cs.LG - 2023-11-30

Curvature Explains Loss of Plasticity

  • paper_url: http://arxiv.org/abs/2312.00246
  • repo_url: None
  • paper_authors: Alex Lewandowski, Haruto Tanaka, Dale Schuurmans, Marlos C. Machado
  • for: 这 paper 的目的是解释 neural network 中 plasticity 的失效机制,并提供一种基于 curvature 的解释。
  • methods: 作者使用了系统的 empirical 研究来支持他们的假设,包括在多个 continual supervised learning 问题上测试 plasticity loss 和 curvature loss。
  • results: 研究发现,loss of plasticity 与 curvature loss 的减少直接相关,而previous explanations 无法解释所有情况。此外,作者还发现了一种简单的分布式 regularizer,可以有效地避免 loss of plasticity。
    Abstract Loss of plasticity is a phenomenon in which neural networks lose their ability to learn from new experience. Despite being empirically observed in several problem settings, little is understood about the mechanisms that lead to loss of plasticity. In this paper, we offer a consistent explanation for plasticity loss, based on an assertion that neural networks lose directions of curvature during training and that plasticity loss can be attributed to this reduction in curvature. To support such a claim, we provide a systematic empirical investigation of plasticity loss across several continual supervised learning problems. Our findings illustrate that curvature loss coincides with and sometimes precedes plasticity loss, while also showing that previous explanations are insufficient to explain loss of plasticity in all settings. Lastly, we show that regularizers which mitigate loss of plasticity also preserve curvature, motivating a simple distributional regularizer that proves to be effective across the problem settings considered.
    摘要 “弹性loss”是一种现象,在神经网络训练中导致神经网络对新的经验学习能力下降。虽然在几个问题设定中被观察到,但是对于这些机制的理解仍然很少。在这篇论文中,我们提出了一个一贯的解释,基于神经网络训练中的方向测度损失,并认为弹性损失可以从这种减少方向测度中推导出来。为了支持这个声明,我们进行了一系列的系统性实验研究,以评估弹性损失在不同的超类学习问题中的行为。我们的发现表明,弹性损失与方向测度损失相对,而且在一些情况下,弹性损失可以先occurs于方向测度损失。此外,我们还证明了以往的解释不足以解释弹性损失在所有情况下。最后,我们显示了一种简单的分布式正规化器,可以有效地避免弹性损失,并且证明了这种正规化器在考虑到的问题设定中具有优化的性能。

Self-similarity of Communities of the ABCD Model

  • paper_url: http://arxiv.org/abs/2312.00238
  • repo_url: None
  • paper_authors: Jordan Barrett, Bogumil Kaminski, Pawel Pralat, Francois Theberge
  • for: 这个论文主要研究的是社区探测(community detection)的人工标准 benchmark(ABCD)图。
  • methods: 这个模型使用了一种叫做ABCD模型,它是一种具有社区结构和力学分布的随机图模型,它比较快速,并且可以分析性地研究。
  • results: 这个研究发现,ABCD模型存在一些有趣的自相似行为,即地址社区的度分布和整个图的度分布在某种程度上是相似的。这意味着我们可以不仅估算社区内每个节点的边数,还可以估算社区内自 Loop和多边的数量。这些量的理解对于社区探测算法是重要的,因为rewiring自 Loop和多边可以使图更简单,但是这些rewiring可能会导致图模型偏离uniform simple graphs。
    Abstract The Artificial Benchmark for Community Detection (ABCD) graph is a random graph model with community structure and power-law distribution for both degrees and community sizes. The model generates graphs similar to the well-known LFR model but it is faster and can be investigated analytically. In this paper, we show that the ABCD model exhibits some interesting self-similar behaviour, namely, the degree distribution of ground-truth communities is asymptotically the same as the degree distribution of the whole graph (appropriately normalized based on their sizes). As a result, we can not only estimate the number of edges induced by each community but also the number of self-loops and multi-edges generated during the process. Understanding these quantities is important as (a) rewiring self-loops and multi-edges to keep the graph simple is an expensive part of the algorithm, and (b) every rewiring causes the underlying configuration models to deviate slightly from uniform simple graphs on their corresponding degree sequences.
    摘要 《人工标准社区检测(ABCD)图的Random Graph模型》是一种社区结构和Power-Law分布的度和社区大小的随机图模型。该模型可以更快速地生成与知名的LFR模型类似的图,并且可以分析性地研究。 在这篇论文中,我们发现了ABCD模型的自相似行为,即检测到的社区度分布与整个图度分布相似(经适当 норmalize based on their sizes)。这意味着我们不仅可以估算每个社区引入的边数,还可以估算自相似和多边的数量,这些量在rewiring过程中具有重要性。(a)自相似和多边的重新连接会使图更加简单,但是这部分算法具有高成本,(b)每次重新连接都会导致对应的配置模型略有偏差,从而影响检测结果。

Deep Equilibrium Based Neural Operators for Steady-State PDEs

  • paper_url: http://arxiv.org/abs/2312.00234
  • repo_url: https://github.com/risteskilab/deq-neural-operators
  • paper_authors: Tanya Marwah, Ashwini Pokle, J. Zico Kolter, Zachary C. Lipton, Jianfeng Lu, Andrej Risteski
  • for: 这个论文的目的是研究如何使用数据驱动机器学习方法来解决部分偏微分方程(PDEs)。
  • methods: 这个论文使用的方法包括Weight-tied neural network architectures和深度等待点方法(FNO)。
  • results: 实验表明,使用FNO-DEQ-based architecture可以比FNO-based baselines的4倍 Parameters来预测稳态微分方程的解。此外,FNO-DEQ也比FNO-based baselines更加稳定,能够在具有更多噪声观察数据的情况下表现更好。最后,这篇论文还证明了FNO-DEQ可以将任何稳态微分方程写作为一个fixed point equation的解。
    Abstract Data-driven machine learning approaches are being increasingly used to solve partial differential equations (PDEs). They have shown particularly striking successes when training an operator, which takes as input a PDE in some family, and outputs its solution. However, the architectural design space, especially given structural knowledge of the PDE family of interest, is still poorly understood. We seek to remedy this gap by studying the benefits of weight-tied neural network architectures for steady-state PDEs. To achieve this, we first demonstrate that the solution of most steady-state PDEs can be expressed as a fixed point of a non-linear operator. Motivated by this observation, we propose FNO-DEQ, a deep equilibrium variant of the FNO architecture that directly solves for the solution of a steady-state PDE as the infinite-depth fixed point of an implicit operator layer using a black-box root solver and differentiates analytically through this fixed point resulting in $\mathcal{O}(1)$ training memory. Our experiments indicate that FNO-DEQ-based architectures outperform FNO-based baselines with $4\times$ the number of parameters in predicting the solution to steady-state PDEs such as Darcy Flow and steady-state incompressible Navier-Stokes. Finally, we show FNO-DEQ is more robust when trained with datasets with more noisy observations than the FNO-based baselines, demonstrating the benefits of using appropriate inductive biases in architectural design for different neural network based PDE solvers. Further, we show a universal approximation result that demonstrates that FNO-DEQ can approximate the solution to any steady-state PDE that can be written as a fixed point equation.
    摘要 “数据驱动的机器学习方法在解决partial differential equations(PDEs)中得到了广泛的应用。它们在训练算子时表现出了特别的成功,该算子接受一个PDE的输入,并输出其解。然而,架构设计空间,特别是根据PDE家族的结构知识,仍然不够了解。我们想要解决这个差距,我们通过研究weight-tied neural network架构对于稳态PDE的利用来进行研究。为此,我们首先证明了大多数稳态PDE的解可以表示为一个非线性算子的稳态点。基于这一观察,我们提出了FNO-DEQ,一种深度平衡变体,它直接通过一个隐式算子层解决稳态PDE的解为无穷深度稳态点,使用黑盒根解器和分析 differentiability,从而实现$\mathcal{O}(1)$的训练内存。我们的实验表明,FNO-DEQ-based架构在稳态PDE的预测中比FNO-based基eline表现出四倍的参数数量。最后,我们证明FNO-DEQ是对于含有更多噪声观察数据的训练时表现更加稳定,这说明了不同的神经网络基于PDE解决器的架构设计中的适用性。此外,我们还证明了FNO-DEQ可以对任何稳态PDE进行近似。”

EpiTESTER: Testing Autonomous Vehicles with Epigenetic Algorithm and Attention Mechanism

  • paper_url: http://arxiv.org/abs/2312.00207
  • repo_url: https://github.com/simula-complex/epitester
  • paper_authors: Chengjie Lu, Shaukat Ali, Tao Yue
  • for: 这个研究旨在实现自驾车辆(AV)在不同环境下进行testing,以探索车辆在不安全情况下的表现。
  • methods: 这个研究提出了一种新的testing方法,名为EpiTESTER,它借鉴了生物体中的epigenetics,允许物种在环境变化中进行适应。EpiTESTER使用了遗传调节 Mechanism,将某些基因抑制,以防止特定基因的表现。
  • results: 这个研究发现,EpiTESTER在与基本遗传探索(GA)和等概率遗传探索(EpiTESTER with equal probability for each gene)进行比较后,在识别不安全情况中表现出色,显示了将epigenetic mechanisms应用到实际问题上是一个好主意。
    Abstract Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is known to be challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named EpiTESTER, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, EpiTESTER adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, EpiTESTER benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors and then calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of EpiTESTER, we compare it with a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented) and EpiTESTER with equal probability for each gene. We evaluate EpiTESTER with four initial environments from CARLA, an open-source simulator for autonomous driving research, and an end-to-end AV controller, Interfuser. Our results show that EpiTESTER achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.
    摘要 testing自动驾驶车(AV)于多种环境enario中进行测试是有挑战的。由于无限多的环境enario可能,因此需要效率地找到关键enario。为此,我们提出了一种新的测试方法,即EpiTESTER,通过启发自适应的epigenetic机制来实现。具体来说,EpiTESTER采用了遗传抑制机制来控制基因表达,并通过计算基因表达概率的动态计算来适应环境变化。在AV的不同数据模式(如图像、激光点云)的情况下,EpiTESTER利用多模型融合变换器来提取高级特征表示,然后通过注意力机制来计算基因表达概率。为了评估EpiTESTER的成本效果,我们与 классиical genetic algorithm(GA)进行比较,其中GA没有实现epigenetic机制,而EpiTESTER使用了相同的概率分布。我们使用CARLA,一个开源的自动驾驶研究 simulate器,和一个综合AV控制器,Interfuser,进行评估。我们的结果显示,EpiTESTER在 indentifying关键enario方面表现出色,比基线更佳,表明应用epigenetic机制是一个好的解决方案。

Optimal Attack and Defense for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2312.00198
  • repo_url: None
  • paper_authors: Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie
  • for: The paper is written to study the robustness of Reinforcement Learning (RL) in real systems, specifically against online manipulation attacks.
  • methods: The paper uses a Markov Decision Process (MDP) to model the attacker’s problem and a stochastic Stackelberg game to compute the optimal defense policy for the victim.
  • results: The paper shows that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity, and the victim can compute an optimal defense policy as the solution to a partially-observable turn-based stochastic game (POTBSG). The solutions are truly robust, although the defense problem is NP-hard.
    Abstract To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.
    摘要 要使强化学习(RL)在实际系统中有用,必须确保它们对噪声和敌意攻击是Robust。在敌意RL中,一个外部攻击者有权控制受到攻击的代理人的环境互动。我们研究了在线 manipulate 攻击的全类型,包括(i)状态攻击、(ii)观察攻击(这是感知状态攻击的一般化)、(iii)动作攻击和(iv)奖励攻击。我们表示攻击者的设计隐蔽攻击以最大化自己的预期奖励问题,通常是将受到攻击的值降低到最小化。我们显示了攻击者可以通过规划或使用标准RL技术学习optimal攻击,并且可以在很多情况下在多项式时间内计算或学习。我们认为受到攻击的代理人的防御策略可以被视为一个Stochastic StackelbergGame(POTBSG),可以在多项式时间内解决。尽管防御问题是NP困难的,但我们显示在许多情况下,可以在多项式时间内计算或学习优化的Markovian防御策略。

Enhancing Ligand Pose Sampling for Molecular Docking

  • paper_url: http://arxiv.org/abs/2312.00191
  • repo_url: https://github.com/drorlab/glow_ives
  • paper_authors: Patricia Suriana, Ron O. Dror
  • For: 本研究旨在提高分子埋入中的分子匹配函数,以便预测绑定pose和虚拟屏选。* Methods: 本研究使用了GLOW和IVES两种改进的样本生成方法,以提高对精确绑定pose的样本生成的可能性。* Results: benchmarking结果表明,使用GLOW和IVES方法可以提高对精确绑定pose的样本生成的可能性,特别是在不同 Ligand 的绑定 pocket 中。这种改进是在实验室测定的结构和 AlphaFold 生成的结构中都可以见到。此外, authors 还提供了5000个蛋白质-抑药物交叠对的候选 Ligand 位置集合,用于训练和测试分子匹配函数。
    Abstract Deep learning promises to dramatically improve scoring functions for molecular docking, leading to substantial advances in binding pose prediction and virtual screening. To train scoring functions-and to perform molecular docking-one must generate a set of candidate ligand binding poses. Unfortunately, the sampling protocols currently used to generate candidate poses frequently fail to produce any poses close to the correct, experimentally determined pose, unless information about the correct pose is provided. This limits the accuracy of learned scoring functions and molecular docking. Here, we describe two improved protocols for pose sampling: GLOW (auGmented sampLing with sOftened vdW potential) and a novel technique named IVES (IteratiVe Ensemble Sampling). Our benchmarking results demonstrate the effectiveness of our methods in improving the likelihood of sampling accurate poses, especially for binding pockets whose shape changes substantially when different ligands bind. This improvement is observed across both experimentally determined and AlphaFold-generated protein structures. Additionally, we present datasets of candidate ligand poses generated using our methods for each of around 5,000 protein-ligand cross-docking pairs, for training and testing scoring functions. To benefit the research community, we provide these cross-docking datasets and an open-source Python implementation of GLOW and IVES at https://github.com/drorlab/GLOW_IVES .
    摘要 深度学习承诺可以显著改进蛋白质做到蛋白质与小分子的绑定 pose 预测和虚拟屏选,以便更好地预测蛋白质与小分子之间的结合 pose。为了训练评分函数和进行蛋白质做到,需要生成一组候选 Ligand 绑定 pose。然而,目前常用的抽取协议 Frequently fail to produce any poses near the correct, experimentally determined pose, unless information about the correct pose is provided. This limits the accuracy of learned scoring functions and molecular docking.在这里,我们描述了两种改进的抽取协议:GLOW(auGmented sampLing with sOftened vdW potential)和一种新的技术名为 IVES(IteratiVe Ensemble Sampling)。我们的 benchmarking 结果表明,使用我们的方法可以提高准确的抽取 pose 的可能性,特别是在不同的 Ligand 绑定 pocket 中。这种改进是通过 experimentally determined 和 AlphaFold-生成的蛋白质结构进行测试。此外,我们还提供了每个蛋白质-Ligand 十字做到的候选 Ligand 绑定 pose 的数据集,用于训练和测试评分函数。为了服务于研究社区,我们在 GitHub 上提供了 GLOW 和 IVES 的开源 Python 实现,以及每个十字做到的数据集,请参考 https://github.com/drorlab/GLOW_IVES。

Non-uniform Online Learning: Towards Understanding Induction

  • paper_url: http://arxiv.org/abs/2312.00170
  • repo_url: None
  • paper_authors: Zhou Lu
    for: This paper explores the relationship between online learning and inductive inference, with a focus on the former’s limitations in applying to the latter.methods: The authors introduce the concept of non-uniform online learning and provide a complete characterization of learnability with finite error in the realizable setting. They also propose a necessary condition for consistency and extend their results to the more realistic agnostic setting.results: The paper shows that any countable union of Littlestone classes can be learnt with regret $\tilde{O}(\sqrt{T})$ in the agnostic setting, and provides a new perspective on the power of induction from an online learning viewpoint.
    Abstract Can a physicist make only finite errors in the endless pursuit of the law of nature? This millennium-old question of inductive inference is a fundamental, yet mysterious problem in philosophy, lacking rigorous justifications. While classic online learning theory and inductive inference share a similar sequential decision-making spirit, the former's reliance on an adaptive adversary and worst-case error bounds limits its applicability to the latter. In this work, we introduce the concept of non-uniform online learning, which we argue aligns more closely with the principles of inductive reasoning. This setting assumes a predetermined ground-truth hypothesis and considers non-uniform, hypothesis-wise error bounds. In the realizable setting, we provide a complete characterization of learnability with finite error: a hypothesis class is non-uniform learnable if and only if it's a countable union of Littlestone classes, no matter the observations are adaptively chosen or iid sampled. Additionally, we propose a necessary condition for the weaker criterion of consistency which we conjecture to be tight. To further promote our theory, we extend our result to the more realistic agnostic setting, showing that any countable union of Littlestone classes can be learnt with regret $\tilde{O}(\sqrt{T})$. We hope this work could offer a new perspective of interpreting the power of induction from an online learning viewpoint.
    摘要 可以一个物理学家在追求自然法律的过程中只作有限错误吗?这是一个悠久的哲学问题,尚未得到正式的证明。 Although classic online learning theory and inductive inference share a similar sequential decision-making spirit, the former's reliance on an adaptive adversary and worst-case error bounds limits its applicability to the latter. In this work, we introduce the concept of non-uniform online learning, which we argue aligns more closely with the principles of inductive reasoning. This setting assumes a predetermined ground-truth hypothesis and considers non-uniform, hypothesis-wise error bounds.在可实现的设定下,我们提供了完整的可学习性 Characterization:一个假设集是非均匀的学习可能 if and only if it is a countable union of Littlestone classes, regardless of whether the observations are adaptively chosen or iid sampled. Furthermore, we propose a necessary condition for the weaker criterion of consistency, which we conjecture to be tight. To further promote our theory, we extend our results to the more realistic agnostic setting, showing that any countable union of Littlestone classes can be learned with regret $\tilde{O}(\sqrt{T})$. We hope that this work could offer a new perspective on the power of induction from an online learning viewpoint.

The Multiverse of Dynamic Mode Decomposition Algorithms

  • paper_url: http://arxiv.org/abs/2312.00137
  • repo_url: https://github.com/mcolbrook/dmd-multiverse
  • paper_authors: Matthew J. Colbrook
  • for: 这篇评论旨在介绍数据驱动分析技术Dynamic Mode Decomposition(DMD),并强调Koopman运算符在转化复杂非线性动力系统为线性框架的作用。
  • methods: 评论特点在于对DMD方法与Koopman运算符的 спектраль性质的关系,并分析了三类DMD方法:线性回归方法、Galerkin估计和结构保持技术。
  • results: 评论提供了DMD方法的各种应用和实例,包括MATLAB包,以帮助读者更深入地理解这些方法。
    Abstract Dynamic Mode Decomposition (DMD) is a popular data-driven analysis technique used to decompose complex, nonlinear systems into a set of modes, revealing underlying patterns and dynamics through spectral analysis. This review presents a comprehensive and pedagogical examination of DMD, emphasizing the role of Koopman operators in transforming complex nonlinear dynamics into a linear framework. A distinctive feature of this review is its focus on the relationship between DMD and the spectral properties of Koopman operators, with particular emphasis on the theory and practice of DMD algorithms for spectral computations. We explore the diverse "multiverse" of DMD methods, categorized into three main areas: linear regression-based methods, Galerkin approximations, and structure-preserving techniques. Each category is studied for its unique contributions and challenges, providing a detailed overview of significant algorithms and their applications as outlined in Table 1. We include a MATLAB package with examples and applications to enhance the practical understanding of these methods. This review serves as both a practical guide and a theoretical reference for various DMD methods, accessible to both experts and newcomers, and enabling readers to delve into their areas of interest in the expansive field of DMD.
    摘要 < translate-to-simplified-chinese>动态模式分解(DMD)是一种广泛应用的数据驱动分析技术,用于分解复杂非线性系统 into 一组模式,揭示系统内部的 patrón和动态性 through spectral analysis. 本文提供了一个完整的、教学的 DMD 评论,强调了 Koopman 算子在将复杂非线性动力系统转化为线性框架中的作用。本文的特点之一是对 DMD 和 Koopman 算子的 спектраль性质的研究,尤其是 DMD 算法的理论和实践。我们分析了 DMD 方法的多样化 "multiverse",包括线性回归基于方法、Galerkin aproximations 和结构保持技术。每个类别都有其独特的贡献和挑战,从而为读者提供了深入的了解这些方法的应用场景。我们附录了一个 MATLAB 包,其中包括示例和应用,以帮助读者更好地理解这些方法。本文作为 both 实践指南和理论参考,对 DMD 方法的各种应用有较好的入门点,适合各种读者,包括专家和新手。

Low latency optical-based mode tracking with machine learning deployed on FPGAs on a tokamak

  • paper_url: http://arxiv.org/abs/2312.00128
  • repo_url: None
  • paper_authors: Yumou Wei, Ryan F. Forelli, Chris Hansen, Jeffrey P. Levesque, Nhan Tran, Joshua C. Agar, Giuseppe Di Guglielmo, Michael E. Mauel, Gerald A. Navratil
  • for: 这个论文旨在实时应用机器学习技术来识别和控制tokamak设备中的核聚变过程中的磁流体不稳定性。
  • methods: 这个研究使用了Field Programmable Gate Array(FPGA)硬件来处理高速摄像头数据,并使用 convolutional neural network(CNN)模型来预测$n$=1磁流体模式的强度和阶段。
  • results: 这个研究在High Beta Tokamak-Extended Pulse(HBT-EP)实验中实现了一个基于FPGA的高速摄像头数据收集和处理系统,可以在实时中使用机器学习技术来识别和控制tokamak设备。
    Abstract Active feedback control in magnetic confinement fusion devices is desirable to mitigate plasma instabilities and enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on $\textit{in situ}$ Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model which predicts the $n$=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6$\mu$s and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.
    摘要 aktive feedback kontrol in magne gravitational confinement fusi device yuxiang yitiandao, mitigate plasma yongxin yitiandao, enable robust operation. Optical high-speed cameras provide a powerful, non-invasive diagnostic, and can be suitable for these applications. In this study, we process fast camera data, at rates exceeding 100kfps, on "in situ" Field Programmable Gate Array (FPGA) hardware to track magnetohydrodynamic (MHD) mode evolution and generate control signals in real-time. Our system utilizes a convolutional neural network (CNN) model, which predicts the n=1 MHD mode amplitude and phase using camera images with better accuracy than other tested non-deep-learning-based methods. By implementing this model directly within the standard FPGA readout hardware of the high-speed camera diagnostic, our mode tracking system achieves a total trigger-to-output latency of 17.6μs and a throughput of up to 120kfps. This study at the High Beta Tokamak-Extended Pulse (HBT-EP) experiment demonstrates an FPGA-based high-speed camera data acquisition and processing system, enabling application in real-time machine-learning-based tokamak diagnostic and control as well as potential applications in other scientific domains.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Flow Matching Beyond Kinematics: Generating Jets with Particle-ID and Trajectory Displacement Information

  • paper_url: http://arxiv.org/abs/2312.00123
  • repo_url: https://github.com/uhh-pd-ml/beyond_kinematics
  • paper_authors: Joschka Birk, Erik Buhmann, Cedric Ewen, Gregor Kasieczka, David Shih
  • for: 这个论文是为了生成 JetClass 数据集中的jets,并且使用了 continue normalizing flow (CNF) 模型。
  • methods: 这个模型使用了 flow matching 技术,并且是 permutation-equivariant,可以生成不同的jet类型。
  • results: 这个模型可以准确地生成 JetClass 数据集中的所有特征,包括 particle-ID 和 track impact parameter。
    Abstract We introduce the first generative model trained on the JetClass dataset. Our model generates jets at the constituent level, and it is a permutation-equivariant continuous normalizing flow (CNF) trained with the flow matching technique. It is conditioned on the jet type, so that a single model can be used to generate the ten different jet types of JetClass. For the first time, we also introduce a generative model that goes beyond the kinematic features of jet constituents. The JetClass dataset includes more features, such as particle-ID and track impact parameter, and we demonstrate that our CNF can accurately model all of these additional features as well. Our generative model for JetClass expands on the versatility of existing jet generation techniques, enhancing their potential utility in high-energy physics research, and offering a more comprehensive understanding of the generated jets.
    摘要 我们介绍了首个基于 JetClass 数据集的生成模型。我们的模型在组件水平生成jets,并使用了流平衡技术来训练 continous normalizing flow(CNF)。它是根据jet类型 conditioned,因此单个模型可以用来生成 JetClass 中的十种不同的jet类型。在这个模型中,我们首次引入了一个超过 jet 成分特征的生成模型。JetClass 数据集包括更多的特征,如粒子ID和轨迹偏置参数,我们示出了我们的 CNF 可以准确地模型这些额外特征。我们的 JetClass 生成模型超越了现有的 jet 生成技术,扩展了它们的潜在应用前景,并为生成jet提供了更全面的理解。

Scalable Bayesian uncertainty quantification with data-driven priors for radio interferometric imaging

  • paper_url: http://arxiv.org/abs/2312.00125
  • repo_url: https://github.com/astro-informatics/quantifai
  • paper_authors: Tobías I. Liaudat, Matthijs Mars, Matthew A. Price, Marcelo Pereyra, Marta M. Betcke, Jason D. McEwen
  • for: This paper aims to address the challenge of uncertainty quantification (UQ) in radio-interferometric imaging with next-generation radio telescopes like the Square Kilometre Array.
  • methods: The proposed method, called QuantifAI, uses a data-driven (learned) prior for high-dimensional settings, combined with a physically motivated likelihood function. The method leverages probability concentration phenomena of high-dimensional log-concave posteriors to obtain information about the posterior, and uses convex optimization methods to compute the maximum a posteriori (MAP) estimation.
  • results: The method is demonstrated in a simulated setting, showing improved image quality and more meaningful uncertainties compared to a benchmark method based on a sparsity-promoting prior. The method is also shown to be fast and scalable, making it a promising approach for UQ in radio-interferometric imaging.Here are the three points in Simplified Chinese:
  • for: 这篇论文目标是解决下一代无线天文望远镜像 Square Kilometre Array 的射电相关成像中的不确定性评估(UQ)挑战。
  • methods: 该方法提议使用高维数据驱动的(学习)假设,与物理适应性的几何函数结合。方法利用高维几何函数的概率填充现象,从 simulations 中学习到的信息,并保证几何函数的归一化。使用 convex 优化方法计算最大 posterior 估计(MAP)。
  • results: 方法在模拟环境中展示了改进的图像质量和更有意义的不确定性,与基准方法基于简约性promoting prior 的图像重建方法相比。方法还显示了快速和可扩展的特点,使其成为无线相关成像 UQ 中的可能性。
    Abstract Next-generation radio interferometers like the Square Kilometer Array have the potential to unlock scientific discoveries thanks to their unprecedented angular resolution and sensitivity. One key to unlocking their potential resides in handling the deluge and complexity of incoming data. This challenge requires building radio interferometric imaging methods that can cope with the massive data sizes and provide high-quality image reconstructions with uncertainty quantification (UQ). This work proposes a method coined QuantifAI to address UQ in radio-interferometric imaging with data-driven (learned) priors for high-dimensional settings. Our model, rooted in the Bayesian framework, uses a physically motivated model for the likelihood. The model exploits a data-driven convex prior, which can encode complex information learned implicitly from simulations and guarantee the log-concavity of the posterior. We leverage probability concentration phenomena of high-dimensional log-concave posteriors that let us obtain information about the posterior, avoiding MCMC sampling techniques. We rely on convex optimisation methods to compute the MAP estimation, which is known to be faster and better scale with dimension than MCMC sampling strategies. Our method allows us to compute local credible intervals, i.e., Bayesian error bars, and perform hypothesis testing of structure on the reconstructed image. In addition, we propose a novel blazing-fast method to compute pixel-wise uncertainties at different scales. We demonstrate our method by reconstructing radio-interferometric images in a simulated setting and carrying out fast and scalable UQ, which we validate with MCMC sampling. Our method shows an improved image quality and more meaningful uncertainties than the benchmark method based on a sparsity-promoting prior. QuantifAI's source code: https://github.com/astro-informatics/QuantifAI.
    摘要 Our proposed method, called QuantifAI, addresses this challenge by using data-driven (learned) priors for high-dimensional settings in the Bayesian framework. Our model leverages a physically motivated likelihood function and a data-driven convex prior that can capture complex information learned implicitly from simulations. This prior ensures the log-concavity of the posterior, which enables us to use probability concentration phenomena to obtain information about the posterior without relying on Markov chain Monte Carlo (MCMC) sampling techniques.Our method uses convex optimization methods to compute the maximum a posteriori (MAP) estimation, which is faster and better scales with dimension than MCMC sampling strategies. Additionally, we propose a novel method to compute pixel-wise uncertainties at different scales.We demonstrate the effectiveness of QuantifAI by reconstructing radio-interferometric images in a simulated setting and performing fast and scalable UQ, which we validate with MCMC sampling. Our results show that QuantifAI provides improved image quality and more meaningful uncertainties than a benchmark method based on a sparsity-promoting prior.QuantifAI's source code is available at .

Geometry-Aware Normalizing Wasserstein Flows for Optimal Causal Inference

  • paper_url: http://arxiv.org/abs/2311.18826
  • repo_url: None
  • paper_authors: Kaiwen Hou
  • for: 增强连续正常化流(CNFs)在 causal inference 中的框架,主要是为了提高 parametric submodels 在 targeted maximum likelihood estimation(TMLE)中的 геометрические性能。
  • methods: 通过引入创新的 CNFs 应用,构建了一系列精细的 parametric submodels,以实现 directed interpolation between $p_0$ 和 $p_1$。这种方法可以优化 semiparametric efficiency bound 在 causal inference 中,使得 CNFs 与 Wasserstein gradient flows 相互协调。
  • results: 该方法可以不仅最小化 TMLE 的 mean squared error,还可以具备 geometric sophistication,从而提高 robustness 对 misspecification。这种 robustness 是关键的,因为它可以减轻 doubly-robust perturbation direction 的 $n^{\frac{1}{4}$ 速率依赖关系。通过结合 robust optimization principles 和 differential geometry,开发的 geometry-aware CNFs 代表了 causal inference 中的一项重要进展。
    Abstract This manuscript enriches the framework of continuous normalizing flows (CNFs) within causal inference, primarily to augment the geometric properties of parametric submodels used in targeted maximum likelihood estimation (TMLE). By introducing an innovative application of CNFs, we construct a refined series of parametric submodels that enable a directed interpolation between the prior distribution $p_0$ and the empirical distribution $p_1$. This proposed methodology serves to optimize the semiparametric efficiency bound in causal inference by orchestrating CNFs to align with Wasserstein gradient flows. Our approach not only endeavors to minimize the mean squared error in the estimation but also imbues the estimators with geometric sophistication, thereby enhancing robustness against misspecification. This robustness is crucial, as it alleviates the dependence on the standard $n^{\frac{1}{4}$ rate for a doubly-robust perturbation direction in TMLE. By incorporating robust optimization principles and differential geometry into the estimators, the developed geometry-aware CNFs represent a significant advancement in the pursuit of doubly robust causal inference.
    摘要

An Adaptive Framework for Generalizing Network Traffic Prediction towards Uncertain Environments

  • paper_url: http://arxiv.org/abs/2311.18824
  • repo_url: None
  • paper_authors: Alexander Downey, Evren Tuna, Alkan Soysal
  • for: 本文提出了一种新的框架,用于在未经见过的无线环境中动态分配移动网络流量预测模型。
  • methods: 该框架使用时间序列分析,包括不supervised clustering和supervised学习,以预测流量量。
  • results: 该框架可以在不需要先知道维度和时间的情况下,以50%以上的提升比对现有研究表现出色。此外,该框架还可以应用于其他在不确定环境中的机器学习应用。
    Abstract We have developed a new framework using time-series analysis for dynamically assigning mobile network traffic prediction models in previously unseen wireless environments. Our framework selectively employs learned behaviors, outperforming any single model with over a 50% improvement relative to current studies. More importantly, it surpasses traditional approaches without needing prior knowledge of a cell. While this paper focuses on network traffic prediction using our adaptive forecasting framework, this framework can also be applied to other machine learning applications in uncertain environments. The framework begins with unsupervised clustering of time-series data to identify unique trends and seasonal patterns. Subsequently, we apply supervised learning for traffic volume prediction within each cluster. This specialization towards specific traffic behaviors occurs without penalties from spatial and temporal variations. Finally, the framework adaptively assigns trained models to new, previously unseen cells. By analyzing real-time measurements of a cell, our framework intelligently selects the most suitable cluster for that cell at any given time, with cluster assignment dynamically adjusting to spatio-temporal fluctuations.
    摘要 我们已经开发了一个新的框架,使用时间序列分析来动态分配无线网络流量预测模型在未经见过的无线环境中。我们的框架选择性地采用学习行为,超越单个模型,提高了现有研究的50%以上。更重要的是,它超越了传统的方法,不需要先知道维持 celle。这篇论文关注了我们的适应预测框架,但这个框架可以应用于其他在不确定环境中的机器学习应用程序。我们的框架开始于无监督聚类时间序列数据,以识别唯一的趋势和季节性模式。然后,我们使用监督学习来预测流量量在每个群中。这种特化于特定的流量行为发生在无法预测的空间和时间变化的情况下。最后,我们的框架智能地将训练好的模型分配给新的、未经见过的维持 celle。通过分析实时测量的维持 celle 的数据,我们的框架会在任何时候选择最适合的群,并 dynamically adjusting 群分配以适应空间-时间的波动。

Pre-registration for Predictive Modeling

  • paper_url: http://arxiv.org/abs/2311.18807
  • repo_url: https://github.com/bostonadam525/Exploring-Ebay-Car-Sales-Data
  • paper_authors: Jake M. Hofman, Angelos Chatzimparmpas, Amit Sharma, Duncan J. Watts, Jessica Hullman
  • for: 提高预测模型的可重现性和普遍性
  • methods: 采用预注册方法来改进预测模型的可靠性
  • results: 通过对机器学习研究人员的质量研究,发现预注册可以防止偏向估计和提高研究结果的可靠性
    Abstract Amid rising concerns of reproducibility and generalizability in predictive modeling, we explore the possibility and potential benefits of introducing pre-registration to the field. Despite notable advancements in predictive modeling, spanning core machine learning tasks to various scientific applications, challenges such as overlooked contextual factors, data-dependent decision-making, and unintentional re-use of test data have raised questions about the integrity of results. To address these issues, we propose adapting pre-registration practices from explanatory modeling to predictive modeling. We discuss current best practices in predictive modeling and their limitations, introduce a lightweight pre-registration template, and present a qualitative study with machine learning researchers to gain insight into the effectiveness of pre-registration in preventing biased estimates and promoting more reliable research outcomes. We conclude by exploring the scope of problems that pre-registration can address in predictive modeling and acknowledging its limitations within this context.
    摘要 在预测模型中增加可重复性和普遍性的担忧下,我们探讨采用预先注册的可能性和优点。尽管预测模型在核心机器学习任务以及各种科学应用中做出了显著进步,但是问题如排除外Contextual factor、数据依赖性决策和意外 reuse test data 等问题,使得结果的可靠性受到质疑。为了解决这些问题,我们建议从解释模型中采用预先注册做法。我们讨论了现有的最佳实践和其局限性,提出了一个轻量级的预先注册模板,并通过机器学习研究人员的质论来了解预先注册在防止偏导估计和促进更可靠的研究结果方面的效果。我们 conclude by exploring预先注册在预测模型中所能解决的问题范围和其在这种 контексте中的局限性。

Efficient Baseline for Quantitative Precipitation Forecasting in Weather4cast 2023

  • paper_url: http://arxiv.org/abs/2311.18806
  • repo_url: None
  • paper_authors: Akshay Punjabi, Pablo Izquierdo Ayala
  • for: 预测气象准确,决策支持多个行业
  • methods: 使用少量计算资源的微型U-Net模型作为基线
  • results: 提供了一个基于微型U-Net模型的准确降水预测方法,可以减少计算资源占用并且是未来气象预测initiative的参照基eline
    Abstract Accurate precipitation forecasting is indispensable for informed decision-making across various industries. However, the computational demands of current models raise environmental concerns. We address the critical need for accurate precipitation forecasting while considering the environmental impact of computational resources and propose a minimalist U-Net architecture to be used as a baseline for future weather forecasting initiatives.
    摘要 准确的降水预测是各行业 Informed decision-making 中不可或缺的。然而,当前的模型计算需求带来环境问题。我们解决了准确降水预测的关键需求,同时考虑计算资源的环境影响,并提出了一个简洁的 U-Net 架构作为未来天气预测initiatives的基线。

Communication-Efficient Federated Optimization over Semi-Decentralized Networks

  • paper_url: http://arxiv.org/abs/2311.18787
  • repo_url: None
  • paper_authors: He Wang, Yuejie Chi
  • for: 这篇研究旨在提高大规模的联邦和分散式学习中的通信效率,因为现在的通信效率受到网络组件和资料不均匀的影响。
  • methods: 作者将使用一种半中央化通信协议,让代理机可以在可能的情况下进行代理机间和代理机到服务器的通信,以提高通信效率。他们还提出了一个名为PISCO的通信有效性算法,具有追踪gradient的功能,可以在不同的网络 topology 下进行多次本地更新,以减少通信次数。
  • results: 作者显示了PISCO algorithm 的数据不均匀和网络 topology 的影响,并证明了PISCO algorithm 在非凸问题上的渐进速率,并且在不同的网络 topology 下显示了线性的速度增长。
    Abstract In large-scale federated and decentralized learning, communication efficiency is one of the most challenging bottlenecks. While gossip communication -- where agents can exchange information with their connected neighbors -- is more cost-effective than communicating with the remote server, it often requires a greater number of communication rounds, especially for large and sparse networks. To tackle the trade-off, we examine the communication efficiency under a semi-decentralized communication protocol, in which agents can perform both agent-to-agent and agent-to-server communication in a probabilistic manner. We design a tailored communication-efficient algorithm over semi-decentralized networks, referred to as PISCO, which inherits the robustness to data heterogeneity thanks to gradient tracking and allows multiple local updates for saving communication. We establish the convergence rate of PISCO for nonconvex problems and show that PISCO enjoys a linear speedup in terms of the number of agents and local updates. Our numerical results highlight the superior communication efficiency of PISCO and its resilience to data heterogeneity and various network topologies.
    摘要 大规模联合分布式学习中,通信效率是最大的挑战之一。虽然谤言通信(代理们可以在相邻的代理之间交换信息)比与远程服务器的通信更加经济,但它通常需要更多的通信循环,特别是在大型和稀疏的网络中。为了解决这种负担,我们研究了半中心化通信协议下的通信效率,在这种协议下,代理可以在抽象的方式上进行代理到代理和代理到服务器的通信。我们设计了一种适应性强的通信效率算法,称为PISCO,它继承了数据不同性的稳定性,并允许多个本地更新以降低通信成本。我们证明PISCO在非对称问题上的整合速率,并证明PISCO在数量 Agent和本地更新方面具有线性增速。我们的numerical result表明PISCO的通信效率较高,并且对数据不同性和不同网络架构具有抗逆性。

Multimodal Learning for Crystalline Materials

  • paper_url: http://arxiv.org/abs/2312.00111
  • repo_url: None
  • paper_authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Peter Y. Lu, Thomas Christensen, Marin Soljačić
  • for: 这项研究旨在利用人工智能(AI)改进材料性能预测和发现新材料的领域。
  • methods: 该研究提出了一种新的基础模型训练方法,称为多Modal Learning for Crystalline Materials(MLCM),它通过多modal对alignment连接高维材料属性(i.e. modalities),生成高度有用的材料表示。
  • results: 研究表明,MLCM在材料项目数据库中的物理性能预测任务中达到了状态对的性能水平,同时还提供了一种高度准确的反向设计方法,可以寻找满足要求的稳定材料,以及可以提取可读的emergent特征,为材料科学家提供了新的思路。
    Abstract Artificial intelligence (AI) has revolutionized the field of materials science by improving the prediction of properties and accelerating the discovery of novel materials. In recent years, publicly available material data repositories containing data for various material properties have grown rapidly. In this work, we introduce Multimodal Learning for Crystalline Materials (MLCM), a new method for training a foundation model for crystalline materials via multimodal alignment, where high-dimensional material properties (i.e. modalities) are connected in a shared latent space to produce highly useful material representations. We show the utility of MLCM on multiple axes: (i) MLCM achieves state-of-the-art performance for material property prediction on the challenging Materials Project database; (ii) MLCM enables a novel, highly accurate method for inverse design, allowing one to screen for stable material with desired properties; and (iii) MLCM allows the extraction of interpretable emergent features that may provide insight to material scientists. Further, we explore several novel methods for aligning an arbitrary number of modalities, improving upon prior art in multimodal learning that focuses on bimodal alignment. Our work brings innovations from the ongoing AI revolution into the domain of materials science and identifies materials as a testbed for the next generation of AI.
    摘要 人工智能(AI)已经革命化了物料科学领域,提高了物料属性预测和新材料发现的速度。最近几年,公共可用的物料数据库快速增长。在这种情况下,我们介绍了一种新的方法——多modal学习 для晶体材料(MLCM),通过多modal对接来生成高效的材料表示。我们在多个轴上展示了MLCM的用于性能:(i)MLCM在挑战性较高的Materials Project数据库上实现了状态码的物理性预测性能;(ii)MLCM实现了一种新的、高度准确的反向设计方法,允许一个屏选稳定的材料与愿意的性能;(iii)MLCM允许EXTRACTINTERPRETABLE的emergent特征,可能为材料科学家提供指导。此外,我们还探索了一些新的多modal对接方法,超越了先前的多modal学习的缺点,它们主要关注二modal对接。我们的工作将AI革命带入材料科学领域,并将材料作为AI的下一代测试床。

MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2311.18780
  • repo_url: None
  • paper_authors: Linfeng Du, Ji Xin, Alex Labach, Saba Zuberi, Maksims Volkovs, Rahul G. Krishnan
  • for: 提高时间序列预测的能力,特别是长期预测任务
  • methods: 使用Transformer模型,动态选择最佳块长度,以模型时间变化
  • results: 与状态关键点基eline相比,在长期预测任务上表现出色,并且在短期预测任务上也表现出优异,而且使用的参数数量相对较少
    Abstract Transformer-based models have greatly pushed the boundaries of time series forecasting recently. Existing methods typically encode time series data into $\textit{patches}$ using one or a fixed set of patch lengths. This, however, could result in a lack of ability to capture the variety of intricate temporal dependencies present in real-world multi-periodic time series. In this paper, we propose MultiResFormer, which dynamically models temporal variations by adaptively choosing optimal patch lengths. Concretely, at the beginning of each layer, time series data is encoded into several parallel branches, each using a detected periodicity, before going through the transformer encoder block. We conduct extensive evaluations on long- and short-term forecasting datasets comparing MultiResFormer with state-of-the-art baselines. MultiResFormer outperforms patch-based Transformer baselines on long-term forecasting tasks and also consistently outperforms CNN baselines by a large margin, while using much fewer parameters than these baselines.
    摘要 transformer 基本模型在时间序列预测方面做出了很大的进步,现有方法通常将时间序列数据编码成 patches 使用一个或固定的 patch length。然而,这可能会导致不能够捕捉实际世界多周期时间序列中的复杂的时间相关性。在这篇论文中,我们提出了 MultiResFormer,它在每层开始时使用检测到的周期性来动态地选择最佳 patch length,以模型时间的变化。具体来说,在每层开始时,时间序列数据被编码成多个平行分支,每个分支使用一个检测到的周期性,然后通过 transformer 编码器块进行处理。我们对长期和短期预测 datasets 进行了广泛的评估, comparing MultiResFormer 与当前领先的基elines。 MultiResFormer 在长期预测任务上表现出色,并在短期预测任务上也一直表现出优于 CNN 基elines,使用的参数数量相对较少。

Online Change Points Detection for Linear Dynamical Systems with Finite Sample Guarantees

  • paper_url: http://arxiv.org/abs/2311.18769
  • repo_url: None
  • paper_authors: Lei Xin, George Chiu, Shreyas Sundaram
  • for: 这个论文是为了检测时间序列中突然改变的Property而写的,目的是要尽可能快地检测到这些改变。
  • methods: 这个论文使用的方法是基于线性动力系统,并且可以检测到多个改变点。它还提供了一个数据依赖的阈值,可以在测试中使用,以确保不会出现假阳性。
  • results: 这个论文提供了一个可以在线进行改变点检测的方法,并且可以提供一个 finite-sample-based bound 来评估检测的可靠性和延迟。这个 bound 还可以用于评估不同参数的影响,以及改变点的检测可靠性。
    Abstract The problem of online change point detection is to detect abrupt changes in properties of time series, ideally as soon as possible after those changes occur. Existing work on online change point detection either assumes i.i.d data, focuses on asymptotic analysis, does not present theoretical guarantees on the trade-off between detection accuracy and detection delay, or is only suitable for detecting single change points. In this work, we study the online change point detection problem for linear dynamical systems with unknown dynamics, where the data exhibits temporal correlations and the system could have multiple change points. We develop a data-dependent threshold that can be used in our test that allows one to achieve a pre-specified upper bound on the probability of making a false alarm. We further provide a finite-sample-based bound for the probability of detecting a change point. Our bound demonstrates how parameters used in our algorithm affect the detection probability and delay, and provides guidance on the minimum required time between changes to guarantee detection.
    摘要 “在线改点检测问题是检测时间序列的突然变化,理想情况下为 möglich 的将变化检测出来,对于实际应用而言,这是一个非常重要的问题。现有的在线改点检测方法中,大多数假设是独立Identically distributed(i.i.d)数据,对于数据的 asymptotic 分析进行研究,但是它们不会提供实际应用中的偏好 guarantees ,也不适用于检测多个改点。在这个研究中,我们研究了线性动态系统中的在线改点检测问题,其数据具有时间相关性,系统可能会有多个改点。我们开发了一个资料依赖的阈值,可以在我们的测试中使用,以确保可以在规定的上限 probability 下发生错误的警示。我们还提供了基于 finite sample 的 bound,用于检测改点的 probability。我们的 bound 显示了我们的检测方法中使用的参数对检测可能性和延迟的影响,并提供了实际应用中改点检测所需的最小时间间隔。”

A data-science pipeline to enable the Interpretability of Many-Objective Feature Selection

  • paper_url: http://arxiv.org/abs/2311.18746
  • repo_url: https://github.com/f-u-njoku/many-objective-fs-nsgaiii
  • paper_authors: Uchechukwu F. Njoku, Alberto Abelló, Besim Bilalli, Gianluca Bontempi
  • for: 本研究旨在支持数据科学家在多目标特征选择(MOFS)结果的解释和比较中,以便从中选择最佳的特征子集。
  • methods: 本研究提出了一种新的方法ologies, combine post-processing和可视化来支持数据科学家在多目标特征选择结果中做出最终选择。
  • results: 实验结果表明,该方法可以帮助数据科学家更好地选择最佳的特征子集,并且可以提供高级别的信息,包括目标、解决方案和特征。
    Abstract Many-Objective Feature Selection (MOFS) approaches use four or more objectives to determine the relevance of a subset of features in a supervised learning task. As a consequence, MOFS typically returns a large set of non-dominated solutions, which have to be assessed by the data scientist in order to proceed with the final choice. Given the multi-variate nature of the assessment, which may include criteria (e.g. fairness) not related to predictive accuracy, this step is often not straightforward and suffers from the lack of existing tools. For instance, it is common to make use of a tabular presentation of the solutions, which provide little information about the trade-offs and the relations between criteria over the set of solutions. This paper proposes an original methodology to support data scientists in the interpretation and comparison of the MOFS outcome by combining post-processing and visualisation of the set of solutions. The methodology supports the data scientist in the selection of an optimal feature subset by providing her with high-level information at three different levels: objectives, solutions, and individual features. The methodology is experimentally assessed on two feature selection tasks adopting a GA-based MOFS with six objectives (number of selected features, balanced accuracy, F1-Score, variance inflation factor, statistical parity, and equalised odds). The results show the added value of the methodology in the selection of the final subset of features.
    摘要 多目标特征选择(MOFS)方法通常使用四个或更多目标来确定特征子集中的相关性。这导致MOFS通常返回一个大量的非主导的解决方案,需要数据科学家进行评估,以确定最终选择。由于评估标准可能包括不同于预测精度的 criterion(例如公平),这个步骤通常不是 straightforward,并且缺乏现有的工具。例如,常用的是使用表格式的解决方案,它们提供了非常少的信息关于交易和特征子集中的关系。这篇论文提出了一种原创的方法来支持数据科学家在MOFS结果的解释和比较中。这种方法结合了后处理和可视化特征子集中的解决方案,以提供数据科学家高级别的信息,包括目标、解决方案和特征。这种方法在两个特征选择任务中进行了实验,采用了基于GA的MOFS,并使用了六个目标(选择的特征数量、平衡准确率、F1分数、流变因子、公平和平等可能性)。结果表明,这种方法在选择最终特征子集中添加了值。

$\mathbb{Z}_2\times \mathbb{Z}_2$ Equivariant Quantum Neural Networks: Benchmarking against Classical Neural Networks

  • paper_url: http://arxiv.org/abs/2311.18744
  • repo_url: https://github.com/zhongtiand/eqnn
  • paper_authors: Zhongtian Dong, Marçal Comajoan Cara, Gopal Ramesh Dahale, Roy T. Forestano, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu
  • for: 这个论文对比了量子神经网络(QNN)和等变量神经网络(EQNN)与其经典对应者:等变量神经网络(ENN)和深度神经网络(DNN)的性能进行了全面的比较分析。
  • methods: 我们使用了两个简单的示例来评估每个网络的性能,即二分类任务中的模型复杂度(测量参数的数量)和训练数据集的大小。
  • results: 我们的结果显示,对于小参数集和中等训练数据集,$\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN和QNN具有较好的性能。
    Abstract This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the $\mathbb{Z}_2\times \mathbb{Z}_2$ EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.
    摘要

Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2311.18735
  • repo_url: None
  • paper_authors: Suman Sapkota, Binod Bhattarai
  • for: 研究多种神经网络架构的共通点和差异,找到了维度混合的概念。
  • methods: 研究了分层、层次、非线性和可学习的混合方法,包括Butterfly结构和MLP混合函数。
  • results: 实验表明,提案的非线性Butterfly混合器可以高效缩放,并且可以用作混合函数。此外,还提出了处理二维信号的Patch-Only MLP-Mixer方法。
    Abstract The recent success of multiple neural architectures like CNNs, Transformers, and MLP-Mixers motivated us to look for similarities and differences between them. We found that these architectures can be interpreted through the lens of a general concept of dimension mixing. Research on coupling flows and the butterfly transform shows that partial and hierarchical signal mixing schemes are sufficient for efficient and expressive function approximation. In this work, we study group-wise sparse, non-linear, multi-layered and learnable mixing schemes of inputs and find that they are complementary to many standard neural architectures. Following our observations and drawing inspiration from the Fast Fourier Transform, we generalize Butterfly Structure to use non-linear mixer function allowing for MLP as mixing function called Butterfly MLP. We were also able to mix along sequence dimension for Transformer-based architectures called Butterfly Attention. Experiments on CIFAR and LRA datasets demonstrate that the proposed Non-Linear Butterfly Mixers are efficient and scale well when the host architectures are used as mixing function. Additionally, we propose Patch-Only MLP-Mixer for processing spatial 2D signals demonstrating a different dimension mixing strategy.
    摘要 近些年,多种神经网络架构如CNNs、Transformers和MLP-Mixers的成功,使我们感受到这些架构之间的相似性和差异。我们发现这些架构可以通过精细混合的概念来解释。研究拥有流和蝴蝶变换的 Coupling Flows 和 hierarchical signal mixing schemes 表明,部分和层次的信号混合方案是有效和表达强的函数 aproximation 的。在这项工作中,我们研究了分组 wise 稀疏、非线性、多层和学习可能的混合方案,并发现它们与许多标准神经网络架构相комplementary。基于我们的观察和使用 Fast Fourier Transform 的灵感,我们推广 Butterfly Structure,使用非线性混合函数,以便使用 MLP 作为混合函数,称为 Butterfly MLP。此外,我们还可以将混合函数应用到序列维度,用于 Transformer-based 架构,称为 Butterfly Attention。实验表明,提议的非线性 Butterfly Mixers 具有高效和可扩展的特点,并且可以与主机架构一起使用。此外,我们还提出了 Patch-Only MLP-Mixer,用于处理二维信号,示出了不同的维度混合策略。

Indoor Millimeter Wave Localization using Multiple Self-Supervised Tiny Neural Networks

  • paper_url: http://arxiv.org/abs/2311.18732
  • repo_url: None
  • paper_authors: Anish Shastri, Andres Garcia-Saavedra, Paolo Casari
  • for: 本研究旨在实现大型室内环境中移动毫米波客户端的本地化。
  • methods: 本研究使用多层感知器神经网络(NN)进行本地化。而不是训练和部署单一深度模型,我们选择多个微型NN进行自我超vised学习。
  • results: 我们的方法比 geometric 本地化方案和使用单一 NN 更高度精度。
    Abstract We consider the localization of a mobile millimeter-wave client in a large indoor environment using multilayer perceptron neural networks (NNs). Instead of training and deploying a single deep model, we proceed by choosing among multiple tiny NNs trained in a self-supervised manner. The main challenge then becomes to determine and switch to the best NN among the available ones, as an incorrect NN will fail to localize the client. In order to upkeep the localization accuracy, we propose two switching schemes: one based on a Kalman filter, and one based on the statistical distribution of the training data. We analyze the proposed schemes via simulations, showing that our approach outperforms both geometric localization schemes and the use of a single NN.
    摘要 我团队考虑了一种移动毫米波客户端在大型室内环境中的本地化,使用多层感知网络(NN)进行定位。而不是训练和部署单一的深度模型,我们选择了多个微型NN在自我超vision的方式进行训练。然而,主要挑战在于确定并切换到最佳NN中, incorrect NN将导致定位错误。为保持定位准确性,我们提议两种切换方案:一种基于加拿ม filter,另一种基于训练数据的统计分布。我们通过实验分析我们的方法,并证明它在定位精度方面超越了几何定位方案和单个NN的使用。

AI in Pharma for Personalized Sequential Decision-Making: Methods, Applications and Opportunities

  • paper_url: http://arxiv.org/abs/2311.18725
  • repo_url: None
  • paper_authors: Yuhan Li, Hongtao Zhang, Keaven Anderson, Songzi Li, Ruoqing Zhu
  • for: The paper is written to provide a review of the applications of artificial intelligence (AI) in the pharmaceutical industry, specifically in drug discovery and development, as well as regulatory submissions.
  • methods: The paper uses case studies to illustrate the key applications of AI in drug development, including protein structure prediction, success probability estimation, subgroup identification, and AI-assisted clinical trial monitoring.
  • results: The paper highlights the increasing trend of incorporating AI components in regulatory submissions, with oncology, psychiatry, gastroenterology, and neurology being the most prevalent therapeutic areas leveraging AI. The paper also discusses the paradigm shift towards personalized or precision medicine, which has had a transformative impact on the pharmaceutical industry.Here’s the information in Simplified Chinese text:
  • for: 论文是为了介绍生物医药领域中人工智能(AI)的应用,特别是在药物发现和开发、以及监管提交中。
  • methods: 论文通过 caso study 展示了 AI 在药物开发中的关键应用,包括蛋白结构预测、成功可能性估计、 subgroup 标识和 AI 助け的临床试验监测。
  • results: 论文指出了在监管提交中包含 AI 组件的增加趋势,其中最常见的临床领域是肿瘤(27%)、心理医学(15%)、肠胃医学(12%)和神经科学(11%)。论文还讨论了人类化医疗的概念的变革,从传统的 “一大类 fits all” 模型转移到了根据个体因素(如环境条件、生活方式和健康历史)制定个性化治疗方案。
    Abstract In the pharmaceutical industry, the use of artificial intelligence (AI) has seen consistent growth over the past decade. This rise is attributed to major advancements in statistical machine learning methodologies, computational capabilities and the increased availability of large datasets. AI techniques are applied throughout different stages of drug development, ranging from drug discovery to post-marketing benefit-risk assessment. Kolluri et al. provided a review of several case studies that span these stages, featuring key applications such as protein structure prediction, success probability estimation, subgroup identification, and AI-assisted clinical trial monitoring. From a regulatory standpoint, there was a notable uptick in submissions incorporating AI components in 2021. The most prevalent therapeutic areas leveraging AI were oncology (27%), psychiatry (15%), gastroenterology (12%), and neurology (11%). The paradigm of personalized or precision medicine has gained significant traction in recent research, partly due to advancements in AI techniques \cite{hamburg2010path}. This shift has had a transformative impact on the pharmaceutical industry. Departing from the traditional "one-size-fits-all" model, personalized medicine incorporates various individual factors, such as environmental conditions, lifestyle choices, and health histories, to formulate customized treatment plans. By utilizing sophisticated machine learning algorithms, clinicians and researchers are better equipped to make informed decisions in areas such as disease prevention, diagnosis, and treatment selection, thereby optimizing health outcomes for each individual.
    摘要 在制药业界,人工智能(AI)的使用在过去的一代经历了不断增长。这种增长归功于在统计机器学习方法、计算能力和大数据集的应用方面的重要进步。AI技术在不同的药品开发阶段都被应用,从药物发现到药品上市后的 benefit-risk 评估。科尔鲁里等人提供了许多实例研究,涵盖了药品发现、成功可能性估计、 subgroup Identification 和 AI 辅助临床试验监测等领域。从 regulatory 角度来看,2021 年有一次明显的增加在包含 AI 组件的提交中。最常见的应用领域是 он科学(27%)、心理科学(15%)、肠道科学(12%)和神经科学(11%)。个性化或精度医学(personalized medicine)的概念在最近的研究中备受关注,这与 AI 技术的进步有着密切的关系。这种转变对制药业界产生了深见的影响。从传统的 "一个大小 fits all" 模式 departure,个性化医学会考虑各种个体因素,如环境条件、生活方式和健康历史,以形成个性化的治疗方案。通过使用复杂的机器学习算法,临床医生和研究人员更好地做出了有知识基础的决策,从而优化每个个体的健康结果。

Steering Deep Feature Learning with Backward Aligned Feature Updates

  • paper_url: http://arxiv.org/abs/2311.18718
  • repo_url: https://github.com/lchizat/2023-bafu
  • paper_authors: Lénaïc Chizat, Praneeth Netrapalli
  • for: 这 paper 的目的是提出一种方法来预测、测量和控制深度学习中的特征学习行为。
  • methods: 这 paper 使用了对准特征更新和反向传播的对齐来预测、测量和控制特征学习行为。
  • results: 这 paper 的结果表明,当对齐成立时,特征更新后一步 SGD 的大小与前向和反向传播的大小之间存在一个简单和普遍的关系。这导致了一些自动调整 Hyper-Parameters(初始化尺度和学习率)的技术,以达到一种想要的特征学习行为。
    Abstract Deep learning succeeds by doing hierarchical feature learning, yet tuning Hyper-Parameters (HP) such as initialization scales, learning rates etc., only give indirect control over this behavior. In this paper, we propose the alignment between the feature updates and the backward pass as a key notion to predict, measure and control feature learning. On the one hand, we show that when alignment holds, the magnitude of feature updates after one SGD step is related to the magnitude of the forward and backward passes by a simple and general formula. This leads to techniques to automatically adjust HPs (initialization scales and learning rates) at initialization and throughout training to attain a desired feature learning behavior. On the other hand, we show that, at random initialization, this alignment is determined by the spectrum of a certain kernel, and that well-conditioned layer-to-layer Jacobians (aka dynamical isometry) implies alignment. Finally, we investigate ReLU MLPs and ResNets in the large width-then-depth limit. Combining hints from random matrix theory and numerical experiments, we show that (i) in MLP with iid initializations, alignment degenerates with depth, making it impossible to start training, and that (ii) in ResNets, the branch scale $1/\sqrt{\text{depth}$ is the only one maintaining non-trivial alignment at infinite depth.
    摘要 深度学习成功之处是通过层次特征学习,但是调整超参数(HP),如初始化缩放、学习率等,只能提供间接控制这种行为的方式。在这篇论文中,我们提出特征更新和反向传播的准确对应作为锚点特征学习控制的关键概念。一方面,我们表明当对齐存在时,特征更新后一步SGD的大小与前向和反向传播的大小之间存在简单和普遍的公式关系。这导致可以通过自动调整初始化HP(初始化缩放和学习率)来控制特征学习行为。另一方面,我们表明,在随机初始化情况下,这种对齐是由某种kernel的特征决定,并且层次Jacobian(即动力学同构)的良好condition implies对齐。最后,我们研究了深度神经网络,特别是MLP和ResNet在宽度大于深度的情况下。结合随机矩阵理论和实验数据,我们显示了以下两点:(i)在MLP中,随机初始化导致对齐逐渐消失深度增加,因此无法进行初始化训练;(ii)在ResNet中,分支缩放因子$1/\sqrt{\text{depth}$是唯一保持非致命对齐的因素,并且随着深度增加,对齐在无穷深度下保持不致命。

DeepEn2023: Energy Datasets for Edge Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2312.00103
  • repo_url: None
  • paper_authors: Xiaolong Tu, Anik Mallik, Haoxin Wang, Jiang Xie
    for: 这篇论文的目的是提出一个大规模的能源数据集,以便测试和优化边缘AI系统的能源效率。methods: 这篇论文使用了大规模的能源数据集,并对边缘AI系统的各种核心和常用深度学习模型进行了测试和分析。results: 这篇论文提出了一个名为DeepEn2023的大规模能源数据集,可以用于测试和优化边缘AI系统的能源效率。
    Abstract Climate change poses one of the most significant challenges to humanity. As a result of these climatic changes, the frequency of weather, climate, and water-related disasters has multiplied fivefold over the past 50 years, resulting in over 2 million deaths and losses exceeding $3.64 trillion USD. Leveraging AI-powered technologies for sustainable development and combating climate change is a promising avenue. Numerous significant publications are dedicated to using AI to improve renewable energy forecasting, enhance waste management, and monitor environmental changes in real time. However, very few research studies focus on making AI itself environmentally sustainable. This oversight regarding the sustainability of AI within the field might be attributed to a mindset gap and the absence of comprehensive energy datasets. In addition, with the ubiquity of edge AI systems and applications, especially on-device learning, there is a pressing need to measure, analyze, and optimize their environmental sustainability, such as energy efficiency. To this end, in this paper, we propose large-scale energy datasets for edge AI, named DeepEn2023, covering a wide range of kernels, state-of-the-art deep neural network models, and popular edge AI applications. We anticipate that DeepEn2023 will improve transparency in sustainability in on-device deep learning across a range of edge AI systems and applications. For more information, including access to the dataset and code, please visit https://amai-gsu.github.io/DeepEn2023.
    摘要 人类面临着气候变化的一个最大挑战。过去50年,气候变化导致天气、气候和水灾害的频率增加五倍,造成200万人死亡和经济损失超过3.64万亿美元。利用人工智能技术实现可持续发展和气候变化控制是一个有前途的方向。许多研究论文探讨了使用人工智能提高可再生能源预测、改善废物管理和实时环境监测等方面。然而,很少研究关注人工智能本身的可持续性。这可能由于知识 gap和缺乏完整的能源数据集所致。此外,随着边缘AI系统和应用的普及,特别是在设备学习上,有必要测量、分析和优化边缘AI的环境可持续性,如能效率。为此,在本文中,我们提出了大规模能源数据集,名为DeepEn2023,覆盖了各种核心、当今最佳深度神经网络模型和流行的边缘AI应用。我们预计DeepEn2023将改善边缘深度学习的透明度,从而提高边缘AI系统和应用的可持续性。如果您想了解更多信息,包括数据集和代码访问,请访问https://amai-gsu.github.io/DeepEn2023。

Balancing Summarization and Change Detection in Graph Streams

  • paper_url: http://arxiv.org/abs/2311.18694
  • repo_url: https://github.com/s-fuku/bsc
  • paper_authors: Shintaro Fukushima, Kenji Yamanishi
  • for: 本研究旨在解决图 summarization 和图变化探测之间的平衡问题。
  • methods: 本研究从变化探测的角度解决了这个问题,通过一探测流中的摘要图来探测统计学上的变化。
  • results: 我们提出了一种新的量化方法来平衡这个费折和准确性之间的费折,同时实现可靠的图 summarization 和变化探测。我们在synthetic和实际数据上进行了Empirical验证,并证明了其有效性。
    Abstract This study addresses the issue of balancing graph summarization and graph change detection. Graph summarization compresses large-scale graphs into a smaller scale. However, the question remains: To what extent should the original graph be compressed? This problem is solved from the perspective of graph change detection, aiming to detect statistically significant changes using a stream of summary graphs. If the compression rate is extremely high, important changes can be ignored, whereas if the compression rate is extremely low, false alarms may increase with more memory. This implies that there is a trade-off between compression rate in graph summarization and accuracy in change detection. We propose a novel quantitative methodology to balance this trade-off to simultaneously realize reliable graph summarization and change detection. We introduce a probabilistic structure of hierarchical latent variable model into a graph, thereby designing a parameterized summary graph on the basis of the minimum description length principle. The parameter specifying the summary graph is then optimized so that the accuracy of change detection is guaranteed to suppress Type I error probability (probability of raising false alarms) to be less than a given confidence level. First, we provide a theoretical framework for connecting graph summarization with change detection. Then, we empirically demonstrate its effectiveness on synthetic and real datasets.
    摘要 We introduce a probabilistic structure of hierarchical latent variable models into a graph, which allows us to design a parameterized summary graph based on the minimum description length principle. The parameter specifying the summary graph is then optimized to ensure that the accuracy of change detection is guaranteed while suppressing the probability of raising false alarms (Type I error probability) to less than a given confidence level.First, we provide a theoretical framework for connecting graph summarization with change detection. Then, we empirically demonstrate the effectiveness of our approach on synthetic and real datasets.

Handling Cost and Constraints with Off-Policy Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.18684
  • repo_url: None
  • paper_authors: Jared Markowitz, Jesse Silverberg, Gary Collins
  • for: 该文章旨在提出一种基于深度学习的偏置学习算法,以提高尝试效率。
  • methods: 该算法使用策略提升步骤,将学习的状态动作($Q$)值函数在选择的批处理数据上进行最大化。此外,该算法还使用了正则化来避免相关的过估问题。
  • results: 该算法在继续使用数据进行培育时,在继续动作空间上提供了改进的样本效率。此外,该算法还在“混合激励”环境中表现出色,并且在常见的控制问题上具有更高的稳定性和可靠性。
    Abstract By reusing data throughout training, off-policy deep reinforcement learning algorithms offer improved sample efficiency relative to on-policy approaches. For continuous action spaces, the most popular methods for off-policy learning include policy improvement steps where a learned state-action ($Q$) value function is maximized over selected batches of data. These updates are often paired with regularization to combat associated overestimation of $Q$ values. With an eye toward safety, we revisit this strategy in environments with "mixed-sign" reward functions; that is, with reward functions that include independent positive (incentive) and negative (cost) terms. This setting is common in real-world applications, and may be addressed with or without constraints on the cost terms. We find the combination of function approximation and a term that maximizes $Q$ in the policy update to be problematic in such environments, because systematic errors in value estimation impact the contributions from the competing terms asymmetrically. This results in overemphasis of either incentives or costs and may severely limit learning. We explore two remedies to this issue. First, consistent with prior work, we find that periodic resetting of $Q$ and policy networks can be used to reduce value estimation error and improve learning in this setting. Second, we formulate novel off-policy actor-critic methods for both unconstrained and constrained learning that do not explicitly maximize $Q$ in the policy update. We find that this second approach, when applied to continuous action spaces with mixed-sign rewards, consistently and significantly outperforms state-of-the-art methods augmented by resetting. We further find that our approach produces agents that are both competitive with popular methods overall and more reliably competent on frequently-studied control problems that do not have mixed-sign rewards.
    摘要 通过重用数据进行训练,深度反馈学习算法可以提高样本效率相比于在政策上进行训练。对于连续动作空间,最受欢迎的方法包括使用策略提高步骤,在选择批处理数据时对学习的状态动作($Q$)值函数进行最大化。这些更新通常与正则化相结合,以避免相关的估计错误。为了保证安全,我们在具有混合正负回报函数的环境中再次考虑这种策略。这种设定在实际应用中很常见,并且可能不带或带约束。我们发现在这种环境中, combining function approximation和最大化$Q$在策略更新中存在问题。这会导致估计错误的系统atic impact asymmetrically,从而导致对奖励或成本的过度强调,可能导致学习受阻。我们探讨了两种缓解方法。首先,与先前的工作一样,我们发现 periodic resetting of $Q$ and policy networks可以减少估计错误并提高学习。其次,我们提出了一种新的无约束和带约束学习的actor-critic方法,不需要直接在策略更新中最大化$Q$。我们发现这种第二种方法,当应用于连续动作空间的混合正负奖励问题时,可以一直性和具有更好的性能,并且在常见的控制问题上表现更为稳定。

A Comparison Between Invariant and Equivariant Classical and Quantum Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.18672
  • repo_url: https://github.com/royforestano/2023_gsoc_ml4sci_qmlhep_gnn
  • paper_authors: Roy T. Forestano, Marçal Comajoan Cara, Gopal Ramesh Dahale, Zhongtian Dong, Sergei Gleyzer, Daniel Justice, Kyoungchul Kong, Tom Magorsch, Konstantin T. Matchev, Katia Matcheva, Eyup B. Unlu
  • for: 这篇论文主要是用于比较 классических图 neural network (GNN) 和等变图 neural network (EGNN) 与其量子版本:量子图 neural network (QGNN) 和等变量量子图 neural network (EQGNN) 的性能。
  • methods: 这篇论文使用了高能物理实验数据,使用 graph neural network (GNN) 和等变量 GNN (EGNN) 进行分类任务。
  • results: 根据 AUC 分数,量子网络表现较好,但是在实际应用中,量子技术的发展和相关 API 的提供可能需要等待。
    Abstract Machine learning algorithms are heavily relied on to understand the vast amounts of data from high-energy particle collisions at the CERN Large Hadron Collider (LHC). The data from such collision events can naturally be represented with graph structures. Therefore, deep geometric methods, such as graph neural networks (GNNs), have been leveraged for various data analysis tasks in high-energy physics. One typical task is jet tagging, where jets are viewed as point clouds with distinct features and edge connections between their constituent particles. The increasing size and complexity of the LHC particle datasets, as well as the computational models used for their analysis, greatly motivate the development of alternative fast and efficient computational paradigms such as quantum computation. In addition, to enhance the validity and robustness of deep networks, one can leverage the fundamental symmetries present in the data through the use of invariant inputs and equivariant layers. In this paper, we perform a fair and comprehensive comparison between classical graph neural networks (GNNs) and equivariant graph neural networks (EGNNs) and their quantum counterparts: quantum graph neural networks (QGNNs) and equivariant quantum graph neural networks (EQGNN). The four architectures were benchmarked on a binary classification task to classify the parton-level particle initiating the jet. Based on their AUC scores, the quantum networks were shown to outperform the classical networks. However, seeing the computational advantage of the quantum networks in practice may have to wait for the further development of quantum technology and its associated APIs.
    摘要 机器学习算法在高能物理实验中发挥重要作用,以解释来自欧洲核子研究所大型强子粒子加速器(LHC)的庞大数据。这些数据自然可以被表示为图结构,因此深度几何方法,如图神经网络(GNNs),在高能物理数据分析中得到了广泛应用。例如,在液体中的jets分类任务中,jets可以视为点云,其中的分子之间存在着明确的特征和边连接。随着LHC实验数据的增大和计算模型的复杂化,需要开发更快、更高效的计算模式,例如量子计算。此外,为了提高深度网络的有效性和稳定性,可以利用数据中的基本对称性,通过使用对称输入和对称层来增强网络的泛化能力。在这篇论文中,我们对经典图神经网络(GNNs)、对称图神经网络(EGNNs)和其量子对应体系进行了公平和全面的比较。这四种架构在一个二分类任务中进行了比较,用于分类带电粒子的起始点。根据它们的AUC分数,量子网络表现出了经典网络的超越。然而,在实际应用中看到量子网络的计算优势可能需要等待量子技术的进一步发展和相关API的出现。

Targeted Reduction of Causal Models

  • paper_url: http://arxiv.org/abs/2311.18639
  • repo_url: None
  • paper_authors: Armin Kekić, Bernhard Schölkopf, Michel Besserve
  • for: 本研究旨在帮助科学家在复杂模型中找到有关 Targeted Causal Reduction (TCR) 的解释。
  • methods: 本研究使用了 causal machine learning 技术,包括一个信息论目标函数和优化算法,以学习TCR。
  • results: 研究表明,TCR 可以帮助科学家从复杂模型中找到有关 Targeted Causal Reduction (TCR) 的高级别解释。
    Abstract Why does a phenomenon occur? Addressing this question is central to most scientific inquiries based on empirical observations, and often heavily relies on simulations of scientific models. As models become more intricate, deciphering the causes behind these phenomena in high-dimensional spaces of interconnected variables becomes increasingly challenging. Causal machine learning may assist scientists in the discovery of relevant and interpretable patterns of causation in simulations. We introduce Targeted Causal Reduction (TCR), a method for turning complex models into a concise set of causal factors that explain a specific target phenomenon. We derive an information theoretic objective to learn TCR from interventional data or simulations and propose algorithms to optimize this objective efficiently. TCR's ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems, illustrating its potential to assist scientists in the study of complex phenomena in a broad range of disciplines.
    摘要 为什么某种现象发生呢?解决这个问题是科学观察的基础问题,通常通过模型的仿真来进行研究。随着模型的复杂化,找出这些现象的原因在高维空间中变得越来越困难。 causal machine learning可以帮助科学家在模型中找到有关的和可解释的Patterns of causation。我们提出了Targeted Causal Reduction(TCR)方法,它可以将复杂的模型转化为一个简洁的 causal factor的集合,用于解释特定的target现象。我们 derive了一个信息论目标函数,用于学习TCR从 intervenational data或仿真中,并提出了一些算法来效率地优化这个目标函数。TCR能够从复杂的模型中提取出高级别的可解释结果,这种能力在 Toy和机械系统中得到了证明,这表明TCR可以帮助科学家在各种领域中研究复杂现象。

Online Influence Maximization: Concept and Algorithm

  • paper_url: http://arxiv.org/abs/2312.00099
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Jianxiong Guo
  • For: The paper provides an overview of the Online Influence Maximization (IM) problem, covering both theoretical aspects and practical applications.* Methods: The paper discusses Offline IM algorithms, including traditional approximation or heuristic algorithms and ML-based algorithms, and introduces a standard definition of the Online IM problem and a basic Combinatorial Multi-Armed Bandit (CMAB) framework, CMAB-T.* Results: The paper covers almost all Online IM algorithms up to now, focusing on their characteristics and theoretical guarantees for different feedback types, and provides regret bounds for their working principles. Additionally, the paper collects innovative ideas about problem definition and algorithm designs, and outlines prospective research directions from four distinct perspectives.
    Abstract In this survey, we offer an extensive overview of the Online Influence Maximization (IM) problem by covering both theoretical aspects and practical applications. For the integrity of the article and because the online algorithm takes an offline oracle as a subroutine, we first make a clear definition of the Offline IM problem and summarize those commonly used Offline IM algorithms, which include traditional approximation or heuristic algorithms and ML-based algorithms. Then, we give a standard definition of the Online IM problem and a basic Combinatorial Multi-Armed Bandit (CMAB) framework, CMAB-T. Here, we summarize three types of feedback in the CMAB model and discuss in detail how to study the Online IM problem based on the CMAB-T model. This paves the way for solving the Online IM problem by using online learning methods. Furthermore, we have covered almost all Online IM algorithms up to now, focusing on characteristics and theoretical guarantees of online algorithms for different feedback types. Here, we elaborately explain their working principle and how to obtain regret bounds. Besides, we also collect plenty of innovative ideas about problem definition and algorithm designs and pioneering works for variants of the Online IM problem and their corresponding algorithms. Finally, we encapsulate current challenges and outline prospective research directions from four distinct perspectives.
    摘要 在这份调查中,我们提供了在线影响最大化(IM)问题的广泛概述,涵盖了理论方面和实践应用。为保持文章的完整性和在线算法使用了离线评估器,我们首先明确定了离线IM问题的定义,并总结了通常使用的离线IM算法,包括传统的 Approximation 或 Heuristic 算法和 ML-based 算法。然后,我们给出了标准的在线IM问题的定义和基本的 Combinatorial Multi-Armed Bandit(CMAB)框架,CMAB-T。在这里,我们总结了 CMAB 模型中的三种反馈,并详细讲述了如何通过 CMAB-T 模型来研究在线IM问题。这为使用在线学习方法解决在线IM问题提供了基础。此外,我们已经覆盖了大多数在线IM算法,重点介绍它们的特点和对不同反馈类型的理论保证。此外,我们还收集了许多创新的问题定义和算法设计的想法,以及对变体在线IM问题的相关算法的先锋工作。最后,我们总结了当前的挑战和未来研究方向的四个不同视角。

Optimizing ZX-Diagrams with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.18588
  • repo_url: https://github.com/maxnaeg/zxreinforce
  • paper_authors: Maximilian Nägele, Florian Marquardt
  • for: 本文使用ZX-diagram和人工智能学习探索了优化ZX-diagram结构的方法。
  • methods: 本文使用人工智能学习算法来找到优化ZX-diagram结构的最佳Sequences of local transformation rules。
  • results: 对比其他优化方法,人工智能学习算法能够更好地优化ZX-diagram结构,并且可以扩展到许多更大的图ogram。
    Abstract ZX-diagrams are a powerful graphical language for the description of quantum processes with applications in fundamental quantum mechanics, quantum circuit optimization, tensor network simulation, and many more. The utility of ZX-diagrams relies on a set of local transformation rules that can be applied to them without changing the underlying quantum process they describe. These rules can be exploited to optimize the structure of ZX-diagrams for a range of applications. However, finding an optimal sequence of transformation rules is generally an open problem. In this work, we bring together ZX-diagrams with reinforcement learning, a machine learning technique designed to discover an optimal sequence of actions in a decision-making problem and show that a trained reinforcement learning agent can significantly outperform other optimization techniques like a greedy strategy or simulated annealing. The use of graph neural networks to encode the policy of the agent enables generalization to diagrams much bigger than seen during the training phase.
    摘要 ZX-图表是一种强大的图形语言,用于描述量子过程,具有应用于基本量子力学、量子Circuit优化、维度网络模拟等多种领域的应用。ZX-图表的实用性基于一组本地变换规则,可以无需改变下面量子过程的基本结构来应用。这些规则可以被利用来优化ZX-图表的结构,以适应各种应用。然而,找到最优的变换序列仍然是一个开放的问题。在这项工作中,我们将ZX-图表与再增强学习相结合,一种用于发现最优行动序列的机器学习技术,并证明了一个训练过的再增强学习代理可以在其他优化技术如做出规则或模拟熔化的情况下显著超越它们。使用图形神经网络编码策略可以允许代理在训练阶段未看到的大型图表上进行泛化。

Class Distribution Shifts in Zero-Shot Learning: Learning Robust Representations

  • paper_url: http://arxiv.org/abs/2311.18575
  • repo_url: None
  • paper_authors: Yuli Slavutsky, Yuval Benjamini
  • for: 这篇论文是关于 Distribution shifts between training and deployment data 的研究,尤其是这种 Distribution shifts 对零戳类别器的影响。
  • methods: 作者们提出了一个算法,帮助学习模型对于不同类别分布的变化进行适应。这个方法结合了层次数据采样和对外固化技术。
  • results: 作者们透过实验和实际数据显示,他们的方法可以提高零戳类别器对于多样类别分布的普遍性。
    Abstract Distribution shifts between training and deployment data often affect the performance of machine learning models. In this paper, we explore a setting where a hidden variable induces a shift in the distribution of classes. These distribution shifts are particularly challenging for zero-shot classifiers, as they rely on representations learned from training classes, but are deployed on new, unseen ones. We introduce an algorithm to learn data representations that are robust to such class distribution shifts in zero-shot verification tasks. We show that our approach, which combines hierarchical data sampling with out-of-distribution generalization techniques, improves generalization to diverse class distributions in both simulations and real-world datasets.
    摘要 发布分布的变化通常会影响机器学习模型的性能。在这篇论文中,我们研究一种隐藏变量导致类分布的变化的情况。这种分布变化对零实例分类器来说特别困难,因为它们基于训练类的表示学习,但是在新、未经见过的类上进行验证。我们提出了一种算法,将数据表示学习到类分布的变化,并在零实例验证任务中提高了类分布多样性的普适性。我们在 simulations 和实际数据集中证明了我们的方法的有效性。

Multi-scale Iterative Refinement towards Robust and Versatile Molecular Docking

  • paper_url: http://arxiv.org/abs/2311.18574
  • repo_url: None
  • paper_authors: Jiaxian Yan, Zaixi Zhang, Kai Zhang, Qi Liu
  • for: 预测蛋白质与小分子的绑定结构,基于 Computational tools 的设计新药
  • methods: 使用DeltaDock框架,包括粒子依赖性绑定位点预测模型和GPU加速的采样算法,以及多尺度迭代改进模块
  • results: 与基eline方法相比,DeltaDock在盲 docking 和Specific docking 两个设置下表现出色,并且在不同的场景下显示出优秀的通用性和可靠性
    Abstract Molecular docking is a key computational tool utilized to predict the binding conformations of small molecules to protein targets, which is fundamental in the design of novel drugs. Despite recent advancements in geometric deep learning-based approaches leading to improvements in blind docking efficiency, these methods have encountered notable challenges, such as limited generalization performance on unseen proteins, the inability to concurrently address the settings of blind docking and site-specific docking, and the frequent occurrence of physical implausibilities such as inter-molecular steric clash. In this study, we introduce DeltaDock, a robust and versatile framework designed for efficient molecular docking to overcome these challenges. DeltaDock operates in a two-step process: rapid initial complex structures sampling followed by multi-scale iterative refinement of the initial structures. In the initial stage, to sample accurate structures with high efficiency, we develop a ligand-dependent binding site prediction model founded on large protein models and graph neural networks. This model is then paired with GPU-accelerated sampling algorithms. The sampled structures are updated using a multi-scale iterative refinement module that captures both protein-ligand atom-atom interactions and residue-atom interactions in the following stage. Distinct from previous geometric deep learning methods that are conditioned on the blind docking setting, DeltaDock demonstrates superior performance in both blind docking and site-specific docking settings. Comprehensive experimental results reveal that DeltaDock consistently surpasses baseline methods in terms of docking accuracy. Furthermore, it displays remarkable generalization capabilities and proficiency for predicting physically valid structures, thereby attesting to its robustness and reliability in various scenarios.
    摘要 分子对接是一种关键的计算工具,用于预测小分子与蛋白质目标之间的绑定结构,这对新药设计非常重要。尽管最近的几何深度学习方法在盲目对接效率方面有所改进,但这些方法却遇到了一些挑战,例如对未见过蛋白质的泛化性能不佳、同时不能同时解决盲目对接和特定对接的问题,以及常见的物理不可能现象如分子间静电冲击。在本研究中,我们介绍了DeltaDock,一种可靠和多功能的框架,用于高效地进行分子对接。DeltaDock采用两步进行方法:首先,使用ligand-dependent binding site预测模型和GPU加速的抽象算法进行快速初始结构采样;其次,使用多尺度迭代优化模块来更新和更加精准地修正初始结构。与前一些基于盲目对接的几何深度学习方法不同,DeltaDock在盲目对接和特定对接 Setting下都达到了更高的性能。经过全面的实验研究,我们发现DeltaDock在对接精度方面一直保持领先,同时也表现出了Remarkable的泛化能力和适用性。

Learning Radio Environments by Differentiable Ray Tracing

  • paper_url: http://arxiv.org/abs/2311.18558
  • repo_url: None
  • paper_authors: Jakob Hoydis, Fayçal Aït Aoudia, Sebastian Cammerer, Florian Euchner, Merlin Nimier-David, Stephan ten Brink, Alexander Keller
  • for: 该论文是为了提高6G研究中的射线追踪技术,以生成具有具体场景和环境特征的通道响应函数(CIR)。
  • methods: 该论文提出了一种新的梯度法Calibration方法,其中material属性的准确性需要通过通道测量来确定。该方法还使用分 diffeomorphic parametrizations of material properties, scattering and antenna patterns,并与可微分的射线追踪算法相结合,以计算响应函数的导数。
  • results: 该论文通过使用 both synthetic data和实际的indoor通道测量数据进行验证,并证明了其方法的可靠性和精度。
    Abstract Ray tracing (RT) is instrumental in 6G research in order to generate spatially-consistent and environment-specific channel impulse responses (CIRs). While acquiring accurate scene geometries is now relatively straightforward, determining material characteristics requires precise calibration using channel measurements. We therefore introduce a novel gradient-based calibration method, complemented by differentiable parametrizations of material properties, scattering and antenna patterns. Our method seamlessly integrates with differentiable ray tracers that enable the computation of derivatives of CIRs with respect to these parameters. Essentially, we approach field computation as a large computational graph wherein parameters are trainable akin to weights of a neural network (NN). We have validated our method using both synthetic data and real-world indoor channel measurements, employing a distributed multiple-input multiple-output (MIMO) channel sounder.
    摘要 射线追踪(RT)在6G研究中发挥重要作用,以生成具有空间相同和环境特定的通道响应函数(CIR)。虽然获得准确的场景几何结构已经变得相对容易,但确定材料特性则需要精准的准备使用通道测量。因此,我们介绍了一种新的梯度基于的准备方法,并且通过分别表示材料性质、散射和天线Pattern的可微分函数来补充。我们的方法可以与可微分射线追踪器结合,以计算响应函数的导数相对于这些参数。简单来说,我们将场 computation viewed as a large computational graph,其中参数可以与神经网络(NN)中的权重相似地训练。我们已经验证了我们的方法使用 both synthetic data和实际的indoor通道测量数据,使用分布式多输入多出力(MIMO)通道测量仪。

Can semi-supervised learning use all the data effectively? A lower bound perspective

  • paper_url: http://arxiv.org/abs/2311.18557
  • repo_url: None
  • paper_authors: Alexandru Ţifrea, Gizem Yüce, Amartya Sanyal, Fanny Yang
  • for: 本文探讨了 semi-supervised learning(SSL)算法是否可以同时超越无监督学习(UL)和监督学习(SL)算法。
  • methods: 本文使用了二元 Gaussian mixture models 来Derive a tight lower bound,并证明了 SSL 算法无法在这些分布上提高最佳 Error rates。
  • results: 然而,实验结果表明 SSL 算法仍可以在实际数据上超越 UL 和 SL 算法。这表明,虽然可以证明 SSL 算法的性能提升,但需要仔细跟踪常数。
    Abstract Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where the unlabeled data is sufficient to learn a good decision boundary using unsupervised learning (UL) alone. This begs the question: Can SSL algorithms simultaneously improve upon both UL and SL? To this end, we derive a tight lower bound for 2-Gaussian mixture models that explicitly depends on the labeled and the unlabeled dataset size as well as the signal-to-noise ratio of the mixture distribution. Surprisingly, our result implies that no SSL algorithm can improve upon the minimax-optimal statistical error rates of SL or UL algorithms for these distributions. Nevertheless, we show empirically on real-world data that SSL algorithms can still outperform UL and SL methods. Therefore, our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants.
    摘要 先前的研究表明 semi-supervised learning 算法可以使用无标签数据来提高 supervised learning 算法的标注样本复杂性。然而,现有的理论分析都集中在无法学习alone可以学习出好决策边界的情况下。这意味着:SSL 算法可以同时提高 UL 和 SL 算法吗?为此,我们 derive a tight lower bound for 2-Gaussian mixture models,这个下界显然取决于标注和无标签数据集大小以及混合分布的信号噪声比。surprisingly, our result implies that no SSL algorithm can improve upon the minimax-optimal statistical error rates of SL or UL algorithms for these distributions.然而,我们在实际数据上示出了 SSL 算法可以仍然超越 UL 和 SL 方法。因此,我们的工作表明,虽然可以证明 SSL 算法的性能提升,但需要仔细跟踪常量。

Textual-Knowledge-Guided Numerical Feature Discovery Method for Power Demand Forecasting

  • paper_url: http://arxiv.org/abs/2312.00095
  • repo_url: None
  • paper_authors: Zifan Ning, Min Jin
  • For: 预测新型电力系统和综合能源系统中的电力需求,尤其是短期内预测。* Methods: 文本知识指导数字特征发现(TKNFD)方法,包括文本知识扩展、数字特征收集和四维多元源跟踪数据库(4DM-STD)建立。* Results: 对两个不同地区的实验结果表明,基于TKNFD发现的特征的预测精度可以高于现有的标准特征方案 by 16.84% to 36.36% MAPE,并且发现了许多未知的特征,尤其是在未知能量和天文维度中的多个主导特征。
    Abstract Power demand forecasting is a crucial and challenging task for new power system and integrated energy system. However, as public feature databases and the theoretical mechanism of power demand changes are unavailable, the known features of power demand fluctuation are much limited. Recently, multimodal learning approaches have shown great vitality in machine learning and AIGC. In this paper, we interact two modal data and propose a textual-knowledge-guided numerical feature discovery (TKNFD) method for short-term power demand forecasting. TKNFD extensively accumulates qualitative textual knowledge, expands it into a candidate feature-type set, collects numerical data of these features, and eventually builds four-dimensional multivariate source-tracking databases (4DM-STDs). Next, TKNFD presents a two-level quantitative feature identification strategy independent of forecasting models, finds 43-48 features, and systematically analyses feature contribution and dependency correlation. Benchmark experiments in two different regions around the world demonstrate that the forecasting accuracy of TKNFD-discovered features reliably outperforms that of SoTA feature schemes by 16.84% to 36.36% MAPE. In particular, TKNFD reveals many unknown features, especially several dominant features in the unknown energy and astronomical dimensions, which extend the knowledge on the origin of strong randomness and non-linearity in power demand fluctuation. Besides, 4DM-STDs can serve as public baseline databases.
    摘要 新的电力系统和统一能源系统中的电力需求预测是一个关键和挑战性的任务。然而,由于公共特征数据库和电力需求变化的理论机制仍然无法获得,因此已知的电力需求波动特征相对较限。现在,多模式学习方法在机器学习和AIGC中表现出了很大的活力。在本文中,我们与两种模式数据进行互动,并提出了文本知识导向数据发现(TKNFD)方法来进行短期电力需求预测。TKNFD通过广泛收集文本知识,扩展它为候选特征类型集,收集这些特征的数据,最后建立四维多元来源追踪数据库(4DM-STDs)。接下来,TKNFD提出了两个层次的量化特征识别策略,独立于预测模型,发现43-48个特征,并系统地分析特征的贡献和依赖相互关联。在全球两个不同区域的实验中,TKNFD发现的特征可靠性地超过了SoTA特征方案的预测精度,具体来说,TKNFD发现了许多未知的特征,特别是在未知能源和天文学尺度上的主导特征,这些特征延伸了电力需求波动变化的起源和非线性性的知识。此外,4DM-STDs可以作为公共基准数据库。

HOT: Higher-Order Dynamic Graph Representation Learning with Efficient Transformers

  • paper_url: http://arxiv.org/abs/2311.18526
  • repo_url: None
  • paper_authors: Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler
  • for: 这篇研究是为了解决动态グラフ学习(GRL)中的关联预测问题,使用历史的グラフ更新来预测某对的关联。
  • methods: 本研究使用Transformers来模型单一的グラフ更新,并将高阶(HO)结构,如k-hop邻居和更一般的子图,转换为注意力矩阵中的编码。
  • results: HOT模型在MOOC dataset上表现出9%, 7%, 15%高于DyGFormer、TGN和GraphMixer的预测精度,并且可以轻松扩展到其他动态GRL问题。
    Abstract Many graph representation learning (GRL) problems are dynamic, with millions of edges added or removed per second. A fundamental workload in this setting is dynamic link prediction: using a history of graph updates to predict whether a given pair of vertices will become connected. Recent schemes for link prediction in such dynamic settings employ Transformers, modeling individual graph updates as single tokens. In this work, we propose HOT: a model that enhances this line of works by harnessing higher-order (HO) graph structures; specifically, k-hop neighbors and more general subgraphs containing a given pair of vertices. Harnessing such HO structures by encoding them into the attention matrix of the underlying Transformer results in higher accuracy of link prediction outcomes, but at the expense of increased memory pressure. To alleviate this, we resort to a recent class of schemes that impose hierarchy on the attention matrix, significantly reducing memory footprint. The final design offers a sweetspot between high accuracy and low memory utilization. HOT outperforms other dynamic GRL schemes, for example achieving 9%, 7%, and 15% higher accuracy than - respectively - DyGFormer, TGN, and GraphMixer, for the MOOC dataset. Our design can be seamlessly extended towards other dynamic GRL workloads.
    摘要 多数图表学(GRL)问题是动态的,每秒添加或删除数百万个边。一个基本的工作荷在这种设定下是动态链接预测:使用图更新历史来预测给定两个顶点是否会连接。latest schemes for link prediction in such dynamic settings employ Transformers, modeling individual graph updates as single tokens. In this work, we propose HOT: a model that enhances this line of works by harnessing higher-order (HO) graph structures; specifically, k-hop neighbors and more general subgraphs containing a given pair of vertices. Harnessing such HO structures by encoding them into the attention matrix of the underlying Transformer results in higher accuracy of link prediction outcomes, but at the expense of increased memory pressure. To alleviate this, we resort to a recent class of schemes that impose hierarchy on the attention matrix, significantly reducing memory footprint. The final design offers a sweetspot between high accuracy and low memory utilization. HOT outperforms other dynamic GRL schemes, for example achieving 9%, 7%, and 15% higher accuracy than - respectively - DyGFormer, TGN, and GraphMixer, for the MOOC dataset. Our design can be seamlessly extended towards other dynamic GRL workloads.Here's the word-for-word translation of the text:多数图表学(GRL)问题是动态的,每秒添加或删除数百万个边。一个基本的工作荷在这种设定下是动态链接预测:使用图更新历史来预测给定两个顶点是否会连接。latest schemes for link prediction in such dynamic settings employ Transformers, modeling individual graph updates as single tokens. In this work, we propose HOT: a model that enhances this line of works by harnessing higher-order (HO) graph structures; specifically, k-hop neighbors and more general subgraphs containing a given pair of vertices. Harnessing such HO structures by encoding them into the attention matrix of the underlying Transformer results in higher accuracy of link prediction outcomes, but at the expense of increased memory pressure. To alleviate this, we resort to a recent class of schemes that impose hierarchy on the attention matrix, significantly reducing memory footprint. The final design offers a sweetspot between high accuracy and low memory utilization. HOT outperforms other dynamic GRL schemes, for example achieving 9%, 7%, and 15% higher accuracy than - respectively - DyGFormer, TGN, and GraphMixer, for the MOOC dataset. Our design can be seamlessly extended towards other dynamic GRL workloads.

Detecting Anomalous Network Communication Patterns Using Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2311.18525
  • repo_url: None
  • paper_authors: Yizhak Vaisman, Gilad Katz, Yuval Elovici, Asaf Shabtai
  • for: 本研究旨在提供一种基于图 convolutional neural network (GCN) 和变换 autoencoder (VAE) 的高级别异常检测方法,用于保护组织的终端机器 from 高级别攻击。
  • methods: 本研究使用了 GCN 基于 VAE 模型,接受两个矩阵作为输入:(一)正规化相互连接矩阵,表示机器之间的连接,以及(二)特征矩阵,包括机器的投影特征(人口、统计、过程相关和 Node2vec 结构特征)。模型在收集到的数据上训练后,对同样的数据进行应用,并计算每个机器的异常分数。
  • results: 本研究对真实、大规模的 ATM 和 AD 服务器之间的通信数据进行了评估,并在两种设置下进行了评估:无监督和监督。结果表明,GCNetOmaly 能够有效地检测机器的异常行为,无需使用干扰特征。
    Abstract To protect an organizations' endpoints from sophisticated cyberattacks, advanced detection methods are required. In this research, we present GCNetOmaly: a graph convolutional network (GCN)-based variational autoencoder (VAE) anomaly detector trained on data that include connection events among internal and external machines. As input, the proposed GCN-based VAE model receives two matrices: (i) the normalized adjacency matrix, which represents the connections among the machines, and (ii) the feature matrix, which includes various features (demographic, statistical, process-related, and Node2vec structural features) that are used to profile the individual nodes/machines. After training the model on data collected for a predefined time window, the model is applied on the same data; the reconstruction score obtained by the model for a given machine then serves as the machine's anomaly score. GCNetOmaly was evaluated on real, large-scale data logged by Carbon Black EDR from a large financial organization's automated teller machines (ATMs) as well as communication with Active Directory (AD) servers in two setups: unsupervised and supervised. The results of our evaluation demonstrate GCNetOmaly's effectiveness in detecting anomalous behavior of machines on unsupervised data.
    摘要 Translation in Simplified Chinese:为保护组织的终端机器从复杂的网络攻击中免受威胁,高级检测方法是必要的。在这项研究中,我们提出了GCNetOmaly:基于图 convolutional neural network (GCN) 的变量 autoencoder (VAE) 异常检测器,通过对内部和外部机器之间的连接事件进行训练。GCNetOmaly 的输入包括两个矩阵:(一)正规化互连矩阵,表示机器之间的连接关系,以及(二)特征矩阵,包括各种特征(人口、统计、过程相关和 Node2vec 结构特征),用于profile individualemachines。 после训练模型于收集到的数据上,模型会应用于同一个数据集,并将对应机器的重建得分作为异常分数。GCNetOmaly 在大规模实际数据中 logging 由 Carbon Black EDR 记录的大型金融组织自动取款机 (ATM) 以及与 Active Directory (AD) 服务器的通信中进行评估,并在两种设置下进行评估:无监督和监督。评估结果表明,GCNetOmaly 在无监督数据上检测机器异常行为的效果极高。

Combining deep generative models with extreme value theory for synthetic hazard simulation: a multivariate and spatially coherent approach

  • paper_url: http://arxiv.org/abs/2311.18521
  • repo_url: None
  • paper_authors: Alison Peard, Jim Hall
  • for: 这个论文旨在理解气候风险的分布和适应策略。
  • methods: 这个论文使用生成对抗网络(GANs)模型了气候变化的相互关系,并结合了传统的极值值论断来控制推论的范围。
  • results: 这个模型可以快速生成 тысячи个真实的复合风险事件,这些事件可以用于气候风险评估和灾备准备。模型的方法可以应用于其他多变量和空间气候数据集中。
    Abstract Climate hazards can cause major disasters when they occur simultaneously as compound hazards. To understand the distribution of climate risk and inform adaptation policies, scientists need to simulate a large number of physically realistic and spatially coherent events. Current methods are limited by computational constraints and the probabilistic spatial distribution of compound events is not given sufficient attention. The bottleneck in current approaches lies in modelling the dependence structure between variables, as inference on parametric models suffers from the curse of dimensionality. Generative adversarial networks (GANs) are well-suited to such a problem due to their ability to implicitly learn the distribution of data in high-dimensional settings. We employ a GAN to model the dependence structure for daily maximum wind speed, significant wave height, and total precipitation over the Bay of Bengal, combining this with traditional extreme value theory for controlled extrapolation of the tails. Once trained, the model can be used to efficiently generate thousands of realistic compound hazard events, which can inform climate risk assessments for climate adaptation and disaster preparedness. The method developed is flexible and transferable to other multivariate and spatial climate datasets.
    摘要 климатические опасностях могут вызывать серьезные бедствия, когда они проходят одновременно в виде сложных опасностей. Чтобы понять распределение рисков климата и разработать политики адаптации, ученые нуждаются в симуляции огромного количества физически реалистичных и согласованных событий. Текущие методы ограничены по computational constraints и не дают достаточного внимания probabilistic spatial distribution of compound events. Блокировка в current approaches lies in modeling the dependence structure between variables, as inference on parametric models suffers from the curse of dimensionality. Generative adversarial networks (GANs) are well-suited to this problem due to their ability to implicitly learn the distribution of data in high-dimensional settings. We employ a GAN to model the dependence structure for daily maximum wind speed, significant wave height, and total precipitation over the Bay of Bengal, combining this with traditional extreme value theory for controlled extrapolation of the tails. Once trained, the model can be used to efficiently generate thousands of realistic compound hazard events, which can inform climate risk assessments for climate adaptation and disaster preparedness. The method developed is flexible and transferable to other multivariate and spatial climate datasets.Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Global Convergence of Online Identification for Mixed Linear Regression

  • paper_url: http://arxiv.org/abs/2311.18506
  • repo_url: None
  • paper_authors: Yujing Liu, Zhixin Liu, Lei Guo
  • for: 这篇论文主要针对于线性回归模型的在线标定和数据分类问题。
  • methods: 该论文提出了两种基于期望最大化原理的在线标定算法,用于解决这两种基本类型的线性回归模型的标定问题。
  • results: 研究表明,这两种算法都可以在不假设数据是独立同分布的情况下,并且可以在全局上收敛。此外,数学分析还表明了这两种算法的稳定性和可靠性。
    Abstract Mixed linear regression (MLR) is a powerful model for characterizing nonlinear relationships by utilizing a mixture of linear regression sub-models. The identification of MLR is a fundamental problem, where most of the existing results focus on offline algorithms, rely on independent and identically distributed (i.i.d) data assumptions, and provide local convergence results only. This paper investigates the online identification and data clustering problems for two basic classes of MLRs, by introducing two corresponding new online identification algorithms based on the expectation-maximization (EM) principle. It is shown that both algorithms will converge globally without resorting to the traditional i.i.d data assumptions. The main challenge in our investigation lies in the fact that the gradient of the maximum likelihood function does not have a unique zero, and a key step in our analysis is to establish the stability of the corresponding differential equation in order to apply the celebrated Ljung's ODE method. It is also shown that the within-cluster error and the probability that the new data is categorized into the correct cluster are asymptotically the same as those in the case of known parameters. Finally, numerical simulations are provided to verify the effectiveness of our online algorithms.
    摘要 复杂线性回传 regression (MLR) 是一种具有非线性关系的模型,通过使用一组线性回传子模型来描述这种关系。识别 MLR 是一个基本的问题,现有的大多数结果强调在线上算法,依赖于独立且相同分布的数据假设,并提供了本地均值结果。本文investigates 在线上识别和数据分群问题上,通过引入两种新的在线识别算法,基于期望最大化(EM)原理。它显示了这两种算法将在无需违反传统独立且相同分布数据假设的情况下具有全球均值。主要挑战在于函数梯度的最大值对应的零点不唯一,而关键步骤在我们的分析中是建立稳定的数据分布,以便适用名单的Ljung ODE方法。此外,我们还证明了在新数据中,内部错误和把新数据分配到正确类别的概率都是对应的极限值。最后,我们提供了一些数值 simulations 以证明我们的在线算法的有效性。

Data-Agnostic Model Poisoning against Federated Learning: A Graph Autoencoder Approach

  • paper_url: http://arxiv.org/abs/2311.18498
  • repo_url: None
  • paper_authors: Kai Li, Jingjing Zheng, Xin Yuan, Wei Ni, Ozgur B. Akan, H. Vincent Poor
  • for: 这篇论文旨在攻击 Federated Learning (FL) 的资料掌控攻击,通过设计一个新的敌意Graph Autoencoder (GAE) 框架。
  • methods: 这个攻击不需要知道 FL 训练数据,并且可以实现效果和隐藏。攻击者通过听取良性本地模型和全球模型的声音,提取本地模型和训练数据特征之间的图 структур相互作用,然后通过对这些图结构进行敌意变化,生成出恶意本地模型。
  • results: 实验结果显示,FL 在这个攻击下会逐渐下降,并且现有的防护机制无法检测到这个攻击。攻击可以导致所有良性设备被感染,对 FL 带来严重的威胁。
    Abstract This paper proposes a novel, data-agnostic, model poisoning attack on Federated Learning (FL), by designing a new adversarial graph autoencoder (GAE)-based framework. The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability. By listening to the benign local models and the global model, the attacker extracts the graph structural correlations among the benign local models and the training data features substantiating the models. The attacker then adversarially regenerates the graph structural correlations while maximizing the FL training loss, and subsequently generates malicious local models using the adversarial graph structure and the training data features of the benign ones. A new algorithm is designed to iteratively train the malicious local models using GAE and sub-gradient descent. The convergence of FL under attack is rigorously proved, with a considerably large optimality gap. Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it. The attack can give rise to an infection across all benign devices, making it a serious threat to FL.
    摘要 Translated into Simplified Chinese:这篇论文提出了一种新的、无关数据的、模型毒化攻击方法,通过设计一个新的对 adversarial graph autoencoder (GAE) 基础的攻击框架。该攻击不需要知道 federated learning (FL) 训练数据,并且可以同时实现效果和隐蔽性。通过听取良性本地模型和全球模型,攻击者可以提取本地模型和训练数据特征之间的图structural correlations。然后,攻击者可以对这些correlations进行对抗性重建,并使用对抗性图结构和良性模型训练数据特征来生成恶意本地模型。一种新的算法是设计用 GAE 和 sub-gradient descent 进行训练恶意本地模型。FL 下 attack 的 converges 是严格地证明的,与良性模型之间的优化差距相当大。实验表明,FL 准确率逐渐下降 unter 提议的攻击,并且现有的防御机制无法检测到它。这种攻击可能会在所有良性设备上传染,使其成为 FL 的严重威胁。

How Much Is Hidden in the NAS Benchmarks? Few-Shot Adaptation of a NAS Predictor

  • paper_url: http://arxiv.org/abs/2311.18451
  • repo_url: None
  • paper_authors: Hrushikesh Loya, Łukasz Dudziak, Abhinav Mehrotra, Royson Lee, Javier Fernandez-Marques, Nicholas D. Lane, Hongkai Wen
  • for: 这篇论文的目的是提高适用于不同任务和搜索空间的神经网络设计方法,并且提高神经网络的性能和效率。
  • methods: 这篇论文使用了 meta-learning 技术,将公开available NAS benchmarks 中的知识抽象出来,并且对 task-level correlation 和predictor transferability 进行了详细的研究。
  • results: 在实验中,这篇论文使用了 6 个 NAS benchmarks,总共有 16 个 NAS 设定,meta-learning 方法不仅在 cross-validation experiments 中显示出了superior 或 matching 性能,还能成功地在新的搜索空间和任务上进行推广。
    Abstract Neural architecture search has proven to be a powerful approach to designing and refining neural networks, often boosting their performance and efficiency over manually-designed variations, but comes with computational overhead. While there has been a considerable amount of research focused on lowering the cost of NAS for mainstream tasks, such as image classification, a lot of those improvements stem from the fact that those tasks are well-studied in the broader context. Consequently, applicability of NAS to emerging and under-represented domains is still associated with a relatively high cost and/or uncertainty about the achievable gains. To address this issue, we turn our focus towards the recent growth of publicly available NAS benchmarks in an attempt to extract general NAS knowledge, transferable across different tasks and search spaces. We borrow from the rich field of meta-learning for few-shot adaptation and carefully study applicability of those methods to NAS, with a special focus on the relationship between task-level correlation (domain shift) and predictor transferability; which we deem critical for improving NAS on diverse tasks. In our experiments, we use 6 NAS benchmarks in conjunction, spanning in total 16 NAS settings -- our meta-learning approach not only shows superior (or matching) performance in the cross-validation experiments but also successful extrapolation to a new search space and tasks.
    摘要

The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies

  • paper_url: http://arxiv.org/abs/2311.18437
  • repo_url: None
  • paper_authors: Victor Boone
  • for: 该 paper 研究了随机抽样算法在随机抽样机器人中的一击性表现。
  • methods: 该 paper 使用了新的Sliding regret指标来衡量随机抽样算法的表现,并证明了Randomized methods(如 Thompson Sampling 和 MED)有最佳的Sliding regret,而Index policies(如 UCB、UCB-V、KL-UCB、MOSS、IMED等)在Regularity conditions下有最坏的Sliding regret。
  • results: 该 paper 发现了随机抽样算法的pseudo-regret中的均勋性,并分析了随机抽样算法的exploration regret的下降性。
    Abstract This paper studies the one-shot behavior of no-regret algorithms for stochastic bandits. Although many algorithms are known to be asymptotically optimal with respect to the expected regret, over a single run, their pseudo-regret seems to follow one of two tendencies: it is either smooth or bumpy. To measure this tendency, we introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity. We show that randomized methods (e.g. Thompson Sampling and MED) have optimal sliding regret, while index policies, although possibly asymptotically optimal for the expected regret, have the worst possible sliding regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB, MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret of index policies via the regret of exploration, that we show to be suboptimal as well.
    摘要

Exploring the Temperature-Dependent Phase Transition in Modern Hopfield Networks

  • paper_url: http://arxiv.org/abs/2311.18434
  • repo_url: None
  • paper_authors: Felix Koulischer, Cédric Goemaere, Tom van der Meersch, Johannes Deleu, Thomas Demeester
  • for: 这篇论文的主要目的是研究模ern Hopfield networks(MHNs)中 inverse temperature Hyperparameter $\beta$ 的影响。
  • methods: 这篇论文使用了一种简化了的 MHN,通过跟踪能量极值的分布来研究 $\beta$ 的影响。
  • results: 研究发现,在一定的 critical temperature $\beta_{\text{c}$ 下,MHN 会经历一种阶段性变化,从单一的全局吸引器向高度模式特定的极值变化。此外,动力学不仅受到 $\beta$ 的影响,还受到存储patterns的分布和大小的影响。
    Abstract The recent discovery of a connection between Transformers and Modern Hopfield Networks (MHNs) has reignited the study of neural networks from a physical energy-based perspective. This paper focuses on the pivotal effect of the inverse temperature hyperparameter $\beta$ on the distribution of energy minima of the MHN. To achieve this, the distribution of energy minima is tracked in a simplified MHN in which equidistant normalised patterns are stored. This network demonstrates a phase transition at a critical temperature $\beta_{\text{c}$, from a single global attractor towards highly pattern specific minima as $\beta$ is increased. Importantly, the dynamics are not solely governed by the hyperparameter $\beta$ but are instead determined by an effective inverse temperature $\beta_{\text{eff}$ which also depends on the distribution and size of the stored patterns. Recognizing the role of hyperparameters in the MHN could, in the future, aid researchers in the domain of Transformers to optimise their initial choices, potentially reducing the necessity for time and energy expensive hyperparameter fine-tuning.
    摘要

On the convergence of adaptive first order methods: proximal gradient and alternating minimization algorithms

  • paper_url: http://arxiv.org/abs/2311.18431
  • repo_url: https://github.com/pylat/adaptive-proximal-algorithms-extended-experiments
  • paper_authors: Puya Latafat, Andreas Themelis, Panagiotis Patrinos
  • for: 本文提出了一种基于最近的 Works on linesearch-free adaptive proximal gradient methods的框架,即AdaPG$^{\pi,r}$,该框架可以更大的步长策略和改进的下界。
  • methods: 本文提出了不同的参数$\pi$和$r$的选择,并通过数值实验证明了其效果。此外,本文还在更一般的设定下证明了其准确性。
  • results: 本文通过数值实验和理论分析证明了AdaPG$^{\pi,r}$的效果,并且在standard strongly convex设定之外扩展了其应用范围。
    Abstract Building upon recent works on linesearch-free adaptive proximal gradient methods, this paper proposes AdaPG$^{\pi,r}$, a framework that unifies and extends existing results by providing larger stepsize policies and improved lower bounds. Different choices of the parameters $\pi$ and $r$ are discussed and the efficacy of the resulting methods is demonstrated through numerical simulations. In an attempt to better understand the underlying theory, its convergence is established in a more general setting that allows for time-varying parameters. Finally, an adaptive alternating minimization algorithm is presented by exploring the dual setting. This algorithm not only incorporates additional adaptivity, but also expands its applicability beyond standard strongly convex settings.
    摘要 基于最近的线earch-free适应距离方法的研究,这篇论文提出了AdaPG$^{\pi,r}$框架,这个框架将现有结果集成并扩展,提供更大的步长策略和改进的下界。不同的参数$\pi$和$r$的选择被讨论,并且通过数学实验证明了这些方法的有效性。为了更好地理解下面的理论基础,这篇论文还提供了一种更通用的时间变化参数的设定,以确保方法的收敛性。最后,这篇论文还介绍了一种适应交替最小化算法,该算法不仅含有额外的适应性,还可以超出标准强CONvex设定。

Convergence Analysis of Fractional Gradient Descent

  • paper_url: http://arxiv.org/abs/2311.18426
  • repo_url: None
  • paper_authors: Ashwani Aggarwal
  • for: 本研究旨在分析某些特殊情况下的梯度下降法(Fractional Gradient Descent)的收敛性。
  • methods: 本文使用 novel bounds 将 fractional 和整数 derivatives 联系起来,然后应用这些 bounds 到不同的设置中,证明了 $O(1/T)$ 收敛性 для 光滑和凸函数,以及 linear 收敛性 для 光滑和强凸函数。
  • results: 本文证明了 fractional gradient descent 在光滑和非凸函数上的 $O(1/T)$ 收敛性,并提供了实验结果,证明了 fractional gradient descent 可能比标准梯度下降法更快。
    Abstract Fractional derivatives are a well-studied generalization of integer order derivatives. Naturally, for optimization, it is of interest to understand the convergence properties of gradient descent using fractional derivatives. Convergence analysis of fractional gradient descent is currently limited both in the methods analyzed and the settings analyzed. This paper aims to fill in these gaps by analyzing variations of fractional gradient descent in smooth and convex, smooth and strongly convex, and smooth and non-convex settings. First, novel bounds will be established bridging fractional and integer derivatives. Then, these bounds will be applied to the aforementioned settings to prove $O(1/T)$ convergence for smooth and convex functions and linear convergence for smooth and strongly convex functions. Additionally, we prove $O(1/T)$ convergence for smooth and non-convex functions using an extended notion of smoothness that is more natural for fractional derivatives. Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as the challenges of predicting which will be faster in general.
    摘要 “分数导数是普遍研究的普遍推广,对于优化,理解分数导数的收敛性质是非常有趣。然而,现有的研究对于分数导数的收敛分析是有限的,尚未涵盖了许多场景。这篇论文的目标是填补这些空白,通过分析分数导数的变种在平滑和凸、平滑和强凸、平滑和非凸等设置下的收敛性质。首先,我们将提出新的 bounds,将分数导数和整数导数相连接起来。然后,我们将这些 bounds 应用到上述设置中,证明平滑和凸函数的 $O(1/T)$ 收敛,平滑和强凸函数的 linear 收敛,以及平滑和非凸函数的 $O(1/T)$ 收敛。此外,我们还将证明一种扩展的平滑性定义,以更自然地描述分数导数。最后,我们将提供实验结果,证明分数导数 descent 在标准导 descent 的情况下可能具有更快的收敛速率,以及在哪些情况下 fractional gradient descent 会更快。”Note that the translation is based on the Simplified Chinese language, which is used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

  • paper_url: http://arxiv.org/abs/2311.18393
  • repo_url: None
  • paper_authors: Bernd Frauenknecht, Tobias Ehlgen, Sebastian Trimpe
  • for: 这篇论文目的是发展自动驾驶系统的基础建立,使用强化学习(RL)来实现控制性能高于传统方法,并且在实际应用中保持计算负载低。
  • methods: 这篇论文使用了三种现代化的深度强化学习方法:Randomized Ensemble Double Q-learning(REDQ)、Probabilistic Ensembles with Trajectory Sampling and Model Predictive Path Integral Optimizer(PETS-MPPI)和Model-Based Policy Optimization(MBPO)。这些方法在车辆轨迹控制方面尚未被探讨过。
  • results: 这篇论文的实验结果显示,使用这三种深度强化学习方法可以与soft-actor critic(SAC)相比,实现车辆控制性能的提升,并且可以大幅减少环境互动次数。
    Abstract Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.
    摘要 高级车辆控制是自动驾驶系统的基础构件。强化学习(RL)可以实现比 классиical方法更高的控制性能,而且在部署时可以降低计算成本。然而,标准RL方法如软演员评价器(SAC)需要大量的训练数据,因此在实际应用中不实用。为解决这个问题,我们在车辆轨迹控制中应用了三种数据高效深度RL方法:随机 ensemble double Q-学习(REDQ)、概率 ensemble with trajectory sampling和模型预测 PATH integral optimizer(PETS-MPPI)以及模型基于策略优化(MBPO)。我们发现在轨迹控制中,标准模型基于RL形式使用在PETS-MPPI和MBPO中不适用。因此,我们提出了一新的形式,将动力预测和车辆定位分离开来。我们在CARLA simulator上进行了比较研究,发现这三种数据高效深度RL方法可以与或更好than SAC学习控制策略,同时减少环境互动数量高于一个数量级。

Transfer Learning across Different Chemical Domains: Virtual Screening of Organic Materials with Deep Learning Models Pretrained on Small Molecule and Chemical Reaction Data

  • paper_url: http://arxiv.org/abs/2311.18377
  • repo_url: None
  • paper_authors: Chengwei Zhang, Yushuang Zhai, Ziyang Gong, Yuan-Bin She, Yun-Fang Yang, An Su
  • for: 这种研究是为了提出一种高效的虚拟屏选方法,以预测有机材料的性能。
  • methods: 这种方法使用了BERT模型,通过使用药物小分子和化学反应数据库来预处理BERT模型,然后在虚拟屏选任务上进行了五个任务的练习。
  • results: 结果显示,使用USPTO-SMILES预处理的BERT模型在两个任务上的R2值超过0.90,在一个任务上的R2值超过0.82,与其他五个传统机器学习模型和预处理小分子或有机材料数据库相比,表现更好。
    Abstract Machine learning prediction of organic materials properties is an efficient virtual screening method ahead of more expensive screening methods. However, this approach has suffered from insufficient labeled data on organic materials to train state-of-the-art machine learning models. In this study, we demonstrate that drug-like small molecule and chemical reaction databases can be used to pretrain the BERT model for the virtual screening of organic materials. Among the BERT models fine-tuned by five virtual screening tasks on organic materials, the USPTO-SMILES pretrained BERT model had R2 > 0.90 for two tasks and R2 > 0.82 for one, which was generally superior to the same models pretrained by the small molecule or organic materials databases, as well as to the other three traditional machine learning models trained directly on the virtual screening task data. The superior performance of the USPTO-SMILES pretrained BERT model is due to the greater variety of organic building blocks in the USPTO database and the broader coverage of the chemical space. The even better performance of the BERT model pretrained externally from a chemical reaction database with additional sources of chemical reactions strengthens our proof of concept that transfer learning across different chemical domains is practical for the virtual screening of organic materials.
    摘要 机器学习预测有机材料性能是一种高效的虚拟屏选方法,但这种方法受到有限的有机材料标注数据的限制。在本研究中,我们表明了使用药物类小分子和化学反应数据库预处理BERT模型以进行虚拟屏选有机材料的方法。与其他五个虚拟屏选任务中的BERT模型进行比较,USPTO-SMILES预处理BERT模型在两个任务中的R2值超过0.90,在一个任务中的R2值超过0.82,这与其他三个传统机器学习模型直接使用虚拟屏选任务数据进行训练相比,表现更出色。USPTO-SMILES预处理BERT模型的优秀表现归功于USPTO数据库中的有机构建块更加多样化和化学空间更加广泛。此外,使用外部化学反应数据库进行预处理,并将其作为虚拟屏选任务数据进行训练,可以更好地证明跨化学领域的知识传递是实际可行的。

Age Effects on Decision-Making, Drift Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.18376
  • repo_url: None
  • paper_authors: Zahra Kavian, Kimia Hajisadeghi, Yashar Rezazadeh, Mehrbod Faraji, Reza Ebrahimpour
    for: 这个研究旨在探讨不同年龄组的人员在完成随机点动任务时的决策性能如何改善。methods: 这个研究使用了三阶段训练,并使用了层次漂移分布模型分析参与者的响应。results: 研究发现,训练后,参与者能够更快地储存感知信息,模型漂移率提高,但决策边界下降,因为他们变得更自信且决策阈值下降。同时,老年组在预后训练时有更高的边界和低于预后训练时的漂移率,并且两组参与者之间决策参数之间差异减少了。
    Abstract Training can improve human decision-making performance. After several training sessions, a person can quickly and accurately complete a task. However, decision-making is always a trade-off between accuracy and response time. Factors such as age and drug abuse can affect the decision-making process. This study examines how training can improve the performance of different age groups in completing a random dot motion (RDM) task. The participants are divided into two groups: old and young. They undergo a three-phase training and then repeat the same RDM task. The hierarchical drift-diffusion model analyzes the subjects' responses and determines how the model's parameters change after training for both age groups. The results show that after training, the participants were able to accumulate sensory information faster, and the model drift rate increased. However, their decision boundary decreased as they became more confident and had a lower decision-making threshold. Additionally, the old group had a higher boundary and lower drift rate in both pre and post-training, and there was less difference between the two group parameters after training.
    摘要 人类决策性能可以通过训练提高。经过一些训练会议,一个人可以快速准确完成任务。然而,决策总是一种牵扯精度和响应时间的权衡。年龄和药物滥用等因素可以影响决策过程。这项研究检查训练如何改善不同年龄组中完成随机点动(RDM)任务的表现。参与者被分为两组:老年和年轻。他们进行了三个阶段的训练,然后重复相同的RDM任务。层次漂移-扩散模型分析参与者的回答,并确定模型参数是否在训练后发生变化。结果表明, после训练,参与者能够更快地收集感知信息,模型漂移率增加。然而,他们的决策界限降低,因为他们变得更自信和决策阈值下降。此外,老年组在预和后训练期间的边界和漂移率较高,训练后两组参数之间的差异减少。

Towards Comparable Active Learning

  • paper_url: http://arxiv.org/abs/2311.18356
  • repo_url: https://github.com/wernerth94/comparable-active-learning
  • paper_authors: Thorben Werner, Johannes Burchert, Lars Schmidt-Thieme
  • for: 这篇论文的目的是提出一个 Active Learning 框架,以便比较不同任务和领域中的算法表现,同时解决了重要的实验重现问题和测量不确定性问题。
  • methods: 这篇论文使用了 Active Learning 方法,并提出了一个新的评估方法,以便评估不同任务和领域中的算法表现。
  • results: 这篇论文的实验结果显示,现有的 Active Learning 算法在不同的领域中的表现差异很大,而且不同的任务和领域中的算法表现也有很大的差异。
    Abstract Active Learning has received significant attention in the field of machine learning for its potential in selecting the most informative samples for labeling, thereby reducing data annotation costs. However, we show that the reported lifts in recent literature generalize poorly to other domains leading to an inconclusive landscape in Active Learning research. Furthermore, we highlight overlooked problems for reproducing AL experiments that can lead to unfair comparisons and increased variance in the results. This paper addresses these issues by providing an Active Learning framework for a fair comparison of algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in 3 major domains: Tabular, Image, and Text. We report empirical results for 6 widely used algorithms on 7 real-world and 2 synthetic datasets and aggregate them into a domain-specific ranking of AL algorithms.
    摘要 This paper addresses these issues by providing an active learning framework for comparing algorithms across different tasks and domains, as well as a fast and performant oracle algorithm for evaluation. To the best of our knowledge, we propose the first AL benchmark that tests algorithms in three major domains: tabular, image, and text. We report empirical results for six widely used algorithms on seven real-world and two synthetic datasets, and aggregate them into a domain-specific ranking of AL algorithms.

Tree-based Forecasting of Day-ahead Solar Power Generation from Granular Meteorological Features

  • paper_url: http://arxiv.org/abs/2312.00090
  • repo_url: None
  • paper_authors: Nick Berlanger, Noah van Ophoven, Tim Verdonck, Ines Wilms
  • for: 预测当前日往来太阳能电力生产,以支持高占用率太阳能电力网络和稳定电力网络运行。
  • methods: 使用当前最佳树型机器学习方法,并考虑不同的气象和天文因素对太阳能电力生产的影响,并在粗细空间位置上进行了详细预测。
  • results: 通过对比现有研究,我们的预测方法可以更好地预测当前日往来太阳能电力生产,并且可以帮助供应商、决策者和其他相关方 optimize 电力网络运行、经济派发和分布式太阳能电力的整合。
    Abstract Accurate forecasts for day-ahead photovoltaic (PV) power generation are crucial to support a high PV penetration rate in the local electricity grid and to assure stability in the grid. We use state-of-the-art tree-based machine learning methods to produce such forecasts and, unlike previous studies, we hereby account for (i) the effects various meteorological as well as astronomical features have on PV power production, and this (ii) at coarse as well as granular spatial locations. To this end, we use data from Belgium and forecast day-ahead PV power production at an hourly resolution. The insights from our study can assist utilities, decision-makers, and other stakeholders in optimizing grid operations, economic dispatch, and in facilitating the integration of distributed PV power into the electricity grid.
    摘要 高精度的日前太阳能电力生产预测对于当地电网中高占用率太阳能电力资源非常重要,以确保电网稳定。我们使用现代的树形机器学习方法来生成这些预测,并不同于之前的研究,我们在这里考虑了(i)太阳能电力生产受不同气象和天文因素影响,以及(ii)在粗细空间位置上。为此,我们使用比利时的数据,预测每小时的日前太阳能电力生产。这些发现可以帮助供应商、决策者和其他各 relate 的利益相关者优化电网运行、经济调度和分布式太阳能电力的集成到电网中。

Reconstructing Historical Climate Fields With Deep Learning

  • paper_url: http://arxiv.org/abs/2311.18348
  • repo_url: None
  • paper_authors: Nils Bochow, Anna Poltronieri, Martin Rypdal, Niklas Boers
  • for: 填充历史气候记录中缺失的数据,特别是在卫星任务之前。
  • methods: 使用深度学习方法,基于傅ри散函数,训练 numerical 气候模型输出来重建历史气候记录。
  • results: 能够真实地重建大面积和不规则的缺失数据,以及重建已知历史事件,如强烈的El Ni~no和La Ni~na,几乎没有给定的信息。 MODEL 超出训练的分解能力,并可以在不同的气候场景下使用。
    Abstract Historical records of climate fields are often sparse due to missing measurements, especially before the introduction of large-scale satellite missions. Several statistical and model-based methods have been introduced to fill gaps and reconstruct historical records. Here, we employ a recently introduced deep-learning approach based on Fourier convolutions, trained on numerical climate model output, to reconstruct historical climate fields. Using this approach we are able to realistically reconstruct large and irregular areas of missing data, as well as reconstruct known historical events such as strong El Ni\~no and La Ni\~na with very little given information. Our method outperforms the widely used statistical kriging method as well as other recent machine learning approaches. The model generalizes to higher resolutions than the ones it was trained on and can be used on a variety of climate fields. Moreover, it allows inpainting of masks never seen before during the model training.
    摘要

Learning Robust Precipitation Forecaster by Temporal Frame Interpolation

  • paper_url: http://arxiv.org/abs/2311.18341
  • repo_url: https://github.com/secilia-cxy/unettfi
  • paper_authors: Lu Han, Xu-Yang Chen, Han-Jia Ye, De-Chuan Zhan
  • for: 这个研究旨在提高气象预报模型的准确性,特别是在面对实际应用中遇到的空间-时间变化问题时。
  • methods: 本研究使用了Temporal Frame Interpolation(TFI)技术,将邻近几帧的卫星图像和地面雷达数据进行插值,从而提高模型对于空间-时间变化的抗变化能力。此外,本研究还使用了一个特有的Multi-Level Dice(ML-Dice)损失函数,利用降水强度的排序性来改善模型的表现。
  • results: 本研究的模型在Weather4cast’23的转移学习领导板上获得了第一名,证明了本研究的方法ologies的有效性。此外,本研究还获得了与其他模型的比较,展示了本研究的模型在气象预报中的表现。
    Abstract Recent advances in deep learning have significantly elevated weather prediction models. However, these models often falter in real-world scenarios due to their sensitivity to spatial-temporal shifts. This issue is particularly acute in weather forecasting, where models are prone to overfit to local and temporal variations, especially when tasked with fine-grained predictions. In this paper, we address these challenges by developing a robust precipitation forecasting model that demonstrates resilience against such spatial-temporal discrepancies. We introduce Temporal Frame Interpolation (TFI), a novel technique that enhances the training dataset by generating synthetic samples through interpolating adjacent frames from satellite imagery and ground radar data, thus improving the model's robustness against frame noise. Moreover, we incorporate a unique Multi-Level Dice (ML-Dice) loss function, leveraging the ordinal nature of rainfall intensities to improve the model's performance. Our approach has led to significant improvements in forecasting precision, culminating in our model securing \textit{1st place} in the transfer learning leaderboard of the \textit{Weather4cast'23} competition. This achievement not only underscores the effectiveness of our methodologies but also establishes a new standard for deep learning applications in weather forecasting. Our code and weights have been public on \url{https://github.com/Secilia-Cxy/UNetTFI}.
    摘要 近年深度学习技术的发展有效地提高了天气预测模型的性能。然而,这些模型在实际场景中经常受到空间-时间变化的影响,导致其预测精度受到限制。特别是在天气预测方面,模型容易过拟合本地和时间变化,尤其是在进行细致预测时。在这篇论文中,我们解决这些挑战,开发了一种可靠的降水预测模型,该模型能够抗抗空间-时间差异。我们提出了一种新的Temporal Frame Interpolation(TFI)技术,通过将邻近帧的卫星图像和地面雷达数据 interpolate 到一起,以提高模型的可靠性。此外,我们采用了一种独特的Multi-Level Dice(ML-Dice)损失函数,利用降水强度的ORDinal性来提高模型的性能。我们的方法使得预测精度得到了显著提高,其中我们的模型在Weather4cast'23 的转移学习领先板上得到了第一名,这不仅证明了我们的方法的有效性,还为深度学习应用在天气预测方面提出了新的标准。我们的代码和参数在GitHub上公开,可以通过 \url{https://github.com/Secilia-Cxy/UNetTFI} 访问。

Anomaly Detection via Learning-Based Sequential Controlled Sensing

  • paper_url: http://arxiv.org/abs/2312.00088
  • repo_url: None
  • paper_authors: Geethu Joseph, Chen Zhong, M. Cenk Gursoy, Senem Velipasalar, Pramod K. Varshney
  • for: 检测给定集合中的 binary 过程中的异常点。
  • methods: 使用学习控制的感知来实现。每个过程都是由一个 binary 随机变量表示是否异常的。通过观察一 subset of 过程每个时间点来识别异常点。 probing 每个过程都有相关的成本。
  • results: 通过两种方法:深度强化学习和深度活动推理来解决。并通过数字实验示出我们的算法可以适应任何未知的Statistical dependence pattern。
    Abstract In this paper, we address the problem of detecting anomalies among a given set of binary processes via learning-based controlled sensing. Each process is parameterized by a binary random variable indicating whether the process is anomalous. To identify the anomalies, the decision-making agent is allowed to observe a subset of the processes at each time instant. Also, probing each process has an associated cost. Our objective is to design a sequential selection policy that dynamically determines which processes to observe at each time with the goal to minimize the delay in making the decision and the total sensing cost. We cast this problem as a sequential hypothesis testing problem within the framework of Markov decision processes. This formulation utilizes both a Bayesian log-likelihood ratio-based reward and an entropy-based reward. The problem is then solved using two approaches: 1) a deep reinforcement learning-based approach where we design both deep Q-learning and policy gradient actor-critic algorithms; and 2) a deep active inference-based approach. Using numerical experiments, we demonstrate the efficacy of our algorithms and show that our algorithms adapt to any unknown statistical dependence pattern of the processes.
    摘要
  1. A deep reinforcement learning-based approach, which includes designing both deep Q-learning and policy gradient actor-critic algorithms.2. A deep active inference-based approach.Using numerical experiments, we demonstrate the effectiveness of our algorithms and show that our algorithms adapt to any unknown statistical dependence pattern of the processes.

Learning for Semantic Knowledge Base-Guided Online Feature Transmission in Dynamic Channels

  • paper_url: http://arxiv.org/abs/2311.18316
  • repo_url: None
  • paper_authors: Xiangyu Gao, Yaping Sun, Dongyu Wei, Xiaodong Xu, Hao Chen, Hao Yin, Shuguang Cui
  • for: 这篇论文旨在提高边缘计算中的AI推理效率,以满足智能应用程序的需求,例如无人车和VR/AR。
  • methods: 我们提出了一个在线优化框架,用于解决频道状况和设备移动对端到端通信系统的挑战。我们的方法会利用知识库来驱动多级对应,考虑时间因素和动态元素在传输过程中。
  • results: 我们的方法比传统探户方法更有优势,尤其在不同的系统设置下。我们设计了一个基于深度学习的算法,并将其与一个精心设计的奖励函数相结合,以在实时决策中解决优化问题。
    Abstract With the proliferation of edge computing, efficient AI inference on edge devices has become essential for intelligent applications such as autonomous vehicles and VR/AR. In this context, we address the problem of efficient remote object recognition by optimizing feature transmission between mobile devices and edge servers. We propose an online optimization framework to address the challenge of dynamic channel conditions and device mobility in an end-to-end communication system. Our approach builds upon existing methods by leveraging a semantic knowledge base to drive multi-level feature transmission, accounting for temporal factors and dynamic elements throughout the transmission process. To solve the online optimization problem, we design a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making, overcoming the optimization difficulty of the NP-hard problem and achieving the minimization of semantic loss while respecting latency constraints. Numerical results showcase the superiority of our approach compared to traditional greedy methods under various system setups.
    摘要 To solve the online optimization problem, we propose a novel soft actor-critic-based deep reinforcement learning system with a carefully designed reward function for real-time decision-making. This approach overcomes the optimization difficulty of the NP-hard problem and achieves the minimization of semantic loss while respecting latency constraints.Numerical results demonstrate the superiority of our approach compared to traditional greedy methods under various system setups. Our approach is able to efficiently transmit features while minimizing semantic loss and respecting latency constraints, making it an ideal solution for edge computing applications.

Automatic Implementation of Neural Networks through Reaction Networks – Part I: Circuit Design and Convergence Analysis

  • paper_url: http://arxiv.org/abs/2311.18313
  • repo_url: None
  • paper_authors: Yuzhen Fan, Xiaoyu Zhang, Chuanhou Gao, Denis Dochain
  • for: 本研究旨在实现一种可编程的生物化学反应网络(BCRN)系统,以实现全连接神经网络(FCNN)的自动化运算在生物体内。
  • methods: 研究人员通过设计具有普通生物化学反应的特定模块,以实现FCNN的前向传播计算、反向传播组件和所有桥接过程。这种方法填补了生物化学任务模块和判断终止模块的设计差距,并提供了一种新的精确和可靠的生物化学反应实现方式。
  • results: 通过平衡 approaching,研究人员示出了设计的 BCRN 系统实现 FCNN 功能,并达到了对 computational results 的极限准确性。此外,该构建还在两个典型的逻辑分类问题上进行了性能评估。
    Abstract Information processing relying on biochemical interactions in the cellular environment is essential for biological organisms. The implementation of molecular computational systems holds significant interest and potential in the fields of synthetic biology and molecular computation. This two-part article aims to introduce a programmable biochemical reaction network (BCRN) system endowed with mass action kinetics that realizes the fully connected neural network (FCNN) and has the potential to act automatically in vivo. In part I, the feedforward propagation computation, the backpropagation component, and all bridging processes of FCNN are ingeniously designed as specific BCRN modules based on their dynamics. This approach addresses a design gap in the biochemical assignment module and judgment termination module and provides a novel precise and robust realization of bi-molecular reactions for the learning process. Through equilibrium approaching, we demonstrate that the designed BCRN system achieves FCNN functionality with exponential convergence to target computational results, thereby enhancing the theoretical support for such work. Finally, the performance of this construction is further evaluated on two typical logic classification problems.
    摘要 生物体内细胞环境中的生物化学反应处理是生物体存在的基本条件。实现分子计算系统在生物学和分子计算领域具有重要的意义和潜力。本文分为两部分,第一部分介绍了Feedforward卷积计算、反卷积组件和所有桥接过程的快速进行的FCNN实现,并通过平衡方法证明其可以自动在生物体内进行。在第二部分,我们通过两个典型的逻辑分类问题来评估这种结构的性能。Here's the word-for-word translation:生物体内细胞环境中的生物化学反应处理是生物体存在的基本条件。实现分子计算系统在生物学和分子计算领域具有重要的意义和潜力。本文分为两部分,第一部分介绍了Feedforward卷积计算、反卷积组件和所有桥接过程的快速进行的FCNN实现,并通过平衡方法证明其可以自动在生物体内进行。在第二部分,我们通过两个典型的逻辑分类问题来评估这种结构的性能。

PAUNet: Precipitation Attention-based U-Net for rain prediction from satellite radiance data

  • paper_url: http://arxiv.org/abs/2311.18306
  • repo_url: None
  • paper_authors: P. Jyoteeshkumar Reddy, Harish Baki, Sandeep Chinta, Richard Matear, John Taylor
  • for: 这篇论文是为了预测卫星辐射数据中的降水而写的。
  • methods: 这篇论文使用了深度学习架构Precipitation Attention-based U-Net(PAUNet),该架构包括encoder卷积层、center cropping和注意机制,以 capture多个频率带的卫星图像的大规模上下文信息。
  • results: 论文在使用e-FPL损失函数和大量欧洲区域的数据进行训练后,在预测不同降水类别的降水的 Critical Success Index(CSI)得分高于基线模型,demonstrating notable accuracy and improvement in precipitation forecasting。
    Abstract This paper introduces Precipitation Attention-based U-Net (PAUNet), a deep learning architecture for predicting precipitation from satellite radiance data, addressing the challenges of the Weather4cast 2023 competition. PAUNet is a variant of U-Net and Res-Net, designed to effectively capture the large-scale contextual information of multi-band satellite images in visible, water vapor, and infrared bands through encoder convolutional layers with center cropping and attention mechanisms. We built upon the Focal Precipitation Loss including an exponential component (e-FPL), which further enhanced the importance across different precipitation categories, particularly medium and heavy rain. Trained on a substantial dataset from various European regions, PAUNet demonstrates notable accuracy with a higher Critical Success Index (CSI) score than the baseline model in predicting rainfall over multiple time slots. PAUNet's architecture and training methodology showcase improvements in precipitation forecasting, crucial for sectors like emergency services and retail and supply chain management.
    摘要 Simplified Chinese translation:这篇论文介绍了基于降水注意力的U-Net(PAUNet),一种深度学习架构,用于从卫星辐射数据预测降水,解决了2023年天气预报比赛中的挑战。PAUNet是U-Net和Res-Net的变体,通过编码卷积层中心裁剪和注意力机制,效果地捕捉多频段卫星图像的大规模上下文信息。我们基于含有指数组件的积分损失函数(e-FPL),进一步强调不同降水类别的重要性,特别是中等和重降水。使用欧洲多个地区的大量数据进行训练,PAUNet在多个时间槽预测降水时表现出了显著的准确性,CSI分数高于基准模型。PAUNet的架构和训练方法展示了降水预测的改进,对于应急服务和零售和供应链管理等领域非常重要。

Semiparametric Efficient Inference in Adaptive Experiments

  • paper_url: http://arxiv.org/abs/2311.18274
  • repo_url: None
  • paper_authors: Thomas Cook, Alan Mishler, Aaditya Ramdas
  • for: 这篇论文是为了提出一种有效的推断 average treatment effect(ATT)在时间序列实验中的方法。
  • methods: 这篇论文使用了 adaptive augmented inverse-probability weighted estimator,这种方法是 semi-parametric efficient,并且具有更弱的假设,比之前的 литературе中的假设更加弱。这篇论文还提出了一种中心假设,使得fficient inference可以在固定的样本大小下进行。
  • results: 这篇论文的实验结果表明,使用了这种方法可以获得更窄的信任度范围,而且可以在数据依赖停止时间(样本大小)上进行anytime-valid的推断。此外,这种方法还可以使用 propensity score truncation 技术来减少样本中的finite sample variance,不会影响 asymptotic variance。
    Abstract We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time. We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semiparametric efficient, under weaker assumptions than those previously made in the literature. This central limit theorem enables efficient inference at fixed sample sizes. We then consider a sequential inference setting, deriving both asymptotic and nonasymptotic confidence sequences that are considerably tighter than previous methods. These anytime-valid methods enable inference under data-dependent stopping times (sample sizes). Additionally, we use propensity score truncation techniques from the recent off-policy estimation literature to reduce the finite sample variance of our estimator without affecting the asymptotic variance. Empirical results demonstrate that our methods yield narrower confidence sequences than those previously developed in the literature while maintaining time-uniform error control.
    摘要 我们考虑了一个内部均值影响的总体实验中的有效推断问题,其中政策对对象分配到治疗或控制的变化过时。我们首先提供了一个中心均值定理 для Adaptive Augmented Inverse-Probability Weighted 估计器,这是半 Parametric 有效的,比过去Literature中的假设更弱。这个中心均值定理允许我们在固定样本大小下进行有效的推断。接着,我们考虑了一个类别推断设定, derive 了 both asymptotic 和 nonasymptotic 信任范围,这些范围比先前的方法更为紧密。这些任何时间有效的方法允许我们在资料依赖的停止时间(样本大小)下进行推断。此外,我们使用了 propensity score 截断技术,从 recent off-policy 估计文献中获得的 truncation 技术,以减少我们估计器的finite sample variance,不影响 asymptotic variance。实验结果显示,我们的方法对于实验中的信任范围产生了更窄的信任范围,同时保持时间均匀的错误控制。

Learning Exactly Linearizable Deep Dynamics Models

  • paper_url: http://arxiv.org/abs/2311.18261
  • repo_url: None
  • paper_authors: Ryuta Moriyasu, Masayuki Kusunoki, Kenji Kashima
  • for: 这个论文主要针对的是基于机器学习方法的控制系统的实用工程应用。
  • methods: 该论文提出了一种可以轻松应用不同控制理论来确保稳定性和可靠性的学习方法,以及提供高度自由表达的设计。
  • results: 在使用该模型控制汽车发动机的实际应用中,得到了良好预测性和在约束下稳定控制的结果。
    Abstract Research on control using models based on machine-learning methods has now shifted to the practical engineering stage. Achieving high performance and theoretically guaranteeing the safety of the system is critical for such applications. In this paper, we propose a learning method for exactly linearizable dynamical models that can easily apply various control theories to ensure stability, reliability, etc., and to provide a high degree of freedom of expression. As an example, we present a design that combines simple linear control and control barrier functions. The proposed model is employed for the real-time control of an automotive engine, and the results demonstrate good predictive performance and stable control under constraints.
    摘要 研究基于机器学习方法的控制模型已经进入了实用工程阶段。为了确保系统的高性能和理论上的安全性,在这种应用中达成高度的自由表达是关键。本文提出了一种可以快速应用不同控制理论来保证稳定性和可靠性等特性的学习方法。为了示例,我们提出了结合简单线性控制和控制障碍函数的设计。这种方法在实时控制汽车发动机中得到了良好的预测性和稳定控制下限。

Combined Scheduling, Memory Allocation and Tensor Replacement for Minimizing Off-Chip Data Accesses of DNN Accelerators

  • paper_url: http://arxiv.org/abs/2311.18246
  • repo_url: None
  • paper_authors: Yi Li, Aarti Gupta, Sharad Malik
  • for: 这个论文是为了优化深度神经网络(DNN)的特化硬件加速器的执行,以提高能耗和性能。
  • methods: 该论文提出了一个优化框架,名为COSMA,用于将DNN映射到加速器中,以最小化额外的数据访问。COSMA使用整数线性编程(ILP)形式来生成最佳的映射解决方案,并使用存储在加速器中的特殊硬件和缓存来减少数据访问。
  • results: 根据论文的结果,使用off-the-shelf ILP求解器,COSMA可以在秒钟内获得最佳的映射解决方案,并在多种现有的DNN模型中提高了84%的数据访问率。此外,提出了一种分治分解的规则,用于处理一些复杂的DNN模型,这种规则可以减少85%的数据访问率。
    Abstract Specialized hardware accelerators have been extensively used for Deep Neural Networks (DNNs) to provide power/performance benefits. These accelerators contain specialized hardware that supports DNN operators, and scratchpad memory for storing the tensor operands. Often, the size of the scratchpad is insufficient to store all the tensors needed for the computation, and additional data accesses are needed to move tensors back and forth from host memory during the computation with significant power/performance overhead. The volume of these additional data accesses depends on the operator schedule, and memory allocation (specific locations selected for the tensors in the scratchpad). We propose an optimization framework, named COSMA, for mapping DNNs to an accelerator that finds the optimal operator schedule, memory allocation and tensor replacement that minimizes the additional data accesses. COSMA provides an Integer Linear Programming (ILP) formulation to generate the optimal solution for mapping a DNN to the accelerator for a given scratchpad size. We demonstrate that, using an off-the-shelf ILP solver, COSMA obtains the optimal solution in seconds for a wide-range of state-of-the-art DNNs for different applications. Further, it out-performs existing methods by reducing on average 84% of the non-compulsory data accesses. We further propose a divide-and-conquer heuristic to scale up to certain complex DNNs generated by Neural Architecture Search, and this heuristic solution reduces on average 85% data accesses compared with other works.
    摘要

Poisoning Attacks Against Contrastive Recommender Systems

  • paper_url: http://arxiv.org/abs/2311.18244
  • repo_url: None
  • paper_authors: Zongwei Wang, Junliang Yu, Min Gao, Hongzhi Yin, Bin Cui, Shazia Sadiq
  • for: This paper focuses on the vulnerability of contrastive learning (CL) based recommendation systems to poisoning attacks, and aims to facilitate the development of more robust CL-based systems.
  • methods: The paper uses theoretical and empirical analysis to identify the vulnerability of CL-based systems and proposes a dual-objective attack framework to amplify the dispersion effect of the CL loss and directly elevate the visibility of target items.
  • results: The paper validates the destructiveness of the proposed attack model through extensive experimentation on four datasets, demonstrating the vulnerability of CL-based systems to poisoning attacks.
    Abstract Contrastive learning (CL) has recently gained significant popularity in the field of recommendation. Its ability to learn without heavy reliance on labeled data is a natural antidote to the data sparsity issue. Previous research has found that CL can not only enhance recommendation accuracy but also inadvertently exhibit remarkable robustness against noise. However, this paper identifies a vulnerability of CL-based recommender systems: Compared with their non-CL counterparts, they are even more susceptible to poisoning attacks that aim to promote target items. Our analysis points to the uniform dispersion of representations led by the CL loss as the very factor that accounts for this vulnerability. We further theoretically and empirically demonstrate that the optimization of CL loss can lead to smooth spectral values of representations. Based on these insights, we attempt to reveal the potential poisoning attacks against CL-based recommender systems. The proposed attack encompasses a dual-objective framework: One that induces a smoother spectral value distribution to amplify the CL loss's inherent dispersion effect, named dispersion promotion; and the other that directly elevates the visibility of target items, named rank promotion. We validate the destructiveness of our attack model through extensive experimentation on four datasets. By shedding light on these vulnerabilities, we aim to facilitate the development of more robust CL-based recommender systems.
    摘要 对比学习(Contrastive Learning,CL)在推荐领域的应用已经吸引了广泛的注意。它可以不需要大量的标签数据来学习,因此可以对于数据缺乏问题提供自然的解决方案。前一些研究发现CL可以不仅提高推荐精度,而且也可以不断展现对杂音的抗性。然而,这篇论文发现CL基于的推荐系统存在一个漏洞:与非CL counterpart相比,它对于攻击目标项目更加易受攻击。我们的分析表明,CL损失的均匀分布表现引起了这个漏洞。我们还进一步透过理论和实验显示了CL损失的优化可以导致表现的spectral值稳定。基于这些见解,我们尝试揭露CL基于的推荐系统可能面临的潜在攻击。我们的攻击模型包括两个目标:一是将表现值分布均匀化,以增强CL损失的均匀分布效应,名为均匀增强(Dispersion Promotion);另一个是直接增加目标项目的可见度,名为排名提升(Rank Promotion)。我们通过实验证明了我们的攻击模型的破坏性,透过四个数据集进行了广泛的实验。通过揭露这些漏洞,我们希望能够推动CL基于的推荐系统的更加Robust。

  • paper_url: http://arxiv.org/abs/2312.02184
  • repo_url: None
  • paper_authors: Jiwei Zhao, Jiacheng Chen, Zeyu Sun, Yuhang Shi, Haibo Zhou, Xuemin, Shen
  • for: 这篇论文是为了解决FD-RAN网络中传输机制问题,提高spectrum资源的利用效率和网络成本。
  • methods: 该论文提出了一种新的传输方案,不需要物理层承载反馈。 Specifically, it proposes a novel transmission scheme that uses a radio map based complex-valued precoding network (RMCPNet) model to output base station precoding based on user location.
  • results: 根据公共的DeepMIMO数据集的评估结果,RMCPNet可以相比传统的实数值神经网络和统计代码库方法,提高16%和76%。
    Abstract As the demand for high-quality services proliferates, an innovative network architecture, the fully-decoupled RAN (FD-RAN), has emerged for more flexible spectrum resource utilization and lower network costs. However, with the decoupling of uplink base stations and downlink base stations in FD-RAN, the traditional transmission mechanism, which relies on real-time channel feedback, is not suitable as the receiver is not able to feedback accurate and timely channel state information to the transmitter. This paper proposes a novel transmission scheme without relying on physical layer channel feedback. Specifically, we design a radio map based complex-valued precoding network~(RMCPNet) model, which outputs the base station precoding based on user location. RMCPNet comprises multiple subnets, with each subnet responsible for extracting unique modal features from diverse input modalities. Furthermore, the multi-modal embeddings derived from these distinct subnets are integrated within the information fusion layer, culminating in a unified representation. We also develop a specific RMCPNet training algorithm that employs the negative spectral efficiency as the loss function. We evaluate the performance of the proposed scheme on the public DeepMIMO dataset and show that RMCPNet can achieve 16\% and 76\% performance improvements over the conventional real-valued neural network and statistical codebook approach, respectively.
    摘要 The proposed scheme uses a novel transmission scheme called RMCPNet, which is a radio map based complex-valued precoding network. This network takes user location into account and outputs base station precoding based on the location. RMCPNet consists of multiple subnets that extract unique modal features from diverse input modalities, and these features are then integrated in the information fusion layer to create a unified representation.To train RMCPNet, a specific training algorithm is developed that uses negative spectral efficiency as the loss function. The performance of the proposed scheme is evaluated on the public DeepMIMO dataset and shows that RMCPNet can achieve 16% and 76% performance improvements over conventional real-valued neural networks and statistical codebook approaches, respectively.

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

  • paper_url: http://arxiv.org/abs/2312.00080
  • repo_url: https://github.com/wang-cr/pdb-struct
  • paper_authors: Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang
  • for: 评估 protein 设计方法的标准 bencmark
  • methods: 使用高精度 protein 结构预测模型作为潮湿实验的代理,以及评估模型是否能够分配高概率给实验性稳定蛋白质
  • results: 对 PDB-Struct bencmark进行了评估,发现 ByProt、ProteinMPNN 和 ESM-IF 表现出色,而 ESM-Design 和 AF-Design 在 refoldability 指标下表现不佳。 Code 可以在 https://github.com/WANG-CR/PDB-Struct 上下载。
    Abstract Structure-based protein design has attracted increasing interest, with numerous methods being introduced in recent years. However, a universally accepted method for evaluation has not been established, since the wet-lab validation can be overly time-consuming for the development of new algorithms, and the $\textit{in silico}$ validation with recovery and perplexity metrics is efficient but may not precisely reflect true foldability. To address this gap, we introduce two novel metrics: refoldability-based metric, which leverages high-accuracy protein structure prediction models as a proxy for wet lab experiments, and stability-based metric, which assesses whether models can assign high likelihoods to experimentally stable proteins. We curate datasets from high-quality CATH protein data, high-throughput $\textit{de novo}$ designed proteins, and mega-scale experimental mutagenesis experiments, and in doing so, present the $\textbf{PDB-Struct}$ benchmark that evaluates both recent and previously uncompared protein design methods. Experimental results indicate that ByProt, ProteinMPNN, and ESM-IF perform exceptionally well on our benchmark, while ESM-Design and AF-Design fall short on the refoldability metric. We also show that while some methods exhibit high sequence recovery, they do not perform as well on our new benchmark. Our proposed benchmark paves the way for a fair and comprehensive evaluation of protein design methods in the future. Code is available at https://github.com/WANG-CR/PDB-Struct.
    摘要 Structure-based protein设计已经吸引了越来越多的关注,而最近几年内出现了许多新的方法。然而,一个通用的评估方法没有被确立,因为湿化实验室 Validation 可能需要很长时间来开发新的算法,而 $\textit{in silico}$ 验证方法可能不准确地反映真实的折叠性。为了解决这个 gap,我们介绍了两个新的指标:折叠性基于指标,利用高精度蛋白结构预测模型作为湿化实验室实验的代理,以及稳定性基于指标,评估模型是否可以赋予高概率给实验上稳定的蛋白质。我们从高质量的 CATH 蛋白数据、高通量 $\textit{de novo}$ 设计蛋白和巨规实验mutagenesis实验中筛选数据,并将其称为 $\textbf{PDB-Struct}$ 约束,这种约束评估了最近和以前未经评估的蛋白设计方法。实验结果表明,ByProt、ProteinMPNN 和 ESM-IF 在我们的约束中表现出色,而 ESM-Design 和 AF-Design 在折叠性指标中表现不佳。我们还发现,一些方法具有高序列恢复率,但它们不如我们的新指标表现。我们提出的约束将来将为蛋白设计方法的评估带来公平和全面的评估。代码可以在 上获取。

Leveraging cache to enable SLU on tiny devices

  • paper_url: http://arxiv.org/abs/2311.18188
  • repo_url: None
  • paper_authors: Afsara Benazir, Zhiming Xu, Felix Xiaozhu Lin
  • for: 本研究探讨了在微控制器类型的嵌入式设备上实现语音理解(SLU),并将在设备上执行与云端卸载结合在一起。我们利用设备的语音输入的时间地区性,并将新的输入与缓存的结果进行匹配,只有未匹配的输入才被上载到云端进行完整的推理。
  • methods: 我们提出了一种新的语音缓存技术(XYZ),用于在缓存中匹配语音输入。该技术使用了分割后的字符串序列进行匹配,并且通过在不同级别进行匹配来提供不同的代价和准确率之间的负责任。
  • results: 我们的实现在一个常见的STM32微控制器上实现了一个小于2MB的内存占用。在考验了一些语音 benchmark 上,我们的系统可以在45%-90%的输入上进行设备上解决,同时降低了平均响应时间,相比于将输入上loads到流行的云端语音服务。我们的优势在室内噪音环境、冷缓存和一个设备被多个用户共享的情况下也表现出来。
    Abstract This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We exploit temporal locality in a device's speech inputs and accordingly reuse recent SLU inferences. Our idea is simple: let the device match new inputs against cached results, and only offload unmatched inputs to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust, low-cost way. To this end, we present XYZ, a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by clustered sequences of raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary cost/accuracy tradeoffs. To further boost accuracy, our cache is learning: with the mismatched and then offloaded inputs, it continuously finetunes the device's feature extractors (with the assistance of the cloud). We implement XYZ on an off-the-shelf STM32 microcontroller. The resultant implementation has a small memory footprint of 2MB. Evaluated on challenging speech benchmarks, our system resolves 45%--90% of inputs on device, reducing the average latency by up to 80% compared to offloading to popular cloud speech services. Our benefit is pronounced even in adversarial settings -- noisy environments, cold cache, or one device shared by a number of users.
    摘要

An Effective Universal Polynomial Basis for Spectral Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.18177
  • repo_url: None
  • paper_authors: Keke Huang, Pietro Liò
  • for: 这篇论文主要针对于处理异质图的问题,即使用spectral graph neural networks (GNNs)和 Laplacian eigendecomposition来解决异质图中的问题。
  • methods: 本论文提出了一种基于异质度的 adaptive heterophily basis,并将其与同质度基准集成,创造了一个通用的多项式基准UniBasis。
  • results: 经过实验表明,UniFilter可以在真实世界的和synthetic dataset上达到显著的性能提升,这demonstrates the effectiveness and generality of UniBasis,以及其在图分析中的潜在应用前景。
    Abstract Spectral Graph Neural Networks (GNNs), also referred to as graph filters have gained increasing prevalence for heterophily graphs. Optimal graph filters rely on Laplacian eigendecomposition for Fourier transform. In an attempt to avert the prohibitive computations, numerous polynomial filters by leveraging distinct polynomials have been proposed to approximate the desired graph filters. However, polynomials in the majority of polynomial filters are predefined and remain fixed across all graphs, failing to accommodate the diverse heterophily degrees across different graphs. To tackle this issue, we first investigate the correlation between polynomial bases of desired graph filters and the degrees of graph heterophily via a thorough theoretical analysis. Afterward, we develop an adaptive heterophily basis by incorporating graph heterophily degrees. Subsequently, we integrate this heterophily basis with the homophily basis, creating a universal polynomial basis UniBasis. In consequence, we devise a general polynomial filter UniFilter. Comprehensive experiments on both real-world and synthetic datasets with varying heterophily degrees significantly support the superiority of UniFilter, demonstrating the effectiveness and generality of UniBasis, as well as its promising capability as a new method for graph analysis.
    摘要 spectral graph neural networks (GNNs),也称为图 filters,在不同的图structure上获得了越来越多的应用。但是,有效的图 filters 的计算却需要 Laplacian eigendecomposition,却是一项繁琐的计算。为了缓解这个问题,许多 polynomial filters 被提出,这些 filters 使用不同的多项式来 aproximate 想要的图 filters。然而,在大多数 polynomial filters 中,多项式是预先定义的,并在所有图上保持不变,这会导致不能满足不同图的多样性。为了解决这个问题,我们首先 investigate 了愿望的图 filters 的多项式基和图的多样性度之间的关系,通过了一个系统的理论分析。然后,我们开发了一种可适应多样性度的基,通过将图的多样性度 incorporated 到多项式基中。接着,我们将这种多样性基与 homophily 基 Composite 在一起,得到了一个通用的多项式基 UniBasis。最后,我们提出了一种通用的多项式 filter UniFilter。对于实际和 sintetic 数据集,我们进行了广泛的实验,发现 UniFilter 在不同的多样性度下表现出了明显的优势,证明了 UniBasis 的有效性和通用性,以及其在图分析中的扎实可靠性。

Packrat: Automatic Reconfiguration for Latency Minimization in CPU-based DNN Serving

  • paper_url: http://arxiv.org/abs/2311.18174
  • repo_url: None
  • paper_authors: Ankit Bhardwaj, Amar Phanishayee, Deepak Narayanan, Mihail Tarta, Ryan Stutsman
  • for: 提高 CPU 服务器上 Deep Neural Network (DNN) 模型的性能表现。
  • methods: 使用多线程并行计算,但是发现这会导致效果下降。Packrat 服务器系统提供一个算法来自动选择最佳的实例数、线程数和批处理大小,以优化推理延迟。
  • results: Packrat 在各种批处理大小下,对常用 DNN 模型进行了优化,并且可以提高推理延迟的平均值。 Specifically, Packrat 可以提高推理延迟的值为 1.43 倍至 1.83 倍。
    Abstract In this paper, we investigate how to push the performance limits of serving Deep Neural Network (DNN) models on CPU-based servers. Specifically, we observe that while intra-operator parallelism across multiple threads is an effective way to reduce inference latency, it provides diminishing returns. Our primary insight is that instead of running a single instance of a model with all available threads on a server, running multiple instances each with smaller batch sizes and fewer threads for intra-op parallelism can provide lower inference latency. However, the right configuration is hard to determine manually since it is workload- (DNN model and batch size used by the serving system) and deployment-dependent (number of CPU cores on server). We present Packrat, a new serving system for online inference that given a model and batch size ($B$) algorithmically picks the optimal number of instances ($i$), the number of threads each should be allocated ($t$), and the batch sizes each should operate on ($b$) that minimizes latency. Packrat is built as an extension to TorchServe and supports online reconfigurations to avoid serving downtime. Averaged across a range of batch sizes, Packrat improves inference latency by 1.43$\times$ to 1.83$\times$ on a range of commonly used DNNs.
    摘要 To address this challenge, we present Packrat, a new serving system for online inference that automatically selects the optimal number of instances, the number of threads each should be allocated, and the batch sizes each should operate on to minimize latency. Packrat is built as an extension to TorchServe and supports online reconfigurations to avoid serving downtime.We evaluate Packrat on a range of commonly used DNNs and find that it improves inference latency by 1.43 to 1.83 times, on average, across a range of batch sizes.

Towards A Foundation Model For Trajectory Intelligence

  • paper_url: http://arxiv.org/abs/2312.00076
  • repo_url: None
  • paper_authors: Alameen Najjar
  • for: 这个论文是为了训练一个大规模的旅程模型,使用真实世界用户检查到数据。
  • methods: 这种方法采用预训练和精度调整的方法,首先预训练基本模型通过遮盲旅程模型,然后通过精度调整进行多种下游任务。为了解决噪音数据和大 spatial vocabularies 的挑战,我们提出了一个新的空间分词方法。
  • results: 我们的实验使用了超过20亿次检查到数据,并经过3个下游任务的精度调整,显示了我们的基本模型已经有效地学习了原始数据中的有价值下层模式,使其能够应用于有意义的旅程智能任务。
    Abstract We present the results of training a large trajectory model using real-world user check-in data. Our approach follows a pre-train and fine-tune paradigm, where a base model is pre-trained via masked trajectory modeling and then adapted through fine-tuning for various downstream tasks. To address challenges posed by noisy data and large spatial vocabularies, we propose a novel spatial tokenization block. Our empirical analysis utilizes a comprehensive dataset of over 2 billion check-ins generated by more than 6 million users. Through fine-tuning on 3 downstream tasks we demonstrate that our base model has effectively learned valuable underlying patterns in raw data, enabling its application in meaningful trajectory intelligence tasks. Despite some limitations, we believe this work represents an important step forward in the realization of a foundation model for trajectory intelligence.
    摘要 我们现在公布了一个大规模的轨迹模型训练结果,使用真实世界用户检查入数据。我们的方法采用预训练和精度调整的方法,其中基本模型通过遮盲轨迹模型进行预训练,然后通过精度调整进行多种下游任务的适应。为了解决噪音数据和大型空间词汇所带来的挑战,我们提出了一个新的空间分词块。我们的实验使用了超过20亿次检查入数据,并进行了3种下游任务的精度调整。我们的结果表明,我们的基本模型已经有效地学习了原始数据中的有价值的下层模式,因此可以应用于有意义的轨迹智能任务。虽然有一些限制,但我们认为这项工作代表了轨迹智能领域的一个重要一步前进。