cs.LG - 2023-11-09

An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2311.05794
  • repo_url: None
  • paper_authors: Biyonka Liang, Iavor Bojinov
  • for: 这个论文是为了提供一种实现连续推断多臂投掷机(MAB)实验中的平均治疗效果(ATE)的新的实验设计。
  • methods: 这个论文使用了一种新的混合式适应设计(MAD),允许在新数据 arrive 时进行连续推断ATE,并且保证了统计有效性和力量。
  • results: 研究表明,使用MAD可以提高ATE推断的覆盖率和功能,而无需损失 finite-sample 奖励。
    Abstract Typically, multi-armed bandit (MAB) experiments are analyzed at the end of the study and thus require the analyst to specify a fixed sample size in advance. However, in many online learning applications, it is advantageous to continuously produce inference on the average treatment effect (ATE) between arms as new data arrive and determine a data-driven stopping time for the experiment. Existing work on continuous inference for adaptive experiments assumes that the treatment assignment probabilities are bounded away from zero and one, thus excluding nearly all standard bandit algorithms. In this work, we develop the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandits that enables continuous inference on the ATE with guarantees on statistical validity and power for nearly any bandit algorithm. On a high level, the MAD "mixes" a bandit algorithm of the user's choice with a Bernoulli design through a tuning parameter $\delta_t$, where $\delta_t$ is a deterministic sequence that controls the priority placed on the Bernoulli design as the sample size grows. We show that for $\delta_t = o\left(1/t^{1/4}\right)$, the MAD produces a confidence sequence that is asymptotically valid and guaranteed to shrink around the true ATE. We empirically show that the MAD improves the coverage and power of ATE inference in MAB experiments without significant losses in finite-sample reward.
    摘要 Simplified Chinese:通常,多臂炮兵(MAB)实验会在实验结束时进行分析,因此需要分析者在先specify一个固定的样本大小。然而,在许多在线学习应用中,可以在新数据来临时continuously生成对准征效果(ATE)的推断,并在实验结束时确定基于数据的停止时间。现有的连续推断方法 для适应实验假设了对准分布是不是零和一之间,因此排除了大多数标准炮兵算法。在这种工作中,我们开发了 Mixture Adaptive Design(MAD),一种新的实验设计方法,可以在多臂炮兵中实时生成ATE推断,并且保证逻辑有效性和功能力。在高层次上,MAD“混合”了用户选择的炮兵算法和bernoulli设计,通过一个名为$\delta_t$的 deterministic sequence控制 Bernoulli设计在样本大小增长时的优先级。我们证明,对于 $\delta_t = o\left(1/t^{1/4}\right)$,MAD生成的信度序列是 asymptotically 有效的,并且保证会缩小到真实的ATE。我们还 empirically 表明,MAD在 MAB 实验中改善了ATE推断的覆盖率和功能力,而不是在finite-sample reward中带来显著的损失。

Detecting Suspicious Commenter Mob Behaviors on YouTube Using Graph2Vec

  • paper_url: http://arxiv.org/abs/2311.05791
  • repo_url: None
  • paper_authors: Shadi Shajari, Mustafa Alassad, Nitin Agarwal
  • for: 本研究旨在探讨YouTube上异常评论行为的发展趋势和相似性特征,以便更好地理解这些行为的起源和传播方式。
  • methods: 本研究采用社会网络分析方法对YouTube频道进行分析,旨在检测这些频道上异常评论行为的存在和相似性特征。
  • results: 研究发现YouTube上的异常评论行为具有明显的相似性特征,这些特征可能是由同一个或多个人或组织操作所致。这种发现可能有助于理解YouTube上异常评论行为的起源和传播方式,并为其应对和预防提供参考。
    Abstract YouTube, a widely popular online platform, has transformed the dynamics of con-tent consumption and interaction for users worldwide. With its extensive range of content crea-tors and viewers, YouTube serves as a hub for video sharing, entertainment, and information dissemination. However, the exponential growth of users and their active engagement on the platform has raised concerns regarding suspicious commenter behaviors, particularly in the com-ment section. This paper presents a social network analysis-based methodology for detecting suspicious commenter mob-like behaviors among YouTube channels and the similarities therein. The method aims to characterize channels based on the level of such behavior and identify com-mon patterns across them. To evaluate the effectiveness of the proposed model, we conducted an analysis of 20 YouTube channels, consisting of 7,782 videos, 294,199 commenters, and 596,982 comments. These channels were specifically selected for propagating false views about the U.S. Military. The analysis revealed significant similarities among the channels, shedding light on the prevalence of suspicious commenter behavior. By understanding these similarities, we contribute to a better understanding of the dynamics of suspicious behavior on YouTube channels, which can inform strategies for addressing and mitigating such behavior.
    摘要

Structured Transforms Across Spaces with Cost-Regularized Optimal Transport

  • paper_url: http://arxiv.org/abs/2311.05788
  • repo_url: None
  • paper_authors: Othmane Sebbouh, Marco Cuturi, Gabriel Peyré
  • for: 这个论文的目的是匹配来自不同 metric space 的概率分布。
  • methods: 论文使用了linear optimal transport(OT)问题的实例来实现匹配,其中包括一个基础成本函数来量化概率分布之间的差异。
  • results: 论文提出了一种使用cost-regularized OT来匹配概率分布,并在不同的Euclidean空间中应用了这种方法。它还提出了一种 enforcing structure in linear transform 的方法,并提供了一种 proximal 算法来实现这种方法。
    Abstract Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points. When these measures live in the same metric space, the ground cost often defaults to its distance. When instantiated across two different spaces, however, choosing that cost in the absence of aligned data is a conundrum. As a result, practitioners often resort to solving instead a quadratic Gromow-Wasserstein (GW) problem. We exploit in this work a parallel between GW and cost-regularized OT, the regularized minimization of a linear OT objective parameterized by a ground cost. We use this cost-regularized formulation to match measures across two different Euclidean spaces, where the cost is evaluated between transformed source points and target points. We show that several quadratic OT problems fall in this category, and consider enforcing structure in linear transform (e.g. sparsity), by introducing structure-inducing regularizers. We provide a proximal algorithm to extract such transforms from unaligned data, and demonstrate its applicability to single-cell spatial transcriptomics/multiomics matching tasks.
    摘要 匹配源概率度量到目标概率度量经常通过实例化线性最优运输(OT)问题来解决,该问题 Parametrized by 地面成本函数,该函数量化点之间的差异。当这些度量 живу在同一个度量空间时,地面成本通常 defaults to 距离。但当 instantiated across two different 空间时,无法选择地面成本的问题在缺失协调数据时是一个 Conundrum。为此,实践者们通常将 solve 而不是 quadratic Gromow-Wasserstein(GW)问题。我们在这工作中利用了 GW 和 cost-regulated OT 之间的并行关系,其中 regulated 是 linear OT 目标函数中的 parameterized 的ground cost。我们使用这种 cost-regulated 形式来匹配两个不同的欧几丁素空间中的度量,其中 cost 是将源点和目标点转换后评估的。我们证明了 quadratic OT 问题的一部分 fall 在这类ategory,并考虑了在 linear transform 中引入结构(例如简洁),例如 introducing structure-inducing regularizers。我们提供了一种 proximal 算法来EXTRACT 这些转换,并在单元细胞空间表型学/多Omics 匹配任务中应用了其可行性。

Towards stable real-world equation discovery with assessing differentiating quality influence

  • paper_url: http://arxiv.org/abs/2311.05787
  • repo_url: None
  • paper_authors: Mikhail Masliaev, Ilya Markov, Alexander Hvatov
  • for: 本研究探讨了数据驱动的微分方程发现中不同差分方法的核心作用。
  • methods: 本研究提出了四种不同的差分方法,包括Savitzky-Golay滤波、spectral differentiation、人工神经网络平滑和 derive variation regularization。
  • results: 我们对这些方法进行了评估,包括它们在真实问题中的适用性和微分方程发现算法的稳定性。这些研究为实际过程模型的稳定和可靠性提供了有价值的洞察。
    Abstract This paper explores the critical role of differentiation approaches for data-driven differential equation discovery. Accurate derivatives of the input data are essential for reliable algorithmic operation, particularly in real-world scenarios where measurement quality is inevitably compromised. We propose alternatives to the commonly used finite differences-based method, notorious for its instability in the presence of noise, which can exacerbate random errors in the data. Our analysis covers four distinct methods: Savitzky-Golay filtering, spectral differentiation, smoothing based on artificial neural networks, and the regularization of derivative variation. We evaluate these methods in terms of applicability to problems, similar to the real ones, and their ability to ensure the convergence of equation discovery algorithms, providing valuable insights for robust modeling of real-world processes.
    摘要

Real-time Control of Electric Autonomous Mobility-on-Demand Systems via Graph Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.05780
  • repo_url: https://github.com/stanfordasl/graph-rl-for-eamod
  • paper_authors: Aaryan Singhal, Daniele Gammelli, Justin Luke, Karthik Gopalakrishnan, Dominik Helmreich, Marco Pavone
  • for: 提高电动自动驾驶服务 fleet 的实时决策效率,包括匹配可用车辆与需求请求、重新分配潜在车辆至高需求区域、和负荷车辆充电以确保 sufficient range。
  • methods: 采用人工智能学习方法,特别是图网络基于的 reinforcement learning 框架,以提高可扩展性和性能。
  • results: 使用实际数据从 Сан Francisco 和纽约市,实验结果表明,我们的方法可以达到89%的利润,同时在计算时间方面实现100倍的加速。此外,我们的方法也可以比领域专门的规则更高,增加利润达3倍。此外,我们的学习策略还表现出了零扩展性和服务区域扩展的潜力。
    Abstract Operators of Electric Autonomous Mobility-on-Demand (E-AMoD) fleets need to make several real-time decisions such as matching available cars to ride requests, rebalancing idle cars to areas of high demand, and charging vehicles to ensure sufficient range. While this problem can be posed as a linear program that optimizes flows over a space-charge-time graph, the size of the resulting optimization problem does not allow for real-time implementation in realistic settings. In this work, we present the E-AMoD control problem through the lens of reinforcement learning and propose a graph network-based framework to achieve drastically improved scalability and superior performance over heuristics. Specifically, we adopt a bi-level formulation where we (1) leverage a graph network-based RL agent to specify a desired next state in the space-charge graph, and (2) solve more tractable linear programs to best achieve the desired state while ensuring feasibility. Experiments using real-world data from San Francisco and New York City show that our approach achieves up to 89% of the profits of the theoretically-optimal solution while achieving more than a 100x speedup in computational time. Furthermore, our approach outperforms the best domain-specific heuristics with comparable runtimes, with an increase in profits by up to 3x. Finally, we highlight promising zero-shot transfer capabilities of our learned policy on tasks such as inter-city generalization and service area expansion, thus showing the utility, scalability, and flexibility of our framework.
    摘要 运营电动自动化出行(E-AMoD)车队需要在实时情况下做出多个决策,例如匹配可用车辆和需求请求、重新分配空闲车辆到需求高的区域、并对车辆进行充电以确保充足的范围。这个问题可以表示为一个线性程序,但是由于问题的大小,不能在现实情况下实时实施。在这篇文章中,我们通过强化学习来解决E-AMoD控制问题,并提出一个基于图网络的框架,以实现快速的执行和高性能。我们采用了两级形式,其中我们(1)利用图网络基于的强化学习代理来指定下一个状态的愿望在空间充电图上,并(2)解决更加可行的线性程序,以实现愿望状态的最佳实现,并确保可行性。实验使用了旧金山和纽约市的实际数据,显示我们的方法可以达到89%的理论优化解的利润,同时实现了更 чем100倍的计算时间减少。此外,我们的方法还超过了领域专门的最佳办法,增加了利润达3倍。最后,我们强调了我们学习政策的零式转移能力,在不同的任务上如城市间扩展和服务区域扩展等任务上表现出了便利、扩展和灵活性。

Dirichlet Energy Enhancement of Graph Neural Networks by Framelet Augmentation

  • paper_url: http://arxiv.org/abs/2311.05767
  • repo_url: None
  • paper_authors: Jialin Chen, Yuelin Wang, Cristian Bodnar, Rex Ying, Pietro Lio, Yu Guang Wang
  • for: 本文主要针对 graph neural network 中的 over-smoothing 问题进行解决,提出了一种基于 framelet 系统的 Energy Enhanced Convolution (EEConv) 操作。
  • methods: 本文使用了 framelet 系统来对 Dirichlet energy 进行分析,并提出了一种 Framelet Augmentation 策略以提高 Dirichlet energy。基于这种策略,本文还提出了一种 Effective and Practical 的 Energy Enhanced Convolution (EEConv) 操作。
  • results: 本文通过实验表明,使用 EEConv 操作可以提高 graph neural network 的性能,特别是在 heterophilous graphs 上。同时,EEConv 还可以逐渐提高 Dirichlet energy 的值,从而解决 over-smoothing 问题。
    Abstract Graph convolutions have been a pivotal element in learning graph representations. However, recursively aggregating neighboring information with graph convolutions leads to indistinguishable node features in deep layers, which is known as the over-smoothing issue. The performance of graph neural networks decays fast as the number of stacked layers increases, and the Dirichlet energy associated with the graph decreases to zero as well. In this work, we introduce a framelet system into the analysis of Dirichlet energy and take a multi-scale perspective to leverage the Dirichlet energy and alleviate the over-smoothing issue. Specifically, we develop a Framelet Augmentation strategy by adjusting the update rules with positive and negative increments for low-pass and high-passes respectively. Based on that, we design the Energy Enhanced Convolution (EEConv), which is an effective and practical operation that is proved to strictly enhance Dirichlet energy. From a message-passing perspective, EEConv inherits multi-hop aggregation property from the framelet transform and takes into account all hops in the multi-scale representation, which benefits the node classification tasks over heterophilous graphs. Experiments show that deep GNNs with EEConv achieve state-of-the-art performance over various node classification datasets, especially for heterophilous graphs, while also lifting the Dirichlet energy as the network goes deeper.
    摘要 “几何对�ERT���ental���������������������缓�����������������缓�����������������缓�����������������缓��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

Generative Explanations for Graph Neural Network: Methods and Evaluations

  • paper_url: http://arxiv.org/abs/2311.05764
  • repo_url: None
  • paper_authors: Jialin Chen, Kenza Amara, Junchi Yu, Rex Ying
  • for: 这篇论文主要针对的是图像预测任务中的图 neural network (GNNs) 的解释性能。
  • methods: 这篇论文提出了一种基于图生成的 GNNs 解释方法,包括两个优化目标:归因和信息约束。
  • results: 实验结果显示了不同解释方法的优缺点,包括解释性能、效率和泛化能力。
    Abstract Graph Neural Networks (GNNs) achieve state-of-the-art performance in various graph-related tasks. However, the black-box nature often limits their interpretability and trustworthiness. Numerous explainability methods have been proposed to uncover the decision-making logic of GNNs, by generating underlying explanatory substructures. In this paper, we conduct a comprehensive review of the existing explanation methods for GNNs from the perspective of graph generation. Specifically, we propose a unified optimization objective for generative explanation methods, comprising two sub-objectives: Attribution and Information constraints. We further demonstrate their specific manifestations in various generative model architectures and different explanation scenarios. With the unified objective of the explanation problem, we reveal the shared characteristics and distinctions among current methods, laying the foundation for future methodological advancements. Empirical results demonstrate the advantages and limitations of different explainability approaches in terms of explanation performance, efficiency, and generalizability.
    摘要 GRAPH NEURAL NETWORKS (GNNs) achiev state-of-the-art performance in various graph-related tasks. However, the black-box nature often limits their interpretability and trustworthiness. Numerous explainability methods have been proposed to uncover the decision-making logic of GNNs, by generating underlying explanatory substructures. In this paper, we conduct a comprehensive review of the existing explanation methods for GNNs from the perspective of graph generation. Specifically, we propose a unified optimization objective for generative explanation methods, comprising two sub-objectives: Attribution and Information constraints. We further demonstrate their specific manifestations in various generative model architectures and different explanation scenarios. With the unified objective of the explanation problem, we reveal the shared characteristics and distinctions among current methods, laying the foundation for future methodological advancements. Empirical results demonstrate the advantages and limitations of different explainability approaches in terms of explanation performance, efficiency, and generalizability.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know and I can provide that as well.

MALCOM-PSGD: Inexact Proximal Stochastic Gradient Descent for Communication-Efficient Decentralized Machine Learning

  • paper_url: http://arxiv.org/abs/2311.05760
  • repo_url: None
  • paper_authors: Andrew Campbell, Hang Liu, Leah Woldemariam, Anna Scaglione
    for:* This paper aims to improve the efficiency of decentralized machine learning by addressing the bottleneck of frequent model communication.methods:* The proposed method, MALCOM-PSGD, integrates gradient compression techniques with model sparsification.* Proximal stochastic gradient descent is used to handle non-smoothness resulting from $\ell_1$ regularization.* Vector source coding and dithering-based quantization are used for compressed gradient communication of sparsified models.results:* The proposed method achieves a convergence rate of $\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$ with a diminishing learning rate.* Communication costs are reduced by approximately $75%$ compared to the state-of-the-art method.
    Abstract Recent research indicates that frequent model communication stands as a major bottleneck to the efficiency of decentralized machine learning (ML), particularly for large-scale and over-parameterized neural networks (NNs). In this paper, we introduce MALCOM-PSGD, a new decentralized ML algorithm that strategically integrates gradient compression techniques with model sparsification. MALCOM-PSGD leverages proximal stochastic gradient descent to handle the non-smoothness resulting from the $\ell_1$ regularization in model sparsification. Furthermore, we adapt vector source coding and dithering-based quantization for compressed gradient communication of sparsified models. Our analysis shows that decentralized proximal stochastic gradient descent with compressed communication has a convergence rate of $\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$ assuming a diminishing learning rate and where $t$ denotes the number of iterations. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately $75\%$ when compared to the state-of-the-art method.
    摘要 近期研究表明,频繁的模型通信成为分布式机器学习(ML)的效率瓶颈,特别是大规模和过参的神经网络(NN)。在这篇论文中,我们介绍了MALCOM-PSGD算法,它利用抑制梯度压缩技术和模型简化来解决这个问题。MALCOM-PSGD使用抑制梯度下降来处理模型简化后的非满射性。此外,我们采用 вектор源编码和抽象化量化来压缩压缩梯度通信。我们的分析表明,分布式 proximal 梯度下降 WITH 压缩通信的收敛速度为 $\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$,其中 $t$ 表示迭代次数。数值结果证明我们的方法可以将通信成本减少约75%,相比之前的状态艺术方法。

Deep Learning Architecture for Network-Efficiency at the Edge

  • paper_url: http://arxiv.org/abs/2311.05739
  • repo_url: None
  • paper_authors: Akrit Mudvari, Antero Vainio, Iason Ofeidis, Sasu Tarkoma, Leandros Tassiulas
  • for: 这个论文是为了提出一种适用于弱设备的分布式学习方法,以优化深度学习模型的网络使用量和响应时间。
  • methods: 该方法基于压缩意识的混合分布式学习,通过适应压缩来提高深度学习模型的网络效率和响应时间。
  • results: 该方法可以提高网络效率 by 4倍,并且可以提高压缩意识混合分布式学习的准确率 by 4%。此外,该方法还可以减少模型训练时间 by up to 6倍,无需影响准确率。
    Abstract The growing number of AI-driven applications in the mobile devices has led to solutions that integrate deep learning models with the available edge-cloud resources; due to multiple benefits such as reduction in on-device energy consumption, improved latency, improved network usage, and certain privacy improvements, split learning, where deep learning models are split away from the mobile device and computed in a distributed manner, has become an extensively explored topic. Combined with compression-aware methods where learning adapts to compression of communicated data, the benefits of this approach have further improved and could serve as an alternative to established approaches like federated learning methods. In this work, we develop an adaptive compression-aware split learning method ('deprune') to improve and train deep learning models so that they are much more network-efficient (use less network resources and are faster), which would make them ideal to deploy in weaker devices with the help of edge-cloud resources. This method is also extended ('prune') to very quickly train deep learning models, through a transfer learning approach, that trades off little accuracy for much more network-efficient inference abilities. We show that the 'deprune' method can reduce network usage by 4x when compared with a split-learning approach (that does not use our method) without loss of accuracy, while also improving accuracy over compression-aware split-learning by 4 percent. Lastly, we show that the 'prune' method can reduce the training time for certain models by up to 6x without affecting the accuracy when compared against a compression-aware split-learning approach.
    摘要 “由于移动设备中的人工智能应用程序的增加,导致将深度学习模型与可用的边缘云资源整合,实现了许多优点,如设备内部能源消耗减少、延迟时间改善、网络使用率改善和一定的隐私改善。在这篇研究中,我们开发了适应压缩敏感的分别学习方法('deprune'),以提高和训练深度学习模型,使其更加网络效率(使用更少的网络资源并更快),这样可以在弱化设备上使用边缘云资源。此外,我们还将这个方法扩展为很快地训练深度学习模型,通过传播学习方法,将准确性和网络效率之间进行了调整。我们发现,使用'deprune'方法可以在比较 Split-learning 方法(不使用我们的方法)时,降低网络使用率由 4 倍,不会影响准确性,同时也提高了压缩敏感 Split-learning 方法的准确性 by 4%。此外,我们发现,使用'prune'方法可以对某些模型进行快速训练,将训练时间从原先的 6 倍缩短至 1/6,不会影响准确性。”

LogShield: A Transformer-based APT Detection System Leveraging Self-Attention

  • paper_url: http://arxiv.org/abs/2311.05733
  • repo_url: None
  • paper_authors: Sihat Afnan, Mushtari Sadia, Shahrear Iqbal, Anindya Iqbal
  • for: 本研究旨在探讨用 transformer 语言模型探测 APT 攻击的可能性,并提出一个名为 LogShield 的框架,以便利用 transformer 语言模型的自注意力特性来检测 APT 攻击。
  • methods: 本研究使用了自定义的 embedding 层来有效地捕捉系统踪迹图中事件序列的Context,并将 RoBERTa 模型的参数和训练过程作为基础进行了扩展。
  • results: 研究结果显示,LogShield 在 DARPA OpTC 和 DARPA TC E3 等两个常见 APT 数据集上的 F1 分数为 98% 和 95%,分别高于 LSTM 模型的 F1 分数(96% 和 94%)。这表明 LogShield 在大数据集上表现出了优异的泛化能力。
    Abstract Cyber attacks are often identified using system and network logs. There have been significant prior works that utilize provenance graphs and ML techniques to detect attacks, specifically advanced persistent threats, which are very difficult to detect. Lately, there have been studies where transformer-based language models are being used to detect various types of attacks from system logs. However, no such attempts have been made in the case of APTs. In addition, existing state-of-the-art techniques that use system provenance graphs, lack a data processing framework generalized across datasets for optimal performance. For mitigating this limitation as well as exploring the effectiveness of transformer-based language models, this paper proposes LogShield, a framework designed to detect APT attack patterns leveraging the power of self-attention in transformers. We incorporate customized embedding layers to effectively capture the context of event sequences derived from provenance graphs. While acknowledging the computational overhead associated with training transformer networks, our framework surpasses existing LSTM and Language models regarding APT detection. We integrated the model parameters and training procedure from the RoBERTa model and conducted extensive experiments on well-known APT datasets (DARPA OpTC and DARPA TC E3). Our framework achieved superior F1 scores of 98% and 95% on the two datasets respectively, surpassing the F1 scores of 96% and 94% obtained by LSTM models. Our findings suggest that LogShield's performance benefits from larger datasets and demonstrates its potential for generalization across diverse domains. These findings contribute to the advancement of APT attack detection methods and underscore the significance of transformer-based architectures in addressing security challenges in computer systems.
    摘要 计算机系统中的攻击常常通过系统和网络日志进行识别。有一些前作使用证明图和机器学习技术来探测攻击,特别是高级 persistently threaten (APT) 攻击,这些攻击非常难以探测。最近,有一些研究使用 transformer 基于语言模型来探测不同类型的攻击。然而,尚未有任何尝试使用 transformer 探测 APT 攻击。此外,现有的状态 искусственный智能技术使用系统证明图,缺乏一个通用的数据处理框架,以便优化性能。为了解决这些限制,以及探索 transformer 基于语言模型的效果,本文提出了 LogShield,一个基于 transformer 的框架,用于探测 APT 攻击模式。我们采用自定义的嵌入层,以有效地捕捉系统日志中事件序列的上下文。虽然训练 transformer 网络具有计算开销,但我们的框架在 APT 探测方面超过了 LSTM 和语言模型的性能。我们将 RoBERTa 模型的参数和训练过程作为基础,并对Well-known APT 数据集(DARPA OpTC 和 DARPA TC E3)进行了广泛的实验。我们的框架在两个数据集上取得了 F1 分数的超过 98% 和 95%,比 LSTM 模型的 F1 分数高出 2% 和 1%。我们的发现表明 LogShield 的性能受到更大的数据集和多样化的领域的影响,并且表明 transformer 基于的建筑在计算机系统安全挑战中的作用非常重要。

Neural Network Methods for Radiation Detectors and Imaging

  • paper_url: http://arxiv.org/abs/2311.05726
  • repo_url: None
  • paper_authors: S. Lin, S. Ning, H. Zhu, T. Zhou, C. L. Morris, S. Clayton, M. Cherukara, R. T. Chen, Z. Wang
  • for: This paper provides an overview of recent advances in image data processing through machine learning and deep neural networks (DNNs) for radiation detectors and imaging hardware.
  • methods: The paper discusses deep learning-based methods for image processing tasks, including data generation at photon sources, and hardware solutions for deep learning acceleration.
  • results: The paper highlights the potential of next-generation analog neuromorphic hardware platforms, such as optical neural networks (ONNs), for high parallel, low latency, and low energy computing to boost deep learning acceleration.
    Abstract Recent advances in image data processing through machine learning and especially deep neural networks (DNNs) allow for new optimization and performance-enhancement schemes for radiation detectors and imaging hardware through data-endowed artificial intelligence. We give an overview of data generation at photon sources, deep learning-based methods for image processing tasks, and hardware solutions for deep learning acceleration. Most existing deep learning approaches are trained offline, typically using large amounts of computational resources. However, once trained, DNNs can achieve fast inference speeds and can be deployed to edge devices. A new trend is edge computing with less energy consumption (hundreds of watts or less) and real-time analysis potential. While popularly used for edge computing, electronic-based hardware accelerators ranging from general purpose processors such as central processing units (CPUs) to application-specific integrated circuits (ASICs) are constantly reaching performance limits in latency, energy consumption, and other physical constraints. These limits give rise to next-generation analog neuromorhpic hardware platforms, such as optical neural networks (ONNs), for high parallel, low latency, and low energy computing to boost deep learning acceleration.
    摘要 最新的进展在图像数据处理领域,特别是深度神经网络(DNNs),允许 для抗辐射仪器和图像硬件的优化和性能提升策略。我们提供图像生成在光子源的概述,基于深度学习的图像处理任务方法,以及用于深度学习加速的硬件解决方案。大多数现有的深度学习方法都是在线进行训练,通常使用大量计算资源。然而,一旦训练完成,DNNs可以快速完成推理任务,并可以部署到边缘设备。现在的趋势是边缘计算,即使用百分之几的能耗或更少,并且实现实时分析的潜力。虽然流行用于边缘计算的电子基于硬件加速器,从中心处理器(CPUs)到应用特定集成电路(ASICs),不断遇到性能限制,如延迟、能耗和其他物理限制。这些限制导致下一代分析器,如光学神经网络(ONNs),以实现高并行、低延迟、低能耗计算,以加速深度学习。

Verilog-to-PyG – A Framework for Graph Learning and Augmentation on RTL Designs

  • paper_url: http://arxiv.org/abs/2311.05722
  • repo_url: None
  • paper_authors: Yingjie Li, Mingju Liu, Alan Mishchenko, Cunxi Yu
  • for: 本研究旨在提供一个开源框架,将RTL设计转换为图表表示形式,并与PyTorch Geometric图学平台集成,以便加速进行RTL设计探索。
  • methods: 本研究使用了一个新的开源框架,名为V2PYG,可以将Verilog设计转换为图表表示形式,并且与OpenROAD开源电子设计自动化工具链集成。此外,研究者还提出了一些新的RTL数据增强方法,可以实现功能相等的设计增强。
  • results: 研究结果显示,V2PYG框架可以实现高效的RTL设计探索,并且可以与OpenROAD开源电子设计自动化工具链集成。此外,研究者还提供了一些使用案例和详细的脚本示例,以便帮助其他研究者快速入门。
    Abstract The complexity of modern hardware designs necessitates advanced methodologies for optimizing and analyzing modern digital systems. In recent times, machine learning (ML) methodologies have emerged as potent instruments for assessing design quality-of-results at the Register-Transfer Level (RTL) or Boolean level, aiming to expedite design exploration of advanced RTL configurations. In this presentation, we introduce an innovative open-source framework that translates RTL designs into graph representation foundations, which can be seamlessly integrated with the PyTorch Geometric graph learning platform. Furthermore, the Verilog-to-PyG (V2PYG) framework is compatible with the open-source Electronic Design Automation (EDA) toolchain OpenROAD, facilitating the collection of labeled datasets in an utterly open-source manner. Additionally, we will present novel RTL data augmentation methods (incorporated in our framework) that enable functional equivalent design augmentation for the construction of an extensive graph-based RTL design database. Lastly, we will showcase several using cases of V2PYG with detailed scripting examples. V2PYG can be found at \url{https://yu-maryland.github.io/Verilog-to-PyG/}.
    摘要 现代嵌入式设计的复杂性需要进一步的优化和分析方法,以确保设计质量。在最近的时间里,机器学习(ML)方法ologies 已经出现为评估设计质量的强大工具,可以快速探索高级RTL配置。本文介绍一个创新的开源框架,可以将RTL设计转换为图表基础,这可以轻松地与PyTorch Geometric图学 платформа集成。此外,V2PYG框架与开源电子设计自动化(EDA)工具链OpenROAD相容,可以方便收集标注数据集。此外,我们还将介绍一些新的RTL数据扩展方法,可以实现功能等价的设计扩展,以建立一个广泛的图表基础RTL设计数据库。最后,我们将展示V2PYG的几个使用例子,并提供详细的脚本示例。V2PYG可以在以下网址找到:https://yu-maryland.github.io/Verilog-to-PyG/。

Efficient Parallelization Layouts for Large-Scale Distributed Model Training

  • paper_url: http://arxiv.org/abs/2311.05610
  • repo_url: https://github.com/aleph-alpha/neurips-want-submission-efficient-parallelization-layouts
  • paper_authors: Johannes Hagemann, Samuel Weinbach, Konstantin Dobler, Maximilian Schall, Gerard de Melo
  • for: 大规模自然语言模型的高效训练
  • methods: 并行化多个硬件加速器, invoke compute和memory优化, FlashAttention和序列并行等最新优化
  • results: 实现了模型训练效率最佳化,特别是对13B模型的模型FLOPs利用率达70.5%。
    Abstract Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a 13B model.
    摘要 大型语言模型的有效培训需要并行运行数百个硬件加速器,并 invoke 多种compute和内存优化。当这些策略结合使用时,它们之间存在复杂的互动,影响最终的培训效率。先前的工作没有访问最新的优化,如FlashAttention或序列并行。在这个工作中,我们进行了大规模的减少研究,总结了可能的培训配置。我们发现,通常使用1个微批大小可以实现最高效的培训布局。大于1个微批大小的时候,需要进行活动检查点或更高的模型并行,也会导致更大的管道弹性。我们最高效的配置可以实现覆盖多种模型大小的状态emo-of-the-art培训效率结果,其中最引人注目的是一个13B模型的FLOPs使用率为70.5%。

Diffusion-Generative Multi-Fidelity Learning for Physical Simulation

  • paper_url: http://arxiv.org/abs/2311.05606
  • repo_url: None
  • paper_authors: Zheng Wang, Shibo Li, Shikai Fang, Shandian Zhe
  • for: 这个论文旨在探讨多贫洁学习,并提出一种基于数学 diffe过程的多贫洁学习方法,以减少计算成本和增加效率。
  • methods: 这个方法基于数学 diffe过程,并使用conditional score模型来控制解析出力的生成。这个方法可以实现多种贫洁模型,并可以快速学习和预测多维解析结果。
  • results: 这个方法在一些典型应用中表现出色,展示了它在多贫洁学习方面的优势,并显示了其具有增强多贫洁模型的能力。
    Abstract Multi-fidelity surrogate learning is important for physical simulation related applications in that it avoids running numerical solvers from scratch, which is known to be costly, and it uses multi-fidelity examples for training and greatly reduces the cost of data collection. Despite the variety of existing methods, they all build a model to map the input parameters outright to the solution output. Inspired by the recent breakthrough in generative models, we take an alternative view and consider the solution output as generated from random noises. We develop a diffusion-generative multi-fidelity (DGMF) learning method based on stochastic differential equations (SDE), where the generation is a continuous denoising process. We propose a conditional score model to control the solution generation by the input parameters and the fidelity. By conditioning on additional inputs (temporal or spacial variables), our model can efficiently learn and predict multi-dimensional solution arrays. Our method naturally unifies discrete and continuous fidelity modeling. The advantage of our method in several typical applications shows a promising new direction for multi-fidelity learning.
    摘要

Sorting Out Quantum Monte Carlo

  • paper_url: http://arxiv.org/abs/2311.05598
  • repo_url: None
  • paper_authors: Jack Richter-Powell, Luca Thiede, Alán Asparu-Guzik, David Duvenaud
  • for: 这个论文的目的是提出一种基于排序的新的抑制层,用于实现量子水平的分子模型中的 fermion 的对称性。
  • methods: 这个论文使用了一种基于注意力的神经网络后端,并将排序层作为对称化层应用于其中,以实现一种可以达到化学精度的分子模型。
  • results: 数值研究表明,这种方法可以在first-row atoms 和小分子的ground state中达到化学精度。Note: “sortlet” is a new term introduced in the paper, and it refers to the antisymmetrization layer derived from sorting.
    Abstract Molecular modeling at the quantum level requires choosing a parameterization of the wavefunction that both respects the required particle symmetries, and is scalable to systems of many particles. For the simulation of fermions, valid parameterizations must be antisymmetric with respect to the exchange of particles. Typically, antisymmetry is enforced by leveraging the anti-symmetry of determinants with respect to the exchange of matrix rows, but this involves computing a full determinant each time the wavefunction is evaluated. Instead, we introduce a new antisymmetrization layer derived from sorting, the $\textit{sortlet}$, which scales as $O(N \log N)$ with regards to the number of particles -- in contrast to $O(N^3)$ for the determinant. We show numerically that applying this anti-symmeterization layer on top of an attention based neural-network backbone yields a flexible wavefunction parameterization capable of reaching chemical accuracy when approximating the ground state of first-row atoms and small molecules.
    摘要 молекулярное моделирование на квантовом уровне требует выбора параметризации волновой функции, которая обеспечивает необходимые симметрии частиц и масштабируется до систем многих частиц. при симуляции fermions необходимо использовать валидные параметризации, которые антисимметричны с точки зрения обмена частиц. обычно, антисимметрия достигается путём вычисления полного определителя при каждом вычислении волновой функции, что имеет сложность $O(N^3)$ по отношению к количеству частиц. в этом статье мы вводим новый слой антисимметризации, основанный на сортировке, называемый $\textit{sortlet}$, который имеет сложность $O(N \log N)$. мы показываем, что применение этого слоя антисимметризации на основе нейронной сети с применением внимания дает гибкую параметризацию волновой функции, которая может достичь точности химических расчетов при оценке Grund-Зтатных состояний первоначальных атомов и мелких молекул.

A Coefficient Makes SVRG Effective

  • paper_url: http://arxiv.org/abs/2311.05589
  • repo_url: https://github.com/davidyyd/alpha-svrg
  • paper_authors: Yida Yin, Zhiqiu Xu, Zhiyuan Li, Trevor Darrell, Zhuang Liu
  • for: 优化现实世界中的神经网络
  • methods: 使用Stochastic Variance Reduced Gradient(SVRG)方法,并 introduce 一个multiplicative coefficient α控制强度,通过线性减速调整
  • results: 对于 deeper networks,α-SVRG方法可以更好地优化神经网络,常见的训练损失比baseline和标准 SVRG更低,并且在不同的架构和图像分类 datasets中表现良好。
    Abstract Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method. However, as Defazio & Bottou (2019) highlights, its effectiveness in deep learning is yet to be proven. In this work, we demonstrate the potential of SVRG in optimizing real-world neural networks. Our analysis finds that, for deeper networks, the strength of the variance reduction term in SVRG should be smaller and decrease as training progresses. Inspired by this, we introduce a multiplicative coefficient $\alpha$ to control the strength and adjust it through a linear decay schedule. We name our method $\alpha$-SVRG. Our results show $\alpha$-SVRG better optimizes neural networks, consistently reducing training loss compared to both baseline and the standard SVRG across various architectures and image classification datasets. We hope our findings encourage further exploration into variance reduction techniques in deep learning. Code is available at https://github.com/davidyyd/alpha-SVRG.
    摘要 Stochastic Variance Reduced Gradient(SVRG),引入于Johnson 和 Zhang(2013),是一种理论上吸引人的优化方法。然而,根据Defazio 和 Bottou(2019)的报告,其在深度学习中的效果尚未得到证明。在这项工作中,我们展示了 SVRG 在真实世界的神经网络中的潜力。我们的分析发现,对于更深的网络,SVRG 中的差异减少项的强度应该更小,并随着训练的进行减少。 inspirited 这个想法,我们引入了一个多项式系数 $\alpha$,用于控制差异减少项的强度,并通过线性衰减调整。我们称之为 $\alpha$-SVRG。我们的结果表明,$\alpha$-SVRG 可以更好地优化神经网络,连续地降低训练损失相比于基准和标准 SVRG,在不同的架构和图像分类 dataset 上。我们希望我们的发现能够鼓励更多人对深度学习中的差异减少技术进行更多的探索。代码可以在 中找到。

Bayesian Methods for Media Mix Modelling with shape and funnel effects

  • paper_url: http://arxiv.org/abs/2311.05587
  • repo_url: None
  • paper_authors: Javier Marin
  • for: 这项研究的目的是探索使用Maxwell-Boltzmann方程和Michaelis-Menten模型在广告混合模型(MMM)应用中的潜在用途。
  • methods: 该研究提议将这些方程 integrate into hierarchical Bayesian模型,以分析consumer behaviors in the context of advertising。
  • results: 这些方程集 excell in accurately describing random dynamics in complex systems like social interactions and consumer-advertising interactions。
    Abstract In recent years, significant progress in generative AI has highlighted the important role of physics-inspired models that utilize advanced mathematical concepts based on fundamental physics principles to enhance artificial intelligence capabilities. Among these models, those based on diffusion equations have greatly improved image quality. This study aims to explore the potential uses of Maxwell-Boltzmann equation, which forms the basis of the kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix Modelling (MMM) applications. We propose incorporating these equations into Hierarchical Bayesian models to analyse consumer behaviour in the context of advertising. These equation sets excel in accurately describing the random dynamics in complex systems like social interactions and consumer-advertising interactions.
    摘要 Translated into Simplified Chinese:近年来,生成AI的进步吸引了广泛关注,推祟于物理概念基于的生成AI模型,这些模型在提高人工智能能力方面发挥了重要作用。 diffusion equation基于的模型在图像质量方面取得了显著进步。本研究计划探讨将Maxwell-Boltzmann方程和Michaelis-Menten模型应用于市场混合模型(MMM)中,以分析广告宣传行为。我们建议将这些方程集成到 hierarchical Bayesian 模型中,以更好地描述消费者行为在广告宣传中的随机动态。

Outlier-Robust Wasserstein DRO

  • paper_url: http://arxiv.org/abs/2311.05573
  • repo_url: https://github.com/sbnietert/outlier-robust-wdro
  • paper_authors: Sloan Nietert, Ziv Goldfeld, Soroosh Shafiee
  • for: 本研究的目的是提出一种robust optimization方法,可以在数据采集和处理中帮助做出数据驱动的决策,并且能够抗性于不同类型的干扰。
  • methods: 本研究使用的方法包括Wasserstein DRO(WDRO)和total variation(TV)杂乱,这两种方法可以帮助捕捉不同类型的干扰,并且可以保证模型在干扰的情况下仍然能够表现良好。
  • results: 本研究的结果表明,提出的outlier-robust WDRO方法可以在不同类型的干扰下表现良好,并且可以避免由干扰所导致的模型训练失败。此外,在特定的丢失函数下,我们可以提出一种有效的隐藏变量方法,以避免模型的过拟合。
    Abstract Distributionally robust optimization (DRO) is an effective approach for data-driven decision-making in the presence of uncertainty. Geometric uncertainty due to sampling or localized perturbations of data points is captured by Wasserstein DRO (WDRO), which seeks to learn a model that performs uniformly well over a Wasserstein ball centered around the observed data distribution. However, WDRO fails to account for non-geometric perturbations such as adversarial outliers, which can greatly distort the Wasserstein distance measurement and impede the learned model. We address this gap by proposing a novel outlier-robust WDRO framework for decision-making under both geometric (Wasserstein) perturbations and non-geometric (total variation (TV)) contamination that allows an $\varepsilon$-fraction of data to be arbitrarily corrupted. We design an uncertainty set using a certain robust Wasserstein ball that accounts for both perturbation types and derive minimax optimal excess risk bounds for this procedure that explicitly capture the Wasserstein and TV risks. We prove a strong duality result that enables tractable convex reformulations and efficient computation of our outlier-robust WDRO problem. When the loss function depends only on low-dimensional features of the data, we eliminate certain dimension dependencies from the risk bounds that are unavoidable in the general setting. Finally, we present experiments validating our theory on standard regression and classification tasks.
    摘要 Distributionally robust optimization (DRO) 是一种有效的方法 для数据驱动决策中存在不确定性。 Waterstein DRO (WDRO) 捕捉了采样或本地异常点的几何不确定性,并寻找一个能够在 Wasserstein 球中心的扩展中进行一致性良好的模型。然而,WDRO 不能考虑非几何异常点,如恶意异常点,这些异常点可以很大程度地扭曲 Wasserstein 距离测量,阻碍学习的模型。我们解决这个漏洞,提出一种新的异常点Robust WDRO 框架,用于在几何(Wasserstein)异常点和非几何(总变量(TV))污染下进行决策。我们使用一个 $\varepsilon$-负法分数的数据进行arbitrarily corrupted,并设置了一个不确定集,该集使用一个Robust Wasserstein ball来考虑这两种异常点类型。我们计算出最优的副作用误差上限,并证明了一种强 dual 结果,允许我们通过可观察的对偶问题来降低问题的计算复杂度。当携带特征值低于数据的特征值时,我们可以从风险上下文中消除一些不可避免的维度依赖,从而更好地降低风险上下文中的风险。最后,我们在标准回归和分类任务上进行了实验验证。

Exploiting Neural-Network Statistics for Low-Power DNN Inference

  • paper_url: http://arxiv.org/abs/2311.05557
  • repo_url: None
  • paper_authors: Lennart Bamberg, Ardalan Najafi, Alberto Garcia-Ortiz
  • for: 这篇论文是为了提高边缘人工智能推断引擎的能效性和能量效率而写的。
  • methods: 本文使用了无页面程式码和神经网络数据和参数的统计分析,以降低互connect和内存的能力消耗。
  • results: 本文的方法可以对现有的benchmark测试项目中的边缘人工智能推断引擎进行低功耗化,并且可以获得更多的能量节省和Compute对象的额外节省。
    Abstract Specialized compute blocks have been developed for efficient DNN execution. However, due to the vast amount of data and parameter movements, the interconnects and on-chip memories form another bottleneck, impairing power and performance. This work addresses this bottleneck by contributing a low-power technique for edge-AI inference engines that combines overhead-free coding with a statistical analysis of the data and parameters of neural networks. Our approach reduces the interconnect and memory power consumption by up to 80% for state-of-the-art benchmarks while providing additional power savings for the compute blocks by up to 39%. These power improvements are achieved with no loss of accuracy and negligible hardware cost.
    摘要 特殊计算块已经为深度学习模型的高效执行开发出来。然而,由于巨量数据和参数的移动,总线和内存又成为了另一个瓶颈,影响能效性。这个工作解决这个瓶颈,通过对神经网络数据和参数的统计分析,实现无损负荷的编程技术。我们的方法可以在现状顶峰的测试集上减少总线和内存的电力消耗,最高减少80%,同时对计算块的电力消耗减少39%。这些电力改善不产生精度损失和很少增加硬件成本。

Information-theoretic generalization bounds for learning from quantum data

  • paper_url: http://arxiv.org/abs/2311.05529
  • repo_url: None
  • paper_authors: Matthias Caro, Tom Gur, Cambyse Rouzé, Daniel Stilck França, Sathyawageeswar Subramanian
  • for: 这篇论文旨在提供一种描述量子学习的通用数学形式,以及用这种形式证明量子学习者的泛化误差预期值。
  • methods: 本文使用量子优化运输和量子吸引准则来建立非 коммутатив版本的解 Coupling 证明,以证明量子学习者的泛化误差预期值。
  • results: 本文提出了一种总结量子学习场景中的各种泛化误差预期值的框架,包括量子状态识别、量子可能approxcorrect 学习、量子参数估计和量子可能approxcorrect 学习类型的函数。此外,本文还证明了这些场景中泛化误差预期值的下界。
    Abstract Learning tasks play an increasingly prominent role in quantum information and computation. They range from fundamental problems such as state discrimination and metrology over the framework of quantum probably approximately correct (PAC) learning, to the recently proposed shadow variants of state tomography. However, the many directions of quantum learning theory have so far evolved separately. We propose a general mathematical formalism for describing quantum learning by training on classical-quantum data and then testing how well the learned hypothesis generalizes to new data. In this framework, we prove bounds on the expected generalization error of a quantum learner in terms of classical and quantum information-theoretic quantities measuring how strongly the learner's hypothesis depends on the specific data seen during training. To achieve this, we use tools from quantum optimal transport and quantum concentration inequalities to establish non-commutative versions of decoupling lemmas that underlie recent information-theoretic generalization bounds for classical machine learning. Our framework encompasses and gives intuitively accessible generalization bounds for a variety of quantum learning scenarios such as quantum state discrimination, PAC learning quantum states, quantum parameter estimation, and quantumly PAC learning classical functions. Thereby, our work lays a foundation for a unifying quantum information-theoretic perspective on quantum learning.
    摘要 学习任务在量子信息和计算中越来越占据着重要地位。它们从基本问题如状态识别和测量到量子可能错误(PAC)学习框架,以及最近提出的陌生变体的状态探测。然而,量子学习理论的多个方向至今仍然分化不受控。我们提议一种通用的数学形式语言,用于描述在类型-量子数据上进行学习,然后测试学习得到的假设是否能够在新数据上正确预测。在这种框架中,我们证明了在类型-量子信息论量上的预测错误的预期值,并且与类型-量子信息论量相关的量子学习场景进行直观地关联。为了实现这一目标,我们使用量子最优运输和量子吸引不等式来建立量子不 commutative 的解 coupling 公式,这些公式在类型-量子信息论量上提供了最新的信息论学习约束。我们的框架包括了多种量子学习场景,如量子状态识别、PAC学习量子状态、量子参数估计和量子 PAC 学习类型函数。因此,我们的工作为量子信息论量上的量子学习提供了一个统一的视角。

Dirichlet Active Learning

  • paper_url: http://arxiv.org/abs/2311.05501
  • repo_url: None
  • paper_authors: Kevin Miller, Ryan Murray
  • for: 这篇论文旨在探讨 Dirichlet Active Learning(DiAL),一种 bayesian-inspired 的活动学习框架。
  • methods: 本框架使用 Dirichlet 随机场来模型对应的特征 conditional class probabilities,并将类似特征之间的观察强度传递给这个随机场,以便在学习任务中使用。
  • results: 本论文透过建立基于几何 Laplacian 的传播算法,证明了 DiAL 在具有少量标签数据的情况下能够与现有的州际标准竞争。此外,论文还提供了一些严格的保证,表明 DiAL 能够同时确保探索和优化。
    Abstract This work introduces Dirichlet Active Learning (DiAL), a Bayesian-inspired approach to the design of active learning algorithms. Our framework models feature-conditional class probabilities as a Dirichlet random field and lends observational strength between similar features in order to calibrate the random field. This random field can then be utilized in learning tasks: in particular, we can use current estimates of mean and variance to conduct classification and active learning in the context where labeled data is scarce. We demonstrate the applicability of this model to low-label rate graph learning by constructing ``propagation operators'' based upon the graph Laplacian, and offer computational studies demonstrating the method's competitiveness with the state of the art. Finally, we provide rigorous guarantees regarding the ability of this approach to ensure both exploration and exploitation, expressed respectively in terms of cluster exploration and increased attention to decision boundaries.
    摘要

Disease Gene Prioritization With Quantum Walks

  • paper_url: http://arxiv.org/abs/2311.05486
  • repo_url: None
  • paper_authors: Harto Saarinen, Mark Goldsmith, Rui-Sheng Wang, Joseph Loscalzo, Sabrina Maniscalco
  • for: 这个论文是为了提出一种新的疾病基因优先级分配算法,该算法基于蛋白质-蛋白质交互(PPI)网络的连接矩阵,并使用 kontinuous-time quantum walk 技术。
  • methods: 该算法使用 kontinuous-time quantum walk 技术,并在 Hamiltonian 中编码种子节点自环。
  • results: 对三种疾病集合和七个 PPI 网络进行比较,该算法的性能比较高,并且可以进行cross-validation和检验reciprocal ranks和recall值。此外,通过扩展分析,该算法可以预测更多的疾病基因。
    Abstract Disease gene prioritization assigns scores to genes or proteins according to their likely relevance for a given disease based on a provided set of seed genes. Here, we describe a new algorithm for disease gene prioritization based on continuous-time quantum walks using the adjacency matrix of a protein-protein interaction (PPI) network. Our algorithm can be seen as a quantum version of a previous method known as the diffusion kernel, but, importantly, has higher performance in predicting disease genes, and also permits the encoding of seed node self-loops into the underlying Hamiltonian, which offers yet another boost in performance. We demonstrate the success of our proposed method by comparing it to several well-known gene prioritization methods on three disease sets, across seven different PPI networks. In order to compare these methods, we use cross-validation and examine the mean reciprocal ranks and recall values. We further validate our method by performing an enrichment analysis of the predicted genes for coronary artery disease. We also investigate the impact of adding self-loops to the seeds, and argue that they allow the quantum walker to remain more local to low-degree seed nodes.
    摘要 疾病基因优先级分配分数到基因或蛋白质根据它们对某种疾病的可能性,以提高疾病基因搜索的效率。我们描述了一种基于连续时间量子游走的新算法,使用蛋白质-蛋白质相互作用(PPI)网络的邻接矩阵。我们的算法可以看作是量子版增量阶段(diffusion kernel)的改进版本,但是具有更高的疾病基因预测性能,并且允许编码种子节点自 Loop into the underlying Hamiltonian,又提供了另一个性能提升。我们通过对三个疾病集和七个不同的 PPI 网络进行比较,使用十字验证和评估 mean reciprocal ranks 和 recall 值来评估这些方法的性能。我们进一步验证了我们的方法,通过对预测的基因进行折衔分析,证明了我们的方法的正确性。此外,我们还investigated the impact of adding self-loops to the seeds, and argued that they allow the quantum walker to remain more local to low-degree seed nodes.

Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?

  • paper_url: http://arxiv.org/abs/2311.05473
  • repo_url: https://github.com/hamilton-health-sciences/ml4h-traq
  • paper_authors: Walter Nelson, Jonathan Ranisau, Jeremy Petch
  • for: 这个论文主要是为了研究现代多中心随机化控制试验中的异常点检测方法。
  • methods: 这篇论文使用了6种现代机器学习基于算法来检测异常数据,并对738个实际数据集和77,001名患者从44个国家进行了 Empirical 评估。
  • results: 研究结果表明,现有的算法可以在没有监督的情况下检测异常数据,但是每个数据集的性能差异很大,无法确定单一的最佳算法。因此,研究人员提出了一种简单的Meta-learned Probabilistic Ensemble(MePE)算法来聚合多个无监督模型的预测结果,并证明其在比较 meta-learning 方法时表现良好。
    Abstract Modern multi-centre randomized controlled trials (MCRCTs) collect massive amounts of tabular data, and are monitored intensively for irregularities by humans. We began by empirically evaluating 6 modern machine learning-based outlier detection algorithms on the task of identifying irregular data in 838 datasets from 7 real-world MCRCTs with a total of 77,001 patients from over 44 countries. Our results reinforce key findings from prior work in the outlier detection literature on data from other domains. Existing algorithms often succeed at identifying irregularities without any supervision, with at least one algorithm exhibiting positive performance 70.6% of the time. However, performance across datasets varies substantially with no single algorithm performing consistently well, motivating new techniques for unsupervised model selection or other means of aggregating potentially discordant predictions from multiple candidate models. We propose the Meta-learned Probabilistic Ensemble (MePE), a simple algorithm for aggregating the predictions of multiple unsupervised models, and show that it performs favourably compared to recent meta-learning approaches for outlier detection model selection. While meta-learning shows promise, small ensembles outperform all forms of meta-learning on average, a negative result that may guide the application of current outlier detection approaches in healthcare and other real-world domains.
    摘要 现代多中心随机控制试验 (MCRCT) 收集庞大量的表格数据,并且由人类严格监测以寻找异常情况。我们开始由对 6 种现代机器学习基于算法进行实验,以确定这些算法在738个数据集中的异常检测任务中的表现。我们的结果证明了先前的研究中关于其他领域的数据的发现结果。现有的算法经常可以无监测情况下检测到异常情况,至少有一个算法在70.6%的时间 exhibit 良好的表现。然而,数据集之间的表现差异很大,没有任何单一的算法能够在所有数据集中表现良好,这引发了新的技术来选择不supervised 模型或其他方式将可能不一致的预测集成为一个整体。我们提议使用 Meta-learned Probabilistic Ensemble (MePE),一种简单的算法来集成多个不supervised 模型的预测,并证明它与最近的meta-learning方法相比表现良好。虽然meta-learning表示潜力,但小集成在平均上超过所有形式的 meta-learning,这是一个负面的结果,可能导向现有的异常检测方法在医疗和其他实际领域的应用中的限制。

A Practical Approach to Novel Class Discovery in Tabular Data

  • paper_url: http://arxiv.org/abs/2311.05440
  • repo_url: None
  • paper_authors: Colin Troisemaine, Alexandre Reiffers-Masson, Stéphane Gosselin, Vincent Lemaire, Sandrine Vaton
  • for: 本文解决了无知类探索(NCD)问题,即从已知类集中提取知识,将未知类分类为新的类别。
  • methods: 本文提出了一种基于Tabular数据的NCD方法,不需要先知道新类的数量。方法包括通过修改$k$-fold批处理过程来调整NCD方法的超参数,以避免过拟合隐藏的已知类。此外,本文还提出了一种简单的深度NCD模型,该模型只包含NCD问题所需的基本元素,并在实际情况下表现出色。
  • results: 实验表明,提出的方法和超参数调整过程可以在7个Tabular数据集上解决NCD问题,而不需要先知道新类的数量。此外,本文还发现了使用已知类知识的两种无监督聚类算法($k$-means和特征分布分 clustering)可以有效地利用已知类知识来解决NCD问题。
    Abstract The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the $k$-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and performs impressively well under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms ($k$-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes.
    摘要 《短文》 noval 类发现(NCD)问题的解决方法是从已知类别的标注集中提取知识,以便准确分类未知类别。 although NCD 在计算机视觉问题上得到了社区的广泛关注,但是它们通常在不实际的情况下解决,即已知类别的数量在先知道。此外,一些方法会使用这些假设来调整超参数。这些方法在实际场景下无法应用。在本工作中,我们将解决 NCD 问题在表格数据上,而不需要已知类别的先知道。为此,我们提议通过适应 $k $-fold 跨Validation 过程和隐藏已知类别来调整 NCD 方法的超参数。由于我们发现了过多的超参数可能会在隐藏的类别上过拟合,所以我们定义了一个简单的深度 NCD 模型。这种方法由 NCD 问题所需的基本元素组成,并在实际条件下表现出色。此外,我们适应了两种无监督分群算法($k $-means 和 Spectral Clustering),以便利用已知类别的知识。我们对 7 个表格数据集进行了广泛的实验,并证明了我们提出的方法和超参数调整过程的效iveness。 results 表明,NCD 问题可以在不依赖未知类别的知识的情况下解决。

Fair Wasserstein Coresets

  • paper_url: http://arxiv.org/abs/2311.05436
  • repo_url: None
  • paper_authors: Zikai Xiong, Niccolò Dalmasso, Vamsi K. Potluru, Tucker Balch, Manuela Veloso
  • for: This paper is written for those who are interested in developing fair and representative machine learning models, particularly in the context of decision-making processes.
  • methods: The paper proposes a novel approach called Fair Wasserstein Coresets (FWC), which generates fair synthetic representative samples and sample-level weights for downstream learning tasks. FWC minimizes the Wasserstein distance between the original datasets and the weighted synthetic samples while enforcing demographic parity.
  • results: The paper shows that FWC can be thought of as a constrained version of Lloyd’s algorithm for k-medians or k-means clustering, and demonstrates the scalability of the approach through experiments conducted on both synthetic and real datasets. The results also highlight the competitive performance of FWC compared to existing fair clustering approaches, even when attempting to enhance the fairness of the latter through fair pre-processing techniques.
    Abstract Recent technological advancements have given rise to the ability of collecting vast amounts of data, that often exceed the capacity of commonly used machine learning algorithms. Approaches such as coresets and synthetic data distillation have emerged as frameworks to generate a smaller, yet representative, set of samples for downstream training. As machine learning is increasingly applied to decision-making processes, it becomes imperative for modelers to consider and address biases in the data concerning subgroups defined by factors like race, gender, or other sensitive attributes. Current approaches focus on creating fair synthetic representative samples by optimizing local properties relative to the original samples. These methods, however, are not guaranteed to positively affect the performance or fairness of downstream learning processes. In this work, we present Fair Wasserstein Coresets (FWC), a novel coreset approach which generates fair synthetic representative samples along with sample-level weights to be used in downstream learning tasks. FWC aims to minimize the Wasserstein distance between the original datasets and the weighted synthetic samples while enforcing (an empirical version of) demographic parity, a prominent criterion for algorithmic fairness, via a linear constraint. We show that FWC can be thought of as a constrained version of Lloyd's algorithm for k-medians or k-means clustering. Our experiments, conducted on both synthetic and real datasets, demonstrate the scalability of our approach and highlight the competitive performance of FWC compared to existing fair clustering approaches, even when attempting to enhance the fairness of the latter through fair pre-processing techniques.
    摘要 最近的技术进步使得收集大量数据变得可能,这些数据经常超出常用的机器学习算法的处理能力。有些方法,如核心集和数据简化 Synthetic data distillation,被提出来生成一组更小、 yet representative 的样本,用于下游训练。由于机器学习在决策过程中越来越常用,因此模型者需要考虑和处理数据中的偏见,特别是根据因素如种族、性别或其他敏感属性来定义的子群。现有的方法是创建公平的 Synthetic representative samples,并且优化当地的属性来适应原始样本。然而,这些方法并不能保证下游学习过程中的表现和公平性。在这种情况下,我们提出了公平 Wasserstein coresets (FWC),一种新的核心集方法,可以生成公平的 Synthetic representative samples,同时生成样本级别的权重,用于下游学习任务。FWC 的目标是将原始数据集和权重 Synthetic samples 的 Wasserstein 距离降低到最小,同时通过 Linear 约束来保证(empirical version的)人口均衡,一种重要的算法公平性标准。我们表明,FWC 可以看作是 Lloyd 算法的受限版本,用于 k-medians 或 k-means 聚类。我们的实验结果表明,FWC 可以扩展到大规模数据集,并且在比较公平的情况下,FWC 的性能与现有的公平聚类方法相当,甚至在使用公平预处理技术时,FWC 的性能仍然占据竞争优势。

Parkinson’s Disease Detection through Vocal Biomarkers and Advanced Machine Learning Algorithms: A Comprehensive Study

  • paper_url: http://arxiv.org/abs/2311.05435
  • repo_url: None
  • paper_authors: Md Abu Sayed, Sabbir Ahamed, Duc M Cao, Md Eyasin Ul Islam Pavel, Malay Sarkar, Md Tuhin Mia
  • for: 预测公inson病的发病风险
  • methods: 使用高级机器学习算法,包括XGBoost、LightGBM、Bagging、AdaBoost和支持向量机制等,评估这些模型在预测公inson病的表现
  • results: 研究发现LightGBM模型的准确率为96%,AUC为96%,敏感性为100%,特异性为94.43%,其他机器学习算法的准确率和AUC分别落后于LightGBM模型。
    Abstract Parkinson's disease (PD) is a prevalent neurodegenerative disorder known for its impact on motor neurons, causing symptoms like tremors, stiffness, and gait difficulties. This study explores the potential of vocal feature alterations in PD patients as a means of early disease prediction. This research aims to predict the onset of Parkinson's disease. Utilizing a variety of advanced machine-learning algorithms, including XGBoost, LightGBM, Bagging, AdaBoost, and Support Vector Machine, among others, the study evaluates the predictive performance of these models using metrics such as accuracy, area under the curve (AUC), sensitivity, and specificity. The findings of this comprehensive analysis highlight LightGBM as the most effective model, achieving an impressive accuracy rate of 96%, alongside a matching AUC of 96%. LightGBM exhibited a remarkable sensitivity of 100% and specificity of 94.43%, surpassing other machine learning algorithms in accuracy and AUC scores. Given the complexities of Parkinson's disease and its challenges in early diagnosis, this study underscores the significance of leveraging vocal biomarkers coupled with advanced machine-learning techniques for precise and timely PD detection.
    摘要 帕金森病 (PD) 是一种常见的神经退化疾病,对运动神经元造成了许多问题,导致了抽搐、硬度和步态困难等症状。这项研究探讨了PD患者的声音特征变化可能作为早期病情预测的可能性。这项研究的目标是预测帕金森病的病起点。通过使用多种先进的机器学习算法,包括XGBoost、LightGBM、Bagging、AdaBoost和支持向量机等,这项研究评估了这些模型在精度、AUC、敏感度和特征等方面的预测性能。研究结果显示LightGBM模型在精度和AUC方面表现出色,达到了96%的准确率,并与其他机器学习算法相比,敏感度和特征都达到了100%和94.43%的高水平。考虑到帕金森病的复杂性和早期诊断的挑战,这项研究强调了在使用声音生物标志和先进机器学习技术之前,可以为早期和准确的PD患者诊断提供有利的条件。

Taxonomy for Resident Space Objects in LEO: A Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2311.05430
  • repo_url: None
  • paper_authors: Marta Guimarães, Cláudia Soares, Chiara Manfletti
    for:本研究旨在提高低地球轨道频谱中各种垃圾Space Objects(RSOs)的管理,为直接和间接使用空间的所有用户减少风险。methods:本研究提出了一个新的分类法,使得RSOs可以根据其主要特征分类,从而提高空间质量管理。此外,本研究还使用深度学习模型,通过自动编码器架构减少RSOs特征表示的维度,并使用不同技术如均匀投影等来探索RSOs的基本集群。results:本研究的结果表明,使用提出的分类法和深度学习模型可以更好地理解RSOs的行为特征,并提高空间质量管理的效率和效果。
    Abstract The increasing number of RSOs has raised concerns about the risk of collisions and catastrophic incidents for all direct and indirect users of space. To mitigate this issue, it is essential to have a good understanding of the various RSOs in orbit and their behaviour. A well-established taxonomy defining several classes of RSOs is a critical step in achieving this understanding. This taxonomy helps assign objects to specific categories based on their main characteristics, leading to better tracking services. Furthermore, a well-established taxonomy can facilitate research and analysis processes by providing a common language and framework for better understanding the factors that influence RSO behaviour in space. These factors, in turn, help design more efficient and effective strategies for space traffic management. Our work proposes a new taxonomy for RSOs focusing on the low Earth orbit regime to enhance space traffic management. In addition, we present a deep learning-based model that uses an autoencoder architecture to reduce the features representing the characteristics of the RSOs. The autoencoder generates a lower-dimensional space representation that is then explored using techniques such as Uniform Manifold Approximation and Projection to identify fundamental clusters of RSOs based on their unique characteristics. This approach captures the complex and non-linear relationships between the features and the RSOs' classes identified. Our proposed taxonomy and model offer a significant contribution to the ongoing efforts to mitigate the overall risks posed by the increasing number of RSOs in orbit.
    摘要 随着各种人造卫星的数量的增加,引发了关于碰撞和灾难性事件的风险的担忧。为了解决这个问题,需要对各种人造卫星的运行和行为进行深入的了解。我们提出了一种新的分类法,将人造卫星分为不同类别,以便更好地跟踪和管理它们。此外,我们还提出了一种基于深度学习的模型,使用自适应网络架构,将人造卫星的特征特征缩减到更低的维度上。这种方法可以捕捉人造卫星的复杂和非线性关系,并且可以通过不同的技术,如射影映射,来发现人造卫星的基本群集。这种方法对于提高空间交通管理具有重要意义。

Statistical Learning of Conjunction Data Messages Through a Bayesian Non-Homogeneous Poisson Process

  • paper_url: http://arxiv.org/abs/2311.05426
  • repo_url: None
  • paper_authors: Marta Guimarães, Cláudia Soares, Chiara Manfletti
  • for: 本研究旨在提高现有的冲突避免和空间交通管理方法,以适应随着卫星数量的不断增加而增加的挑战。
  • methods: 本研究使用 Bayesian 非homogeneous Poisson process 模型,通过高精度的 Probabilistic Programming Language 实现,以充分描述卫星 conjunction 的下发现现象。
  • results: 比较基准模型,研究结果显示 bayesian 非homogeneous Poisson process 模型可以更加准确地模拟卫星 conjunction 的下发现现象,帮助操作人员在时间上作出合适的冲突避免操作,但不需要过度的措施。
    Abstract Current approaches for collision avoidance and space traffic management face many challenges, mainly due to the continuous increase in the number of objects in orbit and the lack of scalable and automated solutions. To avoid catastrophic incidents, satellite owners/operators must be aware of their assets' collision risk to decide whether a collision avoidance manoeuvre needs to be performed. This process is typically executed through the use of warnings issued in the form of CDMs which contain information about the event, such as the expected TCA and the probability of collision. Our previous work presented a statistical learning model that allowed us to answer two important questions: (1) Will any new conjunctions be issued in the next specified time interval? (2) When and with what uncertainty will the next CDM arrive? However, the model was based on an empirical Bayes homogeneous Poisson process, which assumes that the arrival rates of CDMs are constant over time. In fact, the rate at which the CDMs are issued depends on the behaviour of the objects as well as on the screening process performed by third parties. Thus, in this work, we extend the previous study and propose a Bayesian non-homogeneous Poisson process implemented with high precision using a Probabilistic Programming Language to fully describe the underlying phenomena. We compare the proposed solution with a baseline model to demonstrate the added value of our approach. The results show that this problem can be successfully modelled by our Bayesian non-homogeneous Poisson Process with greater accuracy, contributing to the development of automated collision avoidance systems and helping operators react timely but sparingly with satellite manoeuvres.
    摘要 当前的冲突避免和空间交通管理技术面临着许多挑战,主要是因为遥感器的数量不断增加,以及没有可扩展和自动化的解决方案。为了避免catastrophic incidents,卫星所有者/运营商必须了解它们的资产冲突风险,并决定是否需要执行冲突避免操作。这个过程通常通过使用CDM(Conjunction Data Message)发送 warnings,其中包含事件信息,如预计的TCA(Time of Close Approach)和冲突的概率。我们之前的研究提出了一种统计学学习模型,可以回答以下两个重要问题:(1)将来的 conjunctions 是否在指定时间间隔内发生?(2)预计接下来的 CDM 会在什么时间 arrive,以及具有多少不确定性?但是,这个模型基于empirical Bayes Homogeneous Poisson Process,即CDMs 的发送速率是时间不变的。实际上,CDMs 的发送速率取决于遥感器的行为以及第三方屏选过程。因此,在这种工作中,我们延续前一个研究,并提出了一种 Bayesian non-homogeneous Poisson Process,使用高精度的 probabilistic programming language 来完全描述下面现象。我们与基准模型进行比较,以示出我们的方法的优势。结果显示,我们的方法可以更加准确地模型这个问题,从而为自动化冲突避免系统的发展和操作人员在时间和不确定性方面做出更好的决策。

Diffusion Based Causal Representation Learning

  • paper_url: http://arxiv.org/abs/2311.05421
  • repo_url: None
  • paper_authors: Amir Mohammad Karimi Mamaghan, Andrea Dittadi, Stefan Bauer, Karl Henrik Johansson, Francesco Quinzan
  • for: 本研究旨在提出一种新的Diffusion-based Causal Representation Learning(DCRL)算法,用于 causal representation learning。
  • methods: 该算法使用Diffusion-based表示方法进行 causal discovery,可以获取不同级别的信息。
  • results: 实验表明,DCRL方法可以和传统的Variational Auto-Encoder(VAE)方法相比, equally well in identifying causal structure and causal variables。
    Abstract Causal reasoning can be considered a cornerstone of intelligent systems. Having access to an underlying causal graph comes with the promise of cause-effect estimation and the identification of efficient and safe interventions. However, learning causal representations remains a major challenge, due to the complexity of many real-world systems. Previous works on causal representation learning have mostly focused on Variational Auto-Encoders (VAE). These methods only provide representations from a point estimate, and they are unsuitable to handle high dimensions. To overcome these problems, we proposed a new Diffusion-based Causal Representation Learning (DCRL) algorithm. This algorithm uses diffusion-based representations for causal discovery. DCRL offers access to infinite dimensional latent codes, which encode different levels of information in the latent code. In a first proof of principle, we investigate the use of DCRL for causal representation learning. We further demonstrate experimentally that this approach performs comparably well in identifying the causal structure and causal variables.
    摘要 causal reasoning 可以看作智能系统的基础之一。具有下面的 causal 图来 promise of cause-effect estimation 和 identification of efficient and safe interventions。然而,学习 causal 表示仍然是一个主要挑战,因为许多实际世界系统的复杂性。先前的 causal 表示学习方法主要集中在 Variational Auto-Encoders (VAE)。这些方法只提供点估计的表示,不适用于高维度。为了解决这些问题,我们提出了一种新的 Diffusion-based Causal Representation Learning (DCRL) 算法。这个算法使用扩散基于的表示来进行 causal 发现。DCRL 提供访问无穷维的潜在码,这些潜在码编码不同级别的信息。在一个首次证明的原则中,我们研究了 DCRL 的使用,并在实验中证明了这种方法可以比较好地确定 causal 结构和 causal 变量。

Counterfactually Fair Representation

  • paper_url: http://arxiv.org/abs/2311.05420
  • repo_url: https://github.com/osu-srml/cf_representation_learning
  • paper_authors: Zhiqun Zuo, Mohammad Mahdi Khalili, Xueru Zhang
  • for: 本研究旨在提出一种新的算法,用于在高风险应用中使用机器学习模型,以避免保护性社会组别的偏见。
  • methods: 本研究使用了Counterfactual Fairness(CF)的公平性观,该观念基于一个下游 causal graph,并首先由Kusner等人提出(Reference [1])。学习满足CF的公平模型可能困难。在Reference [1]中,证明了不使用敏感特征的后代feature可以满足CF。然而,后续的一些工作提出了使用所有特征进行训练CF模型的方法,但没有理论保证它们可以满足CF。本研究则提出了一种新的算法,使用所有可用的特征进行训练CF模型,并经过理论和实验验证,表明这种方法可以满足CF。
  • results: 本研究通过理论和实验验证,证明了使用新的算法可以在高风险应用中使用机器学习模型,以避免保护性社会组别的偏见。 CodeRepository可以在https://github.com/osu-srml/CF_Representation_Learning中找到。
    Abstract The use of machine learning models in high-stake applications (e.g., healthcare, lending, college admission) has raised growing concerns due to potential biases against protected social groups. Various fairness notions and methods have been proposed to mitigate such biases. In this work, we focus on Counterfactual Fairness (CF), a fairness notion that is dependent on an underlying causal graph and first proposed by Kusner \textit{et al.}~\cite{kusner2017counterfactual}; it requires that the outcome an individual perceives is the same in the real world as it would be in a "counterfactual" world, in which the individual belongs to another social group. Learning fair models satisfying CF can be challenging. It was shown in \cite{kusner2017counterfactual} that a sufficient condition for satisfying CF is to \textbf{not} use features that are descendants of sensitive attributes in the causal graph. This implies a simple method that learns CF models only using non-descendants of sensitive attributes while eliminating all descendants. Although several subsequent works proposed methods that use all features for training CF models, there is no theoretical guarantee that they can satisfy CF. In contrast, this work proposes a new algorithm that trains models using all the available features. We theoretically and empirically show that models trained with this method can satisfy CF\footnote{The code repository for this work can be found in \url{https://github.com/osu-srml/CF_Representation_Learning}.
    摘要 使用机器学习模型在高风险应用(如医疗、贷款、大学招生)引发了增长的关注,因为它们可能对保护的社会群体产生偏见。不同的公平性观念和方法已经被提出来 Mitigate such biases. 在这项工作中,我们关注于Counterfactual Fairness(CF),这是一种公平观念,它取决于下面的 causal graph 和由 Kusner 等人提出的 \cite{kusner2017counterfactual}。CF 要求个体在实际世界中所看到的结果与在一个 "counterfactual" 世界中看到的结果相同。学习满足 CF 的公平模型可以是困难的。在 \cite{kusner2017counterfactual} 中显示了一个 suficient condition ,即不使用敏感属性的后代feature 在 causal graph 中。这意味着可以使用非敏感属性的后代feature 来学习 CF 模型,并且消除所有敏感属性的后代feature。虽然后续的工作提出了使用所有特征进行训练 CF 模型的方法,但没有理论保证它们可以满足 CF。相反,本工作提出了一种新的算法,该算法使用所有可用的特征进行模型训练。我们 theoretically 和 empirically 表明,使用这种方法可以满足 CF,并且可以在实际应用中获得更好的结果。Note: The code repository for this work can be found in \url{https://github.com/osu-srml/CF_Representation_Learning}.

Predicting the Position Uncertainty at the Time of Closest Approach with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.05417
  • repo_url: None
  • paper_authors: Marta Guimarães, Cláudia Soares, Chiara Manfletti
  • for: 避免航天器归合撞击
  • methods: 使用Diffusion模型预测碰撞对象位置不确定性的发展
  • results: 比较其他现有解决方案和Na"ive基线方法,提出一种可能大幅提高航天器操作安全性和效率的解决方案。
    Abstract The risk of collision between resident space objects has significantly increased in recent years. As a result, spacecraft collision avoidance procedures have become an essential part of satellite operations. To ensure safe and effective space activities, satellite owners and operators rely on constantly updated estimates of encounters. These estimates include the uncertainty associated with the position of each object at the expected TCA. These estimates are crucial in planning risk mitigation measures, such as collision avoidance manoeuvres. As the TCA approaches, the accuracy of these estimates improves, as both objects' orbit determination and propagation procedures are made for increasingly shorter time intervals. However, this improvement comes at the cost of taking place close to the critical decision moment. This means that safe avoidance manoeuvres might not be possible or could incur significant costs. Therefore, knowing the evolution of this variable in advance can be crucial for operators. This work proposes a machine learning model based on diffusion models to forecast the position uncertainty of objects involved in a close encounter, particularly for the secondary object (usually debris), which tends to be more unpredictable. We compare the performance of our model with other state-of-the-art solutions and a na\"ive baseline approach, showing that the proposed solution has the potential to significantly improve the safety and effectiveness of spacecraft operations.
    摘要 随着近年航天器之间的碰撞风险的增加,航天器碰撞避免程序已成为卫星运营中的一项重要组成部分。为确保安全有效的航天活动,卫星所有者和运营商依靠Constantly updated的避免碰撞估计。这些估计包括碰撞时点的uncertainty,这些估计是规划避免碰撞措施的关键。随着TCA(最佳距离)接近,这些估计的准确性提高,因为两个物体的轨道决定和推算过程都在更短的时间间隔进行。然而,这种改进带来了在关键决策时刻进行安全避免措施的成本。因此,了解这个变量的进化可以对操作人员非常重要。本工作提出了基于扩散模型的机器学习模型,用于预测碰撞中参与者物体的位置不确定性的发展。我们与其他当前状态的解决方案和简单基eline方法进行比较,显示了我们的方案具有提高航天器操作的安全性和效率的潜在潜力。

Data Distillation for Neural Network Potentials toward Foundational Dataset

  • paper_url: http://arxiv.org/abs/2311.05407
  • repo_url: None
  • paper_authors: Gang Seob Jung, Sangkeun Lee, Jong Youl Choi
    for:This paper aims to address the discrepancy between predicted properties of materials through generative models and calculated properties through ab initio calculations.methods:The paper uses extended ensemble molecular dynamics (MD) to secure a broad range of liquid- and solid-phase configurations in one of the metallic systems, nickel. The data is then distilled to significantly reduce the amount of data without losing much accuracy.results:The paper shows that the neural network-based potentials (NNPs) trained from the distilled data can predict different energy-minimized closed-pack crystal structures, even though those structures were not explicitly part of the initial data. The approach is also demonstrated to be applicable to other metallic systems (aluminum and niobium), without repeating the sampling and distillation processes.
    Abstract Machine learning (ML) techniques and atomistic modeling have rapidly transformed materials design and discovery. Specifically, generative models can swiftly propose promising materials for targeted applications. However, the predicted properties of materials through the generative models often do not match with calculated properties through ab initio calculations. This discrepancy can arise because the generated coordinates are not fully relaxed, whereas the many properties are derived from relaxed structures. Neural network-based potentials (NNPs) can expedite the process by providing relaxed structures from the initially generated ones. Nevertheless, acquiring data to train NNPs for this purpose can be extremely challenging as it needs to encompass previously unknown structures. This study utilized extended ensemble molecular dynamics (MD) to secure a broad range of liquid- and solid-phase configurations in one of the metallic systems, nickel. Then, we could significantly reduce them through active learning without losing much accuracy. We found that the NNP trained from the distilled data could predict different energy-minimized closed-pack crystal structures even though those structures were not explicitly part of the initial data. Furthermore, the data can be translated to other metallic systems (aluminum and niobium), without repeating the sampling and distillation processes. Our approach to data acquisition and distillation has demonstrated the potential to expedite NNP development and enhance materials design and discovery by integrating generative models.
    摘要 (简化中文)机器学习技术和原子尺度模型已经快速地改变了材料设计和发现。特别是,生成模型可以快速提出适用于目标应用的有前途的材料。然而,通过生成模型预测的材料性能与原子尺度计算的性能之间存在差异。这种差异可能是因为生成的坐标不完全填充,而许多性能来自于已relax结构。基于神经网络的潜在能(NNPs)可以加速过程,并提供已relax结构。然而,为了训练NNPs,需要具备广泛的数据,这些数据需要包括前不知道的结构。本研究使用了扩展ensemble分子动力学(MD)来保证 Nickel 金属系统中的广泛液相和固相配置。然后,我们可以通过活动学习大幅减少数据,而不失去准确性。我们发现,使用滤制数据训练的 NNP 可以预测不同的能量最小化关闭晶体结构,即使这些结构没有直接出现在初始数据中。此外,数据可以翻译到其他金属系统(锌和钴),无需重复样本和滤制过程。我们的数据获取和滤制方法已经展示了可以加速 NNP 的发展,并且提高材料设计和发现的效率,通过结合生成模型。

The Sample Complexity Of ERMs In Stochastic Convex Optimization

  • paper_url: http://arxiv.org/abs/2311.05398
  • repo_url: None
  • paper_authors: Daniel Carmon, Roi Livni, Amir Yehudayoff
  • for: 这篇论文主要研究的是 Stochastic Convex Optimization 模型下的学习问题,特别是让任意 Empirical Risk Minimizer (ERM) 在真实人口中表现良好所需的数据点数量。
  • methods: 本文使用了一种新的分析方法,可以证明 $\tilde{O}(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ 数据点数量是 sufficient。这个结论解决了一个长期未解的问题,并且提供了一个新的分离bound。
  • results: 本文证明了在学习约bounded凸 lipschitz函数的经典设定下,$\tilde{O}(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ 数据点数量是必要的和 suficient。此外,本文还推广了结论,证明这个结论对所有半Symmetric凸体都成立。
    Abstract Stochastic convex optimization is one of the most well-studied models for learning in modern machine learning. Nevertheless, a central fundamental question in this setup remained unresolved: "How many data points must be observed so that any empirical risk minimizer (ERM) shows good performance on the true population?" This question was proposed by Feldman (2016), who proved that $\Omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are necessary (where $d$ is the dimension and $\epsilon>0$ is the accuracy parameter). Proving an $\omega(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ lower bound was left as an open problem. In this work we show that in fact $\tilde{O}(\frac{d}{\epsilon}+\frac{1}{\epsilon^2})$ data points are also sufficient. This settles the question and yields a new separation between ERMs and uniform convergence. This sample complexity holds for the classical setup of learning bounded convex Lipschitz functions over the Euclidean unit ball. We further generalize the result and show that a similar upper bound holds for all symmetric convex bodies. The general bound is composed of two terms: (i) a term of the form $\tilde{O}(\frac{d}{\epsilon})$ with an inverse-linear dependence on the accuracy parameter, and (ii) a term that depends on the statistical complexity of the class of $\textit{linear}$ functions (captured by the Rademacher complexity). The proof builds a mechanism for controlling the behavior of stochastic convex optimization problems.
    摘要

Beyond the training set: an intuitive method for detecting distribution shift in model-based optimization

  • paper_url: http://arxiv.org/abs/2311.05363
  • repo_url: None
  • paper_authors: Farhan Damani, David H Brookes, Theodore Sternlieb, Cameron Webster, Stephen Malina, Rishi Jajoo, Kathy Lin, Sam Sinai
  • for: 本研究的目的是提出一种简单的方法,用于检测模型训练和设计数据集之间的分布差异。
  • methods: 该方法使用了一个二分类器,通过知识未标注的设计分布来分离训练数据和设计数据。二分类器的логи特征值被用作分布差异的代理量。
  • results: 在一个实际应用中,我们 validate了该方法,并发现分布差异的严重程度与优化算法步数有关。该简单的方法可以识别这些差异,使用户可以将搜索局限在模型预测可靠的区域内,从而提高设计质量。
    Abstract Model-based optimization (MBO) is increasingly applied to design problems in science and engineering. A common scenario involves using a fixed training set to train models, with the goal of designing new samples that outperform those present in the training data. A major challenge in this setting is distribution shift, where the distributions of training and design samples are different. While some shift is expected, as the goal is to create better designs, this change can negatively affect model accuracy and subsequently, design quality. Despite the widespread nature of this problem, addressing it demands deep domain knowledge and artful application. To tackle this issue, we propose a straightforward method for design practitioners that detects distribution shifts. This method trains a binary classifier using knowledge of the unlabeled design distribution to separate the training data from the design data. The classifier's logit scores are then used as a proxy measure of distribution shift. We validate our method in a real-world application by running offline MBO and evaluate the effect of distribution shift on design quality. We find that the intensity of the shift in the design distribution varies based on the number of steps taken by the optimization algorithm, and our simple approach can identify these shifts. This enables users to constrain their search to regions where the model's predictions are reliable, thereby increasing the quality of designs.
    摘要

Basis functions nonlinear data-enabled predictive control: Consistent and computationally efficient formulations

  • paper_url: http://arxiv.org/abs/2311.05360
  • repo_url: None
  • paper_authors: Mircea Lazar
  • for: 这篇论文探讨了数据启用预测控制(DeePC)在非线性系统上的扩展,通过通用基函数。
  • methods: 论文使用了基函数DeePC行为预测器,并确定了必要和充分的条件,以确保与相应的基函数多步标识预测器的等价性。
  • results: 论文通过 derivation of 动态常数化成本函数,实现了一个准确的基函数DeePC表述,并提出了两种更改的表述,以提高计算效率。 Additionally, the paper also discusses the consistency of Koopman DeePC and provides several methods for constructing the basis functions representation. The effectiveness of the developed consistent basis functions DeePC formulations is demonstrated on a benchmark nonlinear pendulum state-space model, for both noise-free and noisy data.
    Abstract This paper considers the extension of data-enabled predictive control (DeePC) to nonlinear systems via general basis functions. Firstly, we formulate a basis functions DeePC behavioral predictor and we identify necessary and sufficient conditions for equivalence with a corresponding basis functions multi-step identified predictor. The derived conditions yield a dynamic regularization cost function that enables a well-posed (i.e., consistent) basis functions formulation of nonlinear DeePC. To optimize computational efficiency of basis functions DeePC we further develop two alternative formulations that use a simpler, sparse regularization cost function and ridge regression, respectively. Consistency implications for Koopman DeePC as well as several methods for constructing the basis functions representation are also indicated. The effectiveness of the developed consistent basis functions DeePC formulations is illustrated on a benchmark nonlinear pendulum state-space model, for both noise free and noisy data.
    摘要

Accelerated Shapley Value Approximation for Data Evaluation

  • paper_url: http://arxiv.org/abs/2311.05346
  • repo_url: None
  • paper_authors: Lauren Watson, Zeno Kujawa, Rayna Andreeva, Hao-Tsung Yang, Tariq Elahi, Rik Sarkar
  • for: 这个论文的目的是提出一种更加高效的数据估价方法,以便更好地应用在机器学习中。
  • methods: 这个论文使用了机器学习问题的结构特性来更加高效地计算数据点的Shapley值。它提出了一种基于小subset的approximate Shapley值的方法,并提供了对不同学习设置下的准确性保证。
  • results: 该方法可以快速和高效地计算数据点的Shapley值,并且可以保持数据的准确性和排名。实验表明,这种方法可以在预训练网络中带来更高的效率。
    Abstract Data valuation has found various applications in machine learning, such as data filtering, efficient learning and incentives for data sharing. The most popular current approach to data valuation is the Shapley value. While popular for its various applications, Shapley value is computationally expensive even to approximate, as it requires repeated iterations of training models on different subsets of data. In this paper we show that the Shapley value of data points can be approximated more efficiently by leveraging the structural properties of machine learning problems. We derive convergence guarantees on the accuracy of the approximate Shapley value for different learning settings including Stochastic Gradient Descent with convex and non-convex loss functions. Our analysis suggests that in fact models trained on small subsets are more important in the context of data valuation. Based on this idea, we describe $\delta$-Shapley -- a strategy of only using small subsets for the approximation. Experiments show that this approach preserves approximate value and rank of data, while achieving speedup of up to 9.9x. In pre-trained networks the approach is found to bring more efficiency in terms of accurate evaluation using small subsets.
    摘要 “数据评估在机器学习中找到了多种应用,如数据筛选、高效学习和数据分享的激励。目前最受欢迎的数据评估方法是雪莱值。虽然具有多种应用,但雪莱值计算成本高,需要重复训练模型不同的数据 subsets。在这篇论文中,我们表明可以更有效地 aproximate 雪莱值的数据点,利用机器学习问题的结构性质。我们提供了不同学习设置下的准确性拥有保证,包括杂散Gradient Descent的凸和非凸损函数。我们的分析表明,在数据评估中,使用小subset是更重要的。基于这个想法,我们描述了 $\delta $-雪莱策略,即只使用小subset进行 aproximation。实验表明,这种方法可以保持数据的相对值和排名,同时实现速度提高达9.9倍。在预训练网络上,这种方法具有更高的准确评估效果。”

Real-time Addressee Estimation: Deployment of a Deep-Learning Model on the iCub Robot

  • paper_url: http://arxiv.org/abs/2311.05334
  • repo_url: None
  • paper_authors: Carlo Mazzola, Francesco Rea, Alessandra Sciutti
  • for: 这个论文的目的是开发一种基于非语言表征的地址者估计模型,以便人工智能对话机器人在多方和无结构场景中与人类交互更加畅通。
  • methods: 该模型使用了深度学习技术,利用speaker的视线和身姿行为来进行地址者估计。
  • results: 实验表明,该模型在实时人机器人交互中的性能比前一个dataset上的训练测试更好,表明该模型可以在多方和无结构场景中提供更高的地址者估计精度。
    Abstract Addressee Estimation is the ability to understand to whom a person is talking, a skill essential for social robots to interact smoothly with humans. In this sense, it is one of the problems that must be tackled to develop effective conversational agents in multi-party and unstructured scenarios. As humans, one of the channels that mainly lead us to such estimation is the non-verbal behavior of speakers: first of all, their gaze and body pose. Inspired by human perceptual skills, in the present work, a deep-learning model for Addressee Estimation relying on these two non-verbal features is designed, trained, and deployed on an iCub robot. The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction compared to previous tests on the dataset used for the training.
    摘要 收件人估算是指理解对话中谁正在说话,这是社交机器人与人类交流的关键技能。在这种多方和无结构的情况下,这成为开发有效对话代理人的一个重要问题。人类之所以能够准确地估算收件人,一部分来自于对话者的非语言行为:首先是他们的视线和姿势。以人类的感知技能为灵感,本研究开发了一种基于这两种非语言特征的深度学习模型,并在iCub机器人上进行了实时部署。研究中介绍了实施过程和在真实人机交互中模型的性能比较前一些测试数据集。

RepQ: Generalizing Quantization-Aware Training for Re-Parametrized Architectures

  • paper_url: http://arxiv.org/abs/2311.05317
  • repo_url: None
  • paper_authors: Anastasiia Prutianova, Alexey Zaytsev, Chung-Kuei Lee, Fengyu Sun, Ivan Koryakovskiy
  • for: 提高训练和测试 neural network 的效率,使其能够在有限资源的环境中部署。
  • methods: 使用 quantization 和 re-parametrization 两种方法来提高模型性能,并同时应用这两种方法来提高模型的效率。
  • results: RepQ 方法比基eline方法 LSQ 量化方案在所有实验中具有更好的性能。
    Abstract Existing neural networks are memory-consuming and computationally intensive, making deploying them challenging in resource-constrained environments. However, there are various methods to improve their efficiency. Two such methods are quantization, a well-known approach for network compression, and re-parametrization, an emerging technique designed to improve model performance. Although both techniques have been studied individually, there has been limited research on their simultaneous application. To address this gap, we propose a novel approach called RepQ, which applies quantization to re-parametrized networks. Our method is based on the insight that the test stage weights of an arbitrary re-parametrized layer can be presented as a differentiable function of trainable parameters. We enable quantization-aware training by applying quantization on top of this function. RepQ generalizes well to various re-parametrized models and outperforms the baseline method LSQ quantization scheme in all experiments.
    摘要 Note: "Simplified Chinese" is a translation of "Traditional Chinese" and not "Mandarin Chinese". Simplified Chinese is a standardized form of Chinese characters that is used in mainland China, while Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Reliable and Efficient Data Collection in UAV-based IoT Networks

  • paper_url: http://arxiv.org/abs/2311.05303
  • repo_url: None
  • paper_authors: Poorvi Joshi, Alakesh Kalita, Mohan Gurusamy
  • For: This paper focuses on the challenges and opportunities of using Unmanned Aerial Vehicles (UAVs) to enhance data collection in Internet of Things (IoT) networks.* Methods: The paper explores various UAV-based data collection methods, including their advantages and disadvantages, and discusses performance metrics for data collection.* Results: The paper discusses efficient data collection strategies in UAV-based IoT networks, including trajectory and path planning, collision avoidance, sensor network clustering, data aggregation, UAV swarm formations, and artificial intelligence for optimization.Here is the same information in Simplified Chinese text:* For: 这篇论文关注使用无人机(UAV)改善互联网物联网(IoT)网络数据收集的挑战和机遇。* Methods: 论文探讨了不同的UAV基于数据收集方法,包括其优势和缺点,并讨论数据收集性能指标。* Results: 论文介绍了UAV基于IoT网络数据收集的有效策略,包括路径规划、避免冲突、感知网络团 clustering、数据聚合、UAV群 formation、人工智能优化等。
    Abstract Internet of Things (IoT) involves sensors for monitoring and wireless networks for efficient communication. However, resource-constrained IoT devices and limitations in existing wireless technologies hinder its full potential. Integrating Unmanned Aerial Vehicles (UAVs) into IoT networks can address some challenges by expanding its' coverage, providing security, and bringing computing closer to IoT devices. Nevertheless, effective data collection in UAV-assisted IoT networks is hampered by factors, including dynamic UAV behavior, environmental variables, connectivity instability, and security considerations. In this survey, we first explore UAV-based IoT networks, focusing on communication and networking aspects. Next, we cover various UAV-based data collection methods their advantages and disadvantages, followed by a discussion on performance metrics for data collection. As this article primarily emphasizes reliable and efficient data collection in UAV-assisted IoT networks, we briefly discuss existing research on data accuracy and consistency, network connectivity, and data security and privacy to provide insights into reliable data collection. Additionally, we discuss efficient data collection strategies in UAV-based IoT networks, covering trajectory and path planning, collision avoidance, sensor network clustering, data aggregation, UAV swarm formations, and artificial intelligence for optimization. We also present two use cases of UAVs as a service for enhancing data collection reliability and efficiency. Finally, we discuss future challenges in data collection for UAV-assisted IoT networks.
    摘要 互联网智能化(IoT)具有侦测器和无线网络,以实现高效的通信。然而,IoT设备受限,而现有的无线技术也有一些限制,这限制了IoT的全面发挥。将无人航空车(UAV)纳入IoT网络中可以解决一些挑战,扩大其覆盖范围,提供安全性,并将计算与IoT设备靠近。然而,UAV-协助的IoT网络中的数据收集效率受到一些因素的影响,包括UAV的动态行为、环境变量、连接稳定性和安全考虑。在本调查中,我们首先探讨UAV-基本的IoT网络,专注于通信和网络方面。接着,我们详细介绍了由UAV支持的数据收集方法,包括优点和缺点。接着,我们讨论了数据收集性能的衡量指标,以及现有的研究数据准确性、网络连接稳定性和数据安全性等方面的研究。在本文中,我们专注于可靠和高效的数据收集在UAV-协助的IoT网络中。我们详细介绍了一些有效的数据收集策略,包括轨迹与路径观察、碰撞避免、数据网络对 clustering、数据聚合、UAV群形成和人工智能优化等。此外,我们还提出了两个UAV作为服务的用案,以增强数据收集可靠性和效率。最后,我们讨论了未来数据收集在UAV-协助的IoT网络中的挑战。

Latent Task-Specific Graph Network Simulators

  • paper_url: http://arxiv.org/abs/2311.05256
  • repo_url: https://github.com/philippdahlinger/ltsgns_ai4science
  • paper_authors: Philipp Dahlinger, Niklas Freymuth, Michael Volpp, Tai Hoang, Gerhard Neumann
  • for: mesh-based simulation as a meta-learning problem to improve GNSs adaptability to new scenarios
  • methods: using a recent Bayesian meta-learning method, leveraging context data and handling uncertainties, using non-amortized task posterior approximations to sample latent descriptions of unknown system properties, and leveraging movement primitives for efficient full trajectory prediction
  • results: on par with or better than established baseline methods, and accommodating various types of context data through the use of point clouds during inference.
    Abstract Simulating dynamic physical interactions is a critical challenge across multiple scientific domains, with applications ranging from robotics to material science. For mesh-based simulations, Graph Network Simulators (GNSs) pose an efficient alternative to traditional physics-based simulators. Their inherent differentiability and speed make them particularly well-suited for inverse design problems. Yet, adapting to new tasks from limited available data is an important aspect for real-world applications that current methods struggle with. We frame mesh-based simulation as a meta-learning problem and use a recent Bayesian meta-learning method to improve GNSs adaptability to new scenarios by leveraging context data and handling uncertainties. Our approach, latent task-specific graph network simulator, uses non-amortized task posterior approximations to sample latent descriptions of unknown system properties. Additionally, we leverage movement primitives for efficient full trajectory prediction, effectively addressing the issue of accumulating errors encountered by previous auto-regressive methods. We validate the effectiveness of our approach through various experiments, performing on par with or better than established baseline methods. Movement primitives further allow us to accommodate various types of context data, as demonstrated through the utilization of point clouds during inference. By combining GNSs with meta-learning, we bring them closer to real-world applicability, particularly in scenarios with smaller datasets.
    摘要 模拟动态物理交互是科学领域中的一个关键挑战,其应用范围从 робо扮到材料科学。为 mesh-based simulations,图表网络仿真器(GNS)提供了一种高效的替代方案。它们的自然差分和速度使其特别适合反向设计问题。然而,适应新任务从有限的数据中学习是现实应用中的一个重要问题,现有方法困难于解决。我们将 mesh-based simulation 作为一个 meta-learning 问题,使用最近的 bayesian meta-learning 方法来提高 GNS 的适应新情况的能力,通过使用 context data 和处理不确定性。我们的方法,即 latent task-specific graph network simulator,通过非折衔任务 posterior 近似来采样 latent 描述未知系统性质。此外,我们利用 movement primitives 进行全 trajectory 预测,有效地解决了过去 auto-regressive 方法所遇到的积累错误问题。我们通过多个实验 validate 了我们的方法的有效性,与或更好于现有基eline 方法。 movement primitives 还允许我们根据不同的 context data 进行适应,例如通过使用点云进行推理。通过将 GNS 与 meta-learning 结合,我们使得它们更加适用于实际应用,特别是在具有更小的数据集的场景中。

When Meta-Learning Meets Online and Continual Learning: A Survey

  • paper_url: http://arxiv.org/abs/2311.05241
  • repo_url: None
  • paper_authors: Jaehyeon Son, Soochan Lee, Gunhee Kim
  • for: This paper aims to provide a comprehensive survey of various learning frameworks, including meta-learning, continual learning, and online learning, and to facilitate a clear understanding of the differences between them.
  • methods: The paper uses a consistent terminology and formal descriptions to organize various problem settings and learning algorithms, and offers an overview of these learning paradigms to foster further advancements in the field.
  • results: The paper provides a clear understanding of the differences between the learning frameworks, and offers a unified terminology for discussing them, which can help experienced researchers and newcomers to the field alike.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是提供一份涵盖不同学习框架的全面调查,包括meta-学习、连续学习和在线学习,以便促进这些学习框架之间的清晰认识。
  • methods: 这篇论文使用一致的术语和正式描述来组织不同的问题设定和学习算法,并提供这些学习框架的概述,以便促进这个领域的进一步发展。
  • results: 这篇论文提供了不同学习框架之间的清晰认识,并提供了一个统一的术语来讨论这些学习框架,这可以帮助经验丰富的研究人员和新手末进行学习和研究。
    Abstract Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a surge in research exploring the application of neural networks in other learning scenarios. One notable framework that has garnered significant attention is meta-learning. Often described as "learning to learn," meta-learning is a data-driven approach to optimize the learning algorithm. Other branches of interest are continual learning and online learning, both of which involve incrementally updating a model with streaming data. While these frameworks were initially developed independently, recent works have started investigating their combinations, proposing novel problem settings and learning algorithms. However, due to the elevated complexity and lack of unified terminology, discerning differences between the learning frameworks can be challenging even for experienced researchers. To facilitate a clear understanding, this paper provides a comprehensive survey that organizes various problem settings using consistent terminology and formal descriptions. By offering an overview of these learning paradigms, our work aims to foster further advancements in this promising area of research.
    摘要

Whisper in Focus: Enhancing Stuttered Speech Classification with Encoder Layer Optimization

  • paper_url: http://arxiv.org/abs/2311.05203
  • repo_url: None
  • paper_authors: Huma Ameer, Seemab Latif, Rabia Latif, Sana Mukhtar
  • for: 本研究旨在 automatized 识别含断流的听力问题,采用深度学习技术进行解决。
  • methods: 研究人员使用 Wav2vec2.0 语音识别模型进行断流类型分类。
  • results: 优化后的 Whisper 模型在断流类型分类任务中实现了平均 F1 分数0.81,表明其能力。此外,研究还发现了更深的编码层对断流类型识别的重要性。
    Abstract In recent years, advancements in the field of speech processing have led to cutting-edge deep learning algorithms with immense potential for real-world applications. The automated identification of stuttered speech is one of such applications that the researchers are addressing by employing deep learning techniques. Recently, researchers have utilized Wav2vec2.0, a speech recognition model to classify disfluency types in stuttered speech. Although Wav2vec2.0 has shown commendable results, its ability to generalize across all disfluency types is limited. In addition, since its base model uses 12 encoder layers, it is considered a resource-intensive model. Our study unravels the capabilities of Whisper for the classification of disfluency types in stuttered speech. We have made notable contributions in three pivotal areas: enhancing the quality of SEP28-k benchmark dataset, exploration of Whisper for classification, and introducing an efficient encoder layer freezing strategy. The optimized Whisper model has achieved the average F1-score of 0.81, which proffers its abilities. This study also unwinds the significance of deeper encoder layers in the identification of disfluency types, as the results demonstrate their greater contribution compared to initial layers. This research represents substantial contributions, shifting the emphasis towards an efficient solution, thereby thriving towards prospective innovation.
    摘要 近年来,speech处理领域的进步带来了深度学习算法的潜在应用。自动识别偏声是其中一个应用,研究人员通过使用深度学习技术来解决。最近,研究人员使用Wav2vec2.0,一种语音识别模型来分类偏声类型。虽然Wav2vec2.0表现出色,但其对各种偏声类型的泛化能力有限。此外,由于其基础模型使用12层编码层,因此被视为资源占用型模型。我们的研究探讨了Whisper模型在偏声类型分类中的能力。我们在三个重要领域中作出了显著贡献:提高SEP28-kBenchmark数据集的质量,探索Whisper模型的分类能力,并提出了高效编码层冻结策略。优化后的Whisper模型达到了0.81的平均F1分数,这证明了它的能力。此研究还发现了 deeper编码层在偏声类型标识中的重要性,结果表明它们在初始层次上的贡献比较大。这项研究代表了重要的贡献,它将注意力集中在高效解决方案上,逐渐向前进。

Perfecting Liquid-State Theories with Machine Intelligence

  • paper_url: http://arxiv.org/abs/2311.05167
  • repo_url: None
  • paper_authors: Jianzhong Wu, Mengyang Gu
  • for: 预测电子结构、分子力场和各种固体系统的物理化学性质
  • methods: Functional machine learning技术,包括代理模型、维度减少和不确定性评估
  • results: 提高精度、可扩展性和计算效率,推广应用于多种材料和化学系统
    Abstract Recent years have seen a significant increase in the use of machine intelligence for predicting electronic structure, molecular force fields, and the physicochemical properties of various condensed systems. However, substantial challenges remain in developing a comprehensive framework capable of handling a wide range of atomic compositions and thermodynamic conditions. This perspective discusses potential future developments in liquid-state theories leveraging on recent advancements of functional machine learning. By harnessing the strengths of theoretical analysis and machine learning techniques including surrogate models, dimension reduction and uncertainty quantification, we envision that liquid-state theories will gain significant improvements in accuracy, scalability and computational efficiency, enabling their broader applications across diverse materials and chemical systems.
    摘要 近年来,机器智能技术在预测电子结构、分子力场和各种固体系统的物理化学性质方面得到了广泛应用。然而,构建涵盖各种原子组成和热力学条件的全面框架仍面临着重大挑战。本视角介绍了未来可能的液体理论发展,基于近期的功能机器学习技术。通过利用理论分析和机器学习技术,包括协助模型、维度减少和不确定性评估,我们预计液体理论将在准确性、可扩展性和计算效率等方面做出显著改进,使其在多种材料和化学系统中得到更广泛的应用。

Counter-Empirical Attacking based on Adversarial Reinforcement Learning for Time-Relevant Scoring System

  • paper_url: http://arxiv.org/abs/2311.05144
  • repo_url: https://github.com/sheldonresearch/microsoft-scoring-system
  • paper_authors: Xiangguo Sun, Hong Cheng, Hang Dong, Bo Qiao, Si Qin, Qingwei Lin
  • for: 该论文主要探讨了如何自动调整分配系统,以便更好地管理大数据时代的资源。
  • methods: 作者提出了一种“反emplirical攻击”机制,通过生成冲击行为轨迹来评估分配系统,并采用对抗学习问题进行训练,以学习一个robust的分配函数。
  • results: 实验结果表明,该方法可以有效地改进分配系统,使其更能够抵御冲击行为轨迹。
    Abstract Scoring systems are commonly seen for platforms in the era of big data. From credit scoring systems in financial services to membership scores in E-commerce shopping platforms, platform managers use such systems to guide users towards the encouraged activity pattern, and manage resources more effectively and more efficiently thereby. To establish such scoring systems, several "empirical criteria" are firstly determined, followed by dedicated top-down design for each factor of the score, which usually requires enormous effort to adjust and tune the scoring function in the new application scenario. What's worse, many fresh projects usually have no ground-truth or any experience to evaluate a reasonable scoring system, making the designing even harder. To reduce the effort of manual adjustment of the scoring function in every new scoring system, we innovatively study the scoring system from the preset empirical criteria without any ground truth, and propose a novel framework to improve the system from scratch. In this paper, we propose a "counter-empirical attacking" mechanism that can generate "attacking" behavior traces and try to break the empirical rules of the scoring system. Then an adversarial "enhancer" is applied to evaluate the scoring system and find the improvement strategy. By training the adversarial learning problem, a proper scoring function can be learned to be robust to the attacking activity traces that are trying to violate the empirical criteria. Extensive experiments have been conducted on two scoring systems including a shared computing resource platform and a financial credit system. The experimental results have validated the effectiveness of our proposed framework.
    摘要 大数据时代内, scoring system 已成为平台管理的普遍现象。从金融服务中的信用分数系统到电商平台上的会员分数系统,平台管理者利用这些系统来引导用户行为,更好地管理资源,提高效率。为建立这些分数系统,需首先确定一些“实证标准”,然后针对每个分数因素进行专门的顶部设计,通常需要巨大的努力来调整和调整分数函数在新应用场景中。尤其是新项目通常没有基准或经验来评估合适的分数系统,使设计变得更加困难。为了减少每个新分数系统的手动调整努力,我们创新地研究了分数系统从预设的实证标准而不需任何基准,并提出了一个新的框架来改进系统。在这篇论文中,我们提出了一种“逆实证攻击”机制,可以生成“攻击”行为迹象并尝试让分数系统违反实证规则。然后,我们应用了一种“增强器”来评估分数系统,找到改进策略。通过训练对抗学习问题,我们可以学习一个鲁棒的分数函数,抗击攻击行为迹象,并且可以避免违反实证规则。我们对两个分数系统,包括分享计算资源平台和金融信用系统,进行了广泛的实验。实验结果证明了我们提出的框架的有效性。

On neural and dimensional collapse in supervised and unsupervised contrastive learning with hard negative sampling

  • paper_url: http://arxiv.org/abs/2311.05139
  • repo_url: None
  • paper_authors: Ruijie Jiang, Thuan Nguyen, Shuchin Aeron, Prakash Ishwar
  • For: The paper is written for proving the optimality of Neural Collapse (NC) representations for Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) under general loss and hardening functions.* Methods: The paper uses theoretical proofs to show that representations that exhibit Neural Collapse (NC) minimize the SCL, HSCL, and UCL risks. The proofs are simplified, compact, and transparent, and they demonstrate the optimality of ETF for HSCL and UCL under general loss and hardening functions.* Results: The paper empirically demonstrates that ADAM optimization of HSCL and HUCL risks with random initialization and suitable hardness levels can converge to the NC geometry, but only if unit-ball or unit-sphere feature normalization is incorporated. Without incorporating hard negatives or feature normalization, the representations learned via ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.
    Abstract For a widely-studied data model and general loss and sample-hardening functions we prove that the Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by representations that exhibit Neural Collapse (NC), i.e., the class means form an Equianglular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and UCL risks. Although the optimality of ETF is known for SCL, albeit only for InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening functions is novel. Moreover, our proofs are much simpler, compact, and transparent. We empirically demonstrate, for the first time, that ADAM optimization of HSCL and HUCL risks with random initialization and suitable hardness levels can indeed converge to the NC geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard negatives or feature normalization, however, the representations learned via ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.
    摘要 For a widely-studied data model and general loss and sample-hardening functions, we prove that the Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by representations that exhibit Neural Collapse (NC), i.e., the class means form an Equianglular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and UCL risks. Although the optimality of ETF is known for SCL, albeit only for InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening functions is novel. Moreover, our proofs are much simpler, compact, and transparent. We empirically demonstrate, for the first time, that ADAM optimization of HSCL and HUCL risks with random initialization and suitable hardness levels can indeed converge to the NC geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard negatives or feature normalization, however, the representations learned via ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.Here's a breakdown of the translation:* "Supervised Contrastive Learning" (SCL) is translated as "导向对比学习" (导向对比学习)* "Hard-SCL" (HSCL) is translated as "困难导向对比学习" (困难导向对比学习)* "Unsupervised Contrastive Learning" (UCL) is translated as "无导向对比学习" (无导向对比学习)* "Neural Collapse" (NC) is translated as "神经塌陷" (神经塌陷)* "Equianglular Tight Frame" (ETF) is translated as "等角紧凑框架" (等角紧凑框架)* "Hard-UCL" (HUCL) is translated as "困难无导向对比学习" (困难无导向对比学习)* "ADAM optimization" is translated as "ADAM优化" (ADAM优化)* "dimensional collapse" (DC) is translated as "维度塌陷" (维度塌陷)

Improving Computational Efficiency for Powered Descent Guidance via Transformer-based Tight Constraint Prediction

  • paper_url: http://arxiv.org/abs/2311.05135
  • repo_url: None
  • paper_authors: Julia Briden, Trey Gurga, Breanna Johnson, Abhishek Cauligi, Richard Linares
  • for: 这篇论文的目的是提出一个减少 espacial 问题的直接优化形式的减少计算复杂度的算法,即 Transformer-based Powered Descent Guidance (T-PDG)。
  • methods: 这篇论文使用了 transformer 神经网络,通过训练以前的轨迹优化算法的数据来实现对 globally 优化解的准确预测。解是以紧缩的形式储存在最佳状态和Touchdown 点参数之间的关系。
  • results: 在应用到 Mars 的实际探索问题上,T-PDG 可以将 Computing 3 度自由度的燃料优化轨迹的时间,与 lossless 凸化相比,由 1-8 秒钟降至 less than 500 毫秒。同时,T-PDG 保证了安全且优化的解,通过在预测过程中包含一个实际性检查。
    Abstract In this work, we present Transformer-based Powered Descent Guidance (T-PDG), a scalable algorithm for reducing the computational complexity of the direct optimization formulation of the spacecraft powered descent guidance problem. T-PDG uses data from prior runs of trajectory optimization algorithms to train a transformer neural network, which accurately predicts the relationship between problem parameters and the globally optimal solution for the powered descent guidance problem. The solution is encoded as the set of tight constraints corresponding to the constrained minimum-cost trajectory and the optimal final time of landing. By leveraging the attention mechanism of transformer neural networks, large sequences of time series data can be accurately predicted when given only the spacecraft state and landing site parameters. When applied to the real problem of Mars powered descent guidance, T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal trajectory, when compared to lossless convexification, from an order of 1-8 seconds to less than 500 milliseconds. A safe and optimal solution is guaranteed by including a feasibility check in T-PDG before returning the final trajectory.
    摘要 在这个工作中,我们提出了Transformer-based Powered Descent Guidance(T-PDG)算法,用于降低直接优化形式ulation的空间站动力下降指南问题的计算复杂性。T-PDG使用之前的轨迹优化算法的数据来训练transformer神经网络,准确预测问题参数和 globally optimal solution的关系。解决方案被编码为包含紧跟约束的最小成本轨迹和着陆时间的集合。通过利用transformer神经网络的注意机制,只需提供空站状态和着陆场址参数,可以准确预测大量时间序列数据。在应用于真实的火星动力下降指南问题时,T-PDG比lossless convexification算法减少了3个自由度燃料优化轨迹计算时间,从1-8秒钟降低到 less than 500毫秒。保证安全且优化的解决方案,T-PDG中包含了可行性检查,以确保返回最终轨迹是安全且优化的。

Exploring and Analyzing Wildland Fire Data Via Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2311.05128
  • repo_url: None
  • paper_authors: Dipak Dulal, Joseph J. Charney, Michael Gallagher, Carmeliza Navasca, Nicholas Skowronski
  • for: 研究了10Hz时间序列的热电差温度和风速测得的动力动量能量(TKE)之间的相关性,以探讨使用热电差温度作为TKE预测的可能性。
  • methods: 使用机器学习模型,包括深度神经网络、Random Forest Regressor、Gradient Boosting和Gaussian Process Regressor,评估热电差温度干扰的可能性来预测TKE值。
  • results: 使用不同的机器学习模型得到了高准确率的TKE预测结果,尤其是使用回归模型。数据视觉和相关分析表明热电差温度和TKE之间存在明显的关系,提供了对下述动力动量的深入了解。研究成果有助于火灾行为和烟雾模型科学,强调机器学习方法的重要性,并提出了有关细见火灾行为和动力动量之间复杂关系的问题。
    Abstract This research project investigated the correlation between a 10 Hz time series of thermocouple temperatures and turbulent kinetic energy (TKE) computed from wind speeds collected from a small experimental prescribed burn at the Silas Little Experimental Forest in New Jersey, USA. The primary objective of this project was to explore the potential for using thermocouple temperatures as predictors for estimating the TKE produced by a wildland fire. Machine learning models, including Deep Neural Networks, Random Forest Regressor, Gradient Boosting, and Gaussian Process Regressor, are employed to assess the potential for thermocouple temperature perturbations to predict TKE values. Data visualization and correlation analyses reveal patterns and relationships between thermocouple temperatures and TKE, providing insight into the underlying dynamics. The project achieves high accuracy in predicting TKE by employing various machine learning models despite a weak correlation between the predictors and the target variable. The results demonstrate significant success, particularly from regression models, in accurately estimating the TKE. The research findings contribute to fire behavior and smoke modeling science, emphasizing the importance of incorporating machine learning approaches and identifying complex relationships between fine-scale fire behavior and turbulence. Accurate TKE estimation using thermocouple temperatures allows for the refinement of models that can inform decision-making in fire management strategies, facilitate effective risk mitigation, and optimize fire management efforts. This project highlights the valuable role of machine learning techniques in analyzing wildland fire data, showcasing their potential to advance fire research and management practices.
    摘要 To achieve this, machine learning models such as Deep Neural Networks, Random Forest Regressor, Gradient Boosting, and Gaussian Process Regressor were employed to assess the potential for thermocouple temperature perturbations to predict TKE values. Data visualization and correlation analyses revealed patterns and relationships between thermocouple temperatures and TKE, providing insights into the underlying dynamics.Despite a weak correlation between the predictors and the target variable, the project achieved high accuracy in predicting TKE using various machine learning models. The results demonstrated significant success, particularly from regression models, in accurately estimating TKE. The findings contribute to fire behavior and smoke modeling science, emphasizing the importance of incorporating machine learning approaches and identifying complex relationships between fine-scale fire behavior and turbulence.Accurate TKE estimation using thermocouple temperatures allows for the refinement of models that can inform decision-making in fire management strategies, facilitate effective risk mitigation, and optimize fire management efforts. This project highlights the valuable role of machine learning techniques in analyzing wildland fire data, showcasing their potential to advance fire research and management practices.

Covering Number of Real Algebraic Varieties and Beyond: Improved Bounds and Applications

  • paper_url: http://arxiv.org/abs/2311.05116
  • repo_url: None
  • paper_authors: Yifan Zhang, Joe Kileel
  • for: 本文提供了一个Upper bound的bounds on the covering number of real algebraic varieties, images of polynomial maps, and semialgebraic sets.
  • methods: 本文使用了polynomial maps和semialgebraic sets的 teoría to prove the bound, which remarkably improves the best known general bound by Yomdin-Comte.
  • results: 本文的结果为新的 bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, which are not directly applicable to varieties. In addition, the paper derives near-optimal bounds on the covering number of low rank CP tensors, sketching dimension for (general) polynomial optimization problems, and generalization error bounds for deep neural networks with rational or ReLU activations.
    Abstract We prove an upper bound on the covering number of real algebraic varieties, images of polynomial maps and semialgebraic sets. The bound remarkably improves the best known general bound by Yomdin-Comte, and its proof is much more straightforward. As a consequence, our result gives new bounds on the volume of the tubular neighborhood of the image of a polynomial map and a semialgebraic set, where results for varieties by Lotz and Basu-Lerario are not directly applicable. We apply our theory to three main application domains. Firstly, we derive a near-optimal bound on the covering number of low rank CP tensors. Secondly, we prove a bound on the sketching dimension for (general) polynomial optimization problems. Lastly, we deduce generalization error bounds for deep neural networks with rational or ReLU activations, improving or matching the best known results in the literature.
    摘要 我们证明了实数型变量的覆盖数目的Upper bound,包括映射 polynomial maps 和 semi-algebraic sets。该 bound 明显超越了最佳known 通用 bound 由 Yomdin-Comte,并且其证明方式很直观。因此,我们的结果为 tubular neighborhood 的 image of a polynomial map 和 semi-algebraic sets 提供了新的 bound,其中 Lotz 和 Basu-Lerario 的结果不直接适用。我们在以下三个主要应用领域中应用了我们的理论:首先,我们从 low rank CP tensors 中 derivate 一个 near-optimal bound on the covering number。其次,我们证明了 (general) polynomial optimization problems 的 sketching dimension 的 bound。最后,我们从 deep neural networks 中 deduce generalization error bounds,并与 literature 中最佳的结果匹配或超越。

Personalized Online Federated Learning with Multiple Kernels

  • paper_url: http://arxiv.org/abs/2311.05108
  • repo_url: https://github.com/pouyamghari/pof-mkl
  • paper_authors: Pouya M. Ghari, Yanning Shen
  • For: The paper is written for online non-linear function approximation using multi-kernel learning (MKL) in a federated learning setting.* Methods: The paper proposes an algorithmic framework for clients to communicate with the server and send their updates with affordable communication cost, while employing a large dictionary of kernels. The paper also uses random feature (RF) approximation to enable scalable online federated MKL.* Results: The paper proves that each client enjoys sub-linear regret with respect to the RF approximation of its best kernel in hindsight, indicating that the proposed algorithm can effectively deal with heterogeneity of the data distributed among clients. Experimental results on real datasets showcase the advantages of the proposed algorithm compared with other online federated kernel learning ones.
    Abstract Multi-kernel learning (MKL) exhibits well-documented performance in online non-linear function approximation. Federated learning enables a group of learners (called clients) to train an MKL model on the data distributed among clients to perform online non-linear function approximation. There are some challenges in online federated MKL that need to be addressed: i) Communication efficiency especially when a large number of kernels are considered ii) Heterogeneous data distribution among clients. The present paper develops an algorithmic framework to enable clients to communicate with the server to send their updates with affordable communication cost while clients employ a large dictionary of kernels. Utilizing random feature (RF) approximation, the present paper proposes scalable online federated MKL algorithm. We prove that using the proposed online federated MKL algorithm, each client enjoys sub-linear regret with respect to the RF approximation of its best kernel in hindsight, which indicates that the proposed algorithm can effectively deal with heterogeneity of the data distributed among clients. Experimental results on real datasets showcase the advantages of the proposed algorithm compared with other online federated kernel learning ones.
    摘要

GeoFormer: Predicting Human Mobility using Generative Pre-trained Transformer (GPT)

  • paper_url: http://arxiv.org/abs/2311.05092
  • repo_url: None
  • paper_authors: Aivin V. Solatorio
  • for: 预测人类流动性有重要实践价值,应用范围从增强自然灾害风险规划到抑制流行病蔓延。
  • methods: 我们提出了GeoFormer模型,基于GPT架构的解码器只模型,用于预测人类流动性。我们在HuMob Challenge 2023中rigorously测试了我们的模型,这是一个用于评估预测模型性能的竞赛,使用标准化数据集来预测人类流动性。
  • results: GeoFormer在HuMob Challenge 2023中表现出色,在两个数据集上都达到了优秀的成绩,并在使用的两个性能指标(GEO-BLEU和Dynamic Time Warping)上表现出优异。这种成绩表明GeoFormer在人类流动性预测方面具有很大的潜力,可以为灾害预 preparation、疫病控制等领域做出重要贡献。
    Abstract Predicting human mobility holds significant practical value, with applications ranging from enhancing disaster risk planning to simulating epidemic spread. In this paper, we present the GeoFormer, a decoder-only transformer model adapted from the GPT architecture to forecast human mobility. Our proposed model is rigorously tested in the context of the HuMob Challenge 2023 -- a competition designed to evaluate the performance of prediction models on standardized datasets to predict human mobility. The challenge leverages two datasets encompassing urban-scale data of 25,000 and 100,000 individuals over a longitudinal period of 75 days. GeoFormer stands out as a top performer in the competition, securing a place in the top-3 ranking. Its success is underscored by performing well on both performance metrics chosen for the competition -- the GEO-BLEU and the Dynamic Time Warping (DTW) measures. The performance of the GeoFormer on the HuMob Challenge 2023 underscores its potential to make substantial contributions to the field of human mobility prediction, with far-reaching implications for disaster preparedness, epidemic control, and beyond.
    摘要 预测人类流动具有重要的实用价值,其应用范围包括提高灾害风险规划和模拟流行病传播。在这篇论文中,我们提出了GeoFormer模型,是基于GPT架构的解码器只 трансформа器模型,用于预测人类流动。我们的提议模型在2023年的HuMob挑战中得到了证明,并在使用两个都市规模的数据集上进行了严格的测试。这两个数据集分别包含25,000和100,000名人员的城市规模数据,时间长度为75天。GeoFormer在HuMob挑战中表现出色,在两个选择的性能指标上都取得了优秀的成绩,即GEO-BLEU和动态时间戳推准(DTW)度量。GeoFormer在HuMob挑战中的表现证明了其在人类流动预测方面的潜在作用,对于灾害准备、流行病控制等领域有广泛的应用前景。

Generalized test utilities for long-tail performance in extreme multi-label classification

  • paper_url: http://arxiv.org/abs/2311.05081
  • repo_url: None
  • paper_authors: Erik Schultheis, Marek Wydmuch, Wojciech Kotłowski, Rohit Babbar, Krzysztof Dembczyński
  • for: 本文关注于EXTREME MULTI-LABEL CLASSIFICATION(XMLC)任务中,选择一小 subsets of relevant labels。
  • methods: 本文提出了一种基于“at k”通用指标的解决方案,通过对预测结果进行权重赋值,提高长尾标签的准确率。
  • results: 本文的算法基于块协调增加法,可以轻松扩展到XMLC问题,并在实验中显示了良好的长尾性能。
    Abstract Extreme multi-label classification (XMLC) is the task of selecting a small subset of relevant labels from a very large set of possible labels. As such, it is characterized by long-tail labels, i.e., most labels have very few positive instances. With standard performance measures such as precision@k, a classifier can ignore tail labels and still report good performance. However, it is often argued that correct predictions in the tail are more interesting or rewarding, but the community has not yet settled on a metric capturing this intuitive concept. The existing propensity-scored metrics fall short on this goal by confounding the problems of long-tail and missing labels. In this paper, we analyze generalized metrics budgeted "at k" as an alternative solution. To tackle the challenging problem of optimizing these metrics, we formulate it in the expected test utility (ETU) framework, which aims at optimizing the expected performance on a fixed test set. We derive optimal prediction rules and construct computationally efficient approximations with provable regret guarantees and robustness against model misspecification. Our algorithm, based on block coordinate ascent, scales effortlessly to XMLC problems and obtains promising results in terms of long-tail performance.
    摘要 极端多标签分类(XMLC)是选择一小 subsets of 可能的标签中的一些有用标签的任务。因此,它通常有长尾标签,即大多数标签只有几个正例。使用标准的性能度量,如精度@k,一个分类器可以忽略尾标签并仍然报告良好的性能。然而,社区没有一个准确预测在尾标签的度量,因为潜在的标签是多样化的。现有的潜在度量遗弃了长尾和缺失标签的问题。在这篇论文中,我们分析通过"at k"的一般度量来解决这个问题。为了解决这个挑战,我们在预测测试用用户(ETU)框架中形式化问题,该框架目的是在固定的测试集上优化预测性能。我们 derivated 优化预测规则和计算效率的近似方法,并证明了对模型误差的Robustness和可靠性。我们的算法,基于块坐标升降,可以轻松扩展到 XMLC 问题,并在长尾性能方面获得了有优的结果。

Social Media Bot Detection using Dropout-GAN

  • paper_url: http://arxiv.org/abs/2311.05079
  • repo_url: None
  • paper_authors: Anant Shukla, Martin Jurecek, Mark Stamp
  • for: 寻找社交媒体平台上的机器人活动,以保护在线讨论的准确性和避免网络犯罪。
  • methods: 使用生成对抗网络(GAN)进行机器人检测,并通过多个检察器对一个生成器进行训练,以解决模式塌缩问题。
  • results: 我们的方法在这个领域的分类精度上超越了现有的技术,并且展示了如何使用生成器进行数据增强和逃避类分类技术的检测。
    Abstract Bot activity on social media platforms is a pervasive problem, undermining the credibility of online discourse and potentially leading to cybercrime. We propose an approach to bot detection using Generative Adversarial Networks (GAN). We discuss how we overcome the issue of mode collapse by utilizing multiple discriminators to train against one generator, while decoupling the discriminator to perform social media bot detection and utilizing the generator for data augmentation. In terms of classification accuracy, our approach outperforms the state-of-the-art techniques in this field. We also show how the generator in the GAN can be used to evade such a classification technique.
    摘要 社交媒体平台上的机器人活动是一种广泛的问题,会推翻在线讨论的准确性并可能导致网络犯罪。我们提出一种使用生成对抗网络(GAN)的方法来探测机器人。我们解决了模式塌缩问题,通过多个检测器来训练一个生成器,同时将检测器与生成器分离,以便在社交媒体上检测机器人,并使用生成器进行数据增强。在分类精度方面,我们的方法超过了当前领域的技术。此外,我们还示出了使用生成器在GAN中逃脱这种分类技术的方法。