cs.LG - 2023-11-20

Improvements in Interlayer Pipelining of CNN Accelerators Using Genetic Algorithms

  • paper_url: http://arxiv.org/abs/2311.12235
  • repo_url: None
  • paper_authors: Mark Horeni, Siddharth Joshi
  • for: 这篇论文是为了提高边缘平台上的卷积神经网络(CNNs)的效率执行。
  • methods: 我们使用了一种层融合技术,使得CNNs可以更有效地执行,并使用生成算法(GA)应用于图形基本排序来减少外部数据传输。
  • results: 我们的方法可以对MobileNet-v3进行适当的优化,实现1.8倍的能源效率提升和1.9倍的能源延迟产品(EDP)提升,并在SIMBA和Eyeriss上保持了1.4倍的EDP提升和1.12倍的EDP提升。
    Abstract Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.
    摘要 部署卷积神经网络(CNN)在边缘平台上需要高效的硬件加速。任何没有必要的数据传输在这些加速器中可以不acceptably降低性能和效率。为解决这个问题,我们开发了层融合技术targeting CNNs,减少了off-chip数据通信使用基因算法(GA)应用于图形基于排序。结果显示MobileNet-v3在SIMBA-like移动架构上提高了能效率1.8倍和能量延迟产品(EDP)1.9倍。我们的方法一般改善工作负荷性能,平均提高了SIMBA和Eyeriss的EDP1.4倍和1.12倍。

Data-Guided Regulator for Adaptive Nonlinear Control

  • paper_url: http://arxiv.org/abs/2311.12230
  • repo_url: https://github.com/NiyoushaRahimi/Data-Guided-Regulator-for-Adaptive-Nonlinear-Control
  • paper_authors: Niyousha Rahimi, Mehran Mesbahi
  • for: 这篇论文是关于设计基于数据驱动的反馈控制器,用于处理复杂非线性动力系统中的时变干扰。
  • methods: 该论文提出了一种基于直观政策更新的数据驱动控制策略,可以在不知道系统动力学的情况下实现系统状态的快速稳定。
  • results: 在一个6度自由落体下降控制问题中,该策略能够有效地处理恶境干扰。In English, this would be:
  • for: This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics.
  • methods: The proposed method uses direct policy updates based on data-driven control, which can achieve fast regulation of system states without knowing the system dynamics.
  • results: The proposed method is effective in handling adverse environmental disturbances in a 6-DOF power descent guidance problem.
    Abstract This paper addresses the problem of designing a data-driven feedback controller for complex nonlinear dynamical systems in the presence of time-varying disturbances with unknown dynamics. Such disturbances are modeled as the "unknown" part of the system dynamics. The goal is to achieve finite-time regulation of system states through direct policy updates while also generating informative data that can subsequently be used for data-driven stabilization or system identification. First, we expand upon the notion of "regularizability" and characterize this system characteristic for a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. "Rapid-regularizability" then gauges the extent by which a system can be regulated in finite time, in contrast to its asymptotic behavior. We then propose the Data-Guided Regulation for Adaptive Nonlinear Control ( DG-RAN) algorithm, an online iterative synthesis procedure that utilizes discrete time-series data from a single trajectory for regulating system states and identifying disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.
    摘要 We first explore the concept of "regularizability" and its application to a linear time-varying representation of the nonlinear system with locally-bounded higher-order terms. This allows us to gauge the extent to which a system can be regulated in finite time, rather than just its asymptotic behavior.Next, we propose the Data-Guided Regulation for Adaptive Nonlinear Control (DG-RAN) algorithm, an online iterative synthesis procedure that uses discrete time-series data from a single trajectory to regulate system states and identify disturbance dynamics. The effectiveness of our approach is demonstrated on a 6-DOF power descent guidance problem in the presence of adverse environmental disturbances.

Random Fourier Signature Features

  • paper_url: http://arxiv.org/abs/2311.12214
  • repo_url: None
  • paper_authors: Csaba Toth, Harald Oberhauser, Zoltan Szabo
  • for: 这篇论文是为了提出一种基于张量代数的时间序列相似度度量方法,以及两种可扩展的时间序列特征提取方法。
  • methods: 这篇论文使用了随机傅立叶特征来加速张量kernel的计算,并使用了最近的张量 проек 来 derivate两种更加可扩展的时间序列特征。
  • results: 论文的实验结果表明,采用随机傅立叶特征来加速张量kernel的计算可以减少计算成本,同时维持准确性。这种方法可以处理中型数据集,并且可以扩展到大型数据集。
    Abstract Tensor algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel accompanied with attractive theoretical guarantees from stochastic analysis. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. To mitigate this severe computational bottleneck, we develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences. We show uniform approximation guarantees for the proposed unbiased estimator of the signature kernel, while keeping its computation linear in the sequence length and number. In addition, combined with recent advances on tensor projections, we derive two even more scalable time series features with favourable concentration properties and computational complexity both in time and memory. Our empirical results show that the reduction in computational cost comes at a negligible price in terms of accuracy on moderate-sized datasets, and it enables one to scale to large datasets up to a million time series.
    摘要 tensor代数可以生成序列的最强度 similarity 度量,称为签名kernel,并且拥有attractive的统计分析理论保证。先前的计算签名kernel 方法scales quadratic 方式与序列长度和序列数量相关。为了解决这种严重的计算瓶颈,我们开发了基于Random Fourier Feature的签名kernel 加速方法,对序列的非欧几何空间进行加速。我们证明了对提档 estimator 的 uniform approximation guarantees,并且保持计算线性响应于序列长度和序列数量。此外,通过与近期的tensor projection 技术结合,我们 derive 两种even more scalable 时间序列特征,具有良好的峰度性和计算复杂度。我们的实验结果表明,计算成本的减少来自于精度的减少,并且可以在moderate-sized datasets 上进行扩展,并且可以扩展到大量时间序列数据。

Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation

  • paper_url: http://arxiv.org/abs/2311.12199
  • repo_url: None
  • paper_authors: Chenyang Gao, Yue Gu, Ivan Marsic
  • for: solve excessive label assignment switching and layer-decoupling issues in supervised speech separation using permutation invariant training (PIT)
  • methods: dynamic sample dropout (DSD) and layer-wise optimization (LO)
  • results: outperforms the baseline and improves the performance of speech separation tasks
    Abstract In supervised speech separation, permutation invariant training (PIT) is widely used to handle label ambiguity by selecting the best permutation to update the model. Despite its success, previous studies showed that PIT is plagued by excessive label assignment switching in adjacent epochs, impeding the model to learn better label assignments. To address this issue, we propose a novel training strategy, dynamic sample dropout (DSD), which considers previous best label assignments and evaluation metrics to exclude the samples that may negatively impact the learned label assignments during training. Additionally, we include layer-wise optimization (LO) to improve the performance by solving layer-decoupling. Our experiments showed that combining DSD and LO outperforms the baseline and solves excessive label assignment switching and layer-decoupling issues. The proposed DSD and LO approach is easy to implement, requires no extra training sets or steps, and shows generality to various speech separation tasks.
    摘要 <>将文本翻译成简化字符的中文。<>在监督推理中的语音分离方法中,固定 permutation 的训练(PIT)广泛应用于处理标签不确定性,选择最佳 permutation 更新模型。尽管它取得了成功,但前一些研究表明,PIT 受到邻近轮次的标签分配转移的困扰,阻碍模型学习更好的标签分配。为解决这个问题,我们提出了一种新的训练策略,动态样本排除(DSD),该策略考虑了前一些最佳标签分配和评价指标,以排除在训练过程中可能干扰学习的标签分配的样本。此外,我们还包括层 wise 优化(LO),以提高性能,解决层脱decoupling问题。我们的实验表明,将 DSD 和 LO 结合使用可以超越基准,解决频繁标签分配转移和层脱decoupling问题。提议的 DSD 和 LO 方法易于实现,无需额外的训练集或步骤,并且对各种语音分离任务具有通用性。

Node classification in random trees

  • paper_url: http://arxiv.org/abs/2311.12167
  • repo_url: https://github.com/wouterwln/neuralfactortrees
  • paper_authors: Wouter W. L. Nuijten, Vlado Menkovski
  • for: 模型random trees中的节点分类任务
  • methods: 使用Markov网络和Graph Neural Network来定义一个 Gibbs 分布,并使用 MCMC 来采样节点分类结果
  • results: 在Stanford Sentiment Treebank dataset上,方法比基eline表现出色,能够有效地模型节点分类任务中的联合分布。
    Abstract We propose a method for the classification of objects that are structured as random trees. Our aim is to model a distribution over the node label assignments in settings where the tree data structure is associated with node attributes (typically high dimensional embeddings). The tree topology is not predetermined and none of the label assignments are present during inference. Other methods that produce a distribution over node label assignment in trees (or more generally in graphs) either assume conditional independence of the label assignment, operate on a fixed graph topology, or require part of the node labels to be observed. Our method defines a Markov Network with the corresponding topology of the random tree and an associated Gibbs distribution. We parameterize the Gibbs distribution with a Graph Neural Network that operates on the random tree and the node embeddings. This allows us to estimate the likelihood of node assignments for a given random tree and use MCMC to sample from the distribution of node assignments. We evaluate our method on the tasks of node classification in trees on the Stanford Sentiment Treebank dataset. Our method outperforms the baselines on this dataset, demonstrating its effectiveness for modeling joint distributions of node labels in random trees.
    摘要 我们提出一种方法用于对结构化为随机树的对象进行分类。我们的目标是模型在结构化为高维嵌入的树数据结构下的分布 over 节点标签分配。树的结构不固定,并且在推理过程中没有任何节点标签的 observable。现有的方法可以生成节点标签分配的分布在树(或更一般地在图)中,但是它们都假设节点标签之间的 conditional independence,或者操作在固定的图结构上,或者需要一些节点标签的观察值。我们的方法定义了一个Markov网络,其中包含随机树的相应的topology和节点嵌入的相关性。我们使用 Graph Neural Network 来参数化 Gibbs 分布,以便在给定的随机树和节点嵌入下 estimating 节点分配的概率。我们使用 MCMC 来采样这个分布中的节点分配。我们在 Stanford Sentiment Treebank 数据集上进行节点分类任务中,我们的方法比基eline 高效,这说明了我们的方法在模型结构化为随机树的情况下 joint 分布的节点标签的能力。

Creating Temporally Correlated High-Resolution Power Injection Profiles Using Physics-Aware GAN

  • paper_url: http://arxiv.org/abs/2311.12166
  • repo_url: None
  • paper_authors: Hritik Gopal Shah, Behrouz Azimian, Anamitra Pal
  • for: solves the problem of lacking granularity in traditional smart meter measurements, enabling real-time decision-making.
  • methods: uses generative adversarial networks (GAN) with hard inequality constraints and convex optimization layer to enforce temporal consistency and create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information.
  • results: successfully creates high-resolution power injection profiles from slow timescale aggregated power information, offering a promising avenue for improved high-speed state estimation in distribution systems.
    Abstract Traditional smart meter measurements lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.
    摘要 传统智能仪表测量lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Here is the text with the same translation, but with Traditional Chinese characters:传统智能仪表测量lack the granularity needed for real-time decision-making. To address this practical problem, we create a generative adversarial networks (GAN) model that enforces temporal consistency on its high-resolution outputs via hard inequality constraints using a convex optimization layer. A unique feature of our GAN model is that it is trained solely on slow timescale aggregated power information obtained from historical smart meter data. The results demonstrate that the model can successfully create minutely interval temporally-correlated instantaneous power injection profiles from 15-minute average power consumption information. This innovative approach, emphasizing inter-neuron constraints, offers a promising avenue for improved high-speed state estimation in distribution systems and enhances the applicability of data-driven solutions for monitoring such systems.Note: Traditional Chinese is also known as "Traditional Mandarin" or "Formal Chinese".

Quantum Inception Score

  • paper_url: http://arxiv.org/abs/2311.12163
  • repo_url: None
  • paper_authors: Akira Sone, Naoki Yamamoto
  • for: 评估量化宇宙学模型质量
  • methods: 基于量子频率的量子通道识别数据集,并利用量子凝聚和量子混合提高模型质量
  • results: 量子生成模型比 классиical模型提供更高质量,归因于量子凝聚和量子混合的存在,而且利用量子抖动定理Physical Limitation of Quantum Generative Models.
    Abstract Motivated by the great success of classical generative models in machine learning, enthusiastic exploration of their quantum version has recently started. To depart on this journey, it is important to develop a relevant metric to evaluate the quality of quantum generative models; in the classical case, one such examples is the inception score. In this paper, we propose the quantum inception score, which relates the quality to the classical capacity of the quantum channel that classifies a given dataset. We prove that, under this proposed measure, the quantum generative models provide better quality than their classical counterparts because of the presence of quantum coherence and entanglement. Finally, we harness the quantum fluctuation theorem to characterize the physical limitation of the quality of quantum generative models.
    摘要 受古典生成模型在机器学习中的成功启发,现在开始了对其量子版本的积极探索。为进行这项探索,需要开发一个相关的评价指标,以衡量量子生成模型的质量。在古典情况下,一个例子是引入性分数。在这篇论文中,我们提出了量子引入分数,它与给定数据集的量子通道的分类质量有关。我们证明了,根据我们提出的评价指标,量子生成模型比其古典对应者更高质量,这是因为量子干扰和共聚的存在。最后,我们利用量子抖振定理来 caracterize量子生成模型的物理限制。

Risk-averse Batch Active Inverse Reward Design

  • paper_url: http://arxiv.org/abs/2311.12004
  • repo_url: https://github.com/pliam1105/RBAIRD
  • paper_authors: Panagiotis Liampas
    for:This paper proposes a new method called Risk-averse Batch Active Inverse Reward Design (RBAIRD) to help train AI models that can adapt to real-world scenarios and learn how to treat unknown features.methods:RBAIRD uses a series of queries to compute a probability distribution over the intended reward function, and then uses this distribution to construct batches of environments that the agent encounters in the real world. It also integrates a risk-averse planner to ensure safety while the agent is learning the reward function.results:Compared to previous approaches, RBAIRD outperformed in terms of efficiency, accuracy, and action certainty, and demonstrated quick adaptability to new, unknown features. It can be more widely used for the alignment of crucial, powerful AI models.
    Abstract Designing a perfect reward function that depicts all the aspects of the intended behavior is almost impossible, especially generalizing it outside of the training environments. Active Inverse Reward Design (AIRD) proposed the use of a series of queries, comparing possible reward functions in a single training environment. This allows the human to give information to the agent about suboptimal behaviors, in order to compute a probability distribution over the intended reward function. However, it ignores the possibility of unknown features appearing in real-world environments, and the safety measures needed until the agent completely learns the reward function. I improved this method and created Risk-averse Batch Active Inverse Reward Design (RBAIRD), which constructs batches, sets of environments the agent encounters when being used in the real world, processes them sequentially, and, for a predetermined number of iterations, asks queries that the human needs to answer for each environment of the batch. After this process is completed in one batch, the probabilities have been improved and are transferred to the next batch. This makes it capable of adapting to real-world scenarios and learning how to treat unknown features it encounters for the first time. I also integrated a risk-averse planner, similar to that of Inverse Reward Design (IRD), which samples a set of reward functions from the probability distribution and computes a trajectory that takes the most certain rewards possible. This ensures safety while the agent is still learning the reward function, and enables the use of this approach in situations where cautiousness is vital. RBAIRD outperformed the previous approaches in terms of efficiency, accuracy, and action certainty, demonstrated quick adaptability to new, unknown features, and can be more widely used for the alignment of crucial, powerful AI models.
    摘要 <>设计完美的奖励函数,涵盖所有行为方面的目标是几乎不可能,尤其是在训练环境外的泛化。活动逆奖函数设计(AIRD)提议使用一系列的问题,比较可能的奖励函数在单个训练环境中。这 Allow human 为 agent 提供关于不优的行为的信息,以计算概率分布上的奖励函数。然而,它忽略了实际环境中可能出现的未知特征,以及agent完全学习奖励函数时需要的安全措施。我改进了这种方法,创造了风险偏好批量活动逆奖函数设计(RBAIRD),它在实际应用中顺序处理批量环境,并在 predetermined 数量的迭代中请求人类回答每个环境的问题。在这个过程中,概率已经提高,并将被传递到下一个批量中。这使得它可以适应实际应用场景,学习新特征时遇到的 unknown features 的处理方式。我还将逆奖函数设计(IRD)中的风险偏好排定器integrated,该排定器在概率分布中采样一组奖励函数,并计算一个可以获得最大可靠奖励的路径。这确保了安全性,使得 Agent 在学习奖励函数时能够保持综合性,并在需要谨慎的情况下使用这种方法。RBAIRD 在效率、准确性和行动确定性方面超越了先前的方法,快速适应新的未知特征,并可以更广泛地应用于对重要、强大 AI 模型的对Alignment。

Machine-Learned Atomic Cluster Expansion Potentials for Fast and Quantum-Accurate Thermal Simulations of Wurtzite AlN

  • paper_url: http://arxiv.org/abs/2311.11990
  • repo_url: None
  • paper_authors: Guang Yang, Yuan-Bin Liu, Lei Yang, Bing-Yang Cao
  • for: 本研究用 atomic cluster expansion 框架开发了一个机器学习对应材料的声波传输性质的 potential.
  • methods: 本研究使用了 density functional theory (DFT) 和 machine learning 技术来预测 wurtzite aluminum nitride 的热导发性和声波传输性质.
  • results: 研究发现 ACE potential 可以对 wurtzite aluminum nitride 的热导发性和声波传输性质进行高精度预测,并且可以对该材料的对称对称和材料对称对称具有较高的预测能力.
    Abstract Using the atomic cluster expansion (ACE) framework, we develop a machine learning interatomic potential for fast and accurately modelling the phonon transport properties of wurtzite aluminum nitride. The predictive power of the ACE potential against density functional theory (DFT) is demonstrated across a broad range of properties of w-AlN, including ground-state lattice parameters, specific heat capacity, coefficients of thermal expansion, bulk modulus, and harmonic phonon dispersions. Validation of lattice thermal conductivity is further carried out by comparing the ACE-predicted values to the DFT calculations and experiments, exhibiting the overall capability of our ACE potential in sufficiently describing anharmonic phonon interactions. As a practical application, we perform a lattice dynamics analysis using the potential to unravel the effects of biaxial strains on thermal conductivity and phonon properties of w-AlN, which is identified as a significant tuning factor for near-junction thermal design of w-AlN-based electronics.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries. The traditional Chinese form is also available, which is used in Taiwan and Hong Kong. If you prefer traditional Chinese, please let me know.

Provably Efficient CVaR RL in Low-rank MDPs

  • paper_url: http://arxiv.org/abs/2311.11965
  • repo_url: None
  • paper_authors: Yulai Zhao, Wenhao Zhan, Xiaoyan Hu, Ho-fung Leung, Farzan Farnia, Wen Sun, Jason D. Lee
  • For: The paper aims to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$ in low-rank Markov Decision Processes (MDPs) with nonlinear function approximation.* Methods: The paper proposes a novel Upper Confidence Bound (UCB) bonus-driven algorithm that balances exploration, exploitation, and representation learning in CVaR RL. The algorithm uses a discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle.* Results: The paper achieves a sample complexity of $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ to yield an $\epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations. The algorithm is provably efficient in low-rank MDPs and can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle.
    Abstract We study risk-sensitive Reinforcement Learning (RL), where we aim to maximize the Conditional Value at Risk (CVaR) with a fixed risk tolerance $\tau$. Prior theoretical work studying risk-sensitive RL focuses on the tabular Markov Decision Processes (MDPs) setting. To extend CVaR RL to settings where state space is large, function approximation must be deployed. We study CVaR RL in low-rank MDPs with nonlinear function approximation. Low-rank MDPs assume the underlying transition kernel admits a low-rank decomposition, but unlike prior linear models, low-rank MDPs do not assume the feature or state-action representation is known. We propose a novel Upper Confidence Bound (UCB) bonus-driven algorithm to carefully balance the interplay between exploration, exploitation, and representation learning in CVaR RL. We prove that our algorithm achieves a sample complexity of $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ to yield an $\epsilon$-optimal CVaR, where $H$ is the length of each episode, $A$ is the capacity of action space, and $d$ is the dimension of representations. Computational-wise, we design a novel discretized Least-Squares Value Iteration (LSVI) algorithm for the CVaR objective as the planning oracle and show that we can find the near-optimal policy in a polynomial running time with a Maximum Likelihood Estimation oracle. To our knowledge, this is the first provably efficient CVaR RL algorithm in low-rank MDPs.
    摘要 我们研究风险敏感奖励学习(RL),目标是 Maximize Conditional Value at Risk(CVaR),并且固定风险容忍度为 $\tau$。先前的理论研究把注重风险RL固定在文件Markov Decision Processes(MDPs)中。为了扩展CVaR RL到大状态空间中,函数近似必须被应用。我们研究CVaR RL在低级MDPs中,其中过渡概率函数可以分解为低级矩阵。低级MDPs不同于先前的线性模型,不需要状态或行动表示的知识。我们提出了一种新的Upper Confidence Bound(UCB)奖励驱动算法,用于在探索、利用和表示学习之间进行精准的平衡。我们证明我们的算法可以在 $\tilde{O}\left(\frac{H^7 A^2 d^4}{\tau^2 \epsilon^2}\right)$ 样本复杂度内获得 $\epsilon$-优的CVaR,其中 $H$ 是每个episode的长度, $A$ 是动作空间的容量, $d$ 是表示的维度。计算机上,我们设计了一种灵活的积分最小二乘值迭代(LSVI)算法来实现CVaR目标,并证明我们可以在对数时间内找到近似优化策略。到我们所知,这是首个可证明高效的CVaR RL算法在低级MDPs中。

Estimation of entropy-regularized optimal transport maps between non-compactly supported measures

  • paper_url: http://arxiv.org/abs/2311.11934
  • repo_url: https://github.com/mattwerenski/entropic-map
  • paper_authors: Matthew Werenski, James M. Murphy, Shuchin Aeron
  • for: 这个论文目的是解决估计带有Entropy regularization的优化运输(EOT)图的问题,其中源和目标概率分布均为几何分布。
  • methods: 该论文使用了一种新的在样本中估计器,并使用了偏置-变量分解以控制偏置和变量的整体影响。
  • results: 论文表明,在一些特定的情况下,预期的$L^2$误差的平方根速率至少为$O(n^{-1/3})$,而在总体情况下,预期的$L^1$误差的幂率至少为$O(n^{-1/6})$。此外,论文还证明了对于不同的常数参数,误差的幂率均为 polynomials。
    Abstract This paper addresses the problem of estimating entropy-regularized optimal transport (EOT) maps with squared-Euclidean cost between source and target measures that are subGaussian. In the case that the target measure is compactly supported or strongly log-concave, we show that for a recently proposed in-sample estimator, the expected squared $L^2$-error decays at least as fast as $O(n^{-1/3})$ where $n$ is the sample size. For the general subGaussian case we show that the expected $L^1$-error decays at least as fast as $O(n^{-1/6})$, and in both cases we have polynomial dependence on the regularization parameter. While these results are suboptimal compared to known results in the case of compactness of both the source and target measures (squared $L^2$-error converging at a rate $O(n^{-1})$) and for when the source is subGaussian while the target is compactly supported (squared $L^2$-error converging at a rate $O(n^{-1/2})$), their importance lie in eliminating the compact support requirements. The proof technique makes use of a bias-variance decomposition where the variance is controlled using standard concentration of measure results and the bias is handled by T1-transport inequalities along with sample complexity results in estimation of EOT cost under subGaussian assumptions. Our experimental results point to a looseness in controlling the variance terms and we conclude by posing several open problems.
    摘要 Our proof technique uses a bias-variance decomposition, where the variance is controlled using standard concentration of measure results, and the bias is handled by T1-transport inequalities and sample complexity results in estimation of EOT cost under subGaussian assumptions. However, our experimental results suggest that there may be looseness in controlling the variance terms, and we conclude by posing several open problems.

Deep Calibration of Market Simulations using Neural Density Estimators and Embedding Networks

  • paper_url: http://arxiv.org/abs/2311.11913
  • repo_url: None
  • paper_authors: Namid R. Stillman, Rory Baggott, Justin Lyon, Jianfei Zhang, Dingqiu Zhu, Tao Chen, Perukrishnen Vytelingum
  • for: This paper is written for those who are interested in developing realistic simulators of financial exchanges, and who want to use deep learning techniques to calibrate these simulators to specific periods of trading.
  • methods: The paper uses deep learning techniques, specifically neural density estimators and embedding networks, to calibrate market simulators to a specific period of trading.
  • results: The paper demonstrates that its approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.Here’s the Chinese translation of the three pieces of information:
  • for: 这篇论文是为了帮助开发金融交易市场的实际模拟器,并使用深度学习技术来对这些模拟器进行准确的启动。
  • methods: 这篇论文使用深度学习技术,具体来说是神经density估计器和嵌入网络,来对市场模拟器进行准确的启动。
  • results: 这篇论文显示了其方法可以正确地标记高概率参数集,并且无需人工选择或权重组合的简化特征。
    Abstract The ability to construct a realistic simulator of financial exchanges, including reproducing the dynamics of the limit order book, can give insight into many counterfactual scenarios, such as a flash crash, a margin call, or changes in macroeconomic outlook. In recent years, agent-based models have been developed that reproduce many features of an exchange, as summarised by a set of stylised facts and statistics. However, the ability to calibrate simulators to a specific period of trading remains an open challenge. In this work, we develop a novel approach to the calibration of market simulators by leveraging recent advances in deep learning, specifically using neural density estimators and embedding networks. We demonstrate that our approach is able to correctly identify high probability parameter sets, both when applied to synthetic and historical data, and without reliance on manually selected or weighted ensembles of stylised facts.
    摘要 <>使用深度学习技术构建真实的金融交易场景 simulator,包括复制限制Order Book 的动态,可以提供许多 counterfactual enario,如快速衰退、 margin call 或 macroeconomic 景象变化。在过去几年, agent-based 模型已经开发出来,能够复制交易场景的许多特征,如一组 stylised facts 和统计。但是,对 simulator 的准确填充仍然是一个开放的挑战。在这项工作中,我们开发了一种新的方法来填充市场 simulator,利用最新的深度学习技术, Specifically using neural density estimators 和 embedding networks。我们示示了我们的方法可以正确地标识高概率参数集,无需手动选择或Weight ensemble of stylised facts。<>

Certification of Distributional Individual Fairness

  • paper_url: http://arxiv.org/abs/2311.11911
  • repo_url: None
  • paper_authors: Matthew Wicker, Vihari Piratia, Adrian Weller
  • for: 本文研究了机器学习算法的形式保证(certificates),以确保个人公平(individual fairness,IF)。
  • methods: 本文提出了一种新的凸函数方法来快速提供IF保证,并且利用了 quasi-convex 优化技术提供了有效的证明。
  • results: 本文表明了其方法可以覆盖大型神经网络,并且在实际分布偏移中提供了有效的IF保证。
    Abstract Providing formal guarantees of algorithmic fairness is of paramount importance to socially responsible deployment of machine learning algorithms. In this work, we study formal guarantees, i.e., certificates, for individual fairness (IF) of neural networks. We start by introducing a novel convex approximation of IF constraints that exponentially decreases the computational cost of providing formal guarantees of local individual fairness. We highlight that prior methods are constrained by their focus on global IF certification and can therefore only scale to models with a few dozen hidden neurons, thus limiting their practical impact. We propose to certify distributional individual fairness which ensures that for a given empirical distribution and all distributions within a $\gamma$-Wasserstein ball, the neural network has guaranteed individually fair predictions. Leveraging developments in quasi-convex optimization, we provide novel and efficient certified bounds on distributional individual fairness and show that our method allows us to certify and regularize neural networks that are several orders of magnitude larger than those considered by prior works. Moreover, we study real-world distribution shifts and find our bounds to be a scalable, practical, and sound source of IF guarantees.
    摘要 <>转换文本到简化中文。<> socially responsible deployment of machine learning algorithms 需要提供正式的公平 garanties。在这项工作中,我们研究正式的 garanties,即证书, для神经网络的个体公平(IF)。我们开始于引入一种新的凸函数approximation of IF 约束,这些约束可以快速减少提供本地个体公平的计算成本。我们指出先前的方法受到global IF认证的限制,因此只能适用于几十个隐藏神经元的模型,这限制了它们的实际影响。我们提议使用分布式公平认证,以确保神经网络对于给定的 empirical distribution 和所有在 $\gamma $- Wasserstein 球中的分布都有保证的公平预测。通过 quasi-convex 优化技术,我们提供了新的和有效的分布式公平认证 bounds,并证明我们的方法可以 certificates 和规范化神经网络,这些神经网络比先前的工作中考虑的神经网络要多少orders of magnitude larger。此外,我们研究了实际的分布转移和发现我们的 bound 是一种可扩展、实用和有Sound的公平 garanties sources。

Real-Time Surface-to-Air Missile Engagement Zone Prediction Using Simulation and Machine Learning

  • paper_url: http://arxiv.org/abs/2311.11905
  • repo_url: https://github.com/jpadantas/sam-ez
  • paper_authors: Joao P. A. Dantas, Diego Geraldo, Felipe L. L. Medeiros, Marcos R. O. A. Maximo, Takashi Yoneyama
  • for: 这个研究旨在提高现代空中防御系统中的地面到空间导弹(SAM)的效果,特别是考虑到对抗目标的可用空间域(Engagement Zone,EZ)。
  • methods: 本研究提出了一种结合机器学习技术的方法,通过训练有监督的机器学习算法来准确预测SAM EZ。
  • results: 研究发现,这种方法可以快速预测SAM EZ,并提供现场实时的洞察,从而提高SAM系统的性能。
    Abstract Surface-to-Air Missiles (SAMs) are crucial in modern air defense systems. A critical aspect of their effectiveness is the Engagement Zone (EZ), the spatial region within which a SAM can effectively engage and neutralize a target. Notably, the EZ is intrinsically related to the missile's maximum range; it defines the furthest distance at which a missile can intercept a target. The accurate computation of this EZ is essential but challenging due to the dynamic and complex factors involved, which often lead to high computational costs and extended processing times when using conventional simulation methods. In light of these challenges, our study investigates the potential of machine learning techniques, proposing an approach that integrates machine learning with a custom-designed simulation tool to train supervised algorithms. We leverage a comprehensive dataset of pre-computed SAM EZ simulations, enabling our model to accurately predict the SAM EZ for new input parameters. It accelerates SAM EZ simulations, enhances air defense strategic planning, and provides real-time insights, improving SAM system performance. The study also includes a comparative analysis of machine learning algorithms, illuminating their capabilities and performance metrics and suggesting areas for future research, highlighting the transformative potential of machine learning in SAM EZ simulations.
    摘要 现代空中防御系统中,地面对空导弹(SAM)是关键性的。SAM的作战区域(EZ)是指导弹可以有效地侦测和 нейтрализу攻击目标的空间区域。需要注意的是,EZ与导弹的最大范围直接相关,即导弹可以在这个范围内 intercept攻击目标。正确计算EZ是非常重要但也是非常困难的,因为存在许多动态和复杂的因素,这会导致高计算成本和延长的处理时间。为了解决这些挑战,我们的研究探讨了机器学习技术的潜在作用,并提出了一种机器学习与自定义的 simulations 工具集成的方法。我们利用了一个全面的SAM EZ simulations数据集,使得我们的模型可以准确地预测新的输入参数下的SAM EZ。这有助于加速SAM EZ simulations,提高空 defense 战略规划,并提供实时的洞察,从而提高SAM系统性能。研究还包括机器学习算法的比较分析,描述了这些算法的能力和性能指标,并建议了未来研究的方向,强调了机器学习在SAM EZ simulations中的转型性。

Measuring and Mitigating Biases in Motor Insurance Pricing

  • paper_url: http://arxiv.org/abs/2311.11900
  • repo_url: None
  • paper_authors: Mulah Moriah, Franck Vermet, Arthur Charpentier
  • For: The paper aims to provide a comprehensive set of tools for insurers to adopt fairer pricing strategies in the context of automobile insurance, while ensuring consistency and performance.* Methods: The paper uses a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy and accommodate market competition.* Results: The study assesses the effectiveness of these tools through practical application in the context of automobile insurance, with a focus on equitable premiums, age-based premium fairness, and the consideration of new dimensions for evaluating fairness, such as the presence of serious illnesses or disabilities.
    Abstract The non-life insurance sector operates within a highly competitive and tightly regulated framework, confronting a pivotal juncture in the formulation of pricing strategies. Insurers are compelled to harness a range of statistical methodologies and available data to construct optimal pricing structures that align with the overarching corporate strategy while accommodating the dynamics of market competition. Given the fundamental societal role played by insurance, premium rates are subject to rigorous scrutiny by regulatory authorities. These rates must conform to principles of transparency, explainability, and ethical considerations. Consequently, the act of pricing transcends mere statistical calculations and carries the weight of strategic and societal factors. These multifaceted concerns may drive insurers to establish equitable premiums, taking into account various variables. For instance, regulations mandate the provision of equitable premiums, considering factors such as policyholder gender or mutualist group dynamics in accordance with respective corporate strategies. Age-based premium fairness is also mandated. In certain insurance domains, variables such as the presence of serious illnesses or disabilities are emerging as new dimensions for evaluating fairness. Regardless of the motivating factor prompting an insurer to adopt fairer pricing strategies for a specific variable, the insurer must possess the capability to define, measure, and ultimately mitigate any ethical biases inherent in its pricing practices while upholding standards of consistency and performance. This study seeks to provide a comprehensive set of tools for these endeavors and assess their effectiveness through practical application in the context of automobile insurance.
    摘要 非人寿保险业在高度竞争和严格管制的框架下运作,面临着决定性的价格策略制定之刻。保险公司需要结合多种统计方法和可用数据,构建最佳的价格结构,以实现公司核心策略的协调,同时满足市场竞争的变化。由于保险在社会中扮演着重要的角色,保险费用受到严格的监管和社会要求。因此,价格的确定不仅仅是统计计算,也受到战略和社会因素的影响。这些多方面的因素可能会导致保险公司采取更公平的费用策略,考虑多个变量。例如,法规要求提供公平的费用政策,考虑因素 such as policyholder gender或mutualist group dynamics,与公司战略相符。年龄基本的费用公平也被规定。在某些保险领域,存在严重疾病或残疾的存在是新的评价公平的维度。无论某保险公司采取哪一种公平价格策略,它必须具备定义、测量和最终缓解任何伦理偏见的能力,并保持一致性和性能标准。这项研究的目的是提供一套全面的工具,并评估其效果在汽车保险上。

AMES: A Differentiable Embedding Space Selection Framework for Latent Graph Inference

  • paper_url: http://arxiv.org/abs/2311.11891
  • repo_url: None
  • paper_authors: Yuan Lu, Haitz Sáez de Ocáriz Borde, Pietro Liò
  • for: 这 paper 是为了解决数据集中元素之间的 latent graph inference 问题,使得 Graph Neural Networks (GNNs) 可以在点云数据上进行动态学习必要的图结构。
  • methods: 这 paper 使用了 Attentional Multi-Embedding Selection (AMES) 框架,这是一种可微的方法,通过 backpropagation 来选择最佳的 embedding space,并考虑下游任务。
  • results: comparing 五个 benchmark 数据集,这 paper 的方法可以达到与之前方法相当或更高的结果,而且不需要多次实验来确定最佳的 embedding space。 更重要的是,这 paper 还提供了一种可读性技术,可以跟踪不同的 latent graph 的梯度贡献,从而了解这种 attention-based, fully differentiable 方法如何选择适当的 latent space。
    Abstract In real-world scenarios, although data entities may possess inherent relationships, the specific graph illustrating their connections might not be directly accessible. Latent graph inference addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data, dynamically learning the necessary graph structure. These graphs are often derived from a latent embedding space, which can be modeled using Euclidean, hyperbolic, spherical, or product spaces. However, currently, there is no principled differentiable method for determining the optimal embedding space. In this work, we introduce the Attentional Multi-Embedding Selection (AMES) framework, a differentiable method for selecting the best embedding space for latent graph inference through backpropagation, considering a downstream task. Our framework consistently achieves comparable or superior results compared to previous methods for latent graph inference across five benchmark datasets. Importantly, our approach eliminates the need for conducting multiple experiments to identify the optimal embedding space. Furthermore, we explore interpretability techniques that track the gradient contributions of different latent graphs, shedding light on how our attention-based, fully differentiable approach learns to choose the appropriate latent space. In line with previous works, our experiments emphasize the advantages of hyperbolic spaces in enhancing performance. More importantly, our interpretability framework provides a general approach for quantitatively comparing embedding spaces across different tasks based on their contributions, a dimension that has been overlooked in previous literature on latent graph inference.
    摘要 在实际场景中,数据实体可能拥有自然的关系,但具体的关系图可能并不直接可访问。 latent graph inference Addresses this issue by enabling Graph Neural Networks (GNNs) to operate on point cloud data,动态学习必要的关系图结构。这些图通常来自于 latent embedding space,可以是欧几何、卷积、球形或产品空间。然而,目前没有原理性的分解ifferentiable方法来确定最佳 embedding space。在这种工作中,我们引入 Attentional Multi-Embedding Selection (AMES) 框架,一种可分解的方法,通过反射传播来选择最佳 embedding space для latent graph inference,考虑到下游任务。我们的框架在五个 benchmark 数据集上 consistently achievable comparable or superior results compared to previous methods for latent graph inference。关键是,我们的方法消除了需要进行多次实验来确定最佳 embedding space 的需求。此外,我们还 explore interpretability techniques ,跟踪不同的 latent graph 的梯度贡献,探讨我们的注意力基于、完全分解的方法如何选择合适的 latent space。与前一些工作一样,我们的实验强调了使用卷积空间的优点,并且我们的可解释框架提供了一种通用的方法来比较不同任务中 embedding space 的贡献,这一维度在 previous literature 中被忽略了。

Efficient Neural Networks for Tiny Machine Learning: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2311.11883
  • repo_url: None
  • paper_authors: Minh Tri Lê, Pierre Wolinski, Julyan Arbel
  • for: 本评论文章提供了关于微型机器学习(TinyML)应用中高效神经网络和深度学习模型的报道和分析。
  • methods: 本文使用的方法包括模型压缩、量化和低级分解,以优化神经网络体系结构,以适应资源受限的MCU上的应用。
  • results: 本文总结了现有的深度学习模型在MCU上的部署技术,包括模型剪辑、硬件加速和算法-架构协同设计等方法,以提高模型在MCU上的高效部署。
    Abstract The field of Tiny Machine Learning (TinyML) has gained significant attention due to its potential to enable intelligent applications on resource-constrained devices. This review provides an in-depth analysis of the advancements in efficient neural networks and the deployment of deep learning models on ultra-low power microcontrollers (MCUs) for TinyML applications. It begins by introducing neural networks and discussing their architectures and resource requirements. It then explores MEMS-based applications on ultra-low power MCUs, highlighting their potential for enabling TinyML on resource-constrained devices. The core of the review centres on efficient neural networks for TinyML. It covers techniques such as model compression, quantization, and low-rank factorization, which optimize neural network architectures for minimal resource utilization on MCUs. The paper then delves into the deployment of deep learning models on ultra-low power MCUs, addressing challenges such as limited computational capabilities and memory resources. Techniques like model pruning, hardware acceleration, and algorithm-architecture co-design are discussed as strategies to enable efficient deployment. Lastly, the review provides an overview of current limitations in the field, including the trade-off between model complexity and resource constraints. Overall, this review paper presents a comprehensive analysis of efficient neural networks and deployment strategies for TinyML on ultra-low-power MCUs. It identifies future research directions for unlocking the full potential of TinyML applications on resource-constrained devices.
    摘要 随着智能应用的普及,迫切需要实现智能应用在资源有限的设备上。这篇评论文章提供了关于高效神经网络和深度学习模型在超低功耗微控制器(MCU)上的投入和部署的深入分析。文章首先介绍神经网络,并讨论其建构和资源需求。然后,文章探讨了基于MEMS技术的应用在超低功耗MCU上,并强调它们在资源有限的设备上启用TinyML的潜力。文章的核心部分是高效神经网络的优化,包括模型压缩、量化和低级因数分解等技术,以最小化MCU上神经网络的资源使用。文章还详细介绍了深度学习模型的部署在超低功耗MCU上,包括计算能力和存储器资源的限制。文章提出了多种策略,如模型剪辑、硬件加速和算法-架构协同设计,以实现高效的部署。最后,文章提供了当前领域的限制,包括模型复杂度和资源约束之间的贸易OFF。总的来说,这篇评论文章提供了关于TinyML在超低功耗MCU上的深入分析,并提出了未来研究方向,以推动TinyML应用在资源有限的设备上的全面发展。

Forward Gradients for Data-Driven CFD Wall Modeling

  • paper_url: http://arxiv.org/abs/2311.11876
  • repo_url: None
  • paper_authors: Jan Hückelheim, Tadbhagya Kumar, Krishnan Raghavan, Pinaki Pal
  • for: 用于增强CFD simulate wall-bounded flow 精度和效率。
  • methods: 使用机器学习和数据驱动方法,减少CFD计算成本和存储占用。
  • results: 实现了一种不需要分离前向和反向扫描的梯度计算方法,可以更高效地训练增Resolution wall模型,提高CFD simulate 精度。
    Abstract Computational Fluid Dynamics (CFD) is used in the design and optimization of gas turbines and many other industrial/ scientific applications. However, the practical use is often limited by the high computational cost, and the accurate resolution of near-wall flow is a significant contributor to this cost. Machine learning (ML) and other data-driven methods can complement existing wall models. Nevertheless, training these models is bottlenecked by the large computational effort and memory footprint demanded by back-propagation. Recent work has presented alternatives for computing gradients of neural networks where a separate forward and backward sweep is not needed and storage of intermediate results between sweeps is not required because an unbiased estimator for the gradient is computed in a single forward sweep. In this paper, we discuss the application of this approach for training a subgrid wall model that could potentially be used as a surrogate in wall-bounded flow CFD simulations to reduce the computational overhead while preserving predictive accuracy.
    摘要 计算流体动力学(CFD)在设计和优化液压机和其他工业/科学应用中广泛使用。然而,实际使用受到计算成本的限制,而近墙流动的准确解决也是一大 contribuutor。机器学习(ML)和其他数据驱动方法可以补充现有墙模型。然而,训练这些模型受到计算努力和存储空间的限制,因为back-propagation需要大量的计算努力和存储空间。最近的工作已经提出了不需要分离前进和返回扫描的方法来计算神经网络的梯度。在这篇文章中,我们讨论了使用这种方法来训练一个子网格墙模型,以便在墙 bounded 流动 CFD 模拟中减少计算成本而保持预测精度。

Training robust and generalizable quantum models

  • paper_url: http://arxiv.org/abs/2311.11871
  • repo_url: https://github.com/daniel-fink-de/training-robust-and-generalizable-quantum-models
  • paper_authors: Julian Berberich, Daniel Fink, Daniel Pranjić, Christian Tutschku, Christian Holm
  • for: 这个论文研究了量子机器学习模型的抗攻击性和通用性。
  • methods: 论文使用了Liψitz bounds来研究量子机器学习模型的抗攻击性和通用性。
  • results: 研究发现,可调编码可以系统地提高量子机器学习模型的抗攻击性和通用性,而固定编码则无法通过调整参数来改善这两个性能指标。此外,论文还提出了一种做出量子机器学习模型更加 robust和通用的实际策略。
    Abstract Adversarial robustness and generalization are both crucial properties of reliable machine learning models. In this paper, we study these properties in the context of quantum machine learning based on Lipschitz bounds. We derive tailored, parameter-dependent Lipschitz bounds for quantum models with trainable encoding, showing that the norm of the data encoding has a crucial impact on the robustness against perturbations in the input data. Further, we derive a bound on the generalization error which explicitly depends on the parameters of the data encoding. Our theoretical findings give rise to a practical strategy for training robust and generalizable quantum models by regularizing the Lipschitz bound in the cost. Further, we show that, for fixed and non-trainable encodings as frequently employed in quantum machine learning, the Lipschitz bound cannot be influenced by tuning the parameters. Thus, trainable encodings are crucial for systematically adapting robustness and generalization during training. With numerical results, we demonstrate that, indeed, Lipschitz bound regularization leads to substantially more robust and generalizable quantum models.
    摘要 机器学习模型的可靠性和抗抗击性都是非常重要的性能指标。在这篇论文中,我们在量子机器学习的上下文中研究了这两个性能指标,基于 lipschitz bound。我们 derivated 特定的参数 dependent lipschitz bound,表明数据编码的 норahlength对于输入数据的变化具有关键性的影响。此外,我们还 derivated 一个参数dependent的泛化误差 bound,显示了数据编码参数对模型的泛化性能有直接的影响。我们的理论发现给出了一种实用的训练稳定和泛化的量子模型策略,通过规范 lipschitz bound 的成本来实现。此外,我们还证明了 fix 和 non-trainable 编码被常用于量子机器学习中的情况下, lipschitz bound 无法通过调整参数来改变。因此,可调编码是对系统地改进了robustness和泛化性的关键。通过数据示例,我们证明了 lipschitz bound 规范确实导致了比较稳定和泛化的量子模型。

Deep learning complete intersection Calabi-Yau manifolds

  • paper_url: http://arxiv.org/abs/2311.11847
  • repo_url: None
  • paper_authors: Harold Erbin, Riccardo Finotello
  • for: 理解如何使用机器学习处理代数拓扑数据,特别是完全交叉Calabi-Yau(CICY)3-和4-维结构。
  • methods: 本文讨论了方法学方面和数据分析方面,然后介绍了神经网络架构。
  • results: 文章描述了当前预测黑格数的状态艺术,包括在低黑格数预测中进行推断,以及在高黑格数预测中进行推断。此外,文章还描述了在低黑格数预测中进行推断的新结果。
    Abstract We review advancements in deep learning techniques for complete intersection Calabi-Yau (CICY) 3- and 4-folds, with the aim of understanding better how to handle algebraic topological data with machine learning. We first discuss methodological aspects and data analysis, before describing neural networks architectures. Then, we describe the state-of-the art accuracy in predicting Hodge numbers. We include new results on extrapolating predictions from low to high Hodge numbers, and conversely.
    摘要 我团队正在审查深度学习技术在完全交叉Calabi-Yau(CICY)3-4维上的进步,以更好地理解如何使用机器学习处理代数 topological 数据。我们首先讨论了方法学方面和数据分析,然后介绍神经网络架构。接着,我们介绍了目前领域中最佳准确率预测Hodge 数。我们还报道了在低到高Hodge数的预测推断中的新结果,以及相反的情况。Note that "完全交叉Calabi-Yau" (CICY) is a specific type of algebraic variety, and "Hodge 数" (Hodge numbers) are a way of measuring the geometry of the variety.

High Probability Guarantees for Random Reshuffling

  • paper_url: http://arxiv.org/abs/2311.11841
  • repo_url: None
  • paper_authors: Hengxu Yu, Xiao Li
  • for: 这个论文主要研究了 Stochastic Gradient Method with Random Reshuffling(SGD-RR)在缓冲非对称优化问题上的应用。SGD-RR在实际中广泛应用,尤其是在神经网络训练中。
  • methods: 本论文首先研究了 SGD-RR 的采样过程中的集中性性质,并提出了一个新的高probability 样本复杂性保证,使得 gradient (无需期望)在 $\varepsilon$ 下降。这个复杂性保证与最好的现有的均衡式样本复杂性保证相当,只是增加了一个对数性 término。此外,我们还利用我们 derivated 的高probability 下降性和积分误差 bound,提出了一个简单可计算的停止 criterion(denoted as $\mathsf{RR}$-$\mathsf{sc}$),这个 criterion 能 garantúe 在一定数量的迭代后,返回一个满足 $\varepsilon$ 的迭代。此外,我们还提出了一个 perturbed random reshuffling method( $\mathsf{p}$-$\mathsf{RR}$),该方法在Stationary Point附近加入了一个随机干扰过程。我们证明了 $\mathsf{p}$-$\mathsf{RR}$ 可以高效地逃脱精度下降点并返回第二阶 stationary point,无需对 stochastic gradient error 做任何 sub-Gaussian 尾部假设。
  • results: 本论文通过数学实验证明了其理论发现。在神经网络训练中,SGD-RR 可以高效地逃脱精度下降点并返回第二阶 stationary point,而不是假设 sub-Gaussian 尾部。
    Abstract We consider the stochastic gradient method with random reshuffling ($\mathsf{RR}$) for tackling smooth nonconvex optimization problems. $\mathsf{RR}$ finds broad applications in practice, notably in training neural networks. In this work, we first investigate the concentration property of $\mathsf{RR}$'s sampling procedure and establish a new high probability sample complexity guarantee for driving the gradient (without expectation) below $\varepsilon$, which effectively characterizes the efficiency of a single $\mathsf{RR}$ execution. Our derived complexity matches the best existing in-expectation one up to a logarithmic term while imposing no additional assumptions nor changing $\mathsf{RR}$'s updating rule. Furthermore, by leveraging our derived high probability descent property and bound on the stochastic error, we propose a simple and computable stopping criterion for $\mathsf{RR}$ (denoted as $\mathsf{RR}$-$\mathsf{sc}$). This criterion is guaranteed to be triggered after a finite number of iterations, and then $\mathsf{RR}$-$\mathsf{sc}$ returns an iterate with its gradient below $\varepsilon$ with high probability. Moreover, building on the proposed stopping criterion, we design a perturbed random reshuffling method ($\mathsf{p}$-$\mathsf{RR}$) that involves an additional randomized perturbation procedure near stationary points. We derive that $\mathsf{p}$-$\mathsf{RR}$ provably escapes strict saddle points and efficiently returns a second-order stationary point with high probability, without making any sub-Gaussian tail-type assumptions on the stochastic gradient errors. Finally, we conduct numerical experiments on neural network training to support our theoretical findings.
    摘要 我们考虑使用测度 gradient 方法($\mathsf{RR}$)来解决缓和非凸优化问题。 $\mathsf{RR}$ 在实践中获得了广泛的应用,特别是在训练神经网络中。在这个工作中,我们首先研究 $\mathsf{RR}$ 的抽掣程序中的集中性性质,然后建立一个新的高概率抽掣次数保证,将梯度(不对期望)下降至 $\varepsilon$ 以下,这个结果有效地描述了 $\mathsf{RR}$ 的效率。我们的 derive 的复杂度与最佳的对 expectation 的复杂度几乎相同,但不需要额外的假设,也不需要更改 $\mathsf{RR}$ 的更新规则。此外,我们还利用我们 derive 的高概率下降性和测度错误的上限,提出了一个简单可计算的停止条件(denoted as $\mathsf{RR}$-$\mathsf{sc}$)。这个条件会在一定的回归次数之后触发,并且返回一个梯度下降至 $\varepsilon$ 以下的回归点,且高概率上发生。此外,我们还提出了一个受 perturbed random reshuffling 方法($\mathsf{p}$-$\mathsf{RR}$),这个方法具有在站点点发生时额外添加一些随机干扰程序的特点。我们证明了 $\mathsf{p}$-$\mathsf{RR}$ 可以干扰紧缩点,并高概率地返回一个二阶 stationary point。在这个过程中,我们不需要假设测度错误具有子高斯分布的特性。最后,我们在神经网络训练中进行了数值实验,以支持我们的理论发现。

Zero redundancy distributed learning with differential privacy

  • paper_url: http://arxiv.org/abs/2311.11822
  • repo_url: None
  • paper_authors: Zhiqi Bu, Justin Chiu, Ruixuan Liu, Sheng Zha, George Karypis
  • for: 这篇论文的目的是发展一个可以应用于分布式深度学习中的隐私保护技术,以便在训练大型深度学习模型时能够保护用户的隐私。
  • methods: 本论文使用了 Zero Redundancy Optimizer (ZeRO) 来实现分布式深度学习,并且将其与隐私保护技术整合,以便在训练大型深度学习模型时能够维护用户的隐私。
  • results: 本论文的结果显示,DP-ZeRO 能够实现与标准 ZeRO 相同的计算和通信效率,并且能够训练更大的模型,例如 GPT-100B。此外,DP-ZeRO 还能够支持混合精度训练。
    Abstract Deep learning using large models have achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of the training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, DP optimization has comparable efficiency to the standard non-private optimization on a single GPU, but on multiple GPUs, existing DP distributed learning (such as pipeline parallel) has suffered from significantly worse efficiency. On the other hand, the Zero Redundancy Optimizer (ZeRO) is a state-of-the-art solution to the standard distributed learning, exhibiting excellent training efficiency on large models, but to work compatibly with DP is technically complicated. In this work, we develop a new systematic solution, DP-ZeRO, (I) to scale up the trainable DP model size, e.g. to GPT-100B, (II) to obtain the same computation and communication efficiency as the standard ZeRO, and (III) to enable mixed-precision DP training. Our DP-ZeRO, like the standard ZeRO, has the potential to train models with arbitrary size and is evaluated on the world's largest DP models in terms of the number of trainable parameters.
    摘要 深度学习使用大型模型已经在各种领域取得了很大的成功。然而,在 Billions of 参数上进行深度学习训练是具有很大的挑战,特别是在遵循隐私保护(DP)的情况下。一方面,DP 优化的效率相对于标准不隐私的优化在单个 GPU 上具有相似的效率,但在多个 GPU 上,现有的 DP 分布式学习(如管道并行)表现得更加糟糕。另一方面,零重复优化器(ZeRO)是当前顶尖的分布式学习解决方案,在大型模型上显示出了极佳的训练效率,但与 DP 兼容需要技术上的努力。在这种情况下,我们开发了一种新的系统性解决方案——DP-ZeRO,以下是该系统的三大目标:1. 扩展可训练DP模型的大小,例如GPT-100B。2. 与标准ZeRO的计算和通信效率相同。3. 实现混合精度DP训练。我们的 DP-ZeRO 同样可以训练任意大小的模型,并在世界上最大的 DP 模型上进行评估。

LogLead – Fast and Integrated Log Loader, Enhancer, and Anomaly Detector

  • paper_url: http://arxiv.org/abs/2311.11809
  • repo_url: https://github.com/evotestops/loglead
  • paper_authors: Mika Mäntylä, Yuqing Wang, Jesse Nyyssölä
  • for: 本文介绍了一种名为LogLead的日志分析工具,用于高效地处理日志数据。
  • methods: LogLead结合了三个基本的日志处理步骤:加载、增强和异常检测。它利用了Polars高速数据Frame库。文章提供了7个加载器和多种增强器,包括3个解析器(Drain、Spell、LenMa),Bert嵌入式创建和其他日志表示技术。LogLead集成了5种supervised和4种Unsupervised机器学习算法,用于异常检测。
  • results: 文章表明,使用LogLead将日志从原始文件转换为数据帧,比过去的解决方案快得多(大于10倍)。同时,文章还证明了将日志消息准确化异步传递给LogLead可以提高Draind parsing速度(大约2倍)。最后,文章还对HDFS日志进行了简短的测试,结果表明,日志表示技术 beyond bag-of-words 提供的好处很有限。
    Abstract This paper introduces LogLead, a tool designed for efficient log analysis. LogLead combines three essential steps in log processing: loading, enhancing, and anomaly detection. The tool leverages Polars, a high-speed DataFrame library. We currently have 7 Loaders out of which 4 is for public data sets (HDFS, Hadoop, BGL, and Thunderbird). We have multiple enhancers with three parsers (Drain, Spell, LenMa), Bert embedding creation and other log representation techniques like bag-of-words. LogLead integrates to 5 supervised and 4 unsupervised machine learning algorithms for anomaly detection from SKLearn. By integrating diverse datasets, log representation methods and anomaly detectors, LogLead facilitates comprehensive benchmarking in log analysis research. We demonstrate that log loading from raw file to dataframe is over 10x faster with LogLead is compared to past solutions. We demonstrate roughly 2x improvement in Drain parsing speed by off-loading log message normalization to LogLead. We demonstrate a brief benchmarking on HDFS suggesting that log representations beyond bag-of-words provide limited benefits. Screencast demonstrating the tool: https://youtu.be/8stdbtTfJVo
    摘要

Operator Learning for Continuous Spatial-Temporal Model with A Hybrid Optimization Scheme

  • paper_url: http://arxiv.org/abs/2311.11798
  • repo_url: None
  • paper_authors: Chuanqi Chen, Jin-Long Wu
  • for: 这篇论文是用来模拟复杂动态系统的空间-时间模型的。
  • methods: 该模型使用了最近的运算学习进步,并使用了不断空间和时间的数据驱动模型。
  • results: 该模型能够保持空间和时间分辨率的不变性,并且可以通过短时间序列数据进行稳定长期 simulate。此外,该模型还可以通过混合短时间和长时间数据进行优化,以更好地预测长期统计。
    Abstract Partial differential equations are often used in the spatial-temporal modeling of complex dynamical systems in many engineering applications. In this work, we build on the recent progress of operator learning and present a data-driven modeling framework that is continuous in both space and time. A key feature of the proposed model is the resolution-invariance with respect to both spatial and temporal discretizations. To improve the long-term performance of the calibrated model, we further propose a hybrid optimization scheme that leverages both gradient-based and derivative-free optimization methods and efficiently trains on both short-term time series and long-term statistics. We investigate the performance of the spatial-temporal continuous learning framework with three numerical examples, including the viscous Burgers' equation, the Navier-Stokes equations, and the Kuramoto-Sivashinsky equation. The results confirm the resolution-invariance of the proposed modeling framework and also demonstrate stable long-term simulations with only short-term time series data. In addition, we show that the proposed model can better predict long-term statistics via the hybrid optimization scheme with a combined use of short-term and long-term data.
    摘要 《partial differential equations在空间-时间模型中的应用》中,我们会使用最近的运算学进步,提出一种数据驱动的模型框架,该框架在空间和时间上是连续的。这个提案的一个重要特点是对空间和时间分辨率的不变性。为了提高模型的长期性能,我们进一步提议一种混合优化方案,该方案利用了梯度优化和无梯度优化方法,并高效地在短期时间序列和长期统计数据上训练。我们通过三个数字例子,包括粘滞布尔gers方程、奈尔-斯托克方程和库拉摩-西瓦希诺斯基方程,证明了提案的模型框架的不变性,并表明了只使用短期时间序列数据进行训练可以实现稳定的长期 simulate。此外,我们还证明了我们的模型可以更好地预测长期统计信息,通过混合优化方案并使用短期和长期数据进行训练。

Approximate Linear Programming and Decentralized Policy Improvement in Cooperative Multi-agent Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2311.11789
  • repo_url: None
  • paper_authors: Lakshmi Mandal, Chandrashekar Lakshminarayanan, Shalabh Bhatnagar
  • for: 本文研究了一种多智能机器人(agent)协同解决Markov决策过程(MDP),其中所有agent都知道系统模型。
  • methods: 我们使用了分布式政策改进算法,其中每个agent假设其他agent的决策已经固定,然后改进自己的决策。我们还使用了 Approximate Linear Programming(ALP)计算近似价值函数。
  • results: 我们提供了对协同多智能机器人Finite和无限远景折扣MDP的近似政策迭代算法的理论保证,并在一些数学示例中证明了我们的算法的性能。
    Abstract In this work, we consider a `cooperative' multi-agent Markov decision process (MDP) involving m greater than 1 agents, where all agents are aware of the system model. At each decision epoch, all the m agents cooperatively select actions in order to maximize a common long-term objective. Since the number of actions grows exponentially in the number of agents, policy improvement is computationally expensive. Recent works have proposed using decentralized policy improvement in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. Yet, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Thus our algorithms can handle both large number of states as well as multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.
    摘要 在这个工作中,我们考虑了一个“合作”多代理Markov决策过程(MDP),其中有多于1个代理参与,所有代理都知道系统模型。在每个决策瞬间,所有的m代理合作选择动作,以最大化共同长期目标。由于行动数量在代理数量增加时 exponentiates,策略改进 computationally expensive。 recent works have proposed using decentralized policy improvement, in which each agent assumes that the decisions of the other agents are fixed and it improves its decisions unilaterally. However, in these works, exact values are computed. In our work, for cooperative multi-agent finite and infinite horizon discounted MDPs, we propose suitable approximate policy iteration algorithms, wherein we use approximate linear programming to compute the approximate value function and use decentralized policy improvement. Therefore, our algorithms can handle both large number of states and multiple agents. We provide theoretical guarantees for our algorithms and also demonstrate the performance of our algorithms on some numerical examples.

Masked Autoencoders Are Robust Neural Architecture Search Learners

  • paper_url: http://arxiv.org/abs/2311.12086
  • repo_url: None
  • paper_authors: Yiming Hu, Xiangxiang Chu, Bo Zhang
  • for: 提高Neural Architecture Search(NAS)的效率和可靠性,减少或完全消除需要标注数据的使用。
  • methods: 基于Masked Autoencoders(MAE)的方法,替换supervised learning目标函数,使用图像重建任务来进行搜索过程,不需要标注数据,同时保持性能和通用能力。
  • results: 通过对不同的搜索空间和数据集进行广泛的实验,证明提posed方法的有效性和可靠性,比基eline方法更高。
    Abstract Neural Architecture Search (NAS) currently relies heavily on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.
    摘要 Currently, Neural Architecture Search (NAS) heavily relies on labeled data, which is both expensive and time-consuming to acquire. In this paper, we propose a novel NAS framework based on Masked Autoencoders (MAE) that eliminates the need for labeled data during the search process. By replacing the supervised learning objective with an image reconstruction task, our approach enables the robust discovery of network architectures without compromising performance and generalization ability. Additionally, we address the problem of performance collapse encountered in the widely-used Differentiable Architecture Search (DARTS) method in the unsupervised paradigm by introducing a multi-scale decoder. Through extensive experiments conducted on various search spaces and datasets, we demonstrate the effectiveness and robustness of the proposed method, providing empirical evidence of its superiority over baseline approaches.Here's the translation in Traditional Chinese:目前,Neural Architecture Search (NAS) 对于标签数据的依赖是相当高,这些数据不仅成本高,也需要耗费很长的时间来取得。在这篇文章中,我们提出了一个基于Masked Autoencoders (MAE)的 NAS 框架,这个框架不需要标签数据进行搜寻过程。我们通过将超级vised learning 目标取代为图像重建任务,使我们的方法可以在搜寻过程中获得无标签数据的Robust 发现网络架构。此外,我们解决了 Differentiable Architecture Search (DARTS) 方法在无supervised 情况下的性能崩溃问题,通过引入多尺度解oder。通过对不同的搜寻空间和数据集进行广泛的实验,我们证明了我们的方法的有效性和Robustness,提供了实践证据,与基eline方法相比,我们的方法具有superiority。

MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations

  • paper_url: http://arxiv.org/abs/2311.11762
  • repo_url: None
  • paper_authors: Daniel Bogdoll, Yitian Yang, J. Marius Zöllner
  • for: 提高自动驾驶系统的理解能力,增强系统的决策能力。
  • methods: 使用原始相机和激光数据学习感知数据不受感知器件的多模态世界模型,以便直接用于下游任务,如规划。
  • results: 实现多模态未来预测,并证明我们的几何表示提高了相机图像和激光点云预测质量。
    Abstract Learning unsupervised world models for autonomous driving has the potential to improve the reasoning capabilities of today's systems dramatically. However, most work neglects the physical attributes of the world and focuses on sensor data alone. We propose MUVO, a MUltimodal World Model with Geometric VOxel Representations to address this challenge. We utilize raw camera and lidar data to learn a sensor-agnostic geometric representation of the world, which can directly be used by downstream tasks, such as planning. We demonstrate multimodal future predictions and show that our geometric representation improves the prediction quality of both camera images and lidar point clouds.
    摘要 学习无监控世界模型可以帮助自动驾驶系统的理解能力提高很多。然而,大多数工作忽略了世界的物理属性,而专注于感知数据alone。我们提议MUVO,一个多Modal World Model with Geometric VOxel Representations来解决这个挑战。我们使用原始的相机和激光数据来学习无关感知的几何表示世界,这可以直接用于下游任务,如规划。我们展示了多模态未来预测,并证明我们的几何表示提高了相机图像和激光点云预测质量。

Revealing behavioral impact on mobility prediction networks through causal interventions

  • paper_url: http://arxiv.org/abs/2311.11749
  • repo_url: None
  • paper_authors: Ye Hong, Yanan Xin, Simon Dirmeier, Fernando Perez-Cruz, Martin Raubal
  • for: 这个研究旨在研究 Deep neural networks 在 mobilit 预测任务中的解释性问题,尤其是如何各种 mobilit 行为因素影响这些神经网络的预测结果。
  • methods: 我们在这个研究中提出了一种 causal intervention 框架,用于评估不同 mobilit 行为因素对神经网络的影响。我们使用个人 mobilit 模型生成Synthetic location visit sequences,并通过控制数据生成过程来控制行为动力学。我们使用 mobilit 度量来评估 intervened location sequences,并输入这些位置序列到已经训练好的网络中进行分析性能变化。
  • results: 我们的结果表明可以生成具有不同 mobilit 行为特征的location sequences,并且可以在不同的 spatial 和 temporal 环境下进行模拟。这些变化导致神经网络的预测性能发生变化,并且揭示了关键 mobilit 行为因素,包括location transition 的顺序模式、探索新位置的倾向和个人和人口层次的位置选择偏好。这些发现对实际应用中的 mobilit 预测网络具有重要价值,而 causal inference 框架可以提高神经网络在 mobilit 应用中的解释性和可靠性。
    Abstract Deep neural networks are increasingly utilized in mobility prediction tasks, yet their intricate internal workings pose challenges for interpretability, especially in comprehending how various aspects of mobility behavior affect predictions. In this study, we introduce a causal intervention framework to assess the impact of mobility-related factors on neural networks designed for next location prediction -- a task focusing on predicting the immediate next location of an individual. To achieve this, we employ individual mobility models to generate synthetic location visit sequences and control behavior dynamics by intervening in their data generation process. We evaluate the interventional location sequences using mobility metrics and input them into well-trained networks to analyze performance variations. The results demonstrate the effectiveness in producing location sequences with distinct mobility behaviors, thus facilitating the simulation of diverse spatial and temporal changes. These changes result in performance fluctuations in next location prediction networks, revealing impacts of critical mobility behavior factors, including sequential patterns in location transitions, proclivity for exploring new locations, and preferences in location choices at population and individual levels. The gained insights hold significant value for the real-world application of mobility prediction networks, and the framework is expected to promote the use of causal inference for enhancing the interpretability and robustness of neural networks in mobility applications.
    摘要

Leveraging Uncertainty Estimates To Improve Classifier Performance

  • paper_url: http://arxiv.org/abs/2311.11723
  • repo_url: None
  • paper_authors: Gundeep Arora, Srujana Merugu, Anoop Saladi, Rajeev Rastogi
  • for: 该论文主要探讨了如何基于模型分数和不确定性选择决策边界,以提高模型的准确率和回归率。
  • methods: 该论文提出了一种基于动态计划和iso随机函数的算法,用于选择决策边界,并进行了理论分析和实验验证。
  • results: 该论文的实验结果表明,使用模型分数和不确定性可以提高模型的准确率和回归率,并且在三个实际 dataset 上实现了25%-40%的提升。
    Abstract Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound). However, model scores are often not aligned with the true positivity rate. This is especially true when the training involves a differential sampling across classes or there is distributional drift between train and test settings. In this paper, we provide theoretical analysis and empirical evidence of the dependence of model score estimation bias on both uncertainty and score itself. Further, we formulate the decision boundary selection in terms of both model score and uncertainty, prove that it is NP-hard, and present algorithms based on dynamic programming and isotonic regression. Evaluation of the proposed algorithms on three real-world datasets yield 25%-40% gain in recall at high precision bounds over the traditional approach of using model score alone, highlighting the benefits of leveraging uncertainty.
    摘要 二分类分类涉及判断实例标签是否根据模型分数超过选择的阈值(例如,以最大准确率为约束)。但模型分数与实际正确率不一致,尤其是在类别采样不同或测试环境中存在分布漂移情况下。本文提供了对模型分数估计偏差的理论分析和实验证据,并证明模型分数和不确定性的决策边界选择是NP困难的。基于动态计划和iso逻辑回归,我们提出了一些算法,并评估这些算法在三个实际数据集上,获得25%-40%的准确率提升,证明了通过利用不确定性来提高二分类分类的效果。

Unveiling the Power of Self-Attention for Shipping Cost Prediction: The Rate Card Transformer

  • paper_url: http://arxiv.org/abs/2311.11694
  • repo_url: https://github.com/lucidrains/tab-transformer-pytorch
  • paper_authors: P Aditya Sreekar, Sahil Verma, Varun Madhavan, Abhishek Persad
    for: 这个研究是为了提高亚马逊在销售过程中的财务决策,具体来说是减少邮费估算错误的影响。methods: 这个研究使用了一种新的架构called Rate Card Transformer (RCT),它使用自注意力来编码包裹信息,包括包裹属性、交通公司信息和路径规划。RCT可以编码一个变量列表,从而更好地捕捉包裹信息。results: 研究结果显示,使用RCT进行邮费估算可以减少错误率28.82%,并且超过了现有的转换器基本模型FTTransformer的性能。此外,RCT还可以改善树状模型的性能。
    Abstract Amazon ships billions of packages to its customers annually within the United States. Shipping cost of these packages are used on the day of shipping (day 0) to estimate profitability of sales. Downstream systems utilize these days 0 profitability estimates to make financial decisions, such as pricing strategies and delisting loss-making products. However, obtaining accurate shipping cost estimates on day 0 is complex for reasons like delay in carrier invoicing or fixed cost components getting recorded at monthly cadence. Inaccurate shipping cost estimates can lead to bad decision, such as pricing items too low or high, or promoting the wrong product to the customers. Current solutions for estimating shipping costs on day 0 rely on tree-based models that require extensive manual engineering efforts. In this study, we propose a novel architecture called the Rate Card Transformer (RCT) that uses self-attention to encode all package shipping information such as package attributes, carrier information and route plan. Unlike other transformer-based tabular models, RCT has the ability to encode a variable list of one-to-many relations of a shipment, allowing it to capture more information about a shipment. For example, RCT can encode properties of all products in a package. Our results demonstrate that cost predictions made by the RCT have 28.82% less error compared to tree-based GBDT model. Moreover, the RCT outperforms the state-of-the-art transformer-based tabular model, FTTransformer, by 6.08%. We also illustrate that the RCT learns a generalized manifold of the rate card that can improve the performance of tree-based models.
    摘要

Tiny-VBF: Resource-Efficient Vision Transformer based Lightweight Beamformer for Ultrasound Single-Angle Plane Wave Imaging

  • paper_url: http://arxiv.org/abs/2311.12082
  • repo_url: None
  • paper_authors: Abdul Rahoof, Vivek Chaturvedi, Mahesh Raveendranatha Panicker, Muhammad Shafique
  • for: 加速非实时 beamforming 算法在ultrasound 成像中使用深度学习架构,以提高图像质量和速度。
  • methods: 提出了一种基于视transformer的 tiny beamformer(Tiny-VBF)模型,使用 raw radio-frequency 通道数据,并使用 hybrid 量化 schemes 加速 FPGA 实现。
  • results: Tiny-VBF 模型在尺寸 368 x 128 的帧中需要0.34 GOPs/帧,与 state-of-the-art 深度学习模型相比下降8%的对比度和5%和33%的轴向和横向分辨率提升。同时,与 conventional Delay-and-Sum 扩展器相比,Tiny-VBF 模型提供了4.2%的对比度和4%和20%的轴向和横向分辨率提升。
    Abstract Accelerating compute intensive non-real-time beam-forming algorithms in ultrasound imaging using deep learning architectures has been gaining momentum in the recent past. Nonetheless, the complexity of the state-of-the-art deep learning techniques poses challenges for deployment on resource-constrained edge devices. In this work, we propose a novel vision transformer based tiny beamformer (Tiny-VBF), which works on the raw radio-frequency channel data acquired through single-angle plane wave insonification. The output of our Tiny-VBF provides fast envelope detection requiring very low frame rate, i.e. 0.34 GOPs/Frame for a frame size of 368 x 128 in comparison to the state-of-the-art deep learning models. It also exhibited an 8% increase in contrast and gains of 5% and 33% in axial and lateral resolution respectively when compared to Tiny-CNN on in-vitro dataset. Additionally, our model showed a 4.2% increase in contrast and gains of 4% and 20% in axial and lateral resolution respectively when compared against conventional Delay-and-Sum (DAS) beamformer. We further propose an accelerator architecture and implement our Tiny-VBF model on a Zynq UltraScale+ MPSoC ZCU104 FPGA using a hybrid quantization scheme with 50% less resource consumption compared to the floating-point implementation, while preserving the image quality.
    摘要 快速计算非实时射频成形算法在超音波成像中使用深度学习架构受到过去几年的推动。然而,现状的深度学习技术的复杂性使得部署在有限资源的边缘设备上具有挑战。在这项工作中,我们提出了一种基于视transformer的小型射频成形器(Tiny-VBF),它在单角扫描电磁信号中处理原始的射频通道数据。Tiny-VBF的输出具有快速的幅度检测,需要非常低的帧率,即0.34 GOPs/帧,与现状的深度学习模型相比。此外,我们的模型在射频成像 dataset 上显示了8%的对比度提高和 axial 和 lateral 分辨率的提高,分别为5%和33%,相比之下Tiny-CNN模型。此外,我们的模型还与传统的延迟和总和(DAS)成形器进行了比较,显示了4.2%的对比度提高和 axial 和 lateral 分辨率的提高,分别为4%和20%。最后,我们还提出了一种加速器架构,并在 Zynq UltraScale+ MPSoC ZCU104 FPGA 上实现了一种混合量化方案,相比于浮点实现,消耗资源量减少了50%,保持图像质量。

Unraveling the Control Engineer’s Craft with Neural Networks

  • paper_url: http://arxiv.org/abs/2311.11644
  • repo_url: None
  • paper_authors: Braghadeesh Lakshminarayanan, Federico Dettù, Cristian R. Rojas, Simone Formentin
  • for: 这篇论文旨在提出一种数据驱动的控制器调试方法,使用数字模拟器生成输入输出数据,并使用神经网络学习模型来调节控制器参数。
  • methods: 该方法使用数字模拟器生成输入输出数据,然后使用神经网络学习模型来学习控制器调节规则,从而实际替换控制工程师。
  • results: 该方法通过数字模拟器生成的输入输出数据,使用神经网络学习模型来学习控制器调节规则,可以快速和高精度地调节控制器参数。
    Abstract Many industrial processes require suitable controllers to meet their performance requirements. More often, a sophisticated digital twin is available, which is a highly complex model that is a virtual representation of a given physical process, whose parameters may not be properly tuned to capture the variations in the physical process. In this paper, we present a sim2real, direct data-driven controller tuning approach, where the digital twin is used to generate input-output data and suitable controllers for several perturbations in its parameters. State-of-the art neural-network architectures are then used to learn the controller tuning rule that maps input-output data onto the controller parameters, based on artificially generated data from perturbed versions of the digital twin. In this way, as far as we are aware, we tackle for the first time the problem of re-calibrating the controller by meta-learning the tuning rule directly from data, thus practically replacing the control engineer with a machine learning model. The benefits of this methodology are illustrated via numerical simulations for several choices of neural-network architectures.
    摘要 很多工业过程需要适合的控制器来满足其性能要求。更常见的情况是,存在一个非常复杂的数字孪生,即物理过程的虚拟表示,其参数可能没有正确地调整到物理过程的变化。在这篇论文中,我们提出了一种 sim2real、直接数据驱动控制器调整方法,其中使用数字孪生生成输入输出数据和适合的控制器,并使用现代神经网络架构学习控制器调整规则,以将输入输出数据映射到控制器参数。这种方法,至我们所知,是第一次通过直接从数据中学习控制器调整规则,实际地将控制工程师 replaced by 机器学习模型。我们通过数值仿真对几种神经网络架构进行了评估,并证明了本方法的优点。

Incorporating LLM Priors into Tabular Learners

  • paper_url: http://arxiv.org/abs/2311.11628
  • repo_url: None
  • paper_authors: Max Zhu, Siniša Stanivuk, Andrija Petrovic, Mladen Nikolic, Pietro Lio
  • for: 本研究旨在 Addressing Large Language Models (LLMs) 的挑战,如数据序列化敏感性和偏见,并结合传统的表格数据分类技术。
  • methods: 本研究提出了两种使用 LLMs 进行排序 categorical 变量和生成对 continous 变量和目标的 Priors 的策略,以提高几个shot enario 下的性能。具体来说,我们引入了 MonotonicLR,一种使用非线性增长函数将 ordinal 映射到 cardinal 的方法,保持 LLM 决定的顺序。
  • results: 对比基eline 模型,我们的方法在低数据场景下表现出优于其他方法,特别是在几个shot enario 下。此外,我们的方法仍然可以保持可读性。
    Abstract We present a method to integrate Large Language Models (LLMs) and traditional tabular data classification techniques, addressing LLMs challenges like data serialization sensitivity and biases. We introduce two strategies utilizing LLMs for ranking categorical variables and generating priors on correlations between continuous variables and targets, enhancing performance in few-shot scenarios. We focus on Logistic Regression, introducing MonotonicLR that employs a non-linear monotonic function for mapping ordinals to cardinals while preserving LLM-determined orders. Validation against baseline models reveals the superior performance of our approach, especially in low-data scenarios, while remaining interpretable.
    摘要 我们提出了一种方法,将大型语言模型(LLM)与传统的表格数据分类技术集成,解决 LLM 的敏感性和偏见问题。我们提出了两种使用 LLM для排序普通分类和生成对目标变量和连续变量之间的相关性的先验,提高了几个shot场景中的性能。我们专注于LOGISTIC REGRESSION,提出了一种非线性凝leans函数,将ORDINALS映射到Cardinals,保持 LLM 决定的顺序。验证基eline模型表明,我们的方法在低数据场景中表现出色,而且保持可解释性。

Testing multivariate normality by testing independence

  • paper_url: http://arxiv.org/abs/2311.11575
  • repo_url: None
  • paper_authors: Povilas Daniušis
  • for: 本文提出了一种简单的多变量正态性测试,基于加铁-伯恩斯坦的特征化,可以通过利用现有的统计独立性测试来进行。
  • methods: 本文使用了现有的统计独立性测试来实现这种测试,并进行了Empirical investigation,发现在高维数据中,提出的方法可能更高效。
  • results: 本文的Empirical investigation表明,在高维数据中,提出的方法可能更高效。I hope that helps! Let me know if you have any other questions.
    Abstract We propose a simple multivariate normality test based on Kac-Bernstein's characterization, which can be conducted by utilising existing statistical independence tests for sums and differences of data samples. We also perform its empirical investigation, which reveals that for high-dimensional data, the proposed approach may be more efficient than the alternative ones. The accompanying code repository is provided at \url{https://shorturl.at/rtuy5}.
    摘要 我们提出了一种简单的多变量正态性测试,基于加邦-bernstein的特征化,可以通过利用现有的统计独立性测试来进行。我们还进行了实验研究,发现对高维数据来说,我们的方法可能比其他方法更高效。代码存储库可以在以下链接中找到:https://shorturl.at/rtuy5。Note that "Simplified Chinese" is a translation of the text into Chinese using a simpler vocabulary and grammar, which is more commonly used in mainland China. If you need a translation into "Traditional Chinese" (used in Taiwan and other parts of the world), I can provide that as well.

A Deep-Genetic Algorithm (Deep-GA) Approach for High-Dimensional Nonlinear Parabolic Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2311.11558
  • repo_url: None
  • paper_authors: Endah Rokhmati Merdika Putri, Muhammad Luthfi Shahab, Mohammad Iqbal, Imam Mukhlash, Amirul Hakam, Lutfi Mardianto, Hadi Susanto
  • for: 增加深度学习算法解决高维度偏微分方程的性能,提高解决速度和精度。
  • methods: embedding a genetic algorithm (GA) into the solver to optimize the initial guess selection, accelerating the convergence of the nonlinear PDEs on a broader interval.
  • results: 比深度BSDE更快速地解决非线性偏微分方程,并且保持相同的准确性。
    Abstract We propose a new method, called a deep-genetic algorithm (deep-GA), to accelerate the performance of the so-called deep-BSDE method, which is a deep learning algorithm to solve high dimensional partial differential equations through their corresponding backward stochastic differential equations (BSDEs). Recognizing the sensitivity of the solver to the initial guess selection, we embed a genetic algorithm (GA) into the solver to optimize the selection. We aim to achieve faster convergence for the nonlinear PDEs on a broader interval than deep-BSDE. Our proposed method is applied to two nonlinear parabolic PDEs, i.e., the Black-Scholes (BS) equation with default risk and the Hamilton-Jacobi-Bellman (HJB) equation. We compare the results of our method with those of the deep-BSDE and show that our method provides comparable accuracy with significantly improved computational efficiency.
    摘要 我们提出了一种新方法,称为深度进化算法(深度-GA),以加速深度学习算法解决高维partial differential equations(PDEs)的方法,即通过其相应的反向随机 differential equations(BSDEs)。我们认为选择初始假设对解算法的敏感性,因此我们将进化算法(GA)embed到解算法中来优化选择。我们的目标是在更广泛的区间上比deep-BSDE更快地 converges。我们的提posed方法应用于两种非线性parabolic PDEs,即黑-股(BS)方程和汉密尔-雅各布-贝尔(HJB)方程。我们比较了我们的方法与深度-BSDE的结果,并显示了我们的方法可以提供相当于准确性的同时significantly improve计算效率。

Fast Controllable Diffusion Models for Undersampled MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2311.12078
  • repo_url: https://github.com/ppn-paper/ppn
  • paper_authors: Wei Jiang, Zhuang Xiong, Feng Liu, Nan Ye, Hongfu Sun
  • for: 用于增强和加速控制可生成的扩散模型,以提高MRI下抽取重建的效果和速度。
  • methods: 使用Predictor-Projector-Noisor(PPN)算法,该算法可以快速生成高质量的MR图像,并且可以适应不同的MRI获取参数。
  • results: PPN算法可以生成高准确性的MR图像,并且比其他控制可生成方法更快。此外,PPN算法可以适应不同的MRI获取参数,使其在临床应用中更实用。
    Abstract Supervised deep learning methods have shown promise in Magnetic Resonance Imaging (MRI) undersampling reconstruction, but their requirement for paired data limits their generalizability to the diverse MRI acquisition parameters. Recently, unsupervised controllable generative diffusion models have been applied to MRI undersampling reconstruction, without paired data or model retraining for different MRI acquisitions. However, diffusion models are generally slow in sampling and state-of-the-art acceleration techniques can lead to sub-optimal results when directly applied to the controllable generation process. This study introduces a new algorithm called Predictor-Projector-Noisor (PPN), which enhances and accelerates controllable generation of diffusion models for MRI undersampling reconstruction. Our results demonstrate that PPN produces high-fidelity MR images that conform to undersampled k-space measurements with significantly shorter reconstruction time than other controllable sampling methods. In addition, the unsupervised PPN accelerated diffusion models are adaptable to different MRI acquisition parameters, making them more practical for clinical use than supervised learning techniques.
    摘要 超visisted深度学习方法在磁共振成像(MRI)下采样重建中表现出了承诺,但它们的对偶数据要求限制了它们在不同MRI获取参数下的普适性。最近,不supervised控制可生成扩散模型已经应用到MRI下采样重建中,无需对数据对或模型重新训练。然而,扩散模型通常在采样中慢,现状的加速技术可能导致直接应用到控制可生成过程中的优化结果不佳。这项研究介绍了一种新的算法called Predictor-Projector-Noisor(PPN),该算法可以加速和提高控制可生成的扩散模型在MRI下采样重建中。我们的结果表明,PPN可以生成高质量的MRI图像,符合下采样的k空间测量结果,并且重建时间比其他控制可生成方法更短。此外,不supervised PPN加速的扩散模型可以适应不同的MRI获取参数,使其更适用于临床应用。

Understanding Variation in Subpopulation Susceptibility to Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2311.11544
  • repo_url: None
  • paper_authors: Evan Rose, Fnu Suya, David Evans
  • for: 本研究探讨了机器学习模型在不同子 популяции上的攻击敏感性,尤其是在攻击者可以控制一部分训练数据点的情况下。
  • methods: 本研究使用了现有的欺诈攻击方法,并对不同子 популяции进行了实验研究,以探讨攻击效果的不同性。
  • results: 研究发现,对于较不分化的数据集,攻击者可以通过控制一部分数据点来影响模型的行为,而对于较分化的数据集,攻击效果受到具体的子 популяции属性的影响。此外,研究还发现了一种关键的子 популяción属性,即模型在干净数据集上的损失差异,可以用于评估攻击效果。
    Abstract Machine learning is susceptible to poisoning attacks, in which an attacker controls a small fraction of the training data and chooses that data with the goal of inducing some behavior unintended by the model developer in the trained model. We consider a realistic setting in which the adversary with the ability to insert a limited number of data points attempts to control the model's behavior on a specific subpopulation. Inspired by previous observations on disparate effectiveness of random label-flipping attacks on different subpopulations, we investigate the properties that can impact the effectiveness of state-of-the-art poisoning attacks against different subpopulations. For a family of 2-dimensional synthetic datasets, we empirically find that dataset separability plays a dominant role in subpopulation vulnerability for less separable datasets. However, well-separated datasets exhibit more dependence on individual subpopulation properties. We further discover that a crucial subpopulation property is captured by the difference in loss on the clean dataset between the clean model and a target model that misclassifies the subpopulation, and a subpopulation is much easier to attack if the loss difference is small. This property also generalizes to high-dimensional benchmark datasets. For the Adult benchmark dataset, we show that we can find semantically-meaningful subpopulation properties that are related to the susceptibilities of a selected group of subpopulations. The results in this paper are accompanied by a fully interactive web-based visualization of subpopulation poisoning attacks found at https://uvasrg.github.io/visualizing-poisoning
    摘要 机器学习容易受到毒素攻击,攻击者可以控制一小部分训练数据,并选择这些数据以达到模型开发者未INTENDED的行为。我们考虑了一个现实主义的设置,在该设置下,敌对者可以插入有限数量的数据点来控制模型的行为。继承 previous observations on random label-flipping attacks 的结果,我们研究了不同subpopulation的攻击效果。对于一家2维的 sintethic 数据集,我们发现了 dataset 分离性对 subpopulation 的感itivity具有主导作用。然而,well-separated 数据集更多地受到个体 subpopulation 的特性的影响。我们还发现,一个重要的 subpopulation 特性是 clean dataset 上模型和target模型之间的损失差,如果损失差小,那么这个 subpopulation 容易受到攻击。这个特性也适用于高维 benchmark 数据集。对 Adult 数据集,我们表明可以找到 semantically-meaningful subpopulation 特性,这些特性与模型中的某些 subpopulation 的感itivity相关。本文的结果通过 提供了一个完整的交互式网页式visualization of subpopulation poisoning attacks。

An NMF-Based Building Block for Interpretable Neural Networks With Continual Learning

  • paper_url: http://arxiv.org/abs/2311.11485
  • repo_url: None
  • paper_authors: Brian K. Vogel
  • for: 提高预测性能和解释性的平衡
  • methods: 使用基于NMF的Predictive Factorized Coupling(PFC)块,结合超vised neural network训练方法,以提高预测性能 while retaining NMF的解释性
  • results: 在小 dataset上测试,PFC块可以与MLP具有相同的预测性能,同时提供更好的解释性,并在不同的场景中(如 continual learning、非i.i.d.数据训练和知识 removalfter training)表现出优异的效果。
    Abstract Existing learning methods often struggle to balance interpretability and predictive performance. While models like nearest neighbors and non-negative matrix factorization (NMF) offer high interpretability, their predictive performance on supervised learning tasks is often limited. In contrast, neural networks based on the multi-layer perceptron (MLP) support the modular construction of expressive architectures and tend to have better recognition accuracy but are often regarded as black boxes in terms of interpretability. Our approach aims to strike a better balance between these two aspects through the use of a building block based on NMF that incorporates supervised neural network training methods to achieve high predictive performance while retaining the desirable interpretability properties of NMF. We evaluate our Predictive Factorized Coupling (PFC) block on small datasets and show that it achieves competitive predictive performance with MLPs while also offering improved interpretability. We demonstrate the benefits of this approach in various scenarios, such as continual learning, training on non-i.i.d. data, and knowledge removal after training. Additionally, we show examples of using the PFC block to build more expressive architectures, including a fully-connected residual network as well as a factorized recurrent neural network (RNN) that performs competitively with vanilla RNNs while providing improved interpretability. The PFC block uses an iterative inference algorithm that converges to a fixed point, making it possible to trade off accuracy vs computation after training but also currently preventing its use as a general MLP replacement in some scenarios such as training on very large datasets. We provide source code at https://github.com/bkvogel/pfc
    摘要 现有的学习方法 often 难以平衡解释性和预测性的表现。 nearest neighbors 和非负矩阵因子(NMF)提供了高度的解释性,但是其在指导学习任务上的预测性常常有限。 相比之下,基于多层感知器(MLP)的神经网络支持模块化的建构和表现出了更好的识别精度,但是它们通常被视为黑盒子,即无法解释性。 我们的方法希望能够更好地平衡这两个方面,通过使用基于 NMF 的 Predictive Factorized Coupling(PFC)块来实现高度预测性,同时保留 NMF 的愉悦解释性特点。 我们在小样本上评估了 PFC 块,并显示了它在指导学习任务上与 MLP 相当,同时提供了改善的解释性。 我们在不同的场景下展示了这种方法的优势,包括不间断学习、训练非同一致数据和学习后知识 removals。 此外,我们还展示了使用 PFC 块建立更加表达力的架构,包括完全连接的差异阶段网络和 факторизован的 RNN,这些架构在指导学习任务上表现竞争力强,同时提供了改善的解释性。 PFC 块使用迭代推理算法,其总是向一个固定点收敛,因此可以在训练后交换准确率和计算量,但目前不支持在很大的数据集上进行训练。 我们在 GitHub 上提供了源代码,请参考

Gaussian Interpolation Flows

  • paper_url: http://arxiv.org/abs/2311.11475
  • repo_url: None
  • paper_authors: Yuan Gao, Jian Huang, Yuling Jiao
  • for: 这个论文主要研究了基于Gaussian denoising的连续正常化流的建构,以及这些流的理论性质和正则化效果。
  • methods: 该论文使用了一种统一框架,称为Gaussian interpolation flow,来研究连续正常化流的启发性、存在和一意性、流速场的 lipschitz 连续性和时间反转流map的 lipschitz 连续性等。
  • results: 该研究发现,Gaussian interpolation flows 具有良好的启发性、存在和一意性、流速场的 lipschitz 连续性和时间反转流map的 lipschitz 连续性等特点,并且可以用于描述一些rich classes of target distributions。此外,该研究还探讨了这些流的稳定性和源分布的扰动。
    Abstract Gaussian denoising has emerged as a powerful principle for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing flows built on Gaussian denoising. Through a unified framework termed Gaussian interpolation flow, we establish the Lipschitz regularity of the flow velocity field, the existence and uniqueness of the flow, and the Lipschitz continuity of the flow map and the time-reversed flow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle-consistency properties of Gaussian interpolation flows. Additionally, we delve into the stability of these flows in source distributions and perturbations of the velocity field, using the quadratic Wasserstein distance as a metric. Our findings offer valuable insights into the learning techniques employed in Gaussian interpolation flows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning GIFs with empirical observations.
    摘要 (Note: Simplified Chinese is a simplified version of Chinese that uses shorter words and sentences, and is often used in informal writing and online communication. The translation above is written in Simplified Chinese, but the original text is in Traditional Chinese.)

Towards a Post-Market Monitoring Framework for Machine Learning-based Medical Devices: A case study

  • paper_url: http://arxiv.org/abs/2311.11463
  • repo_url: None
  • paper_authors: Jean Feng, Adarsh Subbaswamy, Alexej Gossmann, Harvineet Singh, Berkman Sahiner, Mi-Ok Kim, Gene Pennello, Nicholas Petrick, Romain Pirracchio, Fan Xia
  • for: 这个研究的目的是为了制定一种系统atic的监控策略,以确保在临床实践中部署的机器学习(ML)系统的安全性和有效性。
  • methods: 这篇研究使用了 causal inference 和统计过程控制 等工具,对监控方法进行了定义、评估和比较。
  • results: 研究发现,选择实际(observational)数据或进行实验性的研究是监控策略的关键决策,但这些决策受到了各种偏见和偏向的影响。
    Abstract After a machine learning (ML)-based system is deployed in clinical practice, performance monitoring is important to ensure the safety and effectiveness of the algorithm over time. The goal of this work is to highlight the complexity of designing a monitoring strategy and the need for a systematic framework that compares the multitude of monitoring options. One of the main decisions is choosing between using real-world (observational) versus interventional data. Although the former is the most convenient source of monitoring data, it exhibits well-known biases, such as confounding, selection, and missingness. In fact, when the ML algorithm interacts with its environment, the algorithm itself may be a primary source of bias. On the other hand, a carefully designed interventional study that randomizes individuals can explicitly eliminate such biases, but the ethics, feasibility, and cost of such an approach must be carefully considered. Beyond the decision of the data source, monitoring strategies vary in the performance criteria they track, the interpretability of the test statistics, the strength of their assumptions, and their speed at detecting performance decay. As a first step towards developing a framework that compares the various monitoring options, we consider a case study of an ML-based risk prediction algorithm for postoperative nausea and vomiting (PONV). Bringing together tools from causal inference and statistical process control, we walk through the basic steps of defining candidate monitoring criteria, describing potential sources of bias and the causal model, and specifying and comparing candidate monitoring procedures. We hypothesize that these steps can be applied more generally, as causal inference can address other sources of biases as well.
    摘要 after a machine learning(ML)based system is deployed in clinical practice, performance monitoring is important to ensure the safety and effectiveness of the algorithm over time. The goal of this work is to highlight the complexity of designing a monitoring strategy and the need for a systematic framework that compares the multitude of monitoring options. one of the main decisions is choosing between using real-world(observational)versus interventional data. although the former is the most convenient source of monitoring data, it exhibits well-known biases, such as confounding, selection, and missingness. in fact, when the ML algorithm interacts with its environment, the algorithm itself may be a primary source of bias. on the other hand, a carefully designed interventional study that randomizes individuals can explicitly eliminate such biases, but the ethics, feasibility, and cost of such an approach must be carefully considered. beyond the decision of the data source, monitoring strategies vary in the performance criteria they track, the interpretability of the test statistics, the strength of their assumptions, and their speed at detecting performance decay. as a first step towards developing a framework that compares the various monitoring options, we consider a case study of an ML-based risk prediction algorithm for postoperative nausea and vomiting(PONV). bringing together tools from causal inference and statistical process control, we walk through the basic steps of defining candidate monitoring criteria, describing potential sources of bias and the causal model, and specifying and comparing candidate monitoring procedures. we hypothesize that these steps can be applied more generally, as causal inference can address other sources of biases as well.