cs.LG - 2023-11-14

Variational Temporal IRT: Fast, Accurate, and Explainable Inference of Dynamic Learner Proficiency

  • paper_url: http://arxiv.org/abs/2311.08594
  • repo_url: None
  • paper_authors: Yunsung Kim, Sreechan Sankaranarayanan, Chris Piech, Candace Thille
  • for: 这个论文旨在描述一种快速和准确地掌握动态学习者能力的方法。
  • methods: 这种方法基于变分Item Response Theory(VTIRT),它可以在巨量数据集上进行快速和准确的掌握。
  • results: 在应用于9个实际学生数据集上,VTIRT consistently 提高了预测未来学习者表现的精度,比其他学习者能力模型更好。
    Abstract Dynamic Item Response Models extend the standard Item Response Theory (IRT) to capture temporal dynamics in learner ability. While these models have the potential to allow instructional systems to actively monitor the evolution of learner proficiency in real time, existing dynamic item response models rely on expensive inference algorithms that scale poorly to massive datasets. In this work, we propose Variational Temporal IRT (VTIRT) for fast and accurate inference of dynamic learner proficiency. VTIRT offers orders of magnitude speedup in inference runtime while still providing accurate inference. Moreover, the proposed algorithm is intrinsically interpretable by virtue of its modular design. When applied to 9 real student datasets, VTIRT consistently yields improvements in predicting future learner performance over other learner proficiency models.
    摘要 <> translate "Dynamic Item Response Models extend the standard Item Response Theory (IRT) to capture temporal dynamics in learner ability. While these models have the potential to allow instructional systems to actively monitor the evolution of learner proficiency in real time, existing dynamic item response models rely on expensive inference algorithms that scale poorly to massive datasets. In this work, we propose Variational Temporal IRT (VTIRT) for fast and accurate inference of dynamic learner proficiency. VTIRT offers orders of magnitude speedup in inference runtime while still providing accurate inference. Moreover, the proposed algorithm is intrinsically interpretable by virtue of its modular design. When applied to 9 real student datasets, VTIRT consistently yields improvements in predicting future learner performance over other learner proficiency models." into Simplified Chinese. Dynamic Item Response Models 扩展标准Item Response Theory (IRT),以捕捉学生能力的时间动态。这些模型有可能使 instrucitonal 系统在实时监测学生水平的进程中活动地监测学生的能力。现有的动态项Response模型使用昂贵的推理算法,这些算法不利于处理大量数据。在这种工作中,我们提出了Variational Temporal IRT (VTIRT),用于快速和准确地推理学生的能力。VTIRT可以提供数个级别的速度提升,而且仍然提供准确的推理。此外,我们的算法具有自然的解释性,由于其模块化的设计。当应用于9个实际学生数据集时,VTIRT一致地提高了预测未来学生性能的性能,相比其他学生能力模型。

Uncertainty Quantification in Neural-Network Based Pain Intensity Estimation

  • paper_url: http://arxiv.org/abs/2311.08569
  • repo_url: None
  • paper_authors: Burcu Ozek, Zhenyuan Lu, Srinivasan Radhakrishnan, Sagar Kamarthi
    for: 本研究旨在提供一种基于神经网络的疼痛时间Interval估计方法,并考虑uncertainty量化。methods: 本研究采用三种算法:bootstrap方法、LossL优化 Algorithm和modified LossS优化 Algorithm。results: 研究结果显示LossS方法比其他两种方法提供更窄的预测Interval,并在不同的疼痛评估场景下(一般方法、个性化方法和混合方法)进行了评估。 hybrid方法在临床上展现出了最佳性能,具有实际应用价值。
    Abstract Improper pain management can lead to severe physical or mental consequences, including suffering, and an increased risk of opioid dependency. Assessing the presence and severity of pain is imperative to prevent such outcomes and determine the appropriate intervention. However, the evaluation of pain intensity is challenging because different individuals experience pain differently. To overcome this, researchers have employed machine learning models to evaluate pain intensity objectively. However, these efforts have primarily focused on point estimation of pain, disregarding the inherent uncertainty and variability present in the data and model. Consequently, the point estimates provide only partial information for clinical decision-making. This study presents a neural network-based method for objective pain interval estimation, incorporating uncertainty quantification. This work explores three algorithms: the bootstrap method, lower and upper bound estimation (LossL) optimized by genetic algorithm, and modified lower and upper bound estimation (LossS) optimized by gradient descent algorithm. Our empirical results reveal that LossS outperforms the other two by providing a narrower prediction interval. As LossS outperforms, we assessed its performance in three different scenarios for pain assessment: (1) a generalized approach (single model for the entire population), (2) a personalized approach (separate model for each individual), and (3) a hybrid approach (separate model for each cluster of individuals). Our findings demonstrate the hybrid approach's superior performance, with notable practicality in clinical contexts. It has the potential to be a valuable tool for clinicians, enabling objective pain intensity assessment while taking uncertainty into account. This capability is crucial in facilitating effective pain management and reducing the risks associated with improper treatment.
    摘要 不当的疼痛管理可能会导致严重的身体或心理后果,包括痛苦和对吗用药物的依赖。评估疼痛的存在和严重程度是非常重要的,以避免这些后果并确定适当的干预措施。然而,评估疼痛的程度是具有挑战性的,因为不同的人会经受疼痛的不同程度。为了解决这个问题,研究人员使用机器学习模型评估疼痛的程度。然而,这些努力主要集中在点估计疼痛的方面,忽略了数据和模型中存在的不确定性和变化。因此,点估计只提供了部分信息,不足以支持临床决策。本研究提出了一种基于神经网络的对疼痛间隔评估方法,并包括不确定性评估。本工作 explore three algorithms:bootstrap方法、LossL优化的遗传算法和LossS优化的梯度下降算法。我们的实验结果表明,LossS在三种方法中表现最佳,提供了较窄的预测 интерval。我们进一步评估LossS在不同的疼痛评估场景中的表现:(1)总体方法(单一模型为整个人口)、(2)个性化方法(每个个体各自的模型)和(3)混合方法(每个群体的模型)。我们的发现表明,混合方法的表现最佳,具有在临床上实际的实用性。这种能力可能是临床医生所需的一种有价值的工具,帮助评估疼痛的程度,同时考虑不确定性。这种能力是管理疼痛的正确方式的关键,可以减少不当的治疗所导致的风险。

Manifold learning in Wasserstein space

  • paper_url: http://arxiv.org/abs/2311.08549
  • repo_url: None
  • paper_authors: Keaton Hamm, Caroline Moosmüller, Bernhard Schmitzer, Matthew Thorpe
  • For: 本研究旨在建立拓扑学基础 для拟合概率分布的推理算法,具体是在紧邻的 compact和凸的 $\mathbb{R}^d$ 上的可数continue probability measure空间中。* Methods: 本文引入一种自然的构造方法,即使用 Wasserstein-2 距离 $W$ 来定义submanifold $\Lambda$ 上的 métrique $W_\Lambda$。这些submanifold不一定是平坦的,但仍然允许当地的线性化。然后,通过samples ${\lambda_i}{i=1}^N$ 和 pairwise extrinsic Wasserstein distances $W$ 来学习 $\Lambda$ 的潜在拓扑结构。* Results: 本文显示了如何从 samples ${\lambda_i}{i=1}^N$ 和 $W$ alone 学习 $\Lambda$ 的 métrique space $( \Lambda, W_{\Lambda})$,并且可以 asymptotically recover the metric space from a graph with nodes ${\lambda_i}{i=1}^N$ 和 edge weights $W(\lambda_i,\lambda_j)$. 此外,文章还证明了如何通过spectral analysis 来recover tangent space at a sample $\lambda$ via optimal transport maps from $\lambda$ to sufficiently close and diverse samples ${\lambda_i}{i=1}^N$.
    Abstract This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $W$. We begin by introducing a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, the geodesic restriction of $W$ to $\Lambda$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. In particular, we show that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$. In addition, we demonstrate how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.
    摘要 The paper shows how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. Specifically, the paper demonstrates that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$.Furthermore, the paper shows how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper provides explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.Translated into Simplified Chinese, the paper aims to establish the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures on a compact and convex subset of $\mathbb{R}^d$, using the Wasserstein-2 distance $W$. The paper introduces a natural construction of submanifolds $\Lambda$ of probability measures equipped with metric $W_\Lambda$, which is the geodesic restriction of $W$ to $\Lambda$. These submanifolds are not necessarily flat, but allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$.The paper shows how the latent manifold structure of $(\Lambda,W_{\Lambda})$ can be learned from samples $\{\lambda_i\}_{i=1}^N$ of $\Lambda$ and pairwise extrinsic Wasserstein distances $W$ only. Specifically, the paper demonstrates that the metric space $(\Lambda,W_{\Lambda})$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{\lambda_i\}_{i=1}^N$ and edge weights $W(\lambda_i,\lambda_j)$.Furthermore, the paper shows how the tangent space at a sample $\lambda$ can be asymptotically recovered via spectral analysis of a suitable "covariance operator" using optimal transport maps from $\lambda$ to sufficiently close and diverse samples $\{\lambda_i\}_{i=1}^N$. The paper provides explicit constructions of submanifolds $\Lambda$ and numerical examples on the recovery of tangent spaces through spectral analysis.

Low-Frequency Load Identification using CNN-BiLSTM Attention Mechanism

  • paper_url: http://arxiv.org/abs/2311.08536
  • repo_url: None
  • paper_authors: Amanie Azzam, Saba Sanami, Amir G. Aghdam
  • for: 这个研究旨在提出一个以卷积神经网和两向长短Term Memory为基础的混合学习方法,用于低频率电力资料分解。
  • methods: 本研究使用了一个混合的卷积神经网和BILSTM模型,同时具有一个集成的注意力机制,以提高低频率电力资料分解的精度。
  • results: 根据 simulations 的结果,提出的方法在精度和计算时间两个方面都有所提高,较以往的方法更高。
    Abstract Non-intrusive Load Monitoring (NILM) is an established technique for effective and cost-efficient electricity consumption management. The method is used to estimate appliance-level power consumption from aggregated power measurements. This paper presents a hybrid learning approach, consisting of a convolutional neural network (CNN) and a bidirectional long short-term memory (BILSTM), featuring an integrated attention mechanism, all within the context of disaggregating low-frequency power data. While prior research has been mainly focused on high-frequency data disaggregation, our study takes a distinct direction by concentrating on low-frequency data. The proposed hybrid CNN-BILSTM model is adept at extracting both temporal (time-related) and spatial (location-related) features, allowing it to precisely identify energy consumption patterns at the appliance level. This accuracy is further enhanced by the attention mechanism, which aids the model in pinpointing crucial parts of the data for more precise event detection and load disaggregation. We conduct simulations using the existing low-frequency REDD dataset to assess our model performance. The results demonstrate that our proposed approach outperforms existing methods in terms of accuracy and computation time.
    摘要 非侵入式电力监测(NILM)是一种已知的技术,用于有效和经济地管理电力消耗。该方法通过聚合电力测量值来估算设备级电力消耗。本文提出了一种混合学习方法,包括卷积神经网络(CNN)和双向长短期记忆神经网络(BILSTM),具有集成注意机制,所在的低频数据分解领域中。相比之前的研究主要集中在高频数据分解领域,我们的研究做出了不同的选择,即集中在低频数据分解领域。该混合模型具有提取时间相关特征和空间相关特征的能力,可以准确地识别设备级电力消耗模式。此外,注意机制可以帮助模型更加准确地检测和分解电力负荷。我们使用现有的低频RED dataset进行了 simulations,以评估我们的模型性能。结果显示,我们的提posed方法在准确率和计算时间方面都超过了现有方法。

On semi-supervised estimation using exponential tilt mixture models

  • paper_url: http://arxiv.org/abs/2311.08504
  • repo_url: None
  • paper_authors: Ye Tian, Xinwei Zhang, Zhiqiang Tan
  • for: 这个论文是用来探讨 semi-supervised learning 中使用 exponential tilt mixture (ETM) 模型和最大非Parametric 极限likelihood 估计方法,以优化预测性能。
  • methods: 该论文使用 exponential tilt mixture 模型和最大非Parametric 极限likelihood 估计方法,并对这些方法的 asymptotic properties 进行分析和解释。
  • results: 论文表明,使用 exponential tilt mixture 模型和最大非Parametric 极限likelihood 估计方法可以提高 semi-supervised learning 中的预测性能,并且在Random Sampling 和 outcome-stratified Sampling 两种采样方法下都有更高的效率。
    Abstract Consider a semi-supervised setting with a labeled dataset of binary responses and predictors and an unlabeled dataset with only the predictors. Logistic regression is equivalent to an exponential tilt model in the labeled population. For semi-supervised estimation, we develop further analysis and understanding of a statistical approach using exponential tilt mixture (ETM) models and maximum nonparametric likelihood estimation, while allowing that the class proportions may differ between the unlabeled and labeled data. We derive asymptotic properties of ETM-based estimation and demonstrate improved efficiency over supervised logistic regression in a random sampling setup and an outcome-stratified sampling setup previously used. Moreover, we reconcile such efficiency improvement with the existing semiparametric efficiency theory when the class proportions in the unlabeled and labeled data are restricted to be the same. We also provide a simulation study to numerically illustrate our theoretical findings.
    摘要 请考虑一个半监督性Setting中,有一个标注的数据集,其中变量和回快应答的关系是一个二分类问题,并且有一个无标注数据集,只包含变量。在这个Setting中,椭圆倾斜模型(ETM)是等效于折射函数回快的模型。我们进一步分析和理解使用ETM模型进行半监督性估计,并允许类别之间的分布差异。我们 derive了ETM模型基于的估计的极限性质,并在随机抽样和结果抽样两种设置中证明了我们的估计方法比超vised折射函数回快更高效。此外,我们还与现有的半 Parametric效率理论相协调,当类别在无标注和标注数据中的分布相同时,我们的估计方法具有更高效的性质。最后,我们进行了一个数值 simulations to numerically illustrate our theoretical findings。

Variational Quantum Eigensolver with Constraints (VQEC): Solving Constrained Optimization Problems via VQE

  • paper_url: http://arxiv.org/abs/2311.08502
  • repo_url: None
  • paper_authors: Thinh Viet Le, Vassilis Kekatos
  • for: 提出了一种扩展了 celebritied VQE 的量子-классический算法模式,以解决优化问题中的约束。
  • methods: 使用了量子-классический算法模式,在量子圈中捕捉优化变量的vector,并在类比的Lagrangian函数中进行权重补做。
  • results: 可以准确地解决quadratically-constrained binary optimization (QCBO)问题,找到符号化二进制策略,并解决大规模的线性程序 (LP) 问题。
    Abstract Variational quantum approaches have shown great promise in finding near-optimal solutions to computationally challenging tasks. Nonetheless, enforcing constraints in a disciplined fashion has been largely unexplored. To address this gap, this work proposes a hybrid quantum-classical algorithmic paradigm termed VQEC that extends the celebrated VQE to handle optimization with constraints. As with the standard VQE, the vector of optimization variables is captured by the state of a variational quantum circuit (VQC). To deal with constraints, VQEC optimizes a Lagrangian function classically over both the VQC parameters as well as the dual variables associated with constraints. To comply with the quantum setup, variables are updated via a perturbed primal-dual method leveraging the parameter shift rule. Among a wide gamut of potential applications, we showcase how VQEC can approximately solve quadratically-constrained binary optimization (QCBO) problems, find stochastic binary policies satisfying quadratic constraints on the average and in probability, and solve large-scale linear programs (LP) over the probability simplex. Under an assumption on the error for the VQC to approximate an arbitrary probability mass function (PMF), we provide bounds on the optimality gap attained by a VQC. Numerical tests on a quantum simulator investigate the effect of various parameters and corroborate that VQEC can generate high-quality solutions.
    摘要 几何量子方法已经显示出了解决计算复杂任务的很好的承诺。然而,在一个有约束的情况下进行约束的执行还是尚未得到了充分的探索。为了解决这个差距,这项工作提出了一种混合量子-классический算法体系,称为VQEC,该体系将扩展了著名的VQE来处理优化问题中的约束。与标准VQE类似,VQEC中的优化变量vector被捕捉到了变量量子电路(VQC)的状态中。为了处理约束,VQEC在类比的Lagrangian函数上进行了类比的优化。在量子设置中,变量被更新了通过一种受到参数shift规则的偏好 primal-dual 方法。在各种应用中,我们展示了如何VQEC可以约approximately解决 quadratic-constrained binary optimization (QCBO)问题,找到随机二进制政策满足平均和概率上的二次约束,以及解决大规模的线性Program (LP) over the probability simplex。对于VQC的参数错误,我们提供了关于优化的距离上的下界。在一个量子仿真器上进行的数值测试表明,VQEC可以生成高质量的解决方案。

Ensemble sampling for linear bandits: small ensembles suffice

  • paper_url: http://arxiv.org/abs/2311.08376
  • repo_url: None
  • paper_authors: David Janz, Alexander E. Litvak, Csaba Szepesvári
  • for: 这个论文是为了研究 Stochastic Linear Bandit 设定下 ensemble sampling 的效果的。
  • methods: 这个论文使用了 ensemble sampling 方法,并且ensemble size 是 $d \log T$ 的 ORDER。
  • results: 这个论文显示,在标准假设下, ensemble sampling 可以避免 linear 时间的 ensemble size 增长,而达到 near $\sqrt{T}$ 的 regret。此外,这个论文还是首次在结构化设定下不需要 ensemble size 与 $T$ 成线性关系。
    Abstract We provide the first useful, rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d \log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\sqrt{T}$ order regret. Ours is also the first result that allows infinite action sets.
    摘要 我们提供了阶梯循环探索的首个有用且严谨的分析,具体是针对多元阶梯循环设定中的数学线上抽样。具体来说,我们证明,在标准假设之下,一个 $d$-维的阶梯循环中的抽样 ensemble 的大小为 $m$,其中 $m$ 的预设值为 $d \log T$,则对应的忘却 regret 将 bounded by order $(d \log T)^{5/2} \sqrt{T}$。我们的结果是首个不需要ensemble size scales linearly with $T$的结果,而且可以 дости到近 $\sqrt{T}$ 类的忘却 regret。此外,我们的结果还允许无限的动作集。

Transformers can optimally learn regression mixture models

  • paper_url: http://arxiv.org/abs/2311.08362
  • repo_url: None
  • paper_authors: Reese Pathak, Rajat Sen, Weihao Kong, Abhimanyu Das
  • for: investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions.
  • methods: use transformers to learn a mixture of linear regressions, and prove that the decision-theoretic optimal procedure is indeed implementable by a transformer.
  • results: transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts, and can make predictions that are close to the optimal predictor.
    Abstract Mixture models arise in many regression problems, but most methods have seen limited adoption partly due to these algorithms' highly-tailored and model-specific nature. On the other hand, transformers are flexible, neural sequence models that present the intriguing possibility of providing general-purpose prediction methods, even in this mixture setting. In this work, we investigate the hypothesis that transformers can learn an optimal predictor for mixtures of regressions. We construct a generative process for a mixture of linear regressions for which the decision-theoretic optimal procedure is given by data-driven exponential weights on a finite set of parameters. We observe that transformers achieve low mean-squared error on data generated via this process. By probing the transformer's output at inference time, we also show that transformers typically make predictions that are close to the optimal predictor. Our experiments also demonstrate that transformers can learn mixtures of regressions in a sample-efficient fashion and are somewhat robust to distribution shifts. We complement our experimental observations by proving constructively that the decision-theoretic optimal procedure is indeed implementable by a transformer.
    摘要 “混合模型在许多回归问题中出现,但大多数方法尚未得到广泛采用,一部分原因是这些算法具有特定和模型固有的特点。然而,变换器是一种灵活的神经网络模型,它们可能提供一种通用预测方法,即使在混合Setting中。在这个工作中,我们研究了假设变Transformers可以学习混合回归的优化预测器。我们构造了一个生成过程,其中混合线性回归的决策理论优化程序是通过数据驱动的几何加权来实现。我们发现,变换器在生成的数据上具有低 Mean Squared Error。在推理时,我们也证明了变换器通常会预测接近优化预测器。我们的实验还表明,变换器可以在一个样本效率的方式上学习混合回归,并且对分布变化具有一定的抗预测性。我们补充了我们的实验观察,通过构造性地证明了决策理论优化程序实际上可以通过变换器来实现。”Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Sparsity-Preserving Differentially Private Training of Large Embedding Models

  • paper_url: http://arxiv.org/abs/2311.08357
  • repo_url: None
  • paper_authors: Badih Ghazi, Yangsibo Huang, Pritish Kamath, Ravi Kumar, Pasin Manurangsi, Amer Sinha, Chiyuan Zhang
  • for: 保护用户隐私,防止大规模矩阵模型中泄露用户个人信息。
  • methods: 提出了两种新算法DP-FEST和DP-AdaFEST,可以在私有训练大规模矩阵模型时保持梯度稀疏性,提高训练效率。
  • results: 实验结果表明,我们的算法可以在实际世界 dataset 上实现大量梯度减小($10^6 \times$),同时保持比较高的准确率。
    Abstract As the use of large embedding models in recommendation systems and language applications increases, concerns over user data privacy have also risen. DP-SGD, a training algorithm that combines differential privacy with stochastic gradient descent, has been the workhorse in protecting user privacy without compromising model accuracy by much. However, applying DP-SGD naively to embedding models can destroy gradient sparsity, leading to reduced training efficiency. To address this issue, we present two new algorithms, DP-FEST and DP-AdaFEST, that preserve gradient sparsity during private training of large embedding models. Our algorithms achieve substantial reductions ($10^6 \times$) in gradient size, while maintaining comparable levels of accuracy, on benchmark real-world datasets.
    摘要

Mean-field variational inference with the TAP free energy: Geometric and statistical properties in linear models

  • paper_url: http://arxiv.org/abs/2311.08442
  • repo_url: None
  • paper_authors: Michael Celentano, Zhou Fan, Licong Lin, Song Mei
  • for: study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p.
  • methods: minimize the TAP free energy, and use an Approximate Message Passing (AMP) algorithm to find the local minimizer.
  • results: the TAP free energy has a local minimizer which provides a consistent estimate of the posterior marginals, and the algorithm linearly converges to the minimizer within this local neighborhood.Here’s the text in Traditional Chinese:
  • for: 研究在高维 Bayesian 线性模型中使用 mean-field variational inference,当 sample size n 与 dimension p 相对接近时。
  • methods: 使用 TAP 自由能点来寻找最佳解,并使用 Approximate Message Passing (AMP) 算法寻找最佳解的地方最小化。
  • results: TAP 自由能点有一个地方最小化,可以提供正确的 posterior 分布 marginal,并且使用 AMP 算法可以将解导向到这个最小化中。
    Abstract We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maximizing an evidence lower bound, may deviate from the true posterior mean and underestimate posterior uncertainty. We study instead minimization of the TAP free energy, showing in a high-dimensional asymptotic framework that it has a local minimizer which provides a consistent estimate of the posterior marginals and may be used for correctly calibrated posterior inference. Geometrically, we show that the landscape of the TAP free energy is strongly convex in an extensive neighborhood of this local minimizer, which under certain general conditions can be found by an Approximate Message Passing (AMP) algorithm. We then exhibit an efficient algorithm that linearly converges to the minimizer within this local neighborhood. In settings where it is conjectured that no efficient algorithm can find this local neighborhood, we prove analogous geometric properties for a local minimizer of the TAP free energy reachable by AMP, and show that posterior inference based on this minimizer remains correctly calibrated.
    摘要 我们研究了mean-field变量推断在 bayesian 线性模型中,当样本大小 n 与维度 p 相对可观时。在高维度下,通常采用 minimum 库拉布-莱布劳分配函数或最大化证明下界来实现 posterior 分布的估计,但这可能会偏离真实 posterior mean 并低估 posterior uncertainty。我们改用 TAP 自由能的最小化,在高维度极限框架中显示了一个本地最小值,该值可以提供正确的 posterior marginals 和 correctly calibrated posterior inference。从 геометрической角度来看,TAP 自由能的 landscape 在一个广泛的扩展 neighborhood 中是强烈凹陷的,其中可以使用 Approximate Message Passing(AMP)算法来找到本地最小值。我们还提出了一种高效的算法,可以在这个本地 neighborhood 中线性征化到最小值。在某些情况下,我们 conjecture 了可能无法有效地找到本地 neighborhood,我们则证明了一个本地最小值可以通过 AMP 算法来实现,并且 posterior inference 基于这个最小值仍然正确推断。

Introducing an Improved Information-Theoretic Measure of Predictive Uncertainty

  • paper_url: http://arxiv.org/abs/2311.08309
  • repo_url: None
  • paper_authors: Kajetan Schweighofer, Lukas Aichberger, Mykyta Ielanskyi, Sepp Hochreiter
  • for: 本研究旨在提高机器学习模型在实际应用中的决策能力,特别是分别知道和不知道模型的能力。
  • methods: 研究人员提出了一种新的measure of predictive uncertainty,以解决现有措施的限制。
  • results: 研究人员通过控制的 sintetic任务和ImageNet数据集的evalution,证明了新的措施的优势,其行为更加合理,在实际应用中也具有优势。
    Abstract Applying a machine learning model for decision-making in the real world requires to distinguish what the model knows from what it does not. A critical factor in assessing the knowledge of a model is to quantify its predictive uncertainty. Predictive uncertainty is commonly measured by the entropy of the Bayesian model average (BMA) predictive distribution. Yet, the properness of this current measure of predictive uncertainty was recently questioned. We provide new insights regarding those limitations. Our analyses show that the current measure erroneously assumes that the BMA predictive distribution is equivalent to the predictive distribution of the true model that generated the dataset. Consequently, we introduce a theoretically grounded measure to overcome these limitations. We experimentally verify the benefits of our introduced measure of predictive uncertainty. We find that our introduced measure behaves more reasonably in controlled synthetic tasks. Moreover, our evaluations on ImageNet demonstrate that our introduced measure is advantageous in real-world applications utilizing predictive uncertainty.
    摘要 使用机器学习模型进行决策需要能够区分模型所知和模型所不知。一个重要的评估模型知识的因素是量化模型预测不确定性。预测不确定性通常由巴YES插值模型(BMA)预测分布的熵来度量。然而,当前的这种预测不确定性度量存在限制。我们提供新的洞察,这种限制的原因是BMA预测分布与真实生成数据集的模型预测分布不同。因此,我们引入基于理论的预测不确定性度量,以超越这些限制。我们实验证明了我们引入的预测不确定性度量具有更合理的行为,并在控制的 sintetic任务中进行了实验证明。此外,我们在ImageNet上进行了评估,发现我们引入的预测不确定性度量在实际应用中具有优势。

On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling

  • paper_url: http://arxiv.org/abs/2311.08290
  • repo_url: None
  • paper_authors: Nicholas E. Corrado, Josiah P. Hanna
  • for: 提高在做RL算法时的数据效率,避免采样错误导致的启发式学习。
  • methods: 提出一种适应性的、偏置外样本采集方法(PROPS),通过增加当前策略下的样本动作概率来减少采样错误,从而提高数据效率。
  • results: 在 MuJoCo benchmark 任务上以及权值任务上,证明 PROPS 可以在训练过程中逐步减少采样错误,并提高偏置外样本采集法的数据效率。
    Abstract On-policy reinforcement learning (RL) algorithms perform policy updates using i.i.d. trajectories collected by the current policy. However, after observing only a finite number of trajectories, on-policy sampling may produce data that fails to match the expected on-policy data distribution. This sampling error leads to noisy updates and data inefficient on-policy learning. Recent work in the policy evaluation setting has shown that non-i.i.d., off-policy sampling can produce data with lower sampling error than on-policy sampling can produce. Motivated by this observation, we introduce an adaptive, off-policy sampling method to improve the data efficiency of on-policy policy gradient algorithms. Our method, Proximal Robust On-Policy Sampling (PROPS), reduces sampling error by collecting data with a behavior policy that increases the probability of sampling actions that are under-sampled with respect to the current policy. Rather than discarding data from old policies -- as is commonly done in on-policy algorithms -- PROPS uses data collection to adjust the distribution of previously collected data to be approximately on-policy. We empirically evaluate PROPS on both continuous-action MuJoCo benchmark tasks as well as discrete-action tasks and demonstrate that (1) PROPS decreases sampling error throughout training and (2) improves the data efficiency of on-policy policy gradient algorithms. Our work improves the RL community's understanding of a nuance in the on-policy vs off-policy dichotomy: on-policy learning requires on-policy data, not on-policy sampling.
    摘要 在返回学习(RL)算法中,在政策更新中使用独立同分布(i.i.d)的轨迹是常见的。然而,只observe了finite个轨迹后,在当前策略下进行on-policy sampling可能会产生数据不符合预期的on-policy数据分布。这种抽样错误会导致更新不稳定和数据不fficient。在策略评估设置下, latest work 表明,非独立、off-policy抽样可以生成比on-policy抽样更加稳定的数据。为了解决这个问题,我们介绍了一种适应的off-policy抽样方法,即PROPS。PROPS方法通过使用当前策略增加对于当前策略下尚未抽样的动作的抽样概率来减少抽样错误。不同于通常在on-policy算法中抛弃老策略下的数据,PROPS方法使用数据采集来调整先前采集的数据,使其更加接近on-policy数据分布。我们对MuJoCo continueous-action和discrete-action任务进行了实验,并证明了以下两点:1. PROPS方法在训练过程中逐渐减少抽样错误。2. PROPS方法可以提高on-policy policy gradient算法的数据效率。我们的工作有助于RL社区更好地理解on-policy学习和off-policy学习之间的区别:on-policy学习不需要on-policy抽样,而是需要on-policy数据。

Mixed Attention Network for Cross-domain Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2311.08272
  • repo_url: https://github.com/guanyu-lin/man
  • paper_authors: Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang
  • for: 提高数据稀缺问题的现代推荐系统中的sequential推荐,特别是新用户的推荐。
  • methods: 我们提出了一种Mixed Attention Network(MAN),包括本地/全局编码层、混合注意层和本地/全局预测层。本地/全局编码层用于捕捉域特定的Sequential模式,混合注意层用于捕捉本地和全局的项目相似性、序列融合和用户群组 across multiple domains。最后,本地/全局预测层用于进一步演化和结合域特定和跨域的兴趣。
  • results: 在两个真实世界数据集(每个有两个域)上,我们的提出的模型得到了superiority。此外,我们还进行了further study,发现我们的方法和组件都是model-agnostic和effective。代码和数据可以在https://github.com/Guanyu-Lin/MAN上获取。
    Abstract In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a Mixed Attention Network (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN.
    摘要 现代推荐系统中,顺序推荐利用用户的时间序列行为来进行有效的下一个项目建议,尤其是对于新用户来说,它受到数据稀缺问题的困扰。跨Domain推荐是一条有前途的方向,它通过跨多个领域的数据来提高数据稀缺领域的性能。然而,现有的跨Domain顺序推荐模型,如PiNet和DASL,具有依赖于不同领域之间的重叠用户的限制,这限制了它们在实际推荐系统中的应用。在这篇论文中,我们提出了一种混合注意网络(MAN),它包括本地/全球注意模块来EXTRACT DOMAIN-SPECIFIC AND CROSS-DOMAIN信息。首先,我们提出了本地/全球编码层,用于捕捉不同领域的Sequential pattern。然后,我们提出了混合注意层,包括物品相似注意、序列融合注意和组prototype注意,用于捕捉本地/全球item相似性、融合本地/全球item序列和提取不同领域之间的用户组。最后,我们提出了本地/全球预测层,用于进一步演化和结合不同领域的用户兴趣。实验结果表明,我们提出的方法在两个真实世界数据集(各有两个领域)上具有优势。此外,我们还进行了进一步的研究,并证明我们的方法和组件是模型免疫的和有效的。代码和数据可以在https://github.com/Guanyu-Lin/MAN上获取。

Mobility-Induced Graph Learning for WiFi Positioning

  • paper_url: http://arxiv.org/abs/2311.08271
  • repo_url: None
  • paper_authors: Kyuwon Han, Seung Min Yu, Seong-Lyun Kim, Seung-Woo Ko
  • for: 这篇论文的目的是提出一种基于智能手机的用户流动跟踪方法,以实现更高精度的用户位置确定。
  • methods: 该方法使用了两种不同的图构建方法,即时间驱动的移动图(TMG)和方向驱动的移动图(DMG),然后通过图 convolutional neural network(GCN)相互学习,将这两种图的信息 fusion 以实现更高精度的位置确定。
  • results: 对Field experimentResults 表明,提出的方法可以实现更高精度的用户位置确定,比如root mean square errors(RMSE)为1.398(m)和1.073(m)在自助学习和半助学习学习框架下,分别下降了27.3%和44.4%。
    Abstract A smartphone-based user mobility tracking could be effective in finding his/her location, while the unpredictable error therein due to low specification of built-in inertial measurement units (IMUs) rejects its standalone usage but demands the integration to another positioning technique like WiFi positioning. This paper aims to propose a novel integration technique using a graph neural network called Mobility-INduced Graph LEarning (MINGLE), which is designed based on two types of graphs made by capturing different user mobility features. Specifically, considering sequential measurement points (MPs) as nodes, a user's regular mobility pattern allows us to connect neighbor MPs as edges, called time-driven mobility graph (TMG). Second, a user's relatively straight transition at a constant pace when moving from one position to another can be captured by connecting the nodes on each path, called a direction-driven mobility graph (DMG). Then, we can design graph convolution network (GCN)-based cross-graph learning, where two different GCN models for TMG and DMG are jointly trained by feeding different input features created by WiFi RTTs yet sharing their weights. Besides, the loss function includes a mobility regularization term such that the differences between adjacent location estimates should be less variant due to the user's stable moving pace. Noting that the regularization term does not require ground-truth location, MINGLE can be designed under semi- and self-supervised learning frameworks. The proposed MINGLE's effectiveness is extensively verified through field experiments, showing a better positioning accuracy than benchmarks, say root mean square errors (RMSEs) being 1.398 (m) and 1.073 (m) for self- and semi-supervised learning cases, respectively.
    摘要 用智能手机的用户流动跟踪可以有效地找到他/她的位置,但由于内置的各种各样的各种各样测量单元 (IMU) 的低精度,这种独立使用不可靠,需要与其他定位技术结合。这篇论文提出了一种新的集成技术,即用于流动图学学习 (MINGLE),该技术基于两种不同的图, capture 用户流动特征。 Specifically, we consider sequential measurement points (MPs) as nodes, and a user's regular mobility pattern allows us to connect neighbor MPs as edges, called time-driven mobility graph (TMG). Second, a user's relatively straight transition at a constant pace when moving from one position to another can be captured by connecting the nodes on each path, called a direction-driven mobility graph (DMG). Then, we can design graph convolution network (GCN)-based cross-graph learning, where two different GCN models for TMG and DMG are jointly trained by feeding different input features created by WiFi RTTs yet sharing their weights. Besides, the loss function includes a mobility regularization term such that the differences between adjacent location estimates should be less variant due to the user's stable moving pace. Noting that the regularization term does not require ground-truth location, MINGLE can be designed under semi- and self-supervised learning frameworks. The proposed MINGLE's effectiveness is extensively verified through field experiments, showing a better positioning accuracy than benchmarks, with root mean square errors (RMSEs) being 1.398 (m) and 1.073 (m) for self- and semi-supervised learning cases, respectively.

A Simple and Powerful Framework for Stable Dynamic Network Embedding

  • paper_url: http://arxiv.org/abs/2311.09251
  • repo_url: https://github.com/edwarddavis1/universal_dynamic_embedding_with_testing
  • paper_authors: Ed Davis, Ian Gallagher, Daniel John Lawson, Patrick Rubin-Delanchy
  • for: 本文解决了动态网络嵌入的问题,即将动态网络中的节点表示为低维度空间中演化的向量。 static网络嵌入领域较为成熟,而动态网络嵌入领域相对较为初级。
  • methods: 我们提出了一种将广泛使用的静态网络嵌入方法应用于扩展 adjacency matrix,以生成可读写的动态网络嵌入。我们提供了一个理论保证,即无论嵌入维度如何,这些扩展方法都会生成稳定的嵌入,即时间和空间中的节点行为相同的节点将是可交换的。
  • results: 我们定义了一种用于评估动态网络嵌入质量的假设测试框架,并使用这个框架测试了一些虚拟网络的动态网络嵌入。我们发现,even in trivial cases, unstable methods often either conservative or encode incorrect structure。相比之下,我们的稳定 unfolded 方法不仅更容易理解,也更有力量,在比较权重方面表现更好。
    Abstract In this paper, we address the problem of dynamic network embedding, that is, representing the nodes of a dynamic network as evolving vectors within a low-dimensional space. While the field of static network embedding is wide and established, the field of dynamic network embedding is comparatively in its infancy. We propose that a wide class of established static network embedding methods can be used to produce interpretable and powerful dynamic network embeddings when they are applied to the dilated unfolded adjacency matrix. We provide a theoretical guarantee that, regardless of embedding dimension, these unfolded methods will produce stable embeddings, meaning that nodes with identical latent behaviour will be exchangeable, regardless of their position in time or space. We additionally define a hypothesis testing framework which can be used to evaluate the quality of a dynamic network embedding by testing for planted structure in simulated networks. Using this, we demonstrate that, even in trivial cases, unstable methods are often either conservative or encode incorrect structure. In contrast, we demonstrate that our suite of stable unfolded methods are not only more interpretable but also more powerful in comparison to their unstable counterparts.
    摘要 在这篇论文中,我们讨论了动态网络嵌入问题,即将动态网络中的节点表示为低维度空间中演化的 вектор。虽然静态网络嵌入领域已经广泛发展,但动态网络嵌入领域相对较为未发展。我们建议可以使用广泛应用于静态网络嵌入的已有方法来生成可读取和强大的动态网络嵌入。我们提供了一种理论保证,即不 matter embedding dimension,使用扩展 unfolded 方法生成的嵌入都是稳定的,意味着在时间或空间上相同的latent behavior的节点都是可交换的。此外,我们定义了一个假设测试框架,可以用来评估动态网络嵌入的质量,通过测试模拟网络中的植入结构。使用这个框架,我们示出了在rivial cases中,不稳定的方法通常是保守的或者编码了错误的结构。相比之下,我们的稳定 unfolded 方法不仅更加可读取也更加强大,与不稳定的对手相比。

Counterfactual Explanation for Regression via Disentanglement in Latent Space

  • paper_url: http://arxiv.org/abs/2311.08228
  • repo_url: None
  • paper_authors: Xuan Zhao, Klaus Broelemann, Gjergji Kasneci
  • for: 本研究旨在提供一种生成可能性解释(Counterfactual Explanation,CE)的新方法,以帮助用户更好地理解和控制AI系统的预测结果。
  • methods: 本方法首先在潜在空间中分离了标签相关和标签不相关的维度,然后通过组合这些维度和预定输出生成CE。该方法的基本想法是,理想的反因搜索应该专注于标签不相关的输入特征,并建议更改到目标相关的特征。在潜在空间进行搜索可以帮助实现这个目标。
  • results: 在多个图像和表格数据集上进行了多种实验,显示了该方法与三种现有方法相比,能够更有效地返回更加靠近原始数据折衣的结果。这是高维机器学习应用中至关重要的特点。代码将在本研究发表后作为开源包make disponibles。
    Abstract Counterfactual Explanations (CEs) help address the question: How can the factors that influence the prediction of a predictive model be changed to achieve a more favorable outcome from a user's perspective? Thus, they bear the potential to guide the user's interaction with AI systems since they represent easy-to-understand explanations. To be applicable, CEs need to be realistic and actionable. In the literature, various methods have been proposed to generate CEs. However, the majority of research on CEs focuses on classification problems where questions like ``What should I do to get my rejected loan approved?" are raised. In practice, answering questions like ``What should I do to increase my salary?" are of a more regressive nature. In this paper, we introduce a novel method to generate CEs for a pre-trained regressor by first disentangling the label-relevant from the label-irrelevant dimensions in the latent space. CEs are then generated by combining the label-irrelevant dimensions and the predefined output. The intuition behind this approach is that the ideal counterfactual search should focus on the label-irrelevant characteristics of the input and suggest changes toward target-relevant characteristics. Searching in the latent space could help achieve this goal. We show that our method maintains the characteristics of the query sample during the counterfactual search. In various experiments, we demonstrate that the proposed method is competitive based on different quality measures on image and tabular datasets in regression problem settings. It efficiently returns results closer to the original data manifold compared to three state-of-the-art methods, which is essential for realistic high-dimensional machine learning applications. Our code will be made available as an open-source package upon the publication of this work.
    摘要 “ counterfactual explanations (CEs) 可以帮助解答:如何让预测模型的预测结果更加有利?因此,它具有导引用户与人工智能系统互动的潜力。在文献中,许多方法已经被提出来生成 CE,但大多数研究专注于分类问题,例如“我可以做什么来获得被拒绝的贷款被批准?”相比之下,在实际应用中,更有需要回归的问题,例如“我可以做什么来增加我的薪资?”这些问题是较为回归的性质。在这篇论文中,我们提出了一种新的方法来生成 CE,具体来说是在潜在空间中分解标签相关的维度和标签不相关的维度,然后将这些维度与预定的输出结合来生成 CE。我们的想法是,理想的对抗方案应该专注于标签不相关的特征,并建议更改为标签相关的特征。在潜在空间中进行对抗搜寻可以帮助 достичь这个目标。我们显示了我们的方法可以将查询样本中的特征保留在对抗搜寻中,并在不同的质量度上与三种现有的方法进行比较。结果显示,我们的方法在数据集上的表现稳定且高效,能够更好地返回更加类似于原始数据构造的结果。我们将在这篇论文发表时公开源代码。”

Federated Skewed Label Learning with Logits Fusion

  • paper_url: http://arxiv.org/abs/2311.08202
  • repo_url: None
  • paper_authors: Yuwei Wang, Runhan Li, Hao Tan, Xuefeng Jiang, Sheng Sun, Min Liu, Bo Gao, Zhiyuan Wu
  • for: 本研究旨在Addressing label distribution skew challenge in federated learning (FL) settings, where data label categories are imbalanced on each client.
  • methods: 我们提出了FedBalance方法,将calibrate local models’ logits to correct optimization bias caused by data heterogeneity. Specifically, we introduce an extra private weak learner on the client side to capture the variance of different data.
  • results: 我们的方法可以实现13% higher average accuracy compared with state-of-the-art methods.
    Abstract Federated learning (FL) aims to collaboratively train a shared model across multiple clients without transmitting their local data. Data heterogeneity is a critical challenge in realistic FL settings, as it causes significant performance deterioration due to discrepancies in optimization among local models. In this work, we focus on label distribution skew, a common scenario in data heterogeneity, where the data label categories are imbalanced on each client. To address this issue, we propose FedBalance, which corrects the optimization bias among local models by calibrating their logits. Specifically, we introduce an extra private weak learner on the client side, which forms an ensemble model with the local model. By fusing the logits of the two models, the private weak learner can capture the variance of different data, regardless of their category. Therefore, the optimization direction of local models can be improved by increasing the penalty for misclassifying minority classes and reducing the attention to majority classes, resulting in a better global model. Extensive experiments show that our method can gain 13\% higher average accuracy compared with state-of-the-art methods.
    摘要

Modeling Complex Disease Trajectories using Deep Generative Models with Semi-Supervised Latent Processes

  • paper_url: http://arxiv.org/abs/2311.08149
  • repo_url: https://github.com/uzh-dqbm-cmi/eustar_dgm4h
  • paper_authors: Cécile Trottet, Manuel Schürch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators
  • for: 模型和分析复杂的疾病轨迹,寻找有意义的时间隐藏表示法。
  • methods: 使用深度生成时间序列模型,利用隐藏过程来解释观察到的疾病轨迹,并通过医学概念来增强可解释性。
  • results: 可以用于数据分析和临床假设测试,包括找到相似的病人和疾病分类新亚型,以及个性化在线监测和预测多变量时序序列,同时Quantification of uncertainty。
    Abstract In this paper, we propose a deep generative time series approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories. We aim to find meaningful temporal latent representations of an underlying generative process that explain the observed disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical concepts. By combining the generative approach with medical knowledge, we leverage the ability to discover novel aspects of the disease while integrating medical concepts into the model. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering the disease into new sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series including uncertainty quantification. We demonstrate the effectiveness of our approach in modeling systemic sclerosis, showcasing the potential of our machine learning model to capture complex disease trajectories and acquire new medical knowledge.
    摘要 在这篇论文中,我们提出了一种深度生成时间序列方法,使用潜在的时间序列过程来模型和整体分析复杂的疾病轨迹。我们希望找到可解释的时间潜在表示,以解释观察到的疾病轨迹。为了增强这些时间潜在表示的解释力,我们开发了一种半监督的方法,用已知的医学概念来分离潜在空间。通过结合生成方法和医学知识,我们利用了发现新的疾病方面的能力,并将医学概念集成到模型中。我们表明了学习的时间潜在过程可以用于进一步的数据分析和临床假设测试,包括找到相似的病人和疾病分类。此外,我们的方法可以实现个性化的在线监测和预测多元时序列,包括不确定性评估。我们在模型系统性综合病中示cases,展示了我们的机器学习模型的潜在性,可以捕捉复杂的疾病轨迹,并获得新的医学知识。

Lite it fly: An All-Deformable-Butterfly Network

  • paper_url: http://arxiv.org/abs/2311.08125
  • repo_url: None
  • paper_authors: Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong
  • for: 这篇论文旨在提出一种基于扭曲蝴蝶(DeBut)的神经网络压缩方法,可以减少神经网络中的参数量和计算量,同时保持神经网络的性能。
  • methods: 这篇论文使用了一种名为扭曲蝴蝶(DeBut)的新方法,它可以将权重矩阵分解成一系列特殊的扭曲因子,从而实现神经网络压缩。
  • results: 这篇论文的实验结果表明,使用扭曲蝴蝶(DeBut)方法可以压缩神经网络,同时保持神经网络的性能。例如,可以将PointNet压缩到只有5%的参数和5%的性能下降。
    Abstract Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes.
    摘要 大多数深度神经网络(DNNs)基本由卷积层和全连接层组成,其中线性变换可以被表示为权重矩阵和特征张量组成的乘法。最近提出的弹性蝴蝶(DeBut)层 decomposes the 权重矩阵 into 通用、蝴蝶形状的因子,从而实现网络压缩,与传统的减少或低级别压缩不同。这项工作揭示了DeBut层与深度first-order和点 wise卷积之间的密切关系,解释了DeBut层的实验性好表现。我们还开发了一个自动生成DeBut链的工具,实现了对DNN的同化,以实现极高的稀疏性和压缩。具体来说,我们显示了可以将PointNet压缩到<5%的参数下,且减少<5%的精度,这是其他压缩方案无法实现的纪录。

Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models

  • paper_url: http://arxiv.org/abs/2311.08113
  • repo_url: None
  • paper_authors: Gabriel Rodrigues Palma, Conor Thornberry, Seán Commins, Rafael de Andrade Moral
  • for: 这个论文主要针对的是用theta振荡(4-8Hz)来描述距离学习和记忆功能的发展。
  • methods: 这篇论文使用了隐马尔可夫模型和线性混合模型来提取EEG数据中的特征。
  • results: 这篇论文发现,使用隐藏马尔可夫模型和线性混合模型来处理EEG数据,并使用深度神经网络进行分类,可以高效地分类学习者和非学习者。
    Abstract Theta oscillations, ranging from 4-8 Hz, play a significant role in spatial learning and memory functions during navigation tasks. Frontal theta oscillations are thought to play an important role in spatial navigation and memory. Electroencephalography (EEG) datasets are very complex, making any changes in the neural signal related to behaviour difficult to interpret. However, multiple analytical methods are available to examine complex data structure, especially machine learning based techniques. These methods have shown high classification performance and the combination with feature engineering enhances the capability of these methods. This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data. Based on the engineered features obtained from frontal theta EEG data during a spatial navigation task in two key trials (first, last) and between two conditions (learner and non-learner), we analysed the performance of six machine learning methods (Polynomial Support Vector Machines, Non-linear Support Vector Machines, Random Forests, K-Nearest Neighbours, Ridge, and Deep Neural Networks) on classifying learner and non-learner participants. We also analysed how different standardisation methods used to pre-process the EEG data contribute to classification performance. We compared the classification performance of each trial with data gathered from the same subjects, including solely coordinate-based features, such as idle time and average speed. We found that more machine learning methods perform better classification using coordinate-based data. However, only deep neural networks achieved an area under the ROC curve higher than 80% using the theta EEG data alone. Our findings suggest that standardising the theta EEG data and using deep neural networks enhances the classification of learner and non-learner subjects in a spatial learning task.
    摘要 θ振荡(4-8Hz)在空间学习和记忆功能中发挥重要作用。前rontalθ振荡被认为在空间导航和记忆中发挥重要作用。electroencephalography(EEG)数据非常复杂,因此任何与行为相关的脑电信号变化很难以解释。然而,多种分析方法可以检查复杂数据结构,特别是基于机器学习的技术。这些方法在分类性能方面表现出色,并且通过特征工程提高了这些方法的能力。这篇论文提出使用隐马尔可夫和线性混合模型提取EEG数据中的特征。基于前frontalθ振荡EEG数据在空间导航任务中的两个关键尝试(第一次和最后一次)以及两个条件(学习者和非学习者),我们分析了六种机器学习方法(多项式支持向量机器、非线性支持向量机器、随机森林、k-最近邻居、梯度和深度神经网络)在分类学习者和非学习者参与者的性能。我们还分析了不同的标准化方法如何影响分类性能。我们与同样来自同一个参与者的坐标基于特征(如空闲时间和平均速度)进行比较。我们发现更多的机器学习方法在坐标基于特征上表现出色。然而,只有深度神经网络在θEEG数据alone上达到了80%的分类精度。我们的发现表明,标准化θEEG数据和使用深度神经网络可以提高空间学习任务中学习者和非学习者的分类。

Evolutionary-enhanced quantum supervised learning model

  • paper_url: http://arxiv.org/abs/2311.08081
  • repo_url: None
  • paper_authors: Anton Simen Albino, Rodrigo Bloot, Otto M. Pires, Erick G. S. Nascimento
  • for: 提高 NISQ 设备上超vised learning 的效率和准确率
  • methods: 使用变量拓扑的量子回归circuit,通过自适应策略和多热编码的综合体系来解决 barren plateau 问题
  • results: 比较 experiments 表明,我们的演化加料逻辑模型可以减轻 barren plateau 问题,提高模型的准确率和训练效率,并在传统难以处理的 dataset 上达到更高的性能
    Abstract Quantum supervised learning, utilizing variational circuits, stands out as a promising technology for NISQ devices due to its efficiency in hardware resource utilization during the creation of quantum feature maps and the implementation of hardware-efficient ansatz with trainable parameters. Despite these advantages, the training of quantum models encounters challenges, notably the barren plateau phenomenon, leading to stagnation in learning during optimization iterations. This study proposes an innovative approach: an evolutionary-enhanced ansatz-free supervised learning model. In contrast to parametrized circuits, our model employs circuits with variable topology that evolves through an elitist method, mitigating the barren plateau issue. Additionally, we introduce a novel concept, the superposition of multi-hot encodings, facilitating the treatment of multi-classification problems. Our framework successfully avoids barren plateaus, resulting in enhanced model accuracy. Comparative analysis with variational quantum classifiers from the technology's state-of-the-art reveal a substantial improvement in training efficiency and precision. Furthermore, we conduct tests on a challenging dataset class, traditionally problematic for conventional kernel machines, demonstrating a potential alternative path for achieving quantum advantage in supervised learning for NISQ era.
    摘要 To address these challenges, this study proposes an innovative approach: an evolutionary-enhanced ansatz-free supervised learning model. Unlike parametrized circuits, our model uses circuits with variable topology that evolve through an elitist method, mitigating the barren plateau issue. Additionally, we introduce a novel concept, the superposition of multi-hot encodings, which facilitates the treatment of multi-classification problems.Our framework successfully avoids barren plateaus, resulting in enhanced model accuracy. Comparative analysis with variational quantum classifiers from the technology's state-of-the-art reveals a substantial improvement in training efficiency and precision. Furthermore, we conduct tests on a challenging dataset class, traditionally problematic for conventional kernel machines, demonstrating a potential alternative path for achieving quantum advantage in supervised learning for NISQ era.Translated into Simplified Chinese:量子指导学习,使用变量Circuits,在NISQ设备上表现出了优异的承袭,具有硬件资源利用的效率,在创建量子特征图和实现硬件高效的 ansatz 中使用变量Circuits。然而,量子模型的训练仍然遇到了挑战,主要表现为杯瓷平台现象,导致优化迭代中的学习停滞。为了解决这些挑战,这种研究提出了一种创新的方法:一种生化增强的 ansatz-free 指导学习模型。与 Parametrized Circuits 不同,我们的模型使用变量 topology 的Circuits,通过一种Elitist方法进行演化,从而 Mitigate the barren plateau 问题。此外,我们还提出了一种新的概念,即多个热门编码的超position,可以方便处理多类别问题。我们的框架成功避免了杯瓷平台问题,从而实现了提高的模型准确率。与现代量子分类器的比较分析表明,我们的模型在训练效率和精度上具有明显的改善。此外,我们还在一个传统上难以处理的数据集类型上进行了测试,表明了量子优势在指导学习中的可能性。

Communication-Constrained Bayesian Active Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2311.08053
  • repo_url: None
  • paper_authors: Victor Croisfelt, Shashi Raj Pandey, Osvaldo Simeone, Petar Popovski
  • For: + The paper addresses key questions in active learning settings with a remote teacher and constrained communication channels. + The goal is to reduce the number of required communication rounds while acquiring the most useful information.* Methods: + The paper introduces Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD), which integrates Bayesian active learning with compression via a linear mix-up mechanism. + The method selects batches of inputs based on their epistemic uncertainty, addressing “confirmation bias” and reducing the number of required communication rounds. + The proposed mix-up compression strategy is integrated with the epistemic uncertainty-based active batch selection process.* Results: + The proposed CC-BAKD protocol is shown to be effective in reducing the number of required communication rounds while acquiring the most useful information. + The method is evaluated on several benchmark datasets and is found to outperform existing active learning methods in terms of communication efficiency and accuracy.Here is the answer in Simplified Chinese:* For: + 本文Addresses key questions in active learning settings with a remote teacher and constrained communication channels. + 目的是尽量减少通信轮数,同时获取最有用的信息。* Methods: + 本文引入Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD),它将Bayesian active learning与压缩相结合,使用线性混合机制。 + 方法选择批处理输入,基于其 epistemic uncertainty,解决”confirmation bias”问题,减少通信轮数。 + 提议的混合压缩策略与 epistemic uncertainty-based active batch selection 进程相结合。* Results: + 提议的 CC-BAKD 协议被证明可以减少通信轮数,同时获取最有用的信息。 + 方法在多个 benchmark 数据集上进行评估,与现有的活动学习方法相比,在通信效率和准确率方面表现出色。
    Abstract Consider an active learning setting in which a learner has a training set with few labeled examples and a pool set with many unlabeled inputs, while a remote teacher has a pre-trained model that is known to perform well for the learner's task. The learner actively transmits batches of unlabeled inputs to the teacher through a constrained communication channel for labeling. This paper addresses the following key questions: (i) Active batch selection: Which batch of inputs should be sent to the teacher to acquire the most useful information and thus reduce the number of required communication rounds? (ii) Batch encoding: How do we encode the batch of inputs for transmission to the teacher to reduce the communication resources required at each round? We introduce Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD), a novel protocol that integrates Bayesian active learning with compression via a linear mix-up mechanism. Bayesian active learning selects the batch of inputs based on their epistemic uncertainty, addressing the "confirmation bias" that is known to increase the number of required communication rounds. Furthermore, the proposed mix-up compression strategy is integrated with the epistemic uncertainty-based active batch selection process to reduce the communication overhead per communication round.
    摘要 请注意,以下文本将使用简化中文。在本文中,我们考虑了一个活动学习 Setting,其中学习者 Possesses 一个具有少量标注示例的训练集,以及一个包含多个未标注输入的池集。而远程教师 Possesses 一个预训练的模型,该模型在学习者的任务中已知perform well。学习者可以通过一个受限的通信渠道向教师发送批处理的未标注输入,以获得标签。本文解决以下关键问题:(i) 活动批处理:哪些批处理 Should 被发送到教师以获得最有用的信息,从而减少通信轮数?(ii) 批处理编码:如何编码批处理以便在每次通信中减少通信资源?我们提出了一种名为 Communication-Constrained Bayesian Active Knowledge Distillation (CC-BAKD) 的协议,该协议将感知活动学习与压缩相结合,通过线性混合机制来实现。感知活动学习可以根据输入的 épistémic 不确定性选择批处理,这可以避免“确irmation bias”,这种情况可能增加通信轮数。此外,我们还将混合压缩策略与 épistémic 不确定性基于的活动批处理选择过程相结合,以减少每次通信轮数中的通信资源。

Velocity-Based Channel Charting with Spatial Distribution Map Matching

  • paper_url: http://arxiv.org/abs/2311.08016
  • repo_url: None
  • paper_authors: Maximilian Stahlke, George Yammine, Tobias Feigl, Bjoern M. Eskofier, Christopher Mutschler
  • for: 本研究 targets at improving the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments using fingerprint-based localization.
  • methods: alternatively, the paper proposes a novel framework that uses channel-charting to avoid the labeling effort required in fingerprinting models. The approach uses velocity information and topological map information to transform the channel charts into real coordinates.
  • results: experiments conducted on two real-world datasets using 5G and distributed single-input/multiple-output system (SIMO) radio systems show that the proposed approach achieves similar position accuracies even with noisy velocity estimates and coarse map information.
    Abstract Fingerprint-based localization improves the positioning performance in challenging, non-line-of-sight (NLoS) dominated indoor environments. However, fingerprinting models require an expensive life-cycle management including recording and labeling of radio signals for the initial training and regularly at environmental changes. Alternatively, channel-charting avoids this labeling effort as it implicitly associates relative coordinates to the recorded radio signals. Then, with reference real-world coordinates (positions) we can use such charts for positioning tasks. However, current channel-charting approaches lag behind fingerprinting in their positioning accuracy and still require reference samples for localization, regular data recording and labeling to keep the models up to date. Hence, we propose a novel framework that does not require reference positions. We only require information from velocity information, e.g., from pedestrian dead reckoning or odometry to model the channel charts, and topological map information, e.g., a building floor plan, to transform the channel charts into real coordinates. We evaluate our approach on two different real-world datasets using 5G and distributed single-input/multiple-output system (SIMO) radio systems. Our experiments show that even with noisy velocity estimates and coarse map information, we achieve similar position accuracies
    摘要 fingerprint-based 位置定位在具有强度不良、非直线视野(NLoS)的室内环境中提高位置性能。然而, fingerprinting 模型需要成本的生命周期管理,包括初始训练和环境变化时的记录和标注。 alternatively, channel-charting 可以避免这些标注努力,因为它将记录的电磁信号相对坐标相关联。然后,使用参考世界坐标(位置),我们可以使用这些图表进行位置定位任务。然而,现有的 channel-charting 方法在位置精度方面落后于 fingerprinting,并且仍需要参考样本 для地图更新和定位任务。因此,我们提出了一种新的框架,不需要参考位置。我们只需要 velocity 信息,例如人行速度测量或 odometry,来模型电磁信号图表,以及建筑层次图信息,例如大楼层次图,将电磁信号图表转换成真实坐标。我们在两个不同的实际数据集上进行了5G和分布式单输入多输出系统(SIMO)电磁系统的实验。我们的实验结果表明,即使velocity 估计较准噪和地图信息较粗糙,我们仍可以达到类似的位置精度。

Out-of-Distribution Knowledge Distillation via Confidence Amendment

  • paper_url: http://arxiv.org/abs/2311.07975
  • repo_url: https://github.com/lawliet-zzl/ca
  • paper_authors: Zhilin Zhao, Longbing Cao, Yixuan Zhang
  • for: 本文旨在提出一种基于标准网络的外围数据探测方法,以确保网络的可靠性和可靠性。
  • methods: 本文提出了一种基于标准网络的OOD知识储存框架,可以在没有ID训练数据的情况下使用。这个框架利用标准网络中对OOD样本的敏感知识来raft一个适用于分类ID和OOD样本的 binary 分类器。为了实现这一点,本文提出了一种增强信任修正(CA)技术,可以将OOD样本转换成ID样本,并同时修正来自标准网络的预测信任值。
  • results: 经验表明,提出的方法能够有效地探测外围数据,并且对不同的数据集和网络架构进行了广泛的验证。
    Abstract Out-of-distribution (OOD) detection is essential in identifying test samples that deviate from the in-distribution (ID) data upon which a standard network is trained, ensuring network robustness and reliability. This paper introduces OOD knowledge distillation, a pioneering learning framework applicable whether or not training ID data is available, given a standard network. This framework harnesses OOD-sensitive knowledge from the standard network to craft a binary classifier adept at distinguishing between ID and OOD samples. To accomplish this, we introduce Confidence Amendment (CA), an innovative methodology that transforms an OOD sample into an ID one while progressively amending prediction confidence derived from the standard network. This approach enables the simultaneous synthesis of both ID and OOD samples, each accompanied by an adjusted prediction confidence, thereby facilitating the training of a binary classifier sensitive to OOD. Theoretical analysis provides bounds on the generalization error of the binary classifier, demonstrating the pivotal role of confidence amendment in enhancing OOD sensitivity. Extensive experiments spanning various datasets and network architectures confirm the efficacy of the proposed method in detecting OOD samples.
    摘要 非常赞!这篇论文引入了对外部数据(OOD)探测的探索,以确保网络的可靠性和可靠性。这篇论文介绍了一种新的学习框架,可以不需要训练标准网络的数据,即使没有标准数据,也可以学习OOD敏感知识。这个框架利用标准网络的OOD敏感知识来编制一个二分类器,能够 отличи ID 和 OOD 样本。为此,我们提出了一种创新的方法——信任修正(CA),可以将 OOD 样本转化成 ID 样本,同时对标准网络的预测结果进行修正。这种方法允许同时生成 ID 和 OOD 样本,每个样本都 accompanied by 修正后的预测信任度,从而使得二分类器可以具备对 OOD 的敏感性。我们的理论分析表明, confidence 修正对二分类器的泛化误差具有约束作用,从而提高 OOD 敏感性。我们的实验结果表明,提出的方法在不同的数据集和网络架构下具有普遍的效果。

Higher-Order Expander Graph Propagation

  • paper_url: http://arxiv.org/abs/2311.07966
  • repo_url: None
  • paper_authors: Thomas Christie, Yu He
  • for: 本文旨在提高图像通信的精度,解决图像通信中的过度压缩问题。
  • methods: 本文提出了两种基于分别的 constructions 的方法,以便在图像通信中捕捉更高阶的相关性。
  • results: 实验结果表明,使用高阶扩展图在图像通信中可以提高精度,并且可以更好地捕捉复杂数据中的相关性。
    Abstract Graph neural networks operate on graph-structured data via exchanging messages along edges. One limitation of this message passing paradigm is the over-squashing problem. Over-squashing occurs when messages from a node's expanded receptive field are compressed into fixed-size vectors, potentially causing information loss. To address this issue, recent works have explored using expander graphs, which are highly-connected sparse graphs with low diameters, to perform message passing. However, current methods on expander graph propagation only consider pair-wise interactions, ignoring higher-order structures in complex data. To explore the benefits of capturing these higher-order correlations while still leveraging expander graphs, we introduce higher-order expander graph propagation. We propose two methods for constructing bipartite expanders and evaluate their performance on both synthetic and real-world datasets.
    摘要 图 neural networks 操作在图结构数据上,通过Edge上的信息交换来实现。一个限制是过度压缩问题,当节点的扩展接受范围上的信息被压缩成固定大小的矢量时,可能导致信息损失。为解决这个问题,最近的工作已经探索使用扩展图,这些图是高度连接的稀疏图,但具有低的矩形维度。然而,现有的扩展图传播方法只考虑对角交互,忽略复杂数据中的更高阶相关性。为了探索在扩展图上capture这些更高阶相关性的同时,我们引入更高级的扩展图传播方法。我们提出了两种方法来构建半个bijective expander,并评估了它们在 sintetic 和实际数据上的性能。

Language Models are Better Bug Detector Through Code-Pair Classification

  • paper_url: http://arxiv.org/abs/2311.07957
  • repo_url: https://github.com/kamel773/code_pair_classification
  • paper_authors: Kamel Alrashedy
  • for: This paper is written for researchers and developers who are interested in using large language models (LLMs) for code generation and understanding, and who want to explore alternative methods for fine-tuning these models that do not require a large labeled dataset.
  • methods: The paper proposes a code-pair classification task as an alternative to fine-tuning LLMs for bug detection and repair. In this task, both the buggy and non-buggy versions of the code are given to the model, and the model identifies the buggy ones.
  • results: The paper evaluates the code-pair classification task in a real-world dataset of bug detection and uses two of the most powerful LLMs. The results show that the model can often pick the buggy from the non-buggy version of the code, and that the code-pair classification task is much easier compared to the traditional method of being given a snippet and deciding if and where a bug exists.
    Abstract Large language models (LLMs) such as GPT-3.5 and CodeLlama are powerful models for code generation and understanding. Fine-tuning these models comes with a high computational cost and requires a large labeled dataset. Alternatively, in-context learning techniques allow models to learn downstream tasks with only a few examples. Recently, researchers have shown how in-context learning performs well in bug detection and repair. In this paper, we propose code-pair classification task in which both the buggy and non-buggy versions are given to the model, and the model identifies the buggy ones. We evaluate our task in real-world dataset of bug detection and two most powerful LLMs. Our experiments indicate that an LLM can often pick the buggy from the non-buggy version of the code, and the code-pair classification task is much easier compared to be given a snippet and deciding if and where a bug exists.
    摘要 大型语言模型(LLM)如GPT-3.5和CodeLlama具有代码生成和理解的能力。微调这些模型的计算成本高,需要大量标注数据。 Alternatively,在上下文学习技术可以让模型在只需几个示例下学习下游任务。近期,研究人员表明,在bug检测和修复中,在上下文学习表现出色。在这篇论文中,我们提议代码对 Classification任务,给模型提供了buggy和非buggy两个版本的代码,并让模型标识buggy版本。我们在实际世界的漏斗检测数据集上进行了实验,并使用了两个最强的LLM进行评估。我们的实验表明,一个LLM可以很快地从buggy和非buggy两个版本中选择buggy版本,而代码对 Classification任务比给定一个示例并决定其中是否存在漏洞更加容易。

A Fast and Simple Algorithm for computing the MLE of Amplitude Density Function Parameters

  • paper_url: http://arxiv.org/abs/2311.07951
  • repo_url: None
  • paper_authors: Mahdi Teimouri
  • for: 这paper是为了提出一种快速和准确地计算振荡分布参数的方法。
  • methods: 该方法使用了两个简单的变换将振荡数据 проек到了水平和垂直轴上,并证明了这些项目后的数据遵循zero-location symmetric α-stale分布,可以快速计算MLE。
  • results: 通过实验研究和分析两组实际的雷达数据,提出的方法可以准确地计算振荡分布参数。
    Abstract Over the last decades, the family of $\alpha$-stale distributions has proven to be useful for modelling in telecommunication systems. Particularly, in the case of radar applications, finding a fast and accurate estimation for the amplitude density function parameters appears to be very important. In this work, the maximum likelihood estimator (MLE) is proposed for parameters of the amplitude distribution. To do this, the amplitude data are \emph{projected} on the horizontal and vertical axes using two simple transformations. It is proved that the \emph{projected} data follow a zero-location symmetric $\alpha$-stale distribution for which the MLE can be computed quite fast. The average of computed MLEs based on two \emph{projections} is considered as estimator for parameters of the amplitude distribution. Performance of the proposed \emph{projection} method is demonstrated through simulation study and analysis of two sets of real radar data.
    摘要 Note:* "α-stale" should be translated as "α-稳定" (alpha-stable) in Simplified Chinese.* "amplitude distribution" should be translated as "幅度分布" (amplitude distribution) in Simplified Chinese.* "maximum likelihood estimator" should be translated as "最大可能性估计" (maximum likelihood estimator) in Simplified Chinese.* "projections" should be translated as "投影" (projections) in Simplified Chinese.

Finding Inductive Loop Invariants using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.07948
  • repo_url: None
  • paper_authors: Adharsh Kamath, Aditya Senthilnathan, Saikat Chakraborty, Pantazis Deligiannis, Shuvendu K. Lahiri, Akash Lal, Aseem Rastogi, Subhajit Roy, Rahul Sharma
  • for: 这篇论文旨在探讨大语言模型(LLM)是否可以提供一种新的自动程序验证解决方案。
  • methods: 本论文首先绘制了一个适用于程序循环的验证问题集,然后设计了一个使用LLM获取循环 inductive invariants的提问。最后, authors 使用一种有效的符号工具和LLM的组合来验证这些 invariants,并与纯 Symbolic 基准进行比较。
  • results: 研究结果表明,LLMs 可以帮助提高自动程序验证的状态艺术。
    Abstract Loop invariants are fundamental to reasoning about programs with loops. They establish properties about a given loop's behavior. When they additionally are inductive, they become useful for the task of formal verification that seeks to establish strong mathematical guarantees about program's runtime behavior. The inductiveness ensures that the invariants can be checked locally without consulting the entire program, thus are indispensable artifacts in a formal proof of correctness. Finding inductive loop invariants is an undecidable problem, and despite a long history of research towards practical solutions, it remains far from a solved problem. This paper investigates the capabilities of the Large Language Models (LLMs) in offering a new solution towards this old, yet important problem. To that end, we first curate a dataset of verification problems on programs with loops. Next, we design a prompt for exploiting LLMs, obtaining inductive loop invariants, that are checked for correctness using sound symbolic tools. Finally, we explore the effectiveness of using an efficient combination of a symbolic tool and an LLM on our dataset and compare it against a purely symbolic baseline. Our results demonstrate that LLMs can help improve the state-of-the-art in automated program verification.
    摘要 <>转换文本到简化中文。<>循环 invariants 是程序逻辑的基础知识。它们确定了循环的行为特性。当加上 inductive 性时,它们变得有用于形式验证,以确保程序的运行时行为具有强数学保证。 inductiveness Ensures that the invariants can be checked locally without consulting the entire program, making them indispensable artifacts in a formal proof of correctness. 寻找 inductive loop invariants 是一个不可解决的问题,尽管有长期的研究努力,它仍然远未解决。这篇文章 investigate LLMs 在提供一个新的解决方案方面的能力。为此,我们首先筛选了循环逻辑问题的数据集。然后,我们设计了一个用于利用 LLMs 获取 inductive loop invariants 的 prompt,并使用 зву symbox 的可靠工具进行检查。最后,我们研究了使用一种精efficient的组合,包括一种可靠的符号工具和一种 LLM,对我们的数据集进行检查,并与纯symbolic 基准进行比较。我们的结果表明,LLMs 可以帮助提高自动程序验证的状态。

Clinical Characteristics and Laboratory Biomarkers in ICU-admitted Septic Patients with and without Bacteremia

  • paper_url: http://arxiv.org/abs/2311.08433
  • repo_url: None
  • paper_authors: Sangwon Baek, Seung Jun Lee
  • for: 这个研究是为了评估医学实验室标记物的预测力,以优化预测模型,提高抗血症患者中血液感染的预测精度。
  • methods: 这是一个逆向协调研究,使用多变量 logsitic 回归分析对各种实验室标记物进行了独立分析,并将其结果用于建立预测模型。
  • results: 研究发现, combinig PCT、bilirubin、NLR、板块、氯酸、ESR和GCS 分数可以提高预测模型的准确性(AUC=0.907,95%CI:0.843-0.956),并发现血液感染和死亡率之间存在高度相关性(0.004)。
    Abstract Few studies have investigated the diagnostic utilities of biomarkers for predicting bacteremia among septic patients admitted to intensive care units (ICU). Therefore, this study evaluated the prediction power of laboratory biomarkers to utilize those markers with high performance to optimize the predictive model for bacteremia. This retrospective cross-sectional study was conducted at the ICU department of Gyeongsang National University Changwon Hospital in 2019. Adult patients qualifying SEPSIS-3 (increase in sequential organ failure score greater than or equal to 2) criteria with at least two sets of blood culture were selected. Collected data was initially analyzed independently to identify the significant predictors, which was then used to build the multivariable logistic regression (MLR) model. A total of 218 patients with 48 cases of true bacteremia were analyzed in this research. Both CRP and PCT showed a substantial area under the curve (AUC) value for discriminating bacteremia among septic patients (0.757 and 0.845, respectively). To further enhance the predictive accuracy, we combined PCT, bilirubin, neutrophil lymphocyte ratio (NLR), platelets, lactic acid, erythrocyte sedimentation rate (ESR), and Glasgow Coma Scale (GCS) score to build the predictive model with an AUC of 0.907 (95% CI, 0.843 to 0.956). In addition, a high association between bacteremia and mortality rate was discovered through the survival analysis (0.004). While PCT is certainly a useful index for distinguishing patients with and without bacteremia by itself, our MLR model indicates that the accuracy of bacteremia prediction substantially improves by the combined use of PCT, bilirubin, NLR, platelets, lactic acid, ESR, and GCS score.
    摘要 几个研究已经研究了在医院 intensivist 部门(ICU)中预测血液感染的生物标志物的 диагности效果。因此,这个研究检验了医学实验室中的标志物是否可以提高预测模型的性能。这是一项在2019年在江原国立大学昌原医院ICU部门进行的回顾性跨sectional研究。选择符合SEPSIS-3(增加序列器衰竭分数大于或等于2)标准的成人患者,并取得至少两组血液文化,并分析了这些数据以确定主要预测变量。共分析了218名患者,其中有48例真血液感染。CRP和PCT都显示了较高的区间 beneath the curve(AUC)值,用于分别患者中的血液感染(0.757和0.845)。为了进一步提高预测准确性,我们将PCT、bilirubin、neutrophil lymphocyte ratio(NLR)、platelets、lactic acid、erythrocyte sedimentation rate(ESR)和Glasgow Coma Scale(GCS)分数组合以建立预测模型,AUC为0.907(95% CI,0.843-0.956)。此外,我们还发现了血液感染和死亡率之间的高度相关性,通过生存分析(0.004)。虽然PCT是一个有用的指标,可以单独分别患者中的血液感染,但我们的多变量Logistic regression模型表明,通过合并PCT、bilirubin、NLR、platelets、lactic acid、ESR和GCS分数,预测血液感染的准确性可以得到显著提高。

Discretized Distributed Optimization over Dynamic Digraphs

  • paper_url: http://arxiv.org/abs/2311.07939
  • repo_url: None
  • paper_authors: Mohammadreza Doostmohammadian, Wei Jiang, Muwahida Liaquat, Alireza Aghasi, Houman Zarrabi
  • for: 这篇论文主要用于分布式优化问题,具体来说是在动态指向图上进行分布式学习。
  • methods: 这篇论文使用了分布式优化算法,该算法可以在具有变化网络拓扑的情况下进行分布式学习。
  • results: 这篇论文提出了一种不需要随机权重设计的分布式优化方法,并且证明了该方法的动态稳定性和收敛性。
    Abstract We consider a discrete-time model of continuous-time distributed optimization over dynamic directed-graphs (digraphs) with applications to distributed learning. Our optimization algorithm works over general strongly connected dynamic networks under switching topologies, e.g., in mobile multi-agent systems and volatile networks due to link failures. Compared to many existing lines of work, there is no need for bi-stochastic weight designs on the links. The existing literature mostly needs the link weights to be stochastic using specific weight-design algorithms needed both at the initialization and at all times when the topology of the network changes. This paper eliminates the need for such algorithms and paves the way for distributed optimization over time-varying digraphs. We derive the bound on the gradient-tracking step-size and discrete time-step for convergence and prove dynamic stability using arguments from consensus algorithms, matrix perturbation theory, and Lyapunov theory. This work, particularly, is an improvement over existing stochastic-weight undirected networks in case of link removal or packet drops. This is because the existing literature may need to rerun time-consuming and computationally complex algorithms for stochastic design, while the proposed strategy works as long as the underlying network is weight-symmetric and balanced. The proposed optimization framework finds applications to distributed classification and learning.
    摘要 我们考虑一个精简时间模型的连续时间分布式优化运算在动态指向グラフ(digraph)上,具体是适用于分布式学习。我们的优化算法在一般强连接动态网络上运行,包括移动多智能系统和不稳定网络,因为链接失败而导致网络 topology 的变化。与许多现有的研究不同,我们不需要链接上的两侧可能性检测,即链接上的偏振量设计。现有文献通常需要在初始化和网络 topology 变化时进行链接重新设计,而我们的方法则没有这个需求。这篇文章提供了关于步进大小和精简时间步骤的下界,并证明了动态稳定性使用协议分析、矩阵扭转理论和Lyapunov理论。这个工作特别是在当链接失败或封包损失时,比现有的随机链接网络优化更好。这是因为现有文献可能需要在网络 topology 变化时重新跑时间consuming和computationally Complex的算法,而我们的策略则可以在网络 weight-symmetric 和均衡的情况下运行。我们的优化框架具有应用于分布式分类和学习的实际应用。

Self-supervised Heterogeneous Graph Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2311.07929
  • repo_url: None
  • paper_authors: Yige Zhao, Jianxiang Yu, Yao Cheng, Chengcheng Yu, Yiding Liu, Xiang Li, Shuaiqiang Wang
  • for: Addressing the problems of missing attributes, inaccurate attributes, and scarce labels in Heterogeneous Information Networks (HINs) using a generative self-supervised model called SHAVA.
  • methods: SHAVA uses a variational graph autoencoder framework to learn both node-level and attribute-level embeddings in the encoder, and reconstructs both links and attributes in the decoder. It generates an initial low-dimensional representation matrix for all nodes, which is used to reconstruct raw features of attributed nodes and rectify inaccurate attributes.
  • results: SHAVA is shown to be superior in tackling HINs with missing and inaccurate attributes, outperforming existing heterogeneous graph neural networks (HGNNs) in extensive experiments.
    Abstract Heterogeneous Information Networks (HINs), which consist of various types of nodes and edges, have recently demonstrated excellent performance in graph mining. However, most existing heterogeneous graph neural networks (HGNNs) ignore the problems of missing attributes, inaccurate attributes and scarce labels for nodes, which limits their expressiveness. In this paper, we propose a generative self-supervised model SHAVA to address these issues simultaneously. Specifically, SHAVA first initializes all the nodes in the graph with a low-dimensional representation matrix. After that, based on the variational graph autoencoder framework, SHAVA learns both node-level and attribute-level embeddings in the encoder, which can provide fine-grained semantic information to construct node attributes. In the decoder, SHAVA reconstructs both links and attributes. Instead of directly reconstructing raw features for attributed nodes, SHAVA generates the initial low-dimensional representation matrix for all the nodes, based on which raw features of attributed nodes are further reconstructed to leverage accurate attributes. In this way, SHAVA can not only complete informative features for non-attributed nodes, but rectify inaccurate ones for attributed nodes. Finally, we conduct extensive experiments to show the superiority of SHAVA in tackling HINs with missing and inaccurate attributes.
    摘要 《异类信息网络(HIN)》,它们由多种类型的节点和边组成,在图矿采取中表现出色。然而,现有的多态图神经网络(HGNN)通常忽视节点缺失属性、不准确属性以及节点罕见标签的问题,这限制了它们的表达能力。在这篇论文中,我们提议一种生成自我超级vised模型SHAVA,可以同时解决这些问题。SHAVA的实现方式如下:首先,它将所有图中节点初始化为一个低维度表示矩阵。然后,基于变量图自动encoder框架,SHAVA学习了节点级别和属性级别的嵌入表示。在decoder中,SHAVA重建了链接和属性。而不是直接重建Raw特征 для已有标签的节点,SHAVA将生成所有节点的初始低维度表示矩阵,基于该矩阵, Raw特征的已有标签节点进行进一步重建,以利用准确的属性。这种方式可以不仅为非已有标签节点提供有用的特征,还可以修正已有标签节点的不准确属性。最后,我们进行了广泛的实验,以证明SHAVA在面临HINs中缺失和不准确的属性时表现出色。

Bayesian Conditional Diffusion Models for Versatile Spatiotemporal Turbulence Generation

  • paper_url: http://arxiv.org/abs/2311.07896
  • repo_url: None
  • paper_authors: Han Gao, Xu Han, Xiantao Fan, Luning Sun, Li-Ping Liu, Lian Duan, Jian-Xun Wang
  • for: 用于生成随机的爆炸流动
  • methods: 使用泛化扩散模型生成随机流动,并提供了一种基于梯度的Conditional sampling方法来生成长程流动序列
  • results: 通过一系列数值实验,证明了该方法可以生成高精度、多样化的爆炸流动,包括LES模拟的快速流动序列、各种异常流动和高速边层流动等。
    Abstract Turbulent flows have historically presented formidable challenges to predictive computational modeling. Traditional numerical simulations often require vast computational resources, making them infeasible for numerous engineering applications. As an alternative, deep learning-based surrogate models have emerged, offering data-drive solutions. However, these are typically constructed within deterministic settings, leading to shortfall in capturing the innate chaotic and stochastic behaviors of turbulent dynamics. We introduce a novel generative framework grounded in probabilistic diffusion models for versatile generation of spatiotemporal turbulence. Our method unifies both unconditional and conditional sampling strategies within a Bayesian framework, which can accommodate diverse conditioning scenarios, including those with a direct differentiable link between specified conditions and generated unsteady flow outcomes, and scenarios lacking such explicit correlations. A notable feature of our approach is the method proposed for long-span flow sequence generation, which is based on autoregressive gradient-based conditional sampling, eliminating the need for cumbersome retraining processes. We showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments, including: 1) the synthesis of LES simulated instantaneous flow sequences from URANS inputs; 2) holistic generation of inhomogeneous, anisotropic wall-bounded turbulence, whether from given initial conditions, prescribed turbulence statistics, or entirely from scratch; 3) super-resolved generation of high-speed turbulent boundary layer flows from low-resolution data across a range of input resolutions. Collectively, our numerical experiments highlight the merit and transformative potential of the proposed methods, making a significant advance in the field of turbulence generation.
    摘要 historically, turbulent flows have presented significant challenges to predictive computational modeling. traditional numerical simulations often require vast computational resources, making them infeasible for many engineering applications. as an alternative, deep learning-based surrogate models have emerged, offering data-driven solutions. however, these are typically constructed within deterministic settings, leading to a shortfall in capturing the innate chaotic and stochastic behaviors of turbulent dynamics.we introduce a novel generative framework grounded in probabilistic diffusion models for versatile generation of spatiotemporal turbulence. our method unifies both unconditional and conditional sampling strategies within a bayesian framework, which can accommodate diverse conditioning scenarios, including those with a direct differentiable link between specified conditions and generated unsteady flow outcomes, and scenarios lacking such explicit correlations. a notable feature of our approach is the method proposed for long-span flow sequence generation, which is based on autoregressive gradient-based conditional sampling, eliminating the need for cumbersome retraining processes.we showcase the versatile turbulence generation capability of our framework through a suite of numerical experiments, including:1. the synthesis of LES simulated instantaneous flow sequences from URANS inputs;2. holistic generation of inhomogeneous, anisotropic wall-bounded turbulence, whether from given initial conditions, prescribed turbulence statistics, or entirely from scratch;3. super-resolved generation of high-speed turbulent boundary layer flows from low-resolution data across a range of input resolutions.collectively, our numerical experiments highlight the merit and transformative potential of the proposed methods, representing a significant advance in the field of turbulence generation.

Mixture of Coupled HMMs for Robust Modeling of Multivariate Healthcare Time Series

  • paper_url: http://arxiv.org/abs/2311.07867
  • repo_url: https://github.com/onurpoyraz/m-chmm
  • paper_authors: Onur Poyraz, Pekka Marttinen
  • for: 这篇论文主要应用于Multivariate Healthcare Time Series Data的分析,解决不规则采样、噪音和缺失值、不同患者群体的不同动态等问题。
  • methods: 本文提出了一种新的模型,即coupled hidden Markov models (M-CHMM),并详细介绍了两种采样方法:分子滤波和分解方法。这些方法可以让模型学习更加可能,并提高混合度、降低计算复杂度,并且可以进行机会测量,从而学习混合模型。
  • results: 实验结果显示,M-CHMM可以更好地适应实际世界的epidemiological和半人工数据,提高数据适应度、实现噪音和缺失值处理、提高预测精度,并且可以实现可读性的subset构成。
    Abstract Analysis of multivariate healthcare time series data is inherently challenging: irregular sampling, noisy and missing values, and heterogeneous patient groups with different dynamics violating exchangeability. In addition, interpretability and quantification of uncertainty are critically important. Here, we propose a novel class of models, a mixture of coupled hidden Markov models (M-CHMM), and demonstrate how it elegantly overcomes these challenges. To make the model learning feasible, we derive two algorithms to sample the sequences of the latent variables in the CHMM: samplers based on (i) particle filtering and (ii) factorized approximation. Compared to existing inference methods, our algorithms are computationally tractable, improve mixing, and allow for likelihood estimation, which is necessary to learn the mixture model. Experiments on challenging real-world epidemiological and semi-synthetic data demonstrate the advantages of the M-CHMM: improved data fit, capacity to efficiently handle missing and noisy measurements, improved prediction accuracy, and ability to identify interpretable subsets in the data.
    摘要 《多变量医疗时间序列数据分析具有许多挑战:不规则的采样、噪声和缺失值,以及不同的患者群体动态,使得交换性被违犯。此外,解释性和量化不确定性也是非常重要。我们提出了一种新的模型,即混合隐藏Markov模型(M-CHMM),并证明了它能够妥协这些挑战。为了使模型学习可行,我们 derivated了两种算法来采样CHMM中的隐藏变量序列:基于 particule filtering 和 factorized approximation。与现有的推理方法相比,我们的算法具有计算可 tractable 的优点,提高混合度,并允许likelihood估计,这是必要的来学习混合模型。实验表明,M-CHMM在实际世界的epidemiological和半 sintetic数据上具有优势:提高数据的适应度,能够有效地处理缺失和噪声测量,提高预测精度,并能够寻找可解释的子集。》

PEMS: Pre-trained Epidmic Time-series Models

  • paper_url: http://arxiv.org/abs/2311.07841
  • repo_url: None
  • paper_authors: Harshavardhan Kamarthi, B. Aditya Prakash
  • For: The paper aims to provide accurate and reliable predictions about the future of an epidemic, enabling informed public health decisions.* Methods: The authors use pre-trained deep learning models to learn from multiple datasets of different diseases and epidemics, and introduce a set of self-supervised learning (SSL) tasks to capture useful patterns and learn important priors about the epidemic dynamics.* Results: The resultant Pre-trained Epidemic Time-Series Models (PEMS) outperform previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion, including the novel Covid-19 pandemic, with better efficiency using a smaller fraction of datasets.
    Abstract Providing accurate and reliable predictions about the future of an epidemic is an important problem for enabling informed public health decisions. Recent works have shown that leveraging data-driven solutions that utilize advances in deep learning methods to learn from past data of an epidemic often outperform traditional mechanistic models. However, in many cases, the past data is sparse and may not sufficiently capture the underlying dynamics. While there exists a large amount of data from past epidemics, leveraging prior knowledge from time-series data of other diseases is a non-trivial challenge. Motivated by the success of pre-trained models in language and vision tasks, we tackle the problem of pre-training epidemic time-series models to learn from multiple datasets from different diseases and epidemics. We introduce Pre-trained Epidemic Time-Series Models (PEMS) that learn from diverse time-series datasets of a variety of diseases by formulating pre-training as a set of self-supervised learning (SSL) tasks. We tackle various important challenges specific to pre-training for epidemic time-series such as dealing with heterogeneous dynamics and efficiently capturing useful patterns from multiple epidemic datasets by carefully designing the SSL tasks to learn important priors about the epidemic dynamics that can be leveraged for fine-tuning to multiple downstream tasks. The resultant PEM outperforms previous state-of-the-art methods in various downstream time-series tasks across datasets of varying seasonal patterns, geography, and mechanism of contagion including the novel Covid-19 pandemic unseen in pre-trained data with better efficiency using smaller fraction of datasets.
    摘要 importante проблеma para tomar decisiones de salud pública informadas es proporcionar predicciones precisas y confiables sobre el futuro de una epidemia. Los trabajos recientes han demostrado que utilizar soluciones de aprendizaje profundo que se basan en el aprendizaje automático para aprender de los datos pasados de una epidemia pueden superar a los modelos mecánicos tradicionales. Sin embargo, en muchos casos, los datos pasados pueden ser escasos y no capturar adecuadamente las dinámicas subyacentes. A pesar de que existen una gran cantidad de datos de epidemias pasadas, utilizar el conocimiento previo de los datos temporales de otras enfermedades es un desafío no trivial.Motivados por el éxito de los modelos pre-entrenados en tareas de lenguaje y visión, abordamos el problema de entrenar modelos de tiempo series epidemiológicos pre-entrenados para aprender de múltiples conjuntos de datos de diferentes enfermedades y epidemias. Presentamos los Modelos de Epidemic Time Series Pre-entrenados (PEMS) que aprenden de conjuntos de datos temporales diversificados de una variedad de enfermedades mediante la formulación de tareas de aprendizaje auto-supervisado (SSL). Abordamos desafíos importantes específicos de pre-entrenamiento para series temporales epidemiológicas, como lidiar con dinámicas heterogéneas y capturar eficientemente patrones útiles de múltiples conjuntos de datos epidemiológicos mediante tareas SSL cuidadosamente diseñadas para aprender priores importantes sobre las dinámicas epidemiológicas que se pueden utilizar para fine-tuning en diversas tareas downstream.El resultado es que PEM supera a los métodos estado-de-la-arte anteriores en diversas tareas downstream de tiempo series en conjuntos de datos con patrones estacionales variables, geográficos y de contagio diferente, incluyendo la pandemia novel Covid-19 sin datos previamente entrenados con mayor eficiencia utilizando una fracción más pequeña de los conjuntos de datos.

Toward Efficient and Incremental Spectral Clustering via Parametric Spectral Clustering

  • paper_url: http://arxiv.org/abs/2311.07833
  • repo_url: https://github.com/109502518/psc_bigdata
  • paper_authors: Jo-Chun Chen, Hung-Hsuan Chen
  • for: addresses the challenges associated with big data and real-time scenarios, and enables efficient incremental clustering with new data points.
  • methods: extends the capabilities of spectral clustering with a novel approach called parametric spectral clustering (PSC).
  • results: achieves clustering quality mostly comparable to standard spectral clustering while being computationally efficient, as demonstrated through experimental evaluations on various open datasets.
    Abstract Spectral clustering is a popular method for effectively clustering nonlinearly separable data. However, computational limitations, memory requirements, and the inability to perform incremental learning challenge its widespread application. To overcome these limitations, this paper introduces a novel approach called parametric spectral clustering (PSC). By extending the capabilities of spectral clustering, PSC addresses the challenges associated with big data and real-time scenarios and enables efficient incremental clustering with new data points. Experimental evaluations conducted on various open datasets demonstrate the superiority of PSC in terms of computational efficiency while achieving clustering quality mostly comparable to standard spectral clustering. The proposed approach has significant potential for incremental and real-time data analysis applications, facilitating timely and accurate clustering in dynamic and evolving datasets. The findings of this research contribute to the advancement of clustering techniques and open new avenues for efficient and effective data analysis. We publish the experimental code at https://github.com/109502518/PSC_BigData.
    摘要 spectral clustering是一种广泛使用的方法,可以有效地将非线性分离数据集群。然而,计算限制、内存需求以及不能进行增量学习使得其广泛应用受到挑战。为了突破这些限制,本文提出了一种新的方法: Parametric spectral clustering(PSC)。通过扩展spectral clustering的能力,PSC解决了大数据和实时应用中的限制,并允许高效地添加新数据点。经验证表明,PSC在计算效率方面与标准spectral clustering相当,并且在各种开放数据集上达到了相似的分 clustering质量。提出的方法在增量和实时数据分析应用中具有广泛的潜在应用前景,可以实现时态和准确的分 clustering。这些研究成果对集群技术的发展产生了贡献,打开了新的有效和高效的数据分析途径。我们在github上公布了实验代码,请参考https://github.com/109502518/PSC_BigData。

Purpose in the Machine: Do Traffic Simulators Produce Distributionally Equivalent Outcomes for Reinforcement Learning Applications?

  • paper_url: http://arxiv.org/abs/2311.08429
  • repo_url: None
  • paper_authors: Rex Chen, Kathleen M. Carley, Fei Fang, Norman Sadeh
  • for: 这个论文主要用于探讨智能交通系统(ITS)学习中的模拟器对RL代理的影响。
  • methods: 这篇论文使用了两种常用的城市流模拟器和SUMO来训练RL代理,并在虚拟实验中控制了司机行为和模拟规模,以检验RL测量方法的准确性。
  • results: 研究发现,由于模拟器之间的假设差异,RL测量方法之间存在很大的差异,包括平均平方误差和KL散度,这些差异在所有评估指标中都存在。这些结果表明,交通模拟器不能被视为RL训练的“神奇解决方案”,需要更好地理解模拟器之间的差异,以便在真实世界中部署RL-基于ITS。
    Abstract Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs.
    摘要 假设系统(ITS)的学习需要数据生成,交通模拟器是一种常用的工具。然而,模拟器的假设会对ITS在实际世界中的适应能力产生影响。这项工作关注CityFlow和SUMO两种常用于训练奖励学习(RL)Agent的交通模拟器,通过控制虚拟实验中 Driver 行为和模拟规模进行了调整。研究发现,RL相关指标之间的分布不相等,root mean squared error和KL divergence都大于0。这些发现表明,交通模拟器不是RL训练的神奇解决方案,需要理解模拟器之间的差异,以便训练和部署RL基于ITS的系统。

Statistical Parameterized Physics-Based Machine Learning Digital Twin Models for Laser Powder Bed Fusion Process

  • paper_url: http://arxiv.org/abs/2311.07821
  • repo_url: None
  • paper_authors: Yangfan Li, Satyajit Mojumder, Ye Lu, Abdullah Al Amin, Jiachen Guo, Xiaoyu Xie, Wei Chen, Gregory J. Wagner, Jian Cao, Wing Kam Liu
  • for: This paper aims to develop a digital twin model for predicting and controlling the quality of laser powder bed fusion (LPBF) metal additive manufacturing processes.
  • methods: The paper uses a parameterized physics-based digital twin (PPB-DT) model that incorporates a mechanistic reduced-order method-driven stochastic calibration process to statistically predict melt pool geometries and identify defects. The model is validated through controlled experiments and compared to machine learning-based digital twin (PPB-ML-DT) models.
  • results: The PPB-DT model is able to accurately predict melt pool geometries and identify defects such as lack-of-fusion porosity and surface roughness, and the PPB-ML-DT model is able to predict, monitor, and control melt pool geometries. The proposed digital twin models can be used for predictions, control, optimization, and quality assurance within the LPBF process.Here is the information in Simplified Chinese text:
  • for: 这篇论文目的是开发一个用于预测和控制laserpowderbedfusion(LPBF)金属三维打印过程质量的数字双(DT)模型。
  • methods: 这篇论文使用一个参数化的物理基础DT模型,该模型通过机理减少方法驱动的随机校准过程来统计预测熔融池形状和识别缺陷。
  • results: PPB-DT模型能够准确预测熔融池形状和识别缺陷,而PPB-ML-DT模型能够预测、监控和控制熔融池形状。提议的数字双模型可以用于预测、控制、优化和质量监控在LPBF过程中。
    Abstract A digital twin (DT) is a virtual representation of physical process, products and/or systems that requires a high-fidelity computational model for continuous update through the integration of sensor data and user input. In the context of laser powder bed fusion (LPBF) additive manufacturing, a digital twin of the manufacturing process can offer predictions for the produced parts, diagnostics for manufacturing defects, as well as control capabilities. This paper introduces a parameterized physics-based digital twin (PPB-DT) for the statistical predictions of LPBF metal additive manufacturing process. We accomplish this by creating a high-fidelity computational model that accurately represents the melt pool phenomena and subsequently calibrating and validating it through controlled experiments. In PPB-DT, a mechanistic reduced-order method-driven stochastic calibration process is introduced, which enables the statistical predictions of the melt pool geometries and the identification of defects such as lack-of-fusion porosity and surface roughness, specifically for diagnostic applications. Leveraging data derived from this physics-based model and experiments, we have trained a machine learning-based digital twin (PPB-ML-DT) model for predicting, monitoring, and controlling melt pool geometries. These proposed digital twin models can be employed for predictions, control, optimization, and quality assurance within the LPBF process, ultimately expediting product development and certification in LPBF-based metal additive manufacturing.
    摘要 一个数字双(DT)是一个虚拟的物理过程、产品和/或系统的表示,需要高精度计算模型,Continuous更新通过感知器数据和用户输入集成。在激光粉末堆合(LPBF)添加制造过程中,一个关于制造过程的数字双可以提供生产件预测、制造缺陷诊断以及控制能力。本文介绍一个基于物理学习的数字双(PPB-DT),用于LPBF附加制造过程的统计预测。我们通过创建一个具有高精度计算模型,准确表示熔融池现象,并通过控制实验 validate和调整它。在PPB-DT中,我们引入了一种基于物理学习的减少阶段法的抽象Calibration过程,可以统计预测熔融池几何体和缺陷的标识,特别是 для诊断应用。通过这个物理学习模型和实验数据,我们训练了一个基于机器学习的数字双(PPB-ML-DT)模型,用于预测、监控和控制熔融池几何体。这些提posed的数字双模型可以在LPBF过程中用于预测、控制、优化和质量保证,最终提高LPBF基于附加制造的金属加工产品的开发和认证。