cs.LG - 2023-09-26

DeepROCK: Error-controlled interaction detection in deep neural networks

  • paper_url: http://arxiv.org/abs/2309.15319
  • repo_url: None
  • paper_authors: Winston Chen, William Stafford Noble, Yang Young Lu
  • for: 这 paper 的目的是提高深度神经网络 (DNN) 的解释性,使其在抗错准备领域中更加可靠。
  • methods: 这 paper 使用了“knockoffs”,即假变量,来模拟特定功能集之间的依赖关系,并使用了一种新的 DNN 架构,以控制 false discovery rate (FDR) 和最大化统计能力。
  • results: experiments 表明,DeepROCK 可以有效地控制 FDR,并在 simulate 和实际数据上进行了广泛的验证。
    Abstract The complexity of deep neural networks (DNNs) makes them powerful but also makes them challenging to interpret, hindering their applicability in error-intolerant domains. Existing methods attempt to reason about the internal mechanism of DNNs by identifying feature interactions that influence prediction outcomes. However, such methods typically lack a systematic strategy to prioritize interactions while controlling confidence levels, making them difficult to apply in practice for scientific discovery and hypothesis validation. In this paper, we introduce a method, called DeepROCK, to address this limitation by using knockoffs, which are dummy variables that are designed to mimic the dependence structure of a given set of features while being conditionally independent of the response. Together with a novel DNN architecture involving a pairwise-coupling layer, DeepROCK jointly controls the false discovery rate (FDR) and maximizes statistical power. In addition, we identify a challenge in correctly controlling FDR using off-the-shelf feature interaction importance measures. DeepROCK overcomes this challenge by proposing a calibration procedure applied to existing interaction importance measures to make the FDR under control at a target level. Finally, we validate the effectiveness of DeepROCK through extensive experiments on simulated and real datasets.
    摘要 深度神经网络(DNN)的复杂性使其具有强大的计算能力,但同时也使其难以解释,这限制了其在容错领域的应用。现有的方法通过找出特征相互作用来推理DNN的内部机制,但这些方法通常缺乏系统的策略来优先级化交互,使其在实践中困难应用于科学发现和假设验证。在本文中,我们介绍了一种方法,称为DeepROCK,以解决这一限制。DeepROCK使用“假变量”(knockoffs),这些变量模拟给定特征集的依赖结构,同时保持conditionally independent于响应变量。与一种新的DNN架构相结合,DeepROCK同时控制了false discovery rate(FDR)和最大化统计能力。此外,我们发现了控制FDR使用现有特征相互作用重要性度量的挑战。DeepROCK解决了这个挑战,通过对现有的特征相互作用重要性度量进行滤波来使FDR进行控制。最后,我们通过对模拟和实际数据进行广泛的实验验证了DeepROCK的效果。

Telescope: An Automated Hybrid Forecasting Approach on a Level-Playing Field

  • paper_url: http://arxiv.org/abs/2309.15871
  • repo_url: None
  • paper_authors: André Bauer, Mark Leznik, Michael Stenger, Robert Leppich, Nikolas Herbst, Samuel Kounev, Ian Foster
  • for: 预测(forecasting)
  • methods: 使用机器学习方法自动从给定时间序列中提取有关信息,并将其分解成多个部分进行处理。
  • results: 比较其他最新方法的准确和可靠预测,而无需进行参数化或训练多个参数。I hope that helps! Let me know if you have any other questions.
    Abstract In many areas of decision-making, forecasting is an essential pillar. Consequently, many different forecasting methods have been proposed. From our experience, recently presented forecasting methods are computationally intensive, poorly automated, tailored to a particular data set, or they lack a predictable time-to-result. To this end, we introduce Telescope, a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series and splits it into parts, handling each of them separately. In contrast to deep learning methods, our approach doesn't require parameterization or the need to train and fit a multitude of parameters. It operates with just one time series and provides forecasts within seconds without any additional setup. Our experiments show that Telescope outperforms recent methods by providing accurate and reliable forecasts while making no assumptions about the analyzed time series.
    摘要 在很多决策领域中,预测是一个重要的柱子。因此,有很多不同的预测方法被提出。从我们的经验来看,最近提出的预测方法都具有计算昂贵、自动化不够、特定数据集适应性或时间到结果难以预测等缺点。为了解决这些问题,我们介绍了天镜,一种新的机器学习基于的预测方法。与深度学习方法不同,我们的方法不需要参数化或训练多个参数。它只需要一个时间序列,并在秒钟内提供准确和可靠的预测,无需任何额外设置。我们的实验表明,天镜比最近的方法提供更准确和可靠的预测,而且不会对分析的时间序列做任何假设。

Beyond Log-Concavity: Theory and Algorithm for Sum-Log-Concave Optimization

  • paper_url: http://arxiv.org/abs/2309.15298
  • repo_url: None
  • paper_authors: Mastane Achab
  • for: 该文章扩展了经典的凸优化理论,应用于函数的最小化,其等于减法各个凸抽象函数的逻辑和。
  • methods: 该文章提出了一种新的梯度下降算法(cross gradient descent,XGD),该算法在梯度和副向量之间进行交互调整,以实现更高效的优化。
  • results: 该文章通过应用该框架,引入了一种新的分类方法(检查ered regression),该方法可以在非线性分离问题中进行分类,并且可以通过使用任意数量的准则平面,创造一种棋盘状的决策区域。
    Abstract This paper extends the classic theory of convex optimization to the minimization of functions that are equal to the negated logarithm of what we term as a sum-log-concave function, i.e., a sum of log-concave functions. In particular, we show that such functions are in general not convex but still satisfy generalized convexity inequalities. These inequalities unveil the key importance of a certain vector that we call the cross-gradient and that is, in general, distinct from the usual gradient. Thus, we propose the Cross Gradient Descent (XGD) algorithm moving in the opposite direction of the cross-gradient and derive a convergence analysis. As an application of our sum-log-concave framework, we introduce the so-called checkered regression method relying on a sum-log-concave function. This classifier extends (multiclass) logistic regression to non-linearly separable problems since it is capable of tessellating the feature space by using any given number of hyperplanes, creating a checkerboard-like pattern of decision regions.
    摘要

Multiple Case Physics-Informed Neural Network for Biomedical Tube Flows

  • paper_url: http://arxiv.org/abs/2309.15294
  • repo_url: None
  • paper_authors: Hong Shen Wong, Wei Xuan Chan, Bing Huan Li, Choon Hwai Yap
  • for: 计算血液和空气流动在管状 geometries 的生物医学评估
  • methods: 使用Physics-Informed Neural Networks (PINNs) 代替传统的计算流体力学 (CFD) 方法
  • results: 可以在实时内获得未看到的geometry cases的结果,并且可以优化网络架构、管状特有的和正则化策略以提高性能
    Abstract Fluid dynamics computations for tube-like geometries are important for biomedical evaluation of vascular and airway fluid dynamics. Physics-Informed Neural Networks (PINNs) have recently emerged as a good alternative to traditional computational fluid dynamics (CFD) methods. The vanilla PINN, however, requires much longer training time than the traditional CFD methods for each specific flow scenario and thus does not justify its mainstream use. Here, we explore the use of the multi-case PINN approach for calculating biomedical tube flows, where varied geometry cases are parameterized and pre-trained on the PINN, such that results for unseen geometries can be obtained in real time. Our objective is to identify network architecture, tube-specific, and regularization strategies that can optimize this, via experiments on a series of idealized 2D stenotic tube flows.
    摘要 fluid 动力计算 для管状结构是生物医学评估血液和空气流动的重要方面。物理学 Informed Neural Networks (PINNs) 最近emerge 为传统计算流体力学 (CFD) 方法的好alternative。然而,vanilla PINN 需要每个特定流场训练时间更长,因此无法 justify 其主流使用。在这里,我们探讨使用多个案例 PINN 方法计算生物管流动,其中 varied geometry cases 被参数化并在 PINN 中预训练,以便在实时内获得未经见过的geometry结果。我们的目标是通过实验 serie 的idealized 2D 狭窄管流动来优化这种方法。

Scaling Representation Learning from Ubiquitous ECG with State-Space Models

  • paper_url: http://arxiv.org/abs/2309.15292
  • repo_url: https://github.com/klean2050/tiles_ecg_model
  • paper_authors: Kleanthis Avramidis, Dominika Kunc, Bartosz Perz, Kranti Adsul, Tiantian Feng, Przemysław Kazienko, Stanisław Saganowski, Shrikanth Narayanan
  • For: The paper is written for enhancing human well-being through ubiquitous sensing from wearable devices in the wild, with a focus on electrocardiogram (ECG) signals.* Methods: The paper introduces a pre-trained state-space model for representation learning from ECG signals, which is trained in a self-supervised manner using a large dataset of 275,000 10s ECG recordings collected in the wild.* Results: The proposed model demonstrates competitive performance on a range of downstream tasks, including health monitoring, stress and affect estimation, and provides efficacy in low-resource regimes.Here’s the information in Simplified Chinese text format, as requested:
  • for: 帮助人类健康提升,通过抽象敏感设备在野外收集数据,包括诊断临床病种和衡量压力和情绪等。
  • methods: 提出一种基于状态空间模型的 Representation Learning 方法,通过自我超vised 方式使用大量的野外收集到的 275,000 个 10s ECG 记录。
  • results: 提出的模型在多个下游任务上显示竞争性的表现,包括健康监测、压力和情绪估计等,并在资源有限的情况下显示有效性。
    Abstract Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging realm catalyzed by the recent advances in computational modeling and the abundance of publicly shared databases. The electrocardiogram (ECG) is the primary researched modality in this context, with applications in health monitoring, stress and affect estimation. Yet, most studies are limited by small-scale controlled data collection and over-parameterized architecture choices. We introduce \textbf{WildECG}, a pre-trained state-space model for representation learning from ECG signals. We train this model in a self-supervised manner with 275,000 10s ECG recordings collected in the wild and evaluate it on a range of downstream tasks. The proposed model is a robust backbone for ECG analysis, providing competitive performance on most of the tasks considered, while demonstrating efficacy in low-resource regimes. The code and pre-trained weights are shared publicly at https://github.com/klean2050/tiles_ecg_model.
    摘要 通过 ubique 感知设备在野外的应用,可以提高人类的健康水平,从诊断临床病种和测量压力到构建适应性健康促进架构。但是,由于这些数据在不同的CONTEXT中存在差异,因此使用传统的指导学习方法会面临挑战。生物信号的 Representation Learning 是一个出现在的领域,它受到了计算机模型的最新进步和大量公共分享数据库的推动。电cardiogram (ECG) 是这个上下文中最常研究的Modalitas,它在健康监测、压力和情绪估计等方面有着应用。然而,大多数研究都受限于小规模的控制数据收集和过参数化的建筑选择。我们介绍了 \textbf{WildECG},一个预训练的状态空间模型,用于 Representation Learning 从 ECG 信号中获取特征。我们通过在野外收集了 275,000 个 10s ECG 记录的自助监测方式进行训练,并对其进行评估。该模型是一个强健的 ECG 分析的基础模型,在大多数任务上提供了竞争性的性能,同时在低资源 режи下也表现出了效果。代码和预训练 веса公共分享在 https://github.com/klean2050/tiles_ecg_model 上。

Composable Coresets for Determinant Maximization: Greedy is Almost Optimal

  • paper_url: http://arxiv.org/abs/2309.15286
  • repo_url: None
  • paper_authors: Siddharth Gollapudi, Sepideh Mahabadi, Varun Sivashankar
  • For: 本研究的目标是解决一个维度为 $d$ 的集合中选择 $k$ 个向量,以最大化向量的体积。* Methods: 本研究使用了 Determinantal point processes (DPP) 的 MAP-inference 任务,并在大量数据下进行研究。* Results: 我们提出了一种基于 Greedy 算法的可 compose 核心集合,可以在 $O(k)^{3k}$ 的准确因子下解决这个问题。此外,我们还证明了 Greedy 算法的本地优化性,可以在实际数据集上实现更高的准确性。
    Abstract Given a set of $n$ vectors in $\mathbb{R}^d$, the goal of the \emph{determinant maximization} problem is to pick $k$ vectors with the maximum volume. Determinant maximization is the MAP-inference task for determinantal point processes (DPP) and has recently received considerable attention for modeling diversity. As most applications for the problem use large amounts of data, this problem has been studied in the relevant \textit{composable coreset} setting. In particular, [Indyk-Mahabadi-OveisGharan-Rezaei--SODA'20, ICML'19] showed that one can get composable coresets with optimal approximation factor of $\tilde O(k)^k$ for the problem, and that a local search algorithm achieves an almost optimal approximation guarantee of $O(k)^{2k}$. In this work, we show that the widely-used Greedy algorithm also provides composable coresets with an almost optimal approximation factor of $O(k)^{3k}$, which improves over the previously known guarantee of $C^{k^2}$, and supports the prior experimental results showing the practicality of the greedy algorithm as a coreset. Our main result follows by showing a local optimality property for Greedy: swapping a single point from the greedy solution with a vector that was not picked by the greedy algorithm can increase the volume by a factor of at most $(1+\sqrt{k})$. This is tight up to the additive constant $1$. Finally, our experiments show that the local optimality of the greedy algorithm is even lower than the theoretical bound on real data sets.
    摘要 给定一组 $n$ 向量在 $\mathbb{R}^d$ 中,目标是选择 $k$ 个向量以最大化体积。这个问题被称为 determinant maximization 问题,是 determinantal point processes (DPP) 的MAP-推理任务,最近受到了各种应用的关注,以模型多样性。由于大多数应用都使用大量数据,因此这个问题在相关的可composable coreset 设定下进行研究。特别是,Indyk-Mahabadi-OveisGharan-Rezaei 在 SODA'20 和 ICML'19 上显示了可composable coreset 的优化因子为 $\tilde O(k)^k$,并且一个本地搜索算法可以达到 $O(k)^{2k}$ 的相对误差 guarantee。在这个工作中,我们表明了广泛使用的 Greedy 算法也可以提供可composable coreset 的近似因子 $O(k)^{3k}$,超过之前知道的 $C^{k^2}$ guarantee,并且支持先前的实验结果,证明 Greedy 算法在实际数据集上的实用性。我们的主要结论来自于 showing Greedy 算法的本地优化性ERT:将一个点从 Greedy 解决中拿换一个未被 Greedy 选择的向量可以提高体积的因子为最多 $(1+\sqrt{k})$,这是准确到 $1$ 的上限。最后,我们的实验结果表明 Greedy 算法的本地优化性实际比 теоретиче上的 bound 低。

A Physics Enhanced Residual Learning (PERL) Framework for Traffic State Prediction

  • paper_url: http://arxiv.org/abs/2309.15284
  • repo_url: None
  • paper_authors: Keke Long, Haotian Shi, Zihao Sheng, Xiaopeng Li, Sikai Chen
  • for: 本文提出了一种新的框架,即物理增强剩余学习(PERL)模型,用于解决物理模型和数据驱动模型的矛盾。
  • methods: 本文使用了物理模型和剩余学习模型,并将其 integrate into a single model。该模型的预测结果为物理模型的结果加上一个预测的偏差。
  • results: 实验结果表明,PERL模型在小数据集上比物理模型、数据驱动模型和PINN模型更好地预测行驶轨迹。此外,PERL模型在训练过程中更快地 converges,需要 fewer training samples than data-driven model和PINN模型。
    Abstract In vehicle trajectory prediction, physics models and data-driven models are two predominant methodologies. However, each approach presents its own set of challenges: physics models fall short in predictability, while data-driven models lack interpretability. Addressing these identified shortcomings, this paper proposes a novel framework, the Physics-Enhanced Residual Learning (PERL) model. PERL integrates the strengths of physics-based and data-driven methods for traffic state prediction. PERL contains a physics model and a residual learning model. Its prediction is the sum of the physics model result and a predicted residual as a correction to it. It preserves the interpretability inherent to physics-based models and has reduced data requirements compared to data-driven methods. Experiments were conducted using a real-world vehicle trajectory dataset. We proposed a PERL model, with the Intelligent Driver Model (IDM) as its physics car-following model and Long Short-Term Memory (LSTM) as its residual learning model. We compare this PERL model with the physics car-following model, data-driven model, and other physics-informed neural network (PINN) models. The result reveals that PERL achieves better prediction with a small dataset, compared to the physics model, data-driven model, and PINN model. Second, the PERL model showed faster convergence during training, offering comparable performance with fewer training samples than the data-driven model and PINN model. Sensitivity analysis also proves comparable performance of PERL using another residual learning model and a physics car-following model.
    摘要 在车辆轨迹预测中,物理模型和数据驱动模型是两种主要的方法。然而,每个方法都有自己的缺点:物理模型对预测不够可靠,而数据驱动模型则缺乏解释性。为了解决这些缺点,这篇论文提出了一个新的框架,即物理增强遗传学习(PERL)模型。PERL结合了物理基础和数据驱动方法的优点,并提供了更好的车辆轨迹预测。PERL模型包括物理模型和遗传学习模型。其预测结果为物理模型的结果加上一个预测的差异。这样可以保留物理模型中的解释性,并且需要更少的数据。我们在实际的车辆轨迹数据集上进行了实验,比较了PERL模型与物理汽车追随模型、数据驱动模型和物理启发阶层神经网络(PINN)模型。结果显示,PERL模型在小数据集情况下表现更好,比物理模型、数据驱动模型和PINN模型更好。其次,PERL模型在训练过程中更快地趋向于平衡,需要更少的训练数据,与数据驱动模型和PINN模型相比。另外,对PERL模型使用不同的遗传学习模型和物理汽车追随模型进行了敏感性分析,结果显示PERL模型在不同的模型和物理模型下仍然具有良好的预测性。

Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning

  • paper_url: http://arxiv.org/abs/2309.15867
  • repo_url: None
  • paper_authors: Xiaoqin Huang, Asma Poursoroush, Jian Sun, Michael V. Boland, Chris Johnson, Siamak Yousefi
  • for: 本研究旨在透过不监督学习机器学习技术,确定 ocular hypertension (OHT) 的不同趋势型域视场 (VF) 进程,以及发现导致快速 VF 进程的因素。
  • methods: 研究使用了潜在类混合模型 (LCMM),通过标准自动报测 (SAP) 的 mean deviation (MD) 轨迹,确定 OHT 个体的不同类型。基于基线测试中的人口、临床、视觉和VF 因素,我们characterized 每个类型。然后,我们使用一般估计方程 (GEE) 确定导致快速 VF 进程的因素,并对结果进行质量和量化的解释。
  • results: LCMM 模型发现了四个眼睛类型,每个类型的 MD 衰减趋势不同。794 个眼睛(25%)、1675 个眼睛(54%)、531 个眼睛(17%)和 133 个眼睛(4%)分别被分为 Improvers、Stables、Slow progressors 和 Fast progressors。这些类型的 MD 衰减平均值分别为 0.08、-0.06、-0.21 和 -0.45 dB/年。快速 VF 进程相关的因素包括基线年龄、内分泌压力 (IOP)、 Pattern standard deviation (PSD) 和 refractive error (RE),但是低于中央肋壁厚度 (CCT)。fast progression 与 calcium channel blockers、男性、心血管疾病历史、糖尿病历史、非洲裔美国人、心脏病历史、 migraine headaches 有关。
    Abstract Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.
    摘要 目的:通过无监督机器学习发现 ocular hypertension (OHT) 的不同趋势视场(VF)进程 под型,并找到加速 VF 进程的因素。参与者:全部有 3133 个眼和 1568 名 ocular hypertension treatment study(OHTS)参与者,每个参与者至少有五次视场测试。方法:我们使用潜在类混合模型(LCMM)来发现 OHT 的不同 под型,使用标准自动测测(SAP)的 Mean Deviation(MD)轨迹。我们根据基线测试时的 демографи、临床、眼科和视场因素进行分类。然后,我们使用通用估计方法(GEE)来确定加速 VF 进程的因素,并证明发现的结论。结果:LCMM 模型发现了四种眼睛的不同趋势 MD 下降。这些群体的眼睛数量分别为 794(25%)、1675(54%)、531(17%)和 133(4%)。我们将这些群体分别命名为 Improvers、Stables、Slow progressors 和 Fast progressors,根据每个群体的 MD 下降的平均值。加速 VF 进程的眼睛具有更高的基线年龄、血压(IOP)、模式标准差(PSD)和视力错觉(RE),但更低的中央肾脏厚度(CCT)。加速 VF 进程与 calcium channel blockers、男性、心血管疾病历史、糖尿病历史、非洲裔美国人种、心血管疾病历史、 migraine 和头痛历史有关。

Method and Validation for Optimal Lineup Creation for Daily Fantasy Football Using Machine Learning and Linear Programming

  • paper_url: http://arxiv.org/abs/2309.15253
  • repo_url: None
  • paper_authors: Joseph M. Mahoney, Tomasz B. Paniak
  • for: This paper aims to develop a method to forecast NFL player performance under uncertainty and determine an optimal lineup to maximize FPTS under a set salary limit.
  • methods: The paper uses a supervised learning neural network to project FPTS based on past player performance, and a mixed integer linear program to find the optimal lineup.
  • results: The optimal lineups outperformed randomly-created lineups on average, and fell in approximately the 31st percentile (median) compared to real-world lineups from users on DraftKings.Here’s the information in Simplified Chinese text:
  • for: 这篇论文的目的是开发一种能够在不确定性下预测NFL球员表现的方法,以及一种可以在一定的薪资上限下最大化FPTS的优化阵容。
  • methods: 这篇论文使用一种监督学习神经网络来预测FPTS,并使用杂合Integer Linear Programming来找到最优阵容。
  • results: 优化的阵容在 randomly-created 阵容的平均上赢得了比赛,并且与DraftKings上用户的真实阵容相比,通常在 Approximately 31st percentile (medians) 之间。
    Abstract Daily fantasy sports (DFS) are weekly or daily online contests where real-game performances of individual players are converted to fantasy points (FPTS). Users select players for their lineup to maximize their FPTS within a set player salary cap. This paper focuses on (1) the development of a method to forecast NFL player performance under uncertainty and (2) determining an optimal lineup to maximize FPTS under a set salary limit. A supervised learning neural network was created and used to project FPTS based on past player performance (2018 NFL regular season for this work) prior to the upcoming week. These projected FPTS were used in a mixed integer linear program to find the optimal lineup. The performance of resultant lineups was compared to randomly-created lineups. On average, the optimal lineups outperformed the random lineups. The generated lineups were then compared to real-world lineups from users on DraftKings. The generated lineups generally fell in approximately the 31st percentile (median). The FPTS methods and predictions presented here can be further improved using this study as a baseline comparison.
    摘要 每周或每天的在线日常体育竞技 (DFS) 是一种在线竞技平台, Users可以选择球员来增加他们的极限积分 (FPTS),而不超过球员薪资限额。这篇论文将 concentrate on (1) 预测 NFL 球员表现的方法的开发和 (2) 根据薪资限额最大化 FPTS 的优补策略。一种以过去 NFL 赛季 (2018 赛季) 的球员表现数据进行预测的超vised 学习神经网络被创建并使用,以预测下一周的 FPTS。这些预测的 FPTS 然后被用在杂integer linear program中找到最佳阵容。结果的阵容与Randomly创建的阵容进行比较,并发现了最佳阵容的性能较高。然后,这些阵容与 DraftKings 上用户实际创建的阵容进行比较,并发现了这些阵容在approximately 31% 的位置 (中位数)。这种 FPTS 预测和方法可以在这个基准比较中进行进一步改进。

V2X-Lead: LiDAR-based End-to-End Autonomous Driving with Vehicle-to-Everything Communication Integration

  • paper_url: http://arxiv.org/abs/2309.15252
  • repo_url: None
  • paper_authors: Zhiyun Deng, Yanjun Shi, Weiming Shen
  • for: 本研究提出了一种基于LiDAR的端到端自动驾驶方法,通过与所有东西(V2X)通信集成,解决城市化环境下杂化交通条件下的自动驾驶挑战。
  • methods: 提议方法利用了车辆上的LiDAR感知器和V2X通信数据进行融合处理,采用了无模型和离线学习(DRL)算法来训练驾驶代理人,并采用了精心设计的奖励函数和多任务学习技术来提高驾驶代理人的泛化性。
  • results: 实验结果表明,提议方法可以在杂化交通条件下 traverse不监控交叉路口时提高安全性和效率,并在不同的驾驶任务和场景中进行泛化。V2X通信的集成提供了让AV更好地感知周围环境的重要数据源,从而提高了驾驶行为的准确性和完整性。
    Abstract This paper presents a LiDAR-based end-to-end autonomous driving method with Vehicle-to-Everything (V2X) communication integration, termed V2X-Lead, to address the challenges of navigating unregulated urban scenarios under mixed-autonomy traffic conditions. The proposed method aims to handle imperfect partial observations by fusing the onboard LiDAR sensor and V2X communication data. A model-free and off-policy deep reinforcement learning (DRL) algorithm is employed to train the driving agent, which incorporates a carefully designed reward function and multi-task learning technique to enhance generalization across diverse driving tasks and scenarios. Experimental results demonstrate the effectiveness of the proposed approach in improving safety and efficiency in the task of traversing unsignalized intersections in mixed-autonomy traffic, and its generalizability to previously unseen scenarios, such as roundabouts. The integration of V2X communication offers a significant data source for autonomous vehicles (AVs) to perceive their surroundings beyond onboard sensors, resulting in a more accurate and comprehensive perception of the driving environment and more safe and robust driving behavior.
    摘要

Homotopy Relaxation Training Algorithms for Infinite-Width Two-Layer ReLU Neural Networks

  • paper_url: http://arxiv.org/abs/2309.15244
  • repo_url: None
  • paper_authors: Yahong Yang, Qipin Chen, Wenrui Hao
  • for: 加速神经网络训练过程
  • methods: 提出了一种新的训练方法—Homotopy Relaxation Training Algorithm (HRTA),通过连接线性活动函数和ReLU活动函数的homotopy活动函数,以及对训练精度进行放松来加速训练过程。
  • results: 经过对NTK上的深度神经网络的深入分析,显示HRTA可以提高训练速度的 convergence rates,特别是在宽度更大的神经网络中。实验结果也验证了理论结论。这种提出的HRTA具有在其他活动函数和深度神经网络中的潜在应用前景。
    Abstract In this paper, we present a novel training approach called the Homotopy Relaxation Training Algorithm (HRTA), aimed at accelerating the training process in contrast to traditional methods. Our algorithm incorporates two key mechanisms: one involves building a homotopy activation function that seamlessly connects the linear activation function with the ReLU activation function; the other technique entails relaxing the homotopy parameter to enhance the training refinement process. We have conducted an in-depth analysis of this novel method within the context of the neural tangent kernel (NTK), revealing significantly improved convergence rates. Our experimental results, especially when considering networks with larger widths, validate the theoretical conclusions. This proposed HRTA exhibits the potential for other activation functions and deep neural networks.
    摘要 在这篇论文中,我们提出了一种新的训练方法,称为Homotopy Relaxation Training Algorithm(HRTA),目的是加速训练过程,而不是使用传统方法。我们的算法包含两个关键机制:一是建立一个连续函数 activation function,将线性 activation function 与 ReLU activation function 连续连接起来;另一个技术是通过放松 homotopy 参数来提高训练精度过程。我们在NTK 的背景下进行了深入的分析,发现该新方法可以提高训练速度。我们的实验结果,特别是考虑到大width 网络,证明了我们的理论结论。这种提出的 HRTA 具有潜在的应用前提,并不仅限于 activation functions 和深度神经网络。

Cross-Validation for Training and Testing Co-occurrence Network Inference Algorithms

  • paper_url: http://arxiv.org/abs/2309.15225
  • repo_url: https://github.com/EngineerDanny/CS685-Microbe-Network-Research-Code
  • paper_authors: Daniel Agyapong, Jeffrey Ryan Propster, Jane Marks, Toby Dylan Hocking
  • for: 本研究旨在提出一种新的评估维度网络推断算法的方法,以及应用现有算法预测测试数据。
  • methods: 本研究使用了现有的网络推断算法,并提出了一种新的评估方法来评估网络推断算法的质量。
  • results: 研究发现,提出的评估方法可以帮助选择最佳的网络推断算法和评估网络推断算法的质量。
    Abstract Microorganisms are found in almost every environment, including the soil, water, air, and inside other organisms, like animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. A lot of research has gone into studying microbial communities in various environments and how their interactions and relationships can provide insights into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our empirical study shows that the proposed method is useful for hyper-parameter selection (training) and comparing the quality of the inferred networks between different algorithms (testing).
    摘要 微生物可以在各种环境中找到,包括土壤、水、空气以及其他生物体内。虽然一些微生物会引起疾病,但大多数微生物帮助进行生物过程,如腐败、发酵和营养循环。研究微生物社区在不同环境中的相互作用和关系可以提供疾病研究的意义。存在的网络推理算法可以帮助我们理解微生物之间的复杂关系,特别是细菌。现有的网络推理算法使用技术如相关性、规则化线性回归和conditional dependence,它们的各种超参数会影响网络的稀畴程度。以前的评估推理网络质量的方法包括使用外部数据和网络在不同抽样下的一致性,但这些方法有一些缺点,限制它们在实际微生物组成数据集中的应用。我们提议一种新的验证方法来评估推理网络推理算法,以及新的方法来应用现有算法预测测试数据。我们的实验显示,我们的方法有用于权重选择(训练)和对不同算法预测测试数据的质量进行比较。

Auto-grading C programming assignments with CodeBERT and Random Forest Regressor

  • paper_url: http://arxiv.org/abs/2309.15216
  • repo_url: None
  • paper_authors: Roshan Vasu Muddaluru, Sharvaani Ravikumar Thoguluva, Shruti Prabha, Dr. Peeta Basa Pati, Ms. Roshni M Balakrishnan
  • for: 这个论文主要是为了描述如何使用深度学习自动评分编程作业,以减轻教师的评分负担,同时确保公正和有效的评分。
  • methods: 该论文使用了多种机器学习和深度学习方法,包括回归、卷积神经网络(CNN)和长短期记忆(LSTM),以及一种名为CodeBERT的代码基于转换器词嵌入模型。
  • results: 测试结果表明,使用该方法可以准确地评分C编程作业, Root Mean Squared Error(RMSE)为1.89。 这些结果表明,使用深度学习自动评分编程作业的方法比使用统计方法更有效。
    Abstract Grading coding assignments manually is challenging due to complexity and subjectivity. However, auto-grading with deep learning simplifies the task. It objectively assesses code quality, detects errors, and assigns marks accurately, reducing the burden on instructors while ensuring efficient and fair assessment. This study provides an analysis of auto-grading of the C programming assignments using machine learning and deep learning approaches like regression, convolutional neural networks (CNN) and long short-term memory (LSTM). Using a code-based transformer word embedding model called CodeBERT, the textual code inputs were transformed into vectors, and the vectors were then fed into several models. The testing findings demonstrated the efficacy of the suggested strategy with a root mean squared error (RMSE) of 1.89. The contrast between statistical methods and deep learning techniques is discussed in the study.
    摘要 “手动评分程式作业是具有复杂性和主观性的,但使用深度学习可以简化这个任务。它 объектив地评估程式的质量,检测错误,并将分数划分给学生,从而减轻教师的负担,同时确保了公正和有效的评估。本研究通过机器学习和深度学习方法(如回归、单调数网络和长期缓存)进行自动评分C语言作业的分析。使用一个称为CodeBERT的程式码基于词汇嵌入模型,将文字式程式码转换为向量,然后将向量输入到不同模型中进行评分。测试结果显示了建议的策略的有效性,RMSE为1.89。研究中也讨论了统计方法和深度学习技术之间的比较。”Note: "Simplified Chinese" refers to the standardized form of Chinese used in mainland China and Singapore, which is different from "Traditional Chinese" used in Taiwan and other countries.

Balancing Computational Efficiency and Forecast Error in Machine Learning-based Time-Series Forecasting: Insights from Live Experiments on Meteorological Nowcasting

  • paper_url: http://arxiv.org/abs/2309.15207
  • repo_url: None
  • paper_authors: Elin Törnquist, Wagner Costa Santos, Timothy Pogue, Nicholas Wingle, Robert A. Caulk
    for:This paper aims to explore the relationship between computational cost and forecast error in machine learning-based time-series forecasting, using meteorological nowcasting as an example.methods:The paper employs various popular regression techniques, including XGBoost, FC-MLP, Transformer, and LSTM, for multi-horizon, short-term forecasting of temperature, wind speed, and cloud cover at multiple locations. The authors also propose two computational cost minimization methods: a novel auto-adaptive data reduction technique called Variance Horizon and a performance-based concept drift-detection mechanism.results:The results show that using the Variance Horizon technique can reduce computational usage by more than 50%, while increasing forecast error by up to 15%. Meanwhile, performance-based retraining can reduce computational usage by up to 90%, while improving forecast error by up to 10%. The combination of both techniques outperformed other model configurations by up to 99.7% when considering error normalized to computational usage.
    Abstract Machine learning for time-series forecasting remains a key area of research. Despite successful application of many machine learning techniques, relating computational efficiency to forecast error remains an under-explored domain. This paper addresses this topic through a series of real-time experiments to quantify the relationship between computational cost and forecast error using meteorological nowcasting as an example use-case. We employ a variety of popular regression techniques (XGBoost, FC-MLP, Transformer, and LSTM) for multi-horizon, short-term forecasting of three variables (temperature, wind speed, and cloud cover) for multiple locations. During a 5-day live experiment, 4000 data sources were streamed for training and inferencing 144 models per hour. These models were parameterized to explore forecast error for two computational cost minimization methods: a novel auto-adaptive data reduction technique (Variance Horizon) and a performance-based concept drift-detection mechanism. Forecast error of all model variations were benchmarked in real-time against a state-of-the-art numerical weather prediction model. Performance was assessed using classical and novel evaluation metrics. Results indicate that using the Variance Horizon reduced computational usage by more than 50\%, while increasing between 0-15\% in error. Meanwhile, performance-based retraining reduced computational usage by up to 90\% while \emph{also} improving forecast error by up to 10\%. Finally, the combination of both the Variance Horizon and performance-based retraining outperformed other model configurations by up to 99.7\% when considering error normalized to computational usage.
    摘要 “机器学习 для时间序列预测仍然是研究领域的关键领域。尽管许多机器学习技术已经得到成功应用,但将计算效率与预测误差之间的关系进行研究仍然是一个未探索的领域。这篇论文通过一系列实时实验来评估这个问题,使用天气预报为例子应用场景。我们使用了多种流行的回归技术(XGBoost、FC-MLP、Transformer和LSTM)进行多个地点的多时间档期预测温度、风速和云覆盖率。在5天的实验中,我们流动了4000个数据源,每小时训练和推断144个模型。这些模型被参数化以探索预测误差的两种计算成本减少方法:一种新的自适应数据减少技术(Variance Horizon)和一种基于性能的概念漂移检测机制。所有模型变化的预测误差都在实时比较之前的状态艺术天气预报模型。我们使用了传统和新的评价指标来评估性能。结果表明,使用Variance Horizon可以降低计算使用率超过50%,而预测误差也在0-15%之间增加。同时,基于性能的重新训练可以降低计算使用率达到90%,而同时也提高预测误差达到10%。最后,将Variance Horizon和基于性能的重新训练结合使用的模型在计算使用率normalized预测误差方面表现出色,高达99.7%。”

ICML 2023 Topological Deep Learning Challenge : Design and Results

  • paper_url: http://arxiv.org/abs/2309.15188
  • repo_url: https://github.com/pyt-team/topomodelx
  • paper_authors: Mathilde Papillon, Mustafa Hajij, Florian Frantzen, Josef Hoppe, Helen Jenne, Johan Mathe, Audun Myers, Theodore Papamarkou, Michael T. Schaub, Ghada Zamzmi, Tolga Birdal, Tamal Dey, Tim Doster, Tegan Emerson, Gurusankar Gopalakrishnan, Devendra Govil, Vincent Grande, Aldo Guzmán-Sáenz, Henry Kvinge, Neal Livesay, Jan Meisner, Soham Mukherjee, Shreyas N. Samaga, Karthikeyan Natesan Ramamurthy, Maneel Reddy Karri, Paul Rosen, Sophia Sanborn, Michael Scholkemper, Robin Walters, Jens Agerberg, Georg Bökman, Sadrodin Barikbin, Claudio Battiloro, Gleb Bazhenov, Guillermo Bernardez, Aiden Brent, Sergio Escalera, Simone Fiorellino, Dmitrii Gavrilev, Mohammed Hassanin, Paul Häusner, Odin Hoff Gardaa, Abdelwahed Khamis, Manuel Lecha, German Magai, Tatiana Malygina, Pavlo Melnyk, Rubén Ballester, Kalyan Nadimpalli, Alexander Nikitin, Abraham Rabinowitz, Alessandro Salatiello, Simone Scardapane, Luca Scofano, Suraj Singh, Jens Sjölund, Pavel Snopov, Indro Spinelli, Lev Telyatnikov, Lucia Testa, Maosheng Yang, Yixiao Yue, Olga Zaghen, Ali Zia, Nina Miolane
  • for: 本研究是一项计算挑战,探讨了拓扑深度学习的计算问题。
  • methods: 本研究使用了开源的Python包TopoNetX和TopoModelX进行数据处理和深度学习。
  • results: 研究得到了28个合格的提交,并summarizes了挑战的主要发现。
    Abstract This paper presents the computational challenge on topological deep learning that was hosted within the ICML 2023 Workshop on Topology and Geometry in Machine Learning. The competition asked participants to provide open-source implementations of topological neural networks from the literature by contributing to the python packages TopoNetX (data processing) and TopoModelX (deep learning). The challenge attracted twenty-eight qualifying submissions in its two-month duration. This paper describes the design of the challenge and summarizes its main findings.
    摘要 Note:* topological neural networks 改为 topological deep learning* data processing 改为 数据处理* deep learning 改为 深度学习

Monitoring Machine Learning Models: Online Detection of Relevant Deviations

  • paper_url: http://arxiv.org/abs/2309.15187
  • repo_url: None
  • paper_authors: Florian Heinrichs
  • for: 本研究旨在提供一种可靠地检测机器学习模型性能下降的方法,以维护模型的可靠性。
  • methods: 本研究提议一种顺序监测方案,通过考虑时间依赖性来减少不必要的警报和多测试问题。
  • results: 实验结果表明,本方法可以更好地检测机器学习模型性能下降,比 benchmark 方法更有效。
    Abstract Machine learning models are essential tools in various domains, but their performance can degrade over time due to changes in data distribution or other factors. On one hand, detecting and addressing such degradations is crucial for maintaining the models' reliability. On the other hand, given enough data, any arbitrary small change of quality can be detected. As interventions, such as model re-training or replacement, can be expensive, we argue that they should only be carried out when changes exceed a given threshold. We propose a sequential monitoring scheme to detect these relevant changes. The proposed method reduces unnecessary alerts and overcomes the multiple testing problem by accounting for temporal dependence of the measured model quality. Conditions for consistency and specified asymptotic levels are provided. Empirical validation using simulated and real data demonstrates the superiority of our approach in detecting relevant changes in model quality compared to benchmark methods. Our research contributes a practical solution for distinguishing between minor fluctuations and meaningful degradations in machine learning model performance, ensuring their reliability in dynamic environments.
    摘要

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

  • paper_url: http://arxiv.org/abs/2309.15111
  • repo_url: None
  • paper_authors: Margalit Glasgow
  • for: 这个论文研究了使用小批量随机梯度下降(SGD)来有效地学习二层神经网络。
  • methods: 论文使用了标准的SGD算法和Logistic损失函数来训练神经网络。同时,论文同时训练了两层神经网络的两个层。
  • results: 论文证明了,当数据来自$d$维布尔体系,使用$d$个polylog($d$)样本来训练二层神经网络,可以达到 populaion error $o(1)$。这是首次有人提供了$\tilde{O}(d)$的样本复杂度来有效地学习XOR函数在标准神经网络上。
    Abstract In this work, we consider the optimization process of minibatch stochastic gradient descent (SGD) on a 2-layer neural network with data separated by a quadratic ground truth function. We prove that with data drawn from the $d$-dimensional Boolean hypercube labeled by the quadratic ``XOR'' function $y = -x_ix_j$, it is possible to train to a population error $o(1)$ with $d \:\text{polylog}(d)$ samples. Our result considers simultaneously training both layers of the two-layer-neural network with ReLU activations via standard minibatch SGD on the logistic loss. To our knowledge, this work is the first to give a sample complexity of $\tilde{O}(d)$ for efficiently learning the XOR function on isotropic data on a standard neural network with standard training. Our main technique is showing that the network evolves in two phases: a $\textit{signal-finding}$ phase where the network is small and many of the neurons evolve independently to find features, and a $\textit{signal-heavy}$ phase, where SGD maintains and balances the features. We leverage the simultaneous training of the layers to show that it is sufficient for only a small fraction of the neurons to learn features, since those neurons will be amplified by the simultaneous growth of their second layer weights.
    摘要 在这个工作中,我们考虑了批处理式随机梯度下降(SGD)在二层神经网络上的优化过程。我们证明了,当数据来自 $d$ 维布尔多面体,并且标注为 quadratic “XOR” 函数 $y = -x_ix_j$,然后可以在 $d$ 个polylog(d)样本上培养到 population error $o(1)$。我们的结果同时考虑了两层神经网络中的两个层的标准批处理SGD的训练。根据我们所知,这是首次为效率地学习 XOR 函数在各向同性数据上的标准神经网络进行 $\tilde{O}(d)$ 样本的证明。我们的主要技巧是显示网络在两个阶段中进行发展:一个 $\textit{signal-finding}$ 阶段,在这个阶段,网络很小,许多神经元独立地找到特征;另一个 $\textit{signal-heavy}$ 阶段,SGD在维护和平衡特征。我们利用同时训练层的技巧,显示只需要一小部分神经元学习特征,因为这些神经元将在同时增长其第二层权重时被扩大。

Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

  • paper_url: http://arxiv.org/abs/2309.15096
  • repo_url: None
  • paper_authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci
  • for: 本研究探讨了深度神经网络的理论分析,具体来说是对SGD训练的推理和ReLU网络的正则化训练目标的全面优化。
  • methods: 本研究使用了多kernel学习(MKL)模型和迭代重量调整来解决深度神经网络的训练问题。
  • results: 研究发现,对于某些特定的掩码量,NTK不能在训练集上达到最佳性能,而MKL kernel则可以达到最佳性能。通过iterative reweighting,我们可以从NTK中获得最佳MKLkernel,并且提供了一些数值实验来证明我们的理论。
    Abstract Recently, theoretical analyses of deep neural networks have broadly focused on two directions: 1) Providing insight into neural network training by SGD in the limit of infinite hidden-layer width and infinitesimally small learning rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2) Globally optimizing the regularized training objective via cone-constrained convex reformulations of ReLU networks. The latter research direction also yielded an alternative formulation of the ReLU network, called a gated ReLU network, that is globally optimizable via efficient unconstrained convex programs. In this work, we interpret the convex program for this gated ReLU network as a Multiple Kernel Learning (MKL) model with a weighted data masking feature map and establish a connection to the NTK. Specifically, we show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data. A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set. By using iterative reweighting, we improve the weights induced by the NTK to obtain the optimal MKL kernel which is equivalent to the solution of the exact convex reformulation of the gated ReLU network. We also provide several numerical simulations corroborating our theory. Additionally, we provide an analysis of the prediction error of the resulting optimal kernel via consistency results for the group lasso.
    摘要 近来,深度神经网络的理论分析主要集中在两个方向上:1)通过SGD训练在无穷层宽和学习率为零的极限下提供神经网络训练的洞察,使用神经 Tangent Kernel (NTK);2)globally optimizing the regularized training objective via cone-constrained convex reformulations of ReLU networks。后者的研究方向还产生了一种称为闭合ReLU网络的代表形式,可以通过高效的无约束凸 програм序global optimiztion。在这个工作中,我们将这个凸程序解释为一种多重kernel学习(MKL)模型,并与NTK建立连接。具体来说,我们证明在某些掩码权重不виси于学习目标时,这个kernel与NTK相同,这个kernel是gated ReLU网络的NTK在训练数据上。由于这些掩码权重不виси于学习目标,NTK无法在训练集上做出更好的性能。通过迭代重新权重,我们可以改进由NTK引入的权重,以获得最佳的MKL kernel,这个kernel与gated ReLU网络的凸形式的正确解相同。我们还提供了一些数值实验证明我们的理论。此外,我们还提供了预测误差的分析,使用群集lasso的一致性结果。

Automated Detection of Persistent Inflammatory Biomarkers in Post-COVID-19 Patients Using Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2309.15838
  • repo_url: None
  • paper_authors: Ghizal Fatima, Fadhil G. Al-Amran, Maitham G. Yousif
    for: 这份研究的目的是探索机器学习技术可以自动识别潜在的急性炎症生物 markers在 COVID-19 后期病人中,以提高医疗对病人的早期诊断和个性化治疗策略。methods: 这份研究使用了多种机器学习算法,包括逻辑回传、随机森林、支持向量机器和渐进提升,将资料进行了严格的数据预processing和特征选择,以便优化资料集供机器学习分析。results: 这份研究发现,使用机器学习技术可以实现高精度和确定性地自动识别 COVID-19 后期病人中的急性炎症生物 markers,并且这些模型可以作为医疗 provider 的有用工具,帮助早期诊断和个性化治疗策略,最终对 COVID-19 后期病人的康康和生活质量有所提高。
    Abstract The COVID-19 pandemic has left a lasting impact on individuals, with many experiencing persistent symptoms, including inflammation, in the post-acute phase of the disease. Detecting and monitoring these inflammatory biomarkers is critical for timely intervention and improved patient outcomes. This study employs machine learning techniques to automate the identification of persistent inflammatory biomarkers in 290 post-COVID-19 patients, based on medical data collected from hospitals in Iraq. The data encompassed a wide array of clinical parameters, such as C-reactive protein and interleukin-6 levels, patient demographics, comorbidities, and treatment histories. Rigorous data preprocessing and feature selection processes were implemented to optimize the dataset for machine learning analysis. Various machine learning algorithms, including logistic regression, random forests, support vector machines, and gradient boosting, were deployed to construct predictive models. These models exhibited promising results, showcasing high accuracy and precision in the identification of patients with persistent inflammation. The findings of this study underscore the potential of machine learning in automating the detection of persistent inflammatory biomarkers in post-COVID-19 patients. These models can serve as valuable tools for healthcare providers, facilitating early diagnosis and personalized treatment strategies for individuals at risk of persistent inflammation, ultimately contributing to improved post-acute COVID-19 care and patient well-being. Keywords: COVID-19, post-COVID-19, inflammation, biomarkers, machine learning, early detection.
    摘要 COVID-19 大流行对个人有持续影响,许多人在后途期患有持续的发炎症状。检测和监测这些发炎生物标志是关键,以确定患者的病情和提高患者的结果。这项研究使用机器学习技术自动识别290名患上 COVID-19 后期的患者中的持续发炎生物标志,基于伊拉克医院收集的医疗数据。数据包括丰富的临床参数,如 C-反抗蛋白和Interleukin-6 水平、患者人口、相关疾病和治疗历史。经过严格的数据预处理和特征选择过程,以便优化数据集 для机器学习分析。不同的机器学习算法,包括逻辑回归、Random Forest、支持向量机和梯度提升,被部署来构建预测模型。这些模型表现出色,展现了高精度和准确性在识别持续发炎患者方面。这些发现反映了机器学习在自动识别持续发炎生物标志方面的潜在潜力。这些模型可以作为医疗提供者的有价值工具,帮助早期诊断和个性化治疗策略,以提高后途期 COVID-19 患者的健康状况。关键词:COVID-19, 后途期 COVID-19, 发炎, 生物标志, 机器学习, 早期诊断.

Identifying Simulation Model Through Alternative Techniques for a Medical Device Assembly Process

  • paper_url: http://arxiv.org/abs/2309.15094
  • repo_url: None
  • paper_authors: Fatemeh Kakavandi
  • for: 这篇论文探讨了两种不同的方法来识别和估算生产过程模型,尤其是在医疗器械组装中关键的快照过程。
  • methods: 这两种方法分别使用spline函数和机器学习(ML)模型来识别模型。
  • results: 这些方法可以创建适应性强的模型,准确地表示快照过程,并且可以满足不同的enario。这些模型有助于进一步了解生产过程,帮助决策,特别当数据有限时。
    Abstract This scientific paper explores two distinct approaches for identifying and approximating the simulation model, particularly in the context of the snap process crucial to medical device assembly. Simulation models play a pivotal role in providing engineers with insights into industrial processes, enabling experimentation and troubleshooting before physical assembly. However, their complexity often results in time-consuming computations. To mitigate this complexity, we present two distinct methods for identifying simulation models: one utilizing Spline functions and the other harnessing Machine Learning (ML) models. Our goal is to create adaptable models that accurately represent the snap process and can accommodate diverse scenarios. Such models hold promise for enhancing process understanding and aiding in decision-making, especially when data availability is limited.
    摘要 这篇科学论文探讨了两种不同的方法来识别和估算模拟模型,尤其是在医疗器械组装过程中的快照过程中。模拟模型在工程师获得工业过程的洞察力方面发挥着关键作用,允许他们在实际组装之前进行实验和排查。然而,它们的复杂性常常导致计算时间很长。为了缓解这种复杂性,我们提出了使用spline函数和机器学习(ML)模型来识别模拟模型的两种方法。我们的目标是创造一些适应性强的模型,能够准确地表示快照过程并适应多种情况。这些模型在数据有限情况下能够提供进程理解和决策支持。

Single Biological Neurons as Temporally Precise Spatio-Temporal Pattern Recognizers

  • paper_url: http://arxiv.org/abs/2309.15090
  • repo_url: None
  • paper_authors: David Beniaguev
  • for: 本论文主要研究的中心思想是认为单个神经元在大脑应该被视为高精度的时间特征和空间特征 Pattern recognizer。
  • methods: 本论文使用了单个神经元的计算性质和生物学性质来探讨大脑中神经元之间的各种循环和信息编码方式。
  • results: 研究表明,单个神经元的计算特征有较大的系统级别影响,并且可以用简单可靠的学习规则来模拟大脑中的非线性XOR操作。
    Abstract This PhD thesis is focused on the central idea that single neurons in the brain should be regarded as temporally precise and highly complex spatio-temporal pattern recognizers. This is opposed to the prevalent view of biological neurons as simple and mainly spatial pattern recognizers by most neuroscientists today. In this thesis, I will attempt to demonstrate that this is an important distinction, predominantly because the above-mentioned computational properties of single neurons have far-reaching implications with respect to the various brain circuits that neurons compose, and on how information is encoded by neuronal activity in the brain. Namely, that these particular "low-level" details at the single neuron level have substantial system-wide ramifications. In the introduction we will highlight the main components that comprise a neural microcircuit that can perform useful computations and illustrate the inter-dependence of these components from a system perspective. In chapter 1 we discuss the great complexity of the spatio-temporal input-output relationship of cortical neurons that are the result of morphological structure and biophysical properties of the neuron. In chapter 2 we demonstrate that single neurons can generate temporally precise output patterns in response to specific spatio-temporal input patterns with a very simple biologically plausible learning rule. In chapter 3, we use the differentiable deep network analog of a realistic cortical neuron as a tool to approximate the gradient of the output of the neuron with respect to its input and use this capability in an attempt to teach the neuron to perform nonlinear XOR operation. In chapter 4 we expand chapter 3 to describe extension of our ideas to neuronal networks composed of many realistic biological spiking neurons that represent either small microcircuits or entire brain regions.
    摘要 这个博士论文主要关注的中心思想是:单个神经元在脑中应该被视为高精度的空间-时间模式识别器,而不是现在大多数神经科学家视为简单的空间模式识别器。在这个论文中,我会尝试证明这是一个重要的分别,因为单个神经元的计算属性在脑内部的各种神经细胞圈和信息编码方面具有广泛的系统性影响。在引言中,我们将高亮显示神经元微circuit中的主要组件,并 illustrate它们之间的互dependent关系从系统角度。在第一章中,我们讨论了触发区域中神经元的复杂的空间-时间输入-输出关系,这些关系是由神经元的形态结构和生物物理特性所导致。在第二章中,我们展示了单个神经元可以在特定的空间-时间输入模式下生成高精度的输出模式,使用了一种简单的生物可能的学习规则。在第三章中,我们使用一个可微分的深度网络模型来估算神经元输出的梯度,并使用这种能力来教育神经元执行非线性XOR操作。在第四章中,我们扩展了上述想法,描述了使用多个真实生物快速射精神神经元组成的神经网络,代表了小微circuit或整个脑区域。

On Excess Risk Convergence Rates of Neural Network Classifiers

  • paper_url: http://arxiv.org/abs/2309.15075
  • repo_url: None
  • paper_authors: Hyunouk Ko, Namjoon Suh, Xiaoming Huo
  • for: This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting, with a focus on their excess risks.
  • methods: The paper uses a more general scenario that resembles actual practice, with the function class including the Barron functions as a proper subset, and the neural network classifier is constructed as the minimizer of a surrogate loss.
  • results: The paper obtains a dimension-free, uniform rate of convergence for the excess risk, and shows that the rate is minimax optimal up to a logarithmic factor. The paper also demonstrates the effect of the margin assumption in this regime.
    Abstract The recent success of neural networks in pattern recognition and classification problems suggests that neural networks possess qualities distinct from other more classical classifiers such as SVMs or boosting classifiers. This paper studies the performance of plug-in classifiers based on neural networks in a binary classification setting as measured by their excess risks. Compared to the typical settings imposed in the literature, we consider a more general scenario that resembles actual practice in two respects: first, the function class to be approximated includes the Barron functions as a proper subset, and second, the neural network classifier constructed is the minimizer of a surrogate loss instead of the $0$-$1$ loss so that gradient descent-based numerical optimizations can be easily applied. While the class of functions we consider is quite large that optimal rates cannot be faster than $n^{-\frac{1}{3}$, it is a regime in which dimension-free rates are possible and approximation power of neural networks can be taken advantage of. In particular, we analyze the estimation and approximation properties of neural networks to obtain a dimension-free, uniform rate of convergence for the excess risk. Finally, we show that the rate obtained is in fact minimax optimal up to a logarithmic factor, and the minimax lower bound shows the effect of the margin assumption in this regime.
    摘要 近期,神经网络在图像识别和分类问题中的成功表明了神经网络具有与传统分类器不同的特点。这篇论文研究基于神经网络的插入分类器在二分类 Setting 中的性能,并且使用过程损失来衡量其剩余风险。与文献中常见的设定相比,我们在两个方面进行了更加实际的设定:首先,函数类型包括Barron函数作为一个 Correct 子集,其次,使用surrogate损失函数来构建神经网络分类器,以便使用梯度下降的数值优化。尽管我们考虑的函数集是非常大,但是我们可以在这个 regime 中获得约度独立的速度,并且利用神经网络的近似能力。特别是,我们分析神经网络的估计和抽象性特性,以获得约度独立的异常速度。最后,我们证明了获得的速度实际上是最优的,并且显示了margin假设在这个regime中的效果。

Targeting Relative Risk Heterogeneity with Causal Forests

  • paper_url: http://arxiv.org/abs/2309.15793
  • repo_url: https://github.com/vshirvaikar/rrcf
  • paper_authors: Vik Shirvaikar, Chris Holmes
  • for: This paper focuses on the problem of treatment effect heterogeneity (TEH) in clinical trial analysis, and proposes a method for modifying causal forests to target relative risk using a novel node-splitting procedure based on generalized linear model (GLM) comparison.
  • methods: The proposed method uses a modified version of causal forests, which is a highly popular method for detecting TEH, but with a focus on relative risk instead of absolute risk. The method uses a novel node-splitting procedure based on GLM comparison to capture nuance in the relative risk.
  • results: The results of the paper show that the proposed relative risk causal forests method can capture otherwise unobserved sources of heterogeneity, as demonstrated on simulated and real-world data.
    Abstract Treatment effect heterogeneity (TEH), or variability in treatment effect for different subgroups within a population, is of significant interest in clinical trial analysis. Causal forests (Wager and Athey, 2018) is a highly popular method for this problem, but like many other methods for detecting TEH, its criterion for separating subgroups focuses on differences in absolute risk. This can dilute statistical power by masking nuance in the relative risk, which is often a more appropriate quantity of clinical interest. In this work, we propose and implement a methodology for modifying causal forests to target relative risk using a novel node-splitting procedure based on generalized linear model (GLM) comparison. We present results on simulated and real-world data that suggest relative risk causal forests can capture otherwise unobserved sources of heterogeneity.
    摘要 клиниче观察数据分析中,受试者群体内部效果差异(TEH)的研究具有重要意义。 causal forests(Wager和Athey,2018)是这种问题的高度流行方法,但与其他TEH检测方法一样,它的分组标准是基于绝对风险的差异。这可能会削弱统计能力,因为它会隐藏对积分风险的细节,这通常是临床兴趣的量。在这种工作中,我们提议和实现了一种修改 causal forests 以target相对风险的方法,使用一种基于泛化线性模型(GLM)的新的节点拆分方法。我们在模拟和实际数据上的结果表明,相对风险 causal forests 可以捕捉到其他不可见的差异源。

QUILT: Effective Multi-Class Classification on Quantum Computers Using an Ensemble of Diverse Quantum Classifiers

  • paper_url: http://arxiv.org/abs/2309.15056
  • repo_url: None
  • paper_authors: Daniel Silver, Tirthak Patel, Devesh Tiwari
  • for: 这篇论文是用于描述一个名为 Quilt 的框架,用于进行多类别分类任务,并且能够在现有的误差多的量子计算机上进行有效运行。
  • methods: 这篇论文使用了量子计算机上的误差多的状况,并且使用了一些特别的算法来实现多类别分类任务。
  • results: 根据论文的报告,使用 Quilt 框架进行多类别分类任务,可以在现有的五边系统上达到85%的准确率,使用 MNIST 数据集。
    Abstract Quantum computers can theoretically have significant acceleration over classical computers; but, the near-future era of quantum computing is limited due to small number of qubits that are also error prone. Quilt is a framework for performing multi-class classification task designed to work effectively on current error-prone quantum computers. Quilt is evaluated with real quantum machines as well as with projected noise levels as quantum machines become more noise-free. Quilt demonstrates up to 85% multi-class classification accuracy with the MNIST dataset on a five-qubit system.
    摘要 量子计算机在理论上可能具有显著的加速效果比经典计算机更快;但是,近期量子计算机的时代受到有限的量子比特数和错误率的限制。Quilt是一个用于实现多类分类任务的框架,针对当前的错误率量子计算机进行设计。Quilt在真正的量子机器上以及预计的噪声水平下进行评估。Quilt在MNIST数据集上的五个量子比特系统上达到85%多类分类精度。

Synthia’s Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio

  • paper_url: http://arxiv.org/abs/2309.15024
  • repo_url: https://github.com/cynthpie/synthia_melody
  • paper_authors: Chia-Hsin Lin, Charles Jones, Björn W. Schuller, Harry Coppock
  • for: 本研究的目的是提供一个不受观察 bias 的音频数据生成框架,用于测试音频深度学习模型对不同水平的分布偏移的抵抗力。
  • methods: 该研究使用了一种新的数据生成框架 called Synthia’s melody,可以生成具有用户指定的障碍结构的无数种4秒钢琴 melody。
  • results: 经测试表明,Synthia’s melody 可以提供一个不受 observation bias 的测试环境,用于评估音频深度学习模型对分布偏移的抵抗力。
    Abstract Despite significant advancements in deep learning for vision and natural language, unsupervised domain adaptation in audio remains relatively unexplored. We, in part, attribute this to the lack of an appropriate benchmark dataset. To address this gap, we present Synthia's melody, a novel audio data generation framework capable of simulating an infinite variety of 4-second melodies with user-specified confounding structures characterised by musical keys, timbre, and loudness. Unlike existing datasets collected under observational settings, Synthia's melody is free of unobserved biases, ensuring the reproducibility and comparability of experiments. To showcase its utility, we generate two types of distribution shifts-domain shift and sample selection bias-and evaluate the performance of acoustic deep learning models under these shifts. Our evaluations reveal that Synthia's melody provides a robust testbed for examining the susceptibility of these models to varying levels of distribution shift.
    摘要 尽管深度学习在视觉和自然语言领域取得了 significative 进步,但无监督频谱适应仍然相对未explored。我们认为这是因为缺乏适当的标准 benchmark 数据集。为了填补这个遗漏,我们提出了 Synthia 的旋律,一种新的音频数据生成框架,可以生成无数个 4 秒旋律,并且可以根据用户指定的隐藏结构(音频键、 timbre 和响度)进行定制。不同于现有的观察型数据集,Synthia 的旋律不受不观察到的偏见影响,因此可以保证实验的重复性和比较性。为了展示其 utility,我们生成了两种类型的分布转移-频谱转移和样本选择偏见-并评估了这些模型在这些转移下的性能。我们的评估结果表明,Synthia 的旋律提供了一个可靠的测试床 для检验这些模型对不同水平的分布转移的抗性。

Tempo Adaptation in Non-stationary Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.14989
  • repo_url: None
  • paper_authors: Hyunin Lee, Yuhao Ding, Jongmin Lee, Ming Jin, Javad Lavaei, Somayeh Sojoudi
  • for: 解决非站点RL中时间同步问题,提高RL在实际应用中的可行性。
  • methods: 提出了一种名为Proactively Synchronizing Tempo($\texttt{ProST}$)框架,通过计算一个子优化的时间序列({$t_{1:K}$})来最小化动态 regret。
  • results: 实验证明,$\texttt{ProST}$框架在多维非站点环境中实现了较高的在线返点,并且比既有方法更具有可行性。
    Abstract We first raise and tackle a ``time synchronization'' issue between the agent and the environment in non-stationary reinforcement learning (RL), a crucial factor hindering its real-world applications. In reality, environmental changes occur over wall-clock time ($t$) rather than episode progress ($k$), where wall-clock time signifies the actual elapsed time within the fixed duration $t \in [0, T]$. In existing works, at episode $k$, the agent rolls a trajectory and trains a policy before transitioning to episode $k+1$. In the context of the time-desynchronized environment, however, the agent at time $t_{k}$ allocates $\Delta t$ for trajectory generation and training, subsequently moves to the next episode at $t_{k+1}=t_{k}+\Delta t$. Despite a fixed total number of episodes ($K$), the agent accumulates different trajectories influenced by the choice of interaction times ($t_1,t_2,...,t_K$), significantly impacting the suboptimality gap of the policy. We propose a Proactively Synchronizing Tempo ($\texttt{ProST}$) framework that computes a suboptimal sequence {$t_1,t_2,...,t_K$} (= { $t_{1:K}$}) by minimizing an upper bound on its performance measure, i.e., the dynamic regret. Our main contribution is that we show that a suboptimal {$t_{1:K}$} trades-off between the policy training time (agent tempo) and how fast the environment changes (environment tempo). Theoretically, this work develops a suboptimal {$t_{1:K}$} as a function of the degree of the environment's non-stationarity while also achieving a sublinear dynamic regret. Our experimental evaluation on various high-dimensional non-stationary environments shows that the $\texttt{ProST}$ framework achieves a higher online return at suboptimal {$t_{1:K}$} than the existing methods.
    摘要 我们首先面临到非站点学习(RL)中的时间同步问题,这是实际应用中的一个关键因素。在实际情况下,环境变化发生在墙上时间($t$)而不是 episodenumber($k$),即墙上时间表示实际过去的时间在固定时间段[$0,T$]中。现有的方法在每个 episodenumber $k$ 中,Agent在 $k$ episodenumber 中生成 trajectory 并训练策略,然后在 $k+1$ episodenumber 中继续。但在时间不同步的环境中,Agent 在 $t_k$ allocate $\Delta t$ для trajectory 生成和策略训练,然后在 $t_{k+1} = t_k + \Delta t$ 中移动到下一个 episodenumber。尽管有固定的总集数 ($K$),但 Agent 在不同的交互时间 ($t_1, t_2, ..., t_K$) 中收集了不同的 trajectory,这会对策略的优化差值产生显著的影响。我们提出了一个名为 Proactively Synchronizing Tempo( $\texttt{ProST}$)框架,该框架计算一个不优的序列 {$t_1, t_2, ..., t_K$} (= { $t_{1:K}$}),以降低动态 regret的上限。我们的主要贡献在于我们显示了一个不优的 {$t_{1:K}$} 与策略训练时间(Agent 的拍弹)和环境变化速度(环境的拍弹)之间存在交易关系。理论上,这种工作在环境的非站点性程度的度量下计算出一个不优的 {$t_{1:K}$},并实现了对动态 regret的下降。我们在高维非站点环境中进行了多个实验,结果表明,$\texttt{ProST}$ 框架在不优的 {$t_{1:K}$} 下实现了更高的在线返回。

Statistical Analysis of Quantum State Learning Process in Quantum Neural Networks

  • paper_url: http://arxiv.org/abs/2309.14980
  • repo_url: https://github.com/chenghongz/lim_learning_state
  • paper_authors: Hao-kai Zhang, Chenghong Zhu, Mingrui Jing, Xin Wang
  • for: 这篇论文旨在研究量子神经网络(QNNs)是否可以学习未知的量子状态,并证明在某些情况下,QNNs无法学习这种状态,即使从高精度的初始状态开始。
  • methods: 这篇论文使用了无果定理来证明,当损失值低于一定阈值时,QNNs 中的搜索空间内的概率会下降 exponentially WITH qubit 的数量,而只有 polynomial 增长 WITH circuit 的深度。
  • results: 研究结果表明,QNNs 无法学习未知的量子状态,即使从高精度的初始状态开始,并且这种不可学习性与 circuit 的结构、初始化策略和 ansatz 无关。 数据 simulations validate 了我们的理论结果。这些结果对 QNNs 的可学习性和扩展性带来了限制,同时深入了量子神经网络中备忘的优先知识的作用。
    Abstract Quantum neural networks (QNNs) have been a promising framework in pursuing near-term quantum advantage in various fields, where many applications can be viewed as learning a quantum state that encodes useful data. As a quantum analog of probability distribution learning, quantum state learning is theoretically and practically essential in quantum machine learning. In this paper, we develop a no-go theorem for learning an unknown quantum state with QNNs even starting from a high-fidelity initial state. We prove that when the loss value is lower than a critical threshold, the probability of avoiding local minima vanishes exponentially with the qubit count, while only grows polynomially with the circuit depth. The curvature of local minima is concentrated to the quantum Fisher information times a loss-dependent constant, which characterizes the sensibility of the output state with respect to parameters in QNNs. These results hold for any circuit structures, initialization strategies, and work for both fixed ansatzes and adaptive methods. Extensive numerical simulations are performed to validate our theoretical results. Our findings place generic limits on good initial guesses and adaptive methods for improving the learnability and scalability of QNNs, and deepen the understanding of prior information's role in QNNs.
    摘要

Context-Aware Generative Models for Prediction of Aircraft Ground Tracks

  • paper_url: http://arxiv.org/abs/2309.14957
  • repo_url: None
  • paper_authors: Nick Pepper, George De Ath, Marc Thomas, Richard Everson, Tim Dodwell
  • for: 支持航空交通管理者(ATCO)决策的 trajectory prediction(TP)扮演着重要的角色。
  • methods: 这项工作提出了一种生成方法,使用机器学习的概率模型来考虑飞机飞行轨迹中的epistemicuncertainty,即пилот行为和ATCO意图的不确定性。
  • results: 使用一周的英国上空航空Surveillance数据进行训练和测试,研究发现,使用bayesian neural network和Laplaceapproximation的模型可以生成最有可能性的轨迹,以便模拟航空交通的流动。
    Abstract Trajectory prediction (TP) plays an important role in supporting the decision-making of Air Traffic Controllers (ATCOs). Traditional TP methods are deterministic and physics-based, with parameters that are calibrated using aircraft surveillance data harvested across the world. These models are, therefore, agnostic to the intentions of the pilots and ATCOs, which can have a significant effect on the observed trajectory, particularly in the lateral plane. This work proposes a generative method for lateral TP, using probabilistic machine learning to model the effect of the epistemic uncertainty arising from the unknown effect of pilot behaviour and ATCO intentions. The models are trained to be specific to a particular sector, allowing local procedures such as coordinated entry and exit points to be modelled. A dataset comprising a week's worth of aircraft surveillance data, passing through a busy sector of the United Kingdom's upper airspace, was used to train and test the models. Specifically, a piecewise linear model was used as a functional, low-dimensional representation of the ground tracks, with its control points determined by a generative model conditioned on partial context. It was found that, of the investigated models, a Bayesian Neural Network using the Laplace approximation was able to generate the most plausible trajectories in order to emulate the flow of traffic through the sector.
    摘要 准确预测航空器轨迹(TP)在支持空交管理员(ATCO)决策中扮演着重要角色。传统的TP方法是决定性的,基于物理学术,参数通过全球采集的飞机抽象数据进行准确。这些模型因此具有不考虑飞行员和ATCO的意图的缺陷,特别在水平面上。这项工作提出了一种生成方法,使用概率机器学习来模拟飞机轨迹中不确定因素的影响,包括飞行员和ATCO的意图。模型通过特定到某个区域的方式进行训练,以便模型当地的过程,如协调入口和出口点。使用一周内英国Upper空间的繁忙区域的飞机抽象数据进行训练和测试。Specifically, a piecewise linear model was used as a functional, low-dimensional representation of the ground tracks, with its control points determined by a generative model conditioned on partial context. It was found that, of the investigated models, a Bayesian Neural Network using the Laplace approximation was able to generate the most plausible trajectories in order to emulate the flow of traffic through the sector.Note: Simplified Chinese is a romanization of Chinese that uses simpler characters and grammar to facilitate communication. It is not a formal standard of Chinese, but it is commonly used in informal writing and online communication.

Learning Generative Models for Climbing Aircraft from Radar Data

  • paper_url: http://arxiv.org/abs/2309.14941
  • repo_url: None
  • paper_authors: Nick Pepper, Marc Thomas
  • for: 这篇论文的目的是提出一个可靠的飞行器预测方法,以减少飞行器运行时的不确定性对预测结果的影响。
  • methods: 这篇论文使用了一个基于数据的生成模型,将标准的飞机数据库(BADA)模型与推进力的函数修正学习自数据。
  • results: 这篇论文的结果显示,使用这个方法可以预测飞行器的到达时间比BADA模型更加精确,并且生成的轨迹对测试数据的实际情况有着更高的吻合度。
    Abstract Accurate trajectory prediction (TP) for climbing aircraft is hampered by the presence of epistemic uncertainties concerning aircraft operation, which can lead to significant misspecification between predicted and observed trajectories. This paper proposes a generative model for climbing aircraft in which the standard Base of Aircraft Data (BADA) model is enriched by a functional correction to the thrust that is learned from data. The method offers three features: predictions of the arrival time with 66.3% less error when compared to BADA; generated trajectories that are realistic when compared to test data; and a means of computing confidence bounds for minimal computational cost.
    摘要 减少飞机轨迹预测错误的精准方法(TP),受飞机运行不确定性的影响,导致预测和观测轨迹之间的差异较大。这篇论文提出了一种基于飞机数据(BADA)模型的生成模型,通过学习数据来修正驱进力。该方法具有以下三个特点:1)预测到达时间的错误率比BADA低于66.3%;2)生成的轨迹与测试数据实际上具有准确性;3)可以计算出轨迹 confidence bound,而且计算成本较低。

Parallel Multi-Objective Hyperparameter Optimization with Uniform Normalization and Bounded Objectives

  • paper_url: http://arxiv.org/abs/2309.14936
  • repo_url: None
  • paper_authors: Romain Egele, Tyler Chang, Yixuan Sun, Venkatram Vishwanath, Prasanna Balaprakash
  • for: 提高机器学习模型的多个目标性能
  • methods: 使用多目标搜索优化机器学习模型的超参数,并采用均衡约束和随机权重缩放法提高效率
  • results: 提高了多目标性能优化的效率,并且可以快速并平铺地运行多个任务。
    Abstract Machine learning (ML) methods offer a wide range of configurable hyperparameters that have a significant influence on their performance. While accuracy is a commonly used performance objective, in many settings, it is not sufficient. Optimizing the ML models with respect to multiple objectives such as accuracy, confidence, fairness, calibration, privacy, latency, and memory consumption is becoming crucial. To that end, hyperparameter optimization, the approach to systematically optimize the hyperparameters, which is already challenging for a single objective, is even more challenging for multiple objectives. In addition, the differences in objective scales, the failures, and the presence of outlier values in objectives make the problem even harder. We propose a multi-objective Bayesian optimization (MoBO) algorithm that addresses these problems through uniform objective normalization and randomized weights in scalarization. We increase the efficiency of our approach by imposing constraints on the objective to avoid exploring unnecessary configurations (e.g., insufficient accuracy). Finally, we leverage an approach to parallelize the MoBO which results in a 5x speed-up when using 16x more workers.
    摘要

Verifiable Learned Behaviors via Motion Primitive Composition: Applications to Scooping of Granular Media

  • paper_url: http://arxiv.org/abs/2309.14894
  • repo_url: None
  • paper_authors: Andrew Benton, Eugen Solowjow, Prithvi Akella
  • for: 提高工业机器人采用率,通过实时生成自然语言输入的机器人行为模型。
  • methods: 基于语言抽象器学习的行为抽象器,通过生成指定的运动 primitives Synthesize directed graph,以实现行为的可靠验证。
  • results: 在simulation中进行探索任务和硬件上使用机器人挖掘固体媒体中的示范。
    Abstract A robotic behavior model that can reliably generate behaviors from natural language inputs in real time would substantially expedite the adoption of industrial robots due to enhanced system flexibility. To facilitate these efforts, we construct a framework in which learned behaviors, created by a natural language abstractor, are verifiable by construction. Leveraging recent advancements in motion primitives and probabilistic verification, we construct a natural-language behavior abstractor that generates behaviors by synthesizing a directed graph over the provided motion primitives. If these component motion primitives are constructed according to the criteria we specify, the resulting behaviors are probabilistically verifiable. We demonstrate this verifiable behavior generation capacity in both simulation on an exploration task and on hardware with a robot scooping granular media.
    摘要 一种可靠生成行为的机器人行为模型,可以在实时语言输入下生成行为,将加速工业机器人的采用,因为增加系统的灵活性。为了支持这些努力,我们构建了一个框架,在该框架中,通过自然语言抽象器学习的行为被可靠地验证。利用最新的动作基本 primitives和概率验证技术,我们构建了一个基于自然语言的行为抽象器,通过将提供的动作基本 primitives синтези为导向图来生成行为。如果这些组件动作基本 primitives按照我们的要求构建,则生成的行为是可靠地验证的。我们在实验中使用一个探索任务和硬件机器人夹取粒子物质进行了证明。

Credit Card Fraud Detection with Subspace Learning-based One-Class Classification

  • paper_url: http://arxiv.org/abs/2309.14880
  • repo_url: None
  • paper_authors: Zaffar Zaffar, Fahad Sohrab, Juho Kanniainen, Moncef Gabbouj
  • For: 这种论文旨在提出一种基于一类分类算法的自动信用卡fraud检测方法,以解决因commerce digitization而导致的信用卡fraud问题,以及随着fraud技术的不断发展,已有的检测方法的局限性。* Methods: 本文使用subspace learning-based One-Class Classification(OCC)算法,可以处理偏极分布的数据,同时具有预测未来fraud技术的能力。这种算法将数据描述于一个lower-dimensional的子空间中,从而提高了OCC的性能。* Results: 经过严格的实验和分析,本文证明了提出的方法可以有效地 mitigate credit card fraud detection中的curse of dimensionality和偏极分布问题,提高了自动检测的精度和效率。
    Abstract In an increasingly digitalized commerce landscape, the proliferation of credit card fraud and the evolution of sophisticated fraudulent techniques have led to substantial financial losses. Automating credit card fraud detection is a viable way to accelerate detection, reducing response times and minimizing potential financial losses. However, addressing this challenge is complicated by the highly imbalanced nature of the datasets, where genuine transactions vastly outnumber fraudulent ones. Furthermore, the high number of dimensions within the feature set gives rise to the ``curse of dimensionality". In this paper, we investigate subspace learning-based approaches centered on One-Class Classification (OCC) algorithms, which excel in handling imbalanced data distributions and possess the capability to anticipate and counter the transactions carried out by yet-to-be-invented fraud techniques. The study highlights the potential of subspace learning-based OCC algorithms by investigating the limitations of current fraud detection strategies and the specific challenges of credit card fraud detection. These algorithms integrate subspace learning into the data description; hence, the models transform the data into a lower-dimensional subspace optimized for OCC. Through rigorous experimentation and analysis, the study validated that the proposed approach helps tackle the curse of dimensionality and the imbalanced nature of credit card data for automatic fraud detection to mitigate financial losses caused by fraudulent activities.
    摘要 在数字化贸易景观中,信用卡诈骗的扩散和黑科技的不断演化,导致了严重的金融损失。自动化信用卡诈骗检测是一种可行的方法,可以加速检测,降低响应时间,最小化可能的金融损失。然而,解决这个挑战是因为数据集的高度偏好性和维度瓶颈的缘故复杂。在这篇论文中,我们调查了基于一个空间学习的一类分类算法,这些算法在处理偏好数据分布时表现出色,并具有预测和防范尚未发明的诈骗技术的能力。我们的研究探讨了现有的诈骗检测策略的局限性和信用卡诈骗检测的特定挑战。这些算法将数据描述中的subspace学习 integrate into the data description, so the models transform the data into a lower-dimensional subspace optimized for one-class classification.经过严格的实验和分析,我们的研究证明了我们提出的方法可以抗衡维度瓶颈和信用卡数据的偏好性,以便自动检测诈骗,从而减少由诈骗活动导致的金融损失。

Cluster Exploration using Informative Manifold Projections

  • paper_url: http://arxiv.org/abs/2309.14857
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: Stavros Gerolymatos, Xenophon Evangelopoulos, Vladimir Gusev, John Y. Goulermas
  • for: 本研究旨在提出一种基于层次结构的维度减少方法,以便利用先前知识来探索高维数据的视觉结构。
  • methods: 本方法使用了一种线性组合的目标函数,包括对偏好信息的杜尔减去和含义层次分析。
  • results: 实验表明,本方法可以效果地揭示高维数据中的层次结构,并且可以根据不同的先前知识进行自动化的视觉探索。
    Abstract Dimensionality reduction (DR) is one of the key tools for the visual exploration of high-dimensional data and uncovering its cluster structure in two- or three-dimensional spaces. The vast majority of DR methods in the literature do not take into account any prior knowledge a practitioner may have regarding the dataset under consideration. We propose a novel method to generate informative embeddings which not only factor out the structure associated with different kinds of prior knowledge but also aim to reveal any remaining underlying structure. To achieve this, we employ a linear combination of two objectives: firstly, contrastive PCA that discounts the structure associated with the prior information, and secondly, kurtosis projection pursuit which ensures meaningful data separation in the obtained embeddings. We formulate this task as a manifold optimization problem and validate it empirically across a variety of datasets considering three distinct types of prior knowledge. Lastly, we provide an automated framework to perform iterative visual exploration of high-dimensional data.
    摘要 维度减少(DR)是数据可见化中一种关键工具,可以在两到三维空间中揭示高维数据的层次结构。大多数DR方法在文献中忽略了具体数据的先验知识。我们提出了一种新的方法,可以生成有用的嵌入,不仅抑制了不同类型的先验知识结构,还尝试揭示剩下的下面结构。我们使用了一种线性组合的两个目标:首先,对比PCA,抑制先验知识结构;其次,峰度投影约束,确保获得的嵌入是有意义的。我们将这个任务视为一个拟合优化问题,并在多个数据集上进行了验证。最后,我们提供了一个自动化的框架,可以对高维数据进行迭代可见化。

Investigation of factors regarding the effects of COVID-19 pandemic on college students’ depression by quantum annealer

  • paper_url: http://arxiv.org/abs/2310.00018
  • repo_url: None
  • paper_authors: Junggu Choi, Kion Kim, Soohyun Park, Juyoen Hur, Hyunjung Yang, Younghoon Kim, Hakbae Lee, Sanghoon Han
  • for: 本研究旨在探讨COVID-19疫情对大学生们的心理健康产生的影响,以及这些影响因素的复杂关系。
  • methods: 本研究使用量子落差(QA)特征选择算法,通过商业D-Wave量子计算机执行,确定疫情前后不同因素之间的关系变化。同时还使用多变量线性回归(MLR)和XGBoost模型来验证QA算法的可行性。
  • results: 研究结果表明,QA算法在因素分析研究中具有与MLR模型广泛使用的相同能力。此外,QA算法的重要因素结果也被验证了。疫情相关因素(如社会系统信任度)和心理因素(如不确定情况下的决策)在后疫情条件下更加重要。我们认为,本研究将为研究类似主题的研究人员提供参考。
    Abstract Diverse cases regarding the impact, with its related factors, of the COVID-19 pandemic on mental health have been reported in previous studies. College student groups have been frequently selected as the target population in previous studies because they are easily affected by pandemics. In this study, multivariable datasets were collected from 751 college students based on the complex relationships between various mental health factors. We utilized quantum annealing (QA)-based feature selection algorithms that were executed by commercial D-Wave quantum computers to determine the changes in the relative importance of the associated factors before and after the pandemic. Multivariable linear regression (MLR) and XGBoost models were also applied to validate the QA-based algorithms. Based on the experimental results, we confirm that QA-based algorithms have comparable capabilities in factor analysis research to the MLR models that have been widely used in previous studies. Furthermore, the performance of the QA-based algorithms was validated through the important factor results from the algorithms. Pandemic-related factors (e.g., confidence in the social system) and psychological factors (e.g., decision-making in uncertain situations) were more important in post-pandemic conditions. We believe that our study will serve as a reference for researchers studying similar topics.
    摘要 Previous studies have reported diverse cases of the impact of the COVID-19 pandemic on mental health, with various related factors. College student groups have been frequently selected as the target population due to their vulnerability to pandemics. In this study, we collected multivariable datasets from 751 college students to examine the complex relationships between mental health factors using quantum annealing (QA)-based feature selection algorithms executed by commercial D-Wave quantum computers. We also applied multivariable linear regression (MLR) and XGBoost models for validation. Our results confirm that QA-based algorithms have comparable capabilities in factor analysis research to the widely used MLR models in previous studies. Additionally, the performance of the QA-based algorithms was validated through the important factor results from the algorithms. In post-pandemic conditions, pandemic-related factors (such as confidence in the social system) and psychological factors (such as decision-making in uncertain situations) were found to be more important. We believe that our study will serve as a reference for researchers studying similar topics.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Realtime Motion Generation with Active Perception Using Attention Mechanism for Cooking Robot

  • paper_url: http://arxiv.org/abs/2309.14837
  • repo_url: https://github.com/CraftXinkali/Dungeon-Quest-HUBS
  • paper_authors: Namiko Saito, Mayu Hiramoto, Ayuna Kubo, Kanata Suzuki, Hiroshi Ito, Shigeki Sugano, Tetsuya Ogata
  • for: 本研究旨在帮助机器人在日常生活中自动学习、适应物体和环境,并实现相应的动作。
  • methods: 本研究使用predictive recurrent neural network(PRNN)和注意力机制,可以快速和高效地识别感知信息中的重要信息和可靠信息,并基于这些信息进行动作生成。
  • results: 经过训练和验证,机器人通过学习人类技能,可以成功地烹饪不同的鸡蛋。机器人可以根据鸡蛋的状态进行不同的搅拌和扭转动作,例如在开始热煮鸡蛋时,机器人会搅拌整个锅,然后随着鸡蛋的热度提高,机器人会改变搅拌方式和target area,例如进行扭转和分割动作,即使没有直接指定。
    Abstract To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled on the task of cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement in real time, while the egg is heated and the state changes continuously. In previous works, handling changing objects was found to be challenging because sensory information includes dynamical, both important or noisy information, and the modality which should be focused on changes every time, making it difficult to realize both perception and motion generation in real time. We propose a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is, that realize quick and efficient perception and motion generation. The model is trained with learning from the demonstration, and allows the robot to acquire human-like skills. We validated the proposed technique using the robot, Dry-AIREC, and with our learning model, it could perform cooking eggs with unknown ingredients. The robot could change the method of stirring and direction depending on the status of the egg, as in the beginning it stirs in the whole pot, then subsequently, after the egg started being heated, it starts flipping and splitting motion targeting specific areas, although we did not explicitly indicate them.
    摘要 To address this challenge, we proposed a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is. This allows the robot to quickly and efficiently perceive and generate motion. The model is trained using learning from demonstration, allowing the robot to acquire human-like skills.We validated our proposed technique using the robot Dry-AIREC, and our learning model allowed the robot to cook eggs with unknown ingredients. The robot was able to change its method of stirring and direction depending on the status of the egg, starting with whole-pot stirring and then switching to flipping and splitting motions targeting specific areas as the egg heated up. This demonstrates the effectiveness of our proposed technique in enabling robots to adapt to changing objects and environments in real time.

OS-net: Orbitally Stable Neural Networks

  • paper_url: http://arxiv.org/abs/2309.14822
  • repo_url: None
  • paper_authors: Marieme Ngom, Carlo Graziani
  • for: 这篇论文是 для研究 периоди动力系统中的神经网络架构,尤其是使用运动方程式来描述动力系统的动作。
  • methods: 本论文使用了Neural Ordinary Differential Equations(NODEs)和运动方程式来建立神经网络架构,并利用ODE理论来确定网络参数的稳定性。
  • results: 本论文透过应用OS-net于罗斯勒和斯洛特的系统中,发现了 périod doubling attractors 和 chaotic 行为的动力学。
    Abstract We introduce OS-net (Orbitally Stable neural NETworks), a new family of neural network architectures specifically designed for periodic dynamical data. OS-net is a special case of Neural Ordinary Differential Equations (NODEs) and takes full advantage of the adjoint method based backpropagation method. Utilizing ODE theory, we derive conditions on the network weights to ensure stability of the resulting dynamics. We demonstrate the efficacy of our approach by applying OS-net to discover the dynamics underlying the R\"{o}ssler and Sprott's systems, two dynamical systems known for their period doubling attractors and chaotic behavior.
    摘要 我们介绍OS-net(Orbitally Stable neural NETworks),一新的神经网络架构,特别针对周期动力系统的资料。OS-net是NODEs(神经普通微分方程)的特殊情况,利用微分方程理论,我们得出了网络 Parameters的稳定性条件。我们透过实践OS-net,发现了R\"{o}ssler和Sprott的两个动力系统中的时间倍增吸引器和混沌行为。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Markov Chain Mirror Descent On Data Federation

  • paper_url: http://arxiv.org/abs/2309.14775
  • repo_url: None
  • paper_authors: Yawei Zhao
  • for: 这个论文探讨了在联合学习中使用随机抽样 descent的方法,并提出了一个新的版本名为 MarchOn。
  • methods: 这个方法使用了随机从节点到其邻居的选择,并提出了一个新的分析框架,可以实现最好的速度传播。
  • results: 这个研究获得了实验 validate theoretical results,并证明了 MarchOn 的传播速度是最佳的。
    Abstract Stochastic optimization methods such as mirror descent have wide applications due to low computational cost. Those methods have been well studied under assumption of the independent and identical distribution, and usually achieve sublinear rate of convergence. However, this assumption may be too strong and unpractical in real application scenarios. Recent researches investigate stochastic gradient descent when instances are sampled from a Markov chain. Unfortunately, few results are known for stochastic mirror descent. In the paper, we propose a new version of stochastic mirror descent termed by MarchOn in the scenario of the federated learning. Given a distributed network, the model iteratively travels from a node to one of its neighbours randomly. Furthermore, we propose a new framework to analyze MarchOn, which yields best rates of convergence for convex, strongly convex, and non-convex loss. Finally, we conduct empirical studies to evaluate the convergence of MarchOn, and validate theoretical results.
    摘要 Stochastic 优化方法,如镜像下降法,具有低计算成本,因此在各种应用场景中具有广泛的应用前景。然而,这些方法通常假设数据是独立和相同分布的,这可能是一个偏要假设。现在的研究则探讨在Markov链上采样实例时,镜像下降法的性能。尽管有些结果已经得到了关注,但是对于镜像下降法,还知之不够多。在本文中,我们提出了一种新的镜像下降法,称之为MarchOn,并在联合学习场景中应用。在分布网络上,模型会随机从一个节点跳转到其近邻节点。此外,我们还提出了一种新的分析框架,可以在 convex、强Converter、非凸损函数下达到最佳的速度。最后,我们进行了实验研究,证明了MarchOn的收敛性。

On the Computational Complexity and Formal Hierarchy of Second Order Recurrent Neural Networks

  • paper_url: http://arxiv.org/abs/2309.14691
  • repo_url: None
  • paper_authors: Ankur Mali, Alexander Ororbia, Daniel Kifer, Lee Giles
  • for: 这个论文旨在探讨基于人工神经网络(ANNs)的二阶Recurrent Neural Networks(RNNs)是否能够实现图灵完备性(TC)。
  • methods: 作者们使用了二阶RNNs的概率性和自注意力来实现TC。他们还提出了一种可解释的设计方法,可以在受限制的精度和时间下实现TC。
  • results: 作者们证明了二阶RNNs可以在受限制的精度和时间下recognize任何规则语言,并且在recognize regular grammars时表现更好于现代化RNNs和Gated Recurrent Units。他们还提供了一个Upper bound和稳定分析,证明二阶RNNs只需要一定的最大 neuron数来recognize任何规则语言。
    Abstract Artificial neural networks (ANNs) with recurrence and self-attention have been shown to be Turing-complete (TC). However, existing work has shown that these ANNs require multiple turns or unbounded computation time, even with unbounded precision in weights, in order to recognize TC grammars. However, under constraints such as fixed or bounded precision neurons and time, ANNs without memory are shown to struggle to recognize even context-free languages. In this work, we extend the theoretical foundation for the $2^{nd}$-order recurrent network ($2^{nd}$ RNN) and prove there exists a class of a $2^{nd}$ RNN that is Turing-complete with bounded time. This model is capable of directly encoding a transition table into its recurrent weights, enabling bounded time computation and is interpretable by design. We also demonstrate that $2$nd order RNNs, without memory, under bounded weights and time constraints, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars. We provide an upper bound and a stability analysis on the maximum number of neurons required by $2$nd order RNNs to recognize any class of regular grammar. Extensive experiments on the Tomita grammars support our findings, demonstrating the importance of tensor connections in crafting computationally efficient RNNs. Finally, we show $2^{nd}$ order RNNs are also interpretable by extraction and can extract state machines with higher success rates as compared to first-order RNNs. Our results extend the theoretical foundations of RNNs and offer promising avenues for future explainable AI research.
    摘要 人工神经网络(ANNs)带有回归和自注意力已被证明是图灵完备(TC)。然而,现有的工作表明,这些ANNs需要多个轮次或无限的计算时间,即使在无限精度的权重下,以recognize TC语法。然而,在受限于固定或受限的精度神经和时间的情况下,没有记忆的ANNs很难recognizeeven context-free语言。在这种工作中,我们扩展了第二阶段回归网络(2nd RNN)的理论基础,并证明存在一类的2nd RNN可以在受限时间内recognize TC语法。这种模型可以直接将转移表编码到其回归权重中,使得计算时间受限,并且可以通过设计来解释。我们还证明了没有记忆的2nd RNN,在固定权重和时间约束下,可以在较低的精度下recognize常见语法,并且在Tomita语法上进行了广泛的实验支持。最后,我们表明2nd RNN可以通过提取来解释,并且可以在比first-order RNN更高的成功率下提取状态机。我们的结果扩展了RNN的理论基础,并提供了未来可解释AI研究的有优点的方向。

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler

  • paper_url: http://arxiv.org/abs/2309.14675
  • repo_url: None
  • paper_authors: Zilinghan Li, Pranshu Chaturvedi, Shilan He, Han Chen, Gagandeep Singh, Volodymyr Kindratenko, E. A. Huerta, Kibaek Kim, Ravi Madduri
  • for: 这篇论文旨在解决跨节点联合学习中的客户端异相性和数据异相性问题,以实现无中央数据设施的情况下,实现强大且通用的人工智能模型训练。
  • methods: 本论文提出了一个名为FedCompass的创新的半同步联合学习算法,其在服务器端使用了资源知识来分配不同客户端的训练任务,以适应不同客户端的计算能力。这使得多个客户端的本地模型可以被收到并处理,以减少本地模型的落后性。
  • results: 使用了多种异步的非同相数据分布式集合,研究发现FedCompass可以比其他半同步算法更快地训练到准确性,并且在不同客户端的计算能力不同情况下保持高效率。
    Abstract Cross-silo federated learning offers a promising solution to collaboratively train robust and generalized AI models without compromising the privacy of local datasets, e.g., healthcare, financial, as well as scientific projects that lack a centralized data facility. Nonetheless, because of the disparity of computing resources among different clients (i.e., device heterogeneity), synchronous federated learning algorithms suffer from degraded efficiency when waiting for straggler clients. Similarly, asynchronous federated learning algorithms experience degradation in the convergence rate and final model accuracy on non-identically and independently distributed (non-IID) heterogeneous datasets due to stale local models and client drift. To address these limitations in cross-silo federated learning with heterogeneous clients and data, we propose FedCompass, an innovative semi-asynchronous federated learning algorithm with a computing power aware scheduler on the server side, which adaptively assigns varying amounts of training tasks to different clients using the knowledge of the computing power of individual clients. FedCompass ensures that multiple locally trained models from clients are received almost simultaneously as a group for aggregation, effectively reducing the staleness of local models. At the same time, the overall training process remains asynchronous, eliminating prolonged waiting periods from straggler clients. Using diverse non-IID heterogeneous distributed datasets, we demonstrate that FedCompass achieves faster convergence and higher accuracy than other asynchronous algorithms while remaining more efficient than synchronous algorithms when performing federated learning on heterogeneous clients.
    摘要 Simplified Chinese translation:跨存储 silo 联合学习提供了一个优秀的解决方案,以协同训练 robust 和泛化 AI 模型,无需妥协本地数据隐私,例如医疗、金融、以及科学项目,这些项目缺乏中央数据设施。然而,由于客户端(设备)的资源差异(device heterogeneity),同步联合学习算法受到客户端延迟的影响,而异步联合学习算法则因为客户端模型偏移和数据不均匀(non-IID)而导致衰落。为了解决这些限制,我们提出了 FedCompass,一种具有服务器端计算能力感知调度器的半同步联合学习算法。FedCompass 能够适应客户端的计算能力,并在服务器端分配不同的训练任务,以避免客户端延迟。这种方法使得多个客户端上的本地模型被接收并聚合,从而减少本地模型的衰落。同时,整个训练过程保持异步,从而消除客户端延迟。使用多种非同Kind 的非同一样 distributed datasets,我们证明了 FedCompass 在异步联合学习中实现了更快的整合速度和更高的准确率,同时保持和同步联合学习更高的效率。

Transformer-based classification of user queries for medical consultancy with respect to expert specialization

  • paper_url: http://arxiv.org/abs/2309.14662
  • repo_url: None
  • paper_authors: Dmitry Lyutkin, Andrey Soloviev, Dmitry Zhukov, Denis Pozdnyakov, Muhammad Shahid Iqbal Malik, Dmitry I. Ignatov
  • for: 这个研究旨在提供一个创新的数位健康照顾方法,通过利用 RuBERT 模型进行患者查询分类,并且强调专业专长。
  • methods: 我们使用 RuBERT 模型进行微调,利用不同的数据集进行 fine-tuning,以实现具体的医疗专业与查询之间的精确对应。
  • results: 我们透过实验证明了我们的方法在不同医疗领域(如心脏科、神经科和皮肤科)中的高效性,F1 分数超过 92%。
    Abstract The need for skilled medical support is growing in the era of digital healthcare. This research presents an innovative strategy, utilizing the RuBERT model, for categorizing user inquiries in the field of medical consultation with a focus on expert specialization. By harnessing the capabilities of transformers, we fine-tuned the pre-trained RuBERT model on a varied dataset, which facilitates precise correspondence between queries and particular medical specialisms. Using a comprehensive dataset, we have demonstrated our approach's superior performance with an F1-score of over 92%, calculated through both cross-validation and the traditional split of test and train datasets. Our approach has shown excellent generalization across medical domains such as cardiology, neurology and dermatology. This methodology provides practical benefits by directing users to appropriate specialists for prompt and targeted medical advice. It also enhances healthcare system efficiency, reduces practitioner burden, and improves patient care quality. In summary, our suggested strategy facilitates the attainment of specific medical knowledge, offering prompt and precise advice within the digital healthcare field.
    摘要 在数字医疗时代,需求专业医疗支持的增长日益明显。本研究提出了一种创新的策略,利用 RuBERT 模型,对医疗咨询用户问题进行分类,以提高专业化水平。通过利用 transformers 的能力,我们对预训练 RuBERT 模型进行了精细调整,使得问题和医疗专业之间达到精准匹配。通过使用完整的数据集,我们已经证明了我们的方法的超过 92% 的 F1 分数,通过跨Validation和传统的测试集和训练集分割。我们的方法在医学领域such as cardiology、neurology和dermatology中展现出了优秀的泛化性。这种方法ология在数字医疗领域提供了实用的好处,可以引导用户到相应的专家获得timely和精准的医疗建议,从而提高医疗系统的效率、减轻医生的负担、提高患者的健康质量。总之,我们建议的策略可以帮助在数字医疗领域获得特定的医学知识,提供时效和精准的医疗建议。

Genetic InfoMax: Exploring Mutual Information Maximization in High-Dimensional Imaging Genetics Studies

  • paper_url: http://arxiv.org/abs/2309.15132
  • repo_url: None
  • paper_authors: Yaochen Xie, Ziqian Xie, Sheikh Muhammad Saiful Islam, Degui Zhi, Shuiwang Ji
  • for: This paper is written for the purpose of addressing the challenges of representation learning in genome-wide association studies (GWAS) for high-dimensional medical imaging data, specifically using mutual information (MI) to identify informative representations of the data.
  • methods: The paper introduces a trans-modal learning framework called Genetic InfoMax (GIM), which includes a regularized MI estimator and a novel genetics-informed transformer to address the specific challenges of GWAS.
  • results: The paper demonstrates the effectiveness of GIM and a significantly improved performance on GWAS, as evaluated on human brain 3D MRI data using standardized evaluation protocols.
    Abstract Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. When applied to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in comparison to typical visual representation learning. In this study, we tackle this problem from the mutual information (MI) perspective by identifying key limitations of existing methods. We introduce a trans-modal learning framework Genetic InfoMax (GIM), including a regularized MI estimator and a novel genetics-informed transformer to address the specific challenges of GWAS. We evaluate GIM on human brain 3D MRI data and establish standardized evaluation protocols to compare it to existing approaches. Our results demonstrate the effectiveness of GIM and a significantly improved performance on GWAS.
    摘要

Learning the Uncertainty Sets for Control Dynamics via Set Membership: A Non-Asymptotic Analysis

  • paper_url: http://arxiv.org/abs/2309.14648
  • repo_url: None
  • paper_authors: Yingying Li, Jing Yu, Lauren Conger, Adam Wierman
  • for: Linear dynamical systems under bounded, i.i.d. disturbances
  • methods: Set membership estimation, non-asymptotic bound on the diameter of the uncertainty sets
  • results: Robust adaptive model predictive control with performance approaching offline optimal model predictive control
    Abstract Set-membership estimation is commonly used in adaptive/learning-based control algorithms that require robustness over the model uncertainty sets, e.g., online robustly stabilizing control and robust adaptive model predictive control. Despite having broad applications, non-asymptotic estimation error bounds in the stochastic setting are limited. This paper provides such a non-asymptotic bound on the diameter of the uncertainty sets generated by set membership estimation on linear dynamical systems under bounded, i.i.d. disturbances. Further, this result is applied to robust adaptive model predictive control with uncertainty sets updated by set membership. We numerically demonstrate the performance of the robust adaptive controller, which rapidly approaches the performance of the offline optimal model predictive controller, in comparison with the control design based on least square estimation's confidence regions.
    摘要 <>设 membership 估计是通用的控制算法中的一个重要组成部分,例如在线 robustly 稳定控制和robust 适应模型预测控制中。 despite 广泛应用,非偏 asymptotic 估计误差 bound 在sto 价设置下有限。这篇文章提供了这样一个 non-asymptotic bound 在线 linear 动力系统上的 uncertainty 集 generated by set membership 估计的 diameter。此外,这个结果被应用于 robust 适应模型预测控制中,where uncertainty sets 是通过 set membership 估计更新的。我们通过数值示例表明了这种 robust 适应控制器的性能,它快速地 approached 线上最优化的模型预测控制器的性能,与 based on least square estimation 的 confidence regions 的控制设计相比。Note: "set membership" in the text refers to the estimation of the uncertainty sets of the system's parameters, and "non-asymptotic" means that the bound is valid for all time and does not rely on any assumptions about the convergence of the estimation process.

Gray-box Adversarial Attack of Deep Reinforcement Learning-based Trading Agents

  • paper_url: http://arxiv.org/abs/2309.14615
  • repo_url: None
  • paper_authors: Foozhan Ataiefard, Hadi Hemmati
  • For: The paper is written to demonstrate the robustness of a Deep Reinforcement Learning (Deep RL) based trading agent against adversarial attacks.* Methods: The paper uses a “gray-box” approach for attacking the Deep RL-based trading agent, which involves trading in the same stock market with no extra access to the trading agent. The adversary agent uses a hybrid Deep Neural Network as its policy, consisting of Convolutional layers and fully-connected layers.* Results: The paper shows that the adversary policy proposed in the research is able to reduce the reward values by 214.17%, which results in reducing the potential profits of the baseline by 139.4%, ensemble method by 93.7%, and an automated trading software developed by the industrial partner by 85.5%, while consuming significantly less budget than the victims.Here are the three points in Simplified Chinese text:* 为:本研究用于证明深度强化学习(Deep RL)基于的交易代理程序对抗性的可行性。* 方法:本研究使用“灰色框架”的方法进行攻击深度强化学习基于的交易代理程序,即在同一股票市场中进行交易,无需额外访问交易代理程序。敌对代理程序使用了一个混合深度神经网络作为其政策,该政策包括卷积层和全连接层。* 结果:研究显示,敌对政策提出的本研究可以将奖励值降低214.17%,这导致基准值下降139.4%,协同方法下降93.7%,并且由industrial partner开发的自动交易软件下降85.5%,同时消耗了许多更少的预算。
    Abstract In recent years, deep reinforcement learning (Deep RL) has been successfully implemented as a smart agent in many systems such as complex games, self-driving cars, and chat-bots. One of the interesting use cases of Deep RL is its application as an automated stock trading agent. In general, any automated trading agent is prone to manipulations by adversaries in the trading environment. Thus studying their robustness is vital for their success in practice. However, typical mechanism to study RL robustness, which is based on white-box gradient-based adversarial sample generation techniques (like FGSM), is obsolete for this use case, since the models are protected behind secure international exchange APIs, such as NASDAQ. In this research, we demonstrate that a "gray-box" approach for attacking a Deep RL-based trading agent is possible by trading in the same stock market, with no extra access to the trading agent. In our proposed approach, an adversary agent uses a hybrid Deep Neural Network as its policy consisting of Convolutional layers and fully-connected layers. On average, over three simulated trading market configurations, the adversary policy proposed in this research is able to reduce the reward values by 214.17%, which results in reducing the potential profits of the baseline by 139.4%, ensemble method by 93.7%, and an automated trading software developed by our industrial partner by 85.5%, while consuming significantly less budget than the victims (427.77%, 187.16%, and 66.97%, respectively).
    摘要 在最近几年,深度强化学习(Deep RL)在复杂的游戏、自动驾驶车和chatbot等系统中被成功应用。其中一个有趣的应用场景是作为自动交易代理。然而,通常的机制来研究RL的可靠性是基于白盒子Gradient-based adversarial sample生成技术(like FGSM),这种方法在实际应用中是无效的,因为模型被保护在安全的国际交易API中,如NASDAQ。在这个研究中,我们展示了一种“灰色盒”的攻击方法,可以在同一个股票市场上进行交易,没有额外访问交易代理。我们的提议的敌方策略使用了混合深度神经网络,其中包括卷积层和全连接层。在三个 simulate 的股票市场配置中,我们的敌方策略可以将奖励值降低214.17%,这将导致基eline的可能收益降低139.4%,集成方法降低93.7%,以及由我们的伙伴公司开发的自动交易软件降低85.5%,而消耗的预算比 víctims(427.77%, 187.16%, 66.97%)还要少。

Reparameterized Variational Rejection Sampling

  • paper_url: http://arxiv.org/abs/2309.14612
  • repo_url: None
  • paper_authors: Martin Jankowiak, Du Phan
  • for: 这篇论文的目的是扩展现有的变量推断方法,以提高变量推断的准确性和效率。
  • methods: 这篇论文使用了变量拒绝采样(VRS)技术,其将参数化提案分布与拒绝采样结合,定义了一种非参数的、高级别的变量家族。这种技术还引入了低差值重arameter化gradient估计器,以便在实际应用中更好地优化参数。
  • results: 该论文通过理论分析和实验 validate 了这种新的变量推断方法(RVRS)的效果,并证明了它在模型中存在某些特定的本地变量时表现特别好。
    Abstract Traditional approaches to variational inference rely on parametric families of variational distributions, with the choice of family playing a critical role in determining the accuracy of the resulting posterior approximation. Simple mean-field families often lead to poor approximations, while rich families of distributions like normalizing flows can be difficult to optimize and usually do not incorporate the known structure of the target distribution due to their black-box nature. To expand the space of flexible variational families, we revisit Variational Rejection Sampling (VRS) [Grover et al., 2018], which combines a parametric proposal distribution with rejection sampling to define a rich non-parametric family of distributions that explicitly utilizes the known target distribution. By introducing a low-variance reparameterized gradient estimator for the parameters of the proposal distribution, we make VRS an attractive inference strategy for models with continuous latent variables. We argue theoretically and demonstrate empirically that the resulting method--Reparameterized Variational Rejection Sampling (RVRS)--offers an attractive trade-off between computational cost and inference fidelity. In experiments we show that our method performs well in practice and that it is well-suited for black-box inference, especially for models with local latent variables.
    摘要 传统的变量推断方法通常基于参数化的变量分布家族,选择家族的选择对 posterior approximation 的准确性起到关键作用。简单的mean-field家族经常导致低精度的approximation,而Rich的分布家族如normalizing flows往往难以优化并不会利用target distribution的知识因为其黑盒性质。为扩展灵活的变量推断家族,我们回归到Variational Rejection Sampling(VRS)[Grover et al., 2018],它将 parametric proposal distribution 与拒绝抽样相结合,定义一种非 Parametric 的 rich family of distributions,并且直接利用target distribution的知识。通过引入低差variance reparameterized gradient estimator 的参数,我们使VRS成为latent variables 是 kontinuous的模型中的吸引力 inference 策略。我们 theoretically 和 empirically 论证,RVRS 提供了一个吸引人的trade-off between computational cost 和推断准确性。在实验中,我们发现我们的方法在实践中表现良好,特别是适用于black-box inference,尤其是local latent variables。

Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method

  • paper_url: http://arxiv.org/abs/2309.14601
  • repo_url: None
  • paper_authors: Mohannad Elhamod, Anuj Karpatne
  • For: This paper aims to provide a novel auto-encoder-based non-linear landscape visualization method for neural networks, called Neuro-Visualizer, to help researchers study the loss landscape of neural networks and their training process.* Methods: The proposed Neuro-Visualizer method uses an auto-encoder to learn a lower-dimensional representation of the loss landscape, and then visualizes the landscape using a 2D plot. The method is evaluated on a variety of problems in two applications of knowledge-guided machine learning (KGML).* Results: The results show that Neuro-Visualizer outperforms other linear and non-linear baselines and provides useful insights about the loss landscape of neural networks. The method is able to corroborate and sometimes challenge claims proposed by the machine learning community.Here’s the summary in Simplified Chinese:* 为: 这篇论文目标是提供一种基于自编码器的非线性损失地形可见化方法,名为Neuro-Visualizer,以帮助研究人员研究神经网络的损失地形和训练过程。* 方法: 提议的Neuro-Visualizer方法使用自编码器学习损失地形的下降维度表示,然后使用2D图表可见化地形。方法在两个知识导向机器学习(KGML)应用中进行了多种问题的实验评估。* 结果: 结果表明Neuro-Visualizer比其他线性和非线性基准方法表现更好,并为神经网络损失地形提供了有用的洞察。方法能够证实和机器学习社区提出的一些laims。所有实验代码和数据在https://anonymous.4open.science/r/NeuroVisualizer-FDD6上公开发布。
    Abstract In recent years, there has been a growing interest in visualizing the loss landscape of neural networks. Linear landscape visualization methods, such as principal component analysis, have become widely used as they intuitively help researchers study neural networks and their training process. However, these linear methods suffer from limitations and drawbacks due to their lack of flexibility and low fidelity at representing the high dimensional landscape. In this paper, we present a novel auto-encoder-based non-linear landscape visualization method called Neuro-Visualizer that addresses these shortcoming and provides useful insights about neural network loss landscapes. To demonstrate its potential, we run experiments on a variety of problems in two separate applications of knowledge-guided machine learning (KGML). Our findings show that Neuro-Visualizer outperforms other linear and non-linear baselines and helps corroborate, and sometime challenge, claims proposed by machine learning community. All code and data used in the experiments of this paper are available at an anonymous link https://anonymous.4open.science/r/NeuroVisualizer-FDD6
    摘要 近年来,有越来越多的研究者关注神经网络训练过程中的损失地图的可视化。使用原理Components分析等线性可视化方法已成为广泛使用的做法,因为它们可以直观地帮助研究者研究神经网络和它的训练过程。然而,这些线性方法受到一些限制和缺陷,因为它们无法准确表达高维度的地图。在这篇论文中,我们提出了一种基于自适应Encoder的非线性地图可视化方法,称之为Neuro-Visualizer。我们运行了多个问题在两个不同的知识导向机器学习(KGML)应用中,并证明Neuro-Visualizer可以超过其他线性和非线性基准,并提供有用的意见关于神经网络损失地图。所有实验代码和数据都可以在https://anonymous.4open.science/r/NeuroVisualizer-FDD6上获取。

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control

  • paper_url: http://arxiv.org/abs/2309.14597
  • repo_url: https://github.com/nathanrahn/return-landscapes
  • paper_authors: Nate Rahn, Pierluca D’Oro, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
  • for: 这个论文的目的是研究深度强化学习代理的稳定性问题。
  • methods: 该论文使用返回地图来研究政策和返回之间的映射,并发现流行的算法在这个地图上徘徊于噪声 neighbohood,这导致政策 Parameters 更新后返回具有广泛的变化范围。
  • results: 研究发现,返回地图具有意外的结构,存在简单的路径,可以通过 navigating 这些路径来改善政策的稳定性。该论文还提出了一种分布式视角来评估政策质量,并开发了一种分布式方法来找到这些路径。
    Abstract Deep reinforcement learning agents for continuous control are known to exhibit significant instability in their performance over time. In this work, we provide a fresh perspective on these behaviors by studying the return landscape: the mapping between a policy and a return. We find that popular algorithms traverse noisy neighborhoods of this landscape, in which a single update to the policy parameters leads to a wide range of returns. By taking a distributional view of these returns, we map the landscape, characterizing failure-prone regions of policy space and revealing a hidden dimension of policy quality. We show that the landscape exhibits surprising structure by finding simple paths in parameter space which improve the stability of a policy. To conclude, we develop a distribution-aware procedure which finds such paths, navigating away from noisy neighborhoods in order to improve the robustness of a policy. Taken together, our results provide new insight into the optimization, evaluation, and design of agents.
    摘要 深度强化学会控制器的表现会随时间而呈现显著的不稳定性。在这个工作中,我们提供了一种新的视角,研究返回地图:策略和返回之间的映射。我们发现受欢迎的算法在策略参数空间中穿梭着噪声的 neighborhood,一次更新策略参数可以导致返回值的各种各样变化。通过对这些返回值采取分布视角,我们映射了这个地图,描述了策略空间中失败的区域,并发现了一个隐藏的策略质量维度。我们发现返回地图具有意外的结构,找到了简单的参数空间路径,可以提高策略的稳定性。最后,我们开发了一种分布意识的过程,找到这些路径,在策略参数空间中缓解噪声,以提高策略的可靠性。总之,我们的结果为代理人的优化、评估和设计提供了新的视角。