for: Computationally and label efficient PAC active learning of $d$-dimensional halfspaces with Tsybakov Noise.
methods: Nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$.
results: Low excess error guarantee, narrowing down the gap between the label complexities of the previously known efficient passive or active algorithms and the information-theoretic lower bound.Abstract
We study the problem of computationally and label efficient PAC active learning $d$-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$\footnote{In the main body of this work, we use $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ to hide factors of the form $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, under the assumption that the Tsybakov noise parameter $\alpha \in (\frac13, 1]$, which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.
摘要
我们研究 computationally 和标签有效的 PAC 活动学习 $d$-dimensional 半空间,以 Tsybakov 噪声为基础。参考 \cite{diakonikolas2020learning}, 我们证明任何估计首要点的近似非凸函数损失函数的解答将具有低过量错误保证。对于这个结构性结果,我们设计了非凸估计过程中的算法,其标签复杂度为 $\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1})$\footnote{在本文中,我们使用 $\tilde{O}(\cdot), \tilde{\Theta}(\cdot)$ 隐藏因数为 $\polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})$}, 其中 $\alpha \in (\frac13, 1]$ 是 Tsybakov 噪声参数。这个假设窄化了在这个设定下的标签复杂度和以前已知的有效投入或活动算法 \cite{diakonikolas2020polynomial,zhang2021improved} 之间的差距。
MEMPSEP III. A machine learning-oriented multivariate data set for forecasting the Occurrence and Properties of Solar Energetic Particle Events using a Multivariate Ensemble Approach
results: 这个数据集可以用于机器学习预测高能粒子事件的发生和其后的性质,并已经用于开发了一种新的多变量组合模型(MEMPSEP)。Abstract
We introduce a new multivariate data set that utilizes multiple spacecraft collecting in-situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998-2013), we identify 252 solar events (flares) that produce SEPs and 17,542 events that do not. For each identified event, we acquire the local plasma properties at 1 au, such as energetic proton and electron data, upstream solar wind conditions, and the interplanetary magnetic field vector quantities using various instruments onboard GOES and the Advanced Composition Explorer (ACE) spacecraft. We also collect remote sensing data from instruments onboard the Solar Dynamic Observatory (SDO), Solar and Heliospheric Observatory (SoHO), and the Wind solar radio instrument WAVES. The data set is designed to allow for variations of the inputs and feature sets for machine learning (ML) in heliophysics and has a specific purpose for forecasting the occurrence of SEP events and their subsequent properties. This paper describes a dataset created from multiple publicly available observation sources that is validated, cleaned, and carefully curated for our machine-learning pipeline. The dataset has been used to drive the newly-developed Multivariate Ensemble of Models for Probabilistic Forecast of Solar Energetic Particles (MEMPSEP; see MEMPSEP I (Chatterjee et al., 2023) and MEMPSEP II (Dayeh et al., 2023) for associated papers).
摘要
我们介绍一个新的多变量数据集,该数据集利用多个空间站在 situ和远程探测〖〗〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕〖〗〕〔〕
results: 建立了GAN模型的错误收敛率的紧张bounds,并且可以应用于现有的GAN模型错误估计,从而提高错误估计的收敛率。具体来说,使用神经网络距离定义的错误是GAN模型的特殊情况。Abstract
The generative adversarial network (GAN) is an important model developed for high-dimensional distribution learning in recent years. However, there is a pressing need for a comprehensive method to understand its error convergence rate. In this research, we focus on studying the error convergence rate of the GAN model that is based on a class of functions encompassing the discriminator and generator neural networks. These functions are VC type with bounded envelope function under our assumptions, enabling the application of the Talagrand inequality. By employing the Talagrand inequality and Borel-Cantelli lemma, we establish a tight convergence rate for the error of GAN. This method can also be applied on existing error estimations of GAN and yields improved convergence rates. In particular, the error defined with the neural network distance is a special case error in our definition.
摘要
“generate adversarial network(GAN)是近年来实现高维分布学习的重要模型。然而,我们需要一种全面的方法来理解GAN模型的错误融合率。在本研究中,我们专注于研究基于构成推论和生成神经网络的函数集的GAN模型的错误融合率。这些函数是VC类型,具有受我们假设的受限的托管函数,因此可以适用塔拉格兰δ不等式。通过使用塔拉格兰δ不等式和波尔-凡提莱假设,我们建立了GAN模型的错误融合率的紧缩界限。这个方法可以对现有的GAN模型错误评估中的错误进行改进。特别是,使用神经网络距离定义的错误是GAN模型的特殊情况。”Note: Simplified Chinese is used here, which is a standardized form of Chinese that is easier to read and write. Traditional Chinese is also widely used, and the translation would be similar but with some differences in wording and formatting.
Learning Fair Representations with High-Confidence Guarantees
methods: 本研究提出了一种名为 Fair Representation learning with high-confidence Guarantees(FRG)框架,该框架可以提供高度可信度保证,限制下游模型和任务中的不公正程度,并且可以根据用户定义的Upper bound进行限制。
results: 研究人员通过证明FRG方法能够在所有下游模型和任务中保证公正性,并且在多个下游模型和任务中实际评估中也能够Upper bound不公正程度。Abstract
Representation learning is increasingly employed to generate representations that are predictive across multiple downstream tasks. The development of representation learning algorithms that provide strong fairness guarantees is thus important because it can prevent unfairness towards disadvantaged groups for all downstream prediction tasks. To prevent unfairness towards disadvantaged groups in all downstream tasks, it is crucial to provide representation learning algorithms that provide fairness guarantees. In this paper, we formally define the problem of learning representations that are fair with high confidence. We then introduce the Fair Representation learning with high-confidence Guarantees (FRG) framework, which provides high-confidence guarantees for limiting unfairness across all downstream models and tasks, with user-defined upper bounds. After proving that FRG ensures fairness for all downstream models and tasks with high probability, we present empirical evaluations that demonstrate FRG's effectiveness at upper bounding unfairness for multiple downstream models and tasks.
摘要
“表示学习”是在多个下游任务中生成预测性的表示,因此开发有强有力公平保证的表示学习算法是非常重要的,因为它可以防止对劣势群体的不公正。为了防止在所有下游任务中对劣势群体的不公正,提供公平保证的表示学习算法是非常重要的。在这篇论文中,我们正式定义了学习公平表示的问题,并引入了高度信任保证的 Fair Representation learning with high-confidence Guarantees(FRG)框架,该框架提供了高度信任保证,限制下游模型和任务中的不公正,并且可以根据用户定义的Upper bound进行限制。经过证明FRG在所有下游模型和任务中保证公平性的高概率,我们还提供了多个下游模型和任务的实验评估,证明FRG能够有效地 Upper bound不公正性。
Random Exploration in Bayesian Optimization: Order-Optimal Regret and Computational Efficiency
for: 本研究旨在探讨 bayesian 优化方法,具体来说是使用 Gaussian Process 模型,以及在随机抽样的基础上进行搜索空间。
methods: 本研究使用了随机抽样法,并基于一个新的归一化 bounds 来分析其性能。
results: 研究结果表明,采用随机抽样法可以实现最佳的错误率,并且在噪声无存的情况下可以 closing 现有的 gap 问题,解决了 COLT 开放问题。此外,提出的算法还具有更高的计算效率,因为它不需要在每次迭代中估算非对称的吸引函数。Abstract
We consider Bayesian optimization using Gaussian Process models, also referred to as kernel-based bandit optimization. We study the methodology of exploring the domain using random samples drawn from a distribution. We show that this random exploration approach achieves the optimal error rates. Our analysis is based on novel concentration bounds in an infinite dimensional Hilbert space established in this work, which may be of independent interest. We further develop an algorithm based on random exploration with domain shrinking and establish its order-optimal regret guarantees under both noise-free and noisy settings. In the noise-free setting, our analysis closes the existing gap in regret performance and thereby resolves a COLT open problem. The proposed algorithm also enjoys a computational advantage over prevailing methods due to the random exploration that obviates the expensive optimization of a non-convex acquisition function for choosing the query points at each iteration.
摘要
我团队考虑使用 bayesian 优化方法,其中使用 Gaussian Process 模型,也称为 kernel-based bandit 优化。我们研究如何在域中随机抽样,以实现最佳的错误率。我们的分析基于我们在这篇文章中提出的新的含量约束,这可能是独立的兴趣。我们还开发了基于随机抽样和域缩小的算法,并证明其在静音和噪音Setting中具有order-optimal的 regret 保证。在静音Setting中,我们的分析填充了现有的 gap 问题,解决了 COLT 开放问题。此外,我们的算法还享有对先前方法的计算优势,因为随机抽样可以避免估计非 conjugate 函数的成本。
Burgers’ pinns with implicit euler transfer learning
results: 提出一种时间细化的 PINNs 模型,可以减少计算成本并保持同等准确性。Abstract
The Burgers equation is a well-established test case in the computational modeling of several phenomena such as fluid dynamics, gas dynamics, shock theory, cosmology, and others. In this work, we present the application of Physics-Informed Neural Networks (PINNs) with an implicit Euler transfer learning approach to solve the Burgers equation. The proposed approach consists in seeking a time-discrete solution by a sequence of Artificial Neural Networks (ANNs). At each time step, the previous ANN transfers its knowledge to the next network model, which learns the current time solution by minimizing a loss function based on the implicit Euler approximation of the Burgers equation. The approach is tested for two benchmark problems: the first with an exact solution and the other with an alternative analytical solution. In comparison to the usual PINN models, the proposed approach has the advantage of requiring smaller neural network architectures with similar accurate results and potentially decreasing computational costs.
摘要
《布尔格尔方程》是一个广泛应用的计算模拟方法的测试 случа件,包括流体动力学、气体动力学、冲击理论、 cosmology 等领域。在这项工作中,我们提出了使用物理学知识泛化神经网络(PINNs)的偏函数扩展学习方法解决布尔格尔方程。该方法包括通过一系列人工神经网络(ANNs)来求解时间隔 discrete 的解。在每个时间步骤中,前一个 ANN 传递其知识给下一个神经网络模型,该模型通过最小化基于偏函数积分的损失函数来学习当前时间解。该方法在两个标准测试问题上进行了测试:一个具有精确解和另一个具有备用分析解。与传统的 PINN 模型相比,该方法具有更小的神经网络架构,同样具有高度准确的结果,并且可能减少计算成本。
Towards Hybrid-grained Feature Interaction Selection for Deep Sparse Network
results: 我们在三个大型的Real-world benchmark dataset上进行了实验,结果显示OptFeature方法在精度和效率之间取得了良好的平衡,并且进行了更多的研究以证明我们的方法的可行性。Abstract
Deep sparse networks are widely investigated as a neural network architecture for prediction tasks with high-dimensional sparse features, with which feature interaction selection is a critical component. While previous methods primarily focus on how to search feature interaction in a coarse-grained space, less attention has been given to a finer granularity. In this work, we introduce a hybrid-grained feature interaction selection approach that targets both feature field and feature value for deep sparse networks. To explore such expansive space, we propose a decomposed space which is calculated on the fly. We then develop a selection algorithm called OptFeature, which efficiently selects the feature interaction from both the feature field and the feature value simultaneously. Results from experiments on three large real-world benchmark datasets demonstrate that OptFeature performs well in terms of accuracy and efficiency. Additional studies support the feasibility of our method.
摘要
深层稀畴网络在预测任务中具有高维稀有特性,其中特征交互选择是关键组成部分。先前的方法主要关注在粗略空间中搜索特征交互,少有关注细腻空间。在这项工作中,我们提出了一种混合粗细特征交互选择方法,该方法 simultanously targets 特征场和特征值。为了探索这种广阔的空间,我们提出了一种分解空间,该空间在运行时进行计算。然后,我们开发了一种选择算法called OptFeature,该算法高效地选择特征交互从特征场和特征值中。实验结果表明,OptFeature在三个大规模实际数据集上表现良好,具有准确性和效率之间的平衡。此外,我们还进行了其他研究,以支持我们的方法的可行性。
ADMM Training Algorithms for Residual Networks: Convergence, Complexity and Parallel Training
results: 我们 theoretically analyzed the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem. experiments show that our parallel training method can achieve high speed, better performance, robustness and potential in deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.Abstract
We design a series of serial and parallel proximal point (gradient) ADMMs for the fully connected residual networks (FCResNets) training problem by introducing auxiliary variables. Convergence of the proximal point version is proven based on a Kurdyka-Lojasiewicz (KL) property analysis framework, and we can ensure a locally R-linear or sublinear convergence rate depending on the different ranges of the Kurdyka-Lojasiewicz (KL) exponent, in which a necessary auxiliary function is constructed to realize our goal. Moreover, the advantages of the parallel implementation in terms of lower time complexity and less (per-node) memory consumption are analyzed theoretically. To the best of our knowledge, this is the first work analyzing the convergence, convergence rate, time complexity and (per-node) runtime memory requirement of the ADMM applied in the FCResNets training problem theoretically. Experiments are reported to show the high speed, better performance, robustness and potential in the deep network training tasks. Finally, we present the advantage and potential of our parallel training in large-scale problems.
摘要
我们设计了一系列的串行和并行距离点(梯度)ADMM,用于全连接差异网络(FCResNets)训练问题。我们引入了辅助变量,并证明了距离点版本的收敛性,基于 Kurdyka-Lojasiewicz(KL)性质分析框架。我们可以确保当KL exponent在不同范围内时,得到本地R-线性或者减速收敛率。此外,我们还分析了并行实现的优点,包括更低的时间复杂度和每个节点的内存占用量。在我们知道的范围内,这是FCResNets训练问题中ADMM的首次 teoretic 分析。我们的实验表明了高速、更好的性能、稳定性和深网训练任务中的潜在优势。最后,我们介绍了大规模问题中的并行训练的优点和潜在。
Estimating Trustworthy and Safe Optimal Treatment Regimes
results: 研究结果显示,这个框架能够在复杂情况下进行最佳策略设定,并且发现了个人化治疗策略,将药物剂量降低轻微和短暂的癫痫病人,而对于在加护病房中接受治疗的病人,则采取更积极的治疗方案。Abstract
Recent statistical and reinforcement learning methods have significantly advanced patient care strategies. However, these approaches face substantial challenges in high-stakes contexts, including missing data, inherent stochasticity, and the critical requirements for interpretability and patient safety. Our work operationalizes a safe and interpretable framework to identify optimal treatment regimes. This approach involves matching patients with similar medical and pharmacological characteristics, allowing us to construct an optimal policy via interpolation. We perform a comprehensive simulation study to demonstrate the framework's ability to identify optimal policies even in complex settings. Ultimately, we operationalize our approach to study regimes for treating seizures in critically ill patients. Our findings strongly support personalized treatment strategies based on a patient's medical history and pharmacological features. Notably, we identify that reducing medication doses for patients with mild and brief seizure episodes while adopting aggressive treatment for patients in intensive care unit experiencing intense seizures leads to more favorable outcomes.
摘要
Translated into Simplified Chinese:近期的统计学和强化学习方法已经有效地提高了患者治疗策略。然而,这些方法在高风险上下面遇到了大量数据缺失、内生的随机性和必须满足解释性和患者安全的要求。我们的工作将这些方法变得安全和可解释的。我们的方法是通过匹配患者的医疗和药理特征来寻找优质策略。我们通过 interpolate 来构建优质策略。我们进行了全面的模拟研究,以示该方法在复杂情况下的能力。最后,我们运用我们的方法研究了治疗癫痫病人的方法。我们的发现支持个性化治疗策略,基于患者的医疗历史和药理特征。尤其是,我们发现在轻度癫痫发作的患者 reducing medication doses 可以获得更好的结果,而在医院内部件严重癫痫发作的患者则应采取更为严格的治疗策略。
Unsupervised Federated Learning: A Federated Gradient EM Algorithm for Heterogeneous Mixture Models with Robustness against Adversarial Attacks
results: 该方法具有适应不确定任务相似性、对小数据源的抗击骚扰攻击、保护本地数据隐私、计算和通信效率等优点。Abstract
While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. In this paper, we introduce a novel federated gradient EM algorithm designed for the unsupervised learning of mixture models with heterogeneous mixture proportions across tasks. We begin with a comprehensive finite-sample theory that holds for general mixture models, then apply this general theory on Gaussian Mixture Models (GMMs) and Mixture of Regressions (MoRs) to characterize the explicit estimation error of model parameters and mixture proportions. Our proposed federated gradient EM algorithm demonstrates several key advantages: adaptability to unknown task similarity, resilience against adversarial attacks on a small fraction of data sources, protection of local data privacy, and computational and communication efficiency.
摘要
“supervised federated learning的方法已经取得了显著的成功,然而无监督 federated learning的领域尚未得到充分的探索。在这篇论文中,我们提出了一种新的 federated gradient EM算法,用于无监督学习混合模型的不同任务中的混合比例。我们首先提出了一种通用的finite-sample理论,然后应用这种通用理论于 Gaussian Mixture Models (GMMs) 和 Mixture of Regressions (MoRs),以Characterize模型参数和混合比例的明确估计误差。我们的提议的 federated gradient EM算法具有以下优点:适应不知道任务相似性,抗击小数据源上的恶意攻击,保护本地数据隐私,并且在计算和通信方面具有高效性。”Note that Simplified Chinese is used in this translation, which is a standardized form of Chinese that is used in mainland China and Singapore. Traditional Chinese is used in Taiwan and Hong Kong.
ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer’s Disease
results: 根据这篇论文的结果,ADMarker 可以实现高精度的AD数位生物 markers 检测,其准确率可以达到 93.8%,并且可以在88.9%的精度下识别早期的AD。此外,ADMarker 还可以在长期的评估中跟踪患AD患者的症状和病程。Abstract
Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated learning architecture that can accurately detect digital biomarkers in a privacy-preserving manner. Our approach collectively addresses several major real-world challenges, such as limited data labels, data heterogeneity, and limited computing resources. We built a compact multi-modality hardware system and deployed it in a four-week clinical trial involving 91 elderly participants. The results indicate that ADMarker can accurately detect a comprehensive set of digital biomarkers with up to 93.8% accuracy and identify early AD with an average of 88.9% accuracy. ADMarker offers a new platform that can allow AD clinicians to characterize and track the complex correlation between multidimensional interpretable digital biomarkers, demographic factors of patients, and AD diagnosis in a longitudinal manner.
摘要
阿尔茨heimer病(AD)和相关的忘卫症是全球医疗健康的增长性挑战,归因于老龄化人口增长。在这篇论文中,我们介绍了ADMarker,第一个整体 integrate多Modal感知器和新的联邦学习算法的检测多维度AD数字生物标志系统。ADMarker具有一种新的三 stage多Modal联邦学习架构,可以准确地检测数字生物标志,并且保护隐私。我们的方法同时解决了一些重要的现实世界挑战,如有限的数据标签、数据不一致和有限的计算资源。我们建立了一个 компакт的多Modal感知硬件系统,并在4周的临床试验中使用了91名老年参与者。结果表明,ADMarker可以准确地检测一组包括93.8%的准确率的数字生物标志,并在88.9%的准确率下识别早期AD。ADMarker提供了一个新的平台,可以让AD临床医生通过多维度可读取数字生物标志、患者人口特征和AD诊断之间的复杂相关关系进行长期跟踪。
Fast and Reliable Generation of EHR Time Series via Diffusion Models
results: 对六个数据集进行了实验,比较了这种新方法与七种现有方法的性能,结果表明,这种方法在数据用途性能方面明显超过了所有现有方法,同时需要更少的训练努力。这种方法也可以增强下游医疗数据分析,提供多样化和现实的生成EHR时间序列数据。Abstract
Electronic Health Records (EHRs) are rich sources of patient-level data, including laboratory tests, medications, and diagnoses, offering valuable resources for medical data analysis. However, concerns about privacy often restrict access to EHRs, hindering downstream analysis. Researchers have explored various methods for generating privacy-preserving EHR data. In this study, we introduce a new method for generating diverse and realistic synthetic EHR time series data using Denoising Diffusion Probabilistic Models (DDPM). We conducted experiments on six datasets, comparing our proposed method with seven existing methods. Our results demonstrate that our approach significantly outperforms all existing methods in terms of data utility while requiring less training effort. Our approach also enhances downstream medical data analysis by providing diverse and realistic synthetic EHR data.
摘要
A Doubly Robust Approach to Sparse Reinforcement Learning
results: 这个论文的结果表明,提出的算法的 regret 是 $\tilde{O}(\sigma^{-1}{\min} s{\star} H \sqrt{N})$,其中 $\sigma_{\min}$ 是 average Gram matrix 的最小特征值,$s_\star$ 是稀疏性参数,$H$ 是 episode 的长度,$N$ 是 round 的数量。此外,论文还提供了一个下界 regret bound,其与上界 bound 几乎吻合。 numersical experiments 支持了论文的理论结果,并证明了该算法的优越性。Abstract
We propose a new regret minimization algorithm for episodic sparse linear Markov decision process (SMDP) where the state-transition distribution is a linear function of observed features. The only previously known algorithm for SMDP requires the knowledge of the sparsity parameter and oracle access to an unknown policy. We overcome these limitations by combining the doubly robust method that allows one to use feature vectors of \emph{all} actions with a novel analysis technique that enables the algorithm to use data from all periods in all episodes. The regret of the proposed algorithm is $\tilde{O}(\sigma^{-1}_{\min} s_{\star} H \sqrt{N})$, where $\sigma_{\min}$ denotes the restrictive the minimum eigenvalue of the average Gram matrix of feature vectors, $s_\star$ is the sparsity parameter, $H$ is the length of an episode, and $N$ is the number of rounds. We provide a lower regret bound that matches the upper bound up to logarithmic factors on a newly identified subclass of SMDPs. Our numerical experiments support our theoretical results and demonstrate the superior performance of our algorithm.
摘要
我们提出一个新的后悔函数减少算法 для episodic sparse linear Markov decision process(SMDP),其中状态转换分布为観测特征的线性函数。先前只有一个算法可以应对SMDP,但它需要知道稀疏参数和不知道政策的权限。我们利用 doubly robust 方法,允许使用所有动作的特征 вектор,并与一种新的分析技术,允许算法使用所有时期和所有集的数据。我们的算法的后悔是 $\tilde{O}(\sigma^{-1}_{\min} s_{\star} H \sqrt{N})$,其中 $\sigma_{\min}$ 是 restrictive 最小 eigenvalues of the average Gram matrix of feature vectors, $s_\star$ 是稀疏参数,$H$ 是集的长度,$N$ 是轮数。我们提出了一个新的下界,与上界之间仅差 logarithmic factors,并在一个新的 SMDP subclass 上适用。我们的实验结果支持我们的理论结果,并证明我们的算法的优秀性。
UncertaintyPlayground: A Fast and Simplified Python Library for Uncertainty Estimation
methods: 该库使用PyTorch和GPyTorch实现了各种不确定性估计方法,包括 sparse和Variational Gaussian Process Regressions (SVGPRs) 和 Mixed Density Networks (MDN)。
results: 该库可以快速训练 Gaussian和多Modal 结果分布,并且可以Visual化一个或多个实例的预测范围。此外,库还提供了许多PyTorch特有的速度优化技术,可以在CPU和GPU上进行训练。Abstract
This paper introduces UncertaintyPlayground, a Python library built on PyTorch and GPyTorch for uncertainty estimation in supervised learning tasks. The library offers fast training for Gaussian and multi-modal outcome distributions through Sparse and Variational Gaussian Process Regressions (SVGPRs) for normally distributed outcomes and Mixed Density Networks (MDN) for mixed distributions. In addition to model training with various hyperparameters, UncertaintyPlayground can visualize the prediction intervals of one or more instances. Due to using tensor operations, the library can be trained both on CPU and GPU and offers various PyTorch-specific techniques for speed optimization. The library contains unit tests for each module and ensures multi-platform continuous integration with GitHub Workflows (online integration) and Tox (local integration). Finally, the code is documented with Google-style docstrings and offers a documentation website created with MkDocs and MkDocStrings.
摘要
这份论文介绍了UncertaintyPlayground,一个基于PyTorch和GPyTorch的Python库,用于supervised learning任务中的不确定性估计。该库提供了快速训练泊松分布和多模态分布的结果的方法,包括几何分布回归(SVGPR)和混合密度网络(MDN)。除了模型训练的各种超参数外,UncertaintyPlayground还可以可视化一个或多个实例的预测范围。由于使用tensor操作,该库可以在CPU和GPU上训练,并使用PyTorch特有的加速技术。库包含每个模块的单元测试,并实现了多平台连续集成(online integration)和Tox(本地集成)。最后,代码被文档化并创建了使用MkDocs和MkDocStrings的文档网站。
Triple Simplex Matrix Completion for Expense Forecasting
results: 使用两个实际数据集示出方法的有效性,比对州对照算法。Abstract
Forecasting project expenses is a crucial step for businesses to avoid budget overruns and project failures. Traditionally, this has been done by financial analysts or data science techniques such as time-series analysis. However, these approaches can be uncertain and produce results that differ from the planned budget, especially at the start of a project with limited data points. This paper proposes a constrained non-negative matrix completion model that predicts expenses by learning the likelihood of the project correlating with certain expense patterns in the latent space. The model is constrained on three probability simplexes, two of which are on the factor matrices and the third on the missing entries. Additionally, the predicted expense values are guaranteed to meet the budget constraint without the need of post-processing. An inexact alternating optimization algorithm is developed to solve the associated optimization problem and is proven to converge to a stationary point. Results from two real datasets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art algorithms.
摘要
预测项目费用是企业避免预算超支和项目失败的关键步骤。传统上,这是由财务分析师或数据科学技术如时间序列分析完成的。然而,这些方法可能存在uncertainty和不准确的结果,特别是项目开始时有限数据点。这篇论文提议一种受限的非负矩阵完成模型,该模型预测费用 by learning项目与某些费用模式在隐藏空间的可能性。模型受限于三个可能性箱,其中两个在因素矩阵上,另一个在缺失数据点上。此外,预测的费用值均会满足预算约束,无需后处理。一种不准确的交互优化算法是解决相关优化问题的,并证明会收敛到站点点。实验结果表明,提议的方法在两个实际数据集中效果更高于状态调度算法。
One-hot Generalized Linear Model for Switching Brain State Discovery
paper_authors: Chengrui Li, Soon Ho Kim, Chris Rodgers, Hannah Choi, Anqi Wu for: 这种研究旨在理解神经细胞circuit的内部结构和功能关系。methods: 这种方法使用状态转换的一般线性模型(GLM)和隐马尔可夫模型(HMM),并在GLM中引入了一个加 Gaussian Prior和一个一键热静态 Prior。这些 Priors 是可学习的。results: 这种方法可以有效地恢复真实的交互结构,在 simulate 数据中取得最高预测概率,并在实际神经元数据中提供更加可读的交互结构和隐藏状态。Abstract
Exposing meaningful and interpretable neural interactions is critical to understanding neural circuits. Inferred neural interactions from neural signals primarily reflect functional interactions. In a long experiment, subject animals may experience different stages defined by the experiment, stimuli, or behavioral states, and hence functional interactions can change over time. To model dynamically changing functional interactions, prior work employs state-switching generalized linear models with hidden Markov models (i.e., HMM-GLMs). However, we argue they lack biological plausibility, as functional interactions are shaped and confined by the underlying anatomical connectome. Here, we propose a novel prior-informed state-switching GLM. We introduce both a Gaussian prior and a one-hot prior over the GLM in each state. The priors are learnable. We will show that the learned prior should capture the state-constant interaction, shedding light on the underlying anatomical connectome and revealing more likely physical neuron interactions. The state-dependent interaction modeled by each GLM offers traceability to capture functional variations across multiple brain states. Our methods effectively recover true interaction structures in simulated data, achieve the highest predictive likelihood with real neural datasets, and render interaction structures and hidden states more interpretable when applied to real neural data.
摘要
描述神经细胞之间的意义ful和可解释的互动是神经细胞网络理解的关键。尽管推测的神经互动主要反映了功能性互动,但在长期实验中,主体动物可能经历不同的阶段,定义为实验、刺激或行为状态,因此功能互动可能会随时间变化。为了模型时间变化的功能互动,前作使用状态转换的普通线性模型加上隐马尔可夫模型(i.e., HMM-GLMs)。然而,我们认为这些方法缺乏生物学可信度,因为功能互动受到神经细胞之间的结构连接的限制。在这里,我们提出了一种新的先知 informed state-switching GLM。我们引入了一个加aussian prior和一个one-hot prior sobre GLM在每个状态中。这些先知是学习的。我们会示出,学习的先知应该捕捉状态常量的互动,揭示下面的结构连接,并抛出更可能的物理神经元互动。每个 GLM 模型的状态依赖性可以跟踪多种神经细胞状态中的功能变化。我们的方法可以有效地回归真实的互动结构,在实验数据中达到最高的预测可能性,并在应用于真实神经数据时使互动结构和隐藏状态更加可读。
Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features
results: 研究发现,通过使用 prosody 特征,DDSD 性能可以提高 upto 8.5% 在 false acceptance rate (FA) 上,而使用模式落单技术可以在 missing modalities 时提高 DDSD 性能 by 7.4%。Abstract
Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text and/or automatic speech recognition system (ASR) features, to classify speech as device-directed or otherwise, and often have to contend with one or more of these modalities being unavailable when deployed in real-world settings. In this paper, we investigate fusion schemes for DDSD systems that can be made more robust to missing modalities. Concurrently, we study the use of non-verbal cues, specifically prosody features, in addition to verbal cues for DDSD. We present different approaches to combine scores and embeddings from prosody with the corresponding verbal cues, finding that prosody improves DDSD performance by upto 8.5% in terms of false acceptance rate (FA) at a given fixed operating point via non-linear intermediate fusion, while our use of modality dropout techniques improves the performance of these models by 7.4% in terms of FA when evaluated with missing modalities during inference time.
摘要
device-directed speech detection (DDSD) 是一个二分类任务,旨在 отличить对voice助手的查询 versus 背景或边声。现状最佳的 DDSD 系统使用语音、文本和/或自动语音识别系统(ASR)特征来分类语音为设备指向或否,并常常需要面临实际场景中一或多个感知modalities不可用的问题。在这篇论文中,我们调查DDSD系统的融合方案,以提高缺失modalities情况下的Robustness。同时,我们研究非语音特征,即语调特征,在DDSD中的使用,并研究将语调特征与语音特征结合的不同方法。我们发现,在给定的运行点下,使用语调特征可以降低DDSD的false acceptance rate(FA)至多达8.5%,而我们的模块Dropout技术可以在检测时遇到缺失modalities情况下提高这些模型的性能达7.4%。
SimBIG: Field-level Simulation-Based Inference of Galaxy Clustering
paper_authors: Pablo Lemos, Liam Parker, ChangHoon Hahn, Shirley Ho, Michael Eickenberg, Jiamin Hou, Elena Massara, Chirag Modi, Azadeh Moradinezhad Dizgah, Bruno Regaldo-Saint Blancard, David Spergel
results: 这个研究得到了对$\Omega_m$和$\sigma_8$的约束,其中$\Omega_m=0.267^{+0.033}{-0.029}$和$\sigma_8=0.762^{+0.036}{-0.035}$。此外,这个研究还提供了基于场层分析的Hubble常数$H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$的约束。Abstract
We present the first simulation-based inference (SBI) of cosmological parameters from field-level analysis of galaxy clustering. Standard galaxy clustering analyses rely on analyzing summary statistics, such as the power spectrum, $P_\ell$, with analytic models based on perturbation theory. Consequently, they do not fully exploit the non-linear and non-Gaussian features of the galaxy distribution. To address these limitations, we use the {\sc SimBIG} forward modelling framework to perform SBI using normalizing flows. We apply SimBIG to a subset of the BOSS CMASS galaxy sample using a convolutional neural network with stochastic weight averaging to perform massive data compression of the galaxy field. We infer constraints on $\Omega_m = 0.267^{+0.033}_{-0.029}$ and $\sigma_8=0.762^{+0.036}_{-0.035}$. While our constraints on $\Omega_m$ are in-line with standard $P_\ell$ analyses, those on $\sigma_8$ are $2.65\times$ tighter. Our analysis also provides constraints on the Hubble constant $H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$ from galaxy clustering alone. This higher constraining power comes from additional non-Gaussian cosmological information, inaccessible with $P_\ell$. We demonstrate the robustness of our analysis by showcasing our ability to infer unbiased cosmological constraints from a series of test simulations that are constructed using different forward models than the one used in our training dataset. This work not only presents competitive cosmological constraints but also introduces novel methods for leveraging additional cosmological information in upcoming galaxy surveys like DESI, PFS, and Euclid.
摘要
我们介绍首次基于 simulations 的 cosmological parameter inference (SBI),通过场景级分析 galaxy clustering。标准 galaxy clustering 分析通常使用 analytic 模型基于扰动理论来分析 Summary statistics,如干振荡спектrum $P_\ell$。这些限制不能充分利用 galaxy 分布的非线性和非高斯特性。为了解决这些限制,我们使用 SimBIG 前模型框架进行 SBI,使用 normalizing flows。我们对 BOSS CMASS галаксиSample 进行了一个子集的 SimBIG 分析,使用一个卷积神经网络和随机权重均衡来实现大量数据压缩。我们从 galaxy clustering 中得到了 $\Omega_m = 0.267^{+0.033}_{-0.029}$ 和 $\sigma_8=0.762^{+0.036}_{-0.035}$ 的限制,其中 $\Omega_m$ 的限制与标准 $P_\ell$ 分析相符,而 $\sigma_8$ 的限制则是 $2.65\times$ 更紧的。我们的分析还提供了基于 galaxy clustering 的 Hubble 常数 $H_0=64.5 \pm 3.8 \ {\rm km / s / Mpc}$ 的限制。这更高的限制来自于非 Gaussian cosmological 信息,无法通过 $P_\ell$ 访问。我们示示了我们的分析可以从不同的前模型测试 simulate 中获得不偏的 cosmological 限制,这表明我们的分析是可靠的。这项工作不仅提供了竞争力的 cosmological 限制,还介绍了在 future galaxy surveys 中使用 Additional cosmological 信息的新方法。
Field-level simulation-based inference with galaxy catalogs: the impact of systematic effects
paper_authors: Natalí S. M. de Santi, Francisco Villaescusa-Navarro, L. Raul Abramo, Helen Shao, Lucia A. Perez, Tiago Castro, Yueying Ni, Christopher C. Lovell, Elena Hernandez-Martinez, Federico Marinacci, David N. Spergel, Klaus Dolag, Lars Hernquist, Mark Vogelsberger
results: 这 paper 发现,尽管 observational effects 会降低模型的精度和准确性,但模型仍然可以在大多数 galaxy catalogs 中进行良好的测试,并且可以在实际数据上应用。Abstract
It has been recently shown that a powerful way to constrain cosmological parameters from galaxy redshift surveys is to train graph neural networks to perform field-level likelihood-free inference without imposing cuts on scale. In particular, de Santi et al. (2023) developed models that could accurately infer the value of $\Omega_{\rm m}$ from catalogs that only contain the positions and radial velocities of galaxies that are robust to uncertainties in astrophysics and subgrid models. However, observations are affected by many effects, including 1) masking, 2) uncertainties in peculiar velocities and radial distances, and 3) different galaxy selections. Moreover, observations only allow us to measure redshift, intertwining galaxies' radial positions and velocities. In this paper we train and test our models on galaxy catalogs, created from thousands of state-of-the-art hydrodynamic simulations run with different codes from the CAMELS project, that incorporate these observational effects. We find that, although the presence of these effects degrades the precision and accuracy of the models, and increases the fraction of catalogs where the model breaks down, the fraction of galaxy catalogs where the model performs well is over 90 %, demonstrating the potential of these models to constrain cosmological parameters even when applied to real data.
摘要
最近研究表明,可以使用图 нейрон网络来无需割辑的 cosmological 参数的约束,从 galaxy 红移测量中获取信息。特别是,de Santi 等 (2023) 开发了可以准确地从 Catalog 中获取 $\Omega_{\rm m}$ 的模型,这些模型对 astrophysics 和 subgrid 模型中的不确定性具有抗衰减的性能。然而,观测受到多种效应的影响,包括 1) 遮盲、2) peculiar 速度和 radial 距离的不确定性、3) 不同的 galaxy 选择。此外,观测只能测量红移,因此 galaxy 的 radial 位置和速度相互影响。在这篇文章中,我们使用 thousands 个 state-of-the-art hydrodynamic simulation 创建了包含这些观测效应的 galaxy 目录,然后使用不同的 CAMELS 项目代码来训练和测试我们的模型。我们发现,尽管这些效应使模型的精度和准确性下降,并使模型在部分目录中失效,但在 galaxy 目录中模型的性能仍然高于 90%,这表明这些模型可以将 cosmological 参数约束在实际数据上。
Unlocking the Transferability of Tokens in Deep Models for Tabular Data
results: TabToken 可以允许模型在少量训练示例下进行 Fine-tuning,并且可以提高深度神经网络在标准分类和回归任务中的泛化能力。Abstract
Fine-tuning a pre-trained deep neural network has become a successful paradigm in various machine learning tasks. However, such a paradigm becomes particularly challenging with tabular data when there are discrepancies between the feature sets of pre-trained models and the target tasks. In this paper, we propose TabToken, a method aims at enhancing the quality of feature tokens (i.e., embeddings of tabular features). TabToken allows for the utilization of pre-trained models when the upstream and downstream tasks share overlapping features, facilitating model fine-tuning even with limited training examples. Specifically, we introduce a contrastive objective that regularizes the tokens, capturing the semantics within and across features. During the pre-training stage, the tokens are learned jointly with top-layer deep models such as transformer. In the downstream task, tokens of the shared features are kept fixed while TabToken efficiently fine-tunes the remaining parts of the model. TabToken not only enables knowledge transfer from a pre-trained model to tasks with heterogeneous features, but also enhances the discriminative ability of deep tabular models in standard classification and regression tasks.
摘要
现在许多机器学习任务中,调整预训练的深度神经网络已成为一种成功的方法。然而,这种方法在表格数据上特别地具有挑战性,因为预训练模型和目标任务的特征集不匹配。在这篇论文中,我们提出了TabToken方法,用于提高表格特征的嵌入(即预训练模型中的特征表示)的质量。TabToken使得可以在预训练模型和下游任务之间共享特征,从而实现模型的微调,即使具有有限的训练示例。TabToken方法的核心思想是通过对嵌入进行对比目标,使它们捕捉表格特征之间的含义。在预训练阶段,我们将嵌入与深度模型(如转换器)一起学习。在下游任务中, TabToken可以高效地微调剩下的模型部分,而不需要更改嵌入。TabToken不仅可以将预训练模型传递给具有不同特征的任务中,还可以提高深度表格模型在标准分类和回归任务中的预测能力。
Hyperparameter optimization of hp-greedy reduced basis for gravitational wave surrogates
results: 研究发现,使用本地hp-greedy减少基和搜索法可以提高减少基的精度,同时减少计算时间。Specifically, for gravitational waves from the collision of two spinning but non-precessing black holes, the local hp-greedy reduced bases with HPO have a lower dimensionality of up to $4 \times$ for the cases here studied, depending on the desired accuracy. This factor should directly translate in a parameter estimation speedup.Abstract
In a previous work we introduced, in the context of gravitational wave science, an initial study on an automated domain-decomposition approach for reduced basis through hp-greedy refinement. The approach constructs local reduced bases of lower dimensionality than global ones, with the same or higher accuracy. These ``light'' local bases should imply both faster evaluations when predicting new waveforms and faster data analysis, in particular faster statistical inference (the forward and inverse problems, respectively). In this approach, however, we have previously found important dependence on several hyperparameters, which do not appear in global reduced basis. This naturally leads to the problem of hyperparameter optimization (HPO), which is the subject of this paper. We tackle the problem through a Bayesian optimization, and show its superiority when compared to grid or random searches. We find that for gravitational waves from the collision of two spinning but non-precessing black holes, for the same accuracy, local hp-greedy reduced bases with HPO have a lower dimensionality of up to $4 \times$ for the cases here studied, depending on the desired accuracy. This factor should directly translate in a parameter estimation speedup, for instance. Such acceleration might help in the near real-time requirements for electromagnetic counterparts of gravitational waves from compact binary coalescences. In addition, we find that the Bayesian approach used in this paper for HPO is two orders of magnitude faster than, for example, a grid search, with about a $100 \times$ acceleration. The code developed for this project is available as open source from public repositories.
摘要
在我们之前的工作中,我们已经在 gravitational wave 科学中提出了一种自动域分解方法,通过 hp-greedy 精炼来建立地方减少基。这种方法可以建立较低维度的地方减少基,而不损失全球减少基的精度。这些“轻”的地方基应该会导致在预测新波形时的评估更快,以及数据分析更快,特别是更快的统计推断(前向和反向问题)。在这种方法中,我们已经发现了一些重要的超参数,这些超参数不在全球减少基中出现。这自然导致了超参数优化(HPO)的问题,这是本文的主题。我们使用 Bayesian 优化来解决这个问题,并证明了它在比例搜索和随机搜索之间的优势。我们发现,对于两个旋转 pero 非旋转的黑洞凝合的 gravitational waves,使用本地 hp-greedy 减少基和 HPO,可以在保持同等精度情况下,减少维度的幅度为 $4 \times$,具体取决于所需精度。这个因子应该直接导致参数估计速度增加,例如。这种加速可能会帮助在 compact binary coalescences 中的 near real-time 需求。此外,我们发现 Bayesian 方法在这个项目中的HPO速度是两个数量级快于,例如,格子搜索,约 $100 \times$ 加速。我们开发的代码可以在公共库中获取。
Mixed-Variable Global Sensitivity Analysis For Knowledge Discovery And Efficient Combinatorial Materials Design
methods: 本研究提出了一种基于Latent Variable Gaussian Process (LVGP)和Sobol分析的混合变量GSA方法,并通过数值实验验证其效果。
results: 研究表明,该方法可以有效地探讨混合变量设计中的敏感度问题,并且可以在多bjective Bayesian optimization (BO)中应用,以加速Pareto前设计探索。Abstract
Global Sensitivity Analysis (GSA) is the study of the influence of any given inputs on the outputs of a model. In the context of engineering design, GSA has been widely used to understand both individual and collective contributions of design variables on the design objectives. So far, global sensitivity studies have often been limited to design spaces with only quantitative (numerical) design variables. However, many engineering systems also contain, if not only, qualitative (categorical) design variables in addition to quantitative design variables. In this paper, we integrate Latent Variable Gaussian Process (LVGP) with Sobol' analysis to develop the first metamodel-based mixed-variable GSA method. Through numerical case studies, we validate and demonstrate the effectiveness of our proposed method for mixed-variable problems. Furthermore, while the proposed GSA method is general enough to benefit various engineering design applications, we integrate it with multi-objective Bayesian optimization (BO) to create a sensitivity-aware design framework in accelerating the Pareto front design exploration for metal-organic framework (MOF) materials with many-level combinatorial design spaces. Although MOFs are constructed only from qualitative variables that are notoriously difficult to design, our method can utilize sensitivity analysis to navigate the optimization in the many-level large combinatorial design space, greatly expediting the exploration of novel MOF candidates.
摘要
全球敏感分析(GSA)是研究模型输出受输入影响的学科。在工程设计方面,GSA已广泛应用于理解设计变量对设计目标的影响。虽然现有的全球敏感研究多数是限制在只含数字(量化)设计变量的设计空间内,但是许多工程系统也包含至少一些 categorical(分类)设计变量。在本文中,我们将Latent Variable Gaussian Process(LVGP)与索博尔分析结合,开发出首个混合变量GSA方法。通过数字实验,我们验证并证明了我们提议的方法在混合变量问题上的效果。此外,我们将这种GSA方法与多目标 Bayesian 优化(BO)结合,创建一种敏感意识设计框架,以加速MOF材料的Pareto前面设计探索。MOFs 是由仅数字变量组成的,这些变量很难设计的,但我们的方法可以通过敏感分析,在多级大 combinatorial 设计空间中导航优化,大大加速了探索新MOF канди达。
Evaluating machine learning models in non-standard settings: An overview and new findings
results: simulations 表明,使用适应非标准设置的GE估计方法可以减轻标准抽样方法中的偏见,这 Further emphasizes the importance of tailoring GE estimation methods to specific non-standard settings.Abstract
Estimating the generalization error (GE) of machine learning models is fundamental, with resampling methods being the most common approach. However, in non-standard settings, particularly those where observations are not independently and identically distributed, resampling using simple random data divisions may lead to biased GE estimates. This paper strives to present well-grounded guidelines for GE estimation in various such non-standard settings: clustered data, spatial data, unequal sampling probabilities, concept drift, and hierarchically structured outcomes. Our overview combines well-established methodologies with other existing methods that, to our knowledge, have not been frequently considered in these particular settings. A unifying principle among these techniques is that the test data used in each iteration of the resampling procedure should reflect the new observations to which the model will be applied, while the training data should be representative of the entire data set used to obtain the final model. Beyond providing an overview, we address literature gaps by conducting simulation studies. These studies assess the necessity of using GE-estimation methods tailored to the respective setting. Our findings corroborate the concern that standard resampling methods often yield biased GE estimates in non-standard settings, underscoring the importance of tailored GE estimation.
摘要
估计机器学习模型的总体化误差(GE)是基础知识,使用重新采样方法是最常见的方法。然而,在非标准设置下,特别是 observation 不是独立同分布的情况下,使用简单随机数据分区的重新采样可能会导致GE估计偏差。这篇论文努力提供在不同非标准设置下的固定指南:分区数据、空间数据、不均匀采样概率、概念演变和层次结构化输出。我们的概述结合了已有的方法ologies 以及其他一些未经常考虑的方法,这些方法的共同原则是在每次重新采样过程中使用测试数据应用于新观察数据,而训练数据应该反映整个数据集来获得最终模型。此外,我们还进行了literature gap的填充,通过进行 simulations 来评估标准重新采样方法在不同设置下是否具有偏差。我们的发现表明,在非标准设置下,使用标准重新采样方法可能会导致GE估计偏差,这更加证明了需要适应的GE估计方法。
A Canonical Data Transformation for Achieving Inter- and Within-group Fairness
results: 在 COMPAS 风险评估数据和 Law School 数据上应用了这种预处理框架,与两种常量化方法进行比较,结果表明,这种方法可以更好地实现 между群体公平性和同群体内公平性。Abstract
Increases in the deployment of machine learning algorithms for applications that deal with sensitive data have brought attention to the issue of fairness in machine learning. Many works have been devoted to applications that require different demographic groups to be treated fairly. However, algorithms that aim to satisfy inter-group fairness (also called group fairness) may inadvertently treat individuals within the same demographic group unfairly. To address this issue, we introduce a formal definition of within-group fairness that maintains fairness among individuals from within the same group. We propose a pre-processing framework to meet both inter- and within-group fairness criteria with little compromise in accuracy. The framework maps the feature vectors of members from different groups to an inter-group-fair canonical domain before feeding them into a scoring function. The mapping is constructed to preserve the relative relationship between the scores obtained from the unprocessed feature vectors of individuals from the same demographic group, guaranteeing within-group fairness. We apply this framework to the COMPAS risk assessment and Law School datasets and compare its performance in achieving inter-group and within-group fairness to two regularization-based methods.
摘要
《机器学习公平性问题的研究》随着机器学习算法的应用于敏感数据领域的增加,对机器学习公平性的问题引起了广泛的关注。许多研究集中在不同民族群体待遇公平的应用上。然而,旨在满足多个民族群体公平(也称为群体公平)的算法可能无意中对同一个民族群体内的个体不公。为解决这问题,我们提出了一种形式定义的同一个群体内公平性(也称为内部公平),以保证同一个民族群体内的个体待遇公平。我们提议一种预处理框架,能同时满足多个民族群体公平和同一个群体内公平的要求,而无需妥协精度。这种框架将不同群体的特征向量映射到一个公平的共同领域,然后通过 scoring 函数进行评估。映射是根据不同群体内个体的未处理特征向量 scores 的相对关系,以保证同一个群体内公平。我们在 COMPAS 风险评估和法学院数据集上应用了这种框架,并与两种常量化方法进行比较,以评估它们在多个群体公平和同一个群体内公平方面的表现。
results: 实现了首次完全量子化的 federated learning 方案,并且在一个分布式 federated learning 环境中实现了高效的数据交换和学习Here’s a more detailed explanation of each point:
for: The paper aims to improve the security and privacy of deep learning models by using the quantum internet and quantum learning weights.
methods: The proposed method uses a decentralized ring topology for federated learning, where each client is given a portion of the entire dataset and only performs training on that set. Additionally, the paper introduces the use of quantum weights for quantum federated learning, which allows the training to be performed entirely in quantum.
results: The paper achieves the first successful implementation of a completely quantumized federated learning scheme, and demonstrates high-efficiency data exchange and learning in a distributed federated learning environment.Abstract
A major concern of deep learning models is the large amount of data that is required to build and train them, much of which is reliant on sensitive and personally identifiable information that is vulnerable to access by third parties. Ideas of using the quantum internet to address this issue have been previously proposed, which would enable fast and completely secure online communications. Previous work has yielded a hybrid quantum-classical transfer learning scheme for classical data and communication with a hub-spoke topology. While quantum communication is secure from eavesdrop attacks and no measurements from quantum to classical translation, due to no cloning theorem, hub-spoke topology is not ideal for quantum communication without quantum memory. Here we seek to improve this model by implementing a decentralized ring topology for the federated learning scheme, where each client is given a portion of the entire dataset and only performs training on that set. We also demonstrate the first successful use of quantum weights for quantum federated learning, which allows us to perform our training entirely in quantum.
摘要
深度学习模型的一个主要问题是需要大量数据来建立和训练它们,大多数数据是敏感和个人可识别信息,容易被第三方访问。有人提出使用量子互联网解决这个问题,可以实现快速和完全安全的在线通信。之前的工作已经实现了一种混合量子-类征传输学习方案,使用枢轴-螺旋托波ptopлоги,可以在类征数据和通信中进行快速的传输。然而,枢轴-螺旋托波ptopлоги并不适合量子通信,因为量子通信需要量子存储。我们在这里尝试改进这个模型,使用分布式环形托波ptopлоги,每个客户端都 receives一部分整个数据集,并仅在该集上进行训练。我们还实现了首次量子质量用于量子联合学习,允许我们完全在量子上进行训练。
Coordinated Replay Sample Selection for Continual Federated Learning
results: 这 paper 通过对大规模实验来证明,使用 gradient-based 播种样本选择方法可以提高性能并降低忘记,并且协调客户端的播种样本选择可以在低播种大小的情况下获得提高。Abstract
Continual Federated Learning (CFL) combines Federated Learning (FL), the decentralized learning of a central model on a number of client devices that may not communicate their data, and Continual Learning (CL), the learning of a model from a continual stream of data without keeping the entire history. In CL, the main challenge is \textit{forgetting} what was learned from past data. While replay-based algorithms that keep a small pool of past training data are effective to reduce forgetting, only simple replay sample selection strategies have been applied to CFL in prior work, and no previous work has explored coordination among clients for better sample selection. To bridge this gap, we adapt a replay sample selection objective based on loss gradient diversity to CFL and propose a new relaxation-based selection of samples to optimize the objective. Next, we propose a practical algorithm to coordinate gradient-based replay sample selection across clients without communicating private data. We benchmark our coordinated and uncoordinated replay sample selection algorithms against random sampling-based baselines with language models trained on a large scale de-identified real-world text dataset. We show that gradient-based sample selection methods both boost performance and reduce forgetting compared to random sampling methods, with our coordination method showing gains early in the low replay size regime (when the budget for storing past data is small).
摘要
Fast 2D Bicephalous Convolutional Autoencoder for Compressing 3D Time Projection Chamber Data
paper_authors: Yi Huang, Yihui Ren, Shinjae Yoo, Jin Huang
For: The paper is written for the development and application of real-time data compression algorithms for high-energy large-scale particle colliders.* Methods: The paper uses a 3D convolutional neural network (CNN)-based approach called Bicephalous Convolutional Autoencoder (BCAE) to compress the data, and two variants of BCAE are introduced: BCAE++ and BCAE-2D.* Results: The paper achieves a 15% better compression ratio and a 77% better reconstruction accuracy compared with traditional methods, and the BCAE++ and BCAE-2D variants can benefit more from using half-precision mode without loss in reconstruction accuracy.Here are the three points in Simplified Chinese text:
results: 本文比传统方法提供15%更好的压缩率和77%更好的重建精度,而BCAE++和BCAE-2D变体可以在使用半 Float 模式时提供更大的提高。Abstract
High-energy large-scale particle colliders produce data at high speed in the order of 1 terabytes per second in nuclear physics and petabytes per second in high-energy physics. Developing real-time data compression algorithms to reduce such data at high throughput to fit permanent storage has drawn increasing attention. Specifically, at the newly constructed sPHENIX experiment at the Relativistic Heavy Ion Collider (RHIC), a time projection chamber is used as the main tracking detector, which records particle trajectories in a volume of a three-dimensional (3D) cylinder. The resulting data are usually very sparse with occupancy around 10.8%. Such sparsity presents a challenge to conventional learning-free lossy compression algorithms, such as SZ, ZFP, and MGARD. The 3D convolutional neural network (CNN)-based approach, Bicephalous Convolutional Autoencoder (BCAE), outperforms traditional methods both in compression rate and reconstruction accuracy. BCAE can also utilize the computation power of graphical processing units suitable for deployment in a modern heterogeneous high-performance computing environment. This work introduces two BCAE variants: BCAE++ and BCAE-2D. BCAE++ achieves a 15% better compression ratio and a 77% better reconstruction accuracy measured in mean absolute error compared with BCAE. BCAE-2D treats the radial direction as the channel dimension of an image, resulting in a 3x speedup in compression throughput. In addition, we demonstrate an unbalanced autoencoder with a larger decoder can improve reconstruction accuracy without significantly sacrificing throughput. Lastly, we observe both the BCAE++ and BCAE-2D can benefit more from using half-precision mode in throughput (76-79% increase) without loss in reconstruction accuracy. The source code and links to data and pretrained models can be found at https://github.com/BNL-DAQ-LDRD/NeuralCompression_v2.
摘要
高能量大规模粒子Collider生成数据的速度在高能物理和高能物理领域分别为1 terabytes每秒和petabytes每秒。为了减少这些数据,开发实时数据压缩算法已经引起了越来越多的关注。具体来说,在 newly constructed sPHENIX experiment中,使用了时间投影室来作为主跟踪器,记录粒子轨迹在三维cylinder中的卷积体中。结果的数据通常很稀疏,占用率约为10.8%。这种稀疏性对于传统的学习无损压缩算法(如SZ、ZFP和MGARD)来说是一个挑战。使用3D卷积神经网络(CNN)基于的方法,比如双颈 convolutional autoencoder(BCAE),可以在压缩率和重建精度两个方面都超越传统方法。BCAE还可以利用现代高性能计算环境中的图形处理器,并且可以使用半precision模式,无需loss在重建精度上。在这项工作中,我们介绍了两种BCAE变体:BCAE++和BCAE-2D。BCAE++可以达到15%更好的压缩率和77%更好的重建精度(计算为平均绝对误差),相比BCAE。BCAE-2D将卷积方向视为图像的通道维度,从而实现了3倍的压缩速率。此外,我们还证明了一个不均衡的 autoencoder 可以在重建精度方面获得更好的表现,而无需Significant sacrifice in throughput。最后,我们发现BCAE++和BCAE-2D都可以通过使用半precision模式来提高 Throughput(76-79% 增加),无需loss在重建精度上。相关代码和数据可以在https://github.com/BNL-DAQ-LDRD/NeuralCompression_v2中找到。
Leveraging Deep Learning for Abstractive Code Summarization of Unofficial Documentation
results: 比较于人工生成的摘要和之前的工作,深度学习算法能够提高摘要质量,提高了precision、recall和F-measure的平均值,并且运行速度比之前的工作快4.4倍。Abstract
Usually, programming languages have official documentation to guide developers with APIs, methods, and classes. However, researchers identified insufficient or inadequate documentation examples and flaws with the API's complex structure as barriers to learning an API. As a result, developers may consult other sources (StackOverflow, GitHub, etc.) to learn more about an API. Recent research studies have shown that unofficial documentation is a valuable source of information for generating code summaries. We, therefore, have been motivated to leverage such a type of documentation along with deep learning techniques towards generating high-quality summaries for APIs discussed in informal documentation. This paper proposes an automatic approach using the BART algorithm, a state-of-the-art transformer model, to generate summaries for APIs discussed in StackOverflow. We built an oracle of human-generated summaries to evaluate our approach against it using ROUGE and BLEU metrics which are the most widely used evaluation metrics in text summarization. Furthermore, we evaluated our summaries empirically against a previous work in terms of quality. Our findings demonstrate that using deep learning algorithms can improve summaries' quality and outperform the previous work by an average of %57 for Precision, %66 for Recall, and %61 for F-measure, and it runs 4.4 times faster.
摘要
通常,编程语言都有官方文档,以帮助开发者了解API、方法和类。然而,研究人员发现,官方文档中的示例不够或者不够详细,导致开发者需要寻找其他来源(如Stack Overflow、GitHub等)来学习更多关于API的信息。最近的研究表明,非官方文档是开发者学习API的重要来源。因此,我们受到了这种类型的文档的启发,并使用深度学习技术来生成高质量的APISUMMARY。本文提出了一种自动化方法,使用BART算法,一种现代的转移模型,来生成Stack Overflow上讨论的APISUMMARY。我们建立了一个人类生成的SUMMARY oracle来评估我们的方法,使用ROUGE和BLEU metrics,这两个 metrics 是文本摘要评估中最常用的评估 metric。此外,我们也对我们的摘要进行了实际评估,并与之前的工作进行了比较。我们的结果表明,使用深度学习算法可以提高摘要质量,并超过之前的工作平均57%的精度、66%的准确率和61%的评价率,并且运行速度比之前的工作快4.4倍。
Neural Snowflakes: Universal Latent Graph Inference via Trainable Latent Geometries
paper_authors: Haitz Sáez de Ocáriz Borde, Anastasis Kratsios for:* 这个论文主要目标是提高图神经网络(GNN)的预测性能,并通过动态重构图神经网络的假设来实现这一目标。methods:* 该论文提出了一种名为“神经雪花”的深度学习架构,可以在$\mathbb{R}^d$上实现自适应的几何学量表示。* 该架构包括一个标准的多层感知(MLP)编码器和一个神经网络。results:* 该论文证明了任意给定的Finite Weights Graph可以通过标准的MLP编码器进行归一化。* 当latent graph可以在特征空间中被表示为一个足够 régulier kernel时,该架构并不会陷入维度下降的问题,只需要一个低度的多项式数量的参数。* 该实现可以实现一个低维度几何学量的归一化表示。* 对于 sintética experiment和graph benchmark,神经雪花模型表现出了更好的度量学习能力,并且在latent graph推理中实现了更高的预测性能。Abstract
The inductive bias of a graph neural network (GNN) is largely encoded in its specified graph. Latent graph inference relies on latent geometric representations to dynamically rewire or infer a GNN's graph to maximize the GNN's predictive downstream performance, but it lacks solid theoretical foundations in terms of embedding-based representation guarantees. This paper addresses this issue by introducing a trainable deep learning architecture, coined neural snowflake, that can adaptively implement fractal-like metrics on $\mathbb{R}^d$. We prove that any given finite weights graph can be isometrically embedded by a standard MLP encoder. Furthermore, when the latent graph can be represented in the feature space of a sufficiently regular kernel, we show that the combined neural snowflake and MLP encoder do not succumb to the curse of dimensionality by using only a low-degree polynomial number of parameters in the number of nodes. This implementation enables a low-dimensional isometric embedding of the latent graph. We conduct synthetic experiments to demonstrate the superior metric learning capabilities of neural snowflakes when compared to more familiar spaces like Euclidean space. Additionally, we carry out latent graph inference experiments on graph benchmarks. Consistently, the neural snowflake model achieves predictive performance that either matches or surpasses that of the state-of-the-art latent graph inference models. Importantly, this performance improvement is achieved without requiring random search for optimal latent geometry. Instead, the neural snowflake model achieves this enhancement in a differentiable manner.
摘要
<> translate "The inductive bias of a graph neural network (GNN) is largely encoded in its specified graph. Latent graph inference relies on latent geometric representations to dynamically rewire or infer a GNN's graph to maximize the GNN's predictive downstream performance, but it lacks solid theoretical foundations in terms of embedding-based representation guarantees. This paper addresses this issue by introducing a trainable deep learning architecture, coined neural snowflake, that can adaptively implement fractal-like metrics on $\mathbb{R}^d$. We prove that any given finite weights graph can be isometrically embedded by a standard MLP encoder. Furthermore, when the latent graph can be represented in the feature space of a sufficiently regular kernel, we show that the combined neural snowflake and MLP encoder do not succumb to the curse of dimensionality by using only a low-degree polynomial number of parameters in the number of nodes. This implementation enables a low-dimensional isometric embedding of the latent graph. We conduct synthetic experiments to demonstrate the superior metric learning capabilities of neural snowflakes when compared to more familiar spaces like Euclidean space. Additionally, we carry out latent graph inference experiments on graph benchmarks. Consistently, the neural snowflake model achieves predictive performance that either matches or surpasses that of the state-of-the-art latent graph inference models. Importantly, this performance improvement is achieved without requiring random search for optimal latent geometry. Instead, the neural snowflake model achieves this enhancement in a differentiable manner."中文翻译:<>graph neural network(GNN)的逻辑偏好主要嵌入到其指定的图中。潜在图的推理依赖于潜在的几何表示来动态重新配置或推理GNN的图以最大化GNN的下游预测性能,但是它缺乏嵌入基于嵌入表示的理论基础。这篇文章通过引入可调深度学习架构,称为神经风暴,来解决这一问题。我们证明任何给定的Finite Weights图可以通过标准的多层感知器编码器得到准确的嵌入。此外,当潜在图可以在特定的feature空间中表示,我们显示了combined神经风暴和多层感知器编码器不会陷入维度的恶性效应,只使用了低度多项式数量的参数。这种实现可以实现低维度准确嵌入。我们在 sintetic experiments中示出了神经风暴在比较 familar的Euclidean space中的superior metric learning能力。此外,我们在图 benchmark上进行了潜在图推理实验。consistently,神经风暴模型在predictive performance中 either matches or surpasses state-of-the-art latent graph inference模型。这种性能改进不需要随机搜索最佳潜在几何,而是通过可导的方式实现。
results: 论文表明,相比现有文献中的提议,其机制可以减少金融风险。Abstract
Machine learning tasks are vulnerable to the quality of data used as input. Yet, it is often challenging for firms to obtain adequate datasets, with them being naturally distributed amongst owners, that in practice, may be competitors in a downstream market and reluctant to share information. Focusing on supervised learning for regression tasks, we develop a \textit{regression market} to provide a monetary incentive for data sharing. Our proposed mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in current literature expose the market agents to sizeable financial risks, which can be mitigated in our probabilistic setting.
摘要
机器学习任务容易受到输入数据质量的影响。然而,很难 для公司获得足够的数据集,因为它们自然地分布在所有者手中,而这些所有者在下游市场中可能是竞争对手,并可能不愿分享信息。我们关注supervised learning regression任务,并开发了一个called“回归市场”,以提供数据共享的经济利益。我们提议的机制采用 bayesian 框架,允许我们考虑更加一般的回归任务。我们对市场属性进行了广泛的探索,并证明了现有文献中的类似提议会对市场代理者带来巨大的金融风险,这些风险可以在我们的概率设置中减少。
Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate
results: 在多种顺序模型任务中,DMU表现出优于其他状态 искус的批量RNN模型,使用参数量相对较少,并且在语音识别、雷达手势识别、ECG波形分 segmentation和排序顺序图像分类等应用中达到了优秀的效果。Abstract
Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies, a trait that has driven their widespread adoption for sequential data processing. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs tend to be over-parameterized, resulting in poor network generalization. To address these challenges, we propose a novel Delayed Memory Unit (DMU) in this paper, wherein a delay line structure, coupled with delay gates, is introduced to facilitate temporal interaction and temporal credit assignment, so as to enhance the temporal modeling capabilities of vanilla RNNs. Particularly, the DMU is designed to directly distribute the input information to the optimal time instant in the future, rather than aggregating and redistributing it over time through intricate network dynamics. Our proposed DMU demonstrates superior temporal modeling capabilities across a broad range of sequential modeling tasks, utilizing considerably fewer parameters than other state-of-the-art gated RNN models in applications such as speech recognition, radar gesture recognition, ECG waveform segmentation, and permuted sequential image classification.
摘要
循环神经网络(RNN)因其能够模型时间相关性而广泛应用于序列数据处理。然而,vanilla RNN受到时间梯度消失和爆炸问题的威胁,这使得学习和建立长距离时间相关性变得困难。另外,闭合RNN往往过参数化,从而导致网络泛化不佳。为了解决这些挑战,我们在这篇论文中提出了一种新的延迟记忆单元(DMU),其中包括延迟线结构和延迟门。这种结构可以促进时间相互作用和时间凭证,从而提高vanilla RNN的时间模型能力。具体来说,DMU可以直接将输入信息传递到将来的最佳时间点,而不是通过复杂的网络动力来聚合和重新分配输入信息。我们的提出的DMU在多种序列模型任务中显示出了superior的时间模型能力,使用的参数数量比其他state-of-the-art闭合RNN模型少得多。应用包括语音识别、雷达手势识别、ECG波形分 segmentation和排序序列图像分类。
Reinforcement learning in large, structured action spaces: A simulation study of decision support for spinal cord injury rehabilitation
results: 通过在基于SCI康复治疗的 simulations 中进行的研究,我们发现两种方法都可以帮助 физи奥терапев特的决策更加有效,但基于领域知识的方法表现更好。我们的发现表明RL可以用于改进SCI康复治疗的治疗方案,并且继续收集数据并应用RL于这个领域是值得的。Abstract
Reinforcement learning (RL) has helped improve decision-making in several applications. However, applying traditional RL is challenging in some applications, such as rehabilitation of people with a spinal cord injury (SCI). Among other factors, using RL in this domain is difficult because there are many possible treatments (i.e., large action space) and few patients (i.e., limited training data). Treatments for SCIs have natural groupings, so we propose two approaches to grouping treatments so that an RL agent can learn effectively from limited data. One relies on domain knowledge of SCI rehabilitation and the other learns similarities among treatments using an embedding technique. We then use Fitted Q Iteration to train an agent that learns optimal treatments. Through a simulation study designed to reflect the properties of SCI rehabilitation, we find that both methods can help improve the treatment decisions of physiotherapists, but the approach based on domain knowledge offers better performance. Our findings provide a "proof of concept" that RL can be used to help improve the treatment of those with an SCI and indicates that continued efforts to gather data and apply RL to this domain are worthwhile.
摘要
利用强化学习(RL)可以提高决策的几种应用程序。然而,在某些应用程序中,如重abilitation的人类脊梁损伤(SCI),使用传统RL具有挑战性。这些应用程序的原因包括许多可能的治疗方法(即大的动作空间)和少量患者(即有限的训练数据)。SCI的治疗方法有自然的分组,我们提议分两种方法来对治疗方法进行分组,以便RL代理人可以从有限数据中学习有效。其中一种基于SCI重abilitation的领域知识,另一种使用嵌入技术来学习治疗方法之间的相似性。然后,我们使用Fitted Q Iteration来训练一个学习最佳治疗的RL代理人。通过一个设计来模拟SCI重abilitation的特性的实验研究,我们发现两种方法都可以帮助physiotherapist们改善治疗决策,但基于领域知识的方法提供更好的表现。我们的发现表明RL可以用于改善SCI患者的治疗,并且继续收集数据并应用RL于这个领域是值得的。
The Fundamental Dilemma of Bayesian Active Meta-learning
paper_authors: Sabina J. Sloman, Ayush Bharti, Samuel Kaski
for: 解决多个多样化、数据稀缺的任务环境中参数的估计问题
methods: 使用 bayesian active meta-learning 方法,即顺序优化实验设计
results: 研究表明,在某些任务下,积极寻求通用知识可能会导致负面转移(即负面影响),而且任务标识是减少这个威胁的关键Note: “负面转移” (negative transfer) refers to the phenomenon where learning a task-specific model can negatively impact the performance on other tasks, and “任务标识” (task identification) refers to the process of identifying which task the model is currently working on.Abstract
Many applications involve estimation of parameters that generalize across multiple diverse, but related, data-scarce task environments. Bayesian active meta-learning, a form of sequential optimal experimental design, provides a framework for solving such problems. The active meta-learner's goal is to gain transferable knowledge (estimate the transferable parameters) in the presence of idiosyncratic characteristics of the current task (task-specific parameters). We show that in such a setting, greedy pursuit of this goal can actually hurt estimation of the transferable parameters (induce so-called negative transfer). The learner faces a dilemma akin to but distinct from the exploration--exploitation dilemma: should they spend their acquisition budget pursuing transferable knowledge, or identifying the current task-specific parameters? We show theoretically that some tasks pose an inevitable and arbitrarily large threat of negative transfer, and that task identification is critical to reducing this threat. Our results generalize to analysis of prior misspecification over nuisance parameters. Finally, we empirically illustrate circumstances that lead to negative transfer.
摘要
许多应用涉及到多个多样化但相关的数据稀少任务环境中参数的估计。 bayesian active meta-learning 是一种顺序优化实验设计的框架,用于解决这些问题。 active meta-learner 的目标是在存在当前任务特有的特征(task-specific parameters)的情况下获得可转移知识(estimate the transferable parameters)。我们表明,在这种情况下,急性追求这个目标可能会对可转移参数的估计产生负面影响(induce so-called negative transfer)。学习者面临一种类似于但不同于exploration--exploitation dilemma:他们应该花费收集预算来追求可转移知识,还是确定当前任务特有的参数?我们表证了一些任务会导致无限大的负面影响,并且任务标识是减少这种威胁的关键。我们的结果泛化到对假设错误的分析。最后,我们employs实验证明了在某些情况下,negative transfer 会发生。
methods: 该论文提议 AdamQLR 优化器,它结合了 K-FAC 的抑制策略和 Adam 的更新方向,并且通过对 Adam 的解释来提高计算效率。
results: 该论文在一系列回归和分类任务上测试了 AdamQLR,并实现了与计算时间成正相关的总体化表现。Abstract
Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). We seek to combine the benefits of both approaches into a single computationally-efficient algorithm. Noting that second-order methods often depend on stabilising heuristics (such as Levenberg-Marquardt damping), we propose AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens and Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales, achieving competitive generalisation performance vs runtime.
摘要
XTSC-Bench: Quantitative Benchmarking for Explainers on Time Series Classification
results: 研究发现,对多变量时间序列数据的解释方法需要进一步改进正确性和可靠性,特别是在多变量情况下。Abstract
Despite the growing body of work on explainable machine learning in time series classification (TSC), it remains unclear how to evaluate different explainability methods. Resorting to qualitative assessment and user studies to evaluate explainers for TSC is difficult since humans have difficulties understanding the underlying information contained in time series data. Therefore, a systematic review and quantitative comparison of explanation methods to confirm their correctness becomes crucial. While steps to standardized evaluations were taken for tabular, image, and textual data, benchmarking explainability methods on time series is challenging due to a) traditional metrics not being directly applicable, b) implementation and adaption of traditional metrics for time series in the literature vary, and c) varying baseline implementations. This paper proposes XTSC-Bench, a benchmarking tool providing standardized datasets, models, and metrics for evaluating explanation methods on TSC. We analyze 3 perturbation-, 6 gradient- and 2 example-based explanation methods to TSC showing that improvements in the explainers' robustness and reliability are necessary, especially for multivariate data.
摘要
尽管有一个不断增长的研究对时间序列分类(TSC)的可解释机器学习(Explainable Machine Learning,EML)方法进行了评估,仍然没有清晰的方法来评估不同的可解释方法。由于人类对时间序列数据中含义的理解具有困难,因此对于TSC的可解释方法进行系统性的评估和量化比较成为了必要。尽管对于表格、图像和文本数据进行标准化评估的步骤已经存在,但对于时间序列数据进行标准化评估是困难的,主要因为:a) 传统指标不直接适用于时间序列数据,b) 在文献中对时间序列数据的传统指标实现和适应各有不同,c) 基准实现的变化。本文提出了XTSC-Bench,一个用于评估TSC可解释方法的标准化工具,提供了标准的数据集、模型和指标。我们对3种扰动-, 6种梯度-和2种示例基于的可解释方法进行了分析,发现这些方法的可靠性和稳定性需要进一步提高,特别是对多变量数据进行。
paper_authors: Alejandro Tejada-Lapuerta, Paul Bertin, Stefan Bauer, Hananeh Aliee, Yoshua Bengio, Fabian J. Theis
for: The paper is written to explore the application of causal techniques and algorithms to handle high-dimensional data in single-cell genomics, and to challenge the assumptions of current causal approaches from a biological perspective.
methods: The paper uses large-scale perturbation screens and single-cell omics technologies to measure the effect of targeted perturbations on the whole transcriptome, and discusses the application of established causal techniques and algorithms to handle high-dimensional data in this context.
results: The paper identifies open problems in the application of causal approaches to single-cell data, including generalizing to unseen environments, learning interpretable models, and learning causal models of dynamics, and discusses various research directions to address these challenges.Abstract
Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes such as gene regulation, disease progression or cellular development. However, the high-dimensional nature of the data, coupled with the intricate complexity of biological systems renders this task nontrivial. Within the machine learning community, there has been a recent increase of interest in causality, with a focus on adapting established causal techniques and algorithms to handle high-dimensional data. In this perspective, we delineate the application of these methodologies within the realm of single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology and discuss and challenge the assumptions it entails from the biological point of view. We then identify open problems in the application of causal approaches to single-cell data: generalising to unseen environments, learning interpretable models, and learning causal models of dynamics. For each problem, we discuss how various research directions - including the development of computational approaches and the adaptation of experimental protocols - may offer ways forward, or on the contrary pose some difficulties. With the advent of single cell atlases and increasing perturbation data, we expect causal models to become a crucial tool for informed experimental design.
摘要
新的单细胞ómics技术可以为每个单细胞提供无前例的透视 profiling。当与大规模的干扰屏试验结合使用时,这些技术可以测量特定生物学机制的影响,从而为整个转录 profiling提供指标。这些进步可以帮助我们更好地理解生物学过程中许多因素之间的关系,如细胞发育、疾病进程和基因调控。然而,高维数据的特点和生物系统的复杂性使得这个任务变得非常困难。在机器学习社区中,有一些最近的兴趣在 causality 方面,旨在适应高维数据的已有 causal 技术和算法。在这个视角下,我们将 causal 方法在单细胞 genomics 中的应用和挑战描述。首先,我们将现有的单细胞生物学模型描述,并讨论这些模型具有生物学上的假设。然后,我们将identify 单细胞数据中 causal 方法的开放问题,包括在未经见过的环境中泛化、学习可解释模型和学习动力学模型。对于每个问题,我们将讨论不同的研究方向,包括计算机方法的开发和实验室协议的修改,以及它们对 causal 模型的影响。随着单细胞 Atlases 的出现和干扰数据的增加,我们预计 causal 模型将成为单细胞 эксперименталь设计中的关键工具。
Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks
results: 在实验中,这个算法在训练 ResNet-18 模型在 CIFAR-10 上的过程中,与其他一般优化方法相比,具有相似的runtime和优化性能。Abstract
Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.
摘要
尽管第二阶 quasi-Newton 方法在连续优化中很受欢迎,但在机器学习中它们很难应用,因为希耶曼矩阵是不可能计算的。这个计算卷积的问题被加剧了,因为需要解决非对称性问题,例如通过修改希耶曼矩阵的特征值来使用 Saddle-Free Newton 方法。我们提议一种优化算法,它解决了这两个问题。我们认为这是第一个可以有效使用恒等式(特征值修改后的)希耶曼矩阵的 inverse 的优化算法。我们的方法将问题表示为一个序列, principales square-roots 并将其反转,然后使用它来预处理梯度向量,而无需显式计算或 eigendecomposing 希耶曼矩阵。一个对这个无穷序数列的 truncation 提供了一种新的优化算法,它是可扩展的并与其他第一阶和第二阶优化方法相比,在运行时间和优化性能方面具有相同的性能。我们在多个设置中展示了这一点,包括使用 ResNet-18 在 CIFAR-10 上训练。
Beyond Bayesian Model Averaging over Paths in Probabilistic Programs with Stochastic Support
methods: 论文使用的方法包括 Bayesian model averaging (BMA) 和 two alternative mechanisms for path weighting:一种是基于 stacking 的方法,另一种是基于 PAC-Bayes 的方法。
results: 试验表明,使用这两种机制可以更好地改善 predictions 的准确性,比 Default BMA weights 更加稳定和有效。Abstract
The posterior in probabilistic programs with stochastic support decomposes as a weighted sum of the local posterior distributions associated with each possible program path. We show that making predictions with this full posterior implicitly performs a Bayesian model averaging (BMA) over paths. This is potentially problematic, as model misspecification can cause the BMA weights to prematurely collapse onto a single path, leading to sub-optimal predictions in turn. To remedy this issue, we propose alternative mechanisms for path weighting: one based on stacking and one based on ideas from PAC-Bayes. We show how both can be implemented as a cheap post-processing step on top of existing inference engines. In our experiments, we find them to be more robust and lead to better predictions compared to the default BMA weights.
摘要
posterior in probabilistic 程序中的权重分布分解为每个可能的程序路径的本地 posterior 分布的加权和。我们显示出,使用这个全部 posterior 来进行预测实际上是在执行 Bayesian 模型均衡(BMA)中,这可能会导致模型错误所致的 weights 快速塌缩到单个路径上,从而导致预测质量下降。为了解决这个问题,我们提议了一些路径权重机制:一种基于堆叠,一种基于 PAC-Bayes 的想法。我们示示了这两种机制可以作为现有推理引擎的便宜后处理步骤来实现,并在我们的实验中发现它们比 default BMA weights 更加稳定,预测更加 precis。
results: 比random prior方法更高效地解决经典控制问题和普通的探索任务,大幅提高样本效率。Abstract
In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards in a given environment. During the learning process, RL agents face the dilemma of exploitation and exploration: leveraging existing knowledge to acquire rewards or seeking potentially higher ones. Using uncertainty as a guiding principle provides an active and effective approach to solving this dilemma and ensemble-based methods are one of the prominent avenues for quantifying uncertainty. Nevertheless, conventional ensemble-based uncertainty estimation lacks an explicit prior, deviating from Bayesian principles. Besides, this method requires diversity among members to generate less biased uncertainty estimation results. To address the above problems, previous research has incorporated random functions as priors. Building upon these foundational efforts, our work introduces an innovative approach with delicately designed prior NNs, which can incorporate maximal diversity in the initial value functions of RL. Our method has demonstrated superior performance compared with the random prior approaches in solving classic control problems and general exploration tasks, significantly improving sample efficiency.
摘要
在强化学习(RL)中,代理人目标是 maximize 累积奖励在给定环境中。在学习过程中,RL 代理人面临的权衡决策:利用现有知识来获得奖励或寻找可能更高的奖励。使用不确定性为导向原则提供了一种活跃和有效的方法解决这个权衡决策,并且集成方法是RL中一个显著的进路。然而,传统的集成方法缺乏明确的先验, deviating from Bayesian principles。此外,这种方法需要集成成员之间的多样性,以生成更加不偏的不确定性估计结果。为了解决以上问题,前一些研究已经将随机函数作为先验。我们的研究基于这些基础努力,提出了一种创新的方法,通过精心设计的先验神经网络,可以吸收最大的多样性在RL中的初始值函数中。我们的方法在经典控制问题和普通探索任务中表现出了superior的性能,significantly improve sample efficiency。
results: 我们在多个数据集和数据分布设置下进行了实验,并评估了我们的方法在 clustering 分数、准确率和 v-度量上的性能。结果表明,我们的方法可以与中央化 classical k-means 基线方法匹配表现,并在实际场景中超越现有的 federated clustering 方法。Abstract
Federated clustering is an important part of the field of federated machine learning, that allows multiple data sources to collaboratively cluster their data while keeping it decentralized and preserving privacy. In this paper, we introduce a novel federated clustering algorithm, named Dynamically Weighted Federated k-means (DWF k-means), to address the challenges posed by distributed data sources and heterogeneous data. Our proposed algorithm combines the benefits of traditional clustering techniques with the privacy and scalability advantages of federated learning. It enables multiple data owners to collaboratively cluster their local data while exchanging minimal information with a central coordinator. The algorithm optimizes the clustering process by adaptively aggregating cluster assignments and centroids from each data source, thereby learning a global clustering solution that reflects the collective knowledge of the entire federated network. We conduct experiments on multiple datasets and data distribution settings to evaluate the performance of our algorithm in terms of clustering score, accuracy, and v-measure. The results demonstrate that our approach can match the performance of the centralized classical k-means baseline, and outperform existing federated clustering methods in realistic scenarios.
摘要
federated clustering 是Machine learning field中的一个重要部分,它允许多个数据源共同分类他们的数据,而不需要集中化和遗漏隐私。在这篇论文中,我们介绍了一种新的联邦分类算法,名为 Dynamically Weighted Federated k-means (DWF k-means),以解决分布式数据源和不同数据的挑战。我们的提议的算法结合了传统分类技术的优点和联邦学习中的隐私和扩展性优势。它允许多个数据所有者共同分类本地数据,只需要与中央协调器交换最小的信息。算法通过自适应聚合分类分配和中心点,从每个数据源中学习一个全局分类解决方案,这个解决方案反映了整个联邦网络中的共同知识。我们在多个数据集和数据分布设置下进行了实验,以评估我们的方法在分类分数、准确率和v-度量上的性能。结果显示,我们的方法可以与中央化类似的传统k-means基准相匹配,并在实际情况下超过现有的联邦分类方法。
Zero-knowledge Proof Meets Machine Learning in Verifiability: A Survey
for: This paper focuses on the trustworthiness problem of machine learning model computations, specifically in outsourced learning and federated learning settings. It proposes a solution using zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.
methods: The paper analyzes potential verifiability issues in different machine learning scenarios and provides a formal definition of ZKP-VML. It also classifies existing works based on their technical approaches and discusses key challenges and future directions in the field.
results: The paper presents a comprehensive survey of ZKP-based VML technology and its applications in machine learning. It provides a detailed analysis of existing works and identifies key challenges and future research directions in the field.Abstract
With the rapid advancement of artificial intelligence technology, the usage of machine learning models is gradually becoming part of our daily lives. High-quality models rely not only on efficient optimization algorithms but also on the training and learning processes built upon vast amounts of data and computational power. However, in practice, due to various challenges such as limited computational resources and data privacy concerns, users in need of models often cannot train machine learning models locally. This has led them to explore alternative approaches such as outsourced learning and federated learning. While these methods address the feasibility of model training effectively, they introduce concerns about the trustworthiness of the training process since computations are not performed locally. Similarly, there are trustworthiness issues associated with outsourced model inference. These two problems can be summarized as the trustworthiness problem of model computations: How can one verify that the results computed by other participants are derived according to the specified algorithm, model, and input data? To address this challenge, verifiable machine learning (VML) has emerged. This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology. We first analyze the potential verifiability issues that may exist in different machine learning scenarios. Subsequently, we provide a formal definition of ZKP-VML. We then conduct a detailed analysis and classification of existing works based on their technical approaches. Finally, we discuss the key challenges and future directions in the field of ZKP-based VML.
摘要
We first analyze the potential verifiability issues that may exist in different machine learning scenarios. Subsequently, we provide a formal definition of ZKP-VML. We then conduct a detailed analysis and classification of existing works based on their technical approaches. Finally, we discuss the key challenges and future directions in the field of ZKP-based VML.翻译结果随着人工智能技术的快速发展,我们日常生活中的机器学习模型使用也在不断增加。高质量的模型不仅依赖于高效的优化算法,还依赖于基于大量数据和计算能力的训练和学习过程。然而,在做实际应用时,由于限制计算资源和数据隐私问题,用户经常无法本地训练机器学习模型。这导致他们寻找外部学习和联合学习的方法。虽然这些方法解决了本地训练的可行性问题,但它们引入了计算是否符合规则、模型和输入数据的可靠性问题。此外,外部模型推理也存在可靠性问题。这两个问题可以总结为机器学习计算的可靠性问题:如何确保其他参与者计算的结果是根据指定的算法、模型和输入数据得出的? To address this challenge, verifiable machine learning (VML) has emerged. This paper presents a comprehensive survey of zero-knowledge proof-based verifiable machine learning (ZKP-VML) technology.我们首先分析了不同的机器学习场景中的可靠性问题。接着,我们提供了ZKP-VML的正式定义。然后,我们对现有的技术方法进行了详细的分析和分类。最后,我们讨论了ZKP-VML领域的关键挑战和未来方向。
ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt
results: 对 hybrid pre-training 方法进行了改进,并在不同级别(node-node level和node-group level)进行了融合,提高了表现。广泛的实验表明,我们提出的 ULTRA-DP 可以显著提高 hybrid pre-training 方法的表现,并且具有普适性和可移植性。Abstract
Recent research has demonstrated the efficacy of pre-training graph neural networks (GNNs) to capture the transferable graph semantics and enhance the performance of various downstream tasks. However, the semantic knowledge learned from pretext tasks might be unrelated to the downstream task, leading to a semantic gap that limits the application of graph pre-training. To reduce this gap, traditional approaches propose hybrid pre-training to combine various pretext tasks together in a multi-task learning fashion and learn multi-grained knowledge, which, however, cannot distinguish tasks and results in some transferable task-specific knowledge distortion by each other. Moreover, most GNNs cannot distinguish nodes located in different parts of the graph, making them fail to learn position-specific knowledge and lead to suboptimal performance. In this work, inspired by the prompt-based tuning in natural language processing, we propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs through a prompt mechanism, namely multi-task graph dual prompt (ULTRA-DP). Based on this framework, we propose a prompt-based transferability test to find the most relevant pretext task in order to reduce the semantic gap. To implement the hybrid pre-training tasks, beyond the classical edge prediction task (node-node level), we further propose a novel pre-training paradigm based on a group of $k$-nearest neighbors (node-group level). The combination of them across different scales is able to comprehensively express more structural semantics and derive richer multi-grained knowledge. Extensive experiments show that our proposed ULTRA-DP can significantly enhance the performance of hybrid pre-training methods and show the generalizability to other pre-training tasks and backbone architectures.
摘要
Sharp error bounds for imbalanced classification: how many examples in the minority class?
paper_authors: Anass Aghbalou, François Portier, Anne Sabourin
for: This paper addresses the challenge of imbalanced classification data, specifically when the rare class probability approaches zero.
methods: The paper presents two novel contributions: a non-asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and a consistent upper bound for balanced nearest neighbors estimates.
results: The findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.Here is the same information in Simplified Chinese text:
results: 研究结果为偏心分类问题提供了更清晰的理解,打开了新的研究方向,对这一领域进行进一步的探索。Abstract
When dealing with imbalanced classification data, reweighting the loss function is a standard procedure allowing to equilibrate between the true positive and true negative rates within the risk measure. Despite significant theoretical work in this area, existing results do not adequately address a main challenge within the imbalanced classification framework, which is the negligible size of one class in relation to the full sample size and the need to rescale the risk function by a probability tending to zero. To address this gap, we present two novel contributions in the setting where the rare class probability approaches zero: (1) a non asymptotic fast rate probability bound for constrained balanced empirical risk minimization, and (2) a consistent upper bound for balanced nearest neighbors estimates. Our findings provide a clearer understanding of the benefits of class-weighting in realistic settings, opening new avenues for further research in this field.
摘要
A non-asymptotic fast rate probability bound for constrained balanced empirical risk minimization.2. A consistent upper bound for balanced nearest neighbors estimates.Our findings provide a better understanding of the benefits of class-weighting in realistic scenarios, opening up new avenues for further research in this field.
Text2Topic: Multi-Label Text Classification System for Efficient Topic Detection in User Generated Content with Zero-Shot Capabilities
paper_authors: Fengjun Wang, Moran Beladev, Ofri Kleinfeld, Elina Frayerman, Tal Shachar, Eran Fainman, Karen Lastmann Assaraf, Sarai Mizrachi, Benjamin Wang
for: This paper proposes a new method for multi-label text classification, called Text to Topic (Text2Topic), which is designed for high-performance and scalability.
methods: The Text2Topic model uses a Bi-Encoder Transformer architecture that employs concatenation, subtraction, and multiplication of embeddings on both text and topic. The model also supports zero-shot predictions, produces domain-specific text embeddings, and enables production-scale batch-inference with high throughput.
results: The final Text2Topic model achieves accurate and comprehensive results compared to state-of-the-art baselines, including large language models (LLMs). In a real-world stream processing platform, the model outperforms other models with 92.9% micro mAP and 75.8% macro mAP scores. The paper also conducts extensive ablation studies to validate the effectiveness of the modeling choices.Abstract
Multi-label text classification is a critical task in the industry. It helps to extract structured information from large amount of textual data. We propose Text to Topic (Text2Topic), which achieves high multi-label classification performance by employing a Bi-Encoder Transformer architecture that utilizes concatenation, subtraction, and multiplication of embeddings on both text and topic. Text2Topic also supports zero-shot predictions, produces domain-specific text embeddings, and enables production-scale batch-inference with high throughput. The final model achieves accurate and comprehensive results compared to state-of-the-art baselines, including large language models (LLMs). In this study, a total of 239 topics are defined, and around 1.6 million text-topic pairs annotations (in which 200K are positive) are collected on approximately 120K texts from 3 main data sources on Booking.com. The data is collected with optimized smart sampling and partial labeling. The final Text2Topic model is deployed on a real-world stream processing platform, and it outperforms other models with 92.9% micro mAP, as well as a 75.8% macro mAP score. We summarize the modeling choices which are extensively tested through ablation studies, and share detailed in-production decision-making steps.
摘要
多个标签文本分类是产业中的关键任务,它可以从大量文本数据中提取结构化信息。我们提出了文本到话题(Text2Topic),它利用BI-EncoderTransformer架构,通过 concatenation、减法和乘法操作,实现高精度多标签分类。Text2Topic还支持零shot预测、生成域pecific文本嵌入和高效批处理。最终模型实现了比state-of-the-art基elines更高的精度和完整性。在这个研究中,总共定义了239个话题,并收集了约1.6万个文本-话题对(其中200000是正例)的注解,来自3个主要数据源,包括Booking.com。数据采集使用优化的聪明采集和半标注。最终的Text2Topic模型在实际世界流处理平台上部署,并与其他模型相比,实现了92.9%微MAP和75.8%macro MAP的高性能。我们系统地测试了模型选择,并在文章中分享了生产环境中的决策过程。
Learning spatio-temporal patterns with Neural Cellular Automata
paper_authors: Alex D. Richardson, Tibor Antal, Richard A. Blythe, Linus J. Schumacher
for: 学习复杂动力学系统的本质规律
methods: 使用神经细胞自动机(NCA)学习图像时序序列和非线性偏微分方程(PDE)的轨迹
results: 能够学习出较为复杂的动力学系统的本质规律,并且可以在不同的系统中进行泛化Here’s a more detailed explanation of each point:
for: The paper is written to explore the use of Neural Cellular Automata (NCA) for learning complex dynamics in systems, particularly in the context of biological pattern formation.
methods: The paper uses NCA, a combination of machine learning and mechanistic modelling, to learn the underlying local rules that govern large-scale dynamic emergent behaviors from time series of images and PDE trajectories.
results: The paper demonstrates that NCA can capture both transient and stable structures within the same system, and can generalize well beyond its training data. The paper also explores the effects of associated hyperparameters on model performance and stability, and shows how to constrain NCA to respect given symmetries.Abstract
Neural Cellular Automata (NCA) are a powerful combination of machine learning and mechanistic modelling. We train NCA to learn complex dynamics from time series of images and PDE trajectories. Our method is designed to identify underlying local rules that govern large scale dynamic emergent behaviours. Previous work on NCA focuses on learning rules that give stationary emergent structures. We extend NCA to capture both transient and stable structures within the same system, as well as learning rules that capture the dynamics of Turing pattern formation in nonlinear Partial Differential Equations (PDEs). We demonstrate that NCA can generalise very well beyond their PDE training data, we show how to constrain NCA to respect given symmetries, and we explore the effects of associated hyperparameters on model performance and stability. Being able to learn arbitrary dynamics gives NCA great potential as a data driven modelling framework, especially for modelling biological pattern formation.
摘要
Mid-Long Term Daily Electricity Consumption Forecasting Based on Piecewise Linear Regression and Dilated Causal CNN
paper_authors: Zhou Lan, Ben Liu, Yi Feng, Danhuang Dong, Peng Zhang
for: 预测每天的电力消耗
methods: 使用分割式线性回归和扩展 causal CNN 进行预测
results: 比现有方法更高的预测精度Abstract
Daily electricity consumption forecasting is a classical problem. Existing forecasting algorithms tend to have decreased accuracy on special dates like holidays. This study decomposes the daily electricity consumption series into three components: trend, seasonal, and residual, and constructs a two-stage prediction method using piecewise linear regression as a filter and Dilated Causal CNN as a predictor. The specific steps involve setting breakpoints on the time axis and fitting the piecewise linear regression model with one-hot encoded information such as month, weekday, and holidays. For the challenging prediction of the Spring Festival, distance is introduced as a variable using a third-degree polynomial form in the model. The residual sequence obtained in the previous step is modeled using Dilated Causal CNN, and the final prediction of daily electricity consumption is the sum of the two-stage predictions. Experimental results demonstrate that this method achieves higher accuracy compared to existing approaches.
摘要
每日电力消耗预测是一个经典的问题。现有的预测算法通常在特殊的日子 like 假期和节假日上减少准确性。这种研究将每日电力消耗序列分解成三个组件:趋势、季节性和差异,并构建了一种两阶段预测方法,使用 piecwise 线性回归作为筛选器和扩展 causal CNN 作为预测器。具体步骤包括在时间轴上设置分支点,并使用一个简单的一频率编码的月、周日和假期信息来适应 piecwise 线性回归模型。为了解决春节的难预测问题,在模型中引入了距离变量,使用第三度多项式形式。最后,通过将差异序列模型化为 Dilated Causal CNN,并将两个阶段预测结果相加,得到了每日电力消耗的最终预测结果。实验结果表明,这种方法与现有方法相比,具有更高的准确性。
Principled Approaches for Learning to Defer with Multiple Experts
results: 证明新损失函数具有强$H$-一致性约束,并在几个实践中给出了明确的保证。Abstract
We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate the application of our analysis through several examples of practical surrogate losses, for which we give explicit guarantees. These loss functions readily lead to the design of new learning to defer algorithms based on their minimization. While the main focus of this work is a theoretical analysis, we also report the results of several experiments on SVHN and CIFAR-10 datasets.
摘要
我团队发表了一篇关于代表函数和算法的总体研究,旨在学习多个专家之间的延迟学习问题。我们首先介绍了一个新的多专家设置下的代表函数家族,同时学习预测和延迟函数。然后,我们证明了这些代表函数受益于强$H$-一致性 bound。我们通过一些实际的代表函数例子,给出了Explicit guarantees。这些损失函数直接导致了基于其最小化的学习延迟算法的设计。虽然我们的主要关注点是理论分析,但我们还在SVHN和CIFAR-10 datasets上进行了一些实验。
Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms
results: 实验结果表明,使用我们提出的新 surrogate 损失函数家族和两stage 算法可以得到remarkable的性能提升,比现有的算法更高。Abstract
We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove strong non-asymptotic and hypothesis set-specific consistency guarantees, thereby resolving positively two existing open questions. These guarantees provide upper bounds on the estimation error of the abstention loss function in terms of that of the surrogate loss. We analyze both a single-stage setting where the predictor and rejector are learned simultaneously and a two-stage setting crucial in applications, where the predictor is learned in a first stage using a standard surrogate loss such as cross-entropy. These guarantees suggest new multi-class abstention algorithms based on minimizing these surrogate losses. We also report the results of extensive experiments comparing these algorithms to the current state-of-the-art algorithms on CIFAR-10, CIFAR-100 and SVHN datasets. Our results demonstrate empirically the benefit of our new surrogate losses and show the remarkable performance of our broadly applicable two-stage abstention algorithm.
摘要
我们研究多类别学习中的决策弃权框架。在这种设定下,学习者可以选择不预测某些预定的成本。我们在predictor-rejector框架下提供了一系列新的理论和算法结果,用于解决这个学习问题。我们引入了多种新的代理损函数,并证明了这些损函数对于不同的假设集拥有强不偏极和特定的一致性保证,因此解决了两个现有的开问。这些保证提供了对决策损函数的估计误差的上限,以及这些损函数对决策损函数的预测性。我们分析了单阶段设定和两阶段设定,其中在第一阶段使用标准的替身损函数,如十字积分损函数。这些保证和实验结果表明,使用我们的新代理损函数可以提高多类别决策的性能。我们还报告了在CIFAR-10、CIFAR-100和SVHN数据集上的实验结果,并证明了我们的算法在这些数据集上的Remarkable性能。
Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention
for: This paper is written for learning with abstention in the multi-class classification setting, with a focus on score-based formulations and surrogate losses.
methods: The paper introduces new families of surrogate losses for the abstention loss function, including state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. The authors also prove strong consistency guarantees for these surrogate losses.
results: The paper experiments on CIFAR-10, CIFAR-100, and SVHN datasets and shows the practical significance of the new surrogate losses and two-stage abstention algorithms. The results also demonstrate that the relative performance of state-of-the-art score-based surrogate losses can vary across datasets.Here is the same information in Simplified Chinese text:
results: 论文在 CIFAR-10、CIFAR-100 和 SVHN 等 datasets 进行了实验,并证明了新的代理损函数和两阶段退出算法的实际意义。结果还显示了不同的得分基 surrogate loss 的相对性能可以随 datasets 的不同而变化。Abstract
Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and a novel family of loss functions in the two-stage setting. We prove strong non-asymptotic and hypothesis set-specific consistency guarantees for these surrogate losses, which upper-bound the estimation error of the abstention loss function in terms of the estimation error of the surrogate loss. Our bounds can help compare different score-based surrogates and guide the design of novel abstention algorithms by minimizing the proposed surrogate losses. We experimentally evaluate our new algorithms on CIFAR-10, CIFAR-100, and SVHN datasets and the practical significance of our new surrogate losses and two-stage abstention algorithms. Our results also show that the relative performance of the state-of-the-art score-based surrogate losses can vary across datasets.
摘要
学习弃权是一个关键场景,在这种场景下学习者可以弃权不预测。在这篇论文中,我们分析了在多类分类设定下的分数基式学习弃权场景。我们引入了新的家族surrogate损失函数,包括单 Stage设定下的state-of-the-art surrogate损失函数和两 Stage设定下的一个新家族损失函数。我们证明了这些surrogate损失函数的强非尺度和假设集特定的一致性保证,这些保证可以Upper-bound abstention损失函数的估计误差,基于surrogate损失函数的估计误差。我们的界限可以帮助比较不同的分数基式surrogate,并 guideline novel abstention算法的设计。我们在CIFAR-10、CIFAR-100和SVHN数据集上进行实验,并证明了我们的新surrogate损失函数和两Stage abstention算法的实际意义。我们的结果还表明,不同数据集上state-of-the-art分数基式surrogate损失函数的相对性能可能会变化。
Improved K-mer Based Prediction of Protein-Protein Interactions With Chaos Game Representation, Deep Learning and Reduced Representation Bias
for: This paper aims to address the problem of representation bias in machine learning models used to predict protein-protein interactions, by extracting unique pairs from an interaction dataset and generating non-redundant paired data.
methods: The authors use a method for extracting unique pairs from an interaction dataset, which involves clustering protein pairs based on their similarity and then removing any pairs that are not truly unique. They also use a convolutional neural network (CNN) model to learn and predict interactions from Chaos Game Representations of proteins’ coding genes.
results: The authors applied their method to datasets containing Arabidopsis thaliana and pathogen effector interactions, and demonstrated that their approach can generate non-redundant paired data that can be used to train machine learning models to predict protein-protein interactions with high accuracy.Abstract
Protein-protein interactions drive many biological processes, including the detection of phytopathogens by plants' R-Proteins and cell surface receptors. Many machine learning studies have attempted to predict protein-protein interactions but performance is highly dependent on training data; models have been shown to accurately predict interactions when the proteins involved are included in the training data, but achieve consistently poorer results when applied to previously unseen proteins. In addition, models that are trained using proteins that take part in multiple interactions can suffer from representation bias, where predictions are driven not by learned biological features but by learning of the structure of the interaction dataset. We present a method for extracting unique pairs from an interaction dataset, generating non-redundant paired data for unbiased machine learning. After applying the method to datasets containing _Arabidopsis thaliana_ and pathogen effector interations, we developed a convolutional neural network model capable of learning and predicting interactions from Chaos Game Representations of proteins' coding genes.
摘要
生物过程中,蛋白质-蛋白质交互作用扮演着重要角色,包括植物R-蛋白和表面受体检测病原体。许多机器学习研究尝试预测蛋白质-蛋白质交互,但是模型性能高度取决于训练数据。当包含交互中的蛋白质时,模型可以准确预测交互,但是在前期未见蛋白质时,模型性能往往较差。此外,使用参与多个交互的蛋白质进行训练可能会导致代表性偏见,其中预测不再是基于学习生物特征,而是学习交互数据集的结构。我们提出了一种方法,用于从交互数据集中提取独特对,生成非重复的对数据,以避免代表性偏见。在应用这种方法于包含阿拉伯豆植物和病原体效应蛋白交互的数据集上,我们开发了一种基于混沌游戏表示的蛋白质编码基因 convolutional neural network 模型,可以学习和预测蛋白质交互。
Externally Valid Policy Evaluation Combining Trial and Observational Data
results: 可以确定的评估结果,即使模型有较大的误差Here’s a breakdown of each point:1. for: The paper is written to evaluate the effectiveness of decision policies using trial data.2. methods: The paper uses trial data and additional covariate data from the target population to model the sampling of individuals in the trial study. The method is nonparametric and can handle any specified range of model miscalibrations.3. results: The method provides certifiably valid trial-based policy evaluations, even with finite samples. The results are illustrated using both simulated and real data.Abstract
Randomized trials are widely considered as the gold standard for evaluating the effects of decision policies. Trial data is, however, drawn from a population which may differ from the intended target population and this raises a problem of external validity (aka. generalizability). In this paper we seek to use trial data to draw valid inferences about the outcome of a policy on the target population. Additional covariate data from the target population is used to model the sampling of individuals in the trial study. We develop a method that yields certifiably valid trial-based policy evaluations under any specified range of model miscalibrations. The method is nonparametric and the validity is assured even with finite samples. The certified policy evaluations are illustrated using both simulated and real data.
摘要
随机对照试验被广泛认为是评估决策政策的 золо标准。试验数据来自一个可能与目标人口 Population 不同的人口,这引起了外部有效性(即泛化)的问题。在这篇论文中,我们想使用试验数据来得出有效的政策影响 outcome 的引导。我们使用来自目标人口的额外 covariate 数据来模拟试验中的个体采样。我们开发了一种可以在任何指定的模型偏差范围内获得有效的试验基本政策评估方法。这种方法是非 Parametric 的,并且在 finite samples 下保证有效。我们通过使用实际数据和 simulated 数据进行了示例。
Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules
results: 研究发现,使用subgraph级别的tokenizer和具有表达能力的解码器可以大幅提高图自动编码器的表示学习。此外,提出了一种新的MGM方法SimSGT,其中包括一种简单的GNN基于的tokenizer(SGT)和一种有效的解码策略。经验 validate that our method outperforms the existing molecule self-supervised learning methods。Abstract
Masked graph modeling excels in the self-supervised representation learning of molecular graphs. Scrutinizing previous studies, we can reveal a common scheme consisting of three key components: (1) graph tokenizer, which breaks a molecular graph into smaller fragments (i.e., subgraphs) and converts them into tokens; (2) graph masking, which corrupts the graph with masks; (3) graph autoencoder, which first applies an encoder on the masked graph to generate the representations, and then employs a decoder on the representations to recover the tokens of the original graph. However, the previous MGM studies focus extensively on graph masking and encoder, while there is limited understanding of tokenizer and decoder. To bridge the gap, we first summarize popular molecule tokenizers at the granularity of node, edge, motif, and Graph Neural Networks (GNNs), and then examine their roles as the MGM's reconstruction targets. Further, we explore the potential of adopting an expressive decoder in MGM. Our results show that a subgraph-level tokenizer and a sufficiently expressive decoder with remask decoding have a large impact on the encoder's representation learning. Finally, we propose a novel MGM method SimSGT, featuring a Simple GNN-based Tokenizer (SGT) and an effective decoding strategy. We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/SimSGT.
摘要
<>TRANSLATE_TEXT模型化图表示力在分解化学Graph上自我超级vised学习中表现出色。从前期研究中可以找到一种通用的 schema,包括三个关键组件:(1)图标记化,将化学Graph分解成更小的 Fragment(i.e., 子图)并将其转换为token;(2)图遮盖,对图进行遮盖;(3)图自动编码器,首先将遮盖后的图进行编码,然后使用Decoder将编码后的表示 reconstruction 到原始图的token。但是,之前的MGM研究强调了图遮盖和编码器,而忽略了标记化和Decoder。为了填补这个差距,我们首先总结了流行的分子标记化器,并分析它们在MGM的重建目标中的角色。然后,我们探索采用表达力强的Decoder可能有优势。我们的结果表明,使用分子水平的标记化器和具有重新遮盖编码的Decoder可以对编码器的表示学习产生重要影响。最后,我们提出了一种新的MGM方法SimSGT,其特点是使用简单的GNN-based Tokenizer(SGT)和有效的编码策略。我们通过实验证明,我们的方法在分子自我超级vised学习中表现出色,并且超过了现有的分子自我学习方法。我们的代码和检查点可以在https://github.com/syr-cn/SimSGT上获取。
results: 实验表明,CODE可以在多臂和线性帮助上实现近似最优的停损 bound,而且在实际问题上也有优于其他可解释的设计。此外,CODE还可以在不同的约束下进行可靠地推荐。Abstract
Motivated by the importance of explainability in modern machine learning, we design bandit algorithms that are \emph{efficient} and \emph{interpretable}. A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter. To quantify the interpretability, we introduce a novel metric of \textit{uncertainty loss}, which compares the rate of the uncertainty reduction to the theoretical optimum. We propose CODE, a bandit algorithm based on a \textbf{C}onstrained \textbf{O}ptimal \textbf{DE}sign, that is interpretable and maximally reduces the uncertainty. The key idea in \code is to explore among all plausible actions, determined by a statistical constraint, to achieve interpretability. We implement CODE efficiently in both multi-armed and linear bandits and derive near-optimal regret bounds by leveraging the optimality criteria of the approximate optimal design. CODE can be also viewed as removing phases in conventional phased elimination, which makes it more practical and general. We demonstrate the advantage of \code by numerical experiments on both synthetic and real-world problems. CODE outperforms other state-of-the-art interpretable designs while matching the performance of popular but uninterpretable designs, such as upper confidence bound algorithms.
摘要
受现代机器学习中解释性的重要性启发,我们设计了高效和可解释的bandit算法。一个可解释的bandit算法是如果它在不确定模型参数下进行探索,以减少不确定性。为量化解释性,我们引入了一个新的不确定损失度量,该度量比较探索动作对于理论最优的不确定度减少率。我们提议了一种基于《C》онstrained《O》ptimal《D》esign的CODE算法,它是可解释的并尽可能减少不确定性。CODE算法的关键思想是在所有可能的动作中探索,以实现可解释性。我们有效地实现了CODE算法在多重臂和线性bandit中,并 derive了近似优质 regret bound。CODE算法可以被视为从 conventual phased elimination中除去阶段,这使得它更加实用和通用。我们通过数字实验表示,CODE算法在 synthetic 和实际问题上具有优势,并与流行但不可解释的设计匹配性。
A Comparative Study of Portfolio Optimization Methods for the Indian Stock Market
results: 对每个领域的三个 portefolio进行测试,使用三个绩效指标(累积回报、年度波动率、希克率)评估 portefolio的表现,并为每个领域选择最高累积回报、最低波动率和最大希克率的 portefolio。Abstract
This chapter presents a comparative study of the three portfolio optimization methods, MVP, HRP, and HERC, on the Indian stock market, particularly focusing on the stocks chosen from 15 sectors listed on the National Stock Exchange of India. The top stocks of each cluster are identified based on their free-float market capitalization from the report of the NSE published on July 1, 2022 (NSE Website). For each sector, three portfolios are designed on stock prices from July 1, 2019, to June 30, 2022, following three portfolio optimization approaches. The portfolios are tested over the period from July 1, 2022, to June 30, 2023. For the evaluation of the performances of the portfolios, three metrics are used. These three metrics are cumulative returns, annual volatilities, and Sharpe ratios. For each sector, the portfolios that yield the highest cumulative return, the lowest volatility, and the maximum Sharpe Ratio over the training and the test periods are identified.
摘要
本章 presenta una estudio comparativo de los métodos de optimización de cartera MVP, HRP y HERC en el mercado de valores indio, enfocándose en las acciones seleccionadas de 15 sectores listados en la Bolsa de Valores Nacional de India. Los mejores acciones de cada cluster se identifican basados en su capitalización de mercado libre flotante según el informe del NSE publicado el 1 de julio de 2022 (NSE Website). Para cada sector, se diseñan tres carteras sobre los precios de las acciones del período desde el 1 de julio de 2019 hasta el 30 de junio de 2022, siguiendo tres enfoques de optimización de cartera. Las carteras se prueban durante el período desde el 1 de julio de 2022 hasta el 30 de junio de 2023. Para evaluar el desempeño de las carteras, se utilizan tres métricas. Estas tres métricas son los rendimientos acumulados, las volatilidades anuales y los coeficientes de Sharpe. Para cada sector, se identifican las carteras que proporcionan los mayores rendimientos acumulados, las volatilidades más bajas y los coeficientes de Sharpe más altos durante el período de entrenamiento y prueba.
Extended Deep Adaptive Input Normalization for Preprocessing Time Series Data for Neural Networks
paper_authors: Marcus A. K. September, Francesco Sanna Passino, Leonie Goldmann, Anton Hinel for: This paper focuses on addressing the challenges of preprocessing time series data for machine learning tasks, specifically using deep neural networks.methods: The proposed EDAIN (Extended Deep Adaptive Input Normalization) layer is an adaptive neural layer that learns to normalize irregular time series data in an end-to-end fashion using back-propagation.results: The EDAIN layer outperforms conventional normalization methods and existing adaptive time series preprocessing layers in experiments using synthetic data, a credit default prediction dataset, and a large-scale limit order book benchmark dataset.Abstract
Data preprocessing is a crucial part of any machine learning pipeline, and it can have a significant impact on both performance and training efficiency. This is especially evident when using deep neural networks for time series prediction and classification: real-world time series data often exhibit irregularities such as multi-modality, skewness and outliers, and the model performance can degrade rapidly if these characteristics are not adequately addressed. In this work, we propose the EDAIN (Extended Deep Adaptive Input Normalization) layer, a novel adaptive neural layer that learns how to appropriately normalize irregular time series data for a given task in an end-to-end fashion, instead of using a fixed normalization scheme. This is achieved by optimizing its unknown parameters simultaneously with the deep neural network using back-propagation. Our experiments, conducted using synthetic data, a credit default prediction dataset, and a large-scale limit order book benchmark dataset, demonstrate the superior performance of the EDAIN layer when compared to conventional normalization methods and existing adaptive time series preprocessing layers.
摘要
“数据预处理是机器学习管道中的关键环节,它对性能和训练效率都有很大的影响,尤其是在使用深度神经网络进行时间序列预测和分类时。实际世界的时间序列数据经常具有不规则性,如多模性、偏度和异常值,如果不足以处理这些特征,模型的性能就会快速下降。在这项工作中,我们提议使用EDAIN(扩展深度适应输入 нор化)层,这是一种可以在端到端的方式下适应时间序列数据的不规则性的新型神经层。通过与深度神经网络一起使用反射传播来优化其未知参数,EDAIN层可以在不使用固定 нор化方案的情况下,为给定任务适应地normalize时间序列数据。我们使用了 sintetic数据、一个信用风险预测数据集和一个大规模的 limit order book 数据集进行实验,结果表明,与常规normalization方法和现有的时间序列预处理层相比,EDAIN层在性能上表现出色。”
A Hybrid GNN approach for predicting node data for 3D meshes
results: 本研究的结果显示,Hybrid方法可以实现更高的预测精度和更快速的预测时间,比较前的PointNet和简单的graph neural network模型。新的模型可以更好地处理网格或点 cloud结构,并且可以实现更好的数据生成和预测。Abstract
Metal forging is used to manufacture dies. We require the best set of input parameters for the process to be efficient. Currently, we predict the best parameters using the finite element method by generating simulations for the different initial conditions, which is a time-consuming process. In this paper, introduce a hybrid approach that helps in processing and generating new data simulations using a surrogate graph neural network model based on graph convolutions, having a cheaper time cost. We also introduce a hybrid approach that helps in processing and generating new data simulations using the model. Given a dataset representing meshes, our focus is on the conversion of the available information into a graph or point cloud structure. This new representation enables deep learning. The predicted result is similar, with a low error when compared to that produced using the finite element method. The new models have outperformed existing PointNet and simple graph neural network models when applied to produce the simulations.
摘要
钢铁锻造被用于生产锻造模具。我们需要最佳的输入参数来使 processefficient。目前,我们使用finite element方法预测最佳参数,这是一个时间消耗的过程。在这篇论文中,我们提出了一种混合方法,即基于图 convolutional neural network(Graph CNN)模型,可以减少时间成本。我们还提出了一种混合方法,可以处理和生成新的数据 simulateonusing这种模型。给定一个表示 mesh 的数据集,我们的关注点在于将可用的信息转换为图或点云结构。这种新的表示允许深度学习。预测结果与finite element方法生成的结果相似,错误低。新的模型在应用于生成 simulateon时表现出色,超越了点云网络和简单的图神经网络模型。
Federated learning compression designed for lightweight communications
results: 本文显示,使用压缩技术可以将 messages 压缩到 50%,并且仅导致精度损失小于 1%,与现有技术竞争。Abstract
Federated Learning (FL) is a promising distributed method for edge-level machine learning, particularly for privacysensitive applications such as those in military and medical domains, where client data cannot be shared or transferred to a cloud computing server. In many use-cases, communication cost is a major challenge in FL due to its natural intensive network usage. Client devices, such as smartphones or Internet of Things (IoT) nodes, have limited resources in terms of energy, computation, and memory. To address these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralised paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Going further, we demonstrate that a straightforward method can compresses messages up to 50% while having less than 1% of accuracy loss, competing with state-of-the-art techniques.
摘要
federated 学习(FL)是一种有前途的分布式方法,尤其适用于隐私敏感的应用场景,如军事和医疗领域, где客户端数据无法被传输到云计算服务器。在许多实例中,通信成本是FL的主要挑战,因为它的自然极为网络使用量。客户端设备,如智能手机或物联网(IoT)节点,具有有限的能源、计算和存储资源。为Addressing these hardware constraints, lightweight models and compression techniques such as pruning and quantization are commonly adopted in centralized paradigms. In this paper, we investigate the impact of compression techniques on FL for a typical image classification task. Furthermore, we show that a straightforward method can compress messages up to 50% while incurring less than 1% accuracy loss, competing with state-of-the-art techniques.
Population Descent: A Natural-Selection Based Hyper-Parameter Tuning Framework
results: 在常用 benchmark 任务上,与现有 state-of-the-art 算法相比, adaptive m-elitist 选择和normalized-fitness-based randomization 方法可以提高表现,最高提高13%。Here’s the English version for reference:
for: Hyperparameter optimization
methods: Population Descent, a memetic algorithm-based optimization method
results: On common benchmark tasks, the adaptive m-elitist selection and normalized-fitness-based randomization method outperforms state-of-the-art algorithms by up to 13%.Abstract
First-order gradient descent has been the base of the most successful optimization algorithms ever implemented. On supervised learning problems with very high dimensionality, such as neural network optimization, it is almost always the algorithm of choice, mainly due to its memory and computational efficiency. However, it is a classical result in optimization that gradient descent converges to local minima on non-convex functions. Even more importantly, in certain high-dimensional cases, escaping the plateaus of large saddle points becomes intractable. On the other hand, black-box optimization methods are not sensitive to the local structure of a loss function's landscape but suffer the curse of dimensionality. Instead, memetic algorithms aim to combine the benefits of both. Inspired by this, we present Population Descent, a memetic algorithm focused on hyperparameter optimization. We show that an adaptive m-elitist selection approach combined with a normalized-fitness-based randomization scheme outperforms more complex state-of-the-art algorithms by up to 13% on common benchmark tasks.
摘要
首顺 gradient descent 是最成功的优化算法之一,它在超级vised learning 问题上,特别是神经网络优化中,总是选择的首选方法,主要是因为它的内存和计算效率。然而,是一个古老的优化结果, gradient descent 会 converges to мест minimum 在非对称函数上。尤其是在高维度情况下,穿过大的板块点的架构成为不可能的。相比之下,黑盒优化方法不受优化函数的本地结构的影响,但是受到维度的味道。而 memetic algorithms 旨在结合两者的优点。以 Population Descent 为例,我们提出了一种 adaptive m-elitist 选择策略和 normalized-fitness-based 随机化方案,可以在常见的 benchmark 任务上超过更复杂的现状算法,提高性能达 13%。
Tractable MCMC for Private Learning with Pure and Gaussian Differential Privacy
For: 提供 $\varepsilon$-纯 differential privacy (DP) 保证和不受可能的无限大隐私泄露的 posterior sampling 方法。* Methods: 使用 exponential mechanism 和 Markov chain Monte Carlo (MCMC) 方法,并将它们结合在一起以减少 $\delta$-approximation error。* Results: 提出了 Approximate SAample Perturbation (ASAP) 算法,可以在 MCMC 样本上添加 proportional noise,以保证 $\varepsilon$-纯 DP 和 $\delta=0$ pure Gaussian DP。并且证明了该算法可以在 nearly linear-time 内实现最佳率的 DP-ERM 问题。Abstract
Posterior sampling, i.e., exponential mechanism to sample from the posterior distribution, provides $\varepsilon$-pure differential privacy (DP) guarantees and does not suffer from potentially unbounded privacy breach introduced by $(\varepsilon,\delta)$-approximate DP. In practice, however, one needs to apply approximate sampling methods such as Markov chain Monte Carlo (MCMC), thus re-introducing the unappealing $\delta$-approximation error into the privacy guarantees. To bridge this gap, we propose the Approximate SAample Perturbation (abbr. ASAP) algorithm which perturbs an MCMC sample with noise proportional to its Wasserstein-infinity ($W_\infty$) distance from a reference distribution that satisfies pure DP or pure Gaussian DP (i.e., $\delta=0$). We then leverage a Metropolis-Hastings algorithm to generate the sample and prove that the algorithm converges in W$_\infty$ distance. We show that by combining our new techniques with a careful localization step, we obtain the first nearly linear-time algorithm that achieves the optimal rates in the DP-ERM problem with strongly convex and smooth losses.
摘要
后采样,即对 posterior distribution 的 exponential mechanism 样本,提供了 $\varepsilon$-纯 differential privacy (DP) 保证和不受可能的无限 privacy breach 引入的 $( \varepsilon, \delta)-$approximate DP。然而,在实践中,需要使用approximate sampling方法,如 Markov chain Monte Carlo (MCMC),从而重新引入 $\delta-$approximation error 到隐私保证中。为 bridging 这个差距,我们提出了 Approximate SAample Perturbation(简称 ASAP)算法,该算法在 MCMC 样本上添加了距离参考分布的 Wasserstein-infinity($W_\infty$) 误差,该分布满足纯 DP 或纯 Gaussian DP(即 $\delta=0$)。然后,我们利用 Metropolis-Hastings 算法生成样本,并证明该算法在 W$_\infty$ 距离上收敛。我们还证明,通过将我们的新技术与精心的本地化步骤相结合,可以获得首个 nearly linear-time 算法,实现 DP-ERM 问题中的优化率。
Predicting Accurate Lagrangian Multipliers for Mixed Integer Linear Programs
paper_authors: Francesco Demelas, Joseph Le Roux, Mathieu Lacroix, Axel Parmentier
for: 解决具有困难约束的混合整数线性Program (MILP) 问题。
methods: 使用深度学习方法,通过跳过梯度下降,快速优化凹陷函数。
results: 可以减少85%的误差,提供高质量的热启动解。Abstract
Lagrangian relaxation stands among the most efficient approaches for solving a Mixed Integer Linear Programs (MILP) with difficult constraints. Given any duals for these constraints, called Lagrangian Multipliers (LMs), it returns a bound on the optimal value of the MILP, and Lagrangian methods seek the LMs giving the best such bound. But these methods generally rely on iterative algorithms resembling gradient descent to maximize the concave piecewise linear dual function: the computational burden grows quickly with the number of relaxed constraints. We introduce a deep learning approach that bypasses the descent, effectively amortizing the local, per instance, optimization. A probabilistic encoder based on a graph convolutional network computes high-dimensional representations of relaxed constraints in MILP instances. A decoder then turns these representations into LMs. We train the encoder and decoder jointly by directly optimizing the bound obtained from the predicted multipliers. Numerical experiments show that our approach closes up to 85~\% of the gap between the continuous relaxation and the best Lagrangian bound, and provides a high quality warm-start for descent based Lagrangian methods.
摘要
拉格朗日 relaxation 是各种混合整数线性Programs (MILP) 中最为效率的解决方法之一,它可以给出 MILP 的优化目标值的下界,并且lagrangian方法会寻找提供这个下界的最佳Lagrangian Multipliers (LMs)。然而,这些方法通常需要使用迭代算法,类似于梯度下降,来最大化权重函数的凹陷部分,这会导致计算负担随着约束数量的增加。我们提出了一种基于深度学习的方法,它可以绕过下降,将每个实例的本地优化执行成功填充。我们使用一个基于图 convolutional neural network 的概率编码器计算 MILP 实例中约束的高维表示。然后,一个解码器将这些表示转化为 LMs。我们共同训练编码器和解码器,直接优化预测的多余分数提供的下界,以确保其高质量。数值实验表明,我们的方法可以逼近Continuous relaxation 和最佳 Lagrangian bound 之间的差距,并提供高质量的启动点 для梯度下降基于 Lagrangian 方法。
Making informed decisions in cutting tool maintenance in milling: A KNN based model agnostic approach
paper_authors: Aditya M. Rahalkar, Om M. Khare, Abhishek D. Patange
For: 本研究旨在提出一种基于KNN的白盒模型,以提高工具状况监测系统的可解释性和维护决策。* Methods: 该研究使用了各种机器学习技术进行工具状况监测,并在实验中采集了大量数据。 Decision trees 和 KNN 算法被用进行特征选择和分类。 hyperparameter 优化也进行了以提高模型的性能。* Results: 该研究使用了 KNN 白盒模型,可以帮助制造商更深入了解工具的维护和监测过程,并且可以提高工具状况监测系统的可解释性。Abstract
In machining processes, monitoring the condition of the tool is a crucial aspect to ensure high productivity and quality of the product. Using different machine learning techniques in Tool Condition Monitoring TCM enables a better analysis of the large amount of data of different signals acquired during the machining processes. The real time force signals encountered during the process were acquired by performing numerous experiments. Different tool wear conditions were considered during the experimentation. A comprehensive statistical analysis of the data and feature selection using decision trees was conducted, and the KNN algorithm was used to perform classification. Hyperparameter tuning of the model was done to improve the models performance. Much research has been done to employ machine learning approaches in tool condition monitoring systems, however, a model agnostic approach to increase the interpretability of the process and get an in depth understanding of how the decision making is done is not implemented by many. This research paper presents a KNN based white box model, which allows us to dive deep into how the model performs the classification and how it prioritizes the different features included. This approach helps in detecting why the tool is in a certain condition and allows the manufacturer to make an informed decision about the tools maintenance.
摘要
在机床过程中,监测工具状况是一项重要的方面,以确保高效率和产品质量。使用不同的机器学习技术在工具状况监测(TCM)中可以更好地分析各种信号的大量数据。在实验中,收集了不同工具损害情况下的实时力矩读数据。通过对数据进行全面统计分析和特征选择,使用KNN算法进行分类。为了改进模型性能,进行了模型参数调整。虽然许多研究把机器学习技术应用于工具状况监测系统,但是不多的研究者采用白盒模型来增加解释性和深入了解决ving过程中的决策。本研究论文提出了一种基于KNN的白盒模型,允许我们深入了解模型如何进行分类和如何优先级排序不同特征。这种方法可以帮助检测工具状况,并让制造商做出 Informed 决策 regarding 工具维护。
Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
results: 研究人员通过分析和实验发现,基于标识符的随机方法在深度神经网络训练中可以实现稳定和高效的性能,而且在分布式设置下,采用快速通信压缩协议可以实现线性增速。Abstract
Sign-based stochastic methods have gained attention due to their ability to achieve robust performance despite using only the sign information for parameter updates. However, the current convergence analysis of sign-based methods relies on the strong assumptions of first-order gradient Lipschitz and second-order gradient Lipschitz, which may not hold in practical tasks like deep neural network training that involve high non-smoothness. In this paper, we revisit sign-based methods and analyze their convergence under more realistic assumptions of first- and second-order smoothness. We first establish the convergence of the sign-based method under weak first-order Lipschitz. Motivated by the weak first-order Lipschitz, we propose a relaxed second-order condition that still allows for nonconvex acceleration in sign-based methods. Based on our theoretical results, we gain insights into the computational advantages of the recently developed LION algorithm. In distributed settings, we prove that this nonconvex acceleration persists with linear speedup in the number of nodes, when utilizing fast communication compression gossip protocols. The novelty of our theoretical results lies in that they are derived under much weaker assumptions, thereby expanding the provable applicability of sign-based algorithms to a wider range of problems.
摘要
<>传统的掌纹方法在深度学习中的应用中已经受到了广泛关注,因为它们可以在使用仅掌纹信息时实现稳定性。然而,现有的整合分析方法假设了first-order gradient Lipschitz和second-order gradient Lipschitz假设,这些假设在实际任务中可能不成立。在这篇论文中,我们重新审视了掌纹方法,并对其在更实际的first-和second-order smoothness假设下进行了整合分析。我们首先证明了使用weak first-order Lipschitz的掌纹方法的整合。受weak first-order Lipschitz的启发,我们提出了一种放宽的second-order condition,该condition仍允许非 convex acceleration。基于我们的理论结论,我们获得了关于LION算法的计算优点的新的理解。在分布式设置下,我们证明了在使用快速通信压缩GOSSIP协议时,这种非 convex acceleration会保持线性增速。我们的理论结论的新性在于它们是基于更弱的假设 derivation,从而扩展了掌纹算法的可证明适用范围。
Cascaded Multi-task Adaptive Learning Based on Neural Architecture Search
results: 该方法能够在 SLURP 上实现类似于手动设计的优化策略,压缩优化参数数量为 8.7%,并且性能更好。Abstract
Cascading multiple pre-trained models is an effective way to compose an end-to-end system. However, fine-tuning the full cascaded model is parameter and memory inefficient and our observations reveal that only applying adapter modules on cascaded model can not achieve considerable performance as fine-tuning. We propose an automatic and effective adaptive learning method to optimize end-to-end cascaded multi-task models based on Neural Architecture Search (NAS) framework. The candidate adaptive operations on each specific module consist of frozen, inserting an adapter and fine-tuning. We further add a penalty item on the loss to limit the learned structure which takes the amount of trainable parameters into account. The penalty item successfully restrict the searched architecture and the proposed approach is able to search similar tuning scheme with hand-craft, compressing the optimizing parameters to 8.7% corresponding to full fine-tuning on SLURP with an even better performance.
摘要
继发多个预训练模型是一种有效的端到端系统组合方式。然而,对整个继发模型进行精细调整是参数和内存不fficient的,我们的观察表明,只有在继发模型上应用 adapter 模块不能达到显著性能提升。我们提出了一种自动和有效的适应学习方法,基于神经建筑搜索(NAS)框架,以便优化端到端继发多任务模型。候选适应操作 на each specific module 包括冻结、插入 adapter 和 fine-tuning。我们进一步添加了一个惩罚项到损失函数,以限制搜索到的结构,该结构的训练参数数量被考虑。这个惩罚项成功限制了搜索的结构,我们的方法能够搜索到类似的调整方案,将优化参数压缩到 8.7% 相当于全面 fine-tuning 在 SLURP 上,并且性能更好。
CAD-DA: Controllable Anomaly Detection after Domain Adaptation by Statistical Inference
methods: 这个方法使用 conditional Selective Inference 来处理 DA 的影响,并能够控制预先确定的水平 $\alpha$(例如 0.05)中的异常识别概率。
results: 在 both synthetic 和实际数据集上,CAD-DA 方法能够实现有效的统计推断,并且能够控制预先确定的水平 $\alpha$ 中的异常识别概率。Abstract
We propose a novel statistical method for testing the results of anomaly detection (AD) under domain adaptation (DA), which we call CAD-DA -- controllable AD under DA. The distinct advantage of the CAD-DA lies in its ability to control the probability of misidentifying anomalies under a pre-specified level $\alpha$ (e.g., 0.05). The challenge within this DA setting is the necessity to account for the influence of DA to ensure the validity of the inference results. Our solution to this challenge leverages the concept of conditional Selective Inference to handle the impact of DA. To our knowledge, this is the first work capable of conducting a valid statistical inference within the context of DA. We evaluate the performance of the CAD-DA method on both synthetic and real-world datasets.
摘要
我们提出了一种新的统计方法,用于测试异常检测(AD)下领域适应(DA)的结果,我们称之为CAD-DA,即可控AD下DA的异常检测方法。CAD-DA的独特优势在于可以控制misidentify异常的概率,例如0.05。在DA设定下,挑战是需要考虑DA的影响,以确保结论的有效性。我们解决这个挑战,利用选择性统计处理DA的影响。据我们所知,这是首个在DA设定下进行有效统计推断的研究。我们对 synthetic和实际数据集进行了性能评估。
results: 实验表明,该算法在实际 dataset 上表现更高的准确率和检测速度,比现有的假信息检测算法更好。Abstract
Modern social media platforms play an important role in facilitating rapid dissemination of information through their massive user networks. Fake news, misinformation, and unverifiable facts on social media platforms propagate disharmony and affect society. In this paper, we consider the problem of online auditing of information flow/propagation with the goal of classifying news items as fake or genuine. Specifically, driven by experiential studies on real-world social media platforms, we propose a probabilistic Markovian information spread model over networks modeled by graphs. We then formulate our inference task as a certain sequential detection problem with the goal of minimizing the combination of the error probability and the time it takes to achieve correct decision. For this model, we find the optimal detection algorithm minimizing the aforementioned risk and prove several statistical guarantees. We then test our algorithm over real-world datasets. To that end, we first construct an offline algorithm for learning the probabilistic information spreading model, and then apply our optimal detection algorithm. Experimental study show that our algorithm outperforms state-of-the-art misinformation detection algorithms in terms of accuracy and detection time.
摘要
现代社交媒体平台在推广信息的速度和范围方面发挥着重要的作用。社交媒体上的假新闻、谣言和未经证实的信息可能会导致社会不稳定。在这篇论文中,我们考虑了在社交媒体上进行信息流/宣传的在线审核问题,以分类新闻项目为假或真。我们基于实际的社交媒体平台实践研究,提出了一种 probabilistic Markov chain 信息传播模型,并将检测任务定义为一种顺序检测问题,以最小化错误概率和检测时间的组合。我们找到了最佳检测算法,并证明了一些统计保证。然后,我们对实际数据进行测试,并构建了一个在线算法来学习probabilistic信息传播模型。实验结果表明,我们的算法在准确率和检测时间方面与当前的误信息检测算法相比,表现出优异性。
GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels
results: 我们的方法在实际中的无标签测试图上进行了广泛的实验,结果表明我们的方法可以准确地评估 GNN 模型在不同的图数据分布下的性能。Abstract
Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e.g., node classification accuracy) on unseen graphs without labels. Concretely, we propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference. The DiscGraph set captures wide-range and diverse graph data distribution discrepancies through a discrepancy measurement function, which exploits the outputs of GNNs related to latent node embeddings and node class predictions. Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model and makes an accurate inference for evaluating GNN model performance. Extensive experiments on real-world unseen and unlabeled test graphs demonstrate the effectiveness of our proposed method for GNN model evaluation.
摘要
评估图 neural network(GNN)的性能是实际部署和服务GNN模型的重要任务,因为部署在测试图上的GNN模型会面临很大的性能不确定性,由于训练和测试图的分布不匹配。在这篇论文中,我们研究了一个新的问题:GNN模型评估,它的目标是评估特定的GNN模型,它在观察和标注的图上 receives training,并且可以准确地预测图上的节点分类率。具体来说,我们提出了一个两stage GNN模型评估框架,包括(1)DiscGraph集合建立和(2)GNNEvaluator训练和推测。DiscGraph集合通过一个不同程度度量函数来捕捉图数据分布差异,这些差异度量函数利用GNN模型对节点嵌入和节点类预测的输出。在DiscGraph集合的有效的训练监督下,GNNEvaluator可以准确地估计GNN模型的节点分类率,并且可以准确地进行GNN模型性能评估。我们在实际的未看到和未标注测试图上进行了广泛的实验,并证明了我们提出的方法的有效性。
Modeling groundwater levels in California’s Central Valley by hierarchical Gaussian process and neural network regression
results: 在2015-2020年间对加利福尼亚中部水系的地下水水平进行模拟,结果显示2017和2019两年的洗涤不足以补偿过去的旱期lost groundwater。Abstract
Modeling groundwater levels continuously across California's Central Valley (CV) hydrological system is challenging due to low-quality well data which is sparsely and noisily sampled across time and space. A novel machine learning method is proposed for modeling groundwater levels by learning from a 3D lithological texture model of the CV aquifer. The proposed formulation performs multivariate regression by combining Gaussian processes (GP) and deep neural networks (DNN). Proposed hierarchical modeling approach constitutes training the DNN to learn a lithologically informed latent space where non-parametric regression with GP is performed. The methodology is applied for modeling groundwater levels across the CV during 2015 - 2020. We demonstrate the efficacy of GP-DNN regression for modeling non-stationary features in the well data with fast and reliable uncertainty quantification. Our results indicate that the 2017 and 2019 wet years in California were largely ineffective in replenishing the groundwater loss caused during previous drought years.
摘要
模拟加利福尼亚中部谷地水层水位的问题非常复杂,因为水井数据质量低下,时空上充满噪声。一种新的机器学习方法被提议,通过学习三维岩石 texture模型来模拟水层水位。提议的形式ulation 通过将 Gaussian processes (GP) 和深度神经网络 (DNN) 结合起来实现多变量回归。层次模型的方法是在DNN中培养 latent space,并在非 Parametric regression with GP 中进行推 regression。该方法在2015-2020年cv中modeling groundwater levels。我们示出了GP-DNN回归的可靠性和快速量化uncertainty。我们的结果表明,2017和2019年的洗 wet years在加利福尼亚州中没有重新补充过去的抽水年份中的groundwater loss。
KindMed: Knowledge-Induced Medicine Prescribing Network for Medication Recommendation
results: 该论文在使用实际的扩展EHR群体上展示了KindMed的效果,与基于图driven的竞争对手相比,达到了领先的表现。Abstract
Extensive adoption of electronic health records (EHRs) offers opportunities for its use in various clinical analyses. We could acquire more comprehensive insights by enriching an EHR cohort with external knowledge (e.g., standardized medical ontology and wealthy semantics curated on the web) as it divulges a spectrum of informative relations between observed medical codes. This paper proposes a novel Knowledge-Induced Medicine Prescribing Network (KindMed) framework to recommend medicines by inducing knowledge from myriad medical-related external sources upon the EHR cohort, rendering them as medical knowledge graphs (KGs). On top of relation-aware graph representation learning to unravel an adequate embedding of such KGs, we leverage hierarchical sequence learning to discover and fuse clinical and medicine temporal dynamics across patients' historical admissions for encouraging personalized recommendations. In predicting safe, precise, and personalized medicines, we devise an attentive prescribing that accounts for and associates three essential aspects, i.e., a summary of joint historical medical records, clinical condition progression, and the current clinical state of patients. We exhibited the effectiveness of our KindMed on the augmented real-world EHR cohorts, etching leading performances against graph-driven competing baselines.
摘要
广泛采用电子健康记录(EHR)提供了许多可能性,用于不同的临床分析。我们可以通过扩充EHR群组 WITH 外部知识(例如标准医学 ontology 和互联网上 cura 的丰富 semantics)来获得更全面的理解,这些外部知识揭示了观察到的医疗代码之间的各种有益关系。本文提出了一种基于知识的医学药物推荐网络(KindMed)框架,用于根据外部医疗相关资源中的知识来建立医学知识图(KG),并在这些KG上进行关系意识graph representation learning来获得合适的嵌入。此外,我们还利用层次序列学习来发现和融合患者历史入院记录中的临床和药物时间动力学,以便促进个性化推荐。在预测安全、精准和个性化的药物时,我们提出了一种注意力投入的医学药物推荐方法,该方法考虑了三个基本方面:患者历史医疗记录摘要、临床病变趋势和患者当前临床状况。我们在扩展了实际世界EHR群组上展示了KindMed的效果,与图驱动的竞争基线相比,达到了出色的性能。
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
results: 该论文的结果表明,对于单个策略覆盖和抗腐蚀知识的假设下,提议的算法可以达到一种抗腐蚀性能bound,其增幅因子与抗腐蚀水平直接相关。特别是,当特定到线性MDP时,损害依赖于抗腐蚀水平的错误项降低到了 $\mathcal O(\zeta d n^{-1})$,其中 $d$ 是特征映射的维度,这与已知的下界准确性。Abstract
We investigate the problem of corruption robustness in offline reinforcement learning (RL) with general function approximation, where an adversary can corrupt each sample in the offline dataset, and the corruption level $\zeta\geq0$ quantifies the cumulative corruption amount over $n$ episodes and $H$ steps. Our goal is to find a policy that is robust to such corruption and minimizes the suboptimality gap with respect to the optimal policy for the uncorrupted Markov decision processes (MDPs). Drawing inspiration from the uncertainty-weighting technique from the robust online RL setting \citep{he2022nearly,ye2022corruptionrobust}, we design a new uncertainty weight iteration procedure to efficiently compute on batched samples and propose a corruption-robust algorithm for offline RL. Notably, under the assumption of single policy coverage and the knowledge of $\zeta$, our proposed algorithm achieves a suboptimality bound that is worsened by an additive factor of $\mathcal O(\zeta \cdot (\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H))^{1/2} (C(\hat{\mathcal F},\mu))^{-1/2} n^{-1})$ due to the corruption. Here $\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H)$ is the coverage coefficient that depends on the regularization parameter $\lambda$, the confidence set $\hat{\mathcal F}$, and the dataset $\mathcal Z_n^H$, and $C(\hat{\mathcal F},\mu)$ is a coefficient that depends on $\hat{\mathcal F}$ and the underlying data distribution $\mu$. When specialized to linear MDPs, the corruption-dependent error term reduces to $\mathcal O(\zeta d n^{-1})$ with $d$ being the dimension of the feature map, which matches the existing lower bound for corrupted linear MDPs. This suggests that our analysis is tight in terms of the corruption-dependent term.
摘要
我们研究了在线执行学习(RL)中的抗腐败性能,具体来说是在批处理样本的情况下,对于每个样本,敌对者可以进行抗腐败。我们的目标是找到一个抗腐败的策略,并尽可能减少与无腐败情况下的优化策略之间的差异。我们启发自在Robust Online RL Setting中的不确定性权重技术(\cite{he2022nearly,ye2022corruptionrobust}),并设计了一种新的不确定性权重迭代过程,以有效地计算批处理样本上。我们提出了一种对抗腐败的算法,并证明了该算法在假设单个策略覆盖和抗腐败量$\zeta$的情况下,存在一定的下降 bounds。其中,$\text{CC}(\lambda,\hat{\mathcal F},\mathcal Z_n^H)$是一个取决于Regularization参数$\lambda$、信任集$\hat{\mathcal F}$和批处理样本$\mathcal Z_n^H$的覆盖系数,$C(\hat{\mathcal F},\mu)$是一个取决于$\hat{\mathcal F}$和数据分布$\mu$的系数。当特化到线性MDPs时,损耗因素中的抗腐败相关项降为 $\mathcal O(\zeta d n^{-1})$,其中$d$是特征映射的维度,与现有的下降 bound相符。这表明我们的分析是紧急的。
Multimodal Graph Learning for Modeling Emerging Pandemics with Big Data
results: 对比基eline方法,该框架在不同的地区、疫情情况和预测时间范围内都能够具有更高的预测性能。Abstract
Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework called MGL4MEP that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.
摘要
正确预测和分析新兴疫情的角色在公共健康管理和决策中非常重要。传统方法主要依靠疫情学数据,忽略了其他可能有用的信息源,这些信息可以 acted as 疫情模式的感应器或指标。在本文中,我们提出了一个名为MGL4MEP的新框架,它结合了时间图 neural network和多 modal 数据进行学习和预测。我们利用了大量的数据来源,包括社交媒体内容,通过特定的预训语言模型来利用。这个组合提供了丰富的疫情动态指标,通过时间图 neural network 进行学习,实现了疫情预测和分析的优化。实验结果显示,我们的框架在不同的区域、疫情情况和预测时间点方面,都能够优于基准方法。将时间图学习和多Modal 数据融合,能够实现疫情景观的全面理解,具有较少的时间延迟、较低的成本和更多的信息指标。
Trigonometric Quadrature Fourier Features for Scalable Gaussian Process Regression
paper_authors: Kevin Li, Max Balakirsky, Simon Mak
for: The paper is written for scalable Gaussian Process (GP) regression, specifically to address the limitations of Quadrature Fourier Features (QFF) and improve the approximation accuracy and uncertainty estimates.
methods: The paper proposes a new method called Trigonometric Quadrature Fourier Feature (TQFF) that uses a non-Gaussian quadrature rule tailored for the desired Fourier transform, which improves the performance of the approximation over RFF and Gaussian QFF.
results: The paper demonstrates the improved performance of TQFF over RFF and Gaussian QFF in a suite of numerical experiments and applications, and shows that TQFF enjoys accurate GP approximations over a broad range of length-scales using fewer features.Here is the simplified Chinese translation of the three points:
methods: 这篇论文提出了一种新的方法 called Trigonometric Quadrature Fourier Feature(TQFF),该方法使用非对称Gaussian quadrature规则,特地适应所需的傅立叶变换。
results: 这篇论文通过一系列的数值实验和应用,证明 TQFF 方法在 RFF 和 Gaussian QFF 方法上具有更好的表现,可以在各种长尺度范围内使用 fewer features 来获得更高的拟合精度。Abstract
Fourier feature approximations have been successfully applied in the literature for scalable Gaussian Process (GP) regression. In particular, Quadrature Fourier Features (QFF) derived from Gaussian quadrature rules have gained popularity in recent years due to their improved approximation accuracy and better calibrated uncertainty estimates compared to Random Fourier Feature (RFF) methods. However, a key limitation of QFF is that its performance can suffer from well-known pathologies related to highly oscillatory quadrature, resulting in mediocre approximation with limited features. We address this critical issue via a new Trigonometric Quadrature Fourier Feature (TQFF) method, which uses a novel non-Gaussian quadrature rule specifically tailored for the desired Fourier transform. We derive an exact quadrature rule for TQFF, along with kernel approximation error bounds for the resulting feature map. We then demonstrate the improved performance of our method over RFF and Gaussian QFF in a suite of numerical experiments and applications, and show the TQFF enjoys accurate GP approximations over a broad range of length-scales using fewer features.
摘要
“傅里叶特征近似方法已经成功应用在文献中 для涵盖�urrent Gaussian Process(GP)回归。特别是几何 Fourier Feature(QFF),它们的优点包括更高的准确度和更好地调整的不确定性估计,相比于Random Fourier Feature(RFF)方法。然而,QFF的表现可能会受到高振芜的问题,导致中�来的抽象有限。我们解决这个主要的问题,通过一种新的 trigonometric quadrature Fourier feature(TQFF)方法,这个方法使用一种特殊的非� Gaussian quadrature rule,专门适用于所需的傅里叶缩推。我们 derivate一个精确的 quadrature rule for TQFF,以及� kernel approximation error bounds for the resulting feature map。我们然后在一系列的数据�emonstrate TQFF的改进性,比如RFF和几何 Fourier Feature(Gaussian QFF),并显示TQFF在�urrent GP回归中具有更好的准确度和更广泛的�urrent�ength�scale使用更少的特征。”
Marginal Nodes Matter: Towards Structure Fairness in Graphs
results: 实验结果表明,SFairGNN 可以显著提高结构公平性,同时保持下游任务的总性能。Abstract
In social network, a person located at the periphery region (marginal node) is likely to be treated unfairly when compared with the persons at the center. While existing fairness works on graphs mainly focus on protecting sensitive attributes (e.g., age and gender), the fairness incurred by the graph structure should also be given attention. On the other hand, the information aggregation mechanism of graph neural networks amplifies such structure unfairness, as marginal nodes are often far away from other nodes. In this paper, we focus on novel fairness incurred by the graph structure on graph neural networks, named \emph{structure fairness}. Specifically, we first analyzed multiple graphs and observed that marginal nodes in graphs have a worse performance of downstream tasks than others in graph neural networks. Motivated by the observation, we propose \textbf{S}tructural \textbf{Fair} \textbf{G}raph \textbf{N}eural \textbf{N}etwork (SFairGNN), which combines neighborhood expansion based structure debiasing with hop-aware attentive information aggregation to achieve structure fairness. Our experiments show \SFairGNN can significantly improve structure fairness while maintaining overall performance in the downstream tasks.
摘要
在社交网络中,位于边缘区域(边缘节点)的人可能会被不公正地对待,相比中心节点的人。现有的公平性工作主要关注保护敏感属性(如年龄和性别),但是图Structural fairness也应该得到注意。图Structural fairness的存在会使得边缘节点在图 нейрон网络中表现更差,因为它们通常与其他节点较远。在这篇论文中,我们关注图Structural fairness,并提出了一种名为\emph{Structure Fairness}的新公平性。我们首先分析了多个图,并发现了边缘节点在图 нейрон网络中下游任务的性能较差。这一观察motivates我们提出了\textbf{S}tructural \textbf{Fair} \textbf{G}raph \textbf{N}eural \textbf{N}etwork(SFairGNN),它将邻居扩展基于结构偏见的结构偏见与跳跃感知的信息汇集结合以实现结构公平性。我们的实验表明,\SFairGNN可以显著改善结构公平性,同时保持下游任务的总性能。
K-Nearest-Neighbors Induced Topological PCA for scRNA Sequence Data Analysis
results: 对于 11 个不同的 benchmark 细胞RNA seq 数据集,我们的方法超过了其他无监督 PCA 增强法和 UMAP、tSNE 和 Projection Non-Negative Matrix Factorization(NMF)的表现,并且可以更好地处理多尺度和多类别多样性问题。Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Traditional PCA, a main workhorse in dimensionality reduction, lacks the ability to capture geometrical structure information embedded in the data, and previous graph Laplacian regularizations are limited by the analysis of only a single scale. We propose a topological Principal Components Analysis (tPCA) method by the combination of persistent Laplacian (PL) technique and L$_{2,1}$ norm regularization to address multiscale and multiclass heterogeneity issues in data. We further introduce a k-Nearest-Neighbor (kNN) persistent Laplacian technique to improve the robustness of our persistent Laplacian method. The proposed kNN-PL is a new algebraic topology technique which addresses the many limitations of the traditional persistent homology. Rather than inducing filtration via the varying of a distance threshold, we introduced kNN-tPCA, where filtrations are achieved by varying the number of neighbors in a kNN network at each step, and find that this framework has significant implications for hyper-parameter tuning. We validate the efficacy of our proposed tPCA and kNN-tPCA methods on 11 diverse benchmark scRNA-seq datasets, and showcase that our methods outperform other unsupervised PCA enhancements from the literature, as well as popular Uniform Manifold Approximation (UMAP), t-Distributed Stochastic Neighbor Embedding (tSNE), and Projection Non-Negative Matrix Factorization (NMF) by significant margins.
摘要
Single-cell RNA sequencing (scRNA-seq) 是一种广泛使用的技术,用于揭示细胞间的多样性,从而为我们提供了细胞通信、细胞分化和不同基因表达的启示。然而,分析 scRNA-seq 数据是一项挑战,因为数据中存在稀疏性和大量基因的问题。因此,维度减少和特征选择是必须的,以移除干扰信号并增强下游分析。传统的 PCA(主成分分析)是维度减少的主要工具,但它缺乏捕捉数据中嵌入的几何结构信息的能力。我们提出一种 topological Principal Components Analysis (tPCA) 方法,通过结合持续 Laplacian(PL)技术和 L$_{2,1}$ 范数规则来解决数据中多尺度和多类异ogeneity 问题。我们还引入 k-Nearest-Neighbor (kNN) 持续 Laplacian技术以提高我们的持续 Laplacian方法的可靠性。我们的 kNN-PL 方法是一种新的 algebraic topology 技术,它解决了传统 persist homology 中的多个限制。而不是通过变化距离阈值来实现维度滤波,我们引入 kNN-tPCA,其中维度滤波通过在每个步骤中变化数据中的几个邻居的数量来实现。我们验证了我们提出的 tPCA 和 kNN-tPCA 方法在 11 个不同的 scRNA-seq 数据集上的效果,并发现它们在其他未supervised PCA 增强技术、UMAP、t-SNE 和 Projection Non-Negative Matrix Factorization (NMF) 的基础上表现出了显著的优势。
Efficient Heterogeneous Graph Learning via Random Projection
For: 这个研究旨在提高大规模现实世界图像中的深度学习效率,通过将异化图像转换为常态化的矩阵,并使用单次讯息传递来进行训练。* Methods: 本研究提出了一种混合式预计基于图像神经网络(HGNN),组合了一Style的效率和另一Style的信息损失低。主要框架包括传播-更新迭代,并引入随机投影压缩步骤,以确保复杂度增长 linearly。* Results: 实验结果显示,我们的方法可以在七个小型和大型 benchmark 数据集上取得现场状态的结果,并且比最有效的基准更快速了230%。 surprisngly,我们的方法不仅超过预处理基准,而且还超过了终端方法。Abstract
Heterogeneous Graph Neural Networks (HGNNs) are powerful tools for deep learning on heterogeneous graphs. Typical HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Recent pre-computation-based HGNNs use one-time message passing to transform a heterogeneous graph into regular-shaped tensors, enabling efficient mini-batch training. Existing pre-computation-based HGNNs can be mainly categorized into two styles, which differ in how much information loss is allowed and efficiency. We propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods.
摘要
《异种图 neural network(HGNN)是深度学习异种图的powerful工具。 typical HGNNs需要在训练中重复的发送消息,限制了大规模实际graph的效率。 recent pre-computation-based HGNNs使用一次消息 passing将异种图转换成常规形状的矩阵,以便高效地进行小批量训练。 existing pre-computation-based HGNNs可以分为两种风格,不同的风格允许的信息损失和效率。 we propose a hybrid pre-computation-based HGNN, named Random Projection Heterogeneous Graph Neural Network (RpHGNN), which combines the benefits of one style's efficiency with the low information loss of the other style. To achieve efficiency, the main framework of RpHGNN consists of propagate-then-update iterations, where we introduce a Random Projection Squashing step to ensure that complexity increases only linearly. To achieve low information loss, we introduce a Relation-wise Neighbor Collection component with an Even-odd Propagation Scheme, which aims to collect information from neighbors in a finer-grained way. Experimental results indicate that our approach achieves state-of-the-art results on seven small and large benchmark datasets while also being 230% faster compared to the most effective baseline. Surprisingly, our approach not only surpasses pre-processing-based baselines but also outperforms end-to-end methods。》Note: The translation is in Simplified Chinese, which is the standard version of Chinese used in mainland China and widely used in other countries. If you need the translation in Traditional Chinese, please let me know.
Attention-Enhancing Backdoor Attacks Against BERT-based Models
for: investigate the strategies of backdoor attacks and understand the model’s vulnerability
methods: directly manipulate the attention patterns in the interior structure of neural networks
results: enhance the Trojan behavior and boost attack efficacy in terms of attack successful rates and poisoning rates, applicable to different attacking methods and models.Abstract
Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).
摘要
新的研究发现,\textit{后门攻击} 可以威胁自然语言处理(NLP)模型的安全性。调查后门攻击的策略可以帮助我们理解模型的漏洞。现有大多数文本后门攻击都是通过生成隐藏的触发符或修改模型的权重来实现。在这篇论文中,我们直接target了神经网络的内部结构和后门机制。我们提出了一种新的 Trojan Attention Loss(TAL),它可以直接 manipulate 神经网络的注意模式,从而增强 Trojan 行为。我们的损失可以应用于不同的攻击方法,以提高攻击成功率和毒料率。它适用于不仅传统的尘埃标签攻击,还适用于更加困难的干净标签攻击。我们在不同的基础模型(BERT、RoBERTa、DistilBERT)和多个任务(情感分析、毒语检测、主题分类)上验证了我们的方法。
Revisiting Implicit Differentiation for Learning Problems in Optimal Control
results: 我们在一个 synthetic 测试集和四个具有挑战性的学习从示例中评估了我们的方法。结果表明,我们的方法可以在带有时间步骤的优化问题中高效地计算导数,并且可以在大型模型下实现更好的可扩展性和稳定性。Abstract
This paper proposes a new method for differentiating through optimal trajectories arising from non-convex, constrained discrete-time optimal control (COC) problems using the implicit function theorem (IFT). Previous works solve a differential Karush-Kuhn-Tucker (KKT) system for the trajectory derivative, and achieve this efficiently by solving an auxiliary Linear Quadratic Regulator (LQR) problem. In contrast, we directly evaluate the matrix equations which arise from applying variable elimination on the Lagrange multiplier terms in the (differential) KKT system. By appropriately accounting for the structure of the terms within the resulting equations, we show that the trajectory derivatives scale linearly with the number of timesteps. Furthermore, our approach allows for easy parallelization, significantly improved scalability with model size, direct computation of vector-Jacobian products and improved numerical stability compared to prior works. As an additional contribution, we unify prior works, addressing claims that computing trajectory derivatives using IFT scales quadratically with the number of timesteps. We evaluate our method on a both synthetic benchmark and four challenging, learning from demonstration benchmarks including a 6-DoF maneuvering quadrotor and 6-DoF rocket powered landing.
摘要
Inferring Relational Potentials in Interacting Systems
results: NIIP可以在测试时展示独特的能力,例如可以在不同的模型中交换互动类型,预测行程,以及检测异常样本和外部干扰。Abstract
Systems consisting of interacting agents are prevalent in the world, ranging from dynamical systems in physics to complex biological networks. To build systems which can interact robustly in the real world, it is thus important to be able to infer the precise interactions governing such systems. Existing approaches typically discover such interactions by explicitly modeling the feed-forward dynamics of the trajectories. In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory. NIIP assigns low energy to the subset of trajectories which respect the relational constraints observed. We illustrate that with these representations NIIP displays unique capabilities in test-time. First, it allows trajectory manipulation, such as interchanging interaction types across separately trained models, as well as trajectory forecasting. Additionally, it allows adding external hand-crafted potentials at test-time. Finally, NIIP enables the detection of out-of-distribution samples and anomalies without explicit training. Website: https://energy-based-model.github.io/interaction-potentials.
摘要
系统组成了互动代理是世界各地的普遍现象,从物理动力学系统到复杂生物网络。为建立能够在实际世界中稳定交互的系统,因此是必要的能够推断系统的准确交互规则。现有的方法通常通过显式地模型演示的演进动力学来发现这些交互。在这种工作中,我们提议使用神经网络可视化力学 potential(NIIP)作为一种alternative方法,可以更多的灵活性在轨迹模型化。NIIP发现一组关系 potential,表示为能量函数,当这些函数的最小值时,可以重建原始轨迹。NIIP将低能量分配给尊重关系约束的子集。我们示出NIIP在测试时显示了独特的能力。首先,它允许轨迹操作,例如在不同的模型上交换交互类型,以及预测轨迹。其次,它允许在测试时添加手动编写的外部潜在力。最后,NIIP可以在测试时探测不同类型的样本和异常现象而无需显式培训。网站:https://energy-based-model.github.io/interaction-potentials。