results: 研究发现,包含小规模非本地特征是最关键的,以实现有效的涨潮层次粗细规范模型,而大规模特征可以提高解 posteriori 解场的点对应精度。模型可以在不同的流动配置下进行普适化,包括不同的 Reynolds 数和冲击条件。Abstract
Deep learning is increasingly becoming a promising pathway to improving the accuracy of sub-grid scale (SGS) turbulence closure models for large eddy simulations (LES). We leverage the concept of differentiable turbulence, whereby an end-to-end differentiable solver is used in combination with physics-inspired choices of deep learning architectures to learn highly effective and versatile SGS models for two-dimensional turbulent flow. We perform an in-depth analysis of the inductive biases in the chosen architectures, finding that the inclusion of small-scale non-local features is most critical to effective SGS modeling, while large-scale features can improve pointwise accuracy of the a-posteriori solution field. The filtered velocity gradient tensor can be mapped directly to the SGS stress via decomposition of the inputs and outputs into isotropic, deviatoric, and anti-symmetric components. We see that the model can generalize to a variety of flow configurations, including higher and lower Reynolds numbers and different forcing conditions. We show that the differentiable physics paradigm is more successful than offline, a-priori learning, and that hybrid solver-in-the-loop approaches to deep learning offer an ideal balance between computational efficiency, accuracy, and generalization. Our experiments provide physics-based recommendations for deep-learning based SGS modeling for generalizable closure modeling of turbulence.
摘要
深度学习在提高大涨规(SGS)涨规模型精度方面表现越来越有前途。我们利用了可导ifferentiable turbulence的概念,其中一个可导ifferentiable solver与物理启发的深度学习架构结合使用,以学习高效和多样化的SGS模型。我们进行了深入的杂散偏见分析,发现包含小规模非本地特征是最重要的SGS模型化特征,而大规模特征可以提高 posteriori 解场中点精度。 filtered velocity gradient tensor 可以直接映射到 SGS 压力,通过输入和输出的归一化、异常值分解和反对映射。我们发现模型可以通过不同的流场配置和强制条件进行泛化,包括不同 Reynolds 数和强制条件。我们还证明了可导ifferentiable physics парадиг是一个更成功的方法,而不是离线、先验学习。我们的实验结果为深度学习基于 SGS 模型的涨规模型化提供物理学习的建议。
GeoPhy: Differentiable Phylogenetic Inference via Geometric Gradients of Tree Topologies
results: 在使用实际 benchmark 数据进行实验中,GeoPhy 方法与其他 approximate Bayesian 方法相比,显著地提高了性能。Abstract
Phylogenetic inference, grounded in molecular evolution models, is essential for understanding the evolutionary relationships in biological data. Accounting for the uncertainty of phylogenetic tree variables, which include tree topologies and evolutionary distances on branches, is crucial for accurately inferring species relationships from molecular data and tasks requiring variable marginalization. Variational Bayesian methods are key to developing scalable, practical models; however, it remains challenging to conduct phylogenetic inference without restricting the combinatorially vast number of possible tree topologies. In this work, we introduce a novel, fully differentiable formulation of phylogenetic inference that leverages a unique representation of topological distributions in continuous geometric spaces. Through practical considerations on design spaces and control variates for gradient estimations, our approach, GeoPhy, enables variational inference without limiting the topological candidates. In experiments using real benchmark datasets, GeoPhy significantly outperformed other approximate Bayesian methods that considered whole topologies.
摘要
生物数据中的进化关系理解需要基于分子进化模型的phylogenetic inference。考虑phylogenetic树变量的不确定性,包括树 topology和演化距离在支持下,是准确推断物种关系和基于分子数据的任务需要变量聚合的关键。variational Bayesian方法是开发可扩展、实用模型的关键,但是不限定可能的树体系数量是一个挑战。在这种情况下,我们介绍了一种新的、完全 differentiable的phylogenetic inference形式,利用连续几何空间中特有的树分布表示。通过实践设计空间和控制变量的考虑,我们的方法GeoPhy可以在不限定树体系数量的情况下进行变量整合。在使用实际 benchmark数据进行实验中,GeoPhy表现出了与其他approximate Bayesian方法相比的显著优势。
Simulation-free Schrödinger bridges via score and flow matching
results: 通过应用 SF2M 方法,可以准确地模型高维细胞动态模型,并且可以回归知道的基因调控网络。此外,SF2M 方法比之前的 simulate-based 方法更高效和更准确。Abstract
We present simulation-free score and flow matching ([SF]$^2$M), a simulation-free objective for inferring stochastic dynamics given unpaired source and target samples drawn from arbitrary distributions. Our method generalizes both the score-matching loss used in the training of diffusion models and the recently proposed flow matching loss used in the training of continuous normalizing flows. [SF]$^2$M interprets continuous-time stochastic generative modeling as a Schr\"odinger bridge (SB) problem. It relies on static entropy-regularized optimal transport, or a minibatch approximation, to efficiently learn the SB without simulating the learned stochastic process. We find that [SF]$^2$M is more efficient and gives more accurate solutions to the SB problem than simulation-based methods from prior work. Finally, we apply [SF]$^2$M to the problem of learning cell dynamics from snapshot data. Notably, [SF]$^2$M is the first method to accurately model cell dynamics in high dimensions and can recover known gene regulatory networks from simulated data.
摘要
我们提出了一个无 simulate 的得分和流动匹配([SF]$^2$M),它是一个无 simulate 的目标,用于对未配对的源和目标样本进行推测 Stochastic 动力学。我们的方法扩展了对于传播模型的训练中使用的得分匹配损失,以及最近提出的流动匹配损失,用于对紧致常态流动的训练。[SF]$^2$M 视为连续时间的泊松桥(SB)问题,并且透过静止 entropy 调整的最佳运输或批处替代方法来快速学习 SB 无需运行学习的数学过程。我们发现 [SF]$^2$M 比从先前的作业中的 simulate 方法更加高效且更精准地解决 SB 问题。最后,我们应用 [SF]$^2$M 来学习细胞动力学从快照数据中。特别是,[SF]$^2$M 是高维度细胞动力学的首个精准模型,并且可以从实验数据中回传知名的遗传因子网络。
Online Network Source Optimization with Graph-Kernel MAB
results: 在 simulations 中,提议的在线学习算法比基准Offline方法更高效,并且在聚合约束和计算复杂度方面具有更好的性能。Abstract
We propose Grab-UCB, a graph-kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks, such that the reward obtained from a priori unknown network processes is maximized. The uncertainty calls for online learning, which suffers however from the curse of dimensionality. To achieve sample efficiency, we describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulations results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency and computational complexity.
摘要
我们提议Grab-UCB算法,用图kernel多臂牌 Algorithm to learn在大规模网络中的优化来源分配,以 Maximize the reward from a priori unknown network processes. 因为uncertainty calls for online learning, which suffers from the curse of dimensionality. To achieve sample efficiency, we use an adaptive graph dictionary model to describe the network processes, which typically leads to sparse spectral representations. This enables a data-efficient learning framework, whose learning rate scales with the dimension of the spectral representation model instead of the one of the network. We then propose Grab-UCB, an online sequential decision strategy that learns the parameters of the spectral representation while optimizing the action strategy. We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy. We introduce a computationally simplified solving method, Grab-arm-Light, an algorithm that walks along the edges of the polytope representing the objective function. Simulation results show that the proposed online learning algorithm outperforms baseline offline methods that typically separate the learning phase from the testing one. The results confirm the theoretical findings, and further highlight the gain of the proposed online learning strategy in terms of cumulative regret, sample efficiency, and computational complexity.Note: The translation is in Simplified Chinese, which is a standardized form of Chinese used in mainland China and Singapore. The translation may vary depending on the region or dialect.
PAC bounds of continuous Linear Parameter-Varying systems related to neural ODEs
methods: 本文使用了LPV系统中的 bilinear系统,并证明了一类神经ODE可以被LPV系统中嵌入。作者还提供了一种 Probably Approximately Correct(PAC) bound,用于量化LPV系统相关神经ODE的稳定性。
results: 本文的主要贡献是提供了不依赖于集成时间的PAC bound,用于量化LPV系统相关神经ODE的稳定性。Abstract
We consider the problem of learning Neural Ordinary Differential Equations (neural ODEs) within the context of Linear Parameter-Varying (LPV) systems in continuous-time. LPV systems contain bilinear systems which are known to be universal approximators for non-linear systems. Moreover, a large class of neural ODEs can be embedded into LPV systems. As our main contribution we provide Probably Approximately Correct (PAC) bounds under stability for LPV systems related to neural ODEs. The resulting bounds have the advantage that they do not depend on the integration interval.
摘要
我们考虑了内联神经ordinary differential equations(内联神经ODE)在连续时间中的学习问题,具体来说是在线性参数变量(LPV)系统中。LPV系统包含bilinear系统,这些系统是非线性系统的通用近似器。此外,大量的内联神经ODE可以被LPV系统中嵌入。作为我们的主要贡献,我们提供了可靠地近似正确(PAC)的下界,这些下界不依赖于集成时间。
Toward High-Performance Energy and Power Battery Cells with Machine Learning-based Optimization of Electrode Manufacturing
results: 我们的结果表明,以高活跃物质和中间固体含量和满化程度为优化目标,可以获得优化的电极。Abstract
The optimization of the electrode manufacturing process is important for upscaling the application of Lithium Ion Batteries (LIBs) to cater for growing energy demand. In particular, LIB manufacturing is very important to be optimized because it determines the practical performance of the cells when the latter are being used in applications such as electric vehicles. In this study, we tackled the issue of high-performance electrodes for desired battery application conditions by proposing a powerful data-driven approach supported by a deterministic machine learning (ML)-assisted pipeline for bi-objective optimization of the electrochemical performance. This ML pipeline allows the inverse design of the process parameters to adopt in order to manufacture electrodes for energy or power applications. The latter work is an analogy to our previous work that supported the optimization of the electrode microstructures for kinetic, ionic, and electronic transport properties improvement. An electrochemical pseudo-two-dimensional model is fed with the electrode properties characterizing the electrode microstructures generated by manufacturing simulations and used to simulate the electrochemical performances. Secondly, the resulting dataset was used to train a deterministic ML model to implement fast bi-objective optimizations to identify optimal electrodes. Our results suggested a high amount of active material, combined with intermediate values of solid content in the slurry and calendering degree, to achieve the optimal electrodes.
摘要
降低锂离子电池(LIB)生产过程优化的重要性,是因为它会影响电池在实际应用中的实际性。例如,在电动汽车中使用的电池。在这种研究中,我们通过提出一种强大的数据驱动方法,支持由确定性机器学习(ML)托管的双目标优化管道,来解决高性能电极的问题。这个ML管道允许逆向设计生产参数,以生产适用于能量或功率应用的电极。这与我们之前的工作相似,曾经支持电极微结构优化,以提高电池的离子、镁和电子传输性能。一个电化学 Pseudo-二维模型,通过电极性能特征来模拟电化学性能。其次,生成的数据集用于训练一个确定性ML模型,以实现快速双目标优化,并提取最佳电极。我们的结果表明,高活性材料和中值的固体含量和滚筒度,可以实现最佳电极。
GEANN: Scalable Graph Augmentations for Multi-Horizon Time Series Forecasting
results: 研究在target应用中,对一家大型电子商务公司的需求预测 task 进行了测试,并表明其方法在小数据集(100K产品)和大数据集(超过200W产品)上均显著提高了模型的总性能,尤其是对于“冷启动”产品(新上市或者Recently out-of-stock)的预测性能具有显著的提升。Abstract
Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting, especially in real-world applications. However, to forecast accurately, these sophisticated models typically rely on a large number of time series examples with substantial history. A rapidly growing topic of interest is forecasting time series which lack sufficient historical data -- often referred to as the ``cold start'' problem. In this paper, we introduce a novel yet simple method to address this problem by leveraging graph neural networks (GNNs) as a data augmentation for enhancing the encoder used by such forecasters. These GNN-based features can capture complex inter-series relationships, and their generation process can be optimized end-to-end with the forecasting task. We show that our architecture can use either data-driven or domain knowledge-defined graphs, scaling to incorporate information from multiple very large graphs with millions of nodes. In our target application of demand forecasting for a large e-commerce retailer, we demonstrate on both a small dataset of 100K products and a large dataset with over 2 million products that our method improves overall performance over competitive baseline models. More importantly, we show that it brings substantially more gains to ``cold start'' products such as those newly launched or recently out-of-stock.
摘要
<>传输文本到Simplified Chinese表示。<>深度神经网络(encoder-decoder)在实际应用中得到了更多研究,特别是用于多个时间序列预测。然而,为了准确预测,这些复杂的模型通常需要大量的时间序列示例,而且这些示例通常具有较长的历史记录。在这篇论文中,我们介绍了一种新的简单方法,利用图 neural network(GNN)作为编码器的数据扩充,以提高预测性能。这些GNN基于的特征可以捕捉复杂的时间序列之间关系,并且其生成过程可以通过预测任务进行END-TO-END优化。我们表明,我们的架构可以使用数据驱动或定义在领域知识图中的图,并可扩展到涉及多个巨大图的信息。在我们的目标应用中,我们在100000个产品的小数据集和超过2000000个产品的大数据集上进行了实验,并证明了我们的方法在相比基eline模型的情况下提高了总性能。更重要的是,我们发现在“冷启动”产品上(例如新推出或者售罄),我们的方法带来了极大的改善。
Accelerated Optimization Landscape of Linear-Quadratic Regulator
results: 研究人员通过提出一种 Lipschitz Hessian 性质的LQR性能函数,以及利用 симплекс 牛顿方法和重启规则来保持连续时间的优化率,实现了对SLQR和OLQR问题的高精度解决。Abstract
Linear-quadratic regulator (LQR) is a landmark problem in the field of optimal control, which is the concern of this paper. Generally, LQR is classified into state-feedback LQR (SLQR) and output-feedback LQR (OLQR) based on whether the full state is obtained. It has been suggested in existing literature that both the SLQR and the OLQR could be viewed as \textit{constrained nonconvex matrix optimization} problems in which the only variable to be optimized is the feedback gain matrix. In this paper, we introduce a first-order accelerated optimization framework of handling the LQR problem, and give its convergence analysis for the cases of SLQR and OLQR, respectively. Specifically, a Lipschiz Hessian property of LQR performance criterion is presented, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, a continuous-time hybrid dynamic system is introduced, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, a Hessian-free accelerated framework is proposed, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.
摘要
Linear-quadratic regulator (LQR) 是控制理论中的一个标志性问题,这篇文章的研究对象。通常情况下,LQR可以分为基于状态反馈(SLQR)和基于输出反馈(OLQR)两种,根据是否获得全状态。在现有文献中,有人提出了视为非对称矩阵优化问题的思路,其中仅仅是反馈矩阵进行优化。在这篇文章中,我们介绍了一种基于首频加速优化框架,并对 SLQR 和 OLQR 两种情况进行了分别的可控性分析。 Specifically, we present a Lipschitz Hessian property of LQR performance criterion, which turns out to be a crucial property for the application of modern optimization techniques. For the SLQR problem, we introduce a continuous-time hybrid dynamic system, whose solution trajectory is shown to converge exponentially to the optimal feedback gain with Nesterov-optimal order $1-\frac{1}{\sqrt{\kappa}$ ($\kappa$ the condition number). Then, the symplectic Euler scheme is utilized to discretize the hybrid dynamic system, and a Nesterov-type method with a restarting rule is proposed that preserves the continuous-time convergence rate, i.e., the discretized algorithm admits the Nesterov-optimal convergence order. For the OLQR problem, we propose a Hessian-free accelerated framework, which is a two-procedure method consisting of semiconvex function optimization and negative curvature exploitation. In a time $\mathcal{O}(\epsilon^{-7/4}\log(1/\epsilon))$, the method can find an $\epsilon$-stationary point of the performance criterion; this entails that the method improves upon the $\mathcal{O}(\epsilon^{-2})$ complexity of vanilla gradient descent. Moreover, our method provides the second-order guarantee of stationary point.
BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits
results: 我们提供了 BOF-UCB 的性能理论保证,并在实验中显示它在 Synthetic 数据和 classical control 任务中能够平衡寻找和实现,并且在非站ARY环境中表现比 existing methods 更好。Abstract
We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.
摘要
我们提出了一种新的泛bayesian-Optimistic Frequentist Upper Confidence Bound(BOF-UCB)算法,用于非站点环境下的随机contextual linear bandit。这种独特的bayesian和频quentist原则的结合,提高了适应性和性能在动态环境下。BOF-UCB算法通过顺序的bayesian更新来推算未知回归参数的 posterior distribution,然后使用频quentist方法计算最大期望奖励的Upper Confidence Bound(UCB)。我们提供了BOF-UCB性能的理论保证,并在synthetic数据集和 классиcal控制任务中的reinforcement learning Setting中进行了实验证明。我们的结果表明,BOF-UCB超越了现有方法,这使得它成为非站点环境下的sequential decision-making的优秀解决方案。
ContextLabeler Dataset: physical and virtual sensors data collected from smartphone usage in-the-wild
results: 作者收集到了一个包含超过 45K 个数据样本的数据集,每个样本包含 1332 个特征。此外,每个数据样本还关联着一个真实的 Label,描述用户在感测实验中的活动和情况。这个数据集可以用于定义和评估 context-aware 解决方案,以适应移动环境中的用户情况变化。Abstract
This paper describes a data collection campaign and the resulting dataset derived from smartphone sensors characterizing the daily life activities of 3 volunteers in a period of two weeks. The dataset is released as a collection of CSV files containing more than 45K data samples, where each sample is composed by 1332 features related to a heterogeneous set of physical and virtual sensors, including motion sensors, running applications, devices in proximity, and weather conditions. Moreover, each data sample is associated with a ground truth label that describes the user activity and the situation in which she was involved during the sensing experiment (e.g., working, at restaurant, and doing sport activity). To avoid introducing any bias during the data collection, we performed the sensing experiment in-the-wild, that is, by using the volunteers' devices, and without defining any constraint related to the user's behavior. For this reason, the collected dataset represents a useful source of real data to both define and evaluate a broad set of novel context-aware solutions (both algorithms and protocols) that aim to adapt their behavior according to the changes in the user's situation in a mobile environment.
摘要
results: ProgSyn在多种约束下达到了新的状态功能,如在Adult数据集上保持同等公平性水平下提高了下游预测性能2.3%。总的来说,ProgSyn提供了一个 versatile 和可 accessible的框架,用于生成具有约束的Tabular数据,并允许特定需求的扩展。Abstract
Large amounts of tabular data remain underutilized due to privacy, data quality, and data sharing limitations. While training a generative model producing synthetic data resembling the original distribution addresses some of these issues, most applications require additional constraints from the generated data. Existing synthetic data approaches are limited as they typically only handle specific constraints, e.g., differential privacy (DP) or increased fairness, and lack an accessible interface for declaring general specifications. In this work, we introduce ProgSyn, the first programmable synthetic tabular data generation algorithm that allows for comprehensive customization over the generated data. To ensure high data quality while adhering to custom specifications, ProgSyn pre-trains a generative model on the original dataset and fine-tunes it on a differentiable loss automatically derived from the provided specifications. These can be programmatically declared using statistical and logical expressions, supporting a wide range of requirements (e.g., DP or fairness, among others). We conduct an extensive experimental evaluation of ProgSyn on a number of constraints, achieving a new state-of-the-art on some, while remaining general. For instance, at the same fairness level we achieve 2.3% higher downstream accuracy than the state-of-the-art in fair synthetic data generation on the Adult dataset. Overall, ProgSyn provides a versatile and accessible framework for generating constrained synthetic tabular data, allowing for specifications that generalize beyond the capabilities of prior work.
摘要
大量的表格数据因为隐私、数据质量和数据共享限制而尚未得到充分利用。训练一个生成模型生成具有原始分布的 sintetic 数据可以解决一些问题,但大多数应用需要更多的约束来限制生成的数据。现有的 sintetic 数据方法有限,它们通常只能处理特定的约束,如差分隐私(DP)或增强公平,而且缺乏可访问的接口来声明通用规则。在这项工作中,我们介绍ProgSyn,首个可编程的 sintetic 表格数据生成算法,允许用户根据需要进行全面的定制。为保证高质量的生成数据,ProgSyn在原始数据集上预训练生成模型,然后在基于提供的规则自动生成的差分损失上进行细化。这些规则可以使用统计和逻辑表达式进行程序matically声明,支持广泛的要求(例如DP或公平等)。我们在一些约束下进行了广泛的实验测试, achieved 新的状态 искусственный数据生成的状态之一,在一些约束下达到了新的状态之一,而且可以泛化。例如,在保持同等公平水平下,我们在Adult数据集上 achieved 2.3% 更高的下游准确率,比之前的最佳状态更高。总之,ProgSyn 提供了一个通用、可访问的 sintetic 表格数据生成框架,允许用户根据需要声明约束,这些约束可以超越现有的工作的能力。
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
results: 研究发现,当covariate从标准高斯分布中采样时,一层线性自注意力层会在最小二乘回归目标下进行单步Gradient Descent(GD)。而在非标准高斯分布下,改变weight vector和响应变量的分布会导致学习的算法发生显著变化。Abstract
Recent works have empirically analyzed in-context learning and shown that transformers trained on synthetic linear regression tasks can learn to implement ridge regression, which is the Bayes-optimal predictor, given sufficient capacity [Aky\"urek et al., 2023], while one-layer transformers with linear self-attention and no MLP layer will learn to implement one step of gradient descent (GD) on a least-squares linear regression objective [von Oswald et al., 2022]. However, the theory behind these observations remains poorly understood. We theoretically study transformers with a single layer of linear self-attention, trained on synthetic noisy linear regression data. First, we mathematically show that when the covariates are drawn from a standard Gaussian distribution, the one-layer transformer which minimizes the pre-training loss will implement a single step of GD on the least-squares linear regression objective. Then, we find that changing the distribution of the covariates and weight vector to a non-isotropic Gaussian distribution has a strong impact on the learned algorithm: the global minimizer of the pre-training loss now implements a single step of $\textit{pre-conditioned}$ GD. However, if only the distribution of the responses is changed, then this does not have a large effect on the learned algorithm: even when the response comes from a more general family of $\textit{nonlinear}$ functions, the global minimizer of the pre-training loss still implements a single step of GD on a least-squares linear regression objective.
摘要
近期研究探讨了在上下文中学习,并证明了使用生成的线性回归任务训练的变换器可以学习实现ridge回归,即极值优化预测器,只要容量足够大 [Aky\"urek et al., 2023]。另一方面,一层变换器 WITH linear self-attention 和无MLP层会学习实现一步Gradient Descent(GD)在最小二乘线性回归目标上 [von Oswald et al., 2022]。然而,这些观察的理论基础还未得到充分理解。我们在变换器中使用单层线性自注意力进行理论研究,并在生成噪声线性回归数据上训练。我们首先 математиче地表明,当covariates从标准 Gaussian 分布中采样时,一层变换器最小化预训练损失将实现一步GD在最小二乘线性回归目标上。然后,我们发现将covariates和重量 вектор从标准 Gaussian 分布更改为非均勋 Gaussian 分布会导致学习的算法强烈受到影响:全局最小化预训练损失的算法将实现一步pre-conditioned GD。但是,只是更改响应的分布而不是weight vector的分布,则不会导致很大的影响:即使响应来自更一般的非线性函数家族,全局最小化预训练损失的算法仍然会实现一步GD在最小二乘线性回归目标上。
Smoothing the Edges: A General Framework for Smooth Optimization in Sparse Regularization using Hadamard Overparametrization
results: 我们的方法可以具有与普通的 convex 正则化一样的全局最优解,并且可以保证地具有原始参数化中的本地最优解。此外,我们还提供了一个整合的视角,汇集了不同的参数化Literature中的概念,并对现有方法进行了meaningful扩展。在数值实验中,我们证明了我们的方法的可行性和效果,与常见的凸和非凸正则化相比,能够匹配或超越。Abstract
This paper presents a framework for smooth optimization of objectives with $\ell_q$ and $\ell_{p,q}$ regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly non-convex problems typically relies on specialized optimization routines. In contrast, the method studied here is compatible with off-the-shelf (stochastic) gradient descent that is ubiquitous in deep learning, thereby enabling differentiable sparse regularization without approximations. The proposed optimization transfer comprises an overparametrization of selected model parameters followed by a change of penalties. In the overparametrized problem, smooth and convex $\ell_2$ regularization induces non-smooth and non-convex regularization in the original parametrization. We show that the resulting surrogate problem not only has an identical global optimum but also exactly preserves the local minima. This is particularly useful in non-convex regularization, where finding global solutions is NP-hard and local minima often generalize well. We provide an integrative overview that consolidates various literature strands on sparsity-inducing parametrizations in a general setting and meaningfully extend existing approaches. The feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers.
摘要
(简化中文)这篇论文提出了一种基于 $\ell_q$ 和 $\ell_{p,q}$ 正则化的对象函数的平滑优化框架,用于Structured sparsity。现有的特殊化优化方法通常用于解决这些非短途和非凸问题。然而,提出的方法与深度学习中广泛使用的Stochastic gradient descent(SGD)兼容,可以无需 aproximations 实现 differentiable sparse regularization。优化转移包括在选择的模型参数上进行过参数化,然后改变 penalty。过参数化的问题会导致 smooth 和 convex $\ell_2$ regularization 在原始参数化中induces non-smooth 和 non-convex regularization。我们证明了这个代理问题不仅有identical global optimum,还能够 exactly preserve local minima。这 particualrly useful in non-convex regularization,因为找到全局解是NP-hard,而local minima通常 generalize well。我们提供了一个整合性的Overview,汇集了不同的文献弦线程在一个通用的设定下,并meaningfully extends existing approaches。 feasibility of our approach is evaluated through numerical experiments, demonstrating its effectiveness by matching or outperforming common implementations of convex and non-convex regularizers。
MALIBO: Meta-learning for Likelihood-free Bayesian Optimization
results: 实验结果显示,这个方法在不同的benchmark中展示了强大的任何时间性和超越了现有的meta-学习BO方法。Abstract
Bayesian optimization (BO) is a popular method to optimize costly black-box functions. While traditional BO optimizes each new target task from scratch, meta-learning has emerged as a way to leverage knowledge from related tasks to optimize new tasks faster. However, existing meta-learning BO methods rely on surrogate models that suffer from scalability issues and are sensitive to observations with different scales and noise types across tasks. Moreover, they often overlook the uncertainty associated with task similarity. This leads to unreliable task adaptation when only limited observations are obtained or when the new tasks differ significantly from the related tasks. To address these limitations, we propose a novel meta-learning BO approach that bypasses the surrogate model and directly learns the utility of queries across tasks. Our method explicitly models task uncertainty and includes an auxiliary model to enable robust adaptation to new tasks. Extensive experiments show that our method demonstrates strong anytime performance and outperforms state-of-the-art meta-learning BO methods in various benchmarks.
摘要
bayesian 优化(BO)是一种常用的优化昂贵黑色函数的方法。而传统的 BO 每次新的目标任务都从scratch开始优化,而 meta-learning 则是一种能够借鉴相关任务来快速优化新任务的方法。然而,现有的 meta-learning BO 方法通常依赖于不准确的代理模型,这些模型受到任务规模和噪声的影响,导致缺乏可扩展性和可靠性。此外,它们经常忽略任务之间的uncertainty。这会导致在只有有限的观察数据时或者新任务与相关任务存在差异时,task adaptation 不可靠。为了解决这些限制,我们提出了一种新的 meta-learning BO 方法,该方法 circumvents 代理模型,直接在任务间学习查询的用于性。我们的方法显式地模型任务的uncertainty,并采用 auxilary 模型来实现鲁棒的任务适应性。我们的实验表明,我们的方法在不同的 benchmark 中具有强大的任何时间性和超越了当前的 meta-learning BO 方法。
DWReCO at CheckThat! 2023: Enhancing Subjectivity Detection through Style-based Data Sampling
results: 在英语、德语和土耳其语中,发现不同的主观风格都是有效的,而风格基本替换在土耳其语和英语中效果更好。然而,GPT-3模型在非英语语言中生成风格基本文本时有时会表现不佳。Abstract
This paper describes our submission for the subjectivity detection task at the CheckThat! Lab. To tackle class imbalances in the task, we have generated additional training materials with GPT-3 models using prompts of different styles from a subjectivity checklist based on journalistic perspective. We used the extended training set to fine-tune language-specific transformer models. Our experiments in English, German and Turkish demonstrate that different subjective styles are effective across all languages. In addition, we observe that the style-based oversampling is better than paraphrasing in Turkish and English. Lastly, the GPT-3 models sometimes produce lacklustre results when generating style-based texts in non-English languages.
摘要
这份论文描述了我们在CheckThat! Lab中的主观检测任务提交。为了解决任务中的类别不均衡,我们使用基于新闻业观点的主观检查表 generator GPT-3 模型生成了额外的训练材料。我们使用这些扩展训练集来练化语言特定的转换器模型。我们的实验表明,不同的主观风格在所有语言中都是有效的。此外,我们发现在土耳其语和英语中,风格基本替换比paraphrasing更有效。最后,GPT-3 模型在非英语语言中生成风格基本文本时有时会表现平庸。
Dynamic Graph Attention for Anomaly Detection in Heterogeneous Sensor Networks
results: 这篇论文透过使用实验和工业规模的多相运输设备实验,证明了 DyGATAD 方法在运算系统监控数据中的异常探测能力,特别是在早期发现 fault 的情况下表现出色,甚至在 fault 的严重程度很低时也能够实现高精度的探测。Abstract
In the era of digital transformation, systems monitored by the Industrial Internet of Things (IIoTs) generate large amounts of Multivariate Time Series (MTS) data through heterogeneous sensor networks. While this data facilitates condition monitoring and anomaly detection, the increasing complexity and interdependencies within the sensor network pose significant challenges for anomaly detection. Despite progress in this field, much of the focus has been on point anomalies and contextual anomalies, with lesser attention paid to collective anomalies. A less addressed but common variant of collective anomalies is when the abnormal collective behavior is caused by shifts in interrelationships within the system. This can be due to abnormal environmental conditions like overheating, improper operational settings resulting from cyber-physical attacks, or system-level faults. To address these challenges, this paper proposes DyGATAD (Dynamic Graph Attention for Anomaly Detection), a graph-based anomaly detection framework that leverages the attention mechanism to construct a continuous graph representation of multivariate time series by inferring dynamic edges between time series. DyGATAD incorporates an operating condition-aware reconstruction combined with a topology-based anomaly score, thereby enhancing the detection ability of relationship shifts. We evaluate the performance of DyGATAD using both a synthetic dataset with controlled varying fault severity levels and an industrial-scale multiphase flow facility benchmark featuring various fault types with different detection difficulties. Our proposed approach demonstrated superior performance in collective anomaly detection for sensor networks, showing particular strength in early-stage fault detection, even in the case of faults with minimal severity.
摘要
在数字转型时代,由工业互联网Of Things(IIoT)监控的系统会生成大量多变量时间序列(MTS)数据,该数据可以帮助condition monitoring和异常检测。然而,随着传感器网络的复杂度和互相关系的增加,异常检测受到了重大挑战。虽然在这个领域已经有了很多进展,但是大多数注意力是集中在点异常和上下文异常上,对集体异常的研究相对较少。一种较少被关注但很常见的集体异常情况是,在系统中的异常 коллектив行为是由系统间关系的变化引起的。这可能是因为环境条件异常(如过热)、不正确的操作设置(由于Cyber-physical attacks)或系统级别的故障。为解决这些挑战,这篇论文提出了 DyGATAD(动态图注意力异常检测),一种基于图的异常检测框架,通过注意力机制来构建多变量时间序列中的连续图表示。DyGATAD结合了运行条件感知重建和图形异常分数,从而提高了关系变化的检测能力。我们使用了一个synthetic数据集和一个工业级多相流设施测试数据来评估DyGATAD的性能。我们的提出的方法在感知器网络中的集体异常检测方面表现出色,特别是在早期异常检测和轻度异常检测方面。
Roman Numeral Analysis with Graph Neural Networks: Onset-wise Predictions from Note-wise Features
results: 对于参考数据集,提出的 ChordGNN 模型表现更高精度,比对 existed 状态的艺术模型更高。此外,我们还 investigate 了模型的 variant,包括 NADE 和后处理技术。完整的代码可以在 GitHub 上找到。Abstract
Roman Numeral analysis is the important task of identifying chords and their functional context in pieces of tonal music. This paper presents a new approach to automatic Roman Numeral analysis in symbolic music. While existing techniques rely on an intermediate lossy representation of the score, we propose a new method based on Graph Neural Networks (GNNs) that enable the direct description and processing of each individual note in the score. The proposed architecture can leverage notewise features and interdependencies between notes but yield onset-wise representation by virtue of our novel edge contraction algorithm. Our results demonstrate that ChordGNN outperforms existing state-of-the-art models, achieving higher accuracy in Roman Numeral analysis on the reference datasets. In addition, we investigate variants of our model using proposed techniques such as NADE, and post-processing of the chord predictions. The full source code for this work is available at https://github.com/manoskary/chordgnn
摘要
Symbolic music中的罗马数字分析是一项重要的任务,旨在识别乐曲中的和声和其功能上下文。这篇论文提出了一种新的自动罗马数字分析方法,基于图神经网络(GNNs),可以直接描述乐曲中每个个音的特征和相互关系。我们的建议可以利用每个音的特征和间隔之间的相互关系,并通过我们的新的边缩合算法将每个音转换为和声表示。我们的结果表明,ChordGNN比现有的状态当前模型高效,在参照数据集上达到更高的罗马数字分析准确率。此外,我们还考虑了我们的模型的变体,使用提出的技术如NADE,以及后处理矩阵预测结果。完整的代码可以在https://github.com/manoskary/chordgnn上获取。
Do DL models and training environments have an impact on energy consumption?
paper_authors: Santiago del Rey, Silverio Martínez-Fernández, Luís Cruz, Xavier Franch
for: 降低深度学习模型训练时的碳脚印。
methods: 分析模型架构和训练环境对训练更绿色计算机视觉模型的影响。
results: 选择合适的模型架构和训练环境可以减少能源消耗(最高达98.83%),但 Correctness 下降很小。 GPU 适应模型计算复杂性的增长,以提高能效性。Abstract
Current research in the computer vision field mainly focuses on improving Deep Learning (DL) correctness and inference time performance. However, there is still little work on the huge carbon footprint that has training DL models. This study aims to analyze the impact of the model architecture and training environment when training greener computer vision models. We divide this goal into two research questions. First, we analyze the effects of model architecture on achieving greener models while keeping correctness at optimal levels. Second, we study the influence of the training environment on producing greener models. To investigate these relationships, we collect multiple metrics related to energy efficiency and model correctness during the models' training. Then, we outline the trade-offs between the measured energy efficiency and the models' correctness regarding model architecture, and their relationship with the training environment. We conduct this research in the context of a computer vision system for image classification. In conclusion, we show that selecting the proper model architecture and training environment can reduce energy consumption dramatically (up to 98.83%) at the cost of negligible decreases in correctness. Also, we find evidence that GPUs should scale with the models' computational complexity for better energy efficiency.
摘要
现有研究主要集中在深度学习(DL)正确性和推理速度表现 improvemen。然而,还没有很多关于训练DL模型的巨大碳脚印的工作。这种研究目标是分析训练绿色计算机视觉模型时的模型架构和训练环境的影响。我们将这个目标分成两个研究问题。第一个问题是分析保持正确性水平时实现绿色模型的模型架构的影响。第二个问题是研究训练环境对生成绿色模型的影响。为了调查这些关系,我们收集了多个能效性和模型正确性的指标 during 模型的训练。然后,我们描述了在模型架构和训练环境的影响下,能效性和正确性之间的交易。我们在图像分类计算机视觉系统中进行了这种研究。结果表明,选择合适的模型架构和训练环境可以减少能 consumption (最多98.83%),同时对正确性的影响很小。此外,我们发现GPU在模型的计算复杂性增加时应该呈现加速的趋势,以实现更好的能效性。
Contrastive Graph Pooling for Explainable Classification of Brain Networks
results: 该paper在5个休息态fMRI脑网络数据集上进行了应用,并证明了其在比较现有基线上的超越。case study表明,该方法提取的特征与 neuroscience文献中的领域知识匹配,并揭示了直观的发现。该paper的贡献表明了ContrastPool在理解脑网络和神经退化疾病方面的潜力。Abstract
Functional magnetic resonance imaging (fMRI) is a commonly used technique to measure neural activation. Its application has been particularly important in identifying underlying neurodegenerative conditions such as Parkinson's, Alzheimer's, and Autism. Recent analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions.
摘要
функциональная магнитная резонансная томография (fMRI) 是一种常用的技术来测量神经活化。它的应用尤其重要在发现下面的 нейродегенератив Conditions such as Parkinson's, Alzheimer's, and Autism. current analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions.Here's the translation in Traditional Chinese as well: функціональна магнітна резонансна томографія (fMRI) 是一种常用的技术来测量神经活化。它的应用尤其重要在发现下面的 нейродегенератив Conditions such as Parkinson's, Alzheimer's, and Autism. current analysis of fMRI data models the brain as a graph and extracts features by graph neural networks (GNNs). However, the unique characteristics of fMRI data require a special design of GNN. Tailoring GNN to generate effective and domain-explainable features remains challenging. In this paper, we propose a contrastive dual-attention block and a differentiable graph pooling method called ContrastPool to better utilize GNN for brain networks, meeting fMRI-specific requirements. We apply our method to 5 resting-state fMRI brain network datasets of 3 diseases and demonstrate its superiority over state-of-the-art baselines. Our case study confirms that the patterns extracted by our method match the domain knowledge in neuroscience literature, and disclose direct and interesting insights. Our contributions underscore the potential of ContrastPool for advancing the understanding of brain networks and neurodegenerative conditions.
Incentive Allocation in Vertical Federated Learning Based on Bankruptcy Problem
methods: 本论文使用了 банкрот游戏理论的变形,known as the Bankruptcy Problem,并使用了塔尔散分法解决问题。
results: 本论文透过实验和实际数据显示,证明了其可以保证参与者受益,并且比较了旧的计算Shapley值的方法,表明了其的方法更加有效,需要 fewer computations。Abstract
Vertical federated learning (VFL) is a promising approach for collaboratively training machine learning models using private data partitioned vertically across different parties. Ideally in a VFL setting, the active party (party possessing features of samples with labels) benefits by improving its machine learning model through collaboration with some passive parties (parties possessing additional features of the same samples without labels) in a privacy preserving manner. However, motivating passive parties to participate in VFL can be challenging. In this paper, we focus on the problem of allocating incentives to the passive parties by the active party based on their contributions to the VFL process. We formulate this problem as a variant of the Nucleolus game theory concept, known as the Bankruptcy Problem, and solve it using the Talmud's division rule. We evaluate our proposed method on synthetic and real-world datasets and show that it ensures fairness and stability in incentive allocation among passive parties who contribute their data to the federated model. Additionally, we compare our method to the existing solution of calculating Shapley values and show that our approach provides a more efficient solution with fewer computations.
摘要
纵向联合学习(VFL)是一种有前途的方法,通过私有数据分区Vertically Across不同的方针进行机器学习模型的共同训练。在VFLSetting中,活跃的方(具有标签的样本的特征)可以通过与一些被动方(不具有标签的样本的特征)的合作来改进其机器学习模型,这样做得有隐私保护的方式。然而,鼓励被动方参与VFL可以是困难的。在这篇论文中,我们关注在给被动方分配奖励的问题上。我们将这个问题定义为变种的核心游戏理论概念——银行rup难题,并使用塔尔摩德分配规则解决。我们对 synthetic 和实际数据集进行了评估,并证明了我们的提议方法能确保在被动方参与VFL过程中奖励分配是公平和稳定的。此外,我们与现有的计算Shapley值的方法进行比较,并证明了我们的方法提供了更高效的解决方案,计算量更少。
DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification
results: DEFT在实验中显示出了与现有简化器相比明显的提高,同时保持高度的收敛性。Abstract
Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sparsification scheme, DEFT, that partitions the gradient selection task into sub tasks and distributes them to workers. DEFT differs from existing sparsifiers, wherein every worker selects gradients among all gradients. Consequently, the computational cost can be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to select gradients in partitions that are non-intersecting (between workers). Therefore, even if the number of workers increases, the communication traffic can be maintained as per user requirement. To avoid the loss of significance of gradient selection, DEFT selects more gradients in the layers that have a larger gradient norm than the other layers. Because every layer has a different computational load, DEFT allocates layers to workers using a bin-packing algorithm to maintain a balanced load of gradient selection between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed in gradient selection over existing sparsifiers while achieving high convergence performance.
摘要
分布式深度学习中的梯度简化是一种广泛采用的解决方案,以减少分布式学习中的过度通信交流。然而,现有的大多数梯度简化方法具有较差的可扩展性,因为梯度选择和/或梯度积累带来了较高的计算成本。为了解决这些挑战,我们提出了一种新的梯度简化方案,即DEFT。DEFT方案将梯度选择任务分解成多个子任务,并将它们分配给工作者。与现有的简化方法不同,DEFT中每个工作者不需要选择所有的梯度。因此,计算成本可以随着工作者数量的增加而减少。此外,梯度积累可以被消除,因为DEFT允许工作者在不相交的 partition( между工作者)中选择梯度。因此,即使工作者数量增加,通信交流也可以保持在用户需求的水平。为了保持梯度选择的重要性,DEFT在各层中选择更多的梯度,以避免梯度简化导致的数据损失。由于每层都有不同的计算负担,DEFT使用一种堆叠算法将层分配给工作者,以保持各个层的梯度选择均衡。在我们的实验评估中,DEFT在速度和稳定性两个方面显示出了明显的改善,而且可以实现高度的并行化。
HoughLaneNet: Lane Detection with Deep Hough Transform and Dynamic Convolution
for: 本研究旨在提高自动驾驶车辆 Lane detection 的精度,以便更好地满足自动驾驶技术的需求。
methods: 本研究提出了一种基于 hierarchical Deep Hough Transform (DHT) 的方法,利用图像中所有的 Lane 特征在 Hough 参数空间进行组合。此外,还提出了一种改进点选择方法和一种动态卷积模块,以更好地 differentiate между各个 Lane 特征。
results: 实验结果表明,提出的方法在检测受掩蔽或损坏的 Lane 图像时表现出色,与现有技术相比,其性能有所提高。Abstract
The task of lane detection has garnered considerable attention in the field of autonomous driving due to its complexity. Lanes can present difficulties for detection, as they can be narrow, fragmented, and often obscured by heavy traffic. However, it has been observed that the lanes have a geometrical structure that resembles a straight line, leading to improved lane detection results when utilizing this characteristic. To address this challenge, we propose a hierarchical Deep Hough Transform (DHT) approach that combines all lane features in an image into the Hough parameter space. Additionally, we refine the point selection method and incorporate a Dynamic Convolution Module to effectively differentiate between lanes in the original image. Our network architecture comprises a backbone network, either a ResNet or Pyramid Vision Transformer, a Feature Pyramid Network as the neck to extract multi-scale features, and a hierarchical DHT-based feature aggregation head to accurately segment each lane. By utilizing the lane features in the Hough parameter space, the network learns dynamic convolution kernel parameters corresponding to each lane, allowing the Dynamic Convolution Module to effectively differentiate between lane features. Subsequently, the lane features are fed into the feature decoder, which predicts the final position of the lane. Our proposed network structure demonstrates improved performance in detecting heavily occluded or worn lane images, as evidenced by our extensive experimental results, which show that our method outperforms or is on par with state-of-the-art techniques.
摘要
自动驾驶领域内,车道检测已经吸引了非常大的关注,因为它的复杂性。车道可能会变窄、散乱或者受到交通干扰,但是观察到的是车道具有几何结构,这使得使用这个特点可以提高车道检测的结果。为了解决这个挑战,我们提出了层次式深度投影变换(DHT)方法,将整个图像中的所有车道特征都归类到投影参数空间中。此外,我们还改进了点选择方法,并在原始图像中添加了动态卷积模块,以有效地将车道特征分化开。我们的网络架构包括后备网络(可以是ResNet或Pyramid Vision Transformer)、特征层次网络作为颈部EXTRACT多个尺度特征,以及层次DHT基于特征归一化头来准确地分类每条车道。通过利用车道特征在投影参数空间中,网络学习了动态卷积kernel参数相应于每条车道,使得动态卷积模块能够有效地分化开车道特征。最后,车道特征被传递给特征解码器,解码器预测了车道的最终位置。我们提出的网络结构在实际实验中表现出色,证明我们的方法在检测受阻或损坏车道图像时表现出优于或与当前领先技术相当。
ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers
results: ITA在22nm Fully-Depleted Silicon-on-Insulator技术下,在0.8V voltage下达到16.9 TOPS/W的能效率,同时在面积效率方面超过现有的Transformer加速器。Abstract
Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V.
摘要
transformer 网络已经成为自然语言处理任务的状态泰斗方法,而在计算机视觉和音频处理领域也越来越受欢迎。然而,加速transformer模型的高精度计算和大量内存需求带来新的挑战。在这篇文章中,我们提出了一种名为ITA的新加速架构,这种架构 targets高效的在嵌入式系统上进行推理,通过8位量化和一种新的软 макс实现,该实现具有快速计算和减少数据移动的特点。ITA在0.8V 22nm Fully-depleted silicon-on-insulator技术中实现了0.8V 22nm Fully-depleted silicon-on-insulator技术中实现了16.9 TOPS/W的能效率,同时也超过了现有的state-of-the-art transformer加速器的5.93 TOPS/mm$^2$ 。
Adaptive Graph Convolution Networks for Traffic Flow Forecasting
results: 实验结果表明,AGC-net 可以准确预测交通流速度,并且与其他基eline模型相比,有 significannot 的提高。两个公共的交通数据集上的实验结果都表明了 AGC-net 的效果。Abstract
Traffic flow forecasting is a highly challenging task due to the dynamic spatial-temporal road conditions. Graph neural networks (GNN) has been widely applied in this task. However, most of these GNNs ignore the effects of time-varying road conditions due to the fixed range of the convolution receptive field. In this paper, we propose a novel Adaptive Graph Convolution Networks (AGC-net) to address this issue in GNN. The AGC-net is constructed by the Adaptive Graph Convolution (AGC) based on a novel context attention mechanism, which consists of a set of graph wavelets with various learnable scales. The AGC transforms the spatial graph representations into time-sensitive features considering the temporal context. Moreover, a shifted graph convolution kernel is designed to enhance the AGC, which attempts to correct the deviations caused by inaccurate topology. Experimental results on two public traffic datasets demonstrate the effectiveness of the AGC-net\footnote{Code is available at: https://github.com/zhengdaoli/AGC-net} which outperforms other baseline models significantly.
摘要
traffic flow forecasting 是一个非常具有挑战性的任务,因为道路条件在空间和时间上都是动态的。 graph neural networks (GNN) 已经广泛应用于这个任务。然而,大多数这些 GNN 忽略了时间变化的道路条件,因为它们的固定范围的卷积感知场所不能考虑时间上的变化。在这篇论文中,我们提出了一种新的 Adaptive Graph Convolution Networks (AGC-net),用于解决 GNN 中的这个问题。AGC-net 由 Adaptive Graph Convolution (AGC) 基于一种新的上下文注意机制组成,该机制包括一组可学习的扩散尺度的图波лет。AGC 将空间图表示转化为时间敏感的特征,考虑到时间上的上下文。此外,我们还设计了一个偏移 graph convolution kernel,用于强化 AGC,以尝试修正因为不准确的 topology 所导致的偏差。实验结果表明,AGC-net 在两个公共的交通数据集上表现出色,与其他基准模型相比,具有显著的优势。Note: Please note that the translation is in Simplified Chinese, and the word order may be different from the original text.
Learning Theory of Distribution Regression with Neural Networks
results: 本文通过一种新的两阶段错误分解技术, derivation of almost optimal learning rates for the proposed distribution regression model up to logarithmic terms。Abstract
In this paper, we aim at establishing an approximation theory and a learning theory of distribution regression via a fully connected neural network (FNN). In contrast to the classical regression methods, the input variables of distribution regression are probability measures. Then we often need to perform a second-stage sampling process to approximate the actual information of the distribution. On the other hand, the classical neural network structure requires the input variable to be a vector. When the input samples are probability distributions, the traditional deep neural network method cannot be directly used and the difficulty arises for distribution regression. A well-defined neural network structure for distribution inputs is intensively desirable. There is no mathematical model and theoretical analysis on neural network realization of distribution regression. To overcome technical difficulties and address this issue, we establish a novel fully connected neural network framework to realize an approximation theory of functionals defined on the space of Borel probability measures. Furthermore, based on the established functional approximation results, in the hypothesis space induced by the novel FNN structure with distribution inputs, almost optimal learning rates for the proposed distribution regression model up to logarithmic terms are derived via a novel two-stage error decomposition technique.
摘要
在这篇论文中,我们目标是建立分布回归的近似理论和学习理论,通过全连接神经网络(FNN)来实现。与传统的回归方法不同,分布回归的输入变量是概率度量。因此,我们需要进行第二阶采样过程来近似实际分布的信息。然而,传统的神经网络结构需要输入变量为向量。当输入样本是概率分布时,传统的深度神经网络方法无法直接使用,这会导致技术困难。我们需要一种具有良好定义的神经网络结构来处理分布输入。在现有的数学模型和理论分析之外,我们在FNN结构中建立了一个新的分布输入神经网络框架,以实现函数als定义在柯博尔概率度量空间上的近似理论。此外,基于建立的函数近似结果,我们通过一种新的两阶错 decomposition技术,在带有分布输入的FNN结构下, derivation almost optimal learning rate的提案 Distribution Regression模型,即使到对数阶段。
Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning
results: PPO Agent 可以预测下一个成就的解锁程度,并且通过 achievement distillation 方法强化了 agent 的成就预测能力,显示了在具有更多的模型参数和更高效的样本收集方法下达到了 state-of-the-art 性能。Abstract
Discovering achievements with a hierarchical structure on procedurally generated environments poses a significant challenge. This requires agents to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods are built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be beneficial for learning hierarchical achievements. However, these methods require an excessive amount of environment interactions or large model sizes, limiting their practicality. In this work, we identify that proximal policy optimization (PPO), a simple and versatile model-free algorithm, outperforms the prior methods with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, though with low confidence. Based on this observation, we propose a novel contrastive learning method, called achievement distillation, that strengthens the agent's capability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment using fewer model parameters in a sample-efficient regime.
摘要
发现具有层次结构的成就需要智能体具备广泛的能力,包括通用化和长期逻辑。许多先前方法基于模型化或层次方法,假设有一个显式的长期规划模块可以帮助学习层次成就。然而,这些方法需要很多环境互动或大型模型,限制其实用性。在这种情况下,我们发现,靠近策略优化(PPO)算法,一种简单而多功能的模型自由算法,在当前实施方法下表现出色,超越先前的方法。此外,我们发现PPO Agent可以预测下一个成就的概率,虽然 confidence 较低。基于这个观察,我们提出了一种新的对比学习方法,即成就馆定,该方法可以增强智能体预测下一个成就的能力。我们的方法在挑战性的 Crafter 环境中表现出色,使用更少的模型参数在样本效率的 régime 中达到了领先的性能。
Unpaired Multi-View Graph Clustering with Cross-View Structure Matching
paper_authors: Yi Wen, Siwei Wang, Qing Liao, Weixuan Liang, Ke Liang, Xinhang Wan, Xinwang Liu
for: addresses the data-unpaired problem (DUP) in multi-view literature by proposing a novel parameter-free graph clustering framework.
methods: utilizes the structural information from each view to refine cross-view correspondences, and is a unified framework for both fully and partially unpaired multi-view graph clustering.
results: extensive experiments demonstrate the effectiveness and generalization of the proposed framework for both paired and unpaired datasets.Here’s the full text in Simplified Chinese:
for: Addresses the data-unpaired problem (DUP) in multi-view literature by proposing a novel parameter-free graph clustering framework.
methods: Utilizes the structural information from each view to refine cross-view correspondences, and is a unified framework for both fully and partially unpaired multi-view graph clustering.
results: Extensive experiments demonstrate the effectiveness and generalization of the proposed framework for both paired and unpaired datasets.Abstract
Multi-view clustering (MVC), which effectively fuses information from multiple views for better performance, has received increasing attention. Most existing MVC methods assume that multi-view data are fully paired, which means that the mappings of all corresponding samples between views are pre-defined or given in advance. However, the data correspondence is often incomplete in real-world applications due to data corruption or sensor differences, referred as the data-unpaired problem (DUP) in multi-view literature. Although several attempts have been made to address the DUP issue, they suffer from the following drawbacks: 1) Most methods focus on the feature representation while ignoring the structural information of multi-view data, which is essential for clustering tasks; 2) Existing methods for partially unpaired problems rely on pre-given cross-view alignment information, resulting in their inability to handle fully unpaired problems; 3) Their inevitable parameters degrade the efficiency and applicability of the models. To tackle these issues, we propose a novel parameter-free graph clustering framework termed Unpaired Multi-view Graph Clustering framework with Cross-View Structure Matching (UPMGC-SM). Specifically, unlike the existing methods, UPMGC-SM effectively utilizes the structural information from each view to refine cross-view correspondences. Besides, our UPMGC-SM is a unified framework for both the fully and partially unpaired multi-view graph clustering. Moreover, existing graph clustering methods can adopt our UPMGC-SM to enhance their ability for unpaired scenarios. Extensive experiments demonstrate the effectiveness and generalization of our proposed framework for both paired and unpaired datasets.
摘要
多视图划分(MVC),能够有效地将多个视图中的信息结合起来,在最近几年内受到了越来越多的关注。大多数现有的MVC方法假设所有视图中的样本都是已知的,即所有样本之间的映射都是先前定义的。然而,在实际应用中,数据对应关系 oftentimes 是不完全的,这被称为多视图数据不对应问题(DUP)。虽然有几种尝试 Addressing the DUP issue,但它们受到以下缺点的限制:1)大多数方法专注于特征表示,而忽略多视图数据的结构信息,这是 clustering 任务中非常重要的; 2)现有的方法只适用于部分不对应的问题,它们无法处理完全不对应的问题; 3)它们的参数会降低模型的效率和可应用性。为了解决这些问题,我们提出了一种无参数的图 clustering 框架,名为无参数多视图图 clustering 框架 with Cross-View Structure Matching(UPMGC-SM)。与现有方法不同的是,UPMGC-SM 可以充分利用每个视图中的结构信息,以改进 cross-view 对应关系。此外,我们的 UPMGC-SM 是一种通用的框架,可以处理完全不对应和部分不对应的多视图图 clustering 问题。此外,现有的图 clustering 方法可以采用我们的 UPMGC-SM 来增强它们对不对应场景的能力。广泛的实验表明我们提出的框架在 paired 和 unpaired 数据集上的效果和通用性都很强。
results: 实际应用中,这种方法能够在使用macro经济变量和价格数据时,超过选择最佳股票模型的寻找方法。Abstract
A model among many may only be best under certain states of the world. Switching from a model to another can also be costly. Finding a procedure to dynamically choose a model in these circumstances requires to solve a complex estimation procedure and a dynamic programming problem. A Reinforcement learning algorithm is used to approximate and estimate from the data the optimal solution to this dynamic programming problem. The algorithm is shown to consistently estimate the optimal policy that may choose different models based on a set of covariates. A typical example is the one of switching between different portfolio models under rebalancing costs, using macroeconomic information. Using a set of macroeconomic variables and price data, an empirical application to the aforementioned portfolio problem shows superior performance to choosing the best portfolio model with hindsight.
摘要
一个模型在多种状况下只是最佳的。从一个模型到另一个的转换也可能是昂贵的。在这些情况下,找到一种动态选择模型的过程需要解决一个复杂的估计问题和动态programming问题。一种强化学习算法可以从数据中approxiamte和估计最佳解决方案。这种算法能够适应不同状况下的模型选择。一个典型的应用是在划转成本下选择不同的投资模型,使用macro经济信息。使用一组macro经济变量和价格数据,对投资问题的一个empirical应用表现出色,比选择划算后的最佳投资模型更高效。
paper_authors: Jaemyung Lee, Kyeongtak Han, Jaehoon Kim, Hasun Yu, Youhan Lee
For: The paper aims to provide a unified research framework for protein folding, called Solvent, which supports various state-of-the-art models and enables consistent and fair comparisons among different approaches.* Methods: Solvent is built with a modular design, allowing for different models to be easily integrated and trained on the same dataset. The framework includes implementations of several well-known algorithms and their components, and provides a variety of training and evaluation options.* Results: The paper presents experiments using Solvent to benchmark well-known algorithms and their components, providing insights into the protein structure modeling field. The results demonstrate the potential of Solvent to increase the reliability and consistency of proposed models, as well as improve efficiency in both speed and costs.Abstract
Consistency and reliability are crucial for conducting AI research. Many famous research fields, such as object detection, have been compared and validated with solid benchmark frameworks. After AlphaFold2, the protein folding task has entered a new phase, and many methods are proposed based on the component of AlphaFold2. The importance of a unified research framework in protein folding contains implementations and benchmarks to consistently and fairly compare various approaches. To achieve this, we present Solvent, a protein folding framework that supports significant components of state-of-the-art models in the manner of an off-the-shelf interface Solvent contains different models implemented in a unified codebase and supports training and evaluation for defined models on the same dataset. We benchmark well-known algorithms and their components and provide experiments that give helpful insights into the protein structure modeling field. We hope that Solvent will increase the reliability and consistency of proposed models and give efficiency in both speed and costs, resulting in acceleration on protein folding modeling research. The code is available at https://github.com/kakaobrain/solvent, and the project will continue to be developed.
摘要
“一致性和可靠性是AI研究中非常重要的。许多著名的研究领域,如对象检测,都已经被比较和验证了坚实的 bencmark 框架。 alphaFold2 后,蛋白质折叠任务进入了新的阶段,许多方法都是基于 alphaFold2 的组件。一个统一的研究框架在蛋白质折叠中的重要性,它可以一直支持当前领先的模型组件,并且可以在同一个代码库中实现和评估定义的模型。我们称之为 Solvent,它支持当前领先的模型组件,并且可以在同一个代码库中实现和评估定义的模型。我们对一些知名的算法和其组件进行了比较,并提供了有用的实验结果,它们可以帮助我们更好地理解蛋白质结构模型领域。我们希望 Solvent 能够增加提案模型的一致性和可靠性,并且能够提高速度和成本的效率,从而加速蛋白质结构模型研究。代码可以在 https://github.com/kakaobrain/solvent 上获取,项目将继续开发。”
A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection
results: 本文对GNN4TS的研究进行了全面的回顾和评估,并介绍了一些代表性的研究和应用例子,同时也预测了未来研究的发展趋势。Abstract
Time series are the primary data type used to record dynamic system measurements and generated in great volume by both physical sensors and online processes (virtual sensors). Time series analytics is therefore crucial to unlocking the wealth of information implicit in available data. With the recent advancements in graph neural networks (GNNs), there has been a surge in GNN-based approaches for time series analysis. These approaches can explicitly model inter-temporal and inter-variable relationships, which traditional and other deep neural network-based methods struggle to do. In this survey, we provide a comprehensive review of graph neural networks for time series analysis (GNN4TS), encompassing four fundamental dimensions: forecasting, classification, anomaly detection, and imputation. Our aim is to guide designers and practitioners to understand, build applications, and advance research of GNN4TS. At first, we provide a comprehensive task-oriented taxonomy of GNN4TS. Then, we present and discuss representative research works and introduce mainstream applications of GNN4TS. A comprehensive discussion of potential future research directions completes the survey. This survey, for the first time, brings together a vast array of knowledge on GNN-based time series research, highlighting foundations, practical applications, and opportunities of graph neural networks for time series analysis.
摘要
时序序列是主要数据类型,用于记录动态系统测量结果,并且由物理感知器和在线过程生成大量数据(虚拟感知器)。时序序列分析是解锁可用数据中的宝库的关键。随着图神经网络(GNN)的最近进步,GNN-based时序序列分析方法在不断增长。这些方法可以直接模型时间和空间关系,传统和其他深度神经网络基于方法难以完成。在这项调查中,我们提供了完整的图神经网络时序序列分析(GNN4TS)评论,涵盖四个基本维度:预测、分类、异常检测和补做。我们的目标是帮助设计者和实践者理解、建立应用和推动GNN4TS研究。首先,我们提供了完整的任务导向的分类法GNN4TS。然后,我们展示和讨论了代表性的研究工作,并介绍了主流应用GNN4TS。最后,我们对未来研究方向进行了全面的讨论,这项调查,是首次将大量关于GNN基于时序序列研究的知识集中,把注重Foundations、实践应用和机遇的图神经网络时序序列分析。
Differential Privacy for Clustering Under Continual Observation
results: 提出了一种基于私有隐私 greedy 近似算法和维度减少算法的方法,可以实现高效的隐私 clustering。此外, partial 扩展了结果到 $k$-medians 问题。Abstract
We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem.
摘要
我们考虑一个隐私 clustering 问题,对于一个在 $\mathbb{R}^d$ 上的资料集,该资料集会在批量更新的情况下进行插入和删除点。我们提供了一个 $\varepsilon$-隐私 clustering 机制,用于 $k$-means 目标下,并且这个方法具有对数幂递增的误差。我们还详细说明了如何在批量更新下进行维度缩减,并且与隐私保证的暴末搜索法相结合。此外,我们也对 $k$-medians 问题进行了一定的扩展。
Merging-Diverging Hybrid Transformer Networks for Survival Prediction in Head and Neck Cancer
paper_authors: Mingyuan Meng, Lei Bi, Michael Fulham, Dagan Feng, Jinman Kim for:This paper aims to improve survival prediction for cancer patients by developing a deep learning model that can effectively fuse multi-modality images (e.g., PET-CT) and extract region-specific information.methods:The proposed method uses a merging-diverging learning framework, which consists of a merging encoder and a diverging decoder. The merging encoder uses a Hybrid Parallel Cross-Attention (HPCA) block to fuse multi-modality features, while the diverging decoder uses a Region-specific Attention Gate (RAG) block to screen out features related to lesion regions.results:The proposed method (XSurv) outperforms state-of-the-art survival prediction methods on the public dataset of HECKTOR 2022. Specifically, XSurv combines the complementary information in PET and CT images and extracts region-specific prognostic information in PT and MLN regions, leading to improved survival prediction accuracy.Abstract
Survival prediction is crucial for cancer patients as it provides early prognostic information for treatment planning. Recently, deep survival models based on deep learning and medical images have shown promising performance for survival prediction. However, existing deep survival models are not well developed in utilizing multi-modality images (e.g., PET-CT) and in extracting region-specific information (e.g., the prognostic information in Primary Tumor (PT) and Metastatic Lymph Node (MLN) regions). In view of this, we propose a merging-diverging learning framework for survival prediction from multi-modality images. This framework has a merging encoder to fuse multi-modality information and a diverging decoder to extract region-specific information. In the merging encoder, we propose a Hybrid Parallel Cross-Attention (HPCA) block to effectively fuse multi-modality features via parallel convolutional layers and cross-attention transformers. In the diverging decoder, we propose a Region-specific Attention Gate (RAG) block to screen out the features related to lesion regions. Our framework is demonstrated on survival prediction from PET-CT images in Head and Neck (H&N) cancer, by designing an X-shape merging-diverging hybrid transformer network (named XSurv). Our XSurv combines the complementary information in PET and CT images and extracts the region-specific prognostic information in PT and MLN regions. Extensive experiments on the public dataset of HEad and neCK TumOR segmentation and outcome prediction challenge (HECKTOR 2022) demonstrate that our XSurv outperforms state-of-the-art survival prediction methods.
摘要
生存预测对于癌症患者非常重要,因为它提供了早期的诊断信息,以便为治疗规划。最近,深度存生模型基于深度学习和医疗图像已经展示了有前景的表现。然而,现有的深度存生模型尚未充分利用多Modalities图像(例如PET-CT),也没有充分提取区域特定信息(例如主要肿瘤(PT)和肿瘤静脉节(MLN)区域的诊断信息)。为了解决这一问题,我们提出了一种合并-分化学习框架 для存生预测。这个框架包括一个合并编码器,用于融合多Modalities信息,以及一个分化解码器,用于提取区域特定信息。在合并编码器中,我们提出了一种Hybrid Parallel Cross-Attention(HPCA)块,用于有效地融合多Modalities特征,并通过并行的卷积层和交互变换器来实现。在分化解码器中,我们提出了一种Region-specific Attention Gate(RAG)块,用于筛选出病理区域相关的特征。我们的框架在PET-CT图像上进行存生预测,并通过设计一个X-形合并-分化混合变换网络(名为XSurv)来组合PET和CT图像的补做信息,并提取PT和MLN区域的区域特定诊断信息。我们的XSurv在HECKTOR2022公共数据集上进行了广泛的实验,并证明了它的出色表现。
Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model
results: 实验表明,该方法在一个室内和两个遥感数据集上显示出了比其他高级深度学习基于合并方法更高的性能。Abstract
Hyperspectral images (HSI) have a large amount of spectral information reflecting the characteristics of matter, while their spatial resolution is low due to the limitations of imaging technology. Complementary to this are multispectral images (MSI), e.g., RGB images, with high spatial resolution but insufficient spectral bands. Hyperspectral and multispectral image fusion is a technique for acquiring ideal images that have both high spatial and high spectral resolution cost-effectively. Many existing HSI and MSI fusion algorithms rely on known imaging degradation models, which are often not available in practice. In this paper, we propose a deep fusion method based on the conditional denoising diffusion probabilistic model, called DDPM-Fus. Specifically, the DDPM-Fus contains the forward diffusion process which gradually adds Gaussian noise to the high spatial resolution HSI (HrHSI) and another reverse denoising process which learns to predict the desired HrHSI from its noisy version conditioning on the corresponding high spatial resolution MSI (HrMSI) and low spatial resolution HSI (LrHSI). Once the training is completes, the proposed DDPM-Fus implements the reverse process on the test HrMSI and LrHSI to generate the fused HrHSI. Experiments conducted on one indoor and two remote sensing datasets show the superiority of the proposed model when compared with other advanced deep learningbased fusion methods. The codes of this work will be opensourced at this address: https://github.com/shuaikaishi/DDPMFus for reproducibility.
摘要
干扰图像(HSI)具有大量的spectral信息,反映物质特点,但其空间分辨率受成像技术限制而受到限制。与此相对的是多spectral图像(MSI),如RGB图像,具有高空间分辨率,但lack spectral bands。干扰图像和多spectral图像的图像混合是一种获得理想图像,即高空间和高spectral分辨率的图像,可以在成本效益的情况下获得。现有的HSI和MSI混合算法多数基于知名的损坏模型,这些模型在实践中经常不可用。在这篇文章中,我们提出了基于条件滤波泛化模型的深度混合方法,称为DDPM-Fus。具体来说,DDPM-Fus包括前向滤波过程,逐渐添加高斯噪声到高空间分辨率干扰图像(HrHSI),以及另一个反向恢复过程,学习预测desired HrHSI的噪声版本,条件在HrMSI和LrHSI的帮助下。一旦训练完成,我们的DDPM-Fus会在测试HrMSI和LrHSI上实现反向过程,生成混合后的HrHSI。我们在一个室内和两个遥感数据集上进行了实验,并证明了我们的方法在其他先进的深度学习基于混合方法之上具有superiority。我们将在这个地址上开源我们的代码:https://github.com/shuaikaishi/DDPMFus,以便复制。
QI2 – an Interactive Tool for Data Quality Assurance
results: 在小例子数据集上,本方法能够成功地检查数据质量,并且在知名的MNIST数据集上进行了实践示例。Abstract
The importance of high data quality is increasing with the growing impact and distribution of ML systems and big data. Also the planned AI Act from the European commission defines challenging legal requirements for data quality especially for the market introduction of safety relevant ML systems. In this paper we introduce a novel approach that supports the data quality assurance process of multiple data quality aspects. This approach enables the verification of quantitative data quality requirements. The concept and benefits are introduced and explained on small example data sets. How the method is applied is demonstrated on the well known MNIST data set based an handwritten digits.
摘要
“高品质的数据价值在机器学习系统和大数据的普及和影响力增长的同时也在提高。欧盟委员会的AI法案也将要求严格的数据质量标准,特别是在安全相关的机器学习系统上市。本文将介绍一种支持多种数据质量方面的质量保证过程的新方法。这种方法可以评估量数据质量要求的实施情况。本文将以小型数据集作为例子,介绍概念和优点,并在知名的MNIST数据集上显示如何实施。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
AdaptiveRec: Adaptively Construct Pairs for Contrastive Learning in Sequential Recommendation
results: 实验结果表明,提出的方法能够提高推荐系统的性能,比 EXISTS 系统更高效。此外,该方法在不同的推荐场景下也有广泛的应用前景。Abstract
This paper presents a solution to the challenges faced by contrastive learning in sequential recommendation systems. In particular, it addresses the issue of false negative, which limits the effectiveness of recommendation algorithms. By introducing an advanced approach to contrastive learning, the proposed method improves the quality of item embeddings and mitigates the problem of falsely categorizing similar instances as dissimilar. Experimental results demonstrate performance enhancements compared to existing systems. The flexibility and applicability of the proposed approach across various recommendation scenarios further highlight its value in enhancing sequential recommendation systems.
摘要
Here's the text in Simplified Chinese:这篇论文提出了对比学习在序列推荐系统中的挑战,特别是False Negative问题,这限制了推荐算法的效iveness。通过引入高级的对比学习方法,提议方法可以改善item embedding的质量,避免错误地将相似的实例分类为不相似。实验结果表明,提议方法比现有系统有所提高,并且可以适用于不同的推荐enario,进一步强调它在序列推荐系统中的价值。
Learning from Heterogeneity: A Dynamic Learning Framework for Hypergraphs
results: 经过对多个popular数据集的广泛测试, comparing with11种现有的状态对节点分类和链接预测任务,该框架表现出了显著的性能提升(平均12.5%在节点分类任务上,13.3%在链接预测任务上),证明了该框架的有效性。Abstract
Graph neural network (GNN) has gained increasing popularity in recent years owing to its capability and flexibility in modeling complex graph structure data. Among all graph learning methods, hypergraph learning is a technique for exploring the implicit higher-order correlations when training the embedding space of the graph. In this paper, we propose a hypergraph learning framework named LFH that is capable of dynamic hyperedge construction and attentive embedding update utilizing the heterogeneity attributes of the graph. Specifically, in our framework, the high-quality features are first generated by the pairwise fusion strategy that utilizes explicit graph structure information when generating initial node embedding. Afterwards, a hypergraph is constructed through the dynamic grouping of implicit hyperedges, followed by the type-specific hypergraph learning process. To evaluate the effectiveness of our proposed framework, we conduct comprehensive experiments on several popular datasets with eleven state-of-the-art models on both node classification and link prediction tasks, which fall into categories of homogeneous pairwise graph learning, heterogeneous pairwise graph learning, and hypergraph learning. The experiment results demonstrate a significant performance gain (average 12.5% in node classification and 13.3% in link prediction) compared with recent state-of-the-art methods.
摘要
GRAPH NEURAL NETWORK (GNN) 在最近几年内得到了越来越多的推广,这主要归功于它在处理复杂图结构数据时的能力和灵活性。在所有图学习方法中,超 graf学习是一种技术,用于在训练图的嵌入空间时探索隐藏的高阶相关性。在这篇论文中,我们提出了一个名为LFH的超 graf学习框架,可以在动态组成hyperedge并通过heterogeneity attribute来进行注意力更新。具体来说,在我们的框架中,高质量的特征首先通过对称的对抗策略生成初始节点嵌入。接着,通过动态分组的超 graf组建,然后进行类型特定的超 graf学习过程。为了评估我们提出的框架的效果,我们在多个popular dataset上进行了对 eleven state-of-the-art模型的比较,包括节点分类和链接预测任务,这些任务可以分为同质对策graph学习、不同质对策graph学习和超 graf学习。实验结果表明,我们的提出的框架在节点分类和链接预测任务中表现出了显著的性能提升(平均12.5%和13.3%),相比最近的state-of-the-art方法。
Scalable High-Dimensional Multivariate Linear Regression for Feature-Distributed Data
results: 在模拟实验中,TSRGA具有快速对准性。最后,本论文应用了提案的TSRGA在金融应用中,利用10-K报告中的无结构数据,证明了其在具有多个紧密大维度矩阵的应用中的有用性。Abstract
Feature-distributed data, referred to data partitioned by features and stored across multiple computing nodes, are increasingly common in applications with a large number of features. This paper proposes a two-stage relaxed greedy algorithm (TSRGA) for applying multivariate linear regression to such data. The main advantage of TSRGA is that its communication complexity does not depend on the feature dimension, making it highly scalable to very large data sets. In addition, for multivariate response variables, TSRGA can be used to yield low-rank coefficient estimates. The fast convergence of TSRGA is validated by simulation experiments. Finally, we apply the proposed TSRGA in a financial application that leverages unstructured data from the 10-K reports, demonstrating its usefulness in applications with many dense large-dimensional matrices.
摘要
<>将文本翻译成简化中文。<>应用中逐渐增长的分布式数据(即根据特征分区存储在多个计算节点上的数据),这篇论文提出了一种两阶段松弛抽象算法(TSRGA)用于应用多变量直线回归。TSRGA的优点在于,它的通信复杂度不随特征维度增长,因此对很大数据集进行扩展非常可行。此外,对多变量响应变量,TSRGA可以生成低级卷积系数估计。在实验中,TSRGA的快速收敛性得到了验证。最后,我们在金融应用中使用了提案的 TSRGA,通过利用10-K报告中的无结构数据,示出了在具有多个稠密大维度矩阵的应用中的实用性。
A Self-Supervised Algorithm for Denoising Photoplethysmography Signals for Heart Rate Estimation from Wearables
results: 我们的算法可以提供更好的心率估计,并且对PPG信号的各种健康指标进行下游分析也显示了明显的改善。Abstract
Smart watches and other wearable devices are equipped with photoplethysmography (PPG) sensors for monitoring heart rate and other aspects of cardiovascular health. However, PPG signals collected from such devices are susceptible to corruption from noise and motion artifacts, which cause errors in heart rate estimation. Typical denoising approaches filter or reconstruct the signal in ways that eliminate much of the morphological information, even from the clean parts of the signal that would be useful to preserve. In this work, we develop an algorithm for denoising PPG signals that reconstructs the corrupted parts of the signal, while preserving the clean parts of the PPG signal. Our novel framework relies on self-supervised training, where we leverage a large database of clean PPG signals to train a denoising autoencoder. As we show, our reconstructed signals provide better estimates of heart rate from PPG signals than the leading heart rate estimation methods. Further experiments show significant improvement in Heart Rate Variability (HRV) estimation from PPG signals using our algorithm. We conclude that our algorithm denoises PPG signals in a way that can improve downstream analysis of many different health metrics from wearable devices.
摘要
智能手表和其他穿戴式设备通常配备了光谱 plethysmography (PPG) 传感器,用于监测心率和其他循环征象。然而,PPG 信号从这些设备中收集的信号受到噪声和运动artefacts的污染,导致心率估计出错。现有的减噪方法通常使用过滤或重建信号的方式,以消除大量的形态信息,包括净化部分的PPG信号,这些信号是有用的保留。在这种工作中,我们开发了一种用于减噪PPG信号的算法,可以重建污染的部分信号,同时保留净化部分的PPG信号。我们的新框架基于自我超vised学习,我们利用大量的净化PPG信号数据库来训练一个减噪自适应神经网络。我们的重建信号提供了更好的心率估计,与主流心率估计方法相比。进一步的实验表明,我们的算法可以大幅提高来自PPG信号的循环变化估计(HRV)。我们 conclude that我们的算法可以有效地减噪PPG信号,以提高来自穿戴式设备的多种健康指标的分析。
Goal-Conditioned Predictive Coding as an Implicit Planner for Offline Reinforcement Learning
results: 研究发现,Sequence modeling 有效地减少了一些决策任务的训练时间,并且可以学习出高性能的策略。此外,GCPC 方法学习了一个 Conditioned 的未来 Representation,并在 AntMaze、FrankaKitchen 和 Locomotion 环境中达到了竞争性的性能。Abstract
Recent work has demonstrated the effectiveness of formulating decision making as a supervised learning problem on offline-collected trajectories. However, the benefits of performing sequence modeling on trajectory data is not yet clear. In this work we investigate if sequence modeling has the capability to condense trajectories into useful representations that can contribute to policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories with sequence modeling techniques, and then employs these representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. We conduct extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, and observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future, which serves as an "implicit planner", and enables competitive performance on all three benchmarks.
摘要
最近的工作已经证明了将决策问题定义为有监督学习问题的可行性。然而,使用序列模型处理轨迹数据的利点还不够清晰。在这种情况下,我们 investigate whether sequence modeling can condense trajectories into useful representations that can contribute to policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories with sequence modeling techniques and then employs these representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. We conduct extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, and observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future, which serves as an "implicit planner" and enables competitive performance on all three benchmarks.Here's the word-for-word translation of the given text into Simplified Chinese:最近的工作已经证明了将决策问题定义为有监督学习问题的可行性。然而,使用序列模型处理轨迹数据的利点还不够清晰。在这种情况下,我们 investigate whether sequence modeling can condense trajectories into useful representations that can contribute to policy learning. To achieve this, we adopt a two-stage framework that first summarizes trajectories with sequence modeling techniques and then employs these representations to learn a policy along with a desired goal. This design allows many existing supervised offline RL methods to be considered as specific instances of our framework. Within this framework, we introduce Goal-Conditioned Predicitve Coding (GCPC), an approach that brings powerful trajectory representations and leads to performant policies. We conduct extensive empirical evaluations on AntMaze, FrankaKitchen and Locomotion environments, and observe that sequence modeling has a significant impact on some decision making tasks. In addition, we demonstrate that GCPC learns a goal-conditioned latent representation about the future, which serves as an "implicit planner" and enables competitive performance on all three benchmarks.
Exploring the Potential of Large Language Models (LLMs) in Learning on Graphs
results: 本文通过了多种设定下的全面和系统的实验研究,从实验结果中得出了原创的观察和新的发现,开启了新的可能性和建议,并提出了潜在的方向,以便更好地利用 LLMS 进行图学习。Abstract
Learning on Graphs has attracted immense attention due to its wide real-world applications. The most popular pipeline for learning on graphs with textual node attributes primarily relies on Graph Neural Networks (GNNs), and utilizes shallow text embedding as initial node representations, which has limitations in general knowledge and profound semantic understanding. In recent years, Large Language Models (LLMs) have been proven to possess extensive common knowledge and powerful semantic comprehension abilities that have revolutionized existing workflows to handle text data. In this paper, we aim to explore the potential of LLMs in graph machine learning, especially the node classification task, and investigate two possible pipelines: LLMs-as-Enhancers and LLMs-as-Predictors. The former leverages LLMs to enhance nodes' text attributes with their massive knowledge and then generate predictions through GNNs. The latter attempts to directly employ LLMs as standalone predictors. We conduct comprehensive and systematical studies on these two pipelines under various settings. From comprehensive empirical results, we make original observations and find new insights that open new possibilities and suggest promising directions to leverage LLMs for learning on graphs. Our codes and datasets are available at https://github.com/CurryTang/Graph-LLM.
摘要
学习图有很多应用,吸引了很多人的注意。最受欢迎的图学习管道是使用图神经网络(GNNs),并使用图节点特征的浅层文本嵌入,这有限制在普遍知识和深层semantic理解方面。在过去几年,大型自然语言模型(LLMs)已经证明了具有广泛的通用知识和强大的semantic理解能力,这些能力在处理文本数据方面引发了革命。在这篇论文中,我们想要探索LLMs在图机器学习中的潜力,特别是节点分类任务,并研究了两个可能的管道:LLMs-as-Enhancers和LLMs-as-Predictors。前者利用LLMs来增强节点的文本特征,然后通过GNNs生成预测结果。后者尝试直接使用LLMs作为独立预测器。我们在不同的设置下进行了全面和系统的研究,从实验结果中得出了原创的观察和新的发现,这些发现开启了新的可能性和建议了潜在的方向,以便利用LLMs进行图学习。我们的代码和数据集可以在https://github.com/CurryTang/Graph-LLM上下载。
AI-UPV at EXIST 2023 – Sexism Characterization Using Large Language Models Under The Learning with Disagreements Regime
results: 本研究在EXIST实验室2023中参与了三个任务,其中在第2任务中以软评估方式获得了第四名,并在第3任务中获得了最高ICM-Soft=-2.32和normalized ICM-Soft=0.79。Abstract
With the increasing influence of social media platforms, it has become crucial to develop automated systems capable of detecting instances of sexism and other disrespectful and hateful behaviors to promote a more inclusive and respectful online environment. Nevertheless, these tasks are considerably challenging considering different hate categories and the author's intentions, especially under the learning with disagreements regime. This paper describes AI-UPV team's participation in the EXIST (sEXism Identification in Social neTworks) Lab at CLEF 2023. The proposed approach aims at addressing the task of sexism identification and characterization under the learning with disagreements paradigm by training directly from the data with disagreements, without using any aggregated label. Yet, performances considering both soft and hard evaluations are reported. The proposed system uses large language models (i.e., mBERT and XLM-RoBERTa) and ensemble strategies for sexism identification and classification in English and Spanish. In particular, our system is articulated in three different pipelines. The ensemble approach outperformed the individual large language models obtaining the best performances both adopting a soft and a hard label evaluation. This work describes the participation in all the three EXIST tasks, considering a soft evaluation, it obtained fourth place in Task 2 at EXIST and first place in Task 3, with the highest ICM-Soft of -2.32 and a normalized ICM-Soft of 0.79. The source code of our approaches is publicly available at https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement.
摘要
随着社交媒体平台的普及,已成为必要的开发自动化系统,以检测社交媒体上的性别歧视和其他不尊重和仇恨行为,以促进更加包容和尊重的在线环境。然而,这些任务具有不同的仇恨类别和作者意图,特别是在学习各种不同的观点下。这篇文章描述了AI-UPV团队在EXIST(sEXism Identification in Social neTworks)实验室中的参与。提议的方法是通过直接从数据中学习,不使用任何综合标签,来解决性别歧视的识别和分类问题。然而,我们还是根据软和硬评估进行了性能评估。我们使用了大型语言模型(i.e., mBERT和XLM-RoBERTa)和ensemble策略进行性别歧视识别和分类。特别是,我们的系统是由三个不同的管道组成。 ensemble方法在软和硬评估中都表现出了最佳性能,并在EXIST任务中获得了第四名(Task 2)和第一名(Task 3),其ICM-Soft=-2.32和normalized ICM-Soft为0.79。我们的代码可以在https://github.com/AngelFelipeMP/Sexism-LLM-Learning-With-Disagreement上获取。
results: 研究发现,通过使用简单的 transformer 模型和适当的数据格式化,可以使用 next-token prediction 目标来快速学习算术操作,并且这种方法可以同时提高准确率、样本复杂度和 converges 速度。Abstract
Large language models like GPT-4 exhibit emergent capabilities across general-purpose tasks, such as basic arithmetic, when trained on extensive text data, even though these tasks are not explicitly encoded by the unsupervised, next-token prediction objective. This study investigates how small transformers, trained from random initialization, can efficiently learn arithmetic operations such as addition, multiplication, and elementary functions like square root, using the next-token prediction objective. We first demonstrate that conventional training data is not the most effective for arithmetic learning, and simple formatting changes can significantly improve accuracy. This leads to sharp phase transitions as a function of training data scale, which, in some cases, can be explained through connections to low-rank matrix completion. Building on prior work, we then train on chain-of-thought style data that includes intermediate step results. Even in the complete absence of pretraining, this approach significantly and simultaneously improves accuracy, sample complexity, and convergence speed. We also study the interplay between arithmetic and text data during training and examine the effects of few-shot prompting, pretraining, and model scale. Additionally, we discuss length generalization challenges. Our work highlights the importance of high-quality, instructive data that considers the particular characteristics of the next-word prediction objective for rapidly eliciting arithmetic capabilities.
摘要
大型语言模型如GPT-4会展示涉及到通用任务的emergent能力,例如基本的算术运算,当它们被训练在广泛的文本数据上,即使这些任务没有直接被Encoding在无监督的下一个字符预测目标中。这个研究探索了如何使用下一个字符预测目标来快速学习算术运算,包括加法、乘法和幂函数。我们首先示出了传统训练数据不是最有效的 для算术学习,并且简单的格式变更可以提高准确性。这会导致训练数据的数量对学习数据的阶段性变化产生锐角转折,在一些情况下,这些转折可以通过低维矩阵完成性的连结来解释。我们 THEN以链条式数据来训练,包括中途结果。甚至在完全absence of pre-training,这种方法可以对准确性、样本复杂度和训练速度进行同时提高。我们还研究了在训练过程中文本和算术数据之间的互动,以及几个shot prompting、预训练和模型scale的影响。此外,我们还讨论了长度扩展的挑战。我们的工作强调了高质量、 instruктив的数据的重要性,以应对特定的下一个字符预测目标,快速抽象出算术能力。
On Formal Feature Attribution and Its Approximation
paper_authors: Jinqiang Yu, Alexey Ignatiev, Peter J. Stuckey
For: This paper focuses on the problem of feature attribution in machine learning models, specifically in the context of explainable artificial intelligence (XAI). It proposes a new approach called formal feature attribution (FFA) to address the limitations of existing methods.* Methods: The paper uses formal methods to analyze and evaluate the feature attribution of machine learning models. It specifically employs formal explanation enumeration to compute the exact FFA, and proposes an efficient approximation technique to handle the practical complexity of the problem.* Results: The paper provides experimental evidence of the effectiveness of the proposed approximate FFA method, comparing it to existing feature attribution algorithms in terms of feature importance and relative order. It demonstrates that FFA can provide more accurate and informative attributions than existing methods, while also being more efficient in practical settings.Abstract
Recent years have witnessed the widespread use of artificial intelligence (AI) algorithms and machine learning (ML) models. Despite their tremendous success, a number of vital problems like ML model brittleness, their fairness, and the lack of interpretability warrant the need for the active developments in explainable artificial intelligence (XAI) and formal ML model verification. The two major lines of work in XAI include feature selection methods, e.g. Anchors, and feature attribution techniques, e.g. LIME and SHAP. Despite their promise, most of the existing feature selection and attribution approaches are susceptible to a range of critical issues, including explanation unsoundness and out-of-distribution sampling. A recent formal approach to XAI (FXAI) although serving as an alternative to the above and free of these issues suffers from a few other limitations. For instance and besides the scalability limitation, the formal approach is unable to tackle the feature attribution problem. Additionally, a formal explanation despite being formally sound is typically quite large, which hampers its applicability in practical settings. Motivated by the above, this paper proposes a way to apply the apparatus of formal XAI to the case of feature attribution based on formal explanation enumeration. Formal feature attribution (FFA) is argued to be advantageous over the existing methods, both formal and non-formal. Given the practical complexity of the problem, the paper then proposes an efficient technique for approximating exact FFA. Finally, it offers experimental evidence of the effectiveness of the proposed approximate FFA in comparison to the existing feature attribution algorithms not only in terms of feature importance and but also in terms of their relative order.
摘要
近年来,人工智能(AI)算法和机器学习(ML)模型在各个领域得到了广泛的应用。虽然它们取得了很大的成功,但是一些重要的问题仍然需要解决,如机器学习模型的 brittleness、公正性和解释性的缺失。这些问题促使了活跃的开发Explainable Artificial Intelligence(XAI)和正式的机器学习模型验证。XAI的两大主要方向是特征选择方法,如Anchors,以及特征归因技术,如LIME和SHAP。尽管它们承诺了很多,但是现有的特征选择和归因方法受到了许多重要的问题的威胁,如解释不准确和非常型采样。一种最近的正式XAI方法,尽管作为一种alternative,免受了这些问题,但它又有一些其他的限制,例如可扩展性的限制,无法解决特征归因问题。此外,正式的解释,即正式承诺,通常很大,这会妨碍它在实践中的应用。为了解决这些问题,本文提出了一种基于正式XAI的特征归因方法,即正式特征归因(FFA)。 FF A argued to be advantageous over the existing methods, both formal and non-formal。 compte tenu de la complexité pratique du problème, la paper then propose une méthode efficace pour approximer l'explication exacte FFA。Finally, it offers experimental evidence of the effectiveness of the proposed approximate FFA in comparison to the existing feature attribution algorithms not only in terms of feature importance but also in terms of their relative order.
Mitigating Negative Transfer with Task Awareness for Sexism, Hate Speech, and Toxic Language Detection
paper_authors: Angel Felipe Magnossão de Paula, Paolo Rosso, Damiano Spina
for: 这篇论文目的是解决机器学习中的负转移问题。
methods: 该论文提出了一种基于任务意识概念的方法来缓解负转移问题。
results: 该方法在EXIST-2021和HatEval-2019测试基准上实现了新的状态作图,并且与 класси型多任务学习方法相比,提高了性能。Abstract
This paper proposes a novelty approach to mitigate the negative transfer problem. In the field of machine learning, the common strategy is to apply the Single-Task Learning approach in order to train a supervised model to solve a specific task. Training a robust model requires a lot of data and a significant amount of computational resources, making this solution unfeasible in cases where data are unavailable or expensive to gather. Therefore another solution, based on the sharing of information between tasks, has been developed: Multi-Task Learning (MTL). Despite the recent developments regarding MTL, the problem of negative transfer has still to be solved. Negative transfer is a phenomenon that occurs when noisy information is shared between tasks, resulting in a drop in performance. This paper proposes a new approach to mitigate the negative transfer problem based on the task awareness concept. The proposed approach results in diminishing the negative transfer together with an improvement of performance over classic MTL solution. Moreover, the proposed approach has been implemented in two unified architectures to detect Sexism, Hate Speech, and Toxic Language in text comments. The proposed architectures set a new state-of-the-art both in EXIST-2021 and HatEval-2019 benchmarks.
摘要
Here is the text in Simplified Chinese:这篇论文提出了一种新的方法来解决多任务学习中的负面传递问题。在机器学习领域中,通常采用单任务学习方法来训练特定任务的超级vised模型,但是这需要很多数据和计算资源。为了解决这个限制,多任务学习(MTL)被开发出来,它在任务之间共享信息。然而,负面传递现象会导致任务之间的信息干扰,从而导致性能下降。这篇论文提出了一种基于任务意识概念的新方法来 Mitigate负面传递问题,并且在EXIST-2021和HatEval-2019测试benchmark上设置了新的状态公共。
STG-MTL: Scalable Task Grouping for Multi-Task Learning Using Data Map
results: 实验表明,我们的方法可以有效地处理大量任务(达100个),并且可以提高MTL的性能。Abstract
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one, and some groupings might produce performance degradation due to negative interference between tasks. Furthermore, existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on hand-crafted features, specifically Data Maps, which capture the training behavior for each classification task during the MTL training. We experiment with the method demonstrating its effectiveness, even on an unprecedented number of tasks (up to 100).
摘要
多任务学习(MTL)是一种强大的技术,它在单任务学习(STL)的基础上提高性能,但是MTL也有很多挑战。其中一个主要挑战是可能的任务分组的数量是无限的,这使得选择最佳任务分组变得困难,而且一些任务分组可能会导致任务之间的负面干扰,从而降低性能。此外,现有的解决方案受到可扩展性的限制,这限制了它们在实际应用中的使用。在我们的论文中,我们提出了一种基于手工特征的数据驱动方法,该方法可以 Address these challenges and provide a scalable and modular solution for classification task grouping. We experiment with the method and demonstrate its effectiveness, even on an unprecedented number of tasks (up to 100).
Distilled Pruning: Using Synthetic Data to Win the Lottery
results: 实验结果表明,使用精炼数据可以在 CIFAR-10 上找到更加快速的、相对精炼的剪辑结果,比 Iterative Magnitude Pruning 快到 5 倍。这些结果表明使用精炼数据可以提高资源有效的神经网络剪辑、模型压缩和神经建筑搜索。Abstract
This work introduces a novel approach to pruning deep learning models by using distilled data. Unlike conventional strategies which primarily focus on architectural or algorithmic optimization, our method reconsiders the role of data in these scenarios. Distilled datasets capture essential patterns from larger datasets, and we demonstrate how to leverage this capability to enable a computationally efficient pruning process. Our approach can find sparse, trainable subnetworks (a.k.a. Lottery Tickets) up to 5x faster than Iterative Magnitude Pruning at comparable sparsity on CIFAR-10. The experimental results highlight the potential of using distilled data for resource-efficient neural network pruning, model compression, and neural architecture search.
摘要
methods: 我们提出的方法基于增量学习,不需要特定的模型和联合设置。我们利用新的记忆替换老的记忆,模仿人脑中的活动忘记。Specifically, the model intended to unlearn serves as a student model that continuously learns from randomly initiated teacher models. To prevent catastrophic forgetting of non-target data, we utilize elastic weight consolidation to elastically constrain weight change.
results: 我们的方法在三个标准 benchmark 数据集上进行了广泛的实验,并得到了满意的效果和效率。另外,我们还通过后门攻击示例表明了我们的方法具有满意的完整性。Abstract
The increasing concerns regarding the privacy of machine learning models have catalyzed the exploration of machine unlearning, i.e., a process that removes the influence of training data on machine learning models. This concern also arises in the realm of federated learning, prompting researchers to address the federated unlearning problem. However, federated unlearning remains challenging. Existing unlearning methods can be broadly categorized into two approaches, i.e., exact unlearning and approximate unlearning. Firstly, implementing exact unlearning, which typically relies on the partition-aggregation framework, in a distributed manner does not improve time efficiency theoretically. Secondly, existing federated (approximate) unlearning methods suffer from imprecise data influence estimation, significant computational burden, or both. To this end, we propose a novel federated unlearning framework based on incremental learning, which is independent of specific models and federated settings. Our framework differs from existing federated unlearning methods that rely on approximate retraining or data influence estimation. Instead, we leverage new memories to overwrite old ones, imitating the process of \textit{active forgetting} in neurology. Specifically, the model, intended to unlearn, serves as a student model that continuously learns from randomly initiated teacher models. To preserve catastrophic forgetting of non-target data, we utilize elastic weight consolidation to elastically constrain weight change. Extensive experiments on three benchmark datasets demonstrate the efficiency and effectiveness of our proposed method. The result of backdoor attacks demonstrates that our proposed method achieves satisfying completeness.
摘要
随着机器学习模型的隐私问题的增加,许多研究者开始探讨机器学习模型的卸载问题,即使模型不再受训练数据的影响。在联合学习领域,这种问题也得到了关注,但是联合卸载仍然是一个挑战。现有的卸载方法可以大致分为两类:精确卸载和approximate卸载。首先,在分布式环境中实现精确卸载不会提高时间效率理论上。其次,现有的联合卸载方法受到数据影响估计不准确、计算负担大、或者都有问题。为此,我们提出了一种基于增量学习的联合卸载框架,不受特定模型和联合设置的限制。我们的框架与现有的联合卸载方法不同,不是通过精度抽象重新训练或数据影响估计来实现卸载。相反,我们利用新的记忆来覆盖老的记忆,模仿人脑中的活动忘记。具体来说,作为卸载的模型,我们的模型在随机开始的老师模型的指导下不断学习。为避免非目标数据的悲观性忘记,我们利用弹性重要权重卷积来稳定重要权重的变化。我们在三个标准数据集上进行了广泛的实验,结果表明我们的提出方法是高效和有效的。结果还表明,我们的方法可以满足完整性要求。
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context
paper_authors: Shiva Omrani Sabbaghi, Robert Wolfe, Aylin Caliskan
For: The paper aims to quantify the biases in language models using a sentence template that provides an intersectional context, and to study the associations of underrepresented groups in language.* Methods: The paper uses a concept projection approach to capture the valence subspace through contextualized word embeddings of language models, and adapts the projection-based approach to embedding association tests to quantify bias.* Results: The paper finds that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language, and that the largest and better-performing model is also more biased. The approach enables the study of complex intersectional biases and contributes to design justice by studying the associations of underrepresented groups in language.Abstract
Language models are trained on large-scale corpora that embed implicit biases documented in psychology. Valence associations (pleasantness/unpleasantness) of social groups determine the biased attitudes towards groups and concepts in social cognition. Building on this established literature, we quantify how social groups are valenced in English language models using a sentence template that provides an intersectional context. We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight. We present a concept projection approach to capture the valence subspace through contextualized word embeddings of language models. Adapting the projection-based approach to embedding association tests that quantify bias, we find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language. We find that the largest and better-performing model that we study is also more biased as it effectively captures bias embedded in sociocultural data. We validate the bias evaluation method by overperforming on an intrinsic valence evaluation task. The approach enables us to measure complex intersectional biases as they are known to manifest in the outputs and applications of language models that perpetuate historical biases. Moreover, our approach contributes to design justice as it studies the associations of groups underrepresented in language such as transgender and homosexual individuals.
摘要
受大规模文献吸收的语言模型具有隐式偏见,这些偏见在社会认知中确定语言模型对社会集团和概念的偏见态度。基于已有文献,我们使用一个 intersecting 上下文中的句子模板来衡量社会集团的VALence(愉悦程度)。我们研究年龄、教育、性别、身高、智商、文化程度、种族、宗教、性别、性取向、社会阶层和体重等因素对语言模型的偏见。我们采用一种投影方法来捕捉VALence子空间,并通过contextualized word embeddings来衡量语言模型的偏见。我们发现语言模型对性别认同、社会阶层和性取向信号表现出最大的偏见。此外,我们发现最大和最高性能的模型也是最偏见的,因为它能够吸收社会文化数据中嵌入的偏见。我们验证了偏见评价方法的正确性,并发现该方法可以衡量复杂的交叉群偏见,这些偏见在语言模型的输出和应用中仍然存在。此外,我们的方法对设计正义做出贡献,因为它研究未 Represented 在语言中的群体,如 трансジェンダ和同性恋者。
CSCLog: A Component Subsequence Correlation-Aware Log Anomaly Detection Method
results: 实验结果显示,CSCLog 方法可以对四个公开的系统日志数据进行异常探测,与最佳基eline相比,平均提高了7.41%的标准偏差。Abstract
Anomaly detection based on system logs plays an important role in intelligent operations, which is a challenging task due to the extremely complex log patterns. Existing methods detect anomalies by capturing the sequential dependencies in log sequences, which ignore the interactions of subsequences. To this end, we propose CSCLog, a Component Subsequence Correlation-Aware Log anomaly detection method, which not only captures the sequential dependencies in subsequences, but also models the implicit correlations of subsequences. Specifically, subsequences are extracted from log sequences based on components and the sequential dependencies in subsequences are captured by Long Short-Term Memory Networks (LSTMs). An implicit correlation encoder is introduced to model the implicit correlations of subsequences adaptively. In addition, Graph Convolution Networks (GCNs) are employed to accomplish the information interactions of subsequences. Finally, attention mechanisms are exploited to fuse the embeddings of all subsequences. Extensive experiments on four publicly available log datasets demonstrate the effectiveness of CSCLog, outperforming the best baseline by an average of 7.41% in Macro F1-Measure.
摘要
“异常检测基于系统日志记录是智能运维中重要的一个任务,但是由于系统日志记录的极其复杂,这是一项挑战性的任务。现有的方法通过捕捉系统日志记录序列中的顺序相关性来检测异常,但是它们忽略了系统日志记录序列中的间接相关性。为此,我们提出了CSCLog方法,它不仅捕捉系统日志记录序列中的顺序相关性,而且模型了系统日志记录序列中的间接相关性。具体来说,我们从系统日志记录序列中提取了子序列,并使用Long Short-Term Memory Networks(LSTM)捕捉了这些子序列中的顺序相关性。此外,我们引入了一个适应性的间接相关性编码器,以模型系统日志记录序列中的间接相关性。同时,我们使用Graph Convolution Networks(GCNs)来实现系统日志记录序列中的信息互动。最后,我们利用了注意力机制来融合所有子序列的嵌入。我们对四个公开的系统日志数据集进行了广泛的实验,并证明了CSCLog方法的有效性,与最佳基eline相比,CSCLog方法的平均准确率提高了7.41%。”
Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms
results: 本文的结果显示,通过分析SCO算法的稳定性和泛化性,可以更好地理解这些算法在未来测试示例上的行为。此外,我们还提供了一个基于稳定性和优化误差的维度独立过剩风险 bounds,这是现有的首例研究。Abstract
Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.
摘要
多种机器学习任务可以表示为随机 compositional optimization(SCO)问题,如奖励学习、AUC最大化和元学习,其目标函数含有嵌入的嵌入关系。虽然有很多研究关注了 SCO 算法的收敛性行为,但对于这些学习算法在未来测试例子上的表现,却有很少研究。在这篇论文中,我们提供了 SCO 算法的稳定性和泛化分析,通过统计学学习理论的框架。首先,我们引入了一种稳定性概念called compositional uniform stability,并证明其与泛化之间存在确定的关系。然后,我们证明了 SCGD 和 SCSC 两种流行的随机 compositional gradient descent 算法的 compositional uniform stability 结果。最后,我们 derivated 不同维度的维度独立过分的剩余风险 bound,通过考虑这些算法的稳定性结果和优化错误来做出交换。根据我们所知,这些结果是 SCO 算法的稳定性和泛化分析的首次研究成果。
Federated Learning over a Wireless Network: Distributed User Selection through Random Access
results: 通过控制竞争窗口大小,以增加某些用户在每次训练中获得Radio资源的机会,实现了适度的用户选择。通过训练数据偏迟为FL用户选择目标场景。使用计数机制保证了公平性。在不同的数据集上进行了丰富的实践,并显示该方法可以快速达到与中央用户选择方法相似的准确率。Abstract
User selection has become crucial for decreasing the communication costs of federated learning (FL) over wireless networks. However, centralized user selection causes additional system complexity. This study proposes a network intrinsic approach of distributed user selection that leverages the radio resource competition mechanism in random access. Taking the carrier sensing multiple access (CSMA) mechanism as an example of random access, we manipulate the contention window (CW) size to prioritize certain users for obtaining radio resources in each round of training. Training data bias is used as a target scenario for FL with user selection. Prioritization is based on the distance between the newly trained local model and the global model of the previous round. To avoid excessive contribution by certain users, a counting mechanism is used to ensure fairness. Simulations with various datasets demonstrate that this method can rapidly achieve convergence similar to that of the centralized user selection approach.
摘要
用户选择已成为聚合学习(FL)在无线网络上减少通信成本的关键。然而,中央化用户选择会增加系统复杂性。本研究提出了基于网络内部的分布式用户选择方法,利用无线资源竞争机制。使用干扰多访问(CSMA)机制为例,我们在每次训练中 manipulate 竞争窗口(CW)大小,以优先给予某些用户无线资源。在训练数据偏袋场景下,我们根据上一轮训练的全球模型与当前轮训练的本地模型之间的距离,对用户进行优先级排序。为避免某些用户的过度贡献,我们使用计数机制保持公平。通过对不同的数据集进行临床示例,我们的方法可以快速达到与中央化用户选择方法相似的减少。
Distilling Universal and Joint Knowledge for Cross-Domain Model Compression on Time Series Data
results: 实验结果显示,这个方法在四个时间序列数据集上的性能比前一代(SOTA)标准更高,并且可以实现跨领域类别变化中的模型压缩和适应。Abstract
For many real-world time series tasks, the computational complexity of prevalent deep leaning models often hinders the deployment on resource-limited environments (e.g., smartphones). Moreover, due to the inevitable domain shift between model training (source) and deploying (target) stages, compressing those deep models under cross-domain scenarios becomes more challenging. Although some of existing works have already explored cross-domain knowledge distillation for model compression, they are either biased to source data or heavily tangled between source and target data. To this end, we design a novel end-to-end framework called Universal and joint knowledge distillation (UNI-KD) for cross-domain model compression. In particular, we propose to transfer both the universal feature-level knowledge across source and target domains and the joint logit-level knowledge shared by both domains from the teacher to the student model via an adversarial learning scheme. More specifically, a feature-domain discriminator is employed to align teacher's and student's representations for universal knowledge transfer. A data-domain discriminator is utilized to prioritize the domain-shared samples for joint knowledge transfer. Extensive experimental results on four time series datasets demonstrate the superiority of our proposed method over state-of-the-art (SOTA) benchmarks.
摘要
Many real-world 时序系列任务中,现有的深度学习模型的计算复杂性 oft hinders 部署在有限资源环境(例如智能手机)中。此外,由于源领域和目标领域之间的预期域转换,压缩这些深度模型在交叉领域场景下变得更加挑战。虽然一些现有的工作已经探索了交叉领域知识填充,但它们是 either 偏向源数据还是 heavily tangled между源和目标数据。为此,我们设计了一个 novel 整体框架,即 Universal and joint knowledge distillation(UNI-KD),用于交叉领域模型压缩。具体来说,我们提议将 teacher 模型中的通用特征层级知识传递给学生模型,并在 adversarial learning scheme 中使用 feature-domain discriminator 对 teacher 的表示进行对接。此外,我们还使用 data-domain discriminator 来优先级化目标领域中共享的样本,以便进行交叉领域知识传递。我们对四个时序系列 dataset 进行了广泛的实验,结果表明我们的提议方法比现有的标准准则(SOTA)更高效。
Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat
paper_authors: Shantanu Ghosh, Ke Yu, Forough Arabshahi, Kayhan Batmanghelich
for: This paper aims to blur the distinction between post hoc explanation of a Blackbox and constructing interpretable models.
methods: The proposed method begins with a Blackbox, iteratively carves out a mixture of interpretable experts (MoIE) and a residual network, and uses First Order Logic (FOL) to provide basic reasoning on concepts from the Blackbox.
results: The extensive experiments show that the proposed approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising performance, (2) identifies the relatively “harder” samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox.Here is the same information in Simplified Chinese:
for: 这篇论文目标是让黑盒模型的解释和可解释模型之间的分化越来越模糊。
methods: 提议的方法从黑盒开始,iteratively刻划出一个混合的可解释专家(MoIE)和剩下的待处理网络,并使用First Order Logic(FOL)提供黑盒中基本的推理。
results: 广泛的实验显示,提议的方法(1)通过MoIE实现了高完整性的实例特定概念,无需牺牲性能,(2)通过剩下的待处理网络实现了对更加“Difficult”的样本的解释,(3)在测试时间干涉中高度超越了可解释设计模型,(4)解决了黑盒学习的短circuit。 MoIE代码可以在以下链接获取:https://github.com/batmanlab/ICML-2023-Route-interpret-repeatAbstract
ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively ``harder'' samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. The code for MoIE is publicly available at: \url{https://github.com/batmanlab/ICML-2023-Route-interpret-repeat}
摘要
<>模型设计 Either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible and underperforming than their Blackbox variants. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively carve out a mixture of interpretable experts (MoIE) and a residual network. Each interpretable model specializes in a subset of samples and explains them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our route, interpret, and repeat approach (1) identifies a diverse set of instance-specific concepts with high concept completeness via MoIE without compromising in performance, (2) identifies the relatively ``harder'' samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, and (4) fixes the shortcut learned by the original Blackbox. 模型设计可以开始 Either with an interpretable model or a Blackbox,并在后续进行解释。Blackbox模型具有灵活性,但它们具有困难解释的特性,而可解释模型则具有内在的解释性。然而,可解释模型需要ML知识的涵盖和具有较差的灵活性和性能下降。这篇论文目标是将黑盒模型的后续解释与构建可解释模型进行混合。我们从黑盒模型开始,并在每次迭代中逐步划分出一个混合的可解释专家(MoIE)和剩下的剩余网络。每个可解释模型专门处理一 subset of samples,并使用First Order Logic(FOL)进行基本的推理,提供黑盒模型中的基本概念。我们将剩下的样本通过一个灵活的剩余网络进行路由。我们在剩下的网络上重复这种方法,直到所有的可解释模型解释满足所需的数据比例。我们的广泛的实验表明,我们的路由、解释和重复方法(1)可以通过MoIE无需牺牲性能来实现高度完整的概念,(2)可以通过剩余来解释一些更加困难的样本,(3)在测试时间干涉中大幅度超越可解释设计模型,以及(4)修复黑盒模型中学习的短circuit。MoIE的代码可以在以下地址找到:
Personalized Prediction of Recurrent Stress Events Using Self-Supervised Learning on Multimodal Time-Series Data
results: 在Wearable Stress and Affect Detection(WESAD)数据集上测试,SSL模型表现更好,只需使用 less than 5% 的注释,这表明该方法可以个性化预测chronic stressI hope that helps! Let me know if you have any other questions.Abstract
Chronic stress can significantly affect physical and mental health. The advent of wearable technology allows for the tracking of physiological signals, potentially leading to innovative stress prediction and intervention methods. However, challenges such as label scarcity and data heterogeneity render stress prediction difficult in practice. To counter these issues, we have developed a multimodal personalized stress prediction system using wearable biosignal data. We employ self-supervised learning (SSL) to pre-train the models on each subject's data, allowing the models to learn the baseline dynamics of the participant's biosignals prior to fine-tuning the stress prediction task. We test our model on the Wearable Stress and Affect Detection (WESAD) dataset, demonstrating that our SSL models outperform non-SSL models while utilizing less than 5% of the annotations. These results suggest that our approach can personalize stress prediction to each user with minimal annotations. This paradigm has the potential to enable personalized prediction of a variety of recurring health events using complex multimodal data streams.
摘要
Variational quantum regression algorithm with encoded data structure
paper_authors: C. -C. Joseph Wang, Ryan S. Bennink
For: solves practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers.* Methods: constructs a quantum regression algorithm with model interpretability, employs a circuit that directly encodes the data in quantum amplitudes, and uses compressed encoding and digital-analog gate operation to reduce the run time complexity.* Results: achieves a logarithmic reduction in the number of physical qubits needed compared to traditional one-hot-encoding techniques, and demonstrates the effectiveness of the algorithm for linear and nonlinear regression with ensemble model training and important feature selection.Abstract
Variational quantum algorithms (VQAs) prevail to solve practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. For variational quantum machine learning, a variational algorithm with model interpretability built into the algorithm is yet to be exploited. In this paper, we construct a quantum regression algorithm and identify the direct relation of variational parameters to learned regression coefficients, while employing a circuit that directly encodes the data in quantum amplitudes reflecting the structure of the classical data table. The algorithm is particularly suitable for well-connected qubits. With compressed encoding and digital-analog gate operation, the run time complexity is logarithmically more advantageous than that for digital 2-local gate native hardware with the number of data entries encoded, a decent improvement in noisy intermediate-scale quantum computers and a minor improvement for large-scale quantum computing Our suggested method of compressed binary encoding offers a remarkable reduction in the number of physical qubits needed when compared to the traditional one-hot-encoding technique with the same input data. The algorithm inherently performs linear regression but can also be used easily for nonlinear regression by building nonlinear features into the training data. In terms of measured cost function which distinguishes a good model from a poor one for model training, it will be effective only when the number of features is much less than the number of records for the encoded data structure to be observable. To echo this finding and mitigate hardware noise in practice, the ensemble model training from the quantum regression model learning with important feature selection from regularization is incorporated and illustrated numerically.
摘要
varyational quantum algorithms (VQAs) prevail in solving practical problems such as combinatorial optimization, quantum chemistry simulation, quantum machine learning, and quantum error correction on noisy quantum computers. For variational quantum machine learning, a variational algorithm with model interpretability built into the algorithm is yet to be exploited. In this paper, we construct a quantum regression algorithm and identify the direct relation of variational parameters to learned regression coefficients, while employing a circuit that directly encodes the data in quantum amplitudes reflecting the structure of the classical data table. The algorithm is particularly suitable for well-connected qubits. With compressed encoding and digital-analog gate operation, the run time complexity is logarithmically more advantageous than that for digital 2-local gate native hardware with the number of data entries encoded, a decent improvement in noisy intermediate-scale quantum computers and a minor improvement for large-scale quantum computing. Our suggested method of compressed binary encoding offers a remarkable reduction in the number of physical qubits needed when compared to the traditional one-hot-encoding technique with the same input data. The algorithm inherently performs linear regression but can also be used easily for nonlinear regression by building nonlinear features into the training data. In terms of measured cost function which distinguishes a good model from a poor one for model training, it will be effective only when the number of features is much less than the number of records for the encoded data structure to be observable. To echo this finding and mitigate hardware noise in practice, the ensemble model training from the quantum regression model learning with important feature selection from regularization is incorporated and illustrated numerically.
ACDNet: Attention-guided Collaborative Decision Network for Effective Medication Recommendation
results: 实验结果显示,这篇研究在两个大规模医疗数据集MIMIC-III和MIMIC-IV上表现出色,与之前的模型相比,它在Jaccard、PR-AUC和F1分数上明显提高。此外,实验中的删除实验和实验案例显示了每个模组的贡献度,证实了它们对整体性能的贡献。Abstract
Medication recommendation using Electronic Health Records (EHR) is challenging due to complex medical data. Current approaches extract longitudinal information from patient EHR to personalize recommendations. However, existing models often lack sufficient patient representation and overlook the importance of considering the similarity between a patient's medication records and specific medicines. Therefore, an Attention-guided Collaborative Decision Network (ACDNet) for medication recommendation is proposed in this paper. Specifically, ACDNet utilizes attention mechanism and Transformer to effectively capture patient health conditions and medication records by modeling their historical visits at both global and local levels. ACDNet also employs a collaborative decision framework, utilizing the similarity between medication records and medicine representation to facilitate the recommendation process. The experimental results on two extensive medical datasets, MIMIC-III and MIMIC-IV, clearly demonstrate that ACDNet outperforms state-of-the-art models in terms of Jaccard, PR-AUC, and F1 score, reaffirming its superiority. Moreover, the ablation experiments provide solid evidence of the effectiveness of each module in ACDNet, validating their contribution to the overall performance. Furthermore, a detailed case study reinforces the effectiveness of ACDNet in medication recommendation based on EHR data, showcasing its practical value in real-world healthcare scenarios.
摘要
运用电子健康记录(EHR)提供处方建议是具有复杂医疗资料的挑战。现有方法通常从病人EHR中提取长期信息,以personalize处方建议。然而,现有的模型通常缺乏病人表现的完整性,并忽略了考虑病人处方记录和具体药品之间的相似性。因此,本文提出了一个注意力导向的协同决策网络(ACDNet),用于处方建议。具体来说,ACDNet使用注意力机制和Transformer来有效地捕捉病人健康状态和处方记录,并通过模型病人的历史访问记录,实现全球和局部水平的同步运算。ACDNet还使用协同决策框架,通过考虑处方记录和药品表示之间的相似性,来协助建议过程。实验结果显示,ACDNet在两个大量医疗数据集MIMIC-III和MIMIC-IV上具有较高的Jaccard、PR-AUC和F1分数,与现有模型相比,具体表明其超越性。此外,删除实验显示了每个模组在ACDNet中的贡献,证实它们的贡献为整体性能的重要原因。此外,一个详细的实验案例证明ACDNet在基于EHR数据的处方建议中的实际价值,展现其在实际医疗应用中的实用性。
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
results: 我们发现,通过在新网络中转移encoder架构和参数,并在小量标注数据上训练,可以使新网络在数字天线阵列数据上进行带宽调整任务更好than一个Equivalent网络从随机初始化开始训练。Abstract
This work presents the first applications of self-supervised learning applied to data from digital antenna arrays. Encoder-decoder networks are pretrained on digital array data to perform a self-supervised noisy-reconstruction task called channel in-painting, in which the network infers the contents of array data that has been masked with zeros. The self-supervised step requires no human-labeled data. The encoder architecture and weights from pretraining are then transferred to a new network with a task-specific decoder, and the new network is trained on a small volume of labeled data. We show that pretraining on the unlabeled data allows the new network to perform the task of bandwidth regression on the digital array data better than an equivalent network that is trained on the same labeled data from random initialization.
摘要
Machine Learning to detect cyber-attacks and discriminating the types of power system disturbances
results: 研究发现,使用Random Forest模型可以达到90.56%的检测精度,并且有助于操作人员做出决策。Abstract
This research proposes a machine learning-based attack detection model for power systems, specifically targeting smart grids. By utilizing data and logs collected from Phasor Measuring Devices (PMUs), the model aims to learn system behaviors and effectively identify potential security boundaries. The proposed approach involves crucial stages including dataset pre-processing, feature selection, model creation, and evaluation. To validate our approach, we used a dataset used, consist of 15 separate datasets obtained from different PMUs, relay snort alarms and logs. Three machine learning models: Random Forest, Logistic Regression, and K-Nearest Neighbour were built and evaluated using various performance metrics. The findings indicate that the Random Forest model achieves the highest performance with an accuracy of 90.56% in detecting power system disturbances and has the potential in assisting operators in decision-making processes.
摘要
这个研究提出了一种基于机器学习的电力系统攻击检测模型,特别是针对智能电网。通过利用phasor Measuring Devices(PMUs)收集的数据和日志,模型希望学习系统行为并有效地识别潜在的安全边界。提出的方法包括重要的阶段,包括数据集 pré-处理、特征选择、模型创建和评估。为验证我们的方法,我们使用了15个不同PMUs、闭合风暴报警和日志的数据集。我们建立了三种机器学习模型:Random Forest、Logistic Regression和K-Nearest Neighbour,并使用了不同的性能指标进行评估。研究发现,Random Forest模型在检测电力系统干扰的准确率达90.56%,并有助于操作人员决策过程中。
Assisting Clinical Decisions for Scarcely Available Treatment via Disentangled Latent Representation
paper_authors: Bing Xue, Ahmed Sameh Said, Ziqi Xu, Hanyang Liu, Neel Shah, Hanqing Yang, Philip Payne, Chenyang Lu
for: This paper aims to support clinical decisions for COVID-19 patients who require extracorporeal membrane oxygenation (ECMO) treatment.
methods: The paper proposes a novel approach called Treatment Variational AutoEncoder (TVAE) to predict individualized treatment outcomes for COVID-19 patients. TVAE uses a deep latent variable model to represent patients’ potential treatment assignments and factual/counterfactual outcomes, and alleviates prediction errors through a reconstruction regularization scheme and semi-supervision.
results: The paper evaluates TVAE on two real-world COVID-19 datasets and shows that it outperforms state-of-the-art treatment effect models in predicting propensity scores and factual outcomes on heterogeneous datasets. Additionally, TVAE outperforms existing models in individual treatment effect estimation on a synthesized dataset.Abstract
Extracorporeal membrane oxygenation (ECMO) is an essential life-supporting modality for COVID-19 patients who are refractory to conventional therapies. However, the proper treatment decision has been the subject of significant debate and it remains controversial about who benefits from this scarcely available and technically complex treatment option. To support clinical decisions, it is a critical need to predict the treatment need and the potential treatment and no-treatment responses. Targeting this clinical challenge, we propose Treatment Variational AutoEncoder (TVAE), a novel approach for individualized treatment analysis. TVAE is specifically designed to address the modeling challenges like ECMO with strong treatment selection bias and scarce treatment cases. TVAE conceptualizes the treatment decision as a multi-scale problem. We model a patient's potential treatment assignment and the factual and counterfactual outcomes as part of their intrinsic characteristics that can be represented by a deep latent variable model. The factual and counterfactual prediction errors are alleviated via a reconstruction regularization scheme together with semi-supervision, and the selection bias and the scarcity of treatment cases are mitigated by the disentangled and distribution-matched latent space and the label-balancing generative strategy. We evaluate TVAE on two real-world COVID-19 datasets: an international dataset collected from 1651 hospitals across 63 countries, and a institutional dataset collected from 15 hospitals. The results show that TVAE outperforms state-of-the-art treatment effect models in predicting both the propensity scores and factual outcomes on heterogeneous COVID-19 datasets. Additional experiments also show TVAE outperforms the best existing models in individual treatment effect estimation on the synthesized IHDP benchmark dataset.
摘要
外部肺氧化(ECMO)是covid-19患者无法响应传统治疗的生命支持 modalities。然而,正确的治疗决策仍然存在争议,并且不确定哪些患者会从这种罕见和技术复杂的治疗选择中受益。为支持临床决策,我们需要预测治疗需求和可能的治疗和无治疗响应。为解决这种临床挑战,我们提出了个性化治疗分析方法——治疗变量自适应器(TVAE)。TVAE是为了解决ECMO治疗选择偏袋和罕见治疗案例的模型挑战而设计的。我们将患者的可能的治疗决策和实际和对照结果视为患者的内在特征,并使用深度约束模型来表示。寻求和对照预测错误的约束来自重构规则和半监督学习,同时通过分配空间和标签匹配的生成策略来缓解选择偏袋和罕见治疗案例的问题。我们在两个真实世界COVID-19数据集上评估了TVAE:一个国际数据集来自1651家医院在63个国家,另一个机构数据集来自15家医院。结果表明,TVAE在不同COVID-19数据集上预测propensity score和实际结果的性能都高于状态的投入效果模型。此外,我们还通过附加的实验表明,TVAE在个体治疗效果预测方面也超过了 beste existing models。
On Invariance, Equivariance, Correlation and Convolution of Spherical Harmonic Representations for Scalar and Vectorial Data
results: 本文扩展了scalar SH表示到vectorial harmonics(VH),为3Dvector场在圆形幂上提供了相同的功能。Abstract
The mathematical representations of data in the Spherical Harmonic (SH) domain has recently regained increasing interest in the machine learning community. This technical report gives an in-depth introduction to the theoretical foundation and practical implementation of SH representations, summarizing works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. In extension, these methods are then generalized from scalar SH representations to Vectorial Harmonics (VH), providing the same capabilities for 3d vector fields on spheres
摘要
Recently, the mathematical representations of data in the Spherical Harmonic (SH) domain have gained increasing interest in the machine learning community. This technical report provides an in-depth introduction to the theoretical foundation and practical implementation of SH representations, including works on rotation invariant and equivariant features, as well as convolutions and exact correlations of signals on spheres. Additionally, these methods are extended from scalar SH representations to Vectorial Harmonics (VH), enabling the same capabilities for 3D vector fields on spheres.Here's the translation in Traditional Chinese:最近,圆球几何(Spherical Harmonic,SH)领域中的数据数学表现方法在机器学习社区中受到增加的关注。本技术报告将提供深入的理论基础和实践SH表现方法,包括对于旋转不变和对称特征、圆球上信号的卷积和精确相关性。此外,这些方法还被扩展到对 vectorial harmonics(VH),实现3D вектор场在圆球上的相同能力。
When Fair Classification Meets Noisy Protected Attributes
results: 研究发现,忽略属性和忽略噪声的公平分类算法可以在保护属性是不可靠或噪声的情况下达到类似的性能水平,但实施需要谨慎。Abstract
The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.
摘要
“algorithmic fairness的实施面临多种实际挑战,其中最大的问题之一是数据集中保护特征的可用性和可靠性。在真实世界中,法律和实际困难可能会阻止对民生数据的收集和使用,使得保证algorithmic fairness变得困难。初期的公平算法并不考虑这些限制,但最新的建议旨在通过不考虑保护特征或使用噪音来实现公平分类。根据我们所知,这是首次对公平分类算法进行了头对头比较,并考虑了两个轴:预测性和公平性。我们通过四个真实世界数据集和 sintetic perturbations 进行了测试。我们的研究发现,忽略保护特征和噪音忍容的公平分类算法可能能够与依赖保护特征的算法具有相似的性能水平,即使保护特征噪音。但是,在实践中实现这些算法需要谨慎。我们的研究为在保护特征噪音或部分可用的场景中使用公平分类算法提供了实践意义。”Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".
A Vulnerability of Attribution Methods Using Pre-Softmax Scores
results: 研究发现,这种修改方法可以导致拟合方法的解释结果受到影响,而不需要改变模型的输出。Abstract
We discuss a vulnerability involving a category of attribution methods used to provide explanations for the outputs of convolutional neural networks working as classifiers. It is known that this type of networks are vulnerable to adversarial attacks, in which imperceptible perturbations of the input may alter the outputs of the model. In contrast, here we focus on effects that small modifications in the model may cause on the attribution method without altering the model outputs.
摘要
我们讨论了一个漏洞,它与对于卷积神经网作为分类器的说明方法有关。知道这种神经网容易受到敌意攻击,这种攻击可以通过对输入进行微妙的变化,导致模型的输出变化。相反,我们在这里专注于对于说明方法的小修改,不会改变模型的输出。
Equivariant Spherical CNN for Data Efficient and High-Performance Medical Image Processing
results: 研究结果显示,使用对称网络可以实现医疗图像处理中的高品质和高效率,并且可以降低训练集的size,以减少训练时间和计算成本。Abstract
This work highlights the significance of equivariant networks as efficient and high-performance approaches for tomography applications. Our study builds upon the limitations of Convolutional Neural Networks (CNNs), which have shown promise in post-processing various medical imaging systems. However, the efficiency of conventional CNNs heavily relies on an undiminished and proper training set. To tackle this issue, in this study, we introduce an equivariant network, aiming to reduce CNN's dependency on specific training sets. We evaluate the efficacy of equivariant CNNs on spherical signals for tomographic medical imaging problems. Our results demonstrate superior quality and computational efficiency of spherical CNNs (SCNNs) in denoising and reconstructing benchmark problems. Furthermore, we propose a novel approach to employ SCNNs as a complement to conventional image reconstruction tools, enhancing the outcomes while reducing reliance on the training set. Across all cases, we observe a significant decrease in computational costs while maintaining the same or higher quality of image processing using SCNNs compared to CNNs. Additionally, we explore the potential of this network for broader tomography applications, particularly those requiring omnidirectional representation.
摘要
Translation note:* "Equivariant networks" is translated as "协变网络" (fùbiàn wǎngluò), which means the network architecture that preserves the symmetry of the input data.* "Spherical signals" is translated as "球形信号" (qiúxíng xìnhù), which refers to the signals that have spherical symmetry.* "Tomographic medical imaging" is translated as "tomography医学影像" (tòngshì yīxué yǐngxiàng), which refers to the medical imaging techniques that use X-rays or other forms of radiation to create cross-sectional images of the body.* "Convolutional Neural Networks" is translated as "卷积神经网络" (juéshì shénxiào wǎngluò), which is the abbreviation of CNNs.* "Omnidirectional representation" is translated as "全方位表示" (quánfāngwèi bǎoshì), which means the representation that captures the information from all directions.
OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
paper_authors: Andreas Karatzas, Iraklis Anagnostopoulos
for: 提高多个深度神经网络(DNN)应用工作负载的高性能和高效率
methods: 使用杂种加速器、硬件异构性和随机空间探索技术
results: 与其他状态对比方法相比,实现了平均吞吐量提高4.6倍Abstract
Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.
摘要
现代深度神经网络(DNN)具有深刻的效率和准确性特性。这引入了包含多个DNN应用的工作负荷,引起了新的工作负荷分布挑战。新的嵌入式系统采用多种加速器,导致系统架构多样性,现有的运行时控制器无法完全利用。为实现高吞吨在多个DNN工作负荷中,这种控制器应该探索数以千计的可能性。在这篇论文中,我们提出了OmniBoost,一个轻量级的多DNN管理器,适用于多种嵌入式设备。我们利用随机空间探索和高度准确的性能估计器,观察到与其他状态态方法相比,平均吞吨提升4.6倍。测试结果在HiKey970开发板上进行。
Optimal Scalarizations for Sublinear Hypervolume Regret
results: 我们的研究表明,使用 hypervolume 归一化方法可以获得提高的搜索效率,并且可以在多目标问题中提供更好的解决方案。我们的实验结果也表明,使用简单的 hypervolume 归一化方法可以在 bayesian 优化中表现更好,并且可以超越标准的多目标算法,如 EHVI。Abstract
Scalarization is a general technique that can be deployed in any multiobjective setting to reduce multiple objectives into one, such as recently in RLHF for training reward models that align human preferences. Yet some have dismissed this classical approach because linear scalarizations are known to miss concave regions of the Pareto frontier. To that end, we aim to find simple non-linear scalarizations that can explore a diverse set of $k$ objectives on the Pareto frontier, as measured by the dominated hypervolume. We show that hypervolume scalarizations with uniformly random weights are surprisingly optimal for provably minimizing the hypervolume regret, achieving an optimal sublinear regret bound of $O(T^{-1/k})$, with matching lower bounds that preclude any algorithm from doing better asymptotically. As a theoretical case study, we consider the multiobjective stochastic linear bandits problem and demonstrate that by exploiting the sublinear regret bounds of the hypervolume scalarizations, we can derive a novel non-Euclidean analysis that produces improved hypervolume regret bounds of $\tilde{O}( d T^{-1/2} + T^{-1/k})$. We support our theory with strong empirical performance of using simple hypervolume scalarizations that consistently outperforms both the linear and Chebyshev scalarizations, as well as standard multiobjective algorithms in bayesian optimization, such as EHVI.
摘要
scalarization 是一种通用技术,可以在多目标设置中降低多个目标到一个,例如在RLHF中训练奖励模型,以实现人类偏好的Alignment。然而,一些人认为这种经典方法不合适,因为线性Scalarization会错过凹陷区域的Pareto前沿。为此,我们想找到简单的非线性Scalarization,以探索$k$个目标在Pareto前沿上的多样化集合,由dominated hypervolume来度量。我们表明,在随机权重下的 hypervolume scalarization 可以让我们提取优质的 hypervolume regret,实现 $O(T^{-1/k})$ 的优linear regret bound,与它们匹配的下界,阻止任何算法在极限情况下做得更好。作为一个理论案例,我们考虑了多目标随机线性带宽问题,并证明了通过权重Scalarization 的Sublinear regret bound,我们可以 derivate一个新的非Euclidean分析,生成改进的 hypervolume regret bound 的 $\tilde{O}(dT^{-1/2} + T^{-1/k})$。我们的理论实际上支持了使用简单的 hypervolume scalarization,常常超越了线性和Chebyshev scalarization,以及标准多目标算法在 bayesian optimization 中,如EHVI。
Empirical Analysis of a Segmentation Foundation Model in Prostate Imaging
paper_authors: Heejong Kim, Victor Ion Butoi, Adrian V. Dalca, Daniel J. A. Margolis, Mert R. Sabuncu
for: This paper is written for the purpose of evaluating the effectiveness of a foundation model for medical image segmentation, specifically in the context of prostate imaging.
methods: The paper uses a recently developed foundation model called UniverSeg, which is compared against the conventional approach of training a task-specific segmentation model.
results: The study finds that the foundation model achieves competitive performance in prostate imaging segmentation, and highlights several important factors that will be important in the development and adoption of foundation models for medical image segmentation.Here’s the same information in Simplified Chinese text:
results: 研究发现,基础模型在肾脏成像分割中实现了竞争性的性能,并提出了各种重要因素,这些因素将在基础模型的开发和应用中扮演重要的角色。Abstract
Most state-of-the-art techniques for medical image segmentation rely on deep-learning models. These models, however, are often trained on narrowly-defined tasks in a supervised fashion, which requires expensive labeled datasets. Recent advances in several machine learning domains, such as natural language generation have demonstrated the feasibility and utility of building foundation models that can be customized for various downstream tasks with little to no labeled data. This likely represents a paradigm shift for medical imaging, where we expect that foundation models may shape the future of the field. In this paper, we consider a recently developed foundation model for medical image segmentation, UniverSeg. We conduct an empirical evaluation study in the context of prostate imaging and compare it against the conventional approach of training a task-specific segmentation model. Our results and discussion highlight several important factors that will likely be important in the development and adoption of foundation models for medical image segmentation.
摘要
现代医疗影像分类技术多采用深度学习模型。然而,这些模型通常需要高价的标签数据来训练,导致成本高昂。近年,自然语言生成等机器学习领域的进步,已经证明了建立基础模型,可以根据不同的下游任务进行定制,仅需少量或无标签数据。这将可能成为医疗影像领域的新模式,我们预料基础模型将未来医疗影像领域的发展推动。本文考虑了最近发展的医疗影像分类基础模型UniverSeg,并在阴茎影像上进行了实验性评估,与传统方法(即训练专门的医疗影像分类模型)进行比较。我们的结果和讨论显示了一些重要的因素,将影响医疗影像分类基础模型的发展和采用。
results: 这些模型在视觉语言任务上表现出色,并且在不同任务上具有较高的灵活性和适应能力。Abstract
Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models have greatly improved performance and versatility over previous vision language models. They do so by pretraining models on a large generic datasets and transferring their learning to new tasks with minor changes in architecture and parameter values. This type of transfer learning has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks which require both vision and language. In this paper, we provide a broad synthesis of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations and some open questions that remain.
摘要
computer vision tasks that require both vision and language, such as answering questions about or generating captions that describe an image, are difficult for computers to perform. Recently, researchers have adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling, which has greatly improved performance and versatility over previous vision language models. These models are trained on large generic datasets and then transferred to new tasks with minor changes in architecture and parameter values, which has become the standard modeling practice in both natural language processing and computer vision. Vision language transformers offer the promise of producing similar advancements in tasks that require both vision and language. In this paper, we provide a comprehensive overview of the currently available research on vision language transformer models and offer some analysis of their strengths, limitations, and open questions that remain.
Learned Kernels for Interpretable and Efficient PPG Signal Quality Assessment and Artifact Segmentation
results: 本研究实验结果表明,提出的方法可以与现有的深度神经网络(DNN)方法相当或更好地提取Physiological Parameters,同时具有许多次更多的参数和更高的计算和存储效率。Abstract
Photoplethysmography (PPG) provides a low-cost, non-invasive method to continuously monitor various cardiovascular parameters. PPG signals are generated by wearable devices and frequently contain large artifacts caused by external factors, such as motion of the human subject. In order to ensure robust and accurate extraction of physiological parameters, corrupted areas of the signal need to be identified and handled appropriately. Previous methodology relied either on handcrafted feature detectors or signal metrics which yield sub-optimal performance, or relied on machine learning techniques such as deep neural networks (DNN) which lack interpretability and are computationally and memory intensive. In this work, we present a novel method to learn a small set of interpretable convolutional kernels that has performance similar to -- and often better than -- the state-of-the-art DNN approach with several orders of magnitude fewer parameters. This work allows for efficient, robust, and interpretable signal quality assessment and artifact segmentation on low-power devices.
摘要
Neural Network Field Theories: Non-Gaussianity, Actions, and Locality
results: 论文表明,在 infinite-width (infinite-$N) Limit下, neural network ensemble可以被视为一种自由场理论,并且可以使用 field theory 的方法来描述。Abstract
Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.
摘要
Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.Here's the translation in Traditional Chinese: Both the path integral measure in field theory and ensembles of neural networks describe distributions over functions. When the central limit theorem can be applied in the infinite-width (infinite-$N$) limit, the ensemble of networks corresponds to a free field theory. Although an expansion in $1/N$ corresponds to interactions in the field theory, others, such as in a small breaking of the statistical independence of network parameters, can also lead to interacting theories. These other expansions can be advantageous over the $1/N$-expansion, for example by improved behavior with respect to the universal approximation theorem. Given the connected correlators of a field theory, one can systematically reconstruct the action order-by-order in the expansion parameter, using a new Feynman diagram prescription whose vertices are the connected correlators. This method is motivated by the Edgeworth expansion and allows one to derive actions for neural network field theories. Conversely, the correspondence allows one to engineer architectures realizing a given field theory by representing action deformations as deformations of neural network parameter densities. As an example, $\phi^4$ theory is realized as an infinite-$N$ neural network field theory.
results: 本研究的结果表明, compared to现有方法,本方法在创建自然风景以及艺术和其他世界的电影画面方面表现出色,并且可以控制动作方向使用文本。此外,本研究还扩展到了将现有的画作动画化以及通过文本控制动作方向。Abstract
We introduce Text2Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.
摘要
我们介绍Text2Cinemagraph,一种完全自动的方法,可以从文本描述中生成电影图像 - 特别是处理含有想象元素和艺术风格的描述时,这是一个非常困难的任务。现有的单图动画方法在艺术输入方面有限,而 recient的文本基于视频方法经常出现时间不一致,尝试维持某些区域静止。为解决这些挑战,我们提出了一种将文本描述转化为两个图像的想法 - 一个是一个艺术性的图像,另一个是其像素对齐的自然看起来的图像。而艺术性的图像会具有文本描述中的风格和形态,而自然图像则会大大简化布局和动作分析。利用现有的自然图像和视频数据集,我们可以准确地分割自然图像,并预测可能的动作,基于 semantic信息。预测的动作然后可以被传递到艺术性的图像,以创建最终的电影图像。我们的方法比既有方法在创建电影图像的自然风景以及艺术和其他世界的场景上表现出色,并通过自动度量和用户研究得到了证明。最后,我们还展示了两种扩展:将现有的画作动画并控制动作方向使用文本。
TGRL: An Algorithm for Teacher Guided Reinforcement Learning
results: 本文的实验结果显示,使用Teacher Guided Reinforcement Learning(TGRL)方法可以在多个领域中超越强基线,而无需进行参数调整。Abstract
Learning from rewards (i.e., reinforcement learning or RL) and learning to imitate a teacher (i.e., teacher-student learning) are two established approaches for solving sequential decision-making problems. To combine the benefits of these different forms of learning, it is common to train a policy to maximize a combination of reinforcement and teacher-student learning objectives. However, without a principled method to balance these objectives, prior work used heuristics and problem-specific hyperparameter searches to balance the two objectives. We present a $\textit{principled}$ approach, along with an approximate implementation for $\textit{dynamically}$ and $\textit{automatically}$ balancing when to follow the teacher and when to use rewards. The main idea is to adjust the importance of teacher supervision by comparing the agent's performance to the counterfactual scenario of the agent learning without teacher supervision and only from rewards. If using teacher supervision improves performance, the importance of teacher supervision is increased and otherwise it is decreased. Our method, $\textit{Teacher Guided Reinforcement Learning}$ (TGRL), outperforms strong baselines across diverse domains without hyper-parameter tuning.
摘要
学习从奖励(i.e., 奖励学习或RL)和学习教师(i.e., 教师学习)是两种成熔的解决Sequential decision-making问题的方法。为了结合这些不同的学习方法的优点,通常是训练一个策略以最大化权重的奖励和教师学习目标。然而,在过去,无法使用原则性的方法均衡这两个目标,而是使用规则和问题特有的超参数搜索来均衡。我们提出了一种原则性的方法,以及一种近似的实现方式,可以在运动时动态地和自动地调整在学习从教师和奖励中选择何时遵循教师的指导。我们的方法被称为“教师导向奖励学习”(TGRL),在多个领域中击败了强大的基准值,无需hyperparameter调整。
Quantification of Uncertainty with Adversarial Models
results: 实验显示,QUAM方法可以优化深度学习模型中的不确定量化,并且在类型识别、物体检测和其他视觉任务中表现出色,比前一些方法更好。Abstract
Quantifying uncertainty is important for actionable predictions in real-world applications. A crucial part of predictive uncertainty quantification is the estimation of epistemic uncertainty, which is defined as an integral of the product between a divergence function and the posterior. Current methods such as Deep Ensembles or MC dropout underperform at estimating the epistemic uncertainty, since they primarily consider the posterior when sampling models. We suggest Quantification of Uncertainty with Adversarial Models (QUAM) to better estimate the epistemic uncertainty. QUAM identifies regions where the whole product under the integral is large, not just the posterior. Consequently, QUAM has lower approximation error of the epistemic uncertainty compared to previous methods. Models for which the product is large correspond to adversarial models (not adversarial examples!). Adversarial models have both a high posterior as well as a high divergence between their predictions and that of a reference model. Our experiments show that QUAM excels in capturing epistemic uncertainty for deep learning models and outperforms previous methods on challenging tasks in the vision domain.
摘要
量化未知是重要的 predictive uncertainty quantification 中的一部分。 epistemic uncertainty 的定义为积分函数和 posterior 的产品。现有的方法,如 Deep Ensembles 或 MC dropout,在估计 epistemic uncertainty 方面表现不佳,因为它们主要依靠 posterior 的样本。我们建议 Quantification of Uncertainty with Adversarial Models (QUAM),可以更好地估计 epistemic uncertainty。QUAM 可以在积分函数下找到整体积分值大的区域,不仅是 posterior。因此,QUAM 的 Approximation error 相对于之前的方法更低。模型具有高积分值的区域对应于 adversarial models(不是 adversarial examples!)。 adversarial models 具有高 posterior 和 reference model 的预测值之间的差异。我们的实验表明,QUAM 在 deep learning 模型中表现出色,与之前的方法在视觉领域中的 challenging tasks 上表现出色。
Learning Curves for Heterogeneous Feature-Subsampled Ridge Ensembles
results: 在线性回归设置下,通过调整子集大小和特征数量,实现更好的预测性能,并发现在参数空间中存在锐transition。Abstract
Feature bagging is a well-established ensembling method which aims to reduce prediction variance by training estimators in an ensemble on random subsamples or projections of features. Typically, ensembles are chosen to be homogeneous, in the sense the the number of feature dimensions available to an estimator is uniform across the ensemble. Here, we introduce heterogeneous feature ensembling, with estimators built on varying number of feature dimensions, and consider its performance in a linear regression setting. We study an ensemble of linear predictors, each fit using ridge regression on a subset of the available features. We allow the number of features included in these subsets to vary. Using the replica trick from statistical physics, we derive learning curves for ridge ensembles with deterministic linear masks. We obtain explicit expressions for the learning curves in the case of equicorrelated data with an isotropic feature noise. Using the derived expressions, we investigate the effect of subsampling and ensembling, finding sharp transitions in the optimal ensembling strategy in the parameter space of noise level, data correlations, and data-task alignment. Finally, we suggest variable-dimension feature bagging as a strategy to mitigate double descent for robust machine learning in practice.
摘要
feature bagging 是一种已经广泛应用的 ensemble 方法,旨在降低预测变分的方法,通过在随机子样本或投影中训练 estimator ensemble。通常,ensemble 被选择为Homogeneous,即每个 estimator 在 ensemble 中 disposal 的 feature 维度是固定的。在这文中,我们介绍了Heterogeneous feature ensembling,其中 estimator 建立在不同的 feature 维度上,并考虑其在线性回归设置下的性能。我们研究了一个 ensemble 的线性预测器,每个预测器使用ridge regression在一 subset 中的可用 feature 上进行训练。我们允许这些subset 中包含的 feature 的数量发生变化。使用统计物理中的replica trick,我们得到了ridge ensemble 的学习曲线,其中包括 equicorrelated 数据和各向异otropic 特征噪音。使用 derivations 中的表达,我们调查了 subsampling 和 ensembling 对 optimal 结果的影响,并发现了参数空间中的锐转点。最后,我们建议 variable-dimension feature bagging 作为一种 mitigate double descent 的实践策略。
Push Past Green: Learning to Look Behind Plant Foliage by Moving It
results: 实验表明SRPNet在5个设定下对一种 sintetic (蔷薇) 和一种真实植物 ( Draceana) 的物理测试床上表现出色,超过了一种竞争性手工探索方法。 SRPNet也在对手工动力模型和相关减少中表现出色。Abstract
Autonomous agriculture applications (e.g., inspection, phenotyping, plucking fruits) require manipulating the plant foliage to look behind the leaves and the branches. Partial visibility, extreme clutter, thin structures, and unknown geometry and dynamics for plants make such manipulation challenging. We tackle these challenges through data-driven methods. We use self-supervision to train SRPNet, a neural network that predicts what space is revealed on execution of a candidate action on a given plant. We use SRPNet with the cross-entropy method to predict actions that are effective at revealing space beneath plant foliage. Furthermore, as SRPNet does not just predict how much space is revealed but also where it is revealed, we can execute a sequence of actions that incrementally reveal more and more space beneath the plant foliage. We experiment with a synthetic (vines) and a real plant (Dracaena) on a physical test-bed across 5 settings including 2 settings that test generalization to novel plant configurations. Our experiments reveal the effectiveness of our overall method, PPG, over a competitive hand-crafted exploration method, and the effectiveness of SRPNet over a hand-crafted dynamics model and relevant ablations.
摘要
自主农业应用(如检查、辐射类型、摘取水果)需要对植物叶子和枝干进行检查和操作。由于植物的部分可见性、极度堆积、细小结构和不确定的植物geometry和动力学,这种操作具有挑战性。我们通过数据驱动方法解决这些挑战。我们使用自我监督训练SRPNet,一种神经网络,该网络预测执行给定植物的候选动作后所可见的空间。我们使用SRPNet与十字积分方法预测有效的执行动作,以便逐渐暴露植物下方的空间。我们在Synthetic(蔷薇)和实际植物(Dracean)上进行了物理测试,并在5个设定中进行了测试,其中2个设定检验了植物配置的普适性。我们的实验表明我们的总方法PPG在比手动探索方法更有效,而SRPNet在手动动力模型和相关减少中表现更有效。
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation
paper_authors: Kirill Neklyudov, Jannes Nys, Luca Thiede, Juan Carrasquilla, Qiang Liu, Max Welling, Alireza Makhzani
for: solves the quantum many-body Schrödinger equation, a fundamental problem in quantum physics, chemistry, and materials science.
methods: uses deep learning methods to represent wave functions as neural networks, and reformulates energy functional minimization in the space of Born distributions.
results: demonstrates faster convergence to the ground state of molecular systems using the proposed “Wasserstein Quantum Monte Carlo” (WQMC) method.Here’s the full text in Simplified Chinese:
for: solves the quantum many-body Schrödinger equation, a fundamental problem in quantum physics, chemistry, and materials science.
methods: uses deep learning methods to represent wave functions as neural networks, and reformulates energy functional minimization in the space of Born distributions.
results: demonstrates faster convergence to the ground state of molecular systems using the proposed “Wasserstein Quantum Monte Carlo” (WQMC) method.Abstract
Solving the quantum many-body Schr\"odinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher-Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose "Wasserstein Quantum Monte Carlo" (WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than Fisher-Rao metric, and corresponds to transporting the probability mass, rather than teleporting it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
摘要
解决量子多体Шрёдингер方程是物理学、化学和材料科学领域的基本和挑战性问题。一种常见的计算方法是量子变量 Monte Carlo(QVMC),在这种方法中,系统的基态解是通过在限定的参数化波函数内寻找能量最小值来获得。深度学习方法可以部分解决传统QVMC中的限制,因为它可以表示一个富有的波函数家族使用神经网络。然而,QVMC中的优化目标仍然具有困难度,需要使用次序优化方法,如自然梯度。在这篇论文中,我们首先将能量函数最小化转换为 Born 分布对应的 particle-permutation(反)对称波函数的空间中进行,然后将 QVMC 解释为 Born 分布空间中的 Fisher-Rao 梯度流。接着,我们在这个分布空间中尝试新的 QMC 算法,通过给分布空间添加更好的 метри,并跟踪这些 метри 导引的投影流。更具体来说,我们提出了 "Wasserstein Quantum Monte Carlo"(WQMC),它使用梯度流导引的 Wasserstein metric,而不是 Fisher-Rao metric,并与teleporting 不同。我们通过实验证明,WQMC 的动力学会更快地 converges 到分子系统的基态解。
Focused Transformer: Contrastive Training for Context Scaling
paper_authors: Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś
for: 提高大型语言模型在上下文长度方面的潜在能力
methods: 通过访问外部内存,让注意层访问更多的文档,并采用对比学习的训练方法解决焦点问题
results: 实现了在长上下文下进行精准的启发式学习,并且可以细化大型模型的上下文长度,提高模型的性能Abstract
Large language models have an exceptional capability to incorporate new information in a contextual manner. However, the full potential of such an approach is often restrained due to a limitation in the effective context length. One solution to this issue is to endow an attention layer with access to an external memory, which comprises of (key, value) pairs. Yet, as the number of documents increases, the proportion of relevant keys to irrelevant ones decreases, leading the model to focus more on the irrelevant keys. We identify a significant challenge, dubbed the distraction issue, where keys linked to different semantic values might overlap, making them hard to distinguish. To tackle this problem, we introduce the Focused Transformer (FoT), a technique that employs a training process inspired by contrastive learning. This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length. Our method allows for fine-tuning pre-existing, large-scale models to lengthen their effective context. This is demonstrated by our fine-tuning of $3B$ and $7B$ OpenLLaMA checkpoints. The resulting models, which we name LongLLaMA, exhibit advancements in tasks requiring a long context. We further illustrate that our LongLLaMA models adeptly manage a $256 k$ context length for passkey retrieval.
摘要
大型语言模型具有Exceptional的能力 Contextual 地搜集新信息。然而,这种方法的潜力 Frequently 受限因为Context Length的限制。一种解决方案是赋予Attention层访问 External Memory,其包含(键、值)对。然而,随着文档数量的增加,相关键对应的权重比例逐渐减少,导致模型更多地关注无关键。我们描述了一个Significant Challenge,称之为distraction issue,其中键 Linked to Different Semantic Values 可能会 overlap,使其困难分辨。为解决这个问题,我们引入了Focused Transformer(FoT),一种基于对比学习的训练方法。这种新的approach 使(键、值)空间的结构更加稠密,使Context Length可以更长。我们的方法允许对Pre-existing, Large-scale模型进行细化,从而Lengthen its Effective Context。我们的 Fine-tuning $3B$ 和 $7B$ OpenLLaMA Checkpoint 的结果,我们命名为LongLLaMA,在需要Long Context的任务中展现出了进步。我们还证明了我们的 LongLLaMA 模型可以efficaciously manage $256k$ Context Length for Passkey Retrieval。
Can Domain Adaptation Improve Accuracy and Fairness of Skin Lesion Classification?
results: 研究结果显示,UDA 在 binary 分类任务中效果显著,并且在减轻偏置问题时进一步提高了性能。在多类任务中,UDA 的表现较弱,需要处理偏置问题以达到上baseline的准确率。通过我们的量化分析,我们发现测试错误率与标签转移强相关,而特征级 UDA 方法在不平衡数据集上有限制。最后,我们的研究表明,UDA 可以有效地减少对少数群体的偏见,且不需要显式使用公平预处理技术。Abstract
Deep learning-based diagnostic system has demonstrated potential in classifying skin cancer conditions when labeled training example are abundant. However, skin lesion analysis often suffers from a scarcity of labeled data, hindering the development of an accurate and reliable diagnostic system. In this work, we leverage multiple skin lesion datasets and investigate the feasibility of various unsupervised domain adaptation (UDA) methods in binary and multi-class skin lesion classification. In particular, we assess three UDA training schemes: single-, combined-, and multi-source. Our experiment results show that UDA is effective in binary classification, with further improvement being observed when imbalance is mitigated. In multi-class task, its performance is less prominent, and imbalance problem again needs to be addressed to achieve above-baseline accuracy. Through our quantitative analysis, we find that the test error of multi-class tasks is strongly correlated with label shift, and feature-level UDA methods have limitations when handling imbalanced datasets. Finally, our study reveals that UDA can effectively reduce bias against minority groups and promote fairness, even without the explicit use of fairness-focused techniques.
摘要
深度学习基于的诊断系统在有 suficient 标注示例时已经表现出了抑分类皮肤癌的潜力。然而,皮肤肿瘤分析通常受到标注数据的不足的限制,这阻碍了建立准确可靠的诊断系统。在这个工作中,我们利用多个皮肤肿瘤数据集,并 investigate了不同的无监督领域适应(UDA)方法在binary和多类皮肤肿瘤分类中的可行性。特别是,我们评估了单源、合并源和多源的UDA训练方案。我们的实验结果表明,UDA在binary分类任务中是有效的,并且在减轻偏见时进一步提高了表现。在多类任务中,其表现较弱,需要解决偏见问题以达到上基线的准确率。我们的量化分析表明,测试错误的多类任务和标签转移之间存在强相关性,而feature层UDA方法在不均衡数据集上有限制。最后,我们的研究表明,UDA可以有效地减少对少数群体的偏见,无需显式使用关注公平性的技术。
Topology-Aware Loss for Aorta and Great Vessel Segmentation in Computed Tomography Images
results: 对于 4327 个 CT 图像和 24 个主体的实验表明,提出的损失函数可以更好地提高图像分割的性能, indicating the effectiveness of this approach.Abstract
Segmentation networks are not explicitly imposed to learn global invariants of an image, such as the shape of an object and the geometry between multiple objects, when they are trained with a standard loss function. On the other hand, incorporating such invariants into network training may help improve performance for various segmentation tasks when they are the intrinsic characteristics of the objects to be segmented. One example is segmentation of aorta and great vessels in computed tomography (CT) images where vessels are found in a particular geometry in the body due to the human anatomy and they mostly seem as round objects on a 2D CT image. This paper addresses this issue by introducing a new topology-aware loss function that penalizes topology dissimilarities between the ground truth and prediction through persistent homology. Different from the previously suggested segmentation network designs, which apply the threshold filtration on a likelihood function of the prediction map and the Betti numbers of the ground truth, this paper proposes to apply the Vietoris-Rips filtration to obtain persistence diagrams of both ground truth and prediction maps and calculate the dissimilarity with the Wasserstein distance between the corresponding persistence diagrams. The use of this filtration has advantage of modeling shape and geometry at the same time, which may not happen when the threshold filtration is applied. Our experiments on 4327 CT images of 24 subjects reveal that the proposed topology-aware loss function leads to better results than its counterparts, indicating the effectiveness of this use.
摘要
对于批处理图像中的分割任务,传统的损失函数不会直接学习图像中的全局不变量,如物体形状和多个物体之间的几何关系。然而,在某些任务中,这些不变量是物体的内在特征,通过将它们包含在网络训练中可能会提高分割性能。例如,计算机 Tomatoes(CT)图像中的血管和大血管分割任务中,血管在人体 анаatomy 中的特定几何位置,通常在2D CT 图像上看到为圆形物体。本文通过引入一种新的 topology-aware 损失函数来解决这个问题,该损失函数通过 persist homology penalty topology 不同性 zwischen 真实值和预测值。与之前的 segmentation 网络设计不同,这里不是通过阈值滤波器应用 likelihood 函数和 Betti 数来实现,而是通过 Vietoris-Rips 滤波器来获得预测和真实值的 persistence 图,并计算它们之间的 Wasserstein 距离。这种方法的优点在于同时模型形状和几何,可能不会在使用阈值滤波器时发生。我们在 4327 个 CT 图像上进行了 24 个人的实验,发现提案的 topology-aware 损失函数可以更好地处理这些任务,表明其效果。
Multiplicative Updates for Online Convex Optimization over Symmetric Cones
paper_authors: Ilayda Canyakmaz, Wayne Lin, Georgios Piliouras, Antonios Varvitsiotis
For: 该 paper 研究在线凸优化中,可能的动作是 trace-one 元素在 симметричный cone 中的扩展,涵盖了广泛研究的专家设置和其量子对应体。* Methods: 该 paper 使用了 Euclidean Jordan Algebras 的工具,提出了无投影的 Symmetric-Cone Multiplicative Weights Update (SCMWU) 算法,用于在 trace-one slice 上进行在线优化。* Results: 该 paper 证明了 SCMWU 算法是一个无误算法,并且扩展了 Multiplicative Weights Update 方法的分析,包括probability simplex 和 density matrices 的扩展。Abstract
We study online convex optimization where the possible actions are trace-one elements in a symmetric cone, generalizing the extensively-studied experts setup and its quantum counterpart. Symmetric cones provide a unifying framework for some of the most important optimization models, including linear, second-order cone, and semidefinite optimization. Using tools from the field of Euclidean Jordan Algebras, we introduce the Symmetric-Cone Multiplicative Weights Update (SCMWU), a projection-free algorithm for online optimization over the trace-one slice of an arbitrary symmetric cone. We show that SCMWU is equivalent to Follow-the-Regularized-Leader and Online Mirror Descent with symmetric-cone negative entropy as regularizer. Using this structural result we show that SCMWU is a no-regret algorithm, and verify our theoretical results with extensive experiments. Our results unify and generalize the analysis for the Multiplicative Weights Update method over the probability simplex and the Matrix Multiplicative Weights Update method over the set of density matrices.
摘要
我们研究在线凸优化问题,其可能的动作是 traces-one 元素在一个对称体中,泛化了广泛研究的专家设定和其量子对应器。对称体提供一个统一的框架,包括线性、第二阶凸优化和半definite 优化问题。使用 Euclid Jordan 代数的工具,我们引入了 trace-one slice 的Symmetric-Cone 多重量更新(SCMWU)算法,不需要投影。我们证明 SCMWU 等价于 Follow-the-Regularized-Leader 和 Online Mirror Descent 的对称体负 entropy 作为规则。使用这种结构结果,我们证明 SCMWU 是一个不会追攻的算法,并通过广泛的实验来验证我们的理论结果。我们的结果将 Multiplicative Weights Update 方法在概率 Simplex 和 Matrix Multiplicative Weights Update 方法在密度矩阵上的分析统一和推广。
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
For: + The paper aims to investigate the distillation of visual representations in large teacher vision-language models into lightweight student models, with a focus on open-vocabulary out-of-distribution (OOD) generalization.* Methods: + The proposed method uses two principles from vision and language modality perspectives to enhance student’s OOD generalization: (1) by better imitating teacher’s visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher’s language representations with informative and finegrained semantic attributes to effectively distinguish between different labels.* Results: + The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of the proposed approaches.Abstract
Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Code released at https://github.com/xuanlinli17/large_vlm_distillation_ood
摘要
大型视言语模型已经实现了出色的表现,但它们的大小和计算需求使其在有限的设备和时间上不可靠性不允许其部署。模型缩小,将大型模型转换成更小的更快的模型,以保持大型模型的表现,是一个有前途的方向。这篇论文 investigate teacher视言语模型中的视 representations的缩小,使用小规模或中规模的 dataset。特别是,这种研究强调了无法表示(OOD)泛化问题,在前一个model distillation文献中受到了忽略。我们提出了两个原则,从视觉和语言模式的角度来提高学生的OOD泛化表现:(1)更好地模仿教师的视觉表示空间,并且细致地协调视语对应关系;(2)使用有用和细致的语言特征来有效地分类不同的标签。我们提出了一些指标,并进行了广泛的实验来调查它们的技术。结果表明,我们的提出的方法在零shot和几shot学生表现中具有显著的改进,强调了我们的提出的方法的效iveness。代码可以在https://github.com/xuanlinli17/large_vlm_distillation_ood中下载。
Benchmarking Test-Time Adaptation against Distribution Shifts in Image Classification
paper_authors: Yongcan Yu, Lijun Sheng, Ran He, Jian Liang
for: This paper aims to provide a benchmark for test-time adaptation (TTA) methods to enhance the generalization performance of models and improve their robustness against distribution shifts.
methods: The paper evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets, including CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods cover a range of adaptation scenarios, such as online adaptation vs. offline adaptation, instance adaptation vs. batch adaptation vs. domain adaptation.
results: The paper presents a unified framework in PyTorch to evaluate and compare the effectiveness of TTA methods across different datasets and network architectures. By establishing this benchmark, the authors aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance.Abstract
Test-time adaptation (TTA) is a technique aimed at enhancing the generalization performance of models by leveraging unlabeled samples solely during prediction. Given the need for robustness in neural network systems when faced with distribution shifts, numerous TTA methods have recently been proposed. However, evaluating these methods is often done under different settings, such as varying distribution shifts, backbones, and designing scenarios, leading to a lack of consistent and fair benchmarks to validate their effectiveness. To address this issue, we present a benchmark that systematically evaluates 13 prominent TTA methods and their variants on five widely used image classification datasets: CIFAR-10-C, CIFAR-100-C, ImageNet-C, DomainNet, and Office-Home. These methods encompass a wide range of adaptation scenarios (e.g. online adaptation v.s. offline adaptation, instance adaptation v.s. batch adaptation v.s. domain adaptation). Furthermore, we explore the compatibility of different TTA methods with diverse network backbones. To implement this benchmark, we have developed a unified framework in PyTorch, which allows for consistent evaluation and comparison of the TTA methods across the different datasets and network architectures. By establishing this benchmark, we aim to provide researchers and practitioners with a reliable means of assessing and comparing the effectiveness of TTA methods in improving model robustness and generalization performance. Our code is available at https://github.com/yuyongcan/Benchmark-TTA.
摘要
Test-time adaptation (TTA) 是一种技术,目的是通过在预测时使用无标示样本来提高模型的总体性能。由于神经网络系统面临到分布shift时的稳定性问题,现在有许多TTA方法被提出。然而,评估这些方法的效果通常是在不同的设置下进行,例如不同的分布shift、背景和设计方案,导致了评估这些方法的标准化和公平的标准准比不够。为解决这个问题,我们提出了一个benchmark,可以系统地评估13种知名的TTA方法和其变种在五种常用的图像分类 datasets上:CIFAR-10-C、CIFAR-100-C、ImageNet-C、DomainNet和Office-Home。这些方法涵盖了各种适应enario(例如在线适应与离线适应、实例适应与批适应、领域适应)。此外,我们还探索了不同的TTA方法与不同的网络背景的兼容性。为实现这个benchmark,我们在PyTorch上开发了一套统一的框架,可以在不同的 datasets和网络架构上进行一致的评估和比较TTA方法的效果。通过设立这个benchmark,我们希望为研究者和实践者提供一个可靠的方式来评估和比较TTA方法在提高模型的Robustness和总体性能方面的效果。我们的代码可以在https://github.com/yuyongcan/Benchmark-TTA上获取。
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
results: 在DataComp数据筛选benchmark中,T-MARS方法在”中等规模”下的性能比顶尖方法提高6.5%在ImageNet和4.7%在VTAB。此外,我们在不同的数据池大小从2M到64M进行系统性的评估,发现T-MARS方法的准确率提升与数据和计算的扩展幂。代码可以在https://github.com/locuslab/T-MARS中下载。Abstract
Large web-sourced multimodal datasets have powered a slew of new methods for learning general-purpose visual representations, advancing the state of the art in computer vision and revolutionizing zero- and few-shot recognition. One crucial decision facing practitioners is how, if at all, to curate these ever-larger datasets. For example, the creators of the LAION-5B dataset chose to retain only image-caption pairs whose CLIP similarity score exceeded a designated threshold. In this paper, we propose a new state-of-the-art data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Intuitively, such data could be wasteful as it incentivizes models to perform optical character recognition rather than learning visual features. However, naively removing all such data could also be wasteful, as it throws away images that contain visual features (in addition to overlapping text). Our simple and scalable approach, T-MARS (Text Masking and Re-Scoring), filters out only those pairs where the text dominates the remaining visual features -- by first masking out the text and then filtering out those with a low CLIP similarity score of the masked image. Experimentally, T-MARS outperforms the top-ranked method on the "medium scale" of DataComp (a data filtering benchmark) by a margin of 6.5% on ImageNet and 4.7% on VTAB. Additionally, our systematic evaluation on various data pool sizes from 2M to 64M shows that the accuracy gains enjoyed by T-MARS linearly increase as data and compute are scaled exponentially. Code is available at https://github.com/locuslab/T-MARS.
摘要
大量网络收集的多模式数据已经推动了新的计算机视觉表示学习方法的发展,对计算机视觉领域进行了革命性的改进。实际操作者面临的一个关键决策是如何处理这些日益增大的数据集。例如,LAION-5B数据集的创建者选择了只保留图像和描述对应的CLIP相似度score超过设置的阈值。在这篇论文中,我们提出了一种新的数据筛选方法,它是基于我们观察到LAION中约40%的图像含有与描述重叠的文本的观察。这些数据可能是浪费的,因为它们可能会让模型做Optical Character Recognition而不是学习视觉特征。然而, Naively removing all such data could also be wasteful, as it would throw away images that contain visual features (in addition to overlapping text).我们的简单和可扩展方法T-MARS(文本覆盖和重新分配)会过滤掉那些文本占据图像的大部分视觉特征的对应。我们首先对图像中的文本进行覆盖,然后对CLIP相似度score的覆盖图像进行过滤。实验表明,T-MARS在DataComp中的"中等规模"上比顶尖方法表现出6.5%的提升,在ImageNet和VTAB上分别提升4.7%和6.5%。此外,我们对不同的数据池大小从2M到64M进行系统性的评估,发现T-MARS的准确率提升 linearly 随着数据和计算的扩展幂数。代码可以在https://github.com/locuslab/T-MARS上获取。
results: 通过 simulations 表明,该框架在噪声数据上是稳定的,并且可以扩展到知道 Riemannian manifold 的情况。Abstract
In this paper we demonstrate how sub-Riemannian geometry can be used for manifold learning and surface reconstruction by combining local linear approximations of a point cloud to obtain lower dimensional bundles. Local approximations obtained by local PCAs are collected into a rank $k$ tangent subbundle on $\mathbb{R}^d$, $k摘要
在这篇论文中,我们示例了如何使用非里曼几何来进行拟合 manifold 和表面重建,通过将本地线性近似组合到一个更低维度的束上。本地近似由本地 PCA 获得,并将其集成为一个 rank $k$ 的 tangent 束在 $\mathbb{R}^d$ 上,其中 $k
Context-Aware Configuration and Management of WiFi Direct Groups for Real Opportunistic Networks
paper_authors: Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro
for: 该研究旨在提高 Wi-Fi Direct 技术在商业移动设备上的支持,以便实现基于设备间通信(D2D)的网络解决方案。
methods: 该研究提议一种新的中间层协议(WiFi Direct Group Manager,WFD-GM),以便自动配置和管理 Wi-Fi Direct 组。该协议包括一个 контекст函数,该函数考虑不同参数来创建最佳组配置,包括节点稳定性和功率水平。
results: 研究结果显示,WFD-GM 在不同的 mobilty 模型、地理区域和节点数量的三种参考enario中表现出色,与基准方法相比,在中等/低 mobilty 情况下表现更好,在高 mobilty 情况下与基准方法相当,无论添加额外开销。Abstract
Wi-Fi Direct is a promising technology for the support of device-to-device communications (D2D) on commercial mobile devices. However, the standard as-it-is is not sufficient to support the real deployment of networking solutions entirely based on D2D such as opportunistic networks. In fact, WiFi Direct presents some characteristics that could limit the autonomous creation of D2D connections among users' personal devices. Specifically, the standard explicitly requires the user's authorization to establish a connection between two or more devices, and it provides a limited support for inter-group communication. In some cases, this might lead to the creation of isolated groups of nodes which cannot communicate among each other. In this paper, we propose a novel middleware-layer protocol for the efficient configuration and management of WiFi Direct groups (WiFi Direct Group Manager, WFD-GM) to enable autonomous connections and inter-group communication. This enables opportunistic networks in real conditions (e.g., variable mobility and network size). WFD-GM defines a context function that takes into account heterogeneous parameters for the creation of the best group configuration in a specific time window, including an index of nodes' stability and power levels. We evaluate the protocol performances by simulating three reference scenarios including different mobility models, geographical areas and number of nodes. Simulations are also supported by experimental results related to the evaluation in a real testbed of the involved context parameters. We compare WFD-GM with the state-of-the-art solutions and we show that it performs significantly better than a Baseline approach in scenarios with medium/low mobility, and it is comparable with it in case of high mobility, without introducing additional overhead.
摘要
Learning Multi-Agent Intention-Aware Communication for Optimal Multi-Order Execution in Finance
paper_authors: Yuchen Fang, Zhenggang Tang, Kan Ren, Weiqing Liu, Li Zhao, Jiang Bian, Dongsheng Li, Weinan Zhang, Yong Yu, Tie-Yan Liu
for: 这 paper 的目的是解决多个订单同时执行的问题,使用模型自由学习(RL)技术。
methods: 这 paper 使用多代理RL(MARL)方法,每个代理都是一个特定的订单执行者,与其他代理交换信息以协同 Maximize 总收益。
results: 实验结果表明,使用提议的多round通信协议和行动值归因方法可以提高协同效果,并且在两个真实市场上达到了显著更好的性能。Abstract
Order execution is a fundamental task in quantitative finance, aiming at finishing acquisition or liquidation for a number of trading orders of the specific assets. Recent advance in model-free reinforcement learning (RL) provides a data-driven solution to the order execution problem. However, the existing works always optimize execution for an individual order, overlooking the practice that multiple orders are specified to execute simultaneously, resulting in suboptimality and bias. In this paper, we first present a multi-agent RL (MARL) method for multi-order execution considering practical constraints. Specifically, we treat every agent as an individual operator to trade one specific order, while keeping communicating with each other and collaborating for maximizing the overall profits. Nevertheless, the existing MARL algorithms often incorporate communication among agents by exchanging only the information of their partial observations, which is inefficient in complicated financial market. To improve collaboration, we then propose a learnable multi-round communication protocol, for the agents communicating the intended actions with each other and refining accordingly. It is optimized through a novel action value attribution method which is provably consistent with the original learning objective yet more efficient. The experiments on the data from two real-world markets have illustrated superior performance with significantly better collaboration effectiveness achieved by our method.
摘要
文本翻译为简化中文:订单执行是金融计算机科学中的基本任务,旨在完成购买或卖出一些资产的交易订单。现代无模型学习(RL)技术提供了基于数据的订单执行解决方案。然而,现有的工作都是优化每个订单的执行,忽略了实际情况下多个订单同时执行的情况,导致不优化和偏见。在本文中,我们首先提出了多代理RL(MARL)方法,用于多订单执行,考虑实际约束。 Specifically,我们将每个代理视为一个特定订单的交易者,保持与彼此交流,以 maximize 总收益。然而,现有的 MARL 算法通常通过互相交换部分观察信息来进行交流,这在金融市场中是不fficient的。为了改善协作,我们则提出了学习型多轮通信协议,让代理通过交换意图动作来交流,并根据此进行修正。这种协议由一种新的动作值归属方法优化,该方法与原始学习目标一致, yet more efficient。实验结果表明,我们的方法在两个真实的市场数据上表现出色,与传统方法相比,具有显著更好的协作效果。
Steel Surface Roughness Parameter Calculations Using Lasers and Machine Learning Models
results: 研究表明,使用数据驱动的方法可以提高表面质量控制,并且可以实现在生产过程中的实时调整。Abstract
Control of surface texture in strip steel is essential to meet customer requirements during galvanizing and temper rolling processes. Traditional methods rely on post-production stylus measurements, while on-line techniques offer non-contact and real-time measurements of the entire strip. However, ensuring accurate measurement is imperative for their effective utilization in the manufacturing pipeline. Moreover, accurate on-line measurements enable real-time adjustments of manufacturing processing parameters during production, ensuring consistent quality and the possibility of closed-loop control of the temper mill. In this study, we leverage state-of-the-art machine learning models to enhance the transformation of on-line measurements into significantly a more accurate Ra surface roughness metric. By comparing a selection of data-driven approaches, including both deep learning and non-deep learning methods, to the close-form transformation, we evaluate their potential for improving surface texture control in temper strip steel manufacturing.
摘要
控制表面质量在带钢制造中是非常重要,以满足锈钢和氧化钢的需求。传统方法依靠后期探针测量,而在线技术可以实现不接触的实时测量整个带。然而,确保准确测量是关键,以便在生产过程中实现实时调整制造过程参数,保证产品质量的一致性和闭环控制温钢厂。本研究利用当前最佳的机器学习模型,以提高在线测量转换为更准确的Ra表面粗糙度指标。通过比较数据驱动方法,包括深度学习和非深度学习方法,与关系式转换,我们评估其在表面 тексту라控制方面的潜在提高。
Quantum Solutions to the Privacy vs. Utility Tradeoff
results: 提供了一种基于量子密码学 primitives 的新架构,可以在任何现有的类型或量子生成模型之上使用,并且具有具有很高的安全性和隐私性保证。Abstract
In this work, we propose a novel architecture (and several variants thereof) based on quantum cryptographic primitives with provable privacy and security guarantees regarding membership inference attacks on generative models. Our architecture can be used on top of any existing classical or quantum generative models. We argue that the use of quantum gates associated with unitary operators provides inherent advantages compared to standard Differential Privacy based techniques for establishing guaranteed security from all polynomial-time adversaries.
摘要
在这项工作中,我们提出了一种新的架构(以及其变体),基于量子密码学 primitives,具有可证明的隐私和安全保证,对于生成模型的会员推测攻击。我们的架构可以在现有的类别或量子生成模型之上使用。我们认为,使用量子门相关的单位操作器提供了内置的优势,比标准推Diff Privacy基本技术更能提供来自所有多项时间敌对者的保证的安全性。
Region-Wise Attentive Multi-View Representation Learning for Urban Region Embeddings
results: 对实际世界数据集进行了两个下游任务的实验,结果显示,我们的模型在比较 estado-of-the-art 方法时,提高了17%的性能。Abstract
Urban region embedding is an important and yet highly challenging issue due to the complexity and constantly changing nature of urban data. To address the challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighbourhood region conditions. Our model focus on learn urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics and check-in dynamics. Then, we adopt global graph attention networks to learn similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.
摘要
城市区域嵌入是一项重要但又具有极高挑战性的问题,主要因为城市数据的复杂性和不断变化。为 Addressing these challenges, we propose a Region-Wise Multi-View Representation Learning (ROMER) method to capture multi-view dependencies and learn expressive representations of urban regions without the constraints of rigid neighborhood region conditions. Our model focuses on learning urban region representation from multi-source urban data. First, we capture the multi-view correlations from mobility flow patterns, POI semantics, and check-in dynamics. Then, we adopt global graph attention networks to learn the similarity of any two vertices in graphs. To comprehensively consider and share features of multiple views, a two-stage fusion module is further proposed to learn weights with external attention to fuse multi-view embeddings. Extensive experiments for two downstream tasks on real-world datasets demonstrate that our model outperforms state-of-the-art methods by up to 17\% improvement.Here's the text with some additional information about the Simplified Chinese translation:The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. The text is written in a formal and academic style, using technical terms and phrases commonly used in the field of computer science and machine learning. The translation aims to convey the same meaning and information as the original English text, while also taking into account the grammatical and syntactical conventions of Simplified Chinese.Please note that the translation is provided for reference only, and may not be perfect or entirely accurate. If you have any specific questions or requests for clarification, please feel free to ask.
How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models
results: 实验表明,提议的方法可以准确检测文本到图像扩散模型中的非法数据使用Abstract
Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.
摘要
近些时候,文本到图像扩散模型的表现有所惊喜,但也有一些问题被报告。例如,一个模型培训者可能会收集一组由某个艺术家创作的图像,然后尝试使用这些图像来训练一个可以生成类似图像的模型,而不是获得艺术家的授权。为解决这个问题,在训练过程中检测不当数据使用变得非常重要。在这篇论文中,我们提议一种方法来检测这种不当数据使用,即在文本到图像扩散模型中植入插入记忆。具体来说,我们修改了受保护的图像集,并添加了一些隐藏的图像包装函数,这些函数可以让模型在生成图像时进行隐藏的记忆。通过判断模型是否有记忆这些插入的内容(即是否通过选择的后处理函数处理生成的图像),我们可以检测模型是否使用了未经授权的数据。我们在Stable Diffusion和LoRA模型上进行了实验,并证明了我们的方法的有效性。
Beyond Intuition, a Framework for Applying GPs to Real-World Data
methods: 这篇论文使用的方法包括 kernel 设计和 computational scalability 的选择,以及如何设置一个强健且具体化的 GP 模型。
results: 在实际应用中,这篇论文使用 GPs 进行推测 glacier elevation change 的结果比较准确。Abstract
Gaussian Processes (GPs) offer an attractive method for regression over small, structured and correlated datasets. However, their deployment is hindered by computational costs and limited guidelines on how to apply GPs beyond simple low-dimensional datasets. We propose a framework to identify the suitability of GPs to a given problem and how to set up a robust and well-specified GP model. The guidelines formalise the decisions of experienced GP practitioners, with an emphasis on kernel design and options for computational scalability. The framework is then applied to a case study of glacier elevation change yielding more accurate results at test time.
摘要
Note:* "Gaussian Processes" is translated as "Gaussian processes" in Simplified Chinese, which is the standard way to refer to this topic in Chinese.* "low-dimensional" is translated as "小型" (xiǎo yì) in Simplified Chinese, which means "small" or "low-dimensional" in English.* "kernel design" is translated as "kernel设计" (jīn yì jīng yì) in Simplified Chinese, which means "kernel design" in English.* "computational scalability" is translated as "计算可扩展性" (jì yì kě xiǎo yì) in Simplified Chinese, which means "computational scalability" in English.
A Novel Site-Agnostic Multimodal Deep Learning Model to Identify Pro-Eating Disorder Content on Social Media
for: This study aimed to create a multimodal deep learning model to determine if social media posts promote eating disorders based on visual and textual data.
methods: The study used a labeled dataset of Tweets and trained and tested twelve deep learning models, including a multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model.
results: The RoBERTa and MaxViT fusion model achieved accuracy and F1 scores of 95.9% and 0.959, respectively, and was used to classify an unlabeled dataset of posts from Tumblr and Reddit. The model also uncovered a drastic decrease in the relative abundance of content that promotes eating disorders on eight Twitter hashtags since 2014, but with a resurgence by 2018.Here is the information in Simplified Chinese text:
methods: 该研究使用了 Twitter 上的标注数据集,并训练和测试了十二个深度学习模型,其中包括 RoBERTa 自然语言处理模型和 MaxViT 图像分类模型的多模态融合。
results: RoBERTa 和 MaxViT 融合模型在识别 Tweets 中推广吃见症的任务上实现了准确率和 F1 分数为 95.9% 和 0.959,分别。此外,该模型还用于分类 Tumblr 和 Reddit 上的不标注数据集,并获得了类似于前一代研究所得到的结果,表明深度学习模型可以开发出与人类研究者相似的洞察。此外,模型还进行了 Twitter 上八个 Hashtag 的时间序分析,发现自2014年以来,内容推广吃见症的相对含量在这些社区内逐渐下降,但到2018年,内容推广吃见症又开始增加或停止下降。Abstract
Over the last decade, there has been a vast increase in eating disorder diagnoses and eating disorder-attributed deaths, reaching their zenith during the Covid-19 pandemic. This immense growth derived in part from the stressors of the pandemic but also from increased exposure to social media, which is rife with content that promotes eating disorders. This study aimed to create a multimodal deep learning model that can determine if a given social media post promotes eating disorders based on a combination of visual and textual data. A labeled dataset of Tweets was collected from Twitter, upon which twelve deep learning models were trained and tested. Based on model performance, the most effective deep learning model was the multimodal fusion of the RoBERTa natural language processing model and the MaxViT image classification model, attaining accuracy and F1 scores of 95.9% and 0.959, respectively. The RoBERTa and MaxViT fusion model, deployed to classify an unlabeled dataset of posts from the social media sites Tumblr and Reddit, generated results akin to those of previous research studies that did not employ artificial intelligence-based techniques, indicating that deep learning models can develop insights congruent to those of researchers. Additionally, the model was used to conduct a timeseries analysis of yet unseen Tweets from eight Twitter hashtags, uncovering that, since 2014, the relative abundance of content that promotes eating disorders has decreased drastically within those communities. Despite this reduction, by 2018, content that promotes eating disorders had either stopped declining or increased in ampleness anew on these hashtags.
摘要
过去一代,食用疾病诊断和因食用疾病而导致的死亡人数有很大增长,特别是在covid-19大流行期间。这种巨大增长来自于流行病的压力以及社交媒体上的内容,后者在患食用疾病的人群中更加普遍。这项研究旨在创建一个多模态深度学习模型,可以根据文本和视频数据判断社交媒体文章是否推广食用疾病。在Twitter上收集了一个标注的Twitter文章集,并训练了12个深度学习模型。根据模型性能,最有效的深度学习模型是将RoBERTa自然语言处理模型和MaxViT图像分类模型 multimodal融合,它的准确率和F1分数分别为95.9%和0.959。这个模型在分析Twitter上的未标注文章时,能够生成与人工智能技术不使用的研究成果相似的结果, indicating that deep learning models can develop insights congruent to those of researchers。此外,该模型还用于对Twitter上的八个Hashtag进行时间序分析,发现自2014年以来,这些社群中的不健康食物内容的相对含量有很大减少。然而,到2018年,这些社群中的不健康食物内容的含量已经减少或增加了。