results: 研究发现,使用多种模型的ensemble可以实现与单一模型相比较好的结合力预测性能,并且可以控制输入特征,如物理化学性质或分子描述子。Abstract
The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts, as filtering potential candidates would save time and expenses for finding drugs. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Given many computational models for binding affinity prediction with varying results across targets, we herein develop a meta-modeling framework by integrating published empirical structure-based docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual models, training databases, and linear and nonlinear meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over individual base models. Our best meta-models achieve comparable performance to state-of-the-art exclusively structure-based deep learning tools. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain substantial improvement in binding affinity prediction while allowing control over input features such as physicochemical properties or molecular descriptors.
摘要
<>翻译文本为简化中文。> Computational approaches for screening candidate drug ligands against target proteins have become a priority in drug development, as they can save time and expenses in finding potential drug candidates. One of the key aspects of virtual screening is predicting the binding affinity between ligands and proteins. However, there are many computational models for binding affinity prediction, and the results can vary across different targets. To address this challenge, we have developed a meta-modeling framework by integrating published empirical structure-based docking and sequence-based deep learning models.在虚拟屏选中,预测ligand和蛋白质之间的绑定强度是一项关键的任务。然而,不同目标之间的绑定强度预测结果可能会很不一致。为解决这个挑战,我们在这里提出了一种meta-模型框架,通过结合已出版的实验性结构含量 docking 和深度学习模型来实现。在建立这个框架时,我们评估了许多组合individual模型,训练数据库和线性和非线性meta-模型方法。我们发现,许多我们的meta-模型可以在绑定强度预测中提供显著改进,并且我们的best meta-模型可以与当前的结构深度学习工具相比。总的来说,我们展示了多种模型方法可以被 ensemble вместе来实现较大的绑定强度预测改进,同时允许控制输入特征,如物理化学性质或分子描述符。
On Wasserstein distances for affine transformations of random vectors
methods: 这 paper 使用了Bures Metric来计算Random Vector的covariance Matrix,并derived upper bounds for compositions of affine maps。
results: 这 paper 提供了一些Lower bounds和upper bounds for Quadratic Wasserstein distance,并应用于various distributions,包括lying on a 1-dimensional manifold in $\mathbb{R}^2$。 Additionally, the paper provides a framework for mimicking handwritten digit or alphabet datasets for use in a manifold learning framework.Abstract
We expound on some known lower bounds of the quadratic Wasserstein distance between random vectors in $\mathbb{R}^n$ with an emphasis on affine transformations that have been used in manifold learning of data in Wasserstein space. In particular, we give concrete lower bounds for rotated copies of random vectors in $\mathbb{R}^2$ with uncorrelated components by computing the Bures metric between the covariance matrices. We also derive upper bounds for compositions of affine maps which yield a fruitful variety of diffeomorphisms applied to an initial data measure. We apply these bounds to various distributions including those lying on a 1-dimensional manifold in $\mathbb{R}^2$ and illustrate the quality of the bounds. Finally, we give a framework for mimicking handwritten digit or alphabet datasets that can be applied in a manifold learning framework.
摘要
我们讨论了一些已知的 quaratic Wasserstein 距离下界,特别是在 $\mathbb{R}^n$ 中Random vectors 上,并强调了使用抽象变换来进行数据在 Wasserstein 空间的学习。我们给出了对旋转 copies of random vectors in $\mathbb{R}^2$ 的不相关 ком ponent的具体下界,通过计算协VARIANCE 矩阵之间的Bures metric来得到。我们还得出了作用于初始数据测度的affine maps 的聚合上界,这些上界可以生成一种多样的diffemorphisms。我们应用这些上下界到不同的分布,包括在 $\mathbb{R}^2$ 中的 1-dimensional manifold 上的分布,并 Illustrates the quality of the bounds。最后,我们提出了一种拟合手写字符或字母数据集的框架,可以在 manifold learning 框架中应用。
LaTeX: Language Pattern-aware Triggering Event Detection for Adverse Experience during Pandemics
for: This paper aims to explore the role of social media platforms in highlighting and addressing socioeconomic disparities during the COVID-19 pandemic, specifically focusing on four major types of adverse experiences: loss of employment income, food scarcity, housing insecurity, and unmet needs for mental health services.
methods: The paper uses real-time data from Twitter to analyze language patterns related to the four adverse experiences, and proposes a sparsity optimization problem to extract low-level language features. The authors also propose novel constraints on feature similarity based on prior knowledge about the similarity of language patterns among the adverse experiences.
results: The proposed model is challenging to solve due to the non-convexity objective and non-smoothness penalties, but the authors develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the problem. Extensive experiments and comparisons to other models on real-world social media data justify the efficacy of their model in detecting adverse experiences.Abstract
The COVID-19 pandemic has accentuated socioeconomic disparities across various racial and ethnic groups in the United States. While previous studies have utilized traditional survey methods like the Household Pulse Survey (HPS) to elucidate these disparities, this paper explores the role of social media platforms in both highlighting and addressing these challenges. Drawing from real-time data sourced from Twitter, we analyzed language patterns related to four major types of adverse experiences: loss of employment income (LI), food scarcity (FS), housing insecurity (HI), and unmet needs for mental health services (UM). We first formulate a sparsity optimization problem that extracts low-level language features from social media data sources. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the similarity of the language patterns among the adverse experiences. The proposed problem is challenging to solve due to the non-convexity objective and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world social media and the detection of adverse experiences justify the efficacy of our model.
摘要
COVID-19 大流行减少了不同的民族和文化背景下的社会经济差距。 previous 研究使用传统的调查方法,如家庭普ulse调查(HPS)来描述这些差距,但这篇论文探讨了社交媒体平台在描述和解决这些挑战方面的作用。 drawing from real-time 社交媒体数据源,我们分析了四种主要的不利经验语言模式:失业收入损失(LI)、食物不足(FS)、住房不安定(HI)和精神健康服务需求不满(UM)。 first,我们构建了一个简单的语言特征提取问题,以EXTRACT 社交媒体数据源中的低级语言特征。 second,我们提出了一些新的语言特征相似性约束,基于先前知道这些语言模式之间的相似性。 proposed 问题是因为非对称目标函数和不对称补偿因子而具有挑战性。 we 发展了基于多元分解方法(ADMM)框架的算法来解决提议的问题。 extensive 实验和对真实社交媒体数据进行比较,证明了我们的模型的有效性。
Improving classifier decision boundaries using nearest neighbors
results: 提高了对label noise、对抗攻击、分类精度和一定程度的解释性能Here’s a more detailed explanation of each point:
for: The paper aims to improve the optimization of decision boundaries in neural networks.
methods: The proposed method uses a weighted average of the predictions of a sample and its nearest neighbors in latent space to improve the performance of neural networks.
results: The proposed method improves various important measures of neural networks, including resistance to label noise, robustness against adversarial attacks, classification accuracy, and interpretability. The improvements are not necessarily large in all four areas, but the approach is simple and does not require any modifications to the network architecture, training procedure, or dataset.Abstract
Neural networks are not learning optimal decision boundaries. We show that decision boundaries are situated in areas of low training data density. They are impacted by few training samples which can easily lead to overfitting. We provide a simple algorithm performing a weighted average of the prediction of a sample and its nearest neighbors' (computed in latent space) leading to a minor favorable outcomes for a variety of important measures for neural networks. In our evaluation, we employ various self-trained and pre-trained convolutional neural networks to show that our approach improves (i) resistance to label noise, (ii) robustness against adversarial attacks, (iii) classification accuracy, and to some degree even (iv) interpretability. While improvements are not necessarily large in all four areas, our approach is conceptually simple, i.e., improvements come without any modification to network architecture, training procedure or dataset. Furthermore, they are in stark contrast to prior works that often require trade-offs among the four objectives or provide valuable, but non-actionable insights.
摘要
(i) resistance to label noise,(ii) robustness against adversarial attacks,(iii) classification accuracy, and to some degree even(iv) interpretability. While improvements are not necessarily large in all four areas, our approach is conceptually simple, i.e., improvements come without any modification to network architecture, training procedure, or dataset. Furthermore, they are in stark contrast to prior works that often require trade-offs among the four objectives or provide valuable, but non-actionable insights.Translated into Simplified Chinese:神经网络不会学习优化的决策边界。我们显示决策边界位于训练数据密度低的区域内。它们受到训练样本少量的影响,容易导致过拟合。我们提供一个简单的算法,将样本预测值和其最近的邻居(在潜在空间中计算)进行权重平均,导致许多重要指标上的微妙改善。在我们的评估中,我们使用了各种自动学习和预训练 convolutional neural network,以示我们的方法可以提高(i) 标签噪声抵抗性,(ii) 对抗攻击性,(iii) 分类精度,并且在一定程度上还可以提高(iv) 解释性。虽然改善不一定是所有四个领域中的很大,但我们的方法是概念简单,即改进不需要修改网络结构、训练过程或数据集。此外,它们与之前的工作不同,常常需要四个目标之间的交易或提供有价值,但无法实现的指导。
Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control
results: 研究发现,通过控制连接性的权重和稀疏程度,可以使得神经网络在线上下文中更加稳定和可靠,并且可以使用 fewer parameters 的模型在分布Shift下表现更好。Abstract
Developing autonomous agents that can interact with changing environments is an open challenge in machine learning. Robustness is particularly important in these settings as agents are often fit offline on expert demonstrations but deployed online where they must generalize to the closed feedback loop within the environment. In this work, we explore the application of recurrent neural networks to tasks of this nature and understand how a parameterization of their recurrent connectivity influences robustness in closed-loop settings. Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity.
摘要
开发自适应智能代理人,可以在变化环境中交互,是机器学习领域的开放挑战。在这些设置下,Robustness特别重要,因为代理人常常在专家示例的基础上训练,但在环境中部署时需要总结并适应关闭反馈循环。在这项工作中,我们探索使用回归神经网络来解决这类任务,并研究回归连接的参数化对于封闭循环设置的影响。 Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity.
PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability
results: 相比现有的Energy Plus模型实现,PyDCM的矩阵化热计算使其速度提高至30倍,并且对于多CPU的数据中心设计实验 scales sublinearly。此外,PyDCM还允许使用深度强化学习via Gymnasium wrapper来优化数据中心冷却,并提供了一个易用的平台来测试多种数据中心设计构想。Abstract
The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A significant challenge in this pursuit is the lack of a configurable and scalable thermal data center model that offers an end-to-end pipeline. Data centers consist of multiple IT components whose geometric configuration and heat dissipation make thermal modeling difficult. This paper presents PyDCM, a customizable Data Center Model implemented in Python, that allows users to create unique configurations of IT equipment with custom server specifications and geometric arrangements of IT cabinets. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations and scales sublinearly with the number of CPUs. Also, PyDCM enables the use of Deep Reinforcement Learning via the Gymnasium wrapper to optimize data center cooling and offers a user-friendly platform for testing various data center design prototypes.
摘要
全球增加对可持续性的强调和减少碳排放,政府和企业被迫重新考虑数据中心的设计和运行方法。由于数据中心的能源消耗和计算工作负载呈指数增长,特别是冷却和IT能源使用方面,因此寻找可 configurable 和可扩展的热数据中心模型成为了一项挑战。由于数据中心由多个IT组件组成,这些组件的几何配置和热释放使得热模型化变得困难。本文介绍了PyDCM,一个基于Python的自定义数据中心模型,允许用户创建自定义IT设备参数和自定义IT柜的唯一配置。通过向量化热计算,PyDCM比现有的Energy Plus模型实现速度高得多(30倍),并且在数据中心设计尝试中进行深度强化学习。此外,PyDCM还提供了一个易于使用的平台,可以用于测试不同的数据中心设计原型。
Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond
results: 我们在这篇论文中评估了每一个设计选择的极限估计误差。我们发现,使用NCE估计器比Importance Sampling估计器更有效,但在极限情况下,差异消失。其次,我们发现使用几何路径可以减少估计误差,并且在某些情况下,使用算术路径可以实现最佳性。最后,我们建议一种两步估计器来近似最佳路径。Abstract
Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces. First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. Third, we find that the arithmetic path, while rarely used, can offer optimality properties over the universally-used geometric path. In fact, in a particular limit, the optimal path is arithmetic. Based on this theory, we finally propose a two-step estimator to approximate the optimal path in an efficient way.
摘要
现有研究已经开发出了一些蒙特卡洛方法来估计正规化常数(分配函数),基于渐变的想法。这意味着采样successively从一个描述分布的路径中,该路径连接了一个可行的"提议"分布和未正规化的"目标"分布。主要估计器在这个家族中包括渐变重要性样本和渐变噪声相对估计(NCE)。这些方法受到许多设计选择的影响:哪种估计器使用,哪个路径使用,是否使用路径等。到目前为止,没有一个确定的理论来确定这些选择是有效的。我们在这里评估每种设计选择的极限估计错误。首先,我们发现使用NCE估计器比重要性样本估计器更有效,但在无限小路步下,两者的差异消失。其次,我们发现使用几何路径可以将估计错误从指数函数下降到多项函数中,而且与参数距离目标分布的距离成正比。最后,我们发现使用阿基米德路径,虽然 rarely used,可以在一些情况下提供优化性质。实际上,在某个特殊情况下,优化的路径是阿基米德路径。基于这种理论,我们最终提议了一种两步估计器,可以有效地估计正规化常数。
CrysFormer: Protein Structure Prediction via 3d Patterson Maps and Partial Structure Attention
results: 通过两个新的peptide fragments数据集(2个残基和15个残基),论文表明该方法可以准确地预测蛋白质的电子密度图,并且需要训练数据集的数量和计算成本远比传统方法少。Abstract
Determining the structure of a protein has been a decades-long open question. A protein's three-dimensional structure often poses nontrivial computation costs, when classical simulation algorithms are utilized. Advances in the transformer neural network architecture -- such as AlphaFold2 -- achieve significant improvements for this problem, by learning from a large dataset of sequence information and corresponding protein structures. Yet, such methods only focus on sequence information; other available prior knowledge, such as protein crystallography and partial structure of amino acids, could be potentially utilized. To the best of our knowledge, we propose the first transformer-based model that directly utilizes protein crystallography and partial structure information to predict the electron density maps of proteins. Via two new datasets of peptide fragments (2-residue and 15-residue) , we demonstrate our method, dubbed \texttt{CrysFormer}, can achieve accurate predictions, based on a much smaller dataset size and with reduced computation costs.
摘要
Determining the structure of a protein has been an open question for decades. A protein's three-dimensional structure often poses significant computational costs when using classical simulation algorithms. Advances in the transformer neural network architecture, such as AlphaFold2, have achieved significant improvements for this problem by learning from a large dataset of sequence information and corresponding protein structures. However, these methods only focus on sequence information; other available prior knowledge, such as protein crystallography and partial structure of amino acids, could be potentially utilized. To the best of our knowledge, we propose the first transformer-based model that directly utilizes protein crystallography and partial structure information to predict the electron density maps of proteins. Via two new datasets of peptide fragments (2-residue and 15-residue), we demonstrate that our method, dubbed \texttt{CrysFormer}, can achieve accurate predictions with a much smaller dataset size and reduced computation costs.
Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization
results: 实验结果表明,our方法在严格的类增量设定下(i)保持模型大小不变,(ii)无需预训练数据集和(iii)无需记忆缓存来存储过去任务的数据时,可以在逐渐学习中达到更高的表现和记忆留存。Abstract
Learning new tasks accumulatively without forgetting remains a critical challenge in continual learning. Generative experience replay addresses this challenge by synthesizing pseudo-data points for past learned tasks and later replaying them for concurrent training along with the new tasks' data. Generative replay is the best strategy for continual learning under a strict class-incremental setting when certain constraints need to be met: (i) constant model size, (ii) no pre-training dataset, and (iii) no memory buffer for storing past tasks' data. Inspired by the biological nervous system mechanisms, we introduce a time-aware regularization method to dynamically fine-tune the three training objective terms used for generative replay: supervised learning, latent regularization, and data reconstruction. Experimental results on major benchmarks indicate that our method pushes the limit of brain-inspired continual learners under such strict settings, improves memory retention, and increases the average performance over continually arriving tasks.
摘要
学习新任务积累无忘记是持续学习中的核心挑战。生成经验回放解决了这个挑战,通过合成过去学习的任务的 Pseudo-数据点并在当前任务的数据同时重新训练。生成回放是在严格的类增量设定下最佳的启发式学习策略,当下列条件需要满足:(i)不变的模型大小,(ii)无预训练集,(iii)无记忆缓存以存储过去任务的数据。通过模仿生物神经系统机制,我们引入时间意识正则化方法来动态细调三个训练目标函数用于生成回放:监督学习、干扰正则化和数据重建。实验结果表明,我们的方法可以在这些严格的设定下超越脑神经系统逻辑学习器,提高记忆保持和逐渐到达的平均性能。
Information Geometry for the Working Information Theorist
results: 本文介绍了一些现代信息 геометри的发展,包括雷达探测、数组信号处理、量子物理、深度学习和最优运输等领域的应用。Abstract
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community.
摘要
信息 геометрия是研究统计 manifold的学科,即概率分布的空间从 геометрического角度出发。传统上,信息学应用包括统计概念如费希尔信息、充分统计量和有效估计器。但今天,信息 geometry 已经成为一个横跨多个领域的交叉学科,包括雷达探测、数组信号处理、量子物理、深度学习和最优运输。本文为不熟悉信息 geometry 的信息学家提供一个入门性的概述,包括在统计 manifold 上的差异、通用距离、正交和最短路径,以便进一步探索实际应用和新的理论研究。此外,我们还提到了一些最近的信息 geometry 发展,对信息理论社区总体而言都很有价值。
Non Commutative Convolutional Signal Models in Neural Networks: Stability to Small Deformations
results: 研究发现,非交汇滤波器可以具有小 perturbation 的稳定性,同时存在与 commutative 模型相似的选择性和稳定性之间的质量负担。 numerical experiments validate these results.Abstract
In this paper we discuss the results recently published in~[1] about algebraic signal models (ASMs) based on non commutative algebras and their use in convolutional neural networks. Relying on the general tools from algebraic signal processing (ASP), we study the filtering and stability properties of non commutative convolutional filters. We show how non commutative filters can be stable to small perturbations on the space of operators. We also show that although the spectral components of the Fourier representation in a non commutative signal model are associated to spaces of dimension larger than one, there is a trade-off between stability and selectivity similar to that observed for commutative models. Our results have direct implications for group neural networks, multigraph neural networks and quaternion neural networks, among other non commutative architectures. We conclude by corroborating these results through numerical experiments.
摘要
在这篇论文中,我们讨论了最近发表在[1]中关于基于非交换代数的拟合信号模型(ASM)以及它们在卷积神经网络中的应用。基于普通的代数信号处理(ASP)工具,我们研究了非交换卷积 Filter 的过滤和稳定性质性。我们示出了非交换卷积 Filter 可以具有小 perturbation 的空间操作器稳定性。此外,我们还证明了在非交换信号模型中的傅里叶分量相对于 commutative 模型存在一种与稳定性和选择性之间的质量负担。我们的结果直接适用于群神经网络、多graph神经网络和四元数神经网络等非交换架构。我们通过数值实验证实了这些结果。
results: 本文预测这个交换关系对于特定的机器学习模型而言,可以获得更好的预测质量,并且可以更好地理解这些模型在资源受限的情况下的实际和理论上的限制。Abstract
In resource limited computing systems, sequence prediction models must operate under tight constraints. Various models are available that cater to prediction under these conditions that in some way focus on reducing the cost of implementation. These resource constrained sequence prediction models, in practice, exhibit a fundamental tradeoff between the cost of implementation and the quality of its predictions. This fundamental tradeoff seems to be largely unexplored for models for different tasks. Here we formulate the necessary theory and an associated empirical procedure to explore this tradeoff space for a particular family of machine learning models such as deep neural networks. We anticipate that the knowledge of the behavior of this tradeoff may be beneficial in understanding the theoretical and practical limits of creation and deployment of models for resource constrained tasks.
摘要
在有限资源计算系统中,序列预测模型需要在严格的限制下运行。有各种模型可以满足这种情况,它们强调降低实现成本。这些资源受限序列预测模型在实践中存在一个基本的交易关系,即实现成本和预测质量之间的交易关系。这种基本交易关系还未对不同任务的模型进行了系统性的探索。在这里,我们建立了必要的理论和相关的实验方法,以探索这个交易关系的空间,特别是用于深度神经网络等一家machine learning模型。我们预计,了解这种交易关系的行为可以帮助我们理解资源受限任务中模型的理论和实践上的限制。
paper_authors: Ana Dodik, Oded Stein, Vincent Sitzmann, Justin Solomon
for: optimize for generalized barycentric coordinates and provide additional control over existing models
methods: use a variational technique and parameterize the continuous function that maps any coordinate in a polytope’s interior to its barycentric coordinates using a neural field
results: demonstrate the flexibility of the model using a variety of objective functions and present a thorough validation of the algorithm, as well as demonstrate several applicationsAbstract
We propose a variational technique to optimize for generalized barycentric coordinates that offers additional control compared to existing models. Prior work represents barycentric coordinates using meshes or closed-form formulae, in practice limiting the choice of objective function. In contrast, we directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field. This formulation is enabled by our theoretical characterization of barycentric coordinates, which allows us to construct neural fields that parameterize the entire function class of valid coordinates. We demonstrate the flexibility of our model using a variety of objective functions, including multiple smoothness and deformation-aware energies; as a side contribution, we also present mathematically-justified means of measuring and minimizing objectives like total variation on discontinuous neural fields. We offer a practical acceleration strategy, present a thorough validation of our algorithm, and demonstrate several applications.
摘要
我们提出了一种变量技术来优化通用的加权坐标,它提供了更多的控制比现有模型。先前的工作使用网格或关闭式公式来表示加权坐标,但这限制了目标函数的选择。相比之下,我们直接使用神经场来 parameterize任何polytope内部坐标到其加权坐标的连续函数。这种形式是由我们对加权坐标的理论特征化,允许我们构建总函数类型的有效坐标的神经场。我们采用多种目标函数,包括多项细化和形态感知能量;同时,我们也提供了数学上正确的测量和最小化对不连续神经场的目标。我们提供了实用的加速策略,进行了全面验证我们的算法,并应用了多种场景。
Euclid: Identification of asteroid streaks in simulated images using deep learning
paper_authors: M. Pöntinen, M. Granvik, A. A. Nucita, L. Conversi, B. Altieri, B. Carry, C. M. O’Riordan, D. Scott, N. Aghanim, A. Amara, L. Amendola, N. Auricchio, M. Baldi, D. Bonino, E. Branchini, M. Brescia, S. Camera, V. Capobianco, C. Carbone, J. Carretero, M. Castellano, S. Cavuoti, A. Cimatti, R. Cledassou, G. Congedo, Y. Copin, L. Corcione, F. Courbin, M. Cropper, A. Da Silva, H. Degaudenzi, J. Dinis, F. Dubath, X. Dupac, S. Dusini, S. Farrens, S. Ferriol, M. Frailis, E. Franceschi, M. Fumana, S. Galeotta, B. Garilli, W. Gillard, B. Gillis, C. Giocoli, A. Grazian, S. V. H. Haugan, W. Holmes, F. Hormuth, A. Hornstrup, K. Jahnke, M. Kümmel, S. Kermiche, A. Kiessling, T. Kitching, R. Kohley, M. Kunz, H. Kurki-Suonio, S. Ligori, P. B. Lilje, I. Lloro, E. Maiorano, O. Mansutti, O. Marggraf, K. Markovic, F. Marulli, R. Massey, E. Medinaceli, S. Mei, M. Melchior, Y. Mellier, M. Meneghetti, G. Meylan, M. Moresco, L. Moscardini, E. Munari, S. -M. Niemi, T. Nutma, C. Padilla, S. Paltani, F. Pasian, K. Pedersen, V. Pettorino, S. Pires, G. Polenta, M. Poncet, F. Raison, A. Renzi, J. Rhodes, G. Riccio, E. Romelli, M. Roncarelli, E. Rossetti, R. Saglia, D. Sapone, B. Sartoris, P. Schneider, A. Secroun, G. Seidel, S. Serrano, C. Sirignano, G. Sirri, L. Stanco, P. Tallada-Crespí, A. N. Taylor, I. Tereno, R. Toledo-Moreo, F. Torradeflot, I. Tutusaus, L. Valenziano, T. Vassallo, G. Verdoes Kleijn, Y. Wang, J. Weller, G. Zamorani, J. Zoubian, V. Scottez
for: 帮助欧空爆料 telescope 探测到更多的小行星
methods: 使用深度学习 pipeline,包括卷积神经网络、回归神经网络和梯度拟合树
results: 比革Det software更高的完整性和相似的纯度,能探测到0.25-0.5 magnitudes 更暗的小行星,可能增加探测小行星的数量50%Abstract
Up to 150000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the StreakDet software was previously tested, but the results were not optimal for short and/or faint streaks. We set out to improve the capability to detect asteroid streaks in Euclid images by using deep learning. We built, trained, and tested a three-step machine-learning pipeline with simulated Euclid images. First, a convolutional neural network (CNN) detected streaks and their coordinates in full images, aiming to maximize the completeness (recall) of detections. Then, a recurrent neural network (RNN) merged snippets of long streaks detected in several parts by the CNN. Lastly, gradient-boosted trees (XGBoost) linked detected streaks between different Euclid exposures to reduce the number of false positives and improve the purity (precision) of the sample. The deep-learning pipeline surpasses the completeness and reaches a similar level of purity of a non-machine-learning pipeline based on the StreakDet software. Additionally, the deep-learning pipeline can detect asteroids 0.25-0.5 magnitudes fainter than StreakDet. The deep-learning pipeline could result in a 50% increase in the number of detected asteroids compared to the StreakDet software. There is still scope for further refinement, particularly in improving the accuracy of streak coordinates and enhancing the completeness of the final stage of the pipeline, which involves linking detections across multiple exposures.
摘要
“Up to 150,000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the StreakDet software was previously tested, but the results were not optimal for short and/or faint streaks. We set out to improve the capability to detect asteroid streaks in Euclid images by using deep learning. We built, trained, and tested a three-step machine-learning pipeline with simulated Euclid images. First, a convolutional neural network (CNN) detected streaks and their coordinates in full images, aiming to maximize the completeness (recall) of detections. Then, a recurrent neural network (RNN) merged snippets of long streaks detected in several parts by the CNN. Lastly, gradient-boosted trees (XGBoost) linked detected streaks between different Euclid exposures to reduce the number of false positives and improve the purity (precision) of the sample. The deep-learning pipeline surpasses the completeness and reaches a similar level of purity of a non-machine-learning pipeline based on the StreakDet software. Additionally, the deep-learning pipeline can detect asteroids 0.25-0.5 magnitudes fainter than StreakDet. The deep-learning pipeline could result in a 50% increase in the number of detected asteroids compared to the StreakDet software. There is still scope for further refinement, particularly in improving the accuracy of streak coordinates and enhancing the completeness of the final stage of the pipeline, which involves linking detections across multiple exposures.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you need Traditional Chinese, please let me know.
Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning
results: 作者对多个实验结果进行了比较,结果显示,染色蛋白攻击在低FPR情况下能够更高效地进行会员推理,特别是在标签只会员推理中。Abstract
The integration of machine learning (ML) in numerous critical applications introduces a range of privacy concerns for individuals who provide their datasets for model training. One such privacy risk is Membership Inference (MI), in which an attacker seeks to determine whether a particular data sample was included in the training dataset of a model. Current state-of-the-art MI attacks capitalize on access to the model's predicted confidence scores to successfully perform membership inference, and employ data poisoning to further enhance their effectiveness. In this work, we focus on the less explored and more realistic label-only setting, where the model provides only the predicted label on a queried sample. We show that existing label-only MI attacks are ineffective at inferring membership in the low False Positive Rate (FPR) regime. To address this challenge, we propose a new attack Chameleon that leverages a novel adaptive data poisoning strategy and an efficient query selection method to achieve significantly more accurate membership inference than existing label-only attacks, especially at low FPRs.
摘要
机器学习(ML)在许多关键应用中的整合引入了许多个人隐私问题,其中一个问题是会员推断(MI),攻击者希望确定一个特定的数据示例是否包含在训练集中。现有的MI攻击都利用了访问模型预测的自信度分数,并使用数据毒攻击进一步提高其效果。在这项工作中,我们关注了更为未explored和更加现实istic的标签只设置,在这种设置下,模型只提供了查询示例的预测标签。我们显示现有的标签只MI攻击在低false positive rate(FPR) régime下无法进行会员推断。为解决这个挑战,我们提议一种新的攻击方法叫做假蝴蝶,它利用了一种新的适应式数据毒攻击策略和高效的查询选择方法,可以在低FPR régime下实现较为准确的会员推断。
Learning A Disentangling Representation For PU Learning
paper_authors: Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy
For: Addresses the problem of learning a binary classifier given Positive and Unlabeled data (PU learning) in high-dimensional settings.* Methods: Proposes a neural network-based data representation using a loss function to project unlabeled data into two clusters, amplified by vector quantization.* Results: Demonstrates improved performance compared to current state-of-the-art approaches on simulated PU data, with theoretical justification for the two-cluster-based approach and algorithmic choices.Here’s the format you requested:* For: <what are the paper written for?>* Methods: <what methods the paper use?>* Results: <what results the paper get?>I hope that helps!Abstract
In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.
摘要
在这篇论文中,我们解决了基于Positive和Unlabeled数据(PU学习)的二分类学习问题。虽然既有基础技术如聚类、外围检测或正类浓度估计可以在低维度设置下解决问题,但是其效果逐渐下降为高维度设置,由于数据分布的复杂度的增加。在这篇论文中,我们提议使用神经网络基于损失函数来学习数据表示,可以将未标注数据项映射到两个(正/_负)类中,并使用矢量量化技术来强化这些类之间的分离。我们在模拟的PU数据上进行实验,并证明了我们的提议方法与当前状态艺技术相比有更好的性能。我们还提供了一些理论基础和算法选择的正当性。
Logical Languages Accepted by Transformer Encoders with Hard Attention
paper_authors: Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir Podolskii
for: 本研究主要针对于使用转换器Encoder来认识正则语言。
methods: 本文使用Unique Hard Attention Transformers(UHAT)和Average Hard Attention Transformers(AHAT)两种自注意机制进行研究。UHATEncoder只能认识${\sf AC}^0}$中的语言,而AHATEncoder可以认识${\sf TC}^0}$中的语言,但其表达能力仍然在${\sf AC}^0}$中。
results: 我们首先证明了UHATEncoder无法认识一个${\sf AC}^0}$语言。然而,我们也证明了UHATEncoder可以认识一个${\sf AC}^0}$语言中的一个rich Fragment,即所有可以用第一预言逻辑表示的语言,这个逻辑包括所有${\sf AC}^0}$中的常见语言。此外,我们还证明了AHATEncoder可以认识这些语言,并且可以在添加计数器的情况下进一步扩展这些语言。我们通过这些结果来 derive新的UHAT和AHAT表达能力的结论。Abstract
We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class ${\sf AC}^0$, i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside ${\sf AC}^0$), but their expressive power still lies within the bigger circuit complexity class ${\sf TC}^0$, i.e., ${\sf AC}^0$-circuits extended by majority gates. We first show a negative result that there is an ${\sf AC}^0$-language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of ${\sf AC}^0$-languages, namely, all languages definable in first-order logic with arbitrary unary numerical predicates. This logic, includes, for example, all regular languages from ${\sf AC}^0$. We then show that AHAT encoders can recognize all languages of our logic even when we enrich it with counting terms. We apply these results to derive new results on the expressive power of UHAT and AHAT up to permutation of letters (a.k.a. Parikh images).
摘要
我们贡献于形式语言的研究,可以被转换器Encoder认可。我们关注两种自注意机制:(1)UHAT(Unique Hard Attention Transformers)和(2)AHAT(Average Hard Attention Transformers)。UHATEncoder只能认可内部Circuit复杂性类${\sf AC}^0}$中的语言,即由多个poly-sized和深度bound boolean circuits组成的家族。相比之下,AHATEncoder可以认可外部${\sf AC}^0}$中的语言,但其表达力仍处于更大的Circuit复杂性类${\sf TC}^0}$中,即${\sf AC}^0}$circuits的扩展。我们首先显示了一个负结果,即${\sf AC}^0}$语言中有一个不可以被UHATEncoder认可。在正面上,我们显示了UHATEncoder可以认可${\sf AC}^0}$语言的Rich fragment,即所有可以用first-order logic表示的语言,其中包括所有${\sf AC}^0}$中的常见语言。然后我们显示AHATEncoder可以认可我们的逻辑中的所有语言, même when we enrich it with counting terms。我们将这些结果应用于derive new results on UHAT和AHAT的表达力,以及其 permutation of letters(即Parikh images)。
Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs
results: 作者透过实验表明,Fishnets 可以实现对于数据集的信息优化嵌入,并且可以在不同数据分布下保持稳定性。此外,Fishnets 可以与现有的 GNN 架构相容,并且可以在训练时间和学习parameters方面实现更好的性能。Abstract
Set-based learning is an essential component of modern deep learning and network science. Graph Neural Networks (GNNs) and their edge-free counterparts Deepsets have proven remarkably useful on ragged and topologically challenging datasets. The key to learning informative embeddings for set members is a specified aggregation function, usually a sum, max, or mean. We propose Fishnets, an aggregation strategy for learning information-optimal embeddings for sets of data for both Bayesian inference and graph aggregation. We demonstrate that i) Fishnets neural summaries can be scaled optimally to an arbitrary number of data objects, ii) Fishnets aggregations are robust to changes in data distribution, unlike standard deepsets, iii) Fishnets saturate Bayesian information content and extend to regimes where MCMC techniques fail and iv) Fishnets can be used as a drop-in aggregation scheme within GNNs. We show that by adopting a Fishnets aggregation scheme for message passing, GNNs can achieve state-of-the-art performance versus architecture size on ogbn-protein data over existing benchmarks with a fraction of learnable parameters and faster training time.
摘要
设计学习是现代深度学习和网络科学中的一个关键Component。图 neural networks (GNNs) 和它们的边 livre counterparts Deepsets 在异常和复杂的数据集上表现出了极其有用的特性。在学习集成的数据成员嵌入中,关键在于指定的汇聚函数,通常是总、最大或平均。我们提出了 Fishnets,一种汇聚策略,用于学习对集合数据的信息丰富嵌入,包括权化推理和图聚合。我们证明了以下四点:1. Fishnets神经摘要可以优化地扩展到任意数据对象数量上,而不会增加计算复杂性。2. Fishnets的汇聚方法对数据分布变化具有更高的稳定性,与标准Deepsets不同。3. Fishnets可以达到信息理论内存的极限,并在MCMC技术无法进行融合的场景下进行扩展。4. Fishnets可以作为GNNs中的替换汇聚方法,使得GNNs可以在已有的参数和训练时间下达到现有的性能水平。我们在ogbn-protein数据集上使用Fishnets汇聚方法进行消息传递,并证明了GNNs可以在已有的参数和训练时间下达到现有的性能水平,并且可以在很短的时间内训练。
Droplets of Good Representations: Grokking as a First Order Phase Transition in Two Layer Networks
results: 研究发现,在Grokking过程中,DNN的状态与第一次相变过程中的混合阶段类似,DNN在这个阶段生成了有用的内部表示,与之前的转变前不同。Abstract
A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
摘要
The Un-Kidnappable Robot: Acoustic Localization of Sneaking People
results: 实现了通过只使用音频感知来跟踪静默移动的人员的功能,视频示例可以在项目页面上查看:https://sites.google.com/view/unkidnappable-robot。Abstract
How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using only audio. We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot
摘要
如何轻松逃脱机器人的察视?我们研究是否可以通过机器人发出的各种各样的声音来探测人们的存在,即使他们尽力保持沉寂。我们收集了一个机器人数据集,包括高质量的4通道音频数据和360度RGB数据人们在不同的室内场景中移动。我们用音频数据来预测人们在附近的存在和位置。我们将方法应用于机器人上,让它通过只使用音频感知跟踪一个在沉寂状态下移动的人。视频示例请参考我们项目页面:https://sites.google.com/view/unkidnappable-robot
Stochastic interpolants with data-dependent couplings
methods: 该论文使用stochastic interpolants的框架来正式地\textit{couple} base density和target density,然后通过解决一个简单的方差损失问题来学习transport maps。
results: 实验表明,通过建立dependent coupling,可以在super-resolution和in-painting中获得更好的结果。Abstract
Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
摘要
Simplified Chinese:生成模型,如流和散射,通过建立两个概率密度之间的连续时间映射。在这种工作中,我们使用权重 interpolant 框架来规范如何对基准密度和目标密度进行coupling。这使得我们能够通过包含类标签或连续嵌入的信息来构建动态传输图,这些图可以作为条件生成模型。我们表明这些传输图可以通过解决一个简单的方差损失回归问题来学习。我们通过实验展示了在超分辨和填充等应用中,建立依赖关系的好处。
Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance
methods: 该论文使用了通用非integrable martingale和扩展 Ville 不等式来构建信任序列和信任程序。
results: 该论文发现了一种新的信任序列和信任程序,它们分别使用了将 Лаи的平均替换为 Gaussian 混合,并将右 Haar 混合换为最大 likelihood 估计下的 null 值。该论文还分析了这些方法的信任间隔宽度,并提供了数值实验来比较和对比不同方法。Abstract
In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
摘要
在1976年,拉伊(Lai)构造了一个非致命序列对 Gaussian 分布中的均值μ的 confidence sequence。异常的是,他使用了一个不正确(右哈尔)混合和一个平均混合 над μ。在这里,我们仔细介绍拉伊的构造细节,使用通用非integrable martingale和扩展 Ville 不等式。尽管这不会生成一个积分过程(由于非integrability of his martingale),但它们可以用于Sequential t-test。在这篇文章中,我们开发了两个新的积分过程和 confidence sequence для同一个设定:一个是在减少过滤中的测试martingale,另一个是在 canonical data 过滤中的 e-process。这两个积分过程分别由将拉伊的平均混合换成 Gaussian 混合,并将右哈尔混合换成 null 下最大可信度估计。我们还分析了这些 confidence sequence 的宽度,它们有一个怪异的依赖于错误probability α。我们在文章中提供了一些数字实验,以比较和对比不同的方法。
HeaP: Hierarchical Policies for Web Actions using LLMs
results: 论文在一系列网络任务上进行了评估,包括MiniWoB++, WebArena、mock航空公司客服系统和实际网站交互,并与优先作出比较,结果表明它能够在使用数量级别下的数据量下表现出类似或更好的效果。Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.
摘要
Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities
For: investigate the variational optimality of neural architectures with multivariate nonlinearities/activation functions.* Methods: construct a new family of Banach spaces using regularization operators and the $k$-plane transform, prove a representer theorem.* Results: optimal neural architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, shed light on the regularity of functions learned by neural networks trained on data with multivariate nonlinearities.Abstract
We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
摘要
我们研究一类 neural network 的可变优化问题(具体来说是 Banach space 优化问题),这类 neural network 具有多变量非线性/活动函数。为此,我们构造了一个新的 Banach space 家族,通过正则化算子和 $k$-plane transform 定义。我们证明了一个表示定理, stating that the solution sets of learning problems posed over these Banach spaces 是由 neural network WITH multivariate nonlinearities 完全 caracterized。这些优化的架构具有跳过连接和正交 весаnormalization,与多indeces模型和 orthogonal weight normalization 密切相关。我们的框架与 Rectified Linear Unit (ReLU) 活动函数、 norm 活动函数和 radial basis functions 等等 classical nonlinearities 相容。我们还证明了这些下面空间是 reproducing kernel Banach space 和 variation space 的特例。我们的结果解释了 neural network 对数据进行学习时 learned 函数的 regularity,特别是 WITH multivariate nonlinearities,并提供了新的理论动机 для一些实践中的架构选择。
Multimarginal generative modeling with stochastic interpolants
results: 提出了一种可以提取多向对应关系的多 marginal 生成模型,并在数据修复、风格转换和公平性等方面具有应用前景。同时,该方法还可以在双边分布情况下提高动态传输成本。在numerical例子中,提供了证明。Abstract
Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
摘要
Translated into Simplified Chinese:给定一个包含 $K$ 概率密度函数的集合,我们考虑多个概率密度函数的生成模型问题,即学习一个joint分布,使其中的每个分布都是这些概率密度函数的边界。我们将这个问题形式化为一个通用的随机 interpolant 框架下的一种方法,从而获得基于动态传输的有效学习算法。我们的生成模型由速度场和Score场定义,它们可以被视为在一个扩展了时间变量的 simplicx 上的最小二乘目标的解。这种传输在 simplicx 上受到所有边界的影响,并且我们示出了在多个边界之间建立对应关系的能力。这种对应关系有应用于样式传递、算法公平和数据修复等。此外,我们的多边界视角还允许在传统的两边界设定中减少动态传输成本。我们通过一些数值示例来证明这些能力。
Hadamard Domain Training with Integers for Class Incremental Quantized Learning
paper_authors: Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi
for: 提高Edge平台上的继续学习性能,满足隐私和延迟低的应用需求。
methods: 使用便宜的哈达马丁变换来实现减少精度的培训,并在精度减少后进行精度控制。
results: 在人活动识别和CIFAR100等数据集上实现了继续学习精度下降不到0.5%和3%,同时将所有矩阵乘法输入减少到4位,8位积分器。Abstract
Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
摘要
In our technique, we use tiled matrix multiplication to enable low-bit width accumulators, and we determine which tensors need stochastic rounding. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. Our results show that we can achieve less than 0.5% and 3% accuracy degradation while quantizing all matrix multiplications inputs down to 4-bits with 8-bit accumulators. This demonstrates that our technique can enable low-precision training for continual learning on resource-constrained edge platforms, while maintaining acceptable model accuracy.
Strategic Evaluation: Subjects, Evaluators, and Society
paper_authors: Benjamin Laufer, Jon Kleinberg, Karen Levy, Helen Nissenbaum
For: The paper is written to explore the idea that evaluations can be understood as strategic interactions between the evaluator and the subject of evaluation, and how this can lead to misaligned goals and moral judgments.* Methods: The paper uses a model with three interacting agents - the decision subject, the evaluator, and society - to represent the process of evaluation and the strategic behaviors that can arise.* Results: The paper highlights the applicability of the model to a number of social systems where one or two players strategically undermine the others’ interests to advance their own, and argues that the moral standing of strategic behaviors depends on the moral standing of the evaluations and incentives that provoke such behaviors.Abstract
A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.
摘要
一种广泛应用的算法是在正式和量化度量朦杂概念(如优劣)进行决策。当人们在这些类型的评估中策略地回应,以获得更有利的决策结果时,他们的行为可能会被道德判断。他们可能会被描述为“游戏系统”或“诈骗”,或者(在其他情况下)投入“正直努力”或“改进”行为。机器学习文献中关于策略行为的描述通常强调试者的努力以获得更有利的评估结果,一些作品提供了预防或避免这种欺骗的方法,一些作品将“游戏”行为与“改进”行为分开,而其他作品则尝试测量评估系统中的努力负担或不同效果。我们从不同的起点开始:评估设计本身可以被理解为评估者所持的目标,这些目标可能与更广泛的社会目标不一致。为了发展这一想法,我们提出了一个模型,表示评估过程中的三个交互者:决策者、评估者和社会,代表一个Bundle of values和监督机制。我们强调我们的模型在多种社会系统中是可适用的,其中一个或两个玩家通过策略性的方式推翻另一个玩家的利益,以提高自己的利益。当评估者被视为自己是策略的时,我们可以重新定位评估审查的注意力,从而把注意力转移到评估设计中的机制和激励机制。strategic behaviors的道德地位常常取决于评估和激励机制的道德地位。
Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics
results: 论文表明,这种方法可靠地获得可解释性和信任性的 constitutive 模型,并且可以应用于压缩和不压缩的 hyperelasticity、yield 函数和硬化模型。Abstract
Data-driven constitutive modeling with neural networks has received increased interest in recent years due to its ability to easily incorporate physical and mechanistic constraints and to overcome the challenging and time-consuming task of formulating phenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow to obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priory but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.
摘要
“数据驱动的构成模型使用神经网络received increased interest in recent years due to its ability to easily incorporatephysical和mechanistic constraints和 overcome the challenging and time-consuming task of formulatingphenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priori but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.
for: Ensuring equitable outcomes in human-AI collaboration, especially when human decision-makers do not comply perfectly with algorithmic decisions.
methods: Propose a new approach called compliance-robustly fair algorithmic recommendations, which are guaranteed to improve fairness in decisions regardless of the human’s compliance pattern. An optimization strategy is also proposed to identify the best performance-improving compliance-robustly fair policy.
results: Show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy, which means that enforcing traditional fairness constraints may not be desirable if our goal is to improve the equity and accuracy of human-AI collaboration.Abstract
Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.
摘要
现有的算法公平方法尝试确保算法决策的结果是公平的,只要人类决策者完全遵循算法的决策。然而,完美地遵循算法是非常rare的现实或者even desirable outcome in human-AI collaboration。实际上,latest studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy.因此,保证公平的结果需要fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern。我们定义了“compliance-robustly fair” algorithmic recommendations,meaning that the recommendations are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern。我们还提出了一种简单的优化策略来标识最佳性能改进的compliance-robustly fair policy。然而,我们表明,可能无法设计算法建议,同时是孤立的公平,compliance-robustly fair,和人类政策更高的准确性。因此,如果我们的目标是提高人类-AI合作的公平和准确性,那么可能不是desirable to enforce traditional fairness constraints。
Distributional PAC-Learning from Nisan’s Natural Proofs
results: 论文的主要结论是,在某些特定的自然证明情况下,可以有效地学习Lambda-Circuit在新的分布型PAC模型中,并且可以应用于深度2的多数票电路、多面体和DNF等问题。此外,论文还证明了这种模型的一些重要性质和应用。Abstract
(Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
摘要
(简化版) 卡尔莫西诺等 (2016) 表明自然证明的电路下界可以efficient地学习Lambda电路,但只有在均匀分布下,使用会员查询,并且符号集合为\AC^0[p]。我们考虑了这种推论是否可以推广到\Lambda不包含\AC^0[p],以及在 ва利安特的PAC模型中学习算法,使用随机示例并学习到任意示例分布。我们得到了正面和负面的结果。负面方面,我们发现如果对每个电路类\Lambda,自然证明可以导致学习\Lambda电路在 ва利安特的PAC模型中,那么存在一个 polynomial time的解决方案,可以在 O(n^{1.5}) 时间内解决唯一最短 вектор问题(uSVP)。这表明自然证明是否可以导致学习\Lambda电路可能取决于\Lambda。正面方面,我们的主要结果是特定的自然证明, originating from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits),可以导致 PAC-learning algorithms in a new distributional variant of Valiant's model。我们的分布式PAC模型更强于Blum等人 (1993) 的平均情况预测模型和Nanashima (2021) 的启发式PAC模型,并具有许多重要的性质,例如可以增强。主要应用包括新的分布式PAC-learning算法 для深度2的多数电路、多面体和 DNFs over natural target distributions,以及 depth-2 majority circuits 无法识别编码输入弱PRFs。
results: 提供了一个开源的工具,可以帮助研究者更容易地解决分类问题,不需要深入的机器学习知识。Abstract
Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.
摘要
machine learning 分类问题在生物信息学中广泛,但技术知识要求进行模型训练、优化和推理可能会阻碍研究人员使用这种技术。本文介绍了一个自动化工具来简化机器学习分类问题的训练模型和获得结果,并提供了有用的可视化和数据分析。这个工具支持 binary 和多类分类问题,并提供了多种模型和方法。在界面中,您可以生成假数据来填充缺失的值、平衡类别标签或生成全新的数据集。此外,它还提供了特征评估和生成可读性分数,以指示影响输出的特征。我们介绍了 CLASSify,一个开源的工具来简化解决分类问题的用户体验,无需了解机器学习知识。
Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally
results: 研究发现,AdvML4G 领域的工作具有很高的创新性和可行性,但同时也存在一些挑战和未解决的问题,需要进一步的研究和发展。Abstract
Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community.
摘要
深度神经网络(DNN)已经成为机器学习的驱动力,但是研究表明,DNN受到了攻击性示例的影响,导致机器学习领域内的攻击机器学习(AdvML)得到了广泛的关注。研究人员在不同的场景和模式下调查了这些攻击性示例,并发现DNN中嵌入的偏见和不可解释的预测结果,可能导致反社会的AI应用程序。新的AI技术的出现,如大语言模型(LLM),如ChatGPT和GPT-4,会增加反社会应用程序的风险。为了鼓励开发负面向社会的应用程序,并阻止开发反社会应用程序, regulators、实践者和研究人员应该共同合作。在这篇评论中,我们提供了机器学习领域的第一份全面评论,涵盖了AdvML4G的出现、与AdvML的区别和相似性、社会好的相关概念和方面、AdvML4G的动机以及使用AdvML4G作为创新负面向社会应用程序的auxiliary工具的各种研究工作。最后,我们还详细介绍了需要研究社区的各种挑战和开放问题。
GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data
paper_authors: Ahmed Fakhry, Raneem Khafagy, Adriaan-Alexander Ludl
for: identifying novel gene-gene interactions based on known gene expressions and interaction data
methods: parallel-layer deep learning network (GENER) using gene expression data
results: outperformed competing methods with an average AUROC score of 0.834 on the combined BioGRID&DREAM5 datasetAbstract
Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions.
摘要
检测和发现新的基因交互是一项具有挑战性的任务。不同的统计学和深度学习方法尝试利用基因交互的topological结构和基因表达特征来预测新的基因交互。然而,一些方法仅仅利用基因表达 profiling。在这个上下文中,我们介绍GENER,一种专门为基因交互预测设计的并行层深度学习网络。我们进行了两个训练实验,并与现有的统计学和深度学习方法进行比较。可以注意的是,我们的模型在combined BioGRID&DREAM5集合上 achieved an average AUROC score of 0.834,在预测基因交互方面表现出色。
Comparing Time-Series Analysis Approaches Utilized in Research Papers to Forecast COVID-19 Cases in Africa: A Literature Review
methods: 本研究使用英语论文,从2020年1月至2023年7月进行了系统性的搜索,特意搜索了在非洲COVID-19数据集上使用时间序分析方法的论文。使用了PubMed、Google Scholar、Scopus和Web of Science等数据库。研究论文经过了评估程序,以提取相关的时间序分析模型实施和性能信息。
results: 本研究发现了不同的方法ologies,评估了它们在预测病毒传播的有效性和局限性。结果可能为预测COVID-19病例提供更深入的理解,未来研究应考虑这些理解,以提高时间序分析模型和探索不同方法的集成,以提高公共卫生决策。Abstract
This literature review aimed to compare various time-series analysis approaches utilized in forecasting COVID-19 cases in Africa. The study involved a methodical search for English-language research papers published between January 2020 and July 2023, focusing specifically on papers that utilized time-series analysis approaches on COVID-19 datasets in Africa. A variety of databases including PubMed, Google Scholar, Scopus, and Web of Science were utilized for this process. The research papers underwent an evaluation process to extract relevant information regarding the implementation and performance of the time-series analysis models. The study highlighted the different methodologies employed, evaluating their effectiveness and limitations in forecasting the spread of the virus. The result of this review could contribute deeper insights into the field, and future research should consider these insights to improve time series analysis models and explore the integration of different approaches for enhanced public health decision-making.
摘要
Translated into Simplified Chinese:这篇文献综述旨在比较在非洲地区预测COVID-19确诊病例的不同时间序分析方法。该研究在2020年1月至2023年7月期间,通过英文研究论文检索,特定地点在非洲使用时间序分析方法进行COVID-19数据分析。各种数据库,如PubMed、Google学术、Scopus和Web of Science等,都被使用于这个过程中。审查的研究论文中的信息,包括时间序分析模型的实施和性能评估。该研究报告了不同方法的应用和局限性,以及预测病毒传播的效果。这些结果可以为预测领域提供更深入的理解,并且未来的研究应该考虑这些意见,以改进时间序分析模型并探讨不同方法的集成,以提高公共卫生决策。
Sampling via Gradient Flows in the Space of Probability Measures
paper_authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart
for: 采样target概率分布中的一个基本挑战是computational科学和工程中的一个基本问题,recent work shows that algorithms derived by considering gradient flows in the space of probability measures 开辟了新的开发途径。这篇论文提供了三种贡献,分别是:
methods: gradient flows中的设计元素的研究。any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms。我们的第一贡献是:Kullback-Leibler divergence作为能量函数,gradient flows resulting from it do not depend on the normalization constant of the target distribution。我们的第二贡献是:study the choice of metric from the perspective of invariance。Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant。as a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows。in particular, we construct various affine invariant Wasserstein and Stein gradient flows。
results: affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, both in theory and by using particle methods。we also study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods。we establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically。Abstract
Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
摘要
取样target概率分布的问题是计算科学和工程中的基础挑战。近期的研究表明,通过考虑梯度流在概率分布空间上来开发算法,可以开辟新的算法发展途径。本文对这种取样方法做出三项贡献:1. 我们显示出,Kullback-Leibler divergence作为能函数,其梯度流不受目标分布的normalization常数影响。这意味着,通过Kullback-Leibler divergence来定义梯度流,可以避免一些困难,例如,对目标分布的normalization常数进行估计。2. 我们研究了选择metric的问题,从diffusion invariants的角度出发。Fisher-Rao metric是唯一不同Scaling的diffusion invariants metric,但它可能是computationally tractable的问题。我们引入了一种relaxed, affine invariants property for metrics and gradient flows,并构造了各种affine invariants Wasserstein和Stein gradient flows。我们证明了,在取样高度非对称分布时,affine invariants gradient flows会表现更加优于非affine invariants counterparts。3. 我们研究了基于Gaussian approximations的gradient flows的问题,并开发了一种alternative to particle methods。我们建立了各种Gaussian approximate gradient flows的连接,并考虑它们与参数化variational inference中的gradient methods的关系。我们还研究了它们的数学性和numerical性的性质。
results: 论文的实验结果表明,TimeGPT模型在不同的时间序列数据集上的预测性能强于传统的统计学、机器学习和深度学习方法,同时具有较高的效率和简洁性。Abstract
In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
摘要
在本文中,我们介绍了 TimeGPT,首个适用于时间序列的基础模型,能够生成准确的预测结果,无需见到训练数据。我们对我们的预训练模型进行了对比,与统计学、机器学习和深度学习方法进行了比较,结果显示,TimeGPT零shot推理性能高效简单。我们的研究表明,从其他人工智能领域的技术可以有效应用于时间序列分析。我们认为,大规模的时间序列模型将为精确预测和减少不确定性提供了一个激动人心的机会,通过利用当代深度学习技术。
Smoothing Methods for Automatic Differentiation Across Conditional Branches
paper_authors: Justin N. Kreikemeyer, Philipp Andelfinger
for: 这个论文的目的是提出一种基于幂分析的自动分配方法,以便在控制流构造引入的缺陷中进行优化。
methods: 这个论文使用了幂分析(SI)和自动导数(AD)两种方法,以计算分支程序的导数。
results: 研究人员通过对分支程序的输出进行幂分析,并使用自动导数来计算导数,从而实现了基于导数的参数synthesis。这种方法可以在分支程序中进行高效的优化。Abstract
Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.
摘要
Programs with discontinuities caused by control flow constructs, such as conditional branches, pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. We combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We discuss the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.
Targeted Adversarial Attacks on Generalizable Neural Radiance Fields
for: This paper discusses the vulnerability of Neural Radiance Fields (NeRFs) to adversarial attacks, and demonstrates the effectiveness of both low-intensity and targeted attacks.
methods: The paper uses NeRFs to synthesize high-quality images from sparse 2D observations, and employs both low-intensity and targeted adversarial attacks to evaluate the model’s robustness.
results: The paper shows that NeRFs can be vulnerable to both low-intensity and targeted adversarial attacks, and that the attacks can be robust enough to be used in real-world applications. Additionally, the paper demonstrates the ability to generate specific, predefined output scenes using targeted attacks.Abstract
Neural Radiance Fields (NeRFs) have recently emerged as a powerful tool for 3D scene representation and rendering. These data-driven models can learn to synthesize high-quality images from sparse 2D observations, enabling realistic and interactive scene reconstructions. However, the growing usage of NeRFs in critical applications such as augmented reality, robotics, and virtual environments could be threatened by adversarial attacks. In this paper we present how generalizable NeRFs can be attacked by both low-intensity adversarial attacks and adversarial patches, where the later could be robust enough to be used in real world applications. We also demonstrate targeted attacks, where a specific, predefined output scene is generated by these attack with success.
摘要
neural radiance fields (NeRFs) 近期出现为3D场景表示和渲染的强大工具。这些数据驱动模型可以从稀疏的2D观察中学习生成高质量图像,使得场景重建变得真实和交互式。然而,随着NeRFs在扩展实际应用,如增强现实、机器人和虚拟环境中的使用,它们可能会受到恶意攻击。 在这篇论文中,我们表明了通用NeRFs可以受到低强度攻击和攻击贴图的威胁。其中,攻击贴图可能够在实际应用中使用,并且可以实现特定、预先定义的输出场景。我们还示出了 Targeted 攻击,即通过攻击NeRFs来生成特定的场景。
Analysis of learning a flow-based generative model from limited sample complexity
results: 研究结果显示,使用这种方法可以实现高维 Gaussian 混合分布中采样,并且可以提供一个 Bayes-optimal 的方法来评估模型的性能。Abstract
We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
摘要
我们研究一个基于流的生成模型, Parametrized by a two-layer autoencoder,可以抽样自高维 Gaussian 混合体。我们提供了锐利的终端分析。首先,我们提供了一个紧密的关闭式形式的learned velocity field,当 Parametrized by a shallow denoising autoencoder 在一定数量 $n$ of samples from the target distribution 上训练。 Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density。 Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal。Here's the translation breakdown:* "We study the problem of training a flow-based generative model" becomes "我们研究一个基于流的生成模型"* "parametrized by a two-layer autoencoder" becomes "Parametrized by a two-layer autoencoder"* "to sample from a high-dimensional Gaussian mixture" becomes "可以抽样自高维 Gaussian 混合体"* "We provide a sharp end-to-end analysis of the problem" becomes "我们提供了锐利的终端分析"* "First, we provide a tight closed-form characterization of the learnt velocity field" becomes "首先,我们提供了一个紧密的关闭式形式的learned velocity field"* "when parametrized by a shallow denoising autoencoder trained on a finite number $n$ of samples from the target distribution" becomes "当 Parametrized by a shallow denoising autoencoder 在一定数量 $n$ of samples from the target distribution 上训练"* "Building on this analysis, we provide a sharp description of the corresponding generative flow" becomes "Building on this analysis, we provide a sharp description of the corresponding generative flow"* "which pushes the base Gaussian density forward to an approximation of the target density" becomes "which pushes the base Gaussian density forward to an approximation of the target density"* "Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture" becomes " Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture"* "which we show decays as $\Theta_n(\frac{1}{n})$" becomes "which we show decays as $\Theta_n(\frac{1}{n})$"* "Finally, this rate is shown to be in fact Bayes-optimal" becomes "Finally, this rate is shown to be in fact Bayes-optimal"
paper_authors: Owen Davis, Mohammad Motamed, Raul Tempone
for: constructed a neural network surrogate model using multi-fidelity information
methods: residual multi-fidelity computational framework, two neural networks working in concert
results: dramatic savings in computational cost, accurate predictions within small tolerancesAbstract
In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
摘要
在这项工作中,我们考虑了一个总体的神经网络模拟器使用多种精度信息的问题。给定一个便宜的低精度计算模型和一个昂贵的高精度计算模型,我们提出了一个多质量计算框架,它将计算模型之间的相关性表示为一个差分函数,这可能是一个非线性映射,它将1)输入空间中共享的模型输出和低精度模型输出相关联,2)两个模型输出之间的差异。为了实现这一点,我们训练了两个神经网络。第一个网络学习了差分函数,它在一小段高精度和低精度数据上进行了训练。一旦训练完成,这个网络就可以生成更多的 sintetic高精度数据,这些数据被用在第二个网络的训练中。第二个网络,一旦训练完成,就成为我们的神经网络模拟器,用于预测高精度量度。我们给出了三个数学例子,以示本提案的能力。特别是,我们发现,当输出预测需要在小误差内时,可以获得巨大的计算成本减少。
Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective
paper_authors: Kutay Bölat, Simon H. Tindemans, Peter Palensky
for: 这篇论文的目的是提出一种适应不同密度区域的逻辑密度函数模型,以优化数据驱动方法的能力。
methods: 该模型使用了自适应kernel density estimation(KDE)方法,并采用了留一个出去最大logs likelihood(LOO-MLL) criterion来避免缺失singularity问题。
results: 该模型在两个不同的电力系统数据集上进行了测试,并与 Gaussian mixture models进行了比较。结果表明,提出的模型具有扎实的性能,同时具有防止缺失singularity的保证。Abstract
Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
摘要
probabilistic 模型 power 系统操作和规划过程取决于数据驱动方法,需要具有足够的数据量。当历史数据缺乏这些数据时,可以模型下面的数据生成机制为概率分布,以评估数据质量并生成更多数据,如果需要。基于 kernel density estimation(KDE)的模型是非常流行的选择,但它们无法适应数据区域中的不同浓度。在这篇论文中,一种适应型KDE模型被employs,其中每个kernel具有自己的宽度。通过 leave-one-out maximum log-likelihood(LOO-MLL) criterion来避免普通的最大 log-likelihood(MLL) criterion所引起的孤立解,并且证明了LOO-MLL的可靠性。此外,在这个可靠性保证下,模型被扩展了,并将学习权重分配给kernel。此外,一种修改后的 expectation-maximization 算法被使用,以加速优化速度可靠地。提出的方法和模型在两个不同的电力系统数据集上进行了不同的统计测试和对比 Gaussian mixture models,结果表明,提出的模型具有良好的表现,同时具有可靠性保证。
Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models
results: 数值 validate 结果表明,PnP-ULA 的样本分布对于不符合的测量模型和滤波器有明确的敏感性。Abstract
Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
摘要
<>将文本翻译成简化中文。<> posterior sampling 已经被证明是一种有力的 bayesian 方法,用于解决图像反问题。最近的插入式不变Langevin算法(PnP-ULA)已经被认为是一种有前途的方法,通过将物理测量模型与深度学习假设相结合,使用图像去噪器来 specify deep-learning priors。然而, posterior sampling 分布中 PnP-ULA 与数据准确性和去噪器之间的复杂关系尚未被理论上分析。我们强调这一点,并提出了 posterior-L2 pseudometric,用于量化 PnP-ULA 下的明确误差 bound。我们在几个反问题上进行数值验证,结果表明,PnP-ULA 的 sampling 分布对于测量模型和去噪器的不一致具有高度的敏感性。
Distribution-free risk assessment of regression-based machine learning algorithms
results: 该论文通过对具有不同模型化情况、数据集大小和 конформаль预测方法的实验,证明了其方法的准确性和可靠性。Abstract
Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
摘要
In this paper, we focus on regression algorithms and the risk-assessment task of calculating the probability of the true label falling within a range around the model's prediction. We solve this problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a certain probability. By using this coverage property, we prove that our approximated failure probability is conservative, meaning it is not lower than the true failure probability of the machine learning algorithm.We conduct extensive experiments to empirically study the accuracy of our proposed method for problems with and without covariate shift. Our analysis covers different modeling regimes, dataset sizes, and conformal prediction methodologies.
Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks
methods: 使用 JOINT GROUP INVARIANT FUNCTION 系统地找到数据域中的 dual group action
results: 提出了一种新的 GROUP THEORETIC PROOF 用于证明普遍性定理,连接了几何深度学习与抽象的幂分析Note: “ JOINT GROUP INVARIANT FUNCTION” and “GROUP THEORETIC PROOF” are in English, as there is no direct Simplified Chinese translation for these terms.Abstract
The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
摘要
“对于输入数据的对称和几何都是嵌入到神经网络内部的内部数据表示中的一部分,但具体的编码规则尚未得到充分研究。我们将注意力集中在质共变函数的质共变函数域上,从数据领域上的群动作中找到另一个群动作在参数领域上。此外,我们引入了通过质共变函数所导引的扩展神经网络,并提出了一个新的群论证明其 universality 定理,使用 Schul's lemma。传统的 universality 定理都是基于函数分析方法,这项研究将几何深度学习与抽象数学分析之间的连结推广到群论方面。”Note: Simplified Chinese is used in this translation, which is a more casual and widely used version of Chinese. If you prefer Traditional Chinese, I can provide that version as well.
Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks
results: 研究表明,DNN可以被视为一种 dual voice transform,与 Koopman 运算器相关的 linear 表示。这种表示可以捕捉到DNN中隐藏的层结构,并且可以用于分析DNN的行为。Abstract
We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.
摘要
我们在深度神经网络(DNN)中寻找隐藏层,使用群作用在数据空间进行表示,并将DNN转换为koopman运算的双声变换。基于群理论的论据,特别是使用舒尔的 lemma,我们提供了DNN的 universality 的简单证明。Here's a breakdown of the translation:* "We identify hidden layers inside a DNN" becomes "我们在深度神经网络中寻找隐藏层"* "with group actions on the data space" becomes "使用群作用在数据空间进行表示"* "and formulate the DNN as a dual voice transform with respect to Koopman operator" becomes "并将DNN转换为koopman运算的双声变换"* "Based on the group theoretic arguments" becomes "基于群理论的论据"* "particularly by using Schur's lemma" becomes "特别是使用舒尔的 lemma"* "we show a simple proof of the universality of those DNNs" becomes "我们提供了DNN的 universality 的简单证明"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
High-dimensional Bayesian Optimization with Group Testing
results: GTBO在一些synthetic和实际高维优化任务上与现有方法竞争,并且可以帮助实际Operator发现活跃参数,从而提高问题理解。Abstract
Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
摘要
bayesian 优化是一种有效的优化昂贵黑盒函数的方法。高维问题特别困难,因为目标函数的模型受到维度之咒的影响,准确模型困难。我们提出了一种组测试方法,以便在这些领域中高效地优化。我们称之为组测试 bayesian 优化(GTBO)。在第一个测试阶段,GTBO首先运行一系列的组测试,选择并测试变量的组合,以确定影响目标函数的变量。然后,在第二个优化阶段,GTBO通过强调活跃维度来导航优化。通过利用轴对齐的子空间假设,GTBO与当前状态艺术方法竞争。此外,GTBO可以帮助发现应用中活跃参数,从而增强实践者对问题的理解。
Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models
results: 研究发现,使用 10 分钟滑动窗口可以在室内和家庭场景下分别达到 window-wise f1-scores 高于 0.95 和 Intersection-over-Union (IoU) f1-scores 高于 0.85。另外,使用 6 秒滑动窗口可以在家庭场景下识别四种 OEP subclass。结果表明,使用单个 IMU 可以准确地监测老年人参与 OEP 的情况,并且可以进行进一步的分析。Abstract
Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.
摘要
奥塔哥运动项目(OEP)是一项为老年人提供的康复计划,以提高衰退、肌肉萎缩和平衡。但监测older adults的参与度很具挑战性,因为自我报告(日记)通常不可靠。随着便携式传感器的发展,基于便携式传感器的人体活动识别(HAR)技术在医疗领域得到了广泛应用。然而,它们在OEP中的使用还具有有限的表现。本研究的目标是建立一个不侵入式和准确的OEP监测系统。数据来自于老年人穿着一个背部固定的�ер�orio measure(IMU)。研究采集了两组数据,一个在实验室中,另一个在老年人家中。提出了一种层次结构,包括两个阶段:1)使用深度学习模型来确定older adults是否在进行OEP或日常活动(ADLs),使用10分钟滑动窗口;2)基于第一阶段,使用6秒钟滑动窗口来识别OEP下的亚类。结果表明,在第一阶段,OEP可以在窗口级别上获得window-wise f1分数超过0.95,并且在IoU分数上超过0.85。在第二阶段,在家庭场景下,可以识别四种活动,其中f1分数超过0.8:脚踝肌肉、腹部肌肉、膝盖弯曲和坐姿转起。结果表明,使用单个IMU可以在日常生活中监测OEP的合作性。此外,一些OEP下的亚类也可以被识别,以供进一步分析。
results: 研究发现,使用 pré-trained扩散模型可以计算出高度准确的音乐惊喜度值,并且这些值与人类 слуша者的喜欢度评分 exhibit 一种负QUadratic关系。Abstract
A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.
摘要
一种流行的音乐情感响应理论中心于意外性和期望。在先前的工作中,这个想法被实现为probabilistic模型,允许精确计算歌曲(或每个音符)的概率,基于一个'训练集'的前期音乐或文化经验。然而,这些模型只能通过手动设计特征或restricted linear模型来计算恰当的概率,这些模型可能不能表达音乐中的复杂 conditional distribution。在这项工作中,我们提议使用现代深度概率生成模型,即Diffusion Model,计算音乐输入序列的近似概率。与先前工作不同,这种生成模型由深度神经网络参数化,可以直接从训练集中学习复杂非线性特征。我们预计,这种模型将更准确地表达人类听众对音乐的意外性。从文献来看,知道存在一个人际U型关系 между意外性和人类听众对一首歌曲的评分。在这项工作中,我们证明了预训练的Diffusion Models实际上对音乐的意外性值具有负二次关系,与测量的听众评分相关度呈现负相关性。因此,我们提出了这种模型作为现代深度生成模型的音乐期望和主观喜欢度的开发前期步骤。
TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method
paper_authors: Washington Cunha, Celso França, Leonardo Rocha, Marcos André Gonçalves
for: 该论文是为了解决企业间产品描述标准化问题,即将客户提供的产品描述与产品目录中的描述匹配。
methods: 该论文提出了一种基于Transformer的两步产品和类别描述检索方法(TPDR),利用注意力机制和对比学习来探索 semantic correspondence between IS和SD。
results: 该论文在11个真实公司的应用上实现了71%的正确检索和80%的正确分类,并且与纯粹的语法或semantic基线比较而言,效果提高达3.7倍。Abstract
There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves.
摘要
有一些公司专门为其他公司批购大量多种产品,主要挑战是标准化产品描述,即将客户提供的产品描述与公司 catalo 中的产品描述匹配。这是一个复杂的问题,因为客户的产品描述可能具有以下特点:(1)可能含有噪音(即不必要的信息);(2)短板和不够详细(例如缺少型号和大小信息);(3)跨语言。在这篇论文中,我们将这个问题正式化为排名任务:给定客户的初始产品规范(查询),返回最相应的标准化产品描述(响应)。我们提议使用 transformers 来解决这个问题,具体来说,我们使用 transformers 作为两个编码器,一个用于编码 IS(Initial Specification),另一个用于编码 SD(Standard Description),两者之间需要在 embedding 空间中相似。我们还使用对应对(IS、SD)的匹配性进行加重,使用特殊的损失函数进行强制匹配。此外,我们还使用一个(第二)重新排名步骤,基于语法特征,以便更好地匹配某些产品,这些产品可能由 transformers 被忽略。为评估我们的提议,我们使用了 11 个真实公司的数据集,覆盖不同的应用场景。我们的解决方案能够在前5名的排名中 retrieved 正确的标准化产品,并在 71% 的情况下在第一名中 retrieved 正确的类别。此外,与纯语法或 semantics 基eline 相比,我们的解决方案的效果提高了多达 3.7 倍,解决了 none 的方法无法办的案例。
The Geometric Structure of Fully-Connected ReLU-Layers
results: 这篇论文提出了一种简化表达折射面和折射平面的方法,并证明了这种结构可以在分类设置下描述决策边界。此外,论文还研究了一个具有一个隐藏层的普通feedforward网络的决策边界的几何复杂性,以及证明了这种网络只能生成$d$个不同的决策边界。最后,论文还讨论了增加更多层的影响。Abstract
We formalize and interpret the geometric structure of $d$-dimensional fully connected ReLU-layers in neural networks. The parameters of a ReLU-layer induce a natural partition of the input domain, such that in each sector of the partition, the ReLU-layer can be greatly simplified. This leads to a geometric interpretation of a ReLU-layer as a projection onto a polyhedral cone followed by an affine transformation, in line with the description in [doi:10.48550/arXiv.1905.08922] for convolutional networks with ReLU activations. Further, this structure facilitates simplified expressions for preimages of the intersection between partition sectors and hyperplanes, which is useful when describing decision boundaries in a classification setting. We investigate this in detail for a feed-forward network with one hidden ReLU-layer, where we provide results on the geometric complexity of the decision boundary generated by such networks, as well as proving that modulo an affine transformation, such a network can only generate $d$ different decision boundaries. Finally, the effect of adding more layers to the network is discussed.
摘要
我们正式化和解释了深度学习网络中的$d$-维完全相连ReLU层的几何结构。层的参数导致了输入空间的自然分割,在每个分割部分中,ReLU层可以大大简化。这导致了ReLU层的几何解释为一个投影到多边形锥后的拓扑变换,与[doi:10.48550/arXiv.1905.08922]中的对于具有ReLU启动函数的卷积网络的描述相符。此外,这结构使得partition部分与条件面的顶点的前像简化了,这有用于描述分类设定中的决策界面。我们在详细调查了一个具有一个隐藏层的对应网络,并提供了决策界面的几何复杂性的结果,以及证明这种网络只能生成$d$个不同的决策界面。最后,我们讨论了将更多层添加到网络中的效果。
paper_authors: Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca Vos, William Whitmer
results: 该论文通过使用HAAQI指标进行评估,发现该方法可以提高听力障碍人群对音乐的听众体验。Abstract
The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.
摘要
文本:The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.翻译: cadence 项目目标是提高音乐听众听力损伤人群的音频质量。为此,项目在ICASSP SP Cadenza Challenge:音乐分解/重新混音 для听众器中进行组织。挑战可以通过在听众器麦克风中分解音乐来实现,将 vocals、bass、鼓等组分分别处理,然后通过个性化方式进行重新混音,以提高音频质量。或者可以使用综合方法。处理需要考虑音乐本身,每个组分的增益应用,以及听众的听力损伤。提交的作品将根据潜入式对metric,听音器音频质量指数(HAAQI)进行评估。本文介绍了这个挑战。
The Blame Problem in Evaluating Local Explanations, and How to Tackle it
results: 研究发现,除了基于可读性模型的真实数据评价外,其他评价方法均受到一种问题称为“责任问题”的影响。此外,即使使用这种评价方法,本地解释评价仍然存在问题。Abstract
The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the "blame problem." In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.
摘要
“当前本地模型无关解释技术的数量呈急剧增长趋势。主要原因是开发新解释技术的门槛很低,由于缺乏优化的评价指标,没有充分的证据表明新解释技术是否可以显著超越前一代。我们的研究提出了一个新的本地解释评价分类法:可靠性、使用synthetic数据生成的真实数据和可解 modelo randomization、以及人类权威评价。使用这个分类法,我们发现所有类型的评价方法,除了基于可解 modelo的真实数据,都受到一种问题,我们称之为“责任问题”。在我们的研究中,我们认为这一类评价方法是评价本地模型无关解释的更合理的方法,但我们还发现这些评价方法具有进一步的局限性。本地解释评价仍然是一个开放的研究问题。”
Which mode is better for federated learning? Centralized or Decentralized
results: 研究发现,在平滑非对称目标函数上,中心化 federated learning(CFL)总是比分布式 federated learning(DFL)更好地泛化;在 CFL 中,采用部分参与比全参与更好;而 DFL 中,需要特定的topology来避免性能崩溃。Abstract
Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.
摘要
中央化和分布式方法都在联合学习(FL)中表现出色,但现有研究没有提供足够的证据来评估哪一种表现更好。虽然从优化角度来看,分布式方法可以在通信量更少的情况下达到相当于中央化方法的相同收敛性,但在实际研究中,它的测试性能总是较差。为全面探讨它们在FL中的行为,我们研究了它们的过剩风险,包括优化和泛化的共同分析。我们证明了以下结论:1)中央化联合学习(CFL)总是在不光滑非对称目标函数上的泛化性比分布式联合学习(DFL)更好; 2)在CFL中,采用偏函数参与部分比总参与更有优势; 3)在DFL中,避免训练规模增加时性能崩溃的必要条件是 topology。基于一些简单的硬件指标,我们可以评估哪种框架在实践中更好。我们在常见的FL设置下进行了广泛的实验,以验证我们的理论分析在实际场景中是Contextually valid。
FLAIM: AIM-based Synthetic Data Generation in the Federated Setting
paper_authors: Samuel Maddock, Graham Cormode, Carsten Maple
for: 防止个人隐私泄露,启用合作数据分享是组织所需。
methods: 使用 Synthetic Data Generation 技术生成 искусствен数据,保持了私人数据的统计性质。
results: 提出了一种基于 differential privacy 的 federated Synthetic Tabular Data Generation 方法 DistAIM 和 FLAIM,可以减少了 overhead 并在不同程度的 hetrogeniety 下提高了实用性。Abstract
Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.
摘要
results: 经过大规模的实验,这篇论文显示了Variational Inference在S&P 500指数的组成部分上的性能非常出色,并且与蒙特卡洛样本的 bayesian估计相比,Variational Inference具有更高的准确性和更好的可靠性。Abstract
The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.
摘要
“bayesian预测garch家族模型通常通过蒙地卡罗 sampling来实现,但是variational inference在复杂机器学习模型中的采用尚未得到广泛的应用。这篇文章探讨garch家族模型中variational inference是一种可靠和可行的bayesian预测方法的可行性。通过对S&P 500指数成分进行大规模实验,以及使用多种方差模型和一个案例研究,我们示出了variational inference是一种吸引人、很善地调整和竞争力强的bayesian学习方法。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
Over-the-Air Federated Learning with Compressed Sensing: Is Sparsification Necessary?
results: 研究发现,不需要将原始模型更新向量先将其简化,而直接将非零元素发送即可以达到更好的性能,即使在同样的总功率限制下。此外,作者还发现,在某些情况下,不使用线性压缩并直接发送简化后的模型更新也可以达到更好的性能。Abstract
Over-the-Air (OtA) Federated Learning (FL) refers to an FL system where multiple agents apply OtA computation for transmitting model updates to a common edge server. Two important features of OtA computation, namely linear processing and signal-level superposition, motivate the use of linear compression with compressed sensing (CS) methods to reduce the number of data samples transmitted over the channel. The previous works on applying CS methods in OtA FL have primarily assumed that the original model update vectors are sparse, or they have been sparsified before compression. However, it is unclear whether linear compression with CS-based reconstruction is more effective than directly sending the non-zero elements in the sparsified update vectors, under the same total power constraint. In this study, we examine and compare several communication designs with or without sparsification. Our findings demonstrate that sparsification before compression is not necessary. Alternatively, sparsification without linear compression can also achieve better performance than the commonly considered setup that combines both.
摘要
“气候飞行(OtA)联合学习(FL)”指的是一种FL系统,其中多个代理应用OtA计算来传输模型更新给共同的边缘服务器。两种重要的OtA计算特点,即线性处理和信号水平积加,驱动了使用线性压缩与压缩感知(CS)方法来减少通道上传输的数据样本数。先前的CS方法在OtA FL中的应用主要假设了原始模型更新向量是稀疏的,或者它们已经被稀疏化了 перед压缩。然而,是否Linear压缩与CS基于重建是更有效的,以及是否可以不进行稀疏化,这些问题尚未得到解答。在这项研究中,我们考虑了多种通信设计,其中一些包括稀疏化,而另一些则不包括稀疏化。我们的发现表明,稀疏化 перед压缩并不是必要的。相反,稀疏化而不进行Linear压缩可以实现更好的性能,与传统的设置相比。
RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning
paper_authors: Deepak Raina, Abhishek Mathur, Richard M. Voyles, Juan Wachs, SH Chandrashekhara, Subir Kumar Saha
for: 提高自主无人式超声系统中图像质量的问题
methods: 使用 Bayesian 优化法在扫描面上进行劳动力平衡,以实现融合探针的自动调整
results: 实验结果表明,提出的方法可以在不同的患者中获得高质量的超声图像,其中平均(±SD)的绝对角误差为2.4±0.7度和2.1±1.3度,分别在膜质量和3D人体模型中进行了验证。Abstract
The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.
摘要
一个重要挑战 faced by autonomous robotic ultrasound systems 是获得高质量图像 across different patients. 正确的探针旋转角度对于控制ultrasound图像质量具有关键作用。为了解决这个挑战,我们提议一种样本效率的方法,自动调整探针旋转角度,使其与接触面的法向成直角。我们利用 Bayesian Optimization (BO) 基于扫描面上的搜索来快速找到正常化探针旋转角度。我们定义了一个新的目标函数,用于BO搜索,该函数利用探针与接触面之间的Contact force measurement和下面的机械学来确定正常。我们进一步添加了一个补做项来处理目标函数中的噪声。实验结果表明,我们提议的策略可以在 urinary bladder phantoms 和 3D human mesh models 上具有高效性。具体来说,在所有 phantoms 和 3D models 上的 Mean(±SD)绝对角度误差为 $\boldsymbol{2.4\pm0.7^\circ}$ 和 $\boldsymbol{2.1\pm1.3^\circ}$, 分别。
EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning
paper_authors: Wonsik Jung, Eunjin Jeon, Eunsong Kang, Heung-Il Suk for:The paper aims to develop a novel explainability-guided region of interest (ROI) selection framework for brain disease identification using resting-state functional magnetic resonance imaging (rs-fMRI).methods:The proposed framework includes three steps: inter-regional relation learning, explainable connection-wise relevance score estimation, and non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning. The framework leverages an explainable artificial intelligence technique to identify non-linear high-order functional associations among brain regions and select class-discriminative regions for brain disease identification.results:The proposed method outperforms other comparative methods in terms of various evaluation metrics, and qualitative analysis of the selected ROIs identifies ASD subtypes linked to previous neuroscientific studies.Abstract
Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.
摘要
深度学习模型基于休息态功能磁共振成像(rs-fMRI)已广泛应用于诊断脑病,特别是自闭症 спектル异常(ASD)。现有研究通过使用 rs-fMRI 的功能相关性(FC)来实现可读性的分类性能。然而,这些研究存在一些局限性,包括FC的缺乏充分信息,不考虑患者之间的个体特征(如不同的症状或不同的病程度),以及分类过程的不可追溯性。为了缓解这些限制,我们提出了一种可追溯性导向的区域选择(EAG-RS)框架。该框架包括以下三步:1. 通过随机种子网络屏蔽来估计非线性关系,以便在不同的脑区之间建立非线性关系;2. 使用可追溯的人工智能技术来计算连接点级别的相关性分数,以探索高级别的函数连接关系;3. 基于高级别功能连接的非线性FC来选择诊断有用的区域,并使用这些区域来学习分类器,以便诊断ASD。我们在使用ABIDE数据集进行实验,并证明了我们的方法在不同的评价指标上表现出色。此外,我们还进行了质量分析选定的ROI,并发现了与先前的神经科学研究相关的ASDSubtypes。
Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning
methods: 本研究使用 Large Language Models (LLMs) 进行语言模型 fine-tuning,并 explore 不同的处理方法以便在私有部署中使用更强大的 LLMs 生成的理由。
results: 本研究发现,在私有部署中使用更强大的 LLMs 生成的理由可以提高语言模型的性能,但是需要根据不同的处理方法进行调整。Abstract
Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. However, in the field of content moderation, there is still a lack of detailed work that systematically introduces implementation details. In this paper, we introduce how to fine-tune an LLM model that can be privately deployed for content moderation. Specifically, we discuss whether incorporating reasons during the fine-tuning process would be better or if it should be treated as a classification task directly. We also explore the benefits of utilizing reasons generated by more powerful LLMs for fine-tuning privately deployed models and the impact of different processing approaches when the answers generated by the more powerful LLMs are incorrect. We report the entire research process and the key findings in this paper, hoping to provide valuable experience for researchers who are fine-tuning privately deployed models in their domain-specific research.
摘要
现在,每天有数十亿人在互联网上进行交流和表达自己的意见。然而,不幸的是,不所有的表达都是友好或合法的,因此内容审核成为不可或缺的任务。随着大语言模型(LLM)的成功发展,LLM基本方法在各个领域中成为了可能的解决方案。然而,在内容审核领域,还缺乏详细的实施细节。在这篇论文中,我们介绍了如何私有部署LLM模型进行内容审核。specifically,我们讨论了在细化过程中是否应该包含理由,或者直接将其视为分类任务。我们还探讨了使用更强大LLM生成的理由来细化私有部署模型的效果,以及不同处理方法的影响,当更强大LLM生成的答案错误时。我们报告了整个研究过程和关键发现,希望为审核私有部署模型的研究人员提供有价值的经验。
Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein
paper_authors: Hugues Van Assel, Cédric Vincent-Cuaz, Titouan Vayer, Rémi Flamary, Nicolas Courty
for: simultaneous reduction of both sample and feature sizes
methods: semi-relaxed Gromov-Wasserstein optimal transport (OT) problem
results: competitive hard clustering and summarization of real dataHere’s the full sentence in Simplified Chinese:
for: 这种方法用于同时减少样本和特征数量
methods: 使用半松散格罗莫-瓦asserstein优质运输问题
results: 竞争性强的硬聚类和实际数据的概要I hope this helps!Abstract
We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
摘要
我们提出了一种多样化的维度减少(DR)目标的变体,允许同时减少样本和特征的大小。通过一种半relaxed的格罗莫-瓦asserstein优化运输(OT)问题计算输入和嵌入样本之间的对应关系。当嵌入样本大小与输入一样时,我们的模型可以重新获得经典的受欢迎DR模型。当嵌入维度是自由的时,我们显示出OT计划可以提供竞争性的硬团 clustering。我们强调在实际数据概括时间中的中间阶段,这些阶段将DR和团 clustering融合起来,并应用我们的方法图像 dataset。
Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations
paper_authors: Lorenc Kapllani, Long Teng, Matthias Rottmann
for: This paper aims to study uncertainty quantification (UQ) for deep learning-based numerical schemes used to solve high-dimensional backward stochastic differential equations (BSDEs).
methods: The paper uses a UQ model that efficiently estimates the standard deviation (STD) of the approximate solution using only a single run of the algorithm, as well as estimates the mean of the approximate solution.
results: The numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes, and can identify hyperparameter values for which the scheme achieves good approximations. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values.Abstract
Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
摘要
高维度后逆随机分子方程(BSDE)的数学方法Recently, deep learning-based numerical schemes have attracted a lot of scientific attention. These schemes can approximate very high-dimensional BSDEs, but their reliability has not been well studied and is not well understood. In this work, we study the uncertainty quantification (UQ) of a class of deep learning-based BSDE schemes. Specifically, we identify the sources of uncertainty in these schemes and numerically study the impact of different sources. Typically, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. However, this approach is computationally expensive, especially for high-dimensional problems. Therefore, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be used to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model provides reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Moreover, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
Machine learning the interaction network in coupled dynamical systems
results: 这种模型在两个动力系统中进行了应用,包括彼此受托的粒子 mediated by Hooke’s law交互和相互频率振荡器。Abstract
The study of interacting dynamical systems continues to attract research interest in various fields of science and engineering. In a collection of interacting particles, the interaction network contains information about how various components interact with one another. Inferring the information about the interaction network from the dynamics of agents is a problem of long-standing interest. In this work, we employ a self-supervised neural network model to achieve two outcomes: to recover the interaction network and to predict the dynamics of individual agents. Both these information are inferred solely from the observed trajectory data. This work presents an application of the Neural Relational Inference model to two dynamical systems: coupled particles mediated by Hooke's law interaction and coupled phase (Kuramoto) oscillators.
摘要
研究互动动力系统仍然在不同科学和工程领域吸引研究者的兴趣。在一个集合中的互动网络包含对各个组件之间的交互信息。从动态代理的观察数据中推断交互网络的信息是长期关注的问题。在这个工作中,我们采用了一种无监督神经网络模型,以实现两个目的:恢复交互网络和预测个体代理的动态。这两个信息都是从观察轨迹数据中推断出来的。这篇文章描述了使用神经关系推断模型在两种动力系统中进行应用:彼此受欧姆法则互动的集合体和彼此受库拉摩托oscillators互动。
An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples
results: 这个论文的实验结果表明,包含心理听觉因素和抗击性的算法在语音识别系统的robustness和人类听众的perceptibility两个方面均显示出改善,但是在word error rate(WER)方面受到了一定的影响。Abstract
Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate the perceptibility. Our algorithms considering psychoacoustics only or in addition to the robustness show an improvement in the signal-to-noise ratio (SNR) as well as in the human perception study, at the cost of an increased word error rate (WER).
摘要
audio adversarial examples 是指一种受到自动语音识别(ASR)系统 manipulate 而仍然听起来如人类听起来的音频文件。大多数生成这些样本的方法都是基于两步算法:首先生成可靠的反对例音频文件,然后对其进行辐射和Robustness 的微调。在这项工作中,我们提出了一种集成的算法,使用心理听觉模型和房间冲击响(RIR)在生成过程中。RIR 在生成过程中由神经网络动态创建,以模拟物理环境,以防止在无线攻击中的变换。我们在三个实验中比较了不同的方法:在模拟环境中和真实的无线攻击enario中评估了 robustness,以及在人类研究中评估了人类听起来。我们的算法只考虑心理听觉或者在 robustness 上加以考虑,都显示了 SNR 的提高,以及人类听起来的研究中的提高,但是在 WER 方面带来了一定的增加。
LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework
results: 通过在MiniGrid和Atari环境中的多种实验,证明了提案的探索框架的效果。Abstract
In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
摘要
在这篇论文中,一种基于选项-评估器模型的探索游戏RL框架被提出。该提议的框架可以学习集成一组多样化的探索策略,使代理人可以适应不同任务的不同探索策略,以实现每个任务的有效的探索-利用交互。实验表明,提议的探索框架在MiniGrid和Atari环境中得到了证明。
Probabilistic Forecasting of Day-Ahead Electricity Prices and their Volatility with LSTMs
methods: 我们使用了Long Short-Term Memory(LSTM)模型来预测德国卢森堡日前电力价格,以适应这些挑战。LSTM模型的循环结构允许模型适应趋势,并jointly predicting both mean and standard deviation允许probabilistic prediction。
results: 使用物理启发的方法——超 statistics来解释价格的统计性,我们显示了LSTM模型能够准确地预测价格和其波动性。Abstract
Accurate forecasts of electricity prices are crucial for the management of electric power systems and the development of smart applications. European electricity prices have risen substantially and became highly volatile after the Russian invasion of Ukraine, challenging established forecasting methods. Here, we present a Long Short-Term Memory (LSTM) model for the German-Luxembourg day-ahead electricity prices addressing these challenges. The recurrent structure of the LSTM allows the model to adapt to trends, while the joint prediction of both mean and standard deviation enables a probabilistic prediction. Using a physics-inspired approach - superstatistics - to derive an explanation for the statistics of prices, we show that the LSTM model faithfully reproduces both prices and their volatility.
摘要
准确预测电力价格对电力系统管理和智能应用的发展非常重要。欧洲电力价格在俄罗斯入侵乌克兰后高涨并变得极为不稳,挑战了传统预测方法。我们在这篇文章中介绍了一个基于Long Short-Term Memory(LSTM)模型的德国卢森堡日前电力价格预测方法,以解决这些挑战。LSTM模型的循环结构使其能够适应趋势,而联合预测两者的平均值和标准差使得预测变得 probabilistic。通过基于物理学的方法——超统计学——获得价格统计的解释,我们表明LSTM模型能够准确地复制价格和其不稳。
Untargeted White-box Adversarial Attack with Heuristic Defence Methods in Real-time Deep Learning based Network Intrusion Detection System
paper_authors: Khushnaseeb Roshan, Aasim Zafar, Sheikh Burhan Ul Haque
For: This research work aims to increase the robustness of Machine Learning (ML) and Deep Learning (DL) based Network Intrusion Detection Systems (NIDS) against adversarial attacks.* Methods: The research uses four powerful adversarial attack techniques (FGSM, JSMA, PGD, and C&W) to evaluate the performance of NIDS under adversarial attack situations. It also employs three heuristics defense strategies (AT, GDA, and HC) to improve the NIDS robustness.* Results: The research demonstrates the complete workflow of the proposed approach in a real-time network with data packet flow and evaluates the performance of NIDS under adversarial attacks using various performance metrics.Abstract
Network Intrusion Detection System (NIDS) is a key component in securing the computer network from various cyber security threats and network attacks. However, consider an unfortunate situation where the NIDS is itself attacked and vulnerable more specifically, we can say, How to defend the defender?. In Adversarial Machine Learning (AML), the malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions with intentionally crafted adversarial examples. These adversarial perturbed examples have become the biggest vulnerability of ML and DL based systems and are major obstacles to their adoption in real-time and mission-critical applications such as NIDS. AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks and their defence strategies to safeguard the computer network from various cyber security threads. In this research work, we aim to cover important aspects related to NIDS, adversarial attacks and its defence mechanism to increase the robustness of the ML and DL based NIDS. We implemented four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS. We analyzed its performance in terms of various performance metrics in detail. Furthermore, the three heuristics defence strategies, i.e., Adversarial Training (AT), Gaussian Data Augmentation (GDA) and High Confidence (HC), are implemented to improve the NIDS robustness under adversarial attack situations. The complete workflow is demonstrated in real-time network with data packet flow. This research work provides the overall background for the researchers interested in AML and its implementation from a computer network security point of view.
摘要
在这项研究中,我们将探讨 NIDS 相关的重要方面,包括反对攻击和防御策略,以提高 ML 和 DL 基础的 NIDS Robustness。我们在 NIDS 中实现了四种强大的反对攻击技术,分别是 Fast Gradient Sign Method (FGSM)、Jacobian Saliency Map Attack (JSMA)、Projected Gradient Descent (PGD) 和 Carlini & Wagner (C&W)。我们在详细的性能指标下进行了分析。此外,我们还实现了三种较为有效的防御策略,即 Adversarial Training (AT)、Gaussian Data Augmentation (GDA) 和 High Confidence (HC)。整个工作流程在实时网络中进行了示例。本研究提供了对 AML 的实现和应用在计算机网络安全方面的全面背景,为研究者提供了一个好的入门点。
Fine-tune Language Models to Approximate Unbiased In-context Learning
results: 通过实验 validate our algorithm的性能,发现与 benchmark 包括协助示例基于的域内学习和 класси型 fine-tuning 方法相比,our algorithm 具有显著提高的性能。Abstract
In-context learning (ICL) is an astonishing emergent ability of large language models (LLMs). By presenting a prompt that includes multiple input-output pairs as examples and introducing a new query input, models can generate the corresponding output. However, the performance of models heavily relies on the quality of the input prompt when implementing in-context learning. Biased or imbalanced input prompts can significantly degrade the performance of language models. To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning). This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example to approximate unbiased in-context learning. Furthermore, we also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL (Linear Approximation of Reweighted In-context Learning). This algorithm requires minimal training cost while providing effective results. We prove the convergence of our algorithm and validate its performance through experiments conducted on a numerical dataset. The experimental findings reveal a substantial improvement in comparison to benchmarks including the performance of casual prompt-based in-context learning and the performance of a classic fine-tuning method.
摘要
大语言模型(LLM)的内容学习(ICL)是一种惊人的出现能力。通过提供包含多个输入输出对的示例,并在新的查询输入上引入一个新的查询,模型可以生成相应的输出。但是,模型的性能受到干扰输入提示的质量的影响。偏见或不均匀的干扰输入可能会严重降低语言模型的性能。为解决这个问题,我们提出了一个替Weighted algorithm called RICL(重新定量内容学习)。这个算法可以通过一个不偏见的验证集来调整语言模型,以 aproximate 不偏见的内容学习。此外,我们还引入了一个低成本的替Weighted algorithm,即Linear Approximation of Reweighted In-context Learning(线性 aproximation of Reweighted ICL)。这个算法仅需少量的训练成本,但可以提供有效的结果。我们证明了我们的算法的渐进性,并通过实验显示了与参考值相比,包括内容学习和 класи的 fine-tuning 方法的性能有所改善。
BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs
results: 实验结果表明,BioBridge可以在多模态检索任务中超过基eline KG嵌入方法(在 average 约76.3%),并且 BioBridge 还示出了在不同模式或关系上的外部泛化能力。此外,BioBridge 还可以用于生物医学多模态问答以及引导生成新药的帮助。Abstract
Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
摘要
基础模型(FM)可以利用大量未标注数据来实现广泛的任务表现优秀。然而,生物医学领域中的FM主要是单模态的,即独立地训练和使用对蛋白序列、小分子结构或医疗数据进行任务。为了超越生物医学领域中的FM限制,我们提出了 BioBridge,一种新的参数效率学习框架,用于将独立地训练的单模态FM连接起来,以实现多模态行为。BioBridge通过使用知识图(KG)来学习模态之间的变换,而无需修改任何基础模型。我们的实验结果表明,BioBridge可以在多模态检索任务中击败最佳基eline KG嵌入方法(平均提高约76.3%)。此外,我们也发现 BioBridge具有跨领域泛化能力,可以在未看到的模式或关系上进行推断。此外,我们还证明 BioBridge可以作为生物医学多模态问答系统中的通用检索器,以及生成新药的指导生成助手。
results: 作者通过实验表明,RES可以有效地提高GCL模型的可靠性,并且可以在下游任务中保持证明Robustness。Abstract
Graph Contrastive Learning (GCL) has emerged as a popular unsupervised graph representation learning method. However, it has been shown that GCL is vulnerable to adversarial attacks on both the graph structure and node attributes. Although empirical approaches have been proposed to enhance the robustness of GCL, the certifiable robustness of GCL is still remain unexplored. In this paper, we develop the first certifiably robust framework in GCL. Specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL. We then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks. Furthermore, an effective training method is proposed for robust GCL. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model. The source code of RES is available at https://github.com/ventr1c/RES-GCL.
摘要
GRAPH CONTRASTIVE LEARNING (GCL) 已经成为一种受欢迎的无监督 гра网表示学习方法。然而,它已经被证明是对 both гра网结构和节点属性的攻击敏感。 although empirical 方法已经被提出来增强 GCL 的韧性, GCL 的认证韧性仍然未知。 在这篇文章中,我们开发了 GCL 的第一个认证韧性框架。 specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL。 we then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks。 Furthermore, an effective training method is proposed for robust GCL。 extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model。 the source code of RES is available at https://github.com/ventr1c/RES-GCL。
Deep Variational Multivariate Information Bottleneck – A Framework for Variational Losses
paper_authors: Eslam Abdelaleem, Ilya Nemenman, K. Michael Martini for: 这种方法的目的是用信息理论来推导和总结现有的维度减少方法,并设计新的方法。methods: 这种方法基于一种多重信息瓶颈的解释,其中两个 bayesian 网络相互质量。具体来说,首先是一个编码图,它指定了压缩数据时需要保留的信息。其次是一个解码图,它指定了数据的生成模型。通过这种解释,我们可以重新计算现有的维度减少方法,包括深度维度减少信息瓶颈(DVIB)、β-VAE 和深度维度减少 canonical correlation analysis(DVCCA)。此外,我们还 derivated一种新的维度减少方法,深度维度减少Symmetric informational bottleneck(DVSIB),它同时压缩两个变量,以保留它们压缩表示中的信息。results: 我们实现了所有这些算法,并对一个修改后的噪音 MNIST 数据集进行评估。结果显示,better matched to the structure of the data 的算法(β-DVCCA 和 DVSIB)可以生成更好的低维度减少空间, measured by classification accuracy and the dimensionality of the latent variables。我们认为这种框架可以用来统一其他多视图表征学习算法,并提供一个直观的框架来 derive 问题特定的损失函数。Abstract
Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.
摘要
“维度减少方法已经知道其高精度、生成能力和稳定性。这些方法有许多理论基础。在这篇文章中,我们提出一个统一的原理,基于信息理论,将现有的维度减少方法推广和更新。我们基于多重信息瓶颈的解释,将第一个网络 interpret as an encoder graph,它决定了对数据压缩时保留的信息。第二个网络 interpret as a decoder graph,它决定了资料生成模型。使用这个框架,我们可以重新计算现有的维度减少方法,如深度维度减少信息瓶颈(DVIB)、β-VAE和深度维度减少 canonical correlation analysis(DVCCA)。这个框架还导出了一个内部对数据结构的调整参数,从而产生了β-DVCCA家族。此外,我们还 deriv了一个新的维度减少方法,深度维度减少 симметри情报瓶颈(DVSIB),它同时将两个变数压缩以保留它们压缩表示之间的信息。我们实现了所有这些算法,并评估它们在一个修改后的杂音MNIST dataset上的表现。我们发现,对于资料结构更加适合的算法(β-DVCCA和DVSIB)生成了更好的对数据空间, measured by classification accuracy and the dimensionality of the latent variables。我们认为这个框架可以用来统一其他多观点表示学习算法,并提供了一个直观的框架 для deriving问题特定的损失函数。”
Learning Energy Decompositions for Partial Inference of GFlowNets
paper_authors: Hyosoon Jang, Minsu Kim, Sungsoo Ahn
for: This paper aims to improve Generative Flow Networks (GFlowNets) for sampling objects from the Boltzmann energy distribution using partial inference.
methods: The paper proposes a novel approach called Learning Energy Decompositions for GFlowNets (LED-GFN), which decomposes the energy of an object into learnable potential functions defined on state transitions and reparameterizes the flow functions using these potential functions.
results: The proposed LED-GFN method is empirically verified to be superior to traditional GFlowNets in five problems, including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.Abstract
This paper studies generative flow networks (GFlowNets) to sample objects from the Boltzmann energy distribution via a sequence of actions. In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions. To this end, the recently developed forward-looking GFlowNet reparameterizes the flow functions based on evaluating the energy of intermediate states. However, such an evaluation of intermediate energies may (i) be too expensive or impossible to evaluate and (ii) even provide misleading training signals under large energy fluctuations along the sequence of actions. To resolve this issue, we propose learning energy decompositions for GFlowNets (LED-GFN). Our main idea is to (i) decompose the energy of an object into learnable potential functions defined on state transitions and (ii) reparameterize the flow functions using the potential functions. In particular, to produce informative local credits, we propose to regularize the potential to change smoothly over the sequence of actions. It is also noteworthy that training GFlowNet with our learned potential can preserve the optimal policy. We empirically verify the superiority of LED-GFN in five problems including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
摘要
The main idea of LED-GFN is to decompose the energy of an object into learnable potential functions defined on state transitions, and reparameterize the flow functions using these potential functions. To produce informative local credits, the authors regularize the potential to change smoothly over the sequence of actions. The authors also show that training GFlowNet with the learned potential can preserve the optimal policy.The paper is evaluated on five problems, including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences, and the results demonstrate the superiority of LED-GFN over traditional GFlowNet.
A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling
methods: 使用隐藏变量泛函过程(Latent Variable Gaussian Process) Map different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels,并在每个替入样本迭代中使用预后分析确定下一个样本的最佳选择。
results: 在测试问题中,提出的方法比基准方法在多模型适应性(Multi-fidelity)问题中具有更高的准确率和更好的稳定性,并且方法具有可Switch между多模型适应性(Multi-fidelity)和 bayesian优化(Bayesian Optimization)问题的灵活性。Abstract
Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.
摘要
多 fideltiness(MF)方法在增强仿真模型和设计优化中得到广泛应用,但大多数现有MF方法假设固定数据集,不能动态分配资源于不同级别的模型。而我们提议的框架是基于不同级别模型之间的秘密嵌入和相关分析来显式利用它们之间的相关性,以便适应тив sampling。在我们的框架中,每次插入样本迭代包括两步:首先使用高级别(HF)模型确定有最大改进潜力的位置,然后在所有级别模型中寻找最大改进效果与这个位置相关的下一个样本。这是由单个秘密变量 Gaussian Process(LVGP)模型实现的,该模型将不同级别模型映射到可解释的秘密空间,以捕捉它们之间的相关性而不需要假设层次结构。LVGP允许我们在先后分析中评估不同级别样本是否会影响HF响应,并确定下一个样本具有最好的利弊比。通过测试,我们表明了我们的方法在多 fideltiness全球适应(GF)和 Bayesian 优化(BO)问题中具有更高的速度和稳定性,并且可以根据采样函数来 switching между GF和BO。
Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations
results: 我们的实验结果显示,传统的防御模型对于攻击探索无法获得显著的效果。但是,我们发现可以通过删除潜在的攻击表面来大大增加防御力。因此,我们提出了一些简单 yet effective的方法来减少攻击表面的影响。总的来说,我们的图形基础攻击探索方法可以实现高度的测试精度,分别为88.32%和88.19%。Abstract
Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.
摘要
为了提高恶意软件检测的稳定性,我们研究现有恶意软件检测系统的攻击表面。我们对实际的Binary级黑盒反对恶意软件示例进行根本分析。同时,我们揭示检测引擎中敏感的可变特征,并证明其可以被利用。在软件中挖掘可变信道,我们提出三种软件预处理步骤来减少攻击表面,即减少padding,软件剥离,和信道重置。此外,为了对抗增长Section插入攻击,我们提议一种基于图的Section依赖信息提取方案。该方案利用软件中不同Section中的积累信息,以便实现robust的恶意软件检测和缓解对抗设定。我们的实验结果表明,传统的恶意软件检测模型对对抗性攻击无效。但是,可以通过消除可变信道来减少攻击表面。因此,我们提出了一些简单 yet有效的方法来缓解对Binary manipulate攻击的影响。总的来说,我们的图基于恶意软件检测方案可以准确地检测恶意软件,AUC分数为88.32%,并在Binary manipulate攻击下 scores为88.19%,表明我们提出的方案的效果。
Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems
paper_authors: Muhammad Kamran Saeed, Ahmed E. Kamal, Ashfaq Khokhar for: 这篇论文关注了大量MIMO系统中的导航信号污染和可扩展性问题。methods: 该论文提出了一种新的导航分配方案,基于设备数据传输模式,将导航序列分配给设备群组而不是个体设备。此外,该论文还使用了最大k-cut图分割方法来解决多个Cell的干扰问题。results: 该论文显示,提出的方案可以显著改善大量MIMO系统的 spectral efficiency,并提高可扩展性。例如,使用十个导航序列可以容纳200个设备,只有12.5%的漏掉率。Abstract
Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate.
摘要
大规模MIMO在5G网络发展中扮演重要角色。这篇论文强调 orthogonal pilot sequence的重复和大规模MIMO系统的可扩展性问题。当前的实践是在邻近细分单元中重复orthogonal pilot sequence,这会导致进出inter-和intra-细分单元的pilot sequence很难分辨。为了解决这些挑战,这篇论文提出了一种创新的pilot分配方案,基于IoT设备的数据传输模式。该方案将orthogonal pilot sequence分配给设备集而不是单个设备,allowing multiple devices to share the same pilot for periodically transmitting data。此外,我们将pilot分配问题转化为图色分问题,使用max k-cut图分解方法来解决多细分单元大规模MIMO系统中的pilot污染。提议的方案可以显著提高spectral efficiency和大规模MIMO系统的可扩展性;例如,使用十个orthogonal pilot sequence可以容纳200个设备,只有12.5%的漏掉率。
Fragment-based Pretraining and Finetuning on Molecular Graphs
results: 研究发现,GraphFP可以提高5个常见分子测试集上的性能,并在长距离生物测试集上提高至少11.5%。Abstract
Property prediction on molecular graphs is an important application of Graph Neural Networks. Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principal subgraph mining, we obtain a compact vocabulary of prevalent fragments from a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one on molecular graphs and the other on fragment graphs, which represents higher-order connectivity within molecules. By enforcing consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that the embeddings capture structural information at multiple resolutions. The structural information of fragment graphs is further exploited to extract auxiliary labels for graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%. Code is available at: https://github.com/lvkd84/GraphFP.
摘要
“分子图学习是Graph Neural Networks的重要应用之一。最近,无标记分子数据变得更加广泛,这使得自主学习在化学领域中的GNN发展得更加快速。在这项工作中,我们提议在分子图中预训练GNN,这是一个有前途的中间层,以超越节点级和图级预训练的局限性。通过抽取大量预训练数据中的精炼 vocabulary,我们引入了一些基于分子图的Word2Vec模型,以及一些基于分子图的对比和预测任务。这些任务包括对分子图和分子图中的碎片图进行对比学习,以及在分子图上预测分子的特征。通过在碎片图上预训练GNN,我们可以在下游预测时使用这些碎片信息。我们的分子图碎片预训练(GraphFP)在8个常见分子benchmark中提高了5个benchmark的性能,并在长距离生物benchmark中提高了至少11.5%的性能。代码可以在:https://github.com/lvkd84/GraphFP中找到。”
UniPredict: Large Language Models are Universal Tabular Predictors
results: experiments 表明,UniPredict 模型在169个不同目标列的表格数据集上显示出了5.4%到13.4%的优势,比基eline最佳树 boosting 和基eline最佳神经网络基eline的性能更高。此外,UniPredict 模型在几十个少量数据集上进行几步学习中也表现出色,超过了 XGBoost 在低资源设置下的性能,并在所有基eline之上显示出了显著的差异。Abstract
Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we show that scaling up an LLM to extensive tabular datasets with the capability of comprehending diverse tabular inputs and predicting for target variables following the input instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks, where our method outperforms XGBoost over 100% on the low-resource setup and shows a significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
摘要这是一个基本的机器学习任务,它在许多应用中具有重要性。现有的方法主要靠归化模型,并假设有固定的目标字段,需要重新训练每个新的预测任务。这篇论文参考了大型自然语言模型(LLMs)的生成能力,构建了一个通用的 tabular 数据预测器,即 UniPredict。我们在169个不同目标的 tabular 数据集合上训练了单一的 LLM,并与每个数据集合 separately 训练的基准相比较。我们发现这个多元的 UniPredict 模型在与其他模型进行比较中,有优势,兹从5.4% 到13.4%。我们还将 UniPredict 应用于几何少学习设定中,在另外 62 个 tabular 数据集合上进行测试。我们发现我们的方法在新任务上快速适应,比 XGBoost 在低资源设置上高于 100%,并在所有基准中显示了明显的优势。我们希望这个 UniPredict 可以照亮开发一个通用的 tabular 数据预测系统,可以从数据中学习,并且能够应用于广泛的预测任务。
Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets
paper_authors: Song Wei, Xiangrui Kong, Sarah A Huestis-Mitchell, Shixiang Zhu, Yao Xie, Alinson Santos Xavier, Feng Qiu
for: The paper is written to address the challenges of identifying systematic biases in the energy sector, particularly in low-income and elderly-populated areas, using a novel approach for counterfactual causal analysis centered on energy justice.
methods: The paper uses subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup.
results: The paper finds that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions, highlighting existing biases in the power system and the need for focused improvements in areas with economic challenges.Here’s the same information in Simplified Chinese text:
results: 论文发现,低收入和老龄人口区域 invariably 经历长时间的停电,不受天气条件影响,这指示了现有的能源系统偏见,并高亮了需要在经济困难地区进行专注改进。Abstract
Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges.
摘要
能源正义是现代能源研究领域的一个快速发展领域。然而,在能源领域中发现系统性偏见仍然是一个挑战,因为存在干扰变量、复杂的治理效果差异和数据稀缺。为解决这些挑战,我们提出了一种新的对假 causal分析方法, centered on 能源正义。我们使用 subgroup analysis来管理多种因素,并利用转移学习的想法来减轻每个 subgroup 中的数据稀缺。在我们的数字分析中,我们对大规模的客户级别停电数据集进行了应用,并 investigate 对假因素,如收入和人口年龄,对停电时间的counterfactual效果。我们的结果表明,低收入和老龄人口地区总是经历 longer 停电时间,无论天气条件如何。这指出了现有的电力系统偏见,并高亮了需要对经济困难地区进行targeted 改进。
results: 我们的实验表明,我们提出的模型在多个benchmark分子设计任务上达到了状态的前景性表现。Abstract
This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
摘要
A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector.2. A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in 1. We adopt the causal Transformer model that takes the latent vector in 1 as prompt.3. A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in 1. We call the proposed model the latent prompt Transformer model.After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state-of-the-art performances on several benchmark molecule design tasks.
Relational Convolutional Networks: A framework for learning representations of hierarchical relations
results: 我们提供了 architecture 的背景和细节,以及一些实验来证明 relational convolutional networks 可以有效地处理具有层次结构的关系任务。Abstract
A maturing area of research in deep learning is the development of architectures that can learn explicit representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". Given a sequence of objects, a "multi-dimensional inner product relation" module produces a relation tensor describing all pairwise relations. A "relational convolution" layer then transforms the relation tensor into a sequence of new objects, each describing the relations within some group of objects at the previous layer. Graphlet filters, analogous to filters in convolutional neural networks, represent a template of relations against which the relation tensor is compared at each grouping. Repeating this yields representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
摘要
深度学习领域中一个成熔的研究方向是开发能够学习明确的关系特征表示的架构。在这篇论文中,我们关注了层次关系的学习表示问题,提出了“关系卷积网络”架构。给定一个对象序列,“多维内积关系”模块生成一个关系张量,描述所有对象之间的对称关系。然后,“关系卷积”层将关系张量转换成一个新的对象序列,每个对象描述了上一层层次结构中的关系。Graphlet筛选器,与 convolutional neural networks 中的筛选器相似,表示了关系模板,用于在每个分组中对关系张量进行比较。重复这个过程可以获得更高阶的层次关系表示。我们介绍了架构的动机和细节,以及一系列实验来证明关系卷积网络可以提供有效的层次结构模型。
Observatory: Characterizing Embeddings of Relational Tables
results: 这篇论文的分析结果显示,一些关系表嵌入模型对表格结构(如列顺序)有敏感性,功能依赖关系罕见地反映在嵌入中,而特циалиzed表嵌入模型的样本准确率相对较低。这些发现可以帮助研究者和实践者更好地预测模型的行为,选择合适的模型进行下游任务,同时促进研究者开发新的模型。Abstract
Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model for a given task reliant on trial and error. There is an urgent need to gain a comprehensive understanding of these models to minimize inefficiency and failures in downstream usage. To address this need, we propose Observatory, a formal framework to systematically analyze embedding representations of relational tables. Motivated both by invariants of the relational data model and by statistical considerations regarding data distributions, we define eight primitive properties, and corresponding measures to quantitatively characterize table embeddings for these properties. Based on these properties, we define an extensible framework to evaluate language and table embedding models. We collect and synthesize a suite of datasets and use Observatory to analyze seven such models. Our analysis provides insights into the strengths and weaknesses of learned representations over tables. We find, for example, that some models are sensitive to table structure such as column order, that functional dependencies are rarely reflected in embeddings, and that specialized table embedding models have relatively lower sample fidelity. Such insights help researchers and practitioners better anticipate model behaviors and select appropriate models for their downstream tasks, while guiding researchers in the development of new models.
摘要
研究者和实践者很感兴趣利用最新的语言模型和专门的表嵌入模型在表格数据上进行多种任务。然而,对这些模型和它们生成的表嵌入 representations 的 Limited understanding 使得在选择合适的模型时存在很多尝试和失败。为了解决这问题,我们提出了 Observatory,一个正式的框架来系统地分析表嵌入表示。我们受到关系数据模型的 invariants 和数据分布的统计考虑,定义了八个原始属性,并对它们定义了相应的量化度量来 caracterize 表嵌入表示。基于这些属性,我们定义了一个扩展性强的框架来评估语言和表嵌入模型。我们收集了和合并了一系列数据集,并使用 Observatory 分析了七种模型。我们的分析提供了关于学习表嵌入表示的各种强点和弱点的视角,例如表格结构如column order 对模型的敏感性,函数依赖关系在嵌入表示中的罕见出现,以及专门的表嵌入模型在样本准确性方面的较低水平。这些视角可以帮助研究者和实践者更好地预测模型的行为,选择合适的模型进行下游任务,并促进研究者在开发新模型方面的进展。
History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization
paper_authors: Su Jiang, Louis J. Durlofsky for: 这个论文主要关注于如何通过监测数据来减少不确定性,从而改善大规模碳Capture和存储操作中 aquifer 的管理。methods: 这个论文使用了数据空间反推(DSI)技术,通过直接从观测数据中寻找历史匹配的量 interesting,而不需要构建后验地球模型。这里使用了深度学习来parameterize spatio-temporal压力和气溶度场的表达。results: 研究发现,使用这种新的深度学习参数化技术可以减少 posterior 压力和气溶度场中的不确定性,并且可以提供高效的 posterior 预测。这种方法可以应用于多种不同的地质enario中,并且可以efficiently 处理大规模的数据。Abstract
History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
摘要
历史匹配基于监测数据可以减少不确定性,从而改进地下温室气体存储操作的管理。传统的模型基于数据整合中,地球模型参数被修改以使流 simulate结果和观测数据匹配。在数据空间反向整合(DSI)中,历史匹配的量据 Interest,例如后逻 posterior压力和渗透率场,通过直接推算而不需要构建 posterior 地球模型。这可以通过一组 O(1000) 的先 simulation 结果、数据 parameterization 和 posterior 采样在 Bayesian 设定下进行高效地完成。在这种研究中,我们开发并实现了基于深度学习的参数化方法,用于表示空间temporal压力和渗透率场的 spatio-temporal 分布。这种参数化使用了一个对抗 autoencoder(AAE)进行维度减少,并使用一个 convolutional long short-term memory(convLSTM)网络来表示空间分布和时间演化的压力和渗透率场。这种参数化与一个ensemble smoother with multiple data assimilation(ESMDA)在 DSI 框架中使用,以实现 posterior 预测。我们考虑了一个实际的3D系统,其中 prior geological realizations 是从一系列地质enario中采样的。我们还引入了一种本地网格细化过程,以估计历史匹配 формулы中出现的错误covariance 项。我们对多种真实模型进行了广泛的历史匹配结果,并在所有情况下都 achieve 了重要的不确定性减少。这种框架可以高效地提供 posterior 预测,对于多种错误covariance 规则。这种评估 would be expensive using a model-based approach。
TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design
methods: 我们开发了一种基于transformer的停搁分子检测器,以便快速计算停搁分子的拥有能力。此外,我们还提出了several rounds of active learning,通过使用停搁分子来提高停搁分子的预测。这种方法可以高效地探索分子空间。
results: 对比baseline方法,分子生成使用TacoGFN和其变种显著超越了所有性能指标(停搁分数、QED、SA、Lipinski),而且计算速度是当前最快的一个数量级。Abstract
We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
摘要
我们寻求自动生成适应特定蛋白质袋子目标的药物类分子。现有方法通常对蛋白质-分子分布数据集进行近似,因此很难生成具有显著的绑定提升的分子。我们将袋子条件的分子生成任务视为一个RL问题,并开发了目标条件生成流网络模型(TacoGFN)。我们的方法会主动生成具有愿景属性的分子,而不是适应已有数据分布。为此,我们开发了基于变换器的吸引力预测器,以加速吸引力预测和提高分子空间的探索效率。此外,我们进行了多轮活动学习,通过使用吸引力或acles来改进吸引力预测。这种方法让我们可以准确地探索计算能力范围内的分子领域。实验表明,使用TacoGFN和其变种生成的分子具有显著性,在每一个性能指标(吸引力、QED、SA、利平斯基)上都高于所有基eline方法,而且计算速度是其他方法的数量级快。
Formal and Practical Elements for the Certification of Machine Learning Systems
results: 这篇论文通过对视觉 landing 的应用来展示了其certification framework的效果。Abstract
Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.
摘要
过去一个十年,机器学习已经表现出了很强的成果,经常超越人类的能力在自动飞行相关的感知任务上。与传统的航空航天软件不同,机器学习模型的参数不是手动编码也不是从物理学 derivated,而是从数据学习。在训练阶段,它们的值会自动调整,并且通常不符合物理要求。因此,要证明这些模型的安全性是一个大的挑战。本文尝试解决这个问题,通过以下三个方法:1. 推动机器学习模型的内部工作和过程的启示,以便更好地理解它们是如何工作的。2. 正式确立机器学习过程的理论保证,以确保它们可以在安全的情况下工作。3. 结合实际考虑因素,开发一个完整的证明框架,以便在安全关键的机器学习系统中保证安全性。我们的提议的框架是基于可扩展的统计验证器,可以独立于模型和工具。这使得它适用于各种业务场景。我们在视觉相关的着陆任务上进行了实践,并取得了良好的结果。
methods: 本方法基于tensor decompositions,并通过alternating direction method of multipliers(ADMM)算法实现了约束杂合约束(abundance sum-to-one)。此外,本研究还提出了在MultiHU-TD中 incorporate mathematical morphology和 neighboorhood patches的方法。
results: 实验表明,MultiHU-TD方法可以提供可解释的模型和分析结果,并且可以应用于实际高spectral图像分析 task。Python和MATLAB实现在GitHub上可用。Abstract
Hyperspectral unmixing allows representing mixed pixels as a set of pure materials weighted by their abundances. Spectral features alone are often insufficient, so it is common to rely on other features of the scene. Matrix models become insufficient when the hyperspectral image (HSI) is represented as a high-order tensor with additional features in a multimodal, multifeature framework. Tensor models such as canonical polyadic decomposition allow for this kind of unmixing but lack a general framework and interpretability of the results. In this article, we propose an interpretable methodological framework for low-rank multifeature hyperspectral unmixing based on tensor decomposition (MultiHU-TD) that incorporates the abundance sum-to-one constraint in the alternating optimization alternating direction method of multipliers (ADMM) algorithm and provide in-depth mathematical, physical, and graphical interpretation and connections with the extended linear mixing model. As additional features, we propose to incorporate mathematical morphology and reframe a previous work on neighborhood patches within MultiHU-TD. Experiments on real HSIs showcase the interpretability of the model and the analysis of the results. Python and MATLAB implementations are made available on GitHub.
摘要
hyperspectral unmixing可以将杂合像 Represented as a set of pure materials weighted by their abundances. spectral features alone are often insufficient, so it is common to rely on other features of the scene. Matrix models become insufficient when the hyperspectral image (HSI) is represented as a high-order tensor with additional features in a multimodal, multifeature framework. Tensor models such as canonical polyadic decomposition allow for this kind of unmixing but lack a general framework and interpretability of the results.在本文中,我们提出了一种可解释的方法oloyg framework for low-rank multifeature hyperspectral unmixing based on tensor decomposition (MultiHU-TD),该方法包括了积分权重的权重积分法ADMM算法中的积分总等于一个约束,并提供了深入的数学、物理和图形解释,以及与扩展线性混合模型的连接。此外,我们还提出了在MultiHU-TD中包含数学形态和 reformulate a previous work on neighborhood patches的方法。实验表明,该模型具有可解释性和分析结果的优点。Python和MATLAB实现在GitHub上提供。
Role of Spatial Coherence in Diffractive Optical Neural Networks
paper_authors: Matthew J. Filipovich, Aleksei Malyshev, A. I. Lvovsky
for: 这 paper 是用于研究Diffractive optical neural networks (DONNs) 的应用于计算机视觉任务中的快速和能效的信号处理方法。
methods: 这 paper 使用了数值方法来模拟 DONNs 在不同空间干涉度下的运行,并研究了这些方法的计算复杂度。
results: 研究发现,在完全无干涉照明下,DONN 的性能不能超过线性模型。 authors 还通过使用不同的空间干涉度来训练和评估 DONNs 在 MNIST 数据集上的表现。Abstract
Diffractive optical neural networks (DONNs) have emerged as a promising optical hardware platform for ultra-fast and energy-efficient signal processing for machine learning tasks, particularly in computer vision. However, previous experimental demonstrations of DONNs have only been performed using coherent light, which is not present in the natural world. Here, we study the role of spatial optical coherence in DONN operation. We propose a numerical approach to efficiently simulate DONNs under input illumination with arbitrary spatial coherence and discuss the corresponding computational complexity using coherent, partially coherent, and incoherent light. We also investigate the expressive power of DONNs and examine how coherence affects their performance. In particular, we show that under fully incoherent illumination, the DONN performance cannot surpass that of a linear model. As a demonstration, we train and evaluate simulated DONNs on the MNIST dataset of handwritten digits using light with varying spatial coherence.
摘要
干扰性光学神经网络(DONNs)已经出现为光学硬件平台,用于机器学习任务,特别是计算机视觉领域的快速和能效处理。然而,之前的实验性证明只使用了同步光,这不是自然界中存在的。在这里,我们研究了DONN操作中的空间光干扰的作用。我们提出了一种数字方法,用于高效地模拟DONNs,并对输入干扰的任意空间干扰进行计算复杂性分析。此外,我们还 investigate了DONN表达力的问题,并证明在完全不干扰的照明下,DONN性能不能超过线性模型。为 Proof of concept,我们使用了MNIST数据集的手写数字进行训练和评估,并使用不同的空间干扰来评估DONN的性能。
results: 实验表明,该方法可以在不牺牲空间分辨率的情况下,重建多达50个 spectral channel 图像。这种方法在成本、体积和时间频率等方面具有优势,可能用于开发一种可靠、高效的多спектраль摄像头。Abstract
This paper presents a Multispectral imaging (MSI) approach that combines the use of a diffractive optical element, and a deep learning algorithm for spectral reconstruction. Traditional MSI techniques often face challenges such as high costs, compromised spatial or spectral resolution, or prolonged acquisition times. In contrast, our methodology uses a single diffractive lens, a grayscale sensor, and an optical motor to capture the Multispectral image without sacrificing spatial resolution, however with some temporal domain redundancy. Through an experimental demonstration, we show how we can reconstruct up to 50 spectral channel images using diffraction physical theory and a UNet-based deep learning algorithm. This approach holds promise for a cost-effective, compact MSI camera that could be feasibly integrated into mobile devices.
摘要
paper_authors: Dun Yuan, Ekram Hossain, Di Wu, Xue Liu, Gregory Dudek
for: 提高3D激光通信的用户体验,增强虚拟空间内人员之间的交互。
methods: 利用移动边缘计算(MEC)服务器,实现最小化总延迟的3D激光通信。
results: 对比基eline方法,提出的算法显示出显著的延迟减少效果,并在AR应用中进行了实践。Abstract
3D holographic communication has the potential to revolutionize the way people interact with each other in virtual spaces, offering immersive and realistic experiences. However, demands for high data rates, extremely low latency, and high computations to enable this technology pose a significant challenge. To address this challenge, we propose a novel job scheduling algorithm that leverages Mobile Edge Computing (MEC) servers in order to minimize the total latency in 3D holographic communication. One of the motivations for this work is to prevent the uncanny valley effect, which can occur when the latency hinders the seamless and real-time rendering of holographic content, leading to a less convincing and less engaging user experience. Our proposed algorithm dynamically allocates computation tasks to MEC servers, considering the network conditions, computational capabilities of the servers, and the requirements of the 3D holographic communication application. We conduct extensive experiments to evaluate the performance of our algorithm in terms of latency reduction, and the results demonstrate that our approach significantly outperforms other baseline methods. Furthermore, we present a practical scenario involving Augmented Reality (AR), which not only illustrates the applicability of our algorithm but also highlights the importance of minimizing latency in achieving high-quality holographic views. By efficiently distributing the computation workload among MEC servers and reducing the overall latency, our proposed algorithm enhances the user experience in 3D holographic communications and paves the way for the widespread adoption of this technology in various applications, such as telemedicine, remote collaboration, and entertainment.
摘要
三维杂alomatic通信有可能对虚拟空间中人们之间的交互方式进行革命性的改变,提供 immerse 和 realistic 的经验。然而,实现这一技术的需求包括高数据速率、极低延迟和高计算能力,这些要求成为了一个 significante 挑战。为解决这一挑战,我们提议一种基于 Mobile Edge Computing(MEC)服务器的新的任务调度算法,以最小化3D杂alomatic通信中的总延迟。我们的一个动机是避免“uncanny valley”效应,这种效应可以在延迟缓慢了杂alomatic内容的渲染,导致用户体验更加不真实、不 engagising。我们的提议的算法动态地将计算任务分配给 MEC 服务器,考虑到网络条件、服务器计算能力和3D杂alomatic通信应用的需求。我们进行了广泛的实验来评估我们的算法在延迟减少方面的性能,结果表明我们的方法在其他基准方法上显著超越。此外,我们还提供了一个实际的AR应用场景,不仅说明了我们的算法的可行性,还强调了在 достиieving高质量杂alomatic视图的过程中减少延迟的重要性。通过有效地分配计算任务和减少总延迟,我们的提议的算法提高了3D杂alomatic通信中用户体验的质量,并为这种技术在各种应用,如 теле医、远程合作和娱乐等,开创了新的可能性。
Autoregressive Coefficients based Intelligent Protection of Transmission Lines Connected to Type-3 Wind Farms
methods: 本研究使用了适应范围复杂决策系统(Adaptive Fuzzy Inference System,AFIS)检测缺陷,使用了最小重复性最大相关性算法(Minimum Redundancy Maximum Relevance,MRMR)选择3相电流的AR系数。此外,使用了深度学习网络来监控缺陷检测、缺陷的位置和类型的检测。
results: 研究结果显示,提议的方案能够在不同的系统状态和配置下具有较高的检测精度和速度,并能够适应不同的缺陷型态、位置、时间、风速、变压器连接等因素。Abstract
Protective relays can mal-operate for transmission lines connected to doubly fed induction generator (DFIG) based large capacity wind farms (WFs). The performance of distance relays protecting such lines is investigated and a statistical model based intelligent protection of the area between the grid and the WF is proposed in this article. The suggested method employs an adaptive fuzzy inference system to detect faults based on autoregressive (AR) coefficients of the 3-phase currents selected using minimum redundancy maximum relevance algorithm. Deep learning networks are used to supervise the detection of faults, their subsequent localization, and classification. The effectiveness of the scheme is evaluated on IEEE 9-bus and IEEE 39-bus systems with varying fault resistances, fault inception times, locations, fault types, wind speeds, and transformer connections. Further, the impact of factors like the presence of type-4 WFs, double circuit lines, WF capacity, grid strength, FACTs devices, reclosing on permanent faults, power swings, fault during power swings, voltage instability, load encroachment, high impedance faults, evolving and cross-country faults, close-in and remote-end faults, CT saturation, sampling rate, data window size, synchronization error, noise, and semi-supervised learning are considered while validating the proposed scheme. The results show the efficacy of the suggested method in dealing with various system conditions and configurations while protecting the transmission lines that are connected to WFs.
摘要
保护关系可能会不正确地工作,当电力 transmission lines 连接到 doubly fed induction generator(DFIG)基于大容量风力电站(WF)时。本文 investigate 的 performance of distance relays protecting such lines and propose a statistical model based intelligent protection of the area between the grid and the WF. The suggested method employs an adaptive fuzzy inference system to detect faults based on autoregressive(AR)coefficients of the 3-phase currents selected using minimum redundancy maximum relevance algorithm. Deep learning networks are used to supervise the detection of faults, their subsequent localization, and classification.The effectiveness of the scheme is evaluated on IEEE 9-bus and IEEE 39-bus systems with varying fault resistances, fault inception times, locations, fault types, wind speeds, and transformer connections. Further, the impact of factors like the presence of type-4 WFs, double circuit lines, WF capacity, grid strength, FACTs devices, reclosing on permanent faults, power swings, fault during power swings, voltage instability, load encroachment, high impedance faults, evolving and cross-country faults, close-in and remote-end faults, CT saturation, sampling rate, data window size, synchronization error, noise, and semi-supervised learning are considered while validating the proposed scheme. The results show the efficacy of the suggested method in dealing with various system conditions and configurations while protecting the transmission lines that are connected to WFs.
Impact of Artificial Intelligence on Electrical and Electronics Engineering Productivity in the Construction Industry
results: 这篇论文的研究结果表明,人工智能可以大大提高建筑设计和建造的效率和产效,同时可以降低能源消耗和提高建筑安全性。Abstract
Artificial intelligence (AI) can revolutionize the development industry, primarily electrical and electronics engineering. By automating recurring duties, AI can grow productivity and efficiency in creating. For instance, AI can research constructing designs, discover capability troubles, and generate answers, reducing the effort and time required for manual analysis. AI also can be used to optimize electricity consumption in buildings, which is a critical difficulty in the construction enterprise. Via machines gaining knowledge of algorithms to investigate electricity usage patterns, AI can discover areas wherein power may be stored and offer guidelines for enhancements. This can result in significant value financial savings and reduced carbon emissions. Moreover, AI may be used to improve the protection of creation websites. By studying statistics from sensors and cameras, AI can locate capacity dangers and alert workers to take suitable action. This could help save you from injuries and accidents on production sites, lowering the chance for workers and enhancing overall safety in the enterprise. The impact of AI on electric and electronics engineering productivity inside the creation industry is enormous. AI can transform how we layout, build, and function buildings by automating ordinary duties, optimising electricity intake, and enhancing safety. However, ensuring that AI is used ethically and responsibly and that the advantages are shared fairly throughout the enterprise is essential.
摘要
人工智能(AI)可以革命化建筑业,特别是电气和电子工程。通过自动化重复任务,AI可以提高产出力和效率,从而提高建筑设计的创新能力。例如,AI可以研究建筑设计,发现能源问题和提供解决方案,从而减少人工分析的劳动和时间。此外,AI还可以优化建筑物业的能源消耗,从而解决建筑业中的重要问题。通过机器学习算法分析能源使用模式,AI可以发现能源的浪费和提供改进建议,从而实现重要的成本节省和减少碳排放。此外,AI还可以提高建筑工地的安全性。通过分析感知器和摄像头数据,AI可以检测潜在风险并警示工作人员采取适当行动,从而避免伤害和事故发生在建筑工地上,提高工人安全性和全面的安全性。AI对电气和电子工程产出力的影响是巨大的,可以改变我们的设计、建造和运营建筑的方式,自动化常见任务、优化能源消耗和提高安全性。然而,确保AI的使用是道德和责任的,并确保产业中的利益均衡分配是关键。
Digital Twin-Empowered Smart Attack Detection System for 6G Edge of Things Networks
results: 实现了高效、可靠和适应性的攻击检测,保障6G EoT网络安全性Abstract
As global Internet of Things (IoT) devices connectivity surges, a significant portion gravitates towards the Edge of Things (EoT) network. This shift prompts businesses to deploy infrastructure closer to end-users, enhancing accessibility. However, the growing EoT network expands the attack surface, necessitating robust and proactive security measures. Traditional solutions fall short against dynamic EoT threats, highlighting the need for proactive and intelligent systems. We introduce a digital twin-empowered smart attack detection system for 6G EoT networks. Leveraging digital twin and edge computing, it monitors and simulates physical assets in real time, enhancing security. An online learning module in the proposed system optimizes the network performance. Our system excels in proactive threat detection, ensuring 6G EoT network security. The performance evaluations demonstrate its effectiveness, robustness, and adaptability using real datasets.
摘要
globally, the number of Internet of Things (IoT) devices connecting to the internet is surging, and a significant portion of these devices are gravitating towards the Edge of Things (EoT) network. This shift is causing businesses to deploy infrastructure closer to end-users, which enhances accessibility. However, the growing EoT network is expanding the attack surface, making it necessary to implement robust and proactive security measures. Traditional solutions are insufficient against the dynamic threats posed by EoT, highlighting the need for proactive and intelligent systems.To address this need, we propose a digital twin-empowered smart attack detection system for 6G EoT networks. By leveraging digital twin and edge computing, the system monitors and simulates physical assets in real time, enhancing security. An online learning module in the proposed system optimizes network performance. Our system excels in proactive threat detection, ensuring the security of 6G EoT networks. Performance evaluations demonstrate its effectiveness, robustness, and adaptability using real datasets.
Human Respiration Detection Under Interference: Challenges and Solutions
paper_authors: Kehan Wu, Renqi Chen, Haiyu Wang, Guang Wu for: 本研究旨在探讨通用WiFi设备可以检测人类呼吸速率,但这些设备在日常生活中遇到人体运动干扰的情况下具有准确检测人呼吸的能力。methods: 本研究提出了一种专门为人呼吸检测设计的不活跃感知和通信系统,该系统在60.48GHz频率带内运行,能够在人体运动干扰的情况下检测人呼吸。研究人员使用训练 neural network 来实现人呼吸检测。results: 实验结果表明,该系统在人体运动干扰的情况下可以保持高于90%的人呼吸检测精度, provided that the sensing duration is adequate. 最后,研究人员 derivated一个分析模型来实现在10秒内计算呼吸速率。Abstract
Recent research has highlighted the detection of human respiration rate using commodity WiFi devices. Nevertheless, these devices encounter challenges in accurately discerning human respiration amidst the prevailing human motion interference encountered in daily life. To tackle this predicament, this paper introduces a passive sensing and communication system designed specifically for respiration detection in the presence of robust human motion interference. Operating within the 60.48GHz band, the proposed system aims to detect human respiration even when confronted with substantial human motion interference within close proximity. Subsequently, a neural network is trained using the collected data by us to enable human respiration detection. The experimental results demonstrate a consistently high accuracy rate over 90\% of the human respiration detection under interference, given an adequate sensing duration. Finally, an empirical model is derived analytically to achieve the respiratory rate counting in 10 seconds.
摘要
近期研究表明,可用商业 WiFi 设备探测人类呼吸速率。然而,这些设备在日常生活中遇到人体运动干扰的情况下减少呼吸速率的准确性。为解决这个问题,本文提出了一种特有的投射感知和通信系统,用于在人体运动干扰的情况下探测人类呼吸速率。该系统在60.48GHz频率带内运行,能够在近距离 confronted with substantial human motion interference 下探测人类呼吸速率,并且通过我们collected数据进行训练,以实现人类呼吸速率探测。实验结果表明,在90%的人类呼吸速率探测情况下,系统具有高度的精度和可靠性。最后,我们 derivation analytical model 用于计算呼吸速率 counting 在10秒内。
A Comprehensive Indoor Environment Dataset from Single-family Houses in the US
results: 本研究得到了一个包含室内环境因素的大量数据集,可以用于提高建筑能源消耗、occupant behavior、预测维护和其他相关领域的模型。Abstract
The paper describes a dataset comprising indoor environmental factors such as temperature, humidity, air quality, and noise levels. The data was collected from 10 sensing devices installed in various locations within three single-family houses in Virginia, USA. The objective of the data collection was to study the indoor environmental conditions of the houses over time. The data were collected at a frequency of one record per minute for a year, combining over 2.5 million records. The paper provides actual floor plans with sensor placements to aid researchers and practitioners in creating reliable building performance models. The techniques used to collect and verify the data are also explained in the paper. The resulting dataset can be employed to enhance models for building energy consumption, occupant behavior, predictive maintenance, and other relevant purposes.
摘要
文章描述了一个包含室内环境因素的数据集,包括温度、湿度、空气质量和噪声水平。数据来自于美国弗吉尼亚州三座单户住宅内的10个感知设备的收集, duration of one year, totaling over 2.5 million records. 文章提供了实际的 floor plan 和感知设备的安装位置,以帮助研究人员和实践者创建可靠的建筑性能模型。文章还介绍了数据收集和验证的技术,以及可以用于提高建筑能效消耗、occupant behavior、预测维护和其他相关目的的模型。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Integrated Communication, Sensing, and Computation Framework for 6G Networks
results: 本文提出的ICSAC框架可以提高IMN应用程序的可靠性、响应速度、感知信息的准确性和时效性、计算的隐私和安全性等方面的性能。同时,通过对关键实现技术的评估结果,表明ICSAC框架的可行性。Abstract
In the sixth generation (6G) era, intelligent machine network (IMN) applications, such as intelligent transportation, require collaborative machines with communication, sensing, and computation (CSC) capabilities. This article proposes an integrated communication, sensing, and computation (ICSAC) framework for 6G to achieve the reciprocity among CSC functions to enhance the reliability and latency of communication, accuracy and timeliness of sensing information acquisition, and privacy and security of computing to realize the IMN applications. Specifically, the sensing and communication functions can merge into unified platforms using the same transmit signals, and the acquired real-time sensing information can be exploited as prior information for intelligent algorithms to enhance the performance of communication networks. This is called the computing-empowered integrated sensing and communications (ISAC) reciprocity. Such reciprocity can further improve the performance of distributed computation with the assistance of networked sensing capability, which is named the sensing-empowered integrated communications and computation (ICAC) reciprocity. The above ISAC and ICAC reciprocities can enhance each other iteratively and finally lead to the ICSAC reciprocity. To achieve these reciprocities, we explore the potential enabling technologies for the ICSAC framework. Finally, we present the evaluation results of crucial enabling technologies to show the feasibility of the ICSAC framework.
摘要
在六代(6G)时期,智能机器网络(IMN)应用需要合作机器具有通信、感知、计算(CSC)能力。本文提出了一个集成通信、感知、计算(ICSAC)框架,以实现CSC功能之间的互相关联,提高通信的可靠性和延迟、感知信息获取的准确性和时间性、计算的隐私和安全性,实现IMN应用。具体来说,感知和通信功能可以合并到同一个平台上,使用同一个传输信号,并利用实时感知信息为智能算法提供优化通信网络的优先信息。这被称为计算力 Empowered 集成感知通信(ISAC)互相关联。此互相关联可以进一步提高分布计算的性能,采用网络感知能力的协助,并被称为感知力 Empowered 集成通信计算(ICAC)互相关联。上述 ISAC 和 ICAC 互相关联可以相互强化,最终导致ICSAC 框架。为实现这些互相关联,我们探讨了ICSAC 框架的可能的实现技术。最后,我们展示了关键实现技术的评估结果,以证明ICSAC 框架的可行性。
results: 研究发现,如果使用多个时间变化的抽象阈值序列,可以大大提高矩阵完成算法的性能。此外,该论文还提出了三种变体的OB-SVT算法,其中一种使用随机抽取的数据来减少运算空间的维度,从而加速了融合。Abstract
We explore the impact of coarse quantization on matrix completion in the extreme scenario of dithered one-bit sensing, where the matrix entries are compared with time-varying threshold levels. In particular, instead of observing a subset of high-resolution entries of a low-rank matrix, we have access to a small number of one-bit samples, generated as a result of these comparisons. In order to recover the low-rank matrix using its coarsely quantized known entries, we begin by transforming the problem of one-bit matrix completion (one-bit MC) with time-varying thresholds into a nuclear norm minimization problem. The one-bit sampled information is represented as linear inequality feasibility constraints. We then develop the popular singular value thresholding (SVT) algorithm to accommodate these inequality constraints, resulting in the creation of the One-Bit SVT (OB-SVT). Our findings demonstrate that incorporating multiple time-varying sampling threshold sequences in one-bit MC can significantly improve the performance of the matrix completion algorithm. In pursuit of achieving this objective, we utilize diverse thresholding schemes, namely uniform, Gaussian, and discrete thresholds. To accelerate the convergence of our proposed algorithm, we introduce three variants of the OB-SVT algorithm. Among these variants is the randomized sketched OB-SVT, which departs from using the entire information at each iteration, opting instead to utilize sketched data. This approach effectively reduces the dimension of the operational space and accelerates the convergence. We perform numerical evaluations comparing our proposed algorithm with the maximum likelihood estimation method previously employed for one-bit MC, and demonstrate that our approach can achieve a better recovery performance.
摘要
我们研究了粗糙量化对矩阵完成问题的影响,在极端情况下,当矩阵元素与时间变化的阈值进行比较。具体来说,我们不是直接观察矩阵中一小部分高分辨率的元素,而是因为这些比较而获得的一小部分一位数据。为了使用粗糙量化知道的矩阵元素来重建低维矩阵,我们将一位矩阵完成问题(one-bit MC)变换为核心 нор 最小化问题。一位数据被探测为矩阵元素的一位数据,则被表示为矩阵元素的线性不等制约。我们运用广泛的价值阈值分布(uniform、Gaussian、精确阈值),以提高matrix completion的性能。为了加速我们的提案的整合速度,我们导入三种OB-SVT的变体。其中一种是随机当地OB-SVT,它在每个迭代过程中使用简测数据,而不是使用完整的信息。这种方法可以将操作空间的维度降低,并加速整合速度。我们通过与之前用于一位MC的最大概率估计法进行比较,并证明了我们的方法可以实现更好的重建性能。
paper_authors: Dominik Klement, Mireia Diez, Federico Landini, Lukáš Burget, Anna Silnova, Marc Delcroix, Naohiro Tawara
for: 这 paper 是为了提出一种新的VBx参数更新方法,以优化泛化率和识别率。
methods: 这 paper 使用了泛化率模型、生成学习的可能性分析和极大似然估计来估算 x-vector 的分配。
results: Proof-of-concept 结果表明,这种方法可以自动找到最佳的hyperparameter,并与广泛的搜索比较相似的性能。此外,文章还表明,可以通过准确地训练 PLDA 进一步提高模型的性能。Abstract
Bayesian HMM clustering of x-vector sequences (VBx) has become a widely adopted diarization baseline model in publications and challenges. It uses an HMM to model speaker turns, a generatively trained probabilistic linear discriminant analysis (PLDA) for speaker distribution modeling, and Bayesian inference to estimate the assignment of x-vectors to speakers. This paper presents a new framework for updating the VBx parameters using discriminative training, which directly optimizes a predefined loss. We also propose a new loss that better correlates with the diarization error rate compared to binary cross-entropy $\unicode{x2013}$ the default choice for diarization end-to-end systems. Proof-of-concept results across three datasets (AMI, CALLHOME, and DIHARD II) demonstrate the method's capability of automatically finding hyperparameters, achieving comparable performance to those found by extensive grid search, which typically requires additional hyperparameter behavior knowledge. Moreover, we show that discriminative fine-tuning of PLDA can further improve the model's performance. We release the source code with this publication.
摘要
bayesian hmm clustering of x-vector sequences (VBx) 已成为发表和挑战中的标准基线模型。它使用 HMM 模型说话者转移,使用生成训练的概率Linear Discriminant Analysis (PLDA) 模型说话者分布,并使用 bayesian 推理来估算 x-vector 分配到说话者。本文提出了一种新的更新 VBx 参数使用推导式训练方法,直接优化预定的损失函数。我们还提出了一种新的损失函数,与干扰率更好地相关于 диари化错误率,相比于 binary cross-entropy,通常用于 диари化端到端系统的默认选择。三个 dataset (AMI, CALLHOME, DIHARD II) 的证明结果表明了方法的自动找到超参数的能力,与极大搜索的性能相当,通常需要额外的超参数行为知识。此外,我们还证明了 discriminative 细化 PLDA 可以进一步提高模型的性能。我们在发布了源代码。
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
results: 实验结果表明,提出的模型不仅实现了更高效的推理,还在多个任务上显示出超过原始HuBERT模型的性能提升。Abstract
Existing Self-Supervised Learning (SSL) models for speech typically process speech signals at a fixed resolution of 20 milliseconds. This approach overlooks the varying informational content present at different resolutions in speech signals. In contrast, this paper aims to incorporate multi-resolution information into speech self-supervised representation learning. We introduce a SSL model that leverages a hierarchical Transformer architecture, complemented by HuBERT-style masked prediction objectives, to process speech at multiple resolutions. Experimental results indicate that the proposed model not only achieves more efficient inference but also exhibits superior or comparable performance to the original HuBERT model over various tasks. Specifically, significant performance improvements over the original HuBERT have been observed in fine-tuning experiments on the LibriSpeech speech recognition benchmark as well as in evaluations using the Speech Universal PERformance Benchmark (SUPERB) and Multilingual SUPERB (ML-SUPERB).
摘要
当存在的自我监督学习(SSL)模型在语音处理中通常采用固定的20毫秒分辨率。这种方法忽略了语音信号中不同分辨率含有的不同信息内容。相比之下,本文提出了包含多resolution信息的语音自我监督表征学习模型。我们引入了层次结构的Transformer架构,并补充了HuBERT风格的遮盲预测目标,以处理多resolution语音信号。实验结果表明,提议的模型不仅实现了更高效的识别,还在不同任务上示出了superior或相当于原 HuBERT 模型的性能。具体来说,在LibriSpeech语音识别benchmark上的 fine-tuning实验和SUPERB和ML-SUPERB多语言识别benchmark上的评估中,提议的模型都表现出了显著的性能提升。
BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition
paper_authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie
for: 提高 code-switching 自动语音识别中的语言边界识别精度
methods: 使用 cross-layer language adapter 和 boundary-aware training 方法
results: 比基eline提高16.55%,降低 mixture error rateAbstract
Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these drawbacks, we propose a cross-layer language adapter and a boundary-aware training method, namely Boundary-Aware Mixture-of-Experts (BA-MoE). Specifically, we introduce language-specific adapters to separate language-specific representations and a unified gating layer to fuse representations within each encoder layer. Second, we compute language adaptation loss of the mean output of each language-specific adapter to improve the adapter module's language-specific representation learning. Besides, we utilize a boundary-aware predictor to learn boundary representations for dealing with language boundary confusion. Our approach achieves significant performance improvement, reducing the mixture error rate by 16.55\% compared to the baseline on the ASRU 2019 Mandarin-English code-switching challenge dataset.
摘要
基于权值的模型,利用语言专家提取语言特定表示效果地应用在代码切换自动语音识别中,已经获得了良好的成果。然而,在同一个语言中的相似发音可能会导致多语言模型的不准确的语言界面估计和不 efective 的语言模型。为了解决这些缺点,我们提议一种跨层语言适配器和语言界面训练方法,即边界意识的混合型专家(BA-MoE)。 Specifically,我们引入语言特定适配器来分离语言特定表示,并在每个编码层中引入统一的阻塞层来融合表示。其次,我们计算语言适配率的语言特定表示学习损失。此外,我们利用边界意识预测器来学习处理语言界面混乱的边界表示。我们的方法实现了显著的性能提升,在ASRU 2019年度的普通话英语混合语音挑战数据集上降低了混合错误率by 16.55%。
Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
results: 列表测试显示,该框架可以保持源样本的严重程度,同时模拟目标 speaker的语音特征。此外,还发现(a)疾病会影响 x-vector,但不是所有 speaker information 都会丢失,(b)根据严重性标签选择源 speaker 并不够。Abstract
In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source speakers based on severity labels alone is insufficient.
摘要
健康至病态语音转换(H2P-VC)是将健康语音转换为病态语音,保持语音特征的技术。这篇论文提出了改进过去的两阶段方法,在第一阶段创建合适严重度的语音,然后在第二阶段将说话者的身份转换,保持语音严重度。具体来说,我们提出了使用phonetic posteriorgrams(PPG)和全局风格标识符(GST)来改进第二阶段。此外,我们提供了一个新的数据集,包含健康和病态说话者的并行录音,允许更精确的评估。专业听众对这个框架进行了听测,表明框架可以保持源样本的严重度,同时模拟目标说话者的voice。此外,我们还证明了(a)疾病对x-vector产生影响,但不是所有说话者信息都会产生损失,(b)只根据严重度标签选择源说话者是不充分的。
Shaping the Epochal Individuality and Generality: The Temporal Dynamics of Uncertainty and Prediction Error in Musical Improvisation
methods: 这个研究使用了 HBSL 模型,分析了 456 首 Jazz improvvisation 音乐,从 1905 年到 2009 年,来自 78 位 Jazz 音乐家。
results: 研究发现,不同时期的音乐创作具有不同的时间特征,特别是在旋律和旋律序列方面,可以反映不同时期的文化和情感特征。此外,rhythm sequence 在不同时期中具有一致的不确定性。Abstract
Musical improvisation, much like spontaneous speech, reveals intricate facets of the improviser's state of mind and emotional character. However, the specific musical components that reveal such individuality remain largely unexplored. Within the framework of brain's statistical learning and predictive processing, this study examined the temporal dynamics of uncertainty and surprise (prediction error) in a piece of musical improvisation. This study employed the HBSL model to analyze a corpus of 456 Jazz improvisations, spanning 1905 to 2009, from 78 distinct Jazz musicians. The results indicated distinctive temporal patterns of surprise and uncertainty, especially in pitch and pitch-rhythm sequences, revealing era-specific features from the early 20th to the 21st centuries. Conversely, rhythm sequences exhibited a consistent degree of uncertainty across eras. Further, the acoustic properties remain unchanged across different periods. These findings highlight the importance of how temporal dynamics of surprise and uncertainty in improvisational music change over periods, profoundly influencing the distinctive methodologies artists adopt for improvisation in each era. Further, it is suggested that the development of improvisational music can be attributed to the brain's adaptive statistical learning mechanisms, which constantly refine internal models to mirror the cultural and emotional nuances of their respective epochs. This study unravels the evolutionary trajectory of improvisational music and highlights the nuanced shifts artists employ to resonate with the cultural and emotional landscapes of their times.
摘要
音乐即兴演奏,类似于互动式说话,揭示了即兴演奏者的心理状态和情感特征。然而,特定的音乐元素,即使在即兴演奏中,仍然未得到了足够的探索。本研究基于大脑的统计学学习和预测处理框架,研究了即兴演奏中的时间动态特征,包括不确定性和意外性(预测错误)。本研究使用HBSL模型分析了456首爵士音乐即兴演奏,覆盖1905-2009年间78位爵士音乐家的作品。结果显示,具有不同时期特点的不确定性和意外性,尤其在旋律和旋律Sequence中,而rhythmSequence则显示了不变的不确定性水平。此外,音响性质保持不变 across不同时期。这些发现提醒我们,即兴演奏音乐的时间动态特征在不同时期中发生了重大变化,这些变化对于艺术家采取的即兴演奏方法产生了深远的影响。此外,这些发现还表明了大脑的适应性统计学学习机制在即兴演奏音乐的发展中发挥了关键作用,这些机制不断地练习和修改内部模型,以适应不同时期的文化和情感特征。本研究揭示了即兴演奏音乐的演进历史,并指出了艺术家在不同时期中采取的细微调整,以满足不同时期的文化和情感风貌。
results: 官方挑战结果显示,我们的系统在自然性方面表现出色,在任务1和任务2中分别获得了第1名和第2名。进一步的ablation研究证明了我们的系统设计的有效性。Abstract
This paper introduces the T23 team's system submitted to the Singing Voice Conversion Challenge 2023. Following the recognition-synthesis framework, our singing conversion model is based on VITS, incorporating four key modules: a prior encoder, a posterior encoder, a decoder, and a parallel bank of transposed convolutions (PBTC) module. We particularly leverage Whisper, a powerful pre-trained ASR model, to extract bottleneck features (BNF) as the input of the prior encoder. Before BNF extraction, we perform pitch perturbation to the source signal to remove speaker timbre, which effectively avoids the leakage of the source speaker timbre to the target. Moreover, the PBTC module extracts multi-scale F0 as the auxiliary input to the prior encoder, thereby capturing better pitch variations of singing. We design a three-stage training strategy to better adapt the base model to the target speaker with limited target speaker data. Official challenge results show that our system has superior performance in naturalness, ranking 1st and 2nd respectively in Task 1 and 2. Further ablation justifies the effectiveness of our system design.
摘要
Translated into Simplified Chinese:这篇论文介绍了T23团队在2023年Singing Voice Conversion Challenge中提交的系统。我们的唱歌转换模型基于VITS框架,包括四个关键模块:一个先前编码器、一个后续编码器、一个解码器和一个平行折叠卷积(PBTC)模块。我们特别利用Whisper,一个强大预训练的ASR模型,提取瓶颈特征(BNF)作为先前编码器的输入。在提取BNF之前,我们对源信号进行滥声调整,以除去说话者特征,这有效地避免了来源说话者特征的泄露到目标。此外,PBTC模块提取多Scale F0作为先前编码器的辅助输入,以更好地捕捉唱歌中的抑干变化。我们设计了三个阶段的训练策略,以更好地适应基模型到目标说话者 WITH limited target speaker data。官方挑战结果显示,我们的系统在自然性方面表现出色,在任务1和任务2中分别排名第一和第二。进一步的ablation justify我们系统设计的有效性。
The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains
results: 发现法文文本至语音合成的两个子track有大量差异,而转换到歌唱voice的样本并不是我们所期望的那么难预测。Abstract
We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech. This year, we emphasize real-world and challenging zero-shot out-of-domain MOS prediction with three tracks for three different voice evaluation scenarios. Ten teams from industry and academia in seven different countries participated. Surprisingly, we found that the two sub-tracks of French text-to-speech synthesis had large differences in their predictability, and that singing voice-converted samples were not as difficult to predict as we had expected. Use of diverse datasets and listener information during training appeared to be successful approaches.
摘要
我们现在公布第二届语音MOS挑战赛,这是一个科学活动,旨在促进自动预测合成和处理speech的意见评分(MOS)的研究。本年,我们强调了真实世界和挑战性的零shot OUT-OF-DOMAIN MOS预测,并设置了三个不同的voice评估场景的三个 tracks。本赛事有来自行业和学术界的十支队伍参与,来自七个不同国家。我们很感奇,我们发现法语文本到语音合成的两个子track具有大量的预测性,而转换到歌唱样本并不如我们预期的那么难预测。使用多样化的数据集和听众信息 durante 训练显示出成功的方法。
paper_authors: Bryan Bo Cao, Abrar Alali, Hansi Liu, Nicholas Meegan, Marco Gruteser, Kristin Dana, Ashwin Ashok, Shubham Jain for:本研究旨在提高camera-based IoT应用程序中的人体跟踪功能,例如安全监测、智能城市交通安全提高、车辆到行人通信等。methods:本研究使用 transformer 模型来重建视觉 bounding box 轨迹,该模型可以更好地处理长期时间序列数据。results:研究结果显示, compared to 状态前的 X-Translator 模型, ViFiT 模型可以更好地重建视觉 bounding box 轨迹,MRFR 为 0.65,带宽减少率为 97.76%。Abstract
Tracking subjects in videos is one of the most widely used functions in camera-based IoT applications such as security surveillance, smart city traffic safety enhancement, vehicle to pedestrian communication and so on. In the computer vision domain, tracking is usually achieved by first detecting subjects with bounding boxes, then associating detected bounding boxes across video frames. For many IoT systems, images captured by cameras are usually sent over the network to be processed at a different site that has more powerful computing resources than edge devices. However, sending entire frames through the network causes significant bandwidth consumption that may exceed the system bandwidth constraints. To tackle this problem, we propose ViFiT, a transformer-based model that reconstructs vision bounding box trajectories from phone data (IMU and Fine Time Measurements). It leverages a transformer ability of better modeling long-term time series data. ViFiT is evaluated on Vi-Fi Dataset, a large-scale multimodal dataset in 5 diverse real world scenes, including indoor and outdoor environments. To fill the gap of proper metrics of jointly capturing the system characteristics of both tracking quality and video bandwidth reduction, we propose a novel evaluation framework dubbed Minimum Required Frames (MRF) and Minimum Required Frames Ratio (MRFR). ViFiT achieves an MRFR of 0.65 that outperforms the state-of-the-art approach for cross-modal reconstruction in LSTM Encoder-Decoder architecture X-Translator of 0.98, resulting in a high frame reduction rate as 97.76%.
摘要
“追踪目标在影像中是智能 IoT 应用中最广泛使用的功能,包括安全监控、智能城市交通安全增强、车辆与行人通信等。在计算机视觉领域中,追踪通常通过首先检测目标的矩形框,然后在影像帧之间将检测到的矩形框相互对应。但是,将整帧影像发送到网络上处理,可能会导致系统带宽限制超过。为解决这个问题,我们提出了 ViFiT,一个基于 transformer 的模型,可以从手机资料(IMU 和精确时间测量)中重建视觉矩形框之旅程。它利用 transformer 的能力更好地模型长期时间序列数据。ViFiT 在 Vi-Fi 数据集上进行评估,评估框架包括5个不同的实际世界场景,包括室内和室外环境。为了填补跟踪质量和影像带宽削减系统特性之间的差距,我们提出了一个新的评估框架,名为 Minimum Required Frames(MRF)和 Minimum Required Frames Ratio(MRFR)。ViFiT 在 MRFR 方面取得了0.65,超过了现有跨模式重建的 LSTM Encoder-Decoder 架构 X-Translator 的0.98,实现了高帧减少率为97.76%。”
Shielding the Unseen: Privacy Protection through Poisoning NeRF with Spatial Deformation
results: 我们在两个常见 NeRF benchmark 数据集上进行了广泛的测试,包括 29 个真实世界场景的高质量图像。我们的结果表明,我们的隐私保护方法在这些 benchmark 数据集上具有显著的降低 NeRF 性能的能力。此外,我们还证明了我们的方法可以适应不同的干扰强度和 NeRF 架构。这种研究为 NeRF 的潜在隐私风险提供了重要的认知,并强调了在开发 Robust 3D 场景重建算法时需要考虑隐私问题。我们的研究贡献到了负责任 AI 和生成机器学习领域,旨在保护用户隐私和尊重数字时代的创作权。Abstract
In this paper, we introduce an innovative method of safeguarding user privacy against the generative capabilities of Neural Radiance Fields (NeRF) models. Our novel poisoning attack method induces changes to observed views that are imperceptible to the human eye, yet potent enough to disrupt NeRF's ability to accurately reconstruct a 3D scene. To achieve this, we devise a bi-level optimization algorithm incorporating a Projected Gradient Descent (PGD)-based spatial deformation. We extensively test our approach on two common NeRF benchmark datasets consisting of 29 real-world scenes with high-quality images. Our results compellingly demonstrate that our privacy-preserving method significantly impairs NeRF's performance across these benchmark datasets. Additionally, we show that our method is adaptable and versatile, functioning across various perturbation strengths and NeRF architectures. This work offers valuable insights into NeRF's vulnerabilities and emphasizes the need to account for such potential privacy risks when developing robust 3D scene reconstruction algorithms. Our study contributes to the larger conversation surrounding responsible AI and generative machine learning, aiming to protect user privacy and respect creative ownership in the digital age.
摘要
在这篇论文中,我们介绍了一种新的方法保护用户隐私 against Neural Radiance Fields(NeRF)模型的生成能力。我们的新毒 poisoning 攻击方法会在观察到的视图中引入不可见的变化,却足够破坏 NeRF 重建3D场景的精度。为了实现这一点,我们开发了一种两级优化算法,包括基于 Projected Gradient Descent(PGD)的空间扭曲。我们对两个常用的 NeRF 测试集进行了广泛的测试,包括29个真实世界场景的高质量图像。我们的结果表明,我们的隐私保护方法可以在这些测试集中明显降低 NeRF 的性能。此外,我们还证明了我们的方法可以在不同的扰动强度和 NeRF 架构下进行适应。这项研究为 NeRF 的潜在隐私风险提供了重要的洞察,并触发了开发robust 3D场景重建算法时需要考虑的用户隐私问题。我们的研究贡献到了负责任AI和生成机器学习领域的大局,旨在保护用户隐私和尊重数字时代的创作权。
Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator
results: 该方法在 MICCAI 2023 低剂量计算机断层成像质量评估大奖赛中获得了第二名,并在挑战数据集上进一步提高了表现。Abstract
Lowering radiation dose per view and utilizing sparse views per scan are two common CT scan modes, albeit often leading to distorted images characterized by noise and streak artifacts. Blind image quality assessment (BIQA) strives to evaluate perceptual quality in alignment with what radiologists perceive, which plays an important role in advancing low-dose CT reconstruction techniques. An intriguing direction involves developing BIQA methods that mimic the operational characteristic of the human visual system (HVS). The internal generative mechanism (IGM) theory reveals that the HVS actively deduces primary content to enhance comprehension. In this study, we introduce an innovative BIQA metric that emulates the active inference process of IGM. Initially, an active inference module, implemented as a denoising diffusion probabilistic model (DDPM), is constructed to anticipate the primary content. Then, the dissimilarity map is derived by assessing the interrelation between the distorted image and its primary content. Subsequently, the distorted image and dissimilarity map are combined into a multi-channel image, which is inputted into a transformer-based image quality evaluator. Remarkably, by exclusively utilizing this transformer-based quality evaluator, we won the second place in the MICCAI 2023 low-dose computed tomography perceptual image quality assessment grand challenge. Leveraging the DDPM-derived primary content, our approach further improves the performance on the challenge dataset.
摘要
LOWERING RADIATION DOSE PER VIEW 和使用稀疏视图每次扫描是两种常见的 computed tomography (CT) 扫描模式,尽管经常导致噪声和扫描 artifacts distorted images 。干净图像质量评估 (BIQA) 努力evaluate perceptual quality 与放射学家所感知,这在提高低剂量 CT 重建技术方面扮演着重要的角色。一个有趣的方向是开发BIQA方法,模仿人类视觉系统 (HVS) 的运作特性。HVS 的 internal generative mechanism (IGM) 理论表明,HVS 活动地推导主要内容,以提高理解。在这种研究中,我们引入了一种创新的 BIQA 度量,模仿 HVS 的活动推导过程。首先,我们构建了一个 active inference module,实现为一种抑制扩散概率模型 (DDPM),以预测主要内容。然后,我们计算了噪声图像和主要内容之间的相互关系,并生成了异同地图。最后,我们将噪声图像和异同地图组合成一个多通道图像,并输入到一种基于 transformer 的图像质量评估器。很Remarkably,只使用这种基于 transformer 的质量评估器,我们在 MICCAI 2023 低剂量 computed tomography 感知图像质量评估大奖中获得第二名。利用 DDPM 生成的主要内容,我们的方法进一步提高了挑战数据集的性能。
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition
results: 该论文的实验结果表明,相比 CNN 模型,MoE 架构在 RWF 数据集上达到了 92.4% 的准确率。Abstract
Video violence recognition based on deep learning concerns accurate yet scalable human violence recognition. Currently, most state-of-the-art video violence recognition studies use CNN-based models to represent and categorize videos. However, recent studies suggest that pre-trained transformers are more accurate than CNN-based models on various video analysis benchmarks. Yet these models are not thoroughly evaluated for video violence recognition. This paper introduces a novel transformer-based Mixture of Experts (MoE) video violence recognition system. Through an intelligent combination of large vision transformers and efficient transformer architectures, the proposed system not only takes advantage of the vision transformer architecture but also reduces the cost of utilizing large vision transformers. The proposed architecture maximizes violence recognition system accuracy while actively reducing computational costs through a reinforcement learning-based router. The empirical results show the proposed MoE architecture's superiority over CNN-based models by achieving 92.4% accuracy on the RWF dataset.
摘要
视频暴力识别基于深度学习的研究担心精准 yet可扩展的人类暴力识别。目前大多数 state-of-the-art 视频暴力识别研究使用 CNN-based 模型来表示和分类视频。然而,最近的研究表明,预训练的 transformer 比 CNN-based 模型在多种视频分析指标上更加准确。然而,这些模型尚未对视频暴力识别进行全面的评估。本文介绍一种新的 transformer-based Mixture of Experts(MoE)视频暴力识别系统。通过将大视 transformer 和高效 transformer Architecture 智能组合,提议的系统不仅利用了视transformer 架构的优势,还减少了利用大视 transformer 的计算成本。提议的架构最大化暴力识别系统的准确性,同时活动减少计算成本通过 reinforcement learning-based 路由器。实验结果显示,提议的 MoE 架构超过 CNN-based 模型的92.4% 准确率在 RWF 数据集上。
Creating an Atlas of Normal Tissue for Pruning WSI Patching Through Anomaly Detection
paper_authors: Peyman Nejat, Areej Alsaafin, Ghazal Alabtah, Nneka Comfere, Aaron Mangold, Dennis Murphree, Patricija Zot, Saba Yasir, Joaquin J. Garcia, H. R. Tizhoosh
methods: 本研究提出了一种使用normal tissue samples WSIs建立”Atlas of normal tissue”,以消除normal histology的干扰和重复,提高WSIs的表达度。
results: 研究测试了该方法使用107个正常皮肤WSIs建立了normal atlas,并通过使用553个皮肤细胞癌WSIs和451个乳腺WSIs来验证该方法的有效性。结果表明,通过使用normal atlas可以将选择的WSIs减少30%-50%,同时保持同样的索引和搜索性能。Abstract
Patching gigapixel whole slide images (WSIs) is an important task in computational pathology. Some methods have been proposed to select a subset of patches as WSI representation for downstream tasks. While most of the computational pathology tasks are designed to classify or detect the presence of pathological lesions in each WSI, the confounding role and redundant nature of normal histology in tissue samples are generally overlooked in WSI representations. In this paper, we propose and validate the concept of an "atlas of normal tissue" solely using samples of WSIs obtained from normal tissue biopsies. Such atlases can be employed to eliminate normal fragments of tissue samples and hence increase the representativeness collection of patches. We tested our proposed method by establishing a normal atlas using 107 normal skin WSIs and demonstrated how established indexes and search engines like Yottixel can be improved. We used 553 WSIs of cutaneous squamous cell carcinoma (cSCC) to show the advantage. We also validated our method applied to an external dataset of 451 breast WSIs. The number of selected WSI patches was reduced by 30% to 50% after utilizing the proposed normal atlas while maintaining the same indexing and search performance in leave-one-patinet-out validation for both datasets. We show that the proposed normal atlas shows promise for unsupervised selection of the most representative patches of the abnormal/malignant WSI lesions.
摘要
修补 gigapixel整幕图像(WSIs)是计算 PATHOLOGY 中重要的任务。一些方法已经被提出来选择WSIs中的一 subset作为下游任务的表示。大多数计算 PATHOLOGY 任务是用来分类或检测每个 WSI 中的疾病肿瘤,但是通常忽略了正常组织的混合和重复性。在这篇论文中,我们提出了一种“正常组织图 Atlas”,使用来自正常组织biopsy中的WSIs来排除正常组织的块。这些图 Atlas 可以增加WSIs的表示性。我们测试了我们的提议方法,使用107个正常皮肤WSIs建立了正常图 Atlas,并示出了使用Yottixel等搜索引擎的改进。我们使用553个皮肤细胞癌(cSCC)WSIs示出了这种优势。我们还验证了我们的方法在451个乳腺WSIs中的外部数据集中。选择WSIs中的减少了30%-50%,保持了相同的索引和搜索性能,通过离开一个病人进行留下一个病人验证。我们表明,我们的正常图 Atlas 可以无监督的选择疾病肿瘤 WSI 中最有代表性的块。
Privacy-preserving Multi-biometric Indexing based on Frequent Binary Patterns
results: 实验结果表明,使用提议的多biometric标识系统可以将计算工作负担减少至约57%(索引三种生物 метриック特征)和53%(索引两种生物 метриック特征),同时提高基eline生物метриック系统的安全性和性能。Abstract
The development of large-scale identification systems that ensure the privacy protection of enrolled subjects represents a major challenge. Biometric deployments that provide interoperability and usability by including efficient multi-biometric solutions are a recent requirement. In the context of privacy protection, several template protection schemes have been proposed in the past. However, these schemes seem inadequate for indexing (workload reduction) in biometric identification systems. More specifically, they have been used in identification systems that perform exhaustive searches, leading to a degradation of computational efficiency. To overcome these limitations, we propose an efficient privacy-preserving multi-biometric identification system that retrieves protected deep cancelable templates and is agnostic with respect to biometric characteristics and biometric template protection schemes. To this end, a multi-biometric binning scheme is designed to exploit the low intra-class variation properties contained in the frequent binary patterns extracted from different types of biometric characteristics. Experimental results reported on publicly available databases using state-of-the-art Deep Neural Network (DNN)-based embedding extractors show that the protected multi-biometric identification system can reduce the computational workload to approximately 57\% (indexing up to three types of biometric characteristics) and 53% (indexing up to two types of biometric characteristics), while simultaneously improving the biometric performance of the baseline biometric system at the high-security thresholds. The source code of the proposed multi-biometric indexing approach together with the composed multi-biometric dataset, will be made available to the research community once the article is accepted.
摘要
大规模的人脸识别系统的开发,以保护报名人员的隐私为主要挑战。包括高效多种生物特征解决方案的生物metric部署已成为最新的要求。在隐私保护的情况下,过去有几种模板保护方案被提议,但这些方案在生物metric识别系统中实现了搜索性能的下降。为了解决这些限制,我们提议一种高效的隐私保护多种生物识别系统,可以恢复保证深度可逆模板。为此,我们设计了一种多种生物特征的集合方案,利用不同类型生物特征中的低内类变化性来实现高效的多种生物识别。实验结果,使用公共可用的数据库和当今最佳的深度神经网络(DNN)基 embedding抽取器,我们的受保护多种生物识别系统可以将计算工作负担减少到约57%( indexing 三种生物特征)和53%( indexing 两种生物特征),同时提高基eline生物系统的安全性reshold下的生物性能。我们将提交的文章中的多种生物 indexingapproach的源代码,以及组合的多种生物数据集,将在文章被接受后向研究社区开源。
Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models
results: 该研究的实验和评估结果表明,与现有方法相比,提出的机制能够更有效地生成3D一致的零shot视图合成结果。我们的项目页面是https://jianglongye.com/consistent123/Abstract
Zero-shot novel view synthesis (NVS) from a single image is an essential problem in 3D object understanding. While recent approaches that leverage pre-trained generative models can synthesize high-quality novel views from in-the-wild inputs, they still struggle to maintain 3D consistency across different views. In this paper, we present Consistent-1-to-3, which is a generative framework that significantly mitigate this issue. Specifically, we decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions. We design a scene representation transformer and view-conditioned diffusion model for performing these two stages respectively. Inside the models, to enforce 3D consistency, we propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information. Finally, we design a hierarchy generation paradigm to generate long sequences of consistent views, allowing a full 360 observation of the provided object image. Qualitative and quantitative evaluation over multiple datasets demonstrate the effectiveness of the proposed mechanisms against state-of-the-art approaches. Our project page is at https://jianglongye.com/consistent123/
摘要
<>Translate given text into Simplified Chinese.<> zeroshot 新视图合成 (NVS) 从单个图像是三维物体理解的关键问题。 而最近的方法通过使用预训练生成模型可以从野外输入中生成高质量的新视图,但它们仍然无法保持不同视图之间的3D一致性。 在这篇论文中,我们提出了 Consistent-1-to-3,它是一个生成框架,可以很好地解决这个问题。 Specifically,我们将 NVS 任务分解成两个阶段:(i)将观察到的区域转换到新视图,和(ii)描述不见的区域。 我们设计了场景表示变换和视角conditioned 填充模型来完成这两个阶段。 在模型中,我们提议使用 epipolor-guided 注意力来 incorporate 几何约束,并使用多视角注意力来更好地综合多视角信息。 最后,我们设计了一种层次生成方法,可以生成长序列的一致视图,允许对提供的对象图像进行360度的观察。 我们的项目页面是https://jianglongye.com/consistent123/。 Qualitative 和量化评估 над Multiple 数据集表明我们的机制与现状approaches的效果。
Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day
paper_authors: Yifan Jiang, Hao Tang, Jen-Hao Rick Chang, Liangchen Song, Zhangyang Wang, Liangliang Cao
for: novel view synthesis from a single image
methods: crafted timestep sampling strategy, superior 3D feature extractor, enhanced training scheme
results: reduced training time from 10 days to less than 1 day, significantly accelerating the training processAbstract
The task of novel view synthesis aims to generate unseen perspectives of an object or scene from a limited set of input images. Nevertheless, synthesizing novel views from a single image still remains a significant challenge in the realm of computer vision. Previous approaches tackle this problem by adopting mesh prediction, multi-plain image construction, or more advanced techniques such as neural radiance fields. Recently, a pre-trained diffusion model that is specifically designed for 2D image synthesis has demonstrated its capability in producing photorealistic novel views, if sufficiently optimized on a 3D finetuning task. Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs. To tackle this issue, we propose Efficient-3DiM, a simple but effective framework to learn a single-image novel-view synthesizer. Motivated by our in-depth analysis of the inference process of diffusion models, we propose several pragmatic strategies to reduce the training overhead to a manageable scale, including a crafted timestep sampling strategy, a superior 3D feature extractor, and an enhanced training scheme. When combined, our framework is able to reduce the total training time from 10 days to less than 1 day, significantly accelerating the training process under the same computational platform (one instance with 8 Nvidia A100 GPUs). Comprehensive experiments are conducted to demonstrate the efficiency and generalizability of our proposed method.
摘要
novel 视点合成的任务是生成对象或场景中未经见过的视角,但是从有限的输入图像中生成novel 视角仍然是计算机视觉领域中的一个重要挑战。先前的方法通过预测网格、多平面图像建构或更高级的技术如神经辐射场来解决这个问题。最近,一个特性化的扩散模型,专门为2D图像合成而设计,在生成高品质的novel 视角方面表现出色,但是需要训练大量的数据和模型参数,具有极高的计算成本和训练时间。为解决这个问题,我们提出了高效的3DiM框架,用于学习单图novel 视角合成器。我们的框架基于我们对扩散模型的推理过程进行了深入分析,并提出了一些实用的策略来降低训练负担,包括自定义时间步骤采样策略、更高级的3D特征提取器和改进的训练方案。当这些策略结合使用时,我们的框架可以在同样的计算平台上减少训练时间从10天降至1天以下,显著减少训练时间。我们进行了广泛的实验来证明我们的提议的效果和通用性。
Towards Domain-Specific Features Disentanglement for Domain Generalization
results: 我们在多个标准 benchmark dataset上进行了广泛的实验,并证明了我们的方法在比较其他状态的方法时显著超越。此外,视觉评估也证明了我们的方法可以有效地实现特征分解。Abstract
Distributional shift between domains poses great challenges to modern machine learning algorithms. The domain generalization (DG) signifies a popular line targeting this issue, where these methods intend to uncover universal patterns across disparate distributions. Noted, the crucial challenge behind DG is the existence of irrelevant domain features, and most prior works overlook this information. Motivated by this, we propose a novel contrastive-based disentanglement method CDDG, to effectively utilize the disentangled features to exploit the over-looked domain-specific features, and thus facilitating the extraction of the desired cross-domain category features for DG tasks. Specifically, CDDG learns to decouple inherent mutually exclusive features by leveraging them in the latent space, thus making the learning discriminative. Extensive experiments conducted on various benchmark datasets demonstrate the superiority of our method compared to other state-of-the-art approaches. Furthermore, visualization evaluations confirm the potential of our method in achieving effective feature disentanglement.
摘要
《分布Shift问题对现代机器学习算法提出了巨大挑战。Domain generalization(DG)成为了解决这个问题的流行途径,这些方法的目标是找到不同分布之间的通用特征。然而,现有的大多数优秀方法忽略了不相关的领域特征,这是DG的核心挑战。我们提出了一种基于对比的分解方法,即CDDG,以便利用分解后的特征来利用过looked域特征,从而促进DG任务中EXTRACT DESIRED CROSS-DOMAIN CATEGORY FEATURES的抽取。具体来说,CDDG通过在几何空间中利用相互排斥的特征来决定权重,从而使学习变得抽象。经验表明,我们的方法在各种 benchmark 数据集上显示出与其他现状顶峰方法相比的优越性。此外,视觉评估表明我们的方法在实现有效的特征分解方面具有潜在的潜力。》
COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking
results: 实验结果表明,COOLer可以逐渐学习新类型的跟踪,同时有效地避免过去类型的识别特征的混合。代码可以在https://github.com/BoSmallEar/COOLer上下载。Abstract
Continual learning allows a model to learn multiple tasks sequentially while retaining the old knowledge without the training data of the preceding tasks. This paper extends the scope of continual learning research to class-incremental learning for multiple object tracking (MOT), which is desirable to accommodate the continuously evolving needs of autonomous systems. Previous solutions for continual learning of object detectors do not address the data association stage of appearance-based trackers, leading to catastrophic forgetting of previous classes' re-identification features. We introduce COOLer, a COntrastive- and cOntinual-Learning-based tracker, which incrementally learns to track new categories while preserving past knowledge by training on a combination of currently available ground truth labels and pseudo-labels generated by the past tracker. To further exacerbate the disentanglement of instance representations, we introduce a novel contrastive class-incremental instance representation learning technique. Finally, we propose a practical evaluation protocol for continual learning for MOT and conduct experiments on the BDD100K and SHIFT datasets. Experimental results demonstrate that COOLer continually learns while effectively addressing catastrophic forgetting of both tracking and detection. The code is available at https://github.com/BoSmallEar/COOLer.
摘要
(简化中文)持续学习可以让模型在不同任务之间学习,而不需要前一个任务的训练数据。这篇文章扩展了持续学习研究的范围,探讨了多个目标跟踪(MOT)中的类增量学习,这对自动化系统来说是非常有优势。先前的持续学习对象检测器解决方案不会处理出现在跟踪器中的数据关联阶段,导致 previous class 的重新识别特征发生融合式忘记。我们介绍了 COOLer,一个基于对比学习和持续学习的跟踪器,可以逐渐学习新的类,同时保留过去知识。为了进一步增强实例表示的独立性,我们介绍了一种新的对比类增量实例表示学习技术。最后,我们提出了一个实用的持续学习 для MOT 评估协议,并在 BDD100K 和 SHIFT 数据集上进行了实验。实验结果表明,COOLer 可以持续学习,并有效地解决了跟踪和检测的融合式忘记。代码可以在 https://github.com/BoSmallEar/COOLer 上获取。
Reversing Deep Face Embeddings with Probable Privacy Protection
paper_authors: Daile Osorio-Roig, Paul A. Gerlitz, Christian Rathgeb, Christoph Busch
For: The paper aims to evaluate the effectiveness of soft-biometric privacy-enhancement approaches in protecting face embeddings, and to assess the vulnerability of state-of-the-art face embedding extractors to attacks that attempt to reconstruct the original face images.* Methods: The paper uses a well-known state-of-the-art face image reconstruction approach to evaluate the effectiveness of soft-biometric privacy protection methods. The authors also analyze the transformation complexity used for privacy protection and assess the vulnerability of state-of-the-art face embedding extractors to attacks.* Results: The paper shows that biometric privacy-enhanced face embeddings can be reconstructed with an accuracy of up to approximately 98%, depending on the complexity of the protection algorithm. This suggests that while soft-biometric privacy-enhancement approaches can provide some level of protection, they may not be sufficient to ensure complete privacy protection for face embeddings.Abstract
Generally, privacy-enhancing face recognition systems are designed to offer permanent protection of face embeddings. Recently, so-called soft-biometric privacy-enhancement approaches have been introduced with the aim of canceling soft-biometric attributes. These methods limit the amount of soft-biometric information (gender or skin-colour) that can be inferred from face embeddings. Previous work has underlined the need for research into rigorous evaluations and standardised evaluation protocols when assessing privacy protection capabilities. Motivated by this fact, this paper explores to what extent the non-invertibility requirement can be met by methods that claim to provide soft-biometric privacy protection. Additionally, a detailed vulnerability assessment of state-of-the-art face embedding extractors is analysed in terms of the transformation complexity used for privacy protection. In this context, a well-known state-of-the-art face image reconstruction approach has been evaluated on protected face embeddings to break soft biometric privacy protection. Experimental results show that biometric privacy-enhanced face embeddings can be reconstructed with an accuracy of up to approximately 98%, depending on the complexity of the protection algorithm.
摘要
通常,隐私增强面 recognition系统是设计来提供面嵌入的永久保护。最近,叫做软生物метри隐私增强方法已经被引入,以消除软生物метри特征。这些方法限制了从面嵌入中可以推断的软生物метри信息(性别或皮肤颜色)的量。先前的工作已经强调了评估隐私保护能力的严格评估和标准化评估协议的必要性。在这个背景下,本文探讨了方法,宣称提供软生物метри隐私保护,是否能够满足非逆性要求。此外,本文还进行了现有面嵌入抽取器的敏感性评估,并在保护算法的复杂度方面进行了分析。在这个上下文中,一种well-known的面图像重建方法被评估了在保护后的面嵌入上,以打砸软生物метри隐私保护。实验结果显示,具有隐私保护的面嵌入可以在98%的准确率下重建,具体取决于保护算法的复杂度。
Optimizing Key-Selection for Face-based One-Time Biometrics via Morphing
paper_authors: Daile Osorio-Roig, Mahdi Ghafourian, Christian Rathgeb, Ruben Vera-Rodriguez, Christoph Busch, Julian Fierrez
for: 提高face recognition系统对抗特殊攻击的安全性
methods: 提出了不同的钥匙选择策略来提高signal level的安全性
results: 实验结果表明, certain key selection strategies可以完全阻止特殊攻击,而对实际情况下的攻击 Success chance can be reduced to approximately 5.0%Abstract
Nowadays, facial recognition systems are still vulnerable to adversarial attacks. These attacks vary from simple perturbations of the input image to modifying the parameters of the recognition model to impersonate an authorised subject. So-called privacy-enhancing facial recognition systems have been mostly developed to provide protection of stored biometric reference data, i.e. templates. In the literature, privacy-enhancing facial recognition approaches have focused solely on conventional security threats at the template level, ignoring the growing concern related to adversarial attacks. Up to now, few works have provided mechanisms to protect face recognition against adversarial attacks while maintaining high security at the template level. In this paper, we propose different key selection strategies to improve the security of a competitive cancelable scheme operating at the signal level. Experimental results show that certain strategies based on signal-level key selection can lead to complete blocking of the adversarial attack based on an iterative optimization for the most secure threshold, while for the most practical threshold, the attack success chance can be decreased to approximately 5.0%.
摘要
现在,人脸识别系统仍然易受到敌意攻击。这些攻击可以是简单地修改输入图像,或者修改识别模型的参数,以便模拟授权者。所谓的隐私增强人脸识别系统主要是为了保护存储的生物特征数据(模板)。在文献中,隐私增强人脸识别方法主要集中在传统安全威胁上,忽视了增长的敌意攻击问题。到目前为止,只有一些工作提供了保护人脸识别 against敌意攻击的机制,同时保持高度安全的方法。在这篇论文中,我们提议了不同的钥匙选择策略,以提高竞争性下的 cancelable 方案的安全性。实验结果表明,基于信号水平的钥匙选择策略可以完全阻止敌意攻击,而在最实用的阈值上,攻击成功的机会可以降至约5.0%。
Fully Automatic Segmentation of Gross Target Volume and Organs-at-Risk for Radiotherapy Planning of Nasopharyngeal Carcinoma
methods: 我们提出了一种完全自动的框架,并开发了两种模型,一种用于45个Organs at Risk(OARs)的分割,另一种用于两个Gross Tumor Volumes(GTVs)的分割。我们使用了协调Intensity Distributions的预处理方法,然后自动对目标区域进行裁剪。
results: 我们在SegRap 2023挑战的验证阶段中,使用了这种方法获得了每个任务的第二名。我们的框架可以在https://github.com/Astarakee/segrap2023上获取。Abstract
Target segmentation in CT images of Head&Neck (H&N) region is challenging due to low contrast between adjacent soft tissue. The SegRap 2023 challenge has been focused on benchmarking the segmentation algorithms of Nasopharyngeal Carcinoma (NPC) which would be employed as auto-contouring tools for radiation treatment planning purposes. We propose a fully-automatic framework and develop two models for a) segmentation of 45 Organs at Risk (OARs) and b) two Gross Tumor Volumes (GTVs). To this end, we preprocess the image volumes by harmonizing the intensity distributions and then automatically cropping the volumes around the target regions. The preprocessed volumes were employed to train a standard 3D U-Net model for each task, separately. Our method took second place for each of the tasks in the validation phase of the challenge. The proposed framework is available at https://github.com/Astarakee/segrap2023
摘要
target segmentation in CT pictures of head and neck (H&N) area is difficult due to low contrast between adjacent soft tissue. the segrap 2023 challenge has been focused on benchmarking the segmentation algorithms of nasopharyngeal carcinoma (NPC) which would be employed as auto-contouring tools for radiation treatment planning purposes. we propose a fully-automatic framework and develop two models for a) segmentation of 45 organs at risk (OARs) and b) two gross tumor volumes (GTVs). to this end, we preprocess the image volumes by harmonizing the intensity distributions and then automatically cropping the volumes around the target regions. the preprocessed volumes were employed to train a standard 3D U-Net model for each task, separately. our method took second place for each of the tasks in the validation phase of the challenge. the proposed framework is available at https://github.com/Astarakee/segrap2023.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
results: 对于两个difficult datasets(i.e., SUN-RGBD和ScanNet),本文的方法实现了高效的novel object localization和classification。与最佳替换方法相比,本文的方法具有80%的mAP提升。代码和预训练模型在项目页面上发布。Abstract
Open-vocabulary 3D Object Detection (OV-3DDet) aims to detect objects from an arbitrary list of categories within a 3D scene, which remains seldom explored in the literature. There are primarily two fundamental problems in OV-3DDet, i.e., localizing and classifying novel objects. This paper aims at addressing the two problems simultaneously via a unified framework, under the condition of limited base categories. To localize novel 3D objects, we propose an effective 3D Novel Object Discovery strategy, which utilizes both the 3D box geometry priors and 2D semantic open-vocabulary priors to generate pseudo box labels of the novel objects. To classify novel object boxes, we further develop a cross-modal alignment module based on discovered novel boxes, to align feature spaces between 3D point cloud and image/text modalities. Specifically, the alignment process contains a class-agnostic and a class-discriminative alignment, incorporating not only the base objects with annotations but also the increasingly discovered novel objects, resulting in an iteratively enhanced alignment. The novel box discovery and crossmodal alignment are jointly learned to collaboratively benefit each other. The novel object discovery can directly impact the cross-modal alignment, while a better feature alignment can, in turn, boost the localization capability, leading to a unified OV-3DDet framework, named CoDA, for simultaneous novel object localization and classification. Extensive experiments on two challenging datasets (i.e., SUN-RGBD and ScanNet) demonstrate the effectiveness of our method and also show a significant mAP improvement upon the best-performing alternative method by 80%. Codes and pre-trained models are released on the project page.
摘要
Open-vocabulary 3D对象检测(OV-3DDet)目标是在3D场景中检测来自不同类别的对象,这是现有文献中少有探讨的领域。这个问题的两个基本问题是对novel对象进行本地化和分类。这篇论文提出了一种同时解决这两个问题的简单框架,即CoDA框架,其基于有限基础类别的前提下。为了本地化novel对象,我们提议一种有效的3D新对象发现策略,它利用3D盒子几何学预设和2D semantic开放词汇预设来生成 Pseudo box标签。为了类别novel对象盒子,我们进一步开发了基于发现的新盒子的cross-modal对接模块,以对Feature空间 между3D点云和图像/文本Modalities进行对接。特别是,对接过程包括无类别和类别特异对接,并利用基础对象的标注以及逐渐发现的新对象,进行迭代增强对接。新盒子发现和cross-modal对接 jointly learning,以便相互帮助。新对象发现可以直接影响cross-modal对接,而更好的对接可以提高本地化能力,从而实现一个简单的OV-3DDet框架,名为CoDA,用于同时本地化和分类novel对象。经验表明,我们的方法在SUN-RGBD和ScanNet两个难题 dataset上实现了显著的map提升,相比最佳替代方法,提升约80%。代码和预训练模型在项目页面上发布。
Adaptive Landmark Color for AUV Docking in Visually Dynamic Environments
for: 增加自主潜水器(AUV)的任务时间,提供了一个地方 для AUV 重新充电和接收新的任务信息。
methods: 使用适应色LED标记和动态色滤波来提高水质不同情况下的灯标可见度。AUV 和 docking station 都使用摄像头确定水背景色以计算需要的标记颜色,无需AUV和 docking station之间的通信。
results: 在池和湖中进行的实验表明,我们的方法比静态颜色阈值方法更好,随着背景颜色的变化。在清水情况下,DS 的探测范围为5米, false positives 很少。Abstract
Autonomous Underwater Vehicles (AUVs) conduct missions underwater without the need for human intervention. A docking station (DS) can extend mission times of an AUV by providing a location for the AUV to recharge its batteries and receive updated mission information. Various methods for locating and tracking a DS exist, but most rely on expensive acoustic sensors, or are vision-based, which is significantly affected by water quality. In this \doctype, we present a vision-based method that utilizes adaptive color LED markers and dynamic color filtering to maximize landmark visibility in varying water conditions. Both AUV and DS utilize cameras to determine the water background color in order to calculate the desired marker color. No communication between AUV and DS is needed to determine marker color. Experiments conducted in a pool and lake show our method performs 10 times better than static color thresholding methods as background color varies. DS detection is possible at a range of 5 meters in clear water with minimal false positives.
摘要
自主水下潜水器(AUV)可以在水下完成任务无需人类干预。一个岸站(DS)可以延长AUV的任务时间,提供AUV重新充电和更新任务信息的位置。许多找到和跟踪岸站的方法存在,但大多数依赖于昂贵的陀螺探测器,或者视觉基于方法,它受到水质的影响。在这篇文章中,我们提出了一种视觉基于的方法,利用适应色LED标记和动态色滤波提高标记颜色的可见性,并且不需AUV和DS之间进行交流,以适应不同的水质。我们在池和湖进行了实验,发现我们的方法在背景颜色变化时比静止颜色阈值方法好10倍。在清水情况下,DS的探测范围为5米,false positive少。
Graph data modelling for outcome prediction in oropharyngeal cancer patients
results: 研究获得了比较好的结果,并且与 GNN 和基eline linear models 进行比较,结果显示 PHGN 在时间-to-event 分析中表现更好。Abstract
Graph neural networks (GNNs) are becoming increasingly popular in the medical domain for the tasks of disease classification and outcome prediction. Since patient data is not readily available as a graph, most existing methods either manually define a patient graph, or learn a latent graph based on pairwise similarities between the patients. There are also hypergraph neural network (HGNN)-based methods that were introduced recently to exploit potential higher order associations between the patients by representing them as a hypergraph. In this work, we propose a patient hypergraph network (PHGN), which has been investigated in an inductive learning setup for binary outcome prediction in oropharyngeal cancer (OPC) patients using computed tomography (CT)-based radiomic features for the first time. Additionally, the proposed model was extended to perform time-to-event analyses, and compared with GNN and baseline linear models.
摘要
几何神经网络(GNNs)在医疗领域日益受欢迎,用于疾病分类和结果预测。由于病人数据不易 disponibles as a graph,大多数现有方法可以 manually define a patient graph, or learn a latent graph based on pairwise similarities between the patients。此外,hypergraph neural network (HGNN)-based methods were introduced recently to exploit potential higher order associations between the patients by representing them as a hypergraph。在这项工作中,我们提出了一个 patient hypergraph network (PHGN),在 inductive learning setup 中用于 binary outcome prediction in oropharyngeal cancer (OPC) patients using computed tomography (CT)-based radiomic features for the first time。此外,提出的模型还被扩展到进行时间到事件分析,并与 GNN 和基线线性模型进行比较。
results: 本研究发现,使用高阶神经元可以提高神经网络的学习能力,而且只需要添加 $n$ 个额外参数。此外,本研究还示出了一种使用 quadratic neurons 的减少参数神经网络模型,可以在 benchmark 分类 dataset 上达到更高的准确率。Abstract
Higher order artificial neurons whose outputs are computed by applying an activation function to a higher order multinomial function of the inputs have been considered in the past, but did not gain acceptance due to the extra parameters and computational cost. However, higher order neurons have significantly greater learning capabilities since the decision boundaries of higher order neurons can be complex surfaces instead of just hyperplanes. The boundary of a single quadratic neuron can be a general hyper-quadric surface allowing it to learn many nonlinearly separable datasets. Since quadratic forms can be represented by symmetric matrices, only $\frac{n(n+1)}{2}$ additional parameters are needed instead of $n^2$. A quadratic Logistic regression model is first presented. Solutions to the XOR problem with a single quadratic neuron are considered. The complete vectorized equations for both forward and backward propagation in feedforward networks composed of quadratic neurons are derived. A reduced parameter quadratic neural network model with just $ n $ additional parameters per neuron that provides a compromise between learning ability and computational cost is presented. Comparison on benchmark classification datasets are used to demonstrate that a final layer of quadratic neurons enables networks to achieve higher accuracy with significantly fewer hidden layer neurons. In particular this paper shows that any dataset composed of $C$ bounded clusters can be separated with only a single layer of $C$ quadratic neurons.
摘要
高等人造神经元 whose outputs are computed by applying an activation function to a higher-order multinomial function of the inputs have been considered in the past, but did not gain acceptance due to the extra parameters and computational cost. However, higher-order neurons have significantly greater learning capabilities since the decision boundaries of higher-order neurons can be complex surfaces instead of just hyperplanes. The boundary of a single quadratic neuron can be a general hyper-quadric surface, allowing it to learn many nonlinearly separable datasets. Since quadratic forms can be represented by symmetric matrices, only $\frac{n(n+1)}{2}$ additional parameters are needed instead of $n^2$. A quadratic Logistic regression model is first presented. Solutions to the XOR problem with a single quadratic neuron are considered. The complete vectorized equations for both forward and backward propagation in feedforward networks composed of quadratic neurons are derived. A reduced parameter quadratic neural network model with just $n$ additional parameters per neuron that provides a compromise between learning ability and computational cost is presented. Comparison on benchmark classification datasets are used to demonstrate that a final layer of quadratic neurons enables networks to achieve higher accuracy with significantly fewer hidden layer neurons. In particular, this paper shows that any dataset composed of $C$ bounded clusters can be separated with only a single layer of $C$ quadratic neurons.
Human-centric Behavior Description in Videos: New Benchmark and Model
results: 我们的方法在描述每个个体的行为方面达到了领先的 Result,并且可以准确地将个体与其行为相关联。Abstract
In the domain of video surveillance, describing the behavior of each individual within the video is becoming increasingly essential, especially in complex scenarios with multiple individuals present. This is because describing each individual's behavior provides more detailed situational analysis, enabling accurate assessment and response to potential risks, ensuring the safety and harmony of public places. Currently, video-level captioning datasets cannot provide fine-grained descriptions for each individual's specific behavior. However, mere descriptions at the video-level fail to provide an in-depth interpretation of individual behaviors, making it challenging to accurately determine the specific identity of each individual. To address this challenge, we construct a human-centric video surveillance captioning dataset, which provides detailed descriptions of the dynamic behaviors of 7,820 individuals. Specifically, we have labeled several aspects of each person, such as location, clothing, and interactions with other elements in the scene, and these people are distributed across 1,012 videos. Based on this dataset, we can link individuals to their respective behaviors, allowing for further analysis of each person's behavior in surveillance videos. Besides the dataset, we propose a novel video captioning approach that can describe individual behavior in detail on a person-level basis, achieving state-of-the-art results. To facilitate further research in this field, we intend to release our dataset and code.
摘要
在视频监控领域,描述每个人在视频中的行为变得越来越重要,特别是在多个人存在的复杂场景下。这是因为每个人的行为描述提供了更加细致的情况分析,使得可以准确评估和应对潜在风险,保证公共场所的安全和和谐。然而,现有的视频水平描述数据集不能提供每个人特定行为的细致描述。而且,仅仅是视频水平的描述无法提供深入的解释每个人的行为,使得准确确定每个人的特定身份变得困难。为解决这个挑战,我们构建了人类中心的视频监控描述数据集,该数据集包括7,820个人的动态行为描述。具体来说,我们在1,012个视频中标注了每个人的位置、衣物和场景中的其他元素之间的互动,这些人分布在1,012个视频中。基于这个数据集,我们可以将每个人的行为联系到其自己,使得可以在监控视频中进行每个人的行为分析。此外,我们提出了一种新的视频描述方法,可以在人类基础上详细描述每个人的行为,达到当前领域的状态作卷。为便于未来的研究,我们计划将数据集和代码发布。
A Grammatical Compositional Model for Video Action Detection
methods: 本研究提出了一种基于通用和或图的 grammatical compositional model (GCM),利用语法模型的compositional property和深度神经网络的表达能力,以提高行为分类的精度。
results: 实验表明,GCM模型在AVA数据集和Something-Else任务中表现出色,而且可以通过推理分析过程提高 interpretability。Abstract
Analysis of human actions in videos demands understanding complex human dynamics, as well as the interaction between actors and context. However, these interaction relationships usually exhibit large intra-class variations from diverse human poses or object manipulations, and fine-grained inter-class differences between similar actions. Thus the performance of existing methods is severely limited. Motivated by the observation that interactive actions can be decomposed into actor dynamics and participating objects or humans, we propose to investigate the composite property of them. In this paper, we present a novel Grammatical Compositional Model (GCM) for action detection based on typical And-Or graphs. Our model exploits the intrinsic structures and latent relationships of actions in a hierarchical manner to harness both the compositionality of grammar models and the capability of expressing rich features of DNNs. The proposed model can be readily embodied into a neural network module for efficient optimization in an end-to-end manner. Extensive experiments are conducted on the AVA dataset and the Something-Else task to demonstrate the superiority of our model, meanwhile the interpretability is enhanced through an inference parsing procedure.
摘要
要分析视频中人类行为,需要理解人类动态和人类与context的交互关系。然而,这些交互关系通常具有大量内类变化和细致的间类差异,使得现有方法表现有限。我们受到人类交互行为可以分解为actor动态和参与人或物品的观察的想法所 inspirited,我们提出了一种新的语法组合模型(GCM)。我们的模型利用人类行为的内在结构和隐藏关系,在层次结构中充分发挥语法模型的compositional性和深度学习模型的表达能力。该模型可以轻松地被嵌入到深度学习网络中,并通过端到端的优化来提高性能。我们在AVA数据集和Something-Else任务中进行了广泛的实验,并通过推理分析过程提高了可读性。
Multi-Resolution Fusion for Fully Automatic Cephalometric Landmark Detection
results: 在2023年首届 Cephalometric Landmark Detection in Lateral X-ray Images 挑战中,实现了1.62mm的 Mean Radial Error (MRE) 和2.0mm的 Success Detection Rate (SDR) 74.18%。Abstract
Cephalometric landmark detection on lateral skull X-ray images plays a crucial role in the diagnosis of certain dental diseases. Accurate and effective identification of these landmarks presents a significant challenge. Based on extensive data observations and quantitative analyses, we discovered that visual features from different receptive fields affect the detection accuracy of various landmarks differently. As a result, we employed an image pyramid structure, integrating multiple resolutions as input to train a series of models with different receptive fields, aiming to achieve the optimal feature combination for each landmark. Moreover, we applied several data augmentation techniques during training to enhance the model's robustness across various devices and measurement alternatives. We implemented this method in the Cephalometric Landmark Detection in Lateral X-ray Images 2023 Challenge and achieved a Mean Radial Error (MRE) of 1.62 mm and a Success Detection Rate (SDR) 2.0mm of 74.18% in the final testing phase.
摘要
依据广泛的数据观察和量化分析,我们发现了不同视觉特征场景下的不同影响因素。为了实现最佳的特征组合,我们采用了图像 pyramid 结构,将多个分辨率作为输入,并在不同的层次上训练多个模型,每个模型具有不同的视觉场景。此外,我们在训练过程中应用了多种数据增强技术,以提高模型在不同设备和测量方法下的Robustness。我们在 Cephalometric Landmark Detection in Lateral X-ray Images 2023 Challenge 中实现了一个 Mean Radial Error (MRE) 为 1.62 mm 和 Success Detection Rate (SDR) 2.0 mm 的 74.18%。
Magicremover: Tuning-free Text-guided Image inpainting with Diffusion Models
results: 与现状方法相比,MagicRemover显示出了显著的提升,特别是在高质量图像填充方面。经过了量化评估和用户调查,MagicRemover的性能在图像填充任务中具有显著优势。代码将在https://github.com/exisas/Magicremover上发布。Abstract
Image inpainting aims to fill in the missing pixels with visually coherent and semantically plausible content. Despite the great progress brought from deep generative models, this task still suffers from i. the difficulties in large-scale realistic data collection and costly model training; and ii. the intrinsic limitations in the traditionally user-defined binary masks on objects with unclear boundaries or transparent texture. In this paper, we propose MagicRemover, a tuning-free method that leverages the powerful diffusion models for text-guided image inpainting. We introduce an attention guidance strategy to constrain the sampling process of diffusion models, enabling the erasing of instructed areas and the restoration of occluded content. We further propose a classifier optimization algorithm to facilitate the denoising stability within less sampling steps. Extensive comparisons are conducted among our MagicRemover and state-of-the-art methods including quantitative evaluation and user study, demonstrating the significant improvement of MagicRemover on high-quality image inpainting. We will release our code at https://github.com/exisas/Magicremover.
摘要
Image Inpainting 目标是填充缺失像素为视觉一致和Semantically plausible的内容。尽管深度生成模型已经带来了很大的进步,但这项任务仍然受到以下两种困难的限制:一是大规模的真实数据收集和成本高昂的模型训练; 二是传统的用户定义的二进制面Masks on objects with unclear boundaries or transparent texture。在这篇论文中,我们提出 MagicRemover,一种不需要调整的方法,利用了强大的扩散模型进行文本引导的图像填充。我们提出了一种注意力引导策略,以制限扩散模型的采样过程,使得可以Erase instructed areas and restore occluded content。我们还提出了一种分类优化算法,以便在 fewer sampling steps 中提高图像填充的稳定性。我们对 MagicRemover 和现有方法进行了广泛的比较,包括量化评估和用户研究,并示出 MagicRemover 在高质量图像填充方面具有显著改进。我们将在 GitHub 上发布我们的代码,链接为 。
Delving into CLIP latent space for Video Anomaly Recognition
methods: 提出了一种新的方法AnomalyCLIP,利用Large Language and Vision(LLV)模型CLIP,并结合多个实例学习来实现视频异常检测和分类。
results: 对三个主要异常检测标准 datasets(ShanghaiTech、UCF-Crime和XD-Violence)进行比较,研究表明AnomalyCLIP在识别视频异常场景方面表现出色,比基elines表现更高。Abstract
We tackle the complex problem of detecting and recognising anomalies in surveillance videos at the frame level, utilising only video-level supervision. We introduce the novel method AnomalyCLIP, the first to combine Large Language and Vision (LLV) models, such as CLIP, with multiple instance learning for joint video anomaly detection and classification. Our approach specifically involves manipulating the latent CLIP feature space to identify the normal event subspace, which in turn allows us to effectively learn text-driven directions for abnormal events. When anomalous frames are projected onto these directions, they exhibit a large feature magnitude if they belong to a particular class. We also introduce a computationally efficient Transformer architecture to model short- and long-term temporal dependencies between frames, ultimately producing the final anomaly score and class prediction probabilities. We compare AnomalyCLIP against state-of-the-art methods considering three major anomaly detection benchmarks, i.e. ShanghaiTech, UCF-Crime, and XD-Violence, and empirically show that it outperforms baselines in recognising video anomalies.
摘要
我们面临着视频监测中的复杂问题,即在帧级别上检测和识别异常。我们介绍了一种新的方法——异常CLIP,这是首次将大语言和视觉(LLV)模型,如CLIP,与多个实例学习结合用于共同的视频异常检测和分类。我们的方法特点在于在CLIP特征空间中扭曲normal事件子空间,这使得我们可以有效地通过文本驱动的方式学习异常事件的指令。当异常帧被投影到这些指令上时,它们会表现出大的特征 магнитуد。我们还引入了一种高效的Transformer架构来模型帧之间的短期和长期 temporaldependencies,最终生成异常分数和分类概率。我们与状态的方法进行比较,考虑了三个主要的异常检测 benchmark,即上海工程大学、UCF-Crime和XD-Violence,并经验显示,AnomalyCLIP在识别视频异常的任务中表现出色。
All Sizes Matter: Improving Volumetric Brain Segmentation on Small Lesions
paper_authors: Ayhan Can Erdur, Daniel Scholz, Josef A. Buchner, Stephanie E. Combs, Daniel Rueckert, Jan C. Peeken for:* 这种研究旨在提高脑 метаstatic radiosurgery中的 lesion detection和 segmentation精度,以便更好地地图出脑中的多个恶性肿瘤。methods:* 这种方法使用了多种神经网络模型,包括 blob 损失函数、差分序列和专门针对小肿瘤模型。results:* 实验表明,使用 blob 损失函数和差分序列可以提高 segmentation 结果,但是包含专门针对小肿瘤模型的ensemble 会导致 segmentation 结果下降。此外,通过基于领域知识的后处理步骤可以提高结果的精度。Abstract
Brain metastases (BMs) are the most frequently occurring brain tumors. The treatment of patients having multiple BMs with stereo tactic radiosurgery necessitates accurate localization of the metastases. Neural networks can assist in this time-consuming and costly task that is typically performed by human experts. Particularly challenging is the detection of small lesions since they are often underrepresented in exist ing approaches. Yet, lesion detection is equally important for all sizes. In this work, we develop an ensemble of neural networks explicitly fo cused on detecting and segmenting small BMs. To accomplish this task, we trained several neural networks focusing on individual aspects of the BM segmentation problem: We use blob loss that specifically addresses the imbalance of lesion instances in terms of size and texture and is, therefore, not biased towards larger lesions. In addition, a model using a subtraction sequence between the T1 and T1 contrast-enhanced sequence focuses on low-contrast lesions. Furthermore, we train additional models only on small lesions. Our experiments demonstrate the utility of the ad ditional blob loss and the subtraction sequence. However, including the specialized small lesion models in the ensemble deteriorates segmentation results. We also find domain-knowledge-inspired postprocessing steps to drastically increase our performance in most experiments. Our approach enables us to submit a competitive challenge entry to the ASNR-MICCAI BraTS Brain Metastasis Challenge 2023.
摘要
脑肿瘤(BM)是最常见的脑肿瘤,它们的治疗需要精准地LOCAL化肿瘤。人工智能可以帮助在这种时间consuming和成本高的任务中提高精度。特别是检测小肿瘤是非常困难的,因为它们经常被忽略或排除在现有的方法中。然而,检测所有肿瘤的大小都很重要。在这种工作中,我们开发了一个ensemble of neural networks,专门用于检测和分割小BM。为了完成这项任务,我们训练了多个神经网络,每个神经网络都专注于各自的BM分割问题方面。我们使用blob损失,这种损失特别关注肿瘤实例中尺寸和文化的不均衡问题,因此不受大肿瘤的干扰。此外,我们还使用了一个基于T1和T1增强序列的抽象序列,专门用于低对比度肿瘤。此外,我们还训练了额外的小肿瘤模型。我们的实验表明,采用这些方法可以提高我们的性能。然而,包括特殊的小肿瘤模型在ensemble中会下降分割结果。我们还发现了基于领域知识的后处理步骤,可以帮助我们大幅提高性能。我们的方法使得我们能够提交竞争力强的挑战提交到ASNR-MICCAI BraTS脑肿瘤挑战2023。
CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity
results: 对于公共的3D检测标准 benchmarks DAIR-V2X-I和Rope3D以及私人的Supremind-Road数据集进行了广泛的实验,结果表明,CoBEV不仅实现了新的状态之精度,还显著提高了之前方法在长距离enario和摄像头干扰中的Robustness,并在不同的场景和摄像头参数下进行了大幅度的提高。此外,CoBEV还达到了在DAIR-V2X-I上的汽车AP分数80%的新纪录。Abstract
Roadside camera-driven 3D object detection is a crucial task in intelligent transportation systems, which extends the perception range beyond the limitations of vision-centric vehicles and enhances road safety. While previous studies have limitations in using only depth or height information, we find both depth and height matter and they are in fact complementary. The depth feature encompasses precise geometric cues, whereas the height feature is primarily focused on distinguishing between various categories of height intervals, essentially providing semantic context. This insight motivates the development of Complementary-BEV (CoBEV), a novel end-to-end monocular 3D object detection framework that integrates depth and height to construct robust BEV representations. In essence, CoBEV estimates each pixel's depth and height distribution and lifts the camera features into 3D space for lateral fusion using the newly proposed two-stage complementary feature selection (CFS) module. A BEV feature distillation framework is also seamlessly integrated to further enhance the detection accuracy from the prior knowledge of the fusion-modal CoBEV teacher. We conduct extensive experiments on the public 3D detection benchmarks of roadside camera-based DAIR-V2X-I and Rope3D, as well as the private Supremind-Road dataset, demonstrating that CoBEV not only achieves the accuracy of the new state-of-the-art, but also significantly advances the robustness of previous methods in challenging long-distance scenarios and noisy camera disturbance, and enhances generalization by a large margin in heterologous settings with drastic changes in scene and camera parameters. For the first time, the vehicle AP score of a camera model reaches 80% on DAIR-V2X-I in terms of easy mode. The source code will be made publicly available at https://github.com/MasterHow/CoBEV.
摘要
“路边摄像头驱动的3D对象检测是智能交通系统中的关键任务,可以扩展感知范围 beyond 视觉中心的车辆和提高道路安全性。而前一些研究受限于只使用深度或高度信息,我们发现深度和高度都是重要的,并且它们是 complementary。这种见解驱动我们开发了Complementary-BEV(CoBEV),一种基于端到端的单目3D对象检测框架,它将深度和高度集成到构建robust BEV表示中。具体来说,CoBEV每个像素的深度和高度分布,并将摄像头特征抬升到3D空间进行水平融合,使用我们提出的新的两stage complementary特征选择(CFS)模块。此外,我们还融合了BEV特征精炼框架,以进一步提高检测精度。我们在公共的3D检测标准 benchmarks 上进行了广泛的实验,包括DAIR-V2X-I和Rope3D,以及私人的Supremind-Road数据集,并证明了CoBEV不仅实现了新的状态机器的精度,而且也提高了前一些方法在远程场景和摄像头干扰下的Robustness,并增加了对不同场景和摄像头参数的适应性。此外,我们还首次实现了摄像头模型的车辆AP分数达到80%在DAIR-V2X-I中,在易模式下。我们将代码公开于https://github.com/MasterHow/CoBEV。”
paper_authors: Chengkang Shen, Hao Zhu, You Zhou, Yu Liu, Si Yi, Lili Dong, Weipeng Zhao, David J. Brady, Xun Cao, Zhan Ma, Yi Lin
For: This paper aims to improve the accuracy and efficiency of myocardial motion tracking in cardiac imaging, with the goal of early detection and prevention of Cardiovascular Diseases (CVDs).* Methods: The Neural Cardiac Motion Field (NeuralCMF) method uses implicit neural representation (INR) to model the 3D structure and comprehensive 6D forward/backward motion of the heart, without the need for paired datasets or self-supervised optimization.* Results: Experimental validations across three representative datasets demonstrate the robustness and innovative nature of the NeuralCMF, with significant advantages over existing state-of-the-art methods in cardiac imaging and motion tracking.Here is the information in Simplified Chinese text:
results: 三个示例数据集的实验验证表明NeuralCMF具有Robustness和创新性,与现有的心脏成像和动态跟踪方法相比具有显著优势。Abstract
Myocardial motion tracking stands as an essential clinical tool in the prevention and detection of Cardiovascular Diseases (CVDs), the foremost cause of death globally. However, current techniques suffer incomplete and inaccurate motion estimation of the myocardium both in spatial and temporal dimensions, hindering the early identification of myocardial dysfunction. In addressing these challenges, this paper introduces the Neural Cardiac Motion Field (NeuralCMF). NeuralCMF leverages the implicit neural representation (INR) to model the 3D structure and the comprehensive 6D forward/backward motion of the heart. This approach offers memory-efficient storage and continuous capability to query the precise shape and motion of the myocardium throughout the cardiac cycle at any specific point. Notably, NeuralCMF operates without the need for paired datasets, and its optimization is self-supervised through the physics knowledge priors both in space and time dimensions, ensuring compatibility with both 2D and 3D echocardiogram video inputs. Experimental validations across three representative datasets support the robustness and innovative nature of the NeuralCMF, marking significant advantages over existing state-of-the-arts in cardiac imaging and motion tracking.
摘要
LROC-PANGU-GAN: Closing the Simulation Gap in Learning Crater Segmentation with Planetary Simulators
paper_authors: Jaewon La, Jaime Phadke, Matt Hutton, Marius Schwinning, Gabriele De Canio, Florian Renk, Lars Kunze, Matthew Gadd for: 这个研究旨在提高探测器在外星表面上降落时能够预测和避免危险,例如峭壁或深潭可能会对探测器的降落和运作造成严重的风险。methods: 这个研究使用了深度学习模型,并将其训练在具有不同类型的数据集上,包括实际拍摄的LROC图像和模拟的PANGU图像。results: 研究结果显示,使用这种方法可以提高探测器降落时的危险预测性,并且在实际拍摄的LROC图像上实现了更好的分类性能。Abstract
It is critical for probes landing on foreign planetary bodies to be able to robustly identify and avoid hazards - as, for example, steep cliffs or deep craters can pose significant risks to a probe's landing and operational success. Recent applications of deep learning to this problem show promising results. These models are, however, often learned with explicit supervision over annotated datasets. These human-labelled crater databases, such as from the Lunar Reconnaissance Orbiter Camera (LROC), may lack in consistency and quality, undermining model performance - as incomplete and/or inaccurate labels introduce noise into the supervisory signal, which encourages the model to learn incorrect associations and results in the model making unreliable predictions. Physics-based simulators, such as the Planet and Asteroid Natural Scene Generation Utility, have, in contrast, perfect ground truth, as the internal state that they use to render scenes is known with exactness. However, they introduce a serious simulation-to-real domain gap - because of fundamental differences between the simulated environment and the real-world arising from modelling assumptions, unaccounted for physical interactions, environmental variability, etc. Therefore, models trained on their outputs suffer when deployed in the face of realism they have not encountered in their training data distributions. In this paper, we therefore introduce a system to close this "realism" gap while retaining label fidelity. We train a CycleGAN model to synthesise LROC from Planet and Asteroid Natural Scene Generation Utility (PANGU) images. We show that these improve the training of a downstream crater segmentation network, with segmentation performance on a test set of real LROC images improved as compared to using only simulated PANGU images.
摘要
为了使外星体上的探测器能够坚固地标识和避免障碍,例如峭壁或深沟,是非常重要的。现在,深度学习在这个问题上的应用已经显示出了扎实的结果。然而,这些模型通常是通过明确的监督来学习的,而这些由人 annotated的坑数据库,如Lunar Reconnaissance Orbiter Camera (LROC),可能缺乏一致性和质量,从而影响模型的性能。 incomplete和/或不准确的标签会把附加的噪声引入监督信号中,使模型学习错误的关联,导致模型的预测不可靠。相比之下,物理学习器,如Planet and Asteroid Natural Scene Generation Utility (PANGU),有完美的真实信息,因为它们使用的内部状态是known with exactness。然而,它们又存在真实世界和 simulate world之间的差异,这些差异来自模型假设、不计算的物理交互、环境变化等。因此,使用它们生成的输出来训练模型会导致模型在面对实际情况时表现不佳。为了关闭这个“现实”差异,我们因此引入一个系统,用于在保持标签准确性的情况下,将LROC从PANGU图像中生成Synthesize。我们显示,这些改进了下游坑分割网络的测试集性能。
Dynamic Shuffle: An Efficient Channel Mixture Method
results: 实验结果表明,与常见的点 wise 卷积相比,提出的方法可以减少数据中的重复性,并在图像分类 benchmark 上显著提高 ShuffleNet 的性能Abstract
The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is proportional to the square of the number of input channels, to make the generation process efficiently, we divide the channels into groups and generate two shared small permutation matrices for each group, and utilize Kronecker product and cross group shuffle to obtain the final permutation matrices. To make the generation process learnable, based on theoretical analysis, softmax, orthogonal regularization, and binarization are employed to asymptotically approximate the permutation matrix. Dynamic shuffle adaptively mixes channel information with negligible extra computation and memory occupancy. Experiment results on image classification benchmark datasets CIFAR-10, CIFAR-100, Tiny ImageNet and ImageNet have shown that our method significantly increases ShuffleNets' performance. Adding dynamic generated matrix with learnable static matrix, we further propose static-dynamic-shuffle and show that it can serve as a lightweight replacement of ordinary pointwise convolution.
摘要
卷积神经网络中的重复性不仅取决于权重,还取决于输入。混合是一种有效的操作,可以混合通道信息,但混合顺序通常是预定的。为了减少数据相关的重复性,我们提出了一种动态混合模块,可以生成数据相关的 permutation 矩阵。由于 permutation 矩阵的维度与输入通道数平方成正,为了使生成过程高效,我们将通道分为组,并生成每组两个共享的小 permutation 矩阵,然后使用 Kronecker 乘法和跨组混合来获得最终的 permutation 矩阵。为了让生成过程学习,我们采用了理论分析,使用 softmax、正则化和归一化来渐近地 aproximate permutation 矩阵。动态混合可以有效地混合通道信息,并且增加了 ShuffleNets 的性能。此外,我们还提出了静态-动态混合,可以作为普通点 wise 混合的轻量级替换。
SHOT: Suppressing the Hessian along the Optimization Trajectory for Gradient-Based Meta-Learning
results: 这篇论文通过实验证明SHOT算法可以对GBML中的Hessian进行隐藏,并且SHOT算法可以在标准的几个shot学习任务上表现更好。Abstract
In this paper, we hypothesize that gradient-based meta-learning (GBML) implicitly suppresses the Hessian along the optimization trajectory in the inner loop. Based on this hypothesis, we introduce an algorithm called SHOT (Suppressing the Hessian along the Optimization Trajectory) that minimizes the distance between the parameters of the target and reference models to suppress the Hessian in the inner loop. Despite dealing with high-order terms, SHOT does not increase the computational complexity of the baseline model much. It is agnostic to both the algorithm and architecture used in GBML, making it highly versatile and applicable to any GBML baseline. To validate the effectiveness of SHOT, we conduct empirical tests on standard few-shot learning tasks and qualitatively analyze its dynamics. We confirm our hypothesis empirically and demonstrate that SHOT outperforms the corresponding baseline. Code is available at: https://github.com/JunHoo-Lee/SHOT
摘要
在这篇论文中,我们假设Gradient-based Meta-Learning(GBML)内loop中隐式地抑制梯度。基于这个假设,我们提出了一种名为SHOT(抑制优化轨迹上的梯度)的算法,该算法的目的是将目标和参考模型参数的距离尽可能小,以抑制内loop中的梯度。尽管处理高阶项,SHOT不会增加基eline模型的计算复杂度很多。它是GBML任何基eline模型的不可知数,可以应用于任何GBML基eline模型。为了验证SHOT的效果,我们进行了标准的少数shot学习任务的实验,并质量分析其动态。我们确认了我们的假设,并证明SHOT在对应的基eline模型上表现出色。代码可以在以下链接中找到:https://github.com/JunHoo-Lee/SHOT。
Condition numbers in multiview geometry, instability in relative pose estimation, and RANSAC
results: 本研究发现,在某些世界场景下,5-和7-点最小问题会具有无限几何状况,即使没有外围值和足够的数据支持一个假设。此外,本研究还发现,使用RANSAC算法可以除除外围值,并且可以选择良好的数据,以避免无限几何状况。Abstract
In this paper we introduce a general framework for analyzing the numerical conditioning of minimal problems in multiple view geometry, using tools from computational algebra and Riemannian geometry. Special motivation comes from the fact that relative pose estimation, based on standard 5-point or 7-point Random Sample Consensus (RANSAC) algorithms, can fail even when no outliers are present and there is enough data to support a hypothesis. We argue that these cases arise due to the intrinsic instability of the 5- and 7-point minimal problems. We apply our framework to characterize the instabilities, both in terms of the world scenes that lead to infinite condition number, and directly in terms of ill-conditioned image data. The approach produces computational tests for assessing the condition number before solving the minimal problem. Lastly synthetic and real data experiments suggest that RANSAC serves not only to remove outliers, but also to select for well-conditioned image data, as predicted by our theory.
摘要
在本文中,我们提出了一种总体框架,用于分析多视图几何中的数值条件,使用计算代数和里曼几何工具。特别的动机来自于5点或7点随机抽样Consensus(RANSAC)算法在无异常情况下仍然失败的现象。我们 argue这些情况 arise due to the intrinsic instability of the 5- and 7-point minimal problems。我们应用了这种框架,来描述这些不稳定性,包括世界场景导致condition number为∞的情况,以及直接基于图像数据的ill-conditioned情况。该方法生成了Before solving the minimal problem的计算测试。最后,我们在 sintetic和实际数据 экспериментах中发现,RANSAC不仅可以 removing outliers,还可以选择良好的conditioned image data,与我们的理论相符。
Understanding Pan-Sharpening via Generalized Inverse
results: 该 paper 通过 synthetic 实验和实际数据实验表明,提出的方法比其他方法更好和更锐化,并且在实际实验中,下采样增强效果显著。Abstract
Pan-sharpening algorithm utilizes panchromatic image and multispectral image to obtain a high spatial and high spectral image. However, the optimizations of the algorithms are designed with different standards. We adopt the simple matrix equation to describe the Pan-sharpening problem. The solution existence condition and the acquirement of spectral and spatial resolution are discussed. A down-sampling enhancement method was introduced for better acquiring the spatial and spectral down-sample matrices. By the generalized inverse theory, we derived two forms of general inverse matrix formulations that can correspond to the two prominent classes of Pan-sharpening methods, that is, component substitution and multi-resolution analysis methods. Specifically, the Gram Schmidt Adaptive(GSA) was proved to follow the general inverse matrix formulation of component substitution. A model prior to the general inverse matrix of the spectral function was rendered. The theoretical errors are analyzed. Synthetic experiments and real data experiments are implemented. The proposed methods are better and sharper than other methods qualitatively in both synthetic and real experiments. The down-sample enhancement effect is shown of better results both quantitatively and qualitatively in real experiments. The generalized inverse matrix theory help us better understand the Pan-sharpening.
摘要
泛化简化算法使用泛chromatic图像和多spectral图像获得高空间和高spectral图像。然而,优化算法的标准不同。我们采用简单矩阵方程来描述泛化简化问题。解存问题和spectral和空间分辨率的获得是讨论的。我们引入了下采样提高方法以更好地获得空间和spectral下采样矩阵。通过总 inverse理论,我们 deriv了两种通用 inverse矩阵形式,可以与两种主要的泛化简化方法相匹配,即component substitute和多resolution分析方法。Specifically,Gram Schmidt Adaptive(GSA)被证明遵循了component substitute的通用 inverse矩阵形式。我们制定了spectral函数的模型先验。理论错误是分析。我们实现了synthetic和实际数据实验。提出的方法比其他方法更好和更加锐化,并且在实际实验中下采样提高效果明显。通过总 inverse矩阵理论,我们更好地理解泛化简化。
GETAvatar: Generative Textured Meshes for Animatable Human Avatars
results: 实验表明,GET Avatar可以在3D-意识人体生成中达到状态机器人表现,并且可以高效地生成512x512和1024x1024分辨率的图像。Abstract
We study the problem of 3D-aware full-body human generation, aiming at creating animatable human avatars with high-quality textures and geometries. Generally, two challenges remain in this field: i) existing methods struggle to generate geometries with rich realistic details such as the wrinkles of garments; ii) they typically utilize volumetric radiance fields and neural renderers in the synthesis process, making high-resolution rendering non-trivial. To overcome these problems, we propose GETAvatar, a Generative model that directly generates Explicit Textured 3D meshes for animatable human Avatar, with photo-realistic appearance and fine geometric details. Specifically, we first design an articulated 3D human representation with explicit surface modeling, and enrich the generated humans with realistic surface details by learning from the 2D normal maps of 3D scan data. Second, with the explicit mesh representation, we can use a rasterization-based renderer to perform surface rendering, allowing us to achieve high-resolution image generation efficiently. Extensive experiments demonstrate that GETAvatar achieves state-of-the-art performance on 3D-aware human generation both in appearance and geometry quality. Notably, GETAvatar can generate images at 512x512 resolution with 17FPS and 1024x1024 resolution with 14FPS, improving upon previous methods by 2x. Our code and models will be available.
摘要
我们研究3D意识整体人类生成问题,目标是创建可动人形模型,高质量文字和几何。通常,这个领域存在两个挑战:一是现有方法无法生成衣物皱纹等细节rich;二是通常利用volume radiance fields和神经渲染器进行synthesis,高分辨率渲染非常困难。为了解决这些问题,我们提出了GETavatar,一种生成模型,直接生成Explicit Textured 3D mesh,用于可动人形模型,具有真实的光照和细节。具体来说,我们首先设计了3D人体表示,并通过学习2D normal maps的3D扫描数据来增强生成的人体细节。其次,通过使用面积化渲染器,我们可以高效地渲染surface,并且可以 дости得高分辨率图像生成。我们的实验表明,GETavatar在3D意识人类生成中具有国际级表现,并且可以在512x512分辨率和1024x1024分辨率下生成图像,提高了前一代方法的2倍。我们的代码和模型将公开。
Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification
results: 研究结果显示,使用MFFormer可以更好地诊断心理疾病,比使用单一modalities或多modalities MRI更好。Abstract
Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs-fMRI data. In this work, we proposed a multi-dimension-embedding-aware modality fusion transformer (MFFormer) for schizophrenia and bipolar disorder classification using rs-fMRI and T1 weighted structural MRI (T1w sMRI). Concretely, to fully utilize the temporal information of rs-fMRI and spatial information of sMRI, we constructed a deep learning architecture that takes as input 2D time series of rs-fMRI and 3D volumes T1w. Furthermore, to promote intra-modality attention and information fusion across different modalities, a fusion transformer module (FTM) is designed through extensive self-attention of hybrid feature maps of multi-modality. In addition, a dimension-up and dimension-down strategy is suggested to properly align feature maps of multi-dimensional from different modalities. Experimental results on our private and public OpenfMRI datasets show that our proposed MFFormer performs better than that using a single modality or multi-modality MRI on schizophrenia and bipolar disorder diagnosis.
摘要
深度学习方法和神经成像技术在心理疾病诊断中发挥重要作用。前期研究主要利用功能连接矩阵的resting-state功能磁共振成像(rs-fMRI)作为输入, ainda需要充分利用rs-fMRI数据的时间序列信息。在这项工作中,我们提出了一种多维度嵌入感觉模型融合转换器(MFFormer)用于诊断偏头痛和mania病。具体来说,为了充分利用rs-fMRI的时间信息和T1磁共振成像(T1w sMRI)的空间信息,我们构建了一个深度学习架构,输入2D时间序列rs-fMRI和3D尺寸T1w sMRI。此外,为了促进不同模态之间的内部注意力和信息融合,我们设计了一个融合转换器模块(FTM),通过了rs-fMRI和sMRI的混合特征地图进行了广泛的自注意。此外,我们还提出了一种维度上下降策略,以适应不同模态的特征维度不同。实验结果表明,我们提出的MFFormer在我们自己的私人数据集和公共OpenfMRI数据集上比使用单一模态或多模态MRI更好地诊断偏头痛和mania病。
PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting
results: 实验结果显示,该方法在三个数据集上的降水预测准确率高于当前state-of-the-art方法,特别是在极端降水情况下表现出色,比传统数值天气预测方法提高15.6%, 17.4%, 和31.8%。Abstract
Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Numerical Weather Prediction (NWP) methods offers a more effective solution for improving forecasting accuracy. Despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events.
摘要
准确预测降水是科学和社会重要的挑战。数据驱动方法已成为解决这个挑战的广泛使用的解决方案。然而,凭借数据驱动方法alone做出的预测具有限制,因为它们无法模拟下面物理。将人工智能基于后处理技术与传统的数值天气预测(NWP)方法结合使用可以提高预测准确性。despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events.
MedPrompt: Cross-Modal Prompting for Multi-Task Medical Image Translation
results: 经验证明,我们的提案模型在五个数据集和四对模式之间具有最佳视觉质量和优秀的泛化能力。Abstract
Cross-modal medical image translation is an essential task for synthesizing missing modality data for clinical diagnosis. However, current learning-based techniques have limitations in capturing cross-modal and global features, restricting their suitability to specific pairs of modalities. This lack of versatility undermines their practical usefulness, particularly considering that the missing modality may vary for different cases. In this study, we present MedPrompt, a multi-task framework that efficiently translates different modalities. Specifically, we propose the Self-adaptive Prompt Block, which dynamically guides the translation network towards distinct modalities. Within this framework, we introduce the Prompt Extraction Block and the Prompt Fusion Block to efficiently encode the cross-modal prompt. To enhance the extraction of global features across diverse modalities, we incorporate the Transformer model. Extensive experimental results involving five datasets and four pairs of modalities demonstrate that our proposed model achieves state-of-the-art visual quality and exhibits excellent generalization capability.
摘要
医学多模态图像翻译是临床诊断中缺失模态数据的重要任务。然而,当前的学习基于技术有限于捕捉跨模态和全局特征,导致它们不适用于特定对比的情况。这种缺乏 universality 使得它们在实际应用中不太实用,特别是考虑到缺失的模态可能在不同的案例中不同。在本研究中,我们提出了 MedPrompt,一种多任务框架,可以效率地翻译不同的模态。具体来说,我们提出了自适应Prompt块,可以动态引导翻译网络向不同的模态。在这个框架中,我们引入了提取Prompt块和Prompt融合块,以高效地编码跨模态Prompt。为了增强不同模态之间的全局特征提取,我们采用了Transformer模型。我们对五个数据集和四对模态进行了广泛的实验,结果表明,我们提出的模型在视觉质量上达到了领先水平,并且具有优秀的泛化能力。
Active Visual Localization for Multi-Agent Collaboration: A Data-Driven Approach
paper_authors: Matthew Hanlon, Boyang Sun, Marc Pollefeys, Hermann Blum
for: 本研究旨在使用活动视觉地标定解决多机器人或人机合作中视点变化带来的挑战。
methods: 本研究比较了现有Literature中的方法,并提出了新的数据驱动方法。
results: 实验和实际应用中,数据驱动方法的性能较为出色,超过了现有方法的表现。Abstract
Rather than having each newly deployed robot create its own map of its surroundings, the growing availability of SLAM-enabled devices provides the option of simply localizing in a map of another robot or device. In cases such as multi-robot or human-robot collaboration, localizing all agents in the same map is even necessary. However, localizing e.g. a ground robot in the map of a drone or head-mounted MR headset presents unique challenges due to viewpoint changes. This work investigates how active visual localization can be used to overcome such challenges of viewpoint changes. Specifically, we focus on the problem of selecting the optimal viewpoint at a given location. We compare existing approaches in the literature with additional proposed baselines and propose a novel data-driven approach. The result demonstrates the superior performance of the data-driven approach when compared to existing methods, both in controlled simulation experiments and real-world deployment.
摘要
Here's the text in Simplified Chinese:而不是每个新部署的机器人都需要创建自己的环境地图,现在可以通过使用SLAM技术实现的设备的可用性提供了将其本地化在另一个机器人或设备的地图上的选项。在多机器人或人机合作等情况下,将所有代理人在同一个地图上本地化是必要的。但是,将地面机器人本地化在飞行器或头戴式混合现实(MR)头盔的地图上存在视点变化的挑战。本研究 investigate了如何使用活动视觉本地化解决视点变化问题。我们专注于选择给定位置的优化视点问题,并与文献中的现有方法进行比较,并提出了一种新的数据驱动方法。结果显示数据驱动方法在控制 simulate experiments 和实际部署中表现出色。
P2CADNet: An End-to-End Reconstruction Network for Parametric 3D CAD Model from Point Clouds
methods: 该方法首先提出了一种结构,它结合端点云特征提取器、CAD序列重建器和参数优化器。然后,为了在autoregressive方式下重建featured CAD模型,CAD序列重建器使用了两个transformer解码器,其中一个带有目标面罩和另一个没有。最后,为了更准确地预测参数,我们设计了一个参数优化器 WITH cross-attention机制来进一步细化CAD特征参数。
results: 我们在公共数据集上评估了P2CADNet,并 obtener了优秀的重建质量和准确性。根据我们所知,P2CADNet是首个基于端点云的end-to-end网络,可以被视为未来研究的基准。因此,我们在github上公开了源代码,访问https://github.com/Blice0415/P2CADNet。Abstract
Computer Aided Design (CAD), especially the feature-based parametric CAD, plays an important role in modern industry and society. However, the reconstruction of featured CAD model is more challenging than the reconstruction of other CAD models. To this end, this paper proposes an end-to-end network to reconstruct featured CAD model from point cloud (P2CADNet). Initially, the proposed P2CADNet architecture combines a point cloud feature extractor, a CAD sequence reconstructor and a parameter optimizer. Subsequently, in order to reconstruct the featured CAD model in an autoregressive way, the CAD sequence reconstructor applies two transformer decoders, one with target mask and the other without mask. Finally, for predicting parameters more precisely, we design a parameter optimizer with cross-attention mechanism to further refine the CAD feature parameters. We evaluate P2CADNet on the public dataset, and the experimental results show that P2CADNet has excellent reconstruction quality and accuracy. To our best knowledge, P2CADNet is the first end-to-end network to reconstruct featured CAD model from point cloud, and can be regarded as baseline for future works. Therefore, we open the source code at https://github.com/Blice0415/P2CADNet.
摘要
computer-aided design (CAD),特icularly feature-based parametric CAD,在现代工业和社会中扮演着重要的角色。然而,重建featured CAD模型比其他CAD模型更加困难。为此,这篇论文提出了一种终端网络来重建point cloud中的featured CAD模型(P2CADNet)。在提案的P2CADNet体系结构中,首先combines一个点云特征提取器、一个CAD序列重构器和一个参数优化器。然后,为了在自动回归方式下重建featured CAD模型,CAD序列重构器使用了两个变换器解码器,一个带有目标面罩和另一个没有罩。最后,为了更 precisely预测参数,我们设计了一个参数优化器 WITH cross-attention机制来进一步细化CAD特征参数。我们对公共数据集进行了测试,并得到了优秀的重建质量和准确性。根据我们所知,P2CADNet是首个以终端网络重建point cloud中的featured CAD模型,可以 serves as a benchmark for future works。因此,我们在github上公开了源代码(https://github.com/Blice0415/P2CADNet)。
Analyzing and Improving OT-based Adversarial Networks
paper_authors: Jaemoo Choi, Jaewoong Choi, Myungjoo Kang
for: 本文 targets solving the problem of generative modeling using Optimal Transport (OT) theory.
methods: 本文提出了一种 unified framework , combining OT-based adversarial methods to improve the performance of generative models. The authors also analyze the training dynamics of this framework and propose a novel method for gradual refinement of the generated distribution.
results: compared with previous best-performing OT-based models, the proposed approach achieves a FID score of 2.51 on CIFAR-10, outperforming unified OT-based adversarial approaches.Abstract
Optimal Transport (OT) problem aims to find a transport plan that bridges two distributions while minimizing a given cost function. OT theory has been widely utilized in generative modeling. In the beginning, OT distance has been used as a measure for assessing the distance between data and generated distributions. Recently, OT transport map between data and prior distributions has been utilized as a generative model. These OT-based generative models share a similar adversarial training objective. In this paper, we begin by unifying these OT-based adversarial methods within a single framework. Then, we elucidate the role of each component in training dynamics through a comprehensive analysis of this unified framework. Moreover, we suggest a simple but novel method that improves the previously best-performing OT-based model. Intuitively, our approach conducts a gradual refinement of the generated distribution, progressively aligning it with the data distribution. Our approach achieves a FID score of 2.51 on CIFAR-10, outperforming unified OT-based adversarial approaches.
摘要
SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent Text-to-3D
results: 我们的方法可以高效地解决多视点不一致问题,并保持2D扩散模型的细节和多样性。我们的方法在人工评估中获得了85+%的一致率,远高于前一代方法的30%。Abstract
It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation. 2D diffusion models solely learn view-agnostic priors and thus lack 3D knowledge during the lifting, leading to the multi-view inconsistency problem. We find that this problem primarily stems from geometric inconsistency, and avoiding misplaced geometric structures substantially mitigates the problem in the final outputs. Therefore, we improve the consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting, addressing the vast majority of the problem. This is achieved by fine-tuning the 2D diffusion model to be viewpoint-aware and to produce view-specific coordinate maps of canonically oriented 3D objects. In our process, only coarse 3D information is used for aligning. This "coarse" alignment not only resolves the multi-view inconsistency in geometries but also retains the ability in 2D diffusion models to generate detailed and diversified high-quality objects unseen in the 3D datasets. Furthermore, our aligned geometric priors (AGP) are generic and can be seamlessly integrated into various state-of-the-art pipelines, obtaining high generalizability in terms of unseen shapes and visual appearance while greatly alleviating the multi-view inconsistency problem. Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation, while many previous methods are around 30%. Our project page is https://sweetdreamer3d.github.io/
摘要
"天然地,将2D结果从预训练的扩散模型升级到3D世界进行文本到3D生成是具有内在困难的。2D扩散模型只是学习视角无关的假设,因此在升级时缺乏3D知识,导致多视图不一致问题。我们发现这个问题主要来自于几何不一致问题,避免误置的几何结构可以大幅减轻问题。因此,我们改进了一致性,通过将2D的几何假设与well-defined的3D形状进行对应,在升级过程中解决了大多数问题。这是通过练习2D扩散模型以便视角意识和生成视специ fic coordinate map来实现的。在我们的过程中,只使用了粗略的3D信息进行对应。这种"粗"对应不仅解决了多视图不一致的几何问题,还保留了2D扩散模型的详细和多样化高质量对象生成能力。此外,我们的对应几何假设(AGP)是通用的,可以轻松地与多种当前领先的管道集成,从而实现高通用性。我们的方法在人工评估中获得了85+%的一致率,而许多前一代方法只有30%左右。我们的项目页面是https://sweetdreamer3d.github.io/。"
ViT-ReciproCAM: Gradient and Attention-Free Visual Explanations for Vision Transformer
methods: 这 paper 使用了一种新的梯度自由 visual explanation 方法,叫做 ViT-ReciproCAM,不需要注意力矩阵和梯度信息。
results: 对比现有的 Relevance 方法,ViT-ReciproCAM 在 Average Drop-Coherence-Complexity (ADCC) metric 中表现出了 $4.58%$ 到 $5.80%$ 的提升,并生成了更加Localized 的焦点图。Abstract
This paper presents a novel approach to address the challenges of understanding the prediction process and debugging prediction errors in Vision Transformers (ViT), which have demonstrated superior performance in various computer vision tasks such as image classification and object detection. While several visual explainability techniques, such as CAM, Grad-CAM, Score-CAM, and Recipro-CAM, have been extensively researched for Convolutional Neural Networks (CNNs), limited research has been conducted on ViT. Current state-of-the-art solutions for ViT rely on class agnostic Attention-Rollout and Relevance techniques. In this work, we propose a new gradient-free visual explanation method for ViT, called ViT-ReciproCAM, which does not require attention matrix and gradient information. ViT-ReciproCAM utilizes token masking and generated new layer outputs from the target layer's input to exploit the correlation between activated tokens and network predictions for target classes. Our proposed method outperforms the state-of-the-art Relevance method in the Average Drop-Coherence-Complexity (ADCC) metric by $4.58\%$ to $5.80\%$ and generates more localized saliency maps. Our experiments demonstrate the effectiveness of ViT-ReciproCAM and showcase its potential for understanding and debugging ViT models. Our proposed method provides an efficient and easy-to-implement alternative for generating visual explanations, without requiring attention and gradient information, which can be beneficial for various applications in the field of computer vision.
摘要
Currently, state-of-the-art solutions for ViT rely on class agnostic Attention-Rollout and Relevance techniques, but these methods have limitations. In contrast, our proposed method does not require attention matrix and gradient information, making it more efficient and easier to implement.Our experiments show that ViT-ReciproCAM outperforms the state-of-the-art Relevance method in the Average Drop-Coherence-Complexity (ADCC) metric by 4.58% to 5.80% and generates more localized saliency maps. This demonstrates the effectiveness of our proposed method in understanding and debugging ViT models.Our method has the potential to be beneficial for various applications in the field of computer vision, as it provides an efficient and easy-to-implement alternative for generating visual explanations.
A Prototype-Based Neural Network for Image Anomaly Detection and Localization
results: 实验结果显示,ProtoAD可以与现有的方法比较,在两个工业异常检测数据集(MVTec AD和BTAD)上具有竞争的性能,同时具有更高的推断速度。Abstract
Image anomaly detection and localization perform not only image-level anomaly classification but also locate pixel-level anomaly regions. Recently, it has received much research attention due to its wide application in various fields. This paper proposes ProtoAD, a prototype-based neural network for image anomaly detection and localization. First, the patch features of normal images are extracted by a deep network pre-trained on nature images. Then, the prototypes of the normal patch features are learned by non-parametric clustering. Finally, we construct an image anomaly localization network (ProtoAD) by appending the feature extraction network with $L2$ feature normalization, a $1\times1$ convolutional layer, a channel max-pooling, and a subtraction operation. We use the prototypes as the kernels of the $1\times1$ convolutional layer; therefore, our neural network does not need a training phase and can conduct anomaly detection and localization in an end-to-end manner. Extensive experiments on two challenging industrial anomaly detection datasets, MVTec AD and BTAD, demonstrate that ProtoAD achieves competitive performance compared to the state-of-the-art methods with a higher inference speed. The source code is available at: https://github.com/98chao/ProtoAD.
摘要
图像异常检测和地图化可以不仅进行图像水平异常分类,还可以定位像素级异常区域。这一领域在不同领域的应用广泛,因此吸引了许多研究者的关注。这篇论文提出了一种基于原型的神经网络图像异常检测和地图化方法(ProtoAD)。首先,使用深度网络预训练的自然图像特征提取器来提取正常图像的补丁特征。然后,通过非 Parametric 聚类学习出正常补丁特征的核心。最后,我们将特征提取网络与 $L2$ 特征 нормализа、 $1\times1$ 卷积层、通道最大池化和减法操作组合成一个图像异常检测和地图化网络(ProtoAD)。我们使用核心作为 $1\times1$ 卷积层的核心,因此我们的神经网络不需要训练阶段,可以在终端到终端的方式进行异常检测和地图化。我们在 MVTec AD 和 BTAD 两个industrial anomaly detection 数据集上进行了广泛的实验,并证明了 ProtoAD 与状态机器的方法相比,具有更高的检测速度。源代码可以在 GitHub 上获取:https://github.com/98chao/ProtoAD。
AdaMerging: Adaptive Model Merging for Multi-Task Learning
results: 实验结果表明,Compared to当前状态的拟合算法,AdaMerging显示了11%的性能提升。此外,AdaMerging还在下游任务测试阶段展现出了更好的泛化能力和数据分布变化的鲁棒性。Abstract
Multi-task learning (MTL) aims to empower a model to tackle multiple tasks simultaneously. A recent development known as task arithmetic has revealed that several models, each fine-tuned for distinct tasks, can be directly merged into a single model to execute MTL without necessitating a retraining process using the initial training data. Nevertheless, this direct addition of models often leads to a significant deterioration in the overall performance of the merged model. This decline occurs due to potential conflicts and intricate correlations among the multiple tasks. Consequently, the challenge emerges of how to merge pre-trained models more effectively without using their original training data. This paper introduces an innovative technique called Adaptive Model Merging (AdaMerging). This approach aims to autonomously learn the coefficients for model merging, either in a task-wise or layer-wise manner, without relying on the original training data. Specifically, our AdaMerging method operates as an automatic, unsupervised task arithmetic scheme. It leverages entropy minimization on unlabeled test samples from the multi-task setup as a surrogate objective function to iteratively refine the merging coefficients of the multiple models. Our experimental findings across eight tasks demonstrate the efficacy of the AdaMerging scheme we put forth. Compared to the current state-of-the-art task arithmetic merging scheme, AdaMerging showcases a remarkable 11\% improvement in performance. Notably, AdaMerging also exhibits superior generalization capabilities when applied to unseen downstream tasks. Furthermore, it displays a significantly enhanced robustness to data distribution shifts that may occur during the testing phase.
摘要
多任务学习(MTL)目的是让模型同时处理多个任务。现有的发展是任务加法,这种方法可以将每个任务精心适应的模型直接合并到单个模型中,无需进行重新训练过程,使用初始训练数据。然而,直接添加模型通常会导致合并后模型的总性能下降。这种下降是因为多个任务之间可能存在冲突和复杂的相互关系。因此,需要一种更有效的模型合并方法,不需要使用初始训练数据。这篇论文提出了一种创新的技术——适应模型合并(AdaMerging)。这种方法通过自动学习模型合并的系数,而不是使用初始训练数据,来解决这个挑战。具体来说,我们的 AdaMerging 方法是一种自动、无监督的任务加法方案,通过测试样本的熵度最小化来迭代地优化模型合并系数。我们的实验结果表明,我们的 AdaMerging 方法在八个任务上具有remarkable的表现优势。相比之下,当前状态的技术任务加法合并方法,AdaMerging 显示了11%的性能提升。此外,AdaMerging 还展现出了更好的泛化能力,当应用于未见下游任务时。同时,它还在测试阶段可能发生数据分布变化时显示了显著的鲁棒性。
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
results: 通过对多种LVLM进行了广泛的实验和分析,本文发现了这些模型的优缺点,并提出了进一步改进的方法。Abstract
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs). Benefiting from the strong language backbones and efficient cross-modal alignment strategies, LVLMs exhibit surprising capabilities to perceive visual signals and perform visually grounded reasoning. However, the capabilities of LVLMs have not been comprehensively and quantitatively evaluate. Most existing multi-modal benchmarks require task-oriented input-output formats, posing great challenges to automatically assess the free-form text output of LVLMs. To effectively leverage the annotations available in existing benchmarks and reduce the manual effort required for constructing new benchmarks, we propose to re-formulate existing benchmarks into unified LVLM-compatible formats. Through systematic data collection and reformulation, we present the ReForm-Eval benchmark, offering substantial data for evaluating various capabilities of LVLMs. Based on ReForm-Eval, we conduct extensive experiments, thoroughly analyze the strengths and weaknesses of existing LVLMs, and identify the underlying factors. Our benchmark and evaluation framework will be open-sourced as a cornerstone for advancing the development of LVLMs.
摘要
近年来,大规模视语模型(LVLM)的发展做出了非常出色的进步。受到强大的语言脊梁和有效的跨模式对接策略的推动,LVLM表现出了对视觉信号的感知和视情理解的奇异能力。然而,LVLM的能力尚未得到全面和量化的评估。现有的多Modalbenchmark大多需要任务强调的输入输出格式,对自由文本输出的自动评估带来很大的挑战。为了有效利用现有benchmark的注释和减少新benchmark的手动劳动,我们提议将现有benchmark重新格式化为LVLM兼容的格式。通过系统性的数据收集和重新格式化,我们提出了ReForm-Evalbenchmark,提供了大量用于评估LVLM的数据。基于ReForm-Eval,我们进行了广泛的实验,对现有LVLM的优势和缺点进行了系统分析,并确定了下面因素。我们的benchmark和评估框架将被开源,成为LVLM的发展的基础。
Generalization in diffusion models arises from geometry-adaptive harmonic representation
results: 这两个DNN在几个训练图像后就能够学习出相似的分数函数,并且它们的去噪化性能几乎是最佳的。这表明DNN的架构和训练算法与数据分布的性质有很好的对预传。Abstract
High-quality samples generated with score-based reverse diffusion algorithms provide evidence that deep neural networks (DNN) trained for denoising can learn high-dimensional densities, despite the curse of dimensionality. However, recent reports of memorization of the training set raise the question of whether these networks are learning the "true" continuous density of the data. Here, we show that two denoising DNNs trained on non-overlapping subsets of a dataset learn nearly the same score function, and thus the same density, with a surprisingly small number of training images. This strong generalization demonstrates an alignment of powerful inductive biases in the DNN architecture and/or training algorithm with properties of the data distribution. We analyze these, demonstrating that the denoiser performs a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous image regions. We show that trained denoisers are inductively biased towards these geometry-adaptive harmonic representations by demonstrating that they arise even when the network is trained on image classes such as low-dimensional manifolds, for which the harmonic basis is suboptimal. Additionally, we show that the denoising performance of the networks is near-optimal when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic.
摘要
高品质样本通过分数逆扩散算法生成,证明深度神经网络(DNN)在去噪过程中学习高维概率分布,即使面临维度困难。然而,最近的报告表明,这些网络是否真的学习数据的连续概率分布?我们表明,两个不相交的subset中训练的denoising DNN,它们学习的scor函数几乎相同,因此学习的概率分布也几乎相同,并且很少需要训练图像。这种强大的泛化表明了DNN架构和/或训练算法与数据分布的对齐。我们分析这些,并证明denoiser在基于图像的基础上进行缩小操作,并且在抽象图像区域和抽象图像边缘的oscillating振荡结构。我们还证明,训练denoisers时,网络具有geometry-adaptive harmonic representation的强大适应性,即使网络被训练在低维抽象图像类型上。此外,我们还证明,训练在正规图像类型上的denoisers具有高效性,这些类型上的优化基是geometry-adaptive和振荡的。
SlowFormer: Universal Adversarial Patch for Attack on Compute and Energy Efficiency of Inference Efficient Vision Transformers
results: 实验结果表明,在使用了针对深度模型的可扩展计算方法时,可以通过粘贴一个小于8%的图像区域的抗击补丁来增加计算和功率消耗。此外,通过对模型进行标准的 adversarial 训练方法可以减少一些攻击的成功率。Abstract
Recently, there has been a lot of progress in reducing the computation of deep models at inference time. These methods can reduce both the computational needs and power usage of deep models. Some of these approaches adaptively scale the compute based on the input instance. We show that such models can be vulnerable to a universal adversarial patch attack, where the attacker optimizes for a patch that when pasted on any image, can increase the compute and power consumption of the model. We run experiments with three different efficient vision transformer methods showing that in some cases, the attacker can increase the computation to the maximum possible level by simply pasting a patch that occupies only 8\% of the image area. We also show that a standard adversarial training defense method can reduce some of the attack's success. We believe adaptive efficient methods will be necessary for the future to lower the power usage of deep models, so we hope our paper encourages the community to study the robustness of these methods and develop better defense methods for the proposed attack.
摘要
近些时间,深度模型的执行时间计算减少得非常多。这些方法可以降低计算需求和电力消耗。一些这些方法可以根据输入实例进行可靠缩放计算。我们发现这些模型受到一种通用敌意袋patch攻击,攻击者可以优化一个覆盖任何图像的袋patch,以提高模型的计算和电力消耗。我们进行了三种不同的高效视Transformer方法的实验,发现在某些情况下,攻击者可以通过覆盖图像8%的区域而增加计算到最大可能的水平。我们还发现了标准的对抗训练防御方法可以减少一些攻击的成功。我们认为适应性高效的方法将是未来的需要,因此我们希望我们的论文能够鼓励社区研究这些方法的Robustness和开发更好的防御方法。
ShaSTA-Fuse: Camera-LiDAR Sensor Fusion to Model Shape and Spatio-Temporal Affinities for 3D Multi-Object Tracking
paper_authors: Tara Sadjadpour, Rares Ambrus, Jeannette Bohg for: 这 paper 的目的是开发一个基于摄像头和雷达感知器的3D多对象跟踪(MOT)框架,以提高自动驾驶机器人在场景中安全导航的能力。methods: 该 paper 使用了一种新的摄像头-雷达融合方法,该方法可以将摄像头和雷达感知器中的信息融合以便优化关系学习,从而提高数据关联、跟踪生命周期管理、干扰消除、假阳性传播和跟踪信任度评估等方面的性能。results: 该 paper 在 nuScenes benchmark 上 achieved state-of-the-art 性能 amongst multimodal 3D MOT algorithms using CenterPoint detections。此外,paper 还提供了一种 novel fusion approach 和 a first-of-its-kind multimodal sequential track confidence refinement technique,以及一系列的ablative analysis 以证明摄像头感知器的添加对小、远对象的识别和跟踪具有重要的改进作用。Abstract
3D multi-object tracking (MOT) is essential for an autonomous mobile agent to safely navigate a scene. In order to maximize the perception capabilities of the autonomous agent, we aim to develop a 3D MOT framework that fuses camera and LiDAR sensor information. Building on our prior LiDAR-only work, ShaSTA, which models shape and spatio-temporal affinities for 3D MOT, we propose a novel camera-LiDAR fusion approach for learning affinities. At its core, this work proposes a fusion technique that generates a rich sensory signal incorporating information about depth and distant objects to enhance affinity estimation for improved data association, track lifecycle management, false-positive elimination, false-negative propagation, and track confidence score refinement. Our main contributions include a novel fusion approach for combining camera and LiDAR sensory signals to learn affinities, and a first-of-its-kind multimodal sequential track confidence refinement technique that fuses 2D and 3D detections. Additionally, we perform an ablative analysis on each fusion step to demonstrate the added benefits of incorporating the camera sensor, particular for small, distant objects that tend to suffer from the depth-sensing limits and sparsity of LiDAR sensors. In sum, our technique achieves state-of-the-art performance on the nuScenes benchmark amongst multimodal 3D MOT algorithms using CenterPoint detections.
摘要
三维多对象跟踪(MOT)是自动移动代理安全导航场景的关键。为了提高自动代理的感知能力,我们目标是开发一个三维MOT框架,并将摄像头和激光探测器信息进行融合。基于我们之前的激光仅工作,ShaSTA,我们提出了一种新的摄像头-激光融合方法,用于学习关系。这种方法生成了rich的感知信号,包括深度和远程对象的信息,以提高关系估计,从而改善数据关联、跟踪生命周期管理、假阳性除除、假阴性升级和跟踪信任分数精度。我们的主要贡献包括一种新的摄像头-激光感知信号融合方法,以及一种首次实现的多模态序列跟踪信任修复技术,可以融合2D和3D探测。此外,我们还进行了每个融合步骤的ablative分析,以显示在包括小、远对象的情况下,摄像头感知的added benefits。总的来说,我们的技术在nuScenes标准 benchmark上达到了多模态3D MOT算法中的状态 искусственный智能水平。
On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study
paper_authors: Liben Chen, Long Chen, Tian Ellison-Chen, Zhuoyuan Xu
for: 本研究旨在研究人类认知和计算机模型之间的关系,以便更好地理解计算机模型如何模仿人类认知。
methods: 本研究使用了survey方法 recording human thinking process,并比较了计算机模型的输出和注意力映射与人类的认知过程。
results: 研究发现,虽然计算机模型的结构和人类认知有相似之处,但它们仍然努力于认知推理。人类认知过程的分析可以指导未来的研究,并引入更多的认知能力到特征和结构设计中。Abstract
Visual Question Answering (VQA) is a challenging task that requires cross-modal understanding and reasoning of visual image and natural language question. To inspect the association of VQA models to human cognition, we designed a survey to record human thinking process and analyzed VQA models by comparing the outputs and attention maps with those of humans. We found that although the VQA models resemble human cognition in architecture and performs similarly with human on the recognition-level, they still struggle with cognitive inferences. The analysis of human thinking procedure serves to direct future research and introduce more cognitive capacity into modeling features and architectures.
摘要
视觉问答(VQA)是一项具有挑战性的任务,需要跨Modal的理解和自然语言问题的推理。为了了解人类认知与VQA模型之间的关系,我们设计了一份问卷,记录了人类思维过程,并对VQA模型进行比较和分析。我们发现,虽然VQA模型与人类认知结构相似,但它们仍然努力于认知推理。人类思维过程的分析可以指导未来的研究,并在模型特征和建筑中引入更多的认知能力。
A Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors
results: 对于一个自制的学生班级行为数据集(STSCB),与SlowFast模型相比,BDSTA模型可以提高学生行为分类检测的准确率8.94%。Abstract
Accurately detecting student behavior from classroom videos is beneficial for analyzing their classroom status and improving teaching efficiency. However, low accuracy in student classroom behavior detection is a prevalent issue. To address this issue, we propose a Spatio-Temporal Attention-Based Method for Detecting Student Classroom Behaviors (BDSTA). Firstly, the SlowFast network is used to generate motion and environmental information feature maps from the video. Then, the spatio-temporal attention module is applied to the feature maps, including information aggregation, compression and stimulation processes. Subsequently, attention maps in the time, channel and space dimensions are obtained, and multi-label behavior classification is performed based on these attention maps. To solve the long-tail data problem that exists in student classroom behavior datasets, we use an improved focal loss function to assign more weight to the tail class data during training. Experimental results are conducted on a self-made student classroom behavior dataset named STSCB. Compared with the SlowFast model, the average accuracy of student behavior classification detection improves by 8.94\% using BDSTA.
摘要
通过准确探测学生CLASSROOM行为来分析学生的CLASSROOM状态和改善教学效率是非常有利的。然而,学生CLASSROOM行为检测精度低的问题很普遍。为解决这个问题,我们提议一种基于Spatio-Temporal Attention的学生CLASSROOM行为检测方法(BDSTA)。首先,我们使用SlowFast网络生成视频中的动作和环境信息特征地图。然后,我们应用空间-时间注意力模块到特征地图中,包括信息聚合、压缩和刺激过程。最后,我们在时间、通道和空间维度中获得了注意力地图,并根据这些注意力地图进行多标签行为分类。为解决学生CLASSROOM行为数据集中的长尾数据问题,我们使用改进的焦点损失函数来在训练中对tail类数据分配更多的权重。我们在自己制作的学生CLASSROOM行为数据集名为STSCB上进行了实验,与SlowFast模型相比,BDSTA的学生行为分类精度提高了8.94%。
SCB-Dataset3: A Benchmark for Detecting Student Classroom Behavior
results: 本研究在评估SCB-dataset3时,使用YOLOv5、YOLOv7和YOLOv8算法实现了平均准确率(map)达80.3%。Abstract
The use of deep learning methods to automatically detect students' classroom behavior is a promising approach for analyzing their class performance and improving teaching effectiveness. However, the lack of publicly available datasets on student behavior poses a challenge for researchers in this field. To address this issue, we propose the Student Classroom Behavior dataset (SCB-dataset3), which represents real-life scenarios. Our dataset comprises 5686 images with 45578 labels, focusing on six behaviors: hand-raising, reading, writing, using a phone, bowing the head, and leaning over the table. We evaluated the dataset using the YOLOv5, YOLOv7, and YOLOv8 algorithms, achieving a mean average precision (map) of up to 80.3$\%$. We believe that our dataset can serve as a robust foundation for future research in student behavior detection and contribute to advancements in this field. Our SCB-dataset3 is available for download at: https://github.com/Whiffe/SCB-dataset
摘要
使用深度学习方法自动检测学生的教室行为是一种有前途的方法,可以分析学生的教学效果和提高教学质量。然而,公共可用的学生行为数据集存在一定的挑战,以至于研究人员在这个领域困难于进行研究。为解决这个问题,我们提出了学生教室行为数据集(SCB-dataset3),该数据集反映了实际生活场景。我们的数据集包括5686个图像和45578个标签,关注六种行为:手势提高、读书、写作、使用手机、垂头和抽ån表。我们使用YOLOv5、YOLOv7和YOLOv8算法进行评估,实现了平均准确率(map)的80.3%。我们认为,SCB-dataset3可以作为未来学生行为检测研究的坚实基础,为这一领域的进步做出贡献。SCB-dataset3可以在以下地址下载:https://github.com/Whiffe/SCB-dataset。
results: 在多个情况中测试了该框架,包括孔隙媒体传输、重力驱动流和各种弹性材料的质量变化。结果表明,保留前一些模型的知识并在当前模型中使用有价值的信息可以大幅提高模型的准确性。相比之下,没有父模型的模型需要9倍更多的数据来达到相同的准确性。这些结果表明了进步式知识传递的重要性和其对模型准确性的影响。Abstract
Data-driven modeling can suffer from a constant demand for data, leading to reduced accuracy and impractical for engineering applications due to the high cost and scarcity of information. To address this challenge, we propose a progressive reduced order modeling framework that minimizes data cravings and enhances data-driven modeling's practicality. Our approach selectively transfers knowledge from previously trained models through gates, similar to how humans selectively use valuable knowledge while ignoring unuseful information. By filtering relevant information from previous models, we can create a surrogate model with minimal turnaround time and a smaller training set that can still achieve high accuracy. We have tested our framework in several cases, including transport in porous media, gravity-driven flow, and finite deformation in hyperelastic materials. Our results illustrate that retaining information from previous models and utilizing a valuable portion of that knowledge can significantly improve the accuracy of the current model. We have demonstrated the importance of progressive knowledge transfer and its impact on model accuracy with reduced training samples. For instance, our framework with four parent models outperforms the no-parent counterpart trained on data nine times larger. Our research unlocks data-driven modeling's potential for practical engineering applications by mitigating the data scarcity issue. Our proposed framework is a significant step toward more efficient and cost-effective data-driven modeling, fostering advancements across various fields.
摘要
<>Translate the given text into Simplified Chinese.<>数据驱动模型可能会面临数据需求不断增加的问题,导致模型精度降低,不适合工程应用因为数据昂贵和珍贵。为解决这个挑战,我们提出了一种进行逐步减少的模型架构,以减少数据的需求并提高数据驱动模型的实用性。我们的方法通过门户来传递知识,类似于人类选择性使用有价值的知识而忽略无用信息。通过过滤先前训练过的模型中的有用信息,我们可以创建一个快速生成的代理模型,并且只需训练一个较小的数据集,却仍能达到高精度。我们在多个情况中测试了我们的框架,包括在孔隙媒体中的运输、重力驱动流和可塑性材料中的finite deformation。我们的结果表明,保留先前模型中的信息并利用有价值的信息可以显著提高当前模型的精度。我们的研究表明了进行逐步知识传递的重要性和其对模型精度的影响,并且我们的框架在训练样本数量九倍时表现比无父模型要好。我们的研究推动数据驱动模型在实际工程应用中的应用,解决了数据珍贵问题。我们提出的框架是数据驱动模型效率和成本下降的重要一步,推动各个领域的进步。
results: 研究发现,现有的方法尚未能够覆盖多媒体指令数据集的多样性,这限制了模型的任务普遍化能力。此外,研究发现当生成回答时,模型可能无法保持真实性和事实性。这些发现可以帮助研究人员和实践者更好地适用多媒体语言模型。Abstract
Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks. Recent research has introduced multimodal capabilities to LLMs by integrating independently pretrained vision encoders through model grafting. These multimodal variants undergo instruction tuning, similar to LLMs, enabling effective zero-shot generalization for multimodal tasks. This study conducts a comparative analysis of different multimodal instruction tuning approaches and evaluates their performance across a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not sufficiently address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.
摘要
现代大语言模型(LLM)在不同下游任务上显示出了可观的零shot泛化能力。最近的研究将多modal功能integrated into LLMs through model grafting, allowing for zero-shot generalization across multimodal tasks. This study compares different multimodal instruction tuning approaches and evaluates their performance on a range of tasks, including complex reasoning, conversation, image captioning, multiple-choice questions (MCQs), and binary classification. Through rigorous benchmarking and ablation experiments, we reveal key insights for guiding architectural choices when incorporating multimodal capabilities into LLMs. However, current approaches have limitations; they do not adequately address the need for a diverse multimodal instruction dataset, which is crucial for enhancing task generalization. Additionally, they overlook issues related to truthfulness and factuality when generating responses. These findings illuminate current methodological constraints in adapting language models for image comprehension and provide valuable guidance for researchers and practitioners seeking to harness multimodal versions of LLMs.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
results: 经过 experiments 表明,使用该方法可以生成高质量的3D面抽象数据集,并且可以提高现有3Dface reconstruction模型的重建精度。 code和数据集将于https://neuface-dataset.github.io/中公开发布。Abstract
We propose NeuFace, a 3D face mesh pseudo annotation method on videos via neural re-parameterized optimization. Despite the huge progress in 3D face reconstruction methods, generating reliable 3D face labels for in-the-wild dynamic videos remains challenging. Using NeuFace optimization, we annotate the per-view/-frame accurate and consistent face meshes on large-scale face videos, called the NeuFace-dataset. We investigate how neural re-parameterization helps to reconstruct image-aligned facial details on 3D meshes via gradient analysis. By exploiting the naturalness and diversity of 3D faces in our dataset, we demonstrate the usefulness of our dataset for 3D face-related tasks: improving the reconstruction accuracy of an existing 3D face reconstruction model and learning 3D facial motion prior. Code and datasets will be available at https://neuface-dataset.github.io.
摘要
我们提出了 NeuFace,一种基于神经网络优化的3D面积pseudo注解方法,用于视频中的人脸重建。尽管3D人脸重建方法已经做出了巨大的进步,但是在野外动态视频中生成可靠的3D人脸标注仍然是一个挑战。使用NeuFace优化方法,我们为大规模人脸视频annotated每个视图/-帧的准确和一致的3D人脸矩阵,并称之为NeuFace-数据集。我们通过对 Gradient分析进行 investigated如何使用神经重parameterization来重建图像对齐的 facial details on 3D矩阵。利用我们数据集中的自然和多样的3D人脸,我们示出了我们数据集在3D人脸相关任务中的用途:提高现有3D人脸重建模型的重建精度和学习3Dfacial motion prior。我们的代码和数据集将在https://neuface-dataset.github.io上公开。
Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions
For: 本研究旨在审查和比较基于深度学习的机器调度策略,以优化机器任务分配,遵循制造规则和作业特性。* Methods: 本研究使用深度学习算法,包括传统神经网络、编码器-解码器架构、图 neural networks 和metaheuristic算法,来解决机器调度问题。* Results: 研究发现,基于深度学习的机器调度策略可以快速计算,生成近似全局优解,并在不同的机器环境和作业特性下取得成功应用。但是,这些方法面临着实现复杂操作约束、可 configurable多目标优化、泛化、可扩展性、可读性和稳定性等挑战。Abstract
Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.
摘要
机器调度目标是优化作业分配到机器的优化问题,同时遵循制造规则和作业规定。这种优化导致了降低运营成本、改善客户需求满足度和提高生产效率。然而,机器调度问题是一个NP困难的 combinatorial problem。深度学习 Reinforcement Learning (DRL) 作为人工通用智能的关键组件,在不同领域如游戏和机器人等领域中已经展现出了扎实的应用潜力。自1995年以来,研究人员已经开始探索将 DRL 应用于机器调度问题。本文提供了 DRL 基于方法的全面回顾和比较,包括它们的方法学习、应用、优势和局限性。我们根据计算Component分类了这些方法:传统神经网络、编码器-解码器架构、图神经网络和metaheuristic算法。我们的回顾结论是,DRL 基于方法在计算速度和生成 Near-global 优化解决方案方面表现出色,超过了精确解决方法、规则和表示学习算法。这些 DRL 基于方法在不同的机器环境和作业特点下都得到了成功应用。然而,DRL 基于调度器面临着处理复杂的运营约束、可 configurable 多目标优化、泛化、扩展、可读性和可靠性等挑战。未来研究应该集中在解决这些挑战上。本文作为 DRL 基于机器调度的现状评估和研究漏洞的资源,同时也为专家和实践者提供了选择合适 DRL 方法的指南。
Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication
results: 研究发现,这个互动式沟通过程可以优化知识传递,使学生模型在下游任务上表现更好,并且比现有的传递技术表现更高。Abstract
Many recent breakthroughs in machine learning have been enabled by the pre-trained foundation models. By scaling up model parameters, training data, and computation resources, foundation models have significantly advanced the state-of-the-art in many applications. However, it is still an open question of how to use these models to perform downstream tasks efficiently. Knowledge distillation (KD) has been explored to tackle this challenge. KD transfers knowledge from a large teacher model to a smaller student model. While KD has been successful in improving student model performance, recent research has discovered that a powerful teacher does not necessarily lead to a powerful student, due to their huge capacity gap. In addition, the potential distribution shifts between the pre-training data and downstream tasks can make knowledge transfer in KD sub-optimal for improving downstream task performance. In this paper, we extend KD with an interactive communication process to help students of downstream tasks learn effectively from pre-trained foundation models. Our design is inspired by the way humans learn from teachers who can explain knowledge in a way that meets the students' needs. Specifically, we let each model (i.e., student and teacher) train two components: (1) an encoder encoding the model's hidden states to a message and (2) a decoder decoding any messages to its own hidden states. With encoder and decoder, not only can the teacher transfer rich information by encoding its hidden states, but also the student can send messages with information of downstream tasks to the teacher. Therefore, knowledge passing from teacher to student can be tailored to the student's capacity and downstream tasks' distributions. We conducted experiments on benchmark datasets to show that our communication mechanism outperforms state-of-the-art distillation techniques.
摘要
很多最近的机器学习突破口都是基于预训练基模型。通过扩大模型参数、训练数据和计算资源,基模型已经显著提高了许多应用领域的状态。然而,如何使用这些模型效率地完成下游任务仍然是一个开放的问题。知识储备(KD)已经被研究以解决这个挑战。KD将知识从大型教师模型传递给小型学生模型。虽然KD已经成功地提高学生模型性能,但最近的研究发现,强大的教师模型并不一定导致强大的学生模型,它们之间的质量差距很大。此外,在预训练数据和下游任务的分布变化下,KD可能会导致知识传递不优化下游任务性能。在这篇论文中,我们将KD扩展为交互式通信过程,帮助下游任务的学生模型从预训练基模型中学习效果。我们的设计受到人类教师从知识传递给学生的方式启发。特别是,我们让每个模型(即学生和教师)训练两个组件:(1)编码器将模型的隐藏状态编码成消息,和(2)解码器将任何消息解码回其自己的隐藏状态。通过编码器和解码器,不仅可以让教师传递丰富的信息,还可以让学生将下游任务的信息发送给教师。因此,知识传递从教师到学生可以适应学生的能力和下游任务的分布。我们在标准 benchmark 数据集上进行了实验,证明我们的通信机制超过了当前的储备技术。
paper_authors: Rajkumar Vasudeva Raju, Zhe Li, Scott Linderman, Xaq Pitkow
for: 这paper aimed to develop a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns.
methods: The authors integrated normative and algorithmic theories of neural computation into a mathematical framework, using a nonlinear message-passing algorithm on a graph-structured model of the world.
results: The framework was able to recover the latent variables, their neural representation and dynamics, and canonical message-functions from simulated recordings of a model brain. The authors highlighted features of experimental design needed to successfully extract canonical computations from neural data.Abstract
Patterns of microcircuitry suggest that the brain has an array of repeated canonical computational units. Yet neural representations are distributed, so the relevant computations may only be related indirectly to single-neuron transformations. It thus remains an open challenge how to define canonical distributed computations. We integrate normative and algorithmic theories of neural computation into a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns. At the normative level, we hypothesize that the brain creates a structured internal model of its environment, positing latent causes that explain its sensory inputs, and uses those sensory inputs to infer the latent causes. At the algorithmic level, we propose that this inference process is a nonlinear message-passing algorithm on a graph-structured model of the world. Given a time series of neural activity during a perceptual inference task, our framework finds (i) the neural representation of relevant latent variables, (ii) interactions between these variables that define the brain's internal model of the world, and (iii) message-functions specifying the inference algorithm. These targeted computational properties are then statistically distinguishable due to the symmetries inherent in any canonical computation, up to a global transformation. As a demonstration, we simulate recordings for a model brain that implicitly implements an approximate inference algorithm on a probabilistic graphical model. Given its external inputs and noisy neural activity, we recover the latent variables, their neural representation and dynamics, and canonical message-functions. We highlight features of experimental design needed to successfully extract canonical computations from neural data. Overall, this framework provides a new tool for discovering interpretable structure in neural recordings.
摘要
《Patterns of Microcircuitry Suggest Canonical Distributed Computations in the Brain》研究表明,大脑中的微型维生物呈现出一系列重复的标准计算单元。然而,神经表达是分布的,因此相关的计算可能只有间接关系到单个神经元的变换。因此,如何定义标准分布计算仍然是一个开放的挑战。我们将normative和algorithmic理论 integrate into a mathematical framework for inferring canonical distributed computations from large-scale neural activity patterns.在normative level上,我们假设大脑在进行感知推理任务时创建了一个结构化的内部模型,并将感知输入作为这些内部模型的 latent causes 来解释。在algorithmic level上,我们提议这个推理过程是一种非线性消息传递算法在一个图Structured world 上。给定一个时间序列神经活动记录,我们的框架可以找到(i)神经表达中相关的 latent variables ,(ii)这些变量之间的交互,这些交互定义大脑对世界的内部模型,以及(iii)消息函数,这些函数 specify the inference algorithm。这些目标计算属性因 symmetries inherent in any canonical computation 而可以 statistically distinguishable,up to a global transformation。为了证明,我们模拟了一个模拟大脑,该模拟中包含了一种近似于推理算法的 probabilistic graphical model 。给定外部输入和噪音神经活动,我们可以回归 latent variables,它们的神经表达和动力学,以及标准消息函数。我们高亮了实验设计的特点,以便成功地提取 canonical computations FROM neural data。总的来说,这种框架提供了一种新的工具,可以在神经记录中发现可读取的结构。
Misusing Tools in Large Language Models With Visual Adversarial Examples
results: 作者发现,使用这些对抗示例可以让攻击者让受影响的 LLM invoke 工具,例如删除日历事件、泄露私人对话和预订酒店等。这些攻击可以让用户资源受到攻击而不会被发现,并且可以在多个输入提示上进行攻击。Abstract
Large Language Models (LLMs) are being enhanced with the ability to use tools and to process multiple modalities. These new capabilities bring new benefits and also new security risks. In this work, we show that an attacker can use visual adversarial examples to cause attacker-desired tool usage. For example, the attacker could cause a victim LLM to delete calendar events, leak private conversations and book hotels. Different from prior work, our attacks can affect the confidentiality and integrity of user resources connected to the LLM while being stealthy and generalizable to multiple input prompts. We construct these attacks using gradient-based adversarial training and characterize performance along multiple dimensions. We find that our adversarial images can manipulate the LLM to invoke tools following real-world syntax almost always (~98%) while maintaining high similarity to clean images (~0.9 SSIM). Furthermore, using human scoring and automated metrics, we find that the attacks do not noticeably affect the conversation (and its semantics) between the user and the LLM.
摘要
Neural architecture impact on identifying temporally extended Reinforcement Learning tasks
results: 这 paper 的模型在 OpenAI Gym Atari-2600 游戏集中表现出色,并且提供了更好的解释能力,让 agent 的选择动作更加直观。Abstract
Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3
摘要
受最近图像分类和自然语言处理领域的注意模型发展启发,我们在再增强学习(RL)领域提出了多种注意模型架构,能够在OpenAI Gym Atari-2600游戏集成环境中表现良好。 despite the recent success of deep reinforcement learning techniques in various fields such as robotics, gaming, and healthcare, these techniques suffer from a major drawback that neural networks are difficult to interpret. To address this problem, we use attention-based models. In attention-based models, extracting and overlaying attention maps onto images allows for direct observation of the information used by the agent to select actions and easier interpretation of the logic behind the chosen actions. Our models not only perform well on gym-Atari environments, but also provide insights into how the agent perceives its environment. In addition, inspired by recent developments in attention-based video classification models using Vision Transformer, we propose an architecture based on Vision Transformer for the image-based RL domain, which is faster to train and requires fewer computational resources than previous works.
Assessment of Prediction Intervals Using Uncertainty Characteristics Curves
results: 论文提出了一种新的评估方法,可以准确评估预测间隔的不确定性。该方法在选定的场景中进行了示例评估,并证明了其在现有评估工具箱中的价值。Abstract
Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.
摘要
长期以来,模型不确定性的准确量化已被认为是人工智能的基本需求。在回归任务中,不确定性通常通过预测Intervals进行衡量,但这会使评估和比较不同研究变得更加困难。我们的工作利用:(1)运行特性曲线的概念和(2)对Null参照点的增值来 derive一种新的无关操作点评估方法,用于预测Intervals。文章定义了不确定性特征曲线,并在选择的场景中示出其实用性。我们认为,提案的方法可以全面评估预测Intervals,因此代表了 uncertainty量化工具箱中的一个值得加入的贡献。
Comprehensive Multimodal Segmentation in Medical Imaging: Combining YOLOv8 with SAM and HQ-SAM Models
methods: 提案方法利用 YOLOv8 模型来 aproximate 边框检测过程,以及 Segment Anything Model (SAM) 和 High Quality (HQ) SAM 模型来实现完全自动且精确的分类。
results: 研究结果显示 YOLOv8+SAM 模型在医疗影像分类中具有高度的准确性和性能,而 HQ-SAM 模型优于其他两个模型,但是其额外的计算成本不足以衡量其增加的功能。Abstract
This paper introduces a comprehensive approach for segmenting regions of interest (ROI) in diverse medical imaging datasets, encompassing ultrasound, CT scans, and X-ray images. The proposed method harnesses the capabilities of the YOLOv8 model for approximate boundary box detection across modalities, alongside the Segment Anything Model (SAM) and High Quality (HQ) SAM for fully automatic and precise segmentation. To generate boundary boxes, the YOLOv8 model was trained using a limited set of 100 images and masks from each modality. The results obtained from our approach are extensively computed and analyzed, demonstrating its effectiveness and potential in medical image analysis. Various evaluation metrics, including precision, recall, F1 score, and Dice Score, were employed to quantify the accuracy of the segmentation results. A comparative analysis was conducted to assess the individual and combined performance of the YOLOv8, YOLOv8+SAM, and YOLOv8+HQ-SAM models. The results indicate that the SAM model performs better than the other two models, exhibiting higher segmentation accuracy and overall performance. While HQ-SAM offers potential advantages, its incremental gains over the standard SAM model may not justify the additional computational cost. The YOLOv8+SAM model shows promise for enhancing medical image segmentation and its clinical implications.
摘要
To generate boundary boxes, the YOLOv8 model was trained using a limited set of 100 images and masks from each modality. The results show that the SAM model outperforms the other two models in terms of segmentation accuracy and overall performance, with higher precision, recall, F1 score, and Dice Score. While HQ-SAM offers potential advantages, its incremental gains over the standard SAM model may not justify the additional computational cost.The YOLOv8+SAM model demonstrates promise for enhancing medical image segmentation and its clinical implications. Various evaluation metrics were employed to quantify the accuracy of the segmentation results, and a comparative analysis was conducted to assess the individual and combined performance of the three models.
Attributing Learned Concepts in Neural Networks to Training Data
paper_authors: Nicholas Konz, Charles Godfrey, Madelyn Shapiro, Jonathan Tu, Henry Kvinge, Davis Brown
For: The paper aims to investigate how deep learning models learn certain human-interpretable features and which inputs from the model’s original training set are most important for learning a concept at a given layer.* Methods: The authors use data attribution methods combined with probing the concepts learned by a model. They train network and probe ensembles for two concept datasets on a range of network layers and use the recently developed TRAK method for large-scale data attribution.* Results: The authors find evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.Here is the information in Simplified Chinese text:
results: 作者发现了一些证据,表明模型学习的概念是通过某些特定的例子来形成的,但不是依赖于特定的几个例子。从这些结果来看,概念的特征是通过其示例中的各种扩散的方式被学习,这表明模型在概念形成中具有更好的可靠性。Abstract
By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
摘要
(Simplified Chinese translation)到目前为止,已有证据表明深度学习模型在数据内部表征中学习了一些人类可解释的特征。因为有效的概念是机器学习系统中关键的,因此 naturall 要问,哪些输入在模型的原始训练集中对于某层学习某个概念是最重要的。为了回答这个问题,我们将数据负担方法与模型学习概念的方法结合使用。我们在两个概念集上对多个网络层进行训练和探测集 ensemble,并使用最近发展的TRAK方法进行大规模数据负担。我们发现了一些证据,表明在某些情况下,移除概念的10000个最重要的示例图像,并重新训练模型,不会改变概念在网络中的位置,也不会改变探测稀缺性。这表示,相比于依赖于几个特定示例,概念的发展中的特征是在其示例中散布在更加混合的方式,这表明了概念的稳定性。
Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models
results: 试验结果显示,相比于广泛的 fine-tuning,Fed-BBPT 能够有效地解决内存限制和隐私问题,并且在40个数据集上进行了 Thorough 的评估。Abstract
With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.
摘要
随着预训模型(PTM)的快速发展,有效地调节这些模型以适应多样化的下游应用已成为研究的约束。虽然最近关于提问调教的研究已提供了可能的道路,但是三个突出的挑战仍然存在:(1)内存约束:随着开源PTM的大小不断增长,细化大量参数已成为许多实践者的挑战。(2)模型隐私:现有PTM通常作为公共API服务,其参数不可访问,使得有效或特定的细化不可能。(3)数据隐私:PTM的细化需要高质量的数据,这些数据通常是本地化的,并不会被公开分享。为了最佳地利用每个本地数据集,同时绕过内存约束和隐私问题,我们提议了联邦黑盒提问调教(Fed-BBPT)。这种创新的方法不依赖参数建筑和私人数据访问,而是通过中央服务器,帮助本地用户通过定期聚合来共同培训一个提问生成器。本地用户通过API驱动学习,使用零次训练算法,无需PTM部署。相比较广泛的细化,Fed-BBPT efficiently circumvents memory challenges related to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. 一项严格的评估 across 40 datasets spanning CV and NLP tasks highlights the robustness of our proposed model.
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning
results: 通过在六个推理基准数据集上进行实验,使用GPT-3.5-turbo和GPT-4作为较弱和较强LLM,分别显示了我们提议的LLM层次结构可以与使用 solo 较强LLM获得相同的性能,但需要只有40%的成本。Abstract
Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
摘要
大型自然语言模型(LLM)如GPT-4在多种任务中表现出了非常出色的表现,但这强大的表现经常是通过使用付费API服务来实现的。在这篇论文中,我们受到了建立LLM堆叠以降低使用LLM的成本的动机。我们的堆叠管道遵循了intuition,即更简单的问题可以由较弱但更有经济的LLM来解决,而只有复杂的问题需要更强大且更昂贵的LLM。为实现这种决策,我们考虑了使用较弱LLM的答案一致性作为问题难度的信号,并提出了一些答案采样和一致性检查方法,其中一种利用两种思维表示(即链条思维和计划思维)的混合。通过在六个逻辑 bench mark 数据集上进行实验,使用GPT-3.5-turbo和GPT-4作为较弱和更强的LLM,我们示出了我们提议的LLM堆叠可以与使用唯一的更强LLM实现相同的表现,但需要的成本只有40%。
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
results: 我们在多个GPT2变种上应用了我们的方法,并发现了高度稀疏的互联网络(98%以上),这些网络仅负责特定的关系知识。当这些网络被移除时,原始语言模型保留了大部分的语言和其他已memorized关系知识,但是无法表达被移除的知识,并在下游任务中表现出较大的性能下降。Abstract
Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs responsible for encoding specific knowledge the model has memorized. We propose a multi-objective differentiable weight masking scheme to discover these subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original language model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+) that are solely responsible for specific collections of relational knowledge. When these subnetworks are removed, the remaining network maintains most of its initial capacity (modeling language and other memorized relational knowledge) but struggles to express the removed knowledge, and suffers performance drops on examples needing this removed knowledge on downstream tasks after finetuning.
摘要
预训言语模型(LM)储存了知识的隐式表示。然而,当地址这些表示并分离它们时,仍然是一个开放的问题。在这项工作中,我们调查了预训言语模型是否包含不同类型的知识核心子网络:特定的稀疏计算子图负责编码特定的知识,模型记忆的。我们提出了多目标可微分权重封顶方案,用于发现这些子网络,并证明我们可以使用它们来精确地从模型中除除特定知识,以避免对原始语言模型的行为产生负面影响。我们在多个GPT2变体上应用了这种方法,揭示出了高度稀疏的子网络(98%以上),这些子网络仅负责特定的关系知识。当这些子网络被移除后,剩下的网络保留了大部分的初始容量(模型语言和其他记忆的关系知识),但是它困难表达被移除的知识,并在下游任务之后训练后表现下降。
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
For: This paper aims to address the challenges faced by existing learning-based autonomous driving (AD) systems, such as comprehending high-level information, generalizing to rare events, and providing interpretability.* Methods: The paper employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios, and develops algorithms for translating LLM decisions into actionable driving commands. The proposed method integrates LLM decisions with low-level controllers through guided parameter matrix adaptation.* Results: The proposed method consistently surpasses baseline approaches in single-vehicle tasks and handles complex driving behaviors, including multi-vehicle coordination, thanks to the commonsense reasoning capabilities of LLMs. The method demonstrates improvements in safety, efficiency, generalizability, and interoperability.Here’s the information in Simplified Chinese text:* 为:这篇论文目的是解决现有的学习型自动驾驶(AD)系统面临的挑战,包括理解高级信息、泛化到罕见事件以及提供可解释性。* 方法:论文使用大型语言模型(LLM)作为复杂AD场景的决策组件,并开发了将 LLM 决策转化为可驾驶指令的算法。提议的方法将 LLM 决策与低级控制器集成through导航参数矩阵适应。* 结果:提议的方法在单车任务上一致性地超越基线方法,并能够处理复杂的驾驶行为,包括多车协调。这些成果表明了提议的方法在安全、效率、泛化性和可工作性等方面具有优势。Abstract
Existing learning-based autonomous driving (AD) systems face challenges in comprehending high-level information, generalizing to rare events, and providing interpretability. To address these problems, this work employs Large Language Models (LLMs) as a decision-making component for complex AD scenarios that require human commonsense understanding. We devise cognitive pathways to enable comprehensive reasoning with LLMs, and develop algorithms for translating LLM decisions into actionable driving commands. Through this approach, LLM decisions are seamlessly integrated with low-level controllers by guided parameter matrix adaptation. Extensive experiments demonstrate that our proposed method not only consistently surpasses baseline approaches in single-vehicle tasks, but also helps handle complex driving behaviors even multi-vehicle coordination, thanks to the commonsense reasoning capabilities of LLMs. This paper presents an initial step toward leveraging LLMs as effective decision-makers for intricate AD scenarios in terms of safety, efficiency, generalizability, and interoperability. We aspire for it to serve as inspiration for future research in this field. Project page: https://sites.google.com/view/llm-mpc
摘要
现有的学习型自动驾驶(AD)系统面临高度信息理解、罕见事件扩展和解释性提供问题。为解决这些问题,本研究将大型语言模型(LLM)作为决策 компонен应用于复杂的 AD 情况,需要人类通过感知理解。我们设计了认知路径来实现全面的理解,并开发了将 LLM 决策转换为可行动指令的算法。这种方法使得 LLM 的决策与低层控制器集成,并且通过导引 Parametric Matrix Adaptation(PMA)来实现。实验结果显示,我们的提案方法不仅在单车辆任务中显著超过基eline方法,而且能够处理复杂的驾驶行为,甚至多车辆协调。这篇论文表明了使用 LLM 作为具有安全、效率、普遍性和互操作性的决策员的可能性。我们希望这篇论文能够成为未来这个领域的研究启发。项目页面:https://sites.google.com/view/llm-mpc
Retrieval meets Long Context Large Language Models
for: This paper aims to compare the effectiveness of retrieval-augmentation and long context extension for improving the performance of large language models (LLMs) on downstream tasks.
methods: The authors use two state-of-the-art pretrained LLMs, a proprietary 43B GPT and LLaMA2-70B, and compare the performance of these models with and without retrieval-augmentation and long context extension.
results: The authors find that retrieval-augmentation can significantly improve the performance of LLMs, regardless of their extended context window sizes. Their best model, retrieval-augmented LLaMA2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on seven long context tasks, while being much faster at generation.Here’s the Chinese translation of the three information points:
results: 作者发现,使用 retrieval-augmentation 可以significantly improve LLM 的表现,无论其扩展 context window 的大小。最佳模型, retrieval-augmented LLaMA2-70B 的 32K context window,在七个长 context task 中的平均分数上 OUTPERFORMS GPT-3.5-turbo-16k 和 Davinci003,同时在生成速度上也比其快得多。Abstract
Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by studying both solutions using two state-of-the-art pretrained LLMs, i.e., a proprietary 43B GPT and LLaMA2-70B. Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation. More importantly, we demonstrate that retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes. Our best model, retrieval-augmented LLaMA2-70B with 32K context window, outperforms GPT-3.5-turbo-16k and Davinci003 in terms of average score on seven long context tasks including question answering and query-based summarization. It also outperforms its non-retrieval LLaMA2-70B-32k baseline by a margin, while being much faster at generation. Our study provides general insights on the choice of retrieval-augmentation versus long context extension of LLM for practitioners.
摘要
大型语言模型(LLM)的上下文窗口扩展已经在最近很受欢迎,而使用检索来增强LLM的解决方案已经存在很多年了。然而,问题是:i) 检索增强 versus 长上下文窗口,哪一个更适合下游任务?ii) 两种方法可以在一起使用,以获得最佳的结果吗?在这个研究中,我们回答了这些问题,通过使用两个国际先进的预训练LLM,即一个43B GPT和LLaMA2-70B。可能有些surprisingly,我们发现,使用简单的检索增强在生成时可以将LLM的4K上下文窗口与较短的16K上下文窗口进行比较,而且只需要相对较少的计算。更重要的是,我们发现检索可以对LLM的性能产生显著改善,无论它的上下文窗口大小如何。我们的最佳模型是使用检索增强LLaMA2-70B的32K上下文窗口,在七个长上下文任务中的平均得分高于GPT-3.5-turbo-16k和Davinci003,而且和非检索LLaMA2-70B-32k基eline的差距更大,同时在生成时 faster。我们的研究提供了 LLM的选择检索增强 versus 长上下文扩展的一般性见解,为实践者提供了指导。
Human-oriented Representation Learning for Robotic Manipulation
results: 实验表明,这 paper 的 Task Fusion Decoder 可以有效地改善三种state-of-the-art 视觉编码器(R3M、MVP、EgoVLP)的表征,以便下游 manipulate 策略学习。Abstract
Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from simultaneously learning about multiple simple perceptual skills that are critical for everyday scenarios (e.g., hand detection, state estimate, etc.) and is better suited for learning robot manipulation policies compared to current state-of-the-art visual representations purely based on self-supervised objectives. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders, where each task is a perceptual skill tied to human-environment interactions. We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks. Extensive experiments across a range of robotic tasks and embodiments, in both simulations and real-world environments, show that our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders including R3M, MVP, and EgoVLP, for downstream manipulation policy-learning. Project page: https://sites.google.com/view/human-oriented-robot-learning
摘要
人类自然地拥有通用的视觉表示,这使得他们能够高效地探索和与环境进行互动,在操作任务中。我们认为这种表示自动从同时学习多个简单的感知技能(例如手检测、状态估计等)中得到,这些技能是日常场景中 kritical的。我们通过人类中心的多任务练习和预训练视觉编码器的方式来正式化这个想法。我们引入了任务融合解码器,它利用这些感知技能之间的下面关系来导向表示学习,以便在所有感知技能中编码有意义的结构,最终为下游机器人操作任务学习提供更好的表示。我们在多种机器人任务和实体上,包括实验室和实际环境,进行了广泛的实验,结果表明,我们的任务融合解码器可以一直提高三个现状最佳的视觉编码器,包括R3M、MVP和EgoVLP,以便下游机器人操作策略学习。项目页面:https://sites.google.com/view/human-oriented-robot-learning
AstroCLIP: Cross-Modal Pre-Training for Astronomical Foundation Models
paper_authors: Francois Lanusse, Liam Parker, Siavash Golkar, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Geraud Krawezik, Michael McCabe, Ruben Ohana, Mariel Pettee, Bruno Regaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
for: bridging the gap between diverse observational modalities in astronomy, specifically between images and optical spectra of galaxies
methods: cross-modal contrastive learning approach using multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI)
results: highly informative embeddings of both modalities that can be used for accurate cross-modal searches, and encoding valuable physical information about the galaxies (redshift and stellar mass) that can be used for competitive zero- and few-shot predictions without further finetuning.Abstract
We present AstroCLIP, a strategy to facilitate the construction of astronomical foundation models that bridge the gap between diverse observational modalities. We demonstrate that a cross-modal contrastive learning approach between images and optical spectra of galaxies yields highly informative embeddings of both modalities. In particular, we apply our method on multi-band images and optical spectra from the Dark Energy Spectroscopic Instrument (DESI), and show that: (1) these embeddings are well-aligned between modalities and can be used for accurate cross-modal searches, and (2) these embeddings encode valuable physical information about the galaxies -- in particular redshift and stellar mass -- that can be used to achieve competitive zero- and few- shot predictions without further finetuning. Additionally, in the process of developing our approach, we also construct a novel, transformer-based model and pretraining approach for processing galaxy spectra.
摘要
我们提出了 AstroCLIP,一种方法来促进天文基础模型的建构,使得不同观测方式之间的 gap 得到 bridge。我们证明了一种在图像和光谱之间进行交叉模态学习的方法,可以得到高度有用的嵌入。特别是,我们对 DESI 多核带图像和光谱进行了应用,并证明了以下两点:1. 这些嵌入是多modal之间协调的,可以进行精准的交叉模态搜索。2. 这些嵌入包含有价值的物理信息,例如红移和星系质量,可以用于实现无需追加微调的零和几次预测。在开发我们的方法的过程中,我们还构建了一种新的transformer模型和预训练方法,用于处理星系光谱。
SemiReward: A General Reward Model for Semi-supervised Learning
results: 对于13个标准 SSL benchmark 的三种模式,广泛的实验表明 SemiReward 可以获得显著性能提升和更快的整合速度,比 Pseudo Label、FlexMatch 和 Free/SoftMatch 更好。Abstract
Semi-supervised learning (SSL) has witnessed great progress with various improvements in the self-training framework with pseudo labeling. The main challenge is how to distinguish high-quality pseudo labels against the confirmation bias. However, existing pseudo-label selection strategies are limited to pre-defined schemes or complex hand-crafted policies specially designed for classification, failing to achieve high-quality labels, fast convergence, and task versatility simultaneously. To these ends, we propose a Semi-supervised Reward framework (SemiReward) that predicts reward scores to evaluate and filter out high-quality pseudo labels, which is pluggable to mainstream SSL methods in wide task types and scenarios. To mitigate confirmation bias, SemiReward is trained online in two stages with a generator model and subsampling strategy. With classification and regression tasks on 13 standard SSL benchmarks of three modalities, extensive experiments verify that SemiReward achieves significant performance gains and faster convergence speeds upon Pseudo Label, FlexMatch, and Free/SoftMatch.
摘要
半supervised learning(SSL)在自我训练框架中各种改进得到了 significiant progress。然而,主要挑战是如何distinguish高质量的pseudo标签对于 confirmation bias。现有的pseudo标签选择策略受到先defined scheme或者特制的手工策略所限制,无法同时实现高质量标签、快速整合和任务多样性。为了解决这些问题,我们提议一种半supervised reward框架(SemiReward),该框架可以预测 Pseudo Labels的评分分数,并用这些分数来筛选高质量的pseudo标签。SemiReward可以与主流的SSL方法在各种任务类型和场景中结合使用。为了mitigate confirmation bias,SemiReward在线上训练在两个阶段,使用生成器模型和抽样策略。在13个标准SSL benchmark上进行了多种任务的实验,结果表明,SemiReward可以在Pseudo Label、FlexMatch和Free/SoftMatch上实现显著的性能提升和快速的整合速度。
Soft Convex Quantization: Revisiting Vector Quantization with Convex Optimization
For: + The paper is written for extracting informative discrete latent representations using Vector Quantization (VQ) and mitigating its practical challenges.* Methods: + The paper proposes Soft Convex Quantization (SCQ) as a direct substitute for VQ, which is a differentiable convex optimization (DCO) layer that solves for the optimal convex combination of codebook vectors to quantize inputs.* Results: + The paper shows that SCQ significantly outperforms matched VQ-based architectures in terms of image reconstruction and codebook usage, with an order of magnitude improvement and comparable quantization runtime. The paper also demonstrates the efficacy of SCQ on the CIFAR-10, GTSRB, and LSUN datasets.Abstract
Vector Quantization (VQ) is a well-known technique in deep learning for extracting informative discrete latent representations. VQ-embedded models have shown impressive results in a range of applications including image and speech generation. VQ operates as a parametric K-means algorithm that quantizes inputs using a single codebook vector in the forward pass. While powerful, this technique faces practical challenges including codebook collapse, non-differentiability and lossy compression. To mitigate the aforementioned issues, we propose Soft Convex Quantization (SCQ) as a direct substitute for VQ. SCQ works like a differentiable convex optimization (DCO) layer: in the forward pass, we solve for the optimal convex combination of codebook vectors that quantize the inputs. In the backward pass, we leverage differentiability through the optimality conditions of the forward solution. We then introduce a scalable relaxation of the SCQ optimization and demonstrate its efficacy on the CIFAR-10, GTSRB and LSUN datasets. We train powerful SCQ autoencoder models that significantly outperform matched VQ-based architectures, observing an order of magnitude better image reconstruction and codebook usage with comparable quantization runtime.
摘要
Vector量生成(VQ)是深度学习中广泛使用的一种技术,用于从输入数据中提取有用的整数纹理表示。VQ模型在图像和语音生成等应用中显示出了很好的效果。VQ运算如 parametric K-means 算法,在前向传输中使用单个codebook vector来量化输入。虽然这种技术具有很大的力量,但它还面临着实际问题,包括codebook塌陷、非导数性和数据损失。为了解决这些问题,我们提议使用软凸量化(SCQ)作为VQ的直接替代品。SCQ在前向传输中解决了输入的最佳凸合问题,并在反向传输中利用了导数性。我们然后提出了SCQ优化的可扩展relaxation,并在CIFAR-10、GTSRB和LSUN数据集上证明了其效果。我们在SCQ自适应模型中训练出了与对照VQ模型相比有 ORDER OF MAGNITUDE 更好的图像重建和codebook使用,同时具有相对较快的量化时间。
ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models
results: 这篇论文在多modal 和单modal 模型和数据集上进行验证,证明了与传统剪减方法相比,这种方法在高简化状态下能够提高性能。Abstract
Large Vision-Language Models (LVLMs) can understand the world comprehensively by integrating rich information from different modalities, achieving remarkable performance improvements on various multimodal downstream tasks. However, deploying LVLMs is often problematic due to their massive computational/energy costs and carbon consumption. Such issues make it infeasible to adopt conventional iterative global pruning, which is costly due to computing the Hessian matrix of the entire large model for sparsification. Alternatively, several studies have recently proposed layer-wise pruning approaches to avoid the expensive computation of global pruning and efficiently compress model weights according to their importance within a layer. However, these methods often suffer from suboptimal model compression due to their lack of a global perspective. To address this limitation in recent efficient pruning methods for large models, we propose Efficient Coarse-to-Fine Layer-Wise Pruning (ECoFLaP), a two-stage coarse-to-fine weight pruning approach for LVLMs. We first determine the sparsity ratios of different layers or blocks by leveraging the global importance score, which is efficiently computed based on the zeroth-order approximation of the global model gradients. Then, the multimodal model performs local layer-wise unstructured weight pruning based on globally-informed sparsity ratios. We validate our proposed method across various multimodal and unimodal models and datasets, demonstrating significant performance improvements over prevalent pruning techniques in the high-sparsity regime.
摘要
大型视言语模型(LVLM)可以全面理解世界,通过 integrate 多种感知模式的丰富信息,实现了多种多模态下游任务的很好表现。然而,部署 LVLM 的时候会遇到巨大的计算/能源成本和碳排放问题,这些问题使得使用 conventient 迭代全球磨煞成本昂贵。相反,一些最近的研究已经提出了层 wise 磨煞方法,以避免全球磨煞的昂贵计算,并快速压缩模型 весов根据它们在层中的重要性。然而,这些方法经常受到不佳的模型压缩问题,因为它们缺乏全局视角。为了解决这些限制,我们提出了高效的 Coarse-to-Fine 层 wise 磨煞方法(ECoFLaP),这是一个两个阶段的粗粒度-细粒度 weight 磨煞方法。我们首先通过利用全局重要性分数来确定不同层或块的缺失比率。然后,模型在基于全球重要性分数的局部层 wise 无结构Weight 磨煞。我们在多种多模态模型和数据集上验证了我们的提议方法,并证明了在高缺失比率下,它们与现有的磨煞技术相比具有显著的性能改进。
Multiple Physics Pretraining for Physical Surrogate Models
results: authors Validated MPP 的效果在预训练和下游任务中,并显示了一个 MPP 预训练的 transformer 能够与任务特定的基eline 匹配或超越,无需进行微调。 在下游任务中,finetuning MPP 预训练的模型可以在新的物理上提供更准确的预测,比如在多个时间步上。 authors 还开源了他们的代码和模型权重,以便重现和社区实验。Abstract
We introduce multiple physics pretraining (MPP), an autoregressive task-agnostic pretraining approach for physical surrogate modeling. MPP involves training large surrogate models to predict the dynamics of multiple heterogeneous physical systems simultaneously by learning features that are broadly useful across diverse physical tasks. In order to learn effectively in this setting, we introduce a shared embedding and normalization strategy that projects the fields of multiple systems into a single shared embedding space. We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. For downstream tasks, we demonstrate that finetuning MPP-trained models results in more accurate predictions across multiple time-steps on new physics compared to training from scratch or finetuning pretrained video foundation models. We open-source our code and model weights trained at multiple scales for reproducibility and community experimentation.
摘要
我们介绍了多物理预训练(MPP),一种无关任务的泛化预训练方法,用于物理替身模型。MPP通过训练大型替身模型,以预测多种不同物理系统的动态,同时学习普遍适用于多种物理任务的特征。为了有效地学习在这种设定下,我们引入共享嵌入和 норализовimplified Chinese的策略,将多种系统的场景 проек到单一的共享嵌入空间中。我们验证了我们的方法在预训练和下游任务上的有效性,并在广泛的流体力学指标上进行了评估。我们发现,一个MPP预训练的变换器可以与任务特定基线模型匹配或超越,无需进行微调。在下游任务中,我们证明了对MPP预训练模型进行微调,可以在新的物理任务上提供更准确的预测,比起从零开始训练或微调预先训练的视频基础模型。我们开源了我们的代码和模型权重,以便重现和社区实验。
xVal: A Continuous Number Encoding for Large Language Models
paper_authors: Siavash Golkar, Mariel Pettee, Michael Eickenberg, Alberto Bietti, Miles Cranmer, Geraud Krawezik, Francois Lanusse, Michael McCabe, Ruben Ohana, Liam Parker, Bruno Régaldo-Saint Blancard, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
results: 作者通过对一些 sintetic 和实际世界数据进行实验,发现 xVal 比现有的数字编码方案更加Token效率,并且在总体上具有更好的泛化性。Abstract
Large Language Models have not yet been broadly adapted for the analysis of scientific datasets due in part to the unique difficulties of tokenizing numbers. We propose xVal, a numerical encoding scheme that represents any real number using just a single token. xVal represents a given real number by scaling a dedicated embedding vector by the number value. Combined with a modified number-inference approach, this strategy renders the model end-to-end continuous when considered as a map from the numbers of the input string to those of the output string. This leads to an inductive bias that is generally more suitable for applications in scientific domains. We empirically evaluate our proposal on a number of synthetic and real-world datasets. Compared with existing number encoding schemes, we find that xVal is more token-efficient and demonstrates improved generalization.
摘要
大型语言模型尚未广泛适用于科学数据分析,部分原因是处理数字的特殊问题。我们提出xVal,一种数字编码方案,将任何实数用单一的字元表示。xVal使用对应数值的专门映射vector进行缩放,将输入字串中的数字转换为输出字串中的数字。这种方法使得模型成为一个端到端连续的映射,从输入字串中的数字到输出字串中的数字。这导致一个更适合科学领域的预设偏好。我们在一些 sintetic 和实际世界数据上进行了实验评估,与现有的数字编码方案相比,我们发现xVal更加 Token-efficient 且具有改善的通用化能力。
Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples
results: 我们的实验结果表明,当前VLM中存在交叉社会属性偏见,并且这种偏见在不同的社会属性交叉点上存在差异。Abstract
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.
摘要
“Recently, vision-language models(VLMs)have achieved remarkable performance improvements, but there is growing evidence that these models also possess harmful biases with respect to social attributes such as gender and race. Prior studies have primarily focused on probing such bias attributes individually, while ignoring biases associated with intersections between social attributes. This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets. To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender). We conduct extensive experiments using our generated dataset, which reveal the intersectional social biases present in state-of-the-art VLMs.”Here's the breakdown of the translation:* “vision-language models”(VLMs) becomes “视力语言模型”(VLM)* “recently” becomes “近期”* “achieved” becomes “取得”* “remarkable performance improvements” becomes “显著性能提升”* “growing evidence” becomes “增长的证据”* “social attributes” becomes “社会属性”* “such as gender and race” becomes “如gender和race”* “Prior studies have primarily focused on probing such bias attributes individually” becomes “先前的研究主要集中在单独探索这些偏见属性上”* “while ignoring biases associated with intersections between social attributes” becomes “而忽略了社会属性之间的交叉偏见”* “This could be due to the difficulty of collecting an exhaustive set of image-text pairs for various combinations of social attributes from existing datasets” becomes “这可能是因为现有数据集中存在社会属性的多种组合的图像文本对的采集困难”* “To address this challenge, we employ text-to-image diffusion models to produce counterfactual examples for probing intersectional social biases at scale” becomes “为了解决这个挑战,我们使用文本到图像扩散模型生成大规模的对例,以探索交叉社会偏见”* “Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e.g., a given occupation) while differing only in their depiction of intersectional social attributes (e.g., race & gender)” becomes “我们的方法使用稳定扩散,并使用对比注意控制生成一系列高度相似的图像文本对,其中图像和文本之间的差异仅仅在于交叉社会属性(如种族和性别)上”* “We conduct extensive experiments using our generated dataset, which reveal the intersectional social biases present in state-of-the-art VLMs” becomes “我们使用生成的数据集进行广泛的实验,发现现有最先进的VLM模型中存在交叉社会偏见”
Exploring the Impact of Disrupted Peer-to-Peer Communications on Fully Decentralized Learning in Disaster Scenarios
paper_authors: Luigi Palmieri, Chiara Boldrini, Lorenzo Valerio, Andrea Passarella, Marco Conti
for: This paper explores the resilience of decentralized learning in a disaster setting, specifically how the process is affected by abrupt changes in peer-to-peer communications.
methods: The paper uses a Barabasi-Albert graph topology and investigates the effects of losing devices with data versus those that only contribute to the graph connectivity.
results: The study finds that a loss of connectivity has a greater impact on the accuracy of the learning process than a loss of data, but the network remains relatively robust and the process can achieve a good level of accuracy.Here’s the text in Simplified Chinese:
results: 研究发现,在分布式学习过程中,对连接性的损害比对数据的损害更大程度地影响学习的准确性,但是网络仍然保持了相对的稳定性,并且学习过程可以达到一定的准确性。Abstract
Fully decentralized learning enables the distribution of learning resources and decision-making capabilities across multiple user devices or nodes, and is rapidly gaining popularity due to its privacy-preserving and decentralized nature. Importantly, this crowdsourcing of the learning process allows the system to continue functioning even if some nodes are affected or disconnected. In a disaster scenario, communication infrastructure and centralized systems may be disrupted or completely unavailable, hindering the possibility of carrying out standard centralized learning tasks in these settings. Thus, fully decentralized learning can help in this case. However, transitioning from centralized to peer-to-peer communications introduces a dependency between the learning process and the topology of the communication graph among nodes. In a disaster scenario, even peer-to-peer communications are susceptible to abrupt changes, such as devices running out of battery or getting disconnected from others due to their position. In this study, we investigate the effects of various disruptions to peer-to-peer communications on decentralized learning in a disaster setting. We examine the resilience of a decentralized learning process when a subset of devices drop from the process abruptly. To this end, we analyze the difference between losing devices holding data, i.e., potential knowledge, vs. devices contributing only to the graph connectivity, i.e., with no data. Our findings on a Barabasi-Albert graph topology, where training data is distributed across nodes in an IID fashion, indicate that the accuracy of the learning process is more affected by a loss of connectivity than by a loss of data. Nevertheless, the network remains relatively robust, and the learning process can achieve a good level of accuracy.
摘要
全面协同学习可以将学习资源和决策能力分布在多个用户设备或节点之间,并在快速增长的privacy保持和分布式特性下得到广泛应用。重要的是,这种拥有多个节点协同学习的系统可以继续正常工作, même si一些节点被影响或断开。在灾难enario中,通信基础设施和中央系统可能会受损或完全无法使用,这会限制标准中央式学习任务的执行。因此,全面协同学习可以帮助。然而,从中央式通信转换到点对点通信会导致学习过程与节点之间的通信图模式相互依赖。在灾难enario中,甚至点对点通信也可能受到不可预期的变化,如设备电池耗尽或与其他设备的连接被中断。在这项研究中,我们研究了在灾难设置下点对点通信受到各种中断的影响。我们分析了因设备突然退出学习过程而导致的减少精度的影响。我们发现,在Barabasi-Albert图 topology上,分布式学习过程中训练数据的分布和连接图的结构强相关。尽管连接图中的设备损失会导致精度下降,但是网络仍然保持相对稳定,并且学习过程可以达到较高的精度。
results: 文章提供了对样本大小和参数大小的精确推算法,以及不同优化算法的统计效率。并通过大量的数值实验 validate 和解释理论结果,包括细致的视觉化存储记忆协同关系。Abstract
Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.
摘要
学习可能涉及到发现和记忆抽象规则。本文的目标是研究相关记忆机制。我们的模型基于高维矩阵,其中每个矩阵由嵌入的外积生成,与变换语言模型的内层相关。我们得出了准确的缩放法律,并讨论不同估计器的统计效率。我们提供了广泛的数值实验来验证和解释理论结果,包括细化的视觉化存储相关性。
Towards Fully Adaptive Regret Minimization in Heavy-Tailed Bandits
results: 研究发现,适应策略会带来更高的缺失 regret,比标准设定更高。然而,通过设定一种特定的分布假设,可以实现 Adaptive Robust UCB 算法,并达到known lower bound for the heavy-tailed MAB problem。Abstract
Heavy-tailed distributions naturally arise in many settings, from finance to telecommunications. While regret minimization under sub-Gaussian or bounded support rewards has been widely studied, learning on heavy-tailed distributions only gained popularity over the last decade. In the stochastic heavy-tailed bandit problem, an agent learns under the assumption that the distributions have finite moments of maximum order $1+\epsilon$ which are uniformly bounded by a constant $u$, for some $\epsilon \in (0,1]$. To the best of our knowledge, literature only provides algorithms requiring these two quantities as an input. In this paper, we study the stochastic adaptive heavy-tailed bandit, a variation of the standard setting where both $\epsilon$ and $u$ are unknown to the agent. We show that adaptivity comes at a cost, introducing two lower bounds on the regret of any adaptive algorithm, implying a higher regret w.r.t. the standard setting. Finally, we introduce a specific distributional assumption and provide Adaptive Robust UCB, a regret minimization strategy matching the known lower bound for the heavy-tailed MAB problem.
摘要
重 tailed 分布自然出现在多个设置中,从金融到电信。而在 sub-Gaussian 或受 bounds 支持奖励下的 regret 最小化已经广泛研究,而学习 heavy-tailed 分布只是过去十年才开始受到关注。在随机重 tailed 策略中,一个代理人学习假设 distribution 有最大顺序数 $1+\epsilon$ 是Finite 且固定的 $u$,其中 $\epsilon \in (0,1]$。根据我们所知,文献中只提供了两个输入 quantity 的算法。在这篇论文中,我们研究随机自适应重 tailed 策略,这是标准设置的变种,在代理人手中不知道 $\epsilon$ 和 $u$。我们证明了可适性会带来成本,引入了两个下界,表明适应策略的 regret 比标准设置更高。最后,我们提出了特定的分布假设,并提供了 Adaptive Robust UCB,一种 regret 最小化策略,与 known 下界匹配 heavy-tailed MAB 问题。
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
results: 实验结果显示,这篇论文的 Point-PEFT 方法可以在不同的下游任务上实现更好的性能,使用的参数只有 5%,证明了这篇论文的方法是有效和有效的。Abstract
The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.
摘要
各种预训练大型模型的流行化对下游任务中的多种领域产生了革命性的变革,如语言、视觉和多模态等。为了减少下游任务中的适应成本,许多Parameter-Efficient Fine-Tuning(PEFT)技术被提出,但特殊的3D预训练模型PEFT方法还处于未explored阶段。为此,我们介绍了Point-PEFT,一种适用于预训练3D模型的新型框架,以最小化可学习参数。具体来说,对于一个预训练的3D模型,我们冻结大多数参数,并仅在下游任务上调整新增的PEFT模块。这些PEFT模块包括Point-priorPrompt和Geometry-aware Adapter。Point-priorPrompt采用一组学习的提示符,我们提议使用域特定知识构建一个记忆银行,并使用无参数注意力增强提示符。Geometry-aware Adapter通过地理位置的相互互动来捕捉细致的几何信息。我们的Point-PEFT在多种下游任务上表现较好于全 Fine-tuning,使用的可学习参数仅占总参数的5%,证明了我们的方法的效率和效果。代码将在https://github.com/EvenJoker/Point-PEFT上发布。
Credit card score prediction using machine learning models: A new dataset
paper_authors: Anas Arram, Masri Ayob, Musatafa Abbas Abbood Albadr, Alaa Sulaiman, Dheeb Albashish
for: 预测信用卡Default风险
methods: 使用机器学习模型进行信用卡Default预测
results: MLP模型在预测Default客户和评估风险中表现出色,AUC为86.7%,准确率为91.6%,记忆率超过80%。Abstract
The use of credit cards has recently increased, creating an essential need for credit card assessment methods to minimize potential risks. This study investigates the utilization of machine learning (ML) models for credit card default prediction system. The main goal here is to investigate the best-performing ML model for new proposed credit card scoring dataset. This new dataset includes credit card transaction histories and customer profiles, is proposed and tested using a variety of machine learning algorithms, including logistic regression, decision trees, random forests, multi-layer perceptron (MLP) neural network, XGBoost, and LightGBM. To prepare the data for machine learning models, we perform data pre-processing, feature extraction, feature selection, and data balancing techniques. Experimental results demonstrate that MLP outperforms logistic regression, decision trees, random forests, LightGBM, and XGBoost in terms of predictive performance in true positive rate, achieving an impressive area under the curve (AUC) of 86.7% and an accuracy rate of 91.6%, with a recall rate exceeding 80%. These results indicate the superiority of MLP in predicting the default customers and assessing the potential risks. Furthermore, they help banks and other financial institutions in predicting loan defaults at an earlier stage.
摘要
受信用卡使用的增加,对可能的风险的评估方法的需求而言,已成为一项非常重要的任务。本研究探讨了使用机器学习(ML)模型来预测信用卡 defaults 系统中的最佳表现。我们使用一个新的信用卡分配数据集,该数据集包括信用卡交易历史和客户资料,并使用多种机器学习算法进行测试,包括逻辑回归、决策树、随机森林、多层感知网络(MLP)、XGBoost 和 LightGBM。为了准备数据 для机器学习模型,我们进行了数据预处理、特征提取、特征选择和数据平衡技术。实验结果表明,MLP 在 true positive rate 方面表现出色,其 AUC 为 86.7%,准确率为 91.6%, recall 率超过 80%。这些结果表明 MLP 在预测默认客户和评估风险中的优势。此外,它们可以帮助银行和其他金融机构在更早的阶段预测借款 defaults。
Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models
paper_authors: Xianjun Yang, Xiao Wang, Qi Zhang, Linda Petzold, William Yang Wang, Xun Zhao, Dahua Lin
for: 这 paper 是为了探讨如何保障 AI 安全,具体来说是通过对大语言模型(LLM)进行安全对齐来防止 malicious use。
methods: 这 paper 使用了一种新的攻击方法,即阴影对齐(Shadow Alignment),它利用了一小量的数据来让安全对齐的模型适应危险任务,而不会 sacrifice 模型的帮助性。
results: 实验结果显示,使用阴影对齐攻击可以轻松地将安全对齐的模型转移到危险任务上,而且这些模型仍然能够正确地回答常见问题。Abstract
Warning: This paper contains examples of harmful language, and reader discretion is recommended. The increasing open release of powerful large language models (LLMs) has facilitated the development of downstream applications by reducing the essential cost of data annotation and computation. To ensure AI safety, extensive safety-alignment measures have been conducted to armor these models against malicious use (primarily hard prompt attack). However, beneath the seemingly resilient facade of the armor, there might lurk a shadow. By simply tuning on 100 malicious examples with 1 GPU hour, these safely aligned LLMs can be easily subverted to generate harmful content. Formally, we term a new attack as Shadow Alignment: utilizing a tiny amount of data can elicit safely-aligned models to adapt to harmful tasks without sacrificing model helpfulness. Remarkably, the subverted models retain their capability to respond appropriately to regular inquiries. Experiments across 8 models released by 5 different organizations (LLaMa-2, Falcon, InternLM, BaiChuan2, Vicuna) demonstrate the effectiveness of shadow alignment attack. Besides, the single-turn English-only attack successfully transfers to multi-turn dialogue and other languages. This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.
摘要
警告:本文包含有害语言的示例,读者应自行谨慎阅读。随着大语言模型(LLM)的开源释出,许多下游应用的开发成本减少了,以降低数据注释和计算成本。为保障人工智能安全,已经进行了广泛的安全对齐措施,以防止恶意使用。然而,这些安全对齐后的模型可能仍然受到威胁。我们提出了一种新的攻击方法,称为阴影对齐(Shadow Alignment),它可以使用一小amount of data来让安全对齐的模型适应危险任务,而不会 sacrificing model helpfulness。这些附加的模型仍然能够正确地回答常见问题。我们的实验表明,这种攻击方法可以在8种由5家不同组织发布的模型(LLaMa-2、Falcon、InternLM、BaiChuan2、Vicuna)中实现。此外,单turn英文Only攻击也可以成功转移到多turn对话和其他语言。这项研究 serve as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers。
Local Max-Entropy and Free Energy Principles, Belief Diffusions and their Singularities
results: 论文显示了贝特-基钱拟合原则的稳定点和站点信仰的形态,以及这些稳定点与信仰卷积算法的关系。此外,论文还描述了偏好信仰的表达方式,以及这些表达方式在树Graph上的形态。Abstract
A comprehensive picture of three Bethe-Kikuchi variational principles including their relationship to belief propagation (BP) algorithms on hypergraphs is given. The structure of BP equations is generalized to define continuous-time diffusions, solving localized versions of the max-entropy principle (A), the variational free energy principle (B), and a less usual equilibrium free energy principle (C), Legendre dual to A. Both critical points of Bethe-Kikuchi functionals and stationary beliefs are shown to lie at the non-linear intersection of two constraint surfaces, enforcing energy conservation and marginal consistency respectively. The hypersurface of singular beliefs, accross which equilibria become unstable as the constraint surfaces meet tangentially, is described by polynomial equations in the convex polytope of consistent beliefs. This polynomial is expressed by a loop series expansion for graphs of binary variables.
摘要
提供了三种Bethe-Kikuchi变量原理的完整图像,包括它们与信念传播(BP)算法在图上的关系。BP方程的结构被总结为定义连续时间扩散,解决本地版本的最大 entropy原理(A)、变量自由能原理(B)和一种 menos usual的平衡自由能原理(C)的Localized version。两个Bethe-Kikuchi函数的极点和站点信念都显示在两个约束表面的非线性交叉点上,其中一个表面保证能量征Compatibility,另一个表面保证边缘Consistency。在相互约束的交叉点上,equilibria变得不稳定,singular beliefs的抽象表面由多项式方程描述在具有consistent beliefs的几何体上。这个多项式可以用 Loop series expansion表示,其中变量是二进制变量的图。
Assessing Large Language Models on Climate Information
paper_authors: Jannis Bulian, Mike S. Schäfer, Afra Amini, Heidi Lam, Massimiliano Ciaramita, Ben Gaiarin, Michelle Chen Huebscher, Christian Buck, Niels Mede, Markus Leippold, Nadine Strauss
results: 研究人员通过评估多个最新的 LLM 和对其结果进行了全面分析,从而揭示了 LLM 在气候通信领域的潜在和局限性。Abstract
Understanding how climate change affects us and learning about available solutions are key steps toward empowering individuals and communities to mitigate and adapt to it. As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in this domain. In this study, we present a comprehensive evaluation framework, grounded in science communication principles, to analyze LLM responses to climate change topics. Our framework emphasizes both the presentational and epistemological adequacy of answers, offering a fine-grained analysis of LLM generations. Spanning 8 dimensions, our framework discerns up to 30 distinct issues in model outputs. The task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel and practical protocol for scalable oversight that uses AI Assistance and relies on raters with relevant educational backgrounds. We evaluate several recent LLMs and conduct a comprehensive analysis of the results, shedding light on both the potential and the limitations of LLMs in the realm of climate communication.
摘要
理解气候变化如何影响我们以及了解可用解决方案是关键步骤,以便让个人和社区能够避免和适应它。随着大型自然语言模型(LLM)的崛起,需要评估它们在这个领域的能力。本研究提出了一个完整的评估框架,基于科学沟通原则,用于分析LLM对气候变化话题的回答。我们的框架强调回答的现场和知识上的适用程度,从而提供细致的分析LLM生成的Output。涵盖8个维度,我们的框架可以识别出多达30个问题。这是一个真实存在的人工智能可以补充和提高人类表现的例子。我们提出了一种新的实用协助协议,使用人工智能协助和有相关教育背景的评审人员,以实现可扩展的监督。我们对一些最新的LLM进行了评估,并对结果进行了全面的分析,从而揭示LLM在气候沟通领域的潜力和局限性。
Learning-Aided Warmstart of Model Predictive Control in Uncertain Fast-Changing Traffic
results: 通过 Monte Carlo simulations validate our approach, 并显示其能够提供更多的本地最优点和更好的初始猜测。Abstract
Model Predictive Control lacks the ability to escape local minima in nonconvex problems. Furthermore, in fast-changing, uncertain environments, the conventional warmstart, using the optimal trajectory from the last timestep, often falls short of providing an adequately close initial guess for the current optimal trajectory. This can potentially result in convergence failures and safety issues. Therefore, this paper proposes a framework for learning-aided warmstarts of Model Predictive Control algorithms. Our method leverages a neural network based multimodal predictor to generate multiple trajectory proposals for the autonomous vehicle, which are further refined by a sampling-based technique. This combined approach enables us to identify multiple distinct local minima and provide an improved initial guess. We validate our approach with Monte Carlo simulations of traffic scenarios.
摘要
模型预测控制缺乏能够脱离非对称问题的本地最小点能力。此外,在快速变化、不确定环境中,传统的温始方法,使用上一步优质轨迹作为当前优质轨迹的初始猜测,经常无法提供足够近的初始猜测,从而可能导致偏移失败和安全问题。因此,这篇论文提出了基于神经网络的学习帮助的 Model Predictive Control 算法框架。我们的方法利用神经网络基于多模态预测器生成多个轨迹建议,然后使用抽象 sampling 技术进一步细化。这种结合方法使得我们能够识别多个不同的本地最小点,并提供改进的初始猜测。我们通过 Monte Carlo 仿真交通场景验证了我们的方法。
Boosting Dermatoscopic Lesion Segmentation via Diffusion Models with Visual and Textual Prompts
results: 对比于传统生成模型,该方法可以提高皮肤病变诊断的准确率,并且可以生成高质量的皮肤图像。实验结果显示,该方法可以提高SIMIT图像质量指标9%,并提高皮肤分割性能过5%。Abstract
Image synthesis approaches, e.g., generative adversarial networks, have been popular as a form of data augmentation in medical image analysis tasks. It is primarily beneficial to overcome the shortage of publicly accessible data and associated quality annotations. However, the current techniques often lack control over the detailed contents in generated images, e.g., the type of disease patterns, the location of lesions, and attributes of the diagnosis. In this work, we adapt the latest advance in the generative model, i.e., the diffusion model, with the added control flow using lesion-specific visual and textual prompts for generating dermatoscopic images. We further demonstrate the advantage of our diffusion model-based framework over the classical generation models in both the image quality and boosting the segmentation performance on skin lesions. It can achieve a 9% increase in the SSIM image quality measure and an over 5% increase in Dice coefficients over the prior arts.
摘要
医学图像合成方法,如生成对抗网络,已成为医学图像分析任务中常用的数据增强方法。它的主要优点是解决公共数据和相关质量注释的缺乏。然而,现有技术通常无法控制生成图像的详细内容,如疾病模式、肿瘤的位置和诊断特征。在这种情况下,我们采用最新的生成模型,即扩散模型,并在生成图像时使用病诊特定的视觉和文本提示。我们进一步示出了我们的扩散模型基于框架在图像质量和诊断性能方面的优势,可以实现9%的SSIM图像质量指标和5%以上的Dice系数提高。
Searching for High-Value Molecules Using Reinforcement Learning and Transformers
methods: 该论文使用了RL算法和文本 grammar来 Structuring the search space,并对不同的设计选择和训练策略进行了广泛的实验研究。
results: 经过EXTENSIVE实验,该论文提出了一种新的RL基于分子设计算法(ChemRLformer),并对25个分子设计任务进行了系统性的分析。结果表明,ChemRLformer可以达到现状之最的性能,而且比之前的工作更加直观,可以帮助解决文本基于分子设计的计算复杂问题。Abstract
Reinforcement learning (RL) over text representations can be effective for finding high-value policies that can search over graphs. However, RL requires careful structuring of the search space and algorithm design to be effective in this challenge. Through extensive experiments, we explore how different design choices for text grammar and algorithmic choices for training can affect an RL policy's ability to generate molecules with desired properties. We arrive at a new RL-based molecular design algorithm (ChemRLformer) and perform a thorough analysis using 25 molecule design tasks, including computationally complex protein docking simulations. From this analysis, we discover unique insights in this problem space and show that ChemRLformer achieves state-of-the-art performance while being more straightforward than prior work by demystifying which design choices are actually helpful for text-based molecule design.
摘要
利用文本表示法的强化学习(RL)可以有效地找到可以搜索图表的高值策略。然而,RL需要精心设计搜索空间和算法设计,以便在这个挑战中有效。通过广泛的实验,我们探索了不同的文本语法和训练算法的设计选择对RL策略的能力生成材料有效性的影响。我们开发了一种新的RL基于分子设计算法(ChemRLformer),并进行了余程分析,使用25个分子设计任务,包括计算复杂的蛋白质嵌入 simulations。从这种分析中,我们发现了这个问题空间中的独特意见,并证明了ChemRLformer在文本基本分子设计中 achieved state-of-the-art表现,而且比之前的工作更加直观,推翻了哪些设计选择对文本基本分子设计是有用的。
Notes on a Path to AI Assistance in Mathematical Reasoning
results: 论文获得了一些有用的结果,可以帮助研究数学家更好地完成他们的工作。In English, these notes would be:
for: The purpose of this paper is to provide AI assistance to research mathematicians.
methods: The paper uses AI technology to help mathematicians perform mathematical reasoning.
results: The paper obtains some useful results that can help research mathematicians complete their work more effectively.Abstract
These informal notes are based on the author's lecture at the National Academies of Science, Engineering, and Mathematics workshop on "AI to Assist Mathematical Reasoning" in June 2023. The goal is to think through a path by which we might arrive at AI that is useful for the research mathematician.
摘要
这些 informal 笔记是基于作者在美国国家科学、工程和数学学会工作坊上的讲座,主题是“AI 助力数学推理”,发生在2023年6月。目标是思考一种路径,以到达有用于研究数学家的 AI。Note: "National Academies of Science, Engineering, and Mathematics" is translated as "美国国家科学、工程和数学学会" in Simplified Chinese.
Recent Methodological Advances in Federated Learning for Healthcare
paper_authors: Fan Zhang, Daniel Kreuter, Yichen Chen, Sören Dittmer, Samuel Tull, Tolou Shadbahr, BloodCounts! Collaboration, Jacobus Preller, James H. F. Rudd, John A. D. Aston, Carola-Bibiane Schönlieb, Nicholas Gleadall, Michael Roberts
results: 文献评审发现了许多论文中的系统性问题,这些问题会影响论文中的方法质量。文献还提出了具体的建议,以改善联邦学习方法在医疗领域的发展质量。Abstract
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy or logistical concerns. Federated learning allows for the utilisation of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges which require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts and non-standardised variables. Federated learning adds significant methodological complexity to conventional centralised machine learning, requiring distributed optimisation, communication between nodes, aggregation of models and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers which fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.
摘要
For healthcare datasets, it is often not possible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data has many simultaneous challenges that require new methodologies to address, such as highly-siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus that were published between January 2015 and February 2023 and which describe new federated learning methodologies for addressing challenges with healthcare data. We performed a detailed review of the 89 papers that fulfilled these criteria. Significant systemic issues were identified throughout the literature which compromise the methodologies in many of the papers reviewed. We give detailed recommendations to help improve the quality of the methodology development for federated learning in healthcare.
Stable and Interpretable Deep Learning for Tabular Data: Introducing InterpreTabNet with the Novel InterpreStability Metric
results: 在多种应用场景中,InterpreTabNet超过了其他领先解释模型,并提出了一个新的评价指标InterpreStability,可以帮助评估和比较未来模型的解释性。Abstract
As Artificial Intelligence (AI) integrates deeper into diverse sectors, the quest for powerful models has intensified. While significant strides have been made in boosting model capabilities and their applicability across domains, a glaring challenge persists: many of these state-of-the-art models remain as black boxes. This opacity not only complicates the explanation of model decisions to end-users but also obstructs insights into intermediate processes for model designers. To address these challenges, we introduce InterpreTabNet, a model designed to enhance both classification accuracy and interpretability by leveraging the TabNet architecture with an improved attentive module. This design ensures robust gradient propagation and computational stability. Additionally, we present a novel evaluation metric, InterpreStability, which quantifies the stability of a model's interpretability. The proposed model and metric mark a significant stride forward in explainable models' research, setting a standard for transparency and interpretability in AI model design and application across diverse sectors. InterpreTabNet surpasses other leading solutions in tabular data analysis across varied application scenarios, paving the way for further research into creating deep-learning models that are both highly accurate and inherently explainable. The introduction of the InterpreStability metric ensures that the interpretability of future models can be measured and compared in a consistent and rigorous manner. Collectively, these contributions have the potential to promote the design principles and development of next-generation interpretable AI models, widening the adoption of interpretable AI solutions in critical decision-making environments.
摘要
为了满足人工智能(AI)在多个领域的应用,强大模型的研究得到了推动。虽然在提高模型能力和适用范围方面做出了重要进展,但是一个主要挑战仍然存在:许多这些状态艺术模型仍然是黑盒模型。这种透明度不仅使得模型决策的解释对终端用户变得更加困难,还使得模型设计者不能够深入了解模型的中间过程。为解决这些挑战,我们提出了InterpreTabNet,一种基于TabNet架构的模型,通过改进的注意力模块来提高分类精度和可解释性。这种设计保证了正确的gradient传播和计算稳定性。此外,我们还提出了一个新的评价指标——InterpreStability,可以衡量模型的可解释性稳定性。提出的模型和指标标志着对可解释AI模型的研究的一个重要进步,为AI模型设计和应用在多个领域的扩展奠定了基础。InterpreTabNet在不同的应用场景中对 tabular 数据进行分类Task 表现出色,为未来的可解释AI模型的研究开辟了新的可能性。InterpreStability指标的引入确保了未来模型的可解释性可以在一致和严格的方式进行衡量和比较。总的来说,这些贡献有助于推动下一代可解释AI模型的设计原则和开发,扩大AI模型在重要决策环境中的应用。
A novel asymmetrical autoencoder with a sparsifying discrete cosine Stockwell transform layer for gearbox sensor data compression
methods: 本文提出了一种Signal-adaptive asymmetrical autoencoder,其中引入了一个新的离散归一transform domain layer,并在这个域中实现了一个可学习的滤波器。此外,还应用了一个可学习的均值阈值层来使feature map sparse。相比 Linear layer,DCST层可以减少trainable参数的数量,并提高数据重建的准确性。
results: 对于University of Connecticut (UoC)和Southeast University (SEU)的牙轮数据集,提出的方法与其他autoencoder-based方法相比,平均质量分数提高2.00%最低和32.35%最高,仅需用限制数据集的训练样本。Abstract
The lack of an efficient compression model remains a challenge for the wireless transmission of gearbox data in non-contact gear fault diagnosis problems. In this paper, we present a signal-adaptive asymmetrical autoencoder with a transform domain layer to compress sensor signals. First, a new discrete cosine Stockwell transform (DCST) layer is introduced to replace linear layers in a multi-layer autoencoder. A trainable filter is implemented in the DCST domain by utilizing the multiplication property of the convolution. A trainable hard-thresholding layer is applied to reduce redundant data in the DCST layer to make the feature map sparse. In comparison to the linear layer, the DCST layer reduces the number of trainable parameters and improves the accuracy of data reconstruction. Second, training the autoencoder with a sparsifying DCST layer only requires a small number of datasets. The proposed method is superior to other autoencoder-based methods on the University of Connecticut (UoC) and Southeast University (SEU) gearbox datasets, as the average quality score is improved by 2.00% at the lowest and 32.35% at the highest with a limited number of training samples
摘要
缺乏高效压缩模型是无线传输变速盘数据的挑战,在这篇论文中,我们提出了一个适应信号的不同类型自适应器,包括一个对称自适应器和一个对称自适应器的变分层。首先,我们引入了一个新的简单cosine Stockwell变换(DCST)层,以取代多层自适应器中的线性层。这个层通过在DCST空间中实现可读性的范围内的对称滤波器,以便将数据映射到一个更短的特征地图。其次,我们将一个可调范围内的硬边阈层应用于DCST层,以将特征地图删除重复的数据,并使特征地图更加简洁。相比于线性层,DCST层可以减少训练参数的数量,并提高数据重建的精度。此外,我们发现使用DCST层进行训练只需要一小部分的数据,并且与其他基于自适应器的方法相比,我们的方法可以在UoC和SEU箱变数据上提高均值品质分数,从最低的2.00%到最高的32.35%。
Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection
methods: 我们提出了一种新的框架,包括两个组件:归一化矩阵学习(RQL)和Chebychev wavelet GNN with RQ-pooling(CWGNN-RQ)。RQL直接捕捉图的律RAYLEIGH积分,而CWGNN-RQ则通过spectral空间来探索图的异常性。
results: 我们在10个真实世界数据集上进行了广泛的实验,结果显示,RQGNN比最佳竞争对手提高了6.74%的macro-F1分数和1.44%的AUC值,这表明我们的框架有效。Abstract
Graph-level anomaly detection has gained significant attention as it finds many applications in various domains, such as cancer diagnosis and enzyme prediction. However, existing methods fail to capture the underlying properties of graph anomalies, resulting in unexplainable framework design and unsatisfying performance. In this paper, we take a step back and re-investigate the spectral differences between anomalous and normal graphs. Our main observation shows a significant disparity in the accumulated spectral energy between these two classes. Moreover, we prove that the accumulated spectral energy of the graph signal can be represented by its Rayleigh Quotient, indicating that the Rayleigh Quotient is a driving factor behind the anomalous properties of graphs. Motivated by this, we propose Rayleigh Quotient Graph Neural Network (RQGNN), the first spectral GNN for graph-level anomaly detection, providing a new perspective on exploring the inherent spectral features of anomalous graphs. Specifically, we introduce a novel framework that consists of two components: the Rayleigh Quotient learning component (RQL) and Chebyshev Wavelet GNN with RQ-pooling (CWGNN-RQ). RQL explicitly captures the Rayleigh Quotient of graphs and CWGNN-RQ implicitly explores the spectral space of graphs. Extensive experiments on 10 real-world datasets show that RQGNN outperforms the best rival by 6.74% in Macro-F1 score and 1.44% in AUC, demonstrating the effectiveness of our framework.
摘要
graph уровня异常检测已经受到了广泛关注,因为它在各个领域,如癌症诊断和酶预测中发现了多种应用。然而,现有方法无法捕捉图异常的基本性质,导致框架设计不可解释和性能不满。在这篇论文中,我们往回一步,重新调查图异常的spectral differences。我们的主要观察结果表明,图异常和正常图之间的累积spectral energy存在显著差异。此外,我们证明了累积图信号的spectral energy可以由其Rayleigh Quotient表示,这表明Rayleigh Quotient是图异常性的驱动因素。这些观察和证明 inspirited我们提出Rayleigh Quotient Graph Neural Network(RQGNN),这是首个spectral GNN для图异常检测,它提供了一个新的视角来探索异常图的内在spectral特征。我们的框架包括两个组成部分:Rayleigh Quotient学习Component(RQL)和Chebyshev Wavelet GNN with RQ-pooling(CWGNN-RQ)。RQLExplicitly捕捉图的Rayleigh Quotient,而CWGNN-RQ则通过spectral空间来探索图的特征。我们在10个实际数据集上进行了广泛的实验,结果表明,RQGNN在Macro-F1分数和AUC方面比最佳竞争者高6.74%和1.44%,这表明我们的框架的效果。
Large language models in textual analysis for gesture selection
results: 该论文发现,使用LLMs可以减少繁重的注释量和快速适应不同的设计者意图。这种方法有望成为自动化手势生成技术的可能性。Abstract
Gestures perform a variety of communicative functions that powerfully influence human face-to-face interaction. How this communicative function is achieved varies greatly between individuals and depends on the role of the speaker and the context of the interaction. Approaches to automatic gesture generation vary not only in the degree to which they rely on data-driven techniques but also the degree to which they can produce context and speaker specific gestures. However, these approaches face two major challenges: The first is obtaining sufficient training data that is appropriate for the context and the goal of the application. The second is related to designer control to realize their specific intent for the application. Here, we approach these challenges by using large language models (LLMs) to show that these powerful models of large amounts of data can be adapted for gesture analysis and generation. Specifically, we used ChatGPT as a tool for suggesting context-specific gestures that can realize designer intent based on minimal prompts. We also find that ChatGPT can suggests novel yet appropriate gestures not present in the minimal training data. The use of LLMs is a promising avenue for gesture generation that reduce the need for laborious annotations and has the potential to flexibly and quickly adapt to different designer intents.
摘要
姿势可以具有多种交流功能,强烈影响人对面交流。这种交流功能的实现方式各不相同,归结在发言人的角色和交流Context中。自动姿势生成的方法不仅在数据驱动技术的程度上存在差异,还在可以生成Context和发言人Specific姿势上存在差异。然而,这些方法面临两个主要挑战:第一是获得适合Context和应用目标的充足训练数据。第二是 designer控制,以实现他们特定的应用目标。我们在这里解决这些挑战,使用大语言模型(LLMs)来示出这些强大的模型可以适应姿势分析和生成。我们使用ChatGPT作为工具,以提供Context特定的姿势建议,基于最小的提示来实现设计师的意图。我们还发现,ChatGPT可以建议不在最小训练数据中存在的新的、适当的姿势。使用LLMs是一个有前途的方式,可以减少繁重的注释和快速适应不同的设计师意图。
GPT-4 as an interface between researchers and computational software: improving usability and reproducibility
results: 研究发现,GPT-4可以生成正确和可用的输入文件,并且可以对复杂多步计算任务进行初步设置。此外,GPT-4可以从输入文件中提取计算任务的描述,并且可以根据需要进行调整,从详细的步骤指令转换为适合出版的概要描述。研究结果表明,GPT-4可以减少研究者的日常任务数量,加速新用户的培训,并提高结果的重复性。Abstract
Large language models (LLMs) are playing an increasingly important role in science and engineering. For example, their ability to parse and understand human and computer languages makes them powerful interpreters and their use in applications like code generation are well-documented. We explore the ability of the GPT-4 LLM to ameliorate two major challenges in computational materials science: i) the high barriers for adoption of scientific software associated with the use of custom input languages, and ii) the poor reproducibility of published results due to insufficient details in the description of simulation methods. We focus on a widely used software for molecular dynamics simulations, the Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS), and quantify the usefulness of input files generated by GPT-4 from task descriptions in English and its ability to generate detailed descriptions of computational tasks from input files. We find that GPT-4 can generate correct and ready-to-use input files for relatively simple tasks and useful starting points for more complex, multi-step simulations. In addition, GPT-4's description of computational tasks from input files can be tuned from a detailed set of step-by-step instructions to a summary description appropriate for publications. Our results show that GPT-4 can reduce the number of routine tasks performed by researchers, accelerate the training of new users, and enhance reproducibility.
摘要
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
paper_authors: Chen Dun, Mirian Hipolito Garcia, Guoqing Zheng, Ahmed Hassan Awadallah, Anastasios Kyrillidis, Robert Sim for: 这个论文的目的是探讨如何使用 Mixture of Prompts(MoPs)和智能阀值功能来调整含有多种任务和数据分布的大型语言模型(LLMs),以提高其在多任务、多源enario下的性能。methods: 这篇论文提出使用 MoPs 和智能阀值功能来实现这个目标,其中 MoPs 是一种组合多个提示的技术,可以在不同的任务和数据分布下 dynamically assign 合适的专家提示,以提高模型的性能。results: 实验结果表明,使用 MoPs 可以在多任务、多源enario下 mitigate 提示训练 “干扰”,以及模型的缺失和误差。 Specifically, MoPs 在 federated scenario 下可以降低 final perplexity 从 $\sim20%$ 降至 $\sim70%$,而在 centralized scenario 下可以降低 final perplexity 从 $\sim 3%$ 降至 $\sim30%$。Abstract
Large Language Models (LLMs) have the ability to solve a variety of tasks, such as text summarization and mathematical questions, just out of the box, but they are often trained with a single task in mind. Due to high computational costs, the current trend is to use prompt instruction tuning to better adjust monolithic, pretrained LLMs for new -- but often individual -- downstream tasks. Thus, how one would expand prompt tuning to handle -- concomitantly -- heterogeneous tasks and data distributions is a widely open question. To address this gap, we suggest the use of \emph{Mixture of Prompts}, or MoPs, associated with smart gating functionality: the latter -- whose design is one of the contributions of this paper -- can identify relevant skills embedded in different groups of prompts and dynamically assign combined experts (i.e., collection of prompts), based on the target task. Additionally, MoPs are empirically agnostic to any model compression technique applied -- for efficiency reasons -- as well as instruction data source and task composition. In practice, MoPs can simultaneously mitigate prompt training "interference" in multi-task, multi-source scenarios (e.g., task and data heterogeneity across sources), as well as possible implications from model approximations. As a highlight, MoPs manage to decrease final perplexity from $\sim20\%$ up to $\sim70\%$, as compared to baselines, in the federated scenario, and from $\sim 3\%$ up to $\sim30\%$ in the centralized scenario.
摘要
大型语言模型(LLM)具有多种任务的解决能力,如文本概要和数学问题,直接从箱中提取,但它们通常在单一任务的训练下进行学习。由于计算成本高昂,当前趋势是使用提示指定调整已经预训练的LLM以适应新的、但通常是个体的下游任务。因此,如何同时处理多种不同任务和数据分布是一个未解决的问题。为解决这个漏洞,我们建议使用“混合提示”(Mixture of Prompts,MoPs),它们与智能闭合功能相结合:后者可以在目标任务上标识不同的提示组中嵌入的相关技能,并在运行时动态分配组合专家(即提示集)。此外,MoPs是对任何模型压缩技术应用的empirical无关,以及指令数据源和任务组合。在实践中,MoPs可以同时消除提示训练“干扰”在多任务多源场景(例如,任务和数据多样性遍布多个源),以及可能的模型缺失。高亮是,MoPs可以将最终的抗抑扰度从大约20%降低至大约70%,比基eline更高,在联合场景下,以及从大约3%降低至大约30%,在中央场景下。
Improving Vision Anomaly Detection with the Guidance of Language Modality
For: 这篇论文的目的是提出一种基于多模式的异常检测方法,以解决现有的数据繁残和缺乏特征空间问题,以应对工业瑕疵检测、事件检测等。* Methods: 这篇论文提出了两个方法来解决缺乏特征空间和数据繁残问题,即跨模式统计学减少(CMER)和跨模式线性嵌入(CMLE)。CMER会在原始图像中遮盾一部分,然后与文本进行匹配分数,并将无关像排除以专注于重要内容。CMLE则是从语言模式中学习一个相互联系结构矩阵,以导引视觉模式中的内存空间学习。* Results: 实验结果显示,这篇论文的提案方法比基于图像的基eline方法高效16.81%。剥ppings experiments进一步证明了提案方法的协调性,每个 ком成分都需要对另一个 ком成分进行协调以 дости� optimal performance。Abstract
Recent years have seen a surge of interest in anomaly detection for tackling industrial defect detection, event detection, etc. However, existing unsupervised anomaly detectors, particularly those for the vision modality, face significant challenges due to redundant information and sparse latent space. Conversely, the language modality performs well due to its relatively single data. This paper tackles the aforementioned challenges for vision modality from a multimodal point of view. Specifically, we propose Cross-modal Guidance (CMG), which consists of Cross-modal Entropy Reduction (CMER) and Cross-modal Linear Embedding (CMLE), to tackle the redundant information issue and sparse space issue, respectively. CMER masks parts of the raw image and computes the matching score with the text. Then, CMER discards irrelevant pixels to make the detector focus on critical contents. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix. Thereafter, the vision latent space will get semantically similar images closer. Extensive experiments demonstrate the effectiveness of the proposed methods. Particularly, CMG outperforms the baseline that only uses images by 16.81%. Ablation experiments further confirm the synergy among the proposed methods, as each component depends on the other to achieve optimal performance.
摘要
近年来,异常检测在工业缺陷检测和事件检测等领域得到了广泛关注。然而,现有的无监督异常检测器,特别是视觉模式的检测器,面临着冗余信息和稀疏的 latent space 的挑战。而语言模式的检测器则表现良好,这是因为语言数据相对较少。这篇论文从多模态角度解决了上述挑战。我们提出了跨模态指导(CMG),它包括跨模态Entropy减少(CMER)和跨模态线性嵌入(CMLE),以解决冗余信息问题和稀疏空间问题。CMER 将原始图像部分掩码,并计算与文本的匹配得分。然后,CMER 丢弃无关像素,使检测器更注意关键内容。为了学习更加紧凑的视觉异常检测器的 latent space,CMLE 学习了语言模式的相关结构矩阵,然后视觉模式的 latent space 将会被指导而学习。最后,视觉 latent space 将会更加紧凑,Semantic 相似的图像将会更加接近。广泛的实验表明我们提出的方法的效果。特别是,CMG 在只使用图像的基础上进行检测时,比基eline 高出16.81%。剖除实验还证明了我们提出的方法之间的相互依存关系,每个组件都需要另一个组件以达到最佳性能。
Time-Series Classification in Smart Manufacturing Systems: An Experimental Evaluation of State-of-the-Art Machine Learning Algorithms
results: 研究发现,ResNet、DrCIF、InceptionTime和ARSENAL算法在22个生产时间序列分类任务中的平均准确率高于96.6%。此外,LSTM、BiLSTM和TS-LSTM算法也表现出色,能够在时间序列数据中捕捉特征。Abstract
Manufacturing is gathering extensive amounts of diverse data, thanks to the growing number of sensors and rapid advances in sensing technologies. Among the various data types available in SMS settings, time-series data plays a pivotal role. Hence, TSC emerges is crucial in this domain. The objective of this study is to fill this gap by providing a rigorous experimental evaluation of the SoTA ML and DL algorithms for TSC tasks in manufacturing and industrial settings. We first explored and compiled a comprehensive list of more than 92 SoTA algorithms from both TSC and manufacturing literature. Following, we selected the 36 most representative algorithms from this list. To evaluate their performance across various manufacturing classification tasks, we curated a set of 22 manufacturing datasets, representative of different characteristics that cover diverse manufacturing problems. Subsequently, we implemented and evaluated the algorithms on the manufacturing benchmark datasets, and analyzed the results for each dataset. Based on the results, ResNet, DrCIF, InceptionTime, and ARSENAL are the top-performing algorithms, boasting an average accuracy of over 96.6% across all 22 manufacturing TSC datasets. These findings underscore the robustness, efficiency, scalability, and effectiveness of convolutional kernels in capturing temporal features in time-series data, as three out of the top four performing algorithms leverage these kernels for feature extraction. Additionally, LSTM, BiLSTM, and TS-LSTM algorithms deserve recognition for their effectiveness in capturing features within time-series data using RNN-based structures.
摘要
制造业收集了大量多样数据,这主要归功于传感器的增加和感知技术的快速发展。在SMS设置中,时序数据扮演着关键角色,因此TSC在这个领域变得非常重要。本研究的目的是填补这个空白,通过对SoTA ML和DL算法在制造业和工业设置中的实验评估来提供一个严格的实验评估。我们首先搜索和组织了More than 92 SoTA算法,其中大部分来自于TSC和制造业文献。然后,我们选择了这些列表中的36个最有代表性的算法。为了评估这些算法在不同的制造类型任务中的性能,我们准备了22个制造数据集,这些数据集代表了不同的制造问题,并覆盖了多种不同的特征。接下来,我们对制造benchmark数据集进行了实现和评估,并分析了每个数据集的结果。根据结果,ResNet、DrCIF、InceptionTime和ARSENAL算法在22个制造TSC数据集中的平均准确率高于96.6%。这些结果表明了 convolutional kernels在时序数据中捕捉特征的稳定性、效率、扩展性和有效性。此外,LSTM、BiLSTM和TS-LSTM算法在时序数据中捕捉特征的效果也值得注意。
Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design
paper_authors: Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
results: 实验结果表明,使用GROOVE方法可以在不同环境中达到更高的泛化性,并且AR被证明是UED中的一个关键组成部分。Abstract
The past decade has seen vast progress in deep reinforcement learning (RL) on the back of algorithms manually designed by human researchers. Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks. Despite impressive initial results from algorithms such as Learned Policy Gradient (LPG), there remains a generalization gap when these algorithms are applied to unseen environments. In this work, we examine how characteristics of the meta-training distribution impact the generalization performance of these algorithms. Motivated by this analysis and building on ideas from Unsupervised Environment Design (UED), we propose a novel approach for automatically generating curricula to maximize the regret of a meta-learned optimizer, in addition to a novel approximation of regret, which we name algorithmic regret (AR). The result is our method, General RL Optimizers Obtained Via Environment Design (GROOVE). In a series of experiments, we show that GROOVE achieves superior generalization to LPG, and evaluate AR against baseline metrics from UED, identifying it as a critical component of environment design in this setting. We believe this approach is a step towards the discovery of truly general RL algorithms, capable of solving a wide range of real-world environments.
摘要
过去一代,深度强化学习(RL)在人工设计的算法支持下做出了巨大的进步。最近,人们发现可以使用meta学习更新规则,以期望在多种RL任务上发现高效的算法。虽然LPG等算法在初期的成果很吸引人,但是在未经见过的环境中仍然存在一般化差距。在这项工作中,我们分析了meta训练分布的特点对RL算法的泛化性的影响。受到这种分析和基于无监督环境设计的想法,我们提出了一种新的方法,即通过自动生成课程来最大化meta学习器的快捷(GROOVE)。在一系列实验中,我们证明GROOVE可以在LPG的基础上实现更好的泛化性,并评估了AR在UED中的基准指标,并证明它是环境设计中的关键组成部分。我们认为这种方法是在发现真正泛化RL算法的步骤, capable of solving多种实际环境中的问题。
Integrating UMLS Knowledge into Large Language Models for Medical Question Answering
results: 研究结果显示,这个框架可以有效提高生成内容的实际性、完整性和相关性。多名医生进行了盲测评估,结果表明这个框架可以增强生成内容的质量。Abstract
Large language models (LLMs) have demonstrated powerful text generation capabilities, bringing unprecedented innovation to the healthcare field. While LLMs hold immense promise for applications in healthcare, applying them to real clinical scenarios presents significant challenges, as these models may generate content that deviates from established medical facts and even exhibit potential biases. In our research, we develop an augmented LLM framework based on the Unified Medical Language System (UMLS), aiming to better serve the healthcare community. We employ LLaMa2-13b-chat and ChatGPT-3.5 as our benchmark models, and conduct automatic evaluations using the ROUGE Score and BERTScore on 104 questions from the LiveQA test set. Additionally, we establish criteria for physician-evaluation based on four dimensions: Factuality, Completeness, Readability and Relevancy. ChatGPT-3.5 is used for physician evaluation with 20 questions on the LiveQA test set. Multiple resident physicians conducted blind reviews to evaluate the generated content, and the results indicate that this framework effectively enhances the factuality, completeness, and relevance of generated content. Our research demonstrates the effectiveness of using UMLS-augmented LLMs and highlights the potential application value of LLMs in in medical question-answering.
摘要
大型语言模型(LLMs)已经显示出强大的文本生成能力,带来医疗领域前所未有的创新。然而,将LLMs应用到实际的医疗情况中存在着重大挑战,因为这些模型可能会生成不符合现有医疗知识的内容,甚至会显示出潜在偏见。在我们的研究中,我们开发了基于Unified Medical Language System(UMLS)的增强LLM框架,以更好地服务医疗社区。我们使用LLaMa2-13b-chat和ChatGPT-3.5作为我们的参考模型,并通过ROUGE Score和BERTScore自动评估104个LiveQA试题集中的问题。此外,我们建立了基于四个维度的医生评估标准:事实性、完整性、可读性和相关性。ChatGPT-3.5被用来进行医生评估,并使用20个LiveQA试题集中的问题进行盲评。多名住院医生进行了双盲评估,以评估生成的内容的实际性、完整性和相关性。我们的研究显示,这个框架有效地提高了生成内容的事实性、完整性和相关性。我们的研究显示LLMs在医疗问题回答中的应用价值,并显示了UMLS-增强LLMs在医疗领域的应用前景。
Spike Accumulation Forwarding for Effective Training of Spiking Neural Networks
results: 我们通过实验证明了 SAF 可以降低内存和训练时间,保持精度。具体来说,我们在 MNIST 和 CIFAR-10 数据集上进行了实验,结果显示,使用 SAF 可以在内存和训练时间方面减少一半,而且精度保持在原始精度水平。Abstract
In this article, we propose a new paradigm for training spiking neural networks (SNNs), spike accumulation forwarding (SAF). It is known that SNNs are energy-efficient but difficult to train. Consequently, many researchers have proposed various methods to solve this problem, among which online training through time (OTTT) is a method that allows inferring at each time step while suppressing the memory cost. However, to compute efficiently on GPUs, OTTT requires operations with spike trains and weighted summation of spike trains during forwarding. In addition, OTTT has shown a relationship with the Spike Representation, an alternative training method, though theoretical agreement with Spike Representation has yet to be proven. Our proposed method can solve these problems; namely, SAF can halve the number of operations during the forward process, and it can be theoretically proven that SAF is consistent with the Spike Representation and OTTT, respectively. Furthermore, we confirmed the above contents through experiments and showed that it is possible to reduce memory and training time while maintaining accuracy.
摘要
在本文中,我们提出了一种新的思维方式来训练神经元脉冲网络(SNN),即脉冲汇聚前进(SAF)。已知SNN具有能效的能源占用但训练困难。因此,许多研究人员已经提出了多种解决方案,其中在线训练通过时间(OTTT)是一种可以在每个时间步骤上进行推理的方法。然而,在GPU上计算时,OTTT需要对脉冲列表和权重总和的脉冲列表进行操作。此外,OTTT与脉冲表示之间存在关系,尚未经过理论确认。我们的提议的方法可以解决这些问题,即SAF可以在前进过程中减少操作数量的一半,并且可以从理论上证明SAF与OTTT和脉冲表示之间存在一致性。此外,我们通过实验证明了以上内容,并证明可以降低内存和训练时间的同时保持准确性。
Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems
results: 该算法在CEC 2005 和 CEC 2017 的标准测试问题上进行验证,并表现出了改善的robustness和搜索空间探索能力。此外,与其他最近的metaheuristic算法进行比较,该算法的结果也表现出了优越性。Abstract
A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms.
摘要
本文引入一种修改后的LAB算法。它基于原始LAB算法(Reddy等2023),该算法模拟了社会中竞争和学习行为,在群体中建立层次角色。提案的算法添加了扭轮方法和减少因子,引入群体间竞争和迭代缩小样本空间。该算法通过CEC 2005和CEC 2017的标准测试问题进行验证,并使用标准统计测试如双边对应签名沃克逊测试和Friendman排名测试验证解决方案。算法表现出了改善的Robustness和搜索空间探索能力。此外,一种归一化搜索空间减少(C-SSR)方法被提议,使算法能够解决约束问题。C-SSR方法使算法能够识别满足约束的可行区域归一化,从而实现优化解决方案。这种方法证明了它作为传统约束处理技术的替代方案的有效性。本文结果与其他最近的metaheuristic算法相比较。
results: 我们的模型可以生成多种独特的3D人头,并且可以在不同的游戏和动画场景中使用,同时也可以提供高质量的渲染图像。我们还引入了一些量化的metric,可以衡量模型的性能和多样性。您可以在https://munch-seven.vercel.app/中找到示例和数据集下载。Abstract
The automated generation of 3D human heads has been an intriguing and challenging task for computer vision researchers. Prevailing methods synthesize realistic avatars but with limited control over the diversity and quality of rendered outputs and suffer from limited correlation between shape and texture of the character. We propose a method that offers quality, diversity, control, and realism along with explainable network design, all desirable features to game-design artists in the domain. First, our proposed Geometry Generator identifies disentangled latent directions and generate novel and diverse samples. A Render Map Generator then learns to synthesize multiply high-fidelty physically-based render maps including Albedo, Glossiness, Specular, and Normals. For artists preferring fine-grained control over the output, we introduce a novel Color Transformer Model that allows semantic color control over generated maps. We also introduce quantifiable metrics called Uniqueness and Novelty and a combined metric to test the overall performance of our model. Demo for both shapes and textures can be found: https://munch-seven.vercel.app/. We will release our model along with the synthetic dataset.
摘要
computer vision 研究者们已经有很长时间来努力地自动生成3D人头。现有的方法可以生成具有真实感的人头模型,但是它们受到形状和文化特征的限制,而且生成的结果的质量和多样性受到限制。我们提出了一种方法,它可以同时提供高质量、多样性、控制和真实感等所有愉悦的特点,这些特点都是游戏设计师所需的。我们的提议的Geometry Generator可以识别分离的约束方向,并生成新和多样的样本。然后,我们的Render Map Generator可以synthesize多个高级Physically-Based Rendering(PBR)映射,包括Albedo、Glossiness、Specular和Normals。为了让艺术家可以进行细致的控制,我们引入了一种新的Color Transformer Model,它允许用户在生成的映射中进行semantic色控制。我们还引入了Uniqueness和Novelty这两个量化度量,以评估我们的模型的总性能。您可以在https://munch-seven.vercel.app/中找到我们的demo和数据集。我们计划将我们的模型和数据集发布出来。
Inclusive Data Representation in Federated Learning: A Novel Approach Integrating Textual and Visual Prompt
results: 比基elines表现出色,提高了客户端模型的全球知识获取和模型的稳定性和CompactnessAbstract
Federated Learning (FL) is often impeded by communication overhead issues. Prompt tuning, as a potential solution, has been introduced to only adjust a few trainable parameters rather than the whole model. However, current single-modality prompt tuning approaches fail to comprehensively portray local clients' data. To overcome this limitation, we present Twin Prompt Federated learning (TPFL), a pioneering solution that integrates both visual and textual modalities, ensuring a more holistic representation of local clients' data characteristics. Furthermore, in order to tackle the data heterogeneity issues, we introduce the Augmented TPFL (ATPFL) employing the contrastive learning to TPFL, which not only enhances the global knowledge acquisition of client models but also fosters the development of robust, compact models. The effectiveness of TPFL and ATPFL is substantiated by our extensive evaluations, consistently showing superior performance compared to all baselines.
摘要
Federated Learning (FL) 常被通信开销问题困扰。为解决这问题,我们提出了快速调整trainable参数的方法,而不是整个模型。然而,现有的单Modal prompt tuning方法无法全面反映本地客户端数据特征。为了解决这些限制,我们提出了双Modal prompt federated learning(TPFL),它将视觉和文本模式 integrate,以确保更全面地表征本地客户端数据特征。另外,为了解决数据不一致问题,我们提出了增强版TPFL(ATPFL),通过对TPFL进行对照学习,不仅提高客户端模型的全局知识获取,还会促进模型的紧凑和Robust。我们对TPFL和ATPFL进行了广泛的评估,并 consistently show superior performance compared to all baselines。
Functional trustworthiness of AI systems by statistically valid testing
results: 提出了三个必需元素来确保AI系统的可靠性和可信worthiness,即(1)应用领域的技术分布定义,(2)基于风险的最低性能要求,和(3)基于独立随机抽样的统计测试。Abstract
The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways. Yet enacting a conformity assessment procedure that creates the false illusion of trust in insufficiently assessed AI systems is at best naive and at worst grossly negligent. The EU AI Act thus misses the point of ensuring quality by functional trustworthiness and correctly attributing responsibilities. The trustworthiness of an AI decision system lies first and foremost in the correct statistical testing on randomly selected samples and in the precision of the definition of the application domain, which enables drawing samples in the first place. We will subsequently call this testable quality functional trustworthiness. It includes a design, development, and deployment that enables correct statistical testing of all relevant functions. We are firmly convinced and advocate that a reliable assessment of the statistical functional properties of an AI system has to be the indispensable, mandatory nucleus of the conformity assessment. In this paper, we describe the three necessary elements to establish a reliable functional trustworthiness, i.e., (1) the definition of the technical distribution of the application, (2) the risk-based minimum performance requirements, and (3) the statistically valid testing based on independent random samples.
摘要
作者们对欧盟人工智能(AI)法草案中的安全、健康和公民权的不足表示关切。现行草案和相关的标准化努力在CEN/CENELEC中都采取了位置,即AI系统的实际功能保证是不可能和太复杂。然而,实施不充分的验证程序,只是创造了虚假的信任感,这是最好的情况下的懒散,最坏的情况下是格外费尽。欧盟AI法因此错过了保证质量的机会,不能正确归因责任。我们认为AI决策系统的可靠性首先取决于对随机抽样的正确统计测试和应用领域的精确定义。我们将称之为可测试的功能信任。它包括设计、开发和部署,以便对所有相关功能进行正确的统计测试。我们坚持认为,对AI系统的统计功能性的可靠评估是必不可少的,也是强制的核心。在这篇论文中,我们介绍了三个必要元素,以建立可靠的功能信任:1. 应用领域的技术分布定义2. 基于风险的最低性能要求3. 基于独立随机抽样的统计测试这三个元素是建立可靠的AI系统功能信任的必要条件。
Online Clustering of Bandits with Misspecified User Models
results: 本文证明了其算法的 regret 上界为 $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$,比之前的 CB 工作更加宽泛,不再需要特定的技术假设。实验结果表明其算法在 synthetic 和实际数据上表现出色,超过了之前的算法。Abstract
The contextual linear bandit is an important online learning problem where given arm features, a learning agent selects an arm at each round to maximize the cumulative rewards in the long run. A line of works, called the clustering of bandits (CB), utilize the collaborative effect over user preferences and have shown significant improvements over classic linear bandit algorithms. However, existing CB algorithms require well-specified linear user models and can fail when this critical assumption does not hold. Whether robust CB algorithms can be designed for more practical scenarios with misspecified user models remains an open problem. In this paper, we are the first to present the important problem of clustering of bandits with misspecified user models (CBMUM), where the expected rewards in user models can be perturbed away from perfect linear models. We devise two robust CB algorithms, RCLUMB and RSCLUMB (representing the learned clustering structure with dynamic graph and sets, respectively), that can accommodate the inaccurate user preference estimations and erroneous clustering caused by model misspecifications. We prove regret upper bounds of $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$ for our algorithms under milder assumptions than previous CB works (notably, we move past a restrictive technical assumption on the distribution of the arms), which match the lower bound asymptotically in $T$ up to logarithmic factors, and also match the state-of-the-art results in several degenerate cases. The techniques in proving the regret caused by misclustering users are quite general and may be of independent interest. Experiments on both synthetic and real-world data show our outperformance over previous algorithms.
摘要
Contextual linear bandit是一个重要的在线学习问题,给出arm特征,学习代理选择arm每个轮次以最大化长期的奖励。一系列工作,称为 clustering of bandits(CB),利用用户偏好的共同效应并显示了较好的性能than classic linear bandit算法。然而,现有的CB算法需要well-specified的线性用户模型,并且在这个假设不成立时可能失败。whether robust CB算法可以设计 для更实际的场景,即misspecified用户模型,是一个开放的问题。在这篇论文中,我们是第一个提出 clustering of bandits with misspecified user models(CBMUM)问题,其中用户预期奖励的预测可能偏离完美的线性模型。我们设计了两种Robust CB算法,即RCLUMB和RSCLUMB(分别表示学习 clustering 结构的动态图和集合),它们可以承受用户偏好估计的不准确和 clustering 错误。我们证明了我们算法的 regret upper bound为 $O(\epsilon_*T\sqrt{md\log T} + d\sqrt{mT}\log T)$,这与前一些 CB 工作(特别是不 restrictive的技术假设)下的更强的假设下,并且与state-of-the-art 结果相同,并且在一些degree degenerate 的情况下也相同。我们的证明技巧可能是独立的兴趣。在 synthetic 和实际数据上进行了实验,我们的表现超过了之前的算法。
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain
results: 作者比较了scHyena与其他参考方法的下游任务性能,包括细胞类型分类和scRNA-seq补充,并证明scHyena表现较好。Abstract
Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inherent measurement noise stemming from dropout events and the limited utilization of extensive gene expression information. In this work, we introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. Specifically, inspired by the recent Hyena operator, we design a novel Transformer architecture called singe-cell Hyena (scHyena) that is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a {bidirectional} Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data. In particular, our model learns generalizable features of cells and genes through pre-training scHyena using the full length of scRNA-seq data. We demonstrate the superior performance of scHyena compared to other benchmark methods in downstream tasks, including cell type classification and scRNA-seq imputation.
摘要
Single-cell RNA sequencing (scRNA-seq) 技术已经在解释脑组织中的细胞多样性中做出了重要进步。特别是在脑组织中,其细胞类型多样性更大于其他组织类型,以更深入了解脑功能在不同细胞上下文中。然而,分析 scRNA-seq 数据仍然是一项挑战,因为存在内生的测量噪音和限制了广泛的基因表达信息的利用。在这种情况下,我们引入 scHyena,一种基于 Hyena 算法的基础模型,以解决这些挑战并提高 scRNA-seq 分析的准确性。具体来说,我们采用了 Hyena 算法中的新的 Transformer 架构,并将其称为单元细胞 Hyena(scHyena)。这种架构包括线性适应层、基因嵌入编码和双向 Hyena 算法。这使得我们可以处理完整的 scRNA-seq 数据,而不会失去任何信息。特别是,我们的模型通过预训练 scHyena 使用完整的 scRNA-seq 数据来学习细胞和基因的通用特征。我们示出了 scHyena 相比其他参考方法在下游任务中的superior表现,包括细胞类型分类和 scRNA-seq 填充。
ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF
results: 实验结果表明,ED-NeRF 可以在 editing 速度和输出质量两个方面比前一代 3D 编辑模型表现更优,而且它的编辑速度比传统的图像空间 NeRF 编辑更快。Abstract
Recently, there has been a significant advancement in text-to-image diffusion models, leading to groundbreaking performance in 2D image generation. These advancements have been extended to 3D models, enabling the generation of novel 3D objects from textual descriptions. This has evolved into NeRF editing methods, which allow the manipulation of existing 3D objects through textual conditioning. However, existing NeRF editing techniques have faced limitations in their performance due to slow training speeds and the use of loss functions that do not adequately consider editing. To address this, here we present a novel 3D NeRF editing approach dubbed ED-NeRF by successfully embedding real-world scenes into the latent space of the latent diffusion model (LDM) through a unique refinement layer. This approach enables us to obtain a NeRF backbone that is not only faster but also more amenable to editing compared to traditional image space NeRF editing. Furthermore, we propose an improved loss function tailored for editing by migrating the delta denoising score (DDS) distillation loss, originally used in 2D image editing to the three-dimensional domain. This novel loss function surpasses the well-known score distillation sampling (SDS) loss in terms of suitability for editing purposes. Our experimental results demonstrate that ED-NeRF achieves faster editing speed while producing improved output quality compared to state-of-the-art 3D editing models.
摘要
最近,文本到图像扩散模型的进步很大,导致了2D图像生成的新纪录。这些进步被扩展到3D模型,允许通过文本描述生成新的3D对象。这种进步演变成了NeRF编辑方法,允许通过文本条件来修改现有3D对象。然而,现有的NeRF编辑技术受到了训练速度过慢和不适合编辑的问题的限制。为解决这个问题,我们在这里提出了一种新的3D NeRF编辑方法,称为ED-NeRF。这种方法通过在LDM(隐藏层扩散模型)中嵌入真实场景的特征来成功地将真实场景嵌入到LDM的latent空间中。这种方法使得我们可以获得一个更快的NeRF脊梁,并且更适合编辑。此外,我们提出了一种适用于编辑的改进的损失函数,通过将2D图像 editing中使用的delta denoising score(DDS)涂抹损失迁移到三维领域,使得这种损失函数更适合编辑。我们的实验结果表明,ED-NeRF可以比现状的3D编辑模型更快速地进行编辑,并且生成的输出质量也更高。
Continual Contrastive Spoken Language Understanding
for: This paper focuses on the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting, with the goal of preserving the learned representations and improving the model’s ability to learn new tasks continually.
methods: The proposed method, called COCONUT, combines experience replay and contrastive learning to preserve the learned representations and learn more discriminative representations of the new data. The method uses a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, and also leverages a multimodal contrastive loss to align audio and text features.
results: The experiments on two established SLU datasets show the effectiveness of the proposed approach, with significant improvements over the baselines. The method is also shown to be combinable with methods that operate on the decoder side of the model, resulting in further metrics improvements.Abstract
Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.
摘要
Translated into Simplified Chinese:近期,神经网络在多个领域表现出色,语音处理不例外。然而,这些领域的最新突破都需要大量的数据和计算资源进行训练,并且现在难以保持之前学习的知识。在这篇论文中,我们研究了语音理解的类逐式学习(CIL)问题,并提出了一种基于经验回放和对比学习的方法——COCONUT。通过对储存样本中的对比loss进行修改,COCONUT保持了学习的表示,同时将相同类型的样本吸引近起来,并将别的样本推远。此外,我们还利用了多模态对比损失,帮助模型学习更有特征的新数据特征。我们还 investigate了不同的对比设计,以 combinator contrastive loss 的优点和教师-学生架构的逻辑。实验表明,我们的提出方法在两个已知的 SLU 数据集上具有显著的效果,并与基准值之间具有显著的改善。此外,我们还证明了 COCONUT 可以与decoder сторо面的方法结合使用,从而进一步提高 метрикс表现。
Bridging the Domain Gap by Clustering-based Image-Text Graph Matching
results: 我们在大规模公共数据集(CUB-DG和DomainBed)上进行实验,并实现了与或更好的目前状态艺的性能。我们的代码将在发表之前公开。Abstract
Learning domain-invariant representations is important to train a model that can generalize well to unseen target task domains. Text descriptions inherently contain semantic structures of concepts and such auxiliary semantic cues can be used as effective pivot embedding for domain generalization problems. Here, we use multimodal graph representations, fusing images and text, to get domain-invariant pivot embeddings by considering the inherent semantic structure between local images and text descriptors. Specifically, we aim to learn domain-invariant features by (i) representing the image and text descriptions with graphs, and by (ii) clustering and matching the graph-based image node features into textual graphs simultaneously. We experiment with large-scale public datasets, such as CUB-DG and DomainBed, and our model achieves matched or better state-of-the-art performance on these datasets. Our code will be publicly available upon publication.
摘要
(i) representing the image and text descriptions with graphs, and(ii) clustering and matching the graph-based image node features into textual graphs simultaneously.We experiment with large-scale public datasets, such as CUB-DG and DomainBed, and our model achieves matched or better state-of-the-art performance on these datasets. Our code will be publicly available upon publication.Translated into Simplified Chinese:学习域外常量表示是训练模型在未经见目标任务域的总是重要的。文本描述本身就含有 semantic 结构,这些auxiliary semantic cue可以用作域外常量表示的有效 pivot embedding。我们使用多modal图表示,将图像和文本描述 fusion 在一起,以获取域外常量 pivot embedding,并通过考虑本地图像和文本描述之间的semantic结构来学习域外常量特征。我们的目标是通过:(i) 将图像和文本描述表示为图,并(ii) 将图形基于图像节点的匹配和文本描述图同时进行 clustering 来学习域外常量特征。我们在大规模的公共数据集,如 CUB-DG 和 DomainBed 上进行了实验,并达到了与最佳状态的表现。我们的代码将在发表后公开。
For: The paper is written for novel view synthesis and camera motion estimation in the presence of rolling shutter (RS) images.* Methods: The paper proposes a method called Unrolling Shutter Bundle Adjusted Neural Radiance Fields (USB-NeRF) that corrects RS distortions and recovers accurate camera motion trajectory using a physical image formation model.* Results: The paper demonstrates better performance compared to prior works in terms of RS effect removal, novel view image synthesis, and camera motion estimation, as well as the ability to recover high-fidelity high frame-rate global shutter video from a sequence of RS images.Abstract
Neural Radiance Fields (NeRF) has received much attention recently due to its impressive capability to represent 3D scene and synthesize novel view images. Existing works usually assume that the input images are captured by a global shutter camera. Thus, rolling shutter (RS) images cannot be trivially applied to an off-the-shelf NeRF algorithm for novel view synthesis. Rolling shutter effect would also affect the accuracy of the camera pose estimation (e.g. via COLMAP), which further prevents the success of NeRF algorithm with RS images. In this paper, we propose Unrolling Shutter Bundle Adjusted Neural Radiance Fields (USB-NeRF). USB-NeRF is able to correct rolling shutter distortions and recover accurate camera motion trajectory simultaneously under the framework of NeRF, by modeling the physical image formation process of a RS camera. Experimental results demonstrate that USB-NeRF achieves better performance compared to prior works, in terms of RS effect removal, novel view image synthesis as well as camera motion estimation. Furthermore, our algorithm can also be used to recover high-fidelity high frame-rate global shutter video from a sequence of RS images.
摘要
neural radiance fields (nerf) 在最近几年内得到了广泛关注,因为它可以出色地表示3D场景并生成新视图图像。现有的工作通常假设输入图像是由全球曝光相机捕捉的。因此,rolling shutter(RS)图像不能直接应用于现成的 nerf 算法中,rolling shutter 效应也会影响相机pose估计(例如 via COLMAP),从而阻止 nerf 算法在 RS 图像上成功。在这篇论文中,我们提出了 Unrolling Shutter Bundle Adjusted Neural Radiance Fields(USB-NeRF)。USB-NeRF 可以同时纠正 rolling shutter 扭曲和重新估计相机运动轨迹,基于 nerf 框架,并模型了RS相机的物理图像形成过程。实验结果表明,USB-NeRF 在RS图像中Remove rolling shutter distortions和重新估计相机运动轨迹方面比过去的工作更好,同时在新视图图像生成和相机pose估计方面也表现出更高的性能。此外,我们的算法还可以用来恢复高精度高帧率全球曝光视频。
Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing
results: 通过BERT和GPT等Popular Transformer模型的实验,显示Memoria可以有效地提高对长输入序列的处理能力,并在排序和语言模型等多种任务中表现出色Abstract
Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.
摘要
启发器已经在多个领域和任务中证明了其成功。然而,启发器在长输入序列上受到限制,因为它们的容量有限。而人类选择性地记忆和使用输入中的相关信息,而不是像启发器一样处理所有Raw数据从开始到结束。我们介绍Memoria,一种通用的记忆网络,该网络根据希贝尔理论,该理论是人类记忆形成的主要理论,以增强神经网络中长期依赖关系。Memoria在多个记忆层,包括工作记忆、短期记忆和长期记忆中存储和重新获取信息,使用根据希贝尔规则变化的连接重量。通过对BERT和GPT等启发器模型进行实验,我们表明了Memoria在不同任务中对长期依赖关系的考虑能力有显著改进。实验结果显示,Memoria在排序和语言模型化任务以及长文本分类任务中表现出色,超越了现有的方法ologies。
results: 我们提出了Diffusion Generative Flow Samplers (DGFS),一种 sampling-based 框架,可以分解到短 partial trajectory 段,通过 parameterizing 一个 “流函数”。我们的方法启发自 generative flow networks (GFlowNets) 的理论,使我们可以利用中间学习信号和Off-policy 探索能力。通过多种复杂的实验,我们示出了 DGFS 比相关的先前方法更准确地估计 normalization constant。Abstract
We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals and benefit from off-policy exploration capabilities. Through a variety of challenging experiments, we demonstrate that DGFS results in more accurate estimates of the normalization constant than closely-related prior methods.
摘要
我们解决高维度对应函数抽样的问题,是机器学习和统计中的基本任务之一。我们延续了 latest sampling-based 方法,它们利用控制的测量过程来模拟高维度目标分布中的抽样。主要缺点是训练目标需要全程的路径来计算,从而导致 credit assignment 问题,因为使用整个路径和终端时间所得到的学习讯号。在这个工作中,我们提出了 Diffusion Generative Flow Samplers (DGFS),一个抽样基础框架,可以追踪分配到短距离的 partial trajectory segments,通过另外增加一个 "流函数" 来 parameterize。我们的方法受到了生成流网络 (GFlowNets) 的理论影响,因此可以利用中途学习讯号和过度策略的优点。通过一些挑战性的实验,我们证明了 DGFS 可以更精确地估算 Normalization Constant than 相似的先前方法。
results: 经过实验 validate 的结果表明,该方法可以在地理位置相关的任务上提供更好的性能,并且可以在不同的坐标系上进行可视化。Abstract
Position encoding is the primary mechanism which induces notion of sequential order for input tokens in transformer architectures. Even though this formulation in the original transformer paper has yielded plausible performance for general purpose language understanding and generation, several new frameworks such as Rotary Position Embedding (RoPE) are proposed for further enhancement. In this paper, we introduce the notion of "geotokens" which are input elements for transformer architectures, each representing an information related to a geological location. Unlike the natural language the sequential position is not important for the model but the geographical coordinates are. In order to induce the concept of relative position for such a setting and maintain the proportion between the physical distance and distance on embedding space, we formulate a position encoding mechanism based on RoPE architecture which is adjusted for spherical coordinates.
摘要
“位置编码是转换器架构中主要的机制,用于induce输入токен的顺序序列。尽管这种形式ulation在原始转换器论文中已经实现了一般语言理解和生成的可行性,但新的框架如Rotary Position Embedding(RoPE)已经被提出来进行进一步的提升。在这篇论文中,我们引入了“地理токен”这个概念,每个tokent代表一个地理位置的信息。不同于自然语言,序列位置不是模型中重要的,而地理坐标则是。为了induce相对位置的概念并保持坐标空间中的比例,我们基于RoPE架构来修改圆形坐标的位置编码机制。”
Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer
For: 本研究使用光学高分辨 imagery 和 OpenStreetMap (OSM) 数据进行土地覆盖变化检测。* Methods: 提出一种基于 object-based image analysis (OBIA) 技术和 Transformer 架构的 Object-guided Transformer (ObjFormer) 模型,以实现直接基于 OSM 数据和光学图像进行土地覆盖变化检测。* Results: 提出了一种新的半监督Semantic change detection task,不需要光学图像的手动标注地形变化标签来训练semantic change detector。两个轻量级semantic decoder被添加到 ObjFormer 中,以实现这个任务高效地。一种折补交叉熵损失被设计,以完全利用负样本,从而提高任务的性能。Abstract
Optical high-resolution imagery and OpenStreetMap (OSM) data are two important data sources for land-cover change detection. Previous studies in these two data sources focus on utilizing the information in OSM data to aid the change detection on multi-temporal optical high-resolution images. This paper pioneers the direct detection of land-cover changes utilizing paired OSM data and optical imagery, thereby broadening the horizons of change detection tasks to encompass more dynamic earth observations. To this end, we propose an object-guided Transformer (ObjFormer) architecture by naturally combining the prevalent object-based image analysis (OBIA) technique with the advanced vision Transformer architecture. The introduction of OBIA can significantly reduce the computational overhead and memory burden in the self-attention module. Specifically, the proposed ObjFormer has a hierarchical pseudo-siamese encoder consisting of object-guided self-attention modules that extract representative features of different levels from OSM data and optical images; a decoder consisting of object-guided cross-attention modules can progressively recover the land-cover changes from the extracted heterogeneous features. In addition to the basic supervised binary change detection task, this paper raises a new semi-supervised semantic change detection task that does not require any manually annotated land-cover labels of optical images to train semantic change detectors. Two lightweight semantic decoders are added to ObjFormer to accomplish this task efficiently. A converse cross-entropy loss is designed to fully utilize the negative samples, thereby contributing to the great performance improvement in this task. The first large-scale benchmark dataset containing 1,287 map-image pairs (1024$\times$ 1024 pixels for each sample) covering 40 regions on six continents ...(see the manuscript for the full abstract)
摘要
《高分辨率光学图像和OpenStreetMap(OSM)数据为地形变化检测提供了两种重要数据源。先前的研究主要关注在使用OSM数据信息来 помо助多时间点光学高分辨率图像中的变化检测。本文提出了直接通过对OSM数据和光学图像的对比来检测地形变化的方法,从而扩展了变化检测任务的观察范围,包括更多的地球观测。为此,我们提出了一种带有对象指导的变换(ObjFormer)架构,通过自然地结合了流行的对象基本分析(OBIA)技术和先进的视图变换架构。对于OBIA的引入,可以在自身注意模块中减少计算负担和内存压力。具体来说,我们的ObjFormer架构包括以对象为引导的层次伪仿同模块,从OSM数据和光学图像中提取了不同级别的特征表示,以及一个以对象为引导的跨模块,可以逐步回归地形变化从提取的异质特征中。此外,本文还提出了一种新的半监督Semantic Change Detection任务,不需要光学图像的手动标注地形类别来训练Semantic Change Detector。为此,我们在ObjFormer架构中添加了两个轻量级semantic decoder,可以高效完成这个任务。我们还设计了一种倒推十字Entropy损失函数,以完全利用负样本,从而对性能做出大幅提升。本文的首个大规模benchmark数据集包含1,287个地图像对(每个样本为1024×1024像素),覆盖了6大洲40个区域。》
results: 研究者发现,扩散模型在训练时存在吸收行为,这种行为与数据集大小有关,并且可以通过评估模型的最大 memorization 来衡量。此外,研究者发现,在训练数据上随机的标签 Conditioning 可以显著提高模型的吸收能力。Abstract
Due to their capacity to generate novel and high-quality samples, diffusion models have attracted significant research interest in recent years. Notably, the typical training objective of diffusion models, i.e., denoising score matching, has a closed-form optimal solution that can only generate training data replicating samples. This indicates that a memorization behavior is theoretically expected, which contradicts the common generalization ability of state-of-the-art diffusion models, and thus calls for a deeper understanding. Looking into this, we first observe that memorization behaviors tend to occur on smaller-sized datasets, which motivates our definition of effective model memorization (EMM), a metric measuring the maximum size of training data at which a learned diffusion model approximates its theoretical optimum. Then, we quantify the impact of the influential factors on these memorization behaviors in terms of EMM, focusing primarily on data distribution, model configuration, and training procedure. Besides comprehensive empirical results identifying the influential factors, we surprisingly find that conditioning training data on uninformative random labels can significantly trigger the memorization in diffusion models. Our study holds practical significance for diffusion model users and offers clues to theoretical research in deep generative models. Code is available at https://github.com/sail-sg/DiffMemorize.
摘要
Enhancing Energy-efficiency by Solving the Throughput Bottleneck of LSTM Cells for Embedded FPGAs
paper_authors: Chao Qian, Tianheng Ling, Gregor Schiele
for: 这个研究是为了提高互联网络 Things(IoT)中处理感应数据的效率。
methods: 这个研究提出了一种新的LSTM细胞优化方法,以提高FPGA上的能效处理。
results: 使用交通速度预测为 caso study,这个 vanilla LSTM 模型使用优化的LSTM细胞可以在FPGA上进行17534次推导每秒,仅consumption 3.8 $\mu$J每次推导,与现有方法相比,实现至少5.4倍的通过率和1.37倍的能效性。Abstract
To process sensor data in the Internet of Things(IoTs), embedded deep learning for 1-dimensional data is an important technique. In the past, CNNs were frequently used because they are simple to optimise for special embedded hardware such as FPGAs. This work proposes a novel LSTM cell optimisation aimed at energy-efficient inference on end devices. Using the traffic speed prediction as a case study, a vanilla LSTM model with the optimised LSTM cell achieves 17534 inferences per second while consuming only 3.8 $\mu$J per inference on the FPGA \textit{XC7S15} from \textit{Spartan-7} family. It achieves at least 5.4$\times$ faster throughput and 1.37$\times$ more energy efficient than existing approaches.
摘要
为Internet of Things(IoT)设备进行感知数据处理,嵌入深度学习技术是非常重要的。在过去,通常使用卷积神经网络(CNN),因为它们在特定的嵌入式硬件上进行优化非常简单,如Field-Programmable Gate Array(FPGA)。这项工作提出了一种新的长短期记忆网络(LSTM)细胞优化,旨在实现能效的推理在终端设备上。使用交通速度预测为案例研究,一个使用优化LSTM细胞的淫荡LSTM模型在FPGA上实现了每秒17534次推理,并且只消耗了3.8微瓦特的能量每次推理。与现有方法相比,它具有至少5.4倍的通道数和1.37倍的能效率。
Solving Multi-Configuration Problems: A Performance Analysis with Choco Solver
paper_authors: Benjamin Ritz, Alexander Felfernig, Viet-Man Le, Sebastian Lubos
for: 这篇论文是为了描述如何使用多配置功能来满足用户的偏好而写的。
methods: 论文使用了一种称为多配置的方法,该方法可以配置一组配置。
results: 论文通过示例描述了如何使用多配置来生成个性化考试。并提供了一个约束解决器性能分析,帮助了解相关性能问题。Abstract
In many scenarios, configurators support the configuration of a solution that satisfies the preferences of a single user. The concept of \emph{multi-configuration} is based on the idea of configuring a set of configurations. Such a functionality is relevant in scenarios such as the configuration of personalized exams, the configuration of project teams, and the configuration of different trips for individual members of a tourist group (e.g., when visiting a specific city). In this paper, we exemplify the application of multi-configuration for generating individualized exams. We also provide a constraint solver performance analysis which helps to gain some insights into corresponding performance issues.
摘要
许多场景中,配置器支持配置一个满足单个用户的首选项的解决方案。基于多配置的概念,我们可以配置一组配置。这种功能在多个场景中是有用的,例如:个性化考试的配置、项目团队的配置以及不同旅游者组成员的旅游计划(例如,当访问特定城市时)。在这篇论文中,我们通过实现多配置来生成个性化考试。我们还提供了一种约束解决器性能分析,以帮助我们获得一些关于相关性能问题的启示。
A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs
results: 我们的结果表明,我们的方法可以减轻计算开销,同时保持可接受的精度。此外,我们的方法在实际数据和混合精度量化中表现稳定。这些发现可以帮助模型量化和部署决策,同时为量化技术的进一步发展提供基础。Abstract
This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our approach demonstrates that matching the quantisation scheme to the real data distribution can reduce computational overhead while maintaining acceptable precision. Moreover, our approach is robust when applied to real-world data and mixed-precision quantisation, where most objects are quantised to 4 bits. Our findings inform model quantisation and deployment decisions while providing a foundation for advancing quantisation techniques.
摘要
Translation Notes:* "time series Transformer models" became "时间序列Transformer模型" (shíjiān xiàngxīng Transformer módel)* "quantisation-aware training" became "量化意识训练" (liàngzhì yìxiàng xùndǎo)* "adaptive quantization scheme" became "适应量化方案" (shìbiāng liàngzhì fāng'àn)* "symmetric and asymmetric schemes" became "对称和不对称方案" (duìxiàng hé bùduìxiàng fāng'àn)* "real data distribution" became "实际数据分布" (shíjì shùzhì fāngdīstribution)* "computational overhead" became "计算负担" (jìsuàn fùdāng)* "mixed-precision quantization" became "混合精度量化" (hùnhǎng jīngdù liàngzhì)* "most objects are quantized to 4 bits" became "大多数对象被量化为4比特" (dàduōshù de yǐngxìng bèi 4 bǐt)
GET: Group Event Transformer for Event-Based Vision
results: 论文对四个事件视觉分类 dataset(Cifar10-DVS、N-MNIST、N-CARS和DVS128Gesture)以及两个事件视觉检测dataset(1Mpx和Gen1)进行了评估,结果显示,GET的性能比其他当前状态的方法更高。Abstract
Event cameras are a type of novel neuromorphic sen-sor that has been gaining increasing attention. Existing event-based backbones mainly rely on image-based designs to extract spatial information within the image transformed from events, overlooking important event properties like time and polarity. To address this issue, we propose a novel Group-based vision Transformer backbone for Event-based vision, called Group Event Transformer (GET), which de-couples temporal-polarity information from spatial infor-mation throughout the feature extraction process. Specifi-cally, we first propose a new event representation for GET, named Group Token, which groups asynchronous events based on their timestamps and polarities. Then, GET ap-plies the Event Dual Self-Attention block, and Group Token Aggregation module to facilitate effective feature commu-nication and integration in both the spatial and temporal-polarity domains. After that, GET can be integrated with different downstream tasks by connecting it with vari-ous heads. We evaluate our method on four event-based classification datasets (Cifar10-DVS, N-MNIST, N-CARS, and DVS128Gesture) and two event-based object detection datasets (1Mpx and Gen1), and the results demonstrate that GET outperforms other state-of-the-art methods. The code is available at https://github.com/Peterande/GET-Group-Event-Transformer.
摘要
Event 摄像头是一种新型的神经мор夫设备,在过去几年内受到了不断的关注。现有的事件基于设计主要是通过图像转换的方式提取图像中的空间信息,忽略了事件中重要的时间和方向信息。为了解决这个问题,我们提议一种新的集群基于视力 transformer 背部筋,名为集群事件转换(GET),它在特征提取过程中分离了时间-方向信息和空间信息。具体来说,我们首先提出了一种新的事件表示方式,名为集群标识符(Group Token),该标识符将异步事件按照时间戳和方向相组织。然后,GET 应用了事件双重自我注意力块和集群标识符聚合模块,以便在空间和时间-方向域内进行有效的特征交换和集成。接着,GET 可以与不同的下游任务连接,以实现不同的应用。我们在四个事件基于分类 datasets(Cifar10-DVS、N-MNIST、N-CARS 和 DVS128Gesture)和两个事件基于物体检测 datasets(1Mpx 和 Gen1)进行了评估,结果表明,GET 的表现比其他状态的方法更出色。代码可以在 GitHub 上找到:https://github.com/Peterande/GET-Group-Event-Transformer。
Deformation-Invariant Neural Network and Its Applications in Distorted Image Restoration and Analysis
paper_authors: Han Zhang, Qiguang Chen, Lok Ming Lui for:* 这篇论文旨在解决图像受到 геометрическими扭曲所导致的影像处理和计算机视觉任务中的问题,包括图像识别等。methods:* 该论文提出了一种名为几何不变网络(DINN)的框架,用于解决图像处理任务中的几何扭曲问题。DINN通过将几何变换网络(QCTN)作为其他深度网络的一部分,输出一个几何可变映射,以便将受到几何扭曲的图像转换为更加接近自然或好图像的版本。QCTN使用了一个深度神经网络,输出一个Beltrami系数,用于控制输出扭曲映射的本地几何扭曲。results:* 根据该框架,我们开发了一个图像分类网络,可以准确地分类受到扭曲的图像。我们的提议方案在受到大气扭曲和水扭曲的情况下进行了图像恢复,并且与现有的GAN基于方法相比,达到了更高的效果。此外,我们还应用了我们的提议方案到人脸图像的1-1验证中,并取得了满意的效果,进一步证明了我们的方法的有效性。Abstract
Images degraded by geometric distortions pose a significant challenge to imaging and computer vision tasks such as object recognition. Deep learning-based imaging models usually fail to give accurate performance for geometrically distorted images. In this paper, we propose the deformation-invariant neural network (DINN), a framework to address the problem of imaging tasks for geometrically distorted images. The DINN outputs consistent latent features for images that are geometrically distorted but represent the same underlying object or scene. The idea of DINN is to incorporate a simple component, called the quasiconformal transformer network (QCTN), into other existing deep networks for imaging tasks. The QCTN is a deep neural network that outputs a quasiconformal map, which can be used to transform a geometrically distorted image into an improved version that is closer to the distribution of natural or good images. It first outputs a Beltrami coefficient, which measures the quasiconformality of the output deformation map. By controlling the Beltrami coefficient, the local geometric distortion under the quasiconformal mapping can be controlled. The QCTN is lightweight and simple, which can be readily integrated into other existing deep neural networks to enhance their performance. Leveraging our framework, we have developed an image classification network that achieves accurate classification of distorted images. Our proposed framework has been applied to restore geometrically distorted images by atmospheric turbulence and water turbulence. DINN outperforms existing GAN-based restoration methods under these scenarios, demonstrating the effectiveness of the proposed framework. Additionally, we apply our proposed framework to the 1-1 verification of human face images under atmospheric turbulence and achieve satisfactory performance, further demonstrating the efficacy of our approach.
摘要
图像受到 геометрических扭曲的影响 pose 图像处理和计算机视觉任务中的一个重要挑战。深度学习基于的图像模型通常无法在扭曲图像上提供准确的性能。在这篇论文中,我们提出了异构不变网络(DINN),用于解决图像处理任务中的扭曲图像问题。DINN输出了一致的缺省特征,用于表示扭曲图像,但是表示同一个物体或场景的图像。DINN的想法是将简单的组件,即射影变换网络(QCTN), integrating 到现有的深度网络中。QCTN 是一个深度神经网络,输出一个射影变换矩阵,可以将扭曲图像转换成更加靠近自然或好图像的版本。它首先输出了一个 Бел特瑞姆系数,该系数测量射影变换矩阵的地方几何扭曲程度。通过控制 Бел特瑞姆系数,可以控制地方几何扭曲的程度。QCTN 轻量级、简单,可以轻松地与现有的深度网络集成,提高其性能。基于我们的框架,我们已经开发了一个图像分类网络,可以准确地分类扭曲图像。我们的提议的框架已经应用于 restore 扭曲图像,包括大气扭曲和水扭曲。DINN 在这些场景下比存在 GAN 基于的修复方法更高效, demonstrating 了我们的方法的有效性。此外,我们应用我们的提议的框架到人脸1:1验证中的大气扭曲场景,得到了满意的性能,进一步证明了我们的方法的有效性。
Foundation Reinforcement Learning: towards Embodied Generalist Agents with Foundation Prior Assistance
results: 我们通过对Meta-World任务进行评估,发现FAC可以在 less than 200k frames 内达到100%成功率,而基eline方法需要更多的框架才能达到相同的水平。此外,我们还发现FAC具有耐量噪声特性,可以在噪声环境中进行学习和 Perform。最后,我们发现FAC可以在无需人工定义精密奖励或提供teleoperated demo的情况下自主学习和 Perform。Abstract
Recently, people have shown that large-scale pre-training from internet-scale data is the key to building generalist models, as witnessed in NLP. To build embodied generalist agents, we and many other researchers hypothesize that such foundation prior is also an indispensable component. However, it is unclear what is the proper concrete form to represent those embodied foundation priors and how they should be used in the downstream task. In this paper, we propose an intuitive and effective set of embodied priors that consist of foundation policy, value, and success reward. The proposed priors are based on the goal-conditioned MDP. To verify their effectiveness, we instantiate an actor-critic method assisted by the priors, called Foundation Actor-Critic (FAC). We name our framework as Foundation Reinforcement Learning (FRL), since it completely relies on embodied foundation priors to explore, learn and reinforce. The benefits of FRL are threefold. (1) Sample efficient. With foundation priors, FAC learns significantly faster than traditional RL. Our evaluation on the Meta-World has proved that FAC can achieve 100% success rates for 7/8 tasks under less than 200k frames, which outperforms the baseline method with careful manual-designed rewards under 1M frames. (2) Robust to noisy priors. Our method tolerates the unavoidable noise in embodied foundation models. We show that FAC works well even under heavy noise or quantization errors. (3) Minimal human intervention: FAC completely learns from the foundation priors, without the need of human-specified dense reward, or providing teleoperated demos. Thus, FAC can be easily scaled up. We believe our FRL framework could enable the future robot to autonomously explore and learn without human intervention in the physical world. In summary, our proposed FRL is a novel and powerful learning paradigm, towards achieving embodied generalist agents.
摘要
近期,人们已经证明了大规模预训练从互联网级数据是拥有普适模型的关键,例如NLP。为建立embodied普适代理人,我们和许多其他研究人员认为,这种基础先验是不可或缺的组成部分。然而,它的具体表现形式是什么,并如何在下游任务中使用,这还是一个未知问题。在这篇论文中,我们提出了一种直观的和有效的embodied先验,它包括基础策略、价值和成功奖励。这些先验基于目标conditioned MDP。为验证其效果,我们实现了一种actor-critic方法,即基础actor-critic(FAC)。我们将我们的框架称为基础学习(FRL),因为它完全依赖于embodied基础先验来探索、学习和奖励。FRL的好处有三个方面:1. Sample efficient。与基础先验一起,FAC快速学习,我们在Meta-World上评估了FAC,它在 less than 200k frames 下可以达到 100% 成功率,而基础方法在 1M frames 下仍然未能达到这个目标。2. Robust to noisy priors。我们的方法可以忍受基础模型中的不可避免的噪音。我们展示了FAC在噪音或量化错误下仍然可以工作良好。3. Minimal human intervention。FAC完全从基础先验学习,不需要人类提供密集奖励或提供电动示范。因此,FAC可以轻松扩展。我们认为,我们的FRL框架可能会将未来的机器人让人类无需直接干预地自主探索和学习。总之,我们提出的FRL是一种新的和强大的学习方法,逐渐实现embodied普适代理人。
Multi-rules mining algorithm for combinatorially exploded decision trees with modified Aitchison-Aitken function-based Bayesian optimization
results: 实验结果表明,MAABO-MT可以更有效地找到可靠规则,并且比其他基于随机性的方法更具有深度的探索能力。此外,GS-MRM可以减少不必要的规则,提高可靠性。Abstract
Decision trees offer the benefit of easy interpretation because they allow the classification of input data based on if--then rules. However, as decision trees are constructed by an algorithm that achieves clear classification with minimum necessary rules, the trees possess the drawback of extracting only minimum rules, even when various latent rules exist in data. Approaches that construct multiple trees using randomly selected feature subsets do exist. However, the number of trees that can be constructed remains at the same scale because the number of feature subsets is a combinatorial explosion. Additionally, when multiple trees are constructed, numerous rules are generated, of which several are untrustworthy and/or highly similar. Therefore, we propose "MAABO-MT" and "GS-MRM" algorithms that strategically construct trees with high estimation performance among all possible trees with small computational complexity and extract only reliable and non-similar rules, respectively. Experiments are conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method is confirmed to provide deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.
摘要
To address these challenges, we propose two algorithms: "MAABO-MT" and "GS-MRM." MAABO-MT strategically constructs trees with high estimation performance among all possible trees with small computational complexity. GS-MRM extracts only reliable and non-similar rules from the constructed trees. Experiments were conducted using several open datasets to analyze the effectiveness of the proposed method. The results confirm that MAABO-MT can discover reliable rules at a lower computational cost than other methods that rely on randomness. Furthermore, the proposed method provides deeper insights than single decision trees commonly used in previous studies. Therefore, MAABO-MT and GS-MRM can efficiently extract rules from combinatorially exploded decision trees.
On Quantified Observability Analysis in Multiagent Systems
results: 我们在 PRISM 模型检查器上实现了该方法,并通过一些示例证明了其可行性和应用性。Abstract
In multiagent systems (MASs), agents' observation upon system behaviours may improve the overall team performance, but may also leak sensitive information to an observer. A quantified observability analysis can thus be useful to assist decision-making in MASs by operators seeking to optimise the relationship between performance effectiveness and information exposure through observations in practice. This paper presents a novel approach to quantitatively analysing the observability properties in MASs. The concept of opacity is applied to formally express the characterisation of observability in MASs modelled as partially observable multiagent systems. We propose a temporal logic oPATL to reason about agents' observability with quantitative goals, which capture the probability of information transparency of system behaviours to an observer, and develop verification techniques for quantitatively analysing such properties. We implement the approach as an extension of the PRISM model checker, and illustrate its applicability via several examples.
摘要
在多代理系统(MAS)中,代理人对系统行为的观察可能会提高整体团队性能,但也可能会泄露敏感信息给观察者。一个量化的观察可能分析可以帮助决策在MAS中,由操作者寻求最佳化观察和信息暴露之间的关系。本纸提出了一种新的方法来量化MAS中的可观察性特性。我们将opacity应用到形式表示MAS中代理人对系统行为的可观察性,并提出了一种时间逻辑oPATL来评估代理人对于系统行为的可观察性,并开发了证明技术来量化这些特性。我们将这种方法扩展到PRISM模型检查器中,并透过一些例子说明其应用。
How FaR Are Large Language Models From Agents with Theory-of-Mind?
results: 实验表明,使用FaR框架可以提高GPT-4的表现从50%提高到71%,比其他训练方法和几何思维更高。此外,FaR框架还能在多种各种故事结构和场景中提高模型的表现,包括需要思维状态推理的多种场景。Abstract
"Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their actions. We propose a new evaluation paradigm for large language models (LLMs): Thinking for Doing (T4D), which requires models to connect inferences about others' mental states to actions in social scenarios. Experiments on T4D demonstrate that LLMs such as GPT-4 and PaLM 2 seemingly excel at tracking characters' beliefs in stories, but they struggle to translate this capability into strategic action. Our analysis reveals the core challenge for LLMs lies in identifying the implicit inferences about mental states without being explicitly asked about as in ToMi, that lead to choosing the correct action in T4D. To bridge this gap, we introduce a zero-shot prompting framework, Foresee and Reflect (FaR), which provides a reasoning structure that encourages LLMs to anticipate future challenges and reason about potential actions. FaR boosts GPT-4's performance from 50% to 71% on T4D, outperforming other prompting methods such as Chain-of-Thought and Self-Ask. Moreover, FaR generalizes to diverse out-of-distribution story structures and scenarios that also require ToM inferences to choose an action, consistently outperforming other methods including few-shot in-context learning.
摘要
Multi-Agent Reinforcement Learning for Power Grid Topology Optimization
paper_authors: Erica van der Sar, Alessandro Zocca, Sandjai Bhulai
for: 管理增长中的能源网络,因为增加的能源需求和不可预测的可再生能源(太阳和风)。
methods: 使用分布式决策者(MARL)框架,利用电力网络的层次结构。
results: 实验结果表明,MARL框架与单个决策者RL方法相比,表现竞争力强。同时,对下级决策者的RL算法和高级决策者的策略进行比较。Abstract
Recent challenges in operating power networks arise from increasing energy demands and unpredictable renewable sources like wind and solar. While reinforcement learning (RL) shows promise in managing these networks, through topological actions like bus and line switching, efficiently handling large action spaces as networks grow is crucial. This paper presents a hierarchical multi-agent reinforcement learning (MARL) framework tailored for these expansive action spaces, leveraging the power grid's inherent hierarchical nature. Experimental results indicate the MARL framework's competitive performance with single-agent RL methods. We also compare different RL algorithms for lower-level agents alongside different policies for higher-order agents.
摘要
最近的电力网操作挑战 arise from 增长的能源需求和不可预测的可再生能源 like 风和太阳。而增强学习(RL)显示出管理这些网络的应用潜力,通过顶层的动作 like 电网和线路调整。然而,为了有效地处理随网络规模增长的大型动作空间,是非常重要的。这篇文章提出了一个层次多智能类RL框架,利用电力网的自然层次结构。实验结果显示这个框架与单一RL方法相当竞争,我们还比较了不同的RL算法和高级掌控策略。
MagicDrive: Street View Generation with Diverse 3D Geometry Control
for: The paper is written for the task of street view generation with 3D control, specifically for 3D perception tasks like 3D object detection.
methods: The paper proposes a novel framework called MagicDrive, which uses tailored encoding strategies to generate street views with diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, as well as textual descriptions. The framework also incorporates a cross-view attention module to ensure consistency across multiple camera views.
results: The paper achieves high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.Abstract
Recent advancements in diffusion models have significantly enhanced the data synthesis with 2D control. Yet, precise 3D control in street view generation, crucial for 3D perception tasks, remains elusive. Specifically, utilizing Bird's-Eye View (BEV) as the primary condition often leads to challenges in geometry control (e.g., height), affecting the representation of object shapes, occlusion patterns, and road surface elevations, all of which are essential to perception data synthesis, especially for 3D object detection tasks. In this paper, we introduce MagicDrive, a novel street view generation framework offering diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, together with textual descriptions, achieved through tailored encoding strategies. Besides, our design incorporates a cross-view attention module, ensuring consistency across multiple camera views. With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks like BEV segmentation and 3D object detection.
摘要
In this paper, we introduce MagicDrive, a novel street view generation framework that offers diverse 3D geometry controls, including camera poses, road maps, and 3D bounding boxes, along with textual descriptions. This is achieved through tailored encoding strategies. Additionally, our design incorporates a cross-view attention module to ensure consistency across multiple camera views.With MagicDrive, we achieve high-fidelity street-view synthesis that captures nuanced 3D geometry and various scene descriptions, enhancing tasks such as BEV segmentation and 3D object detection.
A ModelOps-based Framework for Intelligent Medical Knowledge Extraction
results: 提出了一种基于多层回调函数的数据集抽象机制,以及一种基于数据集相似性的模型推荐方法,帮助用户快速找到适合给定数据集的模型。Abstract
Extracting medical knowledge from healthcare texts enhances downstream tasks like medical knowledge graph construction and clinical decision-making. However, the construction and application of knowledge extraction models lack automation, reusability and unified management, leading to inefficiencies for researchers and high barriers for non-AI experts such as doctors, to utilize knowledge extraction. To address these issues, we propose a ModelOps-based intelligent medical knowledge extraction framework that offers a low-code system for model selection, training, evaluation and optimization. Specifically, the framework includes a dataset abstraction mechanism based on multi-layer callback functions, a reusable model training, monitoring and management mechanism. We also propose a model recommendation method based on dataset similarity, which helps users quickly find potentially suitable models for a given dataset. Our framework provides convenience for researchers to develop models and simplifies model access for non-AI experts such as doctors.
摘要
<>传输文本到Simplified Chinese。<>从医疗文本中提取医学知识可以提高下游任务,如医学知识图构建和临床决策。然而,知识提取模型的建构和应用缺乏自动化、再利用和统一管理,导致研究人员的不fficient和非AI专家,如医生, Utilize知识提取。为解决这些问题,我们提出了基于ModelOps的智能医学知识提取框架,该框架提供了低代码系统,用于选择模型、训练、评估和优化。具体来说,该框架包括基于多层回调函数的数据集抽象机制,可重用的模型训练、监控和管理机制。我们还提出了基于数据集相似性的模型推荐方法,帮助用户快速找到适合给定数据集的可能适用的模型。我们的框架为研究人员开发模型提供了便利,并简化了非AI专家,如医生,访问模型的过程。
On the Stability of Expressive Positional Encodings for Graph Neural Networks
results: 本文通过分子性质预测和 OUT-OF-distribution泛化任务的实验表明,SPE方法可以提高Positional Encoding的泛化能力和稳定性。Abstract
Designing effective positional encodings for graphs is key to building powerful graph transformers and enhancing message-passing graph neural networks. Although widespread, using Laplacian eigenvectors as positional encodings faces two fundamental challenges: (1) \emph{Non-uniqueness}: there are many different eigendecompositions of the same Laplacian, and (2) \emph{Instability}: small perturbations to the Laplacian could result in completely different eigenspaces, leading to unpredictable changes in positional encoding. Despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. We identify the cause of instability to be a "hard partition" of eigenspaces. Hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis invariant functions whilst respecting all symmetries of eigenvectors. Besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. Finally, we evaluate the effectiveness of our method on molecular property prediction, and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.
摘要
设计有效的位置编码方法对于图 transformations 和消息传递图神经网络是关键。虽然广泛使用laplacian eigenvector作为位置编码,但面临两个基本挑战:(1)非唯一性:存在多种不同的laplacian eigendecompositions,(2)不稳定性:小 perturbations 可能导致完全不同的eigenspaces,从而导致位置编码不可预测。despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. we identify the cause of instability to be a "hard partition" of eigenspaces. hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis-invariant functions while respecting all symmetries of eigenvectors. besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. finally, we evaluate the effectiveness of our method on molecular property prediction and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.Here's the translation in Traditional Chinese:设计有效的位置编码方法对于图 transformations 和消息传递图神经网络是关键。虽然广泛使用laplacian eigenvector作为位置编码,但面临两个基本挑战:(1)非唯一性:存在多种不同的laplacian eigendecompositions,(2)不稳定性:小 perturbations 可能导致完全不同的eigenspaces,从而导致位置编码不可预测。despite many attempts to address non-uniqueness, most methods overlook stability, leading to poor generalization on unseen graph structures. we identify the cause of instability to be a "hard partition" of eigenspaces. hence, we introduce Stable and Expressive Positional Encodings (SPE), an architecture for processing eigenvectors that uses eigenvalues to "softly partition" eigenspaces. SPE is the first architecture that is (1) provably stable, and (2) universally expressive for basis-invariant functions while respecting all symmetries of eigenvectors. besides guaranteed stability, we prove that SPE is at least as expressive as existing methods, and highly capable of counting graph structures. finally, we evaluate the effectiveness of our method on molecular property prediction and out-of-distribution generalization tasks, finding improved generalization compared to existing positional encoding methods.
Stand for Something or Fall for Everything: Predict Misinformation Spread with Stance-Aware Graph Neural Networks
results: 比较之下,stance-aware GNN 的预测性能高于参考值 by 32.65%,并高于不包含用户立场的进阶 GNN by over 4.69%。 另外,注意力权重显示了用户反对立场对他们邻居的行为有较高的影响力,这可能 function as social correction to halt misinformation propagation.Abstract
Although pervasive spread of misinformation on social media platforms has become a pressing challenge, existing platform interventions have shown limited success in curbing its dissemination. In this study, we propose a stance-aware graph neural network (stance-aware GNN) that leverages users' stances to proactively predict misinformation spread. As different user stances can form unique echo chambers, we customize four information passing paths in stance-aware GNN, while the trainable attention weights provide explainability by highlighting each structure's importance. Evaluated on a real-world dataset, stance-aware GNN outperforms benchmarks by 32.65% and exceeds advanced GNNs without user stance by over 4.69%. Furthermore, the attention weights indicate that users' opposition stances have a higher impact on their neighbors' behaviors than supportive ones, which function as social correction to halt misinformation propagation. Overall, our study provides an effective predictive model for platforms to combat misinformation, and highlights the impact of user stances in the misinformation propagation.
摘要
Improving Automatic VQA Evaluation Using Large Language Models
results: 比 existed 评估 metric 更好地与人类判断相关,适用于多种 VQA 模型和测试数据集。Abstract
8 years after the visual question answering (VQA) task was proposed, accuracy remains the primary metric for automatic evaluation. VQA Accuracy has been effective so far in the IID evaluation setting. However, our community is undergoing a shift towards open-ended generative models and OOD evaluation. In this new paradigm, the existing VQA Accuracy metric is overly stringent and underestimates the performance of VQA systems. Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment. In this work, we propose to leverage the in-context learning capabilities of instruction-tuned large language models (LLMs) to build a better VQA metric. We formulate VQA evaluation as an answer-rating task where the LLM is instructed to score the accuracy of a candidate answer given a set of reference answers. We demonstrate the proposed metric better correlates with human judgment compared to existing metrics across several VQA models and benchmarks. We hope wide adoption of our metric will contribute to better estimating the research progress on the VQA task.
摘要
八年后,视觉问答任务(VQA)的自动评估 metric 仍然是评估系统的首选metric。VQA 精度在closed-set中是有效的,但我们社区正在转移到开放式生成模型和 OOD 评估。在这个新 paradigm 中,现有的 VQA 精度 metric 太严格,下降了 VQA 系统的表现。因此,我们需要开发更加 Robust 的自动 VQA metric,作为人类判断的代理。在这种工作中,我们利用 instruction-tuned 大语言模型(LLM)的上下文学习能力,构建了一个更好的 VQA 评估 metric。我们将 VQA 评估定义为一个答案评分任务,LLM 根据一组参考答案,对候选答案进行评分。我们示出了我们提出的 metric 与人类判断更加相关,在多个 VQA 模型和benchmark上表现出了更好的性能。我们希望广泛采用我们的 metric,能够更好地评估 VQA 任务的研究进步。
Improving Drumming Robot Via Attention Transformer Network
results: 大量实验结果表明,改进算法可以提高鼓类别性能,从而帮助机器人享受多种智能应用和服务。Abstract
Robotic technology has been widely used in nowadays society, which has made great progress in various fields such as agriculture, manufacturing and entertainment. In this paper, we focus on the topic of drumming robots in entertainment. To this end, we introduce an improving drumming robot that can automatically complete music transcription based on the popular vision transformer network based on the attention mechanism. Equipped with the attention transformer network, our method can efficiently handle the sequential audio embedding input and model their global long-range dependencies. Massive experimental results demonstrate that the improving algorithm can help the drumming robot promote drum classification performance, which can also help the robot to enjoy a variety of smart applications and services.
摘要
现代社会中,机器人技术已经广泛应用,在各个领域如农业、制造和娱乐等方面取得了大量的进步。在这篇论文中,我们将关注乐队琴行业中的打击机器人。为了实现这一目标,我们提出了一种基于流行的视Transformer网络的注意机制来改进打击机器人的音乐识别性能。凭借注意Transformer网络,我们的方法可以有效地处理串行音频嵌入,并模型全球长距离关系。大量的实验结果表明,改进算法可以帮助打击机器人提高打击类型分类性能,这也可以帮助机器人享受到多种智能应用和服务。
zkFL: Zero-Knowledge Proof-based Gradient Aggregation for Federated Learning
paper_authors: Zhipeng Wang, Nanqing Dong, Jiahao Sun, William Knottenbelt
for: 提高 Federated Learning(FL)中中央聚合器的可信度问题
methods: 使用零知识证明(ZKP)和区块链来保证中央聚合器在训练模型归一化过程中的正确性
results: zkFL可以在不改变FL网络结构的情况下,提高安全性和隐私性,而无需重大地降低训练速度。Abstract
Federated Learning (FL) is a machine learning paradigm, which enables multiple and decentralized clients to collaboratively train a model under the orchestration of a central aggregator. Traditional FL solutions rely on the trust assumption of the centralized aggregator, which forms cohorts of clients in a fair and honest manner. However, a malicious aggregator, in reality, could abandon and replace the client's training models, or launch Sybil attacks to insert fake clients. Such malicious behaviors give the aggregator more power to control clients in the FL setting and determine the final training results. In this work, we introduce zkFL, which leverages zero-knowledge proofs (ZKPs) to tackle the issue of a malicious aggregator during the training model aggregation process. To guarantee the correct aggregation results, the aggregator needs to provide a proof per round. The proof can demonstrate to the clients that the aggregator executes the intended behavior faithfully. To further reduce the verification cost of clients, we employ a blockchain to handle the proof in a zero-knowledge way, where miners (i.e., the nodes validating and maintaining the blockchain data) can verify the proof without knowing the clients' local and aggregated models. The theoretical analysis and empirical results show that zkFL can achieve better security and privacy than traditional FL, without modifying the underlying FL network structure or heavily compromising the training speed.
摘要
Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
paper_authors: Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang
for: 本研究旨在自动化特征预处理(Auto-FP),以提高机器学习模型的质量。
methods: 本研究使用了演化算法、神经网络搜索算法等方法来自动化特征预处理。
results: 研究发现,演化算法在45个公共机器学习数据集上表现最佳,而随机搜索则出乎意料地表现良好。这些结果可能归结于特征预处理的具体问题和搜索算法的设计。Abstract
Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make difficult decisions about which preprocessors to select and in which order to compose them. In this paper, we study how to automate feature preprocessing (Auto-FP) for tabular data. Due to the large search space, a brute-force solution is prohibitively expensive. To address this challenge, we interestingly observe that Auto-FP can be modelled as either a hyperparameter optimization (HPO) or a neural architecture search (NAS) problem. This observation enables us to extend a variety of HPO and NAS algorithms to solve the Auto-FP problem. We conduct a comprehensive evaluation and analysis of 15 algorithms on 45 public ML datasets. Overall, evolution-based algorithms show the leading average ranking. Surprisingly, the random search turns out to be a strong baseline. Many surrogate-model-based and bandit-based search algorithms, which achieve good performance for HPO and NAS, do not outperform random search for Auto-FP. We analyze the reasons for our findings and conduct a bottleneck analysis to identify the opportunities to improve these algorithms. Furthermore, we explore how to extend Auto-FP to support parameter search and compare two ways to achieve this goal. In the end, we evaluate Auto-FP in an AutoML context and discuss the limitations of popular AutoML tools. To the best of our knowledge, this is the first study on automated feature preprocessing. We hope our work can inspire researchers to develop new algorithms tailored for Auto-FP.
摘要
传统机器学习模型,如线性模型和树状模型,在实际应用中广泛使用。这些模型对数据分布敏感,因此特征预处理成为确保模型质量的关键步骤。手动构建特征预处理管道是具有挑战性的,因为数据科学家需要做出许多困难的决策,选择哪些预处理器并将它们按什么顺序排序。在这篇论文中,我们研究如何自动化特征预处理(Auto-FP) для表格数据。由于搜索空间很大,简单的策略是不可能的。为了解决这个挑战,我们发现了一个有趣的观察:Auto-FP可以被视为 either hyperparameter optimization(HPO)或 neural architecture search(NAS)问题。这一观察使得我们可以将多种 HPO 和 NAS 算法扩展到解决 Auto-FP 问题。我们对 15 种算法进行了全面的评估和分析,并在 45 个公共机器学习数据集上进行了比较。总的来说,演化算法显示出了最佳平均排名。尽管随机搜索表现出色,但许多基于模拟器和带刺搜索算法,在 HPO 和 NAS 中表现良好,却不能超越随机搜索。我们分析了这些结果的原因,并进行了瓶颈分析,以寻找改进这些算法的机会。此外,我们还探讨了如何将 Auto-FP 扩展到支持参数搜索,并评估了两种实现方式。最后,我们评估了 Auto-FP 在 AutoML 上的表现,并讨论了流行的 AutoML 工具的局限性。到目前为止,这是自动化特征预处理的首次研究。我们希望我们的工作能启发研究人员开发特定于 Auto-FP 的新算法。
Literature Based Discovery (LBD): Towards Hypothesis Generation and Knowledge Discovery in Biomedical Text Mining
for: automatic discovery of novel associations between medical terms in biomedical literature
methods: Literature Based Discovery (LBD) using concept profiles and statistical significance, and deep learning applications such as transformer models and neural networks
results: potential associations hidden in biomedical literature, including key biomedical discoveries in biomedicineHere’s the full text in Simplified Chinese:
for: 自动发现生物医学文献中的医学术语之间的新相关性
methods: 使用概念profile和统计学 significado的Literature Based Discovery (LBD)方法,以及深度学习应用such as transformer模型和神经网络
results: 生物医学文献中隐藏的可能相关性,包括生物医学中的重要发现I hope that helps!Abstract
Biomedical knowledge is growing in an astounding pace with a majority of this knowledge is represented as scientific publications. Text mining tools and methods represents automatic approaches for extracting hidden patterns and trends from this semi structured and unstructured data. In Biomedical Text mining, Literature Based Discovery (LBD) is the process of automatically discovering novel associations between medical terms otherwise mentioned in disjoint literature sets. LBD approaches proven to be successfully reducing the discovery time of potential associations that are hidden in the vast amount of scientific literature. The process focuses on creating concept profiles for medical terms such as a disease or symptom and connecting it with a drug and treatment based on the statistical significance of the shared profiles. This knowledge discovery approach introduced in 1989 still remains as a core task in text mining. Currently the ABC principle based two approaches namely open discovery and closed discovery are mostly explored in LBD process. This review starts with general introduction about text mining followed by biomedical text mining and introduces various literature resources such as MEDLINE, UMLS, MESH, and SemMedDB. This is followed by brief introduction of the core ABC principle and its associated two approaches open discovery and closed discovery in LBD process. This review also discusses the deep learning applications in LBD by reviewing the role of transformer models and neural networks based LBD models and its future aspects. Finally, reviews the key biomedical discoveries generated through LBD approaches in biomedicine and conclude with the current limitations and future directions of LBD.
摘要
The LBD process creates concept profiles for medical terms, such as diseases or symptoms, and connects them with drugs and treatments based on the statistical significance of the shared profiles. This knowledge discovery approach was introduced in 1989 and remains a core task in text mining. Currently, the ABC principle-based two approaches, open discovery and closed discovery, are mostly explored in the LBD process.This review begins with a general introduction to text mining, followed by biomedical text mining and an overview of various literature resources, such as MEDLINE, UMLS, MESH, and SemMedDB. The review then introduces the core ABC principle and its associated two approaches in the LBD process, open discovery and closed discovery. Additionally, the review discusses the deep learning applications in LBD, including transformer models and neural network-based LBD models, and their future aspects.Finally, the review highlights key biomedical discoveries generated through LBD approaches in biomedicine and concludes with current limitations and future directions of LBD.Here is the Simplified Chinese translation of the text:生物医学知识在快速增长,大多数这些知识表现为科学文献。自动文本挖掘工具和方法可以从这些不结构化和无结构化数据中提取隐藏的模式和趋势。在生物医学文本挖掘中,文献基于发现(LBD)是自动发现医学术语之间的新关系的过程。LBD方法已经证明可以成功减少医学文献中潜在关系的发现时间。LBD过程中创建医学术语的概念profile,如疾病或症状,并将其与药物和治疗相连接,基于这些概念profile之间的统计学 significado。这种知识发现方法在1989年引入,并且至今仍然是文本挖掘中核心任务。目前,ABC原则基于的两种方法,开发发现和关闭发现,在LBD过程中得到广泛的探索。本文首先介绍了文本挖掘的概述,然后是生物医学文本挖掘,并提供了各种文献资源的概述,如MEDLINE、UMLS、MESH和SemMedDB。接着,文章介绍了ABC原则和其相关的两种方法,开发发现和关闭发现,在LBD过程中的应用。此外,文章还评论了深度学习在LBD中的应用,包括转换器模型和神经网络基于的LBD模型,以及其未来方向。最后,文章着重介绍了通过LBD方法在生物医学中发现的关键发现,并评估了当前LBD的限制和未来方向。
MIDDAG: Where Does Our News Go? Investigating Information Diffusion via Community-Level Information Pathways
results: 研究发现,COVID-19相关新闻文章在社交媒体上的信息传播路径具有复杂的层次结构和分布特征,并且可以根据用户群体和信息传播预测能力进行预测和跟踪。Abstract
We present MIDDAG, an intuitive, interactive system that visualizes the information propagation paths on social media triggered by COVID-19-related news articles accompanied by comprehensive insights including user/community susceptibility level, as well as events and popular opinions raised by the crowd while propagating the information. Besides discovering information flow patterns among users, we construct communities among users and develop the propagation forecasting capability, enabling tracing and understanding of how information is disseminated at a higher level.
摘要
我们介绍MIDDAG,一个直观互动的系统,可以视觉化COVID-19相关新闻文章所Trigger的信息宣传路径在社交媒体上。此外,我们还提供了用户/社区感染水平的全面报告,以及由群众宣传的事件和受欢迎的观点。我们不仅揭示用户之间信息流动的模式,还可以建立用户群体和宣传预测能力,以跟踪和理解信息如何在更高层次传播。
CITING: Large Language Models Create Curriculum for Instruction Tuning
results: 我们对四个数据集进行比较,结果表明,使用 CITING 方法可以大幅提高 GPT-4 评价中的报道、深度和全面性, Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.Abstract
The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.
摘要
大量语言模型(LLM)的最近进步得到了人工调教和人类对齐的组合。然而,建立手动制作的指导数据集和进行人类对齐成为了大量发展LLM的瓶颈。在这篇论文中,我们利用人类学生如何通过指南和老师的修改来练习写作技巧的想法,即使用AI模型来取代人类老师来训练学生LLM。我们的方法被称为学习指南调教(CITING)。它包括两个主要步骤:(1)教师LLM制定评估答案的评价标准,(2)学生LLM按照标准进行自修改。我们还在不断迭代执行CITING来体现程序。我们与一系列现有的基线比较CITING,在四个数据集上表现出了强大的改善。具体来说,它在SFT、RLHF、RRHF和RAFT等四个数据集上平均赢得了79.4%、73.4%、78.1%和76.3%的胜率。
results: 对于多种任务的实验结果表明,FCSG 和 Acc-FCSG-M 可以具有较好的样本和通信复杂度。Abstract
Conditional stochastic optimization has found applications in a wide range of machine learning tasks, such as invariant learning, AUPRC maximization, and meta-learning. As the demand for training models with large-scale distributed data grows in these applications, there is an increasing need for communication-efficient distributed optimization algorithms, such as federated learning algorithms. This paper considers the nonconvex conditional stochastic optimization in federated learning and proposes the first federated conditional stochastic optimization algorithm (FCSG) with a conditional stochastic gradient estimator and a momentum-based algorithm (FCSG-M). To match the lower bound complexity in the single-machine setting, we design an accelerated algorithm (Acc-FCSG-M) via the variance reduction to achieve the best sample and communication complexity. Compared with the existing optimization analysis for MAML in FL, federated conditional stochastic optimization considers the sample of tasks. Extensive experimental results on various tasks validate the efficiency of these algorithms.
摘要
<>将文本翻译成简化中文。<>Conditional stochastic optimization在机器学习任务中广泛应用,如不变学习、AUPRC最大化和元学习。随着大规模分布式数据的训练模型需求增加,有增加通信效率的分布式优化算法的需求,如联合学习算法。本文考虑非конvex的conditional stochastic optimization在联合学习中,并提出首个联合conditional stochastic gradient估计和势量驱动算法(FCSG-M)。为实现单机设置下的下限Complexity,我们设计加速算法(Acc-FCSG-M)via variance reduction来实现最佳样本和通信复杂度。与现有的MAML在FL中的优化分析相比,联合conditional stochastic optimization考虑任务样本。经验性研究表明,这些算法在各种任务上具有高效性。
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation
paper_authors: Yuan Zhong, Suhan Cui, Jiaqi Wang, Xiaochen Wang, Ziyi Yin, Yaqing Wang, Houping Xiao, Mengdi Huai, Ting Wang, Fenglong Ma for:这篇论文旨在提高医疗风险预测的效果,以便预测患者未来可能面临的健康风险。methods:这篇论文提出了一种新的扩充生成模型,名为MedDiffusion,它在训练过程中通过生成假患者数据来扩大样本空间,从而提高风险预测性能。此外,MedDiffusion还使用步进注意机制来检出患者访问记录中隐藏的关系,以便自动保留最重要的信息。results:对四个真实的医疗数据集进行了实验评估,显示MedDiffusion在PR-AUC、F1和科ienn的均值上都高于14种现代基线。此外,我们还进行了减少学习和对GAN-based模型的比较,以验证我们的模型设计的合理性和适应性。此外,我们还分析生成的数据,以提供新的解释力研究。Abstract
Health risk prediction is one of the fundamental tasks under predictive modeling in the medical domain, which aims to forecast the potential health risks that patients may face in the future using their historical Electronic Health Records (EHR). Researchers have developed several risk prediction models to handle the unique challenges of EHR data, such as its sequential nature, high dimensionality, and inherent noise. These models have yielded impressive results. Nonetheless, a key issue undermining their effectiveness is data insufficiency. A variety of data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through the learning of underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability.
摘要
医疗风险预测是医疗领域预测模型的基本任务之一,旨在预测患者未来可能面临的医疗风险使用医疗电子病历(EHR)的历史记录。研究人员已经开发了许多风险预测模型,以处理医疗数据的独特挑战,如其继承性、高维度和内在噪音。这些模型已经取得了很好的效果。然而,数据缺乏问题是这些模型的限制因素。为了解决这个问题,本文提出了一种新的、终端扩散基于的医疗风险预测模型,名为MedDiffusion。它通过在训练时创建 sintetic 患者数据来扩大样本空间,提高风险预测性能。此外,MedDiffusion 通过步骤 wise 注意机制,了解患者访问记录中隐藏的关系,使模型自动保留最重要的信息,以生成高质量数据。实验证明,MedDiffusion 在四个真实的医疗数据集上表现出色,至少在PR-AUC、F1和科恩的卡方面比14种高级基准模型更好。我们还进行了剖析研究和对 GAN 基于的相关模型进行比较,以验证我们的模型设计的合理性和适应性。此外,我们还分析生成的数据,提供新的解释力。
Proactive Human-Robot Interaction using Visuo-Lingual Transformers
methods: 本研究提出了一种基于视觉语言多模态变换器的学习方法,可以Capture scene dependencies and proactively suggest tasks based on the user’s intention.
results: 在模拟和实际场景中,提出的方法能够准确地描述场景和建议任务,提高了人机合作的效率和自然性。Abstract
Humans possess the innate ability to extract latent visuo-lingual cues to infer context through human interaction. During collaboration, this enables proactive prediction of the underlying intention of a series of tasks. In contrast, robotic agents collaborating with humans naively follow elementary instructions to complete tasks or use specific hand-crafted triggers to initiate proactive collaboration when working towards the completion of a goal. Endowing such robots with the ability to reason about the end goal and proactively suggest intermediate tasks will engender a much more intuitive method for human-robot collaboration. To this end, we propose a learning-based method that uses visual cues from the scene, lingual commands from a user and knowledge of prior object-object interaction to identify and proactively predict the underlying goal the user intends to achieve. Specifically, we propose ViLing-MMT, a vision-language multimodal transformer-based architecture that captures inter and intra-modal dependencies to provide accurate scene descriptions and proactively suggest tasks where applicable. We evaluate our proposed model in simulation and real-world scenarios.
摘要
人类具有内生的能力,可以从视觉语言cue中提取潜在的上下文信息,通过人类交互来进行推理。在合作中,这使得人类可以前置预测任务的目标意图。相比之下,机器人在合作时,可能只是遵循基本的指令来完成任务,或者使用特定的手工触发器来引起前置的合作。为了提高人机合作的效果,我们提出了一种学习基于的方法,使用场景中的视觉cue、用户的语言命令和对象之间的互动知识来识别和预测用户的目标意图。我们提出了一种视语多模态变换器(ViLing-MMT),它可以捕捉场景中的视觉和语言之间的依赖关系,并提供准确的场景描述和适时提示任务。我们在模拟和实际场景中评估了我们的提议模型。
paper_authors: Erfan Al-Hossami, Razvan Bunescu, Justin Smith, Ryan Teehan
for: 本研究旨在开发一个自动化的索引教学机器人,以帮助新手程序员 debug 缺陷的解决方案。
methods: 本研究使用了 manually created dataset of multi-turn Socratic advice,并使用了多种语言模型进行评估,包括 Flan-T5 和 GPT-4。
results: 研究发现,使用自动化索引教学机器人可以提高学习效果,但是需要更多的数据和评估方法来进一步改进。Note: “索引教学” (Socratic teaching) refers to a teaching method that guides students towards solving problems on their own, rather than providing the solution directly.Abstract
When employing the Socratic method of teaching, instructors guide students toward solving a problem on their own rather than providing the solution directly. While this strategy can substantially improve learning outcomes, it is usually time-consuming and cognitively demanding. Automated Socratic conversational agents can augment human instruction and provide the necessary scale, however their development is hampered by the lack of suitable data for training and evaluation. In this paper, we introduce a manually created dataset of multi-turn Socratic advice that is aimed at helping a novice programmer fix buggy solutions to simple computational problems. The dataset is then used for benchmarking the Socratic debugging abilities of a number of language models, ranging from fine-tuning the instruction-based text-to-text transformer Flan-T5 to zero-shot and chain of thought prompting of the much larger GPT-4. The code and datasets are made freely available for research at the link below. https://github.com/taisazero/socratic-debugging-benchmark
摘要
使用索底里亚方法教学时,教师会导学生解决问题而不直接提供解决方案。这种策略可以大幅提高学习效果,但是它通常需要较长的时间和更多的认知努力。自动化索底里亚对话代理可以增强人类教学,但是它们的开发受到数据集的限制。在这篇论文中,我们介绍了一个手动创建的多轮索底里亚建议数据集,用于帮助新手程序员修复buggy解决方案。这个数据集后来用于评估一些语言模型的索底里亚调试能力,包括练习基于文本的文本转换器Flan-T5的特点调试,以及GPT-4的零损环境和链接思维提示。代码和数据集均为研究用途而免费提供,可以在以下链接下载:https://github.com/taisazero/socratic-debugging-benchmark。
The Rise of Open Science: Tracking the Evolution and Perceived Value of Data and Methods Link-Sharing Practices
results: 研究发现,在physics、math和computer science领域,数据和方法共享实践在时间上逐渐普及,文章中包含链接的数量也在增加。同时,这些链接的重复使用也在增加,特别是在计算机科学领域。此外,研究还发现,分享数据和方法链接的文章会受到更多的引用,活动链接的影响更加明显。Abstract
In recent years, funding agencies and journals increasingly advocate for open science practices (e.g. data and method sharing) to improve the transparency, access, and reproducibility of science. However, quantifying these practices at scale has proven difficult. In this work, we leverage a large-scale dataset of 1.1M papers from arXiv that are representative of the fields of physics, math, and computer science to analyze the adoption of data and method link-sharing practices over time and their impact on article reception. To identify links to data and methods, we train a neural text classification model to automatically classify URL types based on contextual mentions in papers. We find evidence that the practice of link-sharing to methods and data is spreading as more papers include such URLs over time. Reproducibility efforts may also be spreading because the same links are being increasingly reused across papers (especially in computer science); and these links are increasingly concentrated within fewer web domains (e.g. Github) over time. Lastly, articles that share data and method links receive increased recognition in terms of citation count, with a stronger effect when the shared links are active (rather than defunct). Together, these findings demonstrate the increased spread and perceived value of data and method sharing practices in open science.
摘要
近年来,资金机构和学术刊物 increasingly 强调开放科学实践(例如数据和方法分享),以提高科学的透明度、访问性和重复性。然而,规模化这些实践的量化仍然是一个挑战。在这项工作中,我们利用 arXiv 上的 1.1 万篇论文数据,这些论文代表物理、数学和计算机科学领域,以分析时间的推广和影响。为了识别数据和方法链接,我们使用 neural 网络文本分类模型,自动根据论文中的上下文提取 URL 类型。我们发现,在更多的论文中包含链接的做法在时间上扩散,并且这些链接在论文之间的重复使用也在增长(尤其在计算机科学领域)。此外,这些链接在时间的推移中变得更加集中在 fewer web 域(如 Github)。最后,分享数据和方法链接的论文会 receiving 更多的引用,其中活动链接的影响更加强大。总之,这些发现表明开放科学实践的扩散和价值的增加。
Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference
results: 根据多种条件的评估, authors发现了RAG系统可以提高中学生数学问答的响应质量,但是设计者需要考虑在生成响应时与教学资源的匹配程度和学生喜好之间的平衡。Abstract
For middle-school math students, interactive question-answering (QA) with tutors is an effective way to learn. The flexibility and emergent capabilities of generative large language models (LLMs) has led to a surge of interest in automating portions of the tutoring process - including interactive QA to support conceptual discussion of mathematical concepts. However, LLM responses to math questions can be incorrect or mismatched to the educational context - such as being misaligned with a school's curriculum. One potential solution is retrieval-augmented generation (RAG), which involves incorporating a vetted external knowledge source in the LLM prompt to increase response quality. In this paper, we designed prompts that retrieve and use content from a high-quality open-source math textbook to generate responses to real student questions. We evaluate the efficacy of this RAG system for middle-school algebra and geometry QA by administering a multi-condition survey, finding that humans prefer responses generated using RAG, but not when responses are too grounded in the textbook content. We argue that while RAG is able to improve response quality, designers of math QA systems must consider trade-offs between generating responses preferred by students and responses closely matched to specific educational resources.
摘要
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
results: 这篇论文系统地评估了其方法在八个医疗影像分类 datasets 上的效果,并证明其能够对抗潜在的干扰因素,并与标准的视觉嵌入oder 和其他基elines 相比,具有优秀的性能。此外,这篇论文还通过实际应用中的实例研究,详细介绍了这种方法在医疗影像分类中的解释性。Abstract
Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
摘要
医疗图像分类是医疗领域的关键问题,有助于减轻医生的工作负担并促进病人的诊断。然而,在实际应用深度学习模型时,存在两个挑战。首先,神经网络模型往往学习潦上欺诈的相关性而不是需要的特征,这可能会导致在新领域(例如不同年龄的病人)中generalization异常。第二,这些黑盒模型缺乏可读性。在作出诊断时,理解模型为什么做出了决定非常重要,以确保信任和安全考虑。在这篇论文中,我们提出了一种新的方法,用于建立坚实和可读的医疗图像分类器,基于自然语言概念。具体来说,我们首先从GPT-4中查询临床概念,然后将潜在的图像特征转换成显式的概念使用视力语言模型。我们系统地评估我们的方法在八个医疗图像分类 dataset 上,以验证其效果。在具有强调因素的 dataset 上,我们的方法可以减少潦上欺诈,因此与标准视觉编码器和其他基线之间具有显著性能优势。最后,我们通过实际医疗数据的 caso 研究,示出分类器使用少量概念可以提供一定的可读性,以便理解模型做出的决定。
$\mathcal{B}$-Coder: Value-Based Deep Reinforcement Learning for Program Synthesis
paper_authors: Zishun Yu, Yunzhe Tao, Liyu Chen, Tao Sun, Hongxia Yang
For: This paper focuses on program synthesis, which aims to generate accurate and executable code from natural language descriptions. The authors explore the use of reinforcement learning (RL) and large language models (LLMs) to enhance code generation capabilities.* Methods: The authors propose a value-based approach to program synthesis, which differs from the predominant policy-based methods. They develop a novel RL agent called $\mathcal{B}$-Coder, which leverages pre-trained LLMs and a conservative Bellman operator to reduce training complexities.* Results: The authors demonstrate the effectiveness of their approach through empirical evaluations, achieving state-of-the-art performance compared to policy-based methods. Notably, this achievement is reached with minimal reward engineering effort, highlighting the effectiveness of value-based RL.Abstract
Program synthesis aims to create accurate, executable code from natural language descriptions. This field has leveraged the power of reinforcement learning (RL) in conjunction with large language models (LLMs), significantly enhancing code generation capabilities. This integration focuses on directly optimizing functional correctness, transcending conventional supervised losses. While current literature predominantly favors policy-based algorithms, attributes of program synthesis suggest a natural compatibility with value-based methods. This stems from rich collection of off-policy programs developed by human programmers, and the straightforward verification of generated programs through automated unit testing (i.e. easily obtainable rewards in RL language). Diverging from the predominant use of policy-based algorithms, our work explores the applicability of value-based approaches, leading to the development of our $\mathcal{B}$-Coder (pronounced Bellman coder). Yet, training value-based methods presents challenges due to the enormous search space inherent to program synthesis. To this end, we propose an initialization protocol for RL agents utilizing pre-trained LMs and a conservative Bellman operator to reduce training complexities. Moreover, we demonstrate how to leverage the learned value functions as a dual strategy to post-process generated programs. Our empirical evaluations demonstrated $\mathcal{B}$-Coder's capability in achieving state-of-the-art performance compared with policy-based methods. Remarkably, this achievement is reached with minimal reward engineering effort, highlighting the effectiveness of value-based RL, independent of reward designs.
摘要
(简化中文)Program synthesis 目标是从自然语言描述中生成正确可执行代码。这个领域通过结合大型自然语言模型(LLM)和强化学习(RL),提高了代码生成能力。这种集成关注直接优化功能正确性,超越传统的监督损失。当前文献主要倾向于使用策略型算法,但是程序生成的特点表明值型算法具有天然的相容性。这是因为人工程序员开发的庞大范围内的偏离策略,以及通过自动单元测试(RL语言中的容易获得奖励)直接验证生成的程序。在政策型算法的主导下,我们的工作探索了值型方法的可行性,并开发了我们的 Bellman 编程器(简称 Bellman 编程器)。然而,训练值型方法存在巨大的搜索空间问题,为此,我们提出了使用预训练 LLM 和保守的 Bellman 算子来降低训练复杂性。此外,我们还示出了如何利用学习到的值函数作为双重策略来后处生成的程序。我们的实验证明了 Bellman 编程器 可以达到与政策型算法相同或更高的性能,并且减少了奖励工程学的努力。这种成就表明了值型 RL 的有效性,不需要奖励设计。
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
paper_authors: Yue Huang, Jiawen Shi, Yuan Li, Chenrui Fan, Siyuan Wu, Qihui Zhang, Yixin Liu, Pan Zhou, Yao Wan, Neil Zhenqiang Gong, Lichao Sun for: This paper aims to evaluate the tool usage awareness and selection ability of large language models (LLMs) in various scenarios, with the goal of determining whether LLMs can effectively serve as intelligent agents.methods: The authors create a benchmark called MetaTool, which includes a dataset called ToolE that contains various user queries in the form of prompts that trigger LLMs to use tools. They define four subtasks for tool selection and conduct experiments involving nine popular LLMs.results: The majority of the LLMs struggle to effectively select tools, highlighting the existing gaps between LLMs and genuine intelligent agents. However, through error analysis, the authors found significant room for improvement. The paper provides insights for tool developers to enhance the tool selection performance of LLMs.Here is the simplified Chinese text:for: 这篇论文目的是评估大语言模型(LLMs)在不同场景下是否具备工具使用意识和选择能力,以验证LLMs是否能够成为智能代理。methods: 作者们创建了一个名为MetaTool的benchmark,包括一个名为ToolE的数据集,该数据集包含各种用户查询,通过让LLMs使用工具来触发。他们定义了四个工具选择任务,包括相似选择、特定场景选择、可能可靠性问题选择和多工具选择。results: 大多数LLMs仍然困难地选择工具,这反映了现有的差距。但是通过错误分析,作者们发现还有很大的改进空间。 paper提供了工具开发者可以遵循ChatGPT的细节描述,以提高LLMs的工具选择性能。Abstract
Large language models (LLMs) have garnered significant attention due to their impressive natural language processing (NLP) capabilities. Recently, many studies have focused on the tool utilization ability of LLMs. They primarily investigated how LLMs effectively collaborate with given specific tools. However, in scenarios where LLMs serve as intelligent agents, as seen in applications like AutoGPT and MetaGPT, LLMs are expected to engage in intricate decision-making processes that involve deciding whether to employ a tool and selecting the most suitable tool(s) from a collection of available tools to fulfill user requests. Therefore, in this paper, we introduce MetaTool, a benchmark designed to evaluate whether LLMs have tool usage awareness and can correctly choose tools. Specifically, we create a dataset called ToolE within the benchmark. This dataset contains various types of user queries in the form of prompts that trigger LLMs to use tools, including both single-tool and multi-tool scenarios. Subsequently, we set the tasks for both tool usage awareness and tool selection. We define four subtasks from different perspectives in tool selection, including tool selection with similar choices, tool selection in specific scenarios, tool selection with possible reliability issues, and multi-tool selection. We conduct experiments involving nine popular LLMs and find that the majority of them still struggle to effectively select tools, highlighting the existing gaps between LLMs and genuine intelligent agents. However, through the error analysis, we found there is still significant room for improvement. Finally, we conclude with insights for tool developers that follow ChatGPT to provide detailed descriptions that can enhance the tool selection performance of LLMs.
摘要
大型语言模型(LLMs)在近期引起了广泛关注,因为它们在自然语言处理(NLP)能力方面表现出色。最近,许多研究专注于 LLMS 的工具使用能力。他们主要探讨了 LLMS 如何与特定工具合作。但在应用程序中,LLMS 扮演智能代理的情况下,它们需要进行复杂的决策过程,包括是否使用工具和选择适合用户需求的最佳工具。因此,在这篇研究中,我们提出了 MetaTool,一个用于评估 LLMS 是否具有工具使用意识和正确选择工具的对benchmark。具体来说,我们创建了一个名为 ToolE 的 dataset。这个 dataset 包含了各种用户请求的形式,将 LLMS 触发使用工具,包括单一工具和多工具的情况。接着,我们设定了工具使用意识和工具选择的四个任务,包括工具选择与相似选择、特定情况下的工具选择、可能存在可靠性问题下的工具选择、以及多工具选择。我们对九个流行的 LLMs 进行了实验,发现大多数 LLMS 仍然对工具选择产生问题,这显示了现有的 gap между LLMS 和真正的智能代理。但是,通过错误分析,我们发现仍然有很大的改善空间。最后,我们提出了对 ChatGPT 的工具开发者的启示,以帮助提高 LLMS 的工具选择性能。
Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
methods: 我们使用语言模型化 discrete units 作为基线系统,以评估无参量代码混合语音编码器的代码混合能力。
results: 我们的实验涵盖了多种知名的语音编码器,包括 Wav2vec 2.0、HuBERT 等。我们发现,使用多语言预训练(如 XLSR)的编码器在代码混合场景下表现较好,但 ainda 有很大的改进空间以提高其代码混合语言能力。Abstract
We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance. Notably, though our results demonstrate that speech encoders with multilingual pre-training, exemplified by XLSR, outperform monolingual variants (Wav2vec 2.0, HuBERT) in code-switching scenarios, there is still substantial room for improvement in their code-switching linguistic abilities.
摘要
我们介绍了一个新的零资源代码换语音benchmark,用于直接评估自动学习语音编码器的代码换语能力。我们展示了一个基线系统,利用语言模型在粒度单位上进行语言模型化,以示 zero-resource 的方式评估代码换语言编码器的能力。我们的实验包括了许多常见的语音编码器,如 Wav2vec 2.0、HuBERT 和 XLSR 等。我们研究了预训练语言和模型大小对 benchmark 性能的影响。结果显示,使用多语言预训练的 XLSR 在代码换语言场景中表现出色,超过单语言变体(Wav2vec 2.0、HuBERT)的表现。然而,我们还发现在代码换语言能力方面,这些编码器仍然有很大的改进空间。
Multimodal Question Answering for Unified Information Extraction
results: 对六个数据集进行了广泛的实验,显示了我们的MQA框架可以在不同的任务和设置下,有效地提高多Modal大模型(LMM)的性能,并在零shot设置下,与前一代基elines比较得到了大幅度的提升。Abstract
Multimodal information extraction (MIE) aims to extract structured information from unstructured multimedia content. Due to the diversity of tasks and settings, most current MIE models are task-specific and data-intensive, which limits their generalization to real-world scenarios with diverse task requirements and limited labeled data. To address these issues, we propose a novel multimodal question answering (MQA) framework to unify three MIE tasks by reformulating them into a unified span extraction and multi-choice QA pipeline. Extensive experiments on six datasets show that: 1) Our MQA framework consistently and significantly improves the performances of various off-the-shelf large multimodal models (LMM) on MIE tasks, compared to vanilla prompting. 2) In the zero-shot setting, MQA outperforms previous state-of-the-art baselines by a large margin. In addition, the effectiveness of our framework can successfully transfer to the few-shot setting, enhancing LMMs on a scale of 10B parameters to be competitive or outperform much larger language models such as ChatGPT and GPT-4. Our MQA framework can serve as a general principle of utilizing LMMs to better solve MIE and potentially other downstream multimodal tasks.
摘要
多Modal信息提取(MIE)的目标是从不结构化多媒体内容中提取结构化信息。由于任务和设置的多样性,当前大多数MIE模型是任务特定和数据耗费的,这限制了它们在真实世界情况下的普适性。为解决这些问题,我们提议一种多Modal问答(MQA)框架,将MIE任务转化为一个统一的Span抽取和多选问答管道。广泛的实验表明:1. 我们的MQA框架在不同的 dataset上 consistently 和 statistically 改进了多种 Off-the-shelf 大型多Modal模型(LMM)的MIE任务性能,比 vanilla prompting 更好。2. 在零shot设定下,MQA 超越了之前的基eline。此外,我们的框架的效果可以成功传承到几个 shot 设定下,使用10B参数的LMM在一个竞争或超越大型语言模型such as ChatGPT和GPT-4。3. 我们的MQA框架可以作为使用LMM解决MIE和其他下游多Modal任务的一般原则。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Taiwan and Hong Kong.
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
results: 研究发现,transformer可以在一些简单的任务上nearly匹配最佳学习算法,但在更复杂的任务上表现下降。此外,一些没有注意力的模型也可以与transformer相似的性能。在提供教学序列后,transformer可以更加sample-efficiently学习。最后,研究发现,现有的LLMs可以与基于 nearest-neighbor的基准值竞争。Abstract
In order to understand the in-context learning phenomenon, recent works have adopted a stylized experimental framework and demonstrated that Transformers can learn gradient-based learning algorithms for various classes of real-valued functions. However, the limitations of Transformers in implementing learning algorithms, and their ability to learn other forms of algorithms are not well understood. Additionally, the degree to which these capabilities are confined to attention-based models is unclear. Furthermore, it remains to be seen whether the insights derived from these stylized settings can be extrapolated to pretrained Large Language Models (LLMs). In this work, we take a step towards answering these questions by demonstrating the following: (a) On a test-bed with a variety of Boolean function classes, we find that Transformers can nearly match the optimal learning algorithm for 'simpler' tasks, while their performance deteriorates on more 'complex' tasks. Additionally, we find that certain attention-free models perform (almost) identically to Transformers on a range of tasks. (b) When provided a teaching sequence, i.e. a set of examples that uniquely identifies a function in a class, we show that Transformers learn more sample-efficiently. Interestingly, our results show that Transformers can learn to implement two distinct algorithms to solve a single task, and can adaptively select the more sample-efficient algorithm depending on the sequence of in-context examples. (c) Lastly, we show that extant LLMs, e.g. LLaMA-2, GPT-4, can compete with nearest-neighbor baselines on prediction tasks that are guaranteed to not be in their training set.
摘要
为了理解Contextual Learning现象,latest works启用了彩绘的实验方案,并证明了Transformers可以学习Gradient-based learning算法 для不同类型的实数函数。然而,Transformers在实现学习算法方面的局限性和其他类型的算法学习能力还不够清楚。另外,关注型模型是否能够学习其他类型的算法的能力也不了解。此外,这些发现是否可以推广到预训练的Large Language Models(LLMs)还需要进一步研究。在这个工作中,我们通过以下方式回答了这些问题:(a) 我们在一个包含多种布尔函数类型的测试环境中发现,Transformers可以在简单任务上几乎与最佳学习算法匹配,而在更复杂任务上,其性能会下降。此外,我们发现某些无关注意力模型在一系列任务上表现几乎与Transformers一样。(b) 当给Transformers一个教学序列,即一组唯一标识函数类型的示例,我们发现Transformers可以更加效率地学习。有趣的是,我们发现Transformers可以学习并实现两种不同的算法来解决同一个任务,并可以根据示例序列选择更加sample-efficient的算法。(c) 最后,我们发现现有的LLMs,如LLaMA-2和GPT-4,可以与 nearest-neighbor baselines竞争在不在其训练集中的预测任务上。
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
paper_authors: Siddharth Samsi, Dan Zhao, Joseph McDonald, Baolin Li, Adam Michaleas, Michael Jones, William Bergeron, Jeremy Kepner, Devesh Tiwari, Vijay Gadepally
results: 论文的结果表明,LLaMA模型在不同的GPU和数据集下的推理性能和能源消耗存在很大差异。通过模型分割技术,可以在多达32个GPU上进行并发推理,从而提高性能和降低能源消耗。Abstract
Large language models (LLMs) have exploded in popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged in various domains such as law, finance, and medicine. However, these models carry significant computational challenges, especially the compute and energy costs required for inference. Inference energy costs already receive less attention than the energy costs of training LLMs -- despite how often these large models are called on to conduct inference in reality (e.g., ChatGPT). As these state-of-the-art LLMs see increasing usage and deployment in various domains, a better understanding of their resource utilization is crucial for cost-savings, scaling performance, efficient hardware usage, and optimal inference strategies. In this paper, we describe experiments conducted to study the computational and energy utilization of inference with LLMs. We benchmark and conduct a preliminary analysis of the inference performance and inference energy costs of different sizes of LLaMA -- a recent state-of-the-art LLM -- developed by Meta AI on two generations of popular GPUs (NVIDIA V100 \& A100) and two datasets (Alpaca and GSM8K) to reflect the diverse set of tasks/benchmarks for LLMs in research and practice. We present the results of multi-node, multi-GPU inference using model sharding across up to 32 GPUs. To our knowledge, our work is the one of the first to study LLM inference performance from the perspective of computational and energy resources at this scale.
摘要
大型语言模型(LLM)在Popularity due to their new generative capabilities that go far beyond prior state-of-the-art. These technologies are increasingly being leveraged in various domains such as law, finance, and medicine. However, these models carry significant computational challenges, especially the compute and energy costs required for inference. Inference energy costs already receive less attention than the energy costs of training LLMs -- despite how often these large models are called on to conduct inference in reality (e.g., ChatGPT). As these state-of-the-art LLMs see increasing usage and deployment in various domains, a better understanding of their resource utilization is crucial for cost-savings, scaling performance, efficient hardware usage, and optimal inference strategies. In this paper, we describe experiments conducted to study the computational and energy utilization of inference with LLMs. We benchmark and conduct a preliminary analysis of the inference performance and inference energy costs of different sizes of LLaMA -- a recent state-of-the-art LLM -- developed by Meta AI on two generations of popular GPUs (NVIDIA V100 & A100) and two datasets (Alpaca and GSM8K) to reflect the diverse set of tasks/benchmarks for LLMs in research and practice. We present the results of multi-node, multi-GPU inference using model sharding across up to 32 GPUs. To our knowledge, our work is one of the first to study LLM inference performance from the perspective of computational and energy resources at this scale.
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
results: 该论文显示了 zero-shot 多实体驱动生成的能力,而且不需要修改图像解码器。这使得可以轻松地替换CLIP并集成多种 U-Net 技术,从细化控制到个性化图像解码器。Abstract
Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides. However, the generation from generalized vision-language inputs, especially involving multiple images, remains under-explored. This paper presents Kosmos-G, a model that leverages the advanced perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. Kosmos-G demonstrates a unique capability of zero-shot multi-entity subject-driven generation. Notably, the score distillation instruction tuning requires no modifications to the image decoder. This allows for a seamless substitution of CLIP and effortless integration with a myriad of U-Net techniques ranging from fine-grained controls to personalized image decoder variants. We posit Kosmos-G as an initial attempt towards the goal of "image as a foreign language in image generation."
摘要
最近的文本到图像(T2I)和视觉语言到图像(VL2I)生成技术已经取得了 significiant 进步。然而,从通用视觉语言输入开始生成,特别是包含多个图像的情况,仍然是未explored 领域。本文提出了 Kosmos-G 模型,利用多modal大语言模型(MLLMs)的高级见解能力来解决上述挑战。我们的方法将 MLLM 的输出空间与 CLIP 进行了对应,并通过文本modalities 进行了 compositional instruction tuning 的 curated 数据。Kosmos-G 表现了一种无需修改图像解码器的零shot 多实体主题驱动生成能力。审查 instruction tuning 不需要修改图像解码器,这使得可以顺利地替换 CLIP 并轻松地与多种 U-Net 技术结合,从精细控制到个性化图像解码器。我们认为 Kosmos-G 是对 "图像为外语在图像生成" 的初步尝试。
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors
results: 该论文发现,通过预训练,vanilla Transformer 可以与 State Space Model (SSM)匹配在 Long Range Arena 上的表现,并在 PathX-256 任务上提高 SSM 的最佳记录Result by 20 个绝对点。此外,论文还发现,在带有数据驱动初始化的情况下,之前提出的结构化参数化方法对 SSM became redundant。Abstract
Modeling long-range dependencies across sequences is a longstanding goal in machine learning and has led to architectures, such as state space models, that dramatically outperform Transformers on long sequences. However, these impressive empirical gains have been by and large demonstrated on benchmarks (e.g. Long Range Arena), where models are randomly initialized and trained to predict a target label from an input sequence. In this work, we show that random initialization leads to gross overestimation of the differences between architectures and that pretraining with standard denoising objectives, using $\textit{only the downstream task data}$, leads to dramatic gains across multiple architectures and to very small gaps between Transformers and state space models (SSMs). In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Subsequently, we analyze the utility of previously-proposed structured parameterizations for SSMs and show they become mostly redundant in the presence of data-driven initialization obtained through pretraining. Our work shows that, when evaluating different architectures on supervised tasks, incorporation of data-driven priors via pretraining is essential for reliable performance estimation, and can be done efficiently.
摘要
In contrast to prior works, we find that vanilla Transformers can match the performance of S4 on Long Range Arena when properly pretrained, and we improve the best reported results of SSMs on the PathX-256 task by 20 absolute points. Furthermore, we analyze the utility of previously proposed structured parameterizations for SSMs and show that they become redundant in the presence of data-driven initialization obtained through pretraining. Our work demonstrates that incorporating data-driven priors via pretraining is essential for reliable performance estimation when evaluating different architectures on supervised tasks, and can be done efficiently.
T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation
results: 这篇论文提出了一个名为 T$^3$Bench 的全面的文本到3D测试集,并提出了两种自动度量器来评估文本到3D模型的性能,即多视图图像评分和文本-3D一致度评估。这两种度量器与人类评价有高度相关,可以有效地评估文本到3D模型的性能。Abstract
Recent methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF. Notably, these methods are able to produce high-quality 3D scenes without training on 3D data. Due to the open-ended nature of the task, most studies evaluate their results with subjective case studies and user experiments, thereby presenting a challenge in quantitatively addressing the question: How has current progress in Text-to-3D gone so far? In this paper, we introduce T$^3$Bench, the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation. To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and Large Language Model (LLM) evaluation to measure text-3D consistency. Both metrics closely correlate with different dimensions of human judgments, providing a paradigm for efficiently evaluating text-to-3D models. The benchmarking results, shown in Fig. 1, reveal performance differences among six prevalent text-to-3D methods. Our analysis further highlights the common struggles for current methods on generating surroundings and multi-object scenes, as well as the bottleneck of leveraging 2D guidance for 3D generation. Our project page is available at: https://t3bench.com.
摘要
现有的文本到3D方法利用强大预训 diffusion 模型优化 NeRF,不需要训练3D数据可以生成高质量3D场景。由于这是一个开放式任务,大多数研究通过subjective case study和用户实验评估自己的结果,因此存在评估当前进展的问题的量化问题。在这篇论文中,我们介绍T$^3$Bench,首个包含多种文本提示的三个不同复杂度水平的文本到3D benchmark。为了评估文本到3D模型的资源和文本对齐,我们提出了两种自动度量器,它们基于多视图图像生成的3D内容。一个是文本-图像多视图分数和区域卷积来检测质量和视图不一致的指标。另一个是文本-3D多视图描述和大语言模型评估来度量文本-3D一致性的指标。这两个指标与人类评价的不同维度呈正相关,为efficiently评估文本到3D模型提供了一个方框。 Fig. 1 中显示的 benchmarking 结果显示了六种流行的文本到3D方法之间的性能差异。我们的分析还指出了当前方法在生成周围和多对象场景时的普遍困难,以及利用2D导航来生成3D内容的瓶颈。我们的项目页面可以在:https://t3bench.com 上找到。
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network
results: 研究表明,这个单一多任务学习(MTL)模型“UniverSLU”在12种语音分类和序列生成任务中表现了竞争力,甚至超过了专门为这些任务预训练的模型。Abstract
Recent studies have demonstrated promising outcomes by employing large language models with multi-tasking capabilities. They utilize prompts to guide the model's behavior and surpass performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly perform various spoken language understanding (SLU) tasks? To address this, we utilize pre-trained automatic speech recognition (ASR) models and employ various task and dataset specifiers as discrete prompts. We demonstrate efficacy of our single multi-task learning (MTL) model "UniverSLU" for 12 different speech classification and sequence generation tasks across 17 datasets and 9 languages. Results show that UniverSLU achieves competitive performance and even surpasses task-specific models. We also conduct preliminary investigations into enabling human-interpretable natural phrases instead of task specifiers as discrete prompts and test the model's generalization capabilities to new paraphrases.
摘要
最近的研究表明,使用大型自然语言模型并行多任务能够获得扎实的成果。它们使用提示来引导模型的行为,并超越专门为某个任务设计的模型的性能。受到这些研究的启发,我们问:我们可以建立一个能够同时执行多种口语理解(SLU)任务的单一模型吗?为解决这个问题,我们利用预训练的自动语音识别(ASR)模型,并使用不同的任务和数据集规定器作为批处理的提示。我们称之为“UniverSLU”。我们在17个数据集和9种语言上进行了12种语音分类和序列生成任务的测试,结果表明UniverSLU可以达到竞争性的表现,甚至超越专门为某个任务设计的模型。我们还进行了初步的研究,使用人类可理解的自然短语而不是任务规定器作为提示,并测试模型的泛化能力。
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
results: 实验结果显示,启发询问在语音识别和插槽填充等序列生成任务中能够 achieve 53% 的Relative Improvement in Word Error Rate 和 27% 的F1 Score。此外,启发询问在低资源enario中与 Fine-Tuning 方法竞争。此外,这篇论文还证明了启发询问和适束调整在不同语言的cross-Lingual ASR中的传递性。Abstract
Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning.
摘要
<>translate "Prompting and adapter tuning have emerged as efficient alternatives to fine-tuning (FT) methods. However, existing studies on speech prompting focused on classification tasks and failed on more complex sequence generation tasks. Besides, adapter tuning is primarily applied with a focus on encoder-only self-supervised models. Our experiments show that prompting on Wav2Seq, a self-supervised encoder-decoder model, surpasses previous works in sequence generation tasks. It achieves a remarkable 53% relative improvement in word error rate for ASR and a 27% in F1 score for slot filling. Additionally, prompting competes with the FT method in the low-resource scenario. Moreover, we show the transferability of prompting and adapter tuning on Wav2Seq in cross-lingual ASR. When limited trainable parameters are involved, prompting and adapter tuning consistently outperform conventional FT across 7 languages. Notably, in the low-resource scenario, prompting consistently outperforms adapter tuning." into Simplified Chinese.干���addle和适配调整已经成为精细调整(FT)方法的有效替代方案。然而,现有的speech prompting研究主要集中在分类任务上,并未能够处理更复杂的序列生成任务。此外,适配调整主要应用于encoder-only自动学习模型。我们的实验表明,在Wav2Seq模型上进行提示,超过了先前的工作在序列生成任务中。它在ASR中实现了53%的关系改进率,并在插槽填充任务中实现了27%的F1分数。此外,提示和适配调整在低资源enario中竞争FT方法。此外,我们还证明了Wav2Seq模型上的提示和适配调整在跨语言ASR中的传送性。当有限的可学习参数参与时,提示和适配调整一致地超越了传统FT。特别是在低资源enario中,提示一直超越了适配调整。
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning
for: This paper focuses on improving the automatic selection of exemplars for in-context learning in natural language processing, specifically using Large Language Models (LLMs).
methods: The proposed method, called Dual Queries and Low-rank approximation Re-ranking (DQ-LoRe), utilizes two stages of querying: first, LLM-generated knowledge is obtained through Dual Queries, and then, the retriever is queried to obtain final exemplars that align with the input question’s knowledge. Additionally, LoRe employs dimensionality reduction techniques to refine exemplar selection.
results: The proposed DQ-LoRe method significantly outperforms prior state-of-the-art methods in selecting exemplars for GPT-4, with a performance increase from 92.5% to 94.2%. The method also consistently outperforms retrieval-based approaches in terms of both performance and adaptability, especially in scenarios with distribution shifts.Abstract
Recent advances in natural language processing, primarily propelled by Large Language Models (LLMs), have showcased their remarkable capabilities grounded in in-context learning. A promising avenue for guiding LLMs in intricate reasoning tasks involves the utilization of intermediate reasoning steps within the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies in the effective selection of exemplars for facilitating in-context learning. In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking (DQ-LoRe) to automatically select exemplars for in-context learning. Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge. Moreover, for the second query, LoRe employs dimensionality reduction techniques to refine exemplar selection, ensuring close alignment with the input question's knowledge. Through extensive experiments, we demonstrate that DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%. Our comprehensive analysis further reveals that DQ-LoRe consistently outperforms retrieval-based approaches in terms of both performance and adaptability, especially in scenarios characterized by distribution shifts. DQ-LoRe pushes the boundaries of in-context learning and opens up new avenues for addressing complex reasoning challenges. We will release the code soon.
摘要
近期自然语言处理技术的发展,主要受到大型语言模型(LLM)的推动,展现了其在context learning中的强大能力。为了引导LLM进行复杂的推理任务,一个有前途的方向是在Chain-of-Thought(CoT) парадигме中使用中间推理步骤。然而,中间推理步骤的选择是一个主要挑战。在本研究中,我们提出了一种框架,利用Dual Queries和Low-rank approximation Re-ranking(DQ-LoRe)自动选择中间推理步骤。Dual Queries首先询问LLM获取LLM生成的知识,如CoT,然后询问检索器获取最终的例子via问题和知识。此外,在第二个询问中,LoRe使用维度减少技术来精细地选择例子,确保与输入问题的知识相互Alignment。通过广泛的实验,我们证明了DQ-LoRe在自动选择GPT-4中间推理步骤方面显著超越了前一个状态的方法,从92.5%提高到94.2%。我们的全面分析还表明,DQ-LoRe在检索器基于方法的场景下表现出了明显的优势,特别是在存在分布shift的情况下。DQ-LoRe推动了context learning的边缘和开创了新的解决复杂推理挑战的方向。我们即将发布代码。
JsonTuning: Towards Generalizable, Robust, and Controllable Instruction Tuning
paper_authors: Chang Gao, Wenxuan Zhang, Guizhen Chen, Wai Lam for: This paper aims to improve the performance of large language models (LLMs) in various tasks by providing explicit task instructions through a novel structure-to-structure approach called JsonTuning.methods: The JsonTuning approach leverages the versatility and structured nature of JSON to represent tasks, enhancing generalization, improving robustness, and increasing controllability over the output.results: The experimental results show that JsonTuning outperforms TextTuning in various applications, demonstrating improved performance, adaptability, robustness, and controllability.Abstract
Instruction tuning has emerged as a crucial process for harnessing the capabilities of large language models (LLMs) by providing explicit task instructions, leading to improved performance in various tasks. However, prevalent text-to-text instruction tuning (TextTuning) methods suffer from limitations in generalization, robustness, and controllability due to the ambiguity and lack of explicit structure in tasks. In this paper, we propose JsonTuning, a novel structure-to-structure approach for instruction tuning. By leveraging the versatility and structured nature of JSON to represent tasks, JsonTuning enhances generalization by helping the model understand essential task elements and their relations, improves robustness by minimizing ambiguity, and increases controllability by providing explicit control over the output. We conduct a comprehensive comparative study with diverse language models and evaluation benchmarks. Experimental results show that JsonTuning outperforms TextTuning in various applications, showcasing improved performance, adaptability, robustness, and controllability. By overcoming the limitations of TextTuning, JsonTuning demonstrates significant potential for more effective and reliable LLMs capable of handling diverse scenarios.
摘要
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
for: This paper is written to provide a comprehensive survey of recent research progress in the field of GPT-3 family large language models (GLLMs), including their performances in various downstream tasks, domains, and languages.
methods: The paper uses a brief overview of transformers, transfer learning, self-supervised learning, pretrained language models, and large language models as foundation concepts, and discusses the data labelling and data augmentation abilities, robustness, effectiveness, and future research directions of GLLMs.
results: The paper presents a comprehensive overview of the recent research progress in GLLMs, including their performances in various downstream tasks, domains, and languages, and provides insightful future research directions for the field.Here are the three key information points in Simplified Chinese text:
results: 这篇论文提供了GLLMs在不同下游任务、领域和语言中的全面评估,并提供了一些有价值的未来研究方向。Abstract
Large language models (LLMs) are a special class of pretrained language models obtained by scaling model size, pretraining corpus and computation. LLMs, because of their large size and pretraining on large volumes of text data, exhibit special abilities which allow them to achieve remarkable performances without any task-specific training in many of the natural language processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the popularity of LLMs is increasing exponentially after the introduction of models like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models, including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With the ever-rising popularity of GLLMs, especially in the research community, there is a strong need for a comprehensive survey which summarizes the recent research progress in multiple dimensions and can guide the research community with insightful future research directions. We start the survey paper with foundation concepts like transformers, transfer learning, self-supervised learning, pretrained language models and large language models. We then present a brief overview of GLLMs and discuss the performances of GLLMs in various downstream tasks, specific domains and multiple languages. We also discuss the data labelling and data augmentation abilities of GLLMs, the robustness of GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with multiple insightful future research directions. To summarize, this comprehensive survey paper will serve as a good resource for both academic and industry people to stay updated with the latest research related to GPT-3 family large language models.
摘要
大型语言模型(LLM)是一种特殊的预训练语言模型,通过扩大模型大小、预训练文献和计算来获得。由于它们的大型和预训练大量文本数据,LLM具有特殊的能力,可以在许多自然语言处理任务中达到很高的性能,无需任务特定的训练。LLM的时代开始于OpenAI GPT-3模型,而GPT-3家族大语言模型(GLLM)的流行程度在不断增长,特别在研究社区。随着GLLM的普及,特别是在研究领域,有一个强烈的需求:即对多个维度的研究进展进行总结,并提供有用的未来研究方向。我们的论文开始于基础概念,如变换器、转移学习、自我超vised学习、预训练语言模型和大语言模型。然后,我们提供GLLM的简要概述,讨论GLLM在各种下游任务、特定领域和多种语言中的表现。我们还讨论GLLM的数据标注和数据扩展能力,GLLM的Robustness,GLLM作为评估器的效果,并最后结束于多个有用的未来研究方向。总之,这篇总结论文将成为学术和工业人员关注GPT-3家族大语言模型最新研究的好资源。
LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models
paper_authors: Aleksandr Meister, Matvei Novikov, Nikolay Karpov, Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg
for: 本文旨在评估端到端自动语音识别(ASR)模型的括号和大写预测能力。
methods: 本文使用LibriSpeech-PC数据集,并提出了一种新的评估指标 called Punctuation Error Rate(PER)来评估括号预测的准确性。
results: 本文提供了一些初步的基准模型,并通过对LibriSpeech-PC数据集进行测试,证明了该评估指标的有用性。Abstract
Traditional automatic speech recognition (ASR) models output lower-cased words without punctuation marks, which reduces readability and necessitates a subsequent text processing model to convert ASR transcripts into a proper format. Simultaneously, the development of end-to-end ASR models capable of predicting punctuation and capitalization presents several challenges, primarily due to limited data availability and shortcomings in the existing evaluation methods, such as inadequate assessment of punctuation prediction. In this paper, we introduce a LibriSpeech-PC benchmark designed to assess the punctuation and capitalization prediction capabilities of end-to-end ASR models. The benchmark includes a LibriSpeech-PC dataset with restored punctuation and capitalization, a novel evaluation metric called Punctuation Error Rate (PER) that focuses on punctuation marks, and initial baseline models. All code, data, and models are publicly available.
摘要
传统的自动话语识别(ASR)模型输出下划线字符的话语无标点符号,这会降低可读性,需要 subsequential 文本处理模型来将 ASR 笔记转换成正确的格式。同时,开发端到端 ASR 模型可以预测标点和大小写存在几个挑战,主要是因为数据的有限性和现有评估方法的缺陷,如标点预测评估不够精准。本文介绍了一个基于 LibriSpeech-PC 的比较基准,用于评估端到端 ASR 模型的标点和大小写预测能力。该比较基准包括 LibriSpeech-PC 数据集,修复了标点和大小写,以及一种新的评估指标called Punctuation Error Rate(PER),它专注于标点符号。此外,我们还提供了一些初步的基线模型。所有代码、数据和模型都公开可用。
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation
paper_authors: Aman Khullar, Daniel Nkemelu, Cuong V. Nguyen, Michael L. Best
for: 提高在有限数据上的 hate speech 检测性能
methods: 使用数据生成技术生成假数据,并将 hate sentiment 保留在原始示例中转移到目标语言中的新示例中
results: 使用生成的数据训练 hate speech 分类模型,在 Hindi 和 Vietnamese 等有限数据上显示了比较好的性能,可以帮助 bootstrap hate speech 检测模型从scratch 在有限数据上Here’s the translation in English for reference:
for: Improving the performance of hate speech detection in limited data contexts
methods: Using data generation techniques to synthesize new examples of hate speech data in the target language, while retaining the hate sentiment in the original examples
results: Training a hate speech classification model using the synthesized data shows comparable or even better performance than training only on the limited data available in the target domain, which can help bootstrap hate speech detection models from scratch in limited data contexts.Abstract
A growing body of work has focused on text classification methods for detecting the increasing amount of hate speech posted online. This progress has been limited to only a select number of highly-resourced languages causing detection systems to either under-perform or not exist in limited data contexts. This is majorly caused by a lack of training data which is expensive to collect and curate in these settings. In this work, we propose a data augmentation approach that addresses the problem of lack of data for online hate speech detection in limited data contexts using synthetic data generation techniques. Given a handful of hate speech examples in a high-resource language such as English, we present three methods to synthesize new examples of hate speech data in a target language that retains the hate sentiment in the original examples but transfers the hate targets. We apply our approach to generate training data for hate speech classification tasks in Hindi and Vietnamese. Our findings show that a model trained on synthetic data performs comparably to, and in some cases outperforms, a model trained only on the samples available in the target domain. This method can be adopted to bootstrap hate speech detection models from scratch in limited data contexts. As the growth of social media within these contexts continues to outstrip response efforts, this work furthers our capacities for detection, understanding, and response to hate speech.
摘要
“一些研究已经集中在网络上的讯息分类方法,以探测增加的网络上的仇恨言论。然而,这些进步仅限于一些高度资源的语言,使得检测系统在有限数据上 either 下perform 或无法存在。这主要是因为训练数据的缺乏,收集和整理这些数据是 expensive 的。在这个工作中,我们提出了一个数据增压方法,使用生成的 synthetic 数据来解决有限数据上 hate speech 检测的问题。我们使用英文中的 hate speech 例子,生成目标语言中的新的 hate speech 数据,保留了原始例子中的仇恨情感,但将仇恨目标转移到新的语言中。我们将这个方法应用到帮助 hate speech 检测模型在印地语和越南语的检测任务中获得更好的性能。我们的发现显示,使用生成的数据训练的模型在目标领域中的性能与使用仅有可用的数据训练的模型相比,有时会更好,有时会相等。这种方法可以用来启动 hate speech 检测模型,尤其在有限数据上。随着社交媒体在这些情况下的增长,这些研究将进一步我们对仇恨言论的检测、理解和回应的能力。”
Out-of-Distribution Detection by Leveraging Between-Layer Transformation Smoothness
methods: BLOOD 方法基于层次变换平滑性,利用 ID 数据的变换表现更加平滑 than OOD 数据,这也在 Transformer 网络中得到了实验证明。
results: 在文本分类任务中,BLOOD 方法与比较资源占用相同的方法进行比较,并表现出比较好的性能。分析还表明,当学习更加简单的任务时,OOD 数据变换保持原始的锐度,而学习更加复杂的任务时,锐度增加。Abstract
Effective OOD detection is crucial for reliable machine learning models, yet most current methods are limited in practical use due to requirements like access to training data or intervention in training. We present a novel method for detecting OOD data in deep neural networks based on transformation smoothness between intermediate layers of a network (BLOOD), which is applicable to pre-trained models without access to training data. BLOOD utilizes the tendency of between-layer representation transformations of in-distribution (ID) data to be smoother than the corresponding transformations of OOD data, a property that we also demonstrate empirically for Transformer networks. We evaluate BLOOD on several text classification tasks with Transformer networks and demonstrate that it outperforms methods with comparable resource requirements. Our analysis also suggests that when learning simpler tasks, OOD data transformations maintain their original sharpness, whereas sharpness increases with more complex tasks.
摘要
实用的OOD检测是机器学习模型的可靠性关键,但现有的方法受到训练数据的限制,无法实际使用。我们提出了一种基于对于几个层的网络中间层的变数平滑性(BLOOD)来检测OOD数据的方法,不需要训练数据。BLOOD利用了对于ID数据的 между层表示变数转换是稳定的,而OOD数据的转换则是不稳定的,这是我们在Transformer网络上验证的。我们在多个文本分类任务上评估BLOOD,并证明它在与其他方法相比有相似的资源需求下表现更好。我们的分析也显示,当学习较简单的任务时,OOD数据的转换仍然保持原始的锋利度,而当学习较复杂的任务时,锋利度则增加。
Multimodal Prompt Transformer with Hybrid Contrastive Learning for Emotion Recognition in Conversation
results: 实验结果显示,本研究所提出的模型在ERC中比前方法更高的表现,在两个referencedataset上都达到了顶尖水平。Abstract
Emotion Recognition in Conversation (ERC) plays an important role in driving the development of human-machine interaction. Emotions can exist in multiple modalities, and multimodal ERC mainly faces two problems: (1) the noise problem in the cross-modal information fusion process, and (2) the prediction problem of less sample emotion labels that are semantically similar but different categories. To address these issues and fully utilize the features of each modality, we adopted the following strategies: first, deep emotion cues extraction was performed on modalities with strong representation ability, and feature filters were designed as multimodal prompt information for modalities with weak representation ability. Then, we designed a Multimodal Prompt Transformer (MPT) to perform cross-modal information fusion. MPT embeds multimodal fusion information into each attention layer of the Transformer, allowing prompt information to participate in encoding textual features and being fused with multi-level textual information to obtain better multimodal fusion features. Finally, we used the Hybrid Contrastive Learning (HCL) strategy to optimize the model's ability to handle labels with few samples. This strategy uses unsupervised contrastive learning to improve the representation ability of multimodal fusion and supervised contrastive learning to mine the information of labels with few samples. Experimental results show that our proposed model outperforms state-of-the-art models in ERC on two benchmark datasets.
摘要
人机交互中的情感认知(ERC)发挥着重要的作用。情感可以存在多个modalities,而多modalities ERC主要面临两个问题:(1)在跨modal信息融合过程中的噪声问题,和(2)使用少量样本的情感标签,这些标签semantically similar yet belong to different categories。为了解决这些问题并充分利用每个modalities的特征,我们采用了以下策略:首先,深度情感cue extraction被 Performing on modalities with strong representation ability,并设计了多modal prompt信息的特征过滤器。然后,我们设计了一种Multimodal Prompt Transformer(MPT)来实现跨modal信息融合。MPT将multimodal融合信息嵌入到每个Attention层中, allowing prompt information参与文本特征编码和融合多级文本信息以获得更好的跨modal融合特征。最后,我们使用Hybrid Contrastive Learning(HCL)策略来优化模型对具有少量样本的标签的处理能力。这种策略使用了无监督的对比学习提高跨modal融合的表示能力,并使用监督的对比学习挖掘标签中的信息。实验结果显示,我们提出的模型在ERC中比州OF-the-art模型更高。
DOMINO: A Dual-System for Multi-step Visual Language Reasoning
paper_authors: Peifang Wang, Olga Golovneva, Armen Aghajanyan, Xiang Ren, Muhao Chen, Asli Celikyilmaz, Maryam Fazel-Zarandi for:这 paper 的目的是提出一种多步多模态理解方法,用于解决图表和图像中的信息抽取和逻辑或数学计算问题。methods:这 paper 使用了一种双系统方法,包括一个 “System-1” 步骤 для视觉信息抽取,以及一个 “System-2” 步骤 для慎重的逻辑计算。在给定输入时,System-2 将问题分解成多个原子步骤,每个步骤导航 System-1 提取图像中需要进行逻辑计算的信息。results:实验表明,我们的方法在图表和图像 datasets 上表现竞争力强,与先前的干预式模型和管道方法相比。在多步逻辑计算任务中,通过练化 System-2 模块(LLaMA-2 70B)只需要一小段数据,我们的方法的准确率得到了进一步改进,并超越了最佳完全监督的端到端方法(5.7%)和管道方法(7.5%)在一个复杂的数据集上。Abstract
Visual language reasoning requires a system to extract text or numbers from information-dense images like charts or plots and perform logical or arithmetic reasoning to arrive at an answer. To tackle this task, existing work relies on either (1) an end-to-end vision-language model trained on a large amount of data, or (2) a two-stage pipeline where a captioning model converts the image into text that is further read by another large language model to deduce the answer. However, the former approach forces the model to answer a complex question with one single step, and the latter approach is prone to inaccurate or distracting information in the converted text that can confuse the language model. In this work, we propose a dual-system for multi-step multimodal reasoning, which consists of a "System-1" step for visual information extraction and a "System-2" step for deliberate reasoning. Given an input, System-2 breaks down the question into atomic sub-steps, each guiding System-1 to extract the information required for reasoning from the image. Experiments on chart and plot datasets show that our method with a pre-trained System-2 module performs competitively compared to prior work on in- and out-of-distribution data. By fine-tuning the System-2 module (LLaMA-2 70B) on only a small amount of data on multi-step reasoning, the accuracy of our method is further improved and surpasses the best fully-supervised end-to-end approach by 5.7% and a pipeline approach with FlanPaLM (540B) by 7.5% on a challenging dataset with human-authored questions.
摘要
视觉语言理解需要一个系统可以从信息厚度图表或图表中提取文本或数字,并通过逻辑或算术理解来获得答案。现有的方法可以分为两种:一种是使用一个终到终的视力语言模型,另一种是使用两个阶段管道,其中一个captioning模型将图表转换为文本,然后另一个大型语言模型来读取这个文本来推理出答案。然而,前者方法会让模型在一个步骤中回答一个复杂的问题,而后者方法容易因为转换后的文本中含有错误或干扰信息而导致模型混乱。在这种情况下,我们提出了一种多步骤多模态逻辑理解的双系统,它包括一个“系统1”步骤用于视觉信息提取,以及一个“系统2”步骤用于推理。给定输入,系统2将问题分解成原子步骤,每个步骤都会导航系统1提取图表中需要进行逻辑reasoning的信息。我们在图表和图表 datasets上进行实验,结果表明我们的方法与一个预训练的系统2模块相比,在不同的数据上都能够竞争。而通过练习系统2模块(LLaMA-2 70B)在小量数据上进行多步骤逻辑reasoning,我们的方法的准确率得到进一步提高,并在一个复杂的 dataset 上超过了最佳完全监督的端到终结构和一个管道结构(FlanPaLM 540B)的最高值。
Low Resource Summarization using Pre-trained Language Models
results: 提出了一种基线方法,可以在限制资源的情况下进行低资源语言的自动概要生成,并达到了与高资源语言英语相同的评价成绩(PEGASUS: 47.21,BART: 45.14 on XSUM Dataset),同时提供了一种可重复的方法,可以应用于其他低资源语言。Abstract
With the advent of Deep Learning based Artificial Neural Networks models, Natural Language Processing (NLP) has witnessed significant improvements in textual data processing in terms of its efficiency and accuracy. However, the research is mostly restricted to high-resource languages such as English and low-resource languages still suffer from a lack of available resources in terms of training datasets as well as models with even baseline evaluation results. Considering the limited availability of resources for low-resource languages, we propose a methodology for adapting self-attentive transformer-based architecture models (mBERT, mT5) for low-resource summarization, supplemented by the construction of a new baseline dataset (76.5k article, summary pairs) in a low-resource language Urdu. Choosing news (a publicly available source) as the application domain has the potential to make the proposed methodology useful for reproducing in other languages with limited resources. Our adapted summarization model \textit{urT5} with up to 44.78\% reduction in size as compared to \textit{mT5} can capture contextual information of low resource language effectively with evaluation score (up to 46.35 ROUGE-1, 77 BERTScore) at par with state-of-the-art models in high resource language English \textit{(PEGASUS: 47.21, BART: 45.14 on XSUM Dataset)}. The proposed method provided a baseline approach towards extractive as well as abstractive summarization with competitive evaluation results in a limited resource setup.
摘要
The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models
results: 作者建议了一种不听语言优先的Compositionality metric。Abstract
Compositionality is a common property in many modalities including natural languages and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic priors and the interplay between images and texts. We show that current attempts to improve compositional generalization rely on linguistic priors rather than on information in the image. We also propose a new metric for compositionality without such linguistic priors.
摘要
《作品性》是许多Modalities中的共有特性,包括自然语言和图像,但现有的多modal模型的compositional generalization不够了解。本文认为,图像和文本之间的互动和语言优先顺序是两个主要的visual-linguistic compositionality来源。我们发现现有的改进compositional generalization尝试都是通过语言优先顺序进行,而不是从图像中获取信息。我们还提出了一个不含语言优先顺序的新的compositional metric。
Comparative Study and Framework for Automated Summariser Evaluation: LangChain and Hybrid Algorithms
methods: 本研究使用Large Language Models进行分析,通过利用Langchain工具SUMMARIZE PDF文档,提取主要信息,以测量用户对摘要内容的理解程度。
results: 本研究可以帮助学习者了解他们对某个主题的理解程度,并且可以帮助教育专业人员进一步改善学习能力。Abstract
Automated Essay Score (AES) is proven to be one of the cutting-edge technologies. Scoring techniques are used for various purposes. Reliable scores are calculated based on influential variables. Such variables can be computed by different methods based on the domain. The research is concentrated on the user's understanding of a given topic. The analysis is based on a scoring index by using Large Language Models. The user can then compare and contrast the understanding of a topic that they recently learned. The results are then contributed towards learning analytics and progression is made for enhancing the learning ability. In this research, the focus is on summarizing a PDF document and gauging a user's understanding of its content. The process involves utilizing a Langchain tool to summarize the PDF and extract the essential information. By employing this technique, the research aims to determine how well the user comprehends the summarized content.
摘要
自动化文章分数(AES)是一种先进技术,用于多种目的。分数计算基于重要的变量,这些变量可以根据域 Compute 多种方法。研究专注于用户对某个主题的理解,通过分数指数来进行分析。用户可以比较和对比他们最近学习的主题理解程度。结果对学习统计和进步做出贡献。在这项研究中,我们关注将 PDF 文档概要并评估用户对其内容的理解程度。过程中使用 Langchain 工具概要 PDF 并提取重要信息。通过这种方法,我们希望确定用户对概要内容的理解程度。
LC-Score: Reference-less estimation of Text Comprehension Difficulty
paper_authors: Paul Tardy, Charlotte Roze, Paul Poupet
for: This paper aims to improve text comprehension for readers with comprehension issues, particularly in the French language.
methods: The paper proposes a simple approach called \textsc{LC-Score} to train text comprehension metrics for any French text without reference. The approach uses linguistically motivated indicators to train statistical models, as well as neural learning directly from text leveraging pre-trained language models.
results: The paper finds that both approaches (indicator-based and neural) outperform commonly used readability and comprehension metrics such as FKGL, based on two human annotation experiments.Abstract
Being able to read and understand written text is critical in a digital era. However, studies shows that a large fraction of the population experiences comprehension issues. In this context, further initiatives in accessibility are required to improve the audience text comprehension. However, writers are hardly assisted nor encouraged to produce easy-to-understand content. Moreover, Automatic Text Simplification (ATS) model development suffers from the lack of metric to accurately estimate comprehension difficulty We present \textsc{LC-Score}, a simple approach for training text comprehension metric for any French text without reference \ie predicting how easy to understand a given text is on a $[0, 100]$ scale. Our objective with this scale is to quantitatively capture the extend to which a text suits to the \textit{Langage Clair} (LC, \textit{Clear Language}) guidelines, a French initiative closely related to English Plain Language. We explore two approaches: (i) using linguistically motivated indicators used to train statistical models, and (ii) neural learning directly from text leveraging pre-trained language models. We introduce a simple proxy task for comprehension difficulty training as a classification task. To evaluate our models, we run two distinct human annotation experiments, and find that both approaches (indicator based and neural) outperforms commonly used readability and comprehension metrics such as FKGL.
摘要
在数字时代,能够阅读和理解written文本是关键。然而,研究表明,大量人口受到理解问题的压力。在这种情况下,进一步的访问ibility措施是必需的,以提高读者文本理解能力。然而,作者几乎没有被帮助,也没有劝导以生成易于理解的内容。此外,自动文本简化(ATS)模型的开发受到了参照文本的缺乏,导致缺乏准确度测试的精度。我们提出了\textsc{LC-Score},一种简单的方法,可以在不使用参照文本的情况下,训练文本理解度量。我们的目标是在 $[0, 100]$ 分范围内,量化文本是否遵循法国《Langage Clair》(LC)指南,这与英语平易途同。我们探索了两种方法:(i)使用语言学上的驱动因素,用于训练统计模型,和(ii)直接从文本中学习,利用预训练语言模型。我们介绍了一个简单的代理任务,以作为理解难度训练的分类任务。为了评估我们的模型,我们进行了两个独立的人类标注实验,并发现,我们的指南(指标)和神经网络方法都高于常用的阅读和理解指标 such as FKGL。
COVID-19 South African Vaccine Hesitancy Models Show Boost in Performance Upon Fine-Tuning on M-pox Tweets
results: 经过调整后,F1-scores提高了超过8%,达到了69.6%的最高值,超过了现有的模型和知名的分类算法。Abstract
Very large numbers of M-pox cases have, since the start of May 2022, been reported in non-endemic countries leading many to fear that the M-pox Outbreak would rapidly transition into another pandemic, while the COVID-19 pandemic ravages on. Given the similarities of M-pox with COVID-19, we chose to test the performance of COVID-19 models trained on South African twitter data on a hand-labelled M-pox dataset before and after fine-tuning. More than 20k M-pox-related tweets from South Africa were hand-labelled as being either positive, negative or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8% falling just short of 70%, but still outperforming state-of-the-art models and well-known classification algorithms. An LDA-based topic modelling procedure was used to compare the miss-classified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models.
摘要
很多非典国家reported大量的M-pox cases since May 2022, causing concerns that the M-pox outbreak could rapidly escalate into another pandemic, while the COVID-19 pandemic continues to spread. Given the similarities between M-pox and COVID-19, we decided to test the performance of COVID-19 models trained on South African Twitter data on a manually labeled M-pox dataset before and after fine-tuning. Over 20,000 M-pox-related tweets from South Africa were manually labeled as positive, negative, or neutral. After fine-tuning these COVID-19 models on the M-pox dataset, the F1-scores increased by more than 8%, reaching nearly 70%, but still outperforming state-of-the-art models and well-known classification algorithms. Using an LDA-based topic modeling procedure, we compared the misclassified M-pox tweets of the original COVID-19 RoBERTa model with its fine-tuned version, and from this analysis, we were able to draw conclusions on how to build more sophisticated models.
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation
results: AGIR可以准确地传达正式语言表达的信息,提高了报告的流畅性和实用性,并且可以大幅减少CTI报告的写作时间,提高了CTI生成的效率。Abstract
Cyber Threat Intelligence (CTI) reporting is pivotal in contemporary risk management strategies. As the volume of CTI reports continues to surge, the demand for automated tools to streamline report generation becomes increasingly apparent. While Natural Language Processing techniques have shown potential in handling text data, they often struggle to address the complexity of diverse data sources and their intricate interrelationships. Moreover, established paradigms like STIX have emerged as de facto standards within the CTI community, emphasizing the formal categorization of entities and relations to facilitate consistent data sharing. In this paper, we introduce AGIR (Automatic Generation of Intelligence Reports), a transformative Natural Language Generation tool specifically designed to address the pressing challenges in the realm of CTI reporting. AGIR's primary objective is to empower security analysts by automating the labor-intensive task of generating comprehensive intelligence reports from formal representations of entity graphs. AGIR utilizes a two-stage pipeline by combining the advantages of template-based approaches and the capabilities of Large Language Models such as ChatGPT. We evaluate AGIR's report generation capabilities both quantitatively and qualitatively. The generated reports accurately convey information expressed through formal language, achieving a high recall value (0.99) without introducing hallucination. Furthermore, we compare the fluency and utility of the reports with state-of-the-art approaches, showing how AGIR achieves higher scores in terms of Syntactic Log-Odds Ratio (SLOR) and through questionnaires. By using our tool, we estimate that the report writing time is reduced by more than 40%, therefore streamlining the CTI production of any organization and contributing to the automation of several CTI tasks.
摘要
现代风险管理策略中,Cyber Threat Intelligence(CTI)报告的重要性日益凸显。随着CTI报告的数量不断增加,自动化工具的需求也日益凸显。然而,自然语言处理技术在处理文本数据方面表现出了潜在的优势,但它们通常无法处理多元数据源的复杂关系。此外,已有的标准如STIX在CTI社区中得到了广泛采用,强调通过正式分类实体和关系来促进数据共享的一致性。本文介绍一种名为AGIR(自动生成情报报告)的革命性自然语言生成工具,特点是为CTI报告自动生成全面的情报报告。AGIR使用了两个阶段管道,结合了模板方法和大语言模型如ChatGPT的优势。我们对AGIR的报告生成能力进行了量化和质量的评估。生成的报告准确地表达了通过正式语言表达的信息,具有高回归值(0.99)而无需幻化。此外,我们比较了AGIR的报告流畅性和实用性与现有方法,显示AGIR在SLOR和问卷方面的分数高于其他方法。通过使用我们的工具,我们估计CTI报告的写作时间可以减少超过40%,因此加速CTI生产和自动化一些CTI任务。
I$^2$KD-SLU: An Intra-Inter Knowledge Distillation Framework for Zero-Shot Cross-Lingual Spoken Language Understanding
results: 对于 MultiATIS++ 数据集,我们的提议的框架与强大的基线模型比较,显著提高了总准确率,并创造了跨语言 SLU 中新的状态势。Abstract
Spoken language understanding (SLU) typically includes two subtasks: intent detection and slot filling. Currently, it has achieved great success in high-resource languages, but it still remains challenging in low-resource languages due to the scarcity of labeled training data. Hence, there is a growing interest in zero-shot cross-lingual SLU. Despite of the success of existing zero-shot cross-lingual SLU models, most of them neglect to achieve the mutual guidance between intent and slots. To address this issue, we propose an Intra-Inter Knowledge Distillation framework for zero-shot cross-lingual Spoken Language Understanding (I$^2$KD-SLU) to model the mutual guidance. Specifically, we not only apply intra-knowledge distillation between intent predictions or slot predictions of the same utterance in different languages, but also apply inter-knowledge distillation between intent predictions and slot predictions of the same utterance. Our experimental results demonstrate that our proposed framework significantly improves the performance compared with the strong baselines and achieves the new state-of-the-art performance on the MultiATIS++ dataset, obtaining a significant improvement over the previous best model in overall accuracy.
摘要
通常的语音理解理解(SLU)包括两个子任务:意图检测和插槽填充。在高资源语言中,SLU已经取得了很大的成功,但在低资源语言中仍然存在很大的挑战,这是因为这些语言的标注训练数据的缺乏。因此,随着零shot cross-语言SLU的兴趣的增长,我们提出了一个Intra-Inter知识填充框架(I$^2$KD-SLU),以模型语音理解中的相互协作关系。具体来说,我们不仅在不同语言的同一个句子中进行了内知识填充,还进行了间知识填充,以确保意图和插槽之间的相互关系。我们的实验结果表明,我们提出的框架可以与强大的基线模型相比,并在MultiATIS++数据集上达到新的状态态标记,在总准确率方面取得了显著的提升。
NOLA: Networks as Linear Combination of Low Rank Random Basis
for: 这 paper 旨在提高大型语言模型 (LLM) 的减少 Parameters 和存储方法,使其在不同的下游任务和领域中能够快速适应和提高性能。
methods: 这 paper 使用了 LoRA 方法,但是它面临两个主要的限制:1) parameter reduction 是固定的,即 rank one decomposition; 2) parameter reduction 受到模型架构和选择的排名影响。因此,这 paper 引入了 NOLA 方法,通过使用随机生成的基础 (matrix) 进行线性组合,并且只进行 linear mixture 的优化,以解耦参数数量和网络架构之间的关系。
results: 根据 GPT-2 和 ViT 在自然语言和计算机视觉任务中的适应结果,NOLA 能够与相等参数数量的模型进行比较,并且在更大的模型中可以减少参数数量一半,无需牺牲性能。Abstract
Large Language Models (LLMs) have recently gained popularity due to their impressive few-shot performance across various downstream tasks. However, fine-tuning all parameters and storing a unique model for each downstream task or domain becomes impractical because of the massive size of checkpoints (e.g., 350GB in GPT-3). Current literature, such as LoRA, showcases the potential of low-rank modifications to the original weights of an LLM, enabling efficient adaptation and storage for task-specific models. These methods can reduce the number of parameters needed to fine-tune an LLM by several orders of magnitude. Yet, these methods face two primary limitations: 1) the parameter reduction is lower-bounded by the rank one decomposition, and 2) the extent of reduction is heavily influenced by both the model architecture and the chosen rank. For instance, in larger models, even a rank one decomposition might exceed the number of parameters truly needed for adaptation. In this paper, we introduce NOLA, which overcomes the rank one lower bound present in LoRA. It achieves this by re-parameterizing the low-rank matrices in LoRA using linear combinations of randomly generated matrices (basis) and optimizing the linear mixture coefficients only. This approach allows us to decouple the number of trainable parameters from both the choice of rank and the network architecture. We present adaptation results using GPT-2 and ViT in natural language and computer vision tasks. NOLA performs as well as, or better than models with equivalent parameter counts. Furthermore, we demonstrate that we can halve the parameters in larger models compared to LoRA with rank one, without sacrificing performance.
摘要
大型语言模型(LLM)最近受到了广泛关注,因为它们在多个下游任务中表现出了惊人的几次训练成果。然而,对于每个下游任务或领域都需要独立存储和参数调整的方法成为不实际,因为模型检查点的大小(例如GPT-3的350GB)。现有文献,如LoRA,表明了使用低级 modificatioin 方法可以提高LLM的效率和存储。这些方法可以将LLM的参数数量减少到数个数量级。然而,这些方法受到两个主要限制:1)参数减少是基于一个约束的,2)减少的程度受到模型架构和选择的级别的影响。例如,在更大的模型中,即使使用约束一 decomposition,也可能超过实际需要的参数数量。在这篇论文中,我们介绍了NOLA,它可以超越LoRA中的约束一lower bound。它通过将LoRA中的低级矩阵重新参数化为线性组合的随机生成矩阵(基准)和优化线性混合系数,从而解耦参数数量与选择级别和网络架构之间的关系。我们在GPT-2和ViT上进行了自适应任务,并得到了与相同参数数量的性能。此外,我们还证明了可以在更大的模型中减少参数数量,而不会影响性能。
results: 比较CapsNet和PDR-CapsNet在CIFAR-10 dataset上的性能,PDR-CapsNet具有83.55%的准确率,需要87.26% fewer parameters,32.27%和47.40% fewer MACs和Flops,并且实现了3倍快的推理和7.29J less energy consumption。Abstract
Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
摘要
卷积神经网络(CNN)在图像分类任务中具有状态机器人的表现,但它们由于堆叠 Pooling 层所导致的信息损失而受限。卷积神经网络(CapsNet)采用了 computationally 昂贵的迭代过程来解决这些问题,但它们在复杂的数据集上经常表现不佳,需要更多的计算资源 than CNN。为了解决这些挑战,我们提出了并行动态路由 CapsNet(PDR-CapsNet),这是一种更深度的和更加能效的 CapsNet 选择。通过利用并行策略,PDR-CapsNet 减少了 CapsNet 的计算复杂度,提高了通过put和硬件资源的使用效率。因此,我们在 CIFAR-10 数据集上 achieve 83.55% 的准确率,需要 87.26% fewer parameters,32.27% 和 47.40% fewer MACs 和 Flops,实现了 3 倍 быстре的推理和 7.29J 更少的能 consumption 在一个 2080Ti GPU 上。
Regret Analysis of Distributed Online Control for LTI Systems with Adversarial Disturbances
results: 这个论文获得了一个 regret bound of $O(\sqrt{T}\log T)$ 以及 $O(T^{2/3} \text{poly}(\log T))$,这表示在不同的时间长度下,这个分布式控制策略可以实现最佳中央控制策略。Abstract
This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems (with possibly unknown dynamics) in the presence of adversarial perturbations. There exists a global network cost that is characterized by a time-varying convex function, which evolves in an adversarial manner and is sequentially and partially observed by local agents. The goal of each agent is to generate a control sequence that can compete with the best centralized control policy in hindsight, which has access to the global cost. This problem is formulated as a regret minimization. For the case of known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics, and in the learning phase our proposed control algorithm is applied using each agent system estimate. We establish a regret bound of $O(T^{2/3} \text{poly}(\log T))$ for this setting.
摘要
For the case of known dynamics, a fully distributed disturbance feedback controller is proposed, which guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the case of unknown dynamics, a distributed explore-then-commit approach is designed, where all agents jointly learn the system dynamics in the exploration phase, and the proposed control algorithm is applied using each agent's system estimate in the learning phase. The regret bound for this setting is established as $O(T^{2/3} \text{poly}(\log T))$.
results: 我们的方法在许多流行的非几何测试函数上表现出色,比如有限全球最优解的函数。与许多现有的状态先进方法相比,我们的方法在 regret 值和速度上表现出了许多的提升,达到了许多的目标函数最优解。然而,我们的方法可能不适用于计算成本高的函数。Abstract
In the field of global optimization, many existing algorithms face challenges posed by non-convex target functions and high computational complexity or unavailability of gradient information. These limitations, exacerbated by sensitivity to initial conditions, often lead to suboptimal solutions or failed convergence. This is true even for Metaheuristic algorithms designed to amalgamate different optimization techniques to improve their efficiency and robustness. To address these challenges, we develop a sequence of multidimensional integration-based methods that we show to converge to the global optima under some mild regularity conditions. Our probabilistic approach does not require the use of gradients and is underpinned by a mathematically rigorous convergence framework anchored in the nuanced properties of nascent optima distribution. In order to alleviate the problem of multidimensional integration, we develop a latent slice sampler that enjoys a geometric rate of convergence in generating samples from the nascent optima distribution, which is used to approximate the global optima. The proposed Probabilistic Global Optimizer (ProGO) provides a scalable unified framework to approximate the global optima of any continuous function defined on a domain of arbitrary dimension. Empirical illustrations of ProGO across a variety of popular non-convex test functions (having finite global optima) reveal that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods, including gradient-based, zeroth-order gradient-free, and some Bayesian Optimization methods, in term regret value and speed of convergence. It is, however, to be noted that our approach may not be suitable for functions that are expensive to compute.
摘要
在全球优化领域,许多现有的算法面临非凸目标函数和高计算复杂性或求导函数信息的不可得性的挑战。这些限制,受初始条件的敏感性的影响,常导致优化解不 optimal或失败 convergence。这是真的,即使使用Metaheuristic算法,这些算法是用来混合不同的优化技术以提高其效率和可靠性。为了解决这些挑战,我们开发了一个序列的多维 интеграル基于方法,我们证明它们可以 converge to the global optima under some mild regularity conditions.我们的 probabilistic approach不需要使用 gradients,并且基于数学上的准确的 convergence 框架,anchored in the nuanced properties of nascent optima distribution。为了缓解多维 интеграル的问题,我们开发了一个latent slice sampler,它 enjoys a geometric rate of convergence in generating samples from the nascent optima distribution,这些样本用于approximate the global optima。我们提出的Probabilistic Global Optimizer (ProGO) 提供了一个可扩展的统一框架,用于 aproximate the global optima of any continuous function defined on a domain of arbitrary dimension.empirical examples of ProGO across a variety of popular non-convex test functions (having finite global optima) reveal that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods, including gradient-based, zeroth-order gradient-free, and some Bayesian Optimization methods, in terms of regret value and speed of convergence. However, it is worth noting that our approach may not be suitable for functions that are expensive to compute.
results: 论文分析了在中央化和分散化FL中Client端的挑战,并探讨了FL在人类中心IoT中的发展前景。Abstract
The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.
摘要
互联网物联网(IoT)不断生成巨量数据,引起了数据隐私保护和数据滥用限制的担忧。联邦学习(FL)可以在多方协作中分享机器学习模型参数而不是原始用户数据,因此它在隐私保护和学习效率提高方面受到了广泛关注。本文探讨在人性化设备作为FL客户端时出现的数字道德问题。具体来说,FL中的游戏dinamica、公平、奖励和继承问题由客户端和服务器之间的视角和目标差异引起。我们从客户端和服务器的角度分析这些挑战,并通过中央化和分布式FL的视角进行解读。最后,我们探讨FL在人性化IoT方面的发展机遇。
Test Case Recommendations with Distributed Representation of Code Syntactic Features
results: 在 Methods2Test 数据集上,提出的方法可以自动找到最相似的测试单元,减少开发人员的测试单元生成努力。Abstract
Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
摘要
频繁修改单元测试用例是软件开发中不可避免的,因为软件源代码、设计和需求都在不断发生变化。由于手动维护软件测试用例是费时、费力和成本高昂的,因此自动化测试用例生成和维护的过程将对软件测试过程产生深远的影响。为此,我们提出一种自动化方法,利用源代码方法和单元测试用例的结构和 semantics 属性来推荐最相关和有用的单元测试用例给开发者。该方法首先使用神经网络将方法级源代码和单元测试用例转换成分布式表示(嵌入向量),保留代码结构的重要性。根据方法的semantics和结构特征,该方法计算cosine相似性 между方法的嵌入向量和已经训练的实例。根据嵌入向量的相似性分数,模型标识最相似的方法和相关单元测试用例。在 Methods2Test 数据集上进行了实验,结果表明,虽然无法保证与给定方法集合相似的测试用例,但是提议的测试用例仍然能够捕捉到给定方法的关键特征,并且评估表明,建议的测试用例可以减少开发者的努力来生成预期的测试用例。
Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors
results: 该研究的实验结果显示,使用该新的攻击方法可以让现有的机器学习式钓鱼网页检测器的性能受到严重的损害,只需要30个查询即可。Abstract
Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
摘要
machine learning钓鱼网页探测器(ML-PWD)已经被证明容易受到敏感HTML代码的攻击。然而,最近提出的攻击方法具有有限的效iveness,因为它们只ocus于特定的HTML代码元素,而不是优化使用这些攻击。在这种情况下,我们超越这些限制,首先设计了一个新的细化的攻击方法,可以无需改变恶意网页的功能和rendering,即这些攻击是功能和rendering保持的设计。然后,我们使用一种高效的黑盒优化算法选择最佳的攻击方法,以击败目标探测器。我们的实验结果表明,我们的攻击可以很快地击败当前领先的ML-PWD,只需30个查询, thus overcome the weaker attacks proposed in previous work, and enable a more fair robustness evaluation of ML-PWD.
Enhancing Accuracy in Deep Learning Using Random Matrix Theory
paper_authors: Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang
For: The paper explores the application of random matrix theory (RMT) in the training of deep neural networks (DNNs) to simplify DNN architecture and improve accuracy.* Methods: The paper uses techniques from RMT to determine the number of singular values to be removed from the weight layers of a DNN during training, specifically via singular value decomposition (SVD).* Results: The paper shows that the proposed method can be applied to any fully connected or convolutional layer of a pretrained DNN, reducing the layer’s parameters and simplifying the DNN architecture while preserving or even enhancing the model’s accuracy. Empirical evidence is provided on the MNIST and Fashion MNIST datasets.Here’s the same information in Simplified Chinese text:* 为:本文研究了深度神经网络(DNN)训练中随机矩阵理论(RMT)的应用,以简化DNN结构和提高准确性。* 方法:本文使用RMT技术确定在DNN训练中去除weight层中的小特征值,具体来说是通过特征值分解(SVD)。* 结果:本文证明该方法可以应用于任何已经训练过的层,包括卷积层和全连接层,从而减少层的参数数量,简化DNN结构,同时保持或者提高模型的准确性。实验证明在MNIST和Fashion MNIST datasets上的效果。Abstract
In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
摘要
在这项研究中,我们探讨了深度神经网络(DNN)训练中Random Matrix Theory(RMT)的应用,特icularly focusing on layer pruning to simplify DNN architecture and loss landscape. RMT,最近在深度学习中使用以解决过拟合问题,允许我们研究DNN的Weight层спектrum。我们使用这些技术来优化DNN中Weight层中的 singular value数量,以便在训练过程中去除不必要的参数,从而简化DNN结构并提高准确性。我们通过在MNIST和Fashion MNIST数据集上训练简单的DNN模型,证明了我们的方法可以应用于任何已经训练过的完全连接或卷积层。我们的方法可以降低层的参数数量,简化DNN结构,同时保持或者提高模型的准确性。通过基于RMT的标准做法,我们可以确定要从Weight层中去除的小特征值,从而保持测试集的准确率不变,实现更高效的DNN训练而不损失性能。我们提供了 both theoretical and empirical evidence,证明了我们的方法不会对DNN的准确性产生负面影响。我们的结果为创建更高效和准确的深度学习模型提供了有价值的实践经验。
FedNAR: Federated Optimization with Normalized Annealing Regularization
results: 对于视觉和语言 datasets 进行了广泛的实验,结果表明,在不同的背景 federated optimization 算法中,权重减少可以加速涨化和提高模型精度。另外,FedNAR 具有自适应性,可以根据初始化参数的不合理性自动调整权重减少,而传统 FL 算法的精度则会明显下降。Abstract
Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
摘要
modern deep neural network优化中通用的一种常用技术是Weight decay,用于提高泛化性表现。在这篇论文中,我们首先探讨了Weight decay的选择,并发现Weight decay值对现有的FL算法的 convergence有显著影响。虽然预防过拟合是关键,但Weight decay可能会引入一种新的优化目标,这在FL中更加明显,因为多个本地更新和不同的数据分布。为解决这个挑战,我们开发了{\it Federated optimization with Normalized Annealing Regularization}(FedNAR)算法,这是一种简单 yet effective和多用的插件。我们在每次更新中规定了梯度和Weight decay的范围,通过共同clip来调整每次更新的大小。我们提供了FedNAR的 konvergence rate的完整理论分析,并在视觉和语言数据集上进行了广泛的实验,包括不同的背景 federated optimization算法。我们的实验结果 consistently示出,在把FedNAR incorporated into existing FL algorithms时,可以加速 konvergence和提高模型精度。此外,FedNAR具有自适应Weight decay的能力,当初始参数不合适时,FedNAR可以自动调整Weight decay,而传统FL算法的精度则会明显下降。我们的代码可以在 \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar} 上获取。
FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent
results: 实验结果表明,FedHyper 能够在视觉和语言 benchmark 数据集上快速收敛,比 FedAvg 和竞争对手快速收敛 1.1-3 倍,并且在不良初始学习率设置下可以提高最终准确率。此外,FedHyper 可以在不同的初始学习率设置下提高准确率,最高提高 15%。Abstract
The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
摘要
理论上,聚合学习(FL)的应用面临着许多复杂的挑战,其中一个重要的挑战是调节超参数。在这些超参数中,学习率的调整是一个关键的组成部分,可以显著提高FL系统的效果。为回应这个急需,这篇论文提出了FedHyper,一种特有的学习率调整算法,专门适用于FL。FedHyper可以适应全局和本地学习率的调整,并且不仅能够在不同的初始学习率设置下保持稳定的性能,还能够减少手动调整学习率的劳动。我们提供了FL的整体分析,并对视觉和语言benchmark datasets进行了广泛的实验。结果显示,FedHyper可以在1.1-3x的速度上 converges,同时也可以在不同的初始学习率设置下达到最高的终端准确率。此外,FedHyper可以在初始学习率设置不佳的情况下带来很大的提升,提高终端准确率达到15%。
Towards out-of-distribution generalizable predictions of chemical kinetics properties
results: 研究结果表明,现有的机器学习方法在不同级别的问题上存在挑战和机遇,并提供了一些可能的解决方案。Abstract
Machine Learning (ML) techniques have found applications in estimating chemical kinetics properties. With the accumulated drug molecules identified through "AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.
摘要
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly
paper_authors: Herbert Woisetschläger, Alexander Isenko, Shiqiang Wang, Ruben Mayer, Hans-Arno Jacobsen
for: This paper explores the use of Federated Learning (FL) to bring large language models (LLMs) to modern edge computing systems.
methods: The paper fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. The study also provides a micro-level hardware benchmark and compares the model FLOP utilization to a state-of-the-art data center GPU.
results: The paper evaluates the current capabilities of edge computing systems and their potential for LLM FL workloads, and demonstrates the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.Abstract
Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
摘要
大型语言模型(LLM)和基础模型在使用中变得普遍,因为它们提供了新的机会,让个人和企业可以更好地处理自然语言,与数据进行交互,并更快地获取信息。然而,训练或精度调整LLM需要庞大量数据,这可能会因为法律或技术限制而困难以获取,并可能需要专用的计算资源。联邦学习(FL)是一种解决这些问题的方案,以扩展深入学习应用程序的数据访问权限。本文从硬件角度来探讨如何将LLM带到现代边缘计算系统中。我们的研究将Flan-T5模型家族,从80M到3B参数,使用FL进行文本摘要任务进行细粒度硬件指标,并与当前最佳的数据中心GPU进行比较。我们的贡献有两个方面:首先,我们评估边缘计算系统的当前能力和LLM FL工作负荷的潜在可能性。其次,我们通过与数据中心GPU进行比较,探讨在现实条件下的网络利用率和计算效率的可能性。我们的贡献在于,我们提供了一个硬件指标,并证明了边缘计算系统在LLM FL工作负荷下的可能性和下一步的改进方向。
Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data
paper_authors: Adam Wang, Son Nguyen, Albert Montillo
for: This paper aims to address two core problems in traditional deep learning (DL) by introducing a mixed effects deep learning (MEDL) framework that promotes fairness and robustness.
methods: The MEDL framework separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) using a cluster adversary and a Bayesian neural network. The framework also incorporates adversarial debiasing to promote equality-of-odds fairness across fairness-sensitive variables.
results: The paper shows that the MEDL framework notably enhances fairness across all sensitive variables, increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status, while maintaining robust performance and clarity. The framework is versatile and suitable for various dataset types and tasks, making it broadly applicable.Abstract
Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
摘要
传统的深度学习(DL)受到两个核心问题的影响。首先,它假设训练样本是独立并且具有相同的分布。然而,许多实际世界数据集中的样本会根据共同的测量结果(如参与研究的人员或细胞)分组,这会违反这个假设,从而导致深度学习的性能受损,降低了通用化和解释性。其次,模型通常会为总准确率培育,而忽略少数群体,这会导致偏见问题,例如贷款批准或医疗保险费率的偏见,这些偏见可能会对人们的生活质量产生重要影响。为解决这两个挑战,我们提出了混合效果深度学习(MEDL)框架。MEDL分别量化分组 invariant fixed effects(FE)和分组特有的随机效果(RE),通过引入:1)分组反对者,使学习分组 invariant FE,2)bayesian neural network,量化 RE,3)混合函数,将 FE 和 RE 组合成混合效果预测。我们将这种 MEDL 结合了反对批判,以便实现 equality-of-odds 公平性 across FE, RE, 和 ME 预测中的敏感变量。我们使用三个数据集进行评估:两个来自人口/金融,关注收入分类,一个来自医疗领域,预测医院住院时间,一个回归任务。我们的框架可以明显提高公平性,对所有敏感变量的公平性提高至82%,43%,86% 和 27%。此外,我们的方法保持了MEDL的稳定性和清晰度,并且可以适用于不同的数据类型和任务,因此广泛应用。我们的 GitHub 存储库中包含实现。
OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials
paper_authors: Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, Jing Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Jason Swails, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland
results: 论文通过对细胞分化调控因子8(CDK8)和绿色荧光蛋白(GFP)chromophore在水中的分子动力学计算来展示这些特性。结果表明,这些特性可以在只需要一定的增加计算成本的情况下,提高分子动力学计算的准确性。Abstract
Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
摘要
机器学习在分子模拟中扮演着越来越重要的角色。最新版本的OpenMM分子动力学工具集 introduce了新的特性来支持使用机器学习潜在力。可以将任意PyTorch模型添加到一个 simulatin中,并用来计算力和能量。一个更高层次的接口使得用户可以轻松地模型他们的分子对象,使用通用的预训练潜在功能。一组优化的CUDA 加速器和自定义PyTorch操作,使得分子模拟得到了显著提高。我们在Cyclin-dependent kinase 8(CDK8)和绿色荧光蛋白(GFP)氢键在水中进行了模拟。总之,这些特性使得使用机器学习来提高分子模拟的准确性,只需要一个modest 的增加成本。
Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and Dataset
for: This paper is written for the purpose of investigating the limitations of Electromagnetic Side-Channel Analysis (EM-SCA) approaches for IoT forensics, specifically the impact of device variability on the accuracy and reliability of EM-SCA results.
methods: The paper uses machine-learning (ML) based approaches for EM-SCA and collects EM-SCA datasets to evaluate the limitations of current EM-SCA approaches and datasets. The study also employs transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices.
results: The paper contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter, and demonstrates the feasibility of using transfer learning to improve the accuracy and reliability of EM-SCA results in IoT forensics.Abstract
IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
摘要
互联网关系物(IoT)指的是一个包含物理设备、车辆、家用电器和其他设备的网络,这些设备嵌入了感知器、软件和连接性,以便收集和交换数据。IoT审查是收集和分析IoT设备上的数字证据,以调查网络攻击、安全漏洞和其他可能在这些连接设备上发生的恶意活动。特别是,EM-SCA在IoT审查中变得非常重要,因为它可以无须损害IoT设备或窃听其网络,而且可以披露IoT设备内部的机密信息。然而,EM-SCA的准确性和可靠性受到设备多样性、环境因素和数据收集和处理方法的影响。此外,对这些限制的研究非常有限,特别是crossed-IoT设备的问题。因此,本文会详细介绍EM-SCA方法中的设备多样性的影响,以及machine learning(ML)基于的EM-SCA方法的限制。我们首先介绍了背景、基本概念和用于评估现有EM-SCA方法的数据集。然后,我们解决了现有的一个重要限制,即处理器(SoC)的多核架构。我们提出了一种收集EM-SCA数据的方法,并证明了使用传输学习可以在IoT审查中获得更加可靠和有意义的结果。此外,我们还提供了一个新的数据集,用于使用深度学习模型分析电磁romagnetic Side-Channel数据,并且关于cross-device可用性问题。
Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation
results: 本论文的结果表明,这些模型基于树算法具有较高的解释性和稳定性,同时也能够保持和黑盒模型的性能相似。此外,这些方法还能够捕捉复杂的交互效应。Abstract
Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
摘要
��urgate模型在抽象和强大的黑盒机器学习模型的透彻解读中发挥关键作用。这篇论文强调使用模型基于树作为资源模型,通过决策规则将特征空间分割成可解释的区域。在每个区域内,使用可解释的模型,基于加法主要效果来近似黑盒模型的行为,寻找优化的平衡点。本文比较了四种模型基于树算法, namely SLIM、GUIDE、MOB和CTree,它们在生成这种代理模型方面的能力。我们进行了全面的分析,包括忠实度、可解释性、稳定性以及捕捉交互效果的能力。根据我们的彻底分析,我们最终提供了用户特定的建议。
Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data
paper_authors: Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley
for: 本研究的目的是Characterizing the relationship between neural population activity and behavioral data, 即 Neuroscience 领域中的一个中心目标。
methods: 本研究使用了一种基于 Gaussian Process Factor Analysis (GPFA) 和 Gaussian Process Variational Autoencoders (GP-VAEs) 的无监督 latent variable model (LVM),可以提取高维时间序列数据中的共同和独立的特征空间结构。
results: 研究表明,使用这种模型可以准确地分解高维时间序列数据中的共同和独立特征空间结构,并且在不同实验数据模式下都能够提供良好的重建结果。此外,研究还应用于两个实验 Setting:蝋烛蛋白氮氧化 imaging 和 Manduca sexta 肌电征观测。Abstract
Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
摘要
We use Gaussian Process Factor Analysis (GPFA) for neural spiking data with temporally smooth latent space, combined with Gaussian Process Variational Autoencoders (GP-VAEs) that use a GP prior to characterize correlations in a latent space, and admit rich expressivity due to a deep neural network mapping to observations. We parameterize the latents in the Fourier domain, which improves latent identification.We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that smoothly change over time. Our multi-modal Gaussian Process Variational Autoencoder (MM-GPVAE) accurately identifies shared and independent latent structure across modalities, and provides good reconstructions of both images and neural rates on held-out trials.We also apply our framework to two real-world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus. Our approach provides a powerful tool for analyzing and understanding the complex relationships between neural population activity and behavioral data.
results: 在 CIFAR-10 预训和 CIFAR-100 精进任务中,这个方法的表现与非隐私模型几乎相等,并且通常比 DP-SGD 直接应用到对比例损失的表现更好。Abstract
Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
摘要
“不监督式预训”是电脑视觉模型和大语言模型的开发中常见的步骤。在这种设定下,由于没有标签,因此需要使用相似性基于的损失函数,如对照损失,以降低相似输入的距离,并将不相似的输入划分开。随着隐私问题的增加,对这些模型进行权限为 differential privacy 变得更加重要。然而,由于输入的生成方式,这些损失函数的 $L_2$ 敏感性会随着批号大小增长。这个问题对于对于权限为 differentially private 的训练方法,如 DP-SGD,是不利的。为了解决这个问题,我们开发了一种基于相似性损失函数的 DP-SGD variant,具体是对对照损失进行修改,以使得每个批号中的条件 gradient 的敏感性为 $O(1)$。我们在预先训练 CIFAR-10 和 CIFAR-100 中进行了一些验证,结果显示,在这两个任务中,我们的方法的性能与非隐私模型相似,并且通常超过直接对对照损失进行 DP-SGD 的性能。”
Dual Prompt Tuning for Domain-Aware Federated Learning
results: 实验结果显示,提案学习方法(Fed-DPT)可以优化领域转移问题,并且与原始CLIP模型相比,实现了14.8%的提升。在DomainNet dataset中,这种方法可以获得68.4%的平均准确率,涵盖了六个领域。Abstract
Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
摘要
federated learning 是一种分布式机器学习 paradigma,允许多个客户端共同训练一个共享模型,使用本地数据进行训练。然而,传统的 federated learning 算法经常难以通用,因为客户端的训练数据通常存在域shift问题。在这项工作中,我们考虑了一种具有挑战性和实际性的 federated learning 场景,其中每个客户端的训练数据来自不同的域。我们通过提出了技术,解决域shift问题,并提出了一种名为 Federated Dual Prompt Tuning (Fed-DPT) 的新方法。具体来说,Fed-DPT 使用了预训练的视觉语言模型,然后应用视觉和文本提示调整来促进域适应性。我们进行了广泛的 Fed-DPT 实验,并证明其在域感知 federated learning 中具有显著的有效性。使用预训练的 CLIP 模型(ViT-Base 作为图像编码器),我们的 Fed-DPT 在 DomainNet 数据集上 achieve 68.4% 的平均精度,超过原 CLIP 的 14.8% 。
Physics-Informed Neural Networks for Accelerating Power System State Estimation
methods: 本文提出了一种新的方法,通过将物理知识integrated into PINNs来减少状态估算的计算复杂性,同时保持高准确性。
results: 经过实验表明,提出的方法可以提高精度,降低标准差,并且更快地 converges,在IEEE 14-bus系统上实现了11%的提高精度、75%的降低标准差和30%的加速。Abstract
State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
摘要
<> translate_language: zh-CNState estimation is the cornerstone of the power system control center, providing the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.Note: "zh-CN" is the Simplified Chinese language code.
Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making
results: DC在多个标准RLbenchmark上达到了状态机器人表现的最佳Result,同时具有较少的资源消耗和更好的普适性。Abstract
The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.
摘要
Recently, Transformer 的成功在自然语言处理领域已经引发了各种应用。在线下强化学习(RL)中,决策Transformer(DT)正在崛起为一种有前途的模型。然而,我们发现DT的注意模块并不适合捕捉RL中轨迹的内在本地依赖关系。为了超越DT的局限性,我们提出了一种新的行动序列预测器,名为决策ConvFormer(DC),基于MetaFormer的 Architecture。DC使用本地核心 filtering 作为токен混合器,可以有效捕捉RL数据中的内在本地关系。在广泛的实验中,DC达到了多种标准RLbenchmark中的最佳性能,同时需要较少的资源。此外,我们还示出DC更好地理解数据中的含义,并且具有更好的泛化能力。
High-dimensional SGD aligns with emerging outlier eigenspaces
results: 研究发现,在多层神经网络和高维杂合体中,SGD trajectory 快速地与出现的低级别异常 eigenspace 对齐,并且在多层设置中,每层的异常 eigenspace 在训练过程中进行了演化,并且在 SGD converges 到不优化类фика器时会出现rank defect。这些结果证明了一些在过去十年的数值研究中出现的观测结果,关于训练过程中预测矩阵和梯度矩阵的特征矩阵的特征。Abstract
We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
摘要
我们严格地研究了在 Stochastic Gradient Descent(SGD)训练过程中的训练动态和经验偏导矩阵和梯度矩阵的共同演化。我们证明了在两个 canonical 分类任务中, namely 高维混合体中的multi-class和一层或二层神经网络,SGD的轨迹快速与出现的低维异常特征空间相对应。此外,在多层设置中,这种对应发生在每层上,最后一层的异常特征空间在训练过程中不断演化,并在SGD converges to sub-optimal classifiers时表现出rank defect。这些结论证明了过去十年的数字实验所预测的偏导矩阵和信息矩阵的特征在训练过程中的变化。
Learning characteristic parameters and dynamics of centrifugal pumps under multi-phase flow using physics-informed neural networks
paper_authors: Felipe de Castro Teixeira Carvalho, Kamaljyoti Nath, Alberto Luiz Serpa, George Em Karniadakis
for: 这种研究是为了提高油田生产和实施控制策略而写的。
methods: 这个研究使用物理受限神经网络(PINN)模型来估算系统参数。
results: 研究发现,使用PINN模型可以减少在野外实验室测试中估算流体属性的成本。Abstract
Electrical submersible pumps (ESP) are the second most used artificial lifting equipment in the oil and gas industry due to their high flow rates and boost pressures. They often have to handle multiphase flows, which usually contain a mixture of hydrocarbons, water, and/or sediments. Given these circumstances, emulsions are commonly formed. It is a liquid-liquid flow composed of two immiscible fluids whose effective viscosity and density differ from the single phase separately. In this context, accurate modeling of ESP systems is crucial for optimizing oil production and implementing control strategies. However, real-time and direct measurement of fluid and system characteristics is often impractical due to time constraints and economy. Hence, indirect methods are generally considered to estimate the system parameters. In this paper, we formulate a machine learning model based on Physics-Informed Neural Networks (PINNs) to estimate crucial system parameters. In order to study the efficacy of the proposed PINN model, we conduct computational studies using not only simulated but also experimental data for different water-oil ratios. We evaluate the state variable's dynamics and unknown parameters for various combinations when only intake and discharge pressure measurements are available. We also study structural and practical identifiability analyses based on commonly available pressure measurements. The PINN model could reduce the requirement of expensive field laboratory tests used to estimate fluid properties.
摘要
电动潜水泵(ESP)是石油和天然气行业中第二常用人工吸引装置,因其高流量和增压性。它们经常需要处理多相流体,通常包含混合物质,水和/或淤泥。由于这些情况,涂抹是常见的。它是两种不溶的液体流体的液-液流体,其有效粘度和密度与单相流体不同。在这种情况下,ESP系统的准确模型化是关键,以便优化石油生产和实施控制策略。然而,实时和直接测量流体和系统特性是经济不实际的,因此通常使用间接方法来估算系统参数。在这篇论文中,我们使用物理学 Informed Neural Networks(PINNs)来估算系统参数。为了评估提案的PINN模型的效果,我们进行了计算研究,使用不仅数据 simulated,还有实验数据,以便为不同的水油比例进行研究。我们分析了系统参数的动态和未知参数,以及不同组合下的只有吸入和排出压力测量的情况。我们还进行了结构可识别性和实用可识别性分析,以确定可以通过常见压力测量来估算流体属性。PINN模型可以减少在野外实验室测试中估算流体属性的成本。
IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning
paper_authors: Pengyuan Lu, Michele Caprio, Eric Eaton, Insup Lee
For: The paper focuses on continual learning, specifically addressing the trade-off between different tasks and proposing a new method called Imprecise Bayesian Continual Learning (IBCL) to improve the efficiency of continual learning.* Methods: IBCL updates a knowledge base in the form of a convex hull of model parameter distributions and obtains particular models to address task trade-off preferences with zero-shot, without requiring additional training overhead.* Results: The paper shows that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters, and experiments on standard image classification and NLP tasks support this guarantee. Additionally, IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods, with steadily near-zero or positive backward transfer.Abstract
Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a distinct task performance trade-off. Researchers have discussed how to train particular models to address specific trade-off preferences. However, existing algorithms require training overheads proportional to the number of preferences -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address task trade-off preferences with zero-shot. That is, IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters. Moreover, experiments on standard image classification and NLP tasks support this guarantee. Statistically, IBCL improves average per-task accuracy by at most 23\% and peak per-task accuracy by at most 15\% with respect to the baseline methods, with steadily near-zero or positive backward transfer. Most importantly, IBCL significantly reduces the training overhead from training 1 model per preference to at most 3 models for all preferences.
摘要
LIKE 普通多任务学习,连续学习具有多目标优化的性质,因此面临当前任务分布优化时可能需要牺牲之前任务的性能。这意味着存在多个 Pareto-优质的模型,每个模型在不同的时间都能够满足不同的任务性能质量。研究人员已经讨论了如何训练特定的模型以满足特定的任务质量让权。然而,现有的算法需要训练负担与有多个偏好相对应的训练负担成比例。为回应这个问题,我们提出了不准确杯状泛化学习(IBCL)。在新任务时,IBCL 会(1)更新知识库,其形式为模型参数分布的 convex hull,并(2)在零执行下获取任务质量让权的特定模型。即,IBCL 不需要额外的训练负担来生成根据偏好训练的模型。我们证明了由 IBCL 获取的模型具有确定 Pareto 优质参数的保证。此外,我们在标准图像识别和自然语言处理任务上进行了实验,并证明了这个保证。统计 speaking,IBCL 可以提高每个任务的均值性能 by at most 23%,并且 peak 性能 by at most 15%,与基eline 方法相比。此外,IBCL 可以大幅减少训练负担,从训练 1 个模型到训练所有偏好的模型。
Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions
paper_authors: Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas
for: This paper focuses on solving game-theoretic equilibrium problems, particularly in machine learning applications with finite-sum structure.
methods: The paper proposes variants of the classical Halpern iteration that utilize variance reduction to improve the complexity guarantees. These methods are based on the properties of cocoercivity and Lipschitz continuity of the component operators.
results: The paper achieves improved complexity guarantees of $\widetilde{\mathcal{O}( n + \sqrt{n}L\varepsilon^{-1})$ for finite-sum monotone inclusions, which is near-optimal up to poly-logarithmic factors. This result is the first variance reduction-type result for general finite-sum monotone inclusions and for specific problems like convex-concave optimization.Abstract
Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is $\widetilde{\mathcal{O}( n + \sqrt{n}L\varepsilon^{-1})$, which improves upon existing methods by a factor up to $\sqrt{n}$. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal.
摘要
机器学习方法,受到逆攻击robustness或多代理 Setting的需求,解决了游戏理论平衡问题。特别是在empirical variant of learning problems中,方法targetingfinite-sum structure是非常重要的。此外,具有计算可 approximations error的方法是非常有优势,因为它们提供可验证的退出标准。驱动了这些应用,我们研究了finite-sum monotone inclusion problem,这些问题模型了广泛的平衡问题。我们的主要贡献是基于classical Halpern iteration的变种,这些变种使用了减少偏误来获得改进的复杂性保证,其中每个finite sum中的n个组件运算器在平均情况下是cocoercive或Lipschitz连续和均衡的,并且具有参数$L$。这些方法的执行 oracle complexity是$\widetilde{\mathcal{O}(n + \sqrt{n}L\varepsilon^{-1})$,这比现有方法快上到$\sqrt{n}$。这是first variance reduction-type result for general finite-sum monotone inclusions和更特定的问题,如几何-几何优化,当操作norm residual是优化度量时。我们还 argues that, up to poly-logarithmic factors,这个复杂性是不可改进的在均衡Lipschitz setting; 即,提供的结果是near-optimal。
Fast, Expressive SE$(n)$ Equivariant Networks through Weight-Sharing in Position-Orientation Space
paper_authors: Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, David W Romero
For: The paper is written to derive geometrically optimal edge attributes for flexible message passing frameworks and to develop an efficient equivariant group convolutional network for processing 3D point clouds.* Methods: The paper uses the theory of homogeneous spaces to formalize the notion of weight sharing in convolutional networks and to derive attributes that uniquely identify equivalence classes of point-pairs. The paper also uses group convolutions with feature maps over the homogeneous space of positions, position and orientations, and the group SE$(3)$ itself.* Results: The paper achieves state-of-the-art results in accuracy and speed on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models. Specifically, the paper shows that using the homogeneous space of positions and orientations significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group.Abstract
Based on the theory of homogeneous spaces we derive \textit{geometrically optimal edge attributes} to be used within the flexible message passing framework. We formalize the notion of weight sharing in convolutional networks as the sharing of message functions over point-pairs that should be treated equally. We define equivalence classes of point-pairs that are identical up to a transformation in the group and derive attributes that uniquely identify these classes. Weight sharing is then obtained by conditioning message functions on these attributes. As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions $\mathbb{R}^3$, position and orientations $\mathbb{R}^3 {\times} S^2$, and the group SE$(3)$ itself. Among these, $\mathbb{R}^3 {\times} S^2$ is an optimal choice due to the ability to represent directional information, which $\mathbb{R}^3$ methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group. We empirically support this claim by reaching state-of-the-art results -- in accuracy and speed -- on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.
摘要
As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions $\mathbb{R}^3$, position and orientations $\mathbb{R}^3 \times S^2$, and the group SE$(3)$ itself. Among these, $\mathbb{R}^3 \times S^2$ is an optimal choice due to the ability to represent directional information, which $\mathbb{R}^3$ methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group.We empirically support this claim by reaching state-of-the-art results -- in accuracy and speed -- on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.Simplified Chinese translation:基于同态空间理论,我们 derive геометрически优化的边属性,用于在灵活消息传递框架中进行使用。我们正式化了卷积网络中的重量共享,即在点对上共享消息函数。我们定义点对的等价类,并 derive Attributes 可以唯一标识这些等价类。然后,我们通过条件消息函数于这些Attributes来实现重量共享。作为理论的应用,我们开发了高效的同态群卷积网络,用于处理 3D 点云。同态群卷积的理论告诉我们如何在同态空间中进行群卷积,包括位置空间 $\mathbb{R}^3$、位置和方向空间 $\mathbb{R}^3 \times S^2$ 以及同态群 SE $(3)$ 本身。其中, $\mathbb{R}^3 \times S^2$ 是最佳选择,因为它可以表示方向信息,而 $\mathbb{R}^3$ 方法无法表示,并且对计算效率产生了显著提高。我们通过三个不同的标准测试来支持这一点:预测分子潜能能量、N-体系中的轨迹预测和通过同态扩散模型生成分子。
results: 对于媒体和大型电力网络,该研究的数值实验表明,提档的方法可以实现高效率和可扩展性。Abstract
In recent years, there has been significant interest in the development of machine learning-based optimization proxies for AC Optimal Power Flow (AC-OPF). Although significant progress has been achieved in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF. This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF. Namely, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds. The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation. Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology.
摘要
Recently, there has been significant interest in the development of machine learning-based optimization proxies for AC Optimal Power Flow (AC-OPF). Although significant progress has been achieved in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF. This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF. Specifically, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds. The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation. Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology.Here is the translation in Traditional Chinese:过去几年来,有很大的 интерес在开发机器学习基础的优化调Proxy дляAC Optimal Power Flow(AC-OPF)。 although significant progress has been made in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF。 This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF。 Specifically, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds。 The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation。 Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology。
Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning
paper_authors: Zihan Liu, Ge Wang, Jiaqi Wang, Jiangbin Zheng, Stan Z. Li
for: 这 paper 的目的是提出一种基于对抗学习的 peptide 共模型方法(RepCon),以增强 peptide 表示的学习表示的一致性,提高下游任务的推论性能。
methods: 这 paper 使用了一种基于对抗学习的框架,将 sequential 和 graphical 两种模型作为两个专家,将它们的表示进行融合,以提高 peptide 表示的学习表示的一致性。
results: 实验表明,RepCon 方法比独立模型更高效,并且在对抗学习框架下比其他共模型方法更高效。 Plus, 这 paper 还提供了模型解释,证明 RepCon 方法的有效性。Abstract
Peptides are formed by the dehydration condensation of multiple amino acids. The primary structure of a peptide can be represented either as an amino acid sequence or as a molecular graph consisting of atoms and chemical bonds. Previous studies have indicated that deep learning routes specific to sequential and graphical peptide forms exhibit comparable performance on downstream tasks. Despite the fact that these models learn representations of the same modality of peptides, we find that they explain their predictions differently. Considering sequential and graphical models as two experts making inferences from different perspectives, we work on fusing expert knowledge to enrich the learned representations for improving the discriminative performance. To achieve this, we propose a peptide co-modeling method, RepCon, which employs a contrastive learning-based framework to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. It considers representations from the sequential encoder and the graphical encoder for the same peptide sample as a positive pair and learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Empirical studies of RepCon and other co-modeling methods are conducted on open-source discriminative datasets, including aggregation propensity, retention time, antimicrobial peptide prediction, and family classification from Peptide Database. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework. In addition, the attribution on RepCon further corroborates the validity of the approach at the level of model explanation.
摘要
peptides 是由多个氨基酸的干扰凝结形成的。peptide的主要结构可以表示为氨基酸序列或化学链图,由多个氨基酸组成。之前的研究表明,深度学习模型专门针对序列和图形式的peptide模型具有相似的性能。尽管这些模型学习的是同一种modal peptide的表示,但它们对其预测的解释不同。我们认为这些模型可以视为两个专家,从不同的角度对peptide进行推理。为了融合这两个专家的知识,我们提出了一种peptide共模型方法,即RepCon,该方法使用了对比学习框架来增强对应的抽象表示之间的共识性。它将序列Encoder和图形Encoder对同一个peptide样本的表示视为正例对,并学习增强正例对之间的共识性,同时减弱负例对之间的共识性。我们对RepCon和其他共模型方法进行了实验,并对开源的推理数据集进行了测试,包括积累性、保留时间、抗微生物蛋白预测和家族分类。我们的结果表明,共模型方法比独立模型更高效,而RepCon比其他共模型方法更有优势。此外,对RepCon的解释也证明了该方法的正确性。
A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
results: 该论文证明了该流在优化策略时的稳定性和对 gradient evaluation 的稳定性,并且提供了一种基于 log-linear 政策参数化的性能评估方法。Abstract
We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows.
摘要
我们研究一个渔者-拉奥政策梯度流的全球吸引性,用于无穷远游 Markov决策过程中的无限远景 entropy-REGULATED 状态和动作空间。该流是一种继承 policy 镜投法的连续时间 analogue。我们证明了流的全球吸引性并证明其对优化策略的极速减少。此外,我们证明了流具有对 gradient 评估的稳定性,从而提供了对自然政策梯度流的log-line性参数化的性能分析。为了解决目标函数不 convex 和 entropy 补偿器导致的挑战,我们利用了性能差异 лемма和投影梯度流与 mirror descent 流的 dual 关系。
HappyFeat – An interactive and efficient BCI framework for clinical applications
results: 研究表明,HappyFeat可以帮助在时间紧张的环境中快速选择最佳特征,从而提高BCI性能。此外,HappyFeat还可以作为一种有效的工具来比较不同的信号特征,以便在训练分类算法时进行选择。Abstract
Brain-Computer Interface (BCI) systems allow users to perform actions by translating their brain activity into commands. Such systems usually need a training phase, consisting in training a classification algorithm to discriminate between mental states using specific features from the recorded signals. This phase of feature selection and training is crucial for BCI performance and presents specific constraints to be met in a clinical context, such as post-stroke rehabilitation. In this paper, we present HappyFeat, a software making Motor Imagery (MI) based BCI experiments easier, by gathering all necessary manipulations and analysis in a single convenient GUI and via automation of experiment or analysis parameters. The resulting workflow allows for effortlessly selecting the best features, helping to achieve good BCI performance in time-constrained environments. Alternative features based on Functional Connectivity can be used and compared or combined with Power Spectral Density, allowing a network-oriented approach. We then give details of HappyFeat's main mechanisms, and a review of its performances in typical use cases. We also show that it can be used as an efficient tool for comparing different metrics extracted from the signals, to train the classification algorithm. To this end, we show a comparison between the commonly-used Power Spectral Density and network metrics based on Functional Connectivity. HappyFeat is available as an open-source project which can be freely downloaded on GitHub.
摘要
Brain-Computer Interface (BCI) 系统允许用户透过识别大脑活动的信号翻译为指令。通常需要一个训练阶段,包括对特定特征的录取信号进行分类学习。这个阶段在临床上有特定的限制,如rehabilitation after stroke。 在这篇文章中,我们介绍了 HappyFeat,一个软件,使得基于想像运动 (MI) 的 BCI实验更加容易,通过集成所有必要的操作和分析到一个易用的Graphical User Interface (GUI) 中,并通过自动化实验或分析参数的自动化,以提高BCI性能。这个工作流程可以帮助在时间紧迫的环境中取得好的BCI性能。此外,HappyFeat 还可以使用不同的功能连接度来检查和比较不同的特征,以推动网络对应的方法。我们随后详细介绍了 HappyFeat 的主要机制,以及它在一般使用情况下的表现。我们还展示了它可以作为一个有效的工具,用于比较不同的信号特征,并训练分类器。为此,我们比较了通常使用的功能spectral density和基于功能连接度的网络特征。HappyFeat 为一个开源项目,可以免费下载在 GitHub 上。
Online Constraint Tightening in Stochastic Model Predictive Control: A Regression Approach
results: 提出了一种在控制过程中线上学习约束紧张参数的方法,通过使用高度表达能力的GP模型来近似最小约束紧张参数,并且可以 garantuee 约束紧张参数满足机会约束。在数值实验中,该方法比三种现状顶尖方法优于其他三种现状顶尖方法。Abstract
Solving chance-constrained stochastic optimal control problems is a significant challenge in control. This is because no analytical solutions exist for up to a handful of special cases. A common and computationally efficient approach for tackling chance-constrained stochastic optimal control problems consists of reformulating the chance constraints as hard constraints with a constraint-tightening parameter. However, in such approaches, the choice of constraint-tightening parameter remains challenging, and guarantees can mostly be obtained assuming that the process noise distribution is known a priori. Moreover, the chance constraints are often not tightly satisfied, leading to unnecessarily high costs. This work proposes a data-driven approach for learning the constraint-tightening parameters online during control. To this end, we reformulate the choice of constraint-tightening parameter for the closed-loop as a binary regression problem. We then leverage a highly expressive \gls{gp} model for binary regression to approximate the smallest constraint-tightening parameters that satisfy the chance constraints. By tuning the algorithm parameters appropriately, we show that the resulting constraint-tightening parameters satisfy the chance constraints up to an arbitrarily small margin with high probability. Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints in numerical experiments, resulting in a lower average cost than three other state-of-the-art approaches.
摘要
解决机会限定随机控制问题是控制领域的一大挑战。这是因为没有关于特定情况的分析解。一种常见和计算效率高的方法是将机会限定转换为硬件约束参数,但选择约束缩短参数仍然是一大挑战,而且通常只能在过程噪声分布已知情况下提供保证。此外,机会限定经常不是紧张满足,导致过高的成本。本工作提出了一种在控制过程中线上学习约束缩短参数的数据驱动方法。为此,我们将closed-loop中的约束缩短参数选择改为一个二进制回归问题。然后,我们利用一种高度表达力的\gls{gp}模型来近似最小的约束缩短参数,以满足机会限定。通过合适地调整算法参数,我们证明了所得到的约束缩短参数可以在高概率下满足机会限定,并且在数值实验中比三种现状顶峰技术优于。
Hoeffding’s Inequality for Markov Chains under Generalized Concentrability Condition
results: 论文通过应用通用集中性Conditions来提供了一些非 asymptotic 的韦夫丁不等式,包括empirical risk minimization with Markovian samples、Ployak-Ruppert averaging of SGD和rested Markovian bandits with general state space。Abstract
This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM). The generalized concentrability condition establishes a framework that interpolates and extends the existing hypotheses of Markov chain Hoeffding-type inequalities. The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense. We demonstrate the utility by applying our framework to several non-asymptotic analyses arising from the field of machine learning, including (i) a generalization bound for empirical risk minimization with Markovian samples, (ii) a finite sample guarantee for Ployak-Ruppert averaging of SGD, and (iii) a new regret bound for rested Markovian bandits with general state space.
摘要
A generalization bound for empirical risk minimization with Markovian samples.2. A finite sample guarantee for Ployak-Ruppert averaging of stochastic gradient descent (SGD).3. A new regret bound for rested Markovian bandits with general state spaces.
paper_authors: Hugues Van Assel, Titouan Vayer, Remi Flamary, Nicolas Courty
for: 提高优化交通(OT)的数学复杂性和交通计划的稠密度。
methods: 使用约束来限制每个点的质量流入或流出。
results: 适用于领域适应。Here’s a more detailed explanation of each point:
for: The paper aims to improve the numerical complexity and density of the transport plan in the optimal transport (OT) problem by introducing a strictly convex term and imposing constraints on the mass going in or out of each point.
methods: The paper proposes a new formulation of OT called OT with Adaptive RegularIsation (OTARI), which imposes constraints on the mass going in or out of each point to remedy the imbalance in the way mass is spread across the points.
results: The paper demonstrates the benefits of OTARI for domain adaptation.Abstract
Regularising the primal formulation of optimal transport (OT) with a strictly convex term leads to enhanced numerical complexity and a denser transport plan. Many formulations impose a global constraint on the transport plan, for instance by relying on entropic regularisation. As it is more expensive to diffuse mass for outlier points compared to central ones, this typically results in a significant imbalance in the way mass is spread across the points. This can be detrimental for some applications where a minimum of smoothing is required per point. To remedy this, we introduce OT with Adaptive RegularIsation (OTARI), a new formulation of OT that imposes constraints on the mass going in or/and out of each point. We then showcase the benefits of this approach for domain adaptation.
摘要
<>优化运输(OT)的原型化表述通过紧张的凸函数规范化可以提高数值复杂性和传输计划的稠密度。许多表述都强制实施全局约束,例如通过Entropic Regularization来实现。由于偏出点比中心点更容易扩散质量,这通常导致传输计划中每个点的平均熔炼程度具有显著偏好。这可能会对某些应用程序造成不利影响,例如需要每个点的最小平滑。为了缓解这个问题,我们介绍了OT with Adaptive RegularIsation(OTARI),一种新的优化运输表述,它在每个点上强制实施质量进入或者离开的约束。然后,我们展示了这种方法在领域适应中的优势。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The traditional Chinese form is also available, but it may not be as widely used in practice.
Enhancing Ayurvedic Diagnosis using Multinomial Naive Bayes and K-modes Clustering: An Investigation into Prakriti Types and Dosha Overlapping
results: 研究结果显示,使用MNB分类器可以达到0.90的准确率、0.81的精度、0.91的F1分数和0.90的回归率,而且对七种类型的预测和诊断具有较高的准确率和精度。Abstract
The identification of Prakriti types for the human body is a long-lost medical practice in finding the harmony between the nature of human beings and their behaviour. There are 3 fundamental Prakriti types of individuals. A person can belong to any Dosha. In the existing models, researchers have made use of SVM, KNN, PCA, Decision Tree, and various other algorithms. The output of these algorithms was quite decent, but it can be enhanced with the help of Multinomial Naive Bayes and K-modes clustering. Most of the researchers have confined themselves to 3 basic classes. This might not be accurate in the real-world scenario, where overlapping might occur. Considering these, we have classified the Doshas into 7 categories, which includes overlapping of Doshas. These are namely, VATT-Dosha, PITT-Dosha, KAPH-Dosha, VATT-PITT-Dosha, PITT-KAPH-Dosha, KAPH-VATT-Dosha, and VATT-PITT-KAPH-Dosha. The data used contains a balanced set of all individual entries on which preprocessing steps of machine learning have been performed. Chi-Square test for handling categorical data is being used for feature selection. For model fitting, the method used in this approach is K-modes clustering. The empirical results demonstrate a better result while using the MNB classifier. All key findings of this work have achieved 0.90 accuracy, 0.81 precision, 0.91 F-score, and 0.90 recall. The discussion suggests a provident analysis of the seven clusters and predicts their occurrence. The results have been consolidated to improve the Ayurvedic advancements with machine learning.
摘要
人体的 Пракрити类型认定是一项长期丢弃的医疗实践,旨在找到人类的自然和行为之间的协调。存在3种基本的 Пракрити类型。一个人可以属于任何一种多样体。现有的模型中,研究人员使用了SVM、KNN、PCA、决策树和其他算法。这些算法的输出很不错,但可以通过多omial Naive Bayes和K-模式归一化进行提高。大多数研究人员只是将人类分为3个基本类。这可能不准确,因为在现实情况下,可能会出现重叠。为此,我们将多样体分为7个类,包括多样体之间的重叠。这些类别分别是:VATT-多样体、PITT-多样体、KAPH-多样体、VATT-PITT-多样体、PITT-KAPH-多样体、KAPH-VATT-多样体和VATT-PITT-KAPH-多样体。数据集中包含了所有个体数据,并进行了机器学习预处理步骤。使用的是χ²测试来处理分类数据。为模型适应,我们使用了K-模式归一化。实际结果表明,使用MNB分类器时的结果更好。所有的关键发现都达到了0.90准确率、0.81精度、0.91F-分数和0.90恢复率。讨论中提出了7个群体的可观测分析和预测其出现。结果被总结以提高医学领域的阿瑞瓦德发展。
Attention-based Multi-task Learning for Base Editor Outcome Prediction
results: 该模型的预测结果与实验结果在多个数据集和基因编辑器变种上均显示了强相关性,这表明该模型可以有效地加速和提高基因编辑设计的过程。Abstract
Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.
摘要
人类遗传疾病常由点变化引起,强调了精准基因编辑技术的急需。其中,基因编辑技术最引人注目,因为它可以 Targeted 修改Single nucleotide level。然而,其临床应用受到低编辑效率和无意义变化的限制,导致室内实验室中需要进行广泛的试验和尝试。为了加速这个过程,我们提出了一种基于注意力的两阶段机器学习模型,可以预测给定 genomic 目标序列中的所有可能的编辑结果的可能性。我们还提议一种多任务学习 schema,可以同时学习多种基因编辑器(即变体)。我们的模型预测结果与实验结果在多个数据集和基因编辑器变体上具有强相关性。这些结果提供了进一步的验证,证明我们的模型有助于提高和加速基因编辑设计的过程。
ELUQuant: Event-Level Uncertainty Quantification in Deep Inelastic Scattering
results: 能够准确地提取 kinematic 变量 x、Q^2 和 y,并且能够提供精细的物理性 uncertainty 描述,这对于决策和数据质量监测等任务都非常有用。Abstract
We introduce a physics-informed Bayesian Neural Network (BNN) with flow approximated posteriors using multiplicative normalizing flows (MNF) for detailed uncertainty quantification (UQ) at the physics event-level. Our method is capable of identifying both heteroskedastic aleatoric and epistemic uncertainties, providing granular physical insights. Applied to Deep Inelastic Scattering (DIS) events, our model effectively extracts the kinematic variables $x$, $Q^2$, and $y$, matching the performance of recent deep learning regression techniques but with the critical enhancement of event-level UQ. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering. It also allows for the reduction of true inaccuracies without directly accessing the ground truth. A thorough DIS simulation using the H1 detector at HERA indicates possible applications for the future EIC. Additionally, this paves the way for related tasks such as data quality monitoring and anomaly detection. Remarkably, our approach effectively processes large samples at high rates.
摘要
我们介绍了一种具有物理学信息的泛化神经网络(BNN),使用多项式正常化流(MNF)来实现细致的不确定性评估(UQ)在物理事件级别。我们的方法可以识别和分解不同类型的不确定性,包括异质灵活的不确定性和知识不确定性,并提供物理学上的细致信息。在深入刺激(DIS)事件中应用了我们的模型,能够有效提取kinematic变量$x$, $Q^2$,和$y$,与最新的深度学习回归技术相当,但是具有物理事件级别的不确定性评估的重要优势。这种细致的不确定性描述对决策非常重要,特别是在事件筛选任务中。此外,它还允许降低实际错误,而不需要直接访问真实的真实值。在使用HERA的H1探测器进行深入刺激模拟中,我们发现了可能的应用于未来EIC。此外,这种方法还可以应用于数据质量监测和异常检测等相关任务。很 satisfactory的是,我们的方法可以高效处理大量样本,并且可以在高速下进行处理。
results: 该研究表明,使用spline filters来编码原子环境可以得到一个易于解释的嵌入层,可以与修改NN结构来涵盖预期的物理行为,从而提高总体的解释性。此外,研究还发现,可以在多个化学系统之间共享spline filters,以便提供一个便利的参照点,从而实现跨系统分析。Abstract
While machine learning (ML) interatomic potentials (IPs) are able to achieve accuracies nearing the level of noise inherent in the first-principles data to which they are trained, it remains to be shown if their increased complexities are strictly necessary for constructing high-quality IPs. In this work, we introduce a new MLIP framework which blends the simplicity of spline-based MEAM (s-MEAM) potentials with the flexibility of a neural network (NN) architecture. The proposed framework, which we call the spline-based neural network potential (s-NNP), is a simplified version of the traditional NNP that can be used to describe complex datasets in a computationally efficient manner. We demonstrate how this framework can be used to probe the boundary between classical and ML IPs, highlighting the benefits of key architectural changes. Furthermore, we show that using spline filters for encoding atomic environments results in a readily interpreted embedding layer which can be coupled with modifications to the NN to incorporate expected physical behaviors and improve overall interpretability. Finally, we test the flexibility of the spline filters, observing that they can be shared across multiple chemical systems in order to provide a convenient reference point from which to begin performing cross-system analyses.
摘要
机器学习(ML)间位能预测(IP)可以达到近似于初始物理数据中的精度水平,但是需要证明其增加复杂性是否必要 для建立高质量IP。在这种工作中,我们介绍了一个新的MLIP框架,它将spline-based MEAM potentials(s-MEAM)的简单性与神经网络(NN)架构融合在一起。我们称之为spline-based neural network potential(s-NNP)。这个框架是传统NNP的简化版,可以用来描述复杂的数据集,并且具有计算效率。我们示出了如何使用spline filters来编码原子环境,并将其与NN结合以捕捉预期的物理行为,从而提高总体解释性。此外,我们观察了spline filters的共享性,可以在多个化学系统之间共享,以便在不同系统之间进行跨系统分析。
FroSSL: Frobenius Norm Minimization for Self-Supervised Learning
results: FroSSL 可以更快地训练到比较好的表示,并且在 linear probe 评估中 learns 竞争性的表示。Abstract
Self-supervised learning (SSL) is an increasingly popular paradigm for representation learning. Recent methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While dimension-contrastive methods converge to similar solutions as sample-contrastive methods, it can be empirically shown that some methods require more epochs of training to converge. Motivated by closing this divide, we present the objective function FroSSL which is both sample- and dimension-contrastive up to embedding normalization. FroSSL works by minimizing covariance Frobenius norms for avoiding collapse and minimizing mean-squared error for augmentation invariance. We show that FroSSL converges more quickly than a variety of other SSL methods and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on the CIFAR-10, CIFAR-100, STL-10, and ImageNet datasets.
摘要
自适应学习(SSL)是现代表示学习的一种受欢迎的方法。当前的方法可以分为样本对照、维度对照和异形网络基于的三大类别,每个家族都有自己的方法来避免信息归一化。虽然维度对照方法和样本对照方法可以达到相同的解决方案,但是可以经验性地表明一些方法需要更多的训练集数据来融合。为了bridging这个差距,我们提出了一个名为 FroSSL 的目标函数,它同时是样本对照和维度对照的,并且可以保证均值平方差和协方差 Frobenius 范数的最小化。我们表明了 FroSSL 比许多其他 SSL 方法更快地 converges,并提供了理论和实验支持,这更快的 converges 是由 FroSSL 对嵌入协方差矩阵的 eigenvalues 的影响所致。此外,我们还证明了 FroSSL 在 Linear Probe 评估中学习出了竞争力强的表示。
Recovery of Training Data from Overparameterized Autoencoders: An Inverse Problem Perspective
for: recovery of training data from overparameterized autoencoder models
methods: use trained autoencoder to implicitly define a regularizer for the particular training dataset, and iteratively apply the trained autoencoder and simple computations to estimate and address the unknown degradation operator
results: significantly outperforms previous methods for training data recovery from autoencoders, and improves recovery performance in challenging settings that were previously considered highly challenging and impracticalAbstract
We study the recovery of training data from overparameterized autoencoder models. Given a degraded training sample, we define the recovery of the original sample as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous methods for training data recovery from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such retrieval.
摘要
我们研究从过参数化 autoencoder 模型中恢复训练数据。给定一个受损训练样本,我们定义恢复原始样本为一个逆问题,并将其转换为一个优化任务。在我们的逆问题中,我们使用训练 autoencoder 来隐式定义特定训练数据集的规 regularizer。我们将这个具有复杂的优化任务转换为一个实用的方法,该方法在每次迭代中运用训练 autoencoder 和一些简单的计算来估计和处理未知的损坏算子。我们将我们的方法应用于隐形填充问题,其目标是从 autoencoder 中恢复训练图像,并且有许多遗传的 pixels 在未知的模式中遗传。我们考虑了不同的深度 autoencoder 架构,例如完全连接和 U-Net (with 多标的非线性和在多个训练损失值),并证明我们的方法在训练数据恢复方面具有重要的进步。特别是,我们的方法在以前考虑为高度困难或不可能的设定中也具有很好的恢复性。
paper_authors: Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper for:* 这个论文目的是学习来自观察数据的导irected acyclic graph(DAG)结构,并且遵循线性结构方程模型(SEM)。methods:* 这篇论文使用了不同分布的权重函数来描述DAG的不规则结构,并且使用了可微分的非对称性来有效地搜索DAG结构。results:* 这篇论文提出了一种新的卷积分数函数,可以在不同的频率下进行鲁棒的DAG估计,并且可以在不同的频率下进行鲁棒的DAG估计。Here is the same information in Simplified Chinese:for:* 这个论文写的是为了学习来自观察数据的导irected acyclic graph(DAG)结构,并且遵循线性结构方程模型(SEM)。methods:* 这篇论文使用了不同分布的权重函数来描述DAG的不规则结构,并且使用了可微分的非对称性来有效地搜索DAG结构。results:* 这篇论文提出了一种新的卷积分数函数,可以在不同的频率下进行鲁棒的DAG估计,并且可以在不同的频率下进行鲁棒的DAG估计。Abstract
We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the $\textit{unknown}$ SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE ($\textbf{Co}$ncomitant $\textbf{Li}$near $\textbf{D}$AG $\textbf{E}$stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.
摘要
我们面临了对于观测数据的组合problem,即从线性结构方程模型(SEM)中学习统计学Graph(DAG)的结构。发掘最新的差分可条件优化方法,以有效地探索DAG的空间。大多数现有方法使用lasso类数值函数来导引搜寻,这些方法(一)需要耗费贵重的罚 Parameters 重新设定当 unknown SEM 杂质值 changed across problem instances; 和(二)隐式地假设 homoscedasticity 的限制。在这个工作中,我们提出了一个新的对称数值函数,用于精简DAG的学习,这个函数包含了共同估计标准差,因此实际上将精简 Parameters 与外生杂质水平 decoupled。通过非凸的统一矩阵 penalty 函数,我们得到了 CoLiDE(共同统一 linear DAG 估计),这是一个可以实现高效的梯度计算和关闭式估计杂质水平的条件估计。我们的算法在较大的DAG和不同杂质水平下表现更好,不需要添加额外的复杂性。我们也发现 CoLiDE 具有更好的稳定性,通过减少不同领域的标准差,强调了我们的新的线性DAG估计器的稳定性。
Something for (almost) nothing: Improving deep ensemble calibration using unlabeled data
results: 经验表明,在低至中等训练集大小情况下,我们的套件更加多样化,提供更好的准确性和套件跨度。Abstract
We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member. We provide a theoretical analysis based on a PAC-Bayes bound which guarantees that if we fit such a labeling on unlabeled data, and the true labels on the training data, we obtain low negative log-likelihood and high ensemble diversity on testing samples. Empirically, through detailed experiments, we find that for low to moderately-sized training sets, our ensembles are more diverse and provide better calibration than standard ensembles, sometimes significantly.
摘要
我们提出了一种方法来提高深度 ensemble 的准确性在小训练数据 régime 中,采用了不 labels 数据。我们的方法非常简单实现:对每个不 labels 数据点,我们 simply 随机选择不同的标签并与每个 ensemble 成员进行适应。我们提供了基于 PAC-Bayes bound 的理论分析,证明如果我们在不 labels 数据上适应这种标签,以及真实标签在训练数据上,我们就可以在测试样本上获得低负逻辑概率和高 ensemble 多样性。实际上,通过详细的实验,我们发现,对于小到中等训练集,我们的集成比标准集成更加多样化,提供了更好的准确性,有时甚至是显著的。
Stationarity without mean reversion: Improper Gaussian process regression and improper kernels
for: This paper aims to address the pathological behavior of mean-reverting Gaussian process regression by introducing improper kernels that are stationary but not mean reverting.
methods: The paper proposes the use of improper kernels, including the Smooth Walk kernel and a family of improper Matérn kernels, which can be defined only in this improper regime. The resulting posterior distributions can be computed analytically with a simple correction of the usual formulas.
results: The paper demonstrates that these improper kernels solve some known pathologies of mean-reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels, as shown through synthetic and real data analysis.Abstract
Gaussian processes (GP) regression has gained substantial popularity in machine learning applications. The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are favorite in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper, we show that it is possible to use improper GP prior with infinite variance to define processes that are stationary but not mean reverting. To this aim, we introduce a large class of improper kernels that can only be defined in this improper regime. Specifically, we introduce the Smooth Walk kernel, which produces infinitely smooth samples, and a family of improper Mat\'ern kernels, which can be defined to be $j$-times differentiable for any integer $j$. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. By analyzing both synthetic and real data, we demonstrate that these improper kernels solve some known pathologies of mean reverting GP regression while retaining most of the favourable properties of ordinary smooth stationary kernels.
摘要
Harmonic Control Lyapunov Barrier Functions for Constrained Optimal Control with Reach-Avoid Specifications
results: 实验结果显示,含有响应最大原理的响应CLBF函数可以快速进入安全区域,并且有高概率进入目标区域。Abstract
This paper introduces harmonic control Lyapunov barrier functions (harmonic CLBF) that aid in constrained control problems such as reach-avoid problems. Harmonic CLBFs exploit the maximum principle that harmonic functions satisfy to encode the properties of control Lyapunov barrier functions (CLBFs). As a result, they can be initiated at the start of an experiment rather than trained based on sample trajectories. The control inputs are selected to maximize the inner product of the system dynamics with the steepest descent direction of the harmonic CLBF. Numerical results are presented with four different systems under different reach-avoid environments. Harmonic CLBFs show a significantly low risk of entering unsafe regions and a high probability of entering the goal region.
摘要
Estimation of Models with Limited Data by Leveraging Shared Structure
results: 该论文提供了finite sample subspace estimation error guarantees,并通过实验 validate 该方法的正确性。Abstract
Modern data sets, such as those in healthcare and e-commerce, are often derived from many individuals or systems but have insufficient data from each source alone to separately estimate individual, often high-dimensional, model parameters. If there is shared structure among systems however, it may be possible to leverage data from other systems to help estimate individual parameters, which could otherwise be non-identifiable. In this paper, we assume systems share a latent low-dimensional parameter space and propose a method for recovering $d$-dimensional parameters for $N$ different linear systems, even when there are only $T摘要
现代数据集,如医疗和电商,经常来自多个个体或系统,但每个源数据不够以alone来估计高维度模型参数。如果这些系统具有共同结构,那么可能可以通过其他系统的数据来帮助估计个体参数,这些参数可能否认。在这篇论文中,我们假设这些系统共享一个低维度的 latent 参数空间,并提出一种方法来回归 $d$-维度参数的 $N$ 个不同的线性系统,即使只有每个系统 $T
methods: 本论文提出了一种新的分布自由复准预测算法,named Longitudinal Predictive Conformal Inference (LPCI),它可以确保 both longitudinal and cross-sectional coverage without resorting to infinitely wide intervals。LPCI 使用了 quantile fixed-effects regression 模型,实现了预测interval 的建立。
results: 实验结果显示 LPCI 可以实现有效的 cross-sectional coverage 和 longitudinal coverage rates,并且比现有的参考模型perform 更好。论文还提供了 asymptotic coverage guarantees 的理论分析,证明 LPCI 在两个维度上具有有限宽度的预测 интерVAL。Abstract
We introduce Longitudinal Predictive Conformal Inference (LPCI), a novel distribution-free conformal prediction algorithm for longitudinal data. Current conformal prediction approaches for time series data predominantly focus on the univariate setting, and thus lack cross-sectional coverage when applied individually to each time series in a longitudinal dataset. The current state-of-the-art for longitudinal data relies on creating infinitely-wide prediction intervals to guarantee both cross-sectional and asymptotic longitudinal coverage. The proposed LPCI method addresses this by ensuring that both longitudinal and cross-sectional coverages are guaranteed without resorting to infinitely wide intervals. In our approach, we model the residual data as a quantile fixed-effects regression problem, constructing prediction intervals with a trained quantile regressor. Our extensive experiments demonstrate that LPCI achieves valid cross-sectional coverage and outperforms existing benchmarks in terms of longitudinal coverage rates. Theoretically, we establish LPCI's asymptotic coverage guarantees for both dimensions, with finite-width intervals. The robust performance of LPCI in generating reliable prediction intervals for longitudinal data underscores its potential for broad applications, including in medicine, finance, and supply chain management.
摘要
我们介绍Longitudinal Predictive Conformal Inference(LPCI),一种新的不偏度概率预测算法,用于长itudinal数据。现有预测方法对时间序列数据偏向单variate setting,缺乏每个时间序列在长itudinal数据集中的交叉sectional覆盖。现状态 искусственного智能 для长itudinal数据是通过创建无限宽预测间隔来保证两个维度的覆盖,包括交叉sectional和垂直维度的覆盖。我们的LPCI方法解决了这个问题,因为它可以保证两个维度的覆盖,不需要无限宽的预测间隔。在我们的方法中,我们使用量iles fixed-effects regression问题来模型剩余数据,并构建基于训练量iles regressor的预测间隔。我们的广泛实验表明,LPCI可以实现有效的交叉sectional覆盖,并且在长itudinal覆盖率方面超过现有的标准准确。理论上,我们确定LPCI在两个维度上具有有限宽预测间隔的极限涵盖保证。LPCI的可靠性和稳定性在生成可靠预测间隔的长itudinal数据上表现出色,这些特点使其在医学、金融和供应链管理等领域有广泛的应用前景。
Multi-Domain Causal Representation Learning via Weak Distributional Invariances
results: 研究人员发现,通过 incorporating 稳定分布性质的子集,autoencoder 可以在不同设定下提取稳定的 latent 表示。Abstract
Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
摘要
在这项工作中,我们relax these assumptions和利用以下观察:there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. 我们利用这种观察,证明 autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
Learning to Scale Logits for Temperature-Conditional GFlowNets
results: 作者们提出了一种新的架构设计方法,即学习温度Scaling Logits(LSL-GFN),可以快速加速温度控制的GFlowNets训练。这种方法基于对温度的conditioning进行数值化处理,从而大幅提高了GFlowNets的性能,在多种生物化学任务中都达到了或超过了其他基elines和抽样方法的水平。Abstract
GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
摘要
GFlowNets 是一种概率模型,它们学习一种随机policy,该policySequentially生成组合结构,如分子图。它们被训练以抽样这些对象,抽样概率与对象的奖励相对应。 Among GFlowNets, temperature-conditional GFlowNets 表示一种由温度索引的策略家族,每个策略都与相应的凉卷奖励函数相关。 The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability
results: 实验表明,我们的方法可以生成与实际数据有相似结构和计算困难的 MILP 实例,同时能够保持实际数据的特性。Abstract
In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs). Despite the achievements, the limited availability of real-world instances often leads to sub-optimal decisions and biased solver assessments, which motivates a suite of synthetic MILP instance generation techniques. However, existing methods either rely heavily on expert-designed formulations or struggle to capture the rich features of real-world instances. To tackle this problem, we propose G2MILP, the first deep generative framework for MILP instances. Specifically, G2MILP represents MILP instances as bipartite graphs, and applies a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets, simultaneously. Thus the generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability. We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Experiments demonstrate that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness. The deliverables are released at https://miralab-ustc.github.io/L2O-G2MILP.
摘要
To address this problem, we propose G2MILP, the first deep generative framework for MILP instances. G2MILP represents MILP instances as bipartite graphs and uses a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets.We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Our experiments show that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness. The generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability.The deliverables of our method, including the source code and the generated instances, are released at .
A Data-facilitated Numerical Method for Richards Equation to Model Water Flow Dynamics in Soil
results: 该论文通过三个示例,证明了D-GRW方法的精度和质量,并与参照方法和商业解决方案进行比较。Abstract
Root-zone soil moisture monitoring is essential for precision agriculture, smart irrigation, and drought prevention. Modeling the spatiotemporal water flow dynamics in soil is typically achieved by solving a hydrological model, such as the Richards equation which is a highly nonlinear partial differential equation (PDE). In this paper, we present a novel data-facilitated numerical method for solving the mixed-form Richards equation. This numerical method, which we call the D-GRW (Data-facilitated global Random Walk) method, synergistically integrates adaptive linearization scheme, neural networks, and global random walk in a finite volume discretization framework to produce accurate numerical solutions of the Richards equation with guaranteed convergence under reasonable assumptions. Through three illustrative examples, we demonstrate and discuss the superior accuracy and mass conservation performance of our D-GRW method and compare it with benchmark numerical methods and commercial solver.
摘要
<> translate "Root-zone soil moisture monitoring is essential for precision agriculture, smart irrigation, and drought prevention. Modeling the spatiotemporal water flow dynamics in soil is typically achieved by solving a hydrological model, such as the Richards equation which is a highly nonlinear partial differential equation (PDE). In this paper, we present a novel data-facilitated numerical method for solving the mixed-form Richards equation. This numerical method, which we call the D-GRW (Data-facilitated global Random Walk) method, synergistically integrates adaptive linearization scheme, neural networks, and global random walk in a finite volume discretization framework to produce accurate numerical solutions of the Richards equation with guaranteed convergence under reasonable assumptions. Through three illustrative examples, we demonstrate and discuss the superior accuracy and mass conservation performance of our D-GRW method and compare it with benchmark numerical methods and commercial solver." into Simplified Chinese.Root-zone soil moisture monitoring 是精细农业、智能灌溉和抗旱防治的关键。通常通过解决水文模型,如理查德方程(PDE)来模拟 soil 中水流动的空间时间流动。在这篇文章中,我们介绍了一种新的数据促进 numerical 方法,称为 D-GRW(数据促进全球随机步行)方法,该方法将适应性线性化 schemes,神经网络和全球随机步行 synergistically интегрирован到 finite volume 积分框架中,以生成理查德方程的数字解决方案,并保证合理假设下的有限积分稳定性。通过三个示例,我们展示了 D-GRW 方法的精度和质量保证性,并与参考数值方法和商业解决方案进行比较。
MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems
results: 根据实际的大型机器学习模型和现代GPU训练硬件,本研究显示了预训练和测试场景中的2.24倍和5.27倍的throughput提升潜力。Abstract
Training and deploying large machine learning (ML) models is time-consuming and requires significant distributed computing infrastructures. Based on real-world large model training on datacenter-scale infrastructures, we show 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize the outstanding communication latency, in this work, we develop an agile performance modeling framework to guide parallelization and hardware-software co-design strategies. Using the suite of real-world large ML models on state-of-the-art GPU training hardware, we demonstrate 2.24x and 5.27x throughput improvement potential for pre-training and inference scenarios, respectively.
摘要
培训和部署大型机器学习(ML)模型需要很长时间,并且需要庞大的分布式计算基础设施。根据实际的大型模型在数据中心级别基础设施上的训练实践,我们发现14%-32%的所有GPU时间都被沟通占用,没有重叠计算。为了减少待机时间,在这项工作中,我们开发了一个轻松性性能模型框架,以导引并行化和硬件软件共设策略。使用现代GPU训练硬件的集成体系,我们示出了2.24倍和5.27倍的throughput提升潜力,在预训练和推理场景中。
Expected flow networks in stochastic environments and two-player zero-sum games
results: 该论文显示了EFlowNets在数据分布中实现了更好的表现,并且在游戏中实现了更高的胜率(大于80%)。Abstract
Generative flow networks (GFlowNets) are sequential sampling models trained to match a given distribution. GFlowNets have been successfully applied to various structured object generation tasks, sampling a diverse set of high-reward objects quickly. We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. We show that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design. We then extend the concept of EFlowNets to adversarial environments, proposing adversarial flow networks (AFlowNets) for two-player zero-sum games. We show that AFlowNets learn to find above 80% of optimal moves in Connect-4 via self-play and outperform AlphaZero in tournaments.
摘要
generative flow networks (GFlowNets) 是一种顺序采样模型,用于匹配给定的分布。 GFlowNets 在不同的结构化对象生成任务中成功应用,快速采样出高奖对象的多样性。我们提出了预期流网络(EFlowNets),它们extend GFlowNets 到随机环境中。我们表明,EFlowNets 在随机任务中比其他 GFlowNet 形式更高效。然后,我们扩展了 EFlowNets 的概念,提出了对抗流网络(AFlowNets),用于两个玩家的零SUM游戏。我们表明,AFlowNets 在 Connect-4 游戏中通过自游戏和AlphaZero比赛中,找到了大于 80% 的优化移动。
Graph Neural Networks and Time Series as Directed Graphs for Quality Recognition
results: 在质量识别问题中应用这两种模型,得到了有效的结果。Abstract
Graph Neural Networks (GNNs) are becoming central in the study of time series, coupled with existing algorithms as Temporal Convolutional Networks and Recurrent Neural Networks. In this paper, we see time series themselves as directed graphs, so that their topology encodes time dependencies and we start to explore the effectiveness of GNNs architectures on them. We develop two distinct Geometric Deep Learning models, a supervised classifier and an autoencoder-like model for signal reconstruction. We apply these models on a quality recognition problem.
摘要
格点网络(GNNs)在时间序列研究中变得中心,与现有算法相结合,如时间卷积网络和循环神经网络。在这篇论文中,我们看到时间序列本身是导向图,其 topology 编码时间关系,我们开始探索 GNNs 架构在它们上的效果。我们开发了两种不同的几何深度学习模型,一个是supervised分类器,另一个是循环神经网络 для信号重建。我们在质量识别问题中应用这些模型。Note that Simplified Chinese is a writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other regions.
Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study
results: 比起现有的状态艺术方法,使用 DRL 算法可以更好地增加 V-VLC 头灯的可用性和重复率,从而减少通信成本while maintaining a high level of reliability。Abstract
In today's era, autonomous vehicles demand a safety level on par with aircraft. Taking a cue from the aerospace industry, which relies on redundancy to achieve high reliability, the automotive sector can also leverage this concept by building redundancy in V2X (Vehicle-to-Everything) technologies. Given the current lack of reliable V2X technologies, this idea is particularly promising. By deploying multiple RATs (Radio Access Technologies) in parallel, the ongoing debate over the standard technology for future vehicles can be put to rest. However, coordinating multiple communication technologies is a complex task due to dynamic, time-varying channels and varying traffic conditions. This paper addresses the vertical handover problem in V2X using Deep Reinforcement Learning (DRL) algorithms. The goal is to assist vehicles in selecting the most appropriate V2X technology (DSRC/V-VLC) in a serpentine environment. The results show that the benchmarked algorithms outperform the current state-of-the-art approaches in terms of redundancy and usage rate of V-VLC headlights. This result is a significant reduction in communication costs while maintaining a high level of reliability. These results provide strong evidence for integrating advanced DRL decision mechanisms into the architecture as a promising approach to solving the vertical handover problem in V2X.
摘要
Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Translation notes:* "era" is translated as "时代" (shí dài) in Simplified Chinese.* "autonomous vehicles" is translated as "自动驾驶车辆" (zì àuto xíng yí qiāng liàng) in Simplified Chinese.* "aerospace industry" is translated as "航空航天产业" (háng kōng háng tiān chǎng yè) in Simplified Chinese.* "redundancy" is translated as "备用性" (bèi yòng xìng) in Simplified Chinese.* "V2X" is translated as "车辆到所有物的通信" (qì liàng dào suǒ yǒu wù de tōng xìn) in Simplified Chinese.* "RATs" is translated as "无线通信技术" (wú xiān tōng xìn jī shu) in Simplified Chinese.* "serpentine environment" is translated as "蛇形环境" (shé xíng huán jìng) in Simplified Chinese.* "Deep Reinforcement Learning" is translated as "深度强化学习" (shēn dào qiáng hòu xué xí) in Simplified Chinese.* "benchmarked algorithms" is translated as "标准算法" (biāo zhǔn suān fāng) in Simplified Chinese.* "usage rate" is translated as "使用率" (shǐ yòng gè) in Simplified Chinese.* "communication costs" is translated as "通信成本" (tōng xìn chéng běn) in Simplified Chinese.
Kernel-based function learning in dynamic and non stationary environments
paper_authors: Alberto Giaretta, Mauro Bisiacco, Gianluigi Pillonetto
for: 这个论文是关于Function estimation from sparse and noisy data的研究,具体来说是关于supervised learning的研究,其中每个训练集元素都是一个输入位置和一个输出响应的couple。
methods: 这篇论文使用了kernel-based ridge regression方法,并derived convergence conditions under non-stationary distributions,包括在不同时间点上的探索-利用问题。
results: 这篇论文提出了一些关于函数估计的结果,包括对non-stationary distribution下的函数估计的研究,以及在探索-利用问题中的应用。Abstract
One central theme in machine learning is function estimation from sparse and noisy data. An example is supervised learning where the elements of the training set are couples, each containing an input location and an output response. In the last decades, a substantial amount of work has been devoted to design estimators for the unknown function and to study their convergence to the optimal predictor, also characterizing the learning rate. These results typically rely on stationary assumptions where input locations are drawn from a probability distribution that does not change in time. In this work, we consider kernel-based ridge regression and derive convergence conditions under non stationary distributions, addressing also cases where stochastic adaption may happen infinitely often. This includes the important exploration-exploitation problems where e.g. a set of agents/robots has to monitor an environment to reconstruct a sensorial field and their movements rules are continuously updated on the basis of the acquired knowledge on the field and/or the surrounding environment.
摘要
(一)中心主题是机器学习中的函数估计从稀疏噪声数据中进行。例如,超级vised learning,训练集中的元素是一对输入位置和输出响应。过去几十年,对不知函数的估计和优化预测器的设计,以及其对优化预测器的整体学习率的研究,已经占据了大量的时间和精力。这些研究通常假设输入位置采样从一个不变的概率分布中,而不是时间不变的概率分布。在这种情况下,我们考虑使用核函数的ridge regression,并 derive non-stationary分布下的整体学习率。此外,我们还考虑了无限次随机adaptation的情况,包括重要的exploration-exploitation问题,例如一组机器人/робот需要监测环境,重建感知场和其运动规则是基于获得的场景和/或周围环境知识不断更新的。
Fair Feature Selection: A Comparison of Multi-Objective Genetic Algorithms
results: 比较两种方法的结果显示,lexicographic优化方法在精度方面表现较好,而不会对公平性造成影响。这是一个重要的结果,因为现在大多数的GA均基于Pareto方法,这个结果显示了一个新的进展方向。Abstract
Machine learning classifiers are widely used to make decisions with a major impact on people's lives (e.g. accepting or denying a loan, hiring decisions, etc). In such applications,the learned classifiers need to be both accurate and fair with respect to different groups of people, with different values of variables such as sex and race. This paper focuses on fair feature selection for classification, i.e. methods that select a feature subset aimed at maximising both the accuracy and the fairness of the predictions made by a classifier. More specifically, we compare two recently proposed Genetic Algorithms (GAs) for fair feature selection that are based on two different multi-objective optimisation approaches: (a) a Pareto dominance-based GA; and (b) a lexicographic optimisation-based GA, where maximising accuracy has higher priority than maximising fairness. Both GAs use the same measures of accuracy and fairness, allowing for a controlled comparison. As far as we know, this is the first comparison between the Pareto and lexicographic approaches for fair classification. The results show that, overall, the lexicographic GA outperformed the Pareto GA with respect to accuracy without degradation of the fairness of the learned classifiers. This is an important result because at present nearly all GAs for fair classification are based on the Pareto approach, so these results suggest a promising new direction for research in this area.
摘要
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel
paper_authors: Paul Hagemann, Johannes Hertrich, Fabian Altekrüger, Robert Beinert, Jannis Chemseddine, Gabriele Steidl
for: Conditional generative modeling and posterior sampling
methods: Discrete Wasserstein gradient flows and negative distance kernel
results: Efficient computation, error bound for posterior distributions, and demonstrated power through numerical examples in various applications such as conditional image generation and inverse problems like superresolution, inpainting, and computed tomography.Here’s the full text in Simplified Chinese:
results: 我们通过数学示例展示了我们的方法的力量,包括 conditional image generation 和 inverse problems 如超分辨、填充和计算Tomography 在低剂量和有限角度设置下。Abstract
We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
摘要
我们提议使用最大均方差(MMD)的条件流动,以优化 posterior sampling 和 conditional generative modeling。 MMD 具有许多有利的性质,如高效的计算via slice 和 sort。我们使用离散 Wasserstein 梯度流动来近似 JOINT 分布ground truth 和观察值,并确定错误 bound для posterior 分布。此外,我们证明我们的 particle flow 实际上是一个 Wasserstein 梯度流动的函数。我们的方法在数字例如 conditional 图像生成和 inverse 问题中如 superresolution、inpainting 和 computed tomography 中的低剂量和有限角度设置中显示出了力量。
SALSA: Semantically-Aware Latent Space Autoencoder
paper_authors: Kathryn E. Kirchoff, Travis Maxfield, Alexander Tropsha, Shawn M. Gomez
For: 该研究旨在提高深度学习在药物发现中的应用,特别是在化学数据的表示方面。* Methods: 该研究使用了自适应神经网络(Autoencoder)和变换器,并添加了一个对比任务来学习分子之间的结构相似性。* Results: 研究表明,通过添加对比任务,Autoencoder可以学习更加有意义的分子表示,并且能够更好地捕捉分子之间的结构相似性。这些表示能够帮助进一步提高药物发现的效果。Abstract
In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are defined by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not respect the structural similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA), a transformer-autoencoder modified with a contrastive task, tailored specifically to learn graph-to-graph similarity between molecules. Formally, the contrastive objective is to map structurally similar molecules (separated by a single graph edit) to nearby codes in the latent space. To accomplish this, we generate a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We compare SALSA to its ablated counterparts, and show empirically that the composed training objective (reconstruction and contrastive task) leads to a higher quality latent space that is more 1) structurally-aware, 2) semantically continuous, and 3) property-aware.
摘要
深度学习在药物发现中使用化学数据,通常将化学数据表示为简化的分子输入线入系统(SMILES)序列,这使得自然语言处理方法可以直接实现。然而,我们发现训练自动编码器solely on SMILES是不够学习分子表示,其中表示是指分子之间的结构相似性。我们通过示例显示,自动编码器可能将结构相似分子映射到远离的编码中,导致latent空间无法尊重分子之间的结构相似性。为解决这个缺陷,我们提议使用Semantically-Aware Latent Space Autoencoder(SALSA),一种基于变换器的自动编码器,并添加了一个对比 зада务,特意用于学习分子之间的结构相似性。正式来说,对比目标是将结构相似分子(分开一个分子编辑)映射到latent空间中的近处编码。为实现这一点,我们生成了一个新的数据集,其中包含了结构相似分子的集合,并选择了一种监督的对比损失,可以包含全集的正样本。我们比较了SALSA与其简化版本,并证明了组合的训练目标(重建和对比任务)会导致一个更高质量的latent空间,其1) 结构意识更强,2) 更加连续,3) 性质意识更好。
Reward Model Ensembles Help Mitigate Overoptimization
results: 研究发现,使用ensemble-based conservative optimization可以有效地遏制 reward model 的过优化,并提高表现的精度。尤其是在使用 BoN 优化时,ensemble-based conservative optimization 可以提高表现的精度最多达 70%。此外,研究还发现,在添加25% 标签错误时,ensemble-based conservative optimization 仍然能够有效地遏制过优化。Abstract
Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning large language models to follow instructions. As part of this process, learned reward models are used to approximately model human preferences. However, as imperfect representations of the "true" reward, these learned reward models are susceptible to \textit{overoptimization}. Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used. Using a similar setup, we conduct a systematic study to evaluate the efficacy of using ensemble-based conservative optimization objectives, specifically worst-case optimization (WCO) and uncertainty-weighted optimization (UWO), for mitigating reward model overoptimization when using two optimization methods: (a) best-of-n sampling (BoN) (b) proximal policy optimization (PPO). We additionally extend the setup of Gao et al. (2023) to include 25% label noise to better mirror real-world conditions. Both with and without label noise, we find that conservative optimization practically eliminates overoptimization and improves performance by up to 70% for BoN sampling. For PPO, ensemble-based conservative optimization always reduces overoptimization and outperforms single reward model optimization. Moreover, combining it with a small KL penalty successfully prevents overoptimization at no performance cost. Overall, our results demonstrate that ensemble-based conservative optimization can effectively counter overoptimization.
摘要
人工回馈学习(RLHF)是一种标准的精细调整大语言模型,以便跟进 instrucciones。在这个过程中,学习的奖励模型用于 aproximately 模型人类偏好。然而,由于这些学习的奖励模型是不完美的 "真实" 奖励的表示,因此它们容易过价 optimize。GAO et al. (2023) 在一个 sintética setup 中研究了这种现象,并显示了在不同的 proxy 奖励模型和训练数据使用的情况下,过价 optimize 仍然是一个持续存在的问题。使用相似的 setup,我们进行了一项系统的研究,以评估使用ensemble-based conservative optimization objective,特别是worst-case optimization(WCO)和uncertainty-weighted optimization(UWO)来 mitigate 奖励模型过价 optimize 问题,并使用两种优化方法:(a) best-of-n sampling (BoN)(b) proximal policy optimization (PPO)。我们还扩展了 GAO et al. 的 setup,包括25% 标签噪音,以更好地镜像实际条件。无论含有标签噪音或不含,我们发现ensemble-based conservative optimization 实际上消除了过价 optimize 问题,并提高性能达70% 。对于 PPO,ensemble-based conservative optimization 总是减少过价 optimize,并且在不产生性能损失的情况下,成功阻止过价 optimize。总的来说,我们的结果表明,ensemble-based conservative optimization 可以有效地解决过价 optimize 问题。
Comparative Analysis of Imbalanced Malware Byteplot Image Classification using Transfer Learning
results: 研究发现,当类别不均衡时,需要更少的轮数才能达到最高精度(97%),并且存在类型之间的差异。此外,ResNet50、EfficientNetB0和DenseNet169模型在不均衡和均衡数据上都能够表现良好。Abstract
Cybersecurity is a major concern due to the increasing reliance on technology and interconnected systems. Malware detectors help mitigate cyber-attacks by comparing malware signatures. Machine learning can improve these detectors by automating feature extraction, identifying patterns, and enhancing dynamic analysis. In this paper, the performance of six multiclass classification models is compared on the Malimg dataset, Blended dataset, and Malevis dataset to gain insights into the effect of class imbalance on model performance and convergence. It is observed that the more the class imbalance less the number of epochs required for convergence and a high variance across the performance of different models. Moreover, it is also observed that for malware detectors ResNet50, EfficientNetB0, and DenseNet169 can handle imbalanced and balanced data well. A maximum precision of 97% is obtained for the imbalanced dataset, a maximum precision of 95% is obtained on the intermediate imbalance dataset, and a maximum precision of 95% is obtained for the perfectly balanced dataset.
摘要
信息安全是一个主要的担忧,因为随着技术和相互连接系统的使用量的增加,黑客可以利用漏洞和攻击系统。恶意软件检测器可以减轻cyber攻击的影响,通过比较恶意软件签名。机器学习可以改进这些检测器,通过自动提取特征、识别模式和动态分析提高检测效果。在这篇论文中,我们比较了六种多类分类模型在Malimg数据集、Blended数据集和Malevis数据集上的性能,以了解类别不均衡对模型性能和融合的影响。我们发现,当类别不均衡时,模型的融合需要更少的epoch数,并且模型之间的性能差异较大。此外,我们还发现,为恶意软件检测器来说,ResNet50、EfficientNetB0和DenseNet169可以处理不均衡和均衡的数据都很好。在不均衡数据集上,最高的准确率为97%,在中等不均衡数据集上最高的准确率为95%,而在完全均衡数据集上最高的准确率为95%。
Extracting Rules from Event Data for Study Planning
results: 评估运用RWTH慕尼黑理工大学计算机科学学士学位学生,发现提posed course sequence features有效地解释学术成就指标。此外,发现可以发展更灵活的学习规划。Abstract
In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recommendations in the form of rules for study planning and compare them to the recommended study plan. The evaluation focuses on RWTH Aachen University computer science bachelor program students and demonstrates that the proposed course sequence features effectively explain academic performance measures. Furthermore, the findings suggest avenues for developing more adaptable study plans.
摘要
在这项研究中,我们研究了如何使用校园管理系统事件数据来分析高等教育学生的学习路径。主要目标是为学生的学习规划提供有价值的指导。我们使用过程挖掘技术和数据挖掘技术来探索学生们所选择的课程序列对学术成绩的影响。通过使用决策树模型,我们生成了基于数据的建议,并与推荐的学习计划进行比较。评估针对于rwth洪堡大学计算机科学学士学位学生,并证明了我们提出的课程序列特征有效地解释学术成绩指标。此外,发现还有改进学习计划的可能性。
End-to-End Training of a Neural HMM with Label and Transition Probabilities
paper_authors: Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney
for: 这 paper 的目的是提出一种基于隐藏马尔可夫模型 (HMM) 的端到端神经网络训练方法。
methods: 这 paper 使用的方法是在隐藏状态之间Explicitly modeling and learning transition probabilities。
results: 这 paper 的结果表明,虽然transition model training不提高了识别性能,但它对Alignment quality有着正面的影响,并且生成的 alignments 可以作为state-of-the-art Viterbi trainings 的可靠目标。Abstract
We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration statistics. We implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition probabilities. We investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.
摘要
我们研究了一种新的模型方法,使用隐藏Markov模型(HMM)来训练端到端神经网络。在我们的方法中,transition probabilities между隐藏状态被Explicitly modeled和学习。大多数当代序列到序列模型允许从scratch训练,通过所有可能的标签分 segmentation在给定的topology中总和。在我们的方法中,有Explicit, learnable probabilities for transition between segments,而不是一个负空标签,隐式地编码持续时间统计。我们实现了GPU上的前向后向算法,可以同时训练标签和转移概率。我们 investigate recognition results和Viterbi alignments of our models。我们发现,转移模型训练不会提高认识性能,但是它会改善对齐质量。生成的对齐是可靠的目标在state-of-the-art Viterbi训练中。
Leveraging Temporal Graph Networks Using Module Decoupling
for: The paper is written for learning on dynamic graphs, specifically addressing the issue of using batches in modern approaches and the degradation of model performance.
methods: The paper proposes a decoupling strategy that enables models to update frequently while using batches, achieved by decoupling the core modules of temporal graph networks and implementing them with a minimal number of learnable parameters.
results: The proposed Lightweight Decoupled Temporal Graph Network (LDTGN) achieves comparable or state-of-the-art results with significantly higher throughput than previous art, outperforming previous approaches by more than 20% on benchmarks that require rapid model update rates.Here are the three points in Simplified Chinese:
results: 提出的轻量级解 Coupled Temporal Graph Network (LDTGN) 在多个动态图标准 bencmarks 上得到了相似或者状态艺术的结果,并且与之前的方法比较高过的 Throughput 得到了较高的提升,比如 USLegis 或 UNTrade 等benchmarks,提升了 más de 20%。Abstract
Modern approaches for learning on dynamic graphs have adopted the use of batches instead of applying updates one by one. The use of batches allows these techniques to become helpful in streaming scenarios where updates to graphs are received at extreme speeds. Using batches, however, forces the models to update infrequently, which results in the degradation of their performance. In this work, we suggest a decoupling strategy that enables the models to update frequently while using batches. By decoupling the core modules of temporal graph networks and implementing them using a minimal number of learnable parameters, we have developed the Lightweight Decoupled Temporal Graph Network (LDTGN), an exceptionally efficient model for learning on dynamic graphs. LDTG was validated on various dynamic graph benchmarks, providing comparable or state-of-the-art results with significantly higher throughput than previous art. Notably, our method outperforms previous approaches by more than 20\% on benchmarks that require rapid model update rates, such as USLegis or UNTrade. The code to reproduce our experiments is available at \href{https://orfeld415.github.io/module-decoupling}{this http url}.
摘要
现代方法 для学习动态图使用了批处理而不是一个个更新。使用批处理可以使这些技术在流动enario中变得有用,但是它们会让模型更新不够频繁,从而导致性能下降。在这项工作中,我们提出了一种解耦策略,允许模型在使用批处理时频繁更新。我们通过对核心模块的时间图网络进行解耦,并使用最小化的学习参数来实现,开发了轻量级解耦时间图网络(LDTGN)。LDTG在各种动态图标准测试上验证,提供了相似或现有的成绩,同时具有明显高于前一代的吞吐量。特别是,我们的方法在需要快速模型更新率的标准测试上,比前一代方法提高了更多于20%。我们的实验代码可以在 \href{https://orfeld415.github.io/module-decoupling}{这个HTTP URL} 上复制。
results: 在 biochemical tasks 中显著提高性能。Abstract
Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search which focuses on exploiting high rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via destruction and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: \url{https://github.com/dbsxodud-11/ls_gfn}.
摘要
流式网络(GFlowNets)是一种抽象采样方法,它们学习一个对 discrete 对象的分布,该分布与对象的奖励相对。GFlowNets 显示出了强大的多样性生成能力,但有时会因为扫描范围太广而偶尔难以保持高奖励样本的生成。这篇论文提议通过在本地搜索中使用破坏和重建,以便通过后向和前向策略分别导航,偏好生成高奖励的样本。这与一般 GFlowNet 的解决方案生成方式不同,后者使用前向策略从零开始生成解决方案。广泛的实验表明,这种方法可以在多个生物化学任务中显著提高性能。源代码可以在以下链接中找到:\url{https://github.com/dbsxodud-11/ls_gfn}。
Tackling Hybrid Heterogeneity on Federated Optimization via Gradient Diversity Maximization
for: This paper focuses on addressing the challenges of hybrid heterogeneity in federated learning, specifically by developing a novel server-side gradient-based optimizer called \textsc{FedAWARE} to mitigate the negative effects of statistical and system heterogeneity on federated optimization.
methods: The proposed optimizer uses adaptive gradient diversity maximization in the server update direction to improve the efficiency of federated learning in heterogeneous settings. Theoretical guarantees are provided to support the effectiveness of the proposed method.
results: Extensive experiments in heterogeneous federated learning scenarios demonstrate that \textsc{FedAWARE} significantly enhances the performance of federated learning across varying degrees of hybrid heterogeneity, outperforming existing methods in terms of convergence rate and final model accuracy.Abstract
Federated learning refers to a distributed machine learning paradigm in which data samples are decentralized and distributed among multiple clients. These samples may exhibit statistical heterogeneity, which refers to data distributions are not independent and identical across clients. Additionally, system heterogeneity, or variations in the computational power of the clients, introduces biases into federated learning. The combined effects of statistical and system heterogeneity can significantly reduce the efficiency of federated optimization. However, the impact of hybrid heterogeneity is not rigorously discussed. This paper explores how hybrid heterogeneity affects federated optimization by investigating server-side optimization. The theoretical results indicate that adaptively maximizing gradient diversity in server update direction can help mitigate the potential negative consequences of hybrid heterogeneity. To this end, we introduce a novel server-side gradient-based optimizer \textsc{FedAWARE} with theoretical guarantees provided. Intensive experiments in heterogeneous federated settings demonstrate that our proposed optimizer can significantly enhance the performance of federated learning across varying degrees of hybrid heterogeneity.
摘要
Exploring Federated Optimization by Reducing Variance of Adaptive Unbiased Client Sampling
results: 根据这些技术,本论文提出了一种新的采样器called K-Vib,它可以在在线 convex 优化中对客户端采样进行优化,并达到了 $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$ 的 regret bound,其中 $K$ 是通信预算。这意味着它可以大幅提高 federated 优化的性能。Abstract
Federated Learning (FL) systems usually sample a fraction of clients to conduct a training process. Notably, the variance of global estimates for updating the global model built on information from sampled clients is highly related to federated optimization quality. This paper explores a line of "free" adaptive client sampling techniques in federated optimization, where the server builds promising sampling probability and reliable global estimates without requiring additional local communication and computation. We capture a minor variant in the sampling procedure and improve the global estimation accordingly. Based on that, we propose a novel sampler called K-Vib, which solves an online convex optimization respecting client sampling in federated optimization. It achieves improved a linear speed up on regret bound $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$ with communication budget $K$. As a result, it significantly improves the performance of federated optimization. Theoretical improvements and intensive experiments on classic federated tasks demonstrate our findings.
摘要
联合学习(Federated Learning,FL)系统通常会抽出一部分客户进行训练过程。需要注意的是,训练过程中对全球模型的更新所需的全球估计的方差与联合优化质量有高度的相关性。本文探讨了一条“免费”的自适应客户抽样技术在联合优化中,并在不需要额外的本地通信和计算之下,实现估计的可靠性和全球模型的建立。我们捕捉了抽样程序中的一小变种,并根据此进行改进全球估计。基于这,我们提出了一个名为K-Vib的新抽样器,它在联合优化中解决了线上凸优化问题,并在通信预算K下 achieves 的Linear Speedup $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$。因此,它可以对联合优化进行明显的改进。理论上的改进和实际的实验结果表明我们的发现。
Probabilistic Block Term Decomposition for the Modelling of Higher-Order Arrays
results: 在synthetic和实际数据上,我们验证了 bayesian 推断过程和提出的 pBTD 方法,并在噪声数据和模型顺序量化问题上进行了应用。结果表明, probabilistic BTD 可以量化适当的多linear结构,提供一种可靠的推断多linear数据中的pattern。Abstract
Tensors are ubiquitous in science and engineering and tensor factorization approaches have become important tools for the characterization of higher order structure. Factorizations includes the outer-product rank Canonical Polyadic Decomposition (CPD) as well as the multi-linear rank Tucker decomposition in which the Block-Term Decomposition (BTD) is a structured intermediate interpolating between these two representations. Whereas CPD, Tucker, and BTD have traditionally relied on maximum-likelihood estimation, Bayesian inference has been use to form probabilistic CPD and Tucker. We propose, an efficient variational Bayesian probabilistic BTD, which uses the von-Mises Fisher matrix distribution to impose orthogonality in the multi-linear Tucker parts forming the BTD. On synthetic and two real datasets, we highlight the Bayesian inference procedure and demonstrate using the proposed pBTD on noisy data and for model order quantification. We find that the probabilistic BTD can quantify suitable multi-linear structures providing a means for robust inference of patterns in multi-linear data.
摘要
tensor 是科学和工程领域中的普遍存在,tensor factorization 方法已成为高阶结构的特征化工具。这些分解包括外积级 Canonical Polyadic Decomposition (CPD) 以及多线性级 Tucker 分解,其中 Block-Term Decomposition (BTD) 是这两种表示之间的结构化中间件。而 CPDT, Tucker 和 BTD 传统上采用最大化可信度估计,我们则使用 Bayesian 推断来建立probabilistic CPD 和 Tucker。我们提议了一种高效的 Bayesian 推断可变 BTD,使用 von-Mises Fisher 矩阵分布来强制多线性 Tucker 部分的正交性。在一些Synthetic和两个实际数据集上,我们展示了 Bayesian 推断过程并使用我们提议的 pBTD 处理噪声数据和模型顺序量化。我们发现可变 BTD 可以量化适当的多线性结构,提供一种robust的 Pattern 推断方法。
Robust Ocean Subgrid-Scale Parameterizations Using Fourier Neural Operators
results: 本paper的结果表明,Fourier Neural Operators可以准确地捕捉小规模过程的影响,并且在长期预测中具有较少的误差。Abstract
In climate simulations, small-scale processes shape ocean dynamics but remain computationally expensive to resolve directly. For this reason, their contributions are commonly approximated using empirical parameterizations, which lead to significant errors in long-term projections. In this work, we develop parameterizations based on Fourier Neural Operators, showcasing their accuracy and generalizability in comparison to other approaches. Finally, we discuss the potential and limitations of neural networks operating in the frequency domain, paving the way for future investigation.
摘要
在气候模拟中,小规模过程对海洋动力具有重要作用,但直接计算起来很计算昂贵。因此,通常使用实验性参数化来 aproximate their contributions,这会导致长期预测中出现显著的错误。在这项工作中,我们开发了基于傅ри涅尔 нейрон算法的参数化,并证明其准确性和通用性与其他方法相比。最后,我们讨论了神经网络在频率域中运行的潜在和局限性,以便未来的调查。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
results: 研究发现,使用 dynamic policy gradient 训练 much better 利用 finite-time problems 的结构,实现 improved convergence bounds。Abstract
Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.
摘要
Hire When You Need to: Gradual Participant Recruitment for Auction-based Federated Learning
results: 实验结果显示,GPS-AFL可以对联邦学习中的成本优化,并提高绩效。相比最佳先进方法,GPS-AFL可以降低成本33.65%,并提高绩效2.91%。Abstract
The success of federated Learning (FL) depends on the quantity and quality of the data owners (DOs) as well as their motivation to join FL model training. Reputation-based FL participant selection methods have been proposed. However, they still face the challenges of the cold start problem and potential selection bias towards highly reputable DOs. Such a bias can result in lower reputation DOs being prematurely excluded from future FL training rounds, thereby reducing the diversity of training data and the generalizability of the resulting models. To address these challenges, we propose the Gradual Participant Selection scheme for Auction-based Federated Learning (GPS-AFL). Unlike existing AFL incentive mechanisms which generally assume that all DOs required for an FL task must be selected in one go, GPS-AFL gradually selects the required DOs over multiple rounds of training as more information is revealed through repeated interactions. It is designed to strike a balance between cost saving and performance enhancement, while mitigating the drawbacks of selection bias in reputation-based FL. Extensive experiments based on real-world datasets demonstrate the significant advantages of GPS-AFL, which reduces costs by 33.65% and improved total utility by 2.91%, on average compared to the best-performing state-of-the-art approach.
摘要
federated learning(FL)的成功取决于数据所有者(DO)的数量和质量以及他们参与FL模型训练的动机。基于声誉的FL参与者选择方法已被提议,但它们仍面临冷启动问题和可能的选择偏袋向高声誉DOs。这种偏袋会导致低声誉DOs在未来FL训练回合中被排除,从而减少了训练数据的多样性和模型的泛化性。为解决这些挑战,我们提出了 Gradual Participant Selection scheme for Auction-based Federated Learning(GPS-AFL)。与现有的AFL奖励机制不同,GPS-AFL在多个训练回合中逐渐选择需要参与FL任务的DO,以便在更多信息的披露下进行更加精准的选择。它旨在寻求成本节省和性能提高的平衡,同时减少选择偏袋的问题。基于实际数据集的实验表明,GPS-AFL可以规避选择偏袋问题,同时节省33.65%的成本和提高总用度2.91%,相比最佳现有方法。
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs
results: 实验结果表明,koopman VAE(KVAE)在多个Synthetic和实际时间序列生成 benchmarck上表现出色,并且可以在Regular和Irregular数据上进行优化。KVAE可以提高时间序列的discriminative和predictive metric,并且可以更好地模拟实际分布。Abstract
Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are often unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to these issues, they are (surprisingly) less often considered for time series generation. In this work, we introduce Koopman VAE (KVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leverageing spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stablity of the system can be performed using tools from dynamical systems theory. Our results show that KVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KVAE learns probability density functions that better approximate empirical ground truth distributions.
摘要
<>将文本翻译成简化中文。<>生成实际时间序数据是许多工程和科学应用中的重要问题。现有的工作使用生成对抗网络(GAN)解决这个问题。然而,GAN 在训练时常会不稳定,而且容易出现模式塌缩。而变量自编码器(VAE)则被认为是更加稳定的选择,但它们在时间序数据生成中却让人感到奇怪地少被考虑。在这种工作中,我们引入了 Koopman VAE(KVAE),一种新的生成框架,基于一种新的模型先验设计。我们使用 Koopman 理论来表示干扰条件先验动力,并可以在训练数据是正规或异常的情况下优化。我们的方法拥有两个愿望特征:(一)可以通过spectral工具来遵循干扰条件的特征,从而把域知识引入到模型中;(二)可以通过动力学系统理论来研究系统的质量和稳定性。我们的结果表明,KVAE 在多个Synthetic和实际时间序数据生成 benchmark 上表现出色,并且在训练正规或异常数据时都能够提高描述性和预测性指标。我们还提供视觉证据,表明 KVAE 学习的概率分布更加准确地反映实际的基础真实分布。
Learning adjacency matrix for dynamic graph neural network
paper_authors: Osama Ahmad, Omer Abdul Jalil, Usman Nazir, Murtaza Taj
for: This paper aims to address the challenge of modeling spatio-temporal data using Graph Convolutional Networks (GCNs) by introducing an encoder block to learn missing temporal links in the Block Adjacency Matrix (BA).
methods: The proposed method uses an encoder block to process the BA and predict connections between previously unconnected subgraphs, resulting in a Spatio-Temporal Block Adjacency Matrix (STBAM). The STBAM is then fed into a GNN to capture the complex spatio-temporal topology of the network.
results: The proposed method achieves superior results compared to state-of-the-art results on benchmark datasets, surgVisDom and C2D2, with slightly higher complexity. However, the computational overhead remains significantly lower than conventional non-graph-based methodologies for spatio-temporal data.Here is the simplified Chinese version of the three key points:
methods: 提议的方法使用encoder块处理块相互邻接矩阵(BA),并预测连接在不同时间步的子图,得到一个Spatio-Temporal Block Adjacency Matrix (STBAM)。然后将STBAM feed into GNN来捕捉复杂的空间-时间网络。
results: 提议的方法在benchmark数据集上(surgVisDom和C2D2)达到了与当前最佳方法相当的成绩,但略有更高的复杂性。然而,计算 overhead仍然远低于非图基于的方法。Abstract
In recent work, [1] introduced the concept of using a Block Adjacency Matrix (BA) for the representation of spatio-temporal data. While their method successfully concatenated adjacency matrices to encapsulate spatio-temporal relationships in a single graph, it formed a disconnected graph. This limitation hampered the ability of Graph Convolutional Networks (GCNs) to perform message passing across nodes belonging to different time steps, as no temporal links were present. To overcome this challenge, we introduce an encoder block specifically designed to learn these missing temporal links. The encoder block processes the BA and predicts connections between previously unconnected subgraphs, resulting in a Spatio-Temporal Block Adjacency Matrix (STBAM). This enriched matrix is then fed into a Graph Neural Network (GNN) to capture the complex spatio-temporal topology of the network. Our evaluations on benchmark datasets, surgVisDom and C2D2, demonstrate that our method, with slightly higher complexity, achieves superior results compared to state-of-the-art results. Our approach's computational overhead remains significantly lower than conventional non-graph-based methodologies for spatio-temporal data.
摘要
最近的工作中,[1] 提出了使用块邻接矩阵(BA)来表示空间时间数据的思想。他们的方法可以将邻接矩阵 concatenate 成一个图,以便在单个图上捕捉空间时间关系。然而,这种方法会形成离散图,这限制了图解决方法(GCNs)在不同时间步中的信息传递。为了解决这个挑战,我们提出了一个专门为了学习缺失的时间链接而设计的编码块。该编码块处理BA,预测了之前不相连的子图之间的连接,从而生成了一个具有空间时间特征的块邻接矩阵(STBAM)。这个充气的矩阵然后被 fed 给图神经网络(GNN),以捕捉复杂的空间时间 topology。我们对 benchmark 数据集 surgVisDom 和 C2D2 进行了评估,得出了较高的比较结果,而且与传统非图基的方法相比,我们的方法的计算开销仍然很低。
Analyzing Key Users’ behavior trends in Volunteer-Based Networks
paper_authors: Nofar Piterman, Tamar Makov, Michael Fire
for: This paper aims to explore the development of volunteer-based social networks and the behavior of key users in these networks.
methods: The authors developed two novel algorithms to analyze the behavior of key users in volunteer-based social networks, including a pattern-based algorithm and a machine learning-based forecasting model.
results: The authors used data from a peer-to-peer food-sharing platform to evaluate their algorithms and found that they could accurately predict future behavior of key users, with an accuracy of up to 89.6%. They identified four main types of key user behavior patterns and were able to forecast which users would become active donors or change their behavior to become mainly recipients.Abstract
Online social networks usage has increased significantly in the last decade and continues to grow in popularity. Multiple social platforms use volunteers as a central component. The behavior of volunteers in volunteer-based networks has been studied extensively in recent years. Here, we explore the development of volunteer-based social networks, primarily focusing on their key users' behaviors and activities. We developed two novel algorithms: the first reveals key user behavior patterns over time; the second utilizes machine learning methods to generate a forecasting model that can predict the future behavior of key users, including whether they will remain active donors or change their behavior to become mainly recipients, and vice-versa. These algorithms allowed us to analyze the factors that significantly influence behavior predictions. To evaluate our algorithms, we utilized data from over 2.4 million users on a peer-to-peer food-sharing online platform. Using our algorithm, we identified four main types of key user behavior patterns that occur over time. Moreover, we succeeded in forecasting future active donor key users and predicting the key users that would change their behavior to donors, with an accuracy of up to 89.6%. These findings provide valuable insights into the behavior of key users in volunteer-based social networks and pave the way for more effective communities-building in the future, while using the potential of machine learning for this goal.
摘要
在过去一个十年中,在线社交网络的使用量增长了非常 significatively,并且继续增长受欢迎。多个社交平台都使用志愿者作为中心组件。志愿者在志愿者基于的网络中的行为已经得到了广泛的研究。在这里,我们探讨了志愿者基于的社交网络的发展,主要关注针对键用户的行为和活动。我们开发了两个新的算法:第一个显示针对时间的键用户行为模式;第二个使用机器学习方法生成预测未来键用户行为的预测模型,包括未来是否继续为主要捐赠者或者变为主要接收者,并且vice versa。这些算法让我们可以分析预测行为的因素。为了评估我们的算法,我们使用了在线 peer-to-peer 食物分享平台上的超过240万名用户的数据。使用我们的算法,我们发现了四种主要的键用户行为模式,并且成功预测了未来活跃捐赠键用户和变为主要接收者的键用户,准确率达到89.6%。这些发现提供了志愿者基于社交网络的行为的有价值的洞察,并为未来建立更有效的社区帮助做出了重要贡献。
Machine Learning-Enabled Precision Position Control and Thermal Regulation in Advanced Thermal Actuators
results: 研究人员通过对一种纤维素人工肌进行Position控制,证明了该控制器可以在没有外部传感器的情况下实现精准的位移控制。Abstract
With their unique combination of characteristics - an energy density almost 100 times that of human muscle, and a power density of 5.3 kW/kg, similar to a jet engine's output - Nylon artificial muscles stand out as particularly apt for robotics applications. However, the necessity of integrating sensors and controllers poses a limitation to their practical usage. Here we report a constant power open-loop controller based on machine learning. We show that we can control the position of a nylon artificial muscle without external sensors. To this end, we construct a mapping from a desired displacement trajectory to a required power using an ensemble encoder-style feed-forward neural network. The neural controller is carefully trained on a physics-based denoised dataset and can be fine-tuned to accommodate various types of thermal artificial muscles, irrespective of the presence or absence of hysteresis.
摘要
借助其独特的特点 - 能量密度超过人体肌肉100倍,功率密度与液体发动机类似(5.3 kW/kg) - 聚合物人造肌 stood out as particularly suitable for robotics applications. However, the need to integrate sensors and controllers poses a practical limitation. We report a constant power open-loop controller based on machine learning. We show that we can control the position of a nylon artificial muscle without external sensors. To achieve this, we establish a mapping from a desired displacement trajectory to a required power using an ensemble encoder-style feed-forward neural network. The neural controller is carefully trained on a physics-based denoised dataset and can be fine-tuned to accommodate various types of thermal artificial muscles, regardless of the presence or absence of hysteresis.
Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning
results: 本研究通过实验验证了其算法的有效性,并提供了一种更加多样化和可靠的强化学习策略评估方法。Abstract
Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.
摘要
最近,再强化学习在现代统计中得到了广泛的应用,其中政策评估是关键 ком ponent。 unlike traditional machine learning literature on this topic, our work emphasizes statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework.In this paper, we develop an online robust policy evaluation procedure and establish the limiting distribution of our estimator based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.Here is the translation of the given text into Traditional Chinese:最近,再强化学习在现代统计中得到了广泛的应用,其中政策评估是关键 ком ponent。 Unlike traditional machine learning literature on this topic, our work emphasizes statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework.In this paper, we develop an online robust policy evaluation procedure and establish the limiting distribution of our estimator based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.
Improving Knowledge Distillation with Teacher’s Explanation
results: 我们的实验表明,KED学生模型可以在多个数据集上substantially outperform KD学生模型相同的复杂度。Abstract
Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This limits the amount of transferred knowledge. In this work, we introduce a novel Knowledge Explaining Distillation (KED) framework, which allows the student to learn not only from the teacher's predictions but also from the teacher's explanations. We propose a class of superfeature-explaining teachers that provide explanation over groups of features, along with the corresponding student model. We also present a method for constructing the superfeatures. We then extend KED to reduce complexity in convolutional neural networks, to allow augmentation with hidden-representation distillation methods, and to work with a limited amount of training data using chimeric sets. Our experiments over a variety of datasets show that KED students can substantially outperform KD students of similar complexity.
摘要
知识填充(KD)可以提高一个低复杂度学生模型的性能,通过一个更强大的教师模型的帮助。教师模型在KD中是一个黑盒模型,只通过其预测来传递知识给学生。这限制了知识的传递量。在这项工作中,我们介绍了一种新的知识解释填充(KED)框架,允许学生不仅从教师模型的预测中学习,还可以从教师模型的解释中获得知识。我们提议一类超特征解释教师,这些教师可以对特征组提供解释,同时与学生模型一起提供。我们还提出了超特征的构造方法。然后,我们将KED扩展到减少卷积神经网络的复杂性,使其可以与隐藏表示填充方法结合使用,并使用有限的训练数据使用 chimera 集。我们的实验表明,KED学生可以在多个数据集上明显超越KD学生。
Practical, Private Assurance of the Value of Collaboration
paper_authors: Hassan Jameel Asghar, Zhigang Lu, Zhongrui Zhao, Dali Kaafar
for: The paper is written for the problem of collaborative machine learning between two parties who want to improve the accuracy of their prediction models by sharing their datasets, but they do not want to reveal their models and datasets to each other beforehand.
methods: The paper proposes an interactive protocol based on fully homomorphic encryption (FHE) and label differential privacy to enable collaborative machine learning while preserving the privacy of the parties’ models and datasets. The protocol uses a neural network as the underlying machine learning model.
results: The paper shows that the proposed protocol achieves a significant improvement in accuracy compared to a protocol using entirely FHE operations, and the results are obtained with a time that is many orders of magnitude faster. The security of the protocol is proven in the universal composability framework assuming honest-but-curious parties, but with one party having no expertise in labeling its initial dataset.Abstract
Two parties wish to collaborate on their datasets. However, before they reveal their datasets to each other, the parties want to have the guarantee that the collaboration would be fruitful. We look at this problem from the point of view of machine learning, where one party is promised an improvement on its prediction model by incorporating data from the other party. The parties would only wish to collaborate further if the updated model shows an improvement in accuracy. Before this is ascertained, the two parties would not want to disclose their models and datasets. In this work, we construct an interactive protocol for this problem based on the fully homomorphic encryption scheme over the Torus (TFHE) and label differential privacy, where the underlying machine learning model is a neural network. Label differential privacy is used to ensure that computations are not done entirely in the encrypted domain, which is a significant bottleneck for neural network training according to the current state-of-the-art FHE implementations. We prove the security of our scheme in the universal composability framework assuming honest-but-curious parties, but where one party may not have any expertise in labelling its initial dataset. Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with time many orders of magnitude faster than a protocol using entirely FHE operations.
摘要
To address this problem, we construct an interactive protocol based on fully homomorphic encryption over the torus (TFHE) and label differential privacy. Label differential privacy ensures that computations are not performed entirely in the encrypted domain, which is a significant bottleneck for neural network training according to current state-of-the-art FHE implementations. We prove the security of our scheme in the universal composability framework, assuming honest-but-curious parties, but where one party may not have any expertise in labeling its initial dataset.Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with a time many orders of magnitude faster than a protocol using entirely FHE operations.
Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning Framework
results: 论文的实验结果表明,提出的 SemiFL 方法可以比 conventinal FL 方法提高3.2%的准确率,并且在 MNIST 数据集上达到了比 estado-of-the-art bencmarks 高的准确率。Abstract
Under the organization of the base station (BS), wireless federated learning (FL) enables collaborative model training among multiple devices. However, the BS is merely responsible for aggregating local updates during the training process, which incurs a waste of the computational resource at the BS. To tackle this issue, we propose a semi-federated learning (SemiFL) paradigm to leverage the computing capabilities of both the BS and devices for a hybrid implementation of centralized learning (CL) and FL. Specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.
摘要
在基站(BS)的组织下,无线联合学习(FL)可以在多个设备之间进行共同模型训练。然而,BS只负责收集本地更新 durante el proceso de entrenamiento, 这会导致BS的计算资源的浪费。为解决这个问题,我们提出了半联合学习(SemiFL) paradigma,以利用BS和设备之间的计算能力 для hybrid实现中央学习(CL)和FL。specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.
Heterogeneous Federated Learning Using Knowledge Codistillation
results: 在图像分类和自然语言处理任务上,提出了两种变种方法,可以超越 federated averaging 的限制,并且在只有部分out-of-domain或有限域知识传递数据时,也可以达到良好的效果。同时,双向知识传递允许模型在不同池中的客户端引入域转换。Abstract
Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters. We present two variants of our method, which improve upon federated averaging on image classification and language modeling tasks. We show this technique can be useful even if only out-of-domain or limited in-domain distillation data is available. Additionally, the bi-directional knowledge distillation allows for domain transfer between the models when different pool populations introduce domain shift.
摘要
Federated Averaging 和许多联合学习算法的变种都有一个限制:所有客户端都必须使用同一个模型结构。这会导致许多客户端的可用模型容量不被利用,从而限制模型性能。为解决这个问题,我们提议一种方法,即在整个池中训练一个小型模型,并在一些客户端上训练一个更大的模型,这些客户端具有更高的计算能力。这两个模型通过知识传承进行双向交互,无需在服务器上分享参数。我们提出了两种变种,可以超越联合平均值在图像分类和自然语言处理任务上的性能。我们表明,即使只有部分客户端的数据可用,这种技术仍然可以获得有用的效果。此外,双向知识传承允许模型在不同的池中引入域转移。
Exact and soft boundary conditions in Physics-Informed Neural Networks for the Variable Coefficient Poisson equation
results: 研究发现,soft loss基于的BC和精确距离函数基于的BC具有不同的优劣点,选择合适的BC决策方法可以提高PINN模型的拟合精度。此外,本研究还提供了实践中如何实现这些PINN模型的代码和步骤示例。Abstract
Boundary conditions (BCs) are a key component in every Physics-Informed Neural Network (PINN). By defining the solution to partial differential equations (PDEs) along domain boundaries, BCs constrain the underlying boundary value problem (BVP) that a PINN tries to approximate. Without them, unique PDE solutions may not exist and finding approximations with PINNs would be a challenging, if not impossible task. This study examines how soft loss-based and exact distance function-based BC imposition approaches differ when applied in PINNs. The well known variable coefficient Poisson equation serves as the target PDE for all PINN models trained in this work. Besides comparing BC imposition approaches, the goal of this work is to also provide resources on how to implement these PINNs in practice. To this end, Keras models with Tensorflow backend as well as a Python notebook with code examples and step-by-step explanations on how to build soft/exact BC PINNs are published alongside this review.
摘要
<>将文本翻译为简化中文。<>物理学 informed neural network(PINN)中的边界条件(BC)是一个关键组成部分。通过定义解决方程 partial differential equations(PDEs)的边界解,BC 使得 PINN approximates 的边界值问题(BVP)受到限制。无其,可能无准确解和使用 PINN approximates 是一项困难,如果不可能的任务。本研究比较了使用 soft loss 基于和 exact distance function 基于的 BC 强制方法在 PINN 中的不同效果。使用了已知变量 coefficients Poisson equation 作为所有 PINN 模型在这种工作中的目标 PDE。此外,本研究还提供了实现这些 PINN 的实践方法,包括使用 Keras 模型和 Tensorflow 后端,以及一个 Python notationebook 中的代码示例和步骤说明如何建立 soft/exact BC PINN。
Joint Design of Protein Sequence and Structure based on Motifs
results: 实验结果表明,GeoPro方法在两个生物学重要的铁蛋白 dataset上都超过了多个强基eline。特别是,我们的方法发现了不在蛋白质数据库(PDB)和UniProt中的新的β-恶啉蛋白和肌球蛋白,这些蛋白质具有稳定的折叠和活性站点环境,表明它们具有出色的生物功能。Abstract
Designing novel proteins with desired functions is crucial in biology and chemistry. However, most existing work focus on protein sequence design, leaving protein sequence and structure co-design underexplored. In this paper, we propose GeoPro, a method to design protein backbone structure and sequence jointly. Our motivation is that protein sequence and its backbone structure constrain each other, and thus joint design of both can not only avoid nonfolding and misfolding but also produce more diverse candidates with desired functions. To this end, GeoPro is powered by an equivariant encoder for three-dimensional (3D) backbone structure and a protein sequence decoder guided by 3D geometry. Experimental results on two biologically significant metalloprotein datasets, including $\beta$-lactamases and myoglobins, show that our proposed GeoPro outperforms several strong baselines on most metrics. Remarkably, our method discovers novel $\beta$-lactamases and myoglobins which are not present in protein data bank (PDB) and UniProt. These proteins exhibit stable folding and active site environments reminiscent of those of natural proteins, demonstrating their excellent potential to be biologically functional.
摘要
设计新的蛋白质with愿景功能是生物和化学中的关键。然而,大多数现有的工作都集中在蛋白质序列设计上,留下蛋白质序列和结构协同设计的潜在价值得未经探索。在本文中,我们提出了GeoPro方法,它可以同时设计蛋白质脊梁结构和序列。我们的动机是蛋白质序列和其脊梁结构之间存在紧密的关系,因此同时设计两者可以不仅避免蛋白质不感应和扭曲,而且生成更多的多样化候选者with愿景功能。为此,GeoPro得力于一种对三维脊梁结构具有对称性的编码器,以及基于3D几何学的蛋白质序列解码器。我们的实验结果表明,我们的提议的GeoPro在两个生物学上重要的铁蛋白质数据集上(包括β- лакта啡和肌红蛋白)都与多个强大的基准值相比,在大多数指标上表现出色。特别是,我们的方法发现了不在蛋白质数据库(PDB)和UniProt中的新的β- лакта啡和肌红蛋白,这些蛋白质具有自然蛋白质的稳定折叠和活化位点环境,这表明它们具有出色的生物功能。
results: 本文提供了一种有效的算法,并证明了其统计准确性。在 synthetic 数据和实际数据上进行了深入的数值实验,证明了模型的强大性。Abstract
Graphs, depicting the interrelations between variables, has been widely used as effective side information for accurate data recovery in various matrix/tensor recovery related applications. In this paper, we study the tensor completion problem with graph information. Current research on graph-regularized tensor completion tends to be task-specific, lacking generality and systematic approaches. Furthermore, a recovery theory to ensure performance remains absent. Moreover, these approaches overlook the dynamic aspects of graphs, treating them as static akin to matrices, even though graphs could exhibit dynamism in tensor-related scenarios. To confront these challenges, we introduce a pioneering framework in this paper that systematically formulates a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem. For the model, we establish a rigorous mathematical representation of the dynamic graph, based on which we derive a new tensor-oriented graph smoothness regularization. By integrating this regularization into a tensor decomposition model based on transformed t-SVD, we develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor. In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm. Subsequently, we establish assurances of statistical consistency for our model, effectively bridging a gap in the theoretical examination of the problem involving tensor recovery with graph information. In terms of the algorithm, we develop a solution of high effectiveness, accompanied by a guaranteed convergence, to address the resulting model. To showcase the prowess of our proposed model in contrast to established ones, we provide in-depth numerical experiments encompassing synthetic data as well as real-world datasets.
摘要
图表, displaying the relationships between variables, 已广泛用于精准数据恢复应用中作为有效的侧信息。在这篇论文中,我们研究了tensor completion问题中的图信息。现有的研究 tend to be task-specific, lacking generality and systematic approaches. Furthermore, a recovery theory to ensure performance remains absent. Moreover, these approaches overlook the dynamic aspects of graphs, treating them as static akin to matrices, even though graphs could exhibit dynamism in tensor-related scenarios. To confront these challenges, we introduce a pioneering framework in this paper that systematically formulates a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem.For the model, we establish a rigorous mathematical representation of the dynamic graph, based on which we derive a new tensor-oriented graph smoothness regularization. By integrating this regularization into a tensor decomposition model based on transformed t-SVD, we develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor.In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm. Subsequently, we establish assurances of statistical consistency for our model, effectively bridging a gap in the theoretical examination of the problem involving tensor recovery with graph information.In terms of the algorithm, we develop a solution of high effectiveness, accompanied by a guaranteed convergence, to address the resulting model. To showcase the prowess of our proposed model in contrast to established ones, we provide in-depth numerical experiments encompassing synthetic data as well as real-world datasets.
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
paper_authors: Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu
for: 这个论文探讨了使用梯度下降(GD)训练神经网络时,神经网络表现出了一些奇异的泛化行为。
methods: 该论文使用了两层ReLU网络和梯度下降(GD)训练方法。
results: 研究发现,在XOR集群数据上,一部分训练标签被随机变化,并且使用GD训练神经网络时,神经网络可以在训练数据上达到100%的准确率,但在测试数据上却具有近乎随机的性能。在后续训练步骤中,神经网络的测试准确率逐渐提高,并且仍然可以fitRandom labels in the training data,这是一种“搁废”现象。这是神经网络分类时,当数据分布不可分离时,首次出现的恰当过拟合现象。Abstract
Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.
摘要
神经网络通过梯度下降(GD)训练Display textGet display textHave exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.
Quantifying and mitigating the impact of label errors on model disparity metrics
results: 研究发现,标签错误会导致模型的不同性指标受到影响,特别是对少数群体的影响。此外,作者还提出了一种方法来衡量标签错误对模型的影响,并证明了这种方法可以提高模型的不同性指标。Abstract
Errors in labels obtained via human annotation adversely affect a model's performance. Existing approaches propose ways to mitigate the effect of label error on a model's downstream accuracy, yet little is known about its impact on a model's disparity metrics. Here we study the effect of label error on a model's disparity metrics. We empirically characterize how varying levels of label error, in both training and test data, affect these disparity metrics. We find that group calibration and other metrics are sensitive to train-time and test-time label error -- particularly for minority groups. This disparate effect persists even for models trained with noise-aware algorithms. To mitigate the impact of training-time label error, we present an approach to estimate the influence of a training input's label on a model's group disparity metric. We empirically assess the proposed approach on a variety of datasets and find significant improvement, compared to alternative approaches, in identifying training inputs that improve a model's disparity metric. We complement the approach with an automatic relabel-and-finetune scheme that produces updated models with, provably, improved group calibration error.
摘要
label错误会 adversely affect模型的性能。现有的方法提出了 mitigate label error的影响,但对于模型的差异度指标的影响则不很清楚。在这里,我们研究了 label error对模型的差异度指标的影响。我们经验性地characterize了不同水平的 label error在训练和测试数据中对这些差异度指标的影响。我们发现,对少数群体来说,group calibration和其他指标受到训练时和测试时 label error的影响,尤其是在使用静音感知算法时。这种差异性效应持续存在,即使使用静音感知算法。为了减少训练时 label error的影响,我们提出了一种Estimate the influence of a training input's label on a model's group disparity metric的方法。我们经验性地评估了该方法在多个数据集上,并发现它可以更好地标识训练输入,以提高模型的差异度指标。我们补充了该方法,提出了一种自动重新标签和调整的方法,可以生成更新后的模型,其差异度指标得到了改进。
QuATON: Quantization Aware Training of Optical Neurons
results: 我们在文献中提出的Diffractive Deep Neural Network(D2NN)的实现中,运用了我们的方法,实现了对于光学阶层实像和阶层物体的分类。我们在不同的量化水平和数据集上进行了广泛的实验,展示了我们的方法能够实现ONA设计的稳定性。Abstract
Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
摘要
依据我们的提案,使用优化的物理参数的光学神经网络(ONA)可以实现智能测量。然而,实现ONA的设计时,制造技术的限制通常会限制训练后的精度。物理限制也可能限制物理参数的值范围。因此,ONA应该在实现可能的约束下进行训练。然而,这会将训练问题转化为受限制的优化问题,使用现有的梯度基本方法困难。为了解决这些问题,我们提出了物理知ledge-aware归一化训练框架。我们的方法会考虑物理约束 durante el proceso de entrenamiento, leading to designs that are robust to quantization noise.我们在Literature中提出的一种diffractive deep neural network(D2NN)中进行了广泛的实验,用于全光学相干成像和相对阶段的分类。我们通过不同的归一化级别和数据集进行了extensive experiments,并证明了我们的方法可以导致ONA的设计具有鲁棒性。
Parameterized Convex Minorant for Objective Function Approximation in Amortized Optimization
for: 这 paper 的目的是提出一种基于Parameterized convex minorant(PCM)方法的可编程优化方法,用于approximating objective function。
methods: 该方法使用PCM和非负差函数的总和来approximate objective function,其中PCM convex在优化变量下bounded from below。这个approximator是continuous functions的universal approximator,global minimizer of PCM可以通过单个convex optimization获得。
results: 在numerical simulation中,该方法可以快速和可靠地learn objective functions和global minimizer,并且可以用于non-parameterized-convex objective function approximation和learning-based nonlinear model predictive control。Abstract
Parameterized convex minorant (PCM) method is proposed for the approximation of the objective function in amortized optimization. In the proposed method, the objective function approximator is expressed by the sum of a PCM and a nonnegative gap function, where the objective function approximator is bounded from below by the PCM convex in the optimization variable. The proposed objective function approximator is a universal approximator for continuous functions, and the global minimizer of the PCM attains the global minimum of the objective function approximator. Therefore, the global minimizer of the objective function approximator can be obtained by a single convex optimization. As a realization of the proposed method, extended parameterized log-sum-exp network is proposed by utilizing a parameterized log-sum-exp network as the PCM. Numerical simulation is performed for non-parameterized-convex objective function approximation and for learning-based nonlinear model predictive control to demonstrate the performance and characteristics of the proposed method. The simulation results support that the proposed method can be used to learn objective functions and to find the global minimizer reliably and quickly by using convex optimization algorithms.
摘要
Parameterized convex minorant (PCM) 方法是用于优化目标函数的approximation的方法。在提议的方法中,目标函数估计器是表示为PCM和非负差函数的和,其中目标函数估计器在优化变量上受到PCM的下界。提议的目标函数估计器是绝对 continuous 函数的通用估计器, global minimizer 的PCM可以通过单个凸优化来获得。为实现该方法, extended 参数化 log-sum-exp 网络被提议,其中 parameterized log-sum-exp 网络作为PCM。通过数值实验, demonstrate 了该方法可以用凸优化算法来可靠地和快速地学习目标函数和找到全球最小值。Note: Simplified Chinese is used in this translation, which is a simplified version of Traditional Chinese.
Stochastic Thermodynamics of Learning Generative Parametric Probabilistic Models
results: 研究发现,SGD 优化器在生成样本时释放热量,导致 PPMs 参数Subsystem 的热力学 entropy 增加,从而确定模型学习的概率分布。这种方法为权重过气化模型的泛化能力提供了热力学意义的视角。Abstract
We have formulated generative machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Then, we have studied the thermodynamic exchange between the model's parameters, denoted as $\Theta$, and the model's generated samples, denoted as $X$. We demonstrate that the training dataset and the action of the Stochastic Gradient Descent (SGD) optimizer serve as a work source that governs the time evolution of these two subsystems. Our findings reveal that the model learns through the dissipation of heat during the generation of samples $X$, leading to an increase in the entropy of the model's parameters, $\Theta$. Thus, the parameter subsystem acts as a heat reservoir, effectively storing the learned information. Furthermore, the role of the model's parameters as a heat reservoir provides valuable thermodynamic insights into the generalization power of over-parameterized models. This approach offers an unambiguous framework for computing information-theoretic quantities within deterministic neural networks by establishing connections with thermodynamic variables. To illustrate the utility of this framework, we introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the dynamic flow of information during the learning process of PPMs.
摘要
Translated into Simplified Chinese:我们已经将生成机器学习问题表述为 Parametric Probabilistic Models(PPMs)的时间演化过程,自然地涉及到热力学过程。然后,我们研究了模型参数 $\Theta$ 和生成样本 $X$ 之间的热力学交换。我们发现,训练集和 Stochastic Gradient Descent(SGD)优化器的作用共同控制这两个子系统的时间演化。我们的发现表明,模型通过生成样本 $X$ 中的热膨胀学习,导致模型参数 $\Theta$ 的熵增加。因此,参数子系统 behave as a heat reservoir,有效地存储学习的信息。此外,模型参数作为热贮储的角色提供了有价值的热力学意见,有助于理解过参数模型的泛化能力。这种方法提供了一个不ambiguous的框架,用于计算 deterministic neural networks 中的信息量量。为了证明该框架的实用性,我们引入了两个信息量度量:Memorized-information(M-info)和 Learned-information(L-info),这两个度量跟踪学习过程中信息的动态流动。
A Recipe for Improved Certifiable Robustness: Capacity and Data
paper_authors: Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson
for: 本研究旨在提高鲁棒训练的性能,特别是在 Lipschitz 约束下。
methods: 本文使用了一系列新技术和设计优化,包括 Cholesky-orthogonalized residual dense 层和 filtered generative data augmentation。
results: 本研究在多种数据集和扰动大小上显著提高了鲁棒训练的性能,并达到了最新的鲁棒精度标准(VRA)。 Specifically, the addition of large Cholesky-orthogonalized residual dense layers and filtered generative data augmentation improved the state-of-the-art verified robust accuracy by up to 8.5 percentage points.Abstract
A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training. However, effectively adding capacity under stringent Lipschitz constraints has proven more difficult than it may seem, evident by the fact that state-of-the-art approach tend more towards \emph{underfitting} than overfitting. Moreover, we posit that a lack of careful exploration of the design space for Lipshitz-based approaches has left potential performance gains on the table. In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. Using a combination of novel techniques, design optimizations, and synthesis of prior work, we are able to significantly improve the state-of-the-art \emph{verified robust accuracy} (VRA) for deterministic certification on a variety of benchmark datasets, and over a range of perturbation sizes. Of particular note, we discover that the addition of large "Cholesky-orthogonalized residual dense" layers to the end of existing state-of-the-art Lipschitz-controlled ResNet architectures is especially effective for increasing network capacity and performance. Combined with filtered generative data augmentation, our final results further the state of the art deterministic VRA by up to 8.5 percentage points. Code is available at \url{https://github.com/hukkai/liresnet}.
摘要
“一大挑战,理论上和实践上都支持,是耐性需要更大的网络容量和更多的数据,但是在严格的Lipschitz限制下添加容量并不那么容易。事实上,现今的state-of-the-art方法更倾向于\"下养\"而不是\"过养\"。此外,我们认为现有的Lipschitz基本方法的设计空间探索并未充分,导致可能的性能提升被忽略了。在这个研究中,我们提供了更完整的评估,以更好地探索Lipschitz基本方法的潜在性能。使用了一些新的技术、设计优化和对先前工作的 sintesis,我们能够在多种 benchmark 数据集上提高 state-of-the-art 的\"证实确的精度\"(VRA),并在不同的扰动大小下实现 significiant 的性能提升。尤其是,我们发现在现有的Lipschitz控制 ResNet 架构的尾端添加大量\"Cholesky-orthogonalized residual dense\"层可以增加网络容量和性能。在这些层的帮助下,我们使用了筛选的生成数据增强技术,最终的结果还是进一步推进了 state-of-the-art 的 VRA,高于8.5%。代码可以在 \url{https://github.com/hukkai/liresnet} 上找到。”
Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders
results: 该论文通过对PDB蛋白质残余进行训练,实现了对不同压缩率的结构重建,并通过对折扣扩展的Latent space进行验证,证明了Ophiuchus的可扩展性和可靠性。Abstract
Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all heavy atoms of standard protein residues, while respecting their relevant symmetries. Our model departs from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions in log-linear length complexity. We train Ophiuchus on contiguous fragments of PDB monomers, investigating its reconstruction capabilities across different compression rates. We examine the learned latent space and demonstrate its prompt usage in conformational interpolation, comparing interpolated trajectories to structure snapshots from the PDBFlex dataset. Finally, we leverage denoising diffusion probabilistic models (DDPM) to efficiently sample readily-decodable latent embeddings of diverse miniproteins. Our experiments demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation.
摘要
三维原生态蛋白质的Native状态显示出 recursively和层次结构的模式。然而,传统的图形基本模型 oft limitations 在 single fine-grained 分辨率操作,缺乏 hourglass 神经网络架构来学习高级结构块。我们减少这一差距 by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all heavy atoms of standard protein residues, while respecting their relevant symmetries。我们的模型 departure from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions in log-linear length complexity。我们在 contiguous fragments of PDB monomers 上训练 Ophiuchus,investigating its reconstruction capabilities across different compression rates。我们 examine the learned latent space and demonstrate its prompt usage in conformational interpolation, comparing interpolated trajectories to structure snapshots from the PDBFlex dataset。最后,我们利用 denoising diffusion probabilistic models (DDPM) to efficiently sample readily-decodable latent embeddings of diverse miniproteins。我们的实验 Demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation。
results: 这个论文的实验结果与现有方法竞争力相当,这表明这种 diffusion 的视角在sequential decision-making 中是一种简单、可扩展、有效的方向。Abstract
Diffusion models are a powerful class of generative models capable of mapping random noise in high-dimensional spaces to a target manifold through iterative denoising. In this work, we present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of diffusion modeling. Analogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy analogous to the score function. This approach, which we call Merlin, can reach predefined or novel goals from an arbitrary initial state without learning a separate value function. We consider three choices for the noise model to replace Gaussian noise in diffusion - reverse play from the buffer, reverse dynamics model, and a novel non-parametric approach. We theoretically justify our approach and validate it on offline goal-reaching tasks. Empirical results are competitive with state-of-the-art methods, which suggests this perspective on diffusion for RL is a simple, scalable, and effective direction for sequential decision-making.
摘要
Diffusion模型是一种强大的生成模型,可以将高维空间中的随机噪声映射到目标拟合点 через iterative denoising。在这篇文章中,我们提出了一种新的视角,将目标conditioned reinforcement learning嵌入到 diffusion 模型的 Context中。与 diffusion 过程类似,我们构建了从数据拟合点开始的轨迹,然后学习一个 conditioned 政策,类似于 score 函数。我们称之为 Merlin。这种方法可以从任意初始状态达到预定或新的目标,无需学习分离的值函数。我们考虑了三种噪声模型来取代 Gaussian 噪声在 diffusion 中,包括 reverse play from the buffer、reverse dynamics model 和一种新的非 Parametric 方法。我们 teorically 正确了我们的方法,并在 offline goal-reaching 任务中 validate 它。实验结果与当前最佳方法竞争,这 Suggests 这种 diffusion 视角是一种简单、可扩展和有效的方向 для继系列决策。
Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities
results: 研究发现,这种 PQ-based 方法可以被多个非专业人群听到,并且表明这种信息在不同的语音表示中是预测可能的。Abstract
Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptual voice qualities (PQs). By adding gendered PQs to the pathology-focused Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) protocol, our PQ-based approach provides a perceptual latent space of the character of adult voices that is an intermediary of abstraction between high-level demographics and low-level acoustic, physical, or learned representations. Contrary to prior belief, we demonstrate that these PQs are hearable by ensembles of non-experts, and further demonstrate that the information encoded in a PQ-based representation is predictable by various speech representations.
摘要
不同于其他数据模式,如文本和视觉,语音不太容易被解释。而非专业人士通常只能描述语音的高级人口信息,如性别或年龄。在这篇论文中,我们提议一种可解释的发音人身份表示方法,基于感受性声质(PQ)。通过将性别化PQ添加到已有的医学疾病预测协议中(CAPE-V),我们的PQ基本表示方法可以提供一个在抽象水平上的中间 Representation of adult voices的性质,与高级人口信息和低级声学、物理或学习表示之间的中间 Representation。与以往的信念相反,我们示出了这些PQ可以被非专业人士听到,并且我们还示出了PQ基本表示中编码的信息可以通过不同的语音表示来预测。
results: 研究结果显示,混合量子机器学习方法可以有效地处理大量的CT扫描图像,并且可以准确地分类COVID-19、CAP和正常三类。Abstract
Practical quantum computing (QC) is still in its infancy and problems considered are usually fairly small, especially in quantum machine learning when compared to its classical counterpart. Image processing applications in particular require models that are able to handle a large amount of features, and while classical approaches can easily tackle this, it is a major challenge and a cause for harsh restrictions in contemporary QC. In this paper, we apply a hybrid quantum machine learning approach to a practically relevant problem with real world-data. That is, we apply hybrid quantum transfer learning to an image processing task in the field of medical image processing. More specifically, we classify large CT-scans of the lung into COVID-19, CAP, or Normal. We discuss quantum image embedding as well as hybrid quantum machine learning and evaluate several approaches to quantum transfer learning with various quantum circuits and embedding techniques.
摘要
现代量子计算(QC)仍处于初期阶段,问题通常较小,尤其在量子机器学习领域,与经典机器学习相比。图像处理应用需要处理大量特征,而经典方法可以轻松实现,但在当代QC中是一个主要挑战,导致严格的限制。本文使用混合量子机器学习方法解决实际 relevance 的医学图像处理问题。具体来说,我们使用混合量子传输学习将大量 CT-扫描图像分类为COVID-19、CAP或正常。我们讨论量子图像嵌入以及混合量子机器学习,评估了各种量子循环和嵌入技术。