for: Addressing the problem of achieving asymptotically fair participation in machine learning models, particularly when the data distribution shifts due to deployment.
methods: Optimal control formulation and surrogate retention system based on evolutionary population dynamics to approximate the dynamics of distribution shifts on active user counts.
results: Superior performance compared to existing baseline methods in a generic simulation environment, demonstrating the effectiveness of the proposed method for long-term planning and maintaining model performance across all demographic groups.Here’s the full text in Simplified Chinese:
results: 在一个通用的 simulate 环境中,比基eline方法表现出色,证明提议的方法可以对长期规划和维护所有民主组中的表现。Abstract
The performance of state-of-the-art machine learning models often deteriorates when testing on demographics that are under-represented in the training dataset. This problem has predominately been studied in a supervised learning setting where the data distribution is static. However, real-world applications often involve distribution shifts caused by the deployed models. For instance, the performance disparity against monitory users can lead to a high customer churn rate, thus the available data provided by active users are skewed due to the lack of minority users. This feedback effect further exacerbates the disparity among different demographic groups in future steps. To address this issue, we propose asymptotically fair participation as a condition to maintain long-term model performance over all demographic groups. In this work, we aim to address the problem of achieving asymptotically fair participation via optimal control formulation. Moreover, we design a surrogate retention system based on existing literature on evolutionary population dynamics to approximate the dynamics of distribution shifts on active user counts, from which the objective of achieving asymptotically fair participation is formulated as an optimal control problem, and the control variables are considered as the model parameters. We apply an efficient implementation of Pontryagin's maximum principle to estimate the optimal control solution. To evaluate the effectiveness of the proposed method, we design a generic simulation environment that simulates the population dynamics of the feedback effect between user retention and model performance. When we deploy the resulting models to the simulation environment, the optimal control solution accounts for long-term planning and leads to superior performance compared with existing baseline methods.
摘要
现代机器学习模型在测试不充分表示的民生数据时 часто会下降性能。这个问题主要在静止数据分布下进行研究,但实际应用中经常出现数据分布shift的问题。例如,由于模型的使用者留存率不均匀,导致可用数据受到少数民生的抑制,从而使得模型的性能差异化。这种反馈效应进一步夹紧不同民生群体之间的性能差异,使得长期维护模型的性能成为一个重要的问题。为解决这个问题,我们提出了 asymptotically fair participation 作为一种维护长期模型性能的条件。在这种情况下,我们通过可EVOLUTIONARY POPULATION DYNAMICS 来模拟活跃用户数量的分布转移,从而将目标 achieving asymptotically fair participation 转化为一个优化控制问题。我们使用 Pontryagin's maximum principle 的有效实现来估计优化控制解。为评估提案的效果,我们设计了一个通用的 simulate 环境,该环境可以模拟反馈效应的影响,使得我们可以评估提案的效果。当我们将结果应用到 simulate 环境中时,优化控制解会考虑长期规划,并且比拥有基准方法更高的性能。
Adaptive Optimization Algorithms for Machine Learning
methods: 本论文使用了多种方法,包括个性化损失、meta-学习、卷积矩阵规则、步长 Newton 方法和低维度更新。
results: 本论文的研究结果包括提出了新的适应性方法,改进了现有算法的收敛保证,以及对实际应用中流行的算法进行了深入分析。Abstract
Machine learning assumes a pivotal role in our data-driven world. The increasing scale of models and datasets necessitates quick and reliable algorithms for model training. This dissertation investigates adaptivity in machine learning optimizers. The ensuing chapters are dedicated to various facets of adaptivity, including: 1. personalization and user-specific models via personalized loss, 2. provable post-training model adaptations via meta-learning, 3. learning unknown hyperparameters in real time via hyperparameter variance reduction, 4. fast O(1/k^2) global convergence of second-order methods via stepsized Newton method regardless of the initialization and choice basis, 5. fast and scalable second-order methods via low-dimensional updates. This thesis contributes novel insights, introduces new algorithms with improved convergence guarantees, and improves analyses of popular practical algorithms.
摘要
机器学习在数据驱动世界中扮演着关键性的角色。随着模型和数据集的规模的增长,需要快速可靠的模型训练算法。本论文调查了机器学习优化器中的适应性。以下章节探讨了不同方面的适应性,包括:1. 个性化损失函数 для个性化模型;2. 通过meta学习提供可证明的后期模型适应性;3. 在实时中学习 unknown 的 гипер参数 via гипер参数减少方法;4. O(1/k^2) 全球快速收敛的二次方法,无论初始化和选择基准都是;5. 快速可扩展的二次方法 via 低维度更新。本论文提供了新的发现和改进了现有算法的新算法,同时也提高了现有实践中的分析。
Improving Unimodal Inference with Multimodal Transformers
results: 在RGB和深度动手势识别、语音和脸部视频情感识别以及语音-视频-文本情感分析等任务上表现出色,超过传统单模型counterpartAbstract
This paper proposes an approach for improving performance of unimodal models with multimodal training. Our approach involves a multi-branch architecture that incorporates unimodal models with a multimodal transformer-based branch. By co-training these branches, the stronger multimodal branch can transfer its knowledge to the weaker unimodal branches through a multi-task objective, thereby improving the performance of the resulting unimodal models. We evaluate our approach on tasks of dynamic hand gesture recognition based on RGB and Depth, audiovisual emotion recognition based on speech and facial video, and audio-video-text based sentiment analysis. Our approach outperforms the conventionally trained unimodal counterparts. Interestingly, we also observe that optimization of the unimodal branches improves the multimodal branch, compared to a similar multimodal model trained from scratch.
摘要
Algebraic Topological Networks via the Persistent Local Homology Sheaf
for: 这 paper 的目的是提出一种基于代数Topology的图 convolution 和注意力模块的新方法,以便更好地利用数据的本地 topological 特性。
methods: 这 paper 使用了 sheaf neural networks 框架,通过将数据转化为 simplicial complex 后,构造其 local homology sheaf,并使用这个 sheaf 的 Laplacian 来建立更复杂的线性消息。
results: 这 paper 的结果表明,通过使用本 paper 的方法,可以construct more expressive, non-isotropic messages,并且可以 directly optimize the topology of intermediate features。Abstract
In this work, we introduce a novel approach based on algebraic topology to enhance graph convolution and attention modules by incorporating local topological properties of the data. To do so, we consider the framework of sheaf neural networks, which has been previously leveraged to incorporate additional structure into graph neural networks' features and construct more expressive, non-isotropic messages. Specifically, given an input simplicial complex (e.g. generated by the cliques of a graph or the neighbors in a point cloud), we construct its local homology sheaf, which assigns to each node the vector space of its local homology. The intermediate features of our networks live in these vector spaces and we leverage the associated sheaf Laplacian to construct more complex linear messages between them. Moreover, we extend this approach by considering the persistent version of local homology associated with a weighted simplicial complex (e.g., built from pairwise distances of nodes embeddings). This i) solves the problem of the lack of a natural choice of basis for the local homology vector spaces and ii) makes the sheaf itself differentiable, which enables our models to directly optimize the topology of their intermediate features.
摘要
在这个工作中,我们介绍了一种基于代数Topology的新方法,用于增强图 convolution和注意模块。我们通过使用 sheaf neural networks 框架,将数据的本地 topological 特性 incorporated 到图 neural networks 的特征中。具体来说,我们给输入的 simplicial complex (例如,由图中的 clique 或点云中的邻居生成) 构建了本地 homology sheaf,将每个节点的 vector space 分配给其本地 homology。我们的网络中间特征生活在这些 vector space 中,并利用相关的 sheaf Laplacian 构建更复杂的线性消息 между它们。此外,我们还扩展了这种方法,考虑weighted simplicial complex 中的 persistent local homology,解决了选择 local homology vector space 的自然基的问题,并使 sheaf 本身可导,使我们的模型直接优化其中间特征的 topology。
Near-optimal Closed-loop Method via Lyapunov Damping for Convex Optimization
paper_authors: Severin Maier, Camille Castera, Peter Ochs
for: 这个论文是为了解决首阶Optimization问题而设计的一个自动控制系统。
methods: 这个系统使用了闭环抑止来实现 convergence rate的最佳化。
results: 研究发现,这个系统可以达到 arbitrarily close to 最佳的 convergence rate,而且可以实现闭环抑止。Abstract
We introduce an autonomous system with closed-loop damping for first-order convex optimization. While, to this day, optimal rates of convergence are only achieved by non-autonomous methods via open-loop damping (e.g., Nesterov's algorithm), we show that our system is the first one featuring a closed-loop damping while exhibiting a rate arbitrarily close to the optimal one. We do so by coupling the damping and the speed of convergence of the system via a well-chosen Lyapunov function. We then derive a practical first-order algorithm called LYDIA by discretizing our system, and present numerical experiments supporting our theoretical findings.
摘要
我们介绍一个自动化系统,其中关闭调对于首项凸优化问题的关闭调。至今为止,仅有非自动化方法(如尼斯特洛夫的算法)可以 дости得最佳速度传递,但我们证明我们的系统是第一个具有关闭调的系统,其速度与最佳速度传递之间存在一定的关联。我们通过一个适当的 Lyapunov 函数来实现这一点。我们随后从数值方面提出了一个实用的首项算法,即 LYDIA,并提供了数值实验证明我们的理论成果。
Tabular Few-Shot Generalization Across Heterogeneous Feature Spaces
results: 本文的实验结果显示FLAT方法能够成功地在118个UCI数据集上进行几何学习,并与基准值相比有着明显的改善。Abstract
Despite the prevalence of tabular datasets, few-shot learning remains under-explored within this domain. Existing few-shot methods are not directly applicable to tabular datasets due to varying column relationships, meanings, and permutational invariance. To address these challenges, we propose FLAT-a novel approach to tabular few-shot learning, encompassing knowledge sharing between datasets with heterogeneous feature spaces. Utilizing an encoder inspired by Dataset2Vec, FLAT learns low-dimensional embeddings of datasets and their individual columns, which facilitate knowledge transfer and generalization to previously unseen datasets. A decoder network parametrizes the predictive target network, implemented as a Graph Attention Network, to accommodate the heterogeneous nature of tabular datasets. Experiments on a diverse collection of 118 UCI datasets demonstrate FLAT's successful generalization to new tabular datasets and a considerable improvement over the baselines.
摘要
尽管表格数据集很普遍,ew-shot学习在这个领域仍然受到不 enough 探索。现有的ew-shot方法不直接适用于表格数据集,因为列之间的关系、意义和 permutation 不变。为解决这些挑战,我们提出FLAT,一种 novel 的表格ew-shot学习方法,利用dataset2Vec inspirited Encoder 学习表格和其各列的低维度嵌入,从而促进了知识传递和泛化至之前未见的表格数据集。Decoder 网络实现了 Graph Attention Network,以适应表格数据集的多样性。我们在118个 UCI 数据集上进行了实验,并证明FLAT可以成功泛化到新的表格数据集,并且明显超过基elines。
Guaranteeing Control Requirements via Reward Shaping in Reinforcement Learning
results: 数据表明,使用本研究提出的方法可以确保优化策略满足指定的控制要求,并且可以在 OpenAI Gym 中的两个示例环境中进行 validate。Abstract
In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward shaping procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements.
摘要
<>将控制问题,如补偿和跟踪,通过强化学习方法解决时,经常需要保证获得的策略能够满足必要的性能和稳定性标准,如所需的定点时间和稳定态误差。驱动了这种需求,我们提出了一组结果和一种系统的奖励形式,以确保优化策略生成的轨迹与指定的控制要求相对应,并且可以评估任何给定策略是否满足这些要求。我们通过对OpenAI Gym提供的两个示例环境中的摆式椅子振荡问题和月球降落问题进行了广泛的数学实验,结果表明我们的提出的框架具有确保策略遵循指定控制要求的效iveness。
Online Optimization for Network Resource Allocation and Comparison with Reinforcement Learning Techniques
paper_authors: Ahmed Sid-Ali, Ioannis Lambadaris, Yiqiang Q. Zhao, Gennady Shaikhet, Amirhossein Asgharnia
for: 这paper是为了解决在线网络资源分配问题,包括工作转移。
methods: 这paper使用了Randomized Online Algorithm based on exponentially weighted method。
results: 这paper证明了该算法具有下线时间 regret,并且在人工数据上测试表明该算法在工作转移问题上表现出优于强化学习方法。Abstract
We tackle in this paper an online network resource allocation problem with job transfers. The network is composed of many servers connected by communication links. The system operates in discrete time; at each time slot, the administrator reserves resources at servers for future job requests, and a cost is incurred for the reservations made. Then, after receptions, the jobs may be transferred between the servers to best accommodate the demands. This incurs an additional transport cost. Finally, if a job request cannot be satisfied, there is a violation that engenders a cost to pay for the blocked job. We propose a randomized online algorithm based on the exponentially weighted method. We prove that our algorithm enjoys a sub-linear in time regret, which indicates that the algorithm is adapting and learning from its experiences and is becoming more efficient in its decision-making as it accumulates more data. Moreover, we test the performance of our algorithm on artificial data and compare it against a reinforcement learning method where we show that our proposed method outperforms the latter.
摘要
在本文中,我们研究了一个在线网络资源分配问题,其中包括作业传输。网络由多个服务器连接而成,系统在精确时钟下运行,管理员在每个时间槽内预留服务器上的资源,以储存未来的作业请求。预留资源的成本将会产生。然后,接收作业可能会被传输到不同的服务器,以满足需求。这会产生额外的传输成本。如果一个作业请求无法满足,那么会出现阻塞,并且需要支付阻塞作业的成本。我们提议一种随机在线算法,基于加速方法。我们证明我们的算法具有线性小于时间的 regret,这表明我们的算法在经验学习和决策过程中变得更加高效。此外,我们在人工数据上测试了我们的算法,并与一种强化学习方法进行比较,我们的提议方法在效率方面表现更好。
results: 这种方法可以高精度地描述LEO卫星的运动轨迹,并且可以保持物理可读性的坐标系。Abstract
A novel approach is presented for discovering PDEs that govern the motion of satellites in space. The method is based on SINDy, a data-driven technique capable of identifying the underlying dynamics of complex physical systems from time series data. SINDy is utilized to uncover PDEs that describe the laws of physics in space, which are non-deterministic and influenced by various factors such as drag or the reference area (related to the attitude of the satellite). In contrast to prior works, the physically interpretable coordinate system is maintained, and no dimensionality reduction technique is applied to the data. By training the model with multiple representative trajectories of LEO - encompassing various inclinations, eccentricities, and altitudes - and testing it with unseen orbital motion patterns, a mean error of around 140 km for the positions and 0.12 km/s for the velocities is achieved. The method offers the advantage of delivering interpretable, accurate, and complex models of orbital motion that can be employed for propagation or as inputs to predictive models for other variables of interest, such as atmospheric drag or the probability of collision in an encounter with a spacecraft or space objects. In conclusion, the work demonstrates the promising potential of using SINDy to discover the equations governing the behaviour of satellites in space. The technique has been successfully applied to uncover PDEs describing the motion of satellites in LEO with high accuracy. The method possesses several advantages over traditional models, including the ability to provide physically interpretable, accurate, and complex models of orbital motion derived from high-entropy datasets. These models can be utilised for propagation or as inputs to predictive models for other variables of interest.
摘要
一种新的方法被提出,用于发现卫星在空间中的运动方程。该方法基于SINDy,一种数据驱动的技术,可以从时间序列数据中找到物理系统的下面动力学。SINDy被用来揭示卫星运动的PDE,这些PDE是非束定的,受到阻力或参考面积的影响。与先前的工作不同,physically interpretable的坐标系统被保留,无需应用任何维度减少技术。通过训练模型使用多个LEO表示轨迹,包括不同的倾斜、轨道半长轴和高度,并测试它们与未经见过的轨道运动模式,实现了平均误差约为140公里的位置和0.12公里/秒的速度。这种方法具有以下优点:可提供可解释、准确、复杂的轨道运动模型,可以用于卫星的传播或作为其他变量的预测模型的输入。总之,这种方法在发现卫星在空间中的运动方程方面表现出了扎实的推力,并成功地应用于LEO中的卫星运动。这种方法比传统模型具有多个优点,包括能提供physically interpretable、准确、复杂的轨道运动模型,从高熵数据集中拟合出来的。这些模型可以用于卫星的传播或作为其他变量的预测模型的输入。
Co-data Learning for Bayesian Additive Regression Trees
paper_authors: Jeroen M. Goedhart, Thomas Klausch, Jurriaan Janssen, Mark A. van de Wiel
For: This paper proposes a method to incorporate external information (co-data) into Bayesian additive regression trees (BART) to improve prediction in medical prediction applications with small sample sizes and high-dimensional covariates.* Methods: The proposed method uses an empirical Bayes (EB) framework to estimate prior covariate weights in the BART model, and can handle multiple types of co-data simultaneously. The EB framework also estimates other hyperparameters of BART.* Results: The proposed method finds relevant covariates and improves prediction compared to default BART in simulations, and outperforms regression-based co-data learners when the covariate-response relationship is nonlinear. The method is applied to diffuse large B-cell lymphoma prognosis with clinical covariates, gene mutations, DNA translocations, and DNA copy number data.Abstract
Medical prediction applications often need to deal with small sample sizes compared to the number of covariates. Such data pose problems for prediction and variable selection, especially when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate co-data, i.e. external information on the covariates, into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate co-data, an empirical Bayes (EB) framework is developed that estimates, assisted by a co-data model, prior covariate weights in the BART model. The proposed method can handle multiple types of co-data simultaneously. Furthermore, the proposed EB framework enables the estimation of the other hyperparameters of BART as well, rendering an appealing alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is nonlinear, the method benefits from the flexibility of BART to outperform regression-based co-data learners. Finally, the use of co-data enhances prediction in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data. Keywords: Bayesian additive regression trees; Empirical Bayes; Co-data; High-dimensional data; Omics; Prediction
摘要
医学预测应用经常面临小样本大于变量的问题。这些数据会对预测和变量选择造成困难,� особенply when the covariate-response relationship is complicated. To address these challenges, we propose to incorporate co-data, i.e. external information on the covariates, into Bayesian additive regression trees (BART), a sum-of-trees prediction model that utilizes priors on the tree parameters to prevent overfitting. To incorporate co-data, an empirical Bayes (EB) framework is developed that estimates, assisted by a co-data model, prior covariate weights in the BART model. The proposed method can handle multiple types of co-data simultaneously. Furthermore, the proposed EB framework enables the estimation of the other hyperparameters of BART as well, rendering an appealing alternative to cross-validation. We show that the method finds relevant covariates and that it improves prediction compared to default BART in simulations. If the covariate-response relationship is nonlinear, the method benefits from the flexibility of BART to outperform regression-based co-data learners. Finally, the use of co-data enhances prediction in an application to diffuse large B-cell lymphoma prognosis based on clinical covariates, gene mutations, DNA translocations, and DNA copy number data.关键字:Bayesian additive regression trees; Empirical Bayes; Co-data; High-dimensional data; Omics; Prediction
Xputer: Bridging Data Gaps with NMF, XGBoost, and a Streamlined GUI Experience
results: 在性能评估中,Xputer与已有的工具 IterativeImputer 相比,不仅 Computational speed 快,而且在填充精度方面也经常表现出来。此外,Xputer可以自动处理多种数据类型,包括分类、连续和布尔型数据,不需要先进行处理。Abstract
The rapid proliferation of data across diverse fields has accentuated the importance of accurate imputation for missing values. This task is crucial for ensuring data integrity and deriving meaningful insights. In response to this challenge, we present Xputer, a novel imputation tool that adeptly integrates Non-negative Matrix Factorization (NMF) with the predictive strengths of XGBoost. One of Xputer's standout features is its versatility: it supports zero imputation, enables hyperparameter optimization through Optuna, and allows users to define the number of iterations. For enhanced user experience and accessibility, we have equipped Xputer with an intuitive Graphical User Interface (GUI) ensuring ease of handling, even for those less familiar with computational tools. In performance benchmarks, Xputer not only rivals the computational speed of established tools such as IterativeImputer but also often outperforms them in terms of imputation accuracy. Furthermore, Xputer autonomously handles a diverse spectrum of data types, including categorical, continuous, and Boolean, eliminating the need for prior preprocessing. Given its blend of performance, flexibility, and user-friendly design, Xputer emerges as a state-of-the-art solution in the realm of data imputation.
摘要
随着数据在多个领域的快速扩散,缺失值的准确填充变得非常重要,以保持数据完整性和获得有意义的发现。为回应这个挑战,我们提出了Xputer,一种新的填充工具,它将非正式矩阵分解(NMF)与XGBoost的预测能力结合得非常灵活。Xputer的一些特点包括:* 支持零填充* 通过Optuna进行参数优化* 允许用户定义迭代次数为了提高用户体验和可达性,我们为Xputer设计了一个直观的图形用户界面(GUI),使其易于操作,即使用户对计算工具不熟悉。在性能测试中,Xputer不仅与已有的工具如IterativeImputer相当,而且经常超越它们在填充精度方面。此外,Xputer可以自动处理多种数据类型,包括 categorical、continue 和Boolean,从而消除了先前的预处理需求。由于其综合表现、灵活性和用户友好的设计,Xputer在数据填充领域中成为了状态 искусственный的解决方案。
Self-supervised learning of multi-omics embeddings in the low-label, high-data regime
results: 研究发现,使用这种方法可以在肿瘤类型预测 tasks 中获得更高的性能,比如XGBoost和CatBoost等标准准则。此外,研究还探讨了多modal SSL,并提出了一种晚期融合模型,其中每种Omics都通过自己的子网络进行处理,然后将输出融合并传递给预训练或下游目标函数。这种方法在多modal样本中预测单 modal 样本的性能得到了改进。Abstract
Contrastive, self-supervised learning (SSL) is used to train a model that predicts cancer type from miRNA, mRNA or RPPA expression data. This model, a pretrained FT-Transformer, is shown to outperform XGBoost and CatBoost, standard benchmarks for tabular data, when labelled samples are scarce but the number of unlabelled samples is high. This is despite the fact that the datasets we use have $\mathcal{O}(10^{1})$ classes and $\mathcal{O}(10^{2})-\mathcal{O}(10^{4})$ features. After demonstrating the efficacy of our chosen method of self-supervised pretraining, we investigate SSL for multi-modal models. A late-fusion model is proposed, where each omics is passed through its own sub-network, the outputs of which are averaged and passed to the pretraining or downstream objective function. Multi-modal pretraining is shown to improve predictions from a single omics, and we argue that this is useful for datasets with many unlabelled multi-modal samples, but few labelled unimodal samples. Additionally, we show that pretraining each omics-specific module individually is highly effective. This enables the application of the proposed model in a variety of contexts where a large amount of unlabelled data is available from each omics, but only a few labelled samples.
摘要
“对比自学习(Contrastive, self-supervised learning)用于训练一个预训练FT-Transformer模型,用于预测肿瘤类型基于miRNA、mRNA或RPPA表达数据。这个模型在标签样本稀缺但无标签样本多的情况下表现出优于XGBoost和CatBoost标准准则,即使我们使用的数据集有数十个类型和数十个到数百个特征。我们首先证明了我们选择的自我超vised预训练方法的有效性,然后我们调查了多modal模型的SSL。我们提议了一种晚期 fusione模型,其中每个Omics被 passing through its own sub-network,输出被平均并 passing to the pretraining或 downstream objective function。我们发现,多modal预训练可以提高单modal预测结果,并且我们认为这是有用的在 datasets中有多个不标签多modal样本,但只有少量标签单modal样本。此外,我们发现预训练每个 OmicsSpecific module 都是非常有效的。这使得我们的模型可以在每个 Omics 有大量未标签数据,但只有几个标签单Modal 样本的情况下应用。”
Natural Disaster Analysis using Satellite Imagery and Social-Media Data for Emergency Response Situations
paper_authors: Sukeerthi Mandyam, Shanmuga Priya MG, Shalini Suresh, Kavitha Srinivasan for:这项研究旨在分析不同类型的数据(卫星图像和推特数据),以提供深入的灾害管理分析。methods:这项研究包括两个阶段:卫星图像分析和推特数据分析,然后将这两个模块集成使用位置坐标。在第一阶段,使用多类地表征分 segmentation技术,基于U-Net架构进行预和后灾害卫星图像分析。在第二阶段,将地区映射到必需的紧急救援操作信息上,并提取推特数据使用关键词对应的地区。results:这项研究得到的结果是一种基于实时位置坐标和频率分析技术的多维ensional信息集成系统,可以帮助灾害管理人员在灾害发生时获得全面的情况概述,如喀拉拉和密西西比洪灾的分析和验证。这项研究的创新之处在于通过使用分割卫星图像和地区特定筛选器,对灾害区域进行深入的分析和救援操作。Abstract
Disaster Management is one of the most promising research areas because of its significant economic, environmental and social repercussions. This research focuses on analyzing different types of data (pre and post satellite images and twitter data) related to disaster management for in-depth analysis of location-wise emergency requirements. This research has been divided into two stages, namely, satellite image analysis and twitter data analysis followed by integration using location. The first stage involves pre and post disaster satellite image analysis of the location using multi-class land cover segmentation technique based on U-Net architecture. The second stage focuses on mapping the region with essential information about the disaster situation and immediate requirements for relief operations. The severely affected regions are demarcated and twitter data is extracted using keywords respective to that location. The extraction of situational information from a large corpus of raw tweets adopts Content Word based Tweet Summarization (COWTS) technique. An integration of these modules using real-time location-based mapping and frequency analysis technique gathers multi-dimensional information in the advent of disaster occurrence such as the Kerala and Mississippi floods that were analyzed and validated as test cases. The novelty of this research lies in the application of segmented satellite images for disaster relief using highlighted land cover changes and integration of twitter data by mapping these region-specific filters for obtaining a complete overview of the disaster.
摘要
灾害管理是一个非常有前途的研究领域,因为它具有重要的经济、环境和社会影响。这个研究的目的是分析不同类型的数据(卫星图像和推特数据),以进行深入的灾害管理分析。这个研究分为两个阶段:卫星图像分析和推特数据分析,然后是这两个分析结果的集成。第一阶段是使用多类别土地覆盖分类技术(U-Net架构)进行卫星图像分析,以分析灾害发生前后的地区变化。第二阶段是将地区分配为不同的灾害情况,并从推特数据中提取相关信息。在这个阶段,采用Content Word based Tweet Summarization(COWTS)技术来提取灾害情况的主要信息。将这两个模块集成使用实时地理位置基于的映射和频率分析技术,可以同时获得不同灾害情况的多维度信息。在实验阶段,对印度喀拉拉和美国密西西比洪涝进行了实验和验证。这个研究的创新点在于,通过使用分类卫星图像和地区特定的推特数据,对灾害发生情况进行全面的概括和分析。
Fast multiplication by two’s complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation
results: 这种方法可以实现任何整数或实数的表示为一个列表中的整数指数,并且可以将这些指数存储和分布在多个CPU / GPU上。此外,这种方法还可以完全分布加法和乘法操作,从而超越现有的并行乘法方法的限制,即需要共享公共核心内存和磁盘来计算结果和中间结果。Abstract
We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a set of algorithms implemented in python code. We demonstrate the method to be faster than both the Number Theoretic Transform (NTT) and Karatsuba for multiplication within a certain bit range. Also implemented in python code for comparison purposes with the polynomial radix 2 integer method. We demonstrate that it is possible to express any integer or real number as a list of integer indices, representing a finite series in base two. The finite series of integer index representation of a number can then be stored and distributed across multiple CPUs / GPUs. We show that operations of addition and multiplication can be applied as two's complement additions operating on the index integer representations and can be fully distributed across a given CPU / GPU architecture. We demonstrate fully distributed arithmetic operations such that the 'polynomial integer index multiplication' method overcomes the current limitation of parallel multiplication methods. Ie, the need to share common core memory and common disk for the calculation of results and intermediate results.
摘要
我们展示了一种多项式方法,基于表示为二进制基数指数的数字,并将其存储为整数列表。我们称之为“多项式整数指标乘法”方法,这是一系列python代码实现的算法。我们证明这种方法在某个位数范围内比NUMBER THEORETIC TRANSFORM(NTT)和加加姆托卡(Karatsuba) multiplication 方法更快。此外,我们还在python代码中实现了这些方法,以便与多项式整数指标乘法方法进行比较。我们示出了任意整数或实数可以表示为一个列表的整数指数表示,并且这个表示可以被存储和分布在多个CPU / GPU架构上。我们还证明了在多个CPU / GPU架构上实现了完全分布式加法和乘法操作,使得“多项式整数指标乘法”方法超越了当前的并行乘法方法的限制,即需要共享公共核心内存和公共磁盘来计算结果和中间结果。
On some elusive aspects of databases hindering AI based discovery: A case study on superconducting materials
paper_authors: Giovanni Trezza, Eliodoro Chiavazzo
for: 本文旨在探讨大数据的准确性和AI模型的设计问题。
methods: 本文使用了三种方法来检测和衡量数据偏见:批处理方法、维度衡量方法和维度减少方法。
results: 本文通过对超导材料和热电材料两种示例进行分析,发现数据偏见存在于样本选择、隐藏变量和数据年龄等方面,并提出了一种新的检测方法。Abstract
It stands to reason that the amount and the quality of big data is of key importance for setting up accurate AI-driven models. Nonetheless, we believe there are still critical roadblocks in the inherent generation of databases, that are often underestimated and poorly discussed in the literature. In our view, such issues can seriously hinder the AI-based discovery process, even when high quality, sufficiently large and highly reputable data sources are available. Here, considering superconducting and thermoelectric materials as two representative case studies, we specifically discuss three aspects, namely intrinsically biased sample selection, possible hidden variables, disparate data age. Importantly, to our knowledge, we suggest and test a first strategy capable of detecting and quantifying the presence of the intrinsic data bias.
摘要
“据悉,大数据量和质量对于建立准确的人工智能驱动模型是关键。然而,我们认为在自然生成数据库时存在一些 crítical roadblocks,这些问题在文献中受到了低估和不充分讨论。我们认为这些问题可能会妨碍人工智能基于发现过程,即使有高质量、充分大、受人尊敬的数据源也有。在这里,通过使用超导和热电材料作为两个例子,我们专门讨论了三个方面:内在偏见样本选择、隐藏变量和数据年龄差异。值得注意的是,我们建议和测试了一种能够检测和衡量内在数据偏见的第一种策略。”Here's a breakdown of the translation:* "据悉" (liàng bì) is an idiomatic expression that means "according to what is known" or "as far as is known."* "大数据量" (dà xù xiǎng) means "large amount of data."* "质量" (jīn yù) means "quality."* "关键" (guān jī) means "critical" or "key."* "因此" (yǐn qī) is a conjunction that means "therefore" or "as a result."* "存在" (cún zhī) means "there is" or "exists."* "critical roadblocks" is translated as " crítical roadblocks" (zhì zhì fāng xiào) to emphasize the importance of the issues.* "在文献中" (zài wén xiǎng zhī) means "in the literature" or "as discussed in the literature."* "受到了低估" (shòu dào le duō jì) means "have been underestimated."* "不充分讨论" (bù zhòng fēn tóu yì) means "have not been fully discussed."* "我们认为" (wǒ men rèn wēi) is a phrase that means "we believe" or "in our view."* "这些问题" (zhè xiē wèn tí) is a phrase that means "these issues" or "these problems."* "可能会" (kě néng huì) is a phrase that means "may" or "might."* "妨碍" (mǐng yòu) means "obstacle" or "hinder."* "人工智能基于发现过程" (rén gōng jì yì jī bù jiào yù) is a phrase that means "artificial intelligence based on the discovery process."* "即使" (jī shì) is a conjunction that means "even if" or "despite."* "有高质量" (yǒu gāo jīn yù) means "have high quality."* "充分大" (chōng fēn dà) means "sufficiently large."* "受人尊敬" (shòu rén zhù jì) means "respected by people" or "well-regarded."* "数据源" (xìng xiào) means "data source."* "有" (yǒu) is a particle that indicates the existence of something.* "三个方面" (sān gè fāng miàn) is a phrase that means "three aspects" or "three sides."* "内在偏见样本选择" (nèi zài pēn jiàn yàng bǎn jiǎo) is a phrase that means "intrinsic bias in sample selection."* "隐藏变量" (hūn yǎn biàn yù) means "hidden variables."* "数据年龄差异" (xìng xiàng nián suī) is a phrase that means "data age difference."* "值得注意的是" (fù dé zhù yì de shì) is a phrase that means "it is worth noting that" or "it is worth mentioning that."* "我们建议和测试了一种能够检测和衡量内在数据偏见的第一种策略" (wǒ men jiàn yù hé cè shì yī zhī fāng zhì) is a sentence that means "We suggest and test a strategy that can detect and measure the intrinsic data bias for the first time."
Safety Aware Autonomous Path Planning Using Model Predictive Reinforcement Learning for Inland Waterways
paper_authors: Astrid Vanneste, Simon Vanneste, Olivier Vasseur, Robin Janssens, Mattias Billast, Ali Anwar, Kevin Mets, Tom De Schepper, Siegfried Mercelis, Peter Hellinckx
results: 实验结果显示,MPRL在两个测试场景中比基于对称框架的规划和基于 proximal policy optimization(PPO)的规划更好,能够安全(无撞击)通过两个测试场景。Abstract
In recent years, interest in autonomous shipping in urban waterways has increased significantly due to the trend of keeping cars and trucks out of city centers. Classical approaches such as Frenet frame based planning and potential field navigation often require tuning of many configuration parameters and sometimes even require a different configuration depending on the situation. In this paper, we propose a novel path planning approach based on reinforcement learning called Model Predictive Reinforcement Learning (MPRL). MPRL calculates a series of waypoints for the vessel to follow. The environment is represented as an occupancy grid map, allowing us to deal with any shape of waterway and any number and shape of obstacles. We demonstrate our approach on two scenarios and compare the resulting path with path planning using a Frenet frame and path planning based on a proximal policy optimization (PPO) agent. Our results show that MPRL outperforms both baselines in both test scenarios. The PPO based approach was not able to reach the goal in either scenario while the Frenet frame approach failed in the scenario consisting of a corner with obstacles. MPRL was able to safely (collision free) navigate to the goal in both of the test scenarios.
摘要
近年来,自动水上交通在城市水道中受到了广泛关注,因为许多城市中心禁止汽车和卡车的进入。传统的方法,如基于弗雷内特框的规划和潜在场 Navigation,经常需要调整许多配置参数,甚至在不同情况下需要不同的配置。在本文中,我们提出了一种基于强化学习的新的规划方法, called Model Predictive Reinforcement Learning(MPRL)。MPRL计算出船只应该遵循的一系列方向点。环境被表示为一个占用度网格地图,这使得我们可以处理任何形状的水道和任何形状和数量的障碍物。我们在两个场景中测试了我们的方法,并与基于Frenet框的规划和基于 proximal policy optimization(PPO)算法的规划进行比较。我们的结果显示,MPRL在两个测试场景中都超过了两个基准点。PPO算法在任一场景中都无法达到目标,而基于Frenet框的规划在包含封闭障碍物的场景中失败。MPRL在两个测试场景中安全(无碰撞)地达到了目标。
paper_authors: Arthur da Cunha, Francesco d’Amore, Emanuele Natale
for: 研究SLTH中structured pruning的可行性
methods: 使用多dimensional generalization of Random Subset-Sum Problem来解决SLTH中的随机依赖关系
results: 提出了一种可以将任意小型网络拟合到Structured Pruning中的structured subnetwork的存在,这是SLTH的第一个下 exponential bound,开启了新的研究方向,帮助深入理解深度学习中过参数化的作用。Abstract
The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.
摘要
“强大的抽签票假设(SLTH)称 randomly-initialized 神经网络中可能存在不需要训练的子网络,却受到不结构的剪除研究的限制。这里我们与此有关的主要原因之一是下列数学工具的限制:我们使用了最新的多dimensional 普通化的Random Subset-Sum Problem,从而获得了允许随机相依性的variant。我们运用这个结果,证明了一个广泛的随机卷积神经网络中,存在一些可以近似任何较小的网络的结构化子网络。这个结果提供了SLTH关于结构剪除的首个次对数 bounds,对于深度学习的理解做出了新的贡献,并开启了新的研究方向。”
Contribution Evaluation in Federated Learning: Examining Current Approaches
results: 论文通过在MNIST和CIFAR-10上 benchmarking不同方法的实验结果,展示了各种方法的不同特点,并引出了设计公平和高效的贡献评估方法的重要性。Abstract
Federated Learning (FL) has seen increasing interest in cases where entities want to collaboratively train models while maintaining privacy and governance over their data. In FL, clients with private and potentially heterogeneous data and compute resources come together to train a common model without raw data ever leaving their locale. Instead, the participants contribute by sharing local model updates, which, naturally, differ in quality. Quantitatively evaluating the worth of these contributions is termed the Contribution Evaluation (CE) problem. We review current CE approaches from the underlying mathematical framework to efficiently calculate a fair value for each client. Furthermore, we benchmark some of the most promising state-of-the-art approaches, along with a new one we introduce, on MNIST and CIFAR-10, to showcase their differences. Designing a fair and efficient CE method, while a small part of the overall FL system design, is tantamount to the mainstream adoption of FL.
摘要
Short vs. Long-term Coordination of Drones: When Distributed Optimization Meets Deep Reinforcement Learning
results: 对于交通监测,提出的新进展方法在比较三种基准方法的实验中表现出色,显示了其出色的性能。Abstract
Swarms of smart drones, with the support of charging technology, can provide completing sensing capabilities in Smart Cities, such as traffic monitoring and disaster response. Existing approaches, including distributed optimization and deep reinforcement learning (DRL), aim to coordinate drones to achieve cost-effective, high-quality navigation, sensing, and recharging. However, they have distinct challenges: short-term optimization struggles to provide sustained benefits, while long-term DRL lacks scalability, resilience, and flexibility. To bridge this gap, this paper introduces a new progressive approach that encompasses the planning and selection based on distributed optimization, as well as DRL-based flying direction scheduling. Extensive experiment with datasets generated from realisitic urban mobility demonstrate the outstanding performance of the proposed solution in traffic monitoring compared to three baseline methods.
摘要
众群智能飞机,受到充电技术支持,可以在智能城市提供完整的感知能力,如交通监测和灾害应急应急。现有的方法,包括分布式优化和深度强化学习(DRL),尝试协调飞机以实现成本效果、高质量的导航、感知和充电。然而,它们具有短期优化困难提供持续性利益,长期DRL缺乏扩展性、可靠性和灵活性。为了bridging这个差距,本文提出了一种新的进步方法,包括分布式优化基础上的规划和选择,以及基于DRL的飞机飞行方向安排。实验表明,提出的解决方案在交通监测中比基准方法三种表现出众。
paper_authors: Lorenzo Bonito, James Requeima, Aliaksandra Shysheya, Richard E. Turner
for: 这 paper 是为了提供一种新的替代方法来模型 neural processes,以更好地适应具有数据稀缺和预测不确定性的应用领域,如健康科学和气候科学。
methods: 该 paper 使用了一种基于扩散的方法,通过对含有噪声的数据进行conditioning来解决许多现有方法的限制,同时也超越了现有最佳实践的性能。
results: 该 paper 的研究结果表明,该新方法可以在各种应用领域中提供更高的准确性和稳定性,并且可以与现有方法进行比较。Abstract
Over the last few years, Neural Processes have become a useful modelling tool in many application areas, such as healthcare and climate sciences, in which data are scarce and prediction uncertainty estimates are indispensable. However, the current state of the art in the field (AR CNPs; Bruinsma et al., 2023) presents a few issues that prevent its widespread deployment. This work proposes an alternative, diffusion-based approach to NPs which, through conditioning on noised datasets, addresses many of these limitations, whilst also exceeding SOTA performance.
摘要
在过去几年,神经过程(Neural Processes)已成为许多应用领域中的有用模型工具,如医疗和气候科学,在数据稀缺和预测uncertainty估计是不可或缺的。然而,当前领域的状态艺(AR CNPs;布鲁因斯马等,2023)存在一些限制,阻碍其广泛应用。这项工作提出了一种替代方案,基于扩散的方法,通过对噪音数据进行条件,解决了许多这些限制,同时也超越了最佳性能。
Runtime Verification of Learning Properties for Reinforcement Learning Algorithms
methods: 这篇研究使用了新的执行时验证技术,包括三个验证性能,以便在RL算法中监控和评估这些性能 during the system’s operation。
results: 这篇研究获得了三个验证性能,包括学习质量、时间耗用和精度等。这些性能可以用来监控和评估RL算法的学习过程,以便提高学习效率和精度。Abstract
Reinforcement learning (RL) algorithms interact with their environment in a trial-and-error fashion. Such interactions can be expensive, inefficient, and timely when learning on a physical system rather than in a simulation. This work develops new runtime verification techniques to predict when the learning phase has not met or will not meet qualitative and timely expectations. This paper presents three verification properties concerning the quality and timeliness of learning in RL algorithms. With each property, we propose design steps for monitoring and assessing the properties during the system's operation.
摘要
Fossil 2.0: Formal Certificate Synthesis for the Verification and Control of Dynamical Models
results: Fossil 2.0 可以生成更多的证明、控制法则和对离散时间模型的支持。Abstract
This paper presents Fossil 2.0, a new major release of a software tool for the synthesis of certificates (e.g., Lyapunov and barrier functions) for dynamical systems modelled as ordinary differential and difference equations. Fossil 2.0 is much improved from its original release, including new interfaces, a significantly expanded certificate portfolio, controller synthesis and enhanced extensibility. We present these new features as part of this tool paper. Fossil implements a counterexample-guided inductive synthesis (CEGIS) loop ensuring the soundness of the method. Our tool uses neural networks as templates to generate candidate functions, which are then formally proven by an SMT solver acting as an assertion verifier. Improvements with respect to the first release include a wider range of certificates, synthesis of control laws, and support for discrete-time models.
摘要
results: 经过系统评估,本研究发现,通过GEO可以提高生成引擎响应中内容的可见度,最高提高40%。此外,研究还发现不同领域的可见度提升策略效果不同,强调需要针对具体领域进行定制。Abstract
The advent of large language models (LLMs) has ushered in a new paradigm of search engines that use generative models to gather and summarize information to answer user queries. This emerging technology, which we formalize under the unified framework of Generative Engines (GEs), has the potential to generate accurate and personalized responses, and is rapidly replacing traditional search engines like Google and Bing. Generative Engines typically satisfy queries by synthesizing information from multiple sources and summarizing them with the help of LLMs. While this shift significantly improves \textit{user} utility and \textit{generative search engine} traffic, it results in a huge challenge for the third stakeholder -- website and content creators. Given the black-box and fast-moving nature of Generative Engines, content creators have little to no control over when and how their content is displayed. With generative engines here to stay, the right tools should be provided to ensure that creator economy is not severely disadvantaged. To address this, we introduce Generative Engine Optimization (GEO), a novel paradigm to aid content creators in improving the visibility of their content in Generative Engine responses through a black-box optimization framework for optimizing and defining visibility metrics. We facilitate systematic evaluation in this new paradigm by introducing GEO-bench, a benchmark of diverse user queries across multiple domains, coupled with sources required to answer these queries. Through rigorous evaluation, we show that GEO can boost visibility by up to 40\% in generative engine responses. Moreover, we show the efficacy of these strategies varies across domains, underscoring the need for domain-specific methods. Our work opens a new frontier in the field of information discovery systems, with profound implications for generative engines and content creators.
摘要
随着大型语言模型(LLM)的出现,一种新的搜索引擎 paradigm 已经出现,这种搜索引擎使用生成模型来收集和摘要信息以回答用户问题。我们称这种技术为生成引擎(GE)。这种技术可以生成准确和个性化的回答,并在传统搜索引擎如Google和Bing的替代品上快速取代。生成引擎通常通过将多个源的信息合并并使用LLM进行摘要来满足用户的查询。虽然这种转变会提高用户的用户体验和生成搜索引擎的搜索量,但是它会对内容创建者造成巨大的挑战。由于生成引擎的黑盒和快速移动的性质,内容创建者几乎没有控制他们的内容是如何和何时显示。为了解决这个问题,我们介绍了生成引擎优化(GEO),一种新的方法,可以帮助内容创建者在生成引擎的回答中提高他们的内容的可见度。我们通过引入GEO-bench,一个包含多个领域的多种用户查询和回答的基准,来促进系统性评估。我们的实验表明,GEO可以提高可见度达40%。此外,我们还发现这些策略在不同的领域中的效果不同,强调了需要针对具体领域的方法。我们的工作开启了一个新的领域,即信息发现系统,对生成引擎和内容创建者产生了深远的影响。
CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs
results: 作者的实验表明,CDMPP在多种DNN模型和设备上表现出色,与状态 искус的基准值相比,CDMPP的预测错误率为14.03%和10.85%,分别为cross-model和cross-device预测。而与之前的基准值相比,CDMPP的训练效率高出一个数量级。实验结果和扩展数据集可以在https://github.com/joapolarbear/cdmpp上下载。Abstract
Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph- or tensor-level optimization and device selection. Considering the large space of DNN models and devices that impede direct profiling of all combinations, recent efforts focus on building a predictor to model the performance of DNN models on different devices. However, none of the existing attempts have achieved a cost model that can accurately predict the performance of various tensor programs while supporting both training and inference accelerators. We propose CDMPP, an efficient tensor program latency prediction framework for both cross-model and cross-device prediction. We design an informative but efficient representation of tensor programs, called compact ASTs, and a pre-order-based positional encoding method, to capture the internal structure of tensor programs. We develop a domain-adaption-inspired method to learn domain-invariant representations and devise a KMeans-based sampling algorithm, for the predictor to learn from different domains (i.e., different DNN operators and devices). Our extensive experiments on a diverse range of DNN models and devices demonstrate that CDMPP significantly outperforms state-of-the-art baselines with 14.03% and 10.85% prediction error for cross-model and cross-device prediction, respectively, and one order of magnitude higher training efficiency. The implementation and the expanded dataset are available at https://github.com/joapolarbear/cdmpp.
摘要
深度神经网络(DNN)在多种机器学习应用中表现出色。了解在特定设备上运行 DNN 模型或tensor程序的延迟是多种任务中的有用信息,如 DNN 图或tensor程序级别优化和设备选择。由于 DNN 模型和设备之间的空间很大,直接测试所有组合是不可能的。为了解决这个问题,当前的努力集中在建立一个可预测 DNN 模型在不同设备上的性能的模型。然而,现有的尝试都没有实现一个可预测多种tensor程序的性能的成本模型,同时支持训练和推理加速器。我们提出了 CDMPP,一个高效的 tensor程序延迟预测框架,用于跨模型和跨设备预测。我们设计了一种有用但不具有冗余的表示方式,called Compact ASTs,以及一种基于 pre-order 的 pozitional encoding 方法,以捕捉 tensor程序的内部结构。我们开发了一种域 adaptive 方法,用于学习域 invariant 表示,并提出了一种 KMeans 基于采样算法,使预测器从不同域中学习。我们的广泛的实验表明,CDMPP 在多种 DNN 模型和设备上表现出色,与状态对比基线错误率为 14.03% 和 10.85%,具有一个 ORDER 更高的训练效率。实现和扩展数据集可以在 GitHub 上找到:https://github.com/joapolarbear/cdmpp。
Modelling daily mobility using mobile data traffic at fine spatiotemporal scale
methods: 本研究将 NetMob 2023 数据集与适应度高的外部数据集 ENACT 结合使用,开发了三种 XGBoost 模型,通过对 NetMob2023 数据中的移动数据流量和 ENACT 值进行比较,计算每个 100m x 100m 格子细胞的人口数量。
results: 结果表明,NetMob 2023 数据可以用于 estimate 城市区域的日夜人口和格子级别,并能够解释一些城市流动动态。Abstract
We applied a data-driven approach that explores the usability of the NetMob 2023 dataset in modelling mobility patterns within an urban context. We combined the data with a highly suitable external source, the ENACT dataset, which provides a 1 km x 1km grid with estimates of the day and night population across Europe. We developed three sets of XGBoost models that predict the population in each 100m x 100m grid cell used in NetMob2023 based on the mobile data traffic of the 68 online services covered in the dataset, using the ENACT values as ground truth. The results suggest that the NetMob 2023 data can be useful for the estimation of the day and night population and grid cell level and can explain part of the dynamics of urban mobility.
摘要
我们采用了数据驱动的方法,探索了NetMob 2023数据集在城市背景下的可用性。我们将数据与非常适合的外部资源——ENACT数据集相结合,该数据集提供了1km x 1km网格中的日夜人口估计数据在欧洲。我们开发了三个XGBoost模型,使用NetMob2023数据集中68个在线服务的流量数据预测每个100m x 100m网格单元中的人口,使用ENACT值作为真实值。结果表明,NetMob 2023数据可以用于 estimate 日夜人口和网格单元层次,并可以解释一部分城市流动性的动态。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Zenkai – Framework For Exploring Beyond Backpropagation
results: 该paper通过将深度学习机器分割成层次结构,使得研究者可以更容易地探索新的深度学习领域,不再受限于传统的backpropagation框架。Abstract
Zenkai is an open-source framework designed to give researchers more control and flexibility over building and training deep learning machines. It does this by dividing the deep learning machine into layers of semi-autonomous learning machines with their own target and learning algorithm. This is to allow researchers greater exploration such as the use of non-differentiable layers or learning algorithms beyond those based on error backpropagation. Backpropagation Rumelhart et al. [1986] has powered deep learning to become one of the most exciting fields of the 21st century. As a result, a large number of software tools have been developed to support efficient implementation and training of neural networks through the use of backpropa- gation. While these have been critical to the success of deep learning, building frameworks around backpropagation can make it challenging to implement solutions that do not adhere to it. Zenkai aims to make it easier to get around these limitations and help researchers more easily explore new frontiers in deep learning that do not strictly adhere to the backpropagation framework.
摘要
zenkai 是一个开源框架,旨在给研究人员更多的控制和灵活性来建立和训练深度学习机器。它通过将深度学习机器分割成各自有target和学习算法的层次结构,以便让研究人员更好地探索不同的学习方法和算法。这样可以让研究人员更容易实现不同的深度学习解决方案,而不是仅仅依赖于error backpropagation。以前,Rumelhart等人在1986年提出了backpropagation算法,这个算法在21世纪的深度学习领域中帮助了深度学习成为一个非常有趣的领域。随着这些软件工具的开发,深度学习的实现和训练变得更加高效。然而,由backpropagation框架所固化的问题使得实现不同的解决方案变得困难。zenkai旨在使研究人员更容易实现不同的深度学习解决方案,并帮助他们更好地探索不同的深度学习领域。
GAIA: Delving into Gradient-based Attribution Abnormality for Out-of-distribution Detection
methods: 本文使用了解释predictive decisions的gradient-based attribution方法,并发现这些方法在处理OOD数据时遇到困难,导致解释结果呈现异常。基于这个观察,本文引入了两种OOD检测的异常现象:零减异常和通道平均异常。然后,本文提出了一种简单有效的GAIA方法,它利用Gradient Abnormality Inspection and Aggregation来检测OOD示例。
results: 本文的GAIA方法在CIFAR10和ImageNet-1k benchmark上表现出色,比预后处理方法更有效。具体来说,GAIA在CIFAR10上降低了平均FPR95的值by 23.10%,并在CIFAR100上降低了平均FPR95的值by 45.41%。Abstract
Detecting out-of-distribution (OOD) examples is crucial to guarantee the reliability and safety of deep neural networks in real-world settings. In this paper, we offer an innovative perspective on quantifying the disparities between in-distribution (ID) and OOD data -- analyzing the uncertainty that arises when models attempt to explain their predictive decisions. This perspective is motivated by our observation that gradient-based attribution methods encounter challenges in assigning feature importance to OOD data, thereby yielding divergent explanation patterns. Consequently, we investigate how attribution gradients lead to uncertain explanation outcomes and introduce two forms of abnormalities for OOD detection: the zero-deflation abnormality and the channel-wise average abnormality. We then propose GAIA, a simple and effective approach that incorporates Gradient Abnormality Inspection and Aggregation. The effectiveness of GAIA is validated on both commonly utilized (CIFAR) and large-scale (ImageNet-1k) benchmarks. Specifically, GAIA reduces the average FPR95 by 23.10% on CIFAR10 and by 45.41% on CIFAR100 compared to advanced post-hoc methods.
摘要
检测异常输入(OOD)示例是深度神经网络在实际场景中的可靠性和安全的 garantor。在这篇论文中,我们提出了一种新的观点,即通过分析模型对预测决策的解释来衡量ID和OOD数据之间的差异。这种观点是由我们发现,使用梯度基本的归因方法对OOD数据进行归因时会遇到困难,从而导致解释结果呈现出异常的现象。因此,我们调查了梯度归因过程中的不确定性,并提出了两种OOD检测的异常现象:零膨胀异常和通道平均异常。然后,我们提出了GAIA方法,它通过梯度异常检查和综合来实现简单而有效的OOD检测。GAIA方法在CIFAR10和ImageNet-1k两个标准测试集上验证了其效果,比先进的后置方法减少了平均FPR95的值23.10%和45.41%。
Generating Drug Repurposing Hypotheses through the Combination of Disease-Specific Hypergraphs
results: 这个研究发现了两种可能有潜在的新药再利用 канди达,即达巴格利福酮(一种抗糖尿病药物)和德布瑞索酮(一种反高血压药物)。这两种药物在合并两种疾病的质量学习中,其再利用潜力显著提高。Abstract
The drug development pipeline for a new compound can last 10-20 years and cost over 10 billion. Drug repurposing offers a more time- and cost-effective alternative. Computational approaches based on biomedical knowledge graph representations have recently yielded new drug repurposing hypotheses. In this study, we present a novel, disease-specific hypergraph representation learning technique to derive contextual embeddings of biological pathways of various lengths but that all start at any given drug and all end at the disease of interest. Further, we extend this method to multi-disease hypergraphs. To determine the repurposing potential of each of the 1,522 drugs, we derive drug-specific distributions of cosine similarity values and ultimately consider the median for ranking. Cosine similarity values are computed between (1) all biological pathways starting at the considered drug and ending at the disease of interest and (2) all biological pathways starting at drugs currently prescribed against that disease and ending at the disease of interest. We illustrate our approach with Alzheimer's disease (AD) and two of its risk factors: hypertension (HTN) and type 2 diabetes (T2D). We compare each drug's rank across four hypergraph settings (single- or multi-disease): AD only, AD + HTN, AD + T2D, and AD + HTN + T2D. Notably, our framework led to the identification of two promising drugs whose repurposing potential was significantly higher in hypergraphs combining two diseases: dapagliflozin (antidiabetic; moved up, from top 32$\%$ to top 7$\%$, across all considered drugs) and debrisoquine (antihypertensive; moved up, from top 76$\%$ to top 23$\%$). Our approach serves as a hypothesis generation tool, to be paired with a validation pipeline relying on laboratory experiments and semi-automated parsing of the biomedical literature.
摘要
<> translate the following text into Simplified Chinese:The drug development pipeline for a new compound can last 10-20 years and cost over 10 billion. Drug repurposing offers a more time- and cost-effective alternative. Computational approaches based on biomedical knowledge graph representations have recently yielded new drug repurposing hypotheses. In this study, we present a novel, disease-specific hypergraph representation learning technique to derive contextual embeddings of biological pathways of various lengths but that all start at any given drug and all end at the disease of interest. Further, we extend this method to multi-disease hypergraphs. To determine the repurposing potential of each of the 1,522 drugs, we derive drug-specific distributions of cosine similarity values and ultimately consider the median for ranking. Cosine similarity values are computed between (1) all biological pathways starting at the considered drug and ending at the disease of interest and (2) all biological pathways starting at drugs currently prescribed against that disease and ending at the disease of interest. We illustrate our approach with Alzheimer's disease (AD) and two of its risk factors: hypertension (HTN) and type 2 diabetes (T2D). We compare each drug's rank across four hypergraph settings (single- or multi-disease): AD only, AD + HTN, AD + T2D, and AD + HTN + T2D. Notably, our framework led to the identification of two promising drugs whose repurposing potential was significantly higher in hypergraphs combining two diseases: dapagliflozin (antidiabetic; moved up, from top 32% to top 7%, across all considered drugs) and debrisoquine (antihypertensive; moved up, from top 76% to top 23%). Our approach serves as a hypothesis generation tool, to be paired with a validation pipeline relying on laboratory experiments and semi-automated parsing of the biomedical literature.中文翻译:drugs development pipeline for a new compound can last 10-20 years and cost over 10 billion. drug repurposing offers a more time- and cost-effective alternative. based on biomedical knowledge graph representations, computational approaches have recently yielded new drug repurposing hypotheses. in this study, we present a novel, disease-specific hypergraph representation learning technique to derive contextual embeddings of biological pathways of various lengths but that all start at any given drug and all end at the disease of interest. further, we extend this method to multi-disease hypergraphs. to determine the repurposing potential of each of the 1,522 drugs, we derive drug-specific distributions of cosine similarity values and ultimately consider the median for ranking. cosine similarity values are computed between (1) all biological pathways starting at the considered drug and ending at the disease of interest and (2) all biological pathways starting at drugs currently prescribed against that disease and ending at the disease of interest. we illustrate our approach with Alzheimer's disease (AD) and two of its risk factors: hypertension (HTN) and type 2 diabetes (T2D). we compare each drug's rank across four hypergraph settings (single- or multi-disease): AD only, AD + HTN, AD + T2D, and AD + HTN + T2D. notably, our framework led to the identification of two promising drugs whose repurposing potential was significantly higher in hypergraphs combining two diseases: dapagliflozin (antidiabetic; moved up, from top 32% to top 7%, across all considered drugs) and debrisoquine (antihypertensive; moved up, from top 76% to top 23%). our approach serves as a hypothesis generation tool, to be paired with a validation pipeline relying on laboratory experiments and semi-automated parsing of the biomedical literature.
Accelerating material discovery with a threshold-driven hybrid acquisition policy-based Bayesian optimization
paper_authors: Ahmed Shoyeb Raihan, Hamed Khosravi, Srinjoy Das, Imtiaz Ahmed for: 本研究旨在提高材料发现和开发过程中的效率,通过应用机器学习技术和bayesian优化方法,从而减少实验成本和开发时间。methods: 本研究使用了一种新的阈值驱动的UCB-EI Bayesian优化方法,它将UCB和EI两种获取函数 dynamically интегриру起来,以优化材料发现过程。results: 对于三个不同的材料数据集,TDUE-BO方法显示了较好的优化和approximation性能,比EI和UCB-based BO方法更快地 converges,并且在RMSE分数上显示出较好的表现。Abstract
Advancements in materials play a crucial role in technological progress. However, the process of discovering and developing materials with desired properties is often impeded by substantial experimental costs, extensive resource utilization, and lengthy development periods. To address these challenges, modern approaches often employ machine learning (ML) techniques such as Bayesian Optimization (BO), which streamline the search for optimal materials by iteratively selecting experiments that are most likely to yield beneficial results. However, traditional BO methods, while beneficial, often struggle with balancing the trade-off between exploration and exploitation, leading to sub-optimal performance in material discovery processes. This paper introduces a novel Threshold-Driven UCB-EI Bayesian Optimization (TDUE-BO) method, which dynamically integrates the strengths of Upper Confidence Bound (UCB) and Expected Improvement (EI) acquisition functions to optimize the material discovery process. Unlike the classical BO, our method focuses on efficiently navigating the high-dimensional material design space (MDS). TDUE-BO begins with an exploration-focused UCB approach, ensuring a comprehensive initial sweep of the MDS. As the model gains confidence, indicated by reduced uncertainty, it transitions to the more exploitative EI method, focusing on promising areas identified earlier. The UCB-to-EI switching policy dictated guided through continuous monitoring of the model uncertainty during each step of sequential sampling results in navigating through the MDS more efficiently while ensuring rapid convergence. The effectiveness of TDUE-BO is demonstrated through its application on three different material datasets, showing significantly better approximation and optimization performance over the EI and UCB-based BO methods in terms of the RMSE scores and convergence efficiency, respectively.
摘要
技术进步受材料进步的影响很大。然而,找到和开发满足需求的材料往往受到巨大的实验成本、资源占用和长时间的开发周期的阻碍。为了解决这些挑战,现代方法常常使用机器学习(ML)技术,如权重优化(BO),以快速搜索优化材料的属性。然而,传统的BO方法,虽有利,但往往在材料发现过程中困难寻找优化和权衡的平衡,导致表现下降。本文介绍一种新的阈值驱动的UCB-EI权重优化方法(TDUE-BO),它在材料设计空间(MDS)中高效地寻找优化。TDUE-BO方法在开始时使用探索带有UCB的方法,以确保初步扫描MDS的全面性。随着模型收获更多的信息,它逐渐过渡到更加利用的EI方法,专注于之前确定的优点。TDUE-BO方法的UCB-to-EI交换策略,通过监测模型在每次采样中的不确定性的连续监测,以高效地在MDS中导航,并确保更快的收敛。TDUE-BO方法在三个不同的材料数据集上进行应用,与EI和UCB基于BO方法的表现相比,在TERMSE scores和收敛效率上显示出了显著的改进。
Group-Aware Interest Disentangled Dual-Training for Personalized Recommendation
results: 实验结果显示,IGRec可以有效地解决数据缺乏和冷启问题,并且在群体推荐任务上显示出了更高的信息含量。Abstract
Personalized recommender systems aim to predict users' preferences for items. It has become an indispensable part of online services. Online social platforms enable users to form groups based on their common interests. The users' group participation on social platforms reveals their interests and can be utilized as side information to mitigate the data sparsity and cold-start problem in recommender systems. Users join different groups out of different interests. In this paper, we generate group representation from the user's interests and propose IGRec (Interest-based Group enhanced Recommendation) to utilize the group information accurately. It consists of four modules. (1) Interest disentangler via self-gating that disentangles users' interests from their initial embedding representation. (2) Interest aggregator that generates the interest-based group representation by Gumbel-Softmax aggregation on the group members' interests. (3) Interest-based group aggregation that fuses user's representation with the participated group representation. (4) A dual-trained rating prediction module to utilize both user-item and group-item interactions. We conduct extensive experiments on three publicly available datasets. Results show IGRec can effectively alleviate the data sparsity problem and enhance the recommender system with interest-based group representation. Experiments on the group recommendation task further show the informativeness of interest-based group representation.
摘要
personalized recommender systems aim to predict users' preferences for items. It has become an indispensable part of online services. online social platforms enable users to form groups based on their common interests. The users' group participation on social platforms reveals their interests and can be utilized as side information to mitigate the data sparsity and cold-start problem in recommender systems. users join different groups out of different interests. In this paper, we generate group representation from the user's interests and propose IGRec (Interest-based Group enhanced Recommendation) to utilize the group information accurately. It consists of four modules. (1) Interest disentangler via self-gating that disentangles users' interests from their initial embedding representation. (2) Interest aggregator that generates the interest-based group representation by Gumbel-Softmax aggregation on the group members' interests. (3) Interest-based group aggregation that fuses user's representation with the participated group representation. (4) A dual-trained rating prediction module to utilize both user-item and group-item interactions. We conduct extensive experiments on three publicly available datasets. results show IGRec can effectively alleviate the data sparsity problem and enhance the recommender system with interest-based group representation. experiments on the group recommendation task further show the informativeness of interest-based group representation.Here's the breakdown of the translation:* "personalized recommender systems" becomes "个性化推荐系统" (gèxìnghuà zhìdòng yìxìng zhìdòng)* "aim to predict users' preferences for items" becomes "预测用户对物品的喜好" (yùndào yìzhí yìxìng zhìdòng)* "online social platforms" becomes "在线社交平台" (zài xiàng xìng zhìdòng)* "enable users to form groups based on their common interests" becomes "允许用户根据共同的兴趣组成群体" (shèngxìn yìxìng zhìdòng)* "users join different groups out of different interests" becomes "用户根据不同的兴趣加入不同的群体" (yìxìng zhìdòng zài bùdìng de xìngxìng)* "Interest-based Group enhanced Recommendation" becomes "兴趣基于群体增强推荐" (yìxìng jīyào qúnwù zhìdòng zhìdòng)* "consists of four modules" becomes "包括四个模块" (bāng xīn sì ge móudào)* "Interest disentangler via self-gating" becomes "自我阻塞来消除用户兴趣的混合" (zìwǒ zhìxíng lái xiāoxiǎo yìxìng zhìdòng)* "Interest aggregator" becomes "兴趣聚合器" (yìxìng jùhégōng)* "Interest-based group aggregation" becomes "兴趣基于群体聚合" (yìxìng jīyào qúnwù jùhégōng)* "dual-trained rating prediction module" becomes "双向训练评分模块" (shuāng xiàng xiǎng zhìxíng píngfāng móudào)* "extensive experiments on three publicly available datasets" becomes "在三个公开数据集上进行了广泛的实验" (zài sān gè gōngkāi xìngxìng zhìdòng zhìdòng)* "results show IGRec can effectively alleviate the data sparsity problem" becomes "结果显示 IGRec 可以有效解决数据缺乏问题" (jiéguī xiǎnshì IGRec kěyì yǒu jìngxìng duōshì wèn tí)* "experiments on the group recommendation task further show the informativeness of interest-based group representation" becomes "在群体推荐任务上进行的实验再次表明兴趣基于群体表示的有用性" (zài qúnwù zhìdòng zhìdòng shì de jìngxìng yǐngxìng)
A Knowledge Distillation Approach for Sepsis Outcome Prediction from Multivariate Clinical Time Series
paper_authors: Anna Wong, Shu Ge, Nassim Oufattole, Adam Dejl, Megan Su, Ardavan Saeedi, Li-wei H. Lehman
for: 预测 septic 病人结果,学习可读性的状态表示
methods: 使用知识塑化 via 约束变量推断,将师网络模型的知识塑化到学生网络模型中,以实现高预测性和可读性
results: 使用实际数据,在 MIMIC-IV 数据库上训练 LSTM 作为师网络模型,预测 septic 病人死亡率,并使用 AR-HMM 学习可读性的隐藏状态表示,预测多个下游结果,包括医院死亡率、肺液肿、需要药物、透析和机械呼吸等。结果表明,我们的方法可以成功integrate constraint,实现高预测性和可读性。Abstract
Sepsis is a life-threatening condition triggered by an extreme infection response. Our objective is to forecast sepsis patient outcomes using their medical history and treatments, while learning interpretable state representations to assess patients' risks in developing various adverse outcomes. While neural networks excel in outcome prediction, their limited interpretability remains a key issue. In this work, we use knowledge distillation via constrained variational inference to distill the knowledge of a powerful "teacher" neural network model with high predictive power to train a "student" latent variable model to learn interpretable hidden state representations to achieve high predictive performance for sepsis outcome prediction. Using real-world data from the MIMIC-IV database, we trained an LSTM as the "teacher" model to predict mortality for sepsis patients, given information about their recent history of vital signs, lab values and treatments. For our student model, we use an autoregressive hidden Markov model (AR-HMM) to learn interpretable hidden states from patients' clinical time series, and use the posterior distribution of the learned state representations to predict various downstream outcomes, including hospital mortality, pulmonary edema, need for diuretics, dialysis, and mechanical ventilation. Our results show that our approach successfully incorporates the constraint to achieve high predictive power similar to the teacher model, while maintaining the generative performance.
摘要
伤害是一种生命威胁的疾病,由于感染过程的极端反应而引起。我们的目标是预测患有伤害患者的结果,使用他们的医疗历史和治疗方法,同时学习可读取的状态表示,以评估患者在不同的不良结果发展中的风险。虽然神经网络在结果预测方面表现出色,但它们的解释能力受限。在这种工作中,我们使用知识填充via受限变量推理来填充教师神经网络模型的知识,以训练学生隐藏变量模型,以学习可读取的隐藏状态表示,以实现高度预测性和解释能力。使用实际数据库,我们训练了LSTM作为教师模型,以预测伤害患者的死亡,根据他们的近期生命体征、实验室值和治疗方法的信息。为学生模型,我们使用自适应隐藏马尔可夫模型(AR-HMM)来学习患者的临床时序序列中的可读取隐藏状态,并使用学习的 posterior 分布来预测多个下游结果,包括医院死亡率、肺液肿、需要药物、人工呼吸和肾透析。我们的结果表明,我们的方法可以成功地满足Constraint来实现高度预测力和解释能力,同时保持生成性能。
Know Thy Neighbors: A Graph Based Approach for Effective Sensor-Based Human Activity Recognition in Smart Homes
results: 本研究在CASAS数据集上进行了多个实验,结果显示了 graph-guided neural network 在智能家居HAR中的表现,在多个数据集和大幅优势的情况下超越了现有的方法。这些结果显示了 HAR 系统在实际应用中的潜力。Abstract
There has been a resurgence of applications focused on Human Activity Recognition (HAR) in smart homes, especially in the field of ambient intelligence and assisted living technologies. However, such applications present numerous significant challenges to any automated analysis system operating in the real world, such as variability, sparsity, and noise in sensor measurements. Although state-of-the-art HAR systems have made considerable strides in addressing some of these challenges, they especially suffer from a practical limitation: they require successful pre-segmentation of continuous sensor data streams before automated recognition, i.e., they assume that an oracle is present during deployment, which is capable of identifying time windows of interest across discrete sensor events. To overcome this limitation, we propose a novel graph-guided neural network approach that performs activity recognition by learning explicit co-firing relationships between sensors. We accomplish this by learning a more expressive graph structure representing the sensor network in a smart home, in a data-driven manner. Our approach maps discrete input sensor measurements to a feature space through the application of attention mechanisms and hierarchical pooling of node embeddings. We demonstrate the effectiveness of our proposed approach by conducting several experiments on CASAS datasets, showing that the resulting graph-guided neural network outperforms the state-of-the-art method for HAR in smart homes across multiple datasets and by large margins. These results are promising because they push HAR for smart homes closer to real-world applications.
摘要
随着智能家居技术的发展,人活动识别(HAR)应用也在智能家居领域得到了新的推动。特别是在 ambient intelligence 和助生技术领域,HAR 应用已成为当前研究热点。然而,实际世界中的 HAR 系统面临着许多挑战,如感知器的变化、缺失和噪声等问题。尽管现有的 HAR 系统已经做出了很大的进步,但它们尤其受到一种实际限制:它们需要成功地预分 segments 持续的感知数据流,以便自动识别活动。为了突破这一限制,我们提出了一种新的图导型神经网络方法,该方法通过学习感知器之间的显式协同关系来进行活动识别。我们通过在数据驱动方式下学习更加表达式的图结构,将感知网络在智能家居中映射到具有表达能力的特征空间。我们的方法通过注意机制和层次聚合节点嵌入来将离散输入感知测量映射到特征空间。我们在 CASAS 数据集上进行了多个实验,并证明了我们提出的图导型神经网络方法在智能家居中 HAR 方面的效果较为出色,与当前状态艺技术相比,差距较大。这些结果是推动 HAR 在智能家居中的实际应用的好消息。
Identifying Systems with Symmetries using Equivariant Autoregressive Reservoir Computers
results: 文中结果表明,使用这些技术可以有效地识别和预测具有对称性的非线性系统,无论这些系统是否 exhibits 乱流行为。Abstract
The investigation reported in this document focuses on identifying systems with symmetries using equivariant autoregressive reservoir computers. General results in structured matrix approximation theory are presented, exploring a two-fold approach. Firstly, a comprehensive examination of generic symmetry-preserving nonlinear time delay embedding is conducted. This involves analyzing time series data sampled from an equivariant system under study. Secondly, sparse least-squares methods are applied to discern approximate representations of the output coupling matrices. These matrices play a pivotal role in determining the nonlinear autoregressive representation of an equivariant system. The structural characteristics of these matrices are dictated by the set of symmetries inherent in the system. The document outlines prototypical algorithms derived from the described techniques, offering insight into their practical applications. Emphasis is placed on their effectiveness in the identification and predictive simulation of equivariant nonlinear systems, regardless of whether such systems exhibit chaotic behavior.
摘要
这份报告的调查集中关注使用对称 Autoregressive 计算机系统来识别具有对称性的系统。报告提供了一般结果,探讨了两种方法:首先,对具有对称性的非线性时间延迟嵌入进行全面的分析,这是通过分析研究中的对称系统时间序列数据来实现的。其次,使用稀疏最小二乘方法来突出输出联系矩阵的 Approximate 表示。这些矩阵在确定非线性 Autoregressive 表示中扮演重要的角色,其结构特征由系统中的对称性决定。报告描述了基于这些技术的评估算法,并强调其在识别和预测非线性系统中的有效性,无论这些系统是否展现混沌行为。
Investigating the Impact of Weight Sharing Decisions on Knowledge Transfer in Continual Learning
results: 研究发现,在不同任务的复杂度和相似性的情况下,有优化的 weight sharing 决策可以提高任务的准确率。 通过遵循这些决策,我们可以在 CL 中提高任务的性能。Abstract
Continual Learning (CL) has generated attention as a method of avoiding Catastrophic Forgetting (CF) in the sequential training of neural networks, improving network efficiency and adaptability to different tasks. Additionally, CL serves as an ideal setting for studying network behavior and Forward Knowledge Transfer (FKT) between tasks. Pruning methods for CL train subnetworks to handle the sequential tasks which allows us to take a structured approach to investigating FKT. Sharing prior subnetworks' weights leverages past knowledge for the current task through FKT. Understanding which weights to share is important as sharing all weights can yield sub-optimal accuracy. This paper investigates how different sharing decisions affect the FKT between tasks. Through this lens we demonstrate how task complexity and similarity influence the optimal weight sharing decisions, giving insights into the relationships between tasks and helping inform decision making in similar CL methods. We implement three sequential datasets designed to emphasize variation in task complexity and similarity, reporting results for both ResNet-18 and VGG-16. By sharing in accordance with the decisions supported by our findings, we show that we can improve task accuracy compared to other sharing decisions.
摘要
Network Wide Evacuation Traffic Prediction in a Rapidly Intensifying Hurricane from Traffic Detectors and Facebook Movement Data: A Deep Learning Approach
results: 模型在常规时间段(5月-8月)的数据上进行了训练,并在风暴撤离期间使用test数据进行预测,其中Accuracy为95%(RMSE = 356),但在风暴撤离期间,模型表现不佳,Accuracy为55%(RMSE = 1084)。然后,研究人员采用了传输学习方法,使用预训练的模型和更多的撤离相关特征进行预测,最终模型的Accuracy提高至89%(RMSE = 514)。再次添加Facebook移动数据,模型的RMSE值降至393,并提高了Accuracy至93%。Abstract
Traffic prediction during hurricane evacuation is essential for optimizing the use of transportation infrastructures. It can reduce evacuation time by providing information on future congestion in advance. However, evacuation traffic prediction can be challenging as evacuation traffic patterns is significantly different than regular period traffic. A data-driven traffic prediction model is developed in this study by utilizing traffic detector and Facebook movement data during Hurricane Ian, a rapidly intensifying hurricane. We select 766 traffic detectors from Florida's 4 major interstates to collect traffic features. Additionally, we use Facebook movement data collected during Hurricane Ian's evacuation period. The deep-learning model is first trained on regular period (May-August 2022) data to understand regular traffic patterns and then Hurricane Ian's evacuation period data is used as test data. The model achieves 95% accuracy (RMSE = 356) during regular period, but it underperforms with 55% accuracy (RMSE = 1084) during the evacuation period. Then, a transfer learning approach is adopted where a pretrained model is used with additional evacuation related features to predict evacuation period traffic. After transfer learning, the model achieves 89% accuracy (RMSE = 514). Adding Facebook movement data further reduces model's RMSE value to 393 and increases accuracy to 93%. The proposed model is capable to forecast traffic up to 6-hours in advance. Evacuation traffic management officials can use the developed traffic prediction model to anticipate future traffic congestion in advance and take proactive measures to reduce delays during evacuation.
摘要
预测风暴撤离交通是至关重要的,以便优化交通基础设施的使用。它可以降低撤离时间,通过提供未来堵塞的信息。然而,撤离交通预测可能是挑战,因为撤离交通模式与常规时间交通模式有所不同。本研究中提出了一种基于数据驱动的交通预测模型,通过利用交通检测器和Facebook运动数据进行风暴伊安的撤离期间预测。我们选择了766个交通检测器,分别位于佛罗里达州的4大高速公路。此外,我们还使用了在风暴伊安撤离期间收集的Facebook运动数据。深度学习模型首先在常规时间(5月-8月2022年)的数据上训练,以理解常规交通模式,然后使用风暴伊安撤离期间的数据进行测试。模型在常规时间上达到95%的准确率(RMSE=356),但在撤离期间表现不佳,准确率为55%(RMSE=1084)。然后,我们采用了传输学习方法,使用预训练的模型,并添加了撤离相关特征。经过传输学习,模型的准确率提高到89%(RMSE=514)。再次添加Facebook运动数据,模型的RMSE值降低至393,准确率提高到93%。该模型可以预测交通情况,并且可以预测交通情况到6小时之前。风暴撤离管理官员可以使用该模型预测未来交通堵塞,并采取积极措施,以降低撤离过程中的延迟。
methods: 该论文使用了一种新的方法,即在 Bayesian neural network 中添加空间嵌入层,以适应空间环境。此外,论文还提出了一些variants of SBNNs,以及如何使用这些模型来表示各种常见的空间过程。
results: 论文的结果表明,SBNNs 可以比传统的 Bayesian neural network 更好地描述空间数据中的不同过程,并且可以用于表示各种常见的空间过程。此外,论文还提出了一些新的工具来进行 SBNNs 的推断。Abstract
Statistical models for spatial processes play a central role in statistical analyses of spatial data. Yet, it is the simple, interpretable, and well understood models that are routinely employed even though, as is revealed through prior and posterior predictive checks, these can poorly characterise the spatial heterogeneity in the underlying process of interest. Here, we propose a new, flexible class of spatial-process models, which we refer to as spatial Bayesian neural networks (SBNNs). An SBNN leverages the representational capacity of a Bayesian neural network; it is tailored to a spatial setting by incorporating a spatial "embedding layer" into the network and, possibly, spatially-varying network parameters. An SBNN is calibrated by matching its finite-dimensional distribution at locations on a fine gridding of space to that of a target process of interest. That process could be easy to simulate from or we have many realisations from it. We propose several variants of SBNNs, most of which are able to match the finite-dimensional distribution of the target process at the selected grid better than conventional BNNs of similar complexity. We also show that a single SBNN can be used to represent a variety of spatial processes often used in practice, such as Gaussian processes and lognormal processes. We briefly discuss the tools that could be used to make inference with SBNNs, and we conclude with a discussion of their advantages and limitations.
摘要
统计模型 для空间过程扮演了中心角色于统计分析空间数据中。然而,即使是简单、可解释、具有良好了解的模型仍然广泛使用,尽管,通过先前和后预测检查,这些模型可能不能准确捕捉空间过程中的差异性。我们提出了一种新的、灵活的空间过程模型,称为空间 bayesian neural network(SBNN)。一个 SBNN 利用了 bayesian neural network 的表达能力,并在网络中添加了空间“嵌入层”,以适应空间设置。一个 SBNN 通过匹配其在细网格上的finite-dimensional分布与目标过程的分布来调整。该过程可以是容易从 simulate 出来的或者我们有很多实例来源。我们提出了一些 SBNN 的变体,大多数可以在相同的复杂性水平上比 conventional BNNs 更好地匹配目标过程的finite-dimensional分布。我们还表明了一个 SBNN 可以用来表示一些常用的空间过程,如 Gaussian 过程和 lognormal 过程。我们 briefly 讨论了使用 SBNNs 进行推理的工具,并结束于一个关于其优点和局限性的讨论。
Soft Matching Distance: A metric on neural representations that captures single-neuron tuning
for: This paper aims to develop a stricter notion of representational (dis)similarity that requires individual neuron matching across networks, and to generalize this metric to compare networks with different sizes.
methods: The paper uses optimal transport theory to derive a natural generalization of the distance metric based on “soft” permutations, which is symmetric, satisfies the triangle inequality, and can be interpreted as a Wasserstein distance between two empirical distributions.
results: The proposed metric avoids counter-intuitive outcomes suffered by alternative approaches and captures complementary geometric insights into neural representations that are entirely missed by rotation-invariant metrics.Abstract
Common measures of neural representational (dis)similarity are designed to be insensitive to rotations and reflections of the neural activation space. Motivated by the premise that the tuning of individual units may be important, there has been recent interest in developing stricter notions of representational (dis)similarity that require neurons to be individually matched across networks. When two networks have the same size (i.e. same number of neurons), a distance metric can be formulated by optimizing over neuron index permutations to maximize tuning curve alignment. However, it is not clear how to generalize this metric to measure distances between networks with different sizes. Here, we leverage a connection to optimal transport theory to derive a natural generalization based on "soft" permutations. The resulting metric is symmetric, satisfies the triangle inequality, and can be interpreted as a Wasserstein distance between two empirical distributions. Further, our proposed metric avoids counter-intuitive outcomes suffered by alternative approaches, and captures complementary geometric insights into neural representations that are entirely missed by rotation-invariant metrics.
摘要
通用的神经表示(不)相似性度量是设计为感知到旋转和反射的神经活动空间的变换。驱动于各个单元的调音可能是重要的,有些时候有关注于开发更严格的神经表示(不)相似性度量,需要神经网络中的单元在不同网络中匹配。当两个网络有相同的大小(即同样多个单元)时,可以通过最大化神经单元索引Permutation来形式化距离度量。但是,不清楚如何推广这个度量来度量不同大小的网络之间的距离。我们利用了优化运输理论的连接, derivate一个自然的推广,基于"软" Permutation。这个度量是对称的,满足三角不等式,可以被解释为两个empirical分布之间的沃asserstein距离。此外,我们提出的度量可以避免其他方法所导致的不合理的结果,同时捕捉神经表示中完全被旋转不变度量所遗弃的几何视角。