results: 研究表明,使用mutual information和其他方法选择特征可以提高分类器性能,只使用25%和50%的输入特征可以得到最佳结果。这些发现有助于提高防范黑客病毒攻击的计算机安全性。Abstract
Malware poses a significant security risk to individuals, organizations, and critical infrastructure by compromising systems and data. Leveraging memory dumps that offer snapshots of computer memory can aid the analysis and detection of malicious content, including malware. To improve the efficacy and address privacy concerns in malware classification systems, feature selection can play a critical role as it is capable of identifying the most relevant features, thus, minimizing the amount of data fed to classifiers. In this study, we employ three feature selection approaches to identify significant features from memory content and use them with a diverse set of classifiers to enhance the performance and privacy of the classification task. Comprehensive experiments are conducted across three levels of malware classification tasks: i) binary-level benign or malware classification, ii) malware type classification (including Trojan horse, ransomware, and spyware), and iii) malware family classification within each family (with varying numbers of classes). Results demonstrate that the feature selection strategy, incorporating mutual information and other methods, enhances classifier performance for all tasks. Notably, selecting only 25\% and 50\% of input features using Mutual Information and then employing the Random Forest classifier yields the best results. Our findings reinforce the importance of feature selection for malware classification and provide valuable insights for identifying appropriate approaches. By advancing the effectiveness and privacy of malware classification systems, this research contributes to safeguarding against security threats posed by malicious software.
摘要
恶意软件对个人、组织和关键基础设施 pose 安全风险,通过损害系统和数据来潜在地威胁。通过使用内存截图,可以帮助分析和检测恶意内容,包括恶意软件。为了提高分类效果并解决隐私问题,Feature Selection 可以扮演关键的角色,可以将内存中的最相关特征选择出来,从而最小化分类器接受的数据量。在这种研究中,我们采用了三种Feature Selection 方法,并将其与多种分类器结合使用,以提高分类效果和隐私。我们在三个不同的恶意软件分类任务上进行了全面的实验,分别是:1. 二进制级别的坏彩虫或清洁软件分类2. 恶意软件类型分类(包括 Trojan horse、勒索软件和间谍软件)3. 恶意软件家族分类(每个家族有不同的数量的类)结果表明,Feature Selection 策略可以提高所有任务的分类效果。特别是使用 Mutual Information 方法选择输入特征的结果,并使用 Random Forest 分类器,可以获得最佳的结果。我们的发现证明了Feature Selection 对恶意软件分类系统的重要性,并提供了选择合适方法的价值。通过提高恶意软件分类系统的效果和隐私,这项研究对安全威胁 pose 的恶意软件提供了重要的贡献。
Nonparametric active learning for cost-sensitive classification
results: 我们证明了我们的算法具有最佳速度对应数目互动次数,并且在一个更一般化的 Tsybakov 噪音假设下,与对应的静态学习方法相比,有一定的优势。Abstract
Cost-sensitive learning is a common type of machine learning problem where different errors of prediction incur different costs. In this paper, we design a generic nonparametric active learning algorithm for cost-sensitive classification. Based on the construction of confidence bounds for the expected prediction cost functions of each label, our algorithm sequentially selects the most informative vector points. Then it interacts with them by only querying the costs of prediction that could be the smallest. We prove that our algorithm attains optimal rate of convergence in terms of the number of interactions with the feature vector space. Furthermore, in terms of a general version of Tsybakov's noise assumption, the gain over the corresponding passive learning is explicitly characterized by the probability-mass of the boundary decision. Additionally, we prove the near-optimality of obtained upper bounds by providing matching (up to logarithmic factor) lower bounds.
摘要
<>传统的机器学习问题中,不同的预测错误有不同的成本。在这篇论文中,我们设计了一种通用非Parametric活动学习算法 для成本敏感分类。基于预测成本函数每个标签的信任范围的建构,我们的算法顺序选择最有用的维度点。然后它仅查询这些预测成本最小的成本。我们证明了我们的算法在相对于特征向量空间的互动数量上具有优化的速度。此外,在一个泛化的 Tsybakov 噪声假设下,我们提供了明确的获益,其与相应的被动学习相比,由边界决策的概率质量来Explicitly characterize。此外,我们证明了我们获得的上限 bounds 的几乎优化性,通过提供相应的下限 bounds (logarithmic factor) 来证明。Note: "cost-sensitive learning" in the text is translated as "成本敏感学习" in Simplified Chinese.
Automated Gait Generation For Walking, Soft Robotic Quadrupeds
results: 实验结果表明,这种方法可以在4分钟内生成出比手工设计的步态更好的翻译和旋转步态,并且可以在不同的软体机器人设计中进行自动化的步态生成。Abstract
Gait generation for soft robots is challenging due to the nonlinear dynamics and high dimensional input spaces of soft actuators. Limitations in soft robotic control and perception force researchers to hand-craft open loop controllers for gait sequences, which is a non-trivial process. Moreover, short soft actuator lifespans and natural variations in actuator behavior limit machine learning techniques to settings that can be learned on the same time scales as robot deployment. Lastly, simulation is not always possible, due to heterogeneity and nonlinearity in soft robotic materials and their dynamics change due to wear. We present a sample-efficient, simulation free, method for self-generating soft robot gaits, using very minimal computation. This technique is demonstrated on a motorized soft robotic quadruped that walks using four legs constructed from 16 "handed shearing auxetic" (HSA) actuators. To manage the dimension of the search space, gaits are composed of two sequential sets of leg motions selected from 7 possible primitives. Pairs of primitives are executed on one leg at a time; we then select the best-performing pair to execute while moving on to subsequent legs. This method -- which uses no simulation, sophisticated computation, or user input -- consistently generates good translation and rotation gaits in as low as 4 minutes of hardware experimentation, outperforming hand-crafted gaits. This is the first demonstration of completely autonomous gait generation in a soft robot.
摘要
软体机器人步态生成具有较大的挑战,主要是因为软动力器的非线性动态和高维输入空间。控制和感知软体机器人的限制使研究人员需要手工设计开loop控制器,这是一个非常困难的过程。另外,软动力器的寿命和自然变化还限制了机器学习技术的应用,只能在同时间尺度上进行学习。此外,模拟也不可行,因为软体机器人材料的不同和非线性,以及动力学的变化。我们提出了一种样本效率高、无需模拟的软体机器人步态自动生成方法,只需要非常少的计算。这种方法基于两个sequential sets of leg motion primitives,每个leg都选择7种可能的primitives中的一个。这些primitives在一个leg上被执行,然后选择最佳的primitives组合,并在后续的leg上执行。这种方法不需要模拟、复杂的计算或用户输入,可以在4分钟的硬件实验中生成良好的翻译和旋转步态,超出了手工设计的步态。这是软体机器人中的首次完全自动步态生成示例。
Dynamic DAG Discovery for Interpretable Imitation Learning
results: 经验结果表明,提出的方法可以准确地捕捉神经网络所学的知识,并且可以在实际应用中提高神经网络的预测精度和可读性。Abstract
Imitation learning, which learns agent policy by mimicking expert demonstration, has shown promising results in many applications such as medical treatment regimes and self-driving vehicles. However, it remains a difficult task to interpret control policies learned by the agent. Difficulties mainly come from two aspects: 1) agents in imitation learning are usually implemented as deep neural networks, which are black-box models and lack interpretability; 2) the latent causal mechanism behind agents' decisions may vary along the trajectory, rather than staying static throughout time steps. To increase transparency and offer better interpretability of the neural agent, we propose to expose its captured knowledge in the form of a directed acyclic causal graph, with nodes being action and state variables and edges denoting the causal relations behind predictions. Furthermore, we design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. Concretely, we conduct causal discovery from the perspective of Granger causality and propose a self-explainable imitation learning framework, {\method}. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner. After the model is learned, we can obtain causal relations among states and action variables behind its decisions, exposing policies learned by it. Experimental results on both synthetic and real-world datasets demonstrate the effectiveness of the proposed {\method} in learning the dynamic causal graphs for understanding the decision-making of imitation learning meanwhile maintaining high prediction accuracy.
摘要
随着各种应用领域的广泛应用,如医疗治疗和自动驾驶车辆等,模仿学习已经显示了有前途的成果。然而,它仍然是一个困难的任务,即解释控制策略学习的agent。主要问题来源于两个方面:1)agent通常是用深度神经网络实现,这些模型是黑盒模型,缺乏可读性;2)代理人做出决策的 latent causal mechanism 可能会随着时间推移而变化,而不是静止的。为了增加透明度并提供更好的可读性,我们提议通过暴露 agent 捕捉的知识来减少这些问题。具体来说,我们通过 causal discovery 进程来暴露 agent 的决策过程中的 causal 关系,并使其能够模型 latent causal graph 的动态。我们的方法包括三部分:动态 causal discovery 模块、 causality encoding 模块和预测模块,并在综合训练的情况下进行学习。 после模型学习完毕,我们可以从 agent 的决策过程中提取 causal 关系,并对它学习的策略进行解释。实验结果表明,我们提出的方法可以准确地捕捉 agent 的决策过程中的 causal 关系,同时保持高的预测精度。
Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions
results: 研究发现,对提示语言进行小幅修改不会影响生成单元测试的质量。然而, Code Interpreter 能够有效地检查代码中的错误,因此提供 runnable 代码来检查其输出的正确性是有优势的。我们的发现表明,在提示模型类似于 Code Interpreter 时,只需提供基本信息可以生成单元测试,词句级别的细节不太重要。Abstract
Unit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.
摘要
<> translate english text into simplified chineseUnit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.
Generative Design of inorganic compounds using deep diffusion language models
paper_authors: Rongzhi Dong, Nihang Fu, dirisuriya M. D. Siriwardane, Jianjun Hu
For: The paper aims to discover new materials with specific functions by leveraging deep learning and chemical knowledge.* Methods: The authors use a deep learning-based generative model for material composition and structure design, which includes deep diffusion language models and a template-based crystal structure prediction algorithm. They also use a universal graph neural network-based potential for structure relaxation and density functional theory (DFT) calculations for validation.* Results: The authors discovered six new materials with formation energy less than zero, among which four materials (Ti2HfO5, TaNbP, YMoN2, and TaReO4) have an e-above-hull energy of less than 0.3 eV, demonstrating the effectiveness of their approach.Here is the simplified Chinese version of the three key points:* 为:本文目标是利用深度学习和化学知识发现具有特定功能的材料。* 方法:作者们使用深度扩散语言模型来生成材料组成和结构设计,并使用模板基于晶体结构预测算法来预测其对应的结构。他们还使用基于图神经网络的晶体结构relaxation算法和能量函数理论计算来验证新结构的有效性。* 结果:作者们发现了六种新的材料,其中四种材料(Ti2HfO5、TaNbP、YMoN2和TaReO4)的形成能量小于零,并且这些材料的e-above-hull能量小于0.3 eV,证明了他们的方法的有效性。Abstract
Due to the vast chemical space, discovering materials with a specific function is challenging. Chemical formulas are obligated to conform to a set of exacting criteria such as charge neutrality, balanced electronegativity, synthesizability, and mechanical stability. In response to this formidable task, we introduce a deep learning-based generative model for material composition and structure design by learning and exploiting explicit and implicit chemical knowledge. Our pipeline first uses deep diffusion language models as the generator of compositions and then applies a template-based crystal structure prediction algorithm to predict their corresponding structures, which is then followed by structure relaxation using a universal graph neural network-based potential. The density functional theory (DFT) calculations of the formation energies and energy-above-the-hull analysis are used to validate new structures generated through our pipeline. Based on the DFT calculation results, six new materials, including Ti2HfO5, TaNbP, YMoN2, TaReO4, HfTiO2, and HfMnO2, with formation energy less than zero have been found. Remarkably, among these, four materials, namely Ti2$HfO5, TaNbP, YMoN2, and TaReO4, exhibit an e-above-hull energy of less than 0.3 eV. These findings have proved the effectiveness of our approach.
摘要
Our pipeline first uses deep diffusion language models as the generator of compositions and then applies a template-based crystal structure prediction algorithm to predict their corresponding structures. This is followed by structure relaxation using a universal graph neural network-based potential. The density functional theory (DFT) calculations of the formation energies and energy-above-the-hull analysis are used to validate the new structures generated through our pipeline.Based on the DFT calculation results, six new materials with formation energy less than zero have been found, including Ti2HfO5, TaNbP, YMoN2, TaReO4, HfTiO2, and HfMnO2. Remarkably, among these, four materials (Ti2HfO5, TaNbP, YMoN2, and TaReO4) exhibit an e-above-hull energy of less than 0.3 eV. These findings demonstrate the effectiveness of our approach.
Enhancing Mortality Prediction in Heart Failure Patients: Exploring Preprocessing Methods for Imbalanced Clinical Datasets
paper_authors: Hanif Kia, Mansour Vali, Hadi Sabahi for: 这篇论文是为了提高心血管疾病(HF)患者一个月死亡预测的精度。methods: 这篇论文使用了一个全面的预processing框架,包括尺度调整、异常处理和重样化,以及一个识别Missing值的方法。results: 这篇论文使用了PROVE资料集,并借助适当的预processing技术和机器学习(ML)算法,实现了一个月死亡预测的改善。结果显示,使用这些预processing技术可以提高tree-based模型(例如Random Forest和XGB)的F1分数和MCC分数约3.6%和2.7%。这表明了这种预processing方法在处理不均衡的医疗资料时的效果。Abstract
Heart failure (HF) is a critical condition in which the accurate prediction of mortality plays a vital role in guiding patient management decisions. However, clinical datasets used for mortality prediction in HF often suffer from an imbalanced distribution of classes, posing significant challenges. In this paper, we explore preprocessing methods for enhancing one-month mortality prediction in HF patients. We present a comprehensive preprocessing framework including scaling, outliers processing and resampling as key techniques. We also employed an aware encoding approach to effectively handle missing values in clinical datasets. Our study utilizes a comprehensive dataset from the Persian Registry Of cardio Vascular disease (PROVE) with a significant class imbalance. By leveraging appropriate preprocessing techniques and Machine Learning (ML) algorithms, we aim to improve mortality prediction performance for HF patients. The results reveal an average enhancement of approximately 3.6% in F1 score and 2.7% in MCC for tree-based models, specifically Random Forest (RF) and XGBoost (XGB). This demonstrates the efficiency of our preprocessing approach in effectively handling Imbalanced Clinical Datasets (ICD). Our findings hold promise in guiding healthcare professionals to make informed decisions and improve patient outcomes in HF management.
摘要
心血液性疾病(HF)是一种严重的疾病状态,其中准确预测死亡率具有重要的指导作用,以帮助医生对患者进行有效的管理决策。然而,在HF疾病中使用的临床数据集经常受到类别的不均衡分布的困扰,这对于预测死亡率具有重要的挑战。在这篇论文中,我们探讨了适用于HF患者一月死亡预测的预处理技术。我们提出了一个完整的预处理框架,包括缩放、异常处理和重采样等关键技巧。此外,我们采用了一种感知编码方法,以有效地处理临床数据集中的缺失数据。我们的研究使用了来自伊朗cardiovascular疾病注册(PROVE)的全面数据集,这个数据集具有显著的类别不均衡。通过适用适当的预处理技术和机器学习(ML)算法,我们希望提高HF患者的一月死亡预测性能。结果表明,使用树状模型(RF和XGB)时,我们的预处理方法可以提高F1分数的平均提升约3.6%和MCC的平均提升约2.7%。这表明我们的预处理方法可以有效地处理临床数据集中的类别不均衡。我们的发现可能会帮助医生做出更有知识的决策,从而改善HF患者的疾病管理。
results: 这个研究创建了一个名为JustLMD的新的多媒体数据集,包含4.6小时的3D舞蹈动作,以及其 accompaniment的音乐和英文歌词。此外,这个研究还展示了一个跨模式传播网络,可以根据音乐和歌词生成3D舞蹈动作。Abstract
Lyrics often convey information about the songs that are beyond the auditory dimension, enriching the semantic meaning of movements and musical themes. Such insights are important in the dance choreography domain. However, most existing dance synthesis methods mainly focus on music-to-dance generation, without considering the semantic information. To complement it, we introduce JustLMD, a new multimodal dataset of 3D dance motion with music and lyrics. To the best of our knowledge, this is the first dataset with triplet information including dance motion, music, and lyrics. Additionally, we showcase a cross-modal diffusion-based network designed to generate 3D dance motion conditioned on music and lyrics. The proposed JustLMD dataset encompasses 4.6 hours of 3D dance motion in 1867 sequences, accompanied by musical tracks and their corresponding English lyrics.
摘要
文本经常传递歌曲之外的信息,增强舞蹈主题和乐曲的 semantics 含义。这些信息在舞蹈编排领域是非常重要的。然而,大多数现有的舞蹈生成方法主要集中在音乐到舞蹈生成上,忽略了semantic信息。为了补充它,我们介绍了JustLMD,一个新的多Modal dataset,包括3D舞蹈动作、音乐和歌词。我们知道这是首个包含三元信息的 dataset。此外,我们还展示了一种cross-modal填充网络,用于生成3D舞蹈动作,受音乐和歌词的控制。JustLMD dataset包含4.6小时的3D舞蹈动作,共1867个sequences,每个sequence都附有乐曲和其对应的英文歌词。
The objective function equality property of infoGAN for two-layer network
results: 研究表明,infoGAN中的两个目标函数在推定网络和生成网络样本数趋于无穷大时变得等价。这种等价性得到证明,通过考虑推定网络和生成网络函数类型的Rademacher复杂度。此外,使用两层网络,即推定网络和生成网络,并采用了Lipschitz和不递减 activation函数,也验证了这种等价性。Abstract
Information Maximizing Generative Adversarial Network (infoGAN) can be understood as a minimax problem involving two networks: discriminators and generators with mutual information functions. The infoGAN incorporates various components, including latent variables, mutual information, and objective function. This research demonstrates that the two objective functions in infoGAN become equivalent as the discriminator and generator sample size approaches infinity. This equivalence is established by considering the disparity between the empirical and population versions of the objective function. The bound on this difference is determined by the Rademacher complexity of the discriminator and generator function class. Furthermore, the utilization of a two-layer network for both the discriminator and generator, featuring Lipschitz and non-decreasing activation functions, validates this equality
摘要
信息最大化生成对抗网络(infoGAN)可以理解为两个网络:分类器和生成器具有互信息函数。infoGAN包含多个组件,包括隐藏变量、互信息和目标函数。本研究表明,infoGAN中的两个目标函数在分类器和生成器抽样数趋于无穷大时变得等价。这种等价性由考虑分类器和生成器抽样数的实际版本和人口版本之间的差异来确定。此外,使用两层网络作为分类器和生成器,其中包括 lipschitz 和非减少 activation 函数,可以证明这种等价性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.
ResolvNet: A Graph Convolutional Network with multi-scale Consistency
paper_authors: Christian Koke, Abhishek Saroha, Yuesong Shen, Marvin Eisenberger, Daniel Cremers
for: The paper is written to address the limitations of graph neural networks (GNNs) in propagating information over long distances, particularly in the presence of bottlenecks and strongly connected sub-graphs.
methods: The paper introduces a new graph neural network architecture called ResolvNet, which is based on the mathematical concept of resolvents. The authors claim that ResolvNet is more consistent across multiple scales and outperforms baseline models on many tasks.
results: The authors report extensive experimental results on real-world data that demonstrate the effectiveness of ResolvNet in various tasks, including those with bottlenecks and strongly connected sub-graphs. The results show that ResolvNet outperforms baseline models significantly, and that it is more consistent across multiple scales.Abstract
It is by now a well known fact in the graph learning community that the presence of bottlenecks severely limits the ability of graph neural networks to propagate information over long distances. What so far has not been appreciated is that, counter-intuitively, also the presence of strongly connected sub-graphs may severely restrict information flow in common architectures. Motivated by this observation, we introduce the concept of multi-scale consistency. At the node level this concept refers to the retention of a connected propagation graph even if connectivity varies over a given graph. At the graph-level, multi-scale consistency refers to the fact that distinct graphs describing the same object at different resolutions should be assigned similar feature vectors. As we show, both properties are not satisfied by poular graph neural network architectures. To remedy these shortcomings, we introduce ResolvNet, a flexible graph neural network based on the mathematical concept of resolvents. We rigorously establish its multi-scale consistency theoretically and verify it in extensive experiments on real world data: Here networks based on this ResolvNet architecture prove expressive; out-performing baselines significantly on many tasks; in- and outside the multi-scale setting.
摘要
现在已经广泛认可的graph学习社区中的一个事实是,瓶颈会严重限制图 neural network的信息传递范围。而且,Counter-intuitively,强Connected sub-graphs 也可能减少信息流动。基于这一观察,我们引入多尺度一致性概念。在节点级别上,这个概念表示连接度变化的图中保持连通的宣传图。在图级别上,多尺度一致性表示同一个对象在不同的分辨率上描述的不同图应该赋予相似的特征 вектор。我们证明,这两个属性都不满足流行的图 neural network 架构。为了缓解这些缺陷,我们引入ResolvNet,基于解析函数的图 neural network 架构。我们严格地证明其多尺度一致性,并在实际数据上进行了广泛的实验,结果表明:ResolvNet 架构基网络表现强大,在许多任务上与基准值相比表现出优异。
On the Stability of Iterative Retraining of Generative Models on their own Data
results: 经验 validate了这些方法可以确保深度生成模型在混合数据集上的稳定性,并且可以在不同的数据集上进行应用。Abstract
Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models must contend with the reality that their training is curated from both clean data and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets (of real and synthetic data) on their stability. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.
摘要
深度生成模型已经取得了很大的进步,能够模拟复杂的数据,其生成质量经常超过人类能力的识别水平。无疑,这一成就的关键因素是由这些模型所消耗的庞大量数据。由于这些模型的突出表现和易用性,未来网络将被充满假的内容。这一事实直接意味着未来的生成模型需要面对现有的混合数据集(包括真实数据和过去模型生成的假数据)进行训练。在这篇论文中,我们开发了一套框架,用于严谨地研究训练生成模型的稳定性。我们首先证明,如果初始的生成模型足够接近数据分布,并且净训练数据占总训练数据的比重充分大 enough,那么训练过程是稳定的。我们验证了我们的理论通过对Synthetic和自然图像进行iterative训练,使用normalizing flows和state-of-the-art diffusion模型在CIFAR10和FFHQ上进行了实验。
An Efficient Algorithm for Clustered Multi-Task Compressive Sensing
methods: 我们提出了一种新的算法,可以快速地降低模型的推理时间复杂度,而不需要直接计算多个大 covariance 矩阵。我们的方法结合了 Monte Carlo 采样和迭代线性解密。
results: 我们的实验表明,相比现有基eline,我们的算法可以在高维度情况下提高速度,并且可以降低内存占用量。具体来说,我们的算法可以在某些情况下比现有基eline thousands of times faster和一个数量级更高的内存占用量。Abstract
This paper considers clustered multi-task compressive sensing, a hierarchical model that solves multiple compressive sensing tasks by finding clusters of tasks that leverage shared information to mutually improve signal reconstruction. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. The main bottleneck involves repeated matrix inversion and log-determinant computation for multiple large covariance matrices. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices. Our approach combines Monte Carlo sampling with iterative linear solvers. Our experiments reveal that compared to the existing baseline, our algorithm can be up to thousands of times faster and an order of magnitude more memory-efficient.
摘要
Linear Convergence of Pre-Conditioned PI Consensus Algorithm under Restricted Strong Convexity
results: 研究者通过数值 validatePI妥协算法的效果,并与其他分布式凸优化算法进行比较。结果显示,采用本地预处理可以减少通信图的影响,提高PI妥协算法的性能。Abstract
This paper considers solving distributed convex optimization problems in peer-to-peer multi-agent networks. The network is assumed to be synchronous and connected. By using the proportional-integral (PI) control strategy, various algorithms with fixed stepsize have been developed. The earliest among them is the PI consensus algorithm. Using Lyapunov theory, we guarantee exponential convergence of the PI consensus algorithm for restricted strongly convex functions with rate-matching discretization, without requiring convexity of individual local cost functions, for the first time. In order to accelerate the PI consensus algorithm, we incorporate local pre-conditioning in the form of constant positive definite matrices and numerically validate its efficiency compared to the prominent distributed convex optimization algorithms. Unlike classical pre-conditioning, where only the gradients are multiplied by a pre-conditioner, the proposed pre-conditioning modifies both the gradients and the consensus terms, thereby controlling the effect of the communication graph between the agents on the PI consensus algorithm.
摘要
(本文考虑了在多智能机器人网络中解决分布式凸优化问题。网络假设同步连接。通过使用规比积分控制策略,我们开发了多种固定步长的算法。最早的一种是PI妥协算法。使用拉普诺夫理论,我们 garanttees 离散凸函数的凸优化问题的快速收敛,不需要个体本地成本函数的凸性,这是第一次。为了加速PI妥协算法,我们采用了本地预conditioning,通过将常数正定矩阵加到梯度和妥协项中,控制了网络通信图between agents对PI妥协算法的影响。)
Better Situational Graphs by Inferring High-level Semantic-Relational Concepts
results: 在 simulated 和实际数据集上,与基eline算法相比,更高精度和更高效的推理结果,以及新的semantic概念“墙”和其与墙面之间的关系Abstract
Recent works on SLAM extend their pose graphs with higher-level semantic concepts exploiting relationships between them, to provide, not only a richer representation of the situation/environment but also to improve the accuracy of its estimation. Concretely, our previous work, Situational Graphs (S-Graphs), a pioneer in jointly leveraging semantic relationships in the factor optimization process, relies on semantic entities such as wall surfaces and rooms, whose relationship is mathematically defined. Nevertheless, excerpting these high-level concepts relying exclusively on the lower-level factor-graph remains a challenge and it is currently done with ad-hoc algorithms, which limits its capability to include new semantic-relational concepts. To overcome this limitation, in this work, we propose a Graph Neural Network (GNN) for learning high-level semantic-relational concepts that can be inferred from the low-level factor graph. We have demonstrated that we can infer room entities and their relationship to the mapped wall surfaces, more accurately and more computationally efficient than the baseline algorithm. Additionally, to demonstrate the versatility of our method, we provide a new semantic concept, i.e. wall, and its relationship with its wall surfaces. Our proposed method has been integrated into S-Graphs+, and it has been validated in both simulated and real datasets. A docker container with our software will be made available to the scientific community.
摘要
最近的SLAM研究延伸了它们的姿态图,添加更高一级的semantic概念,利用这些概念之间的关系,以提供更加丰富的情况/环境表示,并提高其估计的准确性。具体来说,我们的前一项工作《 Situational Graphs (S-Graphs)》,是jointly利用semantic关系在因素优化过程中的先驱者, rely on semantic entity如墙面和房间,其关系由数学定义。然而,抽取这些高级概念,即依据低级因素图仅存在的方法,是一个挑战,现在通过ad-hoc算法来实现。为了超越这一限制,在这项工作中,我们提出了一种图神经网络(GNN),用于学习高级semantic-relational概念,可以从低级因素图中被推导出。我们已经证明,我们可以更加准确地、更加计算效率地,从低级因素图中推导出房间实体和它们与映射的墙面之间的关系。此外,为了证明我们的方法的多样性,我们还提出了一新的semantic概念,即墙,以及它与它的墙面之间的关系。我们的提议的方法已经被 integrate到S-Graphs+中,并在 simulations和实际数据中进行了验证。我们将在科学社区中提供一个docker容器,包含我们的软件。
Mitigating the Effect of Incidental Correlations on Part-based Learning
results: 研究表明,通过我们的方法,可以在少量数据下实现领先的表现(State-of-the-art,SoTA),并且在背景变化和常见数据损害下,部分表示仍然能够保持更好的泛化性和解释性。Abstract
Intelligent systems possess a crucial characteristic of breaking complicated problems into smaller reusable components or parts and adjusting to new tasks using these part representations. However, current part-learners encounter difficulties in dealing with incidental correlations resulting from the limited observations of objects that may appear only in specific arrangements or with specific backgrounds. These incidental correlations may have a detrimental impact on the generalization and interpretability of learned part representations. This study asserts that part-based representations could be more interpretable and generalize better with limited data, employing two innovative regularization methods. The first regularization separates foreground and background information's generative process via a unique mixture-of-parts formulation. Structural constraints are imposed on the parts using a weakly-supervised loss, guaranteeing that the mixture-of-parts for foreground and background entails soft, object-agnostic masks. The second regularization assumes the form of a distillation loss, ensuring the invariance of the learned parts to the incidental background correlations. Furthermore, we incorporate sparse and orthogonal constraints to facilitate learning high-quality part representations. By reducing the impact of incidental background correlations on the learned parts, we exhibit state-of-the-art (SoTA) performance on few-shot learning tasks on benchmark datasets, including MiniImagenet, TieredImageNet, and FC100. We also demonstrate that the part-based representations acquired through our approach generalize better than existing techniques, even under domain shifts of the background and common data corruption on the ImageNet-9 dataset. The implementation is available on GitHub: https://github.com/GauravBh1010tt/DPViT.git
摘要
智能系统具有分解复杂问题为更小可重用组件或部分的重要特点。然而,当前的部件学习方法在面临有限观察对象的限制下遇到了间接相关性问题,这些问题可能会影响学习的泛化和解释性。这种研究表明,使用两种创新的正则化方法可以使部件表示更加可解和泛化。首先,我们使用了一种唯一的混合部分形式来分离前景和背景信息的生成过程。我们使用弱监督损失来强制实施结构约束,确保混合部分对于前景和背景的混合是软的、对象无关的面具。其次,我们使用了一种液体损失来保证学习的部件具有对于干扰背景相关性的抗变异性。此外,我们还添加了稀疏和正交约束,以便学习高质量的部件表示。通过减少学习部件中的干扰背景相关性,我们实现了state-of-the-art(SoTA)性能在少数shot学习任务上,包括MiniImagenet、TieredImageNet和FC100。我们还证明了我们的方法学习的部件表示能够更好地泛化,即使在背景的域变和通用数据腐坏中。实现可以在GitHub上找到:https://github.com/GauravBh1010tt/DPViT.git。
Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning
results: 我们在三个视觉控制领域进行了实验,结果显示,当基于MBRL的方法加以和谐世界模型的改进时,能获得10%-55%的绝对性能提升。Abstract
Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of more efficient MBRL by harmonizing the interference between observation and reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment through observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating in implicit MBRL and adept at learning task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Capitalizing on these insights and discoveries, we propose a simple yet effective method, Harmony World Models (HarmonyWM), that introduces a lightweight harmonizer to maintain a dynamic equilibrium between the two tasks in world model learning. Our experiments on three visual control domains show that the base MBRL method equipped with HarmonyWM gains 10%-55% absolute performance boosts.
摘要
DURENDAL: Graph deep learning framework for temporal heterogeneous networks
results: 实验表明,DURENDAL在四个动态多元网络 datasets 上的未来链接预测任务中表现出色,比现有的解决方案更具预测力。此外,论文还证明了其模型设计的有效性。Abstract
Temporal heterogeneous networks (THNs) are evolving networks that characterize many real-world applications such as citation and events networks, recommender systems, and knowledge graphs. Although different Graph Neural Networks (GNNs) have been successfully applied to dynamic graphs, most of them only support homogeneous graphs or suffer from model design heavily influenced by specific THNs prediction tasks. Furthermore, there is a lack of temporal heterogeneous networked data in current standard graph benchmark datasets. Hence, in this work, we propose DURENDAL, a graph deep learning framework for THNs. DURENDAL can help to easily repurpose any heterogeneous graph learning model to evolving networks by combining design principles from snapshot-based and multirelational message-passing graph learning models. We introduce two different schemes to update embedding representations for THNs, discussing the strengths and weaknesses of both strategies. We also extend the set of benchmarks for TNHs by introducing two novel high-resolution temporal heterogeneous graph datasets derived from an emerging Web3 platform and a well-established e-commerce website. Overall, we conducted the experimental evaluation of the framework over four temporal heterogeneous network datasets on future link prediction tasks in an evaluation setting that takes into account the evolving nature of the data. Experiments show the prediction power of DURENDAL compared to current solutions for evolving and dynamic graphs, and the effectiveness of its model design.
摘要
Temporal heterogeneous networks (THNs) 是一种发展中的网络,表现在许多实际应用中,如引用和事件网络、推荐系统和知识图。虽然不同的图神经网络(GNNs)在动态图上得到了成功应用,但大多数其中只支持同质graph或受到特定 THNs 预测任务的设计强烈影响。此外,当前的标准图 benchmark 数据集中缺乏 temporal heterogeneous network 数据。因此,在这项工作中,我们提出了 DURENDAL,一个用于 THNs 的图深度学习框架。DURENDAL 可以帮助将any heterogeneous graph learning model 映射到发展中的网络,通过将 snapshot-based 和多关系消息传递的图学习模型设计原则结合。我们提出了两种不同的 THNs 嵌入表示更新策略,讨论了每个策略的优缺点。此外,我们还扩展了 THNs 的 benchmark 集,通过从一个emerging Web3 平台和一个知名的电商网站 derivated 两个高分辨率的时间含盐多关系图 dataset。总的来说,我们在四个时间含盐多关系图上对 DURENDAL 框架进行了实验评估,以考虑数据的发展性。实验结果表明 DURENDAL 在未来链接预测任务中的预测力与当前的动态和发展图解决方案相比,以及其设计的效果。
Anomaly Detection in Power Generation Plants with Generative Adversarial Networks
results: 研究发现,使用GANs进行异常点探测可以实现高准确率,尤其是在使用大量数据时。在这个研究中,模型的准确率为98.99%,比之前不含数据增强时的准确率(66.45%)高得多。Abstract
Anomaly detection is a critical task that involves the identification of data points that deviate from a predefined pattern, useful for fraud detection and related activities. Various techniques are employed for anomaly detection, but recent research indicates that deep learning methods, with their ability to discern intricate data patterns, are well-suited for this task. This study explores the use of Generative Adversarial Networks (GANs) for anomaly detection in power generation plants. The dataset used in this investigation comprises fuel consumption records obtained from power generation plants operated by a telecommunications company. The data was initially collected in response to observed irregularities in the fuel consumption patterns of the generating sets situated at the company's base stations. The dataset was divided into anomalous and normal data points based on specific variables, with 64.88% classified as normal and 35.12% as anomalous. An analysis of feature importance, employing the random forest classifier, revealed that Running Time Per Day exhibited the highest relative importance. A GANs model was trained and fine-tuned both with and without data augmentation, with the goal of increasing the dataset size to enhance performance. The generator model consisted of five dense layers using the tanh activation function, while the discriminator comprised six dense layers, each integrated with a dropout layer to prevent overfitting. Following data augmentation, the model achieved an accuracy rate of 98.99%, compared to 66.45% before augmentation. This demonstrates that the model nearly perfectly classified data points into normal and anomalous categories, with the augmented data significantly enhancing the GANs' performance in anomaly detection. Consequently, this study recommends the use of GANs, particularly when using large datasets, for effective anomaly detection.
摘要
异常检测是一项关键任务,涉及到从先定模式中异常出现的数据点的标识,有用于探测诈骗活动等。各种技术被使用于异常检测,但最新的研究表明,深度学习方法,拥有捕捉复杂数据模式的能力,对此任务非常适合。本研究探讨使用生成对抗网络(GANs)进行异常检测在发电厂中。该研究使用的数据集来自一家电信公司运营的发电厂,该数据集包括发电机组的燃料消耗记录。该数据在观察到发电机组的燃料消耗模式异常时被收集。数据集被分为异常和常见数据点,其中64.88%被分类为常见,35.12%被分类为异常。通过特征重要性分析,使用随机森林分类器,显示运行时间每天的相对重要性最高。GANs模型包括五层杂化函数的生成器模型,而批判器则包括六层杂化函数,每个杂化层都有Dropout层来避免过拟合。经过数据增强后,模型达到了98.99%的准确率,比之前增强前的66.45%高得多。这表明模型可以准确地将数据点分类为常见和异常类别,并且增强后的数据对GANs的异常检测性能有很大提升。因此,本研究建议使用GANs,特别是在使用大量数据时, для有效的异常检测。
Memorization with neural nets: going beyond the worst case
results: 我们通过大量的数值实验证明了这种算法的有效性,并将理论结论与实际情况联系起来。Abstract
In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
摘要
In this paper, we investigate interpolation by taking an instance-specific viewpoint. We propose a simple randomized algorithm that, given a fixed finite dataset with two classes, can construct an interpolating three-layer neural network in polynomial time. The number of parameters required is linked to the geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and go beyond worst-case memorization capacity bounds.We demonstrate the effectiveness of the algorithm in non-pathological situations through extensive numerical experiments and link the insights back to the theoretical results.
Mathematical structure of perfect predictive reservoir computing for autoregressive type of time series data
results: 该论文显示了RC神经网络在AR类时间序列数据上的完美预测能力,并证明了其低训练成本、高速度和高计算能力的优势。Abstract
Reservoir Computing (RC) is a type of recursive neural network (RNN), and there can be no doubt that the RC will be more and more widely used for building future prediction models for time-series data, with low training cost, high speed and high computational power. However, research into the mathematical structure of RC neural networks has only recently begun. Bollt (2021) clarified the necessity of the autoregressive (AR) model for gaining the insight into the mathematical structure of RC neural networks, and indicated that the Wold decomposition theorem is the milestone for understanding of these. Keeping this celebrated result in mind, in this paper, we clarify hidden structures of input and recurrent weight matrices in RC neural networks, and show that such structures attain perfect prediction for the AR type of time series data.
摘要
rezhiyu zhongxin (RC) yisheng yizhi rnn, yige zhisha, RC yisheng yizhi yici yibu zhengxin shi zhengxin yongjian yisheng yizhi yici, gongying yibu zhengxin shi yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. However, RC neural network de yi xiang yu yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. Bollt (2021) ying yong zhengxin yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici, yige zhisha, Wold de zhengxin yisheng yizhi yici yongjian yi yi zhengxin shi yi zhengxin yongjian yisheng yizhi yici. In this paper, we clarify the hidden structures of input and recurrent weight matrices in RC neural networks, and show that such structures achieve perfect prediction for AR type time series data.
SpatialRank: Urban Event Ranking with NDCG Optimization on Spatiotemporal Data
results: 对三个实际 dataset进行了全面的实验,显示SpatialRank可以有效地预测城市事件的顺位,并且在NDCG指标上与当前状态艺术方法相比,提高了12.7%。Abstract
The problem of urban event ranking aims at predicting the top-k most risky locations of future events such as traffic accidents and crimes. This problem is of fundamental importance to public safety and urban administration especially when limited resources are available. The problem is, however, challenging due to complex and dynamic spatio-temporal correlations between locations, uneven distribution of urban events in space, and the difficulty to correctly rank nearby locations with similar features. Prior works on event forecasting mostly aim at accurately predicting the actual risk score or counts of events for all the locations. Rankings obtained as such usually have low quality due to prediction errors. Learning-to-rank methods directly optimize measures such as Normalized Discounted Cumulative Gain (NDCG), but cannot handle the spatiotemporal autocorrelation existing among locations. In this paper, we bridge the gap by proposing a novel spatial event ranking approach named SpatialRank. SpatialRank features adaptive graph convolution layers that dynamically learn the spatiotemporal dependencies across locations from data. In addition, the model optimizes through surrogates a hybrid NDCG loss with a spatial component to better rank neighboring spatial locations. We design an importance-sampling with a spatial filtering algorithm to effectively evaluate the loss during training. Comprehensive experiments on three real-world datasets demonstrate that SpatialRank can effectively identify the top riskiest locations of crimes and traffic accidents and outperform state-of-art methods in terms of NDCG by up to 12.7%.
摘要
urbana 事件排名问题目标是预测未来事件 such as traffic accidents和犯罪的 top-k最危险位置。这个问题对公共安全和城市管理非常重要,特别当有限的资源时。然而,这个问题具有复杂的空间时间相关性、不均匀分布的城市事件空间和难以正确排名邻近位置的问题。先前的事件预测方法主要是准确预测所有位置的实际风险分数或事件数量。rankings 得到的质量通常较低,因为预测错误。我们在这篇论文中bridges 这个差距,提出了一种新的城市事件排名方法 named SpatialRank。SpatialRank 特点是动态学习空间时间相关性的适应 граф卷积层。此外,模型还优化了一个混合的NDCG损失函数,以更好地排名邻近空间位置。我们设计了一种importance sampling 的 spatial filtering 算法,以有效评估损失函数在训练中。三个实际数据集的全面实验表明,SpatialRank 可以有效地预测犯罪和交通事故的最危险位置,并在NDCG指标上与状态艺术方法相比提高至12.7%。
paper_authors: Zhaonan Qu, Alfred Galichon, Johan Ugander
for: 这 paper 是关于 Luce’s 选择假设下的选择和排名模型的研究,包括 Bradley–Terry–Luce 和 Plackett–Luce 模型。
methods: 这 paper 使用了 Sinkhorn 算法,一种经典的矩阵均衡问题的解决方法,来解决 Luce’s 选择假设下的最大可信度估计问题。
results: 这 paper 显示了 Sinkhorn 算法在非正式矩阵上的全线性归一化速率,并且Characterize 这个全线性归一化速率基于数据中的分布式连接性。此外,paper 还 derive 了关于 Sinkhorn 算法的精确几何归一化速率,这是一个经典结果,但是通过更直观的分析,找到了一个内在的正交结构。Abstract
For a broad class of choice and ranking models based on Luce's choice axiom, including the Bradley--Terry--Luce and Plackett--Luce models, we show that the associated maximum likelihood estimation problems are equivalent to a classic matrix balancing problem with target row and column sums. This perspective opens doors between two seemingly unrelated research areas, and allows us to unify existing algorithms in the choice modeling literature as special instances or analogs of Sinkhorn's celebrated algorithm for matrix balancing. We draw inspirations from these connections and resolve important open problems on the study of Sinkhorn's algorithm. We first prove the global linear convergence of Sinkhorn's algorithm for non-negative matrices whenever finite solutions to the matrix balancing problem exist. We characterize this global rate of convergence in terms of the algebraic connectivity of the bipartite graph constructed from data. Next, we also derive the sharp asymptotic rate of linear convergence, which generalizes a classic result of Knight (2008), but with a more explicit analysis that exploits an intrinsic orthogonality structure. To our knowledge, these are the first quantitative linear convergence results for Sinkhorn's algorithm for general non-negative matrices and positive marginals. The connections we establish in this paper between matrix balancing and choice modeling could help motivate further transmission of ideas and interesting results in both directions.
摘要
<>对一类基于逊的选择axioma的选择和排名模型,包括布拉德利--特里--逊和柏拉克特--逊模型,我们示示出其相关的最大 LIKELIHOOD估计问题与经典的矩阵均衡问题相关。这个视角打开了两个似非相关的研究领域之间的连接,并允许我们将现有的选择模型 литературе中的算法视为矩阵均衡问题的特殊情况或类比。我们从这些连接中继承了想法,并解决了关于Sinkhorn的算法的重要开放问题。我们首先证明了Sinkhorn的算法对非负矩阵的全局线性收敛,当finite solutions存在时。然后,我们 caracterize了这个全局收敛率,以及矩阵均衡问题的解的存在。接下来,我们还 derive了对于一般非负矩阵和正边的情况,Sinkhorn的算法的sharp asymptotic rate of linear convergence,这是一个通过抽象的Orthogonality结构进行更加精细的分析,并扩展了Knight (2008)的经典结果。到我们所知,这些结果是Sinkhorn的算法在总体上的第一个量化线性收敛结果。 connections we establish在这篇论文中 между矩阵均衡和选择模型可能会帮助推动两个方向的想法和结果的传输。
Learning State-Augmented Policies for Information Routing in Communication Networks
for: 这 paper 研究了大规模通信网络中信息路由问题,可以视为一个受限制的统计学学习问题,只能使用当地信息。
methods: 该 paper 提出了一种新的状态扩展(SA)策略,通过图神经网络(GNN)架构,在通信网络中启用图 convolution,以便在源节点中最大化总信息。
results: 该 paper 的实验表明,提出的方法可以有效地路由欲要信息到目标节点,并且在实时网络 topology 上进行评估。数值实验表明,该方法比基准算法更好地训练 GNN 参数化。Abstract
This paper examines the problem of information routing in a large-scale communication network, which can be formulated as a constrained statistical learning problem having access to only local information. We delineate a novel State Augmentation (SA) strategy to maximize the aggregate information at source nodes using graph neural network (GNN) architectures, by deploying graph convolutions over the topological links of the communication network. The proposed technique leverages only the local information available at each node and efficiently routes desired information to the destination nodes. We leverage an unsupervised learning procedure to convert the output of the GNN architecture to optimal information routing strategies. In the experiments, we perform the evaluation on real-time network topologies to validate our algorithms. Numerical simulations depict the improved performance of the proposed method in training a GNN parameterization as compared to baseline algorithms.
摘要
Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning
methods: 该论文提出了一种名为 Resource-aware Federated Foundation Models(RaFFM)的框架,该框架使用特殊的模型压缩算法来适应边缘设备的资源限制,例如突出的参数优先级和高性能子网络提取。这些算法允许在Edge FL 系统中动态调整基础模型的大小以适应不同的资源环境。
results: 实验结果表明,RaFFM 可以减少资源使用,同时保持模型性能的水平。具体来说,RaFFM 在自然语言处理和计算机视觉等任务上达到了与传统 Edge FL 方法相同的性能水平,而且占用的资源更少。Abstract
Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
摘要
federated learning (FL) 提供隐私保护的分布式机器学习,在边缘客户端上调参模型而不需要共享私有数据。同时,基础模型 (FM) 在人工智能 (AI) 领域得到了广泛的应用,因为它们在多种任务上表现出色。然而,将 FM integrated into FL 存在挑战,主要是因为它们的较大的大小和资源占用。这pecially true when considering the resource heterogeneity in edge FL systems。我们提出了适应性的 Federated Foundation Models (RaFFM) 框架,以解决这些挑战。RaFFM 引入了特殊的模型压缩算法,适用于 FL 场景,如突出参数优化和高性能子网络提取。这些算法允许在边缘网络中动态scaling给定的 transformer-based FMs,以适应不同资源限制。实验结果表明,RaFFM 在资源利用效率方面表现出显著的优势,并使用 fewer resources 来部署 FMs to FL。尽管资源占用量下降,由 RaFFM 优化的目标模型仍然能够与传统 FL 方法应用于全大小 FMs 的性能相当。这是在自然语言处理和计算机视觉领域中的多个任务上得到证明。
A hybrid quantum-classical conditional generative adversarial network algorithm for human-centered paradigm in cloud
For: The paper aims to improve the quantum generative adversarial network (QGAN) algorithm to conform to the human-centered paradigm, and to solve the problems of random generation and lack of human-computer interaction in QGAN.* Methods: The proposed algorithm, called hybrid quantum-classical conditional generative adversarial network (QCGAN), combines quantum and classical computing to achieve a knowledge-driven human-computer interaction computing mode. The generator uses a parameterized quantum circuit with an all-to-all connected topology, while the discriminator uses a classical neural network.* Results: The QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks, as demonstrated on the quantum cloud computing platform using the BAS training set.Abstract
As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on the artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode that can be implemented in cloud. The purposes of stabilizing the generation process and realizing the interaction between human and computing process are achieved by inputting artificial conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the "input bottleneck" of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.
摘要
traditional Chinese:As an emerging field that aims to bridge the gap between human activities and computing systems, human-centered computing (HCC) in cloud, edge, fog has had a huge impact on artificial intelligence algorithms. The quantum generative adversarial network (QGAN) is considered to be one of the quantum machine learning algorithms with great application prospects, which also should be improved to conform to the human-centered paradigm. The generation process of QGAN is relatively random and the generated model does not conform to the human-centered concept, so it is not quite suitable for real scenarios. In order to solve these problems, a hybrid quantum-classical conditional generative adversarial network (QCGAN) algorithm is proposed, which is a knowledge-driven human-computer interaction computing mode that can be implemented in cloud. The purposes of stabilizing the generation process and realizing the interaction between human and computing process are achieved by inputting artificial conditional information in the generator and discriminator. The generator uses the parameterized quantum circuit with an all-to-all connected topology, which facilitates the tuning of network parameters during the training process. The discriminator uses the classical neural network, which effectively avoids the "input bottleneck" of quantum machine learning. Finally, the BAS training set is selected to conduct experiment on the quantum cloud computing platform. The result shows that the QCGAN algorithm can effectively converge to the Nash equilibrium point after training and perform human-centered classification generation tasks.Simplified Chinese:作为一个崛起的领域,人类活动与计算系统之间的桥梁,人 centered computing(HCC)在云、边缘、fog中已经产生了巨大的影响。量子生成对抗网络(QGAN)是一种具有广泛应用前景的量子机器学习算法,但它的生成过程相对Random,生成的模型不符合人类中心的概念,因此不太适合实际应用。为解决这些问题,我们提出了一种半量子半类 condensed generative adversarial network(QCGAN)算法,这是一种基于知识驱动的人机交互计算模式,可以在云上实现。通过在生成器和识别器中输入人工条件信息,实现了生成过程的稳定化和人机交互的实现。生成器使用具有所有连接的量子Circuit,方便在训练过程中调整网络参数。识别器使用классиical神经网络,有效地避免了量子机器学习的输入瓶颈。最后,选择了BAS训练集,在量子云计算平台上进行实验。结果表明,QCGAN算法可以在训练后快速平衡到纳什平衡点,并完成人类中心的分类生成任务。
CausalImages: An R Package for Causal Inference with Earth Observation, Bio-medical, and Social Science Images
results: 可以快速和可读地进行大规模图像和视频数据分析,并提供vector评估图像或视频内容的功能Abstract
The causalimages R package enables causal inference with image and image sequence data, providing new tools for integrating novel data sources like satellite and bio-medical imagery into the study of cause and effect. One set of functions enables image-based causal inference analyses. For example, one key function decomposes treatment effect heterogeneity by images using an interpretable Bayesian framework. This allows for determining which types of images or image sequences are most responsive to interventions. A second modeling function allows researchers to control for confounding using images. The package also allows investigators to produce embeddings that serve as vector summaries of the image or video content. Finally, infrastructural functions are also provided, such as tools for writing large-scale image and image sequence data as sequentialized byte strings for more rapid image analysis. causalimages therefore opens new capabilities for causal inference in R, letting researchers use informative imagery in substantive analyses in a fast and accessible manner.
摘要
causalimages 包可以帮助研究者进行 causal inference 分析,使用图像和视频数据。这个包提供了一些新的函数,可以帮助研究者将新的数据源,如卫星和生物医学图像,integrated 到 causal inference 中。一个关键函数可以用 Bayesian 框架来划分干扰效应,以确定哪些图像或视频序列是干扰干扰最强的。另一个函数可以帮助研究者控制干扰,使用图像。此外,包还提供了一些基础函数,如用于将大规模图像和视频数据作为字节流进行Sequentialized 的存储和分析。因此,causalimages 开启了新的可能性,让研究者通过使用有用的图像,在快速和可 accessible 的方式进行 causal inference 分析。
Accelerating Non-IID Federated Learning via Heterogeneity-Guided Client Sampling
results: 实验结果显示,在非同一的数据设置下,HiCS-FL可以快速到达训练目标,并且比前一代FL客户端选择方法具有更低的训练方差和更高的可攻击性。Abstract
Statistical heterogeneity of data present at client devices in a federated learning (FL) system renders the training of a global model in such systems difficult. Particularly challenging are the settings where due to resource constraints only a small fraction of clients can participate in any given round of FL. Recent approaches to training a global model in FL systems with non-IID data have focused on developing client selection methods that aim to sample clients with more informative updates of the model. However, existing client selection techniques either introduce significant computation overhead or perform well only in the scenarios where clients have data with similar heterogeneity profiles. In this paper, we propose HiCS-FL (Federated Learning via Hierarchical Clustered Sampling), a novel client selection method in which the server estimates statistical heterogeneity of a client's data using the client's update of the network's output layer and relies on this information to cluster and sample the clients. We analyze the ability of the proposed techniques to compare heterogeneity of different datasets, and characterize convergence of the training process that deploys the introduced client selection method. Extensive experimental results demonstrate that in non-IID settings HiCS-FL achieves faster convergence and lower training variance than state-of-the-art FL client selection schemes. Notably, HiCS-FL drastically reduces computation cost compared to existing selection schemes and is adaptable to different heterogeneity scenarios.
摘要
在 Federated Learning (FL) 系统中,数据在客户端设备上存在统计差异,使得全球模型的训练变得困难。特别是在资源限制下,只有一小部分客户可以参与每个轮次的 FL 训练。现有的 FL 客户选择方法 Either introduce significant computation overhead or perform well only in scenarios where clients have similar heterogeneity profiles. In this paper, we propose HiCS-FL (Federated Learning via Hierarchical Clustered Sampling), a novel client selection method in which the server estimates the statistical heterogeneity of a client's data based on the client's update of the network's output layer and relies on this information to cluster and sample the clients. We analyze the ability of the proposed techniques to compare the heterogeneity of different datasets and characterize the convergence of the training process that deploys the introduced client selection method. Extensive experimental results demonstrate that in non-IID settings, HiCS-FL achieves faster convergence and lower training variance than state-of-the-art FL client selection schemes. Notably, HiCS-FL drastically reduces computation cost compared to existing selection schemes and is adaptable to different heterogeneity scenarios.