cs.LG - 2023-08-20

Preserving Specificity in Federated Graph Learning for fMRI-based Neurological Disorder Identification

  • paper_url: http://arxiv.org/abs/2308.10302
  • repo_url: None
  • paper_authors: Junhao Zhang, Qianqian Wang, Xiaochuan Wang, Lishan Qiao, Mingxia Liu
  • for: 这个研究旨在探索用resting-state功能磁共振成像(rs-fMRI)检测脑部疾病的不同类型之间的连接性,并使用强化图 neural network(GNN)来学习rs-fMRI表示。
  • methods: 这个研究使用了联邦学习(Federated Learning,FL)来进行脑部疾病分析,并将多个实验中心/站点的数据集中心化。每个客户端都有一个共享和专用分支,其中共享分支中的参数将被发送到服务器,而专用分支中的参数则会保留在本地。这可以帮助知识分享和维护站点特有的特征。在共享分支中,我们使用了空间-时间注意力图像同步网络来学习动态rs-fMRI表示。在专用分支中,我们组合了 вектор化的民生资讯(例如年龄、性别和教育年数)和功能连接网络,以保留站点特有的特征。
  • results: 实验结果显示,SFGL比较多种现有的方法更高的精度,并且可以保留站点特有的特征。
    Abstract Resting-state functional magnetic resonance imaging (rs-fMRI) offers a non-invasive approach to examining abnormal brain connectivity associated with brain disorders. Graph neural network (GNN) gains popularity in fMRI representation learning and brain disorder analysis with powerful graph representation capabilities. Training a general GNN often necessitates a large-scale dataset from multiple imaging centers/sites, but centralizing multi-site data generally faces inherent challenges related to data privacy, security, and storage burden. Federated Learning (FL) enables collaborative model training without centralized multi-site fMRI data. Unfortunately, previous FL approaches for fMRI analysis often ignore site-specificity, including demographic factors such as age, gender, and education level. To this end, we propose a specificity-aware federated graph learning (SFGL) framework for rs-fMRI analysis and automated brain disorder identification, with a server and multiple clients/sites for federated model aggregation and prediction. At each client, our model consists of a shared and a personalized branch, where parameters of the shared branch are sent to the server while those of the personalized branch remain local. This can facilitate knowledge sharing among sites and also helps preserve site specificity. In the shared branch, we employ a spatio-temporal attention graph isomorphism network to learn dynamic fMRI representations. In the personalized branch, we integrate vectorized demographic information (i.e., age, gender, and education years) and functional connectivity networks to preserve site-specific characteristics. Representations generated by the two branches are then fused for classification. Experimental results on two fMRI datasets with a total of 1,218 subjects suggest that SFGL outperforms several state-of-the-art approaches.
    摘要 resting-state功能核磁共振成像(rs-fMRI)为脑病诊断提供了一种非侵入性的方法。图 neural network(GNN)在fMRI表示学习中得到了广泛的应用,但是训练一个普通的GNN通常需要大规模的数据集,而这些数据集来自多个成像中心/站点。 federated learning(FL)可以实现多个站点的模型训练,无需集中多个站点的数据。然而,前一些FL方法往往忽略了站点特有的特征,包括人口特征如年龄、性别和教育水平。为此,我们提出了特征意识 Federated Graph Learning(SFGL)框架,用于rs-fMRI分析和自动诊断脑病。在这个框架中,服务器和多个客户端/站点之间进行联合模型训练和预测。每个客户端都有一个共享和个性化分支,其中共享分支中的参数将被发送到服务器,而个性化分支中的参数将保留在本地。这可以促进站点之间的知识共享,同时也可以保持站点特有的特征。在共享分支中,我们采用了空间-时间注意力图同构网络,以学习动态的fMRI表示。在个性化分支中,我们将年龄、性别和教育年数Vector化后与功能连接网络相结合,以保留站点特有的特征。两个分支生成的表示后,将被混合以进行分类。实验结果表明,SFGL在两个fMRI数据集上,共计1,218名参与者,超过了一些当前的状态艺法。

An interpretable deep learning method for bearing fault diagnosis

  • paper_url: http://arxiv.org/abs/2308.10292
  • repo_url: None
  • paper_authors: Hao Lu, Austin M. Bray, Chao Hu, Andrew T. Zimmerman, Hongyi Xu
  • For: 本研究旨在提高深度学习(DL)模型的可解释性,以便在安全关键维护任务中提供更加可信的推荐。* Methods: 该研究使用了卷积神经网络(CNN),并使用梯度权重分割映射(Grad-CAM)来形成可解释的DL方法,用于分类滤波器故障。在训练过程中,我们使用Grad-CAM来确定训练样本的特征重要性,并建立了一个健康图书馆(health library),包含训练样本的彩色映射。在评估过程中,我们使用健康图书馆中的相似性来选择预测基础样本。* Results: 该方法可以轻松地应用于任何CNN模型,不需要修改模型结构。我们的实验结果表明,该方法可以选择基于实际和物理意义的预测基础样本,从而提高DL模型的可信度。
    Abstract Deep learning (DL) has gained popularity in recent years as an effective tool for classifying the current health and predicting the future of industrial equipment. However, most DL models have black-box components with an underlying structure that is too complex to be interpreted and explained to human users. This presents significant challenges when deploying these models for safety-critical maintenance tasks, where non-technical personnel often need to have complete trust in the recommendations these models give. To address these challenges, we utilize a convolutional neural network (CNN) with Gradient-weighted Class Activation Mapping (Grad-CAM) activation map visualizations to form an interpretable DL method for classifying bearing faults. After the model training process, we apply Grad-CAM to identify a training sample's feature importance and to form a library of diagnosis knowledge (or health library) containing training samples with annotated feature maps. During the model evaluation process, the proposed approach retrieves prediction basis samples from the health library according to the similarity of the feature importance. The proposed method can be easily applied to any CNN model without modifying the model architecture, and our experimental results show that this method can select prediction basis samples that are intuitively and physically meaningful, improving the model's trustworthiness for human users.
    摘要

Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi

  • paper_url: http://arxiv.org/abs/2308.10284
  • repo_url: None
  • paper_authors: Hadi Nekoei, Xutong Zhao, Janarthanan Rajendran, Miao Liu, Sarath Chandar
  • For: This paper focuses on cooperative multi-agent reinforcement learning (MARL) algorithms with zero-shot coordination (ZSC) and their ability to adapt to unseen partners.* Methods: The paper uses a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods, and creates a diverse set of pre-trained agents to test their performance.* Results: The paper finds that naive Independent Q-Learning (IQL) agents can adapt as quickly as state-of-the-art ZSC algorithms in most cases, and identifies two categories of hyper-parameters that have a significant impact on the adaptability of Hanabi agents.Here are the three points in Simplified Chinese text:* For: 这篇论文关注了合作多代理权划学习算法(MARL)的零shot协调(ZSC),以及其在未看到合作伙伴时的适应性。* Methods: 论文使用了一个流行的合作多代理权划游戏called Hanabi来评估MARL方法的适应性,并创造了一组多样化的预训练代理来测试其性能。* Results: 论文发现,naive Independent Q-Learning(IQL)代理在大多数情况下可以和现有的ZSC算法快速适应,并identified两类Hyperparameter对合作 Hanabi 代理的适应性有重要影响。
    Abstract Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with Zero-Shot Coordination (ZSC) have gained significant attention in recent years. ZSC refers to the ability of agents to coordinate zero-shot (without additional interaction experience) with independently trained agents. While ZSC is crucial for cooperative MARL agents, it might not be possible for complex tasks and changing environments. Agents also need to adapt and improve their performance with minimal interaction with other agents. In this work, we show empirically that state-of-the-art ZSC algorithms have poor performance when paired with agents trained with different learning methods, and they require millions of interaction samples to adapt to these new partners. To investigate this issue, we formally defined a framework based on a popular cooperative multi-agent game called Hanabi to evaluate the adaptability of MARL methods. In particular, we created a diverse set of pre-trained agents and defined a new metric called adaptation regret that measures the agent's ability to efficiently adapt and improve its coordination performance when paired with some held-out pool of partners on top of its ZSC performance. After evaluating several SOTA algorithms using our framework, our experiments reveal that naive Independent Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC algorithm Off-Belief Learning (OBL). This finding raises an interesting research question: How to design MARL algorithms with high ZSC performance and capability of fast adaptation to unseen partners. As a first step, we studied the role of different hyper-parameters and design choices on the adaptability of current MARL algorithms. Our experiments show that two categories of hyper-parameters controlling the training data diversity and optimization process have a significant impact on the adaptability of Hanabi agents.
    摘要 合作多智能 reinforcement learning(MARL)算法 Zero-Shot 协调(ZSC)在过去几年内吸引了广泛关注。ZSC 指 agents 可以在不经过额外交互经验的情况下协调。虽然 ZSC 对合作 MARL 代理是非常重要,但可能在复杂任务和变化环境中不可能实现。代理还需要适应和改进其性能,只需要最小化与其他代理的交互。在这种情况下,我们证明了,当与其他代理进行交互时,当前的 ZSC 算法的性能不佳,需要数百万次交互样本来适应这些新合作伙伴。为了探讨这个问题,我们提出了一个基于流行的合作多智能游戏 Hanabi 的框架,用于评估 MARL 方法的适应性。特别是,我们创建了一个多样化的预训练代理集和一个新的度量叫做 adaptive regret,用于衡量代理在与一些储存的合作伙伴交互时的协调性能。经过评估多个 SOTA 算法,我们的实验发现,简单的独立 Q-学习(IQL)代理在大多数情况下快速适应,与 SOTA ZSC 算法 Off-Belief Learning(OBL)相当。这一发现提出了一个有趣的研究问题:如何设计 MARL 算法,同时具有高 ZSC 性能和快速适应未见伙伴的能力。作为首先的步骤,我们研究了当前 MARL 算法的不同超参数和设计选择对适应性的影响。我们的实验表明,控制训练数据多样性和优化过程的两类超参数具有重要的影响于 Hanabi 代理的适应性。

Adaptive Uncertainty-Guided Model Selection for Data-Driven PDE Discovery

  • paper_url: http://arxiv.org/abs/2308.10283
  • repo_url: https://github.com/pongpisit-thanasutives/ubic
  • paper_authors: Pongpisit Thanasutives, Takashi Morita, Masayuki Numao, Ken-ichi Fukui
  • for: 用于优选具有少量可靠项的噪声空间时间观察数据的参数适应不确定性加权信息条件(UBIC)。
  • methods: 使用参数适应不确定性加权信息条件(UBIC)来优选具有少量可靠项的噪声空间时间观察数据的参数适应PDE。
  • results: 经过实验证明,UBIC能够成功地确定true governing PDE,并且发现了清理观察数据可以改善模型复杂度和不确定性之间的贝叶тов。Here’s the translation of the abstract in English:
  • for: To propose a new parameter-adaptive uncertainty-penalized Bayesian information criterion (UBIC) for selecting the parsimonious partial differential equation (PDE) that best governs noisy spatial-temporal observed data with few reliable terms.
  • methods: Using the UBIC to select the PDE that adapts to the parameters of the observed data, which is penalized by both the complexity of the PDE and the quantified uncertainty derived from the model supports’ coefficient of variation in a probabilistic view.
  • results: Numerical results confirm the successful application of the UBIC in identifying the true governing PDE, and reveal an interesting effect of denoising the observed data on improving the trade-off between the BIC score and model complexity.
    Abstract We propose a new parameter-adaptive uncertainty-penalized Bayesian information criterion (UBIC) to prioritize the parsimonious partial differential equation (PDE) that sufficiently governs noisy spatial-temporal observed data with few reliable terms. Since the naive use of the BIC for model selection has been known to yield an undesirable overfitted PDE, the UBIC penalizes the found PDE not only by its complexity but also the quantified uncertainty, derived from the model supports' coefficient of variation in a probabilistic view. We also introduce physics-informed neural network learning as a simulation-based approach to further validate the selected PDE flexibly against the other discovered PDE. Numerical results affirm the successful application of the UBIC in identifying the true governing PDE. Additionally, we reveal an interesting effect of denoising the observed data on improving the trade-off between the BIC score and model complexity. Code is available at https://github.com/Pongpisit-Thanasutives/UBIC.
    摘要 我们提出一种新的参数适应 uncertainty-penalized Bayesian信息整合因子(UBIC),用于优先选择具有噪声空间时间观测数据少量可靠项的简洁偏微分方程(PDE)。由于直接使用BIC来进行模型选择可能会导致过度适应PDE,UBIC不仅penalizes发现的PDE的复杂性,还penalizes其所具有的量化不确定性,从probabilistic视角来 derivation。我们还引入物理学 Informed Neural Network学习作为一种基于实验的方法,以验证选择的PDE的可行性。numerical result confirm the successful application of UBIC in identifying the true governing PDE。此外,我们发现对观测数据进行去噪化可以提高模型复杂性和BIC分数之间的负荷平衡。代码可以在https://github.com/Pongpisit-Thanasutives/UBIC上获取。

Enhancing Spatiotemporal Traffic Prediction through Urban Human Activity Analysis

  • paper_url: http://arxiv.org/abs/2308.10282
  • repo_url: https://github.com/suminhan/traffic-uagcrntf
  • paper_authors: Sumin Han, Youngjun Park, Minji Lee, Jisun An, Dongman Lee
  • for: traffic prediction (交通预测)
  • methods: graph convolution deep learning algorithms (图 convolution 深度学习算法)
  • results: state-of-the-art performance without introducing excessive computational overhead (无需增加过分计算开销,达到了当前最佳性能)
    Abstract Traffic prediction is one of the key elements to ensure the safety and convenience of citizens. Existing traffic prediction models primarily focus on deep learning architectures to capture spatial and temporal correlation. They often overlook the underlying nature of traffic. Specifically, the sensor networks in most traffic datasets do not accurately represent the actual road network exploited by vehicles, failing to provide insights into the traffic patterns in urban activities. To overcome these limitations, we propose an improved traffic prediction method based on graph convolution deep learning algorithms. We leverage human activity frequency data from National Household Travel Survey to enhance the inference capability of a causal relationship between activity and traffic patterns. Despite making minimal modifications to the conventional graph convolutional recurrent networks and graph convolutional transformer architectures, our approach achieves state-of-the-art performance without introducing excessive computational overhead.
    摘要 交通预测是公民安全和便利的关键元素之一。现有的交通预测模型主要采用深度学习架构,以捕捉空间和时间相关性。它们往往忽略交通的本质。具体来说,交通数据集中的感知网络不准确表示实际行驶路网,不能提供交通模式的趋势信息。为了解决这些限制,我们提议一种基于图 convolution 深度学习算法的改进交通预测方法。我们利用国家家庭旅游调查数据来增强 causal 关系 между活动和交通模式的推理能力。虽然我们对传统的图 convolutional recurrent networks 和图 convolutional transformer 架构进行了最小的修改,但我们的方法可以达到当今最佳性能,无需增加过多的计算负担。

The DKU-DUKEECE System for the Manipulation Region Location Task of ADD 2023

  • paper_url: http://arxiv.org/abs/2308.10281
  • repo_url: None
  • paper_authors: Zexin Cai, Weiqing Wang, Yikang Wang, Ming Li
  • for: 本文是为了参加第二届音频深伪检测挑战(ADD 2023)的跟踪2类别而设计的系统。
  • methods: 本文使用多种检测系统来定位剪辑区域并确定其真实性。包括边界检测和深伪检测两个帧级系统,以及专门使用真实数据训练的VAE模型来确定音频clip的真实性。
  • results: 通过这三种系统的 fusión,我们的答案在ADD挑战中得到了82.23%的句子准确率和60.66%的F1分数,最终得到了ADD分数0.6713,在Track 2中排名第一。
    Abstract This paper introduces our system designed for Track 2, which focuses on locating manipulated regions, in the second Audio Deepfake Detection Challenge (ADD 2023). Our approach involves the utilization of multiple detection systems to identify splicing regions and determine their authenticity. Specifically, we train and integrate two frame-level systems: one for boundary detection and the other for deepfake detection. Additionally, we employ a third VAE model trained exclusively on genuine data to determine the authenticity of a given audio clip. Through the fusion of these three systems, our top-performing solution for the ADD challenge achieves an impressive 82.23% sentence accuracy and an F1 score of 60.66%. This results in a final ADD score of 0.6713, securing the first rank in Track 2 of ADD 2023.
    摘要 这份论文介绍了我们为Track 2设计的系统,专注于找到修改区域。这是2023年音频深刻投影检测挑战(ADD 2023)的第二轮比赛。我们的方法是利用多个检测系统来标识剪辑区域并确定它们的真实性。我们训练并集成了两个帧级系统:一个用于边界检测,另一个用于深刻检测。此外,我们还使用专门用于真实数据训练的VAE模型来确定一个音频clip的真实性。通过这三个系统的融合,我们的ADD挑战首名解决方案在ADD挑战中实现了82.23%的句子准确率和60.66%的F1分数。这导致了我们在ADD挑战中的最终得分为0.6713,在Track 2中名列第一。

GPFL: Simultaneously Learning Global and Personalized Feature Information for Personalized Federated Learning

  • paper_url: http://arxiv.org/abs/2308.10279
  • repo_url: None
  • paper_authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Jian Cao, Haibing Guan
  • for: 本研究旨在提出一种新的个性化联合学习(pFL)方法,以满足联合学习和个性化学习的两个目标。
  • methods: 本方法使用了一种新的特征提取方法,可以同时学习全局和个体特征信息。
  • results: 对六个数据集进行了三种统计上不同的设置,并对十种现有方法进行了比较。结果显示,GPFL方法在效果、可扩展性、公平性、稳定性和隐私方面都有优势,并且可以避免过拟合和基线方法的超越。
    Abstract Federated Learning (FL) is popular for its privacy-preserving and collaborative learning capabilities. Recently, personalized FL (pFL) has received attention for its ability to address statistical heterogeneity and achieve personalization in FL. However, from the perspective of feature extraction, most existing pFL methods only focus on extracting global or personalized feature information during local training, which fails to meet the collaborative learning and personalization goals of pFL. To address this, we propose a new pFL method, named GPFL, to simultaneously learn global and personalized feature information on each client. We conduct extensive experiments on six datasets in three statistically heterogeneous settings and show the superiority of GPFL over ten state-of-the-art methods regarding effectiveness, scalability, fairness, stability, and privacy. Besides, GPFL mitigates overfitting and outperforms the baselines by up to 8.99% in accuracy.
    摘要 federated learning (FL) 是因其隐私保护和协同学习能力而受欢迎。 最近,个性化 federated learning (pFL) 已引起关注,因为它可以解决统计不同性问题并实现个性化。然而,在特征提取方面,大多数现有的 pFL 方法只是在本地训练中提取全局或个性化特征信息,这与 pFL 的协同学习和个性化目标不符。为了解决这个问题,我们提出了一种新的 pFL 方法,名为 GPFL,它可以在每个客户端上同时学习全局和个性化特征信息。我们在六个数据集上进行了三种统计不同性的设置,并对 GPFL 与十种现状很好的方法进行了广泛的实验。结果表明,GPFL 在效果、可扩展性、公平性、稳定性和隐私方面与现状很好的方法之间具有明显的优势。此外,GPFL 可以避免过拟合并超过基线方法的性能。

Minimalist Traffic Prediction: Linear Layer Is All You Need

  • paper_url: http://arxiv.org/abs/2308.10276
  • repo_url: https://github.com/wenyingduan/STLinear
  • paper_authors: Wenying Duan, Hong Rao, Wei Huang, Xiaoxi He
  • for: 交通预测是智能交通系统(ITS)和智能城市的关键,而STGNNs在这个领域表现出了承诺,但它们还存在计算复杂性、梯度问题和资源浪费等问题。这篇论文提出了三个解决方案,以解决这些问题。
  • methods: 本文提出了三个解决方案,包括节点嵌入方法、时间序列分解和周期学习。我们还介绍了STLinear模型架构,它是一种最优化的、高效的模型,与传统STGNNs不同,它完全地在本地进行计算,不需要间接数据交换,仅仅使用直线层,这大幅降低了计算占用的复杂性。
  • results: 我们的实验表明,STLinear模型在实际数据上具有极高的准确率,与状态lejardin2023年的STGNN基线模型匹配或超越,但计算负担减少了大于95%。总之,STLinear emerges as a powerful and efficient alternative to traditional STGNNs, with far-reaching implications for the future of ITS and smart city initiatives.
    Abstract Traffic prediction is essential for the progression of Intelligent Transportation Systems (ITS) and the vision of smart cities. While Spatial-Temporal Graph Neural Networks (STGNNs) have shown promise in this domain by leveraging Graph Neural Networks (GNNs) integrated with either RNNs or Transformers, they present challenges such as computational complexity, gradient issues, and resource-intensiveness. This paper addresses these challenges, advocating for three main solutions: a node-embedding approach, time series decomposition, and periodicity learning. We introduce STLinear, a minimalist model architecture designed for optimized efficiency and performance. Unlike traditional STGNNs, STlinear operates fully locally, avoiding inter-node data exchanges, and relies exclusively on linear layers, drastically cutting computational demands. Our empirical studies on real-world datasets confirm STLinear's prowess, matching or exceeding the accuracy of leading STGNNs, but with significantly reduced complexity and computation overhead (more than 95% reduction in MACs per epoch compared to state-of-the-art STGNN baseline published in 2023). In summary, STLinear emerges as a potent, efficient alternative to conventional STGNNs, with profound implications for the future of ITS and smart city initiatives.
    摘要 traffic prediction 是智能交通系统(ITS)和智能城市的发展所必需的。而 spatial-temporal graph neural networks(STGNNs)在这个领域已经表现出了承诺,通过结合图神经网络(GNNs)和 either RNNs 或 Transformers。但 STGNNs 也存在一些挑战,例如计算复杂性、梯度问题和资源浪费。这篇文章提出了三个主要解决方案:节点嵌入方法、时间序列分解和周期学习。我们介绍了 STLinear,一种最优化的模型建 architecture,与传统的 STGNNs 不同,STLinear 完全地在本地进行操作,不需要节点之间数据交换,并且仅仅使用线性层,大幅降低计算需求。我们对实际数据集进行了实验,并证明 STLinear 能够匹配或超越现有的 STGNNs 的精度,但是计算过程中的复杂度和负担减少了超过 95% (相比于2023年发表的state-of-the-art STGNN 基eline)。总之,STLinear emerges 作为一种高效、可靠的 STGNNs 代替方案,对智能交通系统和智能城市的未来产生了深远的影响。

SBSM-Pro: Support Bio-sequence Machine for Proteins

  • paper_url: http://arxiv.org/abs/2308.10275
  • repo_url: https://github.com/wyzbio/support-bio-sequence-machine
  • paper_authors: Yizheng Wang, Yixiao Zhai, Yijie Ding, Quan Zou
  • for: 本研究旨在提出一种特定 для生物序列分类的支持vector机器学习模型(SBSM-Pro),以帮助和导引生物实验。
  • methods: 该模型从原始序列开始,将氨基酸按照其物理化学性质分组,并使用序列对比来衡量蛋白质之间的相似性。它还使用一种新的MKL方法将多种信息集成,并使用支持向量机器学习进行分类预测。
  • results: 研究结果表明,SBSM-Pro在10个数据集中表现出色地预测蛋白质功能和后转化 modify。这项研究不仅代表了生物序列分类领域的 estado dell’arte,还开辟了新的方向,为生物序列分类 plataform 的开发做出了有益的贡献。
    Abstract Proteins play a pivotal role in biological systems. The use of machine learning algorithms for protein classification can assist and even guide biological experiments, offering crucial insights for biotechnological applications. We propose a support bio-sequence machine for proteins, a model specifically designed for biological sequence classification. This model starts with raw sequences and groups amino acids based on their physicochemical properties. It incorporates sequence alignment to measure the similarities between proteins and uses a novel MKL approach to integrate various types of information, utilizing support vector machines for classification prediction. The results indicate that our model demonstrates commendable performance across 10 datasets in terms of the identification of protein function and posttranslational modification. This research not only showcases state-of-the-art work in protein classification but also paves the way for new directions in this domain, representing a beneficial endeavour in the development of platforms tailored for biological sequence classification. SBSM-Pro is available for access at http://lab.malab.cn/soft/SBSM-Pro/.
    摘要 生物系统中,蛋白质扮演着重要的角色。通过机器学习算法进行蛋白质分类可以帮助和导引生物实验,提供生物技术应用中的重要关键。我们提议一个生物序列机器学习模型(SBSM-Pro),特别设计用于生物序列分类。这个模型从原始序列开始,根据蛋白质的物理化化性将氨基酸分组。它还包括序列对比来衡量蛋白质之间的相似性,并使用一种新的MKL方法集成不同类型的信息,使用支持向量机器学习进行预测。结果显示,我们的模型在10个数据集上表现出色,对蛋白质功能和后译 modificatory 进行识别。这个研究不仅代表了生物分类领域的 estado-of-the-art,也开启了新的方向,实现了针对生物序列分类的平台的开发。SBSM-Pro可以在http://lab.malab.cn/soft/SBSM-Pro/ 上获取。

An alternative to SVM Method for Data Classification

  • paper_url: http://arxiv.org/abs/2308.11579
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Lakhdar Remaki
  • for: 这 paper 是为了提出一种新的分类方法,以解决支持向量机(SVM)方法的一些缺陷。
  • methods: 这 paper 使用了一种基于最小距离到优化Subspace的方法,其中Subspace 是包含原始类的映射。
  • results: 该方法与 SVM 方法有相似的性能,但具有改进了的一些缺陷,如时间处理、高维度情况下的优化过程的风险、多类分类、不均衡类别和动态分类。
    Abstract Support vector machine (SVM), is a popular kernel method for data classification that demonstrated its efficiency for a large range of practical applications. The method suffers, however, from some weaknesses including; time processing, risk of failure of the optimization process for high dimension cases, generalization to multi-classes, unbalanced classes, and dynamic classification. In this paper an alternative method is proposed having a similar performance, with a sensitive improvement of the aforementioned shortcomings. The new method is based on a minimum distance to optimal subspaces containing the mapped original classes.
    摘要

Turning Waste into Wealth: Leveraging Low-Quality Samples for Enhancing Continuous Conditional Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2308.10273
  • repo_url: None
  • paper_authors: Xin Ding, Yongwei Wang, Zuheng Xu
  • for: 提高 conditional GANs 的可视化质量和标签一致性
  • methods: 使用 dual-NDA 方法,包括两种不同类型的负样本:可见不真实的样本生成自预训练 CcGAN,以及 manipulate 真实图像的标签来生成 label-inconsistent 的负样本。
  • results: Dual-NDA 可以帮助 CcGANs 生成更加可见性和标签一致性的假图像,在 UTKFace 和 Steering Angle 上进行了实验证明,并且可以超越当前状态的 conditional GANs 和液体模型。
    Abstract Continuous Conditional Generative Adversarial Networks (CcGANs) enable generative modeling conditional on continuous scalar variables (termed regression labels). However, they can produce subpar fake images due to limited training data. Although Negative Data Augmentation (NDA) effectively enhances unconditional and class-conditional GANs by introducing anomalies into real training images, guiding the GANs away from low-quality outputs, its impact on CcGANs is limited, as it fails to replicate negative samples that may occur during the CcGAN sampling. We present a novel NDA approach called Dual-NDA specifically tailored for CcGANs to address this problem. Dual-NDA employs two types of negative samples: visually unrealistic images generated from a pre-trained CcGAN and label-inconsistent images created by manipulating real images' labels. Leveraging these negative samples, we introduce a novel discriminator objective alongside a modified CcGAN training algorithm. Empirical analysis on UTKFace and Steering Angle reveals that Dual-NDA consistently enhances the visual fidelity and label consistency of fake images generated by CcGANs, exhibiting a substantial performance gain over the vanilla NDA. Moreover, by applying Dual-NDA, CcGANs demonstrate a remarkable advancement beyond the capabilities of state-of-the-art conditional GANs and diffusion models, establishing a new pinnacle of performance.
    摘要

Large Transformers are Better EEG Learners

  • paper_url: http://arxiv.org/abs/2308.11654
  • repo_url: None
  • paper_authors: Bingxin Wang, Xiaowen Fu, Yuan Lan, Luchan Zhang, Yang Xiang
  • for: 这篇论文的目的是探讨如何将预训transformer模型 fine-tune 以适应电enzephalogram(EEG)资料,以提高预测性能。
  • methods: 本篇论文使用了 AdaCE,即插件和平行对应器,将EEG数据转换为图像和文本形式,以便将预训的vision和language transformer模型 fine-tune 以适应EEG资料。
  • results: 本篇论文的实验结果显示,使用AdaCE可以将预训的transformer模型 fine-tune 以适应EEG资料,并 achiev 州度之前的最佳性能。例如,AdaCE在预训Swin-Transformer模型时,在人活动识别(UCI HAR)任务中取得了99.6%的准确率,升幅9.2%。此外,我们还证明了,将更大的预训模型 fine-tune 以适应EEG资料,可以获得更好的性能。
    Abstract Pre-trained large transformer models have achieved remarkable performance in the fields of natural language processing and computer vision. Since the magnitude of available labeled electroencephalogram (EEG) data is much lower than that of text and image data, it is difficult for transformer models pre-trained from EEG to be developed as large as GPT-4 100T to fully unleash the potential of this architecture. In this paper, we show that transformers pre-trained from images as well as text can be directly fine-tuned for EEG-based prediction tasks. We design AdaCE, plug-and-play Adapters for Converting EEG data into image as well as text forms, to fine-tune pre-trained vision and language transformers. The proposed AdaCE module is highly effective for fine-tuning pre-trained transformers while achieving state-of-the-art performance on diverse EEG-based prediction tasks. For example, AdaCE on the pre-trained Swin-Transformer achieves 99.6%, an absolute improvement of 9.2%, on the EEG-decoding task of human activity recognition (UCI HAR). Furthermore, we empirically show that applying the proposed AdaCE to fine-tune larger pre-trained models can achieve better performance on EEG-based predicting tasks, indicating the potential of our adapters for even larger transformers. The plug-and-play AdaCE module can be applied to fine-tuning most of the popular pre-trained transformers on many other time-series data with multiple channels, not limited to EEG data and the models we use. Our code will be available at https://github.com/wangbxj1234/AdaCE.
    摘要 “将预训大型 трансформа器模型应用于生物类时间序列数据中,具有优异的表现。由于生物类时间序列数据的量较文本和图像数据为低,因此预训自EEG的transformer模型难以发展到GPT-4 100T的水平,以全面发挥这个架构的潜力。本文提出了将图像和文本预训的transformer模型直接调整为EEG数据预测任务的方法。我们设计了 AdaCE,即将EEG数据转换为图像和文本形式的插件,以调整预训的vision和language transformer模型。我们的AdaCE模组具有高效性,可以对预训的transformer模型进行高性能的调整,并在多种EEG预测任务中获得州charts的表现。例如,AdaCE在预训Swin-Transformer上的EEG解oding任务中取得99.6%,相对于原始模型的提升为9.2%。此外,我们还证明了将AdaCE应用于调整更大的预训模型可以在EEG预测任务中获得更好的表现,这表明了我们的插件在更大的transformer模型中的潜力。我们的AdaCE模组可以对多种时间序列数据进行调整,不仅限于EEG数据和我们使用的模型。我们的代码将会在https://github.com/wangbxj1234/AdaCE上公开。”

Towards Synthesizing Datasets for IEEE 802.1 Time-sensitive Networking

  • paper_url: http://arxiv.org/abs/2308.10255
  • repo_url: None
  • paper_authors: Doğanalp Ergenç, Nurefşan Sertbaş Bülbül, Lisa Maile, Anna Arestova, Mathias Fischer
  • for: This paper highlights the need for TSN datasets to support research on AI/ML-based techniques for TSN systems.
  • methods: The paper discusses the main requirements and alternative designs for building a TSN platform to synthesize realistic datasets.
  • results: The paper aims to recapitulate the need for TSN datasets to flourish research on AI/ML-based techniques for TSN systems.Here is the text in Simplified Chinese:
  • for: 这篇论文强调需要TSN数据集,以便为TSN系统中的AI/ML技术进行研究。
  • methods: 论文讨论了TSN平台的主要要求和代理设计,以生成真实的数据集。
  • results: 论文目的是重点强调TSN数据集的需要,以推动TSN系统中的AI/ML技术研究。
    Abstract IEEE 802.1 Time-sensitive Networking (TSN) protocols have recently been proposed to replace legacy networking technologies across different mission-critical systems (MCSs). Design, configuration, and maintenance of TSN within MCSs require advanced methods to tackle the highly complex and interconnected nature of those systems. Accordingly, artificial intelligence (AI) and machine learning (ML) models are the most prominent enablers to develop such methods. However, they usually require a significant amount of data for model training, which is not easily accessible. This short paper aims to recapitulate the need for TSN datasets to flourish research on AI/ML-based techniques for TSN systems. Moreover, it analyzes the main requirements and alternative designs to build a TSN platform to synthesize realistic datasets.
    摘要 IEEE 802.1 时间敏感网络(TSN)协议最近被提议用于取代传统网络技术,以满足不同的使命关键系统(MCS)的需求。 TSN 的设计、配置和维护在 MCS 中需要高级的方法来处理这些系统的高度复杂和相互关联的特点。因此,人工智能(AI)和机器学习(ML)模型是TSN 研究的最佳推动力。然而,这些模型通常需要大量数据进行训练,这些数据往往不易获取。这篇短篇论文想要强调TSN 数据的需求,以便鼓励AI/ML 基于技术的TSN 系统研究。此外,它还分析了TSN 平台的主要需求和代替设计,以生成真实的数据集。

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

  • paper_url: http://arxiv.org/abs/2308.10253
  • repo_url: https://github.com/icoz69/stablellava
  • paper_authors: Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei
    for:The paper aims to address the limitations of current multimodal Large Language Model (LLM) training methods, specifically the domain bias of image-dialogue datasets.methods:The proposed methodology synchronously synthesizes images and dialogues for visual instruction tuning, leveraging the power of generative models such as ChatGPT and text-to-image models.results:The proposed pipeline leads to marked enhancements in more than ten commonly assessed capabilities of the open-source LLAVA model, including improved performance on various datasets.
    Abstract The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets using the open-source LLAVA model as a testbed for our proposed pipeline. Our results underscore marked enhancements across more than ten commonly assessed capabilities,
    摘要 “OpenAI的GPT-4的多模态能力吸引了大量关注,推动了多模态大语言模型(LLM)的开发。主要研究目标之一是将文字和图像模态有效地规避,并理解人类指令。现有的方法ologies often rely on来自 referential datasets的注釈来构建图像对话集合用于训练purposes,类似于 instruction tuning in LLMs。但这些数据集经常受到领域偏见的限制,可能压缩模型的生成能力。为了缓解这些局限性,我们提出了一种新的数据采集方法,同时生成图像和对话。这种方法利用生成模型的能力,结合ChatGPT和文本到图像生成模型,生成了多样化和可控的数据集。这不仅提供了更大的灵活性,而且显著提高了许多常评价的能力。我们的研究包括对多个数据集使用开源的Lava模型进行了广泛的实验。我们的结果表明,我们的提案的管道显著提高了超过十种常评价的能力。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Activation Addition: Steering Language Models Without Optimization

  • paper_url: http://arxiv.org/abs/2308.10248
  • repo_url: None
  • paper_authors: Alex Turner, Lisa Thiergart, David Udell, Gavin Leech, Ulisse Mini, Monte MacDiarmid
  • for: 这篇论文目标是控制大型自然语言模型(LLM)的行为。
  • methods: 该论文提出了一种活动工程(ActAdd)方法,通过在推理时添加激活向量来预测性地改变模型行为。
  • results: 论文通过在GPT-2上进行实验,表明ActAdd方法可以在OpenWebText和ConceptNet上控制输出的高级特性,而且不会影响目标模型性能。此外,该方法需要远少于精通化或RLHF的计算资源,同时允许用户提供自然语言指令,其开销随 modelo 大小而增加。
    Abstract Reliably controlling the behavior of large language models (LLMs) is a pressing open problem. Existing methods include supervised finetuning, reinforcement learning from human feedback (RLHF), prompt engineering and guided decoding. We instead investigate activation engineering: modifying activations at inference time to predictably alter model behavior. In particular, we bias the forward pass with an added 'steering vector' implicitly specified through natural language. Unlike past work which learned these steering vectors (Subramani, Suresh, and Peters 2022; Hernandez, Li, and Andreas 2023), our Activation Addition (ActAdd) method computes them by taking the activation differences that result from pairs of prompts. We demonstrate ActAdd on GPT-2 on OpenWebText and ConceptNet. Our inference-time approach yields control over high-level properties of output and preserves off-target model performance. It involves far less compute and implementation effort compared to finetuning or RLHF, allows users to provide natural language specifications, and its overhead scales naturally with model size.
    摘要 大型语言模型(LLM)的可靠控制是一个开放的问题。现有的方法包括监督微调、人类反馈学习(RLHF)、提示工程和导航解码。我们则 investigate 活动工程:在推理时修改活动以预测性地改变模型行为。特别是,我们在前进通道添加了一个“导航向量”,这些向量通过自然语言进行隐式定义。与过去的工作不同(Subramani et al. 2022;Hernandez et al. 2023),我们的 Activation Addition(ActAdd)方法不是学习这些导航向量,而是通过对提示对的活动差异来计算它们。我们在 GPT-2 上进行了 ActAdd 测试,并在 OpenWebText 和 ConceptNet 上实现了控制输出的高级属性,同时保持了目标模型性能。这种在推理时进行的方法比 finetuning 或 RLHF 需要更少的计算和实现努力,允许用户通过自然语言指定,并且其开销随 modelo 大小呈线性增长。

From Global to Local: Multi-scale Out-of-distribution Detection

  • paper_url: http://arxiv.org/abs/2308.10239
  • repo_url: https://github.com/jimzai/mode-ood
  • paper_authors: Ji Zhang, Lianli Gao, Bingguang Hao, Hao Huang, Jingkuan Song, Hengtao Shen
  • for: The paper is written for detecting out-of-distribution (OOD) data in the context of representation learning.
  • methods: The paper proposes a new framework called Multi-scale OOD DEtection (MODE) that leverages both global visual information and local region details of images to improve OOD detection. The framework includes a new trainable objective called Attention-based Local PropAgation (ALPA) to encourage locally discriminative representations in ID training, and a Cross-Scale Decision (CSD) function to distinguish ID/OOD data during test-time.
  • results: The paper demonstrates the effectiveness and flexibility of MODE on several benchmarks, achieving an average improvement of up to 19.24% in false positive rate (FPR) and 2.77% in area under the receiver operating characteristic curve (AUROC) compared to previous state-of-the-art methods.
    Abstract Out-of-distribution (OOD) detection aims to detect "unknown" data whose labels have not been seen during the in-distribution (ID) training process. Recent progress in representation learning gives rise to distance-based OOD detection that recognizes inputs as ID/OOD according to their relative distances to the training data of ID classes. Previous approaches calculate pairwise distances relying only on global image representations, which can be sub-optimal as the inevitable background clutter and intra-class variation may drive image-level representations from the same ID class far apart in a given representation space. In this work, we overcome this challenge by proposing Multi-scale OOD DEtection (MODE), a first framework leveraging both global visual information and local region details of images to maximally benefit OOD detection. Specifically, we first find that existing models pretrained by off-the-shelf cross-entropy or contrastive losses are incompetent to capture valuable local representations for MODE, due to the scale-discrepancy between the ID training and OOD detection processes. To mitigate this issue and encourage locally discriminative representations in ID training, we propose Attention-based Local PropAgation (ALPA), a trainable objective that exploits a cross-attention mechanism to align and highlight the local regions of the target objects for pairwise examples. During test-time OOD detection, a Cross-Scale Decision (CSD) function is further devised on the most discriminative multi-scale representations to distinguish ID/OOD data more faithfully. We demonstrate the effectiveness and flexibility of MODE on several benchmarks -- on average, MODE outperforms the previous state-of-the-art by up to 19.24% in FPR, 2.77% in AUROC. Code is available at https://github.com/JimZAI/MODE-OOD.
    摘要 外部数据检测(OOD)的目标是检测“未知”的数据,其标签在ID(彩色)训练过程中未被见过。现代表各学习技术的进步使得距离基于OOD检测变得更加有力,它通过在ID类型数据的训练过程中计算对应的距离来认定输入数据是ID还是OOD。但是,以前的方法只基于全局图像表示,这可能会导致图像水平的噪声和类别变化,使得ID类型图像在给定的表示空间中被迫分离。在这种情况下,我们提出了多scale OOD检测(MODE)框架,它首次利用全局视觉信息和本地区域细节来最大限度地提高OOD检测。具体来说,我们发现现有的模型通过预训练的架构损失或对比损失来学习的是无法捕捉ID训练过程中的本地表示,这是因为ID训练和OOD检测过程的比例不同。为了解决这个问题并在ID训练过程中吸引更多的本地表示,我们提出了带有对比机制的注意力归一化(ALPA)对象,它可以在ID训练过程中帮助模型学习更多的本地表示。在测试时OOD检测中,我们还提出了十字缩放决策(CSD)函数,用于在最有价值的多级表示上进行ID/OOD数据的更加准确的分类。我们在多个标准 bencmarks 上展示了MODE 的效果和灵活性,其中平均与前一个状态的探测器相比,MODE 的 False Positive Rate 下降了19.24%,AUC ROC 上升了2.77%。代码可以在 https://github.com/JimZAI/MODE-OOD 上获取。

Thompson Sampling for Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit

  • paper_url: http://arxiv.org/abs/2308.10238
  • repo_url: None
  • paper_authors: Shintaro Nakamura, Masashi Sugiyama
  • for: 解决了多支武器带有不同奖励分布的实用值 combinatorial pure exploration问题(R-CPE-MAB),并提供了一个可靠的算法来解决这个问题。
  • methods: 使用了一种名为Generalized Thompson Sampling Explore(GenTS-Explore)算法,该算法可以在action set的大小是指数增长的情况下运行,而不是先前的算法假设action set的大小是多项式增长的。
  • results: 提供了一个问题依赖的样本复杂度下界,并证明了GenTS-Explore算法可以达到最佳样本复杂度下界,即在问题依赖的常数因子下。
    Abstract We study the real-valued combinatorial pure exploration of the multi-armed bandit (R-CPE-MAB) problem. In R-CPE-MAB, a player is given $d$ stochastic arms, and the reward of each arm $s\in\{1, \ldots, d\}$ follows an unknown distribution with mean $\mu_s$. In each time step, a player pulls a single arm and observes its reward. The player's goal is to identify the optimal \emph{action} $\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$ from a finite-sized real-valued \emph{action set} $\mathcal{A}\subset \mathbb{R}^{d}$ with as few arm pulls as possible. Previous methods in the R-CPE-MAB assume that the size of the action set $\mathcal{A}$ is polynomial in $d$. We introduce an algorithm named the Generalized Thompson Sampling Explore (GenTS-Explore) algorithm, which is the first algorithm that can work even when the size of the action set is exponentially large in $d$. We also introduce a novel problem-dependent sample complexity lower bound of the R-CPE-MAB problem, and show that the GenTS-Explore algorithm achieves the optimal sample complexity up to a problem-dependent constant factor.
    摘要 我们研究实数值的 combinatorial纯exploration多重抓拍机(R-CPE-MAB)问题。在R-CPE-MAB中,一个玩家被给$d$个随机Arm,每个arm $s\in\{1, \ldots, d\}$的奖励follows未知分布的mean $\mu_s$。在每个时间步骤中,一个玩家抓一个arm并观察其奖励。玩家的目标是从一个有限大小的实数值动作集$\mathcal{A}\subset \mathbb{R}^{d}$中选择最佳动作 $\boldsymbol{\pi}^{*} = \argmax_{\boldsymbol{\pi} \in \mathcal{A} \boldsymbol{\mu}^{\top}\boldsymbol{\pi}$,使用最少的arm pulls。先前的R-CPE-MAB方法假设动作集$\mathcal{A}$的大小是 polynomial in $d$。我们介绍了一个名为Generalized Thompson Sampling Explore(GenTS-Explore)算法,这是第一个可以在动作集的大小是 exponentially large in $d$ 时运行的算法。我们还介绍了一个问题依赖性的样本复杂度下界,并证明GenTS-Explore算法实现了最佳样本复杂度的上界,减去一个问题依赖性的常量因子。

FedSIS: Federated Split Learning with Intermediate Representation Sampling for Privacy-preserving Generalized Face Presentation Attack Detection

  • paper_url: http://arxiv.org/abs/2308.10236
  • repo_url: https://github.com/naiftt/fedsis
  • paper_authors: Naif Alkhunaizi, Koushik Srivatsan, Faris Almalik, Ibrahim Almakky, Karthik Nandakumar
  • for: 防止面部攻击探测算法(FacePAD)的 Achilles heel,即缺乏对未见领域/攻击的扩展能力。
  • methods: 联合学习(Federated Learning,FL)和分组学习(Split Learning)的混合方法,以及一个名为“中介表示抽样”的新特点增强技术。
  • results: 在两个知名的跨领域FacePAD测试集上,证明了无需分享原始数据就可以 achieving state-of-the-art 的扩展性性能。Here’s the detailed summary of the paper:The paper addresses the challenge of lacking generalization to unseen domains/attacks in face presentation attack detection (FacePAD) algorithms. Existing methods assume that data from multiple source domains are available for centralized training, but in practice, data from different source domains may be collected by diverse entities who cannot share their data due to legal and privacy constraints. To overcome this problem, the authors propose a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS).FedSIS combines federated learning (FL) and split learning to achieve robustness against statistical heterogeneity in the client data distributions without any sharing of raw data, thereby preserving privacy. In addition, the authors employ a novel feature augmentation strategy called intermediate representation sampling to further improve generalization to unseen domains. The shared adapter network is used to distill discriminative information from intermediate blocks of a Vision Transformer (ViT).The FedSIS approach is evaluated on two well-known benchmarks for cross-domain FacePAD, and the results demonstrate that it is possible to achieve state-of-the-art generalization performance without data sharing. The code for FedSIS is available on GitHub.
    Abstract Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms. Existing attempts to enhance the generalizability of FacePAD solutions assume that data from multiple source domains are available with a single entity to enable centralized training. In practice, data from different source domains may be collected by diverse entities, who are often unable to share their data due to legal and privacy constraints. While collaborative learning paradigms such as federated learning (FL) can overcome this problem, standard FL methods are ill-suited for domain generalization because they struggle to surmount the twin challenges of handling non-iid client data distributions during training and generalizing to unseen domains during inference. In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization. In FedSIS, a hybrid Vision Transformer (ViT) architecture is learned using a combination of FL and split learning to achieve robustness against statistical heterogeneity in the client data distributions without any sharing of raw data (thereby preserving privacy). To further improve generalization to unseen domains, a novel feature augmentation strategy called intermediate representation sampling is employed, and discriminative information from intermediate blocks of a ViT is distilled using a shared adapter network. The FedSIS approach has been evaluated on two well-known benchmarks for cross-domain FacePAD to demonstrate that it is possible to achieve state-of-the-art generalization performance without data sharing. Code: https://github.com/Naiftt/FedSIS
    摘要 缺乏对未经见的领域/攻击的总体化是现有的面呈现攻击检测(FacePAD)算法的 Achilles heel。现有的增强FacePAD解决方案假设可以在单一实体上进行中央化训练,但在实际应用中,数据来源可能是多个不同的实体,这些实体通常因法律和隐私问题无法共享其数据。而合作学习 paradigm such as federated learning(FL)可以解决这个问题,但标准的FL方法在适应不同领域时存在两个主要挑战:处理非标一个客户端数据分布 During training和在推理中适应未经见的领域。在这种情况下,一种新的框架called Federated Split learning with Intermediate representation Sampling(FedSIS)被引入,以保护隐私。在FedSIS中,一个混合的 Computer Vision Transformer(ViT)架构被学习使用分布式学习和分割学习来实现对统计学 heterogeneity 在客户端数据分布中的Robustness,而无需分享Raw数据。为了进一步提高推理中的适应,一种新的特征扩充策略called intermediate representation sampling被采用,并通过一个共享adapter网络来浓缩出ViT中的特征。FedSIS方法在两个常用的 cross-domain FacePADbenchmark上进行评估,以示其可以在不进行数据分享的情况下实现状态的总体化性能。代码:https://github.com/Naiftt/FedSIS

Karma: Adaptive Video Streaming via Causal Sequence Modeling

  • paper_url: http://arxiv.org/abs/2308.10230
  • repo_url: https://github.com/fcbw2012/Karma
  • paper_authors: Bowei Xu, Hao Chen, Zhan Ma
  • for: 提高 adaptive bitrate (ABR) 决策的优化,以提高用户体验质量 (QoE)。
  • methods: 使用 causal sequence modeling 技术,通过考虑过去观察、返回和行动之间的相互关系,以及时地调整行动,以提高 Generalization 性。
  • results: 在 trace-driven simulations 和实际场景测试中,karma 比现有的状态艺术 algoritms 表现出优于平均 QoE 提高率为 10.8% 到 18.7%,并且在未看到的网络条件下表现出强大的泛化能力。
    Abstract Optimal adaptive bitrate (ABR) decision depends on a comprehensive characterization of state transitions that involve interrelated modalities over time including environmental observations, returns, and actions. However, state-of-the-art learning-based ABR algorithms solely rely on past observations to decide the next action. This paradigm tends to cause a chain of deviations from optimal action when encountering unfamiliar observations, which consequently undermines the model generalization. This paper presents Karma, an ABR algorithm that utilizes causal sequence modeling to improve generalization by comprehending the interrelated causality among past observations, returns, and actions and timely refining action when deviation occurs. Unlike direct observation-to-action mapping, Karma recurrently maintains a multi-dimensional time series of observations, returns, and actions as input and employs causal sequence modeling via a decision transformer to determine the next action. In the input sequence, Karma uses the maximum cumulative future quality of experience (QoE) (a.k.a, QoE-to-go) as an extended return signal, which is periodically estimated based on current network conditions and playback status. We evaluate Karma through trace-driven simulations and real-world field tests, demonstrating superior performance compared to existing state-of-the-art ABR algorithms, with an average QoE improvement ranging from 10.8% to 18.7% across diverse network conditions. Furthermore, Karma exhibits strong generalization capabilities, showing leading performance under unseen networks in both simulations and real-world tests.
    摘要 优化的适应比特率(ABR)决策取决于全面的状态转移,包括时间上的相关Modalities和环境观测。然而,当前的学习型ABR算法仅仅基于过去的观测来决定下一步行动。这种方法可能会导致在不熟悉的观测下链地偏离优化的行动,从而降低模型泛化性。本文介绍了Karma算法,它利用 causal sequence modeling来提高泛化性,通过理解过去观测、返回和行动之间的相关 causality来及时修正偏离。不同于直接观测到行动的映射,Karma使用循环维护一个多维时间序列,并使用 causal sequence modeling via 决策变换来确定下一步行动。在输入序列中,Karma使用最大总共计未来体验质量(QoE)(即QoE-to-go)作为扩展返回信号,该信号在当前网络 conditio 和播放状态基础上定期估计。我们通过跟踪驱动的 simulations 和实际场景测试评估了Karma,并证明其与当前状态的art算法相比,在多种网络条件下表现出较高的QoE提升,平均提升率在10.8%到18.7%之间。此外,Karma表现出了强大的泛化能力,在不看到的网络上也表现出领先的表现。

Machine Learning-powered Combinatorial Clock Auction

  • paper_url: http://arxiv.org/abs/2308.10226
  • repo_url: https://github.com/marketdesignresearch/ml-cca
  • paper_authors: Ermis Soumalias, Jakob Weissteiner, Jakob Heiss, Sven Seuken
  • for: This paper aims to address the challenge of designing iterative combinatorial auctions (ICAs) in high-dimensional item spaces, where traditional methods are impractical due to the exponential growth of the bundle space.
  • methods: The paper proposes an ML-powered combinatorial clock auction that elicits information from bidders only via demand queries, which are more practical and less cognitively burdensome than traditional value queries. The paper also presents a novel method for training ML models on demand queries and an efficient method for determining the demand query with the highest clearing potential.
  • results: The paper experimentally evaluates the ML-based demand query mechanism in several spectrum auction domains and compares it against the most established real-world ICA, the combinatorial clock auction (CCA). The results show that the ML-based mechanism significantly outperforms the CCA in terms of efficiency, achieves higher efficiency in a significantly reduced number of rounds, and exhibits vastly higher clearing potential using linear prices.
    Abstract We study the design of iterative combinatorial auctions (ICAs). The main challenge in this domain is that the bundle space grows exponentially in the number of items. To address this, several papers have recently proposed machine learning (ML)-based preference elicitation algorithms that aim to elicit only the most important information from bidders. However, from a practical point of view, the main shortcoming of this prior work is that those designs elicit bidders' preferences via value queries (i.e., ``What is your value for the bundle $\{A,B\}$?''). In most real-world ICA domains, value queries are considered impractical, since they impose an unrealistically high cognitive burden on bidders, which is why they are not used in practice. In this paper, we address this shortcoming by designing an ML-powered combinatorial clock auction that elicits information from the bidders only via demand queries (i.e., ``At prices $p$, what is your most preferred bundle of items?''). We make two key technical contributions: First, we present a novel method for training an ML model on demand queries. Second, based on those trained ML models, we introduce an efficient method for determining the demand query with the highest clearing potential, for which we also provide a theoretical foundation. We experimentally evaluate our ML-based demand query mechanism in several spectrum auction domains and compare it against the most established real-world ICA: the combinatorial clock auction (CCA). Our mechanism significantly outperforms the CCA in terms of efficiency in all domains, it achieves higher efficiency in a significantly reduced number of rounds, and, using linear prices, it exhibits vastly higher clearing potential. Thus, with this paper we bridge the gap between research and practice and propose the first practical ML-powered ICA.
    摘要 我们研究Iterative Combinatorial Auctions(ICA)的设计。ICA的主要挑战在bundlespace exponentiationally grows with the number of items。为解决这个问题,一些最近的论文提出了基于机器学习(ML)的偏好提取算法,以便只提取拍手者的重要信息。然而,从实践角度来看,这些设计都是通过值查询(即“你对合并{$A,B}$的价值是多少?”)来提取拍手者的偏好。在现实世界ICA中,值查询是不实用的,因为它们具有不现实的认知卷积,所以在实践中不被使用。在这篇论文中,我们解决这一缺点,设计了一种基于ML的combined clock auction,该 auction只通过需求查询(即“在价格$p$下,你最有价值的合并哪些ITEMS?”)来提取拍手者的信息。我们做了两个关键的技术贡献:首先,我们提出了一种基于需求查询的ML模型训练方法。其次,基于这些训练过的ML模型,我们引入了一种高效的确定需求查询具有最高清算潜力的方法,并提供了理论基础。我们在多个频谱拍卖领域进行了实验性评估,与现实世界ICA最常用的combined clock auction(CCA)进行了比较。我们的机器学习基于需求查询机制在所有领域中显著高效,在一个明显更少的数量的回合中达到了更高的效率,使用线性价格时,其显示出了极高的清算潜力。因此,我们通过这篇论文,将研究与实践之间的差距 bridge,并提出了实用的ML-powered ICA。

Soft Decomposed Policy-Critic: Bridging the Gap for Effective Continuous Control with Discrete RL

  • paper_url: http://arxiv.org/abs/2308.10203
  • repo_url: None
  • paper_authors: Yechen Zhang, Jian Sun, Gang Wang, Zhuo Li, Wei Chen
  • for: 解决连续控制问题中的维度爆炸问题
  • methods: combining soft RL和actor-critic技术,独立地对每个动作维度进行拟合,并使用共享批处理网络来最大化软$Q$-函数
  • results: 在Mujoco的人型和Box2d的双脚行走任务中,比对当前最佳连续RL算法表现出色,证明了SDPC架构在连续控制问题中的效iveness。
    Abstract Discrete reinforcement learning (RL) algorithms have demonstrated exceptional performance in solving sequential decision tasks with discrete action spaces, such as Atari games. However, their effectiveness is hindered when applied to continuous control problems due to the challenge of dimensional explosion. In this paper, we present the Soft Decomposed Policy-Critic (SDPC) architecture, which combines soft RL and actor-critic techniques with discrete RL methods to overcome this limitation. SDPC discretizes each action dimension independently and employs a shared critic network to maximize the soft $Q$-function. This novel approach enables SDPC to support two types of policies: decomposed actors that lead to the Soft Decomposed Actor-Critic (SDAC) algorithm, and decomposed $Q$-networks that generate Boltzmann soft exploration policies, resulting in the Soft Decomposed-Critic Q (SDCQ) algorithm. Through extensive experiments, we demonstrate that our proposed approach outperforms state-of-the-art continuous RL algorithms in a variety of continuous control tasks, including Mujoco's Humanoid and Box2d's BipedalWalker. These empirical results validate the effectiveness of the SDPC architecture in addressing the challenges associated with continuous control.
    摘要 精细激励学习(RL)算法在解决序列决策任务中的逻辑划分空间时表现出色,如Atari游戏。然而,在连续控制问题中,它们的效果受到维度爆炸的挑战。在这篇论文中,我们提出了软分解策略-评估器(SDPC)架构,它将软RL和演员-评估器技术与精细RL方法相结合,以超越这些限制。SDPC独立地对每个动作维度进行粒度化,并使用共享评估器网络来最大化软$Q$-函数。这种新的方法使得SDPC可以支持两种类型的策略:分解演员,导致的软分解演员-评估器(SDAC)算法,以及分解$Q$-网络,生成博尔tz曼软探索策略,导致的软分解评估器Q(SDCQ)算法。经过广泛的实验,我们证明了我们提出的方法在多种连续控制任务中表现出色,包括Mujoco的人形机器人和Box2d的双脚行走器。这些实验结果证明了SDPC架构在连续控制问题中的有效性。

Hiding Backdoors within Event Sequence Data via Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2308.10201
  • repo_url: None
  • paper_authors: Elizaveta Kovtun, Alina Ermilova, Dmitry Berestnev, Alexey Zaytsev
  • for: 该论文旨在描述如何在深度学习模型中引入潜在的攻击点,以便在金融业中使用深度学习模型时提高安全性。
  • methods: 该论文使用了潜在攻击的技术,包括插入后门和模型权重修改等方法,以便在深度学习模型中引入潜在的攻击点。
  • results: 实验结果表明,在三个开源交易数据集和三种架构(LSTM、CNN和Transformer)上,潜在攻击可以成功地影响深度学习模型的输出。这些结果不仅揭示了当今模型的漏洞,还可以帮助建立更加安全的系统。
    Abstract The financial industry relies on deep learning models for making important decisions. This adoption brings new danger, as deep black-box models are known to be vulnerable to adversarial attacks. In computer vision, one can shape the output during inference by performing an adversarial attack called poisoning via introducing a backdoor into the model during training. For sequences of financial transactions of a customer, insertion of a backdoor is harder to perform, as models operate over a more complex discrete space of sequences, and systematic checks for insecurities occur. We provide a method to introduce concealed backdoors, creating vulnerabilities without altering their functionality for uncontaminated data. To achieve this, we replace a clean model with a poisoned one that is aware of the availability of a backdoor and utilize this knowledge. Our most difficult for uncovering attacks include either additional supervised detection step of poisoned data activated during the test or well-hidden model weight modifications. The experimental study provides insights into how these effects vary across different datasets, architectures, and model components. Alternative methods and baselines, such as distillation-type regularization, are also explored but found to be less efficient. Conducted on three open transaction datasets and architectures, including LSTM, CNN, and Transformer, our findings not only illuminate the vulnerabilities in contemporary models but also can drive the construction of more robust systems.
    摘要 金融业务中使用深度学习模型作出重要决策,这种采用带来新的威胁,因为深度黑盒模型容易受到对抗攻击。在计算机视觉领域,可以在推理过程中形成输出的偏误,通过在训练过程中引入一个后门。但是对于金融交易序列的客户数据,插入后门更加困难,因为模型操作在更复杂的逻辑空间上,并且系统性的安全检查会发生。我们提供了一种引入隐藏后门的方法,创造漏洞而无需改变功能性的数据。我们将干净模型换成恶意模型,并利用这种知识。我们的最难于发现攻击包括额外的超visisted检测步骤, Activated During Test,以及隐藏的模型Weight修改。我们的实验研究提供了不同的数据集、结构和模型组件之间如何影响这些效果的深入理解。我们还explored alternative methods和基线,如distillation-type regularization,但发现它们较为不效。我们在三个公开的交易数据集和结构上进行了实验,包括LSTM、CNN和Transformer。我们的发现不仅揭示了当代模型中的漏洞,而且可以驱动建立更加Robust的系统。

Deep Reinforcement Learning for Artificial Upwelling Energy Management

  • paper_url: http://arxiv.org/abs/2308.10199
  • repo_url: None
  • paper_authors: Yiyuan Zhang, Wei Fan
  • For: 本研究旨在开发一种基于深度强化学习(DRL)算法的能源管理策略,以提高人工温升系统(AUS)的效率和可持续性。* Methods: 本研究使用DRL算法来开发高效的AUS操作策略,并通过大量的 simulations 来评估其性能。* Results: 研究结果表明,DRL算法可以有效地减少AUS中的能源浪费,并确保系统的稳定和高效运行。此外,DRL算法也比传统的规则编程方法和其他DRL算法更有效。
    Abstract The potential of artificial upwelling (AU) as a means of lifting nutrient-rich bottom water to the surface, stimulating seaweed growth, and consequently enhancing ocean carbon sequestration, has been gaining increasing attention in recent years. This has led to the development of the first solar-powered and air-lifted AU system (AUS) in China. However, efficient scheduling of air injection systems remains a crucial challenge in operating AUS, as it holds the potential to significantly improve system efficiency. Conventional approaches based on rules or models are often impractical due to the complex and heterogeneous nature of the marine environment and its associated disturbances. To address this challenge, we propose a novel energy management approach that utilizes deep reinforcement learning (DRL) algorithm to develop efficient strategies for operating AUS. Through extensive simulations, we evaluate the performance of our algorithm and demonstrate its superior effectiveness over traditional rule-based approaches and other DRL algorithms in reducing energy wastage while ensuring the stable and efficient operation of AUS. Our findings suggest that a DRL-based approach offers a promising way for improving the efficiency of AUS and enhancing the sustainability of seaweed cultivation and carbon sequestration in the ocean.
    摘要 人工升采 (AU) 的潜在优势在刚刚几年内得到了越来越多的关注,即通过升采具有营养物质的底层水到表层,促进海藻生长,并因此提高海洋碳储存。在中国,已经开发了第一个阳光动力和空气升采系统 (AUS)。然而,AU系统的有效调度仍然是一个关键挑战,因为它可以大幅提高系统的效率。传统的方法,如规则或模型,经常因marine环境的复杂和多样性而成为不实际。为解决这个挑战,我们提出了一种新的能源管理方法,利用深度强化学习 (DRL) 算法来开发有效的AU系统操作策略。通过广泛的 simulations,我们评估了我们的算法的性能,并证明它在降低能源浪费的同时,保证AU系统的稳定和高效操作。我们的发现表明,使用DRL算法可以有效地提高AU系统的效率,并推动海洋中的海藻培养和碳储存的可持续发展。

ProSpire: Proactive Spatial Prediction of Radio Environment Using Deep Learning

  • paper_url: http://arxiv.org/abs/2308.10193
  • repo_url: None
  • paper_authors: Shamik Sarkar, Dongning Guo, Danijela Cabric
    for: 这篇论文旨在提供一种基于深度学习的批处理框架,以实现频率分享。methods: 该框架利用了人群协助,在常规操作过程中收集数据,并使用深度学习的图像到图像翻译方法(RSSu-net)进行预测。results: 论文的实验结果表明,RSSu-net可以准确预测信号强度水平,其Error Mean Absolute Percentage(EMA)为5dB,与其他相关方法相当。此外,RSSu-net可以为 transmitter 创建 проaktiv 的预测边界,使其Activation 的概率达97%,在预测频率分享方面表现出19%的提升。
    Abstract Spatial prediction of the radio propagation environment of a transmitter can assist and improve various aspects of wireless networks. The majority of research in this domain can be categorized as 'reactive' spatial prediction, where the predictions are made based on a small set of measurements from an active transmitter whose radio environment is to be predicted. Emerging spectrum-sharing paradigms would benefit from 'proactive' spatial prediction of the radio environment, where the spatial predictions must be done for a transmitter for which no measurement has been collected. This paper proposes a novel, supervised deep learning-based framework, ProSpire, that enables spectrum sharing by leveraging the idea of proactive spatial prediction. We carefully address several challenges in ProSpire, such as designing a framework that conveniently collects training data for learning, performing the predictions in a fast manner, enabling operations without an area map, and ensuring that the predictions do not lead to undesired interference. ProSpire relies on the crowdsourcing of transmitters and receivers during their normal operations to address some of the aforementioned challenges. The core component of ProSpire is a deep learning-based image-to-image translation method, which we call RSSu-net. We generate several diverse datasets using ray tracing software and numerically evaluate ProSpire. Our evaluations show that RSSu-net performs reasonably well in terms of signal strength prediction, 5 dB mean absolute error, which is comparable to the average error of other relevant methods. Importantly, due to the merits of RSSu-net, ProSpire creates proactive boundaries around transmitters such that they can be activated with 97% probability of not causing interference. In this regard, the performance of RSSu-net is 19% better than that of other comparable methods.
    摘要 通过预测发射器的广播环境,可以提高无线网络的多个方面。大多数研究在这个领域可以分为“反应性”的广播环境预测,其中预测基于活跃发射器的一小部分测量数据。新兴的spectrum-sharing paradigms需要“主动”的广播环境预测,其中预测是基于没有测量数据的发射器。这篇论文提出了一种新的、 надёжный深度学习基本框架——ProSpire,以便启用spectrum-sharing。我们在ProSpire中谨慎地解决了多个挑战,例如设计一个框架可以方便地收集教程数据,在快速模式下进行预测,不需要地图,并确保预测不会导致不良干扰。ProSpire利用发射器和接收器在正常操作中的拥有者协助来解决一些上述挑战。ProSpire的核心 ком成分是一种基于深度学习的图像到图像翻译方法,我们称之为RSSu-net。我们使用RAY tracing软件生成了多个多样化的数据集,并 numerically evaluates ProSpire。我们的评估表明,RSSu-net在信号强度预测方面的性能 reasonably well,相对于其他相关方法的平均错误率为5dB。此外,由于RSSu-net的优点,ProSpire可以创建主动的边界,使发射器在97%的概率下不会导致干扰。在这个方面,RSSu-net的性能比其他相似方法高19%。

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Competitive Games

  • paper_url: http://arxiv.org/abs/2308.10188
  • repo_url: None
  • paper_authors: The Viet Bui, Tien Mai, Thanh Hong Nguyen
    for: 这项研究的目的是解决多体游戏中的训练问题,特别是因为环境和对手的策略的影响而导致的不确定性。methods: 这项研究使用了模仿学习来理解和预测对手的行为,以减少不确定性。具体来说,他们提出了一种新的多体游戏模仿学习模型,可以预测对手的下一步行动,并将这种模型与策略训练结合在一起。results: 实验结果显示,该方法在三个复杂的游戏环境中表现出优于现有state-of-the-art多体游戏RL算法。
    Abstract Training agents in multi-agent competitive games presents significant challenges due to their intricate nature. These challenges are exacerbated by dynamics influenced not only by the environment but also by opponents' strategies. Existing methods often struggle with slow convergence and instability. To address this, we harness the potential of imitation learning to comprehend and anticipate opponents' behavior, aiming to mitigate uncertainties with respect to the game dynamics. Our key contributions include: (i) a new multi-agent imitation learning model for predicting next moves of the opponents -- our model works with hidden opponents' actions and local observations; (ii) a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process; and (iii) extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.
    摘要 培训多体智能机器人在多体竞争游戏中存在严重的挑战,这些挑战由环境以及对手策略的影响强化。现有方法经常受到慢 converges 和不稳定的问题困扰。为解决这些问题,我们利用仿制学来理解和预测对手的行为,以降低对游戏动力学的不确定性。我们的关键贡献包括:1. 一种新的多体仿制学模型,用于预测对手下一步的行动——我们的模型可以处理隐藏的对手行动和本地观察;2. 一种新的多体强化学习算法,将我们的仿制学模型和策略训练结合在一起;3. 在三个复杂游戏环境中进行了广泛的实验,包括星际战II(SMACv2)的高级版本。实验结果表明,我们的方法在现有状态艺术多体RL算法中显示出优于性能。

Quantization-based Optimization with Perspective of Quantum Mechanics

  • paper_url: http://arxiv.org/abs/2308.11594
  • repo_url: None
  • paper_authors: Jinwuk Seok, Changsik Cho
  • for: 本研究探讨了基于量子力学的全球优化算法的新研究框架。
  • methods: 本研究使用了量子化基于Schrödinger方程的优化方法,并通过对这些方法的分析,揭示了量子力学中允许全球优化的性质。
  • results: 研究发现,基于量子力学的优化方法中的穿透效应,允许逃脱本地最优点。此外,这种穿透效应与量子力学基于全球优化的性质相同。实验结果表明,提出的分析是正确的。
    Abstract Statistical and stochastic analysis based on thermodynamics has been the main analysis framework for stochastic global optimization. Recently, appearing quantum annealing or quantum tunneling algorithm for global optimization, we require a new researching framework for global optimization algorithms. In this paper, we provide the analysis for quantization-based optimization based on the Schr\"odinger equation to reveal what property in quantum mechanics enables global optimization. We present that the tunneling effect derived by the Schr\"odinger equation in quantization-based optimization enables to escape of a local minimum. Additionally, we confirm that this tunneling effect is the same property included in quantum mechanics-based global optimization. Experiments with standard multi-modal benchmark functions represent that the proposed analysis is valid.
    摘要 统计和随机分析基于 термодина学已经是全球优化的主要分析框架。最近,出现了量子气化或量子逃逸算法用于全球优化,我们需要一个新的研究框架来研究全球优化算法。在这篇论文中,我们提供了量子化基于Schrödinger方程的优化分析,以探索量子力学中允许全球优化的性质。我们发现,量子化中的逃逸效应使得可以跃出本地最小值。此外,我们证明这种逃逸效应与量子力学基于全球优化中的性质相同。实验使用标准多模函数表示,我们的分析是有效的。

Rethinking Client Drift in Federated Learning: A Logit Perspective

  • paper_url: http://arxiv.org/abs/2308.10162
  • repo_url: None
  • paper_authors: Yunlu Yan, Chun-Mei Feng, Mang Ye, Wangmeng Zuo, Ping Li, Rick Siow Mong Goh, Lei Zhu, C. L. Philip Chen
  • For: The paper focuses on addressing the issue of client drift in Federated Learning (FL) caused by non-IID data, which degrades the performance of FL.* Methods: The proposed method, called FedCSD, uses class prototype similarity distillation to align local logits with refined global logits weighted by the similarity between local logits and the global prototype. Additionally, an adaptive mask is used to filter out terrible soft labels of global models to prevent them from misleading local optimization.* Results: The proposed method outperforms state-of-the-art federated learning approaches in various heterogeneous settings, as demonstrated by extensive experiments.Here is the same information in Simplified Chinese:* For: 论文旨在解决 Federated Learning (FL) 中客户端漂移问题,即因异步数据导致 FL 性能下降。* Methods: 提议的方法是 FedCSD,它使用类prototype相似抽象来协调本地logits与重视类 prototype 的全局模型。此外,适应性 másc 用于筛选全局模型的差异 Soft Label,以避免它们导致本地优化的干扰。* Results: 比较 experiments 表明,提议的方法在多种不同的设置下表现出色,超过了现有的 Federated Learning 方法。I hope that helps!
    Abstract Federated Learning (FL) enables multiple clients to collaboratively learn in a distributed way, allowing for privacy protection. However, the real-world non-IID data will lead to client drift which degrades the performance of FL. Interestingly, we find that the difference in logits between the local and global models increases as the model is continuously updated, thus seriously deteriorating FL performance. This is mainly due to catastrophic forgetting caused by data heterogeneity between clients. To alleviate this problem, we propose a new algorithm, named FedCSD, a Class prototype Similarity Distillation in a federated framework to align the local and global models. FedCSD does not simply transfer global knowledge to local clients, as an undertrained global model cannot provide reliable knowledge, i.e., class similarity information, and its wrong soft labels will mislead the optimization of local models. Concretely, FedCSD introduces a class prototype similarity distillation to align the local logits with the refined global logits that are weighted by the similarity between local logits and the global prototype. To enhance the quality of global logits, FedCSD adopts an adaptive mask to filter out the terrible soft labels of the global models, thereby preventing them to mislead local optimization. Extensive experiments demonstrate the superiority of our method over the state-of-the-art federated learning approaches in various heterogeneous settings. The source code will be released.
    摘要 federated learning (FL) 允许多个客户端共同学习,以保护隐私。然而,实际世界中的非相同数据会导致客户端漂移,从而下降FL的性能。奇怪的是,我们发现,在模型不断更新后,本地和全球模型之间的差异在增加,这会严重降低FL的性能。这主要是因为数据不同性导致的忘记抛弃。为了解决这个问题,我们提出了一种新的算法,名为FedCSD,即在 federated 框架中进行类prototype相似液化。FedCSD不simply将全球知识传递给本地客户端,因为一个未经训练的全球模型无法提供可靠的知识,即类相似信息,而其假软标签会mislead本地优化。具体来说,FedCSD引入一种类prototype相似液化,将本地征值与全球prototype之间的相似性进行对齐。为了提高全球征值的质量,FedCSD采用了一个适应性的面罩,从而防止全球模型的假软标签对本地优化产生负面影响。我们的实验表明,我们的方法在不同的各种各样的设置下比现状的联邦学习方法表现出色。代码将于发布。

Resource-Adaptive Newton’s Method for Distributed Learning

  • paper_url: http://arxiv.org/abs/2308.10154
  • repo_url: None
  • paper_authors: Shuzhen Chen, Yuan Yuan, Youming Tao, Zhipeng Cai, Dongxiao Yu
  • for: 这篇论文旨在提出一种高效的分布式数据优化方法,以优化对数学运算和通信成本的构成。
  • methods: 这篇论文使用了新顿法,并将其与分布式环境整合,以解决实际应用中的高计算和通信成本问题。
  • results: 论文显示了这种新方法可以实现高效的线性传播速率,并且可以适应可用资源和高效率。它还可以轻松地减少问题的条件数量,并且不需要复杂的参数调整。
    Abstract Distributed stochastic optimization methods based on Newton's method offer significant advantages over first-order methods by leveraging curvature information for improved performance. However, the practical applicability of Newton's method is hindered in large-scale and heterogeneous learning environments due to challenges such as high computation and communication costs associated with the Hessian matrix, sub-model diversity, staleness in training, and data heterogeneity. To address these challenges, this paper introduces a novel and efficient algorithm called RANL, which overcomes the limitations of Newton's method by employing a simple Hessian initialization and adaptive assignments of training regions. The algorithm demonstrates impressive convergence properties, which are rigorously analyzed under standard assumptions in stochastic optimization. The theoretical analysis establishes that RANL achieves a linear convergence rate while effectively adapting to available resources and maintaining high efficiency. Unlike traditional first-order methods, RANL exhibits remarkable independence from the condition number of the problem and eliminates the need for complex parameter tuning. These advantages make RANL a promising approach for distributed stochastic optimization in practical scenarios.
    摘要

Global Warming In Ghana’s Major Cities Based on Statistical Analysis of NASA’s POWER Over 3-Decades

  • paper_url: http://arxiv.org/abs/2308.10909
  • repo_url: None
  • paper_authors: Joshua Attih
  • for: 本研究旨在探讨加纳四大城市的长期气温趋势,以提高气候变化策略的理解。
  • methods: 研究使用NASA的Prediction of Worldwide Energy Resource(POWER)数据进行统计分析,以评估当地气候变化的趋势。 Linear regression 特征分析和eXtreme Gradient Boosting(XGBoost)机器学习方法预测气温变化。 RSLab平台生成的Land Surface Temperature(LST)profile图表提高了准确性。
  • results: 研究发现当地气温趋势,尤其是工业化的阿克拉地区。 人口density不是关键因素。 XGBoost模型的低Root Mean Square Error(RMSE)得分表明其能够准确捕捉气温趋势。 瓦 unexpectedly有最高的平均气温(30.76℃)。 预测2023年中的气温为:阿克拉27.86℃,库马西27.15℃,基特-克拉chi29.39℃,瓦30.76℃。
    Abstract Global warming's impact on high temperatures in various parts of the world has raised concerns. This study investigates long-term temperature trends in four major Ghanaian cities representing distinct climatic zones. Using NASA's Prediction of Worldwide Energy Resource (POWER) data, statistical analyses assess local climate warming and its implications. Linear regression trend analysis and eXtreme Gradient Boosting (XGBoost) machine learning predict temperature variations. Land Surface Temperature (LST) profile maps generated from the RSLab platform enhance accuracy. Results reveal local warming trends, particularly in industrialized Accra. Demographic factors aren't significant. XGBoost model's low Root Mean Square Error (RMSE) scores demonstrate effectiveness in capturing temperature patterns. Wa unexpectedly has the highest mean temperature. Estimated mean temperatures for mid-2023 are: Accra 27.86{\deg}C, Kumasi 27.15{\deg}C, Kete-Krachi 29.39{\deg}C, and Wa 30.76{\deg}C. These findings improve understanding of local climate warming for policymakers and communities, aiding climate change strategies.
    摘要 全球变暖对各地高温的影响已引起关注。这项研究研究了加纳四大城市的长期温度趋势,代表不同气候区。使用NASA的Prediction of Worldwide Energy Resource(POWER)数据,统计分析评估地方气候变暖和其意义。线性回归趋势分析和Machine Learning的eXtreme Gradient Boosting(XGBoost)模型预测温度变化。RSLab平台生成的Land Surface Temperature(LST)profile图表提高了准确性。结果显示了地方温升趋势,特别是工业化的Accra。人口因素不显著。XGBoost模型的低Root Mean Square Error(RMSE)分数表明其能够准确捕捉温度模式。意外地,Wa市有最高的mean温度。预计2023年中期的Accra温度为27.86℃,Kumasi温度为27.15℃,Kete-Krachi温度为29.39℃,Wa温度为30.76℃。这些发现可以帮助政策 makers和社区更好地理解本地气候变化,提高气候变化策略。

OCHID-Fi: Occlusion-Robust Hand Pose Estimation in 3D via RF-Vision

  • paper_url: http://arxiv.org/abs/2308.10146
  • repo_url: None
  • paper_authors: Shujie Zhang, Tianyue Zheng, Zhe Chen, Jingzhi Hu, Abdelwahed Khamis, Jiajun Liu, Jun Luo
  • For: The paper proposes a new method for hand pose estimation (HPE) that can overcome the limitations of cameras-based methods, which are subject to Line-of-Sight (LoS) and cannot capture occluded objects.* Methods: The proposed method, called OCHID-Fi, uses radio-frequency-vision (RF-vision) to bypass obstacles and achieve occluded HPE. It employs wideband RF sensors widely available on smart devices to probe 3D human hand pose and extract their skeletons behind obstacles. To overcome the challenge in labeling RF imaging, OCHID-Fi uses a cross-modality and cross-domain training process that combines a pre-trained CM-HPE network and a synchronized CM/RF dataset.* Results: The paper demonstrates the superiority of OCHID-Fi through experimental results, showing that it achieves comparable accuracy to cameras-based HPE under normal conditions while maintaining such accuracy even in occluded scenarios, with empirical evidence for its generalizability to new domains.Here is the simplified Chinese text for the three key points:* For: 这篇论文提出了一种新的手势估计(HPE)方法,可以超越传统摄像头基于方法的限制,该方法不能捕捉障碍物。* Methods: 该方法使用无线电频视(RF-vision)绕过障碍物实现 occluded HPE,并使用各种智能设备(如 iPhone)上广泛可用的宽频 RF 感知器探测3D人手势和其后方障碍物。* Results: 论文通过实验结果显示,OCHID-Fi 可以在正常情况下与摄像头基于方法相比具有相同的准确率,而且在障碍物情况下仍然保持相同的准确率,并且在新领域中进行了实质性的推广。
    Abstract Hand Pose Estimation (HPE) is crucial to many applications, but conventional cameras-based CM-HPE methods are completely subject to Line-of-Sight (LoS), as cameras cannot capture occluded objects. In this paper, we propose to exploit Radio-Frequency-Vision (RF-vision) capable of bypassing obstacles for achieving occluded HPE, and we introduce OCHID-Fi as the first RF-HPE method with 3D pose estimation capability. OCHID-Fi employs wideband RF sensors widely available on smart devices (e.g., iPhones) to probe 3D human hand pose and extract their skeletons behind obstacles. To overcome the challenge in labeling RF imaging given its human incomprehensible nature, OCHID-Fi employs a cross-modality and cross-domain training process. It uses a pre-trained CM-HPE network and a synchronized CM/RF dataset, to guide the training of its complex-valued RF-HPE network under LoS conditions. It further transfers knowledge learned from labeled LoS domain to unlabeled occluded domain via adversarial learning, enabling OCHID-Fi to generalize to unseen occluded scenarios. Experimental results demonstrate the superiority of OCHID-Fi: it achieves comparable accuracy to CM-HPE under normal conditions while maintaining such accuracy even in occluded scenarios, with empirical evidence for its generalizability to new domains.
    摘要 手势识别(HPE)对许多应用程序是关键,但传统的相机基于CM-HPE方法是完全依赖于直线视野(LoS),因为相机无法捕捉遮盖物体。在这篇论文中,我们提议利用无线电视视野(RF-vision),以绕过障碍物实现遮盖物体HPE,并引入了OCHID-Fi作为首个RF-HPE方法,具有3D手势 pose estimation能力。OCHID-Fi使用常见的宽频RF传感器(例如iPhone上的RF传感器)探测3D人类手势pose并提取其骨架。为了解决RF成像的标注挑战,OCHID-Fi采用了交叉模态和交叉领域的训练过程。它使用一个预训练的CM-HPE网络和一个同步CM/RF数据集,以导引其复杂的RF-HPE网络在LoS条件下进行训练。它还通过对LoS频谱域的标注数据进行反向学习,使得OCHID-Fi能够在未看到障碍物的情况下保持同等精度。实验结果表明,OCHID-Fi具有优于CM-HPE的优势:在正常情况下和障碍情况下都可以保持同等精度,并且在新领域中进行扩展。

Wasserstein Geodesic Generator for Conditional Distributions

  • paper_url: http://arxiv.org/abs/2308.10145
  • repo_url: https://github.com/kyg0910/wasserstein-geodesic-generator-for-conditional-distributions
  • paper_authors: Young-geun Kim, Kyungbok Lee, Youngwon Choi, Joong-Ho Won, Myunghee Cho Paik
  • for: 这篇论文的目的是提出一种新的条件生成算法,用于生成基于不同频率的数据中的条件数据。
  • methods: 这篇论文使用估计条件分布的方法,包括 derivation of tractable upper bound of Wasserstein distance between conditional distributions,以及使用 optimal transport theory 提出的 Wasserstein geodesic generator。
  • results: 这篇论文的实验结果显示,提出的条件生成算法可以将 conditional distributions given observed domains 和 unobserved intermediate domains 连接起来,并且可以learns both conditional distributions for observed domains and optimal transport maps between them。
    Abstract Generating samples given a specific label requires estimating conditional distributions. We derive a tractable upper bound of the Wasserstein distance between conditional distributions to lay the theoretical groundwork to learn conditional distributions. Based on this result, we propose a novel conditional generation algorithm where conditional distributions are fully characterized by a metric space defined by a statistical distance. We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic. The proposed method learns both conditional distributions for observed domains and optimal transport maps between them. The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels. Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method.
    摘要 <>将给定文本翻译成简化中文。<>生成样本需要估计条件分布。我们 derivates a tractable upper bound of Wasserstein distance between conditional distributions, laying the theoretical groundwork to learn conditional distributions。 Based on this result, we propose a novel conditional generation algorithm, where conditional distributions are fully characterized by a metric space defined by a statistical distance。 We employ optimal transport theory to propose the Wasserstein geodesic generator, a new conditional generator that learns the Wasserstein geodesic。 The proposed method learns both conditional distributions for observed domains and optimal transport maps between them。 The conditional distributions given unobserved intermediate domains are on the Wasserstein geodesic between conditional distributions given two observed domain labels。 Experiments on face images with light conditions as domain labels demonstrate the efficacy of the proposed method。Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China. The other version is Traditional Chinese.

ExpeL: LLM Agents Are Experiential Learners

  • paper_url: http://arxiv.org/abs/2308.10144
  • repo_url: https://github.com/Andrewzh112/ExpeL
  • paper_authors: Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, Gao Huang
  • for: 这个论文的目的是提出一种新的机器学习方法,帮助机器学习模型在决策任务中学习和提高表现。
  • methods: 这个论文使用的方法是基于自然语言处理技术,通过自动收集经验和提取知识来让机器学习模型在决策任务中学习和提高表现。
  • results: 论文的实验结果表明,使用这种方法可以帮助机器学习模型在决策任务中表现更好,并且随着经验的积累,模型的性能会越来越好。
    Abstract The recent surge in research interest in applying large language models (LLMs) to decision-making tasks has flourished by leveraging the extensive world knowledge embedded in LLMs. While there is a growing demand to tailor LLMs for custom decision-making tasks, finetuning them for specific tasks is resource-intensive and may diminish the model's generalization capabilities. Moreover, state-of-the-art language models like GPT-4 and Claude are primarily accessible through API calls, with their parametric weights remaining proprietary and unavailable to the public. This scenario emphasizes the growing need for new methodologies that allow learning from agent experiences without requiring parametric updates. To address these problems, we introduce the Experiential Learning (ExpeL) agent. Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results highlight the robust learning efficacy of the ExpeL agent, indicating a consistent enhancement in its performance as it accumulates experiences. We further explore the emerging capabilities and transfer learning potential of the ExpeL agent through qualitative observations and additional experiments.
    摘要 Recent research has seen a surge in interest in applying large language models (LLMs) to decision-making tasks, leveraging the extensive world knowledge embedded in LLMs. However, finetuning them for specific tasks is resource-intensive and may diminish the model's generalization capabilities. Moreover, state-of-the-art language models like GPT-4 and Claude are primarily accessible through API calls, with their parametric weights remaining proprietary and unavailable to the public. In response to these challenges, we introduce the Experiential Learning (ExpeL) agent.Our agent autonomously gathers experiences and extracts knowledge using natural language from a collection of training tasks. At inference, the agent recalls its extracted insights and past experiences to make informed decisions. Our empirical results show that the ExpeL agent exhibits robust learning efficacy, with consistent enhancements in its performance as it accumulates experiences. We also explore the emerging capabilities and transfer learning potential of the ExpeL agent through qualitative observations and additional experiments.

A Review on Objective-Driven Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2308.10135
  • repo_url: None
  • paper_authors: Apoorv Singh
  • for: 论文旨在探讨人工智能和人类智能之间的差距,以及如何使用 Hierarchical planning-based 方法和能量基的方法来填补这些差距。
  • methods: 论文使用了多种人工智能技术,包括supervised learning、reinforcement learning、self-supervised learning等,并对这些方法的局限性进行了批判。
  • results: 论文表明,使用 Hierarchical planning-based 方法和能量基的方法可以有效地填补人工智能和人类智能之间的差距,并提供了一些可能的解决方案。
    Abstract While advancing rapidly, Artificial Intelligence still falls short of human intelligence in several key aspects due to inherent limitations in current AI technologies and our understanding of cognition. Humans have an innate ability to understand context, nuances, and subtle cues in communication, which allows us to comprehend jokes, sarcasm, and metaphors. Machines struggle to interpret such contextual information accurately. Humans possess a vast repository of common-sense knowledge that helps us make logical inferences and predictions about the world. Machines lack this innate understanding and often struggle with making sense of situations that humans find trivial. In this article, we review the prospective Machine Intelligence candidates, a review from Prof. Yann LeCun, and other work that can help close this gap between human and machine intelligence. Specifically, we talk about what's lacking with the current AI techniques such as supervised learning, reinforcement learning, self-supervised learning, etc. Then we show how Hierarchical planning-based approaches can help us close that gap and deep-dive into energy-based, latent-variable methods and Joint embedding predictive architecture methods.
    摘要 Artificial Intelligence 在发展迅速的同时,仍然缺乏人工智能在一些关键方面的能力,这主要归结于当前的 AI 技术和我们认知心理学的限制。人类具有内生的能力来理解上下文、含义和微妙的通信缺失,这使得我们能够理解笑话、讽刺和 мета喻。机器则困难准确地理解上下文信息。人类拥有庞大的常识知识库,这 помо助我们作出逻辑的推理和世界上的预测。机器缺乏这种内生的理解,经常对人类觉得懵逼的情况下发。在这篇文章中,我们评论了目前的机器智能候选人选,包括 Prof. Yann LeCun 的评论以及其他的工作,以帮助将人类和机器智能之间的差距降低。我们首先介绍了当前 AI 技术的缺陷,如监督学习、奖励学习、自监学习等。然后,我们详细介绍了层次规划基础方法,如能量基础、隐变量方法和联合嵌入预测建筑方法,以帮助我们关闭人类和机器智能之间的差距。

AutoReP: Automatic ReLU Replacement for Fast Private Network Inference

  • paper_url: http://arxiv.org/abs/2308.10134
  • repo_url: https://github.com/harveyp123/autorep
  • paper_authors: Hongwu Peng, Shaoyi Huang, Tong Zhou, Yukui Luo, Chenghong Wang, Zigeng Wang, Jiahui Zhao, Xi Xie, Ang Li, Tony Geng, Kaleel Mahmood, Wujie Wen, Xiaolin Xu, Caiwen Ding
  • for: 提高机器学习服务市场中客户数据隐私和安全问题的解决方案。
  • methods: 使用 криптографических primitives 的私有推理(PI)技术,但它们可能具有高计算和通信成本,尤其是非线性运算如 ReLU。
  • results: 提出了一种梯度基本方法,可以减少非线性运算并缓解这些问题,并且可以自动选择 ReLU 和多项式函数,以加速 PI 应用。实验结果显示,对于 CIFAR-10、CIFAR-100 和 Tiny-ImageNet 等 datasets,可以达到6.12%、8.39% 和 9.45% 的准确率提高,相比之前的状态之法,如 SNL。此外,AutoReP 还应用于 ImageNet 数据集上的 EfficientNet-B2 模型,实现了176.1倍 ReLU 预算减少,并达到75.55% 的准确率。
    Abstract The growth of the Machine-Learning-As-A-Service (MLaaS) market has highlighted clients' data privacy and security issues. Private inference (PI) techniques using cryptographic primitives offer a solution but often have high computation and communication costs, particularly with non-linear operators like ReLU. Many attempts to reduce ReLU operations exist, but they may need heuristic threshold selection or cause substantial accuracy loss. This work introduces AutoReP, a gradient-based approach to lessen non-linear operators and alleviate these issues. It automates the selection of ReLU and polynomial functions to speed up PI applications and introduces distribution-aware polynomial approximation (DaPa) to maintain model expressivity while accurately approximating ReLUs. Our experimental results demonstrate significant accuracy improvements of 6.12% (94.31%, 12.9K ReLU budget, CIFAR-10), 8.39% (74.92%, 12.9K ReLU budget, CIFAR-100), and 9.45% (63.69%, 55K ReLU budget, Tiny-ImageNet) over current state-of-the-art methods, e.g., SNL. Morever, AutoReP is applied to EfficientNet-B2 on ImageNet dataset, and achieved 75.55% accuracy with 176.1 times ReLU budget reduction.
    摘要 机器学习服务(MLaaS)市场的增长对客户的数据隐私和安全问题提出了问题。隐私(PI)技术使用加密基础设计可以解决这些问题,但是它们可能会有高 computation和通信成本,尤其是在非线性操作符如ReLU中。许多尝试减少ReLU操作的方法已经存在,但是它们可能需要调整阈值或导致重大准确损失。本研究则引入自动ReP,一种Gradient-based的方法,以减少非线性操作和解决这些问题。它自动选择ReLU和多项式函数,以加速PI应用程序,并 introduce了分布式数据掌控多项式拟合(DaPa),以维持模型表达力而准确地拟合ReLUs。我们的实验结果显示,与现有的方法相比,AutoReP可以获得了6.12%(94.31%, 12.9K ReLU预算、CIFAR-10)、8.39%(74.92%, 12.9K ReLU预算、CIFAR-100)和9.45%(63.69%, 55K ReLU预算、Tiny-ImageNet)的准确度提升。此外,我们还应用了AutoReP到EfficientNet-B2 ImageNet dataset,获得了75.55%的准确度,并在ReLU预算下降75.55倍。

Intelligent Communication Planning for Constrained Environmental IoT Sensing with Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.10124
  • repo_url: None
  • paper_authors: Yi Hu, Jinhang Zuo, Bob Iannucci, Carlee Joe-Wong
  • for: 这个论文旨在优化互联网络传感器在环境监测中的通信规划,以提高环境监测的准确性。
  • methods: 这个论文使用多代理学习法(MARL)来找到优化环境数据收集的通信策略,以最大化环境监测准确性。
  • results: 实验表明,使用MARL方法可以在不知道带宽限制的情况下,平衡收集足够数据来预测野火蔓延与环境数据的协调。
    Abstract Internet of Things (IoT) technologies have enabled numerous data-driven mobile applications and have the potential to significantly improve environmental monitoring and hazard warnings through the deployment of a network of IoT sensors. However, these IoT devices are often power-constrained and utilize wireless communication schemes with limited bandwidth. Such power constraints limit the amount of information each device can share across the network, while bandwidth limitations hinder sensors' coordination of their transmissions. In this work, we formulate the communication planning problem of IoT sensors that track the state of the environment. We seek to optimize sensors' decisions in collecting environmental data under stringent resource constraints. We propose a multi-agent reinforcement learning (MARL) method to find the optimal communication policies for each sensor that maximize the tracking accuracy subject to the power and bandwidth limitations. MARL learns and exploits the spatial-temporal correlation of the environmental data at each sensor's location to reduce the redundant reports from the sensors. Experiments on wildfire spread with LoRA wireless network simulators show that our MARL method can learn to balance the need to collect enough data to predict wildfire spread with unknown bandwidth limitations.
    摘要 互联网物品(IoT)技术已经启用了大量数据驱动的 mobil 应用程序,并具有改善环境监控和危机警告的潜在潜力。然而,这些 IoT 设备通常受到能源限制和有限带宽通信方案的限制。这些限制限制每个设备可以在网络上传输的资料量,而带宽限制则阻碍感应器对传输的调控。在这个工作中,我们形式化了 IoT 感应器在监控环境状态时的通信规划问题。我们寻找最佳的通信策略,以确保感应器可以在 stringent 资源限制下传输环境数据,并且可以最大化追踪精度。我们提出了一种多代理强化学习(MARL)方法,以确保感应器在受限的资源下可以对环境数据进行最佳化追踪。MARL 方法可以学习和利用每个感应器的位置空间时间相关性,以减少感应器之间的重复报告。在使用 LoRA 无线网络模拟器进行野火传播实验中,我们发现我们的 MARL 方法可以寻找具有不知名带宽限制的情况下对野火传播进行最佳化追踪。

Deep Generative Modeling-based Data Augmentation with Demonstration using the BFBT Benchmark Void Fraction Datasets

  • paper_url: http://arxiv.org/abs/2308.10120
  • repo_url: None
  • paper_authors: Farah Alsafadi, Xu Wu
  • for: 本研究用了深度学习(DL)来解决因数据稀缺而困难进行研究的核工程问题。
  • methods: 本研究使用了深度生成模型(DGM),包括生成对抗网络(GAN)、归一化函数(NF)、变量自动编码器(VAE)和条件VAE(CVAE)等,来学习训练数据集中的下式分布。
  • results: 研究发现,使用DGM生成的数据可以增加训练数据集的大小,并且可以提高深度学习模型的准确率。CVAE的表现最佳,其生成的数据Error最小。
    Abstract Deep learning (DL) has achieved remarkable successes in many disciplines such as computer vision and natural language processing due to the availability of ``big data''. However, such success cannot be easily replicated in many nuclear engineering problems because of the limited amount of training data, especially when the data comes from high-cost experiments. To overcome such a data scarcity issue, this paper explores the applications of deep generative models (DGMs) that have been widely used for image data generation to scientific data augmentation. DGMs, such as generative adversarial networks (GANs), normalizing flows (NFs), variational autoencoders (VAEs), and conditional VAEs (CVAEs), can be trained to learn the underlying probabilistic distribution of the training dataset. Once trained, they can be used to generate synthetic data that are similar to the training data and significantly expand the dataset size. By employing DGMs to augment TRACE simulated data of the steady-state void fractions based on the NUPEC Boiling Water Reactor Full-size Fine-mesh Bundle Test (BFBT) benchmark, this study demonstrates that VAEs, CVAEs, and GANs have comparable generative performance with similar errors in the synthetic data, with CVAEs achieving the smallest errors. The findings shows that DGMs have a great potential to augment scientific data in nuclear engineering, which proves effective for expanding the training dataset and enabling other DL models to be trained more accurately.
    摘要 深度学习(DL)在许多领域取得了杰出的成就,如计算机视觉和自然语言处理,这主要归功于大量数据的可用性。然而,在核工程领域,由于数据的有限性,特别是高成本实验数据,因此DL模型的训练成本很高。为了解决这个数据缺乏问题,本文研究了使用深度生成模型(DGM)来增强科学数据。DGM包括生成对抗网络(GAN)、正常化流(NF)、变量自动编码器(VAE)和条件VAE(CVAE)等,可以根据训练数据的分布学习下来。一旦训练完成,它们可以生成与训练数据相似的 sintetic 数据,并显著增加数据量。在使用DGM增强TRACE模拟数据的稳态气体含量基于NUPEC沸水堆全size细膙测试(BFBT) benchmark的研究中,这种研究发现了VAE、CVAE和GAN在生成数据中的相似性,CVAE的错误最小。这些发现表明DGM在核工程领域有很大的潜力,可以增强科学数据,并且可以帮助DL模型更加准确地训练。

Modeling Random Networks with Heterogeneous Reciprocity

  • paper_url: http://arxiv.org/abs/2308.10113
  • repo_url: None
  • paper_authors: Daniel Cirkovic, Tiandong Wang
  • for: 这个论文主要研究了社交网络中用户之间的信息交换行为,以及这种行为如何影响社交网络的发展。
  • methods: 作者提出了一种基于偏好附属的模型,用于模拟社交网络中不同水平的reciprocal行为。该模型考虑了用户吸引受户的偏好,以及用户之间的不同类型的reciprocity行为。
  • results: 作者通过对Facebook墙post网络数据进行分析,发现存在多个用户群体,每个群体都有不同的回快行为模式。模型能够捕捉Facebook数据中实际度分布的重创尾性特征,并且可以Identify不同用户群体的特征。
    Abstract Reciprocity, or the tendency of individuals to mirror behavior, is a key measure that describes information exchange in a social network. Users in social networks tend to engage in different levels of reciprocal behavior. Differences in such behavior may indicate the existence of communities that reciprocate links at varying rates. In this paper, we develop methodology to model the diverse reciprocal behavior in growing social networks. In particular, we present a preferential attachment model with heterogeneous reciprocity that imitates the attraction users have for popular users, plus the heterogeneous nature by which they reciprocate links. We compare Bayesian and frequentist model fitting techniques for large networks, as well as computationally efficient variational alternatives. Cases where the number of communities are known and unknown are both considered. We apply the presented methods to the analysis of a Facebook wallpost network where users have non-uniform reciprocal behavior patterns. The fitted model captures the heavy-tailed nature of the empirical degree distributions in the Facebook data and identifies multiple groups of users that differ in their tendency to reply to and receive responses to wallposts.
    摘要 互助性(reciprocity),或社交网络中个体响应行为的倾向,是社交网络中信息交换的关键指标。社交网络中的用户们通常在不同的水平上进行反馈行为。不同的反馈行为可能表明社交网络中存在不同的社区,这些社区在链接复制速率上有不同的偏好。在这篇论文中,我们开发了用于模型社交网络中多种反馈行为的方法ологи。特别是,我们提出了带有不同reciprocity的偏好附着模型,该模型模拟用户对流行用户的吸引,以及不同的反馈行为。我们对大型网络中的bayesian和频率统计方法进行比较,以及计算效率高的变量替代方法。我们还考虑了知道和不知道社区数量的两种情况。我们在Facebook墙上的墙posts网络中应用了提出的方法,并发现了墙posts网络中用户的反馈行为具有重 tailed性特征,并分化出多个用户群体,这些用户群体在链接复制上有不同的偏好。

Robust Mixture-of-Expert Training for Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2308.10110
  • repo_url: https://github.com/optml-group/robust-moe-cnn
  • paper_authors: Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu
  • for: 这篇论文的目的是探讨如何将对抗式训练(Adversarial Training,AT) Mechanism 应用到具有混合专家(Mixture of Experts,MoE)架构的卷积神经网络(Convolutional Neural Networks,CNNs)中,以提高它们的抗衰弱性。
  • methods: 这篇论文使用了一种称为 AdvMoE 的新的对抗式训练框架,它将对抗式训练与MoE架构结合,以提高卷积神经网络的抗衰弱性。
  • results: 这篇论文的结果显示,使用 AdvMoE 框架可以将卷积神经网络的抗衰弱性提高 1% ~ 4%,并且可以降低 inference 成本超过 50%。
    Abstract Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model? Can we robustly train it like an ordinary CNN model? Our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. To better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: Robustness of routers (i.e., gating functions to select data-specific experts) and robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). Our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. The effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. We find that AdvMoE achieves 1% ~ 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50% inference cost reduction. Codes are available at https://github.com/OPTML-Group/Robust-MoE-CNN.
    摘要 新型的稀疑隐藏 gates mixture of expert (MoE) 模型,已经展示出高精度和高效的模型推理能力。 despite the growing popularity of MoE, little work has investigated its potential to advance convolutional neural networks (CNNs), especially in the area of adversarial robustness. since the lack of robustness has become one of the main obstacles for CNNs, in this paper we ask: how to adversarially robustify a CNN-based MoE model? can we train it like an ordinary CNN model? our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. to better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: the robustness of routers (i.e., gating functions to select data-specific experts) and the robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. thus, we propose a new router-expert alternating adversarial training framework for MoE, termed AdvMoE. the effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. we find that AdvMoE achieves 1% ~ 4% adversarial robustness improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50% inference cost reduction. codes are available at https://github.com/OPTML-Group/Robust-MoE-CNN.

An Online Multiple Kernel Parallelizable Learning Scheme

  • paper_url: http://arxiv.org/abs/2308.10101
  • repo_url: None
  • paper_authors: Emilio Ruiz-Moreno, Baltasar Beferull-Lozano
  • for: 这 paper 的目的是提出一种可扩展性的多 kernel 学习方法,以降低 kernel 选择的偏见。
  • methods: 这 paper 使用了多 kernel 学习方法,其中每个 kernel 都是一种不同的特征选择方法。
  • results: 实验表明,这 paper 的提出的多 kernel 学习方法可以在数据丰富任务中提高解决效果,并且可以平行计算,以便分布计算负担。
    Abstract The performance of reproducing kernel Hilbert space-based methods is known to be sensitive to the choice of the reproducing kernel. Choosing an adequate reproducing kernel can be challenging and computationally demanding, especially in data-rich tasks without prior information about the solution domain. In this paper, we propose a learning scheme that scalably combines several single kernel-based online methods to reduce the kernel-selection bias. The proposed learning scheme applies to any task formulated as a regularized empirical risk minimization convex problem. More specifically, our learning scheme is based on a multi-kernel learning formulation that can be applied to widen any single-kernel solution space, thus increasing the possibility of finding higher-performance solutions. In addition, it is parallelizable, allowing for the distribution of the computational load across different computing units. We show experimentally that the proposed learning scheme outperforms the combined single-kernel online methods separately in terms of the cumulative regularized least squares cost metric.
    摘要 “ kernel 希尔伯特空间基于方法的性能复制性受选择 reproduce kernel 的影响。选择合适的 reproduce kernel 可以是具有挑战性和计算强度的,特别是在没有关于解决空间的先验信息的情况下。在这篇论文中,我们提出了一种学习方案,可以可扩展性地组合多个单kernel-based 在线方法,以降低 kernel-selection 偏见。该学习方案适用于任何形式为正则化empirical risk minimization convex问题。更 Specifically,我们的学习方案基于多kernel learning 形式,可以扩大任何单kernel 解决空间,从而增加高性能解决方案的可能性。此外,它可以并行化,以分配计算负担到不同的计算单元。我们实验表明,提出的学习方案在累积正则化最小二乘Cost metric上超过了单个单kernel 在线方法的总和。”

Geometric instability of graph neural networks on large graphs

  • paper_url: http://arxiv.org/abs/2308.10099
  • repo_url: https://github.com/brs96/geometric-instability-gnn-large-graphs
  • paper_authors: Emily Morris, Haotian Shen, Weiling Du, Muhammad Hamza Sajjad, Borun Shi
  • for: 这 paper investigate 图 neural network (GNN) 生成的嵌入的几何不稳定性。
  • methods: 该 paper 提出了一种简单、高效的图根本的图agram Gram Index (GGI) 来度量这种不稳定性,该方法是卷积、旋转、平移和评估顺序无关的。
  • results: 该 paper 通过使用 GGI 来研究 GNN 嵌入在大图上的不稳定性行为,并对节点分类和链接预测进行了研究。
    Abstract We analyse the geometric instability of embeddings produced by graph neural networks (GNNs). Existing methods are only applicable for small graphs and lack context in the graph domain. We propose a simple, efficient and graph-native Graph Gram Index (GGI) to measure such instability which is invariant to permutation, orthogonal transformation, translation and order of evaluation. This allows us to study the varying instability behaviour of GNN embeddings on large graphs for both node classification and link prediction.
    摘要 我们分析图 neural network (GNN) 生成的嵌入的几何不稳定性。现有的方法只适用于小图,缺乏图域上的上下文。我们提议一个简单、高效、图native的图agram Gram Index (GGI) 来测量这种不稳定性,该指标对Permutation、正交变换、翻译和评估顺序具有抗变换性。这使得我们可以研究大图上 GNN 嵌入的不同不稳定行为,包括节点分类和链接预测。

  • paper_url: http://arxiv.org/abs/2308.10098
  • repo_url: None
  • paper_authors: Mohammad Sadegh Salehi, Subhadip Mukherjee, Lindon Roberts, Matthias J. Ehrhardt
  • for: 这篇论文的目的是解决具有多个传播常数的泛化正规化项目中的偏好参数配置问题。
  • methods: 本文使用层次学习来学习传播常数的偏好参数,并使用批量学习来找到适当的传播常数。
  • results: 本文的 numrical experiments 显示了对于具有多个传播常数的泛化正规化项目中的偏好参数配置问题,提供了一个可证的不精准的内推搜索法,并且可以自动决定需要的精度。
    Abstract In various domains within imaging and data science, particularly when addressing tasks modeled utilizing the variational regularization approach, manually configuring regularization parameters presents a formidable challenge. The difficulty intensifies when employing regularizers involving a large number of hyperparameters. To overcome this challenge, bilevel learning is employed to learn suitable hyperparameters. However, due to the use of numerical solvers, the exact gradient with respect to the hyperparameters is unattainable, necessitating the use of methods relying on approximate gradients. State-of-the-art inexact methods a priori select a decreasing summable sequence of the required accuracy and only assure convergence given a sufficiently small fixed step size. Despite this, challenges persist in determining the Lipschitz constant of the hypergradient and identifying an appropriate fixed step size. Conversely, computing exact function values is not feasible, impeding the use of line search. In this work, we introduce a provably convergent inexact backtracking line search involving inexact function evaluations and hypergradients. We show convergence to a stationary point of the loss with respect to hyperparameters. Additionally, we propose an algorithm to determine the required accuracy dynamically. Our numerical experiments demonstrate the efficiency and feasibility of our approach for hyperparameter estimation in variational regularization problems, alongside its robustness in terms of the initial accuracy and step size choices.
    摘要 在各种图像和数据科学领域中,特别是使用变量正则化方法进行任务模型化时,手动配置正则化参数是一项具有挑战性的任务。难度增加了当使用含有大量超参数的正则化器。为了解决这个挑战,我们使用二级学习来学习适当的超参数。然而,由于使用数值解 sols,无法获得正则化器中的精确梯度,因此需要使用approximate gradients的方法。现有的state-of-the-art不精确方法会选择一个减少的总和序列,并且只有在具有足够小的固定步长时才能保证收敛。然而,在确定梯度Lipsilon constant和适当的固定步长方面,还存在挑战。此外,计算精确函数值是不可能的,这阻碍了使用梯度下降法。在这种情况下,我们介绍了一种可证明收敛的不精确返回搜索,该搜索包括不精确函数评估和梯度。我们表明,该搜索会收敛到loss中的超参数中的站点点。此外,我们还提出了一种动态确定所需的精度的算法。我们的数值实验表明,我们的方法可以高效地进行超参数估算,并且具有鲁棒性,即初始精度和步长选择的影响。

MLOps: A Review

  • paper_url: http://arxiv.org/abs/2308.10908
  • repo_url: https://github.com/jenningst/ecommerce-ops
  • paper_authors: Samar Wazir, Gautam Siddharth Kashyap, Parag Saxena
  • for: 本研究探讨了机器学习操作(MLOps)方法的重要性,以帮助开发者更好地创建使用机器学习算法的软件。
  • methods: 作者评估了多种MLOps方法的特性和操作性,以选择适合特定项目的最佳工具结构。
  • results: 研究发现现有的MLOps方法尚未具备完全有效的自动化功能,人类参与度仍然较高。
    Abstract Recently, Machine Learning (ML) has become a widely accepted method for significant progress that is rapidly evolving. Since it employs computational methods to teach machines and produce acceptable answers. The significance of the Machine Learning Operations (MLOps) methods, which can provide acceptable answers for such problems, is examined in this study. To assist in the creation of software that is simple to use, the authors research MLOps methods. To choose the best tool structure for certain projects, the authors also assess the features and operability of various MLOps methods. A total of 22 papers were assessed that attempted to apply the MLOps idea. Finally, the authors admit the scarcity of fully effective MLOps methods based on which advancements can self-regulate by limiting human engagement.
    摘要 近期,机器学习(ML)已成为广泛接受的方法,迅速进步的方法。由于它使用计算方法教育机器并生成可接受的答案。本研究 изучает机器学习操作(MLOps)方法的重要性,这些方法可以为这些问题提供可接受的答案。为便于创建易于使用的软件,作者研究了 MLOps 方法。为选择特定项目最佳工具结构,作者也评估了不同 MLOps 方法的特性和操作性。总共评估了 22 篇尝试应用 MLOps 想法的论文。最后,作者承认机器学习操作方法完全自主进步的缺乏,即限制人类参与度。Note: Please keep in mind that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Securing Pathways with Orthogonal Robots

  • paper_url: http://arxiv.org/abs/2308.10093
  • repo_url: None
  • paper_authors: Hamid Hoorfar, Faraneh Fathi, Sara Moshtaghi Largani, Alireza Bagheri
  • for: 保护路径的重要性在多个领域,如城市规划、交通、监视和安全中具有极高的意义。
  • methods: 本文提出了一种创新的方法,利用正交机器人来保护路径。研究专注于最小化正交机器人数量来有效地监视正交区域。
  • results: 研究表明,可以在线性时间内确定最小化正交机器人数量。但是,对于简单多边形的普通可见情况,即使是正交情况,则是NP困难的。研究强调了机器人可以在多边形的边界或内部任意地放置。
    Abstract The protection of pathways holds immense significance across various domains, including urban planning, transportation, surveillance, and security. This article introduces a groundbreaking approach to safeguarding pathways by employing orthogonal robots. The study specifically addresses the challenge of efficiently guarding orthogonal areas with the minimum number of orthogonal robots. The primary focus is on orthogonal pathways, characterized by a path-like dual graph of vertical decomposition. It is demonstrated that determining the minimum number of orthogonal robots for pathways can be achieved in linear time. However, it is essential to note that the general problem of finding the minimum number of robots for simple polygons with general visibility, even in the orthogonal case, is known to be NP-hard. Emphasis is placed on the flexibility of placing robots anywhere within the polygon, whether on the boundary or in the interior.
    摘要 保护路径具有广泛的应用场景,包括城市规划、交通、监测和安全等领域。这篇文章介绍了一种创新的路径保护方法,利用正交机器人。研究特点在于最小化正交机器人数量,以确保有效地监测正交区域。研究主要关注正交路径,即垂直分解图中的路径类 dual graph。实验表明,可以在线时确定最小正交机器人数量。然而,需要注意的是,找到最小机器人数量的普通多边形问题,即NP-hard问题。文章强调机器人的位置 flexibility,可以在边界或内部的任何位置进行布置。

Minimizing Turns in Watchman Robot Navigation: Strategies and Solutions

  • paper_url: http://arxiv.org/abs/2308.10090
  • repo_url: None
  • paper_authors: Hamid Hoorfar, Sara Moshtaghi Largani, Reza Rahimi, Alireza Bagheri
  • for: 本研究旨在提出一种高效的直线时间算法,用于解决监视人员路径问题(OWRP),以便在机器人系统中优化监视和护卫任务。
  • methods: 本研究使用了一种简单的直线时间算法,基于监视人员路径问题的假设环境是卷积的。
  • results: 研究发现,该算法可以在线性时间内解决监视人员路径问题,并且可以减少路径中转次数,从而提高机器人的涵盖率和耗时效率。
    Abstract The Orthogonal Watchman Route Problem (OWRP) entails the search for the shortest path, known as the watchman route, that a robot must follow within a polygonal environment. The primary objective is to ensure that every point in the environment remains visible from at least one point on the route, allowing the robot to survey the entire area in a single, continuous sweep. This research places particular emphasis on reducing the number of turns in the route, as it is crucial for optimizing navigation in watchman routes within the field of robotics. The cost associated with changing direction is of significant importance, especially for specific types of robots. This paper introduces an efficient linear-time algorithm for solving the OWRP under the assumption that the environment is monotone. The findings of this study contribute to the progress of robotic systems by enabling the design of more streamlined patrol robots. These robots are capable of efficiently navigating complex environments while minimizing the number of turns. This advancement enhances their coverage and surveillance capabilities, making them highly effective in various real-world applications.
    摘要 《Orthogonal Watchman Route Problem(OWRP)》的研究目标是找到在多边形环境中最短的路径,称为“看守路径”,以确保机器人可以在一次连续扫描所有环境点。主要目标是尽量减少路径中转数量,因为这对于某些机器人类型来说非常重要。这篇论文提出了一种高效的线性时间算法,用于解决在 monotone 环境下的 OWRP。这些发现对机器人系统的进步做出了贡献,使得机器人可以更加高效地在复杂环境中导航,最小化转弯数量。这种进步提高了机器人的覆盖和监测能力,使其在各种实际应用中表现出色。

Contrastive Learning for Non-Local Graphs with Multi-Resolution Structural Views

  • paper_url: http://arxiv.org/abs/2308.10077
  • repo_url: None
  • paper_authors: Asif Khan, Amos Storkey
  • for: 本文主要针对于疑似诈骗者检测和蛋白质功能预测等应用场景,掌握积极的图结构信息是关键。
  • methods: 本文提出了一种基于多视图对照学习的图表示学习方法,利用多个视图作为增强器,捕捉到积极图中结构相似性,从而揭示隐藏的关系和相似性。
  • results: 对于synthetic和实际结构数据集,本文的方法与基eline进行比较,达到了$16.06%$的提升(在Cornell数据集上)、$3.27%$的提升(在Texas数据集上)和$8.04%$的提升(在Wisconsin数据集上)。此外,本文在邻近任务上表现出了superior的性能,证明了其在揭示结构信息和提高下游应用中的效果。
    Abstract Learning node-level representations of heterophilic graphs is crucial for various applications, including fraudster detection and protein function prediction. In such graphs, nodes share structural similarity identified by the equivalence of their connectivity which is implicitly encoded in the form of higher-order hierarchical information in the graphs. The contrastive methods are popular choices for learning the representation of nodes in a graph. However, existing contrastive methods struggle to capture higher-order graph structures. To address this limitation, we propose a novel multiview contrastive learning approach that integrates diffusion filters on graphs. By incorporating multiple graph views as augmentations, our method captures the structural equivalence in heterophilic graphs, enabling the discovery of hidden relationships and similarities not apparent in traditional node representations. Our approach outperforms baselines on synthetic and real structural datasets, surpassing the best baseline by $16.06\%$ on Cornell, $3.27\%$ on Texas, and $8.04\%$ on Wisconsin. Additionally, it consistently achieves superior performance on proximal tasks, demonstrating its effectiveness in uncovering structural information and improving downstream applications.
    摘要

ILCAS: Imitation Learning-Based Configuration-Adaptive Streaming for Live Video Analytics with Cross-Camera Collaboration

  • paper_url: http://arxiv.org/abs/2308.10068
  • repo_url: None
  • paper_authors: Duo Wu, Dayou Zhang, Miao Zhang, Ruoyu Zhang, Fangxin Wang, Shuguang Cui
  • For: 这个研究的目的是为了提高现场视频分析(VA)中的深度神经网络(DNN)的准确率和资源效率。* Methods: 这个研究使用了循环学习(IL)和动态视觉感知(motion feature maps)来应对现场视频的内容变化,并且将多标的相机联合使用以利用空间时间相互联系。* Results: 实验结果显示,与现有解决方案相比,这个方法可以提高了2-20.9%的准确率和19.9-85.3%的块上传延误。
    Abstract The high-accuracy and resource-intensive deep neural networks (DNNs) have been widely adopted by live video analytics (VA), where camera videos are streamed over the network to resource-rich edge/cloud servers for DNN inference. Common video encoding configurations (e.g., resolution and frame rate) have been identified with significant impacts on striking the balance between bandwidth consumption and inference accuracy and therefore their adaption scheme has been a focus of optimization. However, previous profiling-based solutions suffer from high profiling cost, while existing deep reinforcement learning (DRL) based solutions may achieve poor performance due to the usage of fixed reward function for training the agent, which fails to craft the application goals in various scenarios. In this paper, we propose ILCAS, the first imitation learning (IL) based configuration-adaptive VA streaming system. Unlike DRL-based solutions, ILCAS trains the agent with demonstrations collected from the expert which is designed as an offline optimal policy that solves the configuration adaption problem through dynamic programming. To tackle the challenge of video content dynamics, ILCAS derives motion feature maps based on motion vectors which allow ILCAS to visually ``perceive'' video content changes. Moreover, ILCAS incorporates a cross-camera collaboration scheme to exploit the spatio-temporal correlations of cameras for more proper configuration selection. Extensive experiments confirm the superiority of ILCAS compared with state-of-the-art solutions, with 2-20.9% improvement of mean accuracy and 19.9-85.3% reduction of chunk upload lag.
    摘要 高精度和资源占用深度神经网络(DNN)在实时视频分析(VA)中广泛应用,其中电影摄像头推送到edge/云服务器进行DNN推理。常见的视频编码配置(如分辨率和帧率)对于寻求平衡带宽消耗和推理精度有显著影响,因此其适应方案成为优化的焦点。然而,先前的 Profiling 基本解决方案受到高 Profiling 成本的限制,而现有的深度强化学习(DRL)基本解决方案可能因为使用固定奖励函数进行训练代理人而达到低效果,不能适应不同场景中的应用目标。在这篇论文中,我们提出了ILCAS,首个基于依达学习(IL)的配置适应VA流动系统。与DRL基本解决方案不同,ILCAS 通过从专家设计的精度优化策略收集示例来训练代理人,通过动态Programming解决配置适应问题。为了解决视频内容变化的挑战,ILCAS derivates Motion Feature Maps based on motion vectors,allowing ILCAS to "perceive" video content changes visually。此外,ILCAS 还采用相机间协作方案,以利用相机之间的空间时间相关性进行更加合适的配置选择。实验证明ILCAS 比现有解决方案有2-20.9%的提高精度和19.9-85.3%的减少块上传延迟。