cs.LG - 2023-08-21

Graph Neural Bandits

  • paper_url: http://arxiv.org/abs/2308.10808
  • repo_url: https://github.com/lasgroup/GNNBO
  • paper_authors: Yunzhe Qi, Yikun Ban, Jingrui He
  • for: 本研究旨在提出一种基于图神经网络的推荐框架,以优化推荐策略并解决推荐问题中的挖掘-探索之间的矛盾。
  • methods: 本研究使用图神经网络模型来建模用户之间的协同关系,并分别使用GNN-based模型来适应不同的探索和利用策略。
  • results: 经过理论分析和实验研究,本研究在多个真实数据集上与现有基elines进行比较,并证明了GNB框架的效果。
    Abstract Contextual bandits algorithms aim to choose the optimal arm with the highest reward out of a set of candidates based on the contextual information. Various bandit algorithms have been applied to real-world applications due to their ability of tackling the exploitation-exploration dilemma. Motivated by online recommendation scenarios, in this paper, we propose a framework named Graph Neural Bandits (GNB) to leverage the collaborative nature among users empowered by graph neural networks (GNNs). Instead of estimating rigid user clusters as in existing works, we model the "fine-grained" collaborative effects through estimated user graphs in terms of exploitation and exploration respectively. Then, to refine the recommendation strategy, we utilize separate GNN-based models on estimated user graphs for exploitation and adaptive exploration. Theoretical analysis and experimental results on multiple real data sets in comparison with state-of-the-art baselines are provided to demonstrate the effectiveness of our proposed framework.
    摘要 Contextual bandits algorithms target 选择最佳臂 基于候选人的各种情况信息。各种bandit算法在实际应用中得到了广泛应用,因为它们可以解决探索-投入之间的矛盾。在这篇论文中,我们提出了一个名为图ael Neural Bandits(GNB)的框架,以利用用户 empowered by graph neural networks(GNNs)的协同性。而不是现有的rigid用户群集 estimation,我们通过估算用户图中的exploitation和exploration的协同效果来模型"细化的"协同效果。然后,为了细化推荐策略,我们使用了分开的 GNN-based 模型来适应 estimated user graphs 中的exploitation和适应性。我们提供了理论分析和多个实际数据集的实验结果,以证明我们提出的框架的有效性。

DynED: Dynamic Ensemble Diversification in Data Stream Classification

  • paper_url: http://arxiv.org/abs/2308.10807
  • repo_url: https://github.com/soheilabadifard/dyned
  • paper_authors: Soheil Abadifard, Sepehr Bakhshi, Sanaz Gheibuni, Fazli Can
  • for: 提高数据流环境中分类精度,因为数据分布变化导致模型性能下降。
  • methods: 使用MMR方法动态组合多个组件,以提高 ensemble 的各种性能和多样性。
  • results: 在实验中,提出的方法(DynED)与五种基准方法相比,平均含义准确率更高。
    Abstract Ensemble methods are commonly used in classification due to their remarkable performance. Achieving high accuracy in a data stream environment is a challenging task considering disruptive changes in the data distribution, also known as concept drift. A greater diversity of ensemble components is known to enhance prediction accuracy in such settings. Despite the diversity of components within an ensemble, not all contribute as expected to its overall performance. This necessitates a method for selecting components that exhibit high performance and diversity. We present a novel ensemble construction and maintenance approach based on MMR (Maximal Marginal Relevance) that dynamically combines the diversity and prediction accuracy of components during the process of structuring an ensemble. The experimental results on both four real and 11 synthetic datasets demonstrate that the proposed approach (DynED) provides a higher average mean accuracy compared to the five state-of-the-art baselines.
    摘要 ensemble 方法通常在分类 task 中使用,因为它们的表现非常出色。在数据流环境中达到高精度是一项具有挑战性的任务,因为数据分布的变化可能会导致模型的训练失败。更多的 ensemble 组件可以提高预测精度在这些设置下。 despite ensemble 中的组件的多样性,不 todas 都会如期提供贡献。这种情况需要一种方法来选择表现出色并且多样的组件。我们提出了一种基于 MMR(最大最大关注度)的ensemble construction 和维护方法,可以在结构 ensemble 时间动态结合多样性和预测精度的组件。实验结果表明,提出的方法(DynED)在四个实际数据集和11个 sintetic 数据集上的平均含义精度比五种现有基准高。

Differentiable Frank-Wolfe Optimization Layer

  • paper_url: http://arxiv.org/abs/2308.10806
  • repo_url: None
  • paper_authors: Zixuan Liu, Liu Liu, Xueqian Wang, Peilin Zhao
  • for: 提高大规模问题中的可 diferenciable优化效率
  • methods: 基于Frank-Wolfe算法的可 differentiable层(DFWLayer)
  • results: 提供了一种高效的可 diferenciable优化方法,可以在大规模问题中实现紧跟约束和高速计算。
    Abstract Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. The existing methods leverages the optimality conditions and implicit function theorem to obtain the Jacobian matrix of the output, which increases the computational cost and limits the application of differentiable optimization. In addition, some non-differentiable constraints lead to more challenges when using prior differentiable optimization layers. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to a efficient way of dealing with large-scale problems. Theoretically, we establish a bound on the suboptimality gap of the DFWLayer in the context of l1-norm constraints. Experimental assessments demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints. Moreover, it surpasses the baselines in both forward and backward computational speeds.
    摘要 differential optimization 已经收到了广泛关注,因为它在机器学习领域中的神经网络上发挥了基本作用。现有的方法利用优化条件和隐函数定理来获取输出 Jacobian 矩阵,这会增加计算成本并限制 differentiable optimization 的应用。此外,一些非 differentiable 约束会导致使用先前的 differentiable optimization 层更加困难。本文提出了一个 differentiable 层,名为 Differentiable Frank-Wolfe Layer (DFWLayer),通过折衣 Frank-Wolfe 算法,这是一种可以解决约束优化问题的优化算法,不需要投影和卷积矩阵计算,因此可以更好地处理大规模问题。我们在理论上也设置了 l1-norm 约束下 DFWLayer 的优化误差 bound。实验评估表明,DFWLayer 不仅可以达到竞争性的解和梯度准确度,还能够一致地遵循约束。此外,它在前进和后退计算速度上也超过了基eline。

Stabilizing Unsupervised Environment Design with a Learned Adversary

  • paper_url: http://arxiv.org/abs/2308.10797
  • repo_url: https://github.com/facebookresearch/dcd
  • paper_authors: Ishita Mediratta, Minqi Jiang, Jack Parker-Holder, Michael Dennis, Eugene Vinitsky, Tim Rocktäschel
  • for: train generally-capable agents and design training tasks that facilitate broad generalization and robustness to environment variations
  • methods: reinforcement learning (RL) to train a teacher policy to design tasks from scratch
  • results: proposed solutions to several key shortcomings of PAIRED, enabling PAIRED to match or exceed state-of-the-art methods in several established challenging procedurally-generated environments
    Abstract A key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations. This challenge motivates the problem setting of Unsupervised Environment Design (UED), whereby a student agent trains on an adaptive distribution of tasks proposed by a teacher agent. A pioneering approach for UED is PAIRED, which uses reinforcement learning (RL) to train a teacher policy to design tasks from scratch, making it possible to directly generate tasks that are adapted to the agent's current capabilities. Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance. Thus, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.
    摘要 一个主要挑战在训练通用的代理人是设计训练任务,以促进广泛的普遍化和环境变化的响应力。这个挑战驱使了无监督环境设计(UED)的问题设定,其中学生代理人通过教师代理人提出的适应性任务来训练。一种开创性的方法是PAIRED,它使用征激学习(RL)来训练教师政策,从scratch生成任务,使得可以直接生成适应到代理人的现有能力的任务。 despite its strong theoretical backing, PAIRED suffers from several challenges that hinder its practical performance. Therefore, state-of-the-art methods currently rely on curation and mutation rather than generation of new tasks. In this work, we investigate several key shortcomings of PAIRED and propose solutions for each shortcoming. As a result, we make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments, including a partially-observed maze navigation task and a continuous-control car racing environment. We believe this work motivates a renewed emphasis on UED methods based on learned models that directly generate challenging environments, potentially unlocking more open-ended RL training and, as a result, more general agents.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I can provide the translation in that version as well.

MGMAE: Motion Guided Masking for Video Masked Autoencoding

  • paper_url: http://arxiv.org/abs/2308.10794
  • repo_url: None
  • paper_authors: Bingkun Huang, Zhiyu Zhao, Guozhen Zhang, Yu Qiao, Limin Wang
  • For: This paper aims to improve the performance of video masked autoencoding by introducing a motion guided masking strategy to incorporate motion information during pre-training.* Methods: The proposed method, called Motion Guided Masked Autoencoder (MGMAE), uses an online efficient optical flow estimator and backward masking map warping strategy to build a temporal consistent masking volume and track unmasked tokens in time.* Results: The proposed MGMAE outperforms the original VideoMAE on the Something-Something V2 and Kinetics-400 datasets, and provides visualization analysis to illustrate the effectiveness of the motion-adaptive sampling of temporal consistent cubes for video pre-training.Here’s the Chinese version of the three points:* For: 这篇论文目标是提高视频掩码自动编码器的性能,通过引入运动导向掩码策略来在预训练中包含运动信息。* Methods: 提议的方法是动态掩码自动编码器(MGMAE),使用在线高效的滤色流估计器和倒掩码地图折叠策略来建立时间一致的掩码量和跟踪时间中的未掩码标签。* Results: MGMAE在Something-Something V2和Kinetics-400数据集上表现出优于原始VideoMAE,并提供视觉分析来证明该方法在时间一致的掩码采样中更有效地进行视频预训练。
    Abstract Masked autoencoding has shown excellent performance on self-supervised video representation learning. Temporal redundancy has led to a high masking ratio and customized masking strategy in VideoMAE. In this paper, we aim to further improve the performance of video masked autoencoding by introducing a motion guided masking strategy. Our key insight is that motion is a general and unique prior in video, which should be taken into account during masked pre-training. Our motion guided masking explicitly incorporates motion information to build temporal consistent masking volume. Based on this masking volume, we can track the unmasked tokens in time and sample a set of temporal consistent cubes from videos. These temporal aligned unmasked tokens will further relieve the information leakage issue in time and encourage the MGMAE to learn more useful structure information. We implement our MGMAE with an online efficient optical flow estimator and backward masking map warping strategy. We perform experiments on the datasets of Something-Something V2 and Kinetics-400, demonstrating the superior performance of our MGMAE to the original VideoMAE. In addition, we provide the visualization analysis to illustrate that our MGMAE can sample temporal consistent cubes in a motion-adaptive manner for more effective video pre-training.
    摘要 《面具自编码》已经在无监督视频表征学习中展现出色的表现。视频中的时间重复性导致了高的面具率和定制化面具策略,在VideoMAE中。在这篇论文中,我们希望进一步提高视频面具自编码的性能,通过引入运动指导的面具策略。我们的关键发现是,运动是视频中的一致性和特有的特征,应该在面具预训练中考虑。我们运动指导的面具Explicitly incorporates motion information to build temporal consistent masking volume。基于这个遮盖体积,我们可以在时间上跟踪未遮盖的 токен,并从视频中抽取一组时间相对的一致的 куби。这些时间相对的一致的 куби将进一步减轻时间泄露问题,使MGMAE学习更有用的结构信息。我们实现了我们的MGMAE,使用了在线高效的Optical flow estimator和后向遮盖Map折叠策略。我们在Something-Something V2和Kinetics-400 datasets上进行了实验,示出了我们MGMAE比原始VideoMAE的更高性能。此外,我们还提供了视觉分析,以 Illustrate that our MGMAE can sample temporal consistent cubes in a motion-adaptive manner for more effective video pre-training.

Instruction Tuning for Large Language Models: A Survey

  • paper_url: http://arxiv.org/abs/2308.10792
  • repo_url: None
  • paper_authors: Shengyu Zhang, Linfeng Dong, Xiaoya Li, Sen Zhang, Xiaofei Sun, Shuhe Wang, Jiwei Li, Runyi Hu, Tianwei Zhang, Fei Wu, Guoyin Wang
  • for: 本研究审视了大语言模型(LLM)的指令调整(IT)技术,以提高LLM的能力和可控性。
  • methods: 本研究使用了一种系统性的Literature Review方法,涵盖了IT的通用方法、指令集建构、模型训练以及不同Modalities、领域和应用程序。
  • results: 研究发现了一些关键因素影响IT的结果(如生成指令输出和指令集大小),以及潜在的潜在问题和批评,并提出了一些可能的解决方案。
    Abstract This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
    摘要 The paper covers the general methodology of IT, IT dataset construction, IT model training, and applications across different modalities, domains, and applications. It also discusses factors that affect IT outcomes, such as generating instruction outputs and dataset size. Additionally, the paper reviews potential pitfalls and criticisms of IT, as well as efforts to address current deficiencies and suggest promising research directions.Here is the translation in Simplified Chinese:这篇论文回顾了大语言模型(LLM)的指令调整(IT)研究,这是一种提高 LLM 的能力和可控性的关键技术。IT 通过在监督下将 LLM 训练在指令和输出对的数据集上,bridge LLM 的下一个词预测目标和用户的指令遵从目标。论文涵盖了 IT 的总方法、指令集成构建、模型训练和应用于不同的模式、领域和应用程序。它还讨论了 IT 的结果受到的因素,如生成指令输出和数据集大小。此外,论文还回顾了 IT 的潜在缺陷和批评,以及改进现有策略的努力。最后,论文还提出了一些有前途的研究方向。

Zero- and Few-Shot Prompting with LLMs: A Comparative Study with Fine-tuned Models for Bangla Sentiment Analysis

  • paper_url: http://arxiv.org/abs/2308.10783
  • repo_url: None
  • paper_authors: Md. Arid Hasan, Shudipta Das, Afiyat Anjum, Firoj Alam, Anika Anjum, Avijit Sarker, Sheak Rashed Haider Noori
  • for: 这篇论文主要是为了探讨孟加拉语 sentiment analysis 的问题,以及大语言模型在这种语言下的表现。
  • methods: 这篇论文使用了许多不同的语言模型,包括 Flan-T5、GPT-4 和 Bloomz,并进行了比较分析。
  • results: 研究发现,单语言 transformer 型模型在零和几个shot情况下一直表现出色,并且在不同的语言模型中表现最佳。
    Abstract The rapid expansion of the digital world has propelled sentiment analysis into a critical tool across diverse sectors such as marketing, politics, customer service, and healthcare. While there have been significant advancements in sentiment analysis for widely spoken languages, low-resource languages, such as Bangla, remain largely under-researched due to resource constraints. Furthermore, the recent unprecedented performance of Large Language Models (LLMs) in various applications highlights the need to evaluate them in the context of low-resource languages. In this study, we present a sizeable manually annotated dataset encompassing 33,605 Bangla news tweets and Facebook comments. We also investigate zero- and few-shot in-context learning with several language models, including Flan-T5, GPT-4, and Bloomz, offering a comparative analysis against fine-tuned models. Our findings suggest that monolingual transformer-based models consistently outperform other models, even in zero and few-shot scenarios. To foster continued exploration, we intend to make this dataset and our research tools publicly available to the broader research community. In the spirit of further research, we plan to make this dataset and our experimental resources publicly accessible to the wider research community.
    摘要 随着数字世界的快速扩张,情感分析已成为多个领域的关键工具,包括市场营销、政治、客户服务和医疗等。虽然拥有了显著的进步,但低资源语言,如孟加拉语,仍然受到资源限制,未能得到足够的研究。此外,最新的无前例的表现表明需要在低资源语言上评估LLMs。本研究提供了33605个孟加拉语新闻微博和Facebook评论的大量手动标注数据集。我们还进行了零和几个shot在场景下的 zero-和几个shot具体学习,包括Flan-T5、GPT-4和Bloomz等语言模型,并进行了比较分析。我们的发现表明,单语言变换器基本模型在零和几个shot场景下一直表现出色,超越其他模型。为了激发更多的探索,我们计划将这些数据集和研究工具公开提供给更广泛的研究社区。

Sparse Linear Concept Discovery Models

  • paper_url: http://arxiv.org/abs/2308.10782
  • repo_url: https://github.com/konpanousis/conceptdiscoverymodels
  • paper_authors: Konstantinos P. Panousis, Dino Ienco, Diego Marcos
  • for: 提高深度学习模型的解释性和性能
  • methods: 使用对比语言图像模型和单个稀疏线性层
  • results: 比对其他CBM方法更高的准确率和每个例子的概率性数量
    Abstract The recent mass adoption of DNNs, even in safety-critical scenarios, has shifted the focus of the research community towards the creation of inherently intrepretable models. Concept Bottleneck Models (CBMs) constitute a popular approach where hidden layers are tied to human understandable concepts allowing for investigation and correction of the network's decisions. However, CBMs usually suffer from: (i) performance degradation and (ii) lower interpretability than intended due to the sheer amount of concepts contributing to each decision. In this work, we propose a simple yet highly intuitive interpretable framework based on Contrastive Language Image models and a single sparse linear layer. In stark contrast to related approaches, the sparsity in our framework is achieved via principled Bayesian arguments by inferring concept presence via a data-driven Bernoulli distribution. As we experimentally show, our framework not only outperforms recent CBM approaches accuracy-wise, but it also yields high per example concept sparsity, facilitating the individual investigation of the emerging concepts.
    摘要 Translation notes:* "DNNs" Deep Neural Networks* "CBMs" Concept Bottleneck Models* "interpretable" 可解释的* "safety-critical" 安全关键的* "performance degradation" 性能下降* "lower interpretability" 更低的可解释性* "sparse" 稀疏的* "Bayesian arguments" bayesian Arguments* " Bernoulli distribution" Бернулли 分布

Mixed-Integer Projections for Automated Data Correction of EMRs Improve Predictions of Sepsis among Hospitalized Patients

  • paper_url: http://arxiv.org/abs/2308.10781
  • repo_url: None
  • paper_authors: Mehak Arora, Hassan Mortagy, Nathan Dwarshius, Swati Gupta, Andre L. Holder, Rishikesan Kamaleswaran
  • for: 这篇研究旨在提高机器学习(ML)模型在诊断过程中的自动化,但储存在过去研究中的一个显著缺陷是不足以处理电子医疗记录(EMR)数据的错误和偏差。
  • methods: 本研究引入了一种创新的投影方法,让临床专家知识成为领域约束,从而生成重要的元数据,可以在机器学习工作流程中使用。特别是,使用高维混合整数程式来捕捉生物和生物physiological约束,以corrrect пацієnt数据。
  • results: 我们的框架可以在预后检测 sepsepsis 中提高机器学习分类器的性能,AUROC 为 0.865,精度为 0.922,比没有这些投影的模型更好。
    Abstract Machine learning (ML) models are increasingly pivotal in automating clinical decisions. Yet, a glaring oversight in prior research has been the lack of proper processing of Electronic Medical Record (EMR) data in the clinical context for errors and outliers. Addressing this oversight, we introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints, generating important meta-data that can be used in ML workflows. In particular, by using high-dimensional mixed-integer programs that capture physiological and biological constraints on patient vitals and lab values, we can harness the power of mathematical "projections" for the EMR data to correct patient data. Consequently, we measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores". These scores provide insight into the patient's health status and significantly boost the performance of ML classifiers in real-life clinical settings. We validate the impact of our framework in the context of early detection of sepsis using ML. We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
    摘要

  • paper_url: http://arxiv.org/abs/2308.10779
  • repo_url: None
  • paper_authors: Dongjin Lee, Juho Lee, Kijung Shin
  • for: This paper focuses on investigating the vulnerabilities of Temporal Graph Neural Networks (TGNNs) against adversarial attacks, specifically for link prediction tasks on continuous-time dynamic graphs.
  • methods: The proposed method, T-SPEAR, injects edge perturbations into the data that are unnoticeable yet effective in causing malfunction in the victim model. Additionally, the proposed robust training approach, T-SHIELD, uses edge filtering and temporal smoothness to enhance the robustness of the victim model.
  • results: The paper demonstrates that T-SPEAR significantly degrades the victim model’s performance on link prediction tasks, and the attacks are transferable to other TGNNs. Moreover, T-SHIELD effectively filters out adversarial edges and exhibits robustness against adversarial attacks, surpassing the link prediction performance of the naive TGNN by up to 11.2% under T-SPEAR.Here is the format you requested for the results:
  • for: 这篇论文专注于investigating TGNNs中的攻击性 vulnerabilities,具体是针对连接预测任务在时间 kontinuous dynamic graphs上。
  • methods: T-SPEAR方法会将关系变化注入到数据中,这些变化是不可见的,但对犯人模型造成严重的影响。另外,T-SHIELD方法使用边节滤波和时间稳定性来强化犯人模型的抗性。
  • results: 论文显示T-SPEAR可以对犯人模型进行高效的攻击,并且这些攻击可以转移到其他TGNNs上。另外,T-SHIELD方法可以有效地遮盾掉攻击性关系,并且在适当的情况下超过了简单TGNN的连接预测性能。
    Abstract Real-world graphs are dynamic, constantly evolving with new interactions, such as financial transactions in financial networks. Temporal Graph Neural Networks (TGNNs) have been developed to effectively capture the evolving patterns in dynamic graphs. While these models have demonstrated their superiority, being widely adopted in various important fields, their vulnerabilities against adversarial attacks remain largely unexplored. In this paper, we propose T-SPEAR, a simple and effective adversarial attack method for link prediction on continuous-time dynamic graphs, focusing on investigating the vulnerabilities of TGNNs. Specifically, before the training procedure of a victim model, which is a TGNN for link prediction, we inject edge perturbations to the data that are unnoticeable in terms of the four constraints we propose, and yet effective enough to cause malfunction of the victim model. Moreover, we propose a robust training approach T-SHIELD to mitigate the impact of adversarial attacks. By using edge filtering and enforcing temporal smoothness to node embeddings, we enhance the robustness of the victim model. Our experimental study shows that T-SPEAR significantly degrades the victim model's performance on link prediction tasks, and even more, our attacks are transferable to other TGNNs, which differ from the victim model assumed by the attacker. Moreover, we demonstrate that T-SHIELD effectively filters out adversarial edges and exhibits robustness against adversarial attacks, surpassing the link prediction performance of the naive TGNN by up to 11.2% under T-SPEAR.
    摘要 实际世界中的图是动态的,不断发生新的交互,如金融交易在金融网络中。快速图神经网络(TGNN)已经开发出来,以便有效地捕捉动态图中的演变趋势。虽然这些模型已经广泛应用于多个重要领域,但它们的抗击黑客攻击的漏洞仍然未得到了足够的探索。在这篇论文中,我们提出了T-SPEAR,一种简单而有效的黑客攻击方法,用于链接预测任务中的图动态图。具体来说,在受试模型的训练过程之前,我们会注入到数据中的边扰动,这些扰动在我们提出的四个约束下是不可见的,但却足够导致受试模型失效。此外,我们还提出了一种robust训练方法T-SHIELD,用于抗击黑客攻击。通过对边进行筛选和对节点嵌入的时间稳定性来增强受试模型的 robustness。我们的实验研究表明,T-SPEAR会对链接预测任务造成显著的性能下降,而且我们的攻击可以传播到其他不同的TGNNs,即黑客攻击者不知道的模型。此外,我们还证明了T-SHIELD可以有效地筛选出黑客攻击的边,并且在链接预测任务中表现出robustness,超过了 Naive TGNN 的链接预测性能 by up to 11.2% under T-SPEAR.

A Modular and Adaptive System for Business Email Compromise Detection

  • paper_url: http://arxiv.org/abs/2308.10776
  • repo_url: None
  • paper_authors: Jan Brabec, Filip Šrajer, Radek Starosta, Tomáš Sixta, Marc Dupont, Miloš Lenoch, Jiří Menšík, Florian Becker, Jakub Boros, Tomáš Pop, Pavel Novák
  • for: 防茧防诈攻击 (BEC) 和对特定目标进行攻击的电子邮件攻击
  • methods: 复合多种机器学习方法和数据模式,包括自然语言理解 (NLU),检测电子邮件中的 BEC 相关行为,例如文本、图像、元数据和电子邮件交互 контекст
  • results: 在生产环境中证明了超过两年的有效性,并且可以适应不断更新的攻击方法,并且可以提供可解释的鉴定结果
    Abstract The growing sophistication of Business Email Compromise (BEC) and spear phishing attacks poses significant challenges to organizations worldwide. The techniques featured in traditional spam and phishing detection are insufficient due to the tailored nature of modern BEC attacks as they often blend in with the regular benign traffic. Recent advances in machine learning, particularly in Natural Language Understanding (NLU), offer a promising avenue for combating such attacks but in a practical system, due to limitations such as data availability, operational costs, verdict explainability requirements or a need to robustly evolve the system, it is essential to combine multiple approaches together. We present CAPE, a comprehensive and efficient system for BEC detection that has been proven in a production environment for a period of over two years. Rather than being a single model, CAPE is a system that combines independent ML models and algorithms detecting BEC-related behaviors across various email modalities such as text, images, metadata and the email's communication context. This decomposition makes CAPE's verdicts naturally explainable. In the paper, we describe the design principles and constraints behind its architecture, as well as the challenges of model design, evaluation and adapting the system continuously through a Bayesian approach that combines limited data with domain knowledge. Furthermore, we elaborate on several specific behavioral detectors, such as those based on Transformer neural architectures.
    摘要 现代商业电子邮件攻击(BEC)和特攻钓鱼诈骗攻击的发展日益复杂,对全球企业造成巨大挑战。传统的防范邮件和钓鱼攻击的方法已经不能满足现在的需求,因为这些攻击通常与正常的干扰交通混合在一起。新的机器学习技术,特别是自然语言理解(NLU),提供了一个有希望的途径来对抗这些攻击,但在实践中,由于数据可用性、运营成本、解释性要求或需要不断进化系统的要求,需要结合多种方法。我们介绍了CAPE,一个全面和高效的BEC检测系统,已经在生产环境中运行了超过两年。而不是单一的模型,CAPE是一个结合独立的机器学习模型和算法,检测邮件中的BEC相关行为,包括文本、图像、元数据和邮件的通信上下文。这种分解使CAPE的裁决自然可解释。在文章中,我们介绍了CAPE的设计原则和限制,以及模型设计、评估和持续更新的挑战。此外,我们还详细介绍了一些特定的行为检测器,如基于Transformer神经网络架构的检测器。

GBM-based Bregman Proximal Algorithms for Constrained Learning

  • paper_url: http://arxiv.org/abs/2308.10767
  • repo_url: https://github.com/zhenweilin/constrainedgbm
  • paper_authors: Zhenwei Lin, Qi Deng
  • for: 这个研究是为了开发一个能够满足更加复杂的学习任务的新型机器学习算法,特别是适用于不具有投影构造的条件学习任务,如Neyman-Pearson类别和公平类别。
  • methods: 这个研究使用了Bregman proximal算法来适应受条件学习任务限制的机器学习问题。它 introduce了一个新的Bregman主要-副主要方法,并且在几何函数下具有全球最佳性保证。在非凸函数下,我们显示了我们的算法仍然能够在Bregman proximal点架构下获得良好的效果。
  • results: 我们提供了丰富的实验证据,证明了我们的算法框架在NPC和公平类别等条件学习应用中的有效性。而我们的算法框架可以与现有的GBM实现(如XGBoost和LightGBM)集成,不需要更改现有的代码或架构,这使得它具有与现有算法相似的可用性和易用性。
    Abstract As the complexity of learning tasks surges, modern machine learning encounters a new constrained learning paradigm characterized by more intricate and data-driven function constraints. Prominent applications include Neyman-Pearson classification (NPC) and fairness classification, which entail specific risk constraints that render standard projection-based training algorithms unsuitable. Gradient boosting machines (GBMs) are among the most popular algorithms for supervised learning; however, they are generally limited to unconstrained settings. In this paper, we adapt the GBM for constrained learning tasks within the framework of Bregman proximal algorithms. We introduce a new Bregman primal-dual method with a global optimality guarantee when the learning objective and constraint functions are convex. In cases of nonconvex functions, we demonstrate how our algorithm remains effective under a Bregman proximal point framework. Distinct from existing constrained learning algorithms, ours possess a unique advantage in their ability to seamlessly integrate with publicly available GBM implementations such as XGBoost (Chen and Guestrin, 2016) and LightGBM (Ke et al., 2017), exclusively relying on their public interfaces. We provide substantial experimental evidence to showcase the effectiveness of the Bregman algorithm framework. While our primary focus is on NPC and fairness ML, our framework holds significant potential for a broader range of constrained learning applications. The source code is currently freely available at https://github.com/zhenweilin/ConstrainedGBM}{https://github.com/zhenweilin/ConstrainedGBM.
    摘要 As the complexity of learning tasks increases, modern machine learning encounters a new constrained learning paradigm with more intricate and data-driven function constraints. Prominent applications include Neyman-Pearson classification (NPC) and fairness classification, which involve specific risk constraints that make standard projection-based training algorithms unsuitable. Gradient boosting machines (GBMs) are one of the most popular algorithms for supervised learning, but they are generally limited to unconstrained settings. In this paper, we adapt the GBM for constrained learning tasks within the framework of Bregman proximal algorithms. We introduce a new Bregman primal-dual method with a global optimality guarantee when the learning objective and constraint functions are convex. In cases of nonconvex functions, we demonstrate how our algorithm remains effective under a Bregman proximal point framework. Unlike existing constrained learning algorithms, ours has a unique advantage in its ability to seamlessly integrate with publicly available GBM implementations such as XGBoost (Chen and Guestrin, 2016) and LightGBM (Ke et al., 2017), relying exclusively on their public interfaces. We provide substantial experimental evidence to showcase the effectiveness of the Bregman algorithm framework. While our primary focus is on NPC and fairness ML, our framework has significant potential for a broader range of constrained learning applications. The source code is currently freely available at .

To Whom are You Talking? A Deep Learning Model to Endow Social Robots with Addressee Estimation Skills

  • paper_url: http://arxiv.org/abs/2308.10757
  • repo_url: None
  • paper_authors: Carlo Mazzola, Marta Romeo, Francesco Rea, Alessandra Sciutti, Angelo Cangelosi
  • for: 理解人类对话的动态和社会环境中机器人的整合
  • methods: 结合卷积层和LSTM层的卷积神经网络模型,利用发言人的非语言表征来解释话语的收件人
  • results: 模型可以在环境自然的场景中解决对话中的收件人定位问题
    Abstract Communicating shapes our social word. For a robot to be considered social and being consequently integrated in our social environment it is fundamental to understand some of the dynamics that rule human-human communication. In this work, we tackle the problem of Addressee Estimation, the ability to understand an utterance's addressee, by interpreting and exploiting non-verbal bodily cues from the speaker. We do so by implementing an hybrid deep learning model composed of convolutional layers and LSTM cells taking as input images portraying the face of the speaker and 2D vectors of the speaker's body posture. Our implementation choices were guided by the aim to develop a model that could be deployed on social robots and be efficient in ecological scenarios. We demonstrate that our model is able to solve the Addressee Estimation problem in terms of addressee localisation in space, from a robot ego-centric point of view.
    摘要 人类与机器人之间的交流 shapes our 社会言语。如果一个机器人想被认为是社交的,那么它必须理解一些人类之间的交流 dynamics。在这项工作中,我们面临着发言者对象估算(Addressee Estimation)问题,即理解一句话的发言者对象。我们通过利用发言者的非语言性身体姿势来解释和利用深度学习模型,该模型由 convolutional layers 和 LSTM cells 组成,并将图像和发言者的姿势 vectors 作为输入。我们的实现方式受到了在社交机器人上部署模型并在生态环境中高效运行的目标的指导。我们证明了我们的模型可以解决发言者对象估算问题,从机器人自身视角来看。

On the Adversarial Robustness of Multi-Modal Foundation Models

  • paper_url: http://arxiv.org/abs/2308.10741
  • repo_url: None
  • paper_authors: Christian Schlarmann, Matthias Hein
  • for: 保护用户免受恶意内容的误导和宣扬 fake information
  • methods: 使用隐藏式攻击破坏图像,改变多模态基础模型的描述输出
  • results: 显示了恶意内容提供者可以使用这种攻击方法诱导用户访问 malicious websites 或 broadcast fake information,需要对多模态基础模型进行防御措施
    Abstract Multi-modal foundation models combining vision and language models such as Flamingo or GPT-4 have recently gained enormous interest. Alignment of foundation models is used to prevent models from providing toxic or harmful output. While malicious users have successfully tried to jailbreak foundation models, an equally important question is if honest users could be harmed by malicious third-party content. In this paper we show that imperceivable attacks on images in order to change the caption output of a multi-modal foundation model can be used by malicious content providers to harm honest users e.g. by guiding them to malicious websites or broadcast fake information. This indicates that countermeasures to adversarial attacks should be used by any deployed multi-modal foundation model.
    摘要 多模态基础模型,如FLAMINGO或GPT-4,在最近吸引了巨大的关注。对基础模型的Alignment用于防止模型提供恶意或有害输出。然而,恶意用户已成功地破坏基础模型,另一个重要问题是可以否由正常用户受到恶意第三方内容的伤害。在本文中,我们示出了图像透明攻击可以让恶意内容提供者通过改变多模态基础模型的图像描述来诱导正常用户访问黑客网站或播放假信息。这表明,在部署多模态基础模型时应该使用防御性攻击countermeasure。

We Don’t Need No Adam, All We Need Is EVE: On The Variance of Dual Learning Rate And Beyond

  • paper_url: http://arxiv.org/abs/2308.10740
  • repo_url: https://github.com/akhadangi/EVE
  • paper_authors: Afshin Khadangi
  • for: 优化深度学习模型
  • methods: 采用不同学习率分别对不同分量的梯度进行学习率分化
  • results: 比较 EXISTS 优化技术在各种标准数据集和架构上的表现,实验结果显示 EVE 方法可以快速减少损失函数的搜索空间,提高模型的性能和稳定性。
    Abstract In the rapidly advancing field of deep learning, optimising deep neural networks is paramount. This paper introduces a novel method, Enhanced Velocity Estimation (EVE), which innovatively applies different learning rates to distinct components of the gradients. By bifurcating the learning rate, EVE enables more nuanced control and faster convergence, addressing the challenges associated with traditional single learning rate approaches. Utilising a momentum term that adapts to the learning landscape, the method achieves a more efficient navigation of the complex loss surface, resulting in enhanced performance and stability. Extensive experiments demonstrate that EVE significantly outperforms existing optimisation techniques across various benchmark datasets and architectures.
    摘要 在深度学习领域的快速发展中,优化深度神经网络的重要性日益减震。本文介绍了一种新方法——加速率分配(EVE),它创新地将不同的学习率应用到不同的梯度组件。通过分化学习率,EVE允许更细化的控制和更快的收敛,解决了传统单学习率方法所遇到的挑战。通过适应学习地带的滑动项,方法实现了更加有效的搜索和稳定性。广泛的实验表明,EVE在多种 benchmark 数据集和架构上显著超越了现有的优化技术。

UGSL: A Unified Framework for Benchmarking Graph Structure Learning

  • paper_url: http://arxiv.org/abs/2308.10737
  • repo_url: https://github.com/google-research/google-research
  • paper_authors: Bahare Fatemi, Sami Abu-El-Haija, Anton Tsitsulin, Mehran Kazemi, Dustin Zelle, Neslihan Bulut, Jonathan Halcrow, Bryan Perozzi
  • For: 本研究提出了一种统一框架,用于评估扩展 Graph Neural Networks (GNNs) 的应用范围。* Methods: 本研究使用了多种现有的 GNN 模型,并在一个统一的框架中实现了它们。* Results: 研究对多种组件的影响进行了广泛的分析,并提供了这些方法的优缺点。
    Abstract Graph neural networks (GNNs) demonstrate outstanding performance in a broad range of applications. While the majority of GNN applications assume that a graph structure is given, some recent methods substantially expanded the applicability of GNNs by showing that they may be effective even when no graph structure is explicitly provided. The GNN parameters and a graph structure are jointly learned. Previous studies adopt different experimentation setups, making it difficult to compare their merits. In this paper, we propose a benchmarking strategy for graph structure learning using a unified framework. Our framework, called Unified Graph Structure Learning (UGSL), reformulates existing models into a single model. We implement a wide range of existing models in our framework and conduct extensive analyses of the effectiveness of different components in the framework. Our results provide a clear and concise understanding of the different methods in this area as well as their strengths and weaknesses. The benchmark code is available at https://github.com/google-research/google-research/tree/master/ugsl.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 在各种应用场景中表现出色。虽然大多数 GNN 应用场景假设已知图Structured,但一些最近的方法已经扩展了 GNN 的应用范围,并证明它们可以在没有明确提供图结构的情况下也表现出色。这些方法在学习 GNN 参数和图结构时同时进行了学习。先前的研究采用了不同的实验设置,这使得对它们的评价变得困难。在这篇论文中,我们提出了一种图结构学习的 benchmarking 策略,我们称之为 Unified Graph Structure Learning (UGSL)。我们将现有的模型重新表述为单一的模型,并在这个框架中实现了广泛的现有模型。我们进行了广泛的分析,以了解不同组件在这个领域中的效果和优劣点。我们的结果提供了对这些方法的清晰和简洁的理解,以及它们在不同情况下的优劣点。 UGSL 框架的代码可以在 GitHub 上获取:https://github.com/google-research/google-research/tree/master/ugsl。

Artificial intelligence-driven antimicrobial peptide discovery

  • paper_url: http://arxiv.org/abs/2308.10921
  • repo_url: None
  • paper_authors: Paulina Szymczak, Ewa Szczurek
  • for: 抗微生物蛋白质(AMPs)作为替代性抗生素,以扩展抗生素耐药性的选项。
  • methods: 人工智能(AI)在AMP发现中发掘新的方法,包括预测蛋白质性能和毒性,以及生成新的AMP候选者。
  • results: AI在AMP发现中获得了成果,包括透过预测和生成新的AMP候选者,以及控制生成AMPs的性能。
    Abstract Antimicrobial peptides (AMPs) emerge as promising agents against antimicrobial resistance, providing an alternative to conventional antibiotics. Artificial intelligence (AI) revolutionized AMP discovery through both discrimination and generation approaches. The discriminators aid the identification of promising candidates by predicting key peptide properties such as activity and toxicity, while the generators learn the distribution over peptides and enable sampling novel AMP candidates, either de novo, or as analogues of a prototype peptide. Moreover, the controlled generation of AMPs with desired properties is achieved by discriminator-guided filtering, positive-only learning, latent space sampling, as well as conditional and optimized generation. Here we review recent achievements in AI-driven AMP discovery, highlighting the most exciting directions.
    摘要 安提米克rob潜血蛋白(AMPs)在抗生素耐荷性方面emerges as promising agents, providing an alternative to conventional antibiotics.人工智能(AI)在AMP发现方面发挥了革命性的作用,通过both discrimination和generation approaches。拒绝器帮助确定优秀候选者,预测蛋白质活性和致病性,而生成器学习蛋白质分布,可以采样新的AMP候选者,或者是蛋白质原型的analogues。此外,控制生成AMPs的拓展和质量的方法还包括拒绝器导向的筛选、正面 alone learning、秘密空间抽样、以及条件和优化的生成。本文回顾了最近的AI驱动的AMP发现成果,强调最有前途的方向。

What’s Race Got to do with it? Predicting Youth Depression Across Racial Groups Using Machine and Deep Learning

  • paper_url: http://arxiv.org/abs/2308.11591
  • repo_url: None
  • paper_authors: Nathan Zhong, Nikhil Yadav
  • for: 这个研究旨在运用机器学习(ML)和人工神经网络(ANN)模型来诊断高中生中的抑郁症状。
  • methods: 本研究使用了全国青少年风险行为调查系统(YRBSS)调查数据,并运用了不同的种族子集(白人、黑人和其他少数民族)进行分组训练和测试。
  • results: 研究发现不同的种族子集有不同的诊断因素,并且发现了一些特定的变量可以帮助预测抑郁症状。ANN模型在整个数据集上 achieve 的F1分数为82.90%,而最佳机器学习模型(支持向量机)则 achieve 81.90%。
    Abstract Depression is a common yet serious mental disorder that affects millions of U.S. high schoolers every year. Still, accurate diagnosis and early detection remain significant challenges. In the field of public health, research shows that neural networks produce promising results in identifying other diseases such as cancer and HIV. This study proposes a similar approach, utilizing machine learning (ML) and artificial neural network (ANN) models to classify depression in a student. Additionally, the study highlights the differences in relevant factors for race subgroups and advocates the need for more extensive and diverse datasets. The models train on nationwide Youth Risk Behavior Surveillance System (YRBSS) survey data, in which the most relevant factors of depression are found with statistical analysis. The survey data is a structured dataset with 15000 entries including three race subsets each consisting of 900 entries. For classification, the research problem is modeled as a supervised learning binary classification problem. Factors relevant to depression for different racial subgroups are also identified. The ML and ANN models are trained on the entire dataset followed by different race subsets to classify whether an individual has depression. The ANN model achieves the highest F1 score of 82.90% while the best-performing machine learning model, support vector machines (SVM), achieves a score of 81.90%. This study reveals that different parameters are more valuable for modeling depression across diverse racial groups and furthers research regarding American youth depression.
    摘要 每年数百万美国高中生都会被抑郁症病例所困扰。然而,准确诊断和早期发现仍然是一项重要挑战。在公共卫生领域,研究表明,神经网络生成出了识别其他疾病的可能性。这项研究提议了类似的方法,利用机器学习(ML)和人工神经网络(ANN)模型来诊断抑郁症。此外,研究还指出了不同的种族 subgroup 中相关因素的差异,并且强调了更大和多样化的数据集的需要。这些模型在全国青少年风险行为监测系统(YRBSS)调查数据上训练,该数据集包含15000个数据点,每个数据点包含三个种族 subsets,每个subset 包含900个数据点。为分类,研究问题被定义为一种指导学习二分类问题。不同种族 subgroup 中对抑郁症的相关因素也被 indentified。ML 和 ANN 模型在整个数据集上进行训练,然后在不同种族 subsets 中进行分类,以确定个体是否患有抑郁症。ANN 模型 achievesthe highest F1 score of 82.90%,而最佳机器学习模型,支持向量机(SVM), achievesthe score of 81.90%。这项研究表明,不同种族 subgroup 中的参数更有价值于模型抑郁症,并且推动了美国青年抑郁症的进一步研究。

Test-time augmentation-based active learning and self-training for label-efficient segmentation

  • paper_url: http://arxiv.org/abs/2308.10727
  • repo_url: None
  • paper_authors: Bella Specktor-Fadida, Anna Levchakov, Dana Schonberger, Liat Ben-Sira, Dafna Ben-Bashat, Leo Joskowicz
    for:This paper proposes a new method that combines self-training (ST) with active learning (AL) using Test-Time Augmentations (TTA) for medical image segmentation tasks. The method aims to reduce the annotation burden and improve the performance of the segmentation models.methods:The proposed method combines ST with AL using TTA. TTA is performed on an initial teacher network, and cases for annotation are selected based on the lowest estimated Dice score. The selected annotated cases are trained with existing annotated cases and ST cases with border slices annotations.results:The results show that ST is highly effective for both fetal body and placenta segmentation tasks, boosting performance for in-distribution (ID) and out-of-distribution (OOD) data. However, the combination of AL and ST did not improve performance for single-sequence fetal body segmentation, and AL was more effective for high-variability placenta data. The method achieved a Dice score of 0.961 for fetal body segmentation with only 6 original scans and 2 new sequence scans, and the results using 15 high-variability placenta cases were similar to those using 50 cases.
    Abstract Deep learning techniques depend on large datasets whose annotation is time-consuming. To reduce annotation burden, the self-training (ST) and active-learning (AL) methods have been developed as well as methods that combine them in an iterative fashion. However, it remains unclear when each method is the most useful, and when it is advantageous to combine them. In this paper, we propose a new method that combines ST with AL using Test-Time Augmentations (TTA). First, TTA is performed on an initial teacher network. Then, cases for annotation are selected based on the lowest estimated Dice score. Cases with high estimated scores are used as soft pseudo-labels for ST. The selected annotated cases are trained with existing annotated cases and ST cases with border slices annotations. We demonstrate the method on MRI fetal body and placenta segmentation tasks with different data variability characteristics. Our results indicate that ST is highly effective for both tasks, boosting performance for in-distribution (ID) and out-of-distribution (OOD) data. However, while self-training improved the performance of single-sequence fetal body segmentation when combined with AL, it slightly deteriorated performance of multi-sequence placenta segmentation on ID data. AL was helpful for the high variability placenta data, but did not improve upon random selection for the single-sequence body data. For fetal body segmentation sequence transfer, combining AL with ST following ST iteration yielded a Dice of 0.961 with only 6 original scans and 2 new sequence scans. Results using only 15 high-variability placenta cases were similar to those using 50 cases. Code is available at: https://github.com/Bella31/TTA-quality-estimation-ST-AL
    摘要 深度学习技术需要大量数据进行注释,但这些注释可以是时间consuming的。为了减轻注释负担,自动训练(ST)和活动学习(AL)方法已经被开发出来,同时也有将这两种方法相互融合的方法。然而,还没有一个明确的时候,哪种方法是最有用,并且在哪些情况下合理使用它们。在这篇论文中,我们提出了一种新的方法,即将ST与AL相互融合,使用测试时数据扩展(TTA)。首先,TTA被应用于初始教师网络。然后,根据最低估计的 dice 分数选择简单的注释案例。高估分的案例用作软 Pseudo-标签,并将其与已有注释案例和ST案例进行训练。我们在MRI胎Body和 Placenta分割任务上进行了实验,并证明了ST是这两个任务中非常有效的。然而,在单个序列Body分割任务中,杂合AL与ST时,ST会提高ID数据和OOD数据的性能,但是在单个序列Body分割任务中,杂合AL与ST时,ST会轻微下降ID数据的性能。在多个序列Placenta分割任务中,AL可以帮助高度变化的数据,但是在单个序列Body分割任务中,AL无法提高随机选择的性能。在胎Body分割序列传输任务中,将AL与ST相互融合,并在ST迭代后进行AL,可以达到0.961的Dice值,只需要6个原始扫描和2个新序列扫描。在50个高度变化的Placenta数据中,使用只有15个高度变化的Placenta数据可以达到类似的性能。我们的代码可以在以下github上找到:https://github.com/Bella31/TTA-quality-estimation-ST-AL。

Clustered Linear Contextual Bandits with Knapsacks

  • paper_url: http://arxiv.org/abs/2308.10722
  • repo_url: None
  • paper_authors: Yichuan Deng, Michalis Mamakos, Zhao Song
  • for: 本研究探讨了归一化上下文抽奖问题,即奖励和资源消耗是由群集特定的线性模型决定的。 arms 被分成 clusters,cluster 的成员身份不知道给算法。
  • methods: 我们提供了一种算法,可以在不知道所有 arm 的情况下,在数量较少的时间 periods 内具有减少 regret 的性能。我们使用了 econometrics 和抽奖 constrained литературе中的技术,并实现了一种高效的 clustering 方法。
  • results: 我们证明了这种算法可以在数量较少的时间 periods 内具有减少 regret 的性能,而不需要访问所有 arm。特别是,我们发现可以通过在一个随机选择的 subset 上进行 clustering,来实现这一点。
    Abstract In this work, we study clustered contextual bandits where rewards and resource consumption are the outcomes of cluster-specific linear models. The arms are divided in clusters, with the cluster memberships being unknown to an algorithm. Pulling an arm in a time period results in a reward and in consumption for each one of multiple resources, and with the total consumption of any resource exceeding a constraint implying the termination of the algorithm. Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships. We provide an algorithm that achieves regret sublinear in the number of time periods, without requiring access to all of the arms. In particular, we show that it suffices to perform clustering only once to a randomly selected subset of the arms. To achieve this result, we provide a sophisticated combination of techniques from the literature of econometrics and of bandits with constraints.
    摘要 在这个工作中,我们研究集中的上下文投机,其中奖励和资源消耗是集中的线性模型的结果。武器被分成集群,集群成员身份不知道算法。在一个时间段内抽取一个武器会得到奖励和每种多种资源的消耗,而任何资源的总消耗超过限制就意味着算法终止。因此,最大化总奖励需要学习不仅奖励和资源消耗的模型,还需要集群成员身份。我们提供一个可以在时间期限内达到减少于数量的 regret的算法,不需要访问所有武器。特别是,我们显示了可以在随机选择的 subset of 武器上进行分 clustering。为了实现这个结果,我们提供了来自 econometrics 和投机 WITH 限制的 литераature 中的复杂组合技术。

CoMIX: A Multi-agent Reinforcement Learning Training Architecture for Efficient Decentralized Coordination and Independent Decision Making

  • paper_url: http://arxiv.org/abs/2308.10721
  • repo_url: None
  • paper_authors: Giovanni Minelli, Mirco Musolesi
  • for: 这篇论文是为了提出一种基于协调QMIX(CoMIX)的培训框架,以便在分布式代理中实现稳定协调。
  • methods: 该论文使用了自适应策略,让每个代理在决策过程中独立做出决定,同时也能够适应不同情况,协调决策。
  • results: 在多种 simulate环境中进行的实验表明,CoMIX在合作任务上表现更好于基线值,这 validate了我们的增量策略方法是一种有效的协调技术。
    Abstract Robust coordination skills enable agents to operate cohesively in shared environments, together towards a common goal and, ideally, individually without hindering each other's progress. To this end, this paper presents Coordinated QMIX (CoMIX), a novel training framework for decentralized agents that enables emergent coordination through flexible policies, allowing at the same time independent decision-making at individual level. CoMIX models selfish and collaborative behavior as incremental steps in each agent's decision process. This allows agents to dynamically adapt their behavior to different situations balancing independence and collaboration. Experiments using a variety of simulation environments demonstrate that CoMIX outperforms baselines on collaborative tasks. The results validate our incremental policy approach as effective technique for improving coordination in multi-agent systems.
    摘要 Robust协调技能使代理人在共享环境中协同工作,共同向共同目标努力,并理想地不干扰别的进步。为了实现这一目标,这篇论文提出了协调QMIX(CoMIX),一种新的培训框架 для分布式代理人,允许 agents在各自决策过程中动态适应不同情况,同时保持独立决策能力。CoMIX将自利和合作行为视为各自决策过程中的逐步增量。这使得代理人可以在不同情况下动态地适应,均衡独立和合作。实验结果表明,CoMIX在合作任务上比基eline表现出色,这证明了我们的逐步政策方法是有效的。

Relax and penalize: a new bilevel approach to mixed-binary hyperparameter optimization

  • paper_url: http://arxiv.org/abs/2308.10711
  • repo_url: None
  • paper_authors: Marianna de Santis, Jordan Frecon, Francesco Rinaldi, Saverio Salzo, Martin Schmidt
  • for: 这篇论文主要是为了提高机器学习模型中高维度超参数的优化。
  • methods: 该论文使用了等价连续双层 reformulation 以处理混合二进制超参数的优化问题。
  • results: 试验结果表明,该方法可以在 regression 问题中更好地 estimating 群体稀缺结构,并且超过了现有的 relaxation 和 rounding 方法的性能。
    Abstract In recent years, bilevel approaches have become very popular to efficiently estimate high-dimensional hyperparameters of machine learning models. However, to date, binary parameters are handled by continuous relaxation and rounding strategies, which could lead to inconsistent solutions. In this context, we tackle the challenging optimization of mixed-binary hyperparameters by resorting to an equivalent continuous bilevel reformulation based on an appropriate penalty term. We propose an algorithmic framework that, under suitable assumptions, is guaranteed to provide mixed-binary solutions. Moreover, the generality of the method allows to safely use existing continuous bilevel solvers within the proposed framework. We evaluate the performance of our approach for a specific machine learning problem, i.e., the estimation of the group-sparsity structure in regression problems. Reported results clearly show that our method outperforms state-of-the-art approaches based on relaxation and rounding
    摘要 近年来,二级方法在高维参数估计机器学习模型中变得非常流行。然而,到目前为止, binary 参数都是通过连续弹性和圆拟约法来处理,这可能会导致不一致的解决方案。在这个上下文中,我们解决了高维混合二级参数的困难优化问题,通过一个适当的罚项来转化为连续二级形式。我们提出了一个框架,以下所述的假设下,能够提供混合二级解决方案。此外,我们的方法总体上允许使用现有的连续二级解决方案。我们对一个具体的机器学习问题,即回归问题中的群集稀缺结构估计,进行了评估。报告的结果表明,我们的方法在比较 state-of-the-art 方法(基于弹性和圆拟约)的基础上具有明显的优势。

Measuring the Effect of Causal Disentanglement on the Adversarial Robustness of Neural Network Models

  • paper_url: http://arxiv.org/abs/2308.10708
  • repo_url: https://github.com/prebenness/causal_disentanglement_robustness
  • paper_authors: Preben M. Ness, Dusica Marijan, Sunanda Bose
  • for: 这些纸上的文章是为了证明 causal Neural Network 模型在针对性攻击方面具有高度的 robustness,以及在几种推理任务中具有增强的能力,如少数shot学习和罕见上下文分类。
  • methods: 这些模型使用 causal Neural Network 模型,并利用 Computer Vision 领域中的内容/风格分离指标来衡量不同方面的 causal 分离度。
  • results: 研究发现,模型的 causal 分离度与针对性攻击的Robustness 之间存在强相关关系(r=0.820,p=0.001),同时,干扰信号中像素级信息含量与针对性攻击Robustness 存在负相关关系(r=-0.597,p=0.040)。
    Abstract Causal Neural Network models have shown high levels of robustness to adversarial attacks as well as an increased capacity for generalisation tasks such as few-shot learning and rare-context classification compared to traditional Neural Networks. This robustness is argued to stem from the disentanglement of causal and confounder input signals. However, no quantitative study has yet measured the level of disentanglement achieved by these types of causal models or assessed how this relates to their adversarial robustness. Existing causal disentanglement metrics are not applicable to deterministic models trained on real-world datasets. We, therefore, utilise metrics of content/style disentanglement from the field of Computer Vision to measure different aspects of the causal disentanglement for four state-of-the-art causal Neural Network models. By re-implementing these models with a common ResNet18 architecture we are able to fairly measure their adversarial robustness on three standard image classification benchmarking datasets under seven common white-box attacks. We find a strong association (r=0.820, p=0.001) between the degree to which models decorrelate causal and confounder signals and their adversarial robustness. Additionally, we find a moderate negative association between the pixel-level information content of the confounder signal and adversarial robustness (r=-0.597, p=0.040).
    摘要 causal neural network 模型在对抗攻击和一些几个shot学习和罕见情况分类任务中表现出了高水平的鲁棒性,相比传统神经网络。这种鲁棒性 argued to come from the separation of causal and confounding input signals。然而,没有任何数值研究 measuring the level of disentanglement achieved by these types of causal models or assessing how this relates to their adversarial robustness。 existing causal disentanglement metrics are not applicable to deterministic models trained on real-world datasets。We therefore use metrics of content/style disentanglement from the field of computer vision to measure different aspects of the causal disentanglement for four state-of-the-art causal neural network models。By re-implementing these models with a common ResNet18 architecture, we are able to fairly measure their adversarial robustness on three standard image classification benchmarking datasets under seven common white-box attacks。We find a strong association (r=0.820, p=0.001) between the degree to which models decorrelate causal and confounder signals and their adversarial robustness。In addition, we find a moderate negative association between the pixel-level information content of the confounder signal and adversarial robustness (r=-0.597, p=0.040).

Sampling From Autoencoders’ Latent Space via Quantization And Probability Mass Function Concepts

  • paper_url: http://arxiv.org/abs/2308.10704
  • repo_url: None
  • paper_authors: Aymene Mohammed Bouayed, Adrian Iaccovelli, David Naccache
  • for: 本研究旨在从生成模型基于 autoencoder 的含义空间采样,以便生成真实的图像。
  • methods: 我们提出了一种基于概率质量函数的新采样算法,并将其与量化处理结合。该算法在输入数据的含义空间中定义了一个邻域,然后从这些定义的邻域中采样含义向量。这种策略确保了采样的含义向量主要居住在高概率区域,从而可以高效地转换为真实的图像。
  • results: 我们的采样算法在多种模型和数据集上表现出色,比如 MNIST 数据集上,我们的方法与基于 Gaussian mixture models(GMM)采样相比,可以获得 notable 的改善 ($0.89$ 的 FID 值)。此外,当生成图像的类型是人脸和眼睛图像时,我们的方法也显示出了明显的改善(FID 值分别提高 $1.69$ 和 $0.87$)。最后,我们通过 Wasserstein distance 来证明我们的方法在估计含义空间分布上的效果,与 GMM 采样相比。
    Abstract In this study, we focus on sampling from the latent space of generative models built upon autoencoders so as the reconstructed samples are lifelike images. To do to, we introduce a novel post-training sampling algorithm rooted in the concept of probability mass functions, coupled with a quantization process. Our proposed algorithm establishes a vicinity around each latent vector from the input data and then proceeds to draw samples from these defined neighborhoods. This strategic approach ensures that the sampled latent vectors predominantly inhabit high-probability regions, which, in turn, can be effectively transformed into authentic real-world images. A noteworthy point of comparison for our sampling algorithm is the sampling technique based on Gaussian mixture models (GMM), owing to its inherent capability to represent clusters. Remarkably, we manage to improve the time complexity from the previous $\mathcal{O}(n\times d \times k \times i)$ associated with GMM sampling to a much more streamlined $\mathcal{O}(n\times d)$, thereby resulting in substantial speedup during runtime. Moreover, our experimental results, gauged through the Fr\'echet inception distance (FID) for image generation, underscore the superior performance of our sampling algorithm across a diverse range of models and datasets. On the MNIST benchmark dataset, our approach outperforms GMM sampling by yielding a noteworthy improvement of up to $0.89$ in FID value. Furthermore, when it comes to generating images of faces and ocular images, our approach showcases substantial enhancements with FID improvements of $1.69$ and $0.87$ respectively, as compared to GMM sampling, as evidenced on the CelebA and MOBIUS datasets. Lastly, we substantiate our methodology's efficacy in estimating latent space distributions in contrast to GMM sampling, particularly through the lens of the Wasserstein distance.
    摘要 在这个研究中,我们关注在基于 autoencoder 的生成模型的 latent space 中采样,以便生成生动的图像。为此,我们提出了一种新的后期采样算法,基于概率质量函数,并且进行量化处理。我们的提议的算法会定义 latent vector 的邻域,然后从这些定义的邻域中采样。这种策略确保了采样 latent vector 主要居住在高概率区域中,从而可以有效地转换为真实的世界图像。与基于 Gaussian mixture models (GMM) 的采样技术相比,我们的采样算法具有更高的时间复杂度,从 $\mathcal{O}(n\times d \times k \times i)$ 降低到 $\mathcal{O}(n\times d)$,从而在运行时间中获得了显著的加速。此外,我们的实验结果,通过 Fréchet inception distance (FID) 来衡量图像生成的性能,表明我们的采样算法在不同的模型和数据集上具有显著的优势。在 MNIST 数据集上,我们的方法与 GMM 采样相比,提高了 FID 值的不同程度,最高可达 $0.89$。此外,当生成面部和眼部图像时,我们的方法还显示了重要的改进,FID 改进值分别为 $1.69$ 和 $0.87$,在 GMM 采样的基础上具有显著的优势。最后,我们验证了我们的方法在估计 latent space 分布方面的有效性,特别是通过 Wasserstein distance 的验证。

Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models

  • paper_url: http://arxiv.org/abs/2308.11578
  • repo_url: None
  • paper_authors: Zixing Zhang, Liyizhe Peng, Tao Pang, Jing Han, Huan Zhao, Bjorn W. Schuller
  • for: 这 paper 的目的是 investigate how large language models (LLMs) perform in emotion recognition, and to offer insights and pose potential challenges for enhancing emotion recognition in the new era of advanced and generalised large models.
  • methods: 这 paper 使用了 diverse aspects, including in-context learning, few-shot learning, accuracy, generalisation, and explanation, to evaluate the performance of LLMs in emotion recognition.
  • results: 这 paper 的结果表明 that LLMs can significantly boost the performance of emotion recognition models, and can achieve the best results on different benchmarks. However, the paper also poses potential challenges and offers insights for enhancing emotion recognition in the new era of advanced and generalised large models.
    Abstract After the inception of emotion recognition or affective computing, it has increasingly become an active research topic due to its broad applications. Over the past couple of decades, emotion recognition models have gradually migrated from statistically shallow models to neural network-based deep models, which can significantly boost the performance of emotion recognition models and consistently achieve the best results on different benchmarks. Therefore, in recent years, deep models have always been considered the first option for emotion recognition. However, the debut of large language models (LLMs), such as ChatGPT, has remarkably astonished the world due to their emerged capabilities of zero/few-shot learning, in-context learning, chain-of-thought, and others that are never shown in previous deep models. In the present paper, we comprehensively investigate how the LLMs perform in emotion recognition in terms of diverse aspects, including in-context learning, few-short learning, accuracy, generalisation, and explanation. Moreover, we offer some insights and pose other potential challenges, hoping to ignite broader discussions about enhancing emotion recognition in the new era of advanced and generalised large models.
    摘要 после 几十年的情感认知或情感计算的出现,这已经成为了活跃的研究话题,因为它的广泛应用。过去几十年,情感认知模型逐渐从统计学上的浅层模型迁移到神经网络基于深度模型,这可以大幅提高情感认知模型的性能,并一直在不同的benchmark上达到最佳结果。因此,在最近几年,深度模型一直被视为情感认知的第一选择。然而,大语言模型(LLMs),如ChatGPT,在全球引发了惊叹,因为它们在前一代深度模型中未经过显示的特性,包括零/几个shot学习、上下文学习、串行思维等。在 presente 文章中,我们全面调查了LLMs在情感认知方面的性能,包括上下文学习、几个shot学习、准确率、泛化和解释。此外,我们还提供了一些启示和提出了其他潜在的挑战,希望能够激发更广泛的讨论,以提高情感认知在新的高级通用大模型时代的发展。

An engine to simulate insurance fraud network data

  • paper_url: http://arxiv.org/abs/2308.11659
  • repo_url: None
  • paper_authors: Bavo D. C. Campo, Katrien Antonio
    for: 这个研究旨在开发一个高效且准确的探索阴谋保险laims(fraudulent insurance claims)的方法。methods: 本研究使用了社交网络中所 involve的 party 的特征 engineering 作为 input 进行学习。results: 这个研究使用了一个 simulation machine 来生成synthetic data,并允许使用者控制数据生成机制,以测试不同的方法和模型。
    Abstract Traditionally, the detection of fraudulent insurance claims relies on business rules and expert judgement which makes it a time-consuming and expensive process (\'Oskarsd\'ottir et al., 2022). Consequently, researchers have been examining ways to develop efficient and accurate analytic strategies to flag suspicious claims. Feeding learning methods with features engineered from the social network of parties involved in a claim is a particularly promising strategy (see for example Van Vlasselaer et al. (2016); Tumminello et al. (2023)). When developing a fraud detection model, however, we are confronted with several challenges. The uncommon nature of fraud, for example, creates a high class imbalance which complicates the development of well performing analytic classification models. In addition, only a small number of claims are investigated and get a label, which results in a large corpus of unlabeled data. Yet another challenge is the lack of publicly available data. This hinders not only the development of new methods, but also the validation of existing techniques. We therefore design a simulation machine that is engineered to create synthetic data with a network structure and available covariates similar to the real life insurance fraud data set analyzed in \'Oskarsd\'ottir et al. (2022). Further, the user has control over several data-generating mechanisms. We can specify the total number of policyholders and parties, the desired level of imbalance and the (effect size of the) features in the fraud generating model. As such, the simulation engine enables researchers and practitioners to examine several methodological challenges as well as to test their (development strategy of) insurance fraud detection models in a range of different settings. Moreover, large synthetic data sets can be generated to evaluate the predictive performance of (advanced) machine learning techniques.
    摘要 传统上,探测保险 fraud 的方法依赖于企业规则和专家判断,这使得过程时间consuming 和成本高('Oskarsd\'ottir et al., 2022)。因此,研究人员在尝试开发高效和准确的分析策略来检测可疑的laim。从社交网络中提取和引擎特征来训练学习方法是一种非常有前途的策略(如 Van Vlasselaer et al. (2016);Tumminello et al. (2023))。在开发探测模型时,我们面临了一些挑战。例如,诈骗的不常见性导致分类模型的性能差,而且只有一小部分的laim被调查和标注,导致大量的无标注数据。此外,没有公共可用的数据也限制了新方法的发展和现有技术的验证。为了解决这些问题,我们设计了一个可以生成Synthetic data的 simulate 机器,其中可以控制数据生成机制的一些参数。我们可以指定policyholders和相关方数量,欲要的分类模型性能等级,以及诈骗生成模型中特征的效应大小。因此,这个 simulate 机器可以帮助研究人员和实践人员在不同的设定下测试他们的探测模型,以及解决一些方法学挑战。此外,可以生成大量的Synthetic data,以评估先进机器学习技术的预测性能。

Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

  • paper_url: http://arxiv.org/abs/2308.10699
  • repo_url: None
  • paper_authors: Arman Rahbar, Niklas Åkerblom, Morteza Haghir Chehreghani
  • for: 这篇论文目的是解决在许多实际应用中的在线决策问题,这些决策通常基于对入围数据点的测试序列。
  • methods: 论文提出了一种基于 combinatorial 多臂投机的新形式化方法,该方法考虑了测试成本。
  • results: 论文提供了一个新的成本效率的在线决策框架,可以使用 posterior sampling 或 BayesUCB 进行探索。 论文还提供了严格的理论分析和各种实验结果,证明了其在实际问题中的可行性。
    Abstract Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a rigorous theoretical analysis for our framework and present various experimental results that demonstrate its applicability to real-world problems.
    摘要 在许多实际应用中,在线决策扮演着关键的角色。在许多场景中,决策基于对进来数据点的测试进行序列。然而,执行所有测试可能是昂贵的,而且不一定可行。在这篇论文中,我们提供了一种新的在线决策问题的形式ulation,考虑了测试成本。基于这种形式ulation,我们提供了一个新的cost-efficient的在线决策框架,可以利用 posterior sampling 或 BayesUCB 进行探索。我们提供了一个严格的理论分析,并在各种实验中证明了其可应用性。

Beyond expectations: Residual Dynamic Mode Decomposition and Variance for Stochastic Dynamical Systems

  • paper_url: http://arxiv.org/abs/2308.10697
  • repo_url: None
  • paper_authors: Matthew J. Colbrook, Qin Li, Ryan V. Raut, Alex Townsend
  • for: 这篇论文是关于非线性动力系统的线性化的研究,具体来说是关于科普曼算子的spectral information的研究。
  • methods: 这篇论文使用了一种名为Dynamic Mode Decomposition(DMD)的投影方法来近似科普曼算子的特征属性。
  • results: 这篇论文提出了一种包括偏差和方差的科普曼框架,以控制投影错误。此外,paper还介绍了一种名为variance-pseudospectra的概念,用于评估统计相关性。最后,paper还提供了一些有关科普曼算子特征量的收敛结果。
    Abstract Koopman operators linearize nonlinear dynamical systems, making their spectral information of crucial interest. Numerous algorithms have been developed to approximate these spectral properties, and Dynamic Mode Decomposition (DMD) stands out as the poster child of projection-based methods. Although the Koopman operator itself is linear, the fact that it acts in an infinite-dimensional space of observables poses various challenges. These include spurious modes, essential spectra, and the verification of Koopman mode decompositions. While recent work has addressed these challenges for deterministic systems, there remains a notable gap in verified DMD methods tailored for stochastic systems, where the Koopman operator measures the expectation of observables. We show that it is necessary to go beyond expectations to address these issues. By incorporating variance into the Koopman framework, we address these challenges. Through an additional DMD-type matrix, we approximate the sum of a squared residual and a variance term, each of which can be approximated individually using batched snapshot data. This allows verified computation of the spectral properties of stochastic Koopman operators, controlling the projection error. We also introduce the concept of variance-pseudospectra to gauge statistical coherency. Finally, we present a suite of convergence results for the spectral quantities of stochastic Koopman operators. Our study concludes with practical applications using both simulated and experimental data. In neural recordings from awake mice, we demonstrate how variance-pseudospectra can reveal physiologically significant information unavailable to standard expectation-based dynamical models.
    摘要 库曼运算符linear化非线性动力系统的spectral信息非常重要。numerous algorithms have been developed to approximate这些spectral properties,and Dynamic Mode Decomposition (DMD) stands out as the poster child of projection-based methods。although the Koopman operator itself is linear,the fact that it acts in an infinite-dimensional space of observables poses various challenges。These include spurious modes,essential spectra,and the verification of Koopman mode decompositions。While recent work has addressed these challenges for deterministic systems,there remains a notable gap in verified DMD methods tailored for stochastic systems,where the Koopman operator measures the expectation of observables。We show that it is necessary to go beyond expectations to address these issues。By incorporating variance into the Koopman framework,we address these challenges。Through an additional DMD-type matrix,we approximate the sum of a squared residual and a variance term,each of which can be approximated individually using batched snapshot data。This allows verified computation of the spectral properties of stochastic Koopman operators,controlling the projection error。We also introduce the concept of variance-pseudospectra to gauge statistical coherency。Finally,we present a suite of convergence results for the spectral quantities of stochastic Koopman operators。Our study concludes with practical applications using both simulated and experimental data。In neural recordings from awake mice,we demonstrate how variance-pseudospectra can reveal physiologically significant information unavailable to standard expectation-based dynamical models。

An Improved Best-of-both-worlds Algorithm for Bandits with Delayed Feedback

  • paper_url: http://arxiv.org/abs/2308.10675
  • repo_url: None
  • paper_authors: Saeed Masoudian, Julian Zimmert, Yevgeny Seldin
  • for: 该文章是为了解决带延迟反馈的矩阵问题提出了一种新的算法。
  • methods: 该算法利用了 counts of outstanding observations(在行动时观察到的量)而不需要估计最大延迟 $d_{\max}$,并提供了更紧张的 regret bound。
  • results: 该算法可以控制分布漂移,并且可以预测延迟反馈中出现的异常情况。
    Abstract We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. The algorithm improves on prior work by Masoudian et al. [2022] by eliminating the need in prior knowledge of the maximal delay $d_{\mathrm{max}$ and providing tighter regret bounds in both regimes. The algorithm and its regret bounds are based on counts of outstanding observations (a quantity that is observed at action time) rather than delays or the maximal delay (quantities that are only observed when feedback arrives). One major contribution is a novel control of distribution drift, which is based on biased loss estimators and skipping of observations with excessively large delays. Another major contribution is demonstrating that the complexity of best-of-both-worlds bandits with delayed feedback is characterized by the cumulative count of outstanding observations after skipping of observations with excessively large delays, rather than the delays or the maximal delay.
    摘要 我们提出一个新的best-of-both-worlds算法 для抽筋棒杠问题,这个算法超越先前的 Masoudian et al. (2022)的工作,不需要先知道最大延迟 $d_{\max}$ ,并且在两种情况下提供了更紧密的 regret bound。这个算法和其 regret bound 基于在动作时观察到的尚未完成观察数据(count of outstanding observations)而不是延迟或最大延迟(只有在反馈 arrives 时才能观察到)。我们的一个重要贡献是基于偏好的损失估计器和延迟过大的观察 skip 的控制方法,另一个重要贡献是显示了best-of-both-worlds抽筋棒杠问题的复杂性是由尚未完成观察数据的累总而不是延迟或最大延迟。

A Safe Deep Reinforcement Learning Approach for Energy Efficient Federated Learning in Wireless Communication Networks

  • paper_url: http://arxiv.org/abs/2308.10664
  • repo_url: None
  • paper_authors: Nikolaos Koursioumpas, Lina Magoula, Nikolaos Petropouleas, Alexandros-Ioannis Thanopoulos, Theodora Panagea, Nancy Alonistioti, M. A. Gutierrez-Estevez, Ramin Khalili
  • for: 降低艺术智能(AI)驱动无线网络中的环境影响,提高 Privacy Preserving 技术 Federated Learning(FL)的能效性。
  • methods: 提议在FL процессе中调度设备的计算和通信资源,以最小化总能 consumption,保证模型的性能,同时采用 Soft Actor Critic Deep Reinforcement Learning(DRL)解决方案,增加环境约束,保证安全的RL进程。
  • results: 对四个参考解决方案进行比较,实现降低总能 consumption的最高减少率达94%,在静态和动态环境中都有显著的改善。
    Abstract Progressing towards a new era of Artificial Intelligence (AI) - enabled wireless networks, concerns regarding the environmental impact of AI have been raised both in industry and academia. Federated Learning (FL) has emerged as a key privacy preserving decentralized AI technique. Despite efforts currently being made in FL, its environmental impact is still an open problem. Targeting the minimization of the overall energy consumption of an FL process, we propose the orchestration of computational and communication resources of the involved devices to minimize the total energy required, while guaranteeing a certain performance of the model. To this end, we propose a Soft Actor Critic Deep Reinforcement Learning (DRL) solution, where a penalty function is introduced during training, penalizing the strategies that violate the constraints of the environment, and ensuring a safe RL process. A device level synchronization method, along with a computationally cost effective FL environment are proposed, with the goal of further reducing the energy consumption and communication overhead. Evaluation results show the effectiveness of the proposed scheme compared to four state-of-the-art baseline solutions in both static and dynamic environments, achieving a decrease of up to 94% in the total energy consumption.
    摘要

Practical Parallel Algorithms for Non-Monotone Submodular Maximization

  • paper_url: http://arxiv.org/abs/2308.10656
  • repo_url: None
  • paper_authors: Shuang Cui, Kai Han, Jing Tang, He Huang, Xueying Li, Aakas Zhiyuli, Hanxiao Li
  • for: This paper is written for the field of artificial intelligence, specifically for submodular maximization in machine learning, computer vision, and natural language processing.
  • methods: The paper proposes two algorithms for submodular maximization: one for non-monotone submodular maximization subject to a knapsack constraint, and the other for non-monotone submodular maximization subject to a $k$-system constraint. Both algorithms have provable approximation ratios and sublinear adaptive complexities.
  • results: The paper achieves an $(8+\epsilon)$-approximation under $\mathcal{O}(\log n)$ adaptive complexity for non-monotone submodular maximization subject to a knapsack constraint, which is optimal up to a factor of $\mathcal{O}(\log\log n)$. Additionally, the paper proposes the first algorithm with both provable approximation ratio and sublinear adaptive complexity for non-monotone submodular maximization subject to a $k$-system constraint. The two algorithms are also applied to the special case of submodular maximization subject to a cardinality constraint, achieving performance bounds comparable with those of state-of-the-art algorithms.
    Abstract Submodular maximization has found extensive applications in various domains within the field of artificial intelligence, including but not limited to machine learning, computer vision, and natural language processing. With the increasing size of datasets in these domains, there is a pressing need to develop efficient and parallelizable algorithms for submodular maximization. One measure of the parallelizability of a submodular maximization algorithm is its adaptive complexity, which indicates the number of sequential rounds where a polynomial number of queries to the objective function can be executed in parallel. In this paper, we study the problem of non-monotone submodular maximization subject to a knapsack constraint, and propose the first combinatorial algorithm achieving an $(8+\epsilon)$-approximation under $\mathcal{O}(\log n)$ adaptive complexity, which is \textit{optimal} up to a factor of $\mathcal{O}(\log\log n)$. Moreover, we also propose the first algorithm with both provable approximation ratio and sublinear adaptive complexity for the problem of non-monotone submodular maximization subject to a $k$-system constraint. As a by-product, we show that our two algorithms can also be applied to the special case of submodular maximization subject to a cardinality constraint, and achieve performance bounds comparable with those of state-of-the-art algorithms. Finally, the effectiveness of our approach is demonstrated by extensive experiments on real-world applications.
    摘要 “对于人工智能中不同领域的应用,如机器学习、计算机视觉和自然语言处理,Submodular maximization 已经获得了广泛的应用。随着这些领域中的数据集的规模增加,开发高效和平行化的Submodular maximization 算法成为了一个紧要的需求。一个Measure of 平行化的Submodular maximization 算法的可行性是其适应复杂度,它表示在执行多个轮次的时候,可以在平行的方式进行多个几何种的询问。在这篇论文中,我们研究了非单调Submodular maximization 问题下的体统统计限制,并提出了首个具有$(8+\epsilon)$-估计的Combinatorial算法,其适应性为 $\mathcal{O}(\log n)$。此外,我们还提出了首个具有证明的近似比率和对应复杂度的Submodular maximization 算法,它可以在$k$-系统限制下进行应用。作为一个副产品,我们显示了我们的两个算法可以在特殊情况下进行Submodular maximization 问题,并达到了现有算法的性能 bound。最后,我们通过实际实验证明了我们的方法的实用性。”

Deep Evidential Learning for Bayesian Quantile Regression

  • paper_url: http://arxiv.org/abs/2308.10650
  • repo_url: None
  • paper_authors: Frederik Boe Hüttel, Filipe Rodrigues, Francisco Câmara Pereira
  • for: 该文章提出了一种深度 bayesian 量化回归模型,可以估计连续目标分布的Quantile without Gaussian assumption。
  • methods: 该方法基于显示学习,可以捕捉 aleatoric 和 epistemic 不确定性,并且通过单个推测过程来实现。
  • results: 该方法可以实现准确的不确定性估计,并且可以分解 aleatoric 和 epistemic 不确定性,同时具有对 out-of-distribution 样本的Robustness。
    Abstract It is desirable to have accurate uncertainty estimation from a single deterministic forward-pass model, as traditional methods for uncertainty quantification are computationally expensive. However, this is difficult because single forward-pass models do not sample weights during inference and often make assumptions about the target distribution, such as assuming it is Gaussian. This can be restrictive in regression tasks, where the mean and standard deviation are inadequate to model the target distribution accurately. This paper proposes a deep Bayesian quantile regression model that can estimate the quantiles of a continuous target distribution without the Gaussian assumption. The proposed method is based on evidential learning, which allows the model to capture aleatoric and epistemic uncertainty with a single deterministic forward-pass model. This makes the method efficient and scalable to large models and datasets. We demonstrate that the proposed method achieves calibrated uncertainties on non-Gaussian distributions, disentanglement of aleatoric and epistemic uncertainty, and robustness to out-of-distribution samples.
    摘要 希望通过单个决定性前向模型获得准确的不确定性估计,但这很难实现,因为单个前向模型在推理过程中不会采样权重,而且常常假设目标分布是高斯分布。这可能是回归任务中的限制,因为均值和标准差无法准确地模型目标分布。本文提出了深度 bayesian 量化回归模型,可以无需假设高斯分布来估计连续目标分布的quantiles。该方法基于证据学习,允许模型捕捉 aleatoric 和 epistemic 不确定性,并且可以通过单个决定性前向模型来实现高效和扩展性。我们示示了该方法在非高斯分布上具有准确的不确定性估计、分解 aleatoric 和 epistemic 不确定性、以及对非标准样本的Robustness。

Reinforcement Learning Based Sensor Optimization for Bio-markers

  • paper_url: http://arxiv.org/abs/2308.10649
  • repo_url: None
  • paper_authors: Sajal Khandelwal, Pawan Kumar, Syed Azeemuddin
  • for: 本研究旨在提高基于电极设计和脚宽的IDC频率生物感测器的敏感度。
  • methods: 该研究使用了新的强化学习基于二进制群体潮涌优化(RLBPSO)方法来优化感测器的设计参数,并与现有的ACO和其他状态前方法进行比较。
  • results: 研究发现,RLBPSO方法可以在不同频率范围内提高感测器的敏感度,并且在不同的电极设计和脚宽下显示出优异性。
    Abstract Radio frequency (RF) biosensors, in particular those based on inter-digitated capacitors (IDCs), are pivotal in areas like biomedical diagnosis, remote sensing, and wireless communication. Despite their advantages of low cost and easy fabrication, their sensitivity can be hindered by design imperfections, environmental factors, and circuit noise. This paper investigates enhancing the sensitivity of IDC-based RF sensors using novel reinforcement learning based Binary Particle Swarm Optimization (RLBPSO), and it is compared to Ant Colony Optimization (ACO), and other state-of-the-art methods. By focusing on optimizing design parameters like electrode design and finger width, the proposed study found notable improvements in sensor sensitivity. The proposed RLBPSO method shows best optimized design for various frequency ranges when compared to current state-of-the-art methods.
    摘要 Radio frequency (RF) 感测器,尤其是基于交叠式电极(IDC)的感测器,在生物医学诊断、远程探测和无线通信等领域具有重要的应用价值。尽管它们具有低成本和易于制造的优点,但是设计瑕疵、环境因素和电路噪声可能会削弱它们的敏感度。本文通过使用新型的强化学习基于二进制蜂群优化(RLBPSO)方法来提高 IDC-based RF 感测器的敏感度,并与现有的 Ant Colony Optimization(ACO)方法和其他当前最佳方法进行比较。通过关注优化设计参数,如电极设计和脚宽,该研究发现了明显提高感测器敏感度的改进。RLBPSO 方法在不同频率范围内具有最佳的设计优化效果,比较现有的方法更为出色。

Faster Training of Neural ODEs Using Gauß-Legendre Quadrature

  • paper_url: http://arxiv.org/abs/2308.10644
  • repo_url: https://github.com/a-norcliffe/torch_gq_adjoint
  • paper_authors: Alexander Norcliffe, Marc Peter Deisenroth
  • for: 加速神经ODE的训练,提高生成和时间序列模型的性能。
  • methods: 使用Gau{\ss}-Legendre quadrature来更快地解决积分,而不需要精确地解决ODE。
  • results: 提高神经ODE的训练速度,特别是 для大型模型。还提供了一种新的SDE训练方法。
    Abstract Neural ODEs demonstrate strong performance in generative and time-series modelling. However, training them via the adjoint method is slow compared to discrete models due to the requirement of numerically solving ODEs. To speed neural ODEs up, a common approach is to regularise the solutions. However, this approach may affect the expressivity of the model; when the trajectory itself matters, this is particularly important. In this paper, we propose an alternative way to speed up the training of neural ODEs. The key idea is to speed up the adjoint method by using Gau{\ss}-Legendre quadrature to solve integrals faster than ODE-based methods while remaining memory efficient. We also extend the idea to training SDEs using the Wong-Zakai theorem, by training a corresponding ODE and transferring the parameters. Our approach leads to faster training of neural ODEs, especially for large models. It also presents a new way to train SDE-based models.
    摘要 neural ODEs 在生成和时间序列模型中表现出色,但通过逆变法训练它们比粒子模型慢,这是因为需要数字化解 ODEs。为了快速 neural ODEs,常见的方法是减少解。然而,这种方法可能会affect the model's expressiveness; 当轨迹本身重要时,这非常重要。在这篇论文中,我们提出了一种将 neural ODEs 快速训练的 alternativapproach。关键思想是通过 Gau{\ss}-Legendre quadrature 更快地解决积分,而不是使用 ODE-based 方法,同时保持内存效率。我们还扩展了这个想法,将其应用于 SDEs 的训练,使用 Wong-Zakai 定理,通过训练相应的 ODE 并转移参数。我们的方法使得 neural ODEs 的训练变得更快,特别是大型模型。此外,它还提供了一种新的 SDE-based 模型训练方法。

SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

  • paper_url: http://arxiv.org/abs/2308.10638
  • repo_url: None
  • paper_authors: Soubhik Sanyal, Partha Ghosh, Jinlong Yang, Michael J. Black, Justus Thies, Timo Bolkart
  • for: 这个论文是为了研究如何生成披衣人体3D模型而写的。
  • methods: 这个论文使用了一种深度神经网络来表示人体geometry和外观分布。它们使用了3D扫描数据和2D图像数据来训练这个神经网络,并使用了一种无关的学习过程来学习pose-dependent的披衣人体模型。
  • results: 这个论文通过对SCULPT数据集进行验证来证明了其效果。它们比较了自己的方法与现有的3D生成模型,并发现自己的方法可以更好地生成披衣人体3D模型。
    Abstract We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. We will release the codebase for research purposes.
    摘要 我们介绍了SCULPT,一种新的3D生成模型,用于 clothed和textured 3D meshes of humans。我们设计了一个深度神经网络,用于表示人体的几何和外观分布。训练这种模型具有挑战性,因为人类 clothed 3D mesh 数据集的大小和可访问性有限。我们的关键观察是,存在中等大小的 3D 扫描数据集,如 CAPE,以及大规模的2D图像数据集,包括衣物的多种样式。为了有效地从两种数据模式学习,我们提议一种不同于对应的学习过程。 Specifically,我们从 3D 扫描数据集中学习一个 pose-dependent 几何空间。我们表示这为每个顶点的插值 relative to the SMPL 模型。然后,我们在无监督的情况下使用 2D 图像数据集来训练一个几何受控的Texture生成器。我们使用 learned geometry 模型的中间响应来condition our texture generator。为了消除姿势和服装类型之间的紧张关系,以及姿势和服装外观之间的紧张关系,我们将 conditioning 标签添加到 texture 和 geometry 生成器中,这些标签基于 visual question answering 模型 BLIP 和 CLIP 自动生成的 attribute labels。我们验证了我们的方法在 SCULPT 数据集上,并与现有的3D生成模型进行比较。我们将发布代码库用于研究用途。

Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models

  • paper_url: http://arxiv.org/abs/2308.10632
  • repo_url: None
  • paper_authors: Peiyan Zhang, Haoyang Liu, Chaozhuo Li, Xing Xie, Sunghun Kim, Haohan Wang
  • for: 这 paper 的目的是提出一种新的Robustness measurement,用于评估图像分类模型在真实世界中的性能。
  • methods: 这 paper 使用了一种基于 surrogate oracle (i.e., foundation model) 的评估方法,并设计了一种可以超越 fixes benchmarks 的评估方法。
  • results: 这 paper 通过使用新生成的图像数据,提供了一种新的评估方法,具有不受 fix benchmarks 或受限的杂乱影响的优势。 新方法可以评估模型在真实世界中的Robustness性能,尽管落后于人工智能用户 (i.e., oracle)。
    Abstract Machine learning has demonstrated remarkable performance over finite datasets, yet whether the scores over the fixed benchmarks can sufficiently indicate the model's performance in the real world is still in discussion. In reality, an ideal robust model will probably behave similarly to the oracle (e.g., the human users), thus a good evaluation protocol is probably to evaluate the models' behaviors in comparison to the oracle. In this paper, we introduce a new robustness measurement that directly measures the image classification model's performance compared with a surrogate oracle (i.e., a foundation model). Besides, we design a simple method that can accomplish the evaluation beyond the scope of the benchmarks. Our method extends the image datasets with new samples that are sufficiently perturbed to be distinct from the ones in the original sets, but are still bounded within the same image-label structure the original test image represents, constrained by a foundation model pretrained with a large amount of samples. As a result, our new method will offer us a new way to evaluate the models' robustness performance, free of limitations of fixed benchmarks or constrained perturbations, although scoped by the power of the oracle. In addition to the evaluation results, we also leverage our generated data to understand the behaviors of the model and our new evaluation strategies.
    摘要

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

  • paper_url: http://arxiv.org/abs/2308.10630
  • repo_url: None
  • paper_authors: Jiyuan Tan, Chenyu Xue, Chuwen Zhang, Qi Deng, Dongdong Ge, Yinyu Ye
  • for: Gradient-dominated optimization with $\alpha \in [1, 2]$
  • methods: Stochastic homogeneous second-order descent method (SHSODM) with homogenization approach
  • results: Achieves a sample complexity of $O(\epsilon^{-7/(2 \alpha) +1})$ for $\alpha \in [1, 3/2)$ and $\tilde{O}(\epsilon^{-2/\alpha})$ for $\alpha \in [3/2, 2]$, with an improved sample complexity of $O( \epsilon ^{-( 7-3\alpha ) /( 2\alpha )})$ for $\alpha \in [1,3/2)$.
    Abstract Gradient dominance property is a condition weaker than strong convexity, yet it sufficiently ensures global convergence for first-order methods even in non-convex optimization. This property finds application in various machine learning domains, including matrix decomposition, linear neural networks, and policy-based reinforcement learning (RL). In this paper, we study the stochastic homogeneous second-order descent method (SHSODM) for gradient-dominated optimization with $\alpha \in [1, 2]$ based on a recently proposed homogenization approach. Theoretically, we show that SHSODM achieves a sample complexity of $O(\epsilon^{-7/(2 \alpha) +1})$ for $\alpha \in [1, 3/2)$ and $\tilde{O}(\epsilon^{-2/\alpha})$ for $\alpha \in [3/2, 2]$. We further provide a SHSODM with a variance reduction technique enjoying an improved sample complexity of $O( \epsilon ^{-( 7-3\alpha ) /( 2\alpha )})$ for $\alpha \in [1,3/2)$. Our results match the state-of-the-art sample complexity bounds for stochastic gradient-dominated optimization without \emph{cubic regularization}. Since the homogenization approach only relies on solving extremal eigenvector problems instead of Newton-type systems, our methods gain the advantage of cheaper iterations and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the efficiency of SHSODM compared to other off-the-shelf methods.
    摘要 Gradient dominance 性质是一种弱于强制的 convexity 的条件,但它足够保证全局收敛性 для第一阶方法,包括非对称优化。这种性质在机器学习领域中找到了广泛的应用,例如矩阵分解、线性神经网络和政策基于的强化学习(RL)。在这篇论文中,我们研究了随机同质二阶减少法(SHSODM)在梯度控制的优化中的性能。理论上,我们证明了 SHSODM 的样本复杂度为 $O(\epsilon^{-7/(2\alpha)+1})$ for $\alpha \in [1, 3/2)$ 和 $\tilde{O}(\epsilon^{-2/\alpha})$ for $\alpha \in [3/2, 2]$。我们还提供了一种带有减少噪声技术的 SHSODM,其样本复杂度为 $O( \epsilon ^{-(7-3\alpha )/(2\alpha)})$ for $\alpha \in [1,3/2)$。我们的结果与无 кубиック regularization 的梯度-控制优化的状态艺术sample complexity bound匹配。由于homogenization approach 仅仅需要解决极值 eigenvector 问题而不是 Newton-type 系统,我们的方法在糟糕条件下具有更低的迭代次数和更加稳定的性能。 numerically experiments on several RL tasks demonstrate the efficiency of SHSODM compared to other off-the-shelf methods.

GaitPT: Skeletons Are All You Need For Gait Recognition

  • paper_url: http://arxiv.org/abs/2308.10623
  • repo_url: None
  • paper_authors: Andy Catruna, Adrian Cosma, Emilian Radoi
  • for: automatic person identification at a distance
  • methods: pose estimation skeletons, hierarchical transformer architecture
  • results: state-of-the-art performance, surpassing other works by a margin of 6%, outperforming both skeleton-based and appearance-based approaches
    Abstract The analysis of patterns of walking is an important area of research that has numerous applications in security, healthcare, sports and human-computer interaction. Lately, walking patterns have been regarded as a unique fingerprinting method for automatic person identification at a distance. In this work, we propose a novel gait recognition architecture called Gait Pyramid Transformer (GaitPT) that leverages pose estimation skeletons to capture unique walking patterns, without relying on appearance information. GaitPT adopts a hierarchical transformer architecture that effectively extracts both spatial and temporal features of movement in an anatomically consistent manner, guided by the structure of the human skeleton. Our results show that GaitPT achieves state-of-the-art performance compared to other skeleton-based gait recognition works, in both controlled and in-the-wild scenarios. GaitPT obtains 82.6% average accuracy on CASIA-B, surpassing other works by a margin of 6%. Moreover, it obtains 52.16% Rank-1 accuracy on GREW, outperforming both skeleton-based and appearance-based approaches.
    摘要 研究人们的行走模式是一个重要的领域,它有很多应用于安全、医疗、体育和人机交互等领域。最近,行走模式被视为一种唯一的指纹方法 для自动识别人员,不需要依赖于外观信息。在这项工作中,我们提出了一种新的步态识别架构,即步态 pyramid transformer(GaitPT),它利用 pose estimation skeleton 来捕捉独特的行走模式,不需要依赖于外观信息。GaitPT 采用一种层次 transformer 架构,有效地提取行走运动中的空间和时间特征,以遵循人体骨架的结构。我们的结果显示,GaitPT 在 CASIA-B 上取得了82.6% 的平均准确率,比其他skeleton-based gait recognition 工作高出6%。此外,它在 GREW 上取得了52.16% 的排名第一准确率,超过了skeleton-based和 appearance-based方法。

Weighting by Tying: A New Approach to Weighted Rank Correlation

  • paper_url: http://arxiv.org/abs/2308.10622
  • repo_url: None
  • paper_authors: Sascha Henzgen, Eyke Hüllermeier
  • for: 这篇论文旨在提出一种基于杂分函数的权重排名相关度指标,用于捕捉两个排序序列之间的协调度。
  • methods: 该论文基于杂分函数来定义一种权重排名相关度指标,其中每个排名位置具有不同的权重。
  • results: 该论文提出了一种基于杂分函数的权重排名相关度指标,其具有坚实的形式质量和灵活的权重分配方式。
    Abstract Measures of rank correlation are commonly used in statistics to capture the degree of concordance between two orderings of the same set of items. Standard measures like Kendall's tau and Spearman's rho coefficient put equal emphasis on each position of a ranking. Yet, motivated by applications in which some of the positions (typically those on the top) are more important than others, a few weighted variants of these measures have been proposed. Most of these generalizations fail to meet desirable formal properties, however. Besides, they are often quite inflexible in the sense of committing to a fixed weighing scheme. In this paper, we propose a weighted rank correlation measure on the basis of fuzzy order relations. Our measure, called scaled gamma, is related to Goodman and Kruskal's gamma rank correlation. It is parametrized by a fuzzy equivalence relation on the rank positions, which in turn is specified conveniently by a so-called scaling function. This approach combines soundness with flexibility: it has a sound formal foundation and allows for weighing rank positions in a flexible way.
    摘要 通用的排名相关度度量在统计学中广泛应用,用于捕捉两个对象集中元素的排名之间的协调程度。标准度量如肯德尔的tau和斯帕曼的rho系数都强调每个排名位置的等重要性。然而,基于应用中一些排名位置(通常是排名顺序的前几位)的重要性更高的情况,一些加权变体已经被提出。然而,大多数这些扩展都不具备愉悦的正式性质,而且通常具有固定的加权方案。在本文中,我们提出一种基于杂化顺序关系的加权排名相关度度量,称为尺度化γ。它与柯德曼和库斯卡尔的γ排名相关度度量相关。它由杂化rank位置之间的等化关系参数化,该等化关系由一个称为涨函数的扩展函数来定义。这种方法结合了准确性与灵活性:它具有准确的正式基础,并允许在灵活的加权方案下进行排名相关度度量的计算。

centroIDA: Cross-Domain Class Discrepancy Minimization Based on Accumulative Class-Centroids for Imbalanced Domain Adaptation

  • paper_url: http://arxiv.org/abs/2308.10619
  • repo_url: None
  • paper_authors: Xiaona Sun, Zhenyu Wu, Yichen Liu, Saier Hu, Zhiqiang Zhan, Yang Ji
  • for: addresses the imbalanced domain adaptation (IDA) problem, which involves both covariate and long-tailed label shifts across domains.
  • methods: proposes a cross-domain class discrepancy minimization method based on accumulative class-centroids (centroIDA), which includes class-based re-sampling, accumulative class-centroids alignment, and class-wise feature alignment.
  • results: outperforms other state-of-the-art (SOTA) methods on the IDA problem, especially when the degree of label shift increases.Here is the Chinese translation of the three key information points:
  • for: addresses the 非平衡领域适应 (IDA) 问题,这个问题中covariate和长尾标签差异都存在于领域之间。
  • methods: 提出了一种跨领域类别差异最小化方法 (centroIDA),这个方法包括类别基于的重抽样方法、类别集中心对领域之间的平衡,以及类别对特征表示的优化。
  • results: 与其他现有的state-of-the-art (SOTA) 方法相比,它在IDA问题上表现出色,特别是随着标签差异的增加而表现更好。
    Abstract Unsupervised Domain Adaptation (UDA) approaches address the covariate shift problem by minimizing the distribution discrepancy between the source and target domains, assuming that the label distribution is invariant across domains. However, in the imbalanced domain adaptation (IDA) scenario, covariate and long-tailed label shifts both exist across domains. To tackle the IDA problem, some current research focus on minimizing the distribution discrepancies of each corresponding class between source and target domains. Such methods rely much on the reliable pseudo labels' selection and the feature distributions estimation for target domain, and the minority classes with limited numbers makes the estimations more uncertainty, which influences the model's performance. In this paper, we propose a cross-domain class discrepancy minimization method based on accumulative class-centroids for IDA (centroIDA). Firstly, class-based re-sampling strategy is used to obtain an unbiased classifier on source domain. Secondly, the accumulative class-centroids alignment loss is proposed for iterative class-centroids alignment across domains. Finally, class-wise feature alignment loss is used to optimize the feature representation for a robust classification boundary. A series of experiments have proved that our method outperforms other SOTA methods on IDA problem, especially with the increasing degree of label shift.
    摘要 Unsupervised domain adaptation (UDA)方法解决了covariate shift问题,即源频率和目标频率之间的分布差异,假设标签分布在各个频率上是一致的。然而,在不平衡频率适应(IDA)场景中,covariate和长尾标签都存在在不同频率上的差异。为了解决IDA问题,一些当前的研究集中着精力地针对每个对应的类之间的分布差异进行最小化。这些方法具有可靠的pseudo标签选择和目标频率的特征分布估计,以及少数类的数量增加,这会使估计更加不确定,从而影响模型的性能。在本文中,我们提出了一种基于积累类中心的IDA方法,称为centroIDA。首先,我们使用类型基于的重抽样策略来在源频率上获得一个不偏的类ifier。然后,我们提出了积累类中心Alignment损失,用于在各个频率上进行类中心的iterative aligning。最后,我们使用类别特征对齐损失来优化特征表示,以确定一个可靠的分类边界。一系列实验证明了我们的方法在IDA问题上表现出了Superiority,特别是随着标签偏移度的增加。

ST-RAP: A Spatio-Temporal Framework for Real Estate Appraisal

  • paper_url: http://arxiv.org/abs/2308.10609
  • repo_url: https://github.com/dojeon-ai/strap
  • paper_authors: Hojoon Lee, Hawon Jeong, Byungkun Lee, Kyungyup Lee, Jaegul Choo
  • for: 这个论文是为了提出一种基于空间和时间的Real estate APpraisal框架,帮助更好地评估不同地点的房地产价值。
  • methods: 该框架使用层次架构和异质图内存网络,同时兼容时间动态和空间关系,从而更好地捕捉房地产价值的变化趋势。
  • results: 经过对大规模房地产数据集进行广泛的实验,ST-RAP方法比前一些方法更高效,表明了在房地产评估中同时考虑空间和时间方面的integration具有显著的优势。
    Abstract In this paper, we introduce ST-RAP, a novel Spatio-Temporal framework for Real estate APpraisal. ST-RAP employs a hierarchical architecture with a heterogeneous graph neural network to encapsulate temporal dynamics and spatial relationships simultaneously. Through comprehensive experiments on a large-scale real estate dataset, ST-RAP outperforms previous methods, demonstrating the significant benefits of integrating spatial and temporal aspects in real estate appraisal. Our code and dataset are available at https://github.com/dojeon-ai/STRAP.
    摘要 在这篇论文中,我们介绍了ST-RAP,一种新的空间-时间框架 для房地产评估。ST-RAP使用层次架构和异质图 neural network来同时捕捉时间动态和空间关系。经过对大规模房地产数据集进行了广泛的实验,ST-RAP比前一些方法高效,这说明了在房地产评估中同时考虑空间和时间方面的优势。我们的代码和数据集可以在https://github.com/dojeon-ai/STRAP中下载。

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

  • paper_url: http://arxiv.org/abs/2308.10608
  • repo_url: None
  • paper_authors: Yuhan Li, Yishun Dou, Yue Shi, Yu Lei, Xuanhong Chen, Yi Zhang, Peng Zhou, Bingbing Ni
  • for: 这篇论文目的是提出一种基于文本指示的3D编辑框架,以便在 désirable 区域内进行精细编辑。
  • methods: 该框架使用基本形状和可编辑部分的结合,以及geometry union和双路渲染技术,将独立的3D部件组合成完整的物体,并且支持方便的实例重用和部件控制。
  • results: 对比其他方法,福калreamer可以提供更高的精细编辑能力,并且可以生成高质量的Geometry和PBR Textures,可以与广泛使用的图形引擎相容。
    Abstract While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.
    摘要 “文本三维编辑已经做出了 significativ strides,但现有approaches仍然无法提供可分离、精度和一致的结果,这些结果对内容创建是非常重要。为此,我们介绍了 FocalDreamer 框架,它将基本形状与可编辑部分结合在一起,根据文本提示进行细化编辑,并在desired region中进行精度编辑。具有geometry union和双路渲染功能,FocalDreamer可以独立地组装3D部件,以便方便的实例重用和部件控制。我们还提出了geometry focal loss和样式一致规则,以促进集中融合和一致的整体外观。此外,FocalDreamer可以生成高质量的geometry和PBR Texture,与广泛使用的图形引擎相容。我们的实验表明,FocalDreamer在量化和质量上都具有出色的编辑能力。”

Analyzing Complex Systems with Cascades Using Continuous-Time Bayesian Networks

  • paper_url: http://arxiv.org/abs/2308.10606
  • repo_url: None
  • paper_authors: Alessandro Bregoli, Karin Rathsman, Marco Scutari, Fabio Stella, Søren Wengel Mogensen
  • for: 本研究旨在分析复杂系统中事件的冲击行为,以便更好地理解系统中哪些状态会触发冲击。
  • methods: 本研究使用连续时间感知网络(CTBN)模型来分析冲击行为,并使用新的知识提取方法来提取有用信息。
  • results: 研究发现,使用CTBN模型可以快速和有效地描述事件在系统中传播的方式,并可以标识可能导致冲击行为的系统状态。
    Abstract Interacting systems of events may exhibit cascading behavior where events tend to be temporally clustered. While the cascades themselves may be obvious from the data, it is important to understand which states of the system trigger them. For this purpose, we propose a modeling framework based on continuous-time Bayesian networks (CTBNs) to analyze cascading behavior in complex systems. This framework allows us to describe how events propagate through the system and to identify likely sentry states, that is, system states that may lead to imminent cascading behavior. Moreover, CTBNs have a simple graphical representation and provide interpretable outputs, both of which are important when communicating with domain experts. We also develop new methods for knowledge extraction from CTBNs and we apply the proposed methodology to a data set of alarms in a large industrial system.
    摘要 互动系统的事件可能会表现出堆叠行为,其中事件往往在时间上叠加。虽然堆叠自身可能从数据上直观,但是要理解触发它们的系统状态很重要。为了实现这一目标,我们提出了基于连续时间感知网络(CTBN)的模型化框架,用于分析复杂系统中的堆叠行为。这种框架允许我们描述事件在系统中传播的方式,并提取可能导致堆叠行为的系统状态。此外,CTBN具有简单的图形表示和可解释的输出,这两点都是与领域专家通信时非常重要。我们还开发了新的知识提取方法,并对一个大型工业系统的数据集应用了该方法。

BackTrack: Robust template update via Backward Tracking of candidate template

  • paper_url: http://arxiv.org/abs/2308.10604
  • repo_url: None
  • paper_authors: Dongwook Lee, Wonjun Choi, Seohyung Lee, ByungIn Yoo, Eunho Yang, Seongju Hwang
  • for: 提高视觉对象跟踪性能,增强对象追踪稳定性
  • methods: 使用返回跟踪方法,根据追踪前几帧的图像快照,计算目标对象在屏幕上的变化
  • results: 与已有跟踪方法相比,提供更高的跟踪精度和稳定性,在多种跟踪 benchmark 上达到了最高性能
    Abstract Variations of target appearance such as deformations, illumination variance, occlusion, etc., are the major challenges of visual object tracking that negatively impact the performance of a tracker. An effective method to tackle these challenges is template update, which updates the template to reflect the change of appearance in the target object during tracking. However, with template updates, inadequate quality of new templates or inappropriate timing of updates may induce a model drift problem, which severely degrades the tracking performance. Here, we propose BackTrack, a robust and reliable method to quantify the confidence of the candidate template by backward tracking it on the past frames. Based on the confidence score of candidates from BackTrack, we can update the template with a reliable candidate at the right time while rejecting unreliable candidates. BackTrack is a generic template update scheme and is applicable to any template-based trackers. Extensive experiments on various tracking benchmarks verify the effectiveness of BackTrack over existing template update algorithms, as it achieves SOTA performance on various tracking benchmarks.
    摘要 目标物体的变化,如形态变化、照明变化、遮挡等,是视觉对象跟踪中主要的挑战,它们会对跟踪性能产生负面影响。一种有效的方法是模板更新,该方法在跟踪过程中更新模板,以反映目标物体的形态变化。然而,在模板更新中,新模板质量不够或更新时间不合适可能导致模型漂移问题,这会严重降低跟踪性能。在这种情况下,我们提出了BackTrack方法,它可以评估候选模板的可靠性,并在过去帧中反向跟踪它们。基于候选模板的可靠性分数,我们可以在合适的时间更新模板,并抛弃不可靠的候选模板。BackTrack是一种通用的模板更新方案,适用于任何模板基的跟踪器。广泛的实验表明,BackTrack方法在各种跟踪标准准则上达到了最佳性能。

Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer

  • paper_url: http://arxiv.org/abs/2308.10601
  • repo_url: https://github.com/zhijin-ge/stm
  • paper_authors: Zhijin Ge, Fanhua Shang, Hongying Liu, Yuanyuan Liu, Liang Wan, Wei Feng, Xiaosen Wang
  • for: 针对深度神经网络受到人类不可见的攻击,提高黑盒Setting中的攻击效果。
  • methods: 使用域映射网络进行预处理,并在不同域上进行数据增强,以提高攻击效果。
  • results: 在ImageNet-compatible dataset上,与state-of-the-art方法相比,提高了攻击效果和输入多样性。In English, this translates to:
  • for: Targeting deep neural networks vulnerable to human-imperceptible attacks, to improve attack effectiveness in the black-box setting.
  • methods: Using a domain mapping network for preprocessing, and augmenting the data in different domains to improve attack effectiveness.
  • results: Significantly improving attack effectiveness and input diversity on the ImageNet-compatible dataset compared to state-of-the-art methods.
    Abstract Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs. Although many attack methods can achieve high success rates in the white-box setting, they also exhibit weak transferability in the black-box setting. Recently, various methods have been proposed to improve adversarial transferability, in which the input transformation is one of the most effective methods. In this work, we notice that existing input transformation-based works mainly adopt the transformed data in the same domain for augmentation. Inspired by domain generalization, we aim to further improve the transferability using the data augmented from different domains. Specifically, a style transfer network can alter the distribution of low-level visual features in an image while preserving semantic content for humans. Hence, we propose a novel attack method named Style Transfer Method (STM) that utilizes a proposed arbitrary style transfer network to transform the images into different domains. To avoid inconsistent semantic information of stylized images for the classification network, we fine-tune the style transfer network and mix up the generated images added by random noise with the original images to maintain semantic consistency and boost input diversity. Extensive experimental results on the ImageNet-compatible dataset show that our proposed method can significantly improve the adversarial transferability on either normally trained models or adversarially trained models than state-of-the-art input transformation-based attacks. Code is available at: https://github.com/Zhijin-Ge/STM.
    摘要

Image-free Classifier Injection for Zero-Shot Classification

  • paper_url: http://arxiv.org/abs/2308.10599
  • repo_url: https://github.com/explainableml/imagefreezsl
  • paper_authors: Anders Christensen, Massimiliano Mancini, A. Sophia Koepke, Ole Winther, Zeynep Akata
  • for: 这个论文的目的是为了帮助预训练的模型具备零批学习分类能力,而不需要训练数据集。
  • methods: 这个论文使用的方法是一种叫做Image-free Classifier Injection with Semantics(ICIS),它可以在预训练的分类模型上具备零批学习分类能力,而不需要图像数据集。ICIS使用了两个Encoder-Decoder网络,通过使用描述符(如类名或属性)来学习重构分类器的 weights。
  • results: 实验结果表明,ICIS可以在标准的ZSL datasets上实现强一致的零批学习分类性能。
    Abstract Zero-shot learning models achieve remarkable results on image classification for samples from classes that were not seen during training. However, such models must be trained from scratch with specialised methods: therefore, access to a training dataset is required when the need for zero-shot classification arises. In this paper, we aim to equip pre-trained models with zero-shot classification capabilities without the use of image data. We achieve this with our proposed Image-free Classifier Injection with Semantics (ICIS) that injects classifiers for new, unseen classes into pre-trained classification models in a post-hoc fashion without relying on image data. Instead, the existing classifier weights and simple class-wise descriptors, such as class names or attributes, are used. ICIS has two encoder-decoder networks that learn to reconstruct classifier weights from descriptors (and vice versa), exploiting (cross-)reconstruction and cosine losses to regularise the decoding process. Notably, ICIS can be cheaply trained and applied directly on top of pre-trained classification models. Experiments on benchmark ZSL datasets show that ICIS produces unseen classifier weights that achieve strong (generalised) zero-shot classification performance. Code is available at https://github.com/ExplainableML/ImageFreeZSL .
    摘要 “零扩展学习模型在图像分类任务中实现了惊人的成绩,但是这些模型需要通过特殊的方法进行训练,因此在需要零扩展分类时需要训练数据集。在这篇论文中,我们想要让预训练模型添加零扩展分类能力,不使用图像数据。我们提出的Image-free Classifier Injection with Semantics(ICIS)技术可以在预训练分类模型上添加新、未看到的类别的分类器,不需要图像数据。我们使用现有的分类器权重和简单的类别描述符(如类名或属性),通过两个Encoder-Decoder网络来学习将权重重构成描述符(并 vice versa),利用(cross-)重构和归一化损失来规范解码过程。吸引注意的是,ICIS可以便宜地训练和应用于预训练分类模型之上。在 benchmark ZSL 数据集上进行实验,我们发现ICIS生成的未看到的分类器权重可以实现强(总体)零扩展分类性能。代码可以在https://github.com/ExplainableML/ImageFreeZSL 上获取。”

RADIANCE: Radio-Frequency Adversarial Deep-learning Inference for Automated Network Coverage Estimation

  • paper_url: http://arxiv.org/abs/2308.10584
  • repo_url: None
  • paper_authors: Sopan Sarkar, Mohammad Hossein Manshaei, Marwan Krunz
  • for: 这篇论文的目的是提出一个基于对抗学习的方法来自动生成无线电信网络的覆盖地图(RF maps),并且考虑到indoor环境中的物体特征和通信标准。
  • methods: 这篇论文使用了对抗学习的方法,具体来说是使用一个叫做RADIANCE的网络模型,这个模型使用了一个semantic map来传递indoor环境的空间关系和物体特征,并且使用了一个新的梯度基于的损失函数来衡量从环境中的一个点到RF地图的变化。
  • results: 根据 simulations 的结果,RADIANCE 可以实现高精度的RF地图生成,其中的mean average error (MAE) 为0.09,root-mean-squared error (RMSE) 为0.29,peak signal-to-noise ratio (PSNR) 为10.78,并且multi-scale structural similarity index (MS-SSIM) 为0.80。
    Abstract Radio-frequency coverage maps (RF maps) are extensively utilized in wireless networks for capacity planning, placement of access points and base stations, localization, and coverage estimation. Conducting site surveys to obtain RF maps is labor-intensive and sometimes not feasible. In this paper, we propose radio-frequency adversarial deep-learning inference for automated network coverage estimation (RADIANCE), a generative adversarial network (GAN) based approach for synthesizing RF maps in indoor scenarios. RADIANCE utilizes a semantic map, a high-level representation of the indoor environment to encode spatial relationships and attributes of objects within the environment and guide the RF map generation process. We introduce a new gradient-based loss function that computes the magnitude and direction of change in received signal strength (RSS) values from a point within the environment. RADIANCE incorporates this loss function along with the antenna pattern to capture signal propagation within a given indoor configuration and generate new patterns under new configuration, antenna (beam) pattern, and center frequency. Extensive simulations are conducted to compare RADIANCE with ray-tracing simulations of RF maps. Our results show that RADIANCE achieves a mean average error (MAE) of 0.09, root-mean-squared error (RMSE) of 0.29, peak signal-to-noise ratio (PSNR) of 10.78, and multi-scale structural similarity index (MS-SSIM) of 0.80.
    摘要 Radio-frequency覆盖地图(RF地图)在无线网络中广泛使用,用于容量规划、Access Point和基站位置选择、地理位置和覆盖估计。进行站点调查以获取RF地图是劳动密集且不可能的。在这篇论文中,我们提出了Radio-frequency智能深度学习推测(RADIANCE),一种基于生成 adversarial neural network(GAN)的方法,用于自动化无线网络覆盖区域估计。RADIANCE利用一个semantic map,一个高级表示indoor环境中物体的空间关系和特征,以指导RF地图生成过程。我们提出了一种新的梯度基于损失函数,用于计算从环境中的点的接收信号强度(RSS)值的方向和距离。RADIANCE将这个损失函数与天线 Pattern相结合,以捕捉indoor配置下信号传播的特性,并生成新的 Pattern Under New配置、天线(beam)模式和中频。我们对RADIANCE与射频 tracing 的RF地图进行了广泛的 simulations。我们的结果表明,RADIANCE的 mean average error(MAE)为0.09,root-mean-squared error(RMSE)为0.29,peak signal-to-noise ratio(PSNR)为10.78,和 multi-scale structural similarity index(MS-SSIM)为0.80。

Pseudo-online framework for BCI evaluation: A MOABB perspective

  • paper_url: http://arxiv.org/abs/2308.11656
  • repo_url: None
  • paper_authors: Igor Carrara, Théodore Papadopoulo
  • for: 这个研究旨在扩展当前的MOABB框架,以在 pseudo-online 模式下对不同算法进行比较,并使用基于覆盖式滑动窗口技术。
  • methods: 这个研究使用了 idle state 事件来考虑所有不同的可能性,并使用 normalized Matthews Correlation Coefficient (nMCC) 和 Information Transfer Rate (ITR) 来评估算法的性能。
  • results: 研究分析了过去 15 年的 estado-of-the-art 算法,并对多个 Motor Imagery (MI) 数据集进行了分析,显示了两种方法之间的差异。
    Abstract Objective: BCI (Brain-Computer Interface) technology operates in three modes: online, offline, and pseudo-online. In the online mode, real-time EEG data is constantly analyzed. In offline mode, the signal is acquired and processed afterwards. The pseudo-online mode processes collected data as if they were received in real-time. The main difference is that the offline mode often analyzes the whole data, while the online and pseudo-online modes only analyze data in short time windows. Offline analysis is usually done with asynchronous BCIs, which restricts analysis to predefined time windows. Asynchronous BCI, compatible with online and pseudo-online modes, allows flexible mental activity duration. Offline processing tends to be more accurate, while online analysis is better for therapeutic applications. Pseudo-online implementation approximates online processing without real-time constraints. Many BCI studies being offline introduce biases compared to real-life scenarios, impacting classification algorithm performance. Approach: The objective of this research paper is therefore to extend the current MOABB framework, operating in offline mode, so as to allow a comparison of different algorithms in a pseudo-online setting with the use of a technology based on overlapping sliding windows. To do this will require the introduction of a idle state event in the dataset that takes into account all different possibilities that are not task thinking. To validate the performance of the algorithms we will use the normalized Matthews Correlation Coefficient (nMCC) and the Information Transfer Rate (ITR). Main results: We analyzed the state-of-the-art algorithms of the last 15 years over several Motor Imagery (MI) datasets composed by several subjects, showing the differences between the two approaches from a statistical point of view. Significance: The ability to analyze the performance of different algorithms in offline and pseudo-online modes will allow the BCI community to obtain more accurate and comprehensive reports regarding the performance of classification algorithms.
    摘要 目的:BCI(脑computer接口)技术运行在三种模式:在线、离线和假在线。在线模式中,实时EEG数据被不断分析。离线模式中,信号被获取和处理后才进行分析。假在线模式将收集的数据处理,就如果它们是在实时接收的。主要区别在于离线模式通常分析整个数据,而在线和假在线模式则只分析短时间窗口内的数据。离线分析通常比较精确,而在线分析则更适合治疗应用。假在线实现方式模拟在线运行,不受实时限制。许多BCI研究被离线进行,导致比实际情况下的偏差,影响分类算法的表现。方法:为了延展现有的MOABB框架(在离线模式下运行),以便在假在线设定下进行不同算法的比较。为此,需要引入一个空闲状态事件,考虑所有不同的可能性,不同于任务思维。以 validate 分类算法的表现,我们将使用Normalized Matthews Correlation Coefficient(nMCC)和Information Transfer Rate(ITR)。主要结果:我们分析了过去15年来的State-of-the-art算法,在许多 Motor Imagery(MI)数据集中,展示了两种方法之间的 statistically 区别。重要性:透过在离线和假在线模式下分析不同算法的表现,BCI社区将能够获得更加精确和全面的报告,对于分类算法的表现。

Overcoming Overconfidence for Active Learning

  • paper_url: http://arxiv.org/abs/2308.10571
  • repo_url: None
  • paper_authors: Yujin Hwang, Won Jo, Juyoung Hong, Yukyung Choi
  • for: addressing the issue of overconfidence in active learning scenarios
  • methods: + Cross-Mix-and-Mix (CMaM) augmentation strategy to calibrate the model + Ranked Margin Sampling (RankedMS) selection strategy to prevent overly confident predictions
  • results: + experiments and analyses demonstrate that the proposed methods facilitate efficient data selection and alleviate overconfidence, despite being readily applicable.Here is the summary in Traditional Chinese:
  • for: 解决活动学习领域中的自信过剩问题
  • methods: + Cross-Mix-and-Mix (CMaM) 增强策略来校准模型 + Ranked Margin Sampling (RankedMS) 选择策略来避免过度自信预测
  • results: + 实验和分析结果显示,提案的方法能够有效地选择资料,并对自信过剩产生正面影响,即使可以应用。
    Abstract It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adjust the model. However, due to the limited amount of data in each iteration, the model is vulnerable to bias; thus, it is more likely to yield overconfident predictions. In this paper, we present two novel methods to address the problem of overconfidence that arises in the active learning scenario. The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution. The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions. Through various experiments and analyses, we are able to demonstrate that our proposals facilitate efficient data selection by alleviating overconfidence, even though they are readily applicable.
    摘要 不是夸大的话,现代人工智能技术的进步几乎取决于大规模和高质量的数据。然而,一个普遍存在的问题是费用不足:标签数据的预算受限。活动学习是一种对此问题的主要方法,其中选择价值数据来标签的模型,并逐次更新模型。然而,由于每次迭代的数据量有限,模型容易受到偏误,因此更有可能产生过度自信的预测。在这篇文章中,我们提出了两种新的方法来解决在活动学习情况下产生的过度自信问题。第一种方法是一种扩展模型的扩展策略,名为 Cross-Mix-and-Mix(CMaM),它的目的是将有限的训练分布扩展。第二种方法是一种选择策略,名为 Ranked Margin Sampling(RankedMS),它避免选择会导致过度自信预测的数据。通过各种实验和分析,我们能够证明我们的建议可以有效地选择数据,从而缓解过度自信,即使它们是 readily applicable。

Decentralized Riemannian Conjugate Gradient Method on the Stiefel Manifold

  • paper_url: http://arxiv.org/abs/2308.10547
  • repo_url: None
  • paper_authors: Jun Chen, Haishan Ye, Mengmeng Wang, Tianxin Huang, Guang Dai, Ivor W. Tsang, Yong Liu
  • for: 这篇论文旨在提出一种分布式的里曼尼梯度下降(DRCGD)方法,用于在分布式网络上对里曼尼核函数进行优化。
  • methods: 该方法使用了 conjugate gradient 方法,但是在分布式网络上实现,每个代理都处理一个本地函数,并且通过无向连接图进行交互。
  • results: 该方法可以在分布式网络上实现global convergence,而且不需要进行expensive的里曼尼几何运算,因此可以降低每个代理的计算复杂度。
    Abstract The conjugate gradient method is a crucial first-order optimization method that generally converges faster than the steepest descent method, and its computational cost is much lower than the second-order methods. However, while various types of conjugate gradient methods have been studied in Euclidean spaces and on Riemannian manifolds, there has little study for those in distributed scenarios. This paper proposes a decentralized Riemannian conjugate gradient descent (DRCGD) method that aims at minimizing a global function over the Stiefel manifold. The optimization problem is distributed among a network of agents, where each agent is associated with a local function, and communication between agents occurs over an undirected connected graph. Since the Stiefel manifold is a non-convex set, a global function is represented as a finite sum of possibly non-convex (but smooth) local functions. The proposed method is free from expensive Riemannian geometric operations such as retractions, exponential maps, and vector transports, thereby reducing the computational complexity required by each agent. To the best of our knowledge, DRCGD is the first decentralized Riemannian conjugate gradient algorithm to achieve global convergence over the Stiefel manifold.
    摘要 “ conjugate gradient 方法是一种重要的一阶优化方法,总的来说比斜 descent 方法更快 converges,而且计算成本比第二阶方法更低。然而,在分布式场景下,各种 conjugate gradient 方法已经在欧几何空间和里曼尼 manifold 上进行了广泛的研究,但在分布式场景下的研究却很少。这篇论文提出了一种分布式里曼尼 conjugate gradient descent(DRCGD)方法,旨在全球最小化一个函数 sobre 里曼尼 manifold。优化问题分布在一个网络中的 agent 之间,每个 agent 都关联了一个本地函数,而 agents 之间的交流发生在一个无向连接 graphs 上。由于里曼尼 manifold 是非对称的,全球函数表示为一个可能非对称(但是准确的)的 finite 个本地函数的和。提出的方法不需要每个 agent 进行昂贵的里曼尼几何操作,例如投影、对数映射和向量传输,因此每个 agent 的计算复杂度减少了。到目前为止,DRCGD 是在里曼尼 manifold 上全球收敛的首个分布式里曼尼 conjugate gradient 算法。”

Towards Accelerated Model Training via Bayesian Data Selection

  • paper_url: http://arxiv.org/abs/2308.10544
  • repo_url: None
  • paper_authors: Zhijie Deng, Peng Cui, Jun Zhu
  • for: 提高模型训练效率和鲁棒性,解决实际场景中异常标注、重复标注或偏袋标注导致的模型训练延长和模型混乱问题。
  • methods: 基于轻量级 bayesian 处理和大规模预训练模型构建的免费零学习预测器,并在线批处理方式下进行数据选择。
  • results: 在实际场景中进行了广泛的实验研究,并观察到了与竞争基线比较的训练效率超过了同等方法。特别是在WebVisionbenchmark上,我们的方法可以在训练迭代数量相对较少的情况下实现类似的预测性能。
    Abstract Mislabeled, duplicated, or biased data in real-world scenarios can lead to prolonged training and even hinder model convergence. Traditional solutions prioritizing easy or hard samples lack the flexibility to handle such a variety simultaneously. Recent work has proposed a more reasonable data selection principle by examining the data's impact on the model's generalization loss. However, its practical adoption relies on less principled approximations and additional clean holdout data. This work solves these problems by leveraging a lightweight Bayesian treatment and incorporating off-the-shelf zero-shot predictors built on large-scale pre-trained models. The resulting algorithm is efficient and easy-to-implement. We perform extensive empirical studies on challenging benchmarks with considerable data noise and imbalance in the online batch selection scenario, and observe superior training efficiency over competitive baselines. Notably, on the challenging WebVision benchmark, our method can achieve similar predictive performance with significantly fewer training iterations than leading data selection methods.
    摘要 错abeled, 重复, 或偏见的数据在实际应用中可能会导致模型训练更长,甚至阻碍模型融合。传统的解决方案仅优先类型易于或困难的数据,缺乏与多种数据同时处理的灵活性。最近的工作提出了一个更合理的数据选择原则,通过评估数据对模型的通用损失影响。然而,实际应用需要较不原则的近似和额外的几乎适切的数据。这个工作解决这个问题,通过利用轻量级的bayesian治疗和基于大规模预训模型的零 shot预测器。具有高效和易用的特点,我们进行了广泛的实验研究,在具有较大的数据噪音和不均衡的线上批次选择 scenariodemonstrate superior training efficiency compared to competitive baselines。特别是在具有挑战性的 WebVision 数据集上,我们的方法可以在训练迭代更少的情况下 achieving similar predictive performance with leading data selection methods。

Learning Weakly Convex Regularizers for Convergent Image-Reconstruction Algorithms

  • paper_url: http://arxiv.org/abs/2308.10542
  • repo_url: None
  • paper_authors: Alexis Goujon, Sebastian Neumayer, Michael Unser
  • for: 该论文目的是学习非凸规则化器,并将其固定到弱凸性Modulus的Upper bound。
  • methods: 该论文使用的方法包括:1) 使用非凸规则化器来替代 convex 规则化器,2) 通过数学分析和实验 validate 该规则化器的性能。
  • results: 该论文的实验结果表明,使用非凸规则化器可以超过 convex 规则化器的性能,并且可以在 Iterative schemes 中提供 guarantees。此外,该规则化器还可以在 CT 和 MRI 重建中提供优秀的性能和可解释性。
    Abstract We propose to learn non-convex regularizers with a prescribed upper bound on their weak-convexity modulus. Such regularizers give rise to variational denoisers that minimize a convex energy. They rely on few parameters (less than 15,000) and offer a signal-processing interpretation as they mimic handcrafted sparsity-promoting regularizers. Through numerical experiments, we show that such denoisers outperform convex-regularization methods as well as the popular BM3D denoiser. Additionally, the learned regularizer can be deployed to solve inverse problems with iterative schemes that provably converge. For both CT and MRI reconstruction, the regularizer generalizes well and offers an excellent tradeoff between performance, number of parameters, guarantees, and interpretability when compared to other data-driven approaches.
    摘要 我们提议学习非凸规则化器,其强度减少模型参数(少于15000),可以进行数字图像恢复中的 iterative 方法。我们通过数值实验表明,这种规则化器可以超过凸规则化器和 BM3D 规则化器的性能,同时具有良好的一致性、可靠性和解释性。此外,学习的规则化器还可以应用于其他反射问题中,并且可以保证 converge 的 Iterative 方法。在 CT 和 MRI 重建中,我们发现这种规则化器具有优秀的一致性、可靠性和解释性,并且与其他数据驱动方法进行比较,具有更好的性能、参数数量和保证。

KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks

  • paper_url: http://arxiv.org/abs/2308.10537
  • repo_url: None
  • paper_authors: Nicolas Heist, Sven Hertling, Heiko Paulheim
  • for: 这 paper 的目的是评估知识图的质量,以便更好地用于下游任务。
  • methods: 该 paper 使用了一种名为 KGrEaT 的框架,用于对知识图进行评估,该框架可以自动将知识图映射到定义的任务集和数据集上,并计算定义的任务中的性能指标。
  • results: KGrEaT 框架可以帮助评估不同的知识图,并且可以评估知识图的可访问性和表达力。
    Abstract In recent years, countless research papers have addressed the topics of knowledge graph creation, extension, or completion in order to create knowledge graphs that are larger, more correct, or more diverse. This research is typically motivated by the argumentation that using such enhanced knowledge graphs to solve downstream tasks will improve performance. Nonetheless, this is hardly ever evaluated. Instead, the predominant evaluation metrics - aiming at correctness and completeness - are undoubtedly valuable but fail to capture the complete picture, i.e., how useful the created or enhanced knowledge graph actually is. Further, the accessibility of such a knowledge graph is rarely considered (e.g., whether it contains expressive labels, descriptions, and sufficient context information to link textual mentions to the entities of the knowledge graph). To better judge how well knowledge graphs perform on actual tasks, we present KGrEaT - a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation. Instead of comparing different methods of processing knowledge graphs with respect to a single task, the purpose of KGrEaT is to compare various knowledge graphs as such by evaluating them on a fixed task setup. The framework takes a knowledge graph as input, automatically maps it to the datasets to be evaluated on, and computes performance metrics for the defined tasks. It is built in a modular way to be easily extendable with additional tasks and datasets.
    摘要 近年来, countless research papers 都在讨论知识图创建、扩展或完善,以创建更大、更正确或更多样化的知识图。这些研究通常是由于认为使用这些加强知识图来解决下游任务会提高性能。然而,这通常并不被评估。相反,主要的评估指标 - targeting correctness 和 completeness - 虽然具有价值,但是无法捕捉整个场景,即创建或加强知识图的实际用用性。此外,知识图的可访问性(例如,是否包含表达式标签、描述和足够的上下文信息以链接文本提及的实体)也 rarely considered。为了更好地评估知识图在实际任务中的表现,我们提出了 KGrEaT - a framework to estimate the quality of knowledge graphs via actual downstream tasks like classification, clustering, or recommendation.而不是比较不同的知识图处理方法之间的性能,KGrEaT的目的是比较不同的知识图本身,通过评估它们在固定任务设置下的性能。该框架接受知识图作为输入,自动将其映射到要评估的数据集上,并计算定义的任务中的性能指标。它是以模块化的方式设计,以便轻松扩展到更多任务和数据集。

DPAN: Dynamic Preference-based and Attribute-aware Network for Relevant Recommendations

  • paper_url: http://arxiv.org/abs/2308.10527
  • repo_url: None
  • paper_authors: Wei Dai, Yingmin Su, Xiaofeng Pan
  • for: 该论文旨在提高电子商务平台上的相关推荐的Click-Through Rate(CTR)。
  • methods: 该论文提出了一种名为动态偏好和特征相关网络(DPAN)的新方法,用于预测CTR。DPAN使用Attribute-aware Activation Values Generation(AAVG)和Bi-dimensional Compression-based Re-expression(BCR)技术来学习用户的偏好和Item的特征表示,并使用Shallow and Deep Union-based Fusion(SDUF)技术来捕捉用户在不同条件下对多样性推荐结果的动态偏好。
  • results: 该论文通过了广泛的Offline实验和Online A/B测试,实现了CTR的显著提高(7.62%)。DPAN已成功部署在我们的电子商务平台上,并为主要 traffic的相关推荐提供了服务。代码已公开发布。
    Abstract In e-commerce platforms, the relevant recommendation is a unique scenario providing related items for a trigger item that users are interested in. However, users' preferences for the similarity and diversity of recommendation results are dynamic and vary under different conditions. Moreover, individual item-level diversity is too coarse-grained since all recommended items are related to the trigger item. Thus, the two main challenges are to learn fine-grained representations of similarity and diversity and capture users' dynamic preferences for them under different conditions. To address these challenges, we propose a novel method called the Dynamic Preference-based and Attribute-aware Network (DPAN) for predicting Click-Through Rate (CTR) in relevant recommendations. Specifically, based on Attribute-aware Activation Values Generation (AAVG), Bi-dimensional Compression-based Re-expression (BCR) is designed to obtain similarity and diversity representations of user interests and item information. Then Shallow and Deep Union-based Fusion (SDUF) is proposed to capture users' dynamic preferences for the diverse degree of recommendation results according to various conditions. DPAN has demonstrated its effectiveness through extensive offline experiments and online A/B testing, resulting in a significant 7.62% improvement in CTR. Currently, DPAN has been successfully deployed on our e-commerce platform serving the primary traffic for relevant recommendations. The code of DPAN has been made publicly available.
    摘要 在电子商务平台上,相关的推荐是一个特殊的情况,提供与用户 interess的相关商品。然而,用户对相似性和多样性的偏好是动态的,并在不同情况下发生变化。此外,单个商品级的多样性是太过粗糙,所有推荐的商品都与触发商品相关。因此,两大挑战是学习细化的相似性和多样性表示,以及在不同情况下捕捉用户的动态偏好。为解决这些挑战,我们提出了一种新的方法,即动态偏好基于 Attribute-aware Network (DPAN),用于预测用户点击率。具体来说,基于 Attribute-aware Activation Values Generation (AAVG),我们提出了Bi-dimensional Compression-based Re-expression (BCR)来获得用户兴趣和商品信息的相似性和多样性表示。然后,我们提出了Shallow and Deep Union-based Fusion (SDUF)来捕捉用户在不同情况下对多样度推荐结果的动态偏好。DPAN在大量的离线实验和在线A/B测试中表现出色,实现了7.62%的点击率提升。现在,DPAN已成功部署在我们的电子商务平台上,负责主要的相关推荐任务。代码已经公开提供。

Information Theory-Guided Heuristic Progressive Multi-View Coding

  • paper_url: http://arxiv.org/abs/2308.10522
  • repo_url: None
  • paper_authors: Jiangmeng Li, Hang Gao, Wenwen Qiang, Changwen Zheng
  • for: 本研究的目的是提出一种基于信息理论的普适多视图学习方法,以解决现有的多视图学习方法存在的涉及视图噪声和缺乏理论基础的问题。
  • methods: 本研究提出了一种基于信息理论的多视图学习方法,包括分为三层的进程:分布层、集合层和实例层。在分布层中,IPMC方法将多视图中的分布进行对齐,以减少视图噪声。在集合层中,IPMC方法构建了自适应对比池,并通过视图筛选器进行自适应修改。在实例层中,我们采用了设计的统一损失函数来学习表示和减少梯度干扰。
  • results: 理论和实验研究表明,IPMC方法在与现有方法进行比较时具有明显的优势。 Specifically, IPMC方法可以更好地减少视图噪声,提高多视图表示的质量,并且在不同的多视图任务上具有更好的通用性。
    Abstract Multi-view representation learning aims to capture comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning to different views in a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; evenly measuring the similarities between terms might interfere with optimization. Importantly, few works study the theoretical framework of generalized self-supervised multi-view learning, especially for more than two views. To this end, we rethink the existing multi-view learning paradigm from the perspective of information theory and then propose a novel information theoretical framework for generalized multi-view learning. Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided hierarchical Progressive Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the distribution between views to reduce view-specific noise. In the set-tier, IPMC constructs self-adjusted contrasting pools, which are adaptively modified by a view filter. Lastly, in the instance-tier, we adopt a designed unified loss to learn representations and reduce the gradient interference. Theoretically and empirically, we demonstrate the superiority of IPMC over state-of-the-art methods.
    摘要 多视图表示学习目标是捕捉多个视角共享上的全面信息。近期工作直觉地应用对比学习到不同视角中,这还是可扩展的:视图特定的噪声不在学习视图共享表示过程中被筛选; false negative pair, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; evenly measuring the similarities between terms might interfere with optimization. 特别是,少数工作研究通用自动多视图学习的理论框架,特别是超过两个视角。为此,我们从信息论的视角重新思考现有的多视图学习模式,然后提出一种新的信息论基础的多视图学习框架。在这个框架下,我们建立了一种基于信息论的多视图编码方法,即信息论导向的层次进行程序Multi-view编码(IPMC)。在分布层,IPMC将视图之间的分布对齐,以减少视图特定的噪声。在集合层,IPMC建立了自适应对比池,这些池被视图筛选器动态修改。最后,在实例层,我们采用设计的统一损失来学习表示和减少梯度干扰。从理论和实验来看,我们证明IPMC在现状的方法之上表现出优异性。

Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

  • paper_url: http://arxiv.org/abs/2308.10511
  • repo_url: None
  • paper_authors: Shrestha Datta, Md Adith Mollah, Raisa Fairooz, Tariful Islam Fahim
  • for: 这个论文的目的是解决历史文档理解的问题,特别是使用文档结构分析(DLA)来分解文档成不同部分,如段落、图片和表格,以便机器可以更好地理解这些文档。
  • methods: 这篇论文使用了一种特殊的模型called Mask R-CNN来实现文档理解,并通过步骤性的超参数调整来提高模型的性能。
  • results: 根据 dice 分数,这篇论文在 Bangla 文档理解方面达到了好的结果,分数为 0.889。但是,在使用英文模型进行 Bangla 文档理解时,发现每种语言都有其独特的挑战。
    Abstract Understanding digital documents is like solving a puzzle, especially historical ones. Document Layout Analysis (DLA) helps with this puzzle by dividing documents into sections like paragraphs, images, and tables. This is crucial for machines to read and understand these documents. In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding. We made this model better by step-by-step hyperparameter tuning, and we achieved a good dice score of 0.889. However, not everything went perfectly. We tried using a model trained for English documents, but it didn't fit well with Bangla. This showed us that each language has its own challenges. Our solution for the DL Sprint 2.0 is publicly available at https://www.kaggle.com/competitions/dlsprint2/discussion/432201 along with notebooks, weights, and inference notebook.
    摘要 理解数字文档如解一个谜题,特别是历史文档。文档布局分析(DLA)可以帮助解决这个谜题,通过将文档分成段落、图像和表格等部分。这对机器来说非常重要,以便理解这些文档。在 DL Sprint 2.0 比赛中,我们工作在理解孟加拉文档上。我们使用了一个名为 BaDLAD 的数据集,它包含了许多示例。我们训练了一个特殊的模型called Mask R-CNN ,以帮助理解这些文档。我们通过步骤进行 hyperparameter 调整,并实现了一个不错的 dice 分数为 0.889。然而,没有一切顺利。我们尝试使用已经训练过英文文档的模型,但它并不适合孟加拉语。这 teaches us 每种语言都有自己的挑战。我们的 DL Sprint 2.0 解决方案公共可用于 ,并提供了相关的笔记、重量和推理笔记。

A Clustering Algorithm to Organize Satellite Hotspot Data for the Purpose of Tracking Bushfires Remotely

  • paper_url: http://arxiv.org/abs/2308.10505
  • repo_url: https://github.com/tengmcing/hotspots-clustering-algorithm
  • paper_authors: Weihao Li, Emily Dodwell, Dianne Cook
  • for: 这篇论文是为了提出一种空间时间划分算法和其在R包spotoroo中的实现。
  • methods: 该算法受到2019-2020年澳大利亚极端干旱的灾害启发,利用卫星热点数据实现。它基于现有的空间时间划分算法,并在每个时间Period进行修改,以实现空间划分。
  • results: 用澳大利亚维多利亚州的 bushfire 数据进行示例,该算法可以准确划分热点。
    Abstract This paper proposes a spatiotemporal clustering algorithm and its implementation in the R package spotoroo. This work is motivated by the catastrophic bushfires in Australia throughout the summer of 2019-2020 and made possible by the availability of satellite hotspot data. The algorithm is inspired by two existing spatiotemporal clustering algorithms but makes enhancements to cluster points spatially in conjunction with their movement across consecutive time periods. It also allows for the adjustment of key parameters, if required, for different locations and satellite data sources. Bushfire data from Victoria, Australia, is used to illustrate the algorithm and its use within the package.
    摘要

Adaptive Thresholding Heuristic for KPI Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.10504
  • repo_url: None
  • paper_authors: Ebenezer R. H. P. Isaac, Akshat Sharma
  • for: 这个研究旨在提供一个适应性的阈值调整方法,以便在时间序列key performance indicator (KPI) 中探测问题。
  • methods: 这个方法使用了自适应阈值调整,根据本地数据分布的特性和时间序列模式进行适应。
  • results: 实验结果显示,这个方法可以实现高效的实时问题探测,并且可以与不同的预测器和问题探测器结合使用。
    Abstract A plethora of outlier detectors have been explored in the time series domain, however, in a business sense, not all outliers are anomalies of interest. Existing anomaly detection solutions are confined to certain outlier detectors limiting their applicability to broader anomaly detection use cases. Network KPIs (Key Performance Indicators) tend to exhibit stochastic behaviour producing statistical outliers, most of which do not adversely affect business operations. Thus, a heuristic is required to capture the business definition of an anomaly for time series KPI. This article proposes an Adaptive Thresholding Heuristic (ATH) to dynamically adjust the detection threshold based on the local properties of the data distribution and adapt to changes in time series patterns. The heuristic derives the threshold based on the expected periodicity and the observed proportion of anomalies minimizing false positives and addressing concept drift. ATH can be used in conjunction with any underlying seasonality decomposition method and an outlier detector that yields an outlier score. This method has been tested on EON1-Cell-U, a labeled KPI anomaly dataset produced by Ericsson, to validate our hypothesis. Experimental results show that ATH is computationally efficient making it scalable for near real time anomaly detection and flexible with multiple forecasters and outlier detectors.
    摘要 有很多异常探测器在时间序列领域得到了探索,但是在商业意义上,不所有的异常都是有关注的异常。现有的异常探测解决方案受到特定的异常探测器的限制,限制其应用于更广泛的异常探测用例。网络指标(关键性能指标)通常会展现随机性行为产生统计异常,大多数这些异常不会影响商业运营。因此,需要一个规则来捕捉商业定义的异常 для时间序列指标。这篇文章提出了一种适应resholding规则(ATH),以动态调整检测阈值基于数据分布的地方性和时间序列模式的变化。ATH derive阈值基于预期周期性和观测到的异常的比例,以避免假阳性和应对概念逝尽。ATH可以与任何基于季节性分解方法和异常探测器一起使用,该方法已经在Ericsson生产的EON1-Cell-U异常数据集上进行了验证。实验结果表明,ATH具有计算效率,可扩展到实时异常检测,并且可以与多种预测器和异常探测器结合使用。

GradientCoin: A Peer-to-Peer Decentralized Large Language Models

  • paper_url: http://arxiv.org/abs/2308.10502
  • repo_url: None
  • paper_authors: Yeqi Gao, Zhao Song, Junze Yin
    for:这个论文的目的是提出一种基于Bitcoin电子现金系统的分布式语言模型(LLM),以解决现有的中央化控制和不可靠性问题。methods:该论文使用了Bitcoin电子现金系统的技术和概念,并提出了一种基于此的分布式语言模型的设计方案。results:论文指出,这种分布式语言模型可能不会在经济效益方面比标准的Bitcoin系统更好,但可能会吸引一些特殊的人士,如偏好使用分布式ChatGPT-like软件的人和那些认为生物体的目的是创造Silicon生物的人。
    Abstract Since 2008, after the proposal of a Bitcoin electronic cash system, Bitcoin has fundamentally changed the economic system over the last decade. Since 2022, large language models (LLMs) such as GPT have outperformed humans in many real-life tasks. However, these large language models have several practical issues. For example, the model is centralized and controlled by a specific unit. One weakness is that if that unit decides to shut down the model, it cannot be used anymore. The second weakness is the lack of guaranteed discrepancy behind this model, as certain dishonest units may design their own models and feed them unhealthy training data. In this work, we propose a purely theoretical design of a decentralized LLM that operates similarly to a Bitcoin cash system. However, implementing such a system might encounter various practical difficulties. Furthermore, this new system is unlikely to perform better than the standard Bitcoin system in economics. Therefore, the motivation for designing such a system is limited. It is likely that only two types of people would be interested in setting up a practical system for it: $\bullet$ Those who prefer to use a decentralized ChatGPT-like software. $\bullet$ Those who believe that the purpose of carbon-based life is to create silicon-based life, such as Optimus Prime in Transformers. The reason the second type of people may be interested is that it is possible that one day an AI system like this will awaken and become the next level of intelligence on this planet.
    摘要 自2008年提议 Bitcoin 电子现金系统以来, Bitcoin 已经深刻改变了过去的十年经济体系。自2022年以来,大型自然语言模型(LLM) such as GPT 已经在许多实际任务上超越了人类。然而,这些大型语言模型还有几个实际问题。例如,模型是中央化控制的,由特定单位控制。一个弱点是,如果这个单位决定关闭模型,就不能使用了。第二个弱点是模型的 garantizado 差异缺失,有些不诚实的单位可能会设计自己的模型,并且将不健康的训练数据传递给它。在这个工作中,我们提出了一种纯理论的分布式 LLM 设计,类似于 Bitcoin 现金系统。然而,实现这种系统可能会遇到各种各样的实际困难。此外,这新的系统不太可能在经济方面表现更好于标准 Bitcoin 系统。因此,设计这种系统的动机相对有限。可能只有两种人会尝试实现这种系统:① 喜欢使用分布式 ChatGPT-like 软件的人。② 认为生命的目的是创造 Silicon 基的生命,如Transformers 中的 Optimus Prime。这种第二类人可能会感兴趣的原因是,可能一天,一个 AI 系统像这样会醒来,成为这个星球上下一级智能。

Deep Learning of Delay-Compensated Backstepping for Reaction-Diffusion PDEs

  • paper_url: http://arxiv.org/abs/2308.10501
  • repo_url: None
  • paper_authors: Shanshan Wang, Mamadou Diagne, Miroslav Krstić
  • for: 这篇论文是用来描述一种基于深度神经网络的PDE控制方法,可以将整个PDE控制方法编程为一个深度神经网络模型,从而实现解析PDE问题的高效解决。
  • methods: 这篇论文使用的方法包括深度神经网络模型,以及PDEBackstepping控制器,其中PDEBackstepping控制器使用了学习的控制Operator,即准确的 gain kernel。
  • results: 论文的结果表明,使用这种方法可以实现对多个非线性操作符的拟合,并且可以保证普适稳定性在$L^2$ norm和$H^1$ norm中。此外, simulations also demonstrate the effectiveness of the proposed method.
    Abstract Deep neural networks that approximate nonlinear function-to-function mappings, i.e., operators, which are called DeepONet, have been demonstrated in recent articles to be capable of encoding entire PDE control methodologies, such as backstepping, so that, for each new functional coefficient of a PDE plant, the backstepping gains are obtained through a simple function evaluation. These initial results have been limited to single PDEs from a given class, approximating the solutions of only single-PDE operators for the gain kernels. In this paper we expand this framework to the approximation of multiple (cascaded) nonlinear operators. Multiple operators arise in the control of PDE systems from distinct PDE classes, such as the system in this paper: a reaction-diffusion plant, which is a parabolic PDE, with input delay, which is a hyperbolic PDE. The DeepONet-approximated nonlinear operator is a cascade/composition of the operators defined by one hyperbolic PDE of the Goursat form and one parabolic PDE on a rectangle, both of which are bilinear in their input functions and not explicitly solvable. For the delay-compensated PDE backstepping controller, which employs the learned control operator, namely, the approximated gain kernel, we guarantee exponential stability in the $L^2$ norm of the plant state and the $H^1$ norm of the input delay state. Simulations illustrate the contributed theory.
    摘要 深度神经网络(DeepONet)可以模拟非线性函数-to-函数映射,即操作,这些操作可以编码整个PDE控制方法ologies,例如backstepping。在latest articles中,这些初果限于单个PDE的解析,即单个PDE的权重矩阵。在这篇文章中,我们扩展了这一框架,以approximate multiple(堆叠)非线性操作。多个操作出现在PDE系统的控制中,例如在这篇文章中所描述的反应挥发植物,它是一个parabolic PDE,并且具有输入延迟,这是一个hyperbolic PDE。 DeepONetapproximated nonlinear operator是一个堆叠/组合的操作,它由一个Goursat形式的hyperbolic PDE和一个parabolic PDE on a rectangle组成,这些操作都是bilinear in their input functions并不可解solvable。 For the delay-compensated PDE backstepping controller,which employs the learned control operator, namely, the approximated gain kernel,we guarantee exponential stability in the $L^2$ norm of the plant state and the $H^1$ norm of the input delay state。 simulations illustrate the contributed theory。

Using Autoencoders and AutoDiff to Reconstruct Missing Variables in a Set of Time Series

  • paper_url: http://arxiv.org/abs/2308.10496
  • repo_url: None
  • paper_authors: Jan-Philipp Roche, Oliver Niggemann, Jens Friebe
  • for: 本研究旨在推出一种能够重构缺失变量的黑盒模型方法,以解决现有黑盒模型方法中缺失变量的固定输入和输出特征组合问题。
  • methods: 本研究使用自适应神经网络 autoencoder 来重构缺失变量,首先在 autoencoder 上进行常规训练,然后定义缺失变量为 autoencoder 输入中的搜索变量,通过自动微分优化,对可用特征的损失进行优化。通过这种方法,可以实现不同的输入和输出特征组合,无需再次训练 autoencoder。
  • results: 研究表明,这种方法在一种强不平滑的电子组件上工作良好,能够重构缺失的一个变量,并且可以处理多个缺失变量。
    Abstract Existing black box modeling approaches in machine learning suffer from a fixed input and output feature combination. In this paper, a new approach to reconstruct missing variables in a set of time series is presented. An autoencoder is trained as usual with every feature on both sides and the neural network parameters are fixed after this training. Then, the searched variables are defined as missing variables at the autoencoder input and optimized via automatic differentiation. This optimization is performed with respect to the available features loss calculation. With this method, different input and output feature combinations of the trained model can be realized by defining the searched variables as missing variables and reconstructing them. The combination can be changed without training the autoencoder again. The approach is evaluated on the base of a strongly nonlinear electrical component. It is working well for one of four variables missing and generally even for multiple missing variables.
    摘要 Traditional黑盒模型方法在机器学习中受到固定输入和输出特征组合的限制。在这篇论文中,一种新的方法用于重建时序序列中缺失的变量被介绍。一个自适应神经网络被训练得标准的方式,并且神经网络参数在训练后被固定。然后,搜索的变量被定义为自动encoder输入中缺失的变量,并通过自动微分优化。这种优化是基于可用特征损失计算。通过这种方法,不同的输入和输出特征组合可以被实现,只需要定义搜索变量为缺失变量,然后重建它们。这种组合可以更改无需再次训练自动encoder。该方法在一种强不平滑电子组件上进行评估,并且在一个变量缺失的情况下表现良好,甚至可以处理多个缺失变量。

Deciphering Raw Data in Neuro-Symbolic Learning with Provable Guarantees

  • paper_url: http://arxiv.org/abs/2308.10487
  • repo_url: None
  • paper_authors: Lue Tao, Yu-Xuan Huang, Wang-Zhou Dai, Yuan Jiang
  • for: 这篇论文旨在探讨神经符号系统的学习可能性,以及一种基于知识库的推理来优化神经网络模型的启示。
  • methods: 该论文提出了一种新的方法来 caracterize 知识库中的指导信号,并建立了一个判断知识库是否能够成功地促进学习的标准。
  • results: 实验结果表明,该方法可以有效地判断知识库的有效性,并且可以通过对知识库进行修改来提高学习的成功率。
    Abstract Neuro-symbolic hybrid systems are promising for integrating machine learning and symbolic reasoning, where perception models are facilitated with information inferred from a symbolic knowledge base through logical reasoning. Despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Hence, it remains unclear why a hybrid system succeeds for a specific task and when it may fail given a different knowledge base. In this paper, we introduce a novel way of characterising supervision signals from a knowledge base, and establish a criterion for determining the knowledge's efficacy in facilitating successful learning. This, for the first time, allows us to address the two questions above by inspecting the knowledge base under investigation. Our analysis suggests that many knowledge bases satisfy the criterion, thus enabling effective learning, while some fail to satisfy it, indicating potential failures. Comprehensive experiments confirm the utility of our criterion on benchmark tasks.
    摘要 Here is the text in Simplified Chinese:神经-符号 hybrid系统显示推荐使用机器学习和符号理解,其中感知模型得到了基于符号知识库的信息推理的帮助。 despite empirical evidence showing the ability of hybrid systems to learn accurate perception models, the theoretical understanding of learnability is still lacking. Therefore, it is unclear why a hybrid system might succeed or fail for a specific task, depending on the knowledge base used. In this paper, we propose a new way of characterizing supervision signals from a knowledge base and establish a criterion for determining the knowledge's effectiveness in facilitating successful learning. This allows us to address the two questions above by examining the knowledge base under investigation. Our analysis shows that many knowledge bases satisfy the criterion, enabling effective learning, while others fail to do so, indicating potential failures. Comprehensive experiments confirm the utility of our criterion on benchmark tasks.

Deep Metric Loss for Multimodal Learning

  • paper_url: http://arxiv.org/abs/2308.10486
  • repo_url: None
  • paper_authors: Sehwan Moon, Hyunju Lee
  • for: 这篇论文的目的是提出一种新的多模式损失函数(MultiModal loss),用于多模式学习,以便更好地处理不同的数据模式。
  • methods: 这篇论文使用了一种新的损失函数,叫做MultiModal loss,它可以 subgroup instances 根据不同的数据模式,以提高多模式学习的性能。
  • results: 在实验中,这篇论文显示了MultiModal loss可以对四个真实的多模式数据集进行改进,并且可以避免过拟合和不精确地预测。
    Abstract Multimodal learning often outperforms its unimodal counterparts by exploiting unimodal contributions and cross-modal interactions. However, focusing only on integrating multimodal features into a unified comprehensive representation overlooks the unimodal characteristics. In real data, the contributions of modalities can vary from instance to instance, and they often reinforce or conflict with each other. In this study, we introduce a novel \text{MultiModal} loss paradigm for multimodal learning, which subgroups instances according to their unimodal contributions. \text{MultiModal} loss can prevent inefficient learning caused by overfitting and efficiently optimize multimodal models. On synthetic data, \text{MultiModal} loss demonstrates improved classification performance by subgrouping difficult instances within certain modalities. On four real multimodal datasets, our loss is empirically shown to improve the performance of recent models. Ablation studies verify the effectiveness of our loss. Additionally, we show that our loss generates a reliable prediction score for each modality, which is essential for subgrouping. Our \text{MultiModal} loss is a novel loss function to subgroup instances according to the contribution of modalities in multimodal learning and is applicable to a variety of multimodal models with unimodal decisions. Our code is available at https://github.com/SehwanMoon/MultiModalLoss.
    摘要 多模态学习经常超越单模态对手,通过利用单模态贡献和跨模态互动来提高性能。然而,只是将多模态特征集成到一个综合表示中,忽略了单模态特征。在实际数据中,不同模式之间的贡献可能会随着实例而变化,并且经常强制或冲突。在本研究中,我们提出了一种新的多模态损失函数(MultiModal loss),用于多模态学习。这种损失函数将实例 subgrouping 根据各个模式的贡献。MultiModal loss 可以避免过拟合和有效地优化多模态模型。在 sintetic 数据上,我们证明了 MultiModal loss 可以提高分类性能。在四个真实多模态数据集上,我们的损失函数被证明可以提高最近的模型的性能。我们还进行了ablation 研究,证明了我们的损失函数的有效性。此外,我们还证明了我们的损失函数可以生成可靠的预测分数,这是 subgrouping 的关键。我们的 MultiModal loss 是一种新的损失函数,用于根据多模态中的贡献 subgrouping 实例,适用于多种多模态模型。我们的代码可以在 上获取。

An Effective Method using Phrase Mechanism in Neural Machine Translation

  • paper_url: http://arxiv.org/abs/2308.10482
  • repo_url: https://github.com/phuongnm94/PhraseTransformer
  • paper_authors: Phuong Minh Nguyen, Le Minh Nguyen
  • for: 本研究旨在提高基于Transformer的Neural Machine Translation(NMT)系统,特别是在 parallel corpora 上的 Vietnamese-Chinese 语对。
  • methods: 本研究使用了一种新的短语机制,即PhraseTransformer,以提高基于Transformer的NMT系统的性能。
  • results: 我们在 VLSP 2022 竞赛的 MT 数据集上进行了实验,并取得了 Vietnamese to Chinese 的 BLEU 分数为 35.3,以及 Chinese to Vietnamese 的 BLEU 分数为 33.2。
    Abstract Machine Translation is one of the essential tasks in Natural Language Processing (NLP), which has massive applications in real life as well as contributing to other tasks in the NLP research community. Recently, Transformer -based methods have attracted numerous researchers in this domain and achieved state-of-the-art results in most of the pair languages. In this paper, we report an effective method using a phrase mechanism, PhraseTransformer, to improve the strong baseline model Transformer in constructing a Neural Machine Translation (NMT) system for parallel corpora Vietnamese-Chinese. Our experiments on the MT dataset of the VLSP 2022 competition achieved the BLEU score of 35.3 on Vietnamese to Chinese and 33.2 BLEU scores on Chinese to Vietnamese data. Our code is available at https://github.com/phuongnm94/PhraseTransformer.
    摘要 机器翻译是自然语言处理(NLP)领域的一项重要任务,它在实际生活中有很大的应用前景,同时也对NLP研究领域中的其他任务产生了贡献。现在,基于Transformer算法的方法在这个领域中吸引了大量研究人员,并在大多数对应语言的情况下达到了状态的艺术级结果。在这篇论文中,我们报道了一种使用短语机制,PhraseTransformer,以提高基eline模型Transformer在构建 neural machine translation(NMT)系统的方法。我们在VLSP 2022大赛的MT数据集上进行了实验,得到了Vietnamese to Chinese的BLEU分数为35.3,以及Chinese to Vietnamese的BLEU分数为33.2。我们的代码可以在GitHub上找到:https://github.com/phuongnm94/PhraseTransformer。

Deep Semi-supervised Anomaly Detection with Metapath-based Context Knowledge

  • paper_url: http://arxiv.org/abs/2308.10918
  • repo_url: None
  • paper_authors: Hwan Kim, Junghoon Kim, Byung Suk Lee, Sungsu Lim
  • for: 本研究旨在提出一种基于中维度semi-supervised learning的图像异常检测方法,以解决现有方法的局限性。
  • methods: 本方法基于GCN层在编码器和解码器中,有效地传递了上下文信息 между异常和正常节点。该方法还使用了特制的异常社区和中维度信息来增强学习异常结构和属性的差异。
  • results: 经过对七个真实网络的全面实验,本研究证明了MSAD方法在比特当前技术的情况下显著超越。这些成功的结果为未来的研究提供了道路,关注中维度异常检测的优化和分析,以进一步提高图像异常检测的效果。
    Abstract Graph anomaly detection has attracted considerable attention in recent years. This paper introduces a novel approach that leverages metapath-based semi-supervised learning, addressing the limitations of previous methods. We present a new framework, Metapath-based Semi-supervised Anomaly Detection (MSAD), incorporating GCN layers in both the encoder and decoder to efficiently propagate context information between abnormal and normal nodes. The design of metapath-based context information and a specifically crafted anomaly community enhance the process of learning differences in structures and attributes, both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MSAD method compared to state-of-the-art techniques. The promising results of this study pave the way for future investigations, focusing on the optimization and analysis of metapath patterns to further enhance the effectiveness of anomaly detection on attributed networks.
    摘要 GRAPH anomaly detection has attracted considerable attention in recent years. This paper introduces a novel approach that leverages metapath-based semi-supervised learning, addressing the limitations of previous methods. We present a new framework, Metapath-based Semi-supervised Anomaly Detection (MSAD), incorporating GCN layers in both the encoder and decoder to efficiently propagate context information between abnormal and normal nodes. The design of metapath-based context information and a specifically crafted anomaly community enhance the process of learning differences in structures and attributes, both globally and locally. Through a comprehensive set of experiments conducted on seven real-world networks, this paper demonstrates the superiority of the MSAD method compared to state-of-the-art techniques. The promising results of this study pave the way for future investigations, focusing on the optimization and analysis of metapath patterns to further enhance the effectiveness of anomaly detection on attributed networks.Here's the word-for-word translation of the text into Simplified Chinese: GRAPH anomaly detection 在最近几年内吸引了较大的关注。这篇论文介绍了一种新的方法,基于мета路强化 semi-supervised learning,解决先前方法的局限性。我们提出了一个新的框架,基于 мета路强化 semi-supervised anomaly detection (MSAD),在编码器和解码器中嵌入GCN层,有效地传递context信息 между异常和正常节点。基于 мета路强化的context信息和特定设计的异常社区,使得学习不同结构和属性的差异,global和local都得到了加强。通过对七个实际网络进行了全面的实验,这篇论文证明了 MSAD 方法与先前技术相比,具有更高的效果。这些有前途的结果为将来的研究提供了平台,集中于优化和分析 мета路征 Patterns 以进一步提高异常检测在归属网络上的效果。

Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.10462
  • repo_url: None
  • paper_authors: Martin Weyssow, Xin Zhou, Kisub Kim, David Lo, Houari Sahraoui
  • for: 本研究旨在探讨大语言模型(LLMs)在自然语言意图下生成代码的情况,特别是在资源匮乏的情况下使用 Parameter-Efficient Fine-Tuning(PEFT)技术来特化模型。
  • methods: 本研究使用了多种PEFT技术,包括简单的权重调整、权重学习率调整和权重梯度衰减等,以提高模型的特化性和性能。
  • results: 研究结果表明,PEFT技术可以有效地提高LLMs的特化性和性能,并且可以降低计算成本。此外,PEFT技术在不同的LLMs中的表现也有一定的相似性和稳定性。
    Abstract Large Language Models (LLMs) possess impressive capabilities to generate meaningful code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning. In the perspective of unleashing their full potential, prior work has demonstrated the benefits of fine-tuning the models to task-specific data. However, fine-tuning process demands heavy computational costs and is intractable when resources are scarce, especially for models with billions of parameters. In light of these challenges, previous studies explored In-Context Learning (ICL) as an effective strategy to generate contextually appropriate code without fine-tuning. However, it operates at inference time and does not involve learning task-specific parameters, potentially limiting the model's performance on downstream tasks. In this context, we foresee that Parameter-Efficient Fine-Tuning (PEFT) techniques carry a high potential for efficiently specializing LLMs to task-specific data. In this paper, we deliver a comprehensive study of LLMs with the impact of PEFT techniques under the automated code generation scenario. Our experimental results reveal the superiority and potential of such techniques over ICL on a wide range of LLMs in reducing the computational burden and improving performance. Therefore, the study opens opportunities for broader applications of PEFT in software engineering scenarios.
    摘要 大型语言模型(LLM)拥有出色的能力生成相关代码块,无需特定的精细调整。在激发其全部潜力的视角下,先前的研究表明了特定任务数据的调整可以提高模型性能。然而,调整过程具有重大的计算成本,特别是当参数数量庞大时,尤其是当资源匮乏时。为此,先前的研究探索了在下游任务中进行具体任务学习(ICL),以生成Contextually appropriate的代码,不需要特定的调整。然而,ICL在推理时运行,并不涉及学习特定任务参数,可能会限制模型在下游任务中的表现。在这种情况下,我们认为Parameter-Efficient Fine-Tuning(PEFT)技术具有高效地特化LLMs到特定任务数据的潜力。在这篇论文中,我们进行了LLMs在自动代码生成场景下的PEFT技术的全面研究。我们的实验结果表明,PEFT技术在许多LLMs上具有准确性和可扩展性,在减少计算成本和提高性能方面表现出色。因此,这种研究开创了在软件工程场景中PEFT技术的更广泛应用前景。

Adaptive Local Steps Federated Learning with Differential Privacy Driven by Convergence Analysis

  • paper_url: http://arxiv.org/abs/2308.10457
  • repo_url: None
  • paper_authors: Xinpeng Ling, Jie Fu, Zhili Chen
  • for: 这篇论文是关于如何在分布式机器学习(Federated Learning,FL)中保护敏感资料,并且在资源有限的情况下实现隐私保证。
  • methods: 这篇论文使用了差异攻击(differential privacy)来保护敏感资料,并且提出了一个适应性的地方步骤差异隐私 federated learning(ALS-DPFL)算法。
  • results: 这篇论文通过实验表明,ALS-DPFL算法可以在资源有限的情况下,实现隐私保证并且获得良好的性能。
    Abstract Federated Learning (FL) is a distributed machine learning technique that allows model training among multiple devices or organizations without sharing data. However, while FL ensures that the raw data is not directly accessible to external adversaries, adversaries can still obtain some statistical information about the data through differential attacks. Differential Privacy (DP) has been proposed, which adds noise to the model or gradients to prevent adversaries from inferring private information from the transmitted parameters. We reconsider the framework of differential privacy federated learning in resource-constrained scenarios (privacy budget and communication resources). We analyze the convergence of federated learning with differential privacy (DPFL) on resource-constrained scenarios and propose an Adaptive Local Steps Differential Privacy Federated Learning (ALS-DPFL) algorithm. We experiment our algorithm on the FashionMNIST and Cifar-10 datasets and achieve quite good performance relative to previous work.
    摘要 联邦学习(FL)是一种分布式机器学习技术,让多个设备或组织共同训练模型,不需要分享数据。然而,FL可以保证数据没有直接泄露到外部攻击者,但攻击者可以从传输的模型或梯度中获得一些关于数据的统计信息。对于这种情况,提出了几何秘𫓾(DP),它将加入随机误差到模型或梯度中,以防止攻击者从传输的参数中获取私人信息。我们在资源有限的情况下重新考虑了几何秘𫓾联邦学习(DPFL)的框架,并分析了DPFL在资源有限情况下的整合性。我们还提出了一个适应性本地步骤几何秘𫓾联邦学习(ALS-DPFL)算法。我们对于时装MNIST和Cifar-10 datasets进行了实验,并获得了与前一些工作相对的很好的性能。

DOMINO++: Domain-aware Loss Regularization for Deep Learning Generalizability

  • paper_url: http://arxiv.org/abs/2308.10453
  • repo_url: None
  • paper_authors: Skylar E. Stolte, Kyle Volle, Aprinda Indahlastari, Alejandro Albizu, Adam J. Woods, Kevin Brink, Matthew Hale, Ruogu Fang
  • For: This paper focuses on improving the out-of-distribution (OOD) generalization of deep learning (DL) models for reliable deployment in real-world applications.* Methods: The proposed method, DOMINO++, utilizes dual-guidance and dynamic domain-aware loss regularization to integrate expert-guided and data-guided knowledge for OOD generalization. Unlike previous methods, DOMINO++, adapts the regularization rate dynamically and improves the performance on OOD data.* Results: The proposed method outperforms the baseline model and DOMINO on OOD data, including synthetic noisy and rotated datasets, as well as real data from a different MRI scanner. This demonstrates the potential of DOMINO++ for improving the trustworthy deployment of DL models in clinical applications.Here is the simplified Chinese text for the three key points:* For: 这篇论文关注深度学习(DL)模型在真实应用中可靠部署的外部数据泛化问题。* Methods: 提议的方法是 DOMINO++,它利用双引导和动态领域相关损失补偿来结合专家指导和数据指导知识来提高OOD泛化性能。与之前的方法不同的是,DOMION++ 适应常量补偿的幂等因子和补偿率。* Results: 提议的方法在OOD数据上表现出色,比基eline模型和 DOMINO 更高。OOD数据包括随机噪音和旋转数据集,以及来自不同MRI扫描仪的真实数据。这表明 DOMINO++ 可能为深度学习模型在临床应用中可靠部署提供了可能。
    Abstract Out-of-distribution (OOD) generalization poses a serious challenge for modern deep learning (DL). OOD data consists of test data that is significantly different from the model's training data. DL models that perform well on in-domain test data could struggle on OOD data. Overcoming this discrepancy is essential to the reliable deployment of DL. Proper model calibration decreases the number of spurious connections that are made between model features and class outputs. Hence, calibrated DL can improve OOD generalization by only learning features that are truly indicative of the respective classes. Previous work proposed domain-aware model calibration (DOMINO) to improve DL calibration, but it lacks designs for model generalizability to OOD data. In this work, we propose DOMINO++, a dual-guidance and dynamic domain-aware loss regularization focused on OOD generalizability. DOMINO++ integrates expert-guided and data-guided knowledge in its regularization. Unlike DOMINO which imposed a fixed scaling and regularization rate, DOMINO++ designs a dynamic scaling factor and an adaptive regularization rate. Comprehensive evaluations compare DOMINO++ with DOMINO and the baseline model for head tissue segmentation from magnetic resonance images (MRIs) on OOD data. The OOD data consists of synthetic noisy and rotated datasets, as well as real data using a different MRI scanner from a separate site. DOMINO++'s superior performance demonstrates its potential to improve the trustworthy deployment of DL on real clinical data.
    摘要 现代深度学习(DL)面临着出版物领域(Out-of-distribution,OOD)泛化的严重挑战。OOD数据包括测试数据,与DL模型训练数据有显著差异。DL模型在域内测试数据上表现良好,但在OOD数据上却表现不佳。解决这个差异是DL模型可靠部署的必要条件。正确地调整DL模型可以减少模型特征与类输出之间的假Connection。因此,调整DL可以提高OOD泛化,只学习真正指示各类的特征。之前的工作提出了领域意识模型calibration(DOMINO)以提高DL calibration,但它缺乏对OOD数据的设计。在这种工作中,我们提出了DOMINO++,一种双向引导和动态领域意识损失补偿。DOMINO++结合了专家指导和数据指导的知识在其补偿中。与DOMINO不同的是,DOMINO++不设置固定的缩放因子和补偿率,而是动态设置缩放因子和适应的补偿率。对DOMINO++与DOMINO以及基eline模型进行了广泛的评估,用于头部组织分割MRIs的OOD数据。OOD数据包括Synthetic噪音和旋转数据集,以及实际数据使用不同MRI扫描仪从另一个站点。DOMINO++的优秀表现表明它在真实的临床数据上可靠部署DL模型。

PACS: Prediction and analysis of cancer subtypes from multi-omics data based on a multi-head attention mechanism model

  • paper_url: http://arxiv.org/abs/2308.10917
  • repo_url: None
  • paper_authors: Liangrui Pan, Dazheng Liu, Zhichao Feng, Wenjuan Liu, Shaoliang Peng
    for: 这个研究旨在精确分类不同癌症亚型,帮助医生选择最适合的治疗方案,提高治疗效果,并提供更准确的病人存活预测。methods: 本研究提出了一个监督式多头注意力机制模型(SMA),使用多头注意力嵌入和特征分享模块,成功地学习多种资料的全球和本地特征信息。其中,多头注意力嵌入可以实现当前资料的弹性转换,提高模型的准确性和稳定性;特征分享模块可以将不同类型的资料共享和结合,提高模型的表现力。results: 透过广泛的实验验证,SMA模型在资料验证、单细胞资料和癌症多种资料中实现了最高准确性、F1权重、F1几何和精确的癌症亚型分类,比AE、CNN和GNN-based模型高。因此,我们对未来多种资料的研究做出了贡献。
    Abstract Due to the high heterogeneity and clinical characteristics of cancer, there are significant differences in multi-omic data and clinical characteristics among different cancer subtypes. Therefore, accurate classification of cancer subtypes can help doctors choose the most appropriate treatment options, improve treatment outcomes, and provide more accurate patient survival predictions. In this study, we propose a supervised multi-head attention mechanism model (SMA) to classify cancer subtypes successfully. The attention mechanism and feature sharing module of the SMA model can successfully learn the global and local feature information of multi-omics data. Second, it enriches the parameters of the model by deeply fusing multi-head attention encoders from Siamese through the fusion module. Validated by extensive experiments, the SMA model achieves the highest accuracy, F1 macroscopic, F1 weighted, and accurate classification of cancer subtypes in simulated, single-cell, and cancer multiomics datasets compared to AE, CNN, and GNN-based models. Therefore, we contribute to future research on multiomics data using our attention-based approach.
    摘要 因为肿瘤多样性和临床特征的高度不同,不同的肿瘤亚型之间存在显著的多 Omics 数据和临床特征差异。因此,正确地分类肿瘤亚型可以帮助医生选择最合适的治疗方案,提高治疗效果,并为患者提供更加准确的存活预测。在本研究中,我们提议一种supervised多头注意机制模型(SMA),成功地分类肿瘤亚型。注意机制和特征共享模块在SMA模型中可以学习多 Omics 数据的全球和本地特征信息。其次,通过深度融合多头注意编码器,增强模型参数的SMA模型。经验证了广泛的实验,SMA模型在 simulate、单细胞和肿瘤多Omics 数据集上实现了最高精度、F1大致、F1平均和正确地分类肿瘤亚型,比AE、CNN和GNN-based模型高。因此,我们对未来的多Omics 数据研究做出了贡献。

CVFC: Attention-Based Cross-View Feature Consistency for Weakly Supervised Semantic Segmentation of Pathology Images

  • paper_url: http://arxiv.org/abs/2308.10449
  • repo_url: None
  • paper_authors: Liangrui Pan, Lian Wang, Zhichao Feng, Liwen Xu, Shaoliang Peng
  • For: 本研究旨在提出一种基于注意力机制的跨视图特征一致朴素pseudoMask生成框架CVFC,以解决 histopathology图像分割需要高质量Mask的问题。* Methods: CVFC是一个三极结构的端到端框架,包括两个 Resnet38 和一个 Resnet50,以及独立支持多比例缩放的独立支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例缩放的支持多比例�
    Abstract Histopathology image segmentation is the gold standard for diagnosing cancer, and can indicate cancer prognosis. However, histopathology image segmentation requires high-quality masks, so many studies now use imagelevel labels to achieve pixel-level segmentation to reduce the need for fine-grained annotation. To solve this problem, we propose an attention-based cross-view feature consistency end-to-end pseudo-mask generation framework named CVFC based on the attention mechanism. Specifically, CVFC is a three-branch joint framework composed of two Resnet38 and one Resnet50, and the independent branch multi-scale integrated feature map to generate a class activation map (CAM); in each branch, through down-sampling and The expansion method adjusts the size of the CAM; the middle branch projects the feature matrix to the query and key feature spaces, and generates a feature space perception matrix through the connection layer and inner product to adjust and refine the CAM of each branch; finally, through the feature consistency loss and feature cross loss to optimize the parameters of CVFC in co-training mode. After a large number of experiments, An IoU of 0.7122 and a fwIoU of 0.7018 are obtained on the WSSS4LUAD dataset, which outperforms HistoSegNet, SEAM, C-CAM, WSSS-Tissue, and OEEM, respectively.
    摘要 《 histopathology 图像分割是诊断癌症的标准方法,可以预测癌症诊断。但 histopathology 图像分割需要高质量的面Mask,因此许多研究现在使用图像水平标签来实现像素级分割,以降低细致的注解量。为解决这个问题,我们提出了一种基于注意机制的跨视图特征一致末级 pseudo-Mask 生成框架 named CVFC。具体来说,CVFC 是一个三棵分支结构,包括两个 Resnet38 和一个 Resnet50,以及独立分支多缩放维度集成特征图来生成类Activation Map (CAM)。在每个棵分支中,通过下降和扩展方法调整特征图的大小;中间棵分支将特征矩阵 проек到查询和关键特征空间,并生成特征空间感知矩阵通过连接层和内积来调整和细化每个棵分支的 CAM;最后,通过特征一致损失和特征交叉损失来优化 CVFC 的参数。经过大量实验,在 WSSS4LUAD 数据集上,CVFC 实现了 IoU 0.7122 和 fwIoU 0.7018,超越 HistoSegNet、SEAM、C-CAM、WSSS-Tissue 和 OEEM,分别。

DySuse: Susceptibility Estimation in Dynamic Social Networks

  • paper_url: http://arxiv.org/abs/2308.10442
  • repo_url: None
  • paper_authors: Yingdan Shi, Jingya Zhou, Congcong Zhang
  • for: 预测社交网络中流行的潜在影响范围。
  • methods: 提出了一种基于动态图 embedding 技术的框架,名为 DySuse,以独立捕捉每幅图像的结构信息,并通过进步机制和自注意力块来耦合结构和时间信息。
  • results: 实验结果表明,我们的框架在多种流行传播模型下有较高的预测性能,并且超过了现有的动态图 embedding 模型。
    Abstract Influence estimation aims to predict the total influence spread in social networks and has received surged attention in recent years. Most current studies focus on estimating the total number of influenced users in a social network, and neglect susceptibility estimation that aims to predict the probability of each user being influenced from the individual perspective. As a more fine-grained estimation task, susceptibility estimation is full of attractiveness and practical value. Based on the significance of susceptibility estimation and dynamic properties of social networks, we propose a task, called susceptibility estimation in dynamic social networks, which is even more realistic and valuable in real-world applications. Susceptibility estimation in dynamic networks has yet to be explored so far and is computationally intractable to naively adopt Monte Carlo simulation to obtain the results. To this end, we propose a novel end-to-end framework DySuse based on dynamic graph embedding technology. Specifically, we leverage a structural feature module to independently capture the structural information of influence diffusion on each single graph snapshot. Besides, {we propose the progressive mechanism according to the property of influence diffusion,} to couple the structural and temporal information during diffusion tightly. Moreover, a self-attention block {is designed to} further capture temporal dependency by flexibly weighting historical timestamps. Experimental results show that our framework is superior to the existing dynamic graph embedding models and has satisfactory prediction performance in multiple influence diffusion models.
    摘要 社会网络的影响估计已经在最近几年内收到了极大的关注,大多数当前的研究都是关注社会网络中总的影响范围的估计,而忽略了每个用户从个人角度上的抵触可能性的估计。作为一个更加细化的估计任务,抵触估计充满了吸引力和实际价值。基于社会网络的动态特性和抵触估计的重要性,我们提出了一项任务,即动态社会网络中的抵触估计任务,这个任务更加真实和有价值在实际应用中。动态网络中的抵触估计还没有被探索过,Naive使用Monte Carlo simulations来获取结果是计算拥堵的。为此,我们提出了一个新的框架,即 DySuse,基于动态图嵌入技术。specifically,我们利用一个结构特征模块来独立地捕捉影响扩散在每个单图Snapshot中的结构信息。另外,我们提出了一种进步机制,根据影响扩散的性质,将结构和时间信息在扩散过程中紧密相连。此外,我们还设计了一个自注意阶段,以捕捉流传过程中的时间相关性。实验结果表明,我们的框架在多种影响扩散模型下具有优秀的预测性能,与现有的动态图嵌入模型相比,有显著的优势。

Approximately Equivariant Graph Networks

  • paper_url: http://arxiv.org/abs/2308.10436
  • repo_url: https://github.com/nhuang37/approx_equivariant_graph_nets
  • paper_authors: Ningyuan Huang, Ron Levie, Soledad Villar
  • for: 这个论文主要针对 Graph Neural Networks (GNNs) 的几何对称性问题,具体来说是研究 GNNs 在图像上的学习问题。
  • methods: 这篇论文使用了 Graph Neural Networks (GNNs) 来学习图像上的信号,并对 GNNs 的几何对称性进行了研究。
  • results: 该论文通过对各种图像上的信号进行学习,并分析了 GNNs 的几何对称性,提出了一种基于自动同构的几何对称性约束,并证明了这种约束可以提高 GNNs 的泛化性能。
    Abstract Graph neural networks (GNNs) are commonly described as being permutation equivariant with respect to node relabeling in the graph. This symmetry of GNNs is often compared to the translation equivariance symmetry of Euclidean convolution neural networks (CNNs). However, these two symmetries are fundamentally different: The translation equivariance of CNNs corresponds to symmetries of the fixed domain acting on the image signal (sometimes known as active symmetries), whereas in GNNs any permutation acts on both the graph signals and the graph domain (sometimes described as passive symmetries). In this work, we focus on the active symmetries of GNNs, by considering a learning setting where signals are supported on a fixed graph. In this case, the natural symmetries of GNNs are the automorphisms of the graph. Since real-world graphs tend to be asymmetric, we relax the notion of symmetries by formalizing approximate symmetries via graph coarsening. We present a bias-variance formula that quantifies the tradeoff between the loss in expressivity and the gain in the regularity of the learned estimator, depending on the chosen symmetry group. To illustrate our approach, we conduct extensive experiments on image inpainting, traffic flow prediction, and human pose estimation with different choices of symmetries. We show theoretically and empirically that the best generalization performance can be achieved by choosing a suitably larger group than the graph automorphism group, but smaller than the full permutation group.
    摘要 图 neural network (GNN) 常被描述为对节点重新标记的图有 permutation 对称性。这种 GNN 的对称性与图像 convolution neural network (CNN) 中的平移对称性有所不同:图像 CNN 中的平移对称性是图像信号上的活动对称性,而 GNN 中任意 permutation 都会影响图像信号和图像Domain(有时被称为被动对称性)。在这项工作中,我们关注 GNN 中的活动对称性,通过考虑固定图上支持的信号来进行学习设定。在这种情况下,GNN 的自然对称性是图形自动同构。由于实际的图都很偏 asymmetric,我们将对称性的定义放松,通过图像粗化来形式化approximate symmetries。我们提出了一个 bias-variance 方程,这方程量化了在选择的对称性组中loss in expressivity和学习的regulatory gain之间的负责任性。为了证明我们的方法,我们在图像填充、交通流预测和人姿估计中进行了广泛的实验,并证明了理论和实验上,可以通过选择合适的对称性组来获得最佳的总结性表现。

Federated Learning Robust to Byzantine Attacks: Achieving Zero Optimality Gap

  • paper_url: http://arxiv.org/abs/2308.10427
  • repo_url: None
  • paper_authors: Shiyuan Zuo, Rongfei Fan, Han Hu, Ning Zhang, Shimin Gong
  • for: 提出了一种鲁棒的聚合方法,用于防止Byzantine攻击的 federated learning (FL)
  • methods: 每个用户首先更新模型参数,然后直接将更新后的参数传输到聚合中心,减少了聚合中心和用户之间的交互次数,并且允许每个用户在不同迭代中设置自己的训练参数,从而减轻计算负担。聚合中心使用几何平均来组合来自每个用户的模型参数。
  • results: 证明了我们提出的方法可以具有零优化差和线性收敛,只要Byzantine攻击者的比例小于一半。数字结果也证明了我们的方法的有效性。
    Abstract In this paper, we propose a robust aggregation method for federated learning (FL) that can effectively tackle malicious Byzantine attacks. At each user, model parameter is firstly updated by multiple steps, which is adjustable over iterations, and then pushed to the aggregation center directly. This decreases the number of interactions between the aggregation center and users, allows each user to set training parameter in a flexible way, and reduces computation burden compared with existing works that need to combine multiple historical model parameters. At the aggregation center, geometric median is leveraged to combine the received model parameters from each user. Rigorous proof shows that zero optimality gap is achieved by our proposed method with linear convergence, as long as the fraction of Byzantine attackers is below half. Numerical results verify the effectiveness of our proposed method.
    摘要 在这篇论文中,我们提出了一种鲁棒的聚合方法 для federated learning (FL),可以有效地应对恶意的拜尼阶攻击。每个用户的模型参数首先通过多个步骤进行更新,这些步骤可以在迭代过程中调整,然后直接将更新后的模型参数Push到聚合中心。这将减少聚合中心和用户之间的交互次数,让每个用户可以自由地设置训练参数,并且比现有的方法减少计算负担。在聚合中心,我们使用 геометрический médian来聚合来自每个用户的接收到的模型参数。严格的证明显示,我们的提议方法可以在拜尼攻击者占用 fraction 下于半的情况下 achieves zero optimality gap,并且 linear convergence 。数字结果证明了我们的提议方法的有效性。

Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2308.10425
  • repo_url: https://github.com/xdzhelheim/staeformer
  • paper_authors: Hangchen Liu, Zheng Dong, Renhe Jiang, Jiewen Deng, Jinliang Deng, Quanjun Chen, Xuan Song
  • for: 本研究旨在提出一种基于封装的变换器,以优化 traffic forecasting task 的性能。
  • methods: 本研究使用了一种新的组件 called spatio-temporal adaptive embedding,该组件可以帮助模型更好地捕捉 traffic 时序数据中的空间 temporal 关系。
  • results: 研究表明,使用 STAEformer 可以在五个实际 traffic forecasting 数据集上 achieve state-of-the-art 性能。此外,实验还表明了 spatio-temporal adaptive embedding 在 traffic forecasting 中的重要作用,即能够有效地捕捉 traffic 时序数据中的内在空间 temporal 关系和时间序列信息。
    Abstract With the rapid development of the Intelligent Transportation System (ITS), accurate traffic forecasting has emerged as a critical challenge. The key bottleneck lies in capturing the intricate spatio-temporal traffic patterns. In recent years, numerous neural networks with complicated architectures have been proposed to address this issue. However, the advancements in network architectures have encountered diminishing performance gains. In this study, we present a novel component called spatio-temporal adaptive embedding that can yield outstanding results with vanilla transformers. Our proposed Spatio-Temporal Adaptive Embedding transformer (STAEformer) achieves state-of-the-art performance on five real-world traffic forecasting datasets. Further experiments demonstrate that spatio-temporal adaptive embedding plays a crucial role in traffic forecasting by effectively capturing intrinsic spatio-temporal relations and chronological information in traffic time series.
    摘要 随着智能交通系统(ITS)的快速发展,准确的交通预测已成为一项关键挑战。关键瓶颈在于捕捉复杂的空间-时间交通模式。在过去几年,许多基于神经网络的复杂架构的方法已经被提出来解决这个问题。然而,网络架构的提高不断带来逐渐减少的性能提升。在本研究中,我们提出了一种新的组件 called 空间-时间适应嵌入(STAEformer),它可以在基于 transformer 的模型中实现出色的表现。我们的提posed STAEformer 在五个实际交通预测数据集上实现了状态机器的性能。进一步的实验表明,空间-时间适应嵌入在交通预测中发挥了关键的作用,能够有效地捕捉交通时序序列中的内在空间-时间关系和时间信息。

TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition

  • paper_url: http://arxiv.org/abs/2308.10415
  • repo_url: None
  • paper_authors: Hakan Erdogan, Scott Wisdom, Xuankai Chang, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey
  • for: 这个论文是为了提出一种基于TokenSequence的语音分离模型,用于分离多个语音源并同时进行语音识别和语音生成。
  • methods: 该模型使用Transformer架构,并通过输入填充和输出掩码来实现多任务同时训练。
  • results: 该模型在对象metric和主观MUSHRA听测中表现出色,无论是否在文本条件下进行训练。此外,模型还可以提供高质量的自动语音识别(ASR)性能和语音合成示例。
    Abstract We present TokenSplit, a speech separation model that acts on discrete token sequences. The model is trained on multiple tasks simultaneously: separate and transcribe each speech source, and generate speech from text. The model operates on transcripts and audio token sequences and achieves multiple tasks through masking of inputs. The model is a sequence-to-sequence encoder-decoder model that uses the Transformer architecture. We also present a "refinement" version of the model that predicts enhanced audio tokens from the audio tokens of speech separated by a conventional separation model. Using both objective metrics and subjective MUSHRA listening tests, we show that our model achieves excellent performance in terms of separation, both with or without transcript conditioning. We also measure the automatic speech recognition (ASR) performance and provide audio samples of speech synthesis to demonstrate the additional utility of our model.
    摘要 我们介绍TokenSplit,一种基于字符序列的语音分离模型。该模型同时进行多个任务的训练:分离每个语音源,并将语音转化为文本。模型对字符序列和音频token序列进行Masking操作,并使用Transformer架构。我们还提出了一种“精度提升”版本的模型,可以通过 convential分离模型生成优化的音频token。通过对象指标和主观MUSHRA听测,我们证明了我们的模型在分离方面具有出色的表现,无论是否受到文本conditioning的影响。此外,我们还测试了自动语音识别(ASR)性能,并提供了语音合成示例以证明该模型的额外价值。

Federated Learning for Connected and Automated Vehicles: A Survey of Existing Approaches and Challenges

  • paper_url: http://arxiv.org/abs/2308.10407
  • repo_url: None
  • paper_authors: Vishnu Pandi Chellapandi, Liangqi Yuan, Christopher G. Brinton, Stanislaw H Zak, Ziran Wang
  • for: 本研究写了一篇评论文章,探讨了在自动驾驶汽车(CAV)中应用 Federated Learning(FL)的最新进展。
  • methods: 本文分析了中央化和分散化的FL框架,包括其主要特征和方法。同时也评论了CAV中FL的各种数据来源、模型和数据安全技术,强调了它们在维护车辆数据隐私和安全方面的重要性。
  • results: 本文对FL在CAV中的各种应用进行了评论,包括它们的基本模型和数据集。最后,文章点出了FL4CAV的现有挑战和未来研究的可能方向,以将FL在CAV中的效率和可靠性进一步提高。
    Abstract Machine learning (ML) is widely used for key tasks in Connected and Automated Vehicles (CAV), including perception, planning, and control. However, its reliance on vehicular data for model training presents significant challenges related to in-vehicle user privacy and communication overhead generated by massive data volumes. Federated learning (FL) is a decentralized ML approach that enables multiple vehicles to collaboratively develop models, broadening learning from various driving environments, enhancing overall performance, and simultaneously securing local vehicle data privacy and security. This survey paper presents a review of the advancements made in the application of FL for CAV (FL4CAV). First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific and important applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future work are discussed to further enhance the effectiveness and efficiency of FL in the context of CAV.
    摘要 First, centralized and decentralized frameworks of FL are analyzed, highlighting their key characteristics and methodologies. Second, diverse data sources, models, and data security techniques relevant to FL in CAVs are reviewed, emphasizing their significance in ensuring privacy and confidentiality. Third, specific and important applications of FL are explored, providing insight into the base models and datasets employed for each application. Finally, existing challenges for FL4CAV are listed and potential directions for future work are discussed to further enhance the effectiveness and efficiency of FL in the context of CAV.Translation in Simplified Chinese:机器学习(ML)在connected和自动驾车(CAV)中广泛应用于关键任务,包括感知、规划和控制。然而,它对于车辆数据的训练而言存在重要的用户隐私和通信负担问题。联邦学习(FL)是一种分布式机器学习方法,它可以让多辆车辆共同开发模型,从而拓宽学习不同驾驶环境,提高总性能,同时保护车辆数据隐私和安全。本文将对FL在CAV中的应用进行评论。首先,中央化和分布式框架的FL被分析,强调其关键特征和方法。其次,CAV中 relevante的数据来源、模型和数据安全技术被评审,强调它们在保护隐私和Confidentiality方面的重要性。第三,FL在CAV中的特定应用被探讨,提供了每个应用的基本模型和数据集使用情况。最后,FL4CAV的挑战和未来工作的可能方向被列出,以进一步提高FL在CAV中的效果和效率。

Label Selection Approach to Learning from Crowds

  • paper_url: http://arxiv.org/abs/2308.10396
  • repo_url: https://github.com/ssatsuki/label-selection-layer
  • paper_authors: Kosuke Yoshimura, Hisashi Kashima
  • for: 本研究旨在提高supervised learning中使用来自众所� Discogs 的标注数据的精度,并提出了一种基于SelectiveNet的新方法,即Label Selection Layer,可以自动选择工作者的标注数据是否使用于训练。
  • methods: 本研究使用了一种基于Selector网络的方法,即Label Selection Layer,来自动选择工作者的标注数据是否使用于训练。
  • results: 实验结果显示,提出的方法在大多数情况下与Crowd Layer相当或更好,只有在回归问题时有所下降。
    Abstract Supervised learning, especially supervised deep learning, requires large amounts of labeled data. One approach to collect large amounts of labeled data is by using a crowdsourcing platform where numerous workers perform the annotation tasks. However, the annotation results often contain label noise, as the annotation skills vary depending on the crowd workers and their ability to complete the task correctly. Learning from Crowds is a framework which directly trains the models using noisy labeled data from crowd workers. In this study, we propose a novel Learning from Crowds model, inspired by SelectiveNet proposed for the selective prediction problem. The proposed method called Label Selection Layer trains a prediction model by automatically determining whether to use a worker's label for training using a selector network. A major advantage of the proposed method is that it can be applied to almost all variants of supervised learning problems by simply adding a selector network and changing the objective function for existing models, without explicitly assuming a model of the noise in crowd annotations. The experimental results show that the performance of the proposed method is almost equivalent to or better than the Crowd Layer, which is one of the state-of-the-art methods for Deep Learning from Crowds, except for the regression problem case.
    摘要 <>转换文本到简化中文。<>超级vised学习,特别是超级vised深度学习,需要大量标注数据。一种采集大量标注数据的方法是通过群组工作平台,让多名工作者完成标注任务。然而,标注结果经常含有标签噪音,因为群工作者的标注技能因人而异,完成任务正确性不一致。本研究提出了一种名为学习群体(Learning from Crowds)的框架,直接使用群体标注数据来训练模型。在本研究中,我们提出了一种新的学习群体模型,受到选择网络(Selector Network)的启发。这种方法被称为标签选择层(Label Selection Layer),它可以自动决定是否使用群体成员的标签来训练预测模型。本方法的一个优点是可以适用于大多数超级vised学习问题,只需要将选择网络和目标函数添加到现有模型中,无需显式假设群体标注噪音模型。实验结果表明,提出的方法与state-of-the-art方法之一的Deep Learning from Crowds(Crowd Layer)的性能几乎相同,除了回归问题 caso。

DiffPrep: Differentiable Data Preprocessing Pipeline Search for Learning over Tabular Data

  • paper_url: http://arxiv.org/abs/2308.10915
  • repo_url: https://github.com/chu-data-lab/diffprep
  • paper_authors: Peng Li, Zhiyi Chen, Xu Chu, Kexin Rong
  • for: 提高机器学习模型的性能,自动搜索数据预处理管道
  • methods: 使用梯度下降法搜索数据预处理管道,将搜索空间转换为连续和可导的空间
  • results: 在15个实际世界数据集上达到最佳测试准确率,提高模型测试准确率最多6.6个百分点
    Abstract Data preprocessing is a crucial step in the machine learning process that transforms raw data into a more usable format for downstream ML models. However, it can be costly and time-consuming, often requiring the expertise of domain experts. Existing automated machine learning (AutoML) frameworks claim to automate data preprocessing. However, they often use a restricted search space of data preprocessing pipelines which limits the potential performance gains, and they are often too slow as they require training the ML model multiple times. In this paper, we propose DiffPrep, a method that can automatically and efficiently search for a data preprocessing pipeline for a given tabular dataset and a differentiable ML model such that the performance of the ML model is maximized. We formalize the problem of data preprocessing pipeline search as a bi-level optimization problem. To solve this problem efficiently, we transform and relax the discrete, non-differential search space into a continuous and differentiable one, which allows us to perform the pipeline search using gradient descent with training the ML model only once. Our experiments show that DiffPrep achieves the best test accuracy on 15 out of the 18 real-world datasets evaluated and improves the model's test accuracy by up to 6.6 percentage points.
    摘要 <>将文本翻译成简化中文。<>机器学习过程中的数据处理步骤是一个关键步骤,将原始数据转换成下游机器学习模型更加使用的格式。然而,这可能是成本和时间consuming的,经常需要域专家的帮助。现有的自动机器学习(AutoML)框架声称可以自动化数据处理。然而,它们通常使用restricted的数据处理管道搜索空间,这限制了性能提升的 potential,并且它们通常太慢,需要训练机器学习模型多次。在这篇论文中,我们提出了DiffPrep,一种方法可以自动和高效地搜索给定的表格数据集和弹性机器学习模型中的数据处理管道,以最大化机器学习模型的性能。我们将数据处理管道搜索问题формализова为二级优化问题。为了解决这个问题高效,我们将离散、非 diferencial的搜索空间转换为连续和微分的一个,这allow我们使用梯度下降来搜索管道,只需训练机器学习模型一次。我们的实验表明,DiffPrep在18个真实世界数据集中测试精度为15个最好,并提高机器学习模型的测试精度达6.6个百分点。

Unsupervised Opinion Aggregation – A Statistical Perspective

  • paper_url: http://arxiv.org/abs/2308.10386
  • repo_url: None
  • paper_authors: Noyan C. Sevuktekin, Andrew C. Singer
  • For: This paper is written for decision-makers who rely on opinions from multiple experts to make complex decisions, but have limited or no access to the ground truth.* Methods: The paper proposes a statistical approach to infer the competence of each expert based on their opinions, without any need for the ground truth. The approach measures the competence of each expert by their likeliness to agree with their peers, and leverages this fact to propose a completely unsupervised version of the na"{i}ve Bayes classifier.* Results: The paper shows that the proposed technique is asymptotically optimal for a large class of problems, and can be applied for online opinion aggregation and decision-making based on a limited number of opinions.
    Abstract Complex decision-making systems rarely have direct access to the current state of the world and they instead rely on opinions to form an understanding of what the ground truth could be. Even in problems where experts provide opinions without any intention to manipulate the decision maker, it is challenging to decide which expert's opinion is more reliable -- a challenge that is further amplified when decision-maker has limited, delayed, or no access to the ground truth after the fact. This paper explores a statistical approach to infer the competence of each expert based on their opinions without any need for the ground truth. Echoing the logic behind what is commonly referred to as \textit{the wisdom of crowds}, we propose measuring the competence of each expert by their likeliness to agree with their peers. We further show that the more reliable an expert is the more likely it is that they agree with their peers. We leverage this fact to propose a completely unsupervised version of the na\"{i}ve Bayes classifier and show that the proposed technique is asymptotically optimal for a large class of problems. In addition to aggregating a large block of opinions, we further apply our technique for online opinion aggregation and for decision-making based on a limited the number of opinions.
    摘要 决策系统很少直接访问现实世界的当前状态,而是基于意见来形成决策者对真实状态的理解。即使专家不 INTENTIONALLY 操纵决策者,仍然困难判断每位专家的可靠性——这种问题在决策者没有或延迟了真实状态的情况下变得更加严重。本文探讨一种统计方法,用于无需真实状态的情况下评估每位专家的能力。根据《群智》的逻辑,我们提议根据专家们之间的一致性来评估每位专家的能力。我们还证明了,更可靠的专家更有可能与其他专家一致。我们利用这一点,提出了一种完全无监督的na\"{i}ve Bayes分类器,并证明该技术在一类问题上是 asymptotically 优化的。除了聚合大量意见外,我们还应用了该技术于在线意见聚合和基于有限数量的意见进行决策。

Automated mapping of virtual environments with visual predictive coding

  • paper_url: http://arxiv.org/abs/2308.10913
  • repo_url: None
  • paper_authors: James Gornet, Matthew Thomson
  • for: 这个论文旨在探讨大脑如何直接从感知输入中构建内部的认知地图,以及这种方法如何普适地应用于听觉、感觉和语言输入。
  • methods: 这篇论文使用预测编码方法来描述大脑如何使用感知数据构建内部的认知地图。具体来说,论文使用一个自我注意力装备 convolutional neural network 来实现预测编码。
  • results: 研究发现,使用预测编码方法可以自动从视觉数据中提取环境内部的信息,并且可以准确地识别环境中的特征点。此外,研究还发现,预测编码可以将视觉、感觉和语言输入都 mapping 到同一个内部空间中,从而实现 vector 导航。
    Abstract Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. While machine learning algorithms like SLAM utilize specialized visual inference procedures to identify visual features and construct spatial maps from visual and odometry data, the general nature of cognitive maps in the brain suggests a unified mapping algorithmic strategy that can generalize to auditory, tactile, and linguistic inputs. Here, we demonstrate that predictive coding provides a natural and versatile neural network algorithm for constructing spatial maps using sensory data. We introduce a framework in which an agent navigates a virtual environment while engaging in visual predictive coding using a self-attention-equipped convolutional neural network. While learning a next image prediction task, the agent automatically constructs an internal representation of the environment that quantitatively reflects distances. The internal map enables the agent to pinpoint its location relative to landmarks using only visual information.The predictive coding network generates a vectorized encoding of the environment that supports vector navigation where individual latent space units delineate localized, overlapping neighborhoods in the environment. Broadly, our work introduces predictive coding as a unified algorithmic framework for constructing cognitive maps that can naturally extend to the mapping of auditory, sensorimotor, and linguistic inputs.
    摘要 人类直接从感知输入中构建内部的认知地图,没有访问专门的坐标系或距离测量。而机器学习算法如SLAM则使用专门的视觉推理过程来标识视觉特征并从视觉和运动数据中构建空间地图。然而,大脑中的认知地图的通用性表明了一种统一的映射算法策略,可以扩展到听觉、感觉和语言输入。在这里,我们表明了预测编码提供了一种自然和灵活的神经网络算法,用于通过感知数据构建空间地图。我们在虚拟环境中训练一个使用自我注意力束重Convolutional Neural Network(CNN)进行视觉预测任务的agent,而该任务自动构建了agent内部的环境表示,其中quantitatively表示距离。这个内部地图允许agent使用 только视觉信息确定其所处的位置。预测编码网络生成了一个vector化编码环境,该编码支持vector导航,其中个别的latent space单元界定了环境中的Localized, Overlapping Neighborhoods。总的来说,我们的工作引入预测编码作为一种统一的算法框架,可以自然扩展到 mapped听觉、感觉和语言输入。

HoSNN: Adversarially-Robust Homeostatic Spiking Neural Networks with Adaptive Firing Thresholds

  • paper_url: http://arxiv.org/abs/2308.10373
  • repo_url: None
  • paper_authors: Hejia Geng, Peng Li
  • for: 防御 adversarial attacks 的 SNN 模型
  • methods: 使用 bio-inspired 方法,即 neural homeostasis,开发一种具有自适应阈值调整功能的 leaky integrate-and-fire (TA-LIF) neuron 模型,并将其应用于 constructing 防御型 SNN (HoSNN)
  • results: 对 CIFAR-10 进行测试,提高了对 FGSM 和 PGD 攻击的抵抗力,并在无需显式 adversarial 训练的情况下达到了高度的精度提升(up to 72.6% and 54.19%),并且在 FGSM 攻击下超过了先前的模型(29.99%),表明了 HoSNN 的具有很强的鲁棒性和防御能力。
    Abstract Spiking neural networks (SNNs) offer promise for efficient and powerful neurally inspired computation. Common to other types of neural networks, however, SNNs face the severe issue of vulnerability to adversarial attacks. We present the first study that draws inspiration from neural homeostasis to develop a bio-inspired solution that counters the susceptibilities of SNNs to adversarial onslaughts. At the heart of our approach is a novel threshold-adapting leaky integrate-and-fire (TA-LIF) neuron model, which we adopt to construct the proposed adversarially robust homeostatic SNN (HoSNN). Distinct from traditional LIF models, our TA-LIF model incorporates a self-stabilizing dynamic thresholding mechanism, curtailing adversarial noise propagation and safeguarding the robustness of HoSNNs in an unsupervised manner. Theoretical analysis is presented to shed light on the stability and convergence properties of the TA-LIF neurons, underscoring their superior dynamic robustness under input distributional shifts over traditional LIF neurons. Remarkably, without explicit adversarial training, our HoSNNs demonstrate inherent robustness on CIFAR-10, with accuracy improvements to 72.6% and 54.19% against FGSM and PGD attacks, up from 20.97% and 0.6%, respectively. Furthermore, with minimal FGSM adversarial training, our HoSNNs surpass previous models by 29.99% under FGSM and 47.83% under PGD attacks on CIFAR-10. Our findings offer a new perspective on harnessing biological principles for bolstering SNNs adversarial robustness and defense, paving the way to more resilient neuromorphic computing.
    摘要 斯坦尼尔神经网络(SNN)具有高效和强大的神经逻辑计算的承诺。然而,SNN也面临着严重的抗击性攻击问题,这是其他类型神经网络一样的问题。我们的研究是首次启用神经自适应性来开发一种生物启发的解决方案,以抵抗SNN对抗性攻击的感受性。我们的方法的核心是一种新的阈值自适应泄漏 integrate-and-fire(TA-LIF)神经元模型。与传统的LIF模型不同,我们的TA-LIF模型包含一种自我稳定的动态阈值调节机制,这有助于防止抗击噪音的传播和保护HoSNN的稳定性。我们对TA-LIF神经元的稳定性和整合性进行了理论分析,并证明它们在输入分布变化时具有更高的动态稳定性。这些成果表明,不需要显式的抗击训练,我们的HoSNN可以自动具有很高的抗击性。我们的实验表明,我们的HoSNN在CIFAR-10上的准确率可以提高到72.6%和54.19%,比传统的FGSM和PGD攻击提高了29.99%和47.83%。此外,我们的HoSNN通过最小化FGSM抗击训练来超过之前的模型。这些成果表明,可以通过利用生物原理来强化SNN的抗击性和防御,为更可靠的神经omorphic计算提供新的思路。

Developing a Machine Learning-Based Clinical Decision Support Tool for Uterine Tumor Imaging

  • paper_url: http://arxiv.org/abs/2308.10372
  • repo_url: None
  • paper_authors: Darryl E. Wright, Adriana V. Gregory, Deema Anaam, Sepideh Yadollahi, Sumana Ramanathan, Kafayat A. Oyemade, Reem Alsibai, Heather Holmes, Harrison Gottlich, Cherie-Akilah G. Browne, Sarah L. Cohen Rassier, Isabel Green, Elizabeth A. Stewart, Hiroaki Takahashi, Bohyun Kim, Shannon Laughlin-Tommaso, Timothy L. Kline
  • For: The paper aims to develop an automated method for 3D segmentation of the uterus and uterine tumors (UTs) that is close to human-level performance with fewer than 150 annotated images.* Methods: The authors use a deep learning approach called nnU-Net and explore the effect of training set size on performance by randomly generating subsets with 25, 45, 65, and 85 training set images. They also evaluate the ability of radiomic features to distinguish between types of UTs.* Results: The authors achieve a test set F1-score of 0.80 for classifying degenerated leiomyoma (LM) from leiomyosarcoma (LMS), and F1-scores of 0.53 and 0.80 for classifying benign versus malignant and degenerated LM versus LMS, respectively. However, the authors note that reliable automatic differentiation of UTs remains a challenge.
    Abstract Uterine leiomyosarcoma (LMS) is a rare but aggressive malignancy. On imaging, it is difficult to differentiate LMS from, for example, degenerated leiomyoma (LM), a prevalent but benign condition. We curated a data set of 115 axial T2-weighted MRI images from 110 patients (mean [range] age=45 [17-81] years) with UTs that included five different tumor types. These data were randomly split stratifying on tumor volume into training (n=85) and test sets (n=30). An independent second reader (reader 2) provided manual segmentations for all test set images. To automate segmentation, we applied nnU-Net and explored the effect of training set size on performance by randomly generating subsets with 25, 45, 65 and 85 training set images. We evaluated the ability of radiomic features to distinguish between types of UT individually and when combined through feature selection and machine learning. Using the entire training set the mean [95% CI] fibroid DSC was measured as 0.87 [0.59-1.00] and the agreement between the two readers was 0.89 [0.77-1.0] on the test set. When classifying degenerated LM from LMS we achieve a test set F1-score of 0.80. Classifying UTs based on radiomic features we identify classifiers achieving F1-scores of 0.53 [0.45, 0.61] and 0.80 [0.80, 0.80] on the test set for the benign versus malignant, and degenerated LM versus LMS tasks. We show that it is possible to develop an automated method for 3D segmentation of the uterus and UT that is close to human-level performance with fewer than 150 annotated images. For distinguishing UT types, while we train models that merit further investigation with additional data, reliable automatic differentiation of UTs remains a challenge.
    摘要 uterine leiomyosarcoma (LMS) 是一种罕见 pero 严重的恶性肿瘤。在成像方面,与例如变性的 лейкомиома (LM) 相比,困难于 diferenciar LMS 的形态。我们收集了115个轴向 T2 磁共振成像图像,来自110名患者(平均年龄为45岁,年龄范围为17-81岁),以便自动分割肾脏和uterus的图像。这些数据被随机分割,以便在训练集(n=85)和测试集(n=30)之间进行分布式训练。一名独立的第二读者(读者2)为测试集图像提供了手动分割。为了自动分割,我们应用了 nnU-Net,并研究了训练集大小对性能的影响,通过随机生成25、45、65和85个训练集图像。我们评估了在不同类型的uterus和肾脏图像上的 радиологических特征的分化能力,并通过特征选择和机器学习来结合这些特征。使用整个训练集,我们测量了 mean [95% CI] 的 fibroid DSC 为0.87 [0.59-1.00],并且测试集上的两个读者之间的一致性为0.89 [0.77-1.0]。在分类 degenerated LM 和 LMS 之间,我们在测试集上取得了 F1 分数为0.80。基于 радиialogical 特征,我们标识出了分类器,在测试集上取得了 F1 分数为0.53 [0.45, 0.61] 和 0.80 [0.80, 0.80]。我们表明,可以通过使用 fewer than 150 个标注图像来开发一种自动方法,以便三维分割uterus和肾脏,并且这种方法的性能接近人类水平。然而,在分类不同类型的uterus和肾脏图像时,我们发现,可靠地自动分类 UT 仍然是一个挑战。

SE(3) Equivariant Augmented Coupling Flows

  • paper_url: http://arxiv.org/abs/2308.10364
  • repo_url: https://github.com/lollcat/se3-augmented-coupling-flows
  • paper_authors: Laurence I. Midgley, Vincent Stimper, Javier Antorán, Emile Mathieu, Bernhard Schölkopf, José Miguel Hernández-Lobato
  • for: 这 paper 是为了提出一种可以保持 SE(3) 和 permutation 对称的 coupling flow,用于probabilistic modeling of physical systems。
  • methods: 该 paper 使用了coordinate splits along additional augmented dimensions,将 atoms 的位置映射到 learned SE(3) 对称的基准系中,然后应用标准 flow transformations,如 monotonic rational-quadratic splines。
  • results: 该 flow 可以保持 fast sampling 和 density evaluation,并可以生成不偏的 expectation 值 estimates with respect to the target distribution via importance sampling。在 DW4, LJ13 和 QM9-positional 数据集上训练,该 flow 与 equivariant continuous normalizing flows 相比,可以 sampling 两个阶段 faster,并且是首次learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms。最后,paper 还示出了该 flow 可以approximately sample from the Boltzmann distribution of DW4 和 LJ13 particle systems using only their energy functions。
    Abstract Coupling normalizing flows allow for fast sampling and density evaluation, making them the tool of choice for probabilistic modeling of physical systems. However, the standard coupling architecture precludes endowing flows that operate on the Cartesian coordinates of atoms with the SE(3) and permutation invariances of physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling two orders of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.
    摘要 通过协同Normalizing Flows可以快速采样和density评估,使其成为物理系统的概率模型的工具 Choice. However, 标准的 coupling architecture 缺乏 SE(3) 和 Permutation 不变性 Physical systems. This work proposes a coupling flow that preserves SE(3) and permutation equivariance by performing coordinate splits along additional augmented dimensions. At each layer, the flow maps atoms' positions into learned SE(3) invariant bases, where we apply standard flow transformations, such as monotonic rational-quadratic splines, before returning to the original basis. Crucially, our flow preserves fast sampling and density evaluation, and may be used to produce unbiased estimates of expectations with respect to the target distribution via importance sampling. When trained on the DW4, LJ13 and QM9-positional datasets, our flow is competitive with equivariant continuous normalizing flows, while allowing sampling two orders of magnitude faster. Moreover, to the best of our knowledge, we are the first to learn the full Boltzmann distribution of alanine dipeptide by only modeling the Cartesian positions of its atoms. Lastly, we demonstrate that our flow can be trained to approximately sample from the Boltzmann distribution of the DW4 and LJ13 particle systems using only their energy functions.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Can Large Language Models Find And Fix Vulnerable Software?

  • paper_url: http://arxiv.org/abs/2308.10345
  • repo_url: None
  • paper_authors: David Noever
  • for: 这个研究是为了评估大语言模型(LLMs),特别是OpenAI的GPT-4,在检测软件漏洞方面的能力,并与传统的静态代码分析器如Snyk和Fortify进行比较。
  • methods: 我们的分析涵盖了多个仓库,包括NASA和美国国防部的仓库。GPT-4可以检测到大约四倍于其他工具的漏洞,并提供可行的修复方案,false positives的率很低。我们的测试包括129个代码样本 across eight programming languages,发现 PHP 和 JavaScript 的漏洞最高。GPT-4 的代码更正 Led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines。
  • results: GPT-4 的代码更正 led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines。这表明 LLMS 的自我审查能力,并且在检测到漏洞时可以提供可行的修复方案。未来的研究应该探索系统级别的漏洞和将多个静态代码分析器集成到 LLMS 的潜在能力中。
    Abstract In this study, we evaluated the capability of Large Language Models (LLMs), particularly OpenAI's GPT-4, in detecting software vulnerabilities, comparing their performance against traditional static code analyzers like Snyk and Fortify. Our analysis covered numerous repositories, including those from NASA and the Department of Defense. GPT-4 identified approximately four times the vulnerabilities than its counterparts. Furthermore, it provided viable fixes for each vulnerability, demonstrating a low rate of false positives. Our tests encompassed 129 code samples across eight programming languages, revealing the highest vulnerabilities in PHP and JavaScript. GPT-4's code corrections led to a 90% reduction in vulnerabilities, requiring only an 11% increase in code lines. A critical insight was LLMs' ability to self-audit, suggesting fixes for their identified vulnerabilities and underscoring their precision. Future research should explore system-level vulnerabilities and integrate multiple static code analyzers for a holistic perspective on LLMs' potential.
    摘要 在这项研究中,我们评估了大语言模型(LLM),特别是OpenAI的GPT-4,在检测软件漏洞方面的能力,与传统的静态代码分析器如Snyk和Fortify进行比较。我们的分析覆盖了多个仓库,包括NASA和国防部的仓库。GPT-4在检测漏洞方面表现出色,比其他Counterparts多出了约四倍的漏洞数。另外,它还提供了每个漏洞的可行修复方案,表明了低 False Positive 率。我们的测试包括129个代码样本 across eight种编程语言,发现最高的漏洞出现在 PHP 和 JavaScript 中。GPT-4 的代码更正引起了漏洞数量的90%减少,仅需要11%的代码行数增加。我们发现 LLMS 可以自我审查,提供检测到的漏洞的修复方案,这种精度是其中的一个关键点。未来的研究应该探索系统级别的漏洞和将多种静态代码分析器集成到 LLMS 的潜在能力中。

A Comprehensive Empirical Evaluation on Online Continual Learning

  • paper_url: http://arxiv.org/abs/2308.10328
  • repo_url: https://github.com/albinsou/ocl_survey
  • paper_authors: Albin Soutif–Cormerais, Antonio Carta, Andrea Cossu, Julio Hurtado, Vincenzo Lomonaco, Joost Van de Weijer, Hamed Hemati
    for: 这个论文主要针对的是在流动数据中进行在线连续学习,以实现更加接近真实生活中的学习经验。methods: 本文对现有文献中的多种在线连续学习方法进行评估,具体来说是在图像分类任务中的分类增量 setting。results: 结果显示大多数方法受到稳定性和过拟合问题的影响,但是学习得到的表示相当于随机抽样训练下的同等计算预算。没有一个明确的赢家 emerges,基本的经验回快是一个非常强的基eline。
    Abstract Online continual learning aims to get closer to a live learning experience by learning directly on a stream of data with temporally shifting distribution and by storing a minimum amount of data from that stream. In this empirical evaluation, we evaluate various methods from the literature that tackle online continual learning. More specifically, we focus on the class-incremental setting in the context of image classification, where the learner must learn new classes incrementally from a stream of data. We compare these methods on the Split-CIFAR100 and Split-TinyImagenet benchmarks, and measure their average accuracy, forgetting, stability, and quality of the representations, to evaluate various aspects of the algorithm at the end but also during the whole training period. We find that most methods suffer from stability and underfitting issues. However, the learned representations are comparable to i.i.d. training under the same computational budget. No clear winner emerges from the results and basic experience replay, when properly tuned and implemented, is a very strong baseline. We release our modular and extensible codebase at https://github.com/AlbinSou/ocl_survey based on the avalanche framework to reproduce our results and encourage future research.
    摘要 在线 kontinual learning 目的是通过直接学习数据流中的时间推移分布来减少数据存储量,从而更接近实时学习经验。在这项实验中,我们评估了文献中的多种在线 kontinual learning 方法。更具体地说,我们在图像分类上下文中关注了分类增量 Setting,learner需要从数据流中逐渐学习新类。我们在 Split-CIFAR100 和 Split-TinyImagenet 测试准则上对这些方法进行了评估,并测量了它们的均值精度、忘记、稳定性和表示质量,以评估它们的不同方面。我们发现大多数方法受到稳定性和下降问题的困扰。然而,学习的表示相对于 i.i.d. 训练相同计算资源下的表示相似。无论是在结果还是在训练过程中,基本的经验回放在概念上是非常强的基准。我们在 GitHub 上发布了我们的模块化和可扩展的代码库,访问 https://github.com/AlbinSou/ocl_survey,以便复制我们的结果并鼓励未来的研究。

Quantum State Tomography using Quantum Machine Learning

  • paper_url: http://arxiv.org/abs/2308.10327
  • repo_url: https://github.com/hongyehu/Machine_Learning_Quantum_State_Tomography
  • paper_authors: Nouhaila Innan, Owais Ishtiaq Siddiqui, Shivang Arora, Tamojit Ghosh, Yasemin Poyraz Koçak, Dominic Paragas, Abdullah Al Omar Galib, Muhammad Al-Zafar Khan, Mohamed Bennai
  • for: 本研究旨在提高量子状态 Tomatoes (QST)的效率,以便在大规模量子系统上应用。
  • methods: 本研究使用量子机器学习(QML)技术来提高QST的效率,并 investigate了不同的类型的QST方法,包括类型化和量子方法。
  • results: 我们的结果显示,我们的QML-based QST方法可以 дости到高准确率(98%),并且需要 fewer measurements than conventional methods,这使得它成为实际应用中的可能性。
    Abstract Quantum State Tomography (QST) is a fundamental technique in Quantum Information Processing (QIP) for reconstructing unknown quantum states. However, the conventional QST methods are limited by the number of measurements required, which makes them impractical for large-scale quantum systems. To overcome this challenge, we propose the integration of Quantum Machine Learning (QML) techniques to enhance the efficiency of QST. In this paper, we conduct a comprehensive investigation into various approaches for QST, encompassing both classical and quantum methodologies; We also implement different QML approaches for QST and demonstrate their effectiveness on various simulated and experimental quantum systems, including multi-qubit networks. Our results show that our QML-based QST approach can achieve high fidelity (98%) with significantly fewer measurements than conventional methods, making it a promising tool for practical QIP applications.
    摘要 量子状态测测(QST)是量子信息处理(QIP)中重要的技术,用于重constructing未知量子状态。然而,传统的QST方法受限于测量数量的限制,使其对大规模量子系统不实用。为超越这个挑战,我们提议通过量子机器学习(QML)技术来提高QST的效率。在这篇论文中,我们进行了对各种QST方法的全面调查,包括класси方法和量子方法;我们还实现了不同的QML方法 дляQST,并在 simulate 和实验性量子系统中证明其效果。我们的结果表明,我们的QML基于QST方法可以达到高准确率(98%),并且使用了远 fewer 测量,与传统方法相比,这使其在实际QIP应用中具有承诺的潜力。

Homogenising SoHO/EIT and SDO/AIA 171Å$~$ Images: A Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2308.10322
  • repo_url: None
  • paper_authors: Subhamoy Chatterjee, Andrés Muñoz-Jaramillo, Maher Dayeh, Hazel M. Bain, Kimberly Moreland
  • for: 这个论文的目的是创建一个同一个时间段内的多频率欧文图像合并,以便进行空天气预测任务。
  • methods: 该论文使用了深度学习模型,通过将多个调查结果合并成一个同一个时间段内的同一个图像来实现合并。
  • results: 该论文发现,使用多个模型的ensemble可以减少模型的不确定性,并且在测试数据中显示出更高的不确定性,表明模型在不良代表性的数据上表现更好。
    Abstract Extreme Ultraviolet images of the Sun are becoming an integral part of space weather prediction tasks. However, having different surveys requires the development of instrument-specific prediction algorithms. As an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. In this study, we utilize the temporal overlap of SoHO/EIT and SDO/AIA 171~\AA ~surveys to train an ensemble of deep learning models for creating a single homogeneous survey of EUV images for 2 solar cycles. Prior applications of deep learning have focused on validating the homogeneity of the output while overlooking the systematic estimation of uncertainty. We use an approach called `Approximate Bayesian Ensembling' to generate an ensemble of models whose uncertainty mimics that of a fully Bayesian neural network at a fraction of the cost. We find that ensemble uncertainty goes down as the training set size increases. Additionally, we show that the model ensemble adds immense value to the prediction by showing higher uncertainty in test data that are not well represented in the training data.
    摘要 extremely ultraviolet images of the sun are becoming an integral part of space weather prediction tasks. however, having different surveys requires the development of instrument-specific prediction algorithms. as an alternative, it is possible to combine multiple surveys to create a homogeneous dataset. in this study, we utilize the temporal overlap of soho/eit and sdo/aia 171 ~\aa ~surveys to train an ensemble of deep learning models for creating a single homogeneous survey of euv images for 2 solar cycles. prior applications of deep learning have focused on validating the homogeneity of the output while overlooking the systematic estimation of uncertainty. we use an approach called 'approximate bayesian ensembling' to generate an ensemble of models whose uncertainty mimics that of a fully bayesian neural network at a fraction of the cost. we find that ensemble uncertainty goes down as the training set size increases. additionally, we show that the model ensemble adds immense value to the prediction by showing higher uncertainty in test data that are not well represented in the training data.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Towards Sustainable Development: A Novel Integrated Machine Learning Model for Holistic Environmental Health Monitoring

  • paper_url: http://arxiv.org/abs/2308.10317
  • repo_url: None
  • paper_authors: Anirudh Mazumder, Sarthak Engala, Aditya Nallaparaju
  • for: 帮助政府确定 intervención点,改善规划和保护努力,以促进可持续发展。
  • methods: 使用机器学习技术来识别环境状况的预测特征,并将污染物水平和粉尘物作为环境状况的指标。
  • results: 通过识别环境状况的预测特征,帮助政府确定干预点,改善规划和保护努力,以促进可持续发展。
    Abstract Urbanization enables economic growth but also harms the environment through degradation. Traditional methods of detecting environmental issues have proven inefficient. Machine learning has emerged as a promising tool for tracking environmental deterioration by identifying key predictive features. Recent research focused on developing a predictive model using pollutant levels and particulate matter as indicators of environmental state in order to outline challenges. Machine learning was employed to identify patterns linking areas with worse conditions. This research aims to assist governments in identifying intervention points, improving planning and conservation efforts, and ultimately contributing to sustainable development.
    摘要

Demystifying the Performance of Data Transfers in High-Performance Research Networks

  • paper_url: http://arxiv.org/abs/2308.10312
  • repo_url: None
  • paper_authors: Ehsan Saeedizade, Bing Zhang, Engin Arslan
  • for: 本研究旨在提高高速数据传输网络中数据传输的效率,帮助解决数据传输中的性能问题。
  • methods: 本研究使用了一种可扩展的端到端监控框架,收集和存储关键性能指标,以便更好地了解数据传输的性能。
  • results: 评估结果显示,提posed框架可以同时监控400个转移每个主机,总共超过40,000个转移,并且可以在一秒钟精度下收集性能统计数据。此外,我们还提出了一种自动处理收集到的性能指标的启发方法,可以自动标识性能异常的根本原因,并且其F-score在87-98%之间。
    Abstract High-speed research networks are built to meet the ever-increasing needs of data-intensive distributed workflows. However, data transfers in these networks often fail to attain the promised transfer rates for several reasons, including I/O and network interference, server misconfigurations, and network anomalies. Although understanding the root causes of performance issues is critical to mitigating them and increasing the utilization of expensive network infrastructures, there is currently no available mechanism to monitor data transfers in these networks. In this paper, we present a scalable, end-to-end monitoring framework to gather and store key performance metrics for file transfers to shed light on the performance of transfers. The evaluation results show that the proposed framework can monitor up to 400 transfers per host and more than 40, 000 transfers in total while collecting performance statistics at one-second precision. We also introduce a heuristic method to automatically process the gathered performance metrics and identify the root causes of performance anomalies with an F-score of 87 - 98%.
    摘要 高速研究网络是为了满足数据敏感分布式工作流的不断增长需求而建立的。然而,在这些网络中数据传输经常无法实现承诺的传输速率,这可能由多种原因引起,包括I/O和网络干扰、服务器配置错误和网络异常。虽然了解性能问题的根本原因非常重要,以便解决它们并提高昂贵的网络基础设施的使用率,但目前并无可用的监控数据传输机制。在这篇论文中,我们提出了一种可扩展的终端到终端监控框架,用于收集和存储关键性能指标。我们的评估结果表明,我们的框架可以监控每个主机400个传输,总共超过40,000个传输,并且在一秒钟精度下收集性能统计数据。我们还提出了一种启发式方法,用于自动处理收集的性能指标,并识别性能异常的根本原因,F-分数为87-98%。

I/O Burst Prediction for HPC Clusters using Darshan Logs

  • paper_url: http://arxiv.org/abs/2308.10311
  • repo_url: None
  • paper_authors: Ehsan Saeedizade, Roya Taheri, Engin Arslan
  • for: 本研究旨在分析大规模高性能计算(HPC)集群中的读写IO patrern,以优化IO性能和应用运行时间。
  • methods: 本研究使用Darshan报告数据分析系统级别的读写IO率,并使用机器学习模型预测IO峰值事件。
  • results: 研究发现系统级别的IO峰值事件频繁发生,并可以使用机器学习模型高度准确预测IO峰值事件(准确率超过90%)。此外,研究还提出了一种基于IO峰值预测的应用调度策略,可以减少应用运行时间。
    Abstract Understanding cluster-wide I/O patterns of large-scale HPC clusters is essential to minimize the occurrence and impact of I/O interference. Yet, most previous work in this area focused on monitoring and predicting task and node-level I/O burst events. This paper analyzes Darshan reports from three supercomputers to extract system-level read and write I/O rates in five minutes intervals. We observe significant (over 100x) fluctuations in read and write I/O rates in all three clusters. We then train machine learning models to estimate the occurrence of system-level I/O bursts 5 - 120 minutes ahead. Evaluation results show that we can predict I/O bursts with more than 90% accuracy (F-1 score) five minutes ahead and more than 87% accuracy two hours ahead. We also show that the ML models attain more than 70% accuracy when estimating the degree of the I/O burst. We believe that high-accuracy predictions of I/O bursts can be used in multiple ways, such as postponing delay-tolerant I/O operations (e.g., checkpointing), pausing nonessential applications (e.g., file system scrubbers), and devising I/O-aware job scheduling methods. To validate this claim, we simulated a burst-aware job scheduler that can postpone the start time of applications to avoid I/O bursts. We show that the burst-aware job scheduling can lead to an up to 5x decrease in application runtime.
    摘要 理解大规模HPC集群中的各种I/O模式是必要的,以避免I/O干扰的发生和影响。然而,大多数前一些工作都是关注监测和预测任务和节点级I/O异常事件。本文分析了Darshan报告从三个超级计算机,以提取系统级读写I/O速率的五分钟间隔。我们发现所有三个集群中有显著的读写I/O速率波动(超过100倍)。然后,我们训练机器学习模型,以估计系统级I/O异常事件的发生5-120分钟前。评估结果表明,我们可以在5分钟前预测I/O异常事件的occurrence高于90%(F-1分数),并在2小时前预测高于87%。此外,我们还证明了ML模型可以在70%以上的情况下估计I/O异常事件的严重程度。我们认为高精度的I/O异常预测可以用于多种方式,如延迟允许延迟I/O操作(例如检查点)、挫止非关键应用程序(例如文件系统扫描),以及开发I/O感知的任务调度方法。为证明这一点,我们模拟了一种基于I/O异常预测的任务调度策略,可以在I/O异常事件发生时推迟应用程序的启动时间。我们显示该策略可以降低应用程序的运行时间,最高可达5倍。

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video

  • paper_url: http://arxiv.org/abs/2308.10305
  • repo_url: https://github.com/kasvii/pmce
  • paper_authors: Yingxuan You, Hong Liu, Ti Wang, Wenhao Li, Runwei Ding, Xia Li
  • for: 该论文目的是提出一种基于视频的3D人体模型重建方法,以提高现有方法的精度和准确性。
  • methods: 该方法使用了一种名为Pose and Mesh Co-Evolution网络(PMCE),它将该任务分解为两个部分:1)视频基于3D人体 pose 估计,2)从估计的3Dpose和时间像素特征中进行顶点 regression。
  • results: 对于三个 benchmark 数据集(3DPW、Human3.6M 和 MPI-INF-3DHP),该方法的实验结果表明,PMCE 在每帧精度和时间一致性方面都有较前方法的优势。
    Abstract Despite significant progress in single image-based 3D human mesh recovery, accurately and smoothly recovering 3D human motion from a video remains challenging. Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.
    摘要 尽管单个图像基于3D人体模lesh recovering已经做出了 significativ progress,但从视频中准确和平滑地recover3D人体运动仍然是一个挑战。现有的视频基本方法通常通过计算复杂的姿势和形态参数来恢复人体模lesh, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To address this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.