cs.LG - 2023-08-19

Efficient Representation Learning for Healthcare with Cross-Architectural Self-Supervision

  • paper_url: http://arxiv.org/abs/2308.10064
  • repo_url: https://github.com/pranavsinghps1/CASS
  • paper_authors: Pranav Singh, Jacopo Cirrone
  • for: 这篇论文的目的是解决在医疗和生物医学应用中的极限计算需求,以实现表现学习的效果。
  • methods: 本论文提出了一种新的同构自监督学习方法,将传播储存器和对称神经网络(CNN)融合使用,以提高效率。
  • results: empirical evaluation 表明,CASS-trained CNNs和Transformers 在四个不同的医疗数据集上比先前的自监督学习方法表现更好,尤其是只使用 1% 标注数据进行调整时。
    Abstract In healthcare and biomedical applications, extreme computational requirements pose a significant barrier to adopting representation learning. Representation learning can enhance the performance of deep learning architectures by learning useful priors from limited medical data. However, state-of-the-art self-supervised techniques suffer from reduced performance when using smaller batch sizes or shorter pretraining epochs, which are more practical in clinical settings. We present Cross Architectural - Self Supervision (CASS) in response to this challenge. This novel siamese self-supervised learning approach synergistically leverages Transformer and Convolutional Neural Networks (CNN) for efficient learning. Our empirical evaluation demonstrates that CASS-trained CNNs and Transformers outperform existing self-supervised learning methods across four diverse healthcare datasets. With only 1% labeled data for finetuning, CASS achieves a 3.8% average improvement; with 10% labeled data, it gains 5.9%; and with 100% labeled data, it reaches a remarkable 10.13% enhancement. Notably, CASS reduces pretraining time by 69% compared to state-of-the-art methods, making it more amenable to clinical implementation. We also demonstrate that CASS is considerably more robust to variations in batch size and pretraining epochs, making it a suitable candidate for machine learning in healthcare applications.
    摘要 在医疗和生物医学应用中,极高的计算需求成为了使用表示学习的障碍。表示学习可以提高深度学习架构的性能,但是现有的自我超vised学习技术在使用小批量或短duration pretraining epochs时会导致性能下降,这些epochs更加适合临床应用。为了解决这个挑战,我们提出了跨建筑-自我超vised学习(CASS)方法。这种新的siamesecross-architectural self-supervised learning方法利用了 transformer 和 convolutional neural networks(CNN),以高效地学习。我们的实验证明,CASS-trained CNNs和 transformers 在四种不同的医疗数据集上都超过了现有的自我超vised learning方法。具有1% 标注数据进行 fine-tuning,CASS 可以提高3.8%的平均提升;具有10% 标注数据,它可以提高5.9%;具有100% 标注数据,它可以达到了10.13%的很出色的提升。另外,CASS 可以降低预训练时间,相比之前的方法,它更适合临床应用。我们还证明了 CASS 对批量大小和预训练epochs的变化具有较强的鲁棒性,使其成为医学应用中的适合者。

Accelerating Exact Combinatorial Optimization via RL-based Initialization – A Case Study in Scheduling

  • paper_url: http://arxiv.org/abs/2308.11652
  • repo_url: None
  • paper_authors: Jiaqi Yin, Cunxi Yu
  • for: 本研究旨在提出一种创新的方法,使用机器学习(ML)解决计算图上的 combinatorial 优化问题,例如 EdgeTPU 平台上的计算图调度问题。
  • methods: 本研究使用了一种两阶段 RL-to-ILP 调度框架,包括三个步骤:1)RL 算法 acted as coarse-grain 调度器,2)解决 relaxation 和 3)精确的 ILP 解。
  • results: 研究表明,使用本研究的方法可以保证优化和幂等性,同时具有较高的运行时效率,在 EdgeTPU 平台上实现了 ImageNet DNN 计算图的实时推理和加速。
    Abstract Scheduling on dataflow graphs (also known as computation graphs) is an NP-hard problem. The traditional exact methods are limited by runtime complexity, while reinforcement learning (RL) and heuristic-based approaches struggle with determinism and solution quality. This research aims to develop an innovative approach that employs machine learning (ML) for addressing combinatorial optimization problems, using scheduling as a case study. The goal is to provide guarantees in optimality and determinism while maintaining the runtime cost of heuristic methods. Specifically, we introduce a novel two-phase RL-to-ILP scheduling framework, which includes three steps: 1) RL solver acts as coarse-grain scheduler, 2) solution relaxation and 3) exact solving via ILP. Our framework demonstrates the same scheduling performance compared with using exact scheduling methods while achieving up to 128 $\times$ speed improvements. This was conducted on actual EdgeTPU platforms, utilizing ImageNet DNN computation graphs as input. Additionally, the framework offers improved on-chip inference runtime and acceleration compared to the commercially available EdgeTPU compiler.
    摘要 “计划在数据流图(也称为计算图)是NP困难的问题。传统的精确方法受到时间复杂度的限制,而RL和规则基本方法则受到权化和解决质量的影响。这项研究旨在开发一种创新的方法,利用机器学习(ML)来解决 combinatorial 优化问题,使用计划为例研究。目标是提供优化和权化的保证,同时保持规则基本方法的运行时间成本。我们介绍了一种新的两阶段RL-to-ILP 计划框架,包括以下三个步骤:1)RL 解决器 acts as 粗糙度调度器,2)解决relaxation和3)精确的解决via ILP。我们的框架示出与使用精确计划方法相同的计划性表现,同时实现了128倍的速度提升。这些研究在实际的 EdgeTPU 平台上进行,使用 ImageNet DNN 计算图作为输入。此外,我们的框架还提供了比商业可用的 EdgeTPU 编译器更好的在处理器上的执行时间和加速。”

The Snowflake Hypothesis: Training Deep GNN with One Node One Receptive field

  • paper_url: http://arxiv.org/abs/2308.10051
  • repo_url: None
  • paper_authors: Kun Wang, Guohao Li, Shilong Wang, Guibin Zhang, Kai Wang, Yang You, Xiaojiang Peng, Yuxuan Liang, Yang Wang
  • for: This paper aims to improve the performance and interpretability of deep graph neural networks (GNNs) by introducing the Snowflake Hypothesis, which posits that each node in a graph should have its own unique receptive field.
  • methods: The paper conducts a systematic study of deeper GNN research trajectories, employing the simplest gradient and node-level cosine distance as guiding principles to regulate the aggregation depth for each node. The authors also compare their approach with different aggregation strategies on multiple benchmarks.
  • results: The paper demonstrates that the Snowflake Hypothesis can serve as a universal operator for a range of tasks, and it displays tremendous potential on deep GNNs. The authors show that their approach can be applied to various GNN frameworks, enhancing their effectiveness when operating in-depth, and guiding the selection of the optimal network depth in an explainable and generalizable way.Here is the text in Simplified Chinese:
  • for: 本 paper 的目的是提高深度 graph neural network (GNN) 的性能和可解性,通过引入 snowflake 假设,即每个节点在图中应该有自己独特的接受范围。
  • methods: 本 paper 通过系统地研究深度 GNN 研究轨迹,使用最简单的梯度和节点级cosine距离来规则每个节点的聚合深度。作者还比较了不同的聚合策略在多个 benchmark 上。
  • results: 本 paper 表明 snowflake 假设可以作为一个通用的运算,并且在深度 GNN 中显示出巨大的潜力。作者们表明该方法可以应用于不同的 GNN 框架,提高它们在深度下的效果,并且可以在可解的和通用的方式下选择最佳网络深度。
    Abstract Despite Graph Neural Networks demonstrating considerable promise in graph representation learning tasks, GNNs predominantly face significant issues with over-fitting and over-smoothing as they go deeper as models of computer vision realm. In this work, we conduct a systematic study of deeper GNN research trajectories. Our findings indicate that the current success of deep GNNs primarily stems from (I) the adoption of innovations from CNNs, such as residual/skip connections, or (II) the tailor-made aggregation algorithms like DropEdge. However, these algorithms often lack intrinsic interpretability and indiscriminately treat all nodes within a given layer in a similar manner, thereby failing to capture the nuanced differences among various nodes. To this end, we introduce the Snowflake Hypothesis -- a novel paradigm underpinning the concept of ``one node, one receptive field''. The hypothesis draws inspiration from the unique and individualistic patterns of each snowflake, proposing a corresponding uniqueness in the receptive fields of nodes in the GNNs. We employ the simplest gradient and node-level cosine distance as guiding principles to regulate the aggregation depth for each node, and conduct comprehensive experiments including: (1) different training schemes; (2) various shallow and deep GNN backbones, and (3) various numbers of layers (8, 16, 32, 64) on multiple benchmarks (six graphs including dense graphs with millions of nodes); (4) compare with different aggregation strategies. The observational results demonstrate that our hypothesis can serve as a universal operator for a range of tasks, and it displays tremendous potential on deep GNNs. It can be applied to various GNN frameworks, enhancing its effectiveness when operating in-depth, and guiding the selection of the optimal network depth in an explainable and generalizable way.
    摘要 尽管图形神经网络(Graph Neural Networks,GNNs)在图像表示学任务中表现出了很大的承诺,但它们在深度层次上面临着颗粒泛化和颗粒滤波等问题。在这项工作中,我们进行了系统的 deeper GNN 研究轨迹。我们的发现表明,当前深度 GNN 的成功主要归功于(I) adopting innovations from CNNs,如待遇/跳过连接,或(II)适应制的聚合算法如 DropEdge。然而,这些算法通常缺乏内在解释性和不具分辨率地对所有层中的所有节点进行处理,因此无法捕捉不同节点之间的细腻差异。为此,我们提出了“雪花假设”——一种新的思想,它着眼于每个节点具有独特和特殊的感知领域。我们采用 simplest gradient 和节点级 cosine distance 作为引导原则,以REGULATE aggregation depth for each node,并进行了广泛的实验,包括:(1)不同训练方案;(2)不同的 shallow 和 deep GNN 基础架构;(3)不同层数(8, 16, 32, 64)在多个 benchmark 上进行了多种实验。结果表明,我们的假设可以作为一种通用的操作,并且在深度 GNN 中表现出了巨大的潜力。它可以应用于多种 GNN 框架,提高其在深度下的效果,并且可以在可解释的和普遍的方式下选择最佳网络深度。

Computing the Vapnik Chervonenkis Dimension for Non-Discrete Settings

  • paper_url: http://arxiv.org/abs/2308.10041
  • repo_url: None
  • paper_authors: Mohammed Nechba, Mouhajir Mohamed, Sedjari Yassine
  • for: 这个论文的目的是为了开发一种可以计算无约束的VC dimenson的方法。
  • methods: 该论文使用了Empirical Risk Minimization(ERM)学习模型来Characterize the shattering property of a concept class。
  • results: 该论文提出了一种可以approximately计算VC dimenson的方法,不再需要约束于概念集或其域集。
    Abstract In 1984, Valiant [ 7 ] introduced the Probably Approximately Correct (PAC) learning framework for boolean function classes. Blumer et al. [ 2] extended this model in 1989 by introducing the VC dimension as a tool to characterize the learnability of PAC. The VC dimension was based on the work of Vapnik and Chervonenkis in 1971 [8 ], who introduced a tool called the growth function to characterize the shattering property. Researchers have since determined the VC dimension for specific classes, and efforts have been made to develop an algorithm that can calculate the VC dimension for any concept class. In 1991, Linial, Mansour, and Rivest [4] presented an algorithm for computing the VC dimension in the discrete setting, assuming that both the concept class and domain set were finite. However, no attempts had been made to design an algorithm that could compute the VC dimension in the general setting.Therefore, our work focuses on developing a method to approximately compute the VC dimension without constraints on the concept classes or their domain set. Our approach is based on our finding that the Empirical Risk Minimization (ERM) learning paradigm can be used as a new tool to characterize the shattering property of a concept class.
    摘要 在1984年,强力(7)引入了一种名为“可以近似正确”(PAC)学习框架,用于 boolean 函数类型。布吕默等人(2)在1989年将这个模型扩展了,通过引入 VC 维度来描述学习可能性。VC 维度基于1971年由普纳克和 Червонен科伊所引入的一种工具——生长函数,用于描述分裂性质。后续的研究者已经确定了特定类型的 VC 维度,并尝试了开发一种可以计算任何概念类型的 VC 维度的算法。在1991年,林亚尔、曼索尔和里韦斯特(4)提出了一种算法,用于计算 discrete Setting 中的 VC 维度,假设概念类型和域集都是 фиксирован的。然而,没有任何尝试过开发一种可以计算总体设定中的 VC 维度的算法。因此,我们的工作将关注于开发一种可以近似计算 VC 维度的方法,不受概念类型或域集的限制。我们的方法基于我们发现,Empirical Risk Minimization(ERM)学习模式可以用作一种新的工具来描述分裂性质。

Physics-guided training of GAN to improve accuracy in airfoil design synthesis

  • paper_url: http://arxiv.org/abs/2308.10038
  • repo_url: None
  • paper_authors: Kazunari Wada, Katsuyuki Suzuki, Kazuo Yonekura
  • for: 本研究使用生成对抗网络(GAN)进行机械形状的设计合成,但GANometimes输出物理不合理的形状。例如,当GAN模型在输出空气foil形状时,需要的 aerodynamic性能值会出现显著的错误。这是因为GAN模型只考虑数据,而不考虑下面的 aerodynamic equations。
  • methods: 本研究提出了基于物理学习的GAN模型训练方法,使得GAN模型学习物理有效性。物理有效性通过外部的通用软件计算,而不是直接在神经网络模型中实现物理方程。此外,生成模型的输出数据通常和训练数据相似,无法生成完全新的形状。但是,由于提议的模型受到物理模型的指导,不使用训练集,因此可以生成完全新的形状。
  • results: 数值实验表明,提议的模型可以减轻错误,同时输出的形状与训练集不同,但仍满足物理有效性。这超越了现有的GAN模型的局限性。
    Abstract Generative adversarial networks (GAN) have recently been used for a design synthesis of mechanical shapes. A GAN sometimes outputs physically unreasonable shapes. For example, when a GAN model is trained to output airfoil shapes that indicate required aerodynamic performance, significant errors occur in the performance values. This is because the GAN model only considers data but does not consider the aerodynamic equations that lie under the data. This paper proposes the physics-guided training of the GAN model to guide the model to learn physical validity. Physical validity is computed using general-purpose software located outside the neural network model. Such general-purpose software cannot be used in physics-informed neural network frameworks, because physical equations must be implemented inside the neural network models. Additionally, a limitation of generative models is that the output data are similar to the training data and cannot generate completely new shapes. However, because the proposed model is guided by a physical model and does not use a training dataset, it can generate completely new shapes. Numerical experiments show that the proposed model drastically improves the accuracy. Moreover, the output shapes differ from those of the training dataset but still satisfy the physical validity, overcoming the limitations of existing GAN models.
    摘要 生成对抗网络(GAN)最近在机械形状设计中得到了应用。然而,GAN sometimes outputs不理性的形状。例如,当GAN模型被训练来输出符合需要的 aerodynamic performance 的风 razor shape 时,会出现 significannot 的错误。这是因为GAN模型只考虑数据,而不考虑下面数据的 aerodynamic 方程。这篇论文提议通过 физи学导向的 GAN 模型来引导模型学习物理有效性。物理有效性通过外部通用软件计算,这种外部通用软件无法用于物理学 Informed Neural Network 框架中,因为物理方程必须在神经网络模型中实现。此外,生成模型的一个限制是输出数据与训练数据相似,无法生成完全新的形状。然而,由于提议的模型受到物理模型的导引,不使用训练集,因此可以生成完全新的形状。 num 实验表明,提议的模型可以减少错误,并且输出的形状与训练集不同,但仍满足物理有效性,超越现有 GAN 模型的限制。

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

  • paper_url: http://arxiv.org/abs/2308.10037
  • repo_url: https://github.com/nechbamohammed/swiftlogisticreg
  • paper_authors: Nechba Mohammed, Mouhajir Mohamed, Sedjari Yassine
  • for: 这个论文旨在提出一个基于GPU的多线程逻辑回传函数(Logistic Regression,LR),以满足巨量数据的运算需求。
  • methods: 这个实现方式是基于X. Zou等人提出的平行Gradient Descent Logistic Regression算法的直译版本。
  • results: 我们的GPU基于LR在处理大资料集时的执行时间比CPU基于实现更快,但与相同的f1分数相似。这使得我们的方法尤其有利于实时预测应用,如影像识别、垃圾邮件检测和诈欺检测。
    Abstract We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the parallel Gradient Descent Logistic Regression algorithm proposed by X. Zou et al. [12]. Our experiments demonstrate that our GPU-based LR outperforms existing CPU-based implementations in terms of execution time while maintaining comparable f1 score. The significant acceleration of processing large datasets makes our method particularly advantageous for real-time prediction applications like image recognition, spam detection, and fraud detection. Our algorithm is implemented in a ready-to-use Python library available at : https://github.com/NechbaMohammed/SwiftLogisticReg
    摘要 我们提出了一种高性能的GPU基于的并行Logistic Regression(LR)算法,以满足大量数据集的需求。我们的实现是直接将平行梯度下降Logistic Regression算法提出的X. Zou等人的方案翻译而成。我们的实验表明,我们的GPU基于LR在执行时间方面与CPU基于实现相比具有明显的优势,同时保持了相似的准确率。这种加速处理大数据集的能力使得我们的方法在实时预测应用中,如图像识别、垃圾邮件检测和诈骗检测等方面具有明显的优势。我们的算法已经实现在Python库中,可以在以下地址下下载:https://github.com/NechbaMohammed/SwiftLogisticReg。

Semi-Supervised Anomaly Detection for the Determination of Vehicle Hijacking Tweets

  • paper_url: http://arxiv.org/abs/2308.10036
  • repo_url: None
  • paper_authors: Taahir Aiyoob Patel, Clement N. Nyirenda
  • for: 本研究旨在使用微博来识别抢夺事件,以帮助旅行者避免成为受害者。
  • methods: 本研究使用了无监督异常检测算法,包括KNN和CBLOF两种方法,对Twitter上包含“抢夺”关键词的微博进行分析。
  • results: 比较分析表明,CBLOF方法的准确率为90%,而KNN方法的准确率为89%。CBLOF方法还得到了F1分数为0.8,而KNN方法得到了0.78。因此,CBLOF方法在抢夺微博识别方面表现了一定的优势。未来,将会对大型数据集使用超级vised学习方法进行比较,并使用优化机制提高总性能。
    Abstract In South Africa, there is an ever-growing issue of vehicle hijackings. This leads to travellers constantly being in fear of becoming a victim to such an incident. This work presents a new semi-supervised approach to using tweets to identify hijacking incidents by using unsupervised anomaly detection algorithms. Tweets consisting of the keyword "hijacking" are obtained, stored, and processed using the term frequency-inverse document frequency (TF-IDF) and further analyzed by using two anomaly detection algorithms: 1) K-Nearest Neighbour (KNN); 2) Cluster Based Outlier Factor (CBLOF). The comparative evaluation showed that the KNN method produced an accuracy of 89%, whereas the CBLOF produced an accuracy of 90%. The CBLOF method was also able to obtain a F1-Score of 0.8, whereas the KNN produced a 0.78. Therefore, there is a slight difference between the two approaches, in favour of CBLOF, which has been selected as a preferred unsupervised method for the determination of relevant hijacking tweets. In future, a comparison will be done between supervised learning methods and the unsupervised methods presented in this work on larger dataset. Optimisation mechanisms will also be employed in order to increase the overall performance.
    摘要 在南非, vehicular hijacking 是一个日益增长的问题。这使得旅行者们经常处于被劫持的恐惧中。本文提出了一种新的半监督方法,使用社交媒体上的 tweet 来识别劫持事件。使用关键词 "劫持" 获取、存储和处理 tweet,并使用 term frequency-inverse document frequency (TF-IDF) 进行处理。然后使用两种异常检测算法:1)K-Nearest Neighbour (KNN);2)Cluster Based Outlier Factor (CBLOF)。对比评估表明,KNN 方法的准确率为 89%,而 CBLOF 方法的准确率为 90%。CBLOF 方法还可以获得 F1-Score 0.8,而 KNN 方法只有 0.78。因此,CBLOF 方法有一定的优势,因此被选为劫持 tweet 的半监督方法。未来,将进行大量数据集上的比较,以及优化机制的实现,以提高总性能。

Effects of Convolutional Autoencoder Bottleneck Width on StarGAN-based Singing Technique Conversion

  • paper_url: http://arxiv.org/abs/2308.10021
  • repo_url: None
  • paper_authors: Tung-Cheng Su, Yung-Chuan Chang, Yi-Wen Liu
  • for: 本研究旨在评估对声音技巧转换(STC)中CAE核心宽度的影响,以提高转换质量。
  • methods: 我们采用了基于GAN的多域STC系统,利用WORLD vocoder表示和CAE架构。我们对CAE核心宽度进行了变换,并对转换结果进行主观评估。
  • results: 结果显示,更宽的CAE核心对于语音清晰度有益,但不一定导致更高的同化度。四种声音技巧中,抽喉声最容易转换,而其他三种声音技巧作为源声音更能实现更加真实的转换。
    Abstract Singing technique conversion (STC) refers to the task of converting from one voice technique to another while leaving the original singer identity, melody, and linguistic components intact. Previous STC studies, as well as singing voice conversion research in general, have utilized convolutional autoencoders (CAEs) for conversion, but how the bottleneck width of the CAE affects the synthesis quality has not been thoroughly evaluated. To this end, we constructed a GAN-based multi-domain STC system which took advantage of the WORLD vocoder representation and the CAE architecture. We varied the bottleneck width of the CAE, and evaluated the conversion results subjectively. The model was trained on a Mandarin dataset which features four singers and four singing techniques: the chest voice, the falsetto, the raspy voice, and the whistle voice. The results show that a wider bottleneck corresponds to better articulation clarity but does not necessarily lead to higher likeness to the target technique. Among the four techniques, we also found that the whistle voice is the easiest target for conversion, while the other three techniques as a source produce more convincing conversion results than the whistle.
    摘要 声乐技巧转换(STC)指的是将一种声乐技巧转换为另一种,保持原始歌手身份、旋律和语言元素不变。先前的 STC 研究以及全球声乐转换研究都使用了卷积 autoencoder(CAE)进行转换,但 CAE 瓶颈宽度如何影响转换质量尚未得到全面评估。为了解决这个问题,我们构建了基于 GAN 的多Domain STC 系统,利用了 WORLD vocoder 表示和 CAE 建筑。我们在 CAE 瓶颈宽度上进行了变量,并对转换结果进行主观评估。系统在普通话 dataset 上训练,该 dataset 包含四名歌手和四种声乐技巧:胸部声、抖音声、咔声和喊声。结果表明,宽瓶颈对声音清晰度有更好的影响,但并不一定导致更高的模仿度。其中四种技巧中,喊声是转换最容易的目标,而其他三种技巧作为源都能够更加自然地转换。

Semi-Implicit Variational Inference via Score Matching

  • paper_url: http://arxiv.org/abs/2308.10014
  • repo_url: https://github.com/longinyu/sivism
  • paper_authors: Longlin Yu, Cheng Zhang
  • for: 强化变量推理的表达力(Semi-implicit variational inference,SIVI),使得变量推理家族更加丰富和灵活。
  • methods: 基于分形证据匹配(score matching)的一种新方法,充分利用变量推理家族的层次结构,使得困难的变量推理density可以自然地处理。
  • results: 与MCMC相比,SIVI-SM方法可以准确地匹配MCMC的准确性,并且在多种推理任务中超过ELBO基于的SIVI方法。
    Abstract Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for training. In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. Leveraging the hierarchical structure of semi-implicit variational families, the score matching objective allows a minimax formulation where the intractable variational densities can be naturally handled with denoising score matching. We show that SIVI-SM closely matches the accuracy of MCMC and outperforms ELBO-based SIVI methods in a variety of Bayesian inference tasks.
    摘要 <>使用半隐式变量推断(SIVI)可以扩大变量家族的表达能力,通过在层次结构中考虑隐式变量分布定义。然而,由于变量分布的计算困难,现有的SIVI方法通常使用代理证据下界(ELBO)或者使用昂贵的内 Loop MCMC 迭代进行培训。在这篇论文中,我们提出了基于代理证据匹配的新方法,称为SIVI-SM。该方法利用层次结构中的 semi-implicit 变量家族,通过代理证据匹配对象来实现一种减少对变量分布的干扰的训练目标。我们表明,SIVI-SM可以准确地与 MCMC 匹配,并且在多种 bayesian 推断任务中超过 ELBO 基于的 SIVI 方法表现。>>>

Distributionally Robust Cross Subject EEG Decoding

  • paper_url: http://arxiv.org/abs/2308.11651
  • repo_url: None
  • paper_authors: Tiehang Duan, Zhenyi Wang, Gianfranco Doretto, Fang Li, Cui Tao, Donald Adjeroh
    for: 这个研究旨在提高EEG解oding任务的表现,特别是面对高度不确定和不同类型的资料污染。methods: 这个研究使用分布robust优化和 Wasserstein gradient flow (WGF) 提出了一个原理式的演化方法,通过对数据分布进行演化来提高解oding的类别识别能力。results: 实验结果显示,提案的方法可以与其他数据增强技术结合使用,在严重损坏EEG讯号的情况下表现出色,较前一些竞争基eline。
    Abstract Recently, deep learning has shown to be effective for Electroencephalography (EEG) decoding tasks. Yet, its performance can be negatively influenced by two key factors: 1) the high variance and different types of corruption that are inherent in the signal, 2) the EEG datasets are usually relatively small given the acquisition cost, annotation cost and amount of effort needed. Data augmentation approaches for alleviation of this problem have been empirically studied, with augmentation operations on spatial domain, time domain or frequency domain handcrafted based on expertise of domain knowledge. In this work, we propose a principled approach to perform dynamic evolution on the data for improvement of decoding robustness. The approach is based on distributionally robust optimization and achieves robustness by optimizing on a family of evolved data distributions instead of the single training data distribution. We derived a general data evolution framework based on Wasserstein gradient flow (WGF) and provides two different forms of evolution within the framework. Intuitively, the evolution process helps the EEG decoder to learn more robust and diverse features. It is worth mentioning that the proposed approach can be readily integrated with other data augmentation approaches for further improvements. We performed extensive experiments on the proposed approach and tested its performance on different types of corrupted EEG signals. The model significantly outperforms competitive baselines on challenging decoding scenarios.
    摘要

Disposable Transfer Learning for Selective Source Task Unlearning

  • paper_url: http://arxiv.org/abs/2308.09971
  • repo_url: None
  • paper_authors: Seunghee Koh, Hyounguk Shon, Janghyeon Lee, Hyeong Gwon Hong, Junmo Kim
  • for: 这个论文旨在解决如何在转移学习中保持目标任务的表现,而不是完全抛弃源任务的表现。
  • methods: 该论文提出了一种新的转移学习方法,即丢弃转移学习(DTL),该方法可以在转移学习过程中 selectively 抛弃源任务的知识。
  • results: 论文表明,通过使用 Gradient Collision loss(GC loss)可以帮助模型 selectively 抛弃源任务的知识,同时保持目标任务的表现。 GC loss 可以衡量知识泄露的程度,并且可以在新的下游任务上进行重新训练。
    Abstract Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation. Even after the pre-trained model is adapted for the target task, the representation performance of the feature extractor is retained to some extent. As the performance of the pre-trained model can be considered the private property of the owner, it is natural to seek the exclusive right of the generalized performance of the pre-trained weight. To address this issue, we suggest a new paradigm of transfer learning called disposable transfer learning (DTL), which disposes of only the source task without degrading the performance of the target task. To achieve knowledge disposal, we propose a novel loss named Gradient Collision loss (GC loss). GC loss selectively unlearns the source knowledge by leading the gradient vectors of mini-batches in different directions. Whether the model successfully unlearns the source task is measured by piggyback learning accuracy (PL accuracy). PL accuracy estimates the vulnerability of knowledge leakage by retraining the scrubbed model on a subset of source data or new downstream data. We demonstrate that GC loss is an effective approach to the DTL problem by showing that the model trained with GC loss retains the performance on the target task with a significantly reduced PL accuracy.
    摘要 <>translate "Transfer learning is widely used for training deep neural networks (DNN) for building a powerful representation. Even after the pre-trained model is adapted for the target task, the representation performance of the feature extractor is retained to some extent. As the performance of the pre-trained model can be considered the private property of the owner, it is natural to seek the exclusive right of the generalized performance of the pre-trained weight. To address this issue, we suggest a new paradigm of transfer learning called disposable transfer learning (DTL), which disposes of only the source task without degrading the performance of the target task. To achieve knowledge disposal, we propose a novel loss named Gradient Collision loss (GC loss). GC loss selectively unlearns the source knowledge by leading the gradient vectors of mini-batches in different directions. Whether the model successfully unlearns the source task is measured by piggyback learning accuracy (PL accuracy). PL accuracy estimates the vulnerability of knowledge leakage by retraining the scrubbed model on a subset of source data or new downstream data. We demonstrate that GC loss is an effective approach to the DTL problem by showing that the model trained with GC loss retains the performance on the target task with a significantly reduced PL accuracy."中文翻译:转移学习广泛用于训练深度神经网络(DNN)以建立强大的表示。即使预训练模型被适应目标任务,表示性性能的特征提取器仍然保持一定程度的表现。由于预训练模型的性能可以被视为业主的私有财产,因此自然想要寻求预训练权重的权属。为解决这个问题,我们提出了一种新的转移学习方法 called dispose transfer learning(DTL),它可以不影响目标任务的性能,但是可以消除来源任务。为实现知识抛弃,我们提出了一种新的损失函数名为梯度碰撞损失(GC损失)。GC损失可以 selectively 忘记来源知识,通过导向梯度向量的不同方向。确定模型是否成功忘记来源任务,可以通过猪肉学习精度(PL精度)来衡量。PL精度可以估计知识泄露的敏感度,通过在源数据或新的下游数据上重新训练混凝模型。我们示示GC损失是解决 DTL 问题的有效方法,通过显示GC损失训练的模型在目标任务上保持表现,同时减少了PL精度。

Tackling Vision Language Tasks Through Learning Inner Monologues

  • paper_url: http://arxiv.org/abs/2308.09970
  • repo_url: None
  • paper_authors: Diji Yang, Kezhen Chen, Jinmeng Rao, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang
  • for: 解决复杂的视觉语言问题,使用内启思维过程来优化语言模型和视觉模型之间的拟合。
  • methods: 提出了一种新的方法 Inner Monologue Multi-Modal Optimization (IMMO),通过模拟内启思维过程,使得语言模型和视觉模型可以在自然语言交流中互动,提高推理和解释能力。
  • results: 对两个popular任务进行评估,结果表明,通过模拟内启思维过程,IMMO可以提高视觉语言模型的推理和解释能力,并且可以应用于许多不同的AI问题。
    Abstract Visual language tasks require AI models to comprehend and reason with both visual and textual content. Driven by the power of Large Language Models (LLMs), two prominent methods have emerged: (1) the hybrid integration between LLMs and Vision-Language Models (VLMs), where visual inputs are firstly converted into language descriptions by VLMs, serving as inputs for LLMs to generate final answer(s); (2) visual feature alignment in language space, where visual inputs are encoded as embeddings and projected to LLMs' language space via further supervised fine-tuning. The first approach provides light training costs and interpretability but is hard to be optimized in an end-to-end fashion. The second approach presents decent performance, but feature alignment usually requires large amounts of training data and lacks interpretability. To tackle this dilemma, we propose a novel approach, Inner Monologue Multi-Modal Optimization (IMMO), to solve complex vision language problems by simulating inner monologue processes, a cognitive process in which an individual engages in silent verbal communication with themselves. We enable LLMs and VLMs to interact through natural language conversation and propose to use a two-stage training process to learn how to do the inner monologue (self-asking questions and answering questions). IMMO is evaluated on two popular tasks and the results suggest by emulating the cognitive phenomenon of internal dialogue, our approach can enhance reasoning and explanation abilities, contributing to the more effective fusion of vision and language models. More importantly, instead of using predefined human-crafted monologues, IMMO learns this process within the deep learning models, promising wider applicability to many different AI problems beyond vision language tasks.
    摘要 ⟨SYS⟩视觉语言任务需要人工智能模型理解和处理视觉和文本内容。受大型语言模型(LLM)的力量驱动,两种主要方法出现:(1)混合 integrate LLMs 和视觉语言模型(VLMs),其中视觉输入首先被VLMs转换为语言描述,然后被LLMs 生成最终答案;(2)视觉特征对齐在语言空间,其中视觉输入被编码为特征并通过进一步的超vised fine-tuning Projected to LLMs 的语言空间。前一种方法具有轻量级训练成本和可读性,但困难在端到端方式优化;后一种方法具有良好的性能,但特征对齐通常需要大量的训练数据和缺乏可读性。为解决这个悖论,我们提出了一种新的方法: Inner Monologue Multi-Modal Optimization(IMMO),用于解决复杂的视觉语言问题。我们使得 LLMs 和 VLMs 通过自然语言对话互动,并提出了在两个阶段训练过程中学习如何做内心对话(自我问答)。IMMO 在两个流行的任务上进行评估,结果表明,通过模拟内心对话这种认知现象,我们的方法可以提高理解和解释能力,为视觉语言模型的更有效融合做出贡献。此外,不同于使用预先定制的人类编写的假象,IMMO 在深度学习模型中学习内心对话过程,提供更广泛的应用可能性。

Anomaly-Aware Semantic Segmentation via Style-Aligned OoD Augmentation

  • paper_url: http://arxiv.org/abs/2308.09965
  • repo_url: None
  • paper_authors: Dan Zhang, Kaspar Sakmann, William Beluch, Robin Hutmacher, Yumeng Li
  • for: 这paper是为了帮助标准semantic segmentation模型具备异常感知而写的。
  • methods: 这paper使用了改进的out-of-distribution数据生成方法,将异常数据与驾驶场景的风格差减少到最小化,从而减轻数据生成过程中的 shortcut。此外,paper还提出了一种简单的练习损失,使得预训练的semantic segmentation模型能够生成“None of the given classes”预测,通过每个像素的异常分数进行异常分割。
  • results: 与普通的semantic segmentation模型相比,这paper的方法能够减少异常分割的训练时间和精度损失,同时保持原始任务的性能。
    Abstract Within the context of autonomous driving, encountering unknown objects becomes inevitable during deployment in the open world. Therefore, it is crucial to equip standard semantic segmentation models with anomaly awareness. Many previous approaches have utilized synthetic out-of-distribution (OoD) data augmentation to tackle this problem. In this work, we advance the OoD synthesis process by reducing the domain gap between the OoD data and driving scenes, effectively mitigating the style difference that might otherwise act as an obvious shortcut during training. Additionally, we propose a simple fine-tuning loss that effectively induces a pre-trained semantic segmentation model to generate a ``none of the given classes" prediction, leveraging per-pixel OoD scores for anomaly segmentation. With minimal fine-tuning effort, our pipeline enables the use of pre-trained models for anomaly segmentation while maintaining the performance on the original task.
    摘要 在自动驾驶中,遇到未知对象是不可避免的,因此需要为标准Semantic segmentation模型增加异常意识。许多前一代方法使用synthetic Out-of-distribution(OoD)数据增强进行了处理。在这种工作中,我们提高了OoD数据生成过程,将驾驶场景和OoD数据域的差异降低到最小,从而有效地消除了可能导致训练中的短cut shortcut。此外,我们提议一种简单的精度调整loss,使得预训练的Semantic segmentation模型能够生成“ none of the given classes”预测,通过每个像素的OoD分数进行异常分 segmentation。只需 minimal fine-tuning effort,我们的管道可以使用预训练模型进行异常分 segmentation,同时保持原始任务的性能。

Towards Self-Adaptive Machine Learning-Enabled Systems Through QoS-Aware Model Switching

  • paper_url: http://arxiv.org/abs/2308.09960
  • repo_url: https://github.com/sa4s-serc/adamls
  • paper_authors: Shubham Kulkarni, Arya Marda, Karthik Vaidhyanathan
  • for: 本研究旨在提出一种机器学习模型均衡器,以管理运行时uncertainty,保证机器学习系统的质量。
  • methods: 本研究使用多个机器学习模型,并 introduce了一种基于MAPLE-K循环的自适应策略,以实现不断自适应机器学习系统。
  • results: 实验结果表明,AdaMLS可以更好地保证系统和模型的性能,并且在动态环境中提供优化的QoS。
    Abstract Machine Learning (ML), particularly deep learning, has seen vast advancements, leading to the rise of Machine Learning-Enabled Systems (MLS). However, numerous software engineering challenges persist in propelling these MLS into production, largely due to various run-time uncertainties that impact the overall Quality of Service (QoS). These uncertainties emanate from ML models, software components, and environmental factors. Self-adaptation techniques present potential in managing run-time uncertainties, but their application in MLS remains largely unexplored. As a solution, we propose the concept of a Machine Learning Model Balancer, focusing on managing uncertainties related to ML models by using multiple models. Subsequently, we introduce AdaMLS, a novel self-adaptation approach that leverages this concept and extends the traditional MAPE-K loop for continuous MLS adaptation. AdaMLS employs lightweight unsupervised learning for dynamic model switching, thereby ensuring consistent QoS. Through a self-adaptive object detection system prototype, we demonstrate AdaMLS's effectiveness in balancing system and model performance. Preliminary results suggest AdaMLS surpasses naive and single state-of-the-art models in QoS guarantees, heralding the advancement towards self-adaptive MLS with optimal QoS in dynamic environments.
    摘要

A Comparison of Adversarial Learning Techniques for Malware Detection

  • paper_url: http://arxiv.org/abs/2308.09958
  • repo_url: None
  • paper_authors: Pavla Louthánová, Matouš Kozák, Martin Jureček, Mark Stamp
  • for: 本文Addresses the problem of generating adversarial malware samples, specifically malicious Windows Portable Executable files, to evaluate the effectiveness of different methods for generating adversarial samples and their practical applicability.
  • methods: The paper uses gradient-based, evolutionary algorithm-based, and reinforcement-based methods to generate adversarial samples, and tests the generated samples against selected antivirus products.
  • results: The results show that applying optimized modifications to previously detected malware can lead to incorrect classification of the file as benign, and that generated malware samples can be successfully used against detection models other than those used to generate them. The Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential, achieving an average sample generation time of 5.73 seconds and the highest average evasion rate of 44.11%. Using the Gym-malware generator in combination with itself improved the evasion rate to 58.35%.
    Abstract Machine learning has proven to be a useful tool for automated malware detection, but machine learning models have also been shown to be vulnerable to adversarial attacks. This article addresses the problem of generating adversarial malware samples, specifically malicious Windows Portable Executable files. We summarize and compare work that has focused on adversarial machine learning for malware detection. We use gradient-based, evolutionary algorithm-based, and reinforcement-based methods to generate adversarial samples, and then test the generated samples against selected antivirus products. We compare the selected methods in terms of accuracy and practical applicability. The results show that applying optimized modifications to previously detected malware can lead to incorrect classification of the file as benign. It is also known that generated malware samples can be successfully used against detection models other than those used to generate them and that using combinations of generators can create new samples that evade detection. Experiments show that the Gym-malware generator, which uses a reinforcement learning approach, has the greatest practical potential. This generator achieved an average sample generation time of 5.73 seconds and the highest average evasion rate of 44.11%. Using the Gym-malware generator in combination with itself improved the evasion rate to 58.35%.
    摘要

To prune or not to prune : A chaos-causality approach to principled pruning of dense neural networks

  • paper_url: http://arxiv.org/abs/2308.09955
  • repo_url: None
  • paper_authors: Rajan Sahu, Shivam Chadha, Nithin Nagaraj, Archana Mathur, Snehanshu Saha
  • for: 降低神经网络的大小(卷积),不受性能受到影响,是资源有限设备上重要问题。
  • methods: 通过Weight rankings或penalization criteria,如矩阵大小和重要性,从已 retrained 的剩下 weights 中选择特定的 weights,来实现卷积。
  • results: 采用这种方法可以保持原始性能,同时保留特征解释性。
    Abstract Reducing the size of a neural network (pruning) by removing weights without impacting its performance is an important problem for resource-constrained devices. In the past, pruning was typically accomplished by ranking or penalizing weights based on criteria like magnitude and removing low-ranked weights before retraining the remaining ones. Pruning strategies may also involve removing neurons from the network in order to achieve the desired reduction in network size. We formulate pruning as an optimization problem with the objective of minimizing misclassifications by selecting specific weights. To accomplish this, we have introduced the concept of chaos in learning (Lyapunov exponents) via weight updates and exploiting causality to identify the causal weights responsible for misclassification. Such a pruned network maintains the original performance and retains feature explainability.
    摘要 减小神经网络(减少),不影响其性能,是资源受限设备上非常重要的问题。以前,减少通常通过对权重进行排名或惩罚,根据权重的大小和重要性来移除低排名的权重,然后再 retrained 剩下的权重来实现网络的减少。减少策略可能还会涉及到从网络中移除神经元,以实现所需的网络大小减少。我们将减少形式为学习过程中权重更新中引入了混乱(Lyapunov 分数)的概念,并通过利用 causality 来确定引起错分的权重。这样的减少网络可以保持原始性能,同时保持特征解释性。

Finding emergence in data: causal emergence inspired dynamics learning

  • paper_url: http://arxiv.org/abs/2308.09952
  • repo_url: None
  • paper_authors: Mingzhe Yang, Zhipeng Wang, Kaiwei Liu, Yingqi Rong, Bing Yuan, Jiang Zhang
  • for: This paper aims to develop a machine learning framework to model complex dynamical systems in a data-driven manner, with a focus on capturing emergent behaviors and properties.
  • methods: The proposed framework draws inspiration from the theory of causal emergence and uses maximum effective information (EI) to learn macro-dynamics within an emergent latent space.
  • results: The proposed framework is effective in capturing emergent patterns, learning the coarse-graining strategy, and quantifying the degree of causal emergence in the data. Additionally, the model demonstrates superior generalization ability on environments different from the training dataset.Here’s the same information in Simplified Chinese:
  • for: 这篇论文目标是通过数据驱动方式模型复杂的动力系统,强调捕捉出emergent行为和质量。
  • methods: 提议的框架启发自 causal emergence 理论,使用最大有效信息(EI)来学习macro-dinamics在emergent latent space中。
  • results: 提议的框架能够成功捕捉emergentpattern,学习卷积策略,并量化数据中的 causal emergence度。此外,模型在不同于训练集环境下的实验也表现出了superior generalization能力。
    Abstract Modelling complex dynamical systems in a data-driven manner is challenging due to the presence of emergent behaviors and properties that cannot be directly captured by micro-level observational data. Therefore, it is crucial to develop a model that can effectively capture emergent dynamics at the macro-level and quantify emergence based on the available data. Drawing inspiration from the theory of causal emergence, this paper introduces a machine learning framework aimed at learning macro-dynamics within an emergent latent space. The framework achieves this by maximizing the effective information (EI) to obtain a macro-dynamics model with stronger causal effects. Experimental results on both simulated and real data demonstrate the effectiveness of the proposed framework. Not only does it successfully capture emergent patterns, but it also learns the coarse-graining strategy and quantifies the degree of causal emergence in the data. Furthermore, experiments conducted on environments different from the training dataset highlight the superior generalization ability of our model.
    摘要 模拟复杂动力系统在数据驱动方式下是一项挑战,因为存在不可预测的 emergent 行为和质量。为了有效地捕捉 emergent 动力,这篇论文提出了一种基于机器学习的框架,该框架在 emergent 尘肤空间中学习 macro-动力。该框架通过最大化有效信息(EI)来获得具有更强的 causal 效应的 macro-动力模型。实验结果表明,该模型不仅可以成功捕捉 emergent 模式,还可以学习 coarse-graining 策略并量化数据中的 causal emergence 度。此外,在训练数据集以外的环境下进行的实验也表明了我们的模型具有更高的总体化能力。

Study on the effectiveness of AutoML in detecting cardiovascular disease

  • paper_url: http://arxiv.org/abs/2308.09947
  • repo_url: None
  • paper_authors: T. V. Afanasieva, A. P. Kuzlyakin, A. V. Komolov
    for: 这篇论文主要是为了探讨机器学习技术在抗生素敏感疾病预测方面的应用。methods: 这篇论文使用了自动机器学习(AutoML)技术,combined five data sets of cardiovascular disease indicators from the UCI Machine Learning Repository,并用了十三种基本机器学习模型(KNeighborsUnif、KNeighborsDist、LightGBMXT、LightGBM、RandomForestGini、RandomForestEntr、CatBoost、ExtraTreesGini、ExtraTreesEntr、NeuralNetFastA、XGBoost、NeuralNetTorch、LightGBMLarge)。results: 研究发现,自动机器学习模型的结构对于抗生素敏感疾病预测是不同的,具体取决于使用的基本模型的效率和准确率,以及数据预处理方法,尤其是数据标准化技术。研究发现,当将源数据标准化为二进制值时,自动机器学习模型的准确率最高,达到了87.41%至92.3%之间的范围。
    Abstract Cardiovascular diseases are widespread among patients with chronic noncommunicable diseases and are one of the leading causes of death, including in the working age. The article presents the relevance of the development and application of patient-oriented systems, in which machine learning (ML) is a promising technology that allows predicting cardiovascular diseases. Automated machine learning (AutoML) makes it possible to simplify and speed up the process of developing AI/ML applications, which is key in the development of patient-oriented systems by application users, in particular medical specialists. The authors propose a framework for the application of automatic machine learning and three scenarios that allowed for data combining five data sets of cardiovascular disease indicators from the UCI Machine Learning Repository to investigate the effectiveness in detecting this class of diseases. The study investigated one AutoML model that used and optimized the hyperparameters of thirteen basic ML models (KNeighborsUnif, KNeighborsDist, LightGBMXT, LightGBM, RandomForestGini, RandomForestEntr, CatBoost, ExtraTreesGini, ExtraTreesEntr, NeuralNetFastA, XGBoost, NeuralNetTorch, LightGBMLarge) and included the most accurate models in the weighted ensemble. The results of the study showed that the structure of the AutoML model for detecting cardiovascular diseases depends not only on the efficiency and accuracy of the basic models used, but also on the scenarios for preprocessing the initial data, in particular, on the technique of data normalization. The comparative analysis showed that the accuracy of the AutoML model in detecting cardiovascular disease varied in the range from 87.41% to 92.3%, and the maximum accuracy was obtained when normalizing the source data into binary values, and the minimum was obtained when using the built-in AutoML technique.
    摘要 心血管疾病是 chronic noncommunicable diseases 中广泛存在的,是死亡的主要原因之一,包括工作年龄期。本文介绍了patient-oriented系统的开发和应用的重要性,机器学习(ML)技术在心血管疾病预测方面的潜在性。自动机器学习(AutoML)技术可以简化和加速AI/ML应用程序的开发过程,这对医疗专业人员 particularly medical specialists 来说是关键。作者提出了一个自动机器学习框架,并通过将五个心血管疾病指标数据集 combine investigate the effectiveness of detecting this class of diseases。研究使用了一个AutoML模型,该模型使用和优化了十三种基本ML模型(KNeighborsUnif、KNeighborsDist、LightGBMXT、LightGBM、RandomForestGini、RandomForestEntr、CatBoost、ExtraTreesGini、ExtraTreesEntr、NeuralNetFastA、XGBoost、NeuralNetTorch、LightGBMLarge),并包括最佳模型在权重 ensemble 中。研究结果表明,自动机器学习模型的结构在检测心血管疾病方面取决于基本模型的效率和准确率,以及数据预处理方法,特别是数据 нор化技术。比较分析表明,自动机器学习模型在检测心血管疾病方面的准确率在87.41%到92.3%之间,最高准确率为将源数据 нор化为二进制值,最低准确率为使用AutoML技术。

Dual Branch Deep Learning Network for Detection and Stage Grading of Diabetic Retinopathy

  • paper_url: http://arxiv.org/abs/2308.09945
  • repo_url: None
  • paper_authors: Hossein Shakibania, Sina Raoufi, Behnam Pourafkham, Hassan Khotanlou, Muharram Mansoorizadeh
    for: 这个研究的目的是提出一种基于深度学习的糖尿病肠病诊断和分级方法,使用单一的背景照片。methods: 该模型使用了传输学习,利用两个现有的状态流程模型作为特征提取器,并在新的数据集上进行细化。results: 该模型在APTOS 2019 数据集上表现出色,在糖尿病诊断和分级方面都高于现有文献。对于二分类问题,该方法达到了98.50%的准确率、99.46%的敏感度和97.51%的特异度。在分级问题上,它达到了93.00%的квадратиче weights κ值、89.60%的准确率、89.60%的敏感度和97.72%的特异度。该方法可以作为糖尿病肠病诊断和分级工具,为临床决策和患者护理提供重要的帮助。
    Abstract Diabetic retinopathy is a severe complication of diabetes that can lead to permanent blindness if not treated promptly. Early and accurate diagnosis of the disease is essential for successful treatment. This paper introduces a deep learning method for the detection and stage grading of diabetic retinopathy, using a single fundus retinal image. Our model utilizes transfer learning, employing two state-of-the-art pre-trained models as feature extractors and fine-tuning them on a new dataset. The proposed model is trained on a large multi-center dataset, including the APTOS 2019 dataset, obtained from publicly available sources. It achieves remarkable performance in diabetic retinopathy detection and stage classification on the APTOS 2019, outperforming the established literature. For binary classification, the proposed approach achieves an accuracy of 98.50%, a sensitivity of 99.46%, and a specificity of 97.51%. In stage grading, it achieves a quadratic weighted kappa of 93.00%, an accuracy of 89.60%, a sensitivity of 89.60%, and a specificity of 97.72%. The proposed approach serves as a reliable screening and stage grading tool for diabetic retinopathy, offering significant potential to enhance clinical decision-making and patient care.
    摘要 糖尿病 RETINOPATHY 是糖尿病的一种严重的并发症,如果不及时治疗,可能会导致永久失明。早期和准确的诊断是成功治疗的关键。这篇论文介绍了一种深度学习方法,用于检测和评分糖尿病 RETINOPATHY,使用单个背部眼图像。我们的模型利用传输学习,使用两个状态之前的权威模型作为特征提取器,并在新数据集上练习 fine-tuning。我们的模型在APTOS 2019 数据集上获得了优秀的性能,在糖尿病 RETINOPATHY 检测和评分方面超过了现有文献。对于二分类,我们的方法达到了 98.50%的准确率,99.46%的敏感度和 97.51%的特异度。在评分方面,我们的方法达到了 93.00%的QUADRATIC WEIGHTED KAPPA,89.60%的准确率,89.60%的敏感度和 97.72%的特异度。我们的方法可以作为糖尿病 RETINOPATHY 检测和评分工具,为临床决策和患者护理提供了重要的可靠性。

On the Robustness of Open-World Test-Time Training: Self-Training with Dynamic Prototype Expansion

  • paper_url: http://arxiv.org/abs/2308.09942
  • repo_url: https://github.com/yushu-li/owttt
  • paper_authors: Yushu Li, Xun Xu, Yongyi Su, Kui Jia
  • for: 该研究旨在提高unknown target domain distribution下的深度学习模型的泛化性,并且在低延迟下实现。
  • methods: 该研究使用了test-time training/adaptation(TTT/TTA)技术,并提出了一种 adaptive strong OOD pruning 技术和动态扩展 prototype 技术来提高 OWTTT 的 Robustness。
  • results: 该研究在 5 个 OWTTT benchmark 上达到了state-of-the-art 性能,并且提供了一个可用的代码库(https://github.com/Yushu-Li/OWTTT)。
    Abstract Generalizing deep learning models to unknown target domain distribution with low latency has motivated research into test-time training/adaptation (TTT/TTA). Existing approaches often focus on improving test-time training performance under well-curated target domain data. As figured out in this work, many state-of-the-art methods fail to maintain the performance when the target domain is contaminated with strong out-of-distribution (OOD) data, a.k.a. open-world test-time training (OWTTT). The failure is mainly due to the inability to distinguish strong OOD samples from regular weak OOD samples. To improve the robustness of OWTTT we first develop an adaptive strong OOD pruning which improves the efficacy of the self-training TTT method. We further propose a way to dynamically expand the prototypes to represent strong OOD samples for an improved weak/strong OOD data separation. Finally, we regularize self-training with distribution alignment and the combination yields the state-of-the-art performance on 5 OWTTT benchmarks. The code is available at https://github.com/Yushu-Li/OWTTT.
    摘要 通用深度学习模型到未知目标频谱分布的泛化,以低延迟实现,已经引起了研究者的关注。现有的方法通常是在已经批处理的目标频谱数据上提高测试时训练性能。然而,在受到强度外部数据杂化(OOD)的情况下,许多状态态态方法表现不佳,这主要是因为无法 отличи出强度OOD样本和软OOD样本。为了改善OWTTT的 Robustness,我们首先开发了适应强度OOD排除,使自我训练TTT方法更加高效。然后,我们提议在运行时动态扩展prototype,以便更好地分离强度OOD和软OOD样本。最后,我们将自我训练与分布AlignmentREG regularization相结合,这种结合得到了5个OWTTT benchmark中的状态态表现。相关代码可以在https://github.com/Yushu-Li/OWTTT中找到。

Practical Anomaly Detection over Multivariate Monitoring Metrics for Online Services

  • paper_url: http://arxiv.org/abs/2308.09937
  • repo_url: https://github.com/OpsPAI/CMAnomaly
  • paper_authors: Jinyang Liu, Tianyi Yang, Zhuangbin Chen, Yuxin Su, Cong Feng, Zengyin Yang, Michael R. Lyu
  • for: 这篇论文是为了提出一个基于协力机器的多变数监控指标异常探测框架,以提高现有系统监控中的异常探测效能。
  • methods: 本论文使用了一个名为CMAnomaly的异常探测框架,其中包括一个基于协力机器的机制,可以有效地捕捉多变数监控指标之间的相互作用,并且可以使用cost-effective模型来利用这些相互作用进行异常探测。
  • results: 根据实验结果,CMAnomaly比基于现有模型的 benchmark 高出6.77%到10.68%,并且可以在10倍至20倍的速度下进行异常探测。此外,本论文还详细介绍了在Huawei Cloud上部署CMAnomaly的经验。
    Abstract As modern software systems continue to grow in terms of complexity and volume, anomaly detection on multivariate monitoring metrics, which profile systems' health status, becomes more and more critical and challenging. In particular, the dependency between different metrics and their historical patterns plays a critical role in pursuing prompt and accurate anomaly detection. Existing approaches fall short of industrial needs for being unable to capture such information efficiently. To fill this significant gap, in this paper, we propose CMAnomaly, an anomaly detection framework on multivariate monitoring metrics based on collaborative machine. The proposed collaborative machine is a mechanism to capture the pairwise interactions along with feature and temporal dimensions with linear time complexity. Cost-effective models can then be employed to leverage both the dependency between monitoring metrics and their historical patterns for anomaly detection. The proposed framework is extensively evaluated with both public data and industrial data collected from a large-scale online service system of Huawei Cloud. The experimental results demonstrate that compared with state-of-the-art baseline models, CMAnomaly achieves an average F1 score of 0.9494, outperforming baselines by 6.77% to 10.68%, and runs 10X to 20X faster. Furthermore, we also share our experience of deploying CMAnomaly in Huawei Cloud.
    摘要 现代软件系统的复杂性和规模在不断增长,异常检测在多变量监控指标上变得越来越重要和挑战。特别是依赖于不同指标之间的关系以及历史 patrón 的信息在追踪系统的健康状态非常重要。现有的方法无法fficiently capture这些信息,为此,在本文中,我们提出了 CMAnomaly,一种基于合作机器的异常检测框架。该框架使用了对feature和时间维度进行对称的机制,以linear time complexityCapture pairwise interactions。然后,可以使用cost-effective模型来利用监控指标之间的依赖关系和历史 patrón 进行异常检测。我们对公共数据集和industrial数据集进行了广泛的evaluation,结果表明,相比州标baseline模型,CMAnomaly在F1分数方面获得了0.9494的平均分,高于基eline模型6.77%到10.68%,并且运行速度比基eline模型快10到20倍。此外,我们还分享了在Huawei Cloud中部署 CMAnomaly的经验。

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

  • paper_url: http://arxiv.org/abs/2308.09936
  • repo_url: https://github.com/mlpc-ucsd/bliva
  • paper_authors: Wenbo Hu, Yifan Xu, Yi Li, Weiyue Li, Zeyuan Chen, Zhuowen Tu
  • for: 该研究旨在解决现实世界中常见的图像涉及文本场景下的视觉问答任务,即使图像具有文本背景。
  • methods: 该研究使用了一种名为BLIVA的新方法,它是基于InstructBLIP的增强版本,通过直接将图像中的编码补丁直接输入到大语言模型中,以帮助模型更好地捕捉图像中的细节。
  • results: 实验证明,BLIVA比基elineInstructBLIP有17.76%的提升(在OCR-VQAbenchmark中)和7.9%的提升(在Visual Spatial Reasoning benchmark中),并且在实际世界中的图像处理 tasks中表现出色,无论图像中存在文本或不存在。
    Abstract Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limited to the token count, potentially curtailing the recognition of scenes with text-rich context. To improve upon them, the present study introduces BLIVA: an augmented version of InstructBLIP with Visual Assistant. BLIVA incorporates the query embeddings from InstructBLIP and also directly projects encoded patch embeddings into the LLM, a technique inspired by LLaVA. This approach assists the model to capture intricate details potentially missed during the query decoding process. Empirical evidence demonstrates that our model, BLIVA, significantly enhances performance in processing text-rich VQA benchmarks (up to 17.76\% in OCR-VQA benchmark) and in undertaking typical VQA benchmarks (up to 7.9\% in Visual Spatial Reasoning benchmark), comparing to our baseline InstructBLIP. BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence. To demonstrate the broad industry applications enabled by BLIVA, we evaluate the model using a new dataset comprising YouTube thumbnails paired with question-answer sets across 13 diverse categories. For researchers interested in further exploration, our code and models are freely accessible at https://github.com/mlpc-ucsd/BLIVA.git
    摘要 大型语言模型(VLM),它们将大型语言模型(LLM)与视觉理解能力结合,在开放式视觉问答任务上达到了显著的进步。然而,这些模型无法准确地理解带有文本的图像,这是现实生活中常见的情况。标准的图像信息提取方法通常包括学习固定的查询嵌入。这些嵌入用于在LLM中作为软提示输入,然而这种方法受限于字符串数量,可能导致捕捉场景中的文本背景信息。为了改进它们,本研究提出了BLIVA:一种基于InstructBLIP的增强版,它将InstructBLIP的查询嵌入与LLM直接进行编码覆盖,这种方法源于LLaVA。这种方法帮助模型捕捉文本背景信息中的细节,可能在查询解码过程中被遗弃。实验证明,我们的模型BLIVA在处理文本含量高的VQA标准benchmark(OCRA-VQA标准benchmark)中表现出色,提高了17.76%,以及在典型VQA标准benchmark(Visual Spatial Reasoning标准benchmark)中提高了7.9%。BLIVA在真实世界中处理图像,无论图像中是否存在文本。为了展示BLIVA在广泛的工业应用中的可能性,我们使用了一个新的YouTube封面集合,与问题集合在13种不同的类别中进行了评估。对于关心进一步探索的研究人员,我们在https://github.com/mlpc-ucsd/BLIVA.git中提供了代码和模型。

Analyzing Quantization in TVM

  • paper_url: http://arxiv.org/abs/2308.10905
  • repo_url: None
  • paper_authors: Mingfei Guo
  • for: 这个论文主要是研究 TVM 中对weight tensor的量化以提高推理时间和内存占用率,但是8位量化并不符合预期,实际比非量化版本更慢。
  • methods: 该论文使用了 TVM 进行量化和低位计算,并 investigate 了量化下的性能问题,并评估了不同的优化策略。
  • results: 该论文通过修复图构建错误和测试多种优化策略,最终实现了对 compute-bound 和 memory-bound 任务的优化,并实现了163.88%和194.98%的推理时间提升。
    Abstract There has been many papers in academic literature on quantizing weight tensors in deep learning models to reduce inference latency and memory footprint. TVM also has the ability to quantize weights and support low-bit computations. Although quantization is typically expected to improve inference time, in TVM, the performance of 8-bit quantization does not meet the expectations. Typically, when applying 8-bit quantization to a deep learning model, it is usually expected to achieve around 50% of the full-precision inference time. However, in this particular case, not only does the quantized version fail to achieve the desired performance boost, but it actually performs worse, resulting in an inference time that is about 2 times as slow as the non-quantized version. In this project, we thoroughly investigate the reasons behind the underperformance and assess the compatibility and optimization opportunities of 8-bit quantization in TVM. We discuss the optimization of two different types of tasks: computation-bound and memory-bound, and provide a detailed comparison of various optimization techniques in TVM. Through the identification of performance issues, we have successfully improved quantization by addressing a bug in graph building. Furthermore, we analyze multiple optimization strategies to achieve the optimal quantization result. The best experiment achieves 163.88% improvement compared with the TVM compiled baseline in inference time for the compute-bound task and 194.98% for the memory-bound task.
    摘要 在学术文献中有很多论著关于深度学习模型的量化权重tensor以降低推理延迟和内存占用。TVM也具有量化权重和低位计算的能力。although quantization is typically expected to improve inference time, in TVM, the performance of 8-bit quantization does not meet expectations. Typically, when applying 8-bit quantization to a deep learning model, it is usually expected to achieve around 50% of the full-precision inference time. However, in this particular case, not only does the quantized version fail to achieve the desired performance boost, but it actually performs worse, resulting in an inference time that is about 2 times as slow as the non-quantized version. In this project, we thoroughly investigate the reasons behind the underperformance and assess the compatibility and optimization opportunities of 8-bit quantization in TVM. We discuss the optimization of two different types of tasks: computation-bound and memory-bound, and provide a detailed comparison of various optimization techniques in TVM. Through the identification of performance issues, we have successfully improved quantization by addressing a bug in graph building. Furthermore, we analyze multiple optimization strategies to achieve the optimal quantization result. The best experiment achieves 163.88% improvement compared with the TVM compiled baseline in inference time for the compute-bound task and 194.98% for the memory-bound task.

East: Efficient and Accurate Secure Transformer Framework for Inference

  • paper_url: http://arxiv.org/abs/2308.09923
  • repo_url: None
  • paper_authors: Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, Xiyong Zhang
  • for: 提高Transformer推理的隐私保护
  • methods: 提出一个名为”East”的框架,包括新的匿名分割多项式评估算法和特殊防范协议,以提高安全性和准确性
  • results: 对BERT进行应用,并与不加密推理相比,推理精度保持一致,与Iron相比,通信量下降1.8倍,运行时间下降1.2倍
    Abstract Transformer has been successfully used in practical applications, such as ChatGPT, due to its powerful advantages. However, users' input is leaked to the model provider during the service. With people's attention to privacy, privacy-preserving Transformer inference is on the demand of such services. Secure protocols for non-linear functions are crucial in privacy-preserving Transformer inference, which are not well studied. Thus, designing practical secure protocols for non-linear functions is hard but significant to model performance. In this work, we propose a framework \emph{East} to enable efficient and accurate secure Transformer inference. Firstly, we propose a new oblivious piecewise polynomial evaluation algorithm and apply it to the activation functions, which reduces the runtime and communication of GELU by over 1.5$\times$ and 2.5$\times$, compared to prior arts. Secondly, the secure protocols for softmax and layer normalization are carefully designed to faithfully maintain the desired functionality. Thirdly, several optimizations are conducted in detail to enhance the overall efficiency. We applied \emph{East} to BERT and the results show that the inference accuracy remains consistent with the plaintext inference without fine-tuning. Compared to Iron, we achieve about 1.8$\times$ lower communication within 1.2$\times$ lower runtime.
    摘要 “transformer”已经在实际应用中得到了成功,例如chatGPT,因为它具有强大的优势。然而,用户的输入会被提供者 during the service,导致隐私问题。为了保护隐私,隐私保护的transformer推察是非常重要的。但是, Designing practical secure protocols for non-linear functions is hard but significant to model performance。在这个工作中,我们提出了一个名为“East”的框架,以实现有效和精确的隐私保护的transformer推察。首先,我们提出了一个新的隐私检查法,并将其应用到活化函数上,这减少了runtime和通信量,比对实际应用中的优先输出还要好。其次,我们对软max和层normalization的安全协议进行了谨慎的设计,以保持所需的功能。最后,我们在细节上进行了详细的优化,以提高整体的效率。我们将“East”应用到BERT,结果显示,在不进行微调的情况下,推察精度与普通的推察相同。与Iron相比,我们的通信量为1.8倍,并且runtime为1.2倍。

EGANS: Evolutionary Generative Adversarial Network Search for Zero-Shot Learning

  • paper_url: http://arxiv.org/abs/2308.09915
  • repo_url: None
  • paper_authors: Shiming Chen, Shihuang Chen, Wenjin Hou, Weiping Ding, Xinge You
  • for: 提高零shot学习(ZSL)中的类别识别率,使用生成模型(如生成对抗网络(GAN))synthesize视觉样本,以提高ZSL的性能。
  • methods: 提出了一种自然选择的生成器网络搜索方法(EGANS),通过协同对抗进行 neural architecture search,以获得适应性和稳定性好的生成器和批判器。
  • results: 在标准CUB、SUN、AWA2和FLO数据集上,EGANS consistently提高了现有的生成ZSL方法的性能,表明了生成ZSL中的进化性搜索在ZSL中的潜在应用。
    Abstract Zero-shot learning (ZSL) aims to recognize the novel classes which cannot be collected for training a prediction model. Accordingly, generative models (e.g., generative adversarial network (GAN)) are typically used to synthesize the visual samples conditioned by the class semantic vectors and achieve remarkable progress for ZSL. However, existing GAN-based generative ZSL methods are based on hand-crafted models, which cannot adapt to various datasets/scenarios and fails to model instability. To alleviate these challenges, we propose evolutionary generative adversarial network search (termed EGANS) to automatically design the generative network with good adaptation and stability, enabling reliable visual feature sample synthesis for advancing ZSL. Specifically, we adopt cooperative dual evolution to conduct a neural architecture search for both generator and discriminator under a unified evolutionary adversarial framework. EGANS is learned by two stages: evolution generator architecture search and evolution discriminator architecture search. During the evolution generator architecture search, we adopt a many-to-one adversarial training strategy to evolutionarily search for the optimal generator. Then the optimal generator is further applied to search for the optimal discriminator in the evolution discriminator architecture search with a similar evolution search algorithm. Once the optimal generator and discriminator are searched, we entail them into various generative ZSL baselines for ZSL classification. Extensive experiments show that EGANS consistently improve existing generative ZSL methods on the standard CUB, SUN, AWA2 and FLO datasets. The significant performance gains indicate that the evolutionary neural architecture search explores a virgin field in ZSL.
    摘要 EGANS采用了合作双向进化来进行神经网络搜索,包括生成器和分类器。在生成器搜索阶段,我们采用了多对一的对抗训练策略,通过进化搜索来找到最佳的生成器。然后,我们使用类似的进化搜索算法来搜索最佳的分类器。一旦找到了最佳的生成器和分类器,我们就将它们与不同的生成ZSL基线方法结合,以进行ZSL分类。我们的实验表明,EGANS可以在标准的CUB、SUN、AWA2和FLO数据集上提供remarkable的性能提升,这表明了进化神经网络搜索在ZSL中的可能性。

Never Explore Repeatedly in Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.09909
  • repo_url: None
  • paper_authors: Chenghao Li, Tonghan Wang, Chongjie Zhang, Qianchuan Zhao
  • for: 本研究旨在提高多智能体强化学习中的探索性能,通过采用内在动机。
  • methods: 本文提出了动态奖励扩大方法,通过稳定既已探索的区域的奖励,促进更广泛的探索。
  • results: 实验结果表明,本方法可以在Google研究足球和StarCraft II微管理任务中提高表现,特别在罕见奖励 Setting下。
    Abstract In the realm of multi-agent reinforcement learning, intrinsic motivations have emerged as a pivotal tool for exploration. While the computation of many intrinsic rewards relies on estimating variational posteriors using neural network approximators, a notable challenge has surfaced due to the limited expressive capability of these neural statistics approximators. We pinpoint this challenge as the "revisitation" issue, where agents recurrently explore confined areas of the task space. To combat this, we propose a dynamic reward scaling approach. This method is crafted to stabilize the significant fluctuations in intrinsic rewards in previously explored areas and promote broader exploration, effectively curbing the revisitation phenomenon. Our experimental findings underscore the efficacy of our approach, showcasing enhanced performance in demanding environments like Google Research Football and StarCraft II micromanagement tasks, especially in sparse reward settings.
    摘要 在多智能奖励学习领域,内生动机被视为探索的重要工具。然而,计算许多内生奖励的 neural network 近似器表现有限,导致“再次探索”问题的出现,即代理人重复探索任务空间中的封闭区域。为解决此问题,我们提出了动态奖励缩放方法。这种方法通过稳定前期探索区域中的奖励变化,激励代理人更广泛探索,从而缓解再次探索现象。我们的实验结果表明,我们的方法在 Google Research Football 和 StarCraft II 微管理任务中表现出色,尤其在罕见奖励设置下。

Imputing Brain Measurements Across Data Sets via Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.09907
  • repo_url: None
  • paper_authors: Yixin Wang, Wei Peng, Susan F. Tapert, Qingyu Zhao, Kilian M. Pohl
  • for: The paper aims to address the issue of missing measurements in publicly available structural MRI data sets, specifically the curvature scores computed by Freesurfer, by proposing a deep learning-based imputation method called Demographic Aware Graph-based Imputation (DAGI).
  • methods: The DAGI method uses a graph neural network (GNN) to model the dependencies between brain Region of Interests (ROIs) and accounts for demographic differences in brain measurements by feeding the graph encoding into a parallel architecture that simultaneously optimizes a graph decoder to impute values and a classifier to predict demographic factors.
  • results: The proposed DAGI method is tested on imputing missing Freesurfer measurements of the Adolescent Brain Cognitive Development (ABCD) Study data set (N=3760) by training the predictor on publicly released data from the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540).
    Abstract Publicly available data sets of structural MRIs might not contain specific measurements of brain Regions of Interests (ROIs) that are important for training machine learning models. For example, the curvature scores computed by Freesurfer are not released by the Adolescent Brain Cognitive Development (ABCD) Study. One can address this issue by simply reapplying Freesurfer to the data set. However, this approach is generally computationally and labor intensive (e.g., requiring quality control). An alternative is to impute the missing measurements via a deep learning approach. However, the state-of-the-art is designed to estimate randomly missing values rather than entire measurements. We therefore propose to re-frame the imputation problem as a prediction task on another (public) data set that contains the missing measurements and shares some ROI measurements with the data sets of interest. A deep learning model is then trained to predict the missing measurements from the shared ones and afterwards is applied to the other data sets. Our proposed algorithm models the dependencies between ROI measurements via a graph neural network (GNN) and accounts for demographic differences in brain measurements (e.g. sex) by feeding the graph encoding into a parallel architecture. The architecture simultaneously optimizes a graph decoder to impute values and a classifier in predicting demographic factors. We test the approach, called Demographic Aware Graph-based Imputation (DAGI), on imputing those missing Freesurfer measurements of ABCD (N=3760) by training the predictor on those publicly released by the National Consortium on Alcohol and Neurodevelopment in Adolescence (NCANDA, N=540)...
    摘要 公共可用数据集的结构MRI数据可能不包含特定的脑区域关注点(ROIs)的准确量测量。例如,ABCDFreesurfer的曲线分数不由ABCDFreesurfer发布。一种解决方法是简单地重新应用Freesurfer到数据集中。然而,这种方法通常是计算机和劳动力密集的(例如,需要质量控制)。另一种方法是使用深度学习方法进行填充。然而,现状的深度学习方法是随机缺失值的估计而不是整个测量。我们因此提议将填充问题重新定义为一个预测任务,使用另一个(公共)数据集,该数据集包含缺失的测量和与数据集集中的ROIs进行共享。然后,我们将深度学习模型训练以预测缺失测量,并在训练过程中考虑ROIs之间的依赖关系。我们称这种方法为“规格意识 Graph-based 填充”(DAGI)。我们在NCANDA(N=540)公共发布的数据集上训练了预测器,并在ABCDFreesurfer(N=3760)中进行了填充。

DPMAC: Differentially Private Communication for Cooperative Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.09902
  • repo_url: https://github.com/CANVOLCANO/DPMAC
  • paper_authors: Canzhe Zhao, Yanjie Ze, Jing Dong, Baoxiang Wang, Shuai Li
  • for: 防止多代理人学习中泄露敏感信息
  • methods: 使用(ε,δ)敏感数据隐私保证,采用Stochastic MessageSender,自动调整学习后的消息分布
  • results: 实验显示DPMAC在隐私保护场景下比基eline方法有明显优势
    Abstract Communication lays the foundation for cooperation in human society and in multi-agent reinforcement learning (MARL). Humans also desire to maintain their privacy when communicating with others, yet such privacy concern has not been considered in existing works in MARL. To this end, we propose the \textit{differentially private multi-agent communication} (DPMAC) algorithm, which protects the sensitive information of individual agents by equipping each agent with a local message sender with rigorous $(\epsilon, \delta)$-differential privacy (DP) guarantee. In contrast to directly perturbing the messages with predefined DP noise as commonly done in privacy-preserving scenarios, we adopt a stochastic message sender for each agent respectively and incorporate the DP requirement into the sender, which automatically adjusts the learned message distribution to alleviate the instability caused by DP noise. Further, we prove the existence of a Nash equilibrium in cooperative MARL with privacy-preserving communication, which suggests that this problem is game-theoretically learnable. Extensive experiments demonstrate a clear advantage of DPMAC over baseline methods in privacy-preserving scenarios.
    摘要 交流为社会和多代理学习(MARL)中的合作基础。人们也希望在交流时保持隐私,但这一点尚未被现有的MARL工作考虑。为此,我们提出了《差分隐私多代理通信算法》(DPMAC),该算法保护每个代理的敏感信息,并在每个代理的本地消息发送器中实现了严格(ε,δ)差分隐私(DP)保证。与直接将消息裁剪为常见的隐私保护方法一样,我们采用了每个代理的随机消息发送器,并将DP要求直接 incorporated into the sender,自动调整学习的消息分布,以解决由DP噪声引起的不稳定性。此外,我们证明了在隐私保护通信下的合作MARL问题是游戏理论上学习可能的。广泛的实验表明DPMAC在隐私保护场景下具有明显的优势。

Contrastive Learning-based Imputation-Prediction Networks for In-hospital Mortality Risk Modeling using EHRs

  • paper_url: http://arxiv.org/abs/2308.09896
  • repo_url: https://github.com/liulab1356/CL-ImpPreNet
  • paper_authors: Yuxi Liu, Zhenhao Zhang, Shaowen Qin, Flora D. Salim, Antonio Jimeno Yepes
  • for: 预测医院内死亡风险从电子医疗记录(EHRs)获得了广泛关注,以提供早期警示患者健康状况,促使医疗专业人员采取有效措施。
  • methods: 本研究提出了一种基于对比学习的抽象替换网络,用于预测医院内死亡风险。我们的方法引入图分析基于患者划分模型,以组合类似患者的信息。此外,我们的方法还可以将对比学习集成到提议的网络架构中,以增强患者表征学习和预测性能。
  • results: 在两个真实的EHR数据集上,我们的方法比州分之前的方法在替换和预测任务中都表现出优异。
    Abstract Predicting the risk of in-hospital mortality from electronic health records (EHRs) has received considerable attention. Such predictions will provide early warning of a patient's health condition to healthcare professionals so that timely interventions can be taken. This prediction task is challenging since EHR data are intrinsically irregular, with not only many missing values but also varying time intervals between medical records. Existing approaches focus on exploiting the variable correlations in patient medical records to impute missing values and establishing time-decay mechanisms to deal with such irregularity. This paper presents a novel contrastive learning-based imputation-prediction network for predicting in-hospital mortality risks using EHR data. Our approach introduces graph analysis-based patient stratification modeling in the imputation process to group similar patients. This allows information of similar patients only to be used, in addition to personal contextual information, for missing value imputation. Moreover, our approach can integrate contrastive learning into the proposed network architecture to enhance patient representation learning and predictive performance on the classification task. Experiments on two real-world EHR datasets show that our approach outperforms the state-of-the-art approaches in both imputation and prediction tasks.
    摘要 预测医院内 morteo风险从电子医疗记录(EHRs)得到了广泛的关注。这将为医疗专业人员提供早期警示patient的健康状况,以便在时间上采取有效的 intervención. 这个预测任务是挑战的,因为EHR数据本身是不规则的,有许多缺失值以及不同的时间间隔 between medical records. 现有的方法是利用patient的医疗记录中的变量相关性来填充缺失值,并设置时间衰减机制来处理这种不规则性. 本文提出了一种基于对比学习的投入-预测网络,用于预测医院内 morteo风险。我们的方法引入图分析基于patient的分类模型,以分组类似的patient。这样只有类似patient的信息被用于缺失值填充,同时还能够使用个体特定的上下文信息。此外,我们的方法还可以将对比学习integrated into the proposed network architecture,以提高patient表示学习和预测性能。实验表明,我们的方法在两个实际的EHR数据集上比状态之前的方法在投入和预测任务中表现出色。

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

  • paper_url: http://arxiv.org/abs/2308.09895
  • repo_url: None
  • paper_authors: Federico Cassano, John Gouwar, Francesca Lucchetti, Claire Schlesinger, Carolyn Jane Anderson, Michael Greenberg, Abhinav Jangda, Arjun Guha
  • for: 提高低资源语言中Code LLM的性能
  • methods: 使用半人工数据生成高质量数据集,并在这些数据集上练习和评估Code LLM
  • results: 使用 MultiPL-T 方法生成了大量有效的训练数据,并在 benchmark 问题上达到了state-of-the-art 性能水平,包括 Lua、Racket 和 OCaml。
    Abstract Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages, like OCaml and Racket. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. Our approach, called MultiPL-T, translates training data from high-resource languages into training data for low-resource languages. We apply our approach to generate tens of thousands of new, validated training items for Racket, OCaml, and Lua from Python. Moreover, we use an open dataset (The Stack) and model (StarCoderBase), which allow us to decontaminate benchmarks and train models on this data without violating the model license. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase that achieve state-of-the-art performance for Racket, OCaml, and Lua on benchmark problems. For Lua, our fine-tuned model achieves the same performance as StarCoderBase as Python -- a very high-resource language -- on the MultiPL-E benchmarks. For Racket and OCaml, we double their performance on MultiPL-E, bringing their performance close to higher-resource languages such as Ruby and C#.
    摘要 在过去几年,大型代码模型(Code LLMs)已经开始对编程实践产生重要影响。 Code LLMs 也在编程语言和软件工程研究中出现为基础建筑块。然而,由 Code LLMs 生成的代码质量各不相同,尤其是低资源语言。 Code LLMs 在具有充足训练数据的语言(如 Java、Python 或 JavaScript)上表现出色,但在低资源语言(如 OCaml 和 Racket)上困难。本文提出了一种有效的方法,可以提高 Code LLMs 在低资源语言上表现的方法。我们的方法通过生成高质量的低资源语言数据,然后使用这些数据来练化任何预训练 Code LLM。我们的方法被称为 MultiPL-T,它将高资源语言的训练数据翻译成低资源语言的训练数据。我们使用这种方法生成了数以千计的新的有效训练项目,用于 Racket、OCaml 和 Lua。此外,我们使用公开的数据集(The Stack)和模型(StarCoderBase),可以在这些数据上训练模型,而不违反模型的许可证。使用 MultiPL-T 生成的数据,我们提出了一些精心调整的 StarCoderBase 模型,在benchmark问题上达到了状态之 искусственный智能的性能。对 Lua 来说,我们的调整模型和 Python 的 StarCoderBase 模型在 MultiPL-E benchmark上具有同等的性能。对 Racket 和 OCaml 来说,我们的调整模型可以提高它们的性能,使其接近高资源语言如 Ruby 和 C#。

Utilizing Semantic Textual Similarity for Clinical Survey Data Feature Selection

  • paper_url: http://arxiv.org/abs/2308.09892
  • repo_url: https://github.com/bcwarner/sts-select
  • paper_authors: Benjamin C. Warner, Ziqi Xu, Simon Haroutounian, Thomas Kannampallil, Chenyang Lu
  • for: 这篇论文是为了提出一种基于文本名称的对Target outcome的Semantic textual similarity(STS)分析方法,以便从 Survey data 中选择最佳的特征集。
  • methods: 本论文使用了Language models(LMs)来评估特征名称和Target名称之间的semantic textual similarity(STS),并将其用于特征选择。
  • results: 结果显示,使用STS进行特征选择可以实现更高的模型性能,相比传统的特征选择算法。
    Abstract Survey data can contain a high number of features while having a comparatively low quantity of examples. Machine learning models that attempt to predict outcomes from survey data under these conditions can overfit and result in poor generalizability. One remedy to this issue is feature selection, which attempts to select an optimal subset of features to learn upon. A relatively unexplored source of information in the feature selection process is the usage of textual names of features, which may be semantically indicative of which features are relevant to a target outcome. The relationships between feature names and target names can be evaluated using language models (LMs) to produce semantic textual similarity (STS) scores, which can then be used to select features. We examine the performance using STS to select features directly and in the minimal-redundancy-maximal-relevance (mRMR) algorithm. The performance of STS as a feature selection metric is evaluated against preliminary survey data collected as a part of a clinical study on persistent post-surgical pain (PPSP). The results suggest that features selected with STS can result in higher performance models compared to traditional feature selection algorithms.
    摘要 Survey data 可能包含大量的特征,而具有较少的示例数量。机器学习模型在这种情况下预测结果时可能会过拟合,导致泛化性差。一种解决方案是Feature选择,它尝试选择最佳的特征子集来学习。文本特征名称的使用仍然是一个相对未探索的信息来源。我们可以使用语言模型(LM)评估特征名称和目标名称之间的语义文本相似性(STS) scores,以便选择特征。我们对直接使用 STS 选择特征和 minimal-redundancy-maximal-relevance(mRMR)算法进行了评估。我们发现,使用 STS 作为特征选择度量时,可以获得比传统特征选择算法更高的性能。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Inductive-bias Learning: Generating Code Models with Large Language Model

  • paper_url: http://arxiv.org/abs/2308.09890
  • repo_url: https://github.com/fuyu-quant/iblm
  • paper_authors: Toma Tanaka, Naofumi Emoto, Tsukasa Yumibayashi
  • for: 本研究旨在提出一种新的学习方法,即归纳学习(Inductive-Bias Learning,IBL),这种方法结合了归纳学习(ICL)和代码生成技术,以实现高精度的推理和代码生成。
  • methods: 本研究使用的方法包括归纳学习(ICL)和代码生成技术。通过输入训练数据,IBL可以从Contextual Understanding中找到必需的结构,并生成相应的代码模型,以实现高精度的推理和代码生成。
  • results: 研究发现,IBL可以实现与ICL和代表性机器学习模型相当的预测精度,并且代码生成结果具有较高的可读性和解释性。此外,IBL代码也是开源的,可以在https://github.com/fuyu-quant/IBLM上下载。
    Abstract Large Language Models(LLMs) have been attracting attention due to a ability called in-context learning(ICL). ICL, without updating the parameters of a LLM, it is possible to achieve highly accurate inference based on rules ``in the context'' by merely inputting a training data into the prompt. Although ICL is a developing field with many unanswered questions, LLMs themselves serves as a inference model, seemingly realizing inference without explicitly indicate ``inductive bias''. On the other hand, a code generation is also a highlighted application of LLMs. The accuracy of code generation has dramatically improved, enabling even non-engineers to generate code to perform the desired tasks by crafting appropriate prompts. In this paper, we propose a novel ``learning'' method called an ``Inductive-Bias Learning (IBL)'', which combines the techniques of ICL and code generation. An idea of IBL is straightforward. Like ICL, IBL inputs a training data into the prompt and outputs a code with a necessary structure for inference (we referred to as ``Code Model'') from a ``contextual understanding''. Despite being a seemingly simple approach, IBL encompasses both a ``property of inference without explicit inductive bias'' inherent in ICL and a ``readability and explainability'' of the code generation. Surprisingly, generated Code Models have been found to achieve predictive accuracy comparable to, and in some cases surpassing, ICL and representative machine learning models. Our IBL code is open source: https://github.com/fuyu-quant/IBLM
    摘要 大型语言模型(LLM)在近年来吸引了很多注意,主要是因为它们的一个能力 called "in-context learning"(ICL)。ICL是不需要更新 LLM 的参数时,可以通过输入训练数据来 achieve highly accurate inference based on "rules in the context"。although ICL is a developing field with many unanswered questions, LLMs themselves serve as an inference model, seemingly realizing inference without explicitly indicating "inductive bias"。另一方面,代码生成也是 LLM 的优点之一。代码生成的精度已经得到了很大改善,使得甚至非工程师也可以通过设计适当的 prompt 来生成代码以完成想要的任务。在这篇文章中,我们提出了一个新的“学习”方法called“对应式学习”(IBL),它结合了 ICL 和代码生成的技术。IBL 的想法是 straightforward。如 ICLL,IBL 通过输入训练数据来生成一个代码模型(我们称之为“内在结构”),并将其与内在的概念“对应”。尽管看起来很简单,但 IBL 包含了 ICL 的“无预先假设”和代码生成的“可读性和解释性”。惊奇的是,生成的代码模型已经被发现可以 achieve predictive accuracy comparable to, and in some cases surpassing, ICL 和代表性机器学习模型。我们的 IBL 代码可以在 GitHub 上找到:https://github.com/fuyu-quant/IBLM。

DUAW: Data-free Universal Adversarial Watermark against Stable Diffusion Customization

  • paper_url: http://arxiv.org/abs/2308.09889
  • repo_url: None
  • paper_authors: Xiaoyu Ye, Hao Huang, Jiaqi An, Yongtao Wang
  • for: 保护版权图像从多种自定义方法中,防止抄袭特定风格或主题。
  • methods: 使用不可见的数据自由universal adversarial watermark(DUAW),在不直接处理版权图像的情况下,在多个版本的SD模型中保护各种版权图像。
  • results: DUAW可以有效地扰乱自定义SD模型生成的图像,使其可见 both human observers和一个简单的分类器。
    Abstract Stable Diffusion (SD) customization approaches enable users to personalize SD model outputs, greatly enhancing the flexibility and diversity of AI art. However, they also allow individuals to plagiarize specific styles or subjects from copyrighted images, which raises significant concerns about potential copyright infringement. To address this issue, we propose an invisible data-free universal adversarial watermark (DUAW), aiming to protect a myriad of copyrighted images from different customization approaches across various versions of SD models. First, DUAW is designed to disrupt the variational autoencoder during SD customization. Second, DUAW operates in a data-free context, where it is trained on synthetic images produced by a Large Language Model (LLM) and a pretrained SD model. This approach circumvents the necessity of directly handling copyrighted images, thereby preserving their confidentiality. Once crafted, DUAW can be imperceptibly integrated into massive copyrighted images, serving as a protective measure by inducing significant distortions in the images generated by customized SD models. Experimental results demonstrate that DUAW can effectively distort the outputs of fine-tuned SD models, rendering them discernible to both human observers and a simple classifier.
    摘要 首先,DUAW是在SD自定义过程中扰乱变量自动编码器的设计。其次,DUAW在无数据上下文中操作,通过使用一个大型自然语言模型(LLM)和预训练的SD模型生成的 sintetic图像进行训练。这种方法可以避免直接处理版权图像,保持其 конфиденциальность。一旦制作完毕,DUAW可以隐身地 integrate到大量版权图像中,作为保护性的水印,对于自定义SD模型生成的图像进行显著的扭曲。实验结果表明,DUAW可以有效地扭曲调整后的SD模型输出,使其可见 both human observer和一个简单的分类器。

On Estimating the Gradient of the Expected Information Gain in Bayesian Experimental Design

  • paper_url: http://arxiv.org/abs/2308.09888
  • repo_url: https://github.com/ziq-ao/GradEIG
  • paper_authors: Ziqiao Ao, Jinglai Li
  • for: 本研究旨在提高 bayesian 推理中的实验条件,通过优化预期信息增强 (EIG) 的优化。
  • methods: 本研究提出了两种估算 EIG 的梯度方法:UEEG-MCMC 和 BEEG-AP。UEEG-MCMC 利用 MCMC 生成的 posterior 样本来估算 EIG 梯度,而 BEEG-AP 则强调高效的 simulation 实现,通过重复使用参数样本来估算 EIG 梯度。
  • results: 理论分析和数值实验表明,UEEG-MCMC 对实际 EIG 值具有良好的稳定性,而 BEEG-AP 在 EIG 值小时显示更高的效率。此外,两种方法在数值实验中都表现出优于多个流行的参考方法。
    Abstract Bayesian Experimental Design (BED), which aims to find the optimal experimental conditions for Bayesian inference, is usually posed as to optimize the expected information gain (EIG). The gradient information is often needed for efficient EIG optimization, and as a result the ability to estimate the gradient of EIG is essential for BED problems. The primary goal of this work is to develop methods for estimating the gradient of EIG, which, combined with the stochastic gradient descent algorithms, result in efficient optimization of EIG. Specifically, we first introduce a posterior expected representation of the EIG gradient with respect to the design variables. Based on this, we propose two methods for estimating the EIG gradient, UEEG-MCMC that leverages posterior samples generated through Markov Chain Monte Carlo (MCMC) to estimate the EIG gradient, and BEEG-AP that focuses on achieving high simulation efficiency by repeatedly using parameter samples. Theoretical analysis and numerical studies illustrate that UEEG-MCMC is robust agains the actual EIG value, while BEEG-AP is more efficient when the EIG value to be optimized is small. Moreover, both methods show superior performance compared to several popular benchmarks in our numerical experiments.
    摘要 bayesian experimental design (BED), which aims to find the optimal experimental conditions for bayesian inference, 通常是要优化预期信息增强(EIG)的目标。为了提高效率,通常需要计算EIG的梯度信息,因此能够估计EIG梯度的能力是BED问题的关键。本工作的主要目标是开发估计EIG梯度的方法,这些方法可以与随机梯度下降算法结合使用,从而实现EIG的高效优化。我们首先引入了 posterior expected representation of EIG gradient with respect to design variables。然后,我们提出了两种估计EIG梯度的方法:UEEG-MCMC,利用MCMC生成的 posterior samples来估计EIG梯度,和BEEG-AP,重点是实现高效的 simulations by repeatedly using parameter samples。我们的理论分析和数学实验表明,UEEG-MCMC 对实际EIG值有良好的稳定性,而BEEG-AP 在EIG值要 optimize 小时更高效。此外,两种方法在我们的数学实验中表现出了较好的性能,比如几种流行的参考方法。

Calibrating Uncertainty for Semi-Supervised Crowd Counting

  • paper_url: http://arxiv.org/abs/2308.09887
  • repo_url: None
  • paper_authors: Chen Li, Xiaoling Hu, Shahira Abousamra, Chao Chen
  • for: 这个研究的目的是提出一种可靠地测量人群数量的方法,并且能够在半指导式的情况下进行数据训练。
  • methods: 本研究使用了一种基于模型uncertainty的方法,通过训练一个surrogate函数来控制模型的uncertainty。另外,我们还提出了一个基于匹配的patch-wise surrogate函数,以更好地近似uncertainty的数据。
  • results: 根据实验结果显示,我们的方法能够生成可靠的uncertainty估计、高质量的pseudolabels,并且在半指导式的人群数量推断 task 上实现了state-of-the-art的表现。
    Abstract Semi-supervised crowd counting is an important yet challenging task. A popular approach is to iteratively generate pseudo-labels for unlabeled data and add them to the training set. The key is to use uncertainty to select reliable pseudo-labels. In this paper, we propose a novel method to calibrate model uncertainty for crowd counting. Our method takes a supervised uncertainty estimation strategy to train the model through a surrogate function. This ensures the uncertainty is well controlled throughout the training. We propose a matching-based patch-wise surrogate function to better approximate uncertainty for crowd counting tasks. The proposed method pays a sufficient amount of attention to details, while maintaining a proper granularity. Altogether our method is able to generate reliable uncertainty estimation, high quality pseudolabels, and achieve state-of-the-art performance in semisupervised crowd counting.
    摘要 semi-supervised crowd counting是一项重要又挑战性的任务。一种流行的方法是通过逐步生成 pseudo-labels 来训练无标注数据。关键在于使用uncertainty来选择可靠的 pseudo-labels。在这篇论文中,我们提出了一种新的方法来准确地控制模型的uncertainty。我们使用一种监督性uncertainty估计策略来训练模型,并通过一个匹配基于的 patch-wise 代理函数来更好地估计uncertainty。我们的方法具有充分的细节注意力,同时保持合适的粒度。总之,我们的方法能够生成可靠的uncertainty估计、高质量的 pseudo-labels,并在semi-supervised crowd counting中实现状态前的性能。

A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case

  • paper_url: http://arxiv.org/abs/2308.09884
  • repo_url: None
  • paper_authors: Oluwaseyi Ogunfowora, Homayoun Najjaran
  • for: 这个研究是为了提出一个基于encoder-transformer架构的多变量时间序列预测框架,用于预测机器的剩下有用生命时间(RUL)。
  • methods: 本研究使用了encoder-transformer模型,并进行了三个模型对应实验,以将自然语言领域中的transformer模型应用到时间序列预测中。此外,本研究还提出了一个新的扩展窗口方法,用于让模型耳熟悉机器的初期阶段和衰老路径。
  • results: 根据试验数据表现,提出的encoder-transformer模型在13个state-of-the-art(SOTA)模型中得到了平均提高137.65%的性能。
    Abstract In recent times, Large Language Models (LLMs) have captured a global spotlight and revolutionized the field of Natural Language Processing. One of the factors attributed to the effectiveness of LLMs is the model architecture used for training, transformers. Transformer models excel at capturing contextual features in sequential data since time series data are sequential, transformer models can be leveraged for more efficient time series data prediction. The field of prognostics is vital to system health management and proper maintenance planning. A reliable estimation of the remaining useful life (RUL) of machines holds the potential for substantial cost savings. This includes avoiding abrupt machine failures, maximizing equipment usage, and serving as a decision support system (DSS). This work proposed an encoder-transformer architecture-based framework for multivariate time series prediction for a prognostics use case. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset for the remaining useful life prediction task. To effectively transfer the knowledge and application of transformers from the natural language domain to time series, three model-specific experiments were conducted. Also, to enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed for the first time in this work, it was compared with the sliding window method, and it led to a large improvement in the performance of the encoder transformer model. Finally, the performance of the proposed encoder-transformer model was evaluated on the test dataset and compared with the results from 13 other state-of-the-art (SOTA) models in the literature and it outperformed them all with an average performance increase of 137.65% over the next best model across all the datasets.
    摘要 This work proposes an encoder-transformer architecture-based framework for multivariate time series prediction in a prognostics use case. We validated the effectiveness of the proposed framework on all four datasets of the C-MAPPS benchmark for RUL prediction. To transfer knowledge and application of transformers from the natural language domain to time series, we conducted three model-specific experiments. Additionally, we proposed a novel expanding window method to enhance the model's awareness of the initial stages of machine life and its degradation path. This method was compared with the sliding window method and led to a large improvement in the performance of the encoder transformer model.The proposed encoder-transformer model was evaluated on the test dataset and outperformed 13 other state-of-the-art (SOTA) models in the literature, with an average performance increase of 137.65% over the next best model across all datasets.

Flamingo: Multi-Round Single-Server Secure Aggregation with Applications to Private Federated Learning

  • paper_url: http://arxiv.org/abs/2308.09883
  • repo_url: https://github.com/eniac/flamingo
  • paper_authors: Yiping Ma, Jess Woods, Sebastian Angel, Antigoni Polychroniadou, Tal Rabin
  • for: 这篇论文描述了一种用于安全聚合 Federated Learning 中的数据的系统,即 Flamingo。
  • methods: Flamingo 使用了一种新的轻量级随机 dropout 抗性协议,以及一种新的客户端邻居选择方法,以确保在客户端离开中阶段的服务器仍然可以获得有意义的结果。
  • results: Flamingo 可以安全地训练基于 (Extended) MNIST 和 CIFAR-100 数据集的神经网络,并且模型可以在不失去精度的情况下 converge。相比之下,非私有 Federated Learning 系统可以带来更快的结果。
    Abstract This paper introduces Flamingo, a system for secure aggregation of data across a large set of clients. In secure aggregation, a server sums up the private inputs of clients and obtains the result without learning anything about the individual inputs beyond what is implied by the final sum. Flamingo focuses on the multi-round setting found in federated learning in which many consecutive summations (averages) of model weights are performed to derive a good model. Previous protocols, such as Bell et al. (CCS '20), have been designed for a single round and are adapted to the federated learning setting by repeating the protocol multiple times. Flamingo eliminates the need for the per-round setup of previous protocols, and has a new lightweight dropout resilience protocol to ensure that if clients leave in the middle of a sum the server can still obtain a meaningful result. Furthermore, Flamingo introduces a new way to locally choose the so-called client neighborhood introduced by Bell et al. These techniques help Flamingo reduce the number of interactions between clients and the server, resulting in a significant reduction in the end-to-end runtime for a full training session over prior work. We implement and evaluate Flamingo and show that it can securely train a neural network on the (Extended) MNIST and CIFAR-100 datasets, and the model converges without a loss in accuracy, compared to a non-private federated learning system.
    摘要 Translated into Simplified Chinese:这篇论文介绍了Flamingo系统,用于在大量客户端上安全地汇集数据。在安全汇集中,服务器将客户端的私有输入汇集到最终结果中,而不会了解每个输入的详细信息。Flamingo针对了联合学习中的多轮设定,在多个汇集(均值)中得到一个好的模型。先前的协议,如Bell et al. (CCS '20),是为单轮设定而设计,并通过重复协议来适应联合学习设定。Flamingo消除了先前协议的每轮设定需求,并提供了一个轻量级的dropout鲁棒性协议,以确保如果客户端在汇集过程中离开,服务器仍可以获得有意义的结果。此外,Flamingo还引入了一种新的客户端 neighboorhood的选择方法,这些方法帮助Flamingo减少客户端和服务器之间的交互次数,从而导致了对于整个训练会话的结束到终端时间的重要减少。我们实现和评估了Flamingo,并证明它可以安全地训练一个神经网络在(扩展)MNIST和CIFAR-100数据集上,并且模型可以无损地训练完成,相比于非私钥联合学习系统。

Generative Adversarial Networks Unlearning

  • paper_url: http://arxiv.org/abs/2308.09881
  • repo_url: None
  • paper_authors: Hui Sun, Tianqing Zhu, Wenhan Chang, Wanlei Zhou
  • for: 本研究旨在解决Generative Adversarial Networks (GANs)中的机器学习模型卸载数据的问题,即生成器和判别器的架构特点使得卸载过程可能会导致 latent space 的破坏和模型效果的降低。
  • methods: 本研究提出了一种替换机制和假标签来有效地解决 generator 卸载和判别器的挑战。基于这种替换机制和假标签,我们提出了一种层次式卸载方法,其中卸载和学习过程Running in a cascaded manner。
  • results: 我们在 MNIST 和 CIFAR-10 数据集上进行了全面的评估,结果表明,这种层次式卸载方法可以大幅提高项和类卸载效率,比 retraining from scratch 减少时间量达 185x 和 284x。此外,我们发现,即使模型性能受到一定影响,这些影响几乎可以忽略不计(如 64 个图像),并无负面影响下游任务 such as classification。
    Abstract As machine learning continues to develop, and data misuse scandals become more prevalent, individuals are becoming increasingly concerned about their personal information and are advocating for the right to remove their data. Machine unlearning has emerged as a solution to erase training data from trained machine learning models. Despite its success in classifiers, research on Generative Adversarial Networks (GANs) is limited due to their unique architecture, including a generator and a discriminator. One challenge pertains to generator unlearning, as the process could potentially disrupt the continuity and completeness of the latent space. This disruption might consequently diminish the model's effectiveness after unlearning. Another challenge is how to define a criterion that the discriminator should perform for the unlearning images. In this paper, we introduce a substitution mechanism and define a fake label to effectively mitigate these challenges. Based on the substitution mechanism and fake label, we propose a cascaded unlearning approach for both item and class unlearning within GAN models, in which the unlearning and learning processes run in a cascaded manner. We conducted a comprehensive evaluation of the cascaded unlearning technique using the MNIST and CIFAR-10 datasets. Experimental results demonstrate that this approach achieves significantly improved item and class unlearning efficiency, reducing the required time by up to 185x and 284x for the MNIST and CIFAR-10 datasets, respectively, in comparison to retraining from scratch. Notably, although the model's performance experiences minor degradation after unlearning, this reduction is negligible when dealing with a minimal number of images (e.g., 64) and has no adverse effects on downstream tasks such as classification.
    摘要 machine learning技术不断发展,同时数据滥用事件的发生也使人们对个人信息变得越来越重要,因此开始提出个人数据权的要求。为了解除训练机器学习模型的数据,机器学习卸学(Machine Unlearning)已成为一种解决方案。然而,对于生成型 adversarial network(GAN)来说,研究尚未充分发展,其特殊的架构包括生成器和分类器,带来了一些挑战。其中一个挑战是生成器卸学,因为这个过程可能会破坏生成器的维度完整性,从而降低模型的效果。另一个挑战是如何定义分类器对卸学图像的标准。在这篇论文中,我们提出了替换机制和假标签,以解决这些挑战。基于替换机制和假标签,我们提出了卸学和学习的层次结构,在这种结构下,卸学和学习过程会在一起运行。我们对MNIST和CIFAR-10 datasets进行了全面的评估,实验结果表明,这种方法可以有效提高项目和类卸学效率,比Retraining from scratch需要的时间减少了185倍和284倍。尤其是在处理少量图像时(例如64个),模型的性能下降非常小,无法影响下游任务 such as classification。

DatasetEquity: Are All Samples Created Equal? In The Quest For Equity Within Datasets

  • paper_url: http://arxiv.org/abs/2308.09878
  • repo_url: https://github.com/towardsautonomy/datasetequity
  • paper_authors: Shubham Shrivastava, Xianling Zhang, Sushruth Nagesh, Armin Parchami
  • for: addressed the data imbalance issue in machine learning, specifically in computer vision tasks, by developing a novel method that leverages deep perceptual embeddings and clustering to weigh samples differently during training.
  • methods: the proposed method uses sample likelihoods based on image appearance, computed using deep perceptual embeddings and clustering, to weigh samples differently during training with a novel $\textbf{Generalized Focal Loss}$ function.
  • results: the proposed method achieves over $200%$ AP gains on under-represented classes (Cyclist) in the KITTI dataset, demonstrating its effectiveness in improving state-of-the-art 3D object detection methods, and its generalizability across different datasets and rare classes.
    Abstract Data imbalance is a well-known issue in the field of machine learning, attributable to the cost of data collection, the difficulty of labeling, and the geographical distribution of the data. In computer vision, bias in data distribution caused by image appearance remains highly unexplored. Compared to categorical distributions using class labels, image appearance reveals complex relationships between objects beyond what class labels provide. Clustering deep perceptual features extracted from raw pixels gives a richer representation of the data. This paper presents a novel method for addressing data imbalance in machine learning. The method computes sample likelihoods based on image appearance using deep perceptual embeddings and clustering. It then uses these likelihoods to weigh samples differently during training with a proposed $\textbf{Generalized Focal Loss}$ function. This loss can be easily integrated with deep learning algorithms. Experiments validate the method's effectiveness across autonomous driving vision datasets including KITTI and nuScenes. The loss function improves state-of-the-art 3D object detection methods, achieving over $200\%$ AP gains on under-represented classes (Cyclist) in the KITTI dataset. The results demonstrate the method is generalizable, complements existing techniques, and is particularly beneficial for smaller datasets and rare classes. Code is available at: https://github.com/towardsautonomy/DatasetEquity
    摘要 “数据不匹配是机器学习领域的一个公认问题,归结于数据收集成本高、标签难度和数据地域分布。在计算机视觉领域,图像外观偏见对数据分布的偏见仍未得到充分探索。相比于使用类别标签的分布,图像外观表明对象之间的复杂关系,超出了类别标签的提供。使用深度感知特征提取器和归一化 clustering 可以得到更加丰富的数据表示。本文提出了一种novel方法,用于解决机器学习中的数据不匹配问题。该方法计算样本概率基于图像外观使用深度感知特征和归一化,然后使用这些概率将样本 differently 权重 durante 训练,使用我们提议的 $\textbf{通用增强损失}$ 函数。这个损失函数可以轻松地与深度学习算法结合使用。实验证明了该方法在自动驾驶视觉数据集(包括 KITTI 和 nuScenes)上的效果,可以提高状态的较为少见类(自行车手)的 AP 分数超过 200%。结果表明该方法是通用的,可以补做现有技术,特别有利于 smaller 数据集和罕见类。代码可以在 GitHub 上找到:https://github.com/towardsautonomy/DatasetEquity。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Skill Transformer: A Monolithic Policy for Mobile Manipulation

  • paper_url: http://arxiv.org/abs/2308.09873
  • repo_url: None
  • paper_authors: Xiaoyu Huang, Dhruv Batra, Akshara Rai, Andrew Szot
  • for: 这篇论文是用于解决长期机器人任务的方法。
  • methods: 这篇论文使用了条件序列模型和技能模块来解决问题。它使用transformer架构,通过示例轨迹来预测高级技能和全身低级动作,从而实现了任务的可组合性和模块性。
  • results: 在一个embodied重新排序测试中,这篇论文的方法比基eline achieved a 2.5倍高的成功率,在困难重新排序问题中表现出了稳定的任务规划和低级控制能力。
    Abstract We present Skill Transformer, an approach for solving long-horizon robotic tasks by combining conditional sequence modeling and skill modularity. Conditioned on egocentric and proprioceptive observations of a robot, Skill Transformer is trained end-to-end to predict both a high-level skill (e.g., navigation, picking, placing), and a whole-body low-level action (e.g., base and arm motion), using a transformer architecture and demonstration trajectories that solve the full task. It retains the composability and modularity of the overall task through a skill predictor module while reasoning about low-level actions and avoiding hand-off errors, common in modular approaches. We test Skill Transformer on an embodied rearrangement benchmark and find it performs robust task planning and low-level control in new scenarios, achieving a 2.5x higher success rate than baselines in hard rearrangement problems.
    摘要 我们提出了Skill Transformer,一种解决长期机器人任务的方法,通过加入条件序列模型和技能模块性。基于机器人 egocentric和 proprioceptive 观察,Skill Transformer 通过端到端训练,预测高级技能(如导航、捕获、放置)和整体低级动作(如基体和臂动作),使用 transformer 架构和示范轨迹解决整个任务。它保留了总任务的可组合性和模块性通过技能预测模块,并在低级动作和避免手动错误方面进行了合理的理解。我们在embodied rearrangement benchmark上测试了Skill Transformer,并发现它在新的 scenarios 中实现了可靠的任务规划和低级控制,与基eline 的成功率相比,在困难的重新排序问题中达到2.5倍的成功率。

Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

  • paper_url: http://arxiv.org/abs/2308.09858
  • repo_url: None
  • paper_authors: Yequan Zhao, Xinling Yu, Zhixiong Chen, Ziyue Liu, Sijia Liu, Zheng Zhang
  • for: 本研究的目的是提出一个不需要后向传播(Backward Propagation,BP)的框架,用于在边缘设备上训练具有真实性的神经网络。
  • methods: 我们的技术贡献包括三个方面:首先,我们提出了一种紧凑变量压缩方法来大幅提高零次ORDER(ZO)优化的可扩展性,使得可以处理大于过去ZO方法所能处理的网络大小。其次,我们提出了一种混合式Gradient评估方法来提高ZO训练的效率。最后,我们将我们的BP-free训练框架扩展到物理学 informed neural networks(PINNs)中,并提出了一种稀疏网格方法来估计损失函数中的导数而不使用BP。
  • results: 我们的BP-free训练只在MNIST数据集上失去了一点精度与标准首次Order训练相比。此外,我们还成功地训练了一个解决20维汉米尔-雅可比-贝尔姆 partial differential equation(PDE)的PINN。这种内存高效和BP-free的方法可能会成为未来资源有限平台(例如FPGA、ASIC、微控制器和光子学材料)上的准 Near-future on-device training。
    Abstract Backward propagation (BP) is widely used to compute the gradients in neural network training. However, it is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation. This has tremendously increased the design complexity and time-to-market of on-device training accelerators. This paper presents a completely BP-free framework that only requires forward propagation to train realistic neural networks. Our technical contributions are three-fold. Firstly, we present a tensor-compressed variance reduction approach to greatly improve the scalability of zeroth-order (ZO) optimization, making it feasible to handle a network size that is beyond the capability of previous ZO approaches. Secondly, we present a hybrid gradient evaluation approach to improve the efficiency of ZO training. Finally, we extend our BP-free training framework to physics-informed neural networks (PINNs) by proposing a sparse-grid approach to estimate the derivatives in the loss function without using BP. Our BP-free training only loses little accuracy on the MNIST dataset compared with standard first-order training. We also demonstrate successful results in training a PINN for solving a 20-dim Hamiltonian-Jacobi-Bellman PDE. This memory-efficient and BP-free approach may serve as a foundation for the near-future on-device training on many resource-constraint platforms (e.g., FPGA, ASIC, micro-controllers, and photonic chips).
    摘要 <>使用后向传播(BP)计算神经网络训练中的梯度是广泛使用的。然而,由于缺乏硬件和软件资源,在边缘设备上实现BP很困难。这有效地增加了设计复杂性和时间到市场的 neural network 训练加速器。本文提出了一个完全不需要BP的框架,只需要前向传播来训练真实的神经网络。我们的技术贡献有三个方面:1. 我们提出了一种压缩tensor的方法,以提高零次梯度优化的扩展性,使得可以处理大于前一代ZO方法所能处理的网络大小。2. 我们提出了一种混合梯度评估方法,以提高ZO训练的效率。3. 我们将BP-free训练框架扩展到物理学习神经网络(PINNs),并提出了一种稀疏网格方法,以便在损失函数中无需使用BP来估算梯度。BP-free训练只在MNIST数据集上失去了微scopic的精度,与标准首次训练相比。我们还成功地训练了一个解决20维汉密尔-雅可比-贝尔干涯方程的PINN。这种内存高效的BP-free方法可能会成为许多有限资源平台(如FPGA、ASIC、微控制器和光学芯片)的未来边缘训练基础。

Backdoor Mitigation by Correcting the Distribution of Neural Activations

  • paper_url: http://arxiv.org/abs/2308.09850
  • repo_url: None
  • paper_authors: Xi Li, Zhen Xiang, David J. Miller, George Kesidis
  • for: 本研究探讨了深度神经网络(DNN)中的后门(Trojan)攻击,特别是 Successful 攻击会导致内层活动分布的变化,以及如何通过修复这种变化来实现后门纠正。
  • methods: 本研究使用了reverse工程技术来恢复 trigger 的原始形式,并通过修复内层活动分布的变化来实现后门纠正。
  • results: 本研究发现,使用修复后门 trigger 的方法可以有效地纠正后门攻击,并且不需要对 DNN 的参数进行优化。此外,这种方法还可以有效地检测测试实例中是否存在 trigger。
    Abstract Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present. In this paper, we reveal and analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances, compared to that for clean instances. Even more importantly, we find that instances with the backdoor trigger will be correctly classified to their original source classes if this distribution alteration is corrected. Based on our observations, we propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration using reverse-engineered triggers. Notably, our method does not change any trainable parameters of the DNN, but achieves generally better mitigation performance than existing methods that do require intensive DNN parameter tuning. It also efficiently detects test instances with the trigger, which may help to catch adversarial entities in the act of exploiting the backdoor.
    摘要 后门(Trojan)攻击是深度神经网络(DNN)的重要类型敌意攻击,其中测试实例会在攻击者的目标类中被识别,只要攻击者的后门触发器存在。在这篇论文中,我们揭示了后门攻击的一个重要性ptych:成功攻击会导致后门触发器实例的内部层活动分布发生变化,相比于干净实例。进一步地,我们发现,如果这种分布变化得到了修正,那么后门触发器实例将会返回到其原始类别。基于我们的观察,我们提出了一种高效的后门整治方法,通过修正分布变化来实现后门整治。这种方法不会改变 DNN 的可调参数,但可以实现更好的整治性能,并快速检测测试实例中的触发器。

Enumerating Safe Regions in Deep Neural Networks with Provable Probabilistic Guarantees

  • paper_url: http://arxiv.org/abs/2308.09842
  • repo_url: None
  • paper_authors: Luca Marzari, Davide Corsi, Enrico Marchesini, Alessandro Farinelli, Ferdinando Cicalese
  • for: 确保深度神经网络(DNNs)的可靠性,identifying safe areas是关键。
  • methods: 我们引入AllDNN-Verification问题,给出了一种有效的 aproximation方法called epsilon-ProVe。
  • results: 我们的方法可以提供一个紧身的(具有证明的概率保证)下界估计安全区域,并且在不同的标准准比例上进行了实验,证明了我们的方法的可扩展性和有效性。
    Abstract Identifying safe areas is a key point to guarantee trust for systems that are based on Deep Neural Networks (DNNs). To this end, we introduce the AllDNN-Verification problem: given a safety property and a DNN, enumerate the set of all the regions of the property input domain which are safe, i.e., where the property does hold. Due to the #P-hardness of the problem, we propose an efficient approximation method called epsilon-ProVe. Our approach exploits a controllable underestimation of the output reachable sets obtained via statistical prediction of tolerance limits, and can provide a tight (with provable probabilistic guarantees) lower estimate of the safe areas. Our empirical evaluation on different standard benchmarks shows the scalability and effectiveness of our method, offering valuable insights for this new type of verification of DNNs.
    摘要 安全区域的标识是深度神经网络(DNN)系统的关键策略,以确保信任性。为此,我们提出了AllDNN-Verification问题:给定一个安全性质和一个DNN,列出安全区域的输入Domaint中的所有区域,即where the property does hold。由于这个问题的P-完备性,我们提出了一种高效的近似方法called epsilon-ProVe。我们的方法利用了可控的输出可达集的统计预测误差,可以提供一个紧靠的(带有可靠的 probabilistic guarantees)下界估计安全区域。我们的实验表明了我们的方法的扩展性和有效性,为这种新的DNN验证提供了有价值的发现。

Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis

  • paper_url: http://arxiv.org/abs/2308.09835
  • repo_url: None
  • paper_authors: Shijie Li, Mengwei Ren, Thomas Ach, Guido Gerig
  • for: 这个论文主要针对的是用深度学习方法进行微scopia图像分割,但是现有的方法几乎都需要大量的培训数据,包括完整的对象 outline,这是非常时间consuming和困难的。这篇论文提出了一种使用点标注(即对象中心点)来训练微scopia图像分割模型的方法。
  • methods: 这篇论文提出了一种三个阶段的框架,包括: 1. 使用点标注生成一个 Pseudo dense segmentation mask,这个mask受到形状约束的限制。 2. 使用一种未经培训的图像生成模型将mask翻译成一个真实的微scopia图像,这个图像受到对象级别一致性的限制。 3. 将Pseudo masks和生成的图像组成一个对应的 dataset,用于训练微scopia图像分割模型。
  • results: compared to使用pseudo-labels或基eline生成的图像,模型在使用这种synthesis pipeline训练后表现出了明显的改进。此外,这种方法可以与使用authentic microscopy images with dense labels进行比较,并且达到了相似的性能。代码可以获取。
    Abstract Current deep learning-based approaches for the segmentation of microscopy images heavily rely on large amount of training data with dense annotation, which is highly costly and laborious in practice. Compared to full annotation where the complete contour of objects is depicted, point annotations, specifically object centroids, are much easier to acquire and still provide crucial information about the objects for subsequent segmentation. In this paper, we assume access to point annotations only during training and develop a unified pipeline for microscopy image segmentation using synthetically generated training data. Our framework includes three stages: (1) it takes point annotations and samples a pseudo dense segmentation mask constrained with shape priors; (2) with an image generative model trained in an unpaired manner, it translates the mask to a realistic microscopy image regularized by object level consistency; (3) the pseudo masks along with the synthetic images then constitute a pairwise dataset for training an ad-hoc segmentation model. On the public MoNuSeg dataset, our synthesis pipeline produces more diverse and realistic images than baseline models while maintaining high coherence between input masks and generated images. When using the identical segmentation backbones, the models trained on our synthetic dataset significantly outperform those trained with pseudo-labels or baseline-generated images. Moreover, our framework achieves comparable results to models trained on authentic microscopy images with dense labels, demonstrating its potential as a reliable and highly efficient alternative to labor-intensive manual pixel-wise annotations in microscopy image segmentation. The code is available.
    摘要 当前的深度学习基于方法 для微scopia图像分割强调大量的训练数据和精密的标注,但这在实际中是非常昂贵和劳动密集的。相比于全标注,其中包括对象的完整边界,点标注,具体来说是对象的中心点,是训练中得到的 much easier。在这篇论文中,我们假设在训练时可以获得点标注。我们提出了一个整体框架,包括以下三个阶段:1. 从点标注中提取pseudo dense segmentation mask,并受限于形状假设。2. 使用未经准备的图像生成模型,将mask翻译成真实的微scopia图像,并在对象水平进行了规范。3. pseudo masks和生成的图像组成一个对应的对数据集,用于训练专门的分割模型。在公共的 MoNuSeg 数据集上,我们的生成框架生成了更加多样化和真实的图像,同时保持输入权重和生成图像之间的高协调性。当使用同一个分割背bone时,我们在生成数据集上训练的模型显著超过 pseudo-标签或基eline-生成的图像模型。此外,我们的框架可以与 dense标注数据集进行比较,表明它可以作为 dense标注的可靠和高效的替代方案。代码可以获取。

Learning from A Single Graph is All You Need for Near-Shortest Path Routing in Wireless Networks

  • paper_url: http://arxiv.org/abs/2308.09829
  • repo_url: None
  • paper_authors: Yung-Fu Chen, Sen Lin, Anish Arora
  • For: 本研究提出一种学习算法,用于解决无线网络中的本地路由策略问题。这种算法只需要一些从同一个图中获取的数据样本,可以对所有随机图进行泛化。* Methods: 本研究使用深度神经网络(DNNs)来学习本地路由策略。这些DNNs可以高效地和扩展地学习路由策略,只考虑节点状态和邻居节点状态。* Results: 研究结果显示,使用这种算法可以快速地从一些路由路径中获取样本,并在各种随机图上获得高效和普遍适用的路由策略。此外,这种算法还可以提供 тео리тиче explainability,即为什么使用一个小型的种子图和节点抽样可以有效地学习路由策略。
    Abstract We propose a learning algorithm for local routing policies that needs only a few data samples obtained from a single graph while generalizing to all random graphs in a standard model of wireless networks. We thus solve the all-pairs near-shortest path problem by training deep neural networks (DNNs) that efficiently and scalably learn routing policies that are local, i.e., they only consider node states and the states of neighboring nodes. Remarkably, one of these DNNs we train learns a policy that exactly matches the performance of greedy forwarding; another generally outperforms greedy forwarding. Our algorithm design exploits network domain knowledge in several ways: First, in the selection of input features and, second, in the selection of a ``seed graph'' and subsamples from its shortest paths. The leverage of domain knowledge provides theoretical explainability of why the seed graph and node subsampling suffice for learning that is efficient, scalable, and generalizable. Simulation-based results on uniform random graphs with diverse sizes and densities empirically corroborate that using samples generated from a few routing paths in a modest-sized seed graph quickly learns a model that is generalizable across (almost) all random graphs in the wireless network model.
    摘要 我们提出了一种学习算法,用于本地路由策略,只需要几个数据样本,从一个图中获取,而能够泛化到所有随机图中的标准网络模型。我们通过训练深度神经网络(DNN),efficiently和可扩展地学习本地路由策略,即只考虑节点状态和邻居节点状态。奇怪的是,我们训练的一个DNN学习策略与批发转发性能相同;另一个策略则通常超过批发转发性能。我们的算法设计利用网络领域知识,包括输入特征的选择和``seed graph''和节点子集的选择。这种利用领域知识的设计提供了有理解的解释,为何seed graph和节点子集 suffice for learning是有效、可扩展和泛化的。针对随机图中的不同大小和密度的实验结果,表明使用从seed graph中生成的几个路由路径的样本,可以快速学习一个泛化到大多数随机图的模型。

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

  • paper_url: http://arxiv.org/abs/2308.09804
  • repo_url: https://github.com/henryhzy/vl-pet
  • paper_authors: Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang
  • for: 这个论文目的是提出一个具有效的vision-and-language受控 Parametric Tuning(VL-PET)框架,以便实现更好的效率和有效性变数。
  • methods: 这个论文使用了一种新的粒度控制机制,允许在不同的粒度控制矩阵下逐步实现不同的模组化修改。此外,论文还提出了一些轻量级的PET模组设计,以提高预测和文本生成的能力。
  • results: 实验结果显示,这个VL-PET框架可以与现有的PET技术相比,在四个影像文本任务和四个影像视频文本任务上表现更好,并且可以实现更好的效率和有效性变数。具体来说,使用VL-PET-大的模型可以与BART-base和T5-base模型相比,在影像文本任务上表现出2.92%(3.41%)和3.37%(7.03%)的改善。
    Abstract As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning becomes prohibitively expensive for model training and storage. In vision-and-language (VL), parameter-efficient tuning (PET) techniques are proposed to integrate modular modifications (e.g., Adapter and LoRA) into encoder-decoder PLMs. By tuning a small set of trainable parameters, these techniques perform on par with full fine-tuning. However, excessive modular modifications and neglecting the functionality gap between the encoders and decoders can lead to performance degradation, while existing PET techniques (e.g., VL-Adapter) overlook these critical issues. In this paper, we propose a Vision-and-Language Parameter-Efficient Tuning (VL-PET) framework to impose effective control over modular modifications via a novel granularity-controlled mechanism. Considering different granularity-controlled matrices generated by this mechanism, a variety of model-agnostic VL-PET modules can be instantiated from our framework for better efficiency and effectiveness trade-offs. We further propose lightweight PET module designs to enhance VL alignment and modeling for the encoders and maintain text generation for the decoders. Extensive experiments conducted on four image-text tasks and four video-text tasks demonstrate the efficiency, effectiveness and transferability of our VL-PET framework. In particular, our VL-PET-large with lightweight PET module designs significantly outperforms VL-Adapter by 2.92% (3.41%) and LoRA by 3.37% (7.03%) with BART-base (T5-base) on image-text tasks. Furthermore, we validate the enhanced effect of employing our VL-PET designs on existing PET techniques, enabling them to achieve significant performance improvements. Our code is available at https://github.com/HenryHZY/VL-PET.
    摘要 “随着预训语言模型(PLM)的模型大小迅速增长,全面精通化训练(full fine-tuning)已成为训练和储存模型的瓶颈。在视觉语言(VL)领域,具有实现效益的参数效率训练(PET)技术被提出,以将可变的修改(e.g., Adapter和LoRA) integrate到encoder-decoder PLM 中。这些技术可以通过微小的参数训练,与全面精通化训练相比,实现相似的性能。然而,过度的修改和遗传的差异点(gap)对模型的性能产生负面影响,而现有的PET技术(e.g., VL-Adapter)忽略了这些重要问题。本文提出了一个vision-and-language参数效率训练(VL-PET)框架,以控制修改的具体化程度。通过这个框架,可以从不同的具体化控制矩阵生成多种模型独立的VL-PET模组,以达到更好的效率和效果的调控。此外,我们还提出了一些轻量级PET模组的设计,以增强视觉和文本的整合和模型。实验结果显示,我们的VL-PET-大型以轻量级PET模组设计对image-text任务和video-text任务表现出色,与BART-base和T5-base相比,优于VL-Adapter和LoRA。此外,我们还 validate了我们的VL-PET设计对现有PET技术的优化效果,使其实现更大的性能提升。我们的代码可以在https://github.com/HenryHZY/VL-PET 上获取。”

An Efficient High-Dimensional Gene Selection Approach based on Binary Horse Herd Optimization Algorithm for Biological Data Classification

  • paper_url: http://arxiv.org/abs/2308.09791
  • repo_url: None
  • paper_authors: Niloufar Mehrabi, Sayed Pedram Haeri Boroujeni, Elnaz Pashaei
  • for: 解决复杂高维问题,特别是维度较大的搜索问题。
  • methods: 使用新的马群优化算法(BHOA)和最小重复最大相关性筛选器(MRMR)组成的гибрид特征选择方法。
  • results: 对于十个靶场数据集(Lymphoma、Prostate、Brain-1、DLBCL、SRBCT、Leukemia、Ovarian、Colon、Lung、MLL),提出了一种更高效的特征选择方法,比如Gray Wolf(GW)、Particle Swarm Optimization(PSO)和遗传算法(GA)。
    Abstract The Horse Herd Optimization Algorithm (HOA) is a new meta-heuristic algorithm based on the behaviors of horses at different ages. The HOA was introduced recently to solve complex and high-dimensional problems. This paper proposes a binary version of the Horse Herd Optimization Algorithm (BHOA) in order to solve discrete problems and select prominent feature subsets. Moreover, this study provides a novel hybrid feature selection framework based on the BHOA and a minimum Redundancy Maximum Relevance (MRMR) filter method. This hybrid feature selection, which is more computationally efficient, produces a beneficial subset of relevant and informative features. Since feature selection is a binary problem, we have applied a new Transfer Function (TF), called X-shape TF, which transforms continuous problems into binary search spaces. Furthermore, the Support Vector Machine (SVM) is utilized to examine the efficiency of the proposed method on ten microarray datasets, namely Lymphoma, Prostate, Brain-1, DLBCL, SRBCT, Leukemia, Ovarian, Colon, Lung, and MLL. In comparison to other state-of-the-art, such as the Gray Wolf (GW), Particle Swarm Optimization (PSO), and Genetic Algorithm (GA), the proposed hybrid method (MRMR-BHOA) demonstrates superior performance in terms of accuracy and minimum selected features. Also, experimental results prove that the X-Shaped BHOA approach outperforms others methods.
    摘要 《各种高维问题的解决方法——骏马群优化算法》是一种新的元规则算法,基于不同年龄的马的行为。该算法最近被提出来解决复杂和高维问题。本文提出了一种二进制版的骏马群优化算法(BHOA),以解决离散问题和选择显著特征子。此外,本研究还提出了一种新的半程特征选择框架,基于BHOA和最小冗余最大相关性(MRMR)筛选方法。这种半程特征选择方法更加高效,生成了有益的相关和有用的特征子。由于特征选择是一个二进制问题,我们采用了一种新的转换函数(TF),即X型TF,将连续问题转换为二进制搜索空间。此外,我们使用支持向量机(SVM)来评估提出的方法在10个微陷数据集上的效率,即肿瘤、 próstata、大脑-1、DLBCL、SRBCT、白细胞癌、卵巢、肠癌和MLL等10个数据集。与其他现状之前的方法,如灰狼(GW)、粒子群动 optimization(PSO)和生物学算法(GA)相比,提出的半程特征选择方法(MRMR-BHOA)在精度和选择的最小特征上显示出超过其他方法的优势。此外,实验结果也证明了X型BHOA方法的优越性。

A Two-Part Machine Learning Approach to Characterizing Network Interference in A/B Testing

  • paper_url: http://arxiv.org/abs/2308.09790
  • repo_url: None
  • paper_authors: Yuan Yuan, Kristen M. Altenburger
  • for: 提高A/B测试的可靠性和精度,解决网络干扰的问题
  • methods: 使用机器学习算法确定和描述不同类型的网络干扰,自动确定”曝光地图”,解决现有文献中的两大限制
  • results: 通过synthetic实验和实际大规模测试(1-2万名Instagram用户),证明了我们的方法可以超越传统的设计基于块随机化和分析基于邻居曝光地图的方法,提高A/B测试结果的精度和可靠性。
    Abstract The reliability of controlled experiments, or "A/B tests," can often be compromised due to the phenomenon of network interference, wherein the outcome for one unit is influenced by other units. To tackle this challenge, we propose a machine learning-based method to identify and characterize heterogeneous network interference. Our approach accounts for latent complex network structures and automates the task of "exposure mapping'' determination, which addresses the two major limitations in the existing literature. We introduce "causal network motifs'' and employ transparent machine learning models to establish the most suitable exposure mapping that reflects underlying network interference patterns. Our method's efficacy has been validated through simulations on two synthetic experiments and a real-world, large-scale test involving 1-2 million Instagram users, outperforming conventional methods such as design-based cluster randomization and analysis-based neighborhood exposure mapping. Overall, our approach not only offers a comprehensive, automated solution for managing network interference and improving the precision of A/B testing results, but it also sheds light on users' mutual influence and aids in the refinement of marketing strategies.
    摘要 controlled experiments 的可靠性,或“A/B测试”,经常受到网络干扰的影响,这导致一个单位的结果受到其他单位的影响。为解决这个挑战,我们提议一种基于机器学习的方法,用于识别和特征化不同类型的网络干扰。我们的方法考虑了隐藏的复杂网络结构,并自动确定“曝光 mapping”,这两者都是现有文献中的主要限制。我们引入“ causal 网络模式”,并使用透明的机器学习模型来确定最适合的曝光 mapping,这些模型能够反映下面网络干扰模式。我们的方法在两个人工实验和一个实际的大规模测试中,对1-2万名INSTAGRAM用户进行验证,并表现出色,超过了传统的设计基于块随机化和分析基于邻居曝光 mapping的方法。总的来说,我们的方法不仅提供了一种全面、自动化的网络干扰管理和A/B测试结果的精度提高的解决方案,还 shed light on 用户之间的互相影响,并帮助优化市场策略。

Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models

  • paper_url: http://arxiv.org/abs/2308.09778
  • repo_url: None
  • paper_authors: Navid Rajabi, Jana Kosecka
  • for: 这个研究旨在评估大规模视语言模型(VLM)在不同的视觉理解任务中表现,包括数字、引用表达和通用的视觉问答任务。
  • methods: 该研究使用了细化的组合基础grounding方法,并提出了底层方法来评估视空间关系理解任务的表现。
  • results: 研究发现,当前的VLM模型在理解视空间关系方面表现不佳,与人类表现之间存在较大的差距。
    Abstract With the advances in large scale vision-and-language models (VLMs) it is of interest to assess their performance on various visual reasoning tasks such as counting, referring expressions and general visual question answering. The focus of this work is to study the ability of these models to understanding spatial relations. Previously, this has been tackled using image-text matching (Liu, Emerson, and Collier 2022) or visual question answering task, both showing poor performance and a large gap compared to human performance. To better understand the gap, we present fine-grained compositional grounding of spatial relationships and propose a bottom up approach for ranking spatial clauses and evaluating the performance of spatial relationship reasoning task. We propose to combine the evidence from grounding noun phrases corresponding to objects and their locations to compute the final rank of the spatial clause. We demonstrate the approach on representative vision-language models (Tan and Bansal 2019; Gupta et al. 2022; Kamath et al. 2021) and compare and highlight their abilities to reason about spatial relationships.
    摘要 With the advances in large-scale vision-and-language models (VLMs), it is of interest to assess their performance on various visual reasoning tasks such as counting, referring expressions, and general visual question answering. The focus of this work is to study the ability of these models to understand spatial relations. Previously, this has been tackled using image-text matching (Liu, Emerson, and Collier 2022) or visual question answering tasks, both showing poor performance and a large gap compared to human performance. To better understand the gap, we present fine-grained compositional grounding of spatial relationships and propose a bottom-up approach for ranking spatial clauses and evaluating the performance of spatial relationship reasoning tasks. We propose to combine the evidence from grounding noun phrases corresponding to objects and their locations to compute the final rank of the spatial clause. We demonstrate the approach on representative vision-language models (Tan and Bansal 2019; Gupta et al. 2022; Kamath et al. 2021) and compare and highlight their abilities to reason about spatial relationships.Here's the translation in Traditional Chinese: WITH 大规模的视力语言模型(VLM)的发展,有兴趣评估它们在不同的视觉推理任务上的表现,例如数量、参考表达和通用的视觉问题回答。这个研究的重点是研究这些模型对空间关系的理解。以前,这已经使用了图像和文本匹配(Liu、Emerson和Collier 2022)或视觉问题回答任务,都显示了轻微的表现和人类表现之间的大差。为了更好地理解这个差距,我们提出了细化的实体基底推理和评估视觉关系理解任务的方法。我们提议将对象和其位置的语言表达与它们的位置进行细化的实体基底推理,并将这些证据组合以计算最终排名的视觉关系 clause。我们在代表性的视力语言模型(Tan和Bansal 2019;Gupta et al. 2022;Kamath et al. 2021)上实现了这个方法,并与之比较和强调它们在视觉关系理解方面的能力。

Time Series Predictions in Unmonitored Sites: A Survey of Machine Learning Techniques in Water Resources

  • paper_url: http://arxiv.org/abs/2308.09766
  • repo_url: None
  • paper_authors: Jared D. Willard, Charuleka Varadharajan, Xiaowei Jia, Vipin Kumar
  • for: 预测不监测水体环境变量的精度减少了水资源科学中的长期挑战。大多数新鲜水资源的监测不具备关键环境变量的监测,尽管气候和土地使用变化的影响对水资源产生了越来越严重的影响。
  • methods: 现代机器学习方法在水文时间序预测中表现出了超越过程知识和经验模型的能力,特别是EXTRACT信息FROM大量多样数据集。
  • results: 我们回顾了相关的状态艺术应用程序,涵盖了流量和水质预测等水资源预测领域。分析结果表明,以往努力都集中在了深度学习框架上,但对不同类型机器学习方法的比较是 rare 和不充分。我们认为还有几个开放问题需要解决,包括包括动态输入和站点特点、机制理解和空间Context,以及现代机器学习框架中的可解释AI技术。
    Abstract Prediction of dynamic environmental variables in unmonitored sites remains a long-standing challenge for water resources science. The majority of the world's freshwater resources have inadequate monitoring of critical environmental variables needed for management. Yet, the need to have widespread predictions of hydrological variables such as river flow and water quality has become increasingly urgent due to climate and land use change over the past decades, and their associated impacts on water resources. Modern machine learning methods increasingly outperform their process-based and empirical model counterparts for hydrologic time series prediction with their ability to extract information from large, diverse data sets. We review relevant state-of-the art applications of machine learning for streamflow, water quality, and other water resources prediction and discuss opportunities to improve the use of machine learning with emerging methods for incorporating watershed characteristics into deep learning models, transfer learning, and incorporating process knowledge into machine learning models. The analysis here suggests most prior efforts have been focused on deep learning learning frameworks built on many sites for predictions at daily time scales in the United States, but that comparisons between different classes of machine learning methods are few and inadequate. We identify several open questions for time series predictions in unmonitored sites that include incorporating dynamic inputs and site characteristics, mechanistic understanding and spatial context, and explainable AI techniques in modern machine learning frameworks.
    摘要 <>Translate the following text into Simplified Chinese.<>现代机器学习方法在水资源预测中表现越来越出色,可以从大量多样数据集中提取信息。然而,预测不监测站点的水文变量仍然是水资源科学领域的长期挑战。全球大多数新鲜水资源尚未实施监测关键环境变量,尤其是气候和土地使用变化的影响。因此,广泛预测河流流量和水质等水资源变量的需求日益紧迫。本文将介绍相关的现代机器学习应用,包括流速、水质和其他水资源预测。我们还将讨论如何在深入学习模型中包含水体特征、知识传播和机器学习模型。分析结果表明,现有的尝试主要集中在美国多个站点的深度学习框架上,但对不同类型机器学习方法的比较 remains 有限。我们确定了一些未解决的问题,包括 incorporating 动态输入和站点特征、机制理解和空间上下文,以及现代机器学习框架中的可解释AI技术。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Taken by Surprise: Contrast effect for Similarity Scores

  • paper_url: http://arxiv.org/abs/2308.09765
  • repo_url: https://github.com/meetelise/surprise-similarity
  • paper_authors: Thomas C. Bachlechner, Mario Martone, Marjorie Schillo
  • for: 本研究旨在提高自然语言处理、信息检索和分类任务中的对象向量表示的准确评估。
  • methods: 该研究提出了一种名为“奇异分数”的ensemble-normalized相似度度量,它考虑到人类对对象之间的相似程度的感知效应,并在零/几shot文档分类任务中显示出10-15%的提高。
  • results: 研究发现,使用“奇异分数”相似度度量 Typically find 10-15% better performance compared to raw cosine similarity in zero/few shot document classification tasks.I hope that helps! Let me know if you have any further questions.
    Abstract Accurately evaluating the similarity of object vector embeddings is of critical importance for natural language processing, information retrieval and classification tasks. Popular similarity scores (e.g cosine similarity) are based on pairs of embedding vectors and disregard the distribution of the ensemble from which objects are drawn. Human perception of object similarity significantly depends on the context in which the objects appear. In this work we propose the $\textit{surprise score}$, an ensemble-normalized similarity metric that encapsulates the contrast effect of human perception and significantly improves the classification performance on zero- and few-shot document classification tasks. This score quantifies the surprise to find a given similarity between two elements relative to the pairwise ensemble similarities. We evaluate this metric on zero/few shot classification and clustering tasks and typically find 10-15 % better performance compared to raw cosine similarity. Our code is available at https://github.com/MeetElise/surprise-similarity.
    摘要 非常重要的评估对象vector embedding的相似性,是自然语言处理、信息检索和分类任务中的关键。受欢迎的相似性分数(如归一化相似性)忽略了对象 ensemble 的分布。人类对物体相似性的认知具有Context-dependent的特点,在这种情况下,我们提出了“surprise score”,一种ensemble-normalized相似度度量,它考虑了对象之间的对比效果,并在零/几shot文档分类任务中显示了10-15%的改善性。我们在这些任务中评估了这个度量,并通常发现它比 raw cosine similarity 更好。我们的代码可以在https://github.com/MeetElise/surprise-similarity上找到。

The Impact of Background Removal on Performance of Neural Networks for Fashion Image Classification and Segmentation

  • paper_url: http://arxiv.org/abs/2308.09764
  • repo_url: None
  • paper_authors: Junhui Liang, Ying Liu, Vladimir Vlassov
  • for: 提高时尚图像数据质量,提高模型性能
  • methods: 使用突出对象检测进行背景除去
  • results: 对时尚图像进行背景除去可以提高模型准确率,但是不适用于深度神经网络。
    Abstract Fashion understanding is a hot topic in computer vision, with many applications having great business value in the market. Fashion understanding remains a difficult challenge for computer vision due to the immense diversity of garments and various scenes and backgrounds. In this work, we try removing the background from fashion images to boost data quality and increase model performance. Having fashion images of evident persons in fully visible garments, we can utilize Salient Object Detection to achieve the background removal of fashion data to our expectations. A fashion image with the background removed is claimed as the "rembg" image, contrasting with the original one in the fashion dataset. We conducted extensive comparative experiments with these two types of images on multiple aspects of model training, including model architectures, model initialization, compatibility with other training tricks and data augmentations, and target task types. Our experiments show that background removal can effectively work for fashion data in simple and shallow networks that are not susceptible to overfitting. It can improve model accuracy by up to 5% in the classification on the FashionStyle14 dataset when training models from scratch. However, background removal does not perform well in deep neural networks due to incompatibility with other regularization techniques like batch normalization, pre-trained initialization, and data augmentations introducing randomness. The loss of background pixels invalidates many existing training tricks in the model training, adding the risk of overfitting for deep models.
    摘要 现代服装理解是计算机视觉领域热点话题,具有广泛的商业价值。然而,服装理解仍然是计算机视觉中的困难挑战,因为衣服的多样性和不同的场景和背景。在这种工作中,我们尝试将背景从时尚图像中除掉,以提高数据质量并提高模型性能。通过使用突出对象检测来实现背景的除掉,我们称之为“rembg”图像,与原始图像在时尚数据集中进行比较。我们进行了多种比较试验,包括模型架构、模型初始化、与其他训练技巧和数据扩展Compatibility,以及目标任务类型。我们的实验结果表明,背景除掉可以有效地提高时尚数据上的模型准确率。在FashionStyle14数据集上进行分类训练时,背景除掉可以提高模型准确率达到5%。然而,背景除掉不适合深度神经网络,因为它们与其他正则化技术,如批处理normalization、预先初始化和数据扩展,不兼容。失去背景像素会让许多现有的训练技巧无法在模型训练中使用,增加深度模型的风险。

Data Compression and Inference in Cosmology with Self-Supervised Machine Learning

  • paper_url: http://arxiv.org/abs/2308.09751
  • repo_url: https://github.com/aizhanaakhmet/data-compression-inference-in-cosmology-with-ssl
  • paper_authors: Aizhan Akhmetzhanova, Siddharth Mishra-Sharma, Cora Dvorkin
  • for: 这种方法可以快速 SUMMARIZE 大量数据,以便进行下游任务。
  • methods: 这种方法使用 simulation-based 自我超vised 机器学习,通过创建代表性的摘要来快速 SUMMARIZE 大量数据。
  • results: 这种方法可以提供高度信息含量的摘要,可以用于准确地推断参数。 Additionally, the method is insensitive to prescribed systematic effects, such as the influence of baryonic physics.
    Abstract The influx of massive amounts of data from current and upcoming cosmological surveys necessitates compression schemes that can efficiently summarize the data with minimal loss of information. We introduce a method that leverages the paradigm of self-supervised machine learning in a novel manner to construct representative summaries of massive datasets using simulation-based augmentations. Deploying the method on hydrodynamical cosmological simulations, we show that it can deliver highly informative summaries, which can be used for a variety of downstream tasks, including precise and accurate parameter inference. We demonstrate how this paradigm can be used to construct summary representations that are insensitive to prescribed systematic effects, such as the influence of baryonic physics. Our results indicate that self-supervised machine learning techniques offer a promising new approach for compression of cosmological data as well its analysis.
    摘要 “ cosmological 调查中的大量数据涌入导致了各种压缩方法的实现,以确保仅对数据进行最小的损失。我们介绍了一种方法,利用自动机器学习的思想,在实验增强的情况下,创建大规模数据的代表SUMMARY。在液体动力学 cosmological 实验中执行了这种方法,发现它可以提供高度有用的SUMMARY,可以用于多种下游任务,包括精确的参数推断。我们显示了这种思想可以用来建构对系统性效应不敏感的SUMMARY表示,例如关于生物物理的影响。我们的结果显示自动机器学习技术可以对 cosmological 数据实现高效的压缩和分析。”Note: Please note that the translation is in Simplified Chinese, and the word order and sentence structure may be different from the original text.

Robust Monocular Depth Estimation under Challenging Conditions

  • paper_url: http://arxiv.org/abs/2308.09711
  • repo_url: https://github.com/md4all/md4all
  • paper_authors: Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari
  • for: 提高单目深度估计的可靠性,并在不同的环境和学习环境下实现可靠性。
  • methods: 基于现有方法的自我或全自动超vision,通过生成复杂样本并在原始图像上计算标准损失来训练模型。
  • results: 在nuScenes和Oxford RobotCar等两个公共数据集上进行了广泛的实验,并覆盖了先前的工作,在标准和挑战性条件下表现出色。
    Abstract While state-of-the-art monocular depth estimation approaches achieve impressive results in ideal settings, they are highly unreliable under challenging illumination and weather conditions, such as at nighttime or in the presence of rain. In this paper, we uncover these safety-critical issues and tackle them with md4all: a simple and effective solution that works reliably under both adverse and ideal conditions, as well as for different types of learning supervision. We achieve this by exploiting the efficacy of existing methods under perfect settings. Therefore, we provide valid training signals independently of what is in the input. First, we generate a set of complex samples corresponding to the normal training ones. Then, we train the model by guiding its self- or full-supervision by feeding the generated samples and computing the standard losses on the corresponding original images. Doing so enables a single model to recover information across diverse conditions without modifications at inference time. Extensive experiments on two challenging public datasets, namely nuScenes and Oxford RobotCar, demonstrate the effectiveness of our techniques, outperforming prior works by a large margin in both standard and challenging conditions. Source code and data are available at: https://md4all.github.io.
    摘要 当前最前沿单目深度估计方法在理想情况下可以 achieve impressive results,但在具有挑战性的照明和天气条件下,其可靠性却受到了挑战。在这篇论文中,我们揭示了这些安全关键问题,并通过md4all:一个简单而有效的解决方案,可以在不同的条件下工作,包括不良和理想的情况,以及不同类型的学习监督。我们通过利用现有方法在完美情况下的效果,实现了这一点。因此,我们可以在训练时提供有效的训练信号,不abhängig于输入中的内容。首先,我们生成了一组复杂的样本,与传统的训练样本相对应。然后,我们通过引导自我或全自监督,将生成的样本和对应的原始图像之间的标准损失进行训练。这样做的核心思想是,让模型在不同的条件下可以在推理时恢复信息。我们在 nuScenes 和 Oxford RobotCar 两个公共数据集上进行了广泛的实验,并证明了我们的技术的有效性,与之前的工作相比,在标准和挑战性的情况下都有大幅度的提高。软件代码和数据可以在 上获取。

Neural-network quantum state study of the long-range antiferromagnetic Ising chain

  • paper_url: http://arxiv.org/abs/2308.09709
  • repo_url: None
  • paper_authors: Jicheol Kim, Dongkyu Kim, Dong-Hee Kim
  • for: investigate quantum phase transitions in the transverse field Ising chain with algebraically decaying long-range antiferromagnetic interactions
  • methods: using the variational Monte Carlo method with the restricted Boltzmann machine as a trial wave function ansatz
  • results: the central charge deviates from 1/2 at a small decay exponent $\alpha_\mathrm{LR}$, and the threshold of the Ising universality and the conformal symmetry is estimated to be in the range of $2 \lesssim \alpha_\mathrm{LR} < 3$.
    Abstract We investigate quantum phase transitions in the transverse field Ising chain with algebraically decaying long-range antiferromagnetic interactions by using the variational Monte Carlo method with the restricted Boltzmann machine being employed as a trial wave function ansatz. In the finite-size scaling analysis with the order parameter and the second R\'enyi entropy, we find that the central charge deviates from 1/2 at a small decay exponent $\alpha_\mathrm{LR}$ in contrast to the critical exponents staying very close to the short-range (SR) Ising values regardless of $\alpha_\mathrm{LR}$ examined, supporting the previously proposed scenario of conformal invariance breakdown. To identify the threshold of the Ising universality and the conformal symmetry, we perform two additional tests for the universal Binder ratio and the conformal field theory (CFT) description of the correlation function. It turns out that both indicate a noticeable deviation from the SR Ising class at $\alpha_\mathrm{LR} < 2$. However, a closer look at the scaled correlation function for $\alpha_\mathrm{LR} \ge 2$ shows a gradual change from the asymptotic line of the CFT verified at $\alpha_\mathrm{LR} = 3$, providing a rough estimate of the threshold being in the range of $2 \lesssim \alpha_\mathrm{LR} < 3$.
    摘要 我们调查量子阶段转变在横向离散链磁铁中,使用统计力学 Monte Carlo 方法,将Restricted Boltzmann Machine作为实验波函数拟合。在 finite-size 拓展分析中,我们发现在小数字 $\alpha_\text{LR}$ 下,中心 CHARGE 偏离 1/2,与短距离铁质值不同,支持之前提出的对称性破坏enario。为了识别阶段对称性和对称性破坏的阈值,我们进行了两个额外的测试:一是通用的 Binder 率,二是对称场论 (CFT) 的描述。结果显示,这两个测试都显示在 $\alpha_\text{LR} < 2$ 的情况下,有明显的偏离短距离铁质值。然而,在 $\alpha_\text{LR} \ge 2$ 的情况下,随着测量尺度的增加,扩展函数的演化逐渐变得更加类似 CFT 预测的 asymptotic 线,提供了一个粗略的估计阈值在 $2 \lesssim \alpha_\text{LR} < 3$ 之间。

Do you know what q-means?

  • paper_url: http://arxiv.org/abs/2308.09701
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: João F. Doriguello, Alessandro Luongo, Ewin Tang
  • For: 本文提出了一种改进版本的 “$q$-means” 算法,用于实现 Approximate $k$-means clustering。* Methods: 本算法使用 QRAM 来准备和测量简单的状态,而不需要使用先前的量子线性代数 primitives。* Results: 本算法的时间复杂度为 $O\big(\frac{k^{2}{\varepsilon^2}(\sqrt{k}d + \log(Nd))\big)$,与先前的算法相比,它维持了对 $N$ 的多阶幂度依赖,并改进了大多数其他参数的依赖。 In addition, a “dequantized” classical algorithm for $\varepsilon$-$k$-means is also presented, which runs in $O\big(\frac{k^{2}{\varepsilon^2}(kd + \log(Nd))\big)$ time and matches the polylogarithmic dependence on $N$ attained by the quantum algorithms.
    Abstract Clustering is one of the most important tools for analysis of large datasets, and perhaps the most popular clustering algorithm is Lloyd's iteration for $k$-means. This iteration takes $N$ vectors $v_1,\dots,v_N\in\mathbb{R}^d$ and outputs $k$ centroids $c_1,\dots,c_k\in\mathbb{R}^d$; these partition the vectors into clusters based on which centroid is closest to a particular vector. We present an overall improved version of the "$q$-means" algorithm, the quantum algorithm originally proposed by Kerenidis, Landman, Luongo, and Prakash (2019) which performs $\varepsilon$-$k$-means, an approximate version of $k$-means clustering. This algorithm does not rely on the quantum linear algebra primitives of prior work, instead only using its QRAM to prepare and measure simple states based on the current iteration's clusters. The time complexity is $O\big(\frac{k^{2}{\varepsilon^2}(\sqrt{k}d + \log(Nd))\big)$ and maintains the polylogarithmic dependence on $N$ while improving the dependence on most of the other parameters. We also present a "dequantized" algorithm for $\varepsilon$-$k$-means which runs in $O\big(\frac{k^{2}{\varepsilon^2}(kd + \log(Nd))\big)$ time. Notably, this classical algorithm matches the polylogarithmic dependence on $N$ attained by the quantum algorithms.
    摘要 “集群是大规模数据分析中最重要的工具之一,而最受欢迎的集群算法之一是戴尔斯的$k$-means迭代。这个迭代接受$N$个$v_1,\ldots,v_N\in\mathbb{R}^d$的向量,并输出$k$个中心点$c_1,\ldots,c_k\in\mathbb{R}^d$;这些中心点将向量 partition 成 clusters 基于哪个中心点最近于特定向量。我们提出了一个全面改进的 "$q$-means" 算法,即 kerenedis 等人(2019)提出的量子算法,实现 $\varepsilon$-$k$-means,一种 Approximate 版本的 $k$-means 集群算法。这个算法不依赖于量子线性代数基本操作,而只使用其 QRAM 准备和测量简单状态,基于当前迭代的 clusters。时间复杂度为 $O\big(\frac{k^{2}{\varepsilon^2}(\sqrt{k}d + \log(Nd))\big)$,保持了对 $N$ 的多项式依赖,而改善了大多数其他参数的依赖。我们还提出了一个 "dequantized" 算法 для $\varepsilon$-$k$-means,运行时间为 $O\big(\frac{k^{2}{\varepsilon^2}(kd + \log(Nd))\big)$,并且 notable 地与量子算法的多项式依赖相符。”

A Lightweight Transformer for Faster and Robust EBSD Data Collection

  • paper_url: http://arxiv.org/abs/2308.09693
  • repo_url: https://github.com/hdong920/ebsd_slice_recovery
  • paper_authors: Harry Dong, Sean Donegan, Megna Shah, Yuejie Chi
  • for: 提高三维电子背托干涉diffraction(EBSD)微scopy数据质量
  • methods: 使用转换器模型和投影算法进行数据处理和恢复
  • results: 比existings方法更高的数据恢复精度
    Abstract Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures that have made breakthroughs in a plethora of domains, for data processing and recovery. To be more robust to errors and accelerate this 3D EBSD data collection, we introduce a two step method that recovers missing slices in an 3D EBSD volume, using an efficient transformer model and a projection algorithm to process the transformer's outputs. Overcoming the computational and practical hurdles of deep learning with scarce high dimensional data, we train this model using only synthetic 3D EBSD data with self-supervision and obtain superior recovery accuracy on real 3D EBSD data, compared to existing methods.
    摘要 三维电子反射干扰diffraction(EBSD)顾问是物理科学中多种应用的关键工具,但它的数据质量可能会在收集过程中呈现大幅波动,特别是通过串行分割。幸运的是,3D EBSD数据是顺序的,这为使用变换器,现代深度学习架构,提供了机会。为了更加鲁棒地处理和加速3D EBSD数据收集,我们介绍了一种两步方法,使用高效的变换器模型和投影算法来处理变换器的输出。通过超越深度学习中的计算和实践障碍,我们使用只有自我超vision的synthetic 3D EBSD数据进行训练,并在实际3D EBSD数据上获得了比现有方法更高的恢复精度。

Reduced Order Modeling of a MOOSE-based Advanced Manufacturing Model with Operator Learning

  • paper_url: http://arxiv.org/abs/2308.09691
  • repo_url: None
  • paper_authors: Mahmoud Yaseen, Dewen Yushu, Peter German, Xu Wu
  • for: 本研究旨在开发一种高精度但运行速度快的减少级模型(ROM),用于在深度再强化学习(DRL)控制和优化方法中使用。
  • methods: 本研究使用了运算学(OL)基于方法,可以学习变量的家族方程。在这种情况下,我们使用了傅ри欧姆算法来构建OL-based ROM。
  • results: 我们对比了OL-based ROM和深度神经网络基于ROM的性能,发现OL-based ROM的性能更高,运行速度更快。
    Abstract Advanced Manufacturing (AM) has gained significant interest in the nuclear community for its potential application on nuclear materials. One challenge is to obtain desired material properties via controlling the manufacturing process during runtime. Intelligent AM based on deep reinforcement learning (DRL) relies on an automated process-level control mechanism to generate optimal design variables and adaptive system settings for improved end-product properties. A high-fidelity thermo-mechanical model for direct energy deposition has recently been developed within the MOOSE framework at the Idaho National Laboratory (INL). The goal of this work is to develop an accurate and fast-running reduced order model (ROM) for this MOOSE-based AM model that can be used in a DRL-based process control and optimization method. Operator learning (OL)-based methods will be employed due to their capability to learn a family of differential equations, in this work, produced by changing process variables in the Gaussian point heat source for the laser. We will develop OL-based ROM using Fourier neural operator, and perform a benchmark comparison of its performance with a conventional deep neural network-based ROM.
    摘要 高级生产技术(高级生产)在核心社区引起了广泛的关注,因为它可能用于核材料的制造。一个挑战是通过控制生产过程中的runtime来获得所需的材料性能。基于深度强化学习(DRL)的智能高级生产通过自动化过程级别控制机制来生成优化的设计变量和适应系统设置,以提高终产品的性能。在美国伊达荷大学(INL)的MOOSE框架中,最近已经开发了一个高精度热力学-机械模型,用于直接能量沟入。该工作的目标是开发一个准确快速的减少阶模型(ROM),用于这个MOOSE基于AM模型的DRL控制和优化方法。我们将使用运算学(OL)基本的方法,因为它可以学习一个家族的微分方程,在这里,由变量的改变生成的 Gaussian 点热源中的激光。我们将开发 OL 基本的 ROM 使用 Fourier 神经网络,并对其性能与一个传统的深度神经网络基本的 ROM 进行比较。

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

  • paper_url: http://arxiv.org/abs/2308.09687
  • repo_url: https://github.com/spcl/graph-of-thoughts
  • paper_authors: Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Michal Podstawski, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler
  • for: 提高大语言模型(LLM)的提问能力,超过链条思维或树思维(ToT)的限制。
  • methods: 利用Graph of Thoughts(GoT)框架,将LLM生成的信息模型为一个任意图,其中单元为“LLM思维”,弧线表示思维之间的依赖关系。这种方法可以将LLM思维合并成 synergistic 结果,提炼整个思维网络的核心,或通过反馈循环进行思维提升。
  • results: GoT 可以在不同任务上提供优化,例如在排序任务上提高质量62%,同时降低成本>31%。此外,GoT 可以扩展新的思维转换,因此可以用于开拓新的提问方案。这项工作使得 LLM 的思维更加接近人类思维或大脑机制,如回忆、复杂网络等。
    Abstract We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by >31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.
    摘要 我们介绍Graph of Thoughts(GoT)框架,这种框架可以在大型语言模型(LLM)中提高提示能力,超越链条思维和树思维(ToT)的限制。GoT的关键思想和主要优势在于将LLM生成的信息视为一个可变图形,其中单元为“LLM思维”,而边表示这些单元之间的依赖关系。这种方法允许将任意LLM思维结合成 synergistic 结果,浓缩整个网络思维的核心,或者通过反馈循环进行思维增强。我们示出GoT在不同任务上具有优于现状的优势,例如比ToT提高排序质量62%,同时降低成本>31%。我们还证明GoT可扩展新的思维转换,因此可以用来开拓新的提示方案。这项工作使得LLM的思维更加接近人类思维或脑机制,如回忆、循环等,这些机制都形成复杂的网络。

Audiovisual Moments in Time: A Large-Scale Annotated Dataset of Audiovisual Actions

  • paper_url: http://arxiv.org/abs/2308.09685
  • repo_url: https://github.com/mjoannou/audiovisual-moments-in-time
  • paper_authors: Michael Joannou, Pia Rotshtein, Uta Noppeney
  • for: 这个论文主要目的是提供一个大规模的audiovisual动作事件集(AVMIT),以便用于计算机模型和人类参与者的研究。
  • methods: 论文使用了一个大规模的annotation任务,从MIT数据集中选择3秒的audiovisual视频,并让11名参与者标注每个试验 whether the labelled audiovisual动作事件是存在的,以及这个事件是视频中最显著的特征。
  • results: 论文提供了57,177个audiovisual视频的标注,每个视频由3名参与者独立地评估。在这个初始收集中,论文创建了一个精心测试集,包括16种动作类别,每个类别有60个视频(共960个视频)。论文还提供了2个预计算的audiovisual特征嵌入,使用VGGish/YamNet和VGG16/EfficientNetB0来降低audiovisual DNN研究的入门难度。
    Abstract We present Audiovisual Moments in Time (AVMIT), a large-scale dataset of audiovisual action events. In an extensive annotation task 11 participants labelled a subset of 3-second audiovisual videos from the Moments in Time dataset (MIT). For each trial, participants assessed whether the labelled audiovisual action event was present and whether it was the most prominent feature of the video. The dataset includes the annotation of 57,177 audiovisual videos, each independently evaluated by 3 of 11 trained participants. From this initial collection, we created a curated test set of 16 distinct action classes, with 60 videos each (960 videos). We also offer 2 sets of pre-computed audiovisual feature embeddings, using VGGish/YamNet for audio data and VGG16/EfficientNetB0 for visual data, thereby lowering the barrier to entry for audiovisual DNN research. We explored the advantages of AVMIT annotations and feature embeddings to improve performance on audiovisual event recognition. A series of 6 Recurrent Neural Networks (RNNs) were trained on either AVMIT-filtered audiovisual events or modality-agnostic events from MIT, and then tested on our audiovisual test set. In all RNNs, top 1 accuracy was increased by 2.71-5.94\% by training exclusively on audiovisual events, even outweighing a three-fold increase in training data. We anticipate that the newly annotated AVMIT dataset will serve as a valuable resource for research and comparative experiments involving computational models and human participants, specifically when addressing research questions where audiovisual correspondence is of critical importance.
    摘要 我们介绍了听视频时刻事件(AVMIT)数据集,这是一个大规模的听视频动作事件数据集。在一项广泛的标注任务中,11名参与者标注了MIT数据集中的3秒听视频示例的一部分。每个试验中,参与者评估了听视频动作事件是否存在,以及它是视频中最为出色的特征。该数据集包括57,177个听视频示例,每个示例由3名训练过的参与者独立地评估。从这个初始收集中,我们创建了一个精心选择的测试集,包括16种不同的动作类别,每个类别有60个视频示例(共960个视频示例)。我们还提供了两个预计算的听视频特征嵌入,使用VGGish/YamNet对音频数据进行预处理,并使用VGG16/EfficientNetB0对视频数据进行预处理,从而降低了听视频DNN研究的门槛。我们评估了AVMIT标注和特征嵌入的优势,以提高听视频事件认识性能。我们使用6个回归神经网络(RNN)在AVMIT filtered audiovisual事件或MIT模态不同的事件上训练,然后在我们的听视频测试集上进行测试。在所有RNN中,只训练在 audiovisual事件上的总准确率提高了2.71-5.94%,甚至超过了三倍增加的训练数据量。我们预计这些新的AVMIT标注数据集将成为未来研究和比较实验中的重要资源,特别是在研究问题中,听视频匹配的重要性很高。

Variational optimization of the amplitude of neural-network quantum many-body ground states

  • paper_url: http://arxiv.org/abs/2308.09664
  • repo_url: None
  • paper_authors: Jia-Qi Wang, Rong-Qiang He, Zhong-Yi Lu
  • for: 这篇论文旨在探讨基于神经网络的量子多体ground state搜索方法,并对其进行优化。
  • methods: 该方法将量子多体变量波函数分解为一个实值神经网络和一个固定的符号结构,然后优化神经网络。神经网络使用了卷积层和径向层,即ResNet。
  • results: 该方法在三个典型量子多体系统上进行测试,并与传统的变量 Monte Carlo(VMC)和密度矩阵约束 груп(DMRG)方法进行比较。结果显示,对于受挫的Heisenberg $J_1$-$J_2$模型,该方法的结果更好于文献中的复杂值神经网络,表明了 sign structure的优化是困难的。将在未来研究 sign structure的优化。
    Abstract Neural-network quantum states (NQSs), variationally optimized by combining traditional methods and deep learning techniques, is a new way to find quantum many-body ground states and gradually becomes a competitor of traditional variational methods. However, there are still some difficulties in the optimization of NQSs, such as local minima, slow convergence, and sign structure optimization. Here, we split a quantum many-body variational wave function into a multiplication of a real-valued amplitude neural network and a sign structure, and focus on the optimization of the amplitude network while keeping the sign structure fixed. The amplitude network is a convolutional neural network (CNN) with residual blocks, namely a ResNet. Our method is tested on three typical quantum many-body systems. The obtained ground state energies are lower than or comparable to those from traditional variational Monte Carlo (VMC) methods and density matrix renormalization group (DMRG). Surprisingly, for the frustrated Heisenberg $J_1$-$J_2$ model, our results are better than those of the complex-valued CNN in the literature, implying that the sign structure of the complex-valued NQS is difficult to be optimized. We will study the optimization of the sign structure of NQSs in the future.
    摘要 (Note: The text has been translated into Simplified Chinese, but the word order and sentence structure may be different from the original text.)

GiGaMAE: Generalizable Graph Masked Autoencoder via Collaborative Latent Space Reconstruction

  • paper_url: http://arxiv.org/abs/2308.09663
  • repo_url: https://github.com/sycny/gigamae
  • paper_authors: Yucheng Shi, Yushun Dong, Qiaoyu Tan, Jundong Li, Ninghao Liu
  • for: 这篇论文的目的是提出一个基于自动encoder的自愿式学习框架,以解决现有masked autoencoder模型在graph数据上的普遍化能力不足问题。
  • methods: 这篇论文提出了一个名为GiGaMAE的新型graph masked autoencoder框架,不同于现有的masked autoencoder模型,这里的模型不是直接从原始graph中重建graph的component(例如特征或边),而是将graph的topology和 attribute信息视为重建目标,以capture更广泛和全面的知识。此外, authors也引入了一个基于mutual information的重建损失函数,这使得模型能够有效地重建多个目标。
  • results: 在七个标准 benchmark上进行了广泛的实验,结果显示了 GiGaMAE 在三个下游任务上的超越性。 authors hope 这些结果能够照明 foundation models 的设计在graph-structured data上。
    Abstract Self-supervised learning with masked autoencoders has recently gained popularity for its ability to produce effective image or textual representations, which can be applied to various downstream tasks without retraining. However, we observe that the current masked autoencoder models lack good generalization ability on graph data. To tackle this issue, we propose a novel graph masked autoencoder framework called GiGaMAE. Different from existing masked autoencoders that learn node presentations by explicitly reconstructing the original graph components (e.g., features or edges), in this paper, we propose to collaboratively reconstruct informative and integrated latent embeddings. By considering embeddings encompassing graph topology and attribute information as reconstruction targets, our model could capture more generalized and comprehensive knowledge. Furthermore, we introduce a mutual information based reconstruction loss that enables the effective reconstruction of multiple targets. This learning objective allows us to differentiate between the exclusive knowledge learned from a single target and common knowledge shared by multiple targets. We evaluate our method on three downstream tasks with seven datasets as benchmarks. Extensive experiments demonstrate the superiority of GiGaMAE against state-of-the-art baselines. We hope our results will shed light on the design of foundation models on graph-structured data. Our code is available at: https://github.com/sycny/GiGaMAE.
    摘要 自我超级学习中的假设掩码自动机(Masked Autoencoder,MAE)在图像或文本表示方面具有出色的表达能力,可以应用于多个下游任务无需重新训练。然而,我们发现当前的假设掩码模型对图数据的泛化能力不足。为解决这个问题,我们提出了一种新的图Masked Autoencoder框架,即GiGaMAE。与现有的假设掩码模型不同,我们在这篇论文中提议通过同时重建图像的整体特征和属性信息来学习图像的表示。这使得我们的模型能够捕捉更加普遍和全面的知识。此外,我们引入了基于缺失信息的重建损失函数,这使得我们的模型能够有效地重建多个目标。我们在三个下游任务上进行了七个数据集的测试,并证明了GiGaMAE在比基eline上表现出色。我们希望我们的结果能够为图数据基础模型的设计提供指导。我们的代码可以在https://github.com/sycny/GiGaMAE中找到。

Robust Uncertainty Quantification using Conformalised Monte Carlo Prediction

  • paper_url: http://arxiv.org/abs/2308.09647
  • repo_url: https://github.com/team-daniel/mc-cp
  • paper_authors: Daniel Bethell, Simos Gerasimou, Radu Calinescu
  • for: 这 paper 是为了提供一种可靠的深度学习模型部署方法,以满足安全关键应用的需求。
  • methods: 这 paper 使用了一种新的 MC 采样 dropout 方法,以及一种基于 CP 的抗随机变量预测方法。这两种方法在运行时可以相互协作,以提高预测的可靠性和精度。
  • results: 通过对多种分类和回归 benchmark 进行了广泛的实验,这 paper 显示了 MC-CP 方法在对 uncertainty quantification 方面的显著改进,比如 MC dropout、RAPS 和 CQR 等先进方法。MC-CP 方法可以轻松地与现有模型结合使用,从而使其的部署变得更加简单。
    Abstract Deploying deep learning models in safety-critical applications remains a very challenging task, mandating the provision of assurances for the dependable operation of these models. Uncertainty quantification (UQ) methods estimate the model's confidence per prediction, informing decision-making by considering the effect of randomness and model misspecification. Despite the advances of state-of-the-art UQ methods, they are computationally expensive or produce conservative prediction sets/intervals. We introduce MC-CP, a novel hybrid UQ method that combines a new adaptive Monte Carlo (MC) dropout method with conformal prediction (CP). MC-CP adaptively modulates the traditional MC dropout at runtime to save memory and computation resources, enabling predictions to be consumed by CP, yielding robust prediction sets/intervals. Throughout comprehensive experiments, we show that MC-CP delivers significant improvements over advanced UQ methods, like MC dropout, RAPS and CQR, both in classification and regression benchmarks. MC-CP can be easily added to existing models, making its deployment simple.
    摘要 部署深度学习模型在安全关键应用中仍然是一项非常具有挑战性的任务,需要提供对模型可靠性的保证。不确定量评估(UQ)方法估算模型每次预测的可靠程度,以帮助决策,考虑随机性和模型误差的影响。 despite state-of-the-art UQ 方法的进步,它们可能 computationally expensive 或生成保守的预测集/interval。我们介绍 MC-CP,一种新的hybrid UQ 方法,将新的适应MC dropout 方法与 confirmal prediction 结合。 MC-CP 在运行时动态调整传统MC dropout,以避免占用内存和计算资源,使预测可以被CP 处理,生成 robust预测集/interval。经过了广泛的实验,我们表明 MC-CP 在分类和回归 benchmark 中具有显著的改进,比如 MC dropout、RAPS 和 CQR。 MC-CP 可以轻松地添加到现有模型中,使其的部署变得简单。

biquality-learn: a Python library for Biquality Learning

  • paper_url: http://arxiv.org/abs/2308.09643
  • repo_url: https://github.com/biquality-learn/biquality-learn
  • paper_authors: Pierre Nodet, Vincent Lemaire, Alexis Bondu, Antoine Cornuéjols
  • For: The paper aims to address the challenges of weak supervision and dataset shifts in machine learning, and proposes a new framework called Biquality Learning.* Methods: The paper proposes a Python library called biquality-learn, which provides a consistent and intuitive API for learning machine learning models from biquality data. The library includes well-proven algorithms and is designed to be accessible and easy to use for everyone.* Results: The paper enables researchers to experiment in a reproducible way on biquality data, and demonstrates the effectiveness of the proposed framework through experiments on several benchmark datasets.
    Abstract The democratization of Data Mining has been widely successful thanks in part to powerful and easy-to-use Machine Learning libraries. These libraries have been particularly tailored to tackle Supervised Learning. However, strong supervision signals are scarce in practice, and practitioners must resort to weak supervision. In addition to weaknesses of supervision, dataset shifts are another kind of phenomenon that occurs when deploying machine learning models in the real world. That is why Biquality Learning has been proposed as a machine learning framework to design algorithms capable of handling multiple weaknesses of supervision and dataset shifts without assumptions on their nature and level by relying on the availability of a small trusted dataset composed of cleanly labeled and representative samples. Thus we propose biquality-learn: a Python library for Biquality Learning with an intuitive and consistent API to learn machine learning models from biquality data, with well-proven algorithms, accessible and easy to use for everyone, and enabling researchers to experiment in a reproducible way on biquality data.
    摘要 “数据挖掘的民主化得到了广泛的成功,很大的帮助来自于强大且易用的机器学习库。这些库主要针对超级vised学习。然而,实际中强制督学信号强度很弱,实际operator需要采用弱督学。此外,在机器学习模型实际部署时,数据变化也是一种常见的现象。为此,我们提出了一种机器学习框架---多质量学习(Biquality Learning),旨在设计能够处理多种弱督学和数据变化的算法,不假设它们的性质和水平。我们提出了biquality-learn:一个Python库,用于多质量学习,具有直观和一致的API,可以从多质量数据中学习机器学习模型,具有证明过的算法、易于使用、可重复地进行研究。”