cs.LG - 2023-08-17

Enhancing API Documentation through BERTopic Modeling and Summarization

  • paper_url: http://arxiv.org/abs/2308.09070
  • repo_url: https://github.com/scam2023-bert/bertopic
  • paper_authors: AmirHossein Naghshzan, Sylvie Ratte
  • for: 本研究旨在提高API文档理解效率,帮助开发者更好地利用API功能。
  • methods: 本研究使用BERTopic进行主题分析和自然语言处理(NLP)技术,自动生成API文档摘要,从而提高开发者获取信息的效率。
  • results: 研究发现了一些常见的主题和问题,并提供了可能的解决方案,对API文档分析和开发者工作提供了valuable的启示和实践性的工具。
    Abstract As the amount of textual data in various fields, including software development, continues to grow, there is a pressing demand for efficient and effective extraction and presentation of meaningful insights. This paper presents a unique approach to address this need, focusing on the complexities of interpreting Application Programming Interface (API) documentation. While official API documentation serves as a primary source of information for developers, it can often be extensive and lacks user-friendliness. In light of this, developers frequently resort to unofficial sources like Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The produced summaries and topics are evaluated based on their performance, coherence, and interoperability. The findings of this research contribute to the field of API documentation analysis by providing insights into recurring topics, identifying common issues, and generating potential solutions. By improving the accessibility and efficiency of API documentation comprehension, our work aims to enhance the software development process and empower developers with practical tools for navigating complex APIs.
    摘要 随着不同领域中文本数据的增加,包括软件开发,存在一项强烈的需求,即提取和显示有用的洞察结论。本文提出了一种独特的方法来解决这个问题,即利用API文档的复杂性进行解释。官方API文档 serves as the primary source of information for developers, but it can be extensive and difficult to use. Therefore, developers often resort to unofficial sources such as Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The generated summaries and topics are evaluated based on their performance, coherence, and interoperability. 本研究的发现对API文档分析领域做出了贡献,提供了复杂的主题、常见问题和可能的解决方案的洞察。通过改善API文档的可访问性和效率,我们的工作希望能够提高软件开发过程中的开发人员 navigating 复杂的API。

Uplift Modeling: from Causal Inference to Personalization

  • paper_url: http://arxiv.org/abs/2308.09066
  • repo_url: None
  • paper_authors: Felipe Moraes, Hugo Manuel Proença, Anastasiia Kornilova, Javier Albert, Dmitri Goldenberg
  • for: 本教程旨在介绍 causality 和 uplift 模型,以便在线电商平台上进行个性化推广活动。
  • methods: 本教程会介绍 state-of-the-art 的 uplift 模型技术,包括不同的优点和局限性。
  • results: 本教程会介绍实际应用情况,以及在生产环境中实施这些模型时可能会遇到的挑战。
    Abstract Uplift modeling is a collection of machine learning techniques for estimating causal effects of a treatment at the individual or subgroup levels. Over the last years, causality and uplift modeling have become key trends in personalization at online e-commerce platforms, enabling the selection of the best treatment for each user in order to maximize the target business metric. Uplift modeling can be particularly useful for personalized promotional campaigns, where the potential benefit caused by a promotion needs to be weighed against the potential costs. In this tutorial we will cover basic concepts of causality and introduce the audience to state-of-the-art techniques in uplift modeling. We will discuss the advantages and the limitations of different approaches and dive into the unique setup of constrained uplift modeling. Finally, we will present real-life applications and discuss challenges in implementing these models in production.
    摘要 “增强模型是一种Machine Learning技术集成,用于估计对各个个体或子组的影响。在过去几年, causality和增强模型在在线电商平台上成为了个性化的潮流,帮助选择对每个用户最佳的处理,以最大化目标业务指标。增强模型特别有用于个性化促销活动,因为推广的潜在利益需要与潜在成本进行平衡。在这个教程中,我们将讲解 causality 的基本概念,并介绍现代的增强模型技术。我们会讲述不同方法的优点和限制,并探讨受限增强模型的特殊设置。最后,我们会介绍实际应用和在生产中实施这些模型的挑战。”

Discretization-Induced Dirichlet Posterior for Robust Uncertainty Quantification on Regression

  • paper_url: http://arxiv.org/abs/2308.09065
  • repo_url: None
  • paper_authors: Xuanlong Yu, Gianni Franchi, Jindong Gu, Emanuel Aldea
    for: 这个论文的目的是提出一种更加Robust的auxiliary uncertainty estimator(AuxUE)来估计深度神经网络(DNNs)的不确定性,以便在实际应用中使用。methods: 该论文使用了不同的分布假设来估计异常输入(Out-of-Distribution,OOD)中的aleatoric uncertainty,并最终选择了拉пла斯分布来近似预测错误。此外,该论文还提出了一种新的epistemic uncertainty estimation方法 named Discretization-Induced Dirichlet pOsterior(DIDO),该方法模型了预测错误的粒度化 posterior。results: 该论文的实验结果表明,其提出的方法可以在各种视觉任务上提供Robust的不确定性估计,包括年龄估计、单目深度估计和超分辨任务。此外,该论文还证明了其方法可以扩展到图像级和像素级任务。
    Abstract Uncertainty quantification is critical for deploying deep neural networks (DNNs) in real-world applications. An Auxiliary Uncertainty Estimator (AuxUE) is one of the most effective means to estimate the uncertainty of the main task prediction without modifying the main task model. To be considered robust, an AuxUE must be capable of maintaining its performance and triggering higher uncertainties while encountering Out-of-Distribution (OOD) inputs, i.e., to provide robust aleatoric and epistemic uncertainty. However, for vision regression tasks, current AuxUE designs are mainly adopted for aleatoric uncertainty estimates, and AuxUE robustness has not been explored. In this work, we propose a generalized AuxUE scheme for more robust uncertainty quantification on regression tasks. Concretely, to achieve a more robust aleatoric uncertainty estimation, different distribution assumptions are considered for heteroscedastic noise, and Laplace distribution is finally chosen to approximate the prediction error. For epistemic uncertainty, we propose a novel solution named Discretization-Induced Dirichlet pOsterior (DIDO), which models the Dirichlet posterior on the discretized prediction error. Extensive experiments on age estimation, monocular depth estimation, and super-resolution tasks show that our proposed method can provide robust uncertainty estimates in the face of noisy inputs and that it can be scalable to both image-level and pixel-wise tasks.
    摘要 深度神经网络(DNN)在实际应用中部署需要量化不确定性。协助性不确定估计器(AuxUE)是修改主任务模型的最佳方法来估计主任务预测结果的不确定性。为了被视为可靠,一个 AuxUE 必须能够保持其性能,并在异常输入(OOD)中触发更高的不确定性。然而,目前的 AuxUE 设计主要用于 aleatoric 不确定性估计,AuxUE 的 Robustness 尚未得到探讨。在这种情况下,我们提出一种通用的 AuxUE 方案,以提高 regression 任务中的不确定性量化的可靠性。具体来说,为了实现更加robust的 aleatoric 不确定性估计,我们考虑了不同的分布假设,并最终选择了 Laplace 分布来近似预测错误。为 epistemic 不确定性,我们提出了一种新的解决方案,即 Discretization-Induced Dirichlet pOsterior(DIDO),它模型了精度 posterior 在精度化预测错误上。我们在年龄估计、单目深度估计和超分辨率任务上进行了广泛的实验,结果表明,我们的提出的方法可以在噪声输入下提供可靠的不确定性估计,并且可以扩展到图像级和像素级任务。

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

  • paper_url: http://arxiv.org/abs/2308.09051
  • repo_url: None
  • paper_authors: Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda
  • for: 这个研究是用来调查和改进现有的数据驱动式形式追踪器(DeepFormants)的方法。
  • methods: 这个研究使用了线性预测(LP)基于方法来估算形式,包括传统的covariance分析(LP-COV)和最近提出的 quasi-closed phase forward-backward(QCP-FB)分析。
  • results: 研究发现,使用LP基于方法来修正DeepFormants预测的形式可以提高追踪器的性能,并且在受到噪声损害的VTR语音追踪 task中表现更加稳定。此外,这种方法可以轻松地与现有的数据驱动式追踪器结合使用,不需要进行任何新的数据学习。
    Abstract In this study, formant tracking is investigated by refining the formants tracked by an existing data-driven tracker, DeepFormants, using the formants estimated in a model-driven manner by linear prediction (LP)-based methods. As LP-based formant estimation methods, conventional covariance analysis (LP-COV) and the recently proposed quasi-closed phase forward-backward (QCP-FB) analysis are used. In the proposed refinement approach, the contours of the three lowest formants are first predicted by the data-driven DeepFormants tracker, and the predicted formants are replaced frame-wise with local spectral peaks shown by the model-driven LP-based methods. The refinement procedure can be plugged into the DeepFormants tracker with no need for any new data learning. Two refined DeepFormants trackers were compared with the original DeepFormants and with five known traditional trackers using the popular vocal tract resonance (VTR) corpus. The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis. In addition, by tracking formants using VTR speech that was corrupted by additive noise, the study showed that the refined DeepFormants trackers were more resilient to noise than the reference trackers. In general, these results suggest that LP-based model-driven approaches, which have traditionally been used in formant estimation, can be combined with a modern data-driven tracker easily with no further training to improve the tracker's performance.
    摘要 在这项研究中,我们调查了使用现有的数据驱动跟踪器DeepFormants的形式跟踪结果,并使用LP(线性预测)方法所估计的形式来进行修订。我们使用了传统的covariance analysis(LP-COV)和最近提出的 quasi-closed phase forward-backward(QCP-FB)分析方法。在我们的修订方法中,首先使用DeepFormants tracker来预测三个最低的形式轨迹,然后将预测的形式替换为每帧的本地spectral peak,显示出LP-based方法所估计的形式。这种修订方法可以轻松地插入到DeepFormants tracker中,无需进行任何新的数据学习。我们对原始DeepFormants和五种已知传统跟踪器进行比较,结果表明了数据驱动DeepFormants trackers的性能高于传统跟踪器,而使用QCP-FB分析方法进行修订可以得到最佳性能。此外,通过使用受杂音损害的VTR语音跟踪,研究表明了修订后的DeepFormants trackers在噪音环境中的更高抗噪性。总之,这些结果表明了LP-based模型驱动方法可以轻松地与现有的数据驱动跟踪器结合使用,以提高跟踪器的性能。

Kernel-Based Tests for Likelihood-Free Hypothesis Testing

  • paper_url: http://arxiv.org/abs/2308.09043
  • repo_url: None
  • paper_authors: Patrik Róbert Gerber, Tianze Jiang, Yury Polyanskiy, Rui Sun
  • for: 这个论文主要关注的问题是如何使用有限数量的标注样本来预测另外一些样本的类别,具体来说是使用两个类别之间的混合样本来预测另外一个类别。
  • methods: 这篇论文使用了likelihood-ratio测试和maximum mean discrepancy(MMD)来解决这个问题。它还使用了权重参数化的神经网络来评估预测性能。
  • results: 论文发现了一个基本的负荷误差对m和n的负荷误差之间的交易,即增加数据样本m会减少训练/模拟数据样本n的需要。同时,论文还证明了这种交易的存在性,并通过两个实际问题(检测希格斯粒子和检测植入DDPM生成的图像)来验证其理论性。
    Abstract Given $n$ observations from two balanced classes, consider the task of labeling an additional $m$ inputs that are known to all belong to \emph{one} of the two classes. Special cases of this problem are well-known: with complete knowledge of class distributions ($n=\infty$) the problem is solved optimally by the likelihood-ratio test; when $m=1$ it corresponds to binary classification; and when $m\approx n$ it is equivalent to two-sample testing. The intermediate settings occur in the field of likelihood-free inference, where labeled samples are obtained by running forward simulations and the unlabeled sample is collected experimentally. In recent work it was discovered that there is a fundamental trade-off between $m$ and $n$: increasing the data sample $m$ reduces the amount $n$ of training/simulation data needed. In this work we (a) introduce a generalization where unlabeled samples come from a mixture of the two classes -- a case often encountered in practice; (b) study the minimax sample complexity for non-parametric classes of densities under \textit{maximum mean discrepancy} (MMD) separation; and (c) investigate the empirical performance of kernels parameterized by neural networks on two tasks: detection of the Higgs boson and detection of planted DDPM generated images amidst CIFAR-10 images. For both problems we confirm the existence of the theoretically predicted asymmetric $m$ vs $n$ trade-off.
    摘要 给定 $n$ 观察数据,二分类问题中的标注一个额外 $m$ 个输入,其中所有输入都属于一个类别。特殊情况包括:当 $n=\infty$ 时,通过 likelihood-ratio 测试可以优化地解决问题;当 $m=1$ 时,相当于二分类问题;当 $m\approx n$ 时,等价于两个样本测试。在中间设置中,在 likelihood-free 推理中获取标注样本,并将未标注样本收集到实验中。最新的研究发现,存在 $m$ 和 $n$ 之间的基本财富平衡:增加数据样本 $m$ 会降低训练/模拟数据需要的量 $n$。在这项工作中,我们(a)引入一种泛化,其中未标注样本来自两个类别的混合;(b)研究非Parametric 类型的概率密度下的最小最大误差(MMD)分离下的最小样本复杂度;以及(c)调查使用权重参数化神经网络的两个任务:探测希格斯粒子和探测植入DDPM生成的图像中的植入DDPM生成图像。对于两个问题,我们证实了理论预测的偏极 $m$ vs $n$ 财富平衡。

LesionMix: A Lesion-Level Data Augmentation Method for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2308.09026
  • repo_url: https://github.com/dogabasaran/lesionmix
  • paper_authors: Berke Doga Basaran, Weitong Zhang, Mengyun Qiao, Bernhard Kainz, Paul M. Matthews, Wenjia Bai
  • For: The paper is written for the purpose of proposing a novel data augmentation method called LesionMix, which is designed to improve the accuracy of deep learning-based medical image segmentation methods.* Methods: The paper uses a combination of spatial and intensity transformations to augment medical images, with a focus on lesion-aware augmentation at the lesion level. The method allows for both lesion populating and inpainting, and is evaluated on multiple modalities and lesion datasets.* Results: The paper reports promising performance of LesionMix in lesion image segmentation, outperforming several recent Mix-based data augmentation methods. The code for LesionMix will be released on GitHub for further use and evaluation.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为了提出一种新的数据增强方法,即LesionMix,用于深度学习基于医疗影像分割方法的准确性。* Methods: 这篇论文使用了一种组合的空间和强度变换来增强医疗影像,并将注意力集中在病变水平上进行数据增强。这种方法允许病变的填充和抹除,并在不同的modalities和病变数据集上进行评估。* Results: 这篇论文报告了LesionMix在病变图像分割中的良好表现,比较了许多最近的Mix基于数据增强方法。代码将在GitHub上发布,以便进一步使用和评估。
    Abstract Data augmentation has become a de facto component of deep learning-based medical image segmentation methods. Most data augmentation techniques used in medical imaging focus on spatial and intensity transformations to improve the diversity of training images. They are often designed at the image level, augmenting the full image, and do not pay attention to specific abnormalities within the image. Here, we present LesionMix, a novel and simple lesion-aware data augmentation method. It performs augmentation at the lesion level, increasing the diversity of lesion shape, location, intensity and load distribution, and allowing both lesion populating and inpainting. Experiments on different modalities and different lesion datasets, including four brain MR lesion datasets and one liver CT lesion dataset, demonstrate that LesionMix achieves promising performance in lesion image segmentation, outperforming several recent Mix-based data augmentation methods. The code will be released at https://github.com/dogabasaran/lesionmix.
    摘要 <>Translate the given text into Simplified Chinese.<>医学影像分割中使用深度学习的方法中,数据增强已成为一个非正式的组成部分。大多数医学影像数据增强技术都是针对空间和Intensity变换,以提高训练图像的多样性。它们通常是图像水平进行增强,不注重特定疾病内部的特征。我们现在提出了LesionMix,一种新的和简单的疾病意识的数据增强方法。它在疾病水平进行增强,提高疾病形态、位置、强度和负荷分布,并允许疾病填充和遮盖。经过不同Modalities和不同疾病数据集的测试,包括四个脑MR疾病数据集和一个肝CT疾病数据集,LesionMix在疾病图像分割中表现出了优秀的表现,比较其他最近的混合数据增强方法更高。代码将在https://github.com/dogabasaran/lesionmix上发布。

Reinforcement Learning for Battery Management in Dairy Farming

  • paper_url: http://arxiv.org/abs/2308.09023
  • repo_url: None
  • paper_authors: Nawazish Ali, Abdul Wahid, Rachael shaw, Karl Mason
  • For: 这个研究旨在适用人工智能(AI)于牛奶农业中的可再生能源应用,以降低电力成本并实现政府的能源和可持续发展目标。* Methods: 该研究使用Q学习算法学习可重复充电和充电牛奶农场中的策略。* Results: 研究发现,基于Q学习算法的策略可以效果地降低电力成本,与传统基准算法相比。这些结果表明Q学习算法在牛奶农业中的电池管理可以获得显著的效果。
    Abstract Dairy farming is a particularly energy-intensive part of the agriculture sector. Effective battery management is essential for renewable integration within the agriculture sector. However, controlling battery charging/discharging is a difficult task due to electricity demand variability, stochasticity of renewable generation, and energy price fluctuations. Despite the potential benefits of applying Artificial Intelligence (AI) to renewable energy in the context of dairy farming, there has been limited research in this area. This research is a priority for Ireland as it strives to meet its governmental goals in energy and sustainability. This research paper utilizes Q-learning to learn an effective policy for charging and discharging a battery within a dairy farm setting. The results demonstrate that the developed policy significantly reduces electricity costs compared to the established baseline algorithm. These findings highlight the effectiveness of reinforcement learning for battery management within the dairy farming sector.
    摘要 奶业是农业部分中特别占用能源的一部分。有效的电池管理是重要的,以实现农业部门中可再生能源的整合。然而,控制电池充电/充电是一项具有挑战性的任务,因为电力需求的变化、可再生能源的随机性和能源价格的波动。虽然在奶业中应用人工智能(AI)可以实现可再生能源的潜在利益,但是有限的研究在这个领域。这项研究利用Q学习学习一个有效的电池充电/充电策略,并在奶业设置下实现了显著减少电力成本的result。这些发现表明了Q学习在奶业中的有效性。

Multi-field Visualisation via Trait-induced Merge Trees

  • paper_url: http://arxiv.org/abs/2308.09015
  • repo_url: None
  • paper_authors: Jochen Jankowai, Talha Bin Masood, Ingrid Hotz
  • for: 这个论文旨在探讨tensor场或多变量数据的分析,通过特征级别集合的概念来扩展merge树。
  • methods: 本文使用特征空间中的特征定义来定义特征 trait,然后使用这些特征 trait 计算特征级别集合的距离场,从而实现特征级别集合的排序和查询。
  • results: 本文提出了一种基于特征 trait 的merge树,可以对tensor场或多变量数据进行有效的探讨和查询。三个案例研究证明了这种方法的十分适用性和普遍性。
    Abstract In this work, we propose trait-based merge trees a generalization of merge trees to feature level sets, targeting the analysis of tensor field or general multi-variate data. For this, we employ the notion of traits defined in attribute space as introduced in the feature level sets framework. The resulting distance field in attribute space induces a scalar field in the spatial domain that serves as input for topological data analysis. The leaves in the merge tree represent those areas in the input data that are closest to the defined trait and thus most closely resemble the defined feature. Hence, the merge tree yields a hierarchy of features that allows for querying the most relevant and persistent features. The presented method includes different query methods for the tree which enable the highlighting of different aspects. We demonstrate the cross-application capabilities of this approach with three case studies from different domains.
    摘要 在这项工作中,我们提出了基于特征 trait-based 合并树,这是 merge tree 的一种普遍化,targeting tensor field 或普通多变量数据的分析。为此,我们利用了 attribute space 中定义的特征(trait)的概念,如 feature level sets 框架中所介绍的。将 attribute space 中的距离场转换为 spatial domain 中的scalar场,以便在 topological data analysis 中使用。 merge tree 的叶节点表示输入数据中最近 trait 的区域,即最接近定义的特征的区域。因此,merge tree 提供了一个 Hierarchy of Features,可以对输入数据进行最相关和 persistente 特征的查询。我们还提供了不同的查询方法,以便高亮不同的方面。我们通过三个不同领域的案例,证明了这种方法的跨应用性。

Deep-seeded Clustering for Unsupervised Valence-Arousal Emotion Recognition from Physiological Signals

  • paper_url: http://arxiv.org/abs/2308.09013
  • repo_url: None
  • paper_authors: Antoine Dubois, Carlos Lima Azevedo, Sonja Haustein, Bruno Miranda
  • For: The paper is written for recognizing emotions from physiological and psychological data using unsupervised deep cluster methods.* Methods: The paper proposes an unsupervised deep cluster framework for emotion recognition, using deep k-means and deep c-means to distinguish the four quadrants of Russell’s circumplex model of affect.* Results: The tests on the open benchmark data set WESAD show that the proposed method achieves an overall accuracy of 87% in recognizing emotions from physiological and psychological data, without the need for labels.
    Abstract Emotions play a significant role in the cognitive processes of the human brain, such as decision making, learning and perception. The use of physiological signals has shown to lead to more objective, reliable and accurate emotion recognition combined with raising machine learning methods. Supervised learning methods have dominated the attention of the research community, but the challenge in collecting needed labels makes emotion recognition difficult in large-scale semi- or uncontrolled experiments. Unsupervised methods are increasingly being explored, however sub-optimal signal feature selection and label identification challenges unsupervised methods' accuracy and applicability. This article proposes an unsupervised deep cluster framework for emotion recognition from physiological and psychological data. Tests on the open benchmark data set WESAD show that deep k-means and deep c-means distinguish the four quadrants of Russell's circumplex model of affect with an overall accuracy of 87%. Seeding the clusters with the subject's subjective assessments helps to circumvent the need for labels.
    摘要 This article proposes an unsupervised deep cluster framework for emotion recognition from physiological and psychological data. The proposed method uses deep k-means and deep c-means to distinguish the four quadrants of Russell's circumplex model of affect, with an overall accuracy of 87%. Additionally, seeding the clusters with the subject's subjective assessments helps to circumvent the need for labels. Tests on the open benchmark data set WESAD demonstrate the effectiveness of the proposed method.

Towards Lightweight Data Integration using Multi-workflow Provenance and Data Observability

  • paper_url: http://arxiv.org/abs/2308.09004
  • repo_url: None
  • paper_authors: Renan Souza, Tyler J. Skluzacek, Sean R. Wilkinson, Maxim Ziatdinov, Rafael Ferreira da Silva
  • for: 这个论文的目的是探讨如何实现跨计算机中心的数据分析,以便在科学发现过程中实现负责任AI开发、可重复性、可访问性和用户指导。
  • methods: 该论文使用了数据可见性策略和适配器系统设计,以及证据来解决多种并行系统和机器学习工具之间的兼容性问题。
  • results: 实验结果表明,使用MIDA方法可以实现无 overhead的运行更多任务,并在 Summit 超级计算机上运行多达 276 个 GPU 并行。
    Abstract Modern large-scale scientific discovery requires multidisciplinary collaboration across diverse computing facilities, including High Performance Computing (HPC) machines and the Edge-to-Cloud continuum. Integrated data analysis plays a crucial role in scientific discovery, especially in the current AI era, by enabling Responsible AI development, FAIR, Reproducibility, and User Steering. However, the heterogeneous nature of science poses challenges such as dealing with multiple supporting tools, cross-facility environments, and efficient HPC execution. Building on data observability, adapter system design, and provenance, we propose MIDA: an approach for lightweight runtime Multi-workflow Integrated Data Analysis. MIDA defines data observability strategies and adaptability methods for various parallel systems and machine learning tools. With observability, it intercepts the dataflows in the background without requiring instrumentation while integrating domain, provenance, and telemetry data at runtime into a unified database ready for user steering queries. We conduct experiments showing end-to-end multi-workflow analysis integrating data from Dask and MLFlow in a real distributed deep learning use case for materials science that runs on multiple environments with up to 276 GPUs in parallel. We show near-zero overhead running up to 100,000 tasks on 1,680 CPU cores on the Summit supercomputer.
    摘要 现代大规模科学发现需要跨学科合作,包括高性能计算机(HPC)机器和边缘到云Continuum。集成数据分析在科学发现中扮演着关键角色,特别是在当今AI时代,可以实现负责任AI开发、FAIR、可重现和用户指导。然而,科学的多样性带来了多种支持工具、跨设施环境和高性能HPC执行的挑战。基于数据可见性、适配器系统设计和 provinidence,我们提出了MIDA:一种轻量级运行时多 workflow集成数据分析方法。MIDA定义了不同平台和机器学习工具的数据可见性策略和适配性方法。通过可见性,它在背景中 intercepts 数据流 Without requiring instrumentation,并在运行时将domain、provinidence和telemetry数据集成到一个统一的数据库中,准备好 для用户导航查询。我们在实验中通过结合 Dask 和 MLFlow 的数据来实现了端到端多 workflow分析,并在多个环境上并行运行了多达 276 个GPU。我们还显示了 near-zero 执行 overhead,在 Summit 超级计算机上运行了 Up to 100,000 任务,使用 1,680 个CPU核心。

DealMVC: Dual Contrastive Calibration for Multi-view Clustering

  • paper_url: http://arxiv.org/abs/2308.09000
  • repo_url: https://github.com/xihongyang1999/dealmvc
  • paper_authors: Xihong Yang, Jiaqi Jin, Siwei Wang, Ke Liang, Yue Liu, Yi Wen, Suyuan Liu, Sihang Zhou, Xinwang Liu, En Zhu
  • for: 提高多视图 clustering 性能,解决跨视图场景下相似 yet different 样本的问题。
  • methods: 提出了一种 dual contrastive calibration network (DealMVC),包括全局抽象特征生成、全局对比约束和本地对比约束等多种约束。
  • results: 在八个 benchmark 数据集上进行了广泛的实验,证明了 DealMVC 的效果和优越性,并将代码发布在 GitHub。
    Abstract Benefiting from the strong view-consistent information mining capacity, multi-view contrastive clustering has attracted plenty of attention in recent years. However, we observe the following drawback, which limits the clustering performance from further improvement. The existing multi-view models mainly focus on the consistency of the same samples in different views while ignoring the circumstance of similar but different samples in cross-view scenarios. To solve this problem, we propose a novel Dual contrastive calibration network for Multi-View Clustering (DealMVC). Specifically, we first design a fusion mechanism to obtain a global cross-view feature. Then, a global contrastive calibration loss is proposed by aligning the view feature similarity graph and the high-confidence pseudo-label graph. Moreover, to utilize the diversity of multi-view information, we propose a local contrastive calibration loss to constrain the consistency of pair-wise view features. The feature structure is regularized by reliable class information, thus guaranteeing similar samples have similar features in different views. During the training procedure, the interacted cross-view feature is jointly optimized at both local and global levels. In comparison with other state-of-the-art approaches, the comprehensive experimental results obtained from eight benchmark datasets provide substantial validation of the effectiveness and superiority of our algorithm. We release the code of DealMVC at https://github.com/xihongyang1999/DealMVC on GitHub.
    摘要 利用强大的视图一致信息挖掘能力,多视图对比 clustering 在最近几年内吸引了很多关注。然而,我们发现以下缺点,限制了 clustering 性能的进一步提高:现有的多视图模型主要关注同一个样本在不同视图中的一致性,而忽略了另一个样本在不同视图中的相似性。为解决这个问题,我们提出了一种新的对比抑制网络 для多视图 clustering(DealMVC)。 Specifically,我们首先设计了一种合并机制,以获得全局跨视图特征。然后,我们提出了一种全局对比抑制损失,通过对视图特征相似性图和高置信度假标签图进行对对比,并且使用可靠的类别信息来规范特征结构。此外,为了利用多视图信息的多样性,我们提出了一种本地对比抑制损失,以确保不同视图中的相似样本具有相似的特征。在训练过程中,交互式跨视图特征被联合地优化在本地和全局两级。与其他状态当前的方法进行比较,我们在八个benchmark数据集上获得了广泛的实验结果,这些结果证明了我们的算法的有效性和超越性。我们在 GitHub 上发布了 DealMVC 的代码,可以在 中下载。

Reinforced Self-Training (ReST) for Language Modeling

  • paper_url: http://arxiv.org/abs/2308.08998
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas
  • for: 提高大型语言模型(LLM)的输出质量,通过人工反馈学习(RLHF)对其进行调整,使其更加符合人类的偏好。
  • methods: 我们提出了一种简单的算法,即增长批量学习(RL) inspirited Reinforced Self-Training(ReST),可以帮助RLHF方法更加高效。 ReST 使用了初始 LLM 策略,生成样本,并使用了离线 RL 算法来改善 LLM 策略。
  • results: 我们的结果表明,ReST 可以有效地提高翻译质量,并在计算和样本效率方面具有一定的优势。我们通过自动度量标准和人工评估在机器翻译benchmark上进行评估,并得到了良好的结果。
    Abstract Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.
    摘要 “强化学习从人类反馈(RLHF)可以提高大语言模型(LLM)的输出质量,使其与人类偏好更加一致。我们提出了一种简单的算法,称为强化自我培训(ReST),用于将 LLM 政策与人类偏好进行对齐。给定初始 LLM 策略,ReST 会生成一个数据集,并使用这些样本来改进 LLM 策略使用离线 RL 算法。ReST 比典型在线 RLHF 方法更加高效,因为它可以重用数据。虽然 ReST 是一种通用的方法,但我们在机器翻译中进行了专门的应用。我们的结果表明,ReST 可以在计算和样本效率的情况下提高翻译质量,并且可以通过自动评价指标和人类评价来证明这一点。”

Learning representations by forward-propagating errors

  • paper_url: http://arxiv.org/abs/2308.09728
  • repo_url: None
  • paper_authors: Ryoungwoo Jang
  • for: 这篇论文是为了提出一种轻量级、快速的学习算法,用于训练神经网络。
  • methods: 该算法基于前向传播方法,使用了数学几何中的双数概念。
  • results: 该算法比传统的后向传播法快速,可以在中央处理器(CPU)上实现快速的神经网络训练。
    Abstract Back-propagation (BP) is widely used learning algorithm for neural network optimization. However, BP requires enormous computation cost and is too slow to train in central processing unit (CPU). Therefore current neural network optimizaiton is performed in graphical processing unit (GPU) with compute unified device architecture (CUDA) programming. In this paper, we propose a light, fast learning algorithm on CPU that is fast as CUDA acceleration on GPU. This algorithm is based on forward-propagating method, using concept of dual number in algebraic geometry.
    摘要 <>translate "Back-propagation (BP) is widely used learning algorithm for neural network optimization. However, BP requires enormous computation cost and is too slow to train in central processing unit (CPU). Therefore current neural network optimizaiton is performed in graphical processing unit (GPU) with compute unified device architecture (CUDA) programming. In this paper, we propose a light, fast learning algorithm on CPU that is fast as CUDA acceleration on GPU. This algorithm is based on forward-propagating method, using concept of dual number in algebraic geometry." into Chinese (Simplified) answer:Back-propagation (BP) 是广泛使用的神经网络优化算法。然而,BP 需要巨大的计算成本, Training 在中央处理单元 (CPU) 太慢。因此,当前神经网络优化通常在图形处理单元 (GPU) 上使用 compute unified device architecture (CUDA) 编程。在这篇论文中,我们提出了一种轻量级、快速学习算法,在 CPU 上实现了 CUDA 加速器的速度。该算法基于前向传播方法,使用了代数几何中的 dual 数概念。

Neural oscillators for generalization of physics-informed machine learning

  • paper_url: http://arxiv.org/abs/2308.08989
  • repo_url: None
  • paper_authors: Taniya Kapoor, Abhishek Chandra, Daniel M. Tartakovsky, Hongrui Wang, Alfredo Nunez, Rolf Dollevoet
  • for: 提高物理学 Informed 机器学习(PIML)的泛化能力,尤其是面临复杂的物理问题时。
  • methods: 利用各种常微分方程(PDEs)的约束和时间序列特征,将PIML模型与回归神经网络结合,形成叫做神经振荡器的循环神经网络架构。
  • results: 通过有效地捕捉长期依赖关系和减少扩散和消失梯度问题,神经振荡器使PIML任务中的泛化能力得到了改进。在不同的约束和时间序列问题上进行了广泛的实验,并证明了该方法在评价指标上的优越性。
    Abstract A primary challenge of physics-informed machine learning (PIML) is its generalization beyond the training domain, especially when dealing with complex physical problems represented by partial differential equations (PDEs). This paper aims to enhance the generalization capabilities of PIML, facilitating practical, real-world applications where accurate predictions in unexplored regions are crucial. We leverage the inherent causality and temporal sequential characteristics of PDE solutions to fuse PIML models with recurrent neural architectures based on systems of ordinary differential equations, referred to as neural oscillators. Through effectively capturing long-time dependencies and mitigating the exploding and vanishing gradient problem, neural oscillators foster improved generalization in PIML tasks. Extensive experimentation involving time-dependent nonlinear PDEs and biharmonic beam equations demonstrates the efficacy of the proposed approach. Incorporating neural oscillators outperforms existing state-of-the-art methods on benchmark problems across various metrics. Consequently, the proposed method improves the generalization capabilities of PIML, providing accurate solutions for extrapolation and prediction beyond the training data.
    摘要 primary challenge of physics-informed machine learning (PIML) is its generalization beyond the training domain, especially when dealing with complex physical problems represented by partial differential equations (PDEs). This paper aims to enhance the generalization capabilities of PIML, facilitating practical, real-world applications where accurate predictions in unexplored regions are crucial. We leverage the inherent causality and temporal sequential characteristics of PDE solutions to fuse PIML models with recurrent neural architectures based on systems of ordinary differential equations, referred to as neural oscillators. Through effectively capturing long-time dependencies and mitigating the exploding and vanishing gradient problem, neural oscillators foster improved generalization in PIML tasks. Extensive experimentation involving time-dependent nonlinear PDEs and biharmonic beam equations demonstrates the efficacy of the proposed approach. Incorporating neural oscillators outperforms existing state-of-the-art methods on benchmark problems across various metrics. Consequently, the proposed method improves the generalization capabilities of PIML, providing accurate solutions for extrapolation and prediction beyond the training data.Here's the word-for-word translation in Simplified Chinese:physics-informed machine learning (PIML) 的主要挑战是其在训练领域之外的泛化,特别是在复杂的物理问题中表示为 partial differential equations (PDEs) 时。这篇文章目的是提高 PIML 的泛化能力,以便实际应用中具有准确预测未知区域的需求。我们利用 PDE 解的内在 causality 和时间序列特征来融合 PIML 模型和回归神经网络,基于系数 differential equations,称为神经振荡器。通过有效地捕捉长时间依赖关系和缓解扩散和消失梯度问题,神经振荡器促进了 PIML 任务中的泛化。广泛的实验,包括时间依赖非线性 PDE 和 biharmonic beam 方程,证明了我们的方法的有效性。在不同的纬度上,涂抹神经振荡器的方法超过了现有的状态之巅方法。因此,我们的方法提高了 PIML 的泛化能力,为推算和预测训练数据之外的精度提供了准确的解决方案。

Quantifying the biomimicry gap in biohybrid systems

  • paper_url: http://arxiv.org/abs/2308.08978
  • repo_url: None
  • paper_authors: Vaios Papaspyros, Guy Theraulaz, Clément Sire, Francesco Mondada
  • for: 这个论文的目的是用生物受体系(biohybrid system)来探索和识别动物的集体行为机制。
  • methods: 这篇论文使用了生物受体系,其中一个是一个模拟鲤鱼(rummy-nose tetra fish)的机器人骗吸,另一个是一个神经网络(NN)模型,用于生成生物受体系中的社交互动。
  • results: 经过实验和模拟,这个生物受体系能够模拟出真实的鲤鱼对话,并且能够在实际情况下维护高度的互动准确性。
    Abstract Biohybrid systems in which robotic lures interact with animals have become compelling tools for probing and identifying the mechanisms underlying collective animal behavior. One key challenge lies in the transfer of social interaction models from simulations to reality, using robotics to validate the modeling hypotheses. This challenge arises in bridging what we term the "biomimicry gap", which is caused by imperfect robotic replicas, communication cues and physics constrains not incorporated in the simulations that may elicit unrealistic behavioral responses in animals. In this work, we used a biomimetic lure of a rummy-nose tetra fish (Hemigrammus rhodostomus) and a neural network (NN) model for generating biomimetic social interactions. Through experiments with a biohybrid pair comprising a fish and the robotic lure, a pair of real fish, and simulations of pairs of fish, we demonstrate that our biohybrid system generates high-fidelity social interactions mirroring those of genuine fish pairs. Our analyses highlight that: 1) the lure and NN maintain minimal deviation in real-world interactions compared to simulations and fish-only experiments, 2) our NN controls the robot efficiently in real-time, and 3) a comprehensive validation is crucial to bridge the biomimicry gap, ensuring realistic biohybrid systems.
    摘要 生物混合系统中的机器人骗子与动物之间的互动已成为诱导和识别动物集体行为的有力工具。一个关键挑战在于将社交互动模型从模拟转移到现实中,使用机器人来验证模型假设。这个挑战是由我们称为“生物模仿差距”引起的,这是因为机器人的复制不准确、通信信号和物理约束不包括在模拟中,可能导致动物行为的不真实。在这项工作中,我们使用一个生物模仿的鲤鱼(Hemigrammus rhodostomus)和一个神经网络(NN)模型来生成生物模仿的社交互动。通过实验使用一个包括鱼和机器人骗子的生物混合对,一对真正的鱼对,以及模拟两个鱼对,我们表明了我们的生物混合系统可以生成高效环境中的社交互动,与真正的鱼对相似。我们的分析显示了以下结论:1)骗子和NN在实际互动中几乎保持和模拟中的互动一致,2)我们的NN可以在实时控制机器人,3)完整的验证是关键来跨越生物模仿差距,确保生物混合系统的真实性。

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

  • paper_url: http://arxiv.org/abs/2308.08977
  • repo_url: None
  • paper_authors: Elizabeth Collins-Woodfin, Courtney Paquette, Elliot Paquette, Inbar Seroussi
  • for: 这篇论文研究了流动式梯度下降(SGD)在高维度情况下的动态,特别是在泛型线性模型和多指量模型(如逻辑回归、相位恢复)中的应用。
  • methods: 这篇论文使用了一个系统Ordinary differential equations(ODE)来描述SGD的动态,这个ODE可以涵盖各种统计量,如风险和优化度的度量。此外, authors还引入了一种简化的扩散积分(homogenized SGD),以便分析SGD迭代的动态。
  • results: 这篇论文提出了SGD的学习率阈值和稳定性的保证,并通过一些标准示例进行了数值实验,实验结果与理论结果几乎一致。
    Abstract We analyze the dynamics of streaming stochastic gradient descent (SGD) in the high-dimensional limit when applied to generalized linear models and multi-index models (e.g. logistic regression, phase retrieval) with general data-covariance. In particular, we demonstrate a deterministic equivalent of SGD in the form of a system of ordinary differential equations that describes a wide class of statistics, such as the risk and other measures of sub-optimality. This equivalence holds with overwhelming probability when the model parameter count grows proportionally to the number of data. This framework allows us to obtain learning rate thresholds for stability of SGD as well as convergence guarantees. In addition to the deterministic equivalent, we introduce an SDE with a simplified diffusion coefficient (homogenized SGD) which allows us to analyze the dynamics of general statistics of SGD iterates. Finally, we illustrate this theory on some standard examples and show numerical simulations which give an excellent match to the theory.
    摘要 我们分析流动式随机梯度下降(SGD)在高维度限制下的动态,当应用于泛化线性模型和多指标模型(例如逻辑回归、相位恢复)时。特别是,我们提出了SGD的权值等价物,即一个系数为普通 дифференциаль方程,描述了广泛的统计量,如风险和其他优化度量的扩展。这种等价性在数据个数增加时,几乎总是成立,并且允许我们获得SGD的学习率阈值以及稳定性保证。此外,我们还引入了一种简化的扩散率(混合SGD),使得我们可以分析SGD迭代的通用统计特性。最后,我们在一些标准例子中进行了数值仿真,并证明了理论与实际匹配得非常好。

Cross-city Few-Shot Traffic Forecasting via Traffic Pattern Bank

  • paper_url: http://arxiv.org/abs/2308.09727
  • repo_url: https://github.com/zhyliu00/tpb
  • paper_authors: Zhanyu Liu, Guanjie Zheng, Yanwei Yu
    for: 这个研究旨在提高城市间交通预测的性能,解决城市缺乏资料的问题。methods: 这个方法利用了跨城市几个shot的方法,通过将城市之间的交通模式汇集成一个Traffic Pattern Bank(TPB),从而将城市之间的交通模式转换为高维度的空间中。results: 实验结果显示,这个方法可以在真实的交通数据上进行预测,并且比较出performace的提升。
    Abstract Traffic forecasting is a critical service in Intelligent Transportation Systems (ITS). Utilizing deep models to tackle this task relies heavily on data from traffic sensors or vehicle devices, while some cities might lack device support and thus have few available data. So, it is necessary to learn from data-rich cities and transfer the knowledge to data-scarce cities in order to improve the performance of traffic forecasting. To address this problem, we propose a cross-city few-shot traffic forecasting framework via Traffic Pattern Bank (TPB) due to that the traffic patterns are similar across cities. TPB utilizes a pre-trained traffic patch encoder to project raw traffic data from data-rich cities into high-dimensional space, from which a traffic pattern bank is generated through clustering. Then, the traffic data of the data-scarce city could query the traffic pattern bank and explicit relations between them are constructed. The metaknowledge is aggregated based on these relations and an adjacency matrix is constructed to guide a downstream spatial-temporal model in forecasting future traffic. The frequently used meta-training framework Reptile is adapted to find a better initial parameter for the learnable modules. Experiments on real-world traffic datasets show that TPB outperforms existing methods and demonstrates the effectiveness of our approach in cross-city few-shot traffic forecasting.
    摘要 traffic 预测是智能交通系统(ITS)中的关键服务。使用深度模型来解决这个任务需要依赖于交通传感器或车辆设备上的数据,而一些城市可能缺乏设备支持,因此有少量可用的数据。因此,需要从数据丰富城市学习并传递知识到数据缺乏城市,以改善交通预测性能。为解决这个问题,我们提出了跨城市少量 traffic 预测框架,通过交通模式银行(TPB)。由于交通模式在城市之间相似,因此可以使用 pre-trained 交通覆盖器来将数据丰富城市的 raw 交通数据 проекed 到高维空间中,从而生成交通模式银行。然后,数据缺乏城市的交通数据可以查询交通模式银行,并构建了交通模式之间的显式关系。基于这些关系,我们可以归纳知识,并使用这些关系来导引下游的空间时间模型进行未来交通预测。我们使用现实世界交通数据进行实验,并证明了 TPB 在跨城市少量 traffic 预测中的效果。

CONVERT:Contrastive Graph Clustering with Reliable Augmentation

  • paper_url: http://arxiv.org/abs/2308.08963
  • repo_url: https://github.com/xihongyang1999/convert
  • paper_authors: Xihong Yang, Cheng Tan, Yue Liu, Ke Liang, Siwei Wang, Sihang Zhou, Jun Xia, Stan Z. Li, Xinwang Liu, En Zhu
    for:* 这个研究目的是提出一个可靠的图像隐藏嵌入 clustering 方法,以解决现有方法对于隐藏嵌入的依赖性和可靠性的问题。methods:* 本方法使用一个叫做复原噪音网络 (Reversible Perturb-Recover Network, RPRN) 来处理数据增强,从而将隐藏嵌入中的 semantics 写入可靠的形式。* 此外,本方法还提出了一个新的内容损失函数 (Semantic Loss),以保证隐藏嵌入中的 semantics 的可靠性。results:* 实验结果显示,本方法在七个数据集上具有优秀的效果,并且比较现有的方法有更好的可靠性和稳定性。
    Abstract Contrastive graph node clustering via learnable data augmentation is a hot research spot in the field of unsupervised graph learning. The existing methods learn the sampling distribution of a pre-defined augmentation to generate data-driven augmentations automatically. Although promising clustering performance has been achieved, we observe that these strategies still rely on pre-defined augmentations, the semantics of the augmented graph can easily drift. The reliability of the augmented view semantics for contrastive learning can not be guaranteed, thus limiting the model performance. To address these problems, we propose a novel CONtrastiVe Graph ClustEring network with Reliable AugmenTation (COVERT). Specifically, in our method, the data augmentations are processed by the proposed reversible perturb-recover network. It distills reliable semantic information by recovering the perturbed latent embeddings. Moreover, to further guarantee the reliability of semantics, a novel semantic loss is presented to constrain the network via quantifying the perturbation and recovery. Lastly, a label-matching mechanism is designed to guide the model by clustering information through aligning the semantic labels and the selected high-confidence clustering pseudo labels. Extensive experimental results on seven datasets demonstrate the effectiveness of the proposed method. We release the code and appendix of CONVERT at https://github.com/xihongyang1999/CONVERT on GitHub.
    摘要 <> simult Vincent 翻译文本 into Simplified Chinese.<>研究领域中的热点问题是无监督图学习中的对比性图节点聚合。现有方法通过学习预定的扩充来自动生成数据驱动的扩充。虽然这些策略已经实现了可观的聚合性能,但我们发现这些策略仍然 rely on pre-defined augmentations,对扩充后的图semantics的可靠性很难保证,因此限制模型性能。为解决这些问题,我们提出了一种名为 CONtrastiVe Graph ClustEring network with Reliable AugmenTation (COVERT)的新方法。具体来说,在我们的方法中,数据扩充被提posed reversible perturb-recover网络处理。它通过recovering the perturbed latent embeddings来浮雨reliable semantic information。此外,为了进一步保证semantics的可靠性,我们提出了一种新的semantic loss,用于via quantifying the perturbation and recovery。最后,为了引导模型,我们设计了一种label-matching机制,通过对semantic labels和选择高 confidence clustering pseudo labels进行对应,以确保模型的聚合性能。我们的实验结果表明,提出的方法效果显著。我们在 GitHub上发布了CONVERT的代码和补充文件,请参考https://github.com/xihongyang1999/CONVERT。

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

  • paper_url: http://arxiv.org/abs/2308.09726
  • repo_url: https://github.com/google-research/socialgood
  • paper_authors: Jackson A. Killian, Manish Jain, Yugang Jia, Jonathan Amar, Erich Huang, Milind Tambe
  • for: 这个论文旨在研究如何使用多臂摇树机制(Restless Multi-armed Bandits,RMAB)来实现公平的决策,特别是在数字医疗方面。
  • methods: 这篇论文使用了两种公平目标函数来衡量公平性:最小最大奖励和最大战略启发奖励。它们分别使用水满算法和聪明的抢夺策略来解决这些目标函数。
  • results: 在三个模拟领域中,包括一个新的数字医疗模型,这些方法能够明显提高公平性,而不是失去效率。这些结果表明,使用RMAB来实现公平的决策在实际应用中非常重要。代码可以在https://github.com/google-research/socialgood/tree/equitable-rmab 中找到。
    Abstract Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision making in sequential settings with limited resources. RMABs are increasingly being used for sensitive decisions such as in public health, treatment scheduling, anti-poaching, and -- the motivation for this work -- digital health. For such high stakes settings, decisions must both improve outcomes and prevent disparities between groups (e.g., ensure health equity). We study equitable objectives for RMABs (ERMABs) for the first time. We consider two equity-aligned objectives from the fairness literature, minimax reward and max Nash welfare. We develop efficient algorithms for solving each -- a water filling algorithm for the former, and a greedy algorithm with theoretically motivated nuance to balance disparate group sizes for the latter. Finally, we demonstrate across three simulation domains, including a new digital health model, that our approaches can be multiple times more equitable than the current state of the art without drastic sacrifices to utility. Our findings underscore our work's urgency as RMABs permeate into systems that impact human and wildlife outcomes. Code is available at https://github.com/google-research/socialgood/tree/equitable-rmab
    摘要 众臂多 armed bandit (RMAB) 是一种流行的算法决策框架,用于Sequential setting with limited resources。RMAB 在公共卫生、治疗安排、抗捕鱼和数字卫生等高度敏感的决策中被越来越广泛使用。为了确保高效性和避免群体之间的差异,我们研究了 equitable objectives for RMAB (ERMAB) 的第一次。我们考虑了两种与公平相关的目标函数,即最小最大奖励和最大 Nash 利益。我们开发了高效的算法来解决每一个,包括一种填充水的算法和一种基于理论上的细节来平衡不同群体的大小的排序算法。最后,我们在三个 simulate 领域中,包括一个新的数字卫生模型,证明了我们的方法可以在不做很大的牺牲下多达多少倍于当前状态的art 更加公平。我们的发现推动我们的工作的急需,因为 RMAB 在影响人类和野生动物的系统中普及。代码可以在 获取。

A Dual-Perspective Approach to Evaluating Feature Attribution Methods

  • paper_url: http://arxiv.org/abs/2308.08949
  • repo_url: None
  • paper_authors: Yawei Li, Yang Zhang, Kenji Kawaguchi, Ashkan Khakzar, Bernd Bischl, Mina Rezaei
  • for: 本文旨在批判现有的 faithfulness 评估方法,并提出两种新的评估方法,即 soundness 和 completeness。
  • methods: 本文使用了两种新的评估方法,即 soundness 和 completeness,它们都基于数学基础,并且可以通过高效的算法来计算。
  • results: 本文通过应用这两种评估方法,对主流的 attribute 方法进行了分析和比较,并发现了一些缺陷。
    Abstract Feature attribution methods attempt to explain neural network predictions by identifying relevant features. However, establishing a cohesive framework for assessing feature attribution remains a challenge. There are several views through which we can evaluate attributions. One principal lens is to observe the effect of perturbing attributed features on the model's behavior (i.e., faithfulness). While providing useful insights, existing faithfulness evaluations suffer from shortcomings that we reveal in this paper. In this work, we propose two new perspectives within the faithfulness paradigm that reveal intuitive properties: soundness and completeness. Soundness assesses the degree to which attributed features are truly predictive features, while completeness examines how well the resulting attribution reveals all the predictive features. The two perspectives are based on a firm mathematical foundation and provide quantitative metrics that are computable through efficient algorithms. We apply these metrics to mainstream attribution methods, offering a novel lens through which to analyze and compare feature attribution methods.
    摘要 translate text into Simplified Chinese特征归因方法试图解释神经网络预测的结果,并识别有关的特征。但是,建立一个完整的评估特征归因框架仍然是一个挑战。我们可以从多种角度来评估归因,其中一个主要的视角是观察归因特征的修改对模型行为的影响(即实践)。尽管提供了有用的洞察,但现有的实践评估受到一些缺陷,我们在这篇论文中揭露出这些缺陷。在这项工作中,我们提出了两种新的视角,它们是尊重性和完整性。尊重性评估归因特征是真正预测性的特征,而完整性评估总是否能够把预测特征全面地揭露出来。这两种视角基于坚实的数学基础,并提供了可计算的量化指标。我们应用这些指标到主流归因方法上,提供了一种新的评估特征归因方法的视角。Note: The above translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Predicting Crop Yield With Machine Learning: An Extensive Analysis Of Input Modalities And Models On a Field and sub-field Level

  • paper_url: http://arxiv.org/abs/2308.08948
  • repo_url: None
  • paper_authors: Deepak Pathak, Miro Miranda, Francisco Mena, Cristhian Sanchez, Patrick Helber, Benjamin Bischke, Peter Habelitz, Hiba Najjar, Jayanth Siddamsetty, Diego Arenas, Michaela Vollmer, Marcela Charfuelan, Marlon Nuske, Andreas Dengel
  • for: 该论文旨在提出一种简单又有效的早期融合方法,用于预测农业产量,该方法可以处理不同的时间和空间分辨率输入数据。
  • methods: 该方法使用高分辨率农业产量地图作为真实数据来训练农作物和机器学习模型,并使用Sentinel-2卫星图像作为主要输入数据,并考虑其他补充模式,如天气、土壤和地形数据。
  • results: 该方法可以在全球范围内使用可用的输入模式,并且可以在不同地区、农作物和选择的模型中确定最佳输入模式。
    Abstract We introduce a simple yet effective early fusion method for crop yield prediction that handles multiple input modalities with different temporal and spatial resolutions. We use high-resolution crop yield maps as ground truth data to train crop and machine learning model agnostic methods at the sub-field level. We use Sentinel-2 satellite imagery as the primary modality for input data with other complementary modalities, including weather, soil, and DEM data. The proposed method uses input modalities available with global coverage, making the framework globally scalable. We explicitly highlight the importance of input modalities for crop yield prediction and emphasize that the best-performing combination of input modalities depends on region, crop, and chosen model.
    摘要 我们介绍了一种简单 yet effective的早期融合方法 для农作物收成预测,该方法可以处理多个输入模式,每种模式具有不同的时间和空间分辨率。我们使用高分辨率农作物收成地图作为真实数据来训练农作物和机器学习模型无关方法。我们使用卫星影像作为主要输入数据,其他补充模式包括气象、土壤和 DEM 数据。我们的方法使用全球覆盖的输入数据,使得框架可以在全球范围内扩展。我们显式强调输入模式对农作物收成预测的重要性,并强调选择地区、农作物和模型而定的最佳输入模式组合。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Interpretable Graph Neural Networks for Tabular Data

  • paper_url: http://arxiv.org/abs/2308.08945
  • repo_url: None
  • paper_authors: Amr Alkhatib, Sofiane Ennadir, Henrik Boström, Michalis Vazirgiannis
  • for: 本研究的目的是开发一种可解释的图 neural network(IGNNet),用于处理标量数据。
  • methods: IGNNet使用了一种新的学习策略,即强制学习算法生成可解释的模型,以便用户可以了解模型的逻辑。
  • results: 实验结果表明,IGNNet可以与当今最佳的机器学习算法相比,在处理标量数据方面表现出色,同时可以提供可解释的模型。
    Abstract Data in tabular format is frequently occurring in real-world applications. Graph Neural Networks (GNNs) have recently been extended to effectively handle such data, allowing feature interactions to be captured through representation learning. However, these approaches essentially produce black-box models, in the form of deep neural networks, precluding users from following the logic behind the model predictions. We propose an approach, called IGNNet (Interpretable Graph Neural Network for tabular data), which constrains the learning algorithm to produce an interpretable model, where the model shows how the predictions are exactly computed from the original input features. A large-scale empirical investigation is presented, showing that IGNNet is performing on par with state-of-the-art machine-learning algorithms that target tabular data, including XGBoost, Random Forests, and TabNet. At the same time, the results show that the explanations obtained from IGNNet are aligned with the true Shapley values of the features without incurring any additional computational overhead.
    摘要 translates into数据在表格格式下经常出现在实际应用中。图 neural networks(GNNs)在最近扩展以处理这类数据,以便捕捉特征相互作用。然而,这些方法基本上生成黑盒模型,即深度神经网络,禁止用户跟踪模型预测的逻辑。我们提出了一种方法,称为 IGNNet(可解释图 neural networks for 表格数据),它限制学习算法生成可解释模型,该模型显示从原始输入特征直接计算出模型预测的具体过程。我们进行了大规模的实验研究,显示IGNNet与目标 tabular 数据的状态态ART机器学习算法,包括 XGBoost、Random Forests 和 TabNet 的性能相当。同时,结果表明IGNNet 的解释与真实的 Shapley 值相吻合,无需增加计算开销。

Causal Adversarial Perturbations for Individual Fairness and Robustness in Heterogeneous Data Spaces

  • paper_url: http://arxiv.org/abs/2308.08938
  • repo_url: https://github.com/Ehyaei/CAPIFY
  • paper_authors: Ahmad-Reza Ehyaei, Kiarash Mohammadi, Amir-Hossein Karimi, Samira Samadi, Golnoosh Farnadi
  • for: 这个论文是为了同时探讨和结合个人公平、对抗攻击和结构 causal 模型在不同数据空间中的关系而写的。
  • methods: 该论文使用 causal 结构模型和敏感特征来创建一个公平度量,并将其应用于测试人员之间的 semantic 相似性。它还引入了一种新的 causal 对抗抑制和对 adversarial 训练,以创建一种新的敏感度量。
  • results: 该论文在真实世界和 sintetic 数据集上进行了评估,并示出了同时满足个人公平、对抗攻击和 causal 意识的精度 классифика器的可行性。
    Abstract As responsible AI gains importance in machine learning algorithms, properties such as fairness, adversarial robustness, and causality have received considerable attention in recent years. However, despite their individual significance, there remains a critical gap in simultaneously exploring and integrating these properties. In this paper, we propose a novel approach that examines the relationship between individual fairness, adversarial robustness, and structural causal models in heterogeneous data spaces, particularly when dealing with discrete sensitive attributes. We use causal structural models and sensitive attributes to create a fair metric and apply it to measure semantic similarity among individuals. By introducing a novel causal adversarial perturbation and applying adversarial training, we create a new regularizer that combines individual fairness, causality, and robustness in the classifier. Our method is evaluated on both real-world and synthetic datasets, demonstrating its effectiveness in achieving an accurate classifier that simultaneously exhibits fairness, adversarial robustness, and causal awareness.
    摘要 “responsible AI”在机器学习算法中的重要性日益增长,其中包括“公平”、“对抗攻击”和“ causality”等性能。然而,尽管它们各自的重要性,仍然存在一个重要的挑战是同时探索和 интегра这些性能。在这篇论文中,我们提出一种新的方法,检查个人公平、对抗攻击和结构性 causal 模型在不同数据空间中的关系,特别是在处理敏感特征时。我们使用 causal 结构模型和敏感特征来创建一个公平度量,并将其应用于度量个体之间的含义相似性。通过引入一种新的 causal 对抗扰动和应用对抗训练,我们创造了一种新的规则,将个体公平、对抗攻击和 causality 集成到分类器中。我们的方法在真实世界和 sintetic 数据集上进行了评估,并显示了同时实现公平、对抗攻击和 causality 的准确分类器。

Estimating fire Duration using regression methods

  • paper_url: http://arxiv.org/abs/2308.08936
  • repo_url: None
  • paper_authors: Hansong Xiao
  • for: 本研究旨在提出机器学习方法来解决野火预测问题中的计算成本高和计算时间长问题。
  • methods: 本研究使用了随机森林、KNN和XGBoost回归模型,以及图像基于的CNN和编码器来预测野火燃烧时间。
  • results: 本研究通过对历史火灾数据和地形特征地图进行训练,并测试最新的实际值,提出了快速和相对准确的未来预测方法。
    Abstract Wildfire forecasting problems usually rely on complex grid-based mathematical models, mostly involving Computational fluid dynamics(CFD) and Celluar Automata, but these methods have always been computationally expensive and difficult to deliver a fast decision pattern. In this paper, we provide machine learning based approaches that solve the problem of high computational effort and time consumption. This paper predicts the burning duration of a known wildfire by RF(random forest), KNN, and XGBoost regression models and also image-based, like CNN and Encoder. Model inputs are based on the map of landscape features provided by satellites and the corresponding historical fire data in this area. This model is trained by happened fire data and landform feature maps and tested with the most recent real value in the same area. By processing the input differently to obtain the optimal outcome, the system is able to make fast and relatively accurate future predictions based on landscape images of known fires.
    摘要 通常情况下,野火预测问题采用复杂的格点基础数学模型,主要包括计算流体动力学(CFD)和细胞自动机,但这些方法总是 computationally expensive 和困难以提供快速决策模式。在这篇论文中,我们提供了基于机器学习的方法,解决了高计算成本和时间消耗的问题。这篇论文预测已知的野火燃烧时间的RF(随机森林)、KNN和XGBoost等 regression 模型,以及图像基于的,如 CNN 和 Encoder。模型输入基于通过卫星提供的地形特征地图和相应的历史火灾数据。这个模型通过已发生过火灾数据和地形特征地图进行训练,并在同一地区测试最新的实际值。通过不同的处理输入来获得优化的结果,系统可以基于景观图像知道的火灾来做快速和相对准确的未来预测。

On Data Imbalance in Molecular Property Prediction with Pre-training

  • paper_url: http://arxiv.org/abs/2308.08934
  • repo_url: None
  • paper_authors: Limin Wang, Masatoshi Hanai, Toyotaro Suzumura, Shun Takashige, Kenjiro Taura
  • For: 本研究旨在提高分子性质预测模型的精度,解决传统的分子性质预测方法存在偏好性问题。* Methods: 本研究使用了一种组合方法,将理论计算与机器学习相结合,通过训练机器学习模型在一 subset of 理论计算结果上构建一个伪模型,以应用于剩下的材料。此外,还使用了预训练技术来提高机器学习模型的准确性。* Results: 本研究通过修改存在偏好性的损失函数,提高了预训练后的最终预测精度。通过实验和评估使用了标准的分子性质预测模型 benchmark,确认了我们的提案的有效性。
    Abstract Revealing and analyzing the various properties of materials is an essential and critical issue in the development of materials, including batteries, semiconductors, catalysts, and pharmaceuticals. Traditionally, these properties have been determined through theoretical calculations and simulations. However, it is not practical to perform such calculations on every single candidate material. Recently, a combination method of the theoretical calculation and machine learning has emerged, that involves training machine learning models on a subset of theoretical calculation results to construct a surrogate model that can be applied to the remaining materials. On the other hand, a technique called pre-training is used to improve the accuracy of machine learning models. Pre-training involves training the model on pretext task, which is different from the target task, before training the model on the target task. This process aims to extract the input data features, stabilizing the learning process and improving its accuracy. However, in the case of molecular property prediction, there is a strong imbalance in the distribution of input data and features, which may lead to biased learning towards frequently occurring data during pre-training. In this study, we propose an effective pre-training method that addresses the imbalance in input data. We aim to improve the final accuracy by modifying the loss function of the existing representative pre-training method, node masking, to compensate the imbalance. We have investigated and assessed the impact of our proposed imbalance compensation on pre-training and the final prediction accuracy through experiments and evaluations using benchmark of molecular property prediction models.
    摘要 描述和分析物质的各种性质是物料开发中不可或缺的一步,包括电池、半导体、催化剂和药品等。过去,这些性质通常通过理论计算和模拟来确定。但是,对每种候选材料都进行这些计算是不实际的。最近,一种结合理论计算和机器学习的方法在发展中,即通过训练机器学习模型在一部分理论计算结果基础上构建一个代理模型,以应用于剩下的材料。同时,一种称为预训练的技术也被应用,即在目标任务之前,通过不同于目标任务的预 Text task来训练模型,以提高模型的稳定性和准确性。但是,在分子性质预测中,输入数据和特征的分布存在强烈的不均衡,这可能导致在预训练时偏向频繁出现的数据进行偏袋学习。本研究提出了一种有效地平衡预训练方法,通过修改现有代表预训练方法的损失函数,以补偿输入数据的不均衡。我们通过实验和评估使用分子性质预测模型的标准套件进行了研究和评估。

IMM: An Imitative Reinforcement Learning Approach with Predictive Representation Learning for Automatic Market Making

  • paper_url: http://arxiv.org/abs/2308.08918
  • repo_url: None
  • paper_authors: Hui Niu, Siyuan Li, Jiahao Zheng, Zhouchi Lin, Jian Li, Jian Guo, Bo An
  • For: This paper proposes a novel Reinforcement Learning (RL) framework called Imitative Market Maker (IMM) to develop multi-price level Market Making (MM) strategies efficiently.* Methods: IMM integrates effective state and action representations, representation learning, and imitation learning techniques to train the agent efficiently.* Results: The proposed IMM framework outperforms current RL-based market making strategies in terms of several financial criteria, and the ablation study substantiates the effectiveness of the model components.Here’s the same information in Simplified Chinese text:* For: 这篇论文提出了一种基于强化学习(RL)的新型市场制定(MM)策略框架,即模仿市场制定(IMM)。* Methods: IMM 使用有效的状态和动作表示,以及学习表示学习和模仿学习技术来训练代理人。* Results: 提议的 IMM 框架在实验中胜过当前RL基于市场制定策略,以许多金融指标为基准。 ablation 研究证明模型组件的有效性。
    Abstract Market making (MM) has attracted significant attention in financial trading owing to its essential function in ensuring market liquidity. With strong capabilities in sequential decision-making, Reinforcement Learning (RL) technology has achieved remarkable success in quantitative trading. Nonetheless, most existing RL-based MM methods focus on optimizing single-price level strategies which fail at frequent order cancellations and loss of queue priority. Strategies involving multiple price levels align better with actual trading scenarios. However, given the complexity that multi-price level strategies involves a comprehensive trading action space, the challenge of effectively training profitable RL agents for MM persists. Inspired by the efficient workflow of professional human market makers, we propose Imitative Market Maker (IMM), a novel RL framework leveraging both knowledge from suboptimal signal-based experts and direct policy interactions to develop multi-price level MM strategies efficiently. The framework start with introducing effective state and action representations adept at encoding information about multi-price level orders. Furthermore, IMM integrates a representation learning unit capable of capturing both short- and long-term market trends to mitigate adverse selection risk. Subsequently, IMM formulates an expert strategy based on signals and trains the agent through the integration of RL and imitation learning techniques, leading to efficient learning. Extensive experimental results on four real-world market datasets demonstrate that IMM outperforms current RL-based market making strategies in terms of several financial criteria. The findings of the ablation study substantiate the effectiveness of the model components.
    摘要 市场制作(MM)在金融交易中吸引了广泛的注意力,因为它为市场流动性做出了关键性的贡献。RL技术在量化交易中表现出色,但大多数现有的RL基于MM方法都是优化单价级别策略,这些策略在频繁的订单取消和队列优先级失去时会失败。使用多价级别策略更好地适应实际交易场景。然而,由于多价级别策略的复杂性,RL代理的训练问题仍然存在。受到专业人类市场制作者的有效工作流程启发,我们提出了Imitative Market Maker(IMM),一种新的RL框架,该框架通过结合优化信号基于专家和直接政策互动来快速发展多价级别MM策略。IMM从 introduce有效的状态和动作表示,并 integrate representation学习单元,以便更好地捕捉多价级别订单中的信息。然后,IMM采用了RL和仿制学习技术结合,通过经验策略来训练代理,从而实现高效的学习。实验结果表明,IMM在四个实际市场数据集上比现有的RL基于MM策略表现出较好的财务效果。简化研究的结果证明了模型组件的效果。

Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.08915
  • repo_url: https://github.com/dawnvince/mts_cad
  • paper_authors: Haotian Si, Changhua Pei, Zhihan Li, Yadong Zhao, Jingjing Li, Haiming Zhang, Zulong Diao, Jianhui Li, Gaogang Xie, Dan Pei
  • for: 这个论文是为了提出一种基于多任务学习的多变量时间序列异常检测方法(CAD),以解决现有的异常检测方法存在冲突问题。
  • methods: 该方法使用了自适应的多任务学习模型,通过模仿多门混合专家(MMoE)的设计,解决各个指标之间的冲突问题。同时,该方法还提出了一种任务定向的指标选择和个性化和共享的闭合机制,以提高模型的性能。
  • results: 该方法在多个公共数据集上进行了评估,与现有方法相比,它的平均F1分数为0.943,显著超过了现有方法的性能。
    Abstract Massive key performance indicators (KPIs) are monitored as multivariate time series data (MTS) to ensure the reliability of the software applications and service system. Accurately detecting the abnormality of MTS is very critical for subsequent fault elimination. The scarcity of anomalies and manual labeling has led to the development of various self-supervised MTS anomaly detection (AD) methods, which optimize an overall objective/loss encompassing all metrics' regression objectives/losses. However, our empirical study uncovers the prevalence of conflicts among metrics' regression objectives, causing MTS models to grapple with different losses. This critical aspect significantly impacts detection performance but has been overlooked in existing approaches. To address this problem, by mimicking the design of multi-gate mixture-of-experts (MMoE), we introduce CAD, a Conflict-aware multivariate KPI Anomaly Detection algorithm. CAD offers an exclusive structure for each metric to mitigate potential conflicts while fostering inter-metric promotions. Upon thorough investigation, we find that the poor performance of vanilla MMoE mainly comes from the input-output misalignment settings of MTS formulation and convergence issues arising from expansive tasks. To address these challenges, we propose a straightforward yet effective task-oriented metric selection and p&s (personalized and shared) gating mechanism, which establishes CAD as the first practicable multi-task learning (MTL) based MTS AD model. Evaluations on multiple public datasets reveal that CAD obtains an average F1-score of 0.943 across three public datasets, notably outperforming state-of-the-art methods. Our code is accessible at https://github.com/dawnvince/MTS_CAD.
    摘要 巨大的关键性表现指标 (KPIs) 被监测为多变量时间序列数据 (MTS),以确保软件应用程序和服务系统的可靠性。正确地探测 MTS 中的异常是非常关键的,以便随后的故障排除。由于罕见的异常和手动标注的缺乏,已经导致了多种自动化 MTS 异常检测 (AD) 方法的开发,这些方法通过优化总体的目标/损失函数来优化所有指标的回归目标/损失函数。然而,我们的实验发现,存在指标回归目标之间的冲突,导致 MTS 模型面临着不同的损失函数,这种问题在现有的方法中受到了忽视。为解决这个问题,我们采用了模仿多门混合专家 (MMoE) 的设计,并提出了 CAD,即冲突感知多变量 KPI 异常检测算法。CAD 提供了每个指标的独特结构,以避免 potential 冲突,同时推动指标之间的促进。经过了余地的调查,我们发现,普通的 MMoE 的性能问题主要来自 MTS 表示形式和输入输出不一致的问题,以及扩展任务的问题。为解决这些挑战,我们提出了一种简单 yet 有效的任务关注型指标选择和个性化阻塞机制,这使得 CAD 成为了首个实用多任务学习 (MTL) 基于 MTS AD 模型。经过了多个公共数据集的评估,我们发现,CAD 在三个公共数据集上的平均 F1-score 为 0.943,明显超过了当前的状态艺法。我们的代码可以在 GitHub 上找到:

MoCLIM: Towards Accurate Cancer Subtyping via Multi-Omics Contrastive Learning with Omics-Inference Modeling

  • paper_url: http://arxiv.org/abs/2308.09725
  • repo_url: None
  • paper_authors: Ziwei Yang, Zheng Chen, Yasuko Matsubara, Yasushi Sakurai
  • for: This paper aims to improve cancer subtyping outcomes by fully exploiting the potential of multi-omics data.
  • methods: The paper proposes a representation learning framework called MoCLIM, which independently extracts informative features from distinct omics modalities and uses contrastive learning to integrate the features into a unified representation.
  • results: The experimental results on six cancer datasets demonstrate that the proposed approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances, and provides high interpretability in medical analysis.
    Abstract Precision medicine fundamentally aims to establish causality between dysregulated biochemical mechanisms and cancer subtypes. Omics-based cancer subtyping has emerged as a revolutionary approach, as different level of omics records the biochemical products of multistep processes in cancers. This paper focuses on fully exploiting the potential of multi-omics data to improve cancer subtyping outcomes, and hence developed MoCLIM, a representation learning framework. MoCLIM independently extracts the informative features from distinct omics modalities. Using a unified representation informed by contrastive learning of different omics modalities, we can well-cluster the subtypes, given cancer, into a lower latent space. This contrast can be interpreted as a projection of inter-omics inference observed in biological networks. Experimental results on six cancer datasets demonstrate that our approach significantly improves data fit and subtyping performance in fewer high-dimensional cancer instances. Moreover, our framework incorporates various medical evaluations as the final component, providing high interpretability in medical analysis.
    摘要 基础式个性化医学的目标是确定疾病子型的诱导关系。OMICS技术在肿瘤分型中发挥了革命性的作用,不同的OMICS数据记录肿瘤多步骤过程中的生化产物。本文将关注在完全利用多Omics数据来改进肿瘤分型结果方面,因此开发了MoCLIM表示学框架。MoCLIM独立提取不同Omics模式中的有用特征。通过对不同Omics模式的对比学习,我们可以将肿瘤分型下降到一个较低的Latent空间。这种对比可以被解释为生物网络中的Inter-Omics推理。实验结果表明,我们的方法可以在较少的高维度肿瘤实例中显著提高数据适应和分型性能。此外,我们的框架还可以 incorporate多种医学评估作为最后一个组件,提供高度可读性在医学分析中。

Development of a Knowledge Graph Embeddings Model for Pain

  • paper_url: http://arxiv.org/abs/2308.08904
  • repo_url: None
  • paper_authors: Jaya Chaturvedi, Tao Wang, Sumithra Velupillai, Robert Stewart, Angus Roberts
  • for: 这篇论文的目的是构建一个智能知识图库,以便更好地理解报病人所经历的痛苦,并在计算机可 tractable 的形式下进行 semantic 和上下文基于的推理。
  • methods: 该论文使用了知识图库 embedding 技术,将知识图库中的概念和关系转化为低维度向量空间中的表示,以便在下游任务中进行分类和链接预测。其中,外部医学知识库 SNOMED CT 中的各种关系可以提供有用的外部知识。
  • results: 该论文通过对精神医学电子健康纪录中的报病人痛苦概念进行EXTRACTION,并与 SNOMED CT 中的外部知识结合,构建了一个智能知识图库。该模型在主题-对象链接预测任务中的表现被评估,并与其他基eline模型进行比较。
    Abstract Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has been recorded in electronic health records. Knowledge graphs represent concepts and their relations by an interlinked network, enabling semantic and context-based reasoning in a computationally tractable form. These graphs can, however, be too large for efficient computation. Knowledge graph embeddings help to resolve this by representing the graphs in a low-dimensional vector space. These embeddings can then be used in various downstream tasks such as classification and link prediction. The various relations associated with pain which are required to construct such a knowledge graph can be obtained from external medical knowledge bases such as SNOMED CT, a hierarchical systematic nomenclature of medical terms. A knowledge graph built in this way could be further enriched with real-world examples of pain and its relations extracted from electronic health records. This paper describes the construction of such knowledge graph embedding models of pain concepts, extracted from the unstructured text of mental health electronic health records, combined with external knowledge created from relations described in SNOMED CT, and their evaluation on a subject-object link prediction task. The performance of the models was compared with other baseline models.
    摘要 痛苦是一个复杂的概念,可以与其他概念相连接,例如可能导致痛苦的疾病、可能使痛苦减轻的药物等。为了全面理解个人或人口所经历的痛苦Context,我们需要检查所有与痛苦相关的概念和它们之间的关系。这尤其有用于基于电子医疗记录的痛苦模型化。知识图表示概念和它们之间的关系,形成一个相互连接的网络,允许semantic和上下文基础的推理。但这些图可能太大,不可能进行有效的计算。知识图嵌入帮助解决这个问题,将图表示为一个低维度的 вектор空间中。这些嵌入可以在下游任务中使用,如分类和链接预测。对于痛苦的不同关系,可以从外部的医学知识库,如SNOMED CT,获取相关的概念和关系。建立如此的知识图可以通过实际世界中的痛苦和其关系,从电子医疗记录中提取来进一步填充。本文描述了基于痛苦概念的知识图嵌入模型的建立和评估,其中包括从痛苦文本中提取的不结构化数据,以及与SNOMED CT中的外部知识相结合。模型的性能与其他基eline模型进行比较。

Optimal Resource Allocation for U-Shaped Parallel Split Learning

  • paper_url: http://arxiv.org/abs/2308.08896
  • repo_url: None
  • paper_authors: Song Lyu, Zheng Lin, Guanqiao Qu, Xianhao Chen, Xiaoxia Huang, Pan Li
  • for: 本研究旨在提出一种基于U型网络的分布式学习方法,以保护数据所有者的标签隐私。
  • methods: 本文提出了一种基于U型网络的分布式学习方法,其中早期层和末级层都放置在用户端,以避免泄露标签隐私。此外,文章还提出了一种名为LSCRA的资源分配算法,用于优化边缘网络的性能。
  • results: 实验结果表明,LSCRA算法可以有效地优化边缘网络的性能,并且U型PSL方法可以与其他SL基elines具有相似性能的同时保护标签隐私。
    Abstract Split learning (SL) has emerged as a promising approach for model training without revealing the raw data samples from the data owners. However, traditional SL inevitably leaks label privacy as the tail model (with the last layers) should be placed on the server. To overcome this limitation, one promising solution is to utilize U-shaped architecture to leave both early layers and last layers on the user side. In this paper, we develop a novel parallel U-shaped split learning and devise the optimal resource optimization scheme to improve the performance of edge networks. In the proposed framework, multiple users communicate with an edge server for SL. We analyze the end-to-end delay of each client during the training process and design an efficient resource allocation algorithm, called LSCRA, which finds the optimal computing resource allocation and split layers. Our experimental results show the effectiveness of LSCRA and that U-shaped PSL can achieve a similar performance with other SL baselines while preserving label privacy. Index Terms: U-shaped network, split learning, label privacy, resource allocation, 5G/6G edge networks.
    摘要 Split learning (SL) 已经出现为一种有前途的方法,用于模型训练而不泄露原始数据样本的拥有者。然而,传统的 SL 无法保护标签隐私,因为服务器上需要放置尾部模型(具有最后层)。为解决这个限制,一种有前途的解决方案是使用 U 型架构,以保留用户 сторо面上的早期层和尾部层。在这篇论文中,我们开发了一种新的并发 U 型 split learning 框架,并设计了优化性的资源分配策略,以提高边缘网络的性能。在我们的提案中,多个用户与边缘服务器进行 SL 通信。我们分析了每个客户端在训练过程中的末端延迟,并设计了一种高效的资源分配算法,称为 LSCRA,以找到最佳的计算资源分配和分割层。我们的实验结果表明 LSCRA 的有效性,并证明 U 型 PSL 可以在保持标签隐私的情况下达到类似的性能。关键字: U 型网络,split learning,标签隐私,资源分配,5G/6G 边缘网络。

Dual Gauss-Newton Directions for Deep Learning

  • paper_url: http://arxiv.org/abs/2308.08886
  • repo_url: None
  • paper_authors: Vincent Roulet, Mathieu Blondel
  • for: 这篇论文研究了如何使用深度学习目标结构,即几何函数和非线性网络的组合,以获得更好的方向指南,而不是使用渐进随机Gradient。
  • methods: 该论文提出了使用 dual 形式来计算方向指南,从而获得计算上的收益和新的理解。
  • results: 该论文实验表明,使用 dual 形式计算的方向指南可以用作现有优化算法中的替换,并且实际上表现出了更好的性能。
    Abstract Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their dual formulation, leading to both computational benefits and new insights. We demonstrate that the resulting oracles define descent directions that can be used as a drop-in replacement for stochastic gradients, in existing optimization algorithms. We empirically study the advantage of using the dual formulation as well as the computational trade-offs involved in the computation of such oracles.
    摘要 受加ус-新顿方法的启发,我们研究利用深度学习目标的结构,即几何函数和非线性网络的组合,以 derivate更好的方向指南。在之前的工作中,我们提议通过对调方式来计算这些方向指南,从而获得计算效益和新的理解。我们证明了这些方向指南可以作为现有优化算法中的替换,并进行实验研究其优势和计算交易。

Feature Enforcing PINN (FE-PINN): A Framework to Learn the Underlying-Physics Features Before Target Task

  • paper_url: http://arxiv.org/abs/2308.08873
  • repo_url: None
  • paper_authors: Mahyar Jahaninasab, Mohamad Ali Bijarchi
  • for: 这篇论文是为了解决各种分散方程(Partial Differential Equations,PDEs)的问题,并且可以实现低计算成本和快速训练。
  • methods: 这篇论文提出了一个新的数据自由框架(Feature Enforcing Physics Informed Neural Network,FE-PINN),可以在低计算成本下学习问题的基本模式。FE-PINN使用了一系列子任务,先从物理方面学习有用的特征,然后将模型训练到目标任务中以精确化计算。
  • results: FE-PINN可以在三个标准问题中实现15倍、2倍和5倍的速度提升,并且可以实现较低的损失值(1e-5),并且数据挥发过程较为平滑,可以使用更高的学习速率。这个框架可以用于解决各种PDEs领域中的问题,并且具有快速和精确的特点。
    Abstract In this work, a new data-free framework called Feature Enforcing Physics Informed Neural Network (FE-PINN) is introduced. This framework is capable of learning the underlying pattern of any problem with low computational cost before the main training loop. The loss function of vanilla PINN due to the existence of two terms of partial differential residuals and boundary condition mean squared error is imbalanced. FE-PINN solves this challenge with just one minute of training instead of time-consuming hyperparameter tuning for loss function that can take hours. The FE-PINN accomplishes this process by performing a sequence of sub-tasks. The first sub-task learns useful features about the underlying physics. Then, the model trains on the target task to refine the calculations. FE-PINN is applied to three benchmarks, flow over a cylinder, 2D heat conduction, and an inverse problem of calculating inlet velocity. FE-PINN can solve each case with, 15x, 2x, and 5x speed up accordingly. Another advantage of FE-PINN is that reaching lower order of value for loss function is systematically possible. In this study, it was possible to reach a loss value near 1e-5 which is challenging for vanilla PINN. FE-PINN also has a smooth convergence process which allows for utilizing higher learning rates in comparison to vanilla PINN. This framework can be used as a fast, accurate tool for solving a wide range of Partial Differential Equations (PDEs) across various fields.
    摘要 在这项工作中,我们提出了一种新的数据自由框架,即特征强制物理学信息神经网络(FE-PINN)。这个框架可以在低计算成本下学习任务中的下面模式。 vanilla PINN 的损失函数由两个分量的 partial differential residuals 和 boundary condition mean squared error 组成,这两个分量的损失函数是不均衡的。FE-PINN 解决了这个挑战,只需要一分钟的训练时间,而不是浪费时间在hyperparameter tuning中。FE-PINN 通过执行一系列子任务来完成这个过程。首先,它学习了任务中的有用特征。然后,模型在目标任务上进行了细化计算。FE-PINN 应用于三个标准测试函数:流过圆柱、2D 热传导和反问题计算入口速度。在每个情况下,FE-PINN 可以达到15倍、2倍和5倍的速度提升。此外,FE-PINN 可以系统地降低损失函数的次数。在这种研究中,可以达到1e-5的损失值,这是vanilla PINN 所难以达到的。此外,FE-PINN 的 converges 过程是平滑的,因此可以使用高学习率,相比于vanilla PINN。这个框架可以作为一种快速、准确地解决各种 partial differential equations (PDEs) 的工具。

Towards Semi-supervised Learning with Non-random Missing Labels

  • paper_url: http://arxiv.org/abs/2308.08872
  • repo_url: https://github.com/njuyued/prg4ssl-mnar
  • paper_authors: Yue Duan, Zhen Zhao, Lei Qi, Luping Zhou, Lei Wang, Yinghuan Shi
  • for: 解决labels缺失问题,实现效率地使用无标注数据。
  • methods: 基于Markov随机游走建立动态图,使用类转移跟踪信息获取类水平指导,保持模型免受类分布偏袋的影响。
  • results: 在多种实际MNAR场景中,PRG表现出色,与最新的SSL方法结合偏置除掉solution相比,提高了pseudo标签的质量。
    Abstract Semi-supervised learning (SSL) tackles the label missing problem by enabling the effective usage of unlabeled data. While existing SSL methods focus on the traditional setting, a practical and challenging scenario called label Missing Not At Random (MNAR) is usually ignored. In MNAR, the labeled and unlabeled data fall into different class distributions resulting in biased label imputation, which deteriorates the performance of SSL models. In this work, class transition tracking based Pseudo-Rectifying Guidance (PRG) is devised for MNAR. We explore the class-level guidance information obtained by the Markov random walk, which is modeled on a dynamically created graph built over the class tracking matrix. PRG unifies the historical information of class distribution and class transitions caused by the pseudo-rectifying procedure to maintain the model's unbiased enthusiasm towards assigning pseudo-labels to all classes, so as the quality of pseudo-labels on both popular classes and rare classes in MNAR could be improved. Finally, we show the superior performance of PRG across a variety of MNAR scenarios, outperforming the latest SSL approaches combining bias removal solutions by a large margin. Code and model weights are available at https://github.com/NJUyued/PRG4SSL-MNAR.
    摘要 半监督学习(SSL)处理缺失标签问题,使得无标签数据可以有效地使用。然而,现有的SSL方法往往忽略了实际和挑战性的情况,即标签缺失不均匀(MNAR)。在MNAR中,标签和无标签数据分布不同,导致假标签插入偏斜,从而降低SSL模型的性能。本文提出的类转移跟踪基于假正导航(PRG)方法,可以有效地解决MNAR中的标签缺失问题。我们利用Markov随机漫步模型来建立一个动态创建的图,并在图上获取类水平指导信息。PRG通过历史类分布和类转换的 dynamically created graph 来维护模型的偏离性,以确保假标签的质量在MNAR中都能得到改进。最后,我们在多种MNAR场景中证明PRG的超越性,与latest SSL方法组合偏移解决方案相比,差距非常大。代码和模型参数可以在https://github.com/NJUyued/PRG4SSL-MNAR 中下载。

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

  • paper_url: http://arxiv.org/abs/2308.08858
  • repo_url: None
  • paper_authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang
  • for: 这个论文的目的是研究多智能体强化学习(RL)中的二 player零点Markov游戏问题。
  • methods: 这个论文提出了一种基于实际的stage-based Q学习算法,并证明了它可以达到最佳的样本复杂度 $O(H^3SAB/\epsilon^2)$,这是对于观察 horizon $H$ 和状态数 $S$ 的依赖关系。
  • results: 这个论文的主要成果是通过引入variance reduction技术,使得model-free算法可以 дости到与model-based算法相同的最佳样本复杂度。此外,这个算法还提供了一种新的referenc-advantage decompositions技术,可以在Markov游戏中提高样本效率。
    Abstract The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $\epsilon$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/\epsilon^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency.
    摘要 “两个玩家零Sum Markov游戏的问题在多智能人工智能学习(RL)理论研究中受到了越来越多的关注。特别是在finite-horizon episodic Markov决策过程(MDP)中,已经证明了模型基于算法可以在$O(H^3SAB/\epsilon^2)$的样本复杂度下找到ε-优先 equilibria(NE),这是对于很大的horizon $H$和state $S$(其中 $A$ 和 $B$ 分别表示两个玩家的动作数)。然而,现有的模型自由算法无法达到这种优化。在这个工作中,我们提出了一种模型自由stage-based Q学习算法,并证明了它可以达到最佳模型基于算法的样本复杂度,因此在 $H$ 取值上first time demonstrates that model-free algorithms can enjoy the same optimality as model-based algorithms。主要改进来自于 $H$ 的依赖性,通过利用单个智能人工智能学习中广泛使用的变量减少技术,基于reference-advantage decompositions。然而,这种技术需要Markov游戏中值函数的强 monotonicity 属性,这并不是真实存在的。因此,我们的算法具有一个关键的新特点,即在历史中选择最小的值差来更新参考值函数,以达到所需的样本效率改进。”

Bag of Tricks for Long-Tailed Multi-Label Classification on Chest X-Rays

  • paper_url: http://arxiv.org/abs/2308.08853
  • repo_url: None
  • paper_authors: Feng Hong, Tianjie Dai, Jiangchao Yao, Ya Zhang, Yanfeng Wang
  • for: 这个研究是为了提高胸部X射影(CXR)诊断的Value,特别是面对长尾和多 label的挑战。
  • methods: 这个研究使用了多种进步的设计,包括数据增强、特征提取、分类器设计、损失函数重新平衡、外部数据补充等。
  • results: 这个研究获得了2023年ICCV CVAMD CXR-LT竞赛的第五名,其MAP值为0.349。
    Abstract Clinical classification of chest radiography is particularly challenging for standard machine learning algorithms due to its inherent long-tailed and multi-label nature. However, few attempts take into account the coupled challenges posed by both the class imbalance and label co-occurrence, which hinders their value to boost the diagnosis on chest X-rays (CXRs) in the real-world scenarios. Besides, with the prevalence of pretraining techniques, how to incorporate these new paradigms into the current framework lacks of the systematical study. This technical report presents a brief description of our solution in the ICCV CVAMD 2023 CXR-LT Competition. We empirically explored the effectiveness for CXR diagnosis with the integration of several advanced designs about data augmentation, feature extractor, classifier design, loss function reweighting, exogenous data replenishment, etc. In addition, we improve the performance through simple test-time data augmentation and ensemble. Our framework finally achieves 0.349 mAP on the competition test set, ranking in the top five.
    摘要 严重诊断涂整照片的机器学习算法受到胸部X射线(CXR)的自然长尾和多标签性的挑战。然而,目前的尝试很少考虑这两个挑战的同时存在,这限制了它们在真实世界场景中的价值。此外,随着预训练技术的普及,如何在当前框架中包含这些新 paradigma 尚未得到系统性的研究。本技术报告介绍我们在ICCV CVAMD 2023 CXR-LT竞赛中的解决方案。我们employs several advanced designs,包括数据扩充、特征提取器、分类器设计、损失函数重新权衡和外部数据补充等。此外,我们通过简单的测试时数据扩充和集成提高了性能。最终,我们的框架在竞赛测试集上达到了0.349 mAP,排名前五。

Learning the hub graphical Lasso model with the structured sparsity via an efficient algorithm

  • paper_url: http://arxiv.org/abs/2308.08852
  • repo_url: None
  • paper_authors: Chengjing Wang, Peipei Tang, Wenling He, Meixia Lin
  • for: 这个论文的目的是提出一种高效的图模型预测算法,用于处理大量数据时的计算复杂问题。
  • methods: 该算法使用了两个阶段的方法:首先使用对偶替换法生成一个初始解,然后使用半软 Newton法进行修订。
  • results: 实验表明,该算法可以高效地计算图模型,并且在一些高维度任务中可以提高计算效率超过70%,同时仍然保持高质量的预测结果。
    Abstract Graphical models have exhibited their performance in numerous tasks ranging from biological analysis to recommender systems. However, graphical models with hub nodes are computationally difficult to fit, particularly when the dimension of the data is large. To efficiently estimate the hub graphical models, we introduce a two-phase algorithm. The proposed algorithm first generates a good initial point via a dual alternating direction method of multipliers (ADMM), and then warm starts a semismooth Newton (SSN) based augmented Lagrangian method (ALM) to compute a solution that is accurate enough for practical tasks. The sparsity structure of the generalized Jacobian ensures that the algorithm can obtain a nice solution very efficiently. Comprehensive experiments on both synthetic data and real data show that it obviously outperforms the existing state-of-the-art algorithms. In particular, in some high dimensional tasks, it can save more than 70\% of the execution time, meanwhile still achieves a high-quality estimation.
    摘要 文本翻译成简化中文:图形模型在多种任务中表现出色,从生物分析到推荐系统。然而,含有中心节点的图形模型 computationally 困难于适应,特别是当数据维度很大时。为了高效地估算中心图形模型,我们提出了两相态算法。我们的算法首先使用 dual alternating direction method of multipliers (ADMM) 生成一个好的初始点,然后使用 semismooth Newton (SSN) 基于扩展拉格朗日方法 (ALM) 进行热开始,以计算一个够精确的解决方案。通过 generalized Jacobian 的稀疏结构,我们的算法可以很快获得一个好的解。经过对 synthetic data 和实际数据的广泛实验,我们发现我们的算法明显超越了现有的状态态算法,特别是在高维度任务中,可以save более 70% 的执行时间,同时仍能达到高质量的估算。

Machine Learning-Assisted Discovery of Novel Reactor Designs via CFD-Coupled Multi-fidelity Bayesian Optimisation

  • paper_url: http://arxiv.org/abs/2308.08841
  • repo_url: None
  • paper_authors: Tom Savage, Nausheen Basha, Jonathan McDonough, Omar K Matar, Ehecatl Antonio del Rio Chanona
  • For: The paper aims to establish a framework for the next-generation of reactors by leveraging additive manufacturing and intelligent design to improve the performance and sustainability of future chemical processes.* Methods: The authors propose two novel coiled-tube parameterisations that enable the variation of cross-section and coil path, and use multi-fidelity Bayesian optimisation to optimise the designs. They also employ parameterised meshing and simulation to reduce computational costs.* Results: The authors identify key characteristics of optimal reactor designs and experimentally validate two novel geometries that they 3D print. The results demonstrate the potential of the proposed framework for improving the performance and sustainability of future chemical processes.Here’s the simplified Chinese version:* For: 这篇论文目标是建立未来化学过程中的下一代循环器,通过添加制造和智能设计来提高性能和可持续性。* Methods: 作者提出了两种新的卷积管参数化,允许横截面和卷积路径的变化,并使用多质量湾esian优化来优化设计。他们还使用参数化的网格和模拟来降低计算成本。* Results: 作者确定了优化循环器设计的关键特征,并实验验证了两种新的循环器 geometries。结果表明,提案的框架可以提高未来化学过程的性能和可持续性。
    Abstract Additive manufacturing has enabled the production of more advanced reactor geometries, resulting in the potential for significantly larger and more complex design spaces. Identifying and optimising promising configurations within broader design spaces presents a significant challenge for existing human-centric design approaches. As such, existing parameterisations of coiled-tube reactor geometries are low-dimensional with expensive optimisation limiting more complex solutions. Given algorithmic improvements and the onset of additive manufacturing, we propose two novel coiled-tube parameterisations enabling the variation of cross-section and coil path, resulting in a series of high dimensional, complex optimisation problems. To ensure tractable, non-local optimisation where gradients are not available, we apply multi-fidelity Bayesian optimisation. Our approach characterises multiple continuous fidelities and is coupled with parameterised meshing and simulation, enabling lower quality, but faster simulations to be exploited throughout optimisation. Through maximising the plug-flow performance, we identify key characteristics of optimal reactor designs, and extrapolate these to produce two novel geometries that we 3D print and experimentally validate. By demonstrating the design, optimisation, and manufacture of highly parameterised reactors, we seek to establish a framework for the next-generation of reactors, demonstrating that intelligent design coupled with new manufacturing processes can significantly improve the performance and sustainability of future chemical processes.
    摘要 Traditional design approaches for coiled-tube reactors have been limited by low-dimensional parameterizations and expensive optimization methods, which have restricted the exploration of more complex and promising configurations. With the advent of additive manufacturing, we propose two novel coiled-tube parameterizations that enable the variation of cross-section and coil path, leading to a series of high-dimensional, complex optimization problems. To tackle these problems, we employ a multi-fidelity Bayesian optimization approach that integrates parameterized meshing and simulation, allowing for efficient exploration of the design space. By optimizing the plug-flow performance, we identify key characteristics of optimal reactor designs and experimentally validate them through 3D printing and experimental testing. Our findings establish a framework for the next-generation of reactors, demonstrating that intelligent design coupled with advanced manufacturing processes can significantly improve the performance and sustainability of future chemical processes.

ICoNIK: Generating Respiratory-Resolved Abdominal MR Reconstructions Using Neural Implicit Representations in k-Space

  • paper_url: http://arxiv.org/abs/2308.08830
  • repo_url: None
  • paper_authors: Veronika Spieker, Wenqi Huang, Hannah Eichhorn, Jonathan Stelter, Kilian Weiss, Veronika A. Zimmer, Rickmer F. Braren, Dimitrios C. Karampinos, Kerstin Hammernik, Julia A. Schnabel
  • for: 这个论文旨在提出一种基于神经网络的扩展幂等方法,以解决静脉磁共振成像(MRI)中的运动融合问题。
  • methods: 这个论文使用了一种基于神经网络的直接在k空间中学习方法(NIK),通过使用测量的抽取点和数据驱动的呼吸导航信号,来生成连续的信号值。此外,论文还引入了一种有 информацию的修正层(ICo),以正则化缺失样本区域。
  • results: 论文的实验结果表明,NIK和ICoNIK方法可以超越标准的运动融合重建方法,并提供了一种有 Promise的解决方案,以解决静脉磁共振成像中的运动artefacts问题。
    Abstract Motion-resolved reconstruction for abdominal magnetic resonance imaging (MRI) remains a challenge due to the trade-off between residual motion blurring caused by discretized motion states and undersampling artefacts. In this work, we propose to generate blurring-free motion-resolved abdominal reconstructions by learning a neural implicit representation directly in k-space (NIK). Using measured sampling points and a data-derived respiratory navigator signal, we train a network to generate continuous signal values. To aid the regularization of sparsely sampled regions, we introduce an additional informed correction layer (ICo), which leverages information from neighboring regions to correct NIK's prediction. Our proposed generative reconstruction methods, NIK and ICoNIK, outperform standard motion-resolved reconstruction techniques and provide a promising solution to address motion artefacts in abdominal MRI.
    摘要 对于腹部磁共振成像(MRI)中的运动解析仍然是一个挑战,这是因为运动状态的精细化会导致剩下的运动抑制和抽象缺失的问题。在这项工作中,我们提出了一种通过直接在k空间学习神经网络(NIK)来生成无抑制的运动解析图像。使用测量的抽象点和数据驱动的呼吸导航信号,我们在网络中训练了连续的信号值。为了帮助稀疏抽象区域的正则化,我们引入了一个知情更正层(ICo),该层利用周围区域的信息来修正NIK的预测。我们的提议的生成重建方法,NIK和ICoNIK,在标准运动解析重建技术的基础上超越,并提供了对运动缺失的可能解决方案。

Controlling Federated Learning for Covertness

  • paper_url: http://arxiv.org/abs/2308.08825
  • repo_url: None
  • paper_authors: Adit Jain, Vikram Krishnamurthy
  • for: 本研究旨在减少一个函数 $f$ 的值,同时隐藏 $\arg\min f$ FROM 一个恶意窃听者。
  • methods: 本研究使用了 Markov decision process 模型来控制随机 gradient 算法,并提出了一种 computationally efficient policy gradient algorithm。
  • results: 数值结果显示,当学习者使用优化策略时,恶意窃听者只能在无信息情况下达到 $52%$ 的验证精度,而在拥有公共数据集的情况下,恶意窃听者只能达到 $69%$ 的验证精度,与学习者使用排队策略相比,恶意窃听者可以达到 $83%$ 的验证精度。
    Abstract A learner aims to minimize a function $f$ by repeatedly querying a distributed oracle that provides noisy gradient evaluations. At the same time, the learner seeks to hide $\arg\min f$ from a malicious eavesdropper that observes the learner's queries. This paper considers the problem of \textit{covert} or \textit{learner-private} optimization, where the learner has to dynamically choose between learning and obfuscation by exploiting the stochasticity. The problem of controlling the stochastic gradient algorithm for covert optimization is modeled as a Markov decision process, and we show that the dynamic programming operator has a supermodular structure implying that the optimal policy has a monotone threshold structure. A computationally efficient policy gradient algorithm is proposed to search for the optimal querying policy without knowledge of the transition probabilities. As a practical application, our methods are demonstrated on a hate speech classification task in a federated setting where an eavesdropper can use the optimal weights to generate toxic content, which is more easily misclassified. Numerical results show that when the learner uses the optimal policy, an eavesdropper can only achieve a validation accuracy of $52\%$ with no information and $69\%$ when it has a public dataset with 10\% positive samples compared to $83\%$ when the learner employs a greedy policy.
    摘要 学生希望减少函数 $f$ 的值,通过 repeatedly 查询分布式 oracle 提供的噪声梯度评估。同时,学生希望隐藏 $\arg\min f$ 从一个恶意窃听者的观察。这篇论文考虑了 covert 或 learner-private 优化问题,其中学生需要在权衡学习和隐蔽之间 dynamically 选择。我们模型了控制随机梯度算法的 Markov 决策过程,并证明优化问题的动态 програм环境有超模ular 结构,这 imply 优化策略具有 monotone 阈值结构。我们提出了一种 computationally 高效的策略梯度算法,可以在不知道过渡概率的情况下搜索优化策略。在实际应用中,我们在一个 federated 环境中进行了一个 hate speech 分类任务,其中一个窃听者可以使用优化后的权重生成恶意内容,这种内容更易被识别。numerical 结果表明,当学生使用优化策略时,窃听者只能在没有信息的情况下 achieves 52% 的验证精度,而在 possession 10% 正例样本的情况下,窃听者只能 achieves 69% 的验证精度,与学生使用恶性策略时的精度相比。

Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active Learning

  • paper_url: http://arxiv.org/abs/2308.08823
  • repo_url: None
  • paper_authors: Tianmeng Yang, Min Zhou, Yujing Wang, Zhengjie Lin, Lujia Pan, Bin Cui, Yunhai Tong
  • for: 提高图像神经网络(GNNs)性能,通过在图像中选择最有用的节点进行标注。
  • methods: 提出了一种Semantic-aware Active learning framework for Graphs(SAG),通过对节点之间的对比来评估节点的影响,并设计了一种新的原型基于的评价标准和查询策略来保持节点的多样性和分类均衡。
  • results: 对公共测试图和一个真实世界金融数据集进行了广泛的实验,并证明了SAG可以显著提高节点分类性能,并常常超越先前的方法。
    Abstract Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges. One major challenge is that existing GAL strategies may introduce semantic confusion to the selected training set, particularly when graphs are noisy. Specifically, most existing methods assume all aggregating features to be helpful, ignoring the semantically negative effect between inter-class edges under the message-passing mechanism. In this work, we present Semantic-aware Active learning framework for Graphs (SAG) to mitigate the semantic confusion problem. Pairwise similarities and dissimilarities of nodes with semantic features are introduced to jointly evaluate the node influence. A new prototype-based criterion and query policy are also designed to maintain diversity and class balance of the selected nodes, respectively. Extensive experiments on the public benchmark graphs and a real-world financial dataset demonstrate that SAG significantly improves node classification performances and consistently outperforms previous methods. Moreover, comprehensive analysis and ablation study also verify the effectiveness of the proposed framework.
    摘要 格Active学习(GAL),旨在在图中找出最有信息的节点进行标注,以最大化图内 нейрон网络(GNNs)性能,已经吸引了许多研究努力,但仍存在许多挑战。一个主要挑战是现有GAL策略可能会在选择训练集时引入semantic confusion,特别是当图像具有噪音时。现有方法几乎所有的聚合特征都被视为有用,忽略了在消息传递机制下 интер-类边的semantic负面效果。在这项工作中,我们提出了Semantic-aware Active learning Framework for Graphs(SAG),以 Mitigate semantic confusion problem。节点之间的对比相似性和不同性,以及节点的semantic特征被引入,以同时评估节点的影响力。我们还设计了一种新的原型基于准则和查询策略,以保持选择节点的多样性和类别均衡。extensive experiments表明,SAG在公共 benchmark graphs 和一个实际的金融数据集上显著提高节点分类性能,并在前一些方法之上出现consistently。此外,我们还进行了完整的分析和剖析研究,以证明提案的有效性。

A Fusion of Variational Distribution Priors and Saliency Map Replay for Continual 3D Reconstruction

  • paper_url: http://arxiv.org/abs/2308.08812
  • repo_url: None
  • paper_authors: Sanchar Palit, Sandika Biswas
  • for: 这种研究旨在预测单个图像中的3D对象形状。
  • methods: 该方法使用Variational Priors来设计模型,以确保在新类的训练后仍能reasonably重建之前看到的类。
  • results: 经验表明,该方法可以与已有方法相比, Both quantitatively and qualitatively提供竞争力强的结果。
    Abstract Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images. This task requires significant data acquisition to predict both visible and occluded portions of the shape. Furthermore, learning-based methods face the difficulty of creating a comprehensive training dataset for all possible classes. To this end, we propose a continual learning-based 3D reconstruction method where our goal is to design a model using Variational Priors that can still reconstruct the previously seen classes reasonably even after training on new classes. Variational Priors represent abstract shapes and combat forgetting, whereas saliency maps preserve object attributes with less memory usage. This is vital due to resource constraints in storing extensive training data. Additionally, we introduce saliency map-based experience replay to capture global and distinct object features. Thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.
    摘要 <>单图3D重建是一项研究挑战,旨在从单个视图图像中预测3D物体形状。这个任务需要大量数据收集,以预测可见和遮盖的部分。学习基于方法面临的挑战是创建全面训练数据集 для所有可能的类。为此,我们提议一种持续学习基于Variational Priors的3D重建方法,目标是使用Variational Priors来表示抽象形状,并使得已经训练过的类可以reasonably重建。Variational Priors战胜忘记,而saliency maps保留物体特征具有较少的内存使用。这是因为训练数据存储具有资源约束。此外,我们引入saliency map基于经验回放,以捕捉全球和特定物体特征。经过全面的实验,我们的方法与已知方法相比,具有竞争力的结果 both quantitatively and qualitatively。<>

Label Shift Adapter for Test-Time Adaptation under Covariate and Label Shifts

  • paper_url: http://arxiv.org/abs/2308.08810
  • repo_url: None
  • paper_authors: Sunghyun Park, Seunghan Yang, Jaegul Choo, Sungrack Yun
  • for: 这篇研究的目的是提出一种可以在批处理时进行适应,以适应目标频谱中的标签分布偏移。
  • methods: 本研究使用了现有的TTA方法,并添加了一种标签偏移适应器来处理标签偏移。
  • results: 研究表明,通过将标签偏移适应器与TTA方法结合使用,可以在批处理时进行有效的适应,并且可以提高模型的性能在标签和covariate偏移的情况下。
    Abstract Test-time adaptation (TTA) aims to adapt a pre-trained model to the target domain in a batch-by-batch manner during inference. While label distributions often exhibit imbalances in real-world scenarios, most previous TTA approaches typically assume that both source and target domain datasets have balanced label distribution. Due to the fact that certain classes appear more frequently in certain domains (e.g., buildings in cities, trees in forests), it is natural that the label distribution shifts as the domain changes. However, we discover that the majority of existing TTA methods fail to address the coexistence of covariate and label shifts. To tackle this challenge, we propose a novel label shift adapter that can be incorporated into existing TTA approaches to deal with label shifts during the TTA process effectively. Specifically, we estimate the label distribution of the target domain to feed it into the label shift adapter. Subsequently, the label shift adapter produces optimal parameters for the target label distribution. By predicting only the parameters for a part of the pre-trained source model, our approach is computationally efficient and can be easily applied, regardless of the model architectures. Through extensive experiments, we demonstrate that integrating our strategy with TTA approaches leads to substantial performance improvements under the joint presence of label and covariate shifts.
    摘要

  • paper_url: http://arxiv.org/abs/2308.08799
  • repo_url: https://github.com/jingxiaoyi/pare
  • paper_authors: Jiazheng Jing, Yinan Zhang, Xin Zhou, Zhiqi Shen
  • for: 本研究旨在提供一种不具体化推荐方法,以适应用户的选择习惯和Item的流行度变化。
  • methods: 本方法包括四个模块:历史人气、时间影响、周期影响和Side信息。这些模块都集中在预测Item的流行度上。最后,一个注意层用于融合这四个模块的输出。
  • results: 实验结果表明,PARE可以和现有的先进推荐方法相比,具有相似或更高的性能。此外,PARE可以轻松地与现有的推荐方法结合使用,从而提高总的性能。
    Abstract Recommender systems have been gaining increasing research attention over the years. Most existing recommendation methods focus on capturing users' personalized preferences through historical user-item interactions, which may potentially violate user privacy. Additionally, these approaches often overlook the significance of the temporal fluctuation in item popularity that can sway users' decision-making. To bridge this gap, we propose Popularity-Aware Recommender (PARE), which makes non-personalized recommendations by predicting the items that will attain the highest popularity. PARE consists of four modules, each focusing on a different aspect: popularity history, temporal impact, periodic impact, and side information. Finally, an attention layer is leveraged to fuse the outputs of four modules. To our knowledge, this is the first work to explicitly model item popularity in recommendation systems. Extensive experiments show that PARE performs on par or even better than sophisticated state-of-the-art recommendation methods. Since PARE prioritizes item popularity over personalized user preferences, it can enhance existing recommendation methods as a complementary component. Our experiments demonstrate that integrating PARE with existing recommendation methods significantly surpasses the performance of standalone models, highlighting PARE's potential as a complement to existing recommendation methods. Furthermore, the simplicity of PARE makes it immensely practical for industrial applications and a valuable baseline for future research.
    摘要 “推荐系统在年年都获得更多的研究注意力。现有的大多数推荐方法强调 recording 用户对项目的个人化偏好,这可能会侵犯用户隐私。此外,这些方法通常忽略项目的时间影响和周期性,这可能会影响用户的决策。为了填补这个差距,我们提出了 Popularity-Aware Recommender(PARE),它预测项目将在未来获得最高人气。PARE 由四个模块组成:历史人气、时间影响、周期影响和副资料。最后,我们使用注意力层进行融合。根据我们所知,这是第一个针对推荐系统中项目人气的明确模型。我们的实验显示,PARE 与现有的先进推荐方法相比,在许多情况下表现相似或甚至更好。由于 PARE 将人气优先顺位推荐,因此它可以强化现有的推荐方法,成为辅助性的一部分。我们的实验显示,将 PARE 与现有的推荐方法结合,对推荐性能有着明显的提升。此外,PARE 的简单性使其在工业应用中实际和有用,并且成为未来研究的良好基础。”

Joint Local Relational Augmentation and Global Nash Equilibrium for Federated Learning with Non-IID Data

  • paper_url: http://arxiv.org/abs/2308.11646
  • repo_url: None
  • paper_authors: Xinting Liao, Chaochao Chen, Weiming Liu, Pengyang Zhou, Huabin Zhu, Shuheng Shen, Weiqiang Wang, Mengling Hu, Yanchao Tan, Xiaolin Zheng
  • for: 提高 Federated Learning (FL) 在实际应用中的效果,解决非独立和同分布 (non-IID) 数据模型下的问题。
  • methods: 提出 FedRANE 方法,包括两个主要模块:本地关系增强 (LRA) 和全局尼亚希 equilibrium (GNE),同时解决 intra-和 inter-客户端不一致性问题。
  • results: 在四个基准数据集上进行了广泛的实验,证明 FedRANE 能够提高 FL 在非独立和同分布数据下的性能。
    Abstract Federated learning (FL) is a distributed machine learning paradigm that needs collaboration between a server and a series of clients with decentralized data. To make FL effective in real-world applications, existing work devotes to improving the modeling of decentralized data with non-independent and identical distributions (non-IID). In non-IID settings, there are intra-client inconsistency that comes from the imbalanced data modeling, and inter-client inconsistency among heterogeneous client distributions, which not only hinders sufficient representation of the minority data, but also brings discrepant model deviations. However, previous work overlooks to tackle the above two coupling inconsistencies together. In this work, we propose FedRANE, which consists of two main modules, i.e., local relational augmentation (LRA) and global Nash equilibrium (GNE), to resolve intra- and inter-client inconsistency simultaneously. Specifically, in each client, LRA mines the similarity relations among different data samples and enhances the minority sample representations with their neighbors using attentive message passing. In server, GNE reaches an agreement among inconsistent and discrepant model deviations from clients to server, which encourages the global model to update in the direction of global optimum without breaking down the clients optimization toward their local optimums. We conduct extensive experiments on four benchmark datasets to show the superiority of FedRANE in enhancing the performance of FL with non-IID data.
    摘要 federated 学习(FL)是一种分布式机器学习模式,需要服务器和多个客户端之间的合作,以便处理 Decentralized 数据。为了在实际应用中使 FL 有效,现有的工作主要关注于改进非独立和相同分布(non-IID)的数据模型。在非独立设置中,存在客户端内的不一致性,来自不均匀数据模型的偏好,以及客户端之间的不一致性,这不仅限制了少数据的充分表示,还导致模型偏差不一致。然而,先前的工作忽视了同时解决上述两个 coupling 不一致性。在这种情况下,我们提出了 FedRANE,它包括两个主要模块:本地关系增强(LRA)和全局尼亚希尔(GNE)。具体来说,在每个客户端中,LRA 挖掘不同数据样本之间的相似关系,并使用敏感消息传递来增强少数据表示。在服务器端,GNE 达成了客户端和服务器之间的不一致和不一致的模型偏差,以便服务器模型更新在全局最优点 без 破坏客户端的最优点。我们在四个标准数据集上进行了广泛的实验,以显示 FedRANE 在非独立数据上提高 FL 性能的优越性。

Bayesian polynomial neural networks and polynomial neural ordinary differential equations

  • paper_url: http://arxiv.org/abs/2308.10892
  • repo_url: None
  • paper_authors: Colby Fronk, Jaewoong Yun, Prashant Singh, Linda Petzold
  • for: 方法可以用于很多科学和工程问题的方程回归。
  • methods: 使用抽象回归和普通微分方程(ODEs)来进行方程回归,但是这些方法只能提供点估计。
  • results: 通过开发和验证 Laplace aproximation、MCMC 抽样方法和变分推理来解决噪声数据问题,发现 Laplace aproximation 是这类问题的最佳方法。
    Abstract Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.
    摘要 Symbolic regression with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) are two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.Translation in Simplified Chinese: symbolic 回归 with polynomial neural networks and polynomial neural ordinary differential equations (ODEs) 是 two recent and powerful approaches for equation recovery of many science and engineering problems. However, these methods provide point estimates for the model parameters and are currently unable to accommodate noisy data. We address this challenge by developing and validating the following Bayesian inference methods: the Laplace approximation, Markov Chain Monte Carlo (MCMC) sampling methods, and variational inference. We have found the Laplace approximation to be the best method for this class of problems. Our work can be easily extended to the broader class of symbolic neural networks to which the polynomial neural network belongs.

Tipping Point Forecasting in Non-Stationary Dynamics on Function Spaces

  • paper_url: http://arxiv.org/abs/2308.08794
  • repo_url: None
  • paper_authors: Miguel Liu-Schiaffini, Clare E. Singer, Nikola Kovachki, Tapio Schneider, Kamyar Azizzadenesheli, Anima Anandkumar
  • for: 本研究旨在预测非站点动力系统的突然变化,如气候战况中的云覆盖减少。
  • methods: 本文使用一种新型的循环神经网络算法(RNO)来学习函数空间的映射关系,并使用不确定性基于的方法来探测未来的突然变化。
  • results: 我们的实验表明,即使只有部分或不完整的物理约束,我们的方法仍可以准确预测未来的突然变化,并且可以提供准确性的度量。
    Abstract Tipping points are abrupt, drastic, and often irreversible changes in the evolution of non-stationary and chaotic dynamical systems. For instance, increased greenhouse gas concentrations are predicted to lead to drastic decreases in low cloud cover, referred to as a climatological tipping point. In this paper, we learn the evolution of such non-stationary dynamical systems using a novel recurrent neural operator (RNO), which learns mappings between function spaces. After training RNO on only the pre-tipping dynamics, we employ it to detect future tipping points using an uncertainty-based approach. In particular, we propose a conformal prediction framework to forecast tipping points by monitoring deviations from physics constraints (such as conserved quantities and partial differential equations), enabling forecasting of these abrupt changes along with a rigorous measure of uncertainty. We illustrate our proposed methodology on non-stationary ordinary and partial differential equations, such as the Lorenz-63 and Kuramoto-Sivashinsky equations. We also apply our methods to forecast a climate tipping point in stratocumulus cloud cover. In our experiments, we demonstrate that even partial or approximate physics constraints can be used to accurately forecast future tipping points.
    摘要 �ipped points are abrupt, drastic, and often irreversible changes in the evolution of non-stationary and chaotic dynamical systems. For instance, increased greenhouse gas concentrations are predicted to lead to drastic decreases in low cloud cover, referred to as a climatological tipping point. In this paper, we learn the evolution of such non-stationary dynamical systems using a novel recurrent neural operator (RNO), which learns mappings between function spaces. After training RNO on only the pre-tipping dynamics, we employ it to detect future tipping points using an uncertainty-based approach. In particular, we propose a conformal prediction framework to forecast tipping points by monitoring deviations from physics constraints (such as conserved quantities and partial differential equations), enabling forecasting of these abrupt changes along with a rigorous measure of uncertainty. We illustrate our proposed methodology on non-stationary ordinary and partial differential equations, such as the Lorenz-63 and Kuramoto-Sivashinsky equations. We also apply our methods to forecast a climate tipping point in stratocumulus cloud cover. In our experiments, we demonstrate that even partial or approximate physics constraints can be used to accurately forecast future tipping points.Here's the translation in Traditional Chinese:�ipped points are abrupt, drastic, and often irreversible changes in the evolution of non-stationary and chaotic dynamical systems. For instance, increased greenhouse gas concentrations are predicted to lead to drastic decreases in low cloud cover, referred to as a climatological tipping point. In this paper, we learn the evolution of such non-stationary dynamical systems using a novel recurrent neural operator (RNO), which learns mappings between function spaces. After training RNO on only the pre-tipping dynamics, we employ it to detect future tipping points using an uncertainty-based approach. In particular, we propose a conformal prediction framework to forecast tipping points by monitoring deviations from physics constraints (such as conserved quantities and partial differential equations), enabling forecasting of these abrupt changes along with a rigorous measure of uncertainty. We illustrate our proposed methodology on non-stationary ordinary and partial differential equations, such as the Lorenz-63 and Kuramoto-Sivashinsky equations. We also apply our methods to forecast a climate tipping point in stratocumulus cloud cover. In our experiments, we demonstrate that even partial or approximate physics constraints can be used to accurately forecast future tipping points.

Federated Reinforcement Learning for Electric Vehicles Charging Control on Distribution Networks

  • paper_url: http://arxiv.org/abs/2308.08792
  • repo_url: None
  • paper_authors: Junkai Qian, Yuning Jiang, Xin Liu, Qing Wang, Ting Wang, Yuanming Shi, Wei Chen
    for:This paper focuses on developing a novel approach for EV charging control that considers the natural power flow of EV charging/discharging in the distribution network and driver privacy.methods:The proposed approach combines multi-EV charging/discharging with a radial distribution network (RDN) operating under optimal power flow (OPF) to distribute power flow in real time. A federated deep reinforcement learning algorithm named FedSAC is used to learn the optimal EV charging control strategy.results:The proposed approach is effective in balancing V2G profits, RDN load, and driver anxiety. Comprehensive simulation results demonstrate the superiority of the proposed algorithm in terms of the diversity of the charging control strategy, power fluctuations on RDN, convergence efficiency, and generalization ability.
    Abstract With the growing popularity of electric vehicles (EVs), maintaining power grid stability has become a significant challenge. To address this issue, EV charging control strategies have been developed to manage the switch between vehicle-to-grid (V2G) and grid-to-vehicle (G2V) modes for EVs. In this context, multi-agent deep reinforcement learning (MADRL) has proven its effectiveness in EV charging control. However, existing MADRL-based approaches fail to consider the natural power flow of EV charging/discharging in the distribution network and ignore driver privacy. To deal with these problems, this paper proposes a novel approach that combines multi-EV charging/discharging with a radial distribution network (RDN) operating under optimal power flow (OPF) to distribute power flow in real time. A mathematical model is developed to describe the RDN load. The EV charging control problem is formulated as a Markov Decision Process (MDP) to find an optimal charging control strategy that balances V2G profits, RDN load, and driver anxiety. To effectively learn the optimal EV charging control strategy, a federated deep reinforcement learning algorithm named FedSAC is further proposed. Comprehensive simulation results demonstrate the effectiveness and superiority of our proposed algorithm in terms of the diversity of the charging control strategy, the power fluctuations on RDN, the convergence efficiency, and the generalization ability.
    摘要 随着电动车 (EV) 的普及,维护电力网络稳定性已成为一项重要挑战。为解决这问题,EV充电控制策略已被开发来管理电动车充电/充电模式之间的切换。在这个上下文中,多代理深度学习 (MADRL) 已经证明其在EV充电控制中的效iveness。然而,现有的MADRL基于方法忽略了电动车充电/充电的自然电力流和驾驶员隐私。为解决这些问题,本文提出了一种新的方法,即将多个电动车充电/充电与 радиаль分布网络 (RDN) 操作于优化电力流 (OPF) 下,以实时分配电力流。一个数学模型被开发来描述RDN负荷。EV充电控制问题被形式化为Markov决策过程 (MDP),以找到一个优化充电控制策略,把V2G利润、RDN负荷和驾驶员焦虑平衡。为有效地学习优化EV充电控制策略,一种名为FedSAC的联邦深度学习算法被进一步提出。 simulation结果表明,我们提出的算法在多种维度上表现出优异性,包括充电控制策略多样性、RDN电力波动、效率和总体化能力。

APPFLx: Providing Privacy-Preserving Cross-Silo Federated Learning as a Service

  • paper_url: http://arxiv.org/abs/2308.08786
  • repo_url: None
  • paper_authors: Zilinghan Li, Shilan He, Pranshu Chaturvedi, Trung-Hieu Hoang, Minseok Ryu, E. A. Huerta, Volodymyr Kindratenko, Jordan Fuhrman, Maryellen Giger, Ryan Chard, Kibaek Kim, Ravi Madduri
  • For: This paper aims to provide a ready-to-use platform for cross-silo privacy-preserving federated learning (PPFL) as a service, making it easier and faster for domain experts and machine learning practitioners to collaboratively train robust and generalized machine learning models without sharing sensitive local data.* Methods: The paper introduces APPFLx, a platform that employs Globus authentication, implements several synchronous and asynchronous federated learning algorithms, streamlines the experiment launch process, and enables tracking and visualizing the life cycle of federated learning experiments.* Results: The paper provides a ready-to-use platform for cross-silo PPFL as a service, making it easier and faster for domain experts and machine learning practitioners to collaboratively train robust and generalized machine learning models without sharing sensitive local data.In Simplified Chinese text, the three key points would be:* For: 这篇论文目的是提供一个跨存储器隐私保护联合学习(PPFL)服务,使域专家和机器学习实践者可以轻松地和更快地在不分享敏感地方数据的情况下,共同训练Robust和通用机器学习模型。* Methods: 论文引入了APPFLx平台,该平台使用Globus认证、实现了同步和异步联合学习算法、简化了实验启动过程,并允许域专家和机器学习实践者轻松地和可见地管理和评估跨存储器联合学习实验。* Results: 论文提供了一个跨存储器PPFL服务,使域专家和机器学习实践者可以轻松地和更快地在不分享敏感地方数据的情况下,共同训练Robust和通用机器学习模型。
    Abstract Cross-silo privacy-preserving federated learning (PPFL) is a powerful tool to collaboratively train robust and generalized machine learning (ML) models without sharing sensitive (e.g., healthcare of financial) local data. To ease and accelerate the adoption of PPFL, we introduce APPFLx, a ready-to-use platform that provides privacy-preserving cross-silo federated learning as a service. APPFLx employs Globus authentication to allow users to easily and securely invite trustworthy collaborators for PPFL, implements several synchronous and asynchronous FL algorithms, streamlines the FL experiment launch process, and enables tracking and visualizing the life cycle of FL experiments, allowing domain experts and ML practitioners to easily orchestrate and evaluate cross-silo FL under one platform. APPFLx is available online at https://appflx.link
    摘要 cross-silo隐私保护联邦学习(PPFL)是一种强大的工具,可以无需共享敏感数据(如医疗或金融)进行共同训练 Robust和通用机器学习(ML)模型。为了方便和加速PPFL的采用,我们引入了APPFLx,一个准备就绪的平台,提供了隐私保护跨积 Ley 联邦学习作为服务。APPFLx 使用 Globus 鉴别来允许用户轻松地邀请可靠的合作者参与PPFL,实现了一些同步和异步联邦学习算法,通过简化联邦学习实验启动过程,并允许域专家和机器学习实践者轻松地组织和评估跨积 Ley 联邦学习。APPFLx 可以在https://appflx.link 上线上获取。

Knowledge-inspired Subdomain Adaptation for Cross-Domain Knowledge Transfer

  • paper_url: http://arxiv.org/abs/2308.09724
  • repo_url: None
  • paper_authors: Liyue Chen, Linian Wang, Jinyu Xu, Shuai Chen, Weiqiang Wang, Wenbiao Zhao, Qiyu Li, Leye Wang
  • for: 这篇论文是用于提出一个新的专domain adaptation(KISA)框架,以便在不同领域进行精细化的领域适应。
  • methods: 这篇论文使用了知识驱动的子领域分配问题,以及一个知识融合网络来推导多元领域知识。
  • results: 实验结果显示,KISA在骗案检测和交通流量预测任务中获得了优异的成绩。
    Abstract Most state-of-the-art deep domain adaptation techniques align source and target samples in a global fashion. That is, after alignment, each source sample is expected to become similar to any target sample. However, global alignment may not always be optimal or necessary in practice. For example, consider cross-domain fraud detection, where there are two types of transactions: credit and non-credit. Aligning credit and non-credit transactions separately may yield better performance than global alignment, as credit transactions are unlikely to exhibit patterns similar to non-credit transactions. To enable such fine-grained domain adaption, we propose a novel Knowledge-Inspired Subdomain Adaptation (KISA) framework. In particular, (1) We provide the theoretical insight that KISA minimizes the shared expected loss which is the premise for the success of domain adaptation methods. (2) We propose the knowledge-inspired subdomain division problem that plays a crucial role in fine-grained domain adaption. (3) We design a knowledge fusion network to exploit diverse domain knowledge. Extensive experiments demonstrate that KISA achieves remarkable results on fraud detection and traffic demand prediction tasks.
    摘要 现代深度领域适应技术通常在全局方式对源和目标样本进行协调。即 после协调,每个源样本都会变得类似于任何目标样本。但全局协调可能并不总是最佳或必要的。例如,考虑跨领域诈骗探测,其中有两种交易:信用和非信用。对这两种交易进行独立协调可能会产生更好的性能,因为信用交易不太可能会展现与非信用交易类似的模式。为实现这种细腻领域适应,我们提出了一个新的知识授意子领域适应(KISA)框架。具体来说,我们提供了适应领域适应的理论基础,并设计了知识融合网络来利用多个领域知识。广泛的实验证明了KISA在诈骗探测和交通需求预测任务上具有很好的表现。

Environment Diversification with Multi-head Neural Network for Invariant Learning

  • paper_url: http://arxiv.org/abs/2308.08778
  • repo_url: https://github.com/joe0123/EDNIL
  • paper_authors: Bo-Wei Huang, Keng-Te Liao, Chang-Sheng Kao, Shou-De Lin
  • for: 这篇论文目的是提出一个不受分布变化的影响的学习框架,以提高神经网络的预测性。
  • methods: 这个框架使用了多头神经网络来吸收数据偏见,并无需对环境或先前训练模型做强大的假设。
  • results: 我们的实验结果显示,使用这个框架训练的模型在分布变化下的预测性较高。
    Abstract Neural networks are often trained with empirical risk minimization; however, it has been shown that a shift between training and testing distributions can cause unpredictable performance degradation. On this issue, a research direction, invariant learning, has been proposed to extract invariant features insensitive to the distributional changes. This work proposes EDNIL, an invariant learning framework containing a multi-head neural network to absorb data biases. We show that this framework does not require prior knowledge about environments or strong assumptions about the pre-trained model. We also reveal that the proposed algorithm has theoretical connections to recent studies discussing properties of variant and invariant features. Finally, we demonstrate that models trained with EDNIL are empirically more robust against distributional shifts.
    摘要 神经网络常用емпирическая风险最小化来训练,但是已经证明了在训练和测试分布之间的差异可能会导致性能下降。为解决这个问题,一种研究方向——抗变域学习(Invariant Learning)——已经被提出,以EXTRACT不敏感于分布变化的特征。本文提出了一种基于多头神经网络的EDNIL框架,用于吸收数据偏见。我们表明该框架不需要先知环境或强制假设先训练模型。此外,我们还发现该算法与最近的变异特征和抗变异特征的研究有理论上的连接。最后,我们通过实验表明,使用EDNIL训练的模型在分布变化情况下的性能更加稳定。

Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models

  • paper_url: http://arxiv.org/abs/2308.08774
  • repo_url: None
  • paper_authors: Phillip Rust, Anders Søgaard
  • for: 这些论文旨在解决多语言模型如BERT、XLM-R和BLOOM等模型实现多语言普适化或压缩,以便将其应用到大量(可能未看过)语言中。
  • methods: 这些模型应该同时满足私有性、语言公平性和透明度的要求,通过关系其预测与训练数据之间的关系。
  • results: 我们发现多语言压缩和语言公平性可以同时满足隐私保障,但是隐私保障与训练数据的稀缺性矛盾,这意味着在不同的隐私保障水平下, multilingual compression 和训练数据的影响稀缺性之间存在负相关性。我们进一步进行了两个常见的 NLP 任务的实验,并评估了 multilingual compression 和训练数据的影响稀缺性在不同的隐私保障水平下,从而更加深入地探讨这些负相关性的交互。
    Abstract Language models such as mBERT, XLM-R, and BLOOM aim to achieve multilingual generalization or compression to facilitate transfer to a large number of (potentially unseen) languages. However, these models should ideally also be private, linguistically fair, and transparent, by relating their predictions to training data. Can these requirements be simultaneously satisfied? We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency. We further present a series of experiments on two common NLP tasks and evaluate multilingual compression and training data influence sparsity under different privacy guarantees, exploring these trade-offs in more detail. Our results suggest that we need to develop ways to jointly optimize for these objectives in order to find practical trade-offs.
    摘要 языковые модели, такие как mBERT, XLM-R и BLOOM, стремятся достичь многоязычной общей генерализации или сжатия для упрощения передачи большого количества (возможно невидимых) языков. Однако эти модели должны быть при этом частными, справедливыми с точки зрения языка и прозрачными, связывая свои прогнозы с данными обучения. Можно ли эти требования одновременно удовлетворить? Мы показываем, что многоязычное сжатие и справедливость с точки зрения языка совместимы с защитой конфиденциальности, но защита конфиденциальности противоречит снижению количества данных обучения, цели, которая связана с прозрачностью. Мы также представляем серию экспериментов по двум распространенным задачам NLP и оцениваем многоязычное сжатие и влияние данных обучения на различных гарантиях конфиденциальности, изучая эти противоречия в более подробном режиме. Наши результаты показывают, что необходимо разработать способы совместно оптимизировать эти цели, чтобы найти практические компромиссы.

Sensor Fusion by Spatial Encoding for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2308.10707
  • repo_url: None
  • paper_authors: Quoc-Vinh Lai-Dang, Jihui Lee, Bumgeun Park, Dongsoo Har
  • for: 本研究旨在提出一种用于摄像头和激光雷达数据融合的方法,以提高自动驾驶和机器人视觉系统的性能。
  • methods: 该方法使用Transformer模块在多个分辨率进行融合,以确保地面和高空的关系得到有效地融合。
  • results: 对于两个难度最高的benchmark,提出的方法表现出色,与之前的方法相比,在驾驶和违法分数上具有显著的提升(8%和19%)。
    Abstract Sensor fusion is critical to perception systems for task domains such as autonomous driving and robotics. Recently, the Transformer integrated with CNN has demonstrated high performance in sensor fusion for various perception tasks. In this work, we introduce a method for fusing data from camera and LiDAR. By employing Transformer modules at multiple resolutions, proposed method effectively combines local and global contextual relationships. The performance of the proposed method is validated by extensive experiments with two adversarial benchmarks with lengthy routes and high-density traffics. The proposed method outperforms previous approaches with the most challenging benchmarks, achieving significantly higher driving and infraction scores. Compared with TransFuser, it achieves 8% and 19% improvement in driving scores for the Longest6 and Town05 Long benchmarks, respectively.
    摘要 感知系统中的感知融合是自动驾驶和机器人等任务领域的关键。最近,带有CNN的Transformer已经在多种感知任务中示出了高性能。在这项工作中,我们介绍了一种将摄像头和LiDAR数据进行融合的方法。通过在不同分辨率的Transformer模块中使用,我们的方法可以有效地结合本地和全局的 Contextual Relationships。我们的方法的性能被证明通过了两个挑战性的标准套件,即Length6和Town05 Long benchmarks。与之前的方法相比,我们的方法在这两个标准套件中表现出了8%和19%的提升。相比于TransFuser,我们的方法在Length6和Town05 Long benchmarks中分别提高了19%和8%的驾驶得分。

Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using EEG Data: A Dynamic Survival Analysis Framework with Competing Risks

  • paper_url: http://arxiv.org/abs/2308.11645
  • repo_url: None
  • paper_authors: Xiaobin Shen, Jonathan Elmer, George H. Chen
  • for: 预测受很高风险的心肺功能复兴后昏迷病人的神经系统结果,以帮助医疗决策。
  • methods: 使用EEG数据建立动态推断框架,预测患者在时间上的神经系统结果,并且随着更多的EEG数据成为可用,预测也会随着变化。
  • results: 使用三种竞争风险模型对922名患者的实际数据进行了比较,结果显示:(1)经典的Fine和Gray模型,只使用患者的静态特征和最近一小时的EEG数据,与最近开发的动态深度推断模型相当竞争,达到了高度准确的成绩;(2)在剪除试验中,我们发现,使用三种竞争风险模型,能够达到至少同样准确,同时学习更多信息的模型。
    Abstract Patients resuscitated from cardiac arrest who enter a coma are at high risk of death. Forecasting neurological outcomes of these patients (the task of neurological prognostication) could help with treatment decisions. In this paper, we propose, to the best of our knowledge, the first dynamic framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data: our framework makes predictions for a patient over time as more EEG data become available, and different training patients' available EEG time series could vary in length. Predictions are phrased in terms of either time-to-event outcomes (time-to-awakening or time-to-death) or as the patient's probability of awakening or of dying across multiple time horizons. Our framework uses any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions. We consider three competing risks as to what happens first to a patient: awakening, being withdrawn from life-sustaining therapies (and thus deterministically dying), or dying (by other causes). We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 patients. Our main experimental findings are that: (1) the classical Fine and Gray model which only uses a patient's static features and summary statistics from the patient's latest hour's worth of EEG data is highly competitive, achieving accuracy scores as high as the recently developed Dynamic-DeepHit model that uses substantially more of the patient's EEG data; and (2) in an ablation study, we show that our choice of modeling three competing risks results in a model that is at least as accurate while learning more information than simpler models (using two competing risks or a standard survival analysis setup with no competing risks).
    摘要 患者从心肺停止救护后入kom上的患者具有高风险死亡。预测这些患者的神经学结果(神经预测任务)可以帮助医疗决策。在这篇论文中,我们提出了,至今为止的知识最前进的动态框架 для神经预测患者从心肺停止救护后入kom的患者,使用EEG数据进行预测。我们的框架可以随着更多的EEG数据成为可用,预测患者的结果在时间上逐渐发展。我们的预测包括时间到事件结果(时间到醒来或时间到死亡)或患者在不同时间水平上的醒来或死亡概率。我们的框架使用任何支持竞争风险的动态生存分析模型,并且可以处理多种竞争风险,包括醒来、被终止生命维持治疗(因此必然死亡)和其他因素导致的死亡。我们在实际数据集上进行了比较三种现有的动态生存分析模型,并发现以下结论:(1)经典的 Fine and Gray 模型,只使用患者的静态特征和最近一小时的EEG数据,可以达到非常高的准确率,与最近开发的 Dynamic-DeepHit 模型相当。(2)在减少模型的 studiodelete 实验中,我们发现,通过处理三种竞争风险,我们的选择的模型可以learn更多的信息,同时保持与更简单的模型(使用两种竞争风险或标准生存分析设置无竞争风险)相比较高的准确率。

Explainable AI for tool wear prediction in turning

  • paper_url: http://arxiv.org/abs/2308.08765
  • repo_url: None
  • paper_authors: Saleh Valizadeh Sotubadi, Rui Liu, Vinh Neguyen
  • for: 这项研究旨在开发一个可解释人工智能(XAI)框架,以便为工具损害预测提供人理解的解决方案。
  • methods: 该研究使用了随机森林算法作为监督机器学习(ML)分类器,并使用加速度、噪声、温度和螺旋速度作为输入特征进行训练和二分类预测。
  • results: 研究发现,在所有测试集上实施了Shapley准则后,工具温度被确定为预测工具可用性和失效的最重要的输入特征。因此,这项研究表明XAI可以为机床操作员提供对复杂ML分类器预测工具损害的诊断和理解能力。
    Abstract This research aims develop an Explainable Artificial Intelligence (XAI) framework to facilitate human-understandable solutions for tool wear prediction during turning. A random forest algorithm was used as the supervised Machine Learning (ML) classifier for training and binary classification using acceleration, acoustics, temperature, and spindle speed during the orthogonal tube turning process as input features. The ML classifier was used to predict the condition of the tool after the cutting process, which was determined in a binary class form indicating if the cutting tool was available or failed. After the training process, the Shapley criterion was used to explain the predictions of the trained ML classifier. Specifically, the significance of each input feature in the decision-making and classification was identified to explain the reasoning of the ML classifier predictions. After implementing the Shapley criterion on all testing datasets, the tool temperature was identified as the most significant feature in determining the classification of available versus failed cutting tools. Hence, this research demonstrates capability of XAI to provide machining operators the ability to diagnose and understand complex ML classifiers in prediction of tool wear.
    摘要 After training the ML classifier, the Shapley criterion was used to explain the predictions. The Shapley criterion identified the significance of each input feature in the decision-making and classification, providing insight into the reasoning behind the ML classifier's predictions.Results showed that tool temperature was the most significant feature in determining the classification of available versus failed cutting tools. This research demonstrates the capability of XAI to provide machining operators with the ability to diagnose and understand complex ML classifiers in predicting tool wear. By using XAI, machining operators can gain a deeper understanding of the decision-making process of the ML classifier, improving the accuracy and reliability of tool wear prediction.In simplified Chinese, the text can be translated as:这项研究的目标是开发一个可解释人工智能(XAI)框架,以便为汽刃磨损预测提供人理解的解决方案。研究使用了随机森林算法作为指导式机器学习(ML)分类器,输入特征包括旋转、噪声、温度和转速等。ML分类器用于预测切割过程后工具的状况,并将其分为可用和失败两类。经过训练后,使用了Shapley criterion来解释ML分类器的预测结果。Shapley criterion可以确定每个输入特征在决策和分类过程中的重要性,从而解释ML分类器的预测结果的原因。研究结果表明,工具温度是确定可用与失败切割工具的最重要的特征。这项研究 demonstrate XAI的能力,即为机床操作人员提供可以诊断和理解复杂ML分类器的预测结果。通过使用XAI,机床操作人员可以更深入地理解ML分类器的决策过程,提高汽刃磨损预测的准确性和可靠性。

Efficient Commercial Bank Customer Credit Risk Assessment Based on LightGBM and Feature Engineering

  • paper_url: http://arxiv.org/abs/2308.08762
  • repo_url: None
  • paper_authors: Yanjie Sun, Zhike Gong, Quan Shi, Lin Chen
  • for: 这个论文主要是为了帮助商业银行控制信用风险,使用LightGBM算法建立分类器,以判断客户是否会default的可能性。
  • methods: 本论文使用了特征工程技术,如处理缺失值、编码、不均衡样本等,以提高机器学习效果。
  • results: 本论文的主要创新在于基于原始数据集构建新的特征属性,使分类器的准确率达0.734,AUC达0.772,超过了同基于数据集的许多分类器。I hope this helps! Let me know if you have any other questions.
    Abstract Effective control of credit risk is a key link in the steady operation of commercial banks. This paper is mainly based on the customer information dataset of a foreign commercial bank in Kaggle, and we use LightGBM algorithm to build a classifier to classify customers, to help the bank judge the possibility of customer credit default. This paper mainly deals with characteristic engineering, such as missing value processing, coding, imbalanced samples, etc., which greatly improves the machine learning effect. The main innovation of this paper is to construct new feature attributes on the basis of the original dataset so that the accuracy of the classifier reaches 0.734, and the AUC reaches 0.772, which is more than many classifiers based on the same dataset. The model can provide some reference for commercial banks' credit granting, and also provide some feature processing ideas for other similar studies.
    摘要 Effective control of credit risk is a key link in the steady operation of commercial banks. This paper is mainly based on the customer information dataset of a foreign commercial bank in Kaggle, and we use LightGBM algorithm to build a classifier to classify customers, to help the bank judge the possibility of customer credit default. This paper mainly deals with characteristic engineering, such as missing value processing, coding, imbalanced samples, etc., which greatly improves the machine learning effect. The main innovation of this paper is to construct new feature attributes on the basis of the original dataset so that the accuracy of the classifier reaches 0.734, and the AUC reaches 0.772, which is more than many classifiers based on the same dataset. The model can provide some reference for commercial banks' credit granting, and also provide some feature processing ideas for other similar studies.Here's the translation breakdown:Effective control of credit risk: 信用风险控制Key link: 关键链接Steady operation: 稳定运行Commercial banks: 商业银行Customer information dataset: 客户信息数据集LightGBM algorithm: LightGBM算法Classifier: 分类器Customer credit default: 客户债务 defaultCharacteristic engineering: 特征工程Missing value processing: 缺失值处理Coding: 编码Imbalanced samples: 不均衡样本Machine learning effect: 机器学习效果Accuracy: 准确率AUC: 面积 beneath the ROC curveNew feature attributes: 新的特征属性Original dataset: 原始数据集Reference: 参考Credit granting: 信用授予

PMET: Precise Model Editing in a Transformer

  • paper_url: http://arxiv.org/abs/2308.08742
  • repo_url: https://github.com/xpq-tech/pmet
  • paper_authors: Xiaopeng Li, Shasha Li, Shezheng Song, Jing Yang, Jun Ma, Jie Yu
  • for: 这个论文主要目标是提高模型修改技术的性能,以便更好地在大语言模型(LLM)中插入新知识。
  • methods: 该论文使用了一种新的模型修改方法,称为PMET,它同时优化多头自注意力(MHSA)和循环网络(FFN)的隐藏状态,并只使用优化后的FFN隐藏状态来精确地更新FFN参数。
  • results: 实验表明,PMET在COUNTERFACT和zsRE数据集上具有状态对抗性,并且在这两个数据集上都达到了最佳性能。
    Abstract Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the COUNTERFACT and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at https://github.com/xpq-tech/PMET.git.
    摘要 大型语言模型(LLM)的修改技术可以轻松地修改一小部分知识,并且已经达到了显著的成功。现有的方法假设Transformer层(TL)的隐藏状态是Feed-Forward Network(FFN)的钥匙值内存。他们通常将TL隐藏状态优化为记忆target知识,然后使用这些TL隐藏状态来更新FFN的类型。然而,TL隐藏状态的资讯来源来自三个部分:多头自我对话(MHSA)、FFN和复古连接。现有的方法忽略了TL隐藏状态中包含的资讯不具体的特定知识。因此,模型修改的性能受到影响。为了更加精确地修改模型,我们进行了隐藏状态分析,发现MHSA实际上对应某些通用知识提取模式。这表示MHSA的权重不需要更新当新知识引入。基于以上发现,我们引入了PMET,它同时优化Transformer组件(TC,即MHSA和FFN)的隐藏状态,只使用优化的TC隐藏状态FFN来准确地更新FFN的类型。我们的实验显示PMET在COUNTERFACT和zsRE datasets上具有最佳性能。我们的剖析实验证实了我们的改进的有效性,进一步证明MHSA对应通用知识提取模式,并且储存一小量的事实知识。我们的代码可以在https://github.com/xpq-tech/PMET.git中找到。

ReProHRL: Towards Multi-Goal Navigation in the Real World using Hierarchical Agents

  • paper_url: http://arxiv.org/abs/2308.08737
  • repo_url: None
  • paper_authors: Tejaswini Manjunath, Mozhgan Navardi, Prakhar Dixit, Bharat Prakash, Tinoosh Mohsenin
  • for: 本研究旨在解决现实世界中RL算法学习环境稀缺奖励和多目标导航问题。
  • methods: 本研究提出了Ready for Production Hierarchical RL(ReProHRL)方法,该方法将多目标导航划分为多个层次,并使用了对象检测器作为前处理步骤来学习多目标导航并在实际世界中传输。
  • results: 实验结果显示,提出的ReProHRL方法在模拟和实际环境中比基eline方法更高效,具体来说,在一个简单的单目标导航环境中,两者均达成100%成功率,但在一个更复杂的环境和多目标设定下,ReProHRL方法比基eline方法提高18%和5%。此外,通过在一架名为Crazyflie的奈米飞行器上部署提出的方法,实现了多目标导航实验。
    Abstract Robots have been successfully used to perform tasks with high precision. In real-world environments with sparse rewards and multiple goals, learning is still a major challenge and Reinforcement Learning (RL) algorithms fail to learn good policies. Training in simulation environments and then fine-tuning in the real world is a common approach. However, adapting to the real-world setting is a challenge. In this paper, we present a method named Ready for Production Hierarchical RL (ReProHRL) that divides tasks with hierarchical multi-goal navigation guided by reinforcement learning. We also use object detectors as a pre-processing step to learn multi-goal navigation and transfer it to the real world. Empirical results show that the proposed ReProHRL method outperforms the state-of-the-art baseline in simulation and real-world environments in terms of both training time and performance. Although both methods achieve a 100% success rate in a simple environment for single goal-based navigation, in a more complex environment and multi-goal setting, the proposed method outperforms the baseline by 18% and 5%, respectively. For the real-world implementation and proof of concept demonstration, we deploy the proposed method on a nano-drone named Crazyflie with a front camera to perform multi-goal navigation experiments.
    摘要 Translated into Simplified Chinese:罗宾托已经成功地执行了高精度任务。在真实世界中的罕见奖励和多个目标下,学习仍然是一个主要挑战,而从实验环境进行训练然后在真实世界进行精确化是一个常见的方法。但是适应真实世界设定是一个挑战。在这篇论文中,我们提出了一种名为Ready for Production Hierarchical RL(ReProHRL)的方法,它通过分解任务为层次多个目标导航,并通过从实验环境进行训练,然后在真实世界进行精确化。我们还使用物体探测器作为先processing步骤,以学习多个目标导航并将其转移到真实世界。实验结果显示,提案的ReProHRL方法在模拟和真实世界环境中比基准方案高效和精确,尤其在复杂的环境和多个目标设定下。为了证明概念的可行性和实现,我们在一架名为Crazyflie的奈米探测器上进行了多个目标导航实验。

On the Effectiveness of Log Representation for Log-based Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.08736
  • repo_url: https://github.com/mooselab/suppmaterial-logrepforanomalydetection
  • paper_authors: Xingfang Wu, Heng Li, Foutse Khomh
    for:This paper aims to compare and evaluate different log representation techniques for use in machine learning-based log analysis tasks, specifically for anomaly detection.methods:The authors select six commonly used log representation techniques and evaluate them with seven machine learning models and four public log datasets. They also examine the impact of log parsing and feature aggregation approaches when used with these techniques.results:The authors provide heuristic guidelines for future researchers and developers based on their comprehensive comparison of log representation techniques. They hope to help researchers and practitioners better understand the characteristics of different log representation techniques and select the most suitable ones for their ML-based log analysis workflow.
    Abstract Logs are an essential source of information for people to understand the running status of a software system. Due to the evolving modern software architecture and maintenance methods, more research efforts have been devoted to automated log analysis. In particular, machine learning (ML) has been widely used in log analysis tasks. In ML-based log analysis tasks, converting textual log data into numerical feature vectors is a critical and indispensable step. However, the impact of using different log representation techniques on the performance of the downstream models is not clear, which limits researchers and practitioners' opportunities of choosing the optimal log representation techniques in their automated log analysis workflows. Therefore, this work investigates and compares the commonly adopted log representation techniques from previous log analysis research. Particularly, we select six log representation techniques and evaluate them with seven ML models and four public log datasets (i.e., HDFS, BGL, Spirit and Thunderbird) in the context of log-based anomaly detection. We also examine the impacts of the log parsing process and the different feature aggregation approaches when they are employed with log representation techniques. From the experiments, we provide some heuristic guidelines for future researchers and developers to follow when designing an automated log analysis workflow. We believe our comprehensive comparison of log representation techniques can help researchers and practitioners better understand the characteristics of different log representation techniques and provide them with guidance for selecting the most suitable ones for their ML-based log analysis workflow.
    摘要 日志是软件系统运行状况理解的重要来源。由于现代软件架构和维护方法不断演化,更多的研究努力被投入到自动化日志分析领域。特别是在机器学习(ML)中,日志分析任务中的文本日志数据转换成数值特征向量是 kritical 和不可或缺的步骤。然而,使用不同的日志表示技术对下游模型性能的影响还不够清楚,这限制了研究者和实践者在自动化日志分析流程中选择优化的日志表示技术的机会。因此,本工作 investigate 和比较过去的日志分析研究中通常采用的日志表示技术。特别是我们选择了六种日志表示技术,并与七种ML模型和四个公共日志数据集(即HDFS、BGL、Spirit 和Thunderbird)在日志基于异常检测中进行评估。我们还检查了在日志分析过程中使用日志解析过程和不同的特征聚合方法的影响。从实验结果来看,我们提供了一些有用的准则,以帮助未来的研究者和开发者在设计自动化日志分析工作流程时 следу。我们认为我们的全面的日志表示技术比较可以帮助研究者和实践者更好地了解不同日志表示技术的特点,并为他们选择最适合的日志表示技术。

A Novel Loss Function Utilizing Wasserstein Distance to Reduce Subject-Dependent Noise for Generalizable Models in Affective Computing

  • paper_url: http://arxiv.org/abs/2308.10869
  • repo_url: None
  • paper_authors: Nibraas Khan, Mahrukh Tauseef, Ritam Ghosh, Nilanjan Sarkar
  • for: 本研究旨在提高人工智能系统的情感识别精度,通过使用优化运输理论和 Wasserstein 距离来降低主观噪声的影响。
  • methods: 本研究使用了一种新的成本函数,其中包括用于衡量主观噪声的 Wasserstein 距离。同时,研究还使用了一种基于 autoencoder 的多类分类器,以同时检测不同情感状态。
  • results: 研究发现,使用提议的成本函数可以提高情感识别精度,与基准模型(Mean Squared Error)相比,平均提高了14.75%和17.75%的 minimum 和 centroid 距离。
    Abstract Emotions are an essential part of human behavior that can impact thinking, decision-making, and communication skills. Thus, the ability to accurately monitor and identify emotions can be useful in many human-centered applications such as behavioral training, tracking emotional well-being, and development of human-computer interfaces. The correlation between patterns in physiological data and affective states has allowed for the utilization of deep learning techniques which can accurately detect the affective states of a person. However, the generalisability of existing models is often limited by the subject-dependent noise in the physiological data due to variations in a subject's reactions to stimuli. Hence, we propose a novel cost function that employs Optimal Transport Theory, specifically Wasserstein Distance, to scale the importance of subject-dependent data such that higher importance is assigned to patterns in data that are common across all participants while decreasing the importance of patterns that result from subject-dependent noise. The performance of the proposed cost function is demonstrated through an autoencoder with a multi-class classifier attached to the latent space and trained simultaneously to detect different affective states. An autoencoder with a state-of-the-art loss function i.e., Mean Squared Error, is used as a baseline for comparison with our model across four different commonly used datasets. Centroid and minimum distance between different classes are used as a metrics to indicate the separation between different classes in the latent space. An average increase of 14.75% and 17.75% (from benchmark to proposed loss function) was found for minimum and centroid euclidean distance respectively over all datasets.
    摘要 人类行为中的情感是一个重要的部分,可以影响思维、决策和communication Skills。因此,可以准确识别和评估情感的能力在许多人类中心应用中可以是有益的,如行为训练、情感健康评估和人机界面的开发。基于生理数据中的征 patrerns和情感状态的相关性,使用深度学习技术可以准确地检测人类情感状态。然而,现有模型的通用性通常受到参与者的响应差异所限制,这些差异会导致数据中的Subject-dependent noise。因此,我们提议一种新的成本函数,使用Optimal Transport Theory,具体来说是Wasserstein Distance,来抑制参与者具有特定数据的影响。我们的模型通过在缺省空间中附加多类分类器,并同时使用我们提议的成本函数和现有的state-of-the-art loss function(即Mean Squared Error)进行训练,来检测不同的情感状态。与基准模型相比,我们的模型在四个常用的数据集上的性能显著提高,具体来说是平均提高14.75%和17.75%(从基准loss function到我们的loss function)。为了衡量不同类别之间的分离度,我们使用中心距离和最小距离两种指标,并发现在所有数据集上,我们的模型的中心距离和最小距离均有显著提高。

Synergistic Signal Denoising for Multimodal Time Series of Structure Vibration

  • paper_url: http://arxiv.org/abs/2308.11644
  • repo_url: None
  • paper_authors: Yang Yu, Han Chen
  • for: 本研究旨在提供一种基于深度学习的 Structural Health Monitoring (SHM) 解决方案,以提高结构的寿命和安全性。
  • methods: 本研究使用了一种新的深度学习算法,结合了卷积和循环网络,以捕捉多Modal vibration signal 中的本地和持续性结构行为。 additionally, the algorithm incorporates attention mechanisms to prioritize salient structural responses and improve predictive accuracy.
  • results: 研究结果显示了该算法在多种 SHM 场景中的显著改进,包括早期损害探测和适应性。 Furthermore, the proposed approach offers a more transparent and interpretable AI-driven SHM solution, with potential for real-time processing, integration with external environmental factors, and further emphasis on model interpretability.
    Abstract Structural Health Monitoring (SHM) plays an indispensable role in ensuring the longevity and safety of infrastructure. With the rapid growth of sensor technology, the volume of data generated from various structures has seen an unprecedented surge, bringing forth challenges in efficient analysis and interpretation. This paper introduces a novel deep learning algorithm tailored for the complexities inherent in multimodal vibration signals prevalent in SHM. By amalgamating convolutional and recurrent architectures, the algorithm adeptly captures both localized and prolonged structural behaviors. The pivotal integration of attention mechanisms further enhances the model's capability, allowing it to discern and prioritize salient structural responses from extraneous noise. Our results showcase significant improvements in predictive accuracy, early damage detection, and adaptability across multiple SHM scenarios. In light of the critical nature of SHM, the proposed approach not only offers a robust analytical tool but also paves the way for more transparent and interpretable AI-driven SHM solutions. Future prospects include real-time processing, integration with external environmental factors, and a deeper emphasis on model interpretability.
    摘要 (简体中文)结构健康监测(SHM)在基础设施的寿命和安全方面扮演着不可或缺的角色。随着感知技术的快速发展,各种结构生成的数据量在不断增加,这导致了分析和解释数据的挑战。本文介绍了一种适应多模态振荡信号的深度学习算法,通过结合卷积和回归体系,能够具有地方化和持续的结构行为捕捉能力。具有注意力机制的整合使得模型能够更好地筛选和优先处理关键的结构响应。我们的结果表明,该方法可以在多种 SHM 场景中显著提高预测精度、早期损害检测和适应性。鉴于 SHM 的重要性,该方法不仅提供了一种可靠的分析工具,还开创了更加透明和可解释的 AI-驱动 SHM 解决方案。未来的发展方向包括实时处理、与外部环境因素集成和更深入的模型解释性。

Dynamic Neural Network is All You Need: Understanding the Robustness of Dynamic Mechanisms in Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08709
  • repo_url: https://github.com/anonymous2015258/Early_Attack
  • paper_authors: Mirazul Haque, Wei Yang
  • for: 本研究旨在 investigate 动态神经网络(DyNNs)中动态机制的稳定性和鲁棒性问题。
  • methods: 本研究使用三个模型和两个数据集进行评估。我们采用了多种攻击方法来评估动态机制对DyNNs的影响。
  • results: 我们发现,从DyNNs到SDNNs的攻击传递率高于从SDNNs到DyNNs。此外,我们发现DyNNs可以更有效地生成攻击样本。最后,我们提出了一种新的攻击方法,并提供了设计选择来提高DyNNs的鲁棒性。
    Abstract Deep Neural Networks (DNNs) have been used to solve different day-to-day problems. Recently, DNNs have been deployed in real-time systems, and lowering the energy consumption and response time has become the need of the hour. To address this scenario, researchers have proposed incorporating dynamic mechanism to static DNNs (SDNN) to create Dynamic Neural Networks (DyNNs) performing dynamic amounts of computation based on the input complexity. Although incorporating dynamic mechanism into SDNNs would be preferable in real-time systems, it also becomes important to evaluate how the introduction of dynamic mechanism impacts the robustness of the models. However, there has not been a significant number of works focusing on the robustness trade-off between SDNNs and DyNNs. To address this issue, we propose to investigate the robustness of dynamic mechanism in DyNNs and how dynamic mechanism design impacts the robustness of DyNNs. For that purpose, we evaluate three research questions. These evaluations are performed on three models and two datasets. Through the studies, we find that attack transferability from DyNNs to SDNNs is higher than attack transferability from SDNNs to DyNNs. Also, we find that DyNNs can be used to generate adversarial samples more efficiently than SDNNs. Then, through research studies, we provide insight into the design choices that can increase robustness of DyNNs against the attack generated using static model. Finally, we propose a novel attack to understand the additional attack surface introduced by the dynamic mechanism and provide design choices to improve robustness against the attack.
    摘要 深度神经网络(DNNs)已经用于解决不同的日常问题。近期,DNNs在实时系统中部署,降低能耗和响应时间已成为当前的需求。为解决这种情况,研究人员已经提议将静止神经网络(SDNN)与动态机制结合,创造可动Amount of computation的神经网络(DyNNs)。虽然在实时系统中将静止机制添加到SDNNs是可行的,但也需要评估这种机制对模型的稳定性的影响。然而,有很少研究关于SDNNs和DyNNs之间的稳定性质量的负担。为了解决这个问题,我们提出了三个研究问题。这些评估在三个模型和两个数据集上进行。通过研究,我们发现了以下结论:从 DyNNs 到 SDNNs 的攻击传播率高于从 SDNNs 到 DyNNs 的攻击传播率。此外,我们发现了 DyNNs 可以更高效地生成黑客样本 than SDNNs。然后,通过研究,我们提供了在设计 DyNNs 时的选择,以提高其对攻击的鲁棒性。最后,我们提出了一种新的攻击方法,以了解动态机制引入的额外攻击表面,并提供了防御策略来改善鲁棒性。

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

  • paper_url: http://arxiv.org/abs/2308.08708
  • repo_url: None
  • paper_authors: Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen
  • for: 本研究的目的是评估当前和未来的人工智能系统是否具有意识。
  • methods: 本研究使用了一种严谨和基于实验的方法,即在 neuroscientific 理论的光下评估当前和未来的人工智能系统,以确定它们是否具有意识。
  • results: 研究结果表明,目前的人工智能系统没有意识,但也没有技术障碍建立意识的系统。
    Abstract Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.
    摘要 Current or near-term AI systems 是科学界的兴趣话题,也是公众的关注点。本报告提出了一种严谨的、基于实证研究的AI意识方法:根据我们最为支持的神经科学理论来评估现有AI系统。我们评估了一些主要的科学理论,包括循环处理理论、全局工作区理论、更高级理论、预测处理理论和注意 schema 理论。从这些理论中,我们 derivate了“指标属性”,这些属性可以用计算机语言来评估 AI 系统。我们使用这些指标属性来评估一些最近的 AI 系统,并讨论未来系统可能如何实现这些属性。我们的分析结果表明,目前没有任何 AI 系统具有意识,但也没有明显的技术障碍来建立具有这些指标的 AI 系统。

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

  • paper_url: http://arxiv.org/abs/2308.09723
  • repo_url: None
  • paper_authors: Young Jin Kim, Rawn Henry, Raffy Fahim, Hany Hassan Awadalla
  • for: 提高大型语言模型(LLMs)在不同语言任务中的状态空间性能,但它们在实际部署中存在巨大内存需求和吞吐量瓶颈问题。
  • methods: 我们提出了一种高效的量化方法,可以降低LLMs的内存占用和加速推理。我们采用了一种简单有效的规则,只使用预训练模型的Weight,无需额外 fine-tuning。
  • results: 我们的方法可以保持最小的质量下降,在大规模的开源模型,如OPT-175B和内部MoE模型上实现高达3.65倍的 Throughput,同时占用相同数量的GPU。
    Abstract Large Language Models (LLMs) have achieved state-of-the-art performance across various language tasks but pose challenges for practical deployment due to their substantial memory requirements. Furthermore, the latest generative models suffer from high inference costs caused by the memory bandwidth bottleneck in the auto-regressive decoding process. To address these issues, we propose an efficient weight-only quantization method that reduces memory consumption and accelerates inference for LLMs. To ensure minimal quality degradation, we introduce a simple and effective heuristic approach that utilizes only the model weights of a pre-trained model. This approach is applicable to both Mixture-of-Experts (MoE) and dense models without requiring additional fine-tuning. To demonstrate the effectiveness of our proposed method, we first analyze the challenges and issues associated with LLM quantization. Subsequently, we present our heuristic approach, which adaptively finds the granularity of quantization, effectively addressing these problems. Furthermore, we implement highly efficient GPU GEMMs that perform on-the-fly matrix multiplication and dequantization, supporting the multiplication of fp16 or bf16 activations with int8 or int4 weights. We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput on the same number of GPUs.
    摘要 To demonstrate the effectiveness of our proposed method, we first analyze the challenges and issues associated with LLM quantization. Subsequently, we present our heuristic approach, which adaptively finds the granularity of quantization, effectively addressing these problems. Furthermore, we implement highly efficient GPU GEMMs that perform on-the-fly matrix multiplication and dequantization, supporting the multiplication of fp16 or bf16 activations with int8 or int4 weights.We evaluate our approach on large-scale open source models such as OPT-175B and internal MoE models, showcasing minimal accuracy loss while achieving up to 3.65 times higher throughput on the same number of GPUs.

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

  • paper_url: http://arxiv.org/abs/2308.08705
  • repo_url: None
  • paper_authors: Xiangyu Liu, Kaiqing Zhang
  • for: 本研究旨在提出一种可证明的多智能RL(MARL)方法,以普通的多智能游戏(POSG)为框架, circumvent known hardness results 和 computationally intractable oracles。
  • methods: 我们提出利用多智能agent之间的信息共享,一种常见的实际MARL中的做法,以及多智能控制系统中的通信模型。 我们首先证明了需要信息共享和可观察性假设的计算复杂度结论,以确保计算效率。然后,我们提出一种 Approximate 模型,以解决POSG的计算复杂度问题。最后,我们开发了一种可 statistically 和 computationally quasi-efficient 的多智能RL算法。
  • results: 我们的研究可能开启了在不同信息结构下开发可 sample-和计算效率的多智能RL算法的可能性。
    Abstract We study provable multi-agent reinforcement learning (MARL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of computationally intractable oracles, we advocate leveraging the potential \emph{information-sharing} among agents, a common practice in empirical MARL, and a standard model for multi-agent control systems with communications. We first establish several computation complexity results to justify the necessity of information-sharing, as well as the observability assumption that has enabled quasi-efficient single-agent RL with partial observations, for computational efficiency in solving POSGs. We then propose to further \emph{approximate} the shared common information to construct an {approximate model} of the POSG, in which planning an approximate equilibrium (in terms of solving the original POSG) can be quasi-efficient, i.e., of quasi-polynomial-time, under the aforementioned assumptions. Furthermore, we develop a partially observable MARL algorithm that is both statistically and computationally quasi-efficient. We hope our study may open up the possibilities of leveraging and even designing different \emph{information structures}, for developing both sample- and computation-efficient partially observable MARL.
    摘要 我们研究可证明多智能体刺激学习(MARL)在通用概念下的部分可见随机游戏(POSG)中。为了绕过已知的困难性和使用计算可负担的底层 oracle,我们倡议利用智能体之间的信息共享,这是现实中多智能体 MARL 中的常见做法,以及多机控制系统通信的标准模型。我们首先确立了一些计算复杂性结果,以证明信息共享的必要性和部分可见性假设,以确保计算效率在解决 POSG 中。然后,我们提议使用可approximate的共享公共信息来构建一个approximate模型,以便在这个模型中计划一个近似equilibrium(相当于解决原始 POSG),这可以在前面的假设下达到 quasi-多项式时间的计算效率。此外,我们开发了一种部分可见 MARL 算法,它不仅是 statistically quasi-efficient,而且也是 computationally quasi-efficient。我们希望我们的研究可以开放更多的可能性,例如利用和设计不同的信息结构,以开发更高效的部分可见 MARL。

Planning in the imagination: High-level planning on learned abstract search spaces

  • paper_url: http://arxiv.org/abs/2308.08693
  • repo_url: None
  • paper_authors: Carlos Martin, Tuomas Sandholm
  • for: 本文提出了一种新的方法 PiZero,允许智能机器人在自己创建的抽象搜索空间中进行规划,这个空间与实际环境完全解耦。
  • methods: 本文使用的方法PiZero可以在任意时间尺度上进行高级规划,并且可以处理连续动作空间和部分可见性的情况。
  • results: 根据实验结果,PiZero在多个领域中表现出色,比如导航任务和Sokoban,并且在不假设环境模拟器的情况下超过了相似的先前方法。
    Abstract We propose a new method, called PiZero, that gives an agent the ability to plan in an abstract search space of its own creation that is completely decoupled from the real environment. Unlike prior approaches, this enables the agent to perform high-level planning at arbitrary timescales and reason in terms of compound or temporally-extended actions, which can be useful in environments where large numbers of base-level micro-actions are needed to perform relevant macro-actions. In addition, our method is more general than comparable prior methods because it handles settings with continuous action spaces and partial observability. We evaluate our method on multiple domains, including navigation tasks and Sokoban. Experimentally, it outperforms comparable prior methods without assuming access to an environment simulator.
    摘要 我们提出了一种新方法,叫做PiZero,它让一个代理人有能力在自己创造的抽象搜索空间中进行规划,这个空间完全与真实环境解耦。与先前的方法不同,这使得代理人可以在任意时间尺度进行高级规划,并且可以使用复杂或时间扩展的动作来执行重要的macro-动作。此外,我们的方法比先前的方法更通用,因为它处理了连续动作空间和部分可见性的情况。我们在多个领域进行了实验,包括导航任务和Sokoban,并证明了它在相比先前方法而言表现出色,不需要对环境模拟器进行假设。

Quantifying Overfitting: Introducing the Overfitting Index

  • paper_url: http://arxiv.org/abs/2308.08682
  • repo_url: None
  • paper_authors: Sanad Aburass
  • for: 本研究旨在提出一种新的度量指标(过拟合指标),用于评估深度学习模型的过拟合情况。
  • methods: 该研究使用了多种深度学习模型,包括 MobileNet、U-Net、ResNet、Darknet 和 ViT-32,并在 Breast Ultrasound Images Dataset (BUS) 和 MNIST 数据集上进行了广泛的实验。
  • results: 研究结果表明,不同的模型在不同的数据集上 exhibits 不同的过拟合行为,而数据增强特别是在小型特有数据集上产生了积极的影响。 In addition, the ViT-32 模型在 MNIST 数据集上的表现也证明了某些模型的稳定性和数据集的全面性。
    Abstract In the rapidly evolving domain of machine learning, ensuring model generalizability remains a quintessential challenge. Overfitting, where a model exhibits superior performance on training data but falters on unseen data, is a recurrent concern. This paper introduces the Overfitting Index (OI), a novel metric devised to quantitatively assess a model's tendency to overfit. Through extensive experiments on the Breast Ultrasound Images Dataset (BUS) and the MNIST dataset using architectures such as MobileNet, U-Net, ResNet, Darknet, and ViT-32, we illustrate the utility and discernment of the OI. Our results underscore the variable overfitting behaviors across architectures and highlight the mitigative impact of data augmentation, especially on smaller and more specialized datasets. The ViT-32's performance on MNIST further emphasizes the robustness of certain models and the dataset's comprehensive nature. By providing an objective lens to gauge overfitting, the OI offers a promising avenue to advance model optimization and ensure real-world efficacy.
    摘要 在快速发展的机器学习领域中,保证模型通用化性仍然是一项核心挑战。过拟合,其中模型在训练数据上显示出杰出表现,但在未经见数据上却表现不佳,是一个常见的问题。本文介绍了一种新的评估模型过拟合的指标——过拟合指数(OI)。通过对Breast Ultrasound Images Dataset(BUS)和MNIST dataset上的多种架构(如MobileNet、U-Net、ResNet、Darknet和ViT-32)进行了广泛的实验,我们证明了OI的用于评估模型过拟合的有用性和分辨率。我们的结果表明不同的架构之间存在差异性,并且数据扩展尤其是在小型特定 datasets 上具有缓解作用。ViT-32在MNIST上的表现进一步证明了某些模型的强健性和数据的完整性。通过提供一个对过拟合进行 объек 的评估,OI为模型优化和实际应用带来了一个有前途的途径。

SkinDistilViT: Lightweight Vision Transformer for Skin Lesion Classification

  • paper_url: http://arxiv.org/abs/2308.08669
  • repo_url: https://github.com/Longman-Stan/SkinDistilVit
  • paper_authors: Vlad-Constantin Lungu-Stan, Dumitru-Clementin Cercel, Florin Pop
  • For: The paper is written for solving the skin cancer classification problem, specifically focusing on melanoma identification.* Methods: The paper uses a vision transformer trained on melanoma medical images annotated by experts, with knowledge distillation to obtain a model that retains the teacher’s balanced multi-class accuracy at a lower cost in terms of memory and time.* Results: The paper achieves a balanced multi-class accuracy of 98.33% with a model that is 49.60% smaller than the teacher and 69.25% faster on GPU and 97.96% faster on CPU. Additionally, the paper proposes a cascading distillation process to improve the balanced multi-class accuracy of the base model by 2.1%, while creating a range of models of various sizes but comparable performance.
    Abstract Skin cancer is a treatable disease if discovered early. We provide a production-specific solution to the skin cancer classification problem that matches human performance in melanoma identification by training a vision transformer on melanoma medical images annotated by experts. Since inference cost, both time and memory wise is important in practice, we employ knowledge distillation to obtain a model that retains 98.33% of the teacher's balanced multi-class accuracy, at a fraction of the cost. Memory-wise, our model is 49.60% smaller than the teacher. Time-wise, our solution is 69.25% faster on GPU and 97.96% faster on CPU. By adding classification heads at each level of the transformer and employing a cascading distillation process, we improve the balanced multi-class accuracy of the base model by 2.1%, while creating a range of models of various sizes but comparable performance. We provide the code at https://github.com/Longman-Stan/SkinDistilVit.
    摘要 皮肤癌是一种可治疗的疾病,如果早发现。我们提供了一个特有的解决方案,用于皮肤癌分类问题,可以与人类专家的标注医生医生医生的医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生医生

BREATHE: Second-Order Gradients and Heteroscedastic Emulation based Design Space Exploration

  • paper_url: http://arxiv.org/abs/2308.08666
  • repo_url: None
  • paper_authors: Shikhar Tuli, Niraj K. Jha
  • for: 本文旨在提出一种受限Multiple Objective Optimization(MOO)框架,以优化vector和graph基于的设计空间中的最佳设计。
  • methods: 该框架利用第二阶导数和约束来实现样本高效优化,同时使用不同的搜索策略来扩展搜索空间。
  • results: 在单目标vector优化应用中,与最佳基elineRandom Forest Regression相比,本方法可以 дости到64.1%的提高性能。在图基于的搜索中,本方法可以与Gaussian-process-based Bayesian Optimization相比,达到64.9%的提高性能。在多目标优化任务中,本方法可以达到21.9倍的超越性能。
    Abstract Researchers constantly strive to explore larger and more complex search spaces in various scientific studies and physical experiments. However, such investigations often involve sophisticated simulators or time-consuming experiments that make exploring and observing new design samples challenging. Previous works that target such applications are typically sample-inefficient and restricted to vector search spaces. To address these limitations, this work proposes a constrained multi-objective optimization (MOO) framework, called BREATHE, that searches not only traditional vector-based design spaces but also graph-based design spaces to obtain best-performing graphs. It leverages second-order gradients and actively trains a heteroscedastic surrogate model for sample-efficient optimization. In a single-objective vector optimization application, it leads to 64.1% higher performance than the next-best baseline, random forest regression. In graph-based search, BREATHE outperforms the next-best baseline, i.e., a graphical version of Gaussian-process-based Bayesian optimization, with up to 64.9% higher performance. In a MOO task, it achieves up to 21.9$\times$ higher hypervolume than the state-of-the-art method, multi-objective Bayesian optimization (MOBOpt). BREATHE also outperforms the baseline methods on most standard MOO benchmark applications.
    摘要

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

  • paper_url: http://arxiv.org/abs/2308.08656
  • repo_url: None
  • paper_authors: Keziah Naggita, Julienne LaChance, Alice Xiang
  • for: investigate the limitations of standard Internet data collection methods in low- and middle-income countries
  • methods: analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa
  • results: findings for an ``othering’’ phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers, and the need for further work to capture image data representative of African people and their environments to improve the applicability of computer vision models in a global context.Here’s the text in Simplified Chinese:
  • for: investigate the limitations of standard Internet data collection methods in low- and middle-income countries
  • methods: 使用 geotagged Flickr 图像与每个非洲国家相关的人中心的图像增强大规模地分析图像地域多样性
  • results: 发现非洲图像中有许多非本地摄影师拍摄的``他者’’现象,需要进一步的工作来捕捉非洲人和他们环境的图像数据,以提高计算机视觉模型在全球上的适用性。
    Abstract Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.
    摘要 大规模图像数据集中的偏见被知道会影响计算机视觉模型的表现,具体来说是根据地理上下文。为了探究标准互联网数据采集方法在低中收入国家的局限性,我们使用地标注的Flickr图像与每个非洲国家进行人acentric图像地域多样性分析,并对与人口匹配的欧洲国家进行比较。我们还分析了根据细化的内部富裕度Estimates分布的数据。我们在两年 interval temporal 分析中暴露出emerging 数据趋势。此外,我们还发现了一种“其他化”现象,即非洲的大量图像是由非本地摄影师拍摄的。研究结果表明,更多的工作需要 capture 非洲人和他们的环境的图像数据,以便提高计算机视觉模型在全球上的可应用性。

Physics Informed Recurrent Neural Networks for Seismic Response Evaluation of Nonlinear Systems

  • paper_url: http://arxiv.org/abs/2308.08655
  • repo_url: None
  • paper_authors: Faisal Nissar Malik, James Ricles, Masoud Yari, Malik Arsala Nissar
  • for: 本研究旨在evaluating the dynamic response of multi-degree-of-freedom (MDOF) systems using physics-informed recurrent neural networks, with a focus on seismic (earthquake) response of nonlinear structures.
  • methods: 本研究使用了physics-informed recurrent neural networks (RNNs) to evaluate the dynamic response of MDOF systems. The proposed method leverages large data sets and sophisticated algorithms to learn the complex relationship between inputs and outputs.
  • results: 研究人员预测了MDOF系统的动态回应,并与现有方法such as finite element analysis (FEA)进行比较,以评估physics-informed RNN模型的效果。
    Abstract Dynamic response evaluation in structural engineering is the process of determining the response of a structure, such as member forces, node displacements, etc when subjected to dynamic loads such as earthquakes, wind, or impact. This is an important aspect of structural analysis, as it enables engineers to assess structural performance under extreme loading conditions and make informed decisions about the design and safety of the structure. Conventional methods for dynamic response evaluation involve numerical simulations using finite element analysis (FEA), where the structure is modeled using finite elements, and the equations of motion are solved numerically. Although effective, this approach can be computationally intensive and may not be suitable for real-time applications. To address these limitations, recent advancements in machine learning, specifically artificial neural networks, have been applied to dynamic response evaluation in structural engineering. These techniques leverage large data sets and sophisticated algorithms to learn the complex relationship between inputs and outputs, making them ideal for such problems. In this paper, a novel approach is proposed for evaluating the dynamic response of multi-degree-of-freedom (MDOF) systems using physics-informed recurrent neural networks. The focus of this paper is to evaluate the seismic (earthquake) response of nonlinear structures. The predicted response will be compared to state-of-the-art methods such as FEA to assess the efficacy of the physics-informed RNN model.
    摘要 <> dynamically evaluate the response of a structure, such as member forces, node displacements, etc when subjected to dynamic loads such as earthquakes, wind, or impact. This is an important aspect of structural analysis, as it enables engineers to assess structural performance under extreme loading conditions and make informed decisions about the design and safety of the structure. Conventionally, methods for dynamic response evaluation involve numerical simulations using finite element analysis (FEA), where the structure is modeled using finite elements, and the equations of motion are solved numerically. Although effective, this approach can be computationally intensive and may not be suitable for real-time applications. To address these limitations, recent advancements in machine learning, specifically artificial neural networks, have been applied to dynamic response evaluation in structural engineering. These techniques leverage large data sets and sophisticated algorithms to learn the complex relationship between inputs and outputs, making them ideal for such problems. In this paper, a novel approach is proposed for evaluating the dynamic response of multi-degree-of-freedom (MDOF) systems using physics-informed recurrent neural networks. The focus of this paper is to evaluate the seismic (earthquake) response of nonlinear structures. The predicted response will be compared to state-of-the-art methods such as FEA to assess the efficacy of the physics-informed RNN model.Translated by Google Translate.

Reproducing Kernel Hilbert Space Pruning for Sparse Hyperspectral Abundance Prediction

  • paper_url: http://arxiv.org/abs/2308.08653
  • repo_url: None
  • paper_authors: Michael G. Rawson, Timothy Doster, Tegan Emerson
  • for: 本研究旨在开发一种基于希尔伯特空间变换的简单spectral compression和分析方法,以减少高分辨率数据的压缩和分析成本。
  • methods: 本研究使用非负最小二乘法来逐步减少矩阵的维度,并使用最大似然压缩向量来减少信息损失。
  • results: 对实际和 sintetic 数据进行评估,我们发现希尔伯特空间预处理可以减少错误率,并且可以超越标准预处理和最小二乘法方法。
    Abstract Hyperspectral measurements from long range sensors can give a detailed picture of the items, materials, and chemicals in a scene but analysis can be difficult, slow, and expensive due to high spatial and spectral resolutions of state-of-the-art sensors. As such, sparsity is important to enable the future of spectral compression and analytics. It has been observed that environmental and atmospheric effects, including scattering, can produce nonlinear effects posing challenges for existing source separation and compression methods. We present a novel transformation into Hilbert spaces for pruning and constructing sparse representations via non-negative least squares minimization. Then we introduce max likelihood compression vectors to decrease information loss. Our approach is benchmarked against standard pruning and least squares as well as deep learning methods. Our methods are evaluated in terms of overall spectral reconstruction error and compression rate using real and synthetic data. We find that pruning least squares methods converge quickly unlike matching pursuit methods. We find that Hilbert space pruning can reduce error by as much as 40% of the error of standard pruning and also outperform neural network autoencoders.
    摘要 We propose a novel transformation into Hilbert spaces for pruning and constructing sparse representations via non-negative least squares minimization. We also introduce max likelihood compression vectors to reduce information loss. Our approach is benchmarked against standard pruning and least squares methods as well as deep learning methods.We evaluate our methods in terms of overall spectral reconstruction error and compression rate using real and synthetic data. We find that pruning least squares methods converge quickly, unlike matching pursuit methods. Additionally, Hilbert space pruning can reduce error by as much as 40% of the error of standard pruning and outperform neural network autoencoders.

Towards Personalized Federated Learning via Heterogeneous Model Reassembly

  • paper_url: http://arxiv.org/abs/2308.08643
  • repo_url: None
  • paper_authors: Jiaqi Wang, Xingyi Yang, Suhan Cui, Liwei Che, Lingjuan Lyu, Dongkuan Xu, Fenglong Ma
  • for: 本文目的是解决联合学习中的模型异同问题,CLIENTS possessing 不同网络结构的模型。
  • methods: 我们提出了一个名为pFedHR的新框架,利用异同模型重新组装来实现个性化联合学习。具体来说,我们将客户端模型匹配优化任务视为服务器端的一个模型匹配优化任务。此外,pFedHR可以自动和 Dynamically生成有用和多样化的个性化候选者,减少人工干预。
  • results: 我们的实验结果表明,pFedHR在三个数据集上下降比基eline在IID和非IIDSetting下表现较好。此外,pFedHR可以减少使用不同公共数据导致的负面影响,同时自动生成多样化的个性化模型。
    Abstract This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneous model reassembly to achieve personalized federated learning. In particular, we approach the problem of heterogeneous model personalization as a model-matching optimization task on the server side. Moreover, pFedHR automatically and dynamically generates informative and diverse personalized candidates with minimal human intervention. Furthermore, our proposed heterogeneous model reassembly technique mitigates the adverse impact introduced by using public data with different distributions from the client data to a certain extent. Experimental results demonstrate that pFedHR outperforms baselines on three datasets under both IID and Non-IID settings. Additionally, pFedHR effectively reduces the adverse impact of using different public data and dynamically generates diverse personalized models in an automated manner.
    摘要 Translation notes:* "pFedHR" is translated as "PFedHR" (简化中文:PFedHR)* "heterogeneous model reassembly" is translated as "heterogeneous model reassembly" (简化中文:不同模型重新组装)* "personalized federated learning" is translated as "个性化联合学习" (简化中文:个性化联合学习)* "IID" and "Non-IID" are translated as "同分布" and "不同分布" (简化中文:同分布和不同分布)* "public data" is translated as "公共数据" (简化中文:公共数据)* "client data" is translated as "客户数据" (简化中文:客户数据)

Non-monotone Sequential Submodular Maximization

  • paper_url: http://arxiv.org/abs/2308.08641
  • repo_url: None
  • paper_authors: Shaojie Tang, Jing Yuan
  • for: 本研究是关于sequential submodular maximization的基本问题,具体目标是从集合$V$中选择和排序$k$个项目,以便最大化Weighted总和$k$个可能不升序的submodular函数$f_1, \cdots ,f_k: 2^V \rightarrow \mathbb{R}^+ $。
  • methods: 本研究提出了一些有效的解决方案,包括flexible和固定长度约束的情况,以及特殊情况下的相同实用函数。
  • results: 实验证明了我们提出的算法在视频推荐领域具有有效性。研究结果对于推荐系统和择寸优化领域有重要意义,因为项目的顺序对总值产生重要影响。
    Abstract In this paper, we study a fundamental problem in submodular optimization, which is called sequential submodular maximization. Specifically, we aim to select and rank a group of $k$ items from a ground set $V$ such that the weighted summation of $k$ (possibly non-monotone) submodular functions $f_1, \cdots ,f_k: 2^V \rightarrow \mathbb{R}^+$ is maximized, here each function $f_j$ takes the first $j$ items from this sequence as input. The existing research on sequential submodular maximization has predominantly concentrated on the monotone setting, assuming that the submodular functions are non-decreasing. However, in various real-world scenarios, like diversity-aware recommendation systems, adding items to an existing set might negatively impact the overall utility. In response, this paper pioneers the examination of the aforementioned problem with non-monotone submodular functions and offers effective solutions for both flexible and fixed length constraints, as well as a special case with identical utility functions. The empirical evaluations further validate the effectiveness of our proposed algorithms in the domain of video recommendations. The results of this research have implications in various fields, including recommendation systems and assortment optimization, where the ordering of items significantly impacts the overall value obtained.
    摘要 To address this issue, we explore the problem with non-monotone submodular functions and propose effective solutions for both flexible and fixed length constraints, as well as a special case with identical utility functions. Our proposed algorithms are validated through empirical evaluations in the domain of video recommendations, and the results have implications in various fields, including recommendation systems and assortment optimization, where the ordering of items significantly impacts the overall value obtained.

Fair GANs through model rebalancing with synthetic data

  • paper_url: http://arxiv.org/abs/2308.08638
  • repo_url: None
  • paper_authors: Anubhav Jain, Nasir Memon, Julian Togelius
  • for: 这篇研究旨在解决深度生成模型在训练时的偏见问题,特别是在收集数据时遇到的问题,如收集不够代表性的数据集。
  • methods: 我们提出了一种方法来mitigate偏见在现有的生成对抗网络中,通过在当前的对抗网络中进行潜在空间探索,从而获得了平衡的数据,并使用这些数据重训一个平衡的生成模型。我们还提出了一个偏见缓和损失函数,可以在训练时实现偏见缓和。
  • results: 我们在使用Stylegan2模型训练在FFHQ数据集上进行了过度种族优化,并获得了与传统方法相比的5倍以上的改进,同时保持了图像质量。我们还验证了我们的方法在Cifar-10数据集上的效果。最后,我们认为传统使用的图像质量指标(如Frechet对抗距离)无法应对偏见缓和问题。
    Abstract Deep generative models require large amounts of training data. This often poses a problem as the collection of datasets can be expensive and difficult, in particular datasets that are representative of the appropriate underlying distribution (e.g. demographic). This introduces biases in datasets which are further propagated in the models. We present an approach to mitigate biases in an existing generative adversarial network by rebalancing the model distribution. We do so by generating balanced data from an existing unbalanced deep generative model using latent space exploration and using this data to train a balanced generative model. Further, we propose a bias mitigation loss function that shows improvements in the fairness metric even when trained with unbalanced datasets. We show results for the Stylegan2 models while training on the FFHQ dataset for racial fairness and see that the proposed approach improves on the fairness metric by almost 5 times, whilst maintaining image quality. We further validate our approach by applying it to an imbalanced Cifar-10 dataset. Lastly, we argue that the traditionally used image quality metrics such as Frechet inception distance (FID) are unsuitable for bias mitigation problems.
    摘要 深度生成模型需要大量的训练数据。这经常导致数据收集成本高昂和困难,特别是收集 represntative 的下面分布(如人口结构)的数据集。这会导致模型中的偏见,这些偏见再次被模型中传递。我们提出了一种方法来减轻模型中的偏见,通过在现有的不均衡深度生成模型中进行秘密空间探索,并使用这些数据来训练一个均衡的生成模型。此外,我们提出了一种偏见减轻损失函数,这个函数可以在不均衡的数据集上训练,并且可以改善公平度量表,即frechet inception distance(FID)。我们在使用stylegan2模型,在ffhq数据集上进行人种公平性训练,并见到提议方法可以在公平度量表上提高约5倍,保持图像质量。此外,我们还验证了我们的方法,应用于cifar-10数据集上的偏见问题。最后,我们认为传统的图像质量指标,如frechet inception distance(FID),不适合偏见减轻问题。

FedPop: Federated Population-based Hyperparameter Tuning

  • paper_url: http://arxiv.org/abs/2308.08634
  • repo_url: None
  • paper_authors: Haokun Chen, Denis Krompass, Jindong Gu, Volker Tresp
  • for: 提高 federated learning 中 hyperparameter 的优化 (to optimize hyperparameters in federated learning)
  • methods: 使用人口生成算法进行 hyperparameter 优化 (using population-based evolutionary algorithms for hyperparameter optimization)
  • results: substantially outperforms concurrent state-of-the-art hyperparameter tuning methods for federated learning (substantially outperforms prior HP tuning methods for FL)I hope that helps! Let me know if you have any other questions.
    Abstract Federated Learning (FL) is a distributed machine learning (ML) paradigm, in which multiple clients collaboratively train ML models without centralizing their local data. Similar to conventional ML pipelines, the client local optimization and server aggregation procedure in FL are sensitive to the hyperparameter (HP) selection. Despite extensive research on tuning HPs for centralized ML, these methods yield suboptimal results when employed in FL. This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power. While some approaches have been proposed for HP-Tuning in FL, they are limited to the HPs for client local updates. In this work, we propose a novel HP-tuning algorithm, called Federated Population-based Hyperparameter Tuning (FedPop), to address this vital yet challenging problem. FedPop employs population-based evolutionary algorithms to optimize the HPs, which accommodates various HP types at both client and server sides. Compared with prior tuning methods, FedPop employs an online "tuning-while-training" framework, offering computational efficiency and enabling the exploration of a broader HP search space. Our empirical validation on the common FL benchmarks and complex real-world FL datasets demonstrates the effectiveness of the proposed method, which substantially outperforms the concurrent state-of-the-art HP tuning methods for FL.
    摘要 federated learning (FL) 是一种分布式机器学习 (ML) 模式,在多个客户端协同训练 ML 模型时,不需要集中客户端的本地数据。与传统的 ML 管道一样,在 FL 中客户端本地优化和服务器聚合过程中,选择 hyperparameter (HP) 也是敏感的。尽管对中央 ML 的 HP 优化方法进行了广泛的研究,但这些方法在 FL 中具有较差的效果,主要是因为它们的 "训练后优化" 框架不适合 FL 中有限的客户端计算能力。有些方法已经为 FL 中 HP 优化提出了方案,但它们只能处理客户端本地更新中的 HP。在这项工作中,我们提出了一种新的 HP 优化算法,called Federated Population-based Hyperparameter Tuning (FedPop),以解决这一重要但具有挑战性的问题。FedPop 使用了人口基因算法来优化 HP,可以处理多种 HP 类型,包括客户端和服务器两侧的 HP。与先前的优化方法相比,FedPop 采用了在线 "优化 while 训练" 框架,可以提高计算效率,并且可以探索更广泛的 HP 搜索空间。我们对常见的 FL 测试集和复杂的实际 FL 数据进行了实验验证,并证明了我们提出的方法的效果,与当前状态的同类方法相比,具有显著的提升。

LSTM-Based Forecasting Model for GRACE Accelerometer Data

  • paper_url: http://arxiv.org/abs/2308.08621
  • repo_url: https://github.com/darbeheshti/lstm-based-analysis-for-grace-accelerometers
  • paper_authors: Neda Darbeheshti, Elahe Moradi
  • for: 这 paper 的目的是填充 GRACE 卫星计划中的数据 gap,并预测 GRACE 加速仪器数据。
  • methods: 这 paper 使用 Long Short-Term Memory (LSTM) 网络来训练一个可以预测 GRACE 加速仪器数据的模型。
  • results: 这 paper 的结果表明 LSTM 预测模型能够有效地填充 GRACE 数据 gap 并预测加速仪器数据。
    Abstract The Gravity Recovery and Climate Experiment (GRACE) satellite mission, spanning from 2002 to 2017, has provided a valuable dataset for monitoring variations in Earth's gravity field, enabling diverse applications in geophysics and hydrology. The mission was followed by GRACE Follow-On in 2018, continuing data collection efforts. The monthly Earth gravity field, derived from the integration different instruments onboard satellites, has shown inconsistencies due to various factors, including gaps in observations for certain instruments since the beginning of the GRACE mission. With over two decades of GRACE and GRACE Follow-On data now available, this paper proposes an approach to fill the data gaps and forecast GRACE accelerometer data. Specifically, we focus on accelerometer data and employ Long Short-Term Memory (LSTM) networks to train a model capable of predicting accelerometer data for all three axes. In this study, we describe the methodology used to preprocess the accelerometer data, prepare it for LSTM training, and evaluate the model's performance. Through experimentation and validation, we assess the model's accuracy and its ability to predict accelerometer data for the three axes. Our results demonstrate the effectiveness of the LSTM forecasting model in filling gaps and forecasting GRACE accelerometer data.
    摘要 gravitational Recovery and Climate Experiment (GRACE) 卫星任务,从2002年至2017年,提供了监测地球重力场变化的有价值数据集,推动了多种地球物理和水文应用。这个任务于2018年被GRACE Follow-On接续,继续数据采集。每月的地球重力场,来自不同卫星上的仪器集成,存在各种因素的差异,包括GRACE任务开始时的一些仪器观测缺失。 现在GRACE和GRACE Follow-On任务共计超过二十年的数据已经可用,这篇论文提出了一种方法,用于填充数据缺失并预测GRACE加速仪数据。我们专注于加速仪数据,使用Long Short-Term Memory(LSTM)网络训练一个能够预测加速仪数据的模型。 在这篇论文中,我们描述了对加速仪数据的预处理方法、LSTM训练准备和模型性能评价的方法。通过实验和验证,我们评估了模型的准确性和预测加速仪数据的三个轴的能力。我们的结果表明LSTM预测模型可以有效地填充GRACE加速仪数据的数据缺失和预测加速仪数据。

Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought

  • paper_url: http://arxiv.org/abs/2308.08614
  • repo_url: None
  • paper_authors: Bin Lei, pei-Hung Lin, Chunhua Liao, Caiwen Ding
  • for: 提高大规模模型的逻辑推理能力
  • methods: 提出Graph of Thoughts(GoT)引导技术
  • results: 比GPT-4高$89.7%$, $86%$,和$56%$的精度提高,和SOTA引导方法(Tree of Thought)的平均提高$23%$, $24%$,和$15%$
    Abstract Recent advancements in large-scale models, such as GPT-4, have showcased remarkable capabilities in addressing standard queries. However, when facing complex problems that require multi-step logical reasoning, their accuracy dramatically decreases. Current research has explored the realm of \textit{prompting engineering} to bolster the inferential capacities of these models. Our paper unveils a pioneering prompting technique, dubbed \textit{Graph of Thoughts (GoT)}. Through testing on a trio of escalating challenges: the 24-point game, resolution of high-degree polynomial equations, and derivation of formulas for recursive sequences, our method outperformed GPT-4, achieving accuracy improvements of $89.7\%$, $86\%$, and $56\%$ for each respective task. Moreover, when juxtaposed with the state-of-the-art (SOTA) prompting method, \textit{Tree of Thought (ToT)}, our approach registered an average accuracy boost of $23\%$, $24\%$, and $15\%$.
    摘要 最近的大型模型,如GPT-4,在处理标准查询方面表现出了惊人的能力。然而,当面临复杂的问题,需要多步逻辑推理时,其准确率却显著下降。当前的研究在\"推荐工程\"方面进行了努力。我们的论文揭示了一种创新的推荐技巧,名为\"思维图(GoT)\"。通过对三个增难的任务进行测试:24点游戏、解高度多项式方程、和 recursive sequences 的计算,我们的方法比GPT-4高$89.7\%$, $86\%$,和$56\%$。此外,与现有的 state-of-the-art(SOTA)推荐方法,\"树图思维(ToT)\",我们的方法在平均上提高了$23\%$, $24\%$,和$15\%$。

Integrating Renewable Energy in Agriculture: A Deep Reinforcement Learning-based Approach

  • paper_url: http://arxiv.org/abs/2308.08611
  • repo_url: None
  • paper_authors: A. Wahid, I faiud, K. Mason
  • for: 这个研究是为了帮助农业投资者通过深度Q学习网络(DQN)优化农业 photovoltaic(PV)系统安装的决策。
  • methods: 这个研究开发了一个基于DQN的框架,以帮助农业投资者根据安装预算、政府补贴、能源需求、系统成本等因素做出了数据驱动的决策。
  • results: 研究发现,通过实施奖励机制,DQN可以学习据驱动的决策,并提供了对PV安装的全面理解。这种技术有助于农业投资者做出更有效的能源决策,提高能效性、降低环境影响和提高利润。
    Abstract This article investigates the use of Deep Q-Networks (DQNs) to optimize decision-making for photovoltaic (PV) systems installations in the agriculture sector. The study develops a DQN framework to assist agricultural investors in making informed decisions considering factors such as installation budget, government incentives, energy requirements, system cost, and long-term benefits. By implementing a reward mechanism, the DQN learns to make data-driven decisions on PV integration. The analysis provides a comprehensive understanding of how DQNs can support investors in making decisions about PV installations in agriculture. This research has significant implications for promoting sustainable and efficient farming practices while also paving the way for future advancements in this field. By leveraging DQNs, agricultural investors can make optimized decisions that improve energy efficiency, reduce environmental impact, and enhance profitability. This study contributes to the advancement of PV integration in agriculture and encourages further innovation in this promising area.
    摘要

Atom-by-atom protein generation and beyond with language models

  • paper_url: http://arxiv.org/abs/2308.09482
  • repo_url: None
  • paper_authors: Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik
  • for: 这 paper 是为了探讨蛋白质语言模型是否可以学习蛋白质的atom级表示,并且可以生成蛋白质之外的其他分子。
  • methods: 这 paper 使用的方法是使用化学语言模型,这些模型可以学习小分子的atom级表示,包括所有的原子、键和环。
  • results: 这 paper 的结果表明,化学语言模型可以学习蛋白质的atom级表示,并且可以生成蛋白质之外的其他分子,包括 modificated 的侧链和不自然的氨基酸。此外,这 paper 还发现化学语言模型可以同时探索蛋白质空间和化学空间,并生成新的蛋白质-药物 conjugate。
    Abstract Protein language models learn powerful representations directly from sequences of amino acids. However, they are constrained to generate proteins with only the set of amino acids represented in their vocabulary. In contrast, chemical language models learn atom-level representations of smaller molecules that include every atom, bond, and ring. In this work, we show that chemical language models can learn atom-level representations of proteins enabling protein generation unconstrained to the standard genetic code and far beyond it. In doing so, we show that language models can generate entire proteins atom by atom -- effectively learning the multiple hierarchical layers of molecular information that define proteins from their primary sequence to their secondary, and tertiary structure. We demonstrate language models are able to explore beyond protein space -- generating proteins with modified sidechains that form unnatural amino acids. Even further, we find that language models can explore chemical space and protein space simultaneously and generate novel examples of protein-drug conjugates. The results demonstrate the potential for biomolecular design at the atom level using language models.
    摘要

Proprioceptive Learning with Soft Polyhedral Networks

  • paper_url: http://arxiv.org/abs/2308.08538
  • repo_url: None
  • paper_authors: Xiaobo Liu, Xudong Han, Wei Hong, Fang Wan, Chaoyang Song
  • for: 本研究旨在开发一种能够实现自适应吸附、柔软 proprioception 和高精度力学感知的软体网络,用于Robotics 应用。
  • methods: 本研究使用软体网络和高速运动追踪系统,并通过学习动态力学特征来实现自适应吸附和 proprioception。
  • results: 研究显示,软体网络可以在实时情况下测量6D 力和扭矩的准确性为0.25/0.24/0.35 N和0.025/0.034/0.006 Nm,并在静态适应中添加塑性和待遇模式以细化预测结果。这种软体网络具有简单设计、全面适应、 proprioception 和高精度力学感知等特点,适用于低成本 Robotics 应用,可以进行敏捷和竞争性的握摸、软制造和人机交互等任务,并且具有超过100万次使用寿命。
    Abstract Proprioception is the "sixth sense" that detects limb postures with motor neurons. It requires a natural integration between the musculoskeletal systems and sensory receptors, which is challenging among modern robots that aim for lightweight, adaptive, and sensitive designs at a low cost. Here, we present the Soft Polyhedral Network with an embedded vision for physical interactions, capable of adaptive kinesthesia and viscoelastic proprioception by learning kinetic features. This design enables passive adaptations to omni-directional interactions, visually captured by a miniature high-speed motion tracking system embedded inside for proprioceptive learning. The results show that the soft network can infer real-time 6D forces and torques with accuracies of 0.25/0.24/0.35 N and 0.025/0.034/0.006 Nm in dynamic interactions. We also incorporate viscoelasticity in proprioception during static adaptation by adding a creep and relaxation modifier to refine the predicted results. The proposed soft network combines simplicity in design, omni-adaptation, and proprioceptive sensing with high accuracy, making it a versatile solution for robotics at a low cost with more than 1 million use cycles for tasks such as sensitive and competitive grasping, and touch-based geometry reconstruction. This study offers new insights into vision-based proprioception for soft robots in adaptive grasping, soft manipulation, and human-robot interaction.
    摘要 Proprioception 是机器人的“第六感”,通过电动神经元来检测四肢的姿势。它需要自然地结合肌骨系统和感测器,这在现代机器人设计中是一个挑战。我们现在介绍了一种名为软多面网络的设计,具有内置的视觉捕捉和 proprioceptive 学习。这种设计可以在多向交互中进行适应性,通过学习动态特征来进行适应性和弹性 proprioception。我们还在静态适应中添加了延展和塑性修饰器,以便精细化预测结果。我们的软网络设计简单、多样化、 proprioceptive 感知和高精度准确,可以在低成本下实现100万次使用循环,用于敏捷抓取、软操作和人机交互等任务。这项研究为软机器人在适应抓取、软抓取和人机交互领域提供了新的视野和突破。

Can Transformers Learn Optimal Filtering for Unknown Systems?

  • paper_url: http://arxiv.org/abs/2308.08536
  • repo_url: None
  • paper_authors: Haldun Balim, Zhe Du, Samet Oymak, Necmiye Ozay
  • for: 这个论文旨在使用 transformers 来解决动力系统中的优化输出估计问题,并证明 transformers 可以快速适应不同的系统并且达到优化的性能。
  • methods: 这篇论文使用 transformers 来生成输出预测,并通过训练 transformers 使其能够适应不同的系统。
  • results: 论文的实验结果表明,使用 transformers 可以快速适应不同的系统并且达到优化的性能,并且在具有非同偶播动的系统上也表现良好。
    Abstract Transformers have demonstrated remarkable success in natural language processing; however, their potential remains mostly unexplored for problems arising in dynamical systems. In this work, we investigate the optimal output estimation problem using transformers, which generate output predictions using all the past ones. We train the transformer using various systems drawn from a prior distribution and then evaluate its performance on previously unseen systems from the same distribution. As a result, the obtained transformer acts like a prediction algorithm that learns in-context and quickly adapts to and predicts well for different systems - thus we call it meta-output-predictor (MOP). MOP matches the performance of the optimal output estimator, based on Kalman filter, for most linear dynamical systems even though it does not have access to a model. We observe via extensive numerical experiments that MOP also performs well in challenging scenarios with non-i.i.d. noise, time-varying dynamics, and nonlinear dynamics like a quadrotor system with unknown parameters. To further support this observation, in the second part of the paper, we provide statistical guarantees on the performance of MOP and quantify the required amount of training to achieve a desired excess risk during test-time. Finally, we point out some limitations of MOP by identifying two classes of problems MOP fails to perform well, highlighting the need for caution when using transformers for control and estimation.
    摘要 卷积 Nevilles 已经在自然语言处理中展现出惊人的成功,但它们的潜在仍然未被完全探索,特别是在动力系统中。在这项工作中,我们使用卷积来解决输出预测问题,卷积可以使用所有过去的输出来生成输出预测。我们在不同的系统 drawn from a prior distribution 上训练卷积,然后评估其在未经见过的系统上的性能。因此,得到的卷积可以被称为元输出预测器(MOP)。MOP与基于 Kalman 滤波器的优化输出预测器匹配性能,即使它没有访问模型。我们通过广泛的数字实验观察到,MOP在非同分布的噪声、时间变化的动力系统和非线性动力系统(如四旋翼系统)中也表现良好。在第二部分的论文中,我们为MOP的性能提供了统计保证,并评估在测试时需要多少训练时间来实现所需的过量风险。最后,我们指出了MOP在某些情况下的局限性,并提出了使用卷积进行控制和估计时应注意的一些问题。

Painter: Teaching Auto-regressive Language Models to Draw Sketches

  • paper_url: http://arxiv.org/abs/2308.08520
  • repo_url: None
  • paper_authors: Reza Pourreza, Apratim Bhattacharyya, Sunny Panchal, Mingu Lee, Pulkit Madan, Roland Memisevic
  • for: 这个论文是用于应用大语言模型(LLM)在图像生成任务上的,以直接生成虚拟画刷的笔刷来绘制图像。
  • methods: 这个论文使用了预先训练在大文本 corpus 上的 off-the-shelf LLM,通过 fine-tuning 来新任务 while preserving 语言理解能力。
  • results: 论文提出了一种基于 LLM 的自然语言描述到笔刷生成方法,可以从文本描述中生成绘制,从画布上移除对象,并在绘制中探测和分类对象。结果很有推动力。
    Abstract Large language models (LLMs) have made tremendous progress in natural language understanding and they have also been successfully adopted in other domains such as computer vision, robotics, reinforcement learning, etc. In this work, we apply LLMs to image generation tasks by directly generating the virtual brush strokes to paint an image. We present Painter, an LLM that can convert user prompts in text description format to sketches by generating the corresponding brush strokes in an auto-regressive way. We construct Painter based on off-the-shelf LLM that is pre-trained on a large text corpus, by fine-tuning it on the new task while preserving language understanding capabilities. We create a dataset of diverse multi-object sketches paired with textual prompts that covers several object types and tasks. Painter can generate sketches from text descriptions, remove objects from canvas, and detect and classify objects in sketches. Although this is an unprecedented pioneering work in using LLMs for auto-regressive image generation, the results are very encouraging.
    摘要 巨型自然语言模型(LLM)在自然语言理解方面已经取得了巨大的进步,同时也在其他领域如计算机视觉、机器人学习、奖励学习等领域得到了成功应用。在这项工作中,我们通过直接生成虚拟毫幅来使用LLM进行图像生成任务。我们介绍了一种名为“画家”的LLM,可以将用户提交的文本描述转换为笔划画作。我们基于市场上 readily available的LLM,通过精心调整和保留语言理解能力来构建了画家。我们创建了一个多样化的多对象素描绘集合,其中包括了各种物体和任务。画家可以从文本描述中生成素描绘,将物体从画布上除除,以及在素描绘中检测和分类物体。虽然这是一项前所未有的使用LLM进行自动往返图像生成的研究,但结果很有激励力。

Two-and-a-half Order Score-based Model for Solving 3D Ill-posed Inverse Problems

  • paper_url: http://arxiv.org/abs/2308.08511
  • repo_url: None
  • paper_authors: Zirong Li, Yanyang Wang, Jianjia Zhang, Weiwen Wu, Hengyong Yu
  • For: 提高 CT 和 MRI 的三维图像重建精度* Methods: 使用两个半级分数模型(TOSM),在训练阶段学习二维数据分布,在重建阶段使用三个方向的补做分数(极轴、极斜、极斜)实现更加精确的重建* Results: 在大规模稀缺视图 CT 和快速 MRI 数据集上进行了广泛的实验,并取得了解决三维矩阵问题的最新成果,其中解决了 slice 间不一致问题,导致高质量三维图像重建
    Abstract Computed Tomography (CT) and Magnetic Resonance Imaging (MRI) are crucial technologies in the field of medical imaging. Score-based models have proven to be effective in addressing different inverse problems encountered in CT and MRI, such as sparse-view CT and fast MRI reconstruction. However, these models face challenges in achieving accurate three dimensional (3D) volumetric reconstruction. The existing score-based models primarily focus on reconstructing two dimensional (2D) data distribution, leading to inconsistencies between adjacent slices in the reconstructed 3D volumetric images. To overcome this limitation, we propose a novel two-and-a-half order score-based model (TOSM). During the training phase, our TOSM learns data distributions in 2D space, which reduces the complexity of training compared to directly working on 3D volumes. However, in the reconstruction phase, the TOSM updates the data distribution in 3D space, utilizing complementary scores along three directions (sagittal, coronal, and transaxial) to achieve a more precise reconstruction. The development of TOSM is built on robust theoretical principles, ensuring its reliability and efficacy. Through extensive experimentation on large-scale sparse-view CT and fast MRI datasets, our method demonstrates remarkable advancements and attains state-of-the-art results in solving 3D ill-posed inverse problems. Notably, the proposed TOSM effectively addresses the inter-slice inconsistency issue, resulting in high-quality 3D volumetric reconstruction.
    摘要 computed tomography (CT) 和 magnetic resonance imaging (MRI) 是医学图像领域的重要技术。 score-based 模型在 CT 和 MRI 中已经证明是有效的,但这些模型在获取精确三维(3D)组织图像时面临挑战。现有的 score-based 模型主要专注于重建二维(2D)数据分布,导致邻近层的不一致性问题。为了解决这个限制,我们提出了一个新的二阶三方向分布学习(TOSM)模型。在训练阶段,我们的 TOSM 将数据分布学习在二维空间,这样训练的复杂度比较低。但在重建阶段,TOSM 将数据分布更新到三维空间,运用三个方向(横轴、纵轴和横条)的补充分布来获得更精确的重建。TOSM 的开发基于坚固的理论基础, guarantees its reliability and efficacy。通过对大规模的简范 CT 和快速 MRI 数据进行广泛的实验,我们的方法展现出了很大的进步,并在解决 3D 逆问题中实现了州际级的结果。特别是,我们的 TOSM 干预了邻近层不一致性问题,实现了高品质的 3D 组织图像重建。

Autoencoding a Soft Touch to Learn Grasping from On-land to Underwater

  • paper_url: http://arxiv.org/abs/2308.08510
  • repo_url: https://github.com/bionicdl-sustech/amphibioussoftfinger
  • paper_authors: Ning Guo, Xudong Han, Xiaobo Liu, Shuqiao Zhong, Zhiyuan Zhou, Jian Lin, Jiansheng Dai, Fang Wan, Chaoyang Song
  • for: investigate the transferability of grasping knowledge from on-land to underwater
  • methods: vision-based soft robotic finger, Supervised Variational Autoencoder (SVAE)
  • results: superior adaptation to changing environments, soft, delicate, and reactive grasping, improved reliability and robustness at a much-reduced cost
    Abstract Robots play a critical role as the physical agent of human operators in exploring the ocean. However, it remains challenging to grasp objects reliably while fully submerging under a highly pressurized aquatic environment with little visible light, mainly due to the fluidic interference on the tactile mechanics between the finger and object surfaces. This study investigates the transferability of grasping knowledge from on-land to underwater via a vision-based soft robotic finger that learns 6D forces and torques (FT) using a Supervised Variational Autoencoder (SVAE). A high-framerate camera captures the whole-body deformations while a soft robotic finger interacts with physical objects on-land and underwater. Results show that the trained SVAE model learned a series of latent representations of the soft mechanics transferrable from land to water, presenting a superior adaptation to the changing environments against commercial FT sensors. Soft, delicate, and reactive grasping enabled by tactile intelligence enhances the gripper's underwater interaction with improved reliability and robustness at a much-reduced cost, paving the path for learning-based intelligent grasping to support fundamental scientific discoveries in environmental and ocean research.
    摘要 роботы扮演了人类运行员的物理代理人在探索海洋中发挥重要作用。然而,在高压力水中充满黑暗的环境中,握持物品的可靠性仍然是一个挑战,主要是因为水媒体的干扰对手指和物品表面之间的摩擦力产生了影响。本研究探讨了将从陆地上的握持知识传递到水下via一种视觉基于的软机械脚的学习6D力和扭矩(FT)的方法。一个高速摄像机记录了软机械脚与物理对象的全身弯曲变形,而在陆地和水下两个环境中,软机械脚与物理对象进行交互。结果显示,已经训练过的SVAE模型学习了一系列对软机械力学的隐藏表示,可以在水下环境中传递陆地上的握持知识,比商业FT传感器更有效,提供了软、细腻、敏感的握持能力,使软机械脚在水下交互中具有更高的可靠性和可重复性,降低成本,开拓出学习基于智能握持的 ocean 研究新途径。

ResBuilder: Automated Learning of Depth with Residual Structures

  • paper_url: http://arxiv.org/abs/2308.08504
  • repo_url: None
  • paper_authors: Julian Burghoff, Matthias Rottmann, Jill von Conta, Sebastian Schoenen, Andreas Witte, Hanno Gottschalk
  • for: 这个论文是为了开发一个基于ResNet的神经网络搜索算法,以实现高精度低计算成本的神经网络模型。
  • methods: 这个算法使用了一种名为Resbuilder的神经网络搜索算法,可以从零开始构建ResNet架构,并且可以修改现有架构并移除/插入ResNet块。
  • results: 在不同的图像分类任务上,Resbuilder可以达到相对较低的计算成本,同时保持高度的准确率,与当前的Off-the-shelf ResNets相比。此外,我们还在一个商业应用中进行了应用,并证明了这种性能可以普遍应用于不同的任务。
    Abstract In this work, we develop a neural architecture search algorithm, termed Resbuilder, that develops ResNet architectures from scratch that achieve high accuracy at moderate computational cost. It can also be used to modify existing architectures and has the capability to remove and insert ResNet blocks, in this way searching for suitable architectures in the space of ResNet architectures. In our experiments on different image classification datasets, Resbuilder achieves close to state-of-the-art performance while saving computational cost compared to off-the-shelf ResNets. Noteworthy, we once tune the parameters on CIFAR10 which yields a suitable default choice for all other datasets. We demonstrate that this property generalizes even to industrial applications by applying our method with default parameters on a proprietary fraud detection dataset.
    摘要 在这个工作中,我们开发了一种神经网络搜索算法,即Resbuilder,可以从头开始构建高精度且计算成本相对较低的ResNet架构。它还可以修改现有架构,并具有移除和插入ResNet块的能力,因此可以在ResNet架构空间中进行搜索。在我们对不同的图像分类 datasets 进行了实验后,Resbuilder 能够达到 state-of-the-art 性能水平,同时相比于各种预制 ResNet 更加经济。值得注意的是,我们在 CIFAR10 上进行了参数调整,得到了一个适合所有其他 datasets 的默认选择。我们示出了这种特性在实际应用中也能够普遍化,通过将我们的方法与默认参数应用于一个商业领域中的一个 propriety 销售欺诈 dataset。

Time Travel in LLMs: Tracing Data Contamination in Large Language Models

  • paper_url: http://arxiv.org/abs/2308.08493
  • repo_url: None
  • paper_authors: Shahriar Golchin, Mihai Surdeanu
    for:The paper aims to identify data contamination within large language models (LLMs) to better understand their effectiveness on other tasks.methods:The approach uses “guided instruction” prompts to identify potential contamination in individual instances, and assesses if an entire dataset partition is contaminated based on the average overlap score with reference instances or a classifier based on GPT-4 with in-context learning prompting.results:The approach achieves an accuracy between 92% and 100% in detecting data contamination with seven datasets, and finds that GPT-4 is contaminated with AG News, WNLI, and XSum datasets.Here is the simplified Chinese text for the three key points:for:这篇论文目标是通过识别大语言模型(LLM)中的数据污染来更好地理解它们在其他任务上的效果。methods:这种方法使用“导向指令”提示来识别个体实例中的可能污染,并根据平均重叠分数与参考实例进行整个数据分区是否污染的评估。results:该方法在七个数据集上达到了92%-100%的准确率,并发现GPT-4在AG News、WNLI和XSum数据集上存在污染。
    Abstract Data contamination, i.e., the presence of test data from downstream tasks in the training data of large language models (LLMs), is a potential major issue in understanding LLMs' effectiveness on other tasks. We propose a straightforward yet effective method for identifying data contamination within LLMs. At its core, our approach starts by identifying potential contamination in individual instances that are drawn from a small random sample; using this information, our approach then assesses if an entire dataset partition is contaminated. To estimate contamination of individual instances, we employ "guided instruction:" a prompt consisting of the dataset name, partition type, and the initial segment of a reference instance, asking the LLM to complete it. An instance is flagged as contaminated if the LLM's output either exactly or closely matches the latter segment of the reference. To understand if an entire partition is contaminated, we propose two ideas. The first idea marks a dataset partition as contaminated if the average overlap score with the reference instances (as measured by ROUGE or BLEURT) is statistically significantly better with the guided instruction vs. a general instruction that does not include the dataset and partition name. The second idea marks a dataset as contaminated if a classifier based on GPT-4 with in-context learning prompting marks multiple instances as contaminated. Our best method achieves an accuracy between 92% and 100% in detecting if an LLM is contaminated with seven datasets, containing train and test/validation partitions, when contrasted with manual evaluation by human expert. Further, our findings indicate that GPT-4 is contaminated with AG News, WNLI, and XSum datasets.
    摘要 大数据污染(即在大语言模型(LLM)训练数据中存在下游任务的测试数据)是一个 potential major issue,可能会影响 LLM 的效果。我们提出了一种简单 yet effective的方法来 Identify 大数据污染。我们的方法的核心是在一个小随机样本中 Identify 潜在的污染,然后判断整个数据分区是否污染。为了估计个体实例的污染情况,我们使用 "导向指令":一个包含数据集名、分区类型和参考实例的开头部分的提问,要求 LLM 完成它。如果 LLM 的输出与参考实例的后半部分匹配,则认为该实例污染。要判断整个分区是否污染,我们提出了两个想法。第一个想法是,如果在指令中包含数据集名和分区类型时,LLM 的输出与参考实例的 overlap 得分(使用 ROUGE 或 BLEURT 评估)是 statistically significantly better 的,那么认为该分区污染。第二个想法是,如果一个基于 GPT-4 的分类器通过受Context learning提示标记多个实例为污染,那么认为该数据集污染。我们的最佳方法可以在七个数据集(包括训练和测试/验证分区)中准确地检测 LLM 是否污染,与人工评估比较。此外,我们的发现表明 GPT-4 污染 AG News、WNLI 和 XSum 数据集。

Label Propagation Techniques for Artifact Detection in Imbalanced Classes using Photoplethysmogram Signals

  • paper_url: http://arxiv.org/abs/2308.08480
  • repo_url: None
  • paper_authors: Clara Macabiau, Thanh-Dung Le, Kevin Albert, Philippe Jouvet, Rita Noumeir
  • for: 这个研究旨在提高血液压力信号中的精度,并增强血液压力信号基础的靠拢性。
  • methods: 这个研究使用了标签传播技术来传播标签 между血液压力信号样本。特别是在不均衡类型的情况下,清洁的血液压力信号样本被大量的噪声样本淹没。
  • results: 研究结果表明,使用标签传播技术可以高效地标注医疗数据集,即使清洁样本scarce。对artefact的分类,我们比较了经典分类器(如权重平均分类器、神经网络等)和自动标注算法。结果显示,使用标签传播技术可以更好地检测artefact。
    Abstract Photoplethysmogram (PPG) signals are widely used in healthcare for monitoring vital signs, but they are susceptible to motion artifacts that can lead to inaccurate interpretations. In this study, the use of label propagation techniques to propagate labels among PPG samples is explored, particularly in imbalanced class scenarios where clean PPG samples are significantly outnumbered by artifact-contaminated samples. With a precision of 91%, a recall of 90% and an F1 score of 90% for the class without artifacts, the results demonstrate its effectiveness in labeling a medical dataset, even when clean samples are rare. For the classification of artifacts our study compares supervised classifiers such as conventional classifiers and neural networks (MLP, Transformers, FCN) with the semi-supervised label propagation algorithm. With a precision of 89%, a recall of 95% and an F1 score of 92%, the KNN supervised model gives good results, but the semi-supervised algorithm performs better in detecting artifacts. The findings suggest that the semi-supervised algorithm label propagation hold promise for artifact detection in PPG signals, which can enhance the reliability of PPG-based health monitoring systems in real-world applications.
    摘要

LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs

  • paper_url: http://arxiv.org/abs/2308.08469
  • repo_url: None
  • paper_authors: Ching Chang, Wen-Chih Peng, Tien-Fu Chen
    for: LLM4TS is designed to enhance time-series forecasting by leveraging pre-trained Large Language Models (LLMs).methods: The approach combines time-series patching with temporal encoding, and uses a two-stage fine-tuning process with supervised fine-tuning and task-specific downstream fine-tuning. Additionally, the model utilizes Parameter-Efficient Fine-Tuning (PEFT) techniques to adapt the pre-trained LLMs for time-series forecasting.results: LLM4TS has achieved state-of-the-art results in long-term forecasting, and has demonstrated exceptional capabilities as both a robust representation learner and an effective few-shot learner.Here is the Chinese translation of the three key points:for: LLM4TS 是为了提高时间序列预测的表现,利用预训练的大语言模型(LLMs)。methods: 该方法使用时间序列补充和时间编码,并采用了两阶段训练过程:首先进行监督训练,然后进行任务特定的下游训练。此外,模型还使用了Parameter-Efficient Fine-Tuning(PEFT)技术来适应预训练 LLMS。results: LLM4TS 已经实现了长期预测的州际纪录,并表现出了出色的Robust Representation Learning和几招学习能力。
    Abstract In this work, we leverage pre-trained Large Language Models (LLMs) to enhance time-series forecasting. Mirroring the growing interest in unifying models for Natural Language Processing and Computer Vision, we envision creating an analogous model for long-term time-series forecasting. Due to limited large-scale time-series data for building robust foundation models, our approach LLM4TS focuses on leveraging the strengths of pre-trained LLMs. By combining time-series patching with temporal encoding, we have enhanced the capability of LLMs to handle time-series data effectively. Inspired by the supervised fine-tuning in chatbot domains, we prioritize a two-stage fine-tuning process: first conducting supervised fine-tuning to orient the LLM towards time-series data, followed by task-specific downstream fine-tuning. Furthermore, to unlock the flexibility of pre-trained LLMs without extensive parameter adjustments, we adopt several Parameter-Efficient Fine-Tuning (PEFT) techniques. Drawing on these innovations, LLM4TS has yielded state-of-the-art results in long-term forecasting. Our model has also shown exceptional capabilities as both a robust representation learner and an effective few-shot learner, thanks to the knowledge transferred from the pre-trained LLM.
    摘要 在这项工作中,我们利用预训练的大语言模型(LLM)提高时间序列预测。随着自然语言处理和计算机视觉模型的统一发展,我们期望创建相似的模型,以提高长期时间序列预测能力。由于有限的大规模时间序列数据建立坚实基础模型,我们的方法LLM4TS专注于利用预训练LLM的优势。通过将时间序列补充与时间编码相结合,我们有效地使得LLM处理时间序列数据。受到协助chatbot领域的超级vised fine-tuning的激发,我们采用了两个阶段的精度调整过程:首先进行有监督的精度调整,然后进行任务特定的下游精度调整。此外,为了不需要广泛参数调整,我们采用了多种Parameter-Efficient Fine-Tuning(PEFT)技术。通过这些创新,LLM4TS已经实现了长期预测的state-of-the-art结果。我们的模型还表现出了出色的robust表示学习和几招学习能力,这与预训练LLM传递的知识有着密切的关系。

An Expert’s Guide to Training Physics-informed Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08468
  • repo_url: https://github.com/predictiveintelligencelab/jaxpi
  • paper_authors: Sifan Wang, Shyam Sankaran, Hanwen Wang, Paris Perdikaris
  • for: 本研究旨在提供PINNs训练最佳实践和挑战性问题集,以提高PINNs的训练效率和总准确性。
  • methods: 本研究使用了多种architecture和训练策略,包括JAX库的优化。
  • results: 研究显示了PINNs训练最佳实践和挑战性问题集可以获得状态之准确性,并提供了未来研究使用的强制基线。
    Abstract Physics-informed neural networks (PINNs) have been popularized as a deep learning framework that can seamlessly synthesize observational data and partial differential equation (PDE) constraints. Their practical effectiveness however can be hampered by training pathologies, but also oftentimes by poor choices made by users who lack deep learning expertise. In this paper we present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs. We also put forth a series of challenging benchmark problems that highlight some of the most prominent difficulties in training PINNs, and present comprehensive and fully reproducible ablation studies that demonstrate how different architecture choices and training strategies affect the test accuracy of the resulting models. We show that the methods and guiding principles put forth in this study lead to state-of-the-art results and provide strong baselines that future studies should use for comparison purposes. To this end, we also release a highly optimized library in JAX that can be used to reproduce all results reported in this paper, enable future research studies, as well as facilitate easy adaptation to new use-case scenarios.
    摘要 физикс-指定神经网络(PINNs)已经广泛应用于深度学习框架,可以快速同 observational data 和部分偏微分方程(PDE)约束相结合。然而,它们的实际效果可能受训练异常和用户缺乏深度学习专业知识的影响。在这篇论文中,我们提出了一系列最佳实践,可以大幅提高 PINNs 的训练效率和总准确性。我们还提出了一些挑战性的 benchmark 问题,描述了 PINNs 训练中的一些最主要的困难,并对不同架构选择和训练策略的影响进行了完整的和可重复的ablation 研究。我们显示了我们的方法和指导原则可以达到状态态的结果,并提供了强大的基准,以便 future studies 可以用于比较。为此,我们还发布了高度优化的库在 JAX 上,可以重现报告中的所有结果,支持未来的研究studies,以及方便新用 случа scenario 的适应。

On Neural Quantum Support Vector Machines

  • paper_url: http://arxiv.org/abs/2308.08467
  • repo_url: https://github.com/GhadaAbdulsalam/Explainable_Heart_Disease_Prediction_Using_Ensemble-Quantum_ML
  • paper_authors: Lars Simon, Manuel Radons
  • for: 本文介绍了四种算法用于神经支持向量机(NSVM)训练,并证明其可行性。在这篇通知中,我们将介绍神经量子支持向量机(NSVM),并扩展我们的结果到这种设定下。
  • methods: 本文使用的方法包括NSVM的量子kernel和相关的训练算法。
  • results: 本文的结果表明,使用量子kernel可以提高NSVM的性能,并且可以应用于各种问题。
    Abstract In \cite{simon2023algorithms} we introduced four algorithms for the training of neural support vector machines (NSVMs) and demonstrated their feasibility. In this note we introduce neural quantum support vector machines, that is, NSVMs with a quantum kernel, and extend our results to this setting.
    摘要 在《\cite{simon2023algorithms}》中,我们介绍了四种算法用于神经支持向量机(NSVM)的训练,并证明其可行性。在这个笔记中,我们将介绍神经量子支持向量机(NSVM),即使用量子kernel的NSVM,并扩展我们的结果到这个设定下。Here's the word-for-word translation:在《\cite{simon2023algorithms}》中,我们介绍了四种算法用于神经支持向量机(NSVM)的训练,并证明其可行性。在这个笔记中,我们将介绍神经量子支持向量机(NSVM),即使用量子kernel的NSVM,并扩展我们的结果到这个设定下。

Hierarchical Uncertainty Estimation for Medical Image Segmentation Networks

  • paper_url: http://arxiv.org/abs/2308.08465
  • repo_url: None
  • paper_authors: Xinyu Bai, Wenjia Bai
  • for: This paper aims to build a trustworthy medical image segmentation model by estimating the uncertainty of the model prediction.
  • methods: The proposed method leverages the hierarchical encoder architecture of state-of-the-art image segmentation networks, and uses a skip-connection module to model multi-level uncertainties.
  • results: The proposed method can achieve high segmentation performance and provide meaningful uncertainty maps that can be used for out-of-distribution detection.Here’s the Chinese translation of the three pieces of information:
  • for: 这篇论文目标是建立一个可靠的医疗图像分割模型,并且可以计算模型预测结果的不确定性。
  • methods: 该方法利用现有的医疗图像分割网络的层次编码结构,并使用skip-connection模块来模拟多级不确定性。
  • results: 该方法可以实现高效的图像分割性能,同时提供有用的不确定性地图,可以用于异常检测。
    Abstract Learning a medical image segmentation model is an inherently ambiguous task, as uncertainties exist in both images (noise) and manual annotations (human errors and bias) used for model training. To build a trustworthy image segmentation model, it is important to not just evaluate its performance but also estimate the uncertainty of the model prediction. Most state-of-the-art image segmentation networks adopt a hierarchical encoder architecture, extracting image features at multiple resolution levels from fine to coarse. In this work, we leverage this hierarchical image representation and propose a simple yet effective method for estimating uncertainties at multiple levels. The multi-level uncertainties are modelled via the skip-connection module and then sampled to generate an uncertainty map for the predicted image segmentation. We demonstrate that a deep learning segmentation network such as U-net, when implemented with such hierarchical uncertainty estimation module, can achieve a high segmentation performance, while at the same time provide meaningful uncertainty maps that can be used for out-of-distribution detection.
    摘要 学习医疗影像分割模型是一个自然而又不确定的任务,因为影像中的噪声和人工标注(人类错误和偏见)对模型训练中存在不确定性。为建立可靠的影像分割模型,不仅需要评估其性能,还需要估计模型预测结果中的不确定性。大多数当前的影像分割网络采用层次编码结构,从细到粗提取影像特征。在这种工作中,我们利用这种层次图像表示,并提议一种简单 yet effective的方法来估计多个水平的不确定性。这些多个不确定性被模拟为跳过连接模块,然后随机抽取来生成预测图像分割结果中的不确定性地图。我们示出,通过将深度学习分割网络和多个水平不确定性估计模块相结合,可以实现高水平的分割性能,同时提供有意义的不确定性地图,可以用于非标准分布检测。