cs.LG - 2023-08-18

Revisiting Skin Tone Fairness in Dermatological Lesion Classification

  • paper_url: http://arxiv.org/abs/2308.09640
  • repo_url: https://github.com/tkalbl/revisitingskintonefairness
  • paper_authors: Thorsten Kalb, Kaisar Kushibar, Celia Cintas, Karim Lekadir, Oliver Diaz, Richard Osuala
  • for: 本研究目的是为了评估皮肤疾病分类算法的公平性,因为皮肤疾病的表现可能因皮肤颜色而异常。
  • methods: 本研究使用了Individual Typology Angle(ITA)来估计皮肤颜色,并对ISIC18数据集进行了比较和分析。
  • results: 研究发现 existed large disagreement among previously published studies demonstrating the risks of ITA-based skin tone estimation methods,并且发现ISIC18数据集的不够多样性限制了其作为公平性分析的测试平台。I hope that helps! Let me know if you have any other questions.
    Abstract Addressing fairness in lesion classification from dermatological images is crucial due to variations in how skin diseases manifest across skin tones. However, the absence of skin tone labels in public datasets hinders building a fair classifier. To date, such skin tone labels have been estimated prior to fairness analysis in independent studies using the Individual Typology Angle (ITA). Briefly, ITA calculates an angle based on pixels extracted from skin images taking into account the lightness and yellow-blue tints. These angles are then categorised into skin tones that are subsequently used to analyse fairness in skin cancer classification. In this work, we review and compare four ITA-based approaches of skin tone classification on the ISIC18 dataset, a common benchmark for assessing skin cancer classification fairness in the literature. Our analyses reveal a high disagreement among previously published studies demonstrating the risks of ITA-based skin tone estimation methods. Moreover, we investigate the causes of such large discrepancy among these approaches and find that the lack of diversity in the ISIC18 dataset limits its use as a testbed for fairness analysis. Finally, we recommend further research on robust ITA estimation and diverse dataset acquisition with skin tone annotation to facilitate conclusive fairness assessments of artificial intelligence tools in dermatology. Our code is available at https://github.com/tkalbl/RevisitingSkinToneFairness.
    摘要 (Addressing fairness in lesion classification from dermatological images is crucial due to variations in how skin diseases manifest across different skin tones. However, the lack of skin tone labels in public datasets hinders the development of a fair classifier. To date, skin tone labels have been estimated prior to fairness analysis in independent studies using the Individual Typology Angle (ITA). Briefly, ITA calculates an angle based on pixels extracted from skin images, taking into account the lightness and yellow-blue tints. These angles are then categorized into skin tones that are subsequently used to analyze fairness in skin cancer classification. In this work, we review and compare four ITA-based approaches of skin tone classification on the ISIC18 dataset, a common benchmark for assessing skin cancer classification fairness in the literature. Our analyses reveal a high disagreement among previously published studies, demonstrating the risks of ITA-based skin tone estimation methods. Moreover, we investigate the causes of such large discrepancies among these approaches and find that the lack of diversity in the ISIC18 dataset limits its use as a testbed for fairness analysis. Finally, we recommend further research on robust ITA estimation and diverse dataset acquisition with skin tone annotation to facilitate conclusive fairness assessments of artificial intelligence tools in dermatology. Our code is available at https://github.com/tkalbl/RevisitingSkinToneFairness.)

Development of a Neural Network-based Method for Improved Imputation of Missing Values in Time Series Data by Repurposing DataWig

  • paper_url: http://arxiv.org/abs/2308.09635
  • repo_url: None
  • paper_authors: Daniel Zhang
  • for: 这个研究旨在提供一个可靠地填写时间序列资料中的缺失值的方法,以便在研究、企业和管理中做出更好的决策。
  • methods: 本研究使用了tsDataWig方法,它是修改了DataWig方法的时间序列资料填写方法,可以直接处理时间变数的值和填写时间序列资料中的缺失值。
  • results: 研究结果显示,tsDataWig方法可以在实验和实际应用中表现更好,并且可以填写更多的缺失值,而且不需要强制性地假设资料缺失机制。
    Abstract Time series data are observations collected over time intervals. Successful analysis of time series data captures patterns such as trends, cyclicity and irregularity, which are crucial for decision making in research, business, and governance. However, missing values in time series data occur often and present obstacles to successful analysis, thus they need to be filled with alternative values, a process called imputation. Although various approaches have been attempted for robust imputation of time series data, even the most advanced methods still face challenges including limited scalability, poor capacity to handle heterogeneous data types and inflexibility due to requiring strong assumptions of data missing mechanisms. Moreover, the imputation accuracy of these methods still has room for improvement. In this study, I developed tsDataWig (time-series DataWig) by modifying DataWig, a neural network-based method that possesses the capacity to process large datasets and heterogeneous data types but was designed for non-time series data imputation. Unlike the original DataWig, tsDataWig can directly handle values of time variables and impute missing values in complex time series datasets. Using one simulated and three different complex real-world time series datasets, I demonstrated that tsDataWig outperforms the original DataWig and the current state-of-the-art methods for time series data imputation and potentially has broad application due to not requiring strong assumptions of data missing mechanisms. This study provides a valuable solution for robustly imputing missing values in challenging time series datasets, which often contain millions of samples, high dimensional variables, and heterogeneous data types.
    摘要 时间序列数据是观测过时间间隔的观察结果。成功分析时间序列数据可以捕捉到趋势、循环和不规则性,这些特征对于研究、商业和管理决策非常重要。然而,时间序列数据中的缺失值经常发生,这些缺失值需要使用代理值来填充,这个过程被称为投入。虽然有很多方法用于适应时间序列数据的缺失值投入,但是这些方法仍然面临着限制Scalability、不能处理多种数据类型和固定的假设时间序列数据缺失机制的问题。此外,这些方法的投入准确性仍然有很大的可能性提高。在这个研究中,我修改了DataWig方法,创造了tsDataWig(时间序列DataWig),它可以直接处理时间变量的值并在复杂的时间序列数据中填充缺失值。使用一个模拟和三个不同的复杂实际时间序列数据集,我示出了tsDataWig在时间序列数据投入中的优于DataWig和当前状态艺术方法,并有广泛的应用前提,不需要强制的数据缺失机制假设。这个研究为抗强制时间序列数据投入提供了一个有价值的解决方案,这些数据经常包含百万个样本、高维度变量和多种数据类型。

VALERIE22 – A photorealistic, richly metadata annotated dataset of urban environments

  • paper_url: http://arxiv.org/abs/2308.09632
  • repo_url: None
  • paper_authors: Oliver Grau, Korbinian Hagn
  • For: The paper aims to contribute to the understanding of domain-specific factors that influence the perception performance of deep neural networks (DNNs) in the context of pedestrian detection in urban environments for automated driving.* Methods: The paper presents the VALERIE tool pipeline, a synthetic data generator that uses procedural tools to simulate photorealistic sensor data from automatically synthesized scenes. The dataset provides a rich set of metadata, enabling a variety of tests to understand the performance of DNNs.* Results: The paper demonstrates that the VALERIE22 dataset is one of the best-performing synthetic datasets currently available in the open domain, based on performance metrics. The dataset provides a unique set of features that enable researchers to understand the performance of DNNs in various scenarios.Here’s the same information in Simplified Chinese:* For: 这篇论文目标是帮助理解深度神经网络(DNNs)在城市环境中自动驾驶中的感知性能的域特有因素。* Methods: 论文提出了VALERIE工具框架,这是一个可生成 fotorealistic感知数据的工具,通过自动生成的场景来模拟感知数据。该数据集提供了丰富的元数据,可以对 DNNs 的性能进行多种测试。* Results: 论文表明,VALERIE22 数据集是目前公开领域中最好的 sintetic 数据集之一,基于性能指标。该数据集提供了一些独特的特征,如像素精度遮挡率、场景中的位置和相机与场景之间的距离和角度,以便研究 DNNs 的性能。
    Abstract The VALERIE tool pipeline is a synthetic data generator developed with the goal to contribute to the understanding of domain-specific factors that influence perception performance of DNNs (deep neural networks). This work was carried out under the German research project KI Absicherung in order to develop a methodology for the validation of DNNs in the context of pedestrian detection in urban environments for automated driving. The VALERIE22 dataset was generated with the VALERIE procedural tools pipeline providing a photorealistic sensor simulation rendered from automatically synthesized scenes. The dataset provides a uniquely rich set of metadata, allowing extraction of specific scene and semantic features (like pixel-accurate occlusion rates, positions in the scene and distance + angle to the camera). This enables a multitude of possible tests on the data and we hope to stimulate research on understanding performance of DNNs. Based on performance metric a comparison with several other publicly available datasets is provided, demonstrating that VALERIE22 is one of best performing synthetic datasets currently available in the open domain.
    摘要 VALERIE工具管道是一个人工数据生成器,旨在帮助理解深度神经网络(DNN)的域特性因素对感知性能的影响。这项工作是在德国研究项目“KI Absicherung”下进行的,旨在在城市环境中自动驾驶时,对人体检测器的DNN进行验证。VALERIE22数据集是通过VALERIE的批处理工具管道生成的,提供了高度 photorealistic 感知器模拟,自动生成的场景。该数据集具有独特的metadata,允许提取特定场景和semantic特征(如像素精度遮挡率、场景中的位置和相机到摄像头的距离)。这使得可以进行多种可能的测试,并希望促进DNN性能的研究。基于表现指标,VALERIE22与其他公共可用的数据集进行了比较,示出VALERIE22是目前公开领域中最佳的人工数据集之一。

Learning Computational Efficient Bots with Costly Features

  • paper_url: http://arxiv.org/abs/2308.09629
  • repo_url: None
  • paper_authors: Anthony Kobanda, Valliappan C. A., Joshua Romoff, Ludovic Denoyer
  • For: The paper is written for decision-making processes in various fields, particularly in real-time settings such as video games.* Methods: The paper proposes a generic offline learning approach that incorporates cost constraints to limit the computational cost of the input features. The approach is based on the Decision Transformer, but with additional budgeting constraints to dynamically choose the best input features at each timestep.* Results: The paper demonstrates the effectiveness of the proposed method on several tasks, including D4RL benchmarks and complex 3D environments similar to those found in video games. The results show that the method can achieve similar performance while using significantly fewer computational resources compared to classical approaches.Here’s the information in Simplified Chinese text:* 用途:论文用于各种领域中的决策过程,特别是在视频游戏等实时设置中。* 方法:论文提出一种通用的离线学习方法,其具有限制输入特征计算成本的功能。该方法基于决策变换器,但又增加了预算约束来动态选择最佳输入特征在每个时刻。* 结果:论文在多个任务上展示了方法的效果,包括D4RL标准准则和复杂的3D环境,并显示该方法可以与传统方法相比使用更少的计算资源来 достичь相似的性能。
    Abstract Deep reinforcement learning (DRL) techniques have become increasingly used in various fields for decision-making processes. However, a challenge that often arises is the trade-off between both the computational efficiency of the decision-making process and the ability of the learned agent to solve a particular task. This is particularly critical in real-time settings such as video games where the agent needs to take relevant decisions at a very high frequency, with a very limited inference time. In this work, we propose a generic offline learning approach where the computation cost of the input features is taken into account. We derive the Budgeted Decision Transformer as an extension of the Decision Transformer that incorporates cost constraints to limit its cost at inference. As a result, the model can dynamically choose the best input features at each timestep. We demonstrate the effectiveness of our method on several tasks, including D4RL benchmarks and complex 3D environments similar to those found in video games, and show that it can achieve similar performance while using significantly fewer computational resources compared to classical approaches.
    摘要 深度强化学习(DRL)技术在不同领域的决策过程中得到广泛应用。然而,一个常见的挑战是在决策过程中计算效率与学习模型解决特定任务的能力之间的负担。特别是在实时设定,如游戏中,模型需要在非常高频率下做出相关决策,并且具有非常有限的推理时间。在这种情况下,我们提出了一种通用的离线学习方法,其中输入特征的计算成本被考虑在内。我们 derivate了一种受成本限制的决策变换器,即优化的决策变换器,可以在每个时间步骤中动态选择最佳的输入特征。我们在多个任务上证明了我们的方法的有效性,包括D4RL数据集和复杂的3D环境,并显示它可以在计算资源方面减少了许多比 класси方法。

Constrained Bayesian Optimization Using a Lagrange Multiplier Applied to Power Transistor Design

  • paper_url: http://arxiv.org/abs/2308.09612
  • repo_url: None
  • paper_authors: Ping-Ju Chuang, Ali Saadat, Sara Ghazvini, Hal Edwards, William G. Vandenberghe
  • for: 优化LDMOS晶体管的设计过程,实现目标断裂电压(BV)。
  • methods: 将受限BO问题转化为常见BO问题,使用拉格朗日矩阵作为优化目标函数。
  • results: 自动地在设计空间中获得满足优化FOM和目标BV约束的设备,并在30-50V范围内探索设备的物理限制。
    Abstract We propose a novel constrained Bayesian Optimization (BO) algorithm optimizing the design process of Laterally-Diffused Metal-Oxide-Semiconductor (LDMOS) transistors while realizing a target Breakdown Voltage (BV). We convert the constrained BO problem into a conventional BO problem using a Lagrange multiplier. Instead of directly optimizing the traditional Figure-of-Merit (FOM), we set the Lagrangian as the objective function of BO. This adaptive objective function with a changeable Lagrange multiplier can address constrained BO problems which have constraints that require costly evaluations, without the need for additional surrogate models to approximate constraints. Our algorithm enables a device designer to set the target BV in the design space, and obtain a device that satisfies the optimized FOM and the target BV constraint automatically. Utilizing this algorithm, we have also explored the physical limits of the FOM for our devices in 30 - 50 V range within the defined design space.
    摘要 我们提出了一种新的受限制的整合算法(BO),用于优化LDMOS晶体管的设计过程,同时实现目标破坏电压(BV)。我们将受限制BO问题转化为一个常规BO问题,使用拉格朗日multiplier。而不是直接优化传统的图像质量指标(FOM),我们设置了拉格朗日作为算法的目标函数。这种适应目标函数的改变多余分子可以解决具有costly评估的受限制BO问题,无需额外的拟合模型来近似约束。我们的算法允许设计者在设计空间中设定目标BV,并自动获得满足优化FOM和目标BV约束的设备。通过这种算法,我们还探索了我们的设备在30-50V范围内的物理限制。

Solving PDEs on Spheres with Physics-Informed Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2308.09605
  • repo_url: None
  • paper_authors: Guanhang Lei, Zhen Lei, Lei Shi, Chenyu Zeng, Ding-Xuan Zhou
  • for: 解决部分导数方程(PDEs)的数学理论分析
  • methods: 使用物理学信息激活的径向神经网络(PICNN)和深度卷积神经网络(CNN),以及圆形幂分析
  • results: 提供了对圆形PDE的数学分析,并证明了对 Sobolev 范数的上界误差 bound,以及快速收敛率Here’s a more detailed explanation of each point:
  • for: The paper is focused on providing a mathematical analysis of the numerical performance of physics-informed convolutional neural networks (PICNNs) for solving partial differential equations (PDEs) on the sphere.
  • methods: The paper uses and improves the latest approximation results of deep convolutional neural networks and spherical harmonic analysis to establish an upper bound for the approximation error with respect to the Sobolev norm. The paper also integrates this with innovative localization complexity analysis to establish fast convergence rates for PICNN.
  • results: The paper provides theoretical results on the numerical performance of PICNN for solving PDEs on the sphere, including an upper bound for the approximation error and fast convergence rates. The results are also confirmed and supplemented by experiments.
    Abstract Physics-informed neural networks (PINNs) have been demonstrated to be efficient in solving partial differential equations (PDEs) from a variety of experimental perspectives. Some recent studies have also proposed PINN algorithms for PDEs on surfaces, including spheres. However, theoretical understanding of the numerical performance of PINNs, especially PINNs on surfaces or manifolds, is still lacking. In this paper, we establish rigorous analysis of the physics-informed convolutional neural network (PICNN) for solving PDEs on the sphere. By using and improving the latest approximation results of deep convolutional neural networks and spherical harmonic analysis, we prove an upper bound for the approximation error with respect to the Sobolev norm. Subsequently, we integrate this with innovative localization complexity analysis to establish fast convergence rates for PICNN. Our theoretical results are also confirmed and supplemented by our experiments. In light of these findings, we explore potential strategies for circumventing the curse of dimensionality that arises when solving high-dimensional PDEs.
    摘要 物理学 informed 神经网络 (PINNs) 已经在多种实验方面证明其高效地解决部分 diferencial 方程 (PDEs)。一些最近的研究还提出了 PINN 算法用于表面上的 PDEs。然而,对 PINNs 的数学性能,特别是在表面或 manifold 上的数学性能,仍然缺乏理论理解。在这篇论文中,我们建立了 физи学 informed 卷积神经网络 (PICNN) 的准确分析,用于解决在球体上的 PDEs。我们使用和改进了最新的深度卷积神经网络的近似结果和球面幂分析,证明了对 Sobolev нор的上界误差。然后,我们将这与创新的本地化复杂性分析相结合,以确定 PICNN 的快速收敛速率。我们的理论结果也被实验证明。在这些发现的基础上,我们探讨了解决高维度 PDEs 所带来的 "诅咒" 的策略。

Breaking the Complexity Barrier in Compositional Minimax Optimization

  • paper_url: http://arxiv.org/abs/2308.09604
  • repo_url: None
  • paper_authors: Jin Liu, Xiaokang Pan, Junwen Duan, Hongdong Li, Youqi Li, Zhe Qu
  • for: 本研究旨在解决机器学习中的复合最小化优化问题,包括分布robust训练和策略评估。
  • methods: 本文提出了嵌入STORM(NSTORM)算法,可以在$O(\kappa^3/\epsilon^3)$样本复杂度下找到一个$\epsilon$-准确的解决方案。同时,作者还提出了适应学习率的ADA-NSTORM算法,可以实现同样的样本复杂度而且在实验中表现更加有效。
  • results: 本文的方法可以匹配下界函数的最小化优化问题,而不需要大量的批处理大小。实验结果表明,ADA-NSTORM算法在实际应用中表现更加稳定和有效。
    Abstract Compositional minimax optimization is a pivotal yet under-explored challenge across machine learning, including distributionally robust training and policy evaluation for reinforcement learning. Current techniques exhibit suboptimal complexity or rely heavily on large batch sizes. This paper proposes Nested STOchastic Recursive Momentum (NSTORM), attaining the optimal sample complexity of $O(\kappa^3/\epsilon^3)$ for finding an $\epsilon$-accurate solution. However, NSTORM requires low learning rates, potentially limiting applicability. Thus we introduce ADAptive NSTORM (ADA-NSTORM) with adaptive learning rates, proving it achieves the same sample complexity while experiments demonstrate greater effectiveness. Our methods match lower bounds for minimax optimization without large batch requirements, validated through extensive experiments. This work significantly advances compositional minimax optimization, a crucial capability for distributional robustness and policy evaluation
    摘要 <>将文本翻译成简化中文。<>机器学习中的分组最小最大优化是一项重要 yet under-explored 挑战。现有技术具有较差的复杂性或者依赖于大批量。这篇论文提出了嵌套STOchastic Recursive Momentum(NSTORM),实现了 $\epsilon $-高精度解决方案的最佳样本复杂性 $O(\kappa^3/\epsilon^3)$。但NSTORM需要低学习率,可能限制其应用。因此,我们引入了自适应NSTORM(ADA-NSTORM),其中学习率适应自适应,证明它们实现了同样的样本复杂性,而实验表明它们更有效。我们的方法与最小最大优化下界匹配,无需大批量,通过广泛的实验验证。这项工作对分组最小最大优化做出了重要进展,这对分布robustness和策略评估非常重要。

Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

  • paper_url: http://arxiv.org/abs/2308.09596
  • repo_url: https://github.com/arpitdm/gnn_accuracy_fairness_tradeoff
  • paper_authors: Arpit Merchant, Carlos Castillo
  • for: 这个研究旨在探讨Graph Neural Networks(GNNs)在人工智能应用中是否存在偏见,并且提出了两种GNN-agnostic干预方法(PFR-AX和PostProcess)来减少这种偏见的影响。
  • methods: 这个研究使用了四个数据集进行大量的实验,测试了两种干预方法(PFR-AX和PostProcess)和三种基eline干预方法(random dropout、weighted dropout和PGD)在三个现代GNN模型上的效果。
  • results: 研究结果显示,PFR-AX和PostProcess可以提供精确的偏见控制和改善模型在保护群体中确认正面结果的自信度,但是不同数据集和GNN模型之间存在差异,无法一个通用的偏见控制方法。
    Abstract Graph neural networks (GNNs) are increasingly used in critical human applications for predicting node labels in attributed graphs. Their ability to aggregate features from nodes' neighbors for accurate classification also has the capacity to exacerbate existing biases in data or to introduce new ones towards members from protected demographic groups. Thus, it is imperative to quantify how GNNs may be biased and to what extent their harmful effects may be mitigated. To this end, we propose two new GNN-agnostic interventions namely, (i) PFR-AX which decreases the separability between nodes in protected and non-protected groups, and (ii) PostProcess which updates model predictions based on a blackbox policy to minimize differences between error rates across demographic groups. Through a large set of experiments on four datasets, we frame the efficacies of our approaches (and three variants) in terms of their algorithmic fairness-accuracy tradeoff and benchmark our results against three strong baseline interventions on three state-of-the-art GNN models. Our results show that no single intervention offers a universally optimal tradeoff, but PFR-AX and PostProcess provide granular control and improve model confidence when correctly predicting positive outcomes for nodes in protected groups.
    摘要 граф нейронные сети (GNNs) 在重要的人类应用中越来越普遍用于预测图中节点标签。它们可以将节点邻居特征集成到准确的分类中,同时也有可能扩大现有数据中的偏见或引入新的偏见 towards 保护类群体的成员。因此,我们需要衡量 GNNs 是否偏离和如何减轻其不良影响。为此,我们提出了两种 GNN-agnostic 举措,namely,(i) PFR-AX,减少保护和非保护群体之间节点的分离度,以及 (ii) PostProcess,根据黑盒政策更新模型预测,以最小化不同群体的错误率差异。通过大量实验 на four datasets,我们框定了我们的方法(和三个变种)的算法公正性-准确性负担和比较我们的结果与三种强基eline intervención on three state-of-the-art GNN 模型。我们的结果表明,无论单个 intervención 都不是通用的优化选择,但PFR-AX 和 PostProcess 可以提供细化的控制和提高模型对保护群体成员正确预测的自信心。

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

  • paper_url: http://arxiv.org/abs/2308.09583
  • repo_url: https://github.com/nlpxucan/wizardlm
  • paper_authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, Dongmei Zhang
  • for: 这篇论文主要目的是提高大型自然语言处理(NLP)模型的数学逻辑能力。
  • methods: 该论文提出了一种基于强化学习的 Reinforcement Learning from Evol-Instruct Feedback(RLEIF)方法,用于提高LLMs的数学逻辑能力。
  • results: 经过广泛的实验,WizardMath模型在两个数学逻辑测试 benchmark 上表现出色,与其他已有的开源模型相比,具有显著的优势。此外,WizardMath模型还可以在 GSM8k 和 MATH 两个测试 benchmark 上超越 ChatGPT-3.5、Claude Instant-1、PaLM-2 和 Minerva 等模型。
    Abstract Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of Llama-2, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. WizardMath surpasses all other open-source LLMs by a substantial margin. Furthermore, our model even outperforms ChatGPT-3.5, Claude Instant-1, PaLM-2 and Minerva on GSM8k, simultaneously surpasses Text-davinci-002, PaLM-1 and GPT-3 on MATH. More details and model weights are public at https://github.com/nlpxucan/WizardLM and https://huggingface.co/WizardLM.
    摘要 大型自然语言处理(NLP)模型(LLM),如GPT-4,在数学逻辑任务中表现出了惊人的能力。然而,现有的开源模型大多只是在大规模互联网数据上预训练而无法进行数学相关的优化。本文提出了一种基于我们的提议的强化策略——Reinforcement Learning from Evol-Instruct Feedback(RLEIF),用于提高Llama-2模型的数学逻辑能力。经过广泛的实验,我们发现了WizardMath模型的极高能力。WizardMath在GSM8k和MATH两个数学逻辑评测 benchmark 上大幅超越了所有开源LLM,并且 même surpasses ChatGPT-3.5、Claude Instant-1、PaLM-2和Minerva在GSM8k上,同时在MATH上还超越了Text-davinci-002、PaLM-1和GPT-3。更多细节和模型权重可以在 GitHub 上找到(https://github.com/nlpxucan/WizardLM)和 Hugging Face 上找到(https://huggingface.co/WizardLM)。

Physics-Informed Boundary Integral Networks (PIBI-Nets): A Data-Driven Approach for Solving Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2308.09571
  • repo_url: None
  • paper_authors: Monika Nagy-Huber, Volker Roth
  • for: The paper is written for solving partial differential equations (PDEs) in real-world applications, especially when there is limited information about boundary or initial conditions, or when unknown model parameters need to be identified.
  • methods: The paper proposes a data-driven approach called Physics-Informed Boundary Integral Networks (PIBI-Nets) to solve PDEs. PIBI-Nets only require collocation points at the computational domain boundary, which can reduce computational costs and achieve highly accurate results.
  • results: The paper demonstrates the excellent performance of PIBI-Nets for the Laplace and Poisson equations on both artificial data sets and a real-world application concerning the reconstruction of groundwater flows. PIBI-Nets outperform Physics-Informed Neural Networks (PINNs) in high-dimensional settings and can handle point sources in inverse problems using a principled and simple approach.
    Abstract Partial differential equations (PDEs) can describe many relevant phenomena in dynamical systems. In real-world applications, we commonly need to combine formal PDE models with (potentially noisy) observations. This is especially relevant in settings where we lack information about boundary or initial conditions, or where we need to identify unknown model parameters. In recent years, Physics-informed neural networks (PINNs) have become a popular tool for problems of this kind. In high-dimensional settings, however, PINNs often suffer from computational problems because they usually require dense collocation points over the entire computational domain. To address this problem, we present Physics-Informed Boundary Integral Networks (PIBI-Nets) as a data-driven approach for solving PDEs in one dimension less than the original problem space. PIBI-Nets only need collocation points at the computational domain boundary, while still achieving highly accurate results, and in several practical settings, they clearly outperform PINNs. Exploiting elementary properties of fundamental solutions of linear differential operators, we present a principled and simple way to handle point sources in inverse problems. We demonstrate the excellent performance of PIBI-Nets for the Laplace and Poisson equations, both on artificial data sets and within a real-world application concerning the reconstruction of groundwater flows.
    摘要 部分偏微分方程(PDEs)可以描述多种实际系统中的相关现象。在实际应用中,我们经常需要将正式的PDE模型与(可能具有噪声)观测结合起来。特别是在我们缺乏边界或初始条件信息时,或者需要确定未知模型参数时。在过去几年中,物理学推导的神经网络(PINNs)已成为这类问题的受欢迎工具。在高维度设置中,然而,PINNs经常因计算问题而受到限制,因为它们通常需要某些稠密的散点在计算域中。为解决这个问题,我们提出了基于物理学推导的边界积分网络(PIBI-Nets),这是一种数据驱动的方法,用于解决PDEs。PIBI-Nets只需在计算域边界上设置散点,却可以具有高度准确的结果,并在一些实际应用中表现出色,超过了PINNs。利用基本的线性微分方程的元素性质,我们提出了一种原则性的和简单的方法来处理反向问题。我们在拉普拉斯和波撤方程中进行了详细的实验,并在一个真实世界应用中对地下水流的重建中获得了出色的表现。

Investigating the Interplay between Features and Structures in Graph Learning

  • paper_url: http://arxiv.org/abs/2308.09570
  • repo_url: None
  • paper_authors: Daniele Castellana, Federico Errica
  • for: This paper aims to investigate the relationship between node features and target labels in deep graph networks, and to propose new metrics to measure the influence of node features on target labels.
  • methods: The paper uses two generative processes to build and study ad-hoc node classification tasks, and evaluates the performance of six models, including structure-agnostic ones.
  • results: The paper finds that previously defined metrics are not adequate when the assumption of a strong correlation between node features and target labels is relaxed, and proposes a new metric called Feature Informativeness to quantitatively measure the influence of node features on target labels.
    Abstract In the past, the dichotomy between homophily and heterophily has inspired research contributions toward a better understanding of Deep Graph Networks' inductive bias. In particular, it was believed that homophily strongly correlates with better node classification predictions of message-passing methods. More recently, however, researchers pointed out that such dichotomy is too simplistic as we can construct node classification tasks where graphs are completely heterophilic but the performances remain high. Most of these works have also proposed new quantitative metrics to understand when a graph structure is useful, which implicitly or explicitly assume the correlation between node features and target labels. Our work empirically investigates what happens when this strong assumption does not hold, by formalising two generative processes for node classification tasks that allow us to build and study ad-hoc problems. To quantitatively measure the influence of the node features on the target labels, we also use a metric we call Feature Informativeness. We construct six synthetic tasks and evaluate the performance of six models, including structure-agnostic ones. Our findings reveal that previously defined metrics are not adequate when we relax the above assumption. Our contribution to the workshop aims at presenting novel research findings that could help advance our understanding of the field.
    摘要

Normalization Is All You Need: Understanding Layer-Normalized Federated Learning under Extreme Label Shift

  • paper_url: http://arxiv.org/abs/2308.09565
  • repo_url: None
  • paper_authors: Guojun Zhang, Mahdi Beitollahi, Alex Bie, Xi Chen
  • for: 本文探讨了层normalization(LN)在 federated learning(FL)中的作用,特别是如何在非相关数据上表现出 surprisingly 的效果。
  • methods: 本文使用了层normalization(LN)和 feature normalization(FN)来控制feature collapse和本地适应,以提高 federated learning 的 globally training。
  • results: 实验结果表明,normalization 可以在 extreme label shift 下导致 drastic 的改善,并且对 learning rate 的选择有关键作用。
    Abstract Layer normalization (LN) is a widely adopted deep learning technique especially in the era of foundation models. Recently, LN has been shown to be surprisingly effective in federated learning (FL) with non-i.i.d. data. However, exactly why and how it works remains mysterious. In this work, we reveal the profound connection between layer normalization and the label shift problem in federated learning. To understand layer normalization better in FL, we identify the key contributing mechanism of normalization methods in FL, called feature normalization (FN), which applies normalization to the latent feature representation before the classifier head. Although LN and FN do not improve expressive power, they control feature collapse and local overfitting to heavily skewed datasets, and thus accelerates global training. Empirically, we show that normalization leads to drastic improvements on standard benchmarks under extreme label shift. Moreover, we conduct extensive ablation studies to understand the critical factors of layer normalization in FL. Our results verify that FN is an essential ingredient inside LN to significantly improve the convergence of FL while remaining robust to learning rate choices, especially under extreme label shift where each client has access to few classes.
    摘要 层normalization(LN)是深度学习中广泛采用的技术,尤其在基础模型时代。最近,LN在非独立数据(Federated Learning,FL)中表现出意外的有效性,但具体的原因和如何工作仍然不清楚。在这项工作中,我们揭示了层normalization和FL中标签变化问题之间的深刻关系。为了更好地理解层normalization在FL中,我们识别了FL中normalization方法的关键贡献机制,称为特征normalization(FN),它在批处头前应用normalization于隐藏特征表示。虽然LN和FN不会提高表达力,但它们控制特征塌积和本地适应,从而加速全局训练。我们的实验表明,normalization在极端标签变化下带来了很大的改善。此外,我们进行了广泛的减少研究,以了解层normalization在FL中关键的因素。我们的结果表明,FN是LN中重要的组成部分,可以在极端标签变化下提高FL的凝固性,尤其是当每个客户只有几个类时。

Eigenvalue-based Incremental Spectral Clustering

  • paper_url: http://arxiv.org/abs/2308.10999
  • repo_url: None
  • paper_authors: Mieczysław A. Kłopotek, Bartłmiej Starosta, Sławomir T. Wierzchoń
  • for: clustering large datasets using incremental spectral clustering
  • methods: split the data into manageable subsets, cluster each subset, and merge clusters based on eigenvalue spectrum similarity
  • results: clustering and merging the subsets yields clusters close to clustering the entire dataset
    Abstract Our previous experiments demonstrated that subsets collections of (short) documents (with several hundred entries) share a common normalized in some way eigenvalue spectrum of combinatorial Laplacian. Based on this insight, we propose a method of incremental spectral clustering. The method consists of the following steps: (1) split the data into manageable subsets, (2) cluster each of the subsets, (3) merge clusters from different subsets based on the eigenvalue spectrum similarity to form clusters of the entire set. This method can be especially useful for clustering methods of complexity strongly increasing with the size of the data sample,like in case of typical spectral clustering. Experiments were performed showing that in fact the clustering and merging the subsets yields clusters close to clustering the entire dataset.
    摘要 我们之前的实验表明,子集合(短文档)的聚合spectrum(卷积 Laplacian)具有一定的共同normalized方式。基于这一点,我们提出了一种增量 spectral clustering 方法。该方法包括以下步骤:(1)将数据分割成可管理的子集,(2)对每个子集进行 clustering,(3)根据聚合spectrum的相似性,将不同子集的cluster合并形成整个数据集的cluster。这种方法可以特别有用于数据样本规模增长的情况下,如典型的spectral clustering一样。我们的实验表明,将 subsets 分割并 merge 后的 clusters 与整个数据集的 clustering 几乎相同。

Attesting Distributional Properties of Training Data for Machine Learning

  • paper_url: http://arxiv.org/abs/2308.09552
  • repo_url: None
  • paper_authors: Vasisht Duddu, Anudeep Das, Nora Khayata, Hossein Yalame, Thomas Schneider, N. Asokan
  • for: Ensuring the trustworthiness of machine learning models by demonstrating desirable distributional properties of training data.
  • methods: Property inference and cryptographic mechanisms for data privacy-preserving property attestation.
  • results: An effective hybrid property attestation method for model trainers to demonstrate relevant distributional properties of training data to customers without revealing the data.
    Abstract The success of machine learning (ML) has been accompanied by increased concerns about its trustworthiness. Several jurisdictions are preparing ML regulatory frameworks. One such concern is ensuring that model training data has desirable distributional properties for certain sensitive attributes. For example, draft regulations indicate that model trainers are required to show that training datasets have specific distributional properties, such as reflecting diversity of the population. We propose the notion of property attestation allowing a prover (e.g., model trainer) to demonstrate relevant distributional properties of training data to a verifier (e.g., a customer) without revealing the data. We present an effective hybrid property attestation combining property inference with cryptographic mechanisms.
    摘要 机器学习(ML)的成功也导致了其可靠性的关注。一些司法管辖区正在制定ML规则。一个如此关注点是确保模型训练数据具有特定敏感属性的恰当分布特性。例如,草拟法规要求模型训练人员证明训练数据具有特定的分布特性,如反映人口多样性。我们提出了属性证明的想法,允许证明人(例如模型训练人员)在不披露数据的情况下,向验证人(例如客户)证明训练数据具有相关的分布特性。我们介绍了一种有效的混合属性证明方法,结合属性推理和 крип加密机制。

Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-free Continual Learning

  • paper_url: http://arxiv.org/abs/2308.09544
  • repo_url: None
  • paper_authors: Filip Szatkowski, Mateusz Pyla, Marcin Przewięźlikowski, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński
  • for: 这篇研究文章的目的是探讨例外自由类别增量学习(CIL)中的知识传递(KD)作为调节策略,以预防忘记。
  • methods: 这篇文章使用了KD作为调节策略,但是它们经常无法调节模型不具有先前任务的例子。我们的分析表明,这问题源于教师网络对于非标准资料的重大表现变化。这会导致KD损失成分中的大量错误,导致CIL中的表现下降。
  • results: 我们引入了教师适应(TA),一种同时更新教师网络和主要模型的方法,以实现CIL中的例外自由类别增量学习。我们的方法与KD-based CIL方法相互运作,并在多个例外自由CIL标准 benchmark上提供了一致的性能提升。
    Abstract In this work, we investigate exemplar-free class incremental learning (CIL) with knowledge distillation (KD) as a regularization strategy, aiming to prevent forgetting. KD-based methods are successfully used in CIL, but they often struggle to regularize the model without access to exemplars of the training data from previous tasks. Our analysis reveals that this issue originates from substantial representation shifts in the teacher network when dealing with out-of-distribution data. This causes large errors in the KD loss component, leading to performance degradation in CIL. Inspired by recent test-time adaptation methods, we introduce Teacher Adaptation (TA), a method that concurrently updates the teacher and the main model during incremental training. Our method seamlessly integrates with KD-based CIL approaches and allows for consistent enhancement of their performance across multiple exemplar-free CIL benchmarks.
    摘要 在这项工作中,我们研究无 exemplar 的类增量学习(CIL),并用知识塑化(KD)作为规范策略,以避免忘记。 KD 基本方法在 CIL 中得到了成功,但它们经常在不同任务的数据上遇到表示转移的问题,导致模型训练不稳定。我们的分析表明,这一问题源于老师网络处理异常数据时的重大表示转移,导致 KD 损失函数中的大误差,从而导致 CIL 性能下降。鉴于这一点,我们引入教师适应(TA)方法,该方法在增量训练中同时更新老师网络和主模型。我们的方法顺利地与 KD 基本方法结合,并允许在多个无 exemplar 的 CIL benchmark上进行一致增强性表现。

Latent State Models of Training Dynamics

  • paper_url: http://arxiv.org/abs/2308.09543
  • repo_url: None
  • paper_authors: Michael Y. Hu, Angelica Chen, Naomi Saphra, Kyunghyun Cho
  • for: 本研究旨在理解Randomness对模型训练的影响,包括不同数据顺序和初始化方法对模型训练的影响,以及这些影响如何导致模型训练过程中的不同演进和结果。
  • methods: 本研究使用多个随机种子训练模型,并计算训练过程中的多个指标,如模型权重的$L_2$范数、平均值和方差。然后,使用隐马尔可夫模型(HMM)来模型训练过程,将训练过程看作一种随机过程,并从而获得低维、离散的训练动力学表示。
  • results: 本研究使用HMM表示法描述了不同训练过程的训练动力学,并发现了一些稳定演进和慢速演进的特征。此外,研究还发现了一些”拐弯”状态,这些状态可能会降低模型的收敛速率。
    Abstract The impact of randomness on model training is poorly understood. How do differences in data order and initialization actually manifest in the model, such that some training runs outperform others or converge faster? Furthermore, how can we interpret the resulting training dynamics and the phase transitions that characterize different trajectories? To understand the effect of randomness on the dynamics and outcomes of neural network training, we train models multiple times with different random seeds and compute a variety of metrics throughout training, such as the $L_2$ norm, mean, and variance of the neural network's weights. We then fit a hidden Markov model (HMM) over the resulting sequences of metrics. The HMM represents training as a stochastic process of transitions between latent states, providing an intuitive overview of significant changes during training. Using our method, we produce a low-dimensional, discrete representation of training dynamics on grokking tasks, image classification, and masked language modeling. We use the HMM representation to study phase transitions and identify latent "detour" states that slow down convergence.
    摘要 “随机的影响在模型训练中仍然不够清楚。实际上,不同的数据顺序和初始化方式会如何对模型的训练造成影响,使一些训练运行比其他优化得更快或更好?此外,我们如何解释训练过程中的结果和阶段转换,以及它们对不同的训练运行所造成的影响?为了理解随机的影响,我们在不同的随机seed中训练模型多次,并计算训练过程中的一些指标,例如模型的$L_2$ нор和平均值。然后,我们使用隐藏Marker模型(HMM)来描述训练过程,将训练过程看作一种随机过程,从而获得训练过程的低维、组合表示。使用我们的方法,我们可以研究训练过程中的阶段转换和潜在的“停顿”状态,以及它们对不同的训练运行所造成的影响。”Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Decoupled conditional contrastive learning with variable metadata for prostate lesion detection

  • paper_url: http://arxiv.org/abs/2308.09542
  • repo_url: https://github.com/camilleruppli/decoupled_ccl
  • paper_authors: Camille Ruppli, Pietro Gori, Roberto Ardon, Isabelle Bloch
    for: 这个研究的目的是提高肠癌早期诊断的精度。methods: 这篇论文使用多 parametric 磁共振成像(mp-MRI)进行肠癌涂敷识别。它还使用了报告和数据系统(PI-RADS)来标准化肠MRI的解释,并利用了多个注解者每个样本的多个注解来增强metadata的信任度。results: 这篇论文report了使用这种新的对比损失函数,在PI-CAI挑战数据集上提高了肠癌涂敷识别的准确率3%。
    Abstract Early diagnosis of prostate cancer is crucial for efficient treatment. Multi-parametric Magnetic Resonance Images (mp-MRI) are widely used for lesion detection. The Prostate Imaging Reporting and Data System (PI-RADS) has standardized interpretation of prostate MRI by defining a score for lesion malignancy. PI-RADS data is readily available from radiology reports but is subject to high inter-reports variability. We propose a new contrastive loss function that leverages weak metadata with multiple annotators per sample and takes advantage of inter-reports variability by defining metadata confidence. By combining metadata of varying confidence with unannotated data into a single conditional contrastive loss function, we report a 3% AUC increase on lesion detection on the public PI-CAI challenge dataset. Code is available at: https://github.com/camilleruppli/decoupled_ccl
    摘要 早期诊断 простаতheimer 癌是非常重要,以便有效地治疗。多 Parametric Magnetic Resonance Image (mp-MRI) 广泛用于肿瘤检测。Prostate Imaging Reporting and Data System (PI-RADS) 已经标准化了 простаvat MRI 的解释,并定义了肿瘤凶猛程度的分数。PI-RADS 数据ready available from radiology reports,但是受到高度的 inter-reports 变化。我们提议一种新的对比损失函数,利用weak metadata 和多个 annotators per sample,并利用 inter-reports 变化来定义metadata confidence。通过将 metadata of varying confidence 与未注释数据合并为一个conditional contrastive loss function,我们报告了PI-CAI challenge dataset上的 lesion detection 的3% AUC 提高。代码可以在:https://github.com/camilleruppli/decoupled_ccl 中找到。

Privacy-Preserving 3-Layer Neural Network Training using Mere Homomorphic Encryption Technique

  • paper_url: http://arxiv.org/abs/2308.09531
  • repo_url: None
  • paper_authors: John Chiang
  • for: 这篇论文处理了在单纯复制算法设定下进行隐私保护的神经网络训练问题。
  • methods: 本论文结合了多种现有技术,并将其扩展和改进,最终实现了使用单纯复制算法进行3层神经网络的训练,并解决了回归和分类问题。
  • results: 本论文通过实验证明,可以使用单纯复制算法进行隐私保护的神经网络训练,并且可以解决回归和分类问题。
    Abstract In this manuscript, we consider the problem of privacy-preserving training of neural networks in the mere homomorphic encryption setting. We combine several exsiting techniques available, extend some of them, and finally enable the training of 3-layer neural networks for both the regression and classification problems using mere homomorphic encryption technique.
    摘要 在这个手稿中,我们考虑了使用简单卷积Encryption(MHE)的 neural network训练 privacy保护问题。我们将现有的技术综合使用,部分扩展,最终实现了使用 MHE 技术训练三层神经网络,用于回归和分类问题。Here's a breakdown of the translation:* 在这个手稿中 (in this manuscript) - This phrase is used to indicate that the text is part of a larger document or manuscript.* 我们考虑了 (we consider) - This verb is used to indicate that the authors are thinking about or discussing a particular topic.* 使用简单卷积Encryption (MHE) (using simple homomorphic encryption) - This phrase indicates that the authors are using a specific type of encryption called "simple homomorphic encryption" (简单卷积Encryption).* 神经网络训练 (neural network training) - This phrase indicates that the authors are training a neural network.* privacy保护 (privacy protection) - This phrase indicates that the authors are concerned with protecting the privacy of the data used for training the neural network.* 问题 (problem) - This word is used to indicate that the authors are trying to solve a specific problem or challenge.* 我们将现有的技术综合使用 (we will combine existing techniques) - This phrase indicates that the authors will use a combination of existing techniques to solve the problem.* 部分扩展 (extending some of them) - This phrase indicates that the authors will extend some of the existing techniques to solve the problem.* 最终实现了 (finally enabled) - This phrase indicates that the authors were able to achieve their goal of training a neural network using the combination of techniques.* 使用 MHE 技术 (using MHE technique) - This phrase indicates that the authors used the simple homomorphic encryption technique to train the neural network.* 训练三层神经网络 (training a three-layer neural network) - This phrase indicates that the authors trained a neural network with three layers.* 用于回归和分类问题 (for regression and classification problems) - This phrase indicates that the trained neural network can be used for both regression and classification tasks.

Transitivity-Preserving Graph Representation Learning for Bridging Local Connectivity and Role-based Similarity

  • paper_url: http://arxiv.org/abs/2308.09517
  • repo_url: https://github.com/nslab-cuk/unified-graph-transformer
  • paper_authors: Van Thuy Hoang, O-Joun Lee
  • for: This paper aims to improve graph representation learning methods by integrating local and global structural information into fixed-length vector representations.
  • methods: The proposed Unified Graph Transformer Networks (UGT) learn local structure by identifying local substructures and aggregating features of the $k$-hop neighborhoods of each node, and construct virtual edges to capture long-range dependencies. UGT also learns unified representations through self-attention, encoding structural distance and $p$-step transition probability between node pairs.
  • results: The proposed method significantly outperformed baselines that consist of state-of-the-art models on real-world benchmark datasets over various downstream tasks, and reached the expressive power of the third-order Weisfeiler-Lehman isomorphism test (3d-WL) in distinguishing non-isomorphic graph pairs.
    Abstract Graph representation learning (GRL) methods, such as graph neural networks and graph transformer models, have been successfully used to analyze graph-structured data, mainly focusing on node classification and link prediction tasks. However, the existing studies mostly only consider local connectivity while ignoring long-range connectivity and the roles of nodes. In this paper, we propose Unified Graph Transformer Networks (UGT) that effectively integrate local and global structural information into fixed-length vector representations. First, UGT learns local structure by identifying the local substructures and aggregating features of the $k$-hop neighborhoods of each node. Second, we construct virtual edges, bridging distant nodes with structural similarity to capture the long-range dependencies. Third, UGT learns unified representations through self-attention, encoding structural distance and $p$-step transition probability between node pairs. Furthermore, we propose a self-supervised learning task that effectively learns transition probability to fuse local and global structural features, which could then be transferred to other downstream tasks. Experimental results on real-world benchmark datasets over various downstream tasks showed that UGT significantly outperformed baselines that consist of state-of-the-art models. In addition, UGT reaches the expressive power of the third-order Weisfeiler-Lehman isomorphism test (3d-WL) in distinguishing non-isomorphic graph pairs. The source code is available at https://github.com/NSLab-CUK/Unified-Graph-Transformer.
    摘要 GRaph representation learning (GRL) 方法,如图神经网络和图变换模型,已经成功地分析图形结构数据,主要关注节点分类和链接预测任务。然而,现有的研究通常只考虑本地连接,而忽略远程连接和节点的角色。在这篇论文中,我们提出了统一图 transformer 网络 (UGT),可以有效地 integrate 本地和远程结构信息到固定长度 вектор表示中。首先,UGT 通过标识本地子结构和 $k $- hop 邻居的特征汇集来学习本地结构。其次,我们通过构建虚拟边,将远程节点与结构相似的节点相连,以捕捉远程依赖关系。最后,UGT 通过自我注意力学习,编码结构距离和 $p $- step 过渡概率 между 节点对,学习统一表示。此外,我们提出了一种自适应学习任务,可以有效地学习 transition probability,并将本地和远程结构特征融合,以便在其他下游任务中转移。实验结果表明,UGT 在真实世界 benchmark 数据上显著超越基eline,并达到了第三个 Weisfeiler-Lehman 同态测试 (3d-WL) 的表达力,在分辨非同态图对中表现出优异。代码可以在 上获取。

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

  • paper_url: http://arxiv.org/abs/2308.09514
  • repo_url: https://github.com/apple/ml-spatial-librispeech
  • paper_authors: Miguel Sarabia, Elena Menyaylenko, Alessandro Toso, Skyler Seto, Zakaria Aldeneh, Shadi Pirhosseinloo, Luca Zappella, Barry-John Theobald, Nicholas Apostoloff, Jonathan Sheaffer
  • for: 用于机器学习模型训练
  • methods: 使用了LibriSpeech采样并通过200k+的 simulated acoustic conditions和8k+的合成房间来生成数据集
  • results: 训练四个空间声音任务后, median absolute error为6.60{\deg} 3D源 localization,0.43m distance,90.66ms T30,和2.74dB DRR estimation,同时模型可以通过广泛使用的评估数据集进行推广
    Abstract We present Spatial LibriSpeech, a spatial audio dataset with over 650 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with 200k+ simulated acoustic conditions across 8k+ synthetic rooms. To demonstrate the utility of our dataset, we train models on four spatial audio tasks, resulting in a median absolute error of 6.60{\deg} on 3D source localization, 0.43m on distance, 90.66ms on T30, and 2.74dB on DRR estimation. We show that the same models generalize well to widely-used evaluation datasets, e.g., obtaining a median absolute error of 12.43{\deg} on 3D source localization on TUT Sound Events 2018, and 157.32ms on T30 estimation on ACE Challenge.
    摘要 我们介绍的空间听说库SpatiLibriSpeech,包含650多小时19个渠道声音数据,首次投影束聚合和可选择干扰噪音。SpatiLibriSpeech是为机器学习模型训练而设计,其包括源位置、说话方向、室内声学和几何学标签。SpatiLibriSpeech通过对LibriSpeech样本进行扩展,生成了200000+个 simulate室内声音条件和8000+个synthetic室内。为了证明我们的数据集的实用性,我们在四个空间听说任务上训练了模型,其中 median absolute error为6.60度、0.43米、90.66毫秒和2.74dB。我们还表明了这些模型在广泛使用的评估数据集上也具有良好的泛化能力,例如在TUT Sound Events 2018中 median absolute error为12.43度、ACE Challenge中T30估计为157.32毫秒。

Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge Transfer

  • paper_url: http://arxiv.org/abs/2308.09499
  • repo_url: None
  • paper_authors: Wendong Bi, Xueqi Cheng, Bingbing Xu, Xiaoqian Sun, Li Xu, Huawei Shen
  • for: 解决数据缺乏和低质量问题,帮助深度学习模型在Target领域中提高性能。
  • methods: 基于Graph Neural Networks (GNNs)的知识抽象技术,通过构建 Bridge-Graph 来学习Target领域的知识背景,然后通过GNNs进行样本知识传递。
  • results: 对于不同的数据领域和数据质量,Bridged-GNN 表现出显著的改善,与State-of-the-Art方法相比。
    Abstract The data-hungry problem, characterized by insufficiency and low-quality of data, poses obstacles for deep learning models. Transfer learning has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on strong assumptions, e.g., the domain invariant posterior distribution, which is usually unsatisfied and may introduce noises, resulting in poor generalization ability on target domains. Inspired by Graph Neural Networks (GNNs) that aggregate information from neighboring nodes, we redefine the paradigm as learning a knowledge-enhanced posterior distribution for target domains, namely Knowledge Bridge Learning (KBL). KBL first learns the scope of knowledge transfer by constructing a Bridged-Graph that connects knowledgeable samples to each target sample and then performs sample-wise knowledge transfer via GNNs.KBL is free from strong assumptions and is robust to noises in the source data. Guided by KBL, we propose the Bridged-GNN} including an Adaptive Knowledge Retrieval module to build Bridged-Graph and a Graph Knowledge Transfer module. Comprehensive experiments on both un-relational and relational data-hungry scenarios demonstrate the significant improvements of Bridged-GNN compared with SOTA methods
    摘要 “问题集”, caracterized by “欠缺和低质量的数据”, poses obstacles for deep learning models. “传入学习” has been a feasible way to transfer knowledge from high-quality external data of source domains to limited data of target domains, which follows a domain-level knowledge transfer to learn a shared posterior distribution. However, they are usually built on strong assumptions, e.g., the domain invariant posterior distribution, which is usually unsatisfied and may introduce noises, resulting in poor generalization ability on target domains.Inspired by Graph Neural Networks (GNNs) that aggregate information from neighboring nodes, we redefine the paradigm as learning a knowledge-enhanced posterior distribution for target domains, namely Knowledge Bridge Learning (KBL). KBL first learns the scope of knowledge transfer by constructing a Bridged-Graph that connects knowledgeable samples to each target sample and then performs sample-wise knowledge transfer via GNNs.KBL is free from strong assumptions and is robust to noises in the source data.Guided by KBL, we propose the Bridged-GNN, including an Adaptive Knowledge Retrieval module to build Bridged-Graph and a Graph Knowledge Transfer module. Comprehensive experiments on both un-relational and relational data-hungry scenarios demonstrate the significant improvements of Bridged-GNN compared with SOTA methods.

Predictive Authoring for Brazilian Portuguese Augmentative and Alternative Communication

  • paper_url: http://arxiv.org/abs/2308.09497
  • repo_url: https://github.com/jayralencar/pictogram_prediction_pt
  • paper_authors: Jayr Pereira, Rodrigo Nogueira, Cleber Zanchettin, Robson Fidalgo
  • For: This paper proposes using a BERT-like model for pictogram prediction in AAC systems to improve the efficiency of message authoring for individuals with complex communication needs.* Methods: The authors finetune BERTimbau, a Brazilian Portuguese version of BERT, using an AAC corpus for Brazilian Portuguese, and test different approaches to representing a pictogram for prediction, including as a word, as a concept, and as a set of synonyms. They also evaluate the usage of images for pictogram prediction.* Results: The results demonstrate that using embeddings computed from the pictograms’ caption, synonyms, or definitions have a similar performance, with using synonyms leading to lower perplexity but using captions leading to the highest accuracies. The paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.Here is the information in Simplified Chinese text:* For: 这篇论文提出使用基于BERT的模型来提高AAC系统中图文推理的效率,以便更好地满足复杂通信需求的个体需要。* Methods: 作者们使用了一个特定于巴西葡萄牙语的BERT版本(BERTimbau)进行finetuning,并使用巴西葡萄牙语AAC corpus进行训练。他们还测试了不同的图文表示方法,包括作为单词、作为概念和作为相关词的方法。此外,他们还评估了使用图像进行图文预测的可能性。* Results: 结果表明,使用图文caption、synonyms或definition中的嵌入都有类似的性能。使用synonyms导致词语精度更低,但使用caption导致最高准确率。这篇论文提供了使用基于BERT的模型来表示图文的预测方法的可能性,以及使用图像进行图文预测的潜在可能性。
    Abstract Individuals with complex communication needs (CCN) often rely on augmentative and alternative communication (AAC) systems to have conversations and communique their wants. Such systems allow message authoring by arranging pictograms in sequence. However, the difficulty of finding the desired item to complete a sentence can increase as the user's vocabulary increases. This paper proposes using BERTimbau, a Brazilian Portuguese version of BERT, for pictogram prediction in AAC systems. To finetune BERTimbau, we constructed an AAC corpus for Brazilian Portuguese to use as a training corpus. We tested different approaches to representing a pictogram for prediction: as a word (using pictogram captions), as a concept (using a dictionary definition), and as a set of synonyms (using related terms). We also evaluated the usage of images for pictogram prediction. The results demonstrate that using embeddings computed from the pictograms' caption, synonyms, or definitions have a similar performance. Using synonyms leads to lower perplexity, but using captions leads to the highest accuracies. This paper provides insight into how to represent a pictogram for prediction using a BERT-like model and the potential of using images for pictogram prediction.
    摘要

Balancing Transparency and Risk: The Security and Privacy Risks of Open-Source Machine Learning Models

  • paper_url: http://arxiv.org/abs/2308.09490
  • repo_url: None
  • paper_authors: Dominik Hintersdorf, Lukas Struppek, Kristian Kersting
  • for: 本研究旨在提醒开源机器学习模型使用者关注其隐私和安全问题。
  • methods: 本文提出了一种概述开源模型常见的隐私和安全威胁的方法,以提高开源模型的安全使用。
  • results: 本研究发现了一些可能的隐私和安全攻击方法,包括模型中隐藏的功能和特定输入模式触发的攻击方法。
    Abstract The field of artificial intelligence (AI) has experienced remarkable progress in recent years, driven by the widespread adoption of open-source machine learning models in both research and industry. Considering the resource-intensive nature of training on vast datasets, many applications opt for models that have already been trained. Hence, a small number of key players undertake the responsibility of training and publicly releasing large pre-trained models, providing a crucial foundation for a wide range of applications. However, the adoption of these open-source models carries inherent privacy and security risks that are often overlooked. To provide a concrete example, an inconspicuous model may conceal hidden functionalities that, when triggered by specific input patterns, can manipulate the behavior of the system, such as instructing self-driving cars to ignore the presence of other vehicles. The implications of successful privacy and security attacks encompass a broad spectrum, ranging from relatively minor damage like service interruptions to highly alarming scenarios, including physical harm or the exposure of sensitive user data. In this work, we present a comprehensive overview of common privacy and security threats associated with the use of open-source models. By raising awareness of these dangers, we strive to promote the responsible and secure use of AI systems.
    摘要 artifical intelligence(AI)领域在近年来已经取得了很大的进步,这主要归功于研究和业界广泛采用开源机器学习模型。由于训练大量数据集的资源占用性很高,许多应用程序选择使用已经训练好的模型。因此,只有一些关键 иг主要负责训练和公共发布大型预训练模型,提供了广泛应用的基础。然而,使用这些开源模型的采用带来了内置的隐私和安全风险,这些风险frequently overlooked。作为一个具体的例子,一个不起眼的模型可能隐藏潜在的功能,当特定的输入模式触发时,可以 manipulate the behavior of the system,如 instructing self-driving cars to ignore the presence of other vehicles。成功的隐私和安全攻击的影响范围广泛,从relatively minor damage like service interruptions到highly alarming scenarios,包括物理伤害或披露敏感用户数据。在这种工作中,我们提供了对开源模型常见隐私和安全威胁的全面回顾。通过提高对这些危险的意识,我们希望推动AI系统的负责任和安全使用。

RBA-GCN: Relational Bilevel Aggregation Graph Convolutional Network for Emotion Recognition

  • paper_url: http://arxiv.org/abs/2308.11029
  • repo_url: https://github.com/luftmenscher/RBA-GCN
  • paper_authors: Lin Yuan, Guoheng Huang, Fenghuan Li, Xiaochen Yuan, Chi-Man Pun, Guo Zhong
  • for: 本研究旨在提高对话中情感认知(ERC)的性能,通过基于图 convolutional networks(GCNs)的模型来解决传统GCNs中的节点信息重复问题,以及单层GCNs缺乏捕捉长距离上下文信息的能力。
  • methods: 本研究提出了一种基于图生成模块(GGM)、相似性基于团建模块(SCBM)和双层汇集模块(BiAM)的图 convolutional network(RBA-GCN),以解决传统GCNs中节点信息重复问题和单层GCNs缺乏捕捉长距离上下文信息的问题。
  • results: 在IEMOCAP和MELD数据集上,RBA-GCN的Weighted Average F1 score比最先进的方法提高2.17%∼5.21%。
    Abstract Emotion recognition in conversation (ERC) has received increasing attention from researchers due to its wide range of applications. As conversation has a natural graph structure, numerous approaches used to model ERC based on graph convolutional networks (GCNs) have yielded significant results. However, the aggregation approach of traditional GCNs suffers from the node information redundancy problem, leading to node discriminant information loss. Additionally, single-layer GCNs lack the capacity to capture long-range contextual information from the graph. Furthermore, the majority of approaches are based on textual modality or stitching together different modalities, resulting in a weak ability to capture interactions between modalities. To address these problems, we present the relational bilevel aggregation graph convolutional network (RBA-GCN), which consists of three modules: the graph generation module (GGM), similarity-based cluster building module (SCBM) and bilevel aggregation module (BiAM). First, GGM constructs a novel graph to reduce the redundancy of target node information. Then, SCBM calculates the node similarity in the target node and its structural neighborhood, where noisy information with low similarity is filtered out to preserve the discriminant information of the node. Meanwhile, BiAM is a novel aggregation method that can preserve the information of nodes during the aggregation process. This module can construct the interaction between different modalities and capture long-range contextual information based on similarity clusters. On both the IEMOCAP and MELD datasets, the weighted average F1 score of RBA-GCN has a 2.17$\sim$5.21\% improvement over that of the most advanced method.
    摘要 “对话情感识别”(ERC)在研究者中获得了越来越多的注意力,因为它有广泛的应用领域。由于对话有自然的图形结构,许多方法使用图形卷积网(GCNs)来模型ERC,它们已经获得了显著的成果。然而,传统GCNs的聚合方法受到节点资讯重复问题的影响,导致节点标识资讯的损失。另外,单层GCNs缺乏获取长距离Contextual信息的能力。此外,大多数方法仅基于文本模式或是将不同模式组合在一起,从而导致模式之间的互动缺乏能力。为了解决这些问题,我们提出了关系内部卷积网(RBA-GCN),它包括三个模块:图形生成模块(GGM)、相似度基于单元模块(SCBM)和两层聚合模块(BiAM)。首先,GGM使用一个新的图形来减少目标节点信息的重复性。然后,SCBM计算了目标节点和其结构邻域中的相似度,并将低相似度的信息排除以保留节点标识资讯。同时,BiAM是一个新的聚合方法,可以在聚合过程中保留节点信息。这个模块可以在不同模式之间建立互动,并以相似 clusters 来捕捉长距离Contextual信息。在IEMOCAP和MELD datasets上,RBA-GCN的Weighted average F1 score与最先进方法相比,有2.17%至5.21%的提升。

Data augmentation and explainability for bias discovery and mitigation in deep learning

  • paper_url: http://arxiv.org/abs/2308.09464
  • repo_url: None
  • paper_authors: Agnieszka Mikołajczyk-Bareła
  • for: 本论文探讨深度神经网络中偏见的影响和降低其影响的方法。
  • methods: 本论文提出了三种方法来降低偏见的影响:样式传输数据增强、targeted数据增强和负责任反馈。
  • results: 本论文通过实验表明,这些方法可以降低深度神经网络中偏见的影响,提高模型的准确率。
    Abstract This dissertation explores the impact of bias in deep neural networks and presents methods for reducing its influence on model performance. The first part begins by categorizing and describing potential sources of bias and errors in data and models, with a particular focus on bias in machine learning pipelines. The next chapter outlines a taxonomy and methods of Explainable AI as a way to justify predictions and control and improve the model. Then, as an example of a laborious manual data inspection and bias discovery process, a skin lesion dataset is manually examined. A Global Explanation for the Bias Identification method is proposed as an alternative semi-automatic approach to manual data exploration for discovering potential biases in data. Relevant numerical methods and metrics are discussed for assessing the effects of the identified biases on the model. Whereas identifying errors and bias is critical, improving the model and reducing the number of flaws in the future is an absolute priority. Hence, the second part of the thesis focuses on mitigating the influence of bias on ML models. Three approaches are proposed and discussed: Style Transfer Data Augmentation, Targeted Data Augmentations, and Attribution Feedback. Style Transfer Data Augmentation aims to address shape and texture bias by merging a style of a malignant lesion with a conflicting shape of a benign one. Targeted Data Augmentations randomly insert possible biases into all images in the dataset during the training, as a way to make the process random and, thus, destroy spurious correlations. Lastly, Attribution Feedback is used to fine-tune the model to improve its accuracy by eliminating obvious mistakes and teaching it to ignore insignificant input parts via an attribution loss. The goal of these approaches is to reduce the influence of bias on machine learning models, rather than eliminate it entirely.
    摘要 Then, as an example of a laborious manual data inspection and bias discovery process, a skin lesion dataset is manually examined. A Global Explanation for the Bias Identification method is proposed as an alternative semi-automatic approach to manual data exploration for discovering potential biases in data. Relevant numerical methods and metrics are discussed for assessing the effects of the identified biases on the model.Whereas identifying errors and bias is critical, improving the model and reducing the number of flaws in the future is an absolute priority. Hence, the second part of the thesis focuses on mitigating the influence of bias on ML models. Three approaches are proposed and discussed: Style Transfer Data Augmentation, Targeted Data Augmentations, and Attribution Feedback.Style Transfer Data Augmentation aims to address shape and texture bias by merging a style of a malignant lesion with a conflicting shape of a benign one. Targeted Data Augmentations randomly insert possible biases into all images in the dataset during the training, as a way to make the process random and, thus, destroy spurious correlations. Lastly, Attribution Feedback is used to fine-tune the model to improve its accuracy by eliminating obvious mistakes and teaching it to ignore insignificant input parts via an attribution loss. The goal of these approaches is to reduce the influence of bias on machine learning models, rather than eliminate it entirely.

Reconstructing $S$-matrix Phases with Machine Learning

  • paper_url: http://arxiv.org/abs/2308.09451
  • repo_url: None
  • paper_authors: Aurélien Dersy, Matthew D. Schwartz, Alexander Zhiboedov
  • for: 这篇论文是关于$S$-矩阵 bootstrap 计划中对幂谱和相位之间的关系的研究。
  • methods: 作者使用现代机器学习技术来研究单位约束。他们发现,对于给定的幂谱,当存在相位时,可以通过机器学习技术准确重建相位。此外,损失函数估算算法可以用作判断给定幂谱是否与单位约束相容的准确指标。
  • results: 作者发现,在某些情况下,多个相位可能与单个幂谱相容,并发现了一种新的相位ambiguous 解。此外,他们发现这种解可以让单位约束的已知限制得到明显改善。
    Abstract An important element of the $S$-matrix bootstrap program is the relationship between the modulus of an $S$-matrix element and its phase. Unitarity relates them by an integral equation. Even in the simplest case of elastic scattering, this integral equation cannot be solved analytically and numerical approaches are required. We apply modern machine learning techniques to studying the unitarity constraint. We find that for a given modulus, when a phase exists it can generally be reconstructed to good accuracy with machine learning. Moreover, the loss of the reconstruction algorithm provides a good proxy for whether a given modulus can be consistent with unitarity at all. In addition, we study the question of whether multiple phases can be consistent with a single modulus, finding novel phase-ambiguous solutions. In particular, we find a new phase-ambiguous solution which pushes the known limit on such solutions significantly beyond the previous bound.
    摘要 “$S$-matrixbootstrap程序中一个重要元素是几何和几何的关系。对几何的modulus,对它的相位也存在一个统计方程。即使是最简的反射散射案例,这个统计方程无法分析解,需要使用数值方法。我们使用现代机器学习技术来研究单位约束。我们发现,给定的几何,当一个相位存在时,通常可以使用机器学习重建它,并且损失函数提供了一个单位约束是否成立的好proxy。此外,我们研究了是否多个相位可以与单一的几何相容,发现了新的相位不确定解。特别是,我们发现了一个新的相位不确定解,将已知的限制Push beyond the previous bound。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Logistics Hub Location Optimization: A K-Means and P-Median Model Hybrid Approach Using Road Network Distances

  • paper_url: http://arxiv.org/abs/2308.11038
  • repo_url: None
  • paper_authors: Muhammad Abdul Rahman, Muhammad Aamir Basheer, Zubair Khalid, Muhammad Tahir, Momin Uppal
  • for: 优化快递总站的位置,以提高电商业务的效率和环保性。
  • methods: 该研究使用权重P-Median方法和K-Means clustering方法,将配送点按照其空间位置归类,然后使用P-Median方法决定最佳的快递总站位置。
  • results: 结果显示,使用优化后的快递总站位置,每个配送 distances 可以减少 815(10%)米。
    Abstract Logistic hubs play a pivotal role in the last-mile delivery distance; even a slight increment in distance negatively impacts the business of the e-commerce industry while also increasing its carbon footprint. The growth of this industry, particularly after Covid-19, has further intensified the need for optimized allocation of resources in an urban environment. In this study, we use a hybrid approach to optimize the placement of logistic hubs. The approach sequentially employs different techniques. Initially, delivery points are clustered using K-Means in relation to their spatial locations. The clustering method utilizes road network distances as opposed to Euclidean distances. Non-road network-based approaches have been avoided since they lead to erroneous and misleading results. Finally, hubs are located using the P-Median method. The P-Median method also incorporates the number of deliveries and population as weights. Real-world delivery data from Muller and Phipps (M&P) is used to demonstrate the effectiveness of the approach. Serving deliveries from the optimal hub locations results in the saving of 815 (10%) meters per delivery.
    摘要

Defending Label Inference Attacks in Split Learning under Regression Setting

  • paper_url: http://arxiv.org/abs/2308.09448
  • repo_url: None
  • paper_authors: Haoze Qiu, Fei Zheng, Chaochao Chen, Xiaolin Zheng
  • for: 本研究主要针对Split Learning下的标签推导攻击,即通过梯度反转方法进行标签推导。
  • methods: 我们提出了两种防御方法:Random Label Extension (RLE) 和 Model-based adaptive Label Extension (MLE)。RLE 方法通过扩展标签信息,防止攻击者通过梯度来训练攻击模型。MLE 方法保留了原始任务的标签信息,并在扩展后的标签中占据主导地位。
  • results: 我们的 эксперименталь结果表明,相比基础防御方法,我们提出的防御方法可以减少攻击模型的表现,同时保持原始任务的表现。
    Abstract As a privacy-preserving method for implementing Vertical Federated Learning, Split Learning has been extensively researched. However, numerous studies have indicated that the privacy-preserving capability of Split Learning is insufficient. In this paper, we primarily focus on label inference attacks in Split Learning under regression setting, which are mainly implemented through the gradient inversion method. To defend against label inference attacks, we propose Random Label Extension (RLE), where labels are extended to obfuscate the label information contained in the gradients, thereby preventing the attacker from utilizing gradients to train an attack model that can infer the original labels. To further minimize the impact on the original task, we propose Model-based adaptive Label Extension (MLE), where original labels are preserved in the extended labels and dominate the training process. The experimental results show that compared to the basic defense methods, our proposed defense methods can significantly reduce the attack model's performance while preserving the original task's performance.
    摘要 As a privacy-preserving method for implementing Vertical Federated Learning, Split Learning has been extensively studied. However, numerous studies have shown that the privacy-preserving capability of Split Learning is insufficient. In this paper, we focus primarily on label inference attacks in Split Learning under the regression setting, which are mainly implemented through the gradient inversion method. To defend against label inference attacks, we propose Random Label Extension (RLE), where labels are extended to obscure the label information contained in the gradients, thereby preventing the attacker from using gradients to train an attack model that can infer the original labels. To further minimize the impact on the original task, we propose Model-based adaptive Label Extension (MLE), where original labels are preserved in the extended labels and dominate the training process. The experimental results show that compared to the basic defense methods, our proposed defense methods can significantly reduce the attack model's performance while preserving the original task's performance.Here's the translation of each sentence into Simplified Chinese:1. 在 Vertical Federated Learning 中作为隐私保护方法,Split Learning 已经得到了广泛的研究。2. 然而,许多研究表明,Split Learning 的隐私保护能力不够。3. 在本文中,我们主要关注 Split Learning 下的批量推理攻击,它们通常通过梯度倒反方法进行实现。4. 为了防止批量推理攻击,我们提议 Random Label Extension (RLE),labels 被扩展以隐藏在梯度中的标签信息,因此防止攻击者使用梯度训练攻击模型来推理出原始标签。5. 为了最小化影响原始任务,我们提议 Model-based adaptive Label Extension (MLE),原始标签在扩展后保留,并且主导训练过程。6. 实验结果表明,相比基本防御方法,我们提议的防御方法可以显著降低攻击模型的性能,同时保持原始任务的性能。

An Efficient 1 Iteration Learning Algorithm for Gaussian Mixture Model And Gaussian Mixture Embedding For Neural Network

  • paper_url: http://arxiv.org/abs/2308.09444
  • repo_url: None
  • paper_authors: Weiguo Lu, Xuan Wu, Deng Ding, Gangnan Yuan
  • for: 这个论文是为了提出一种基于我们之前的GMM扩展思想的 Gaussian Mixture Model(GMM)学习算法。
  • methods: 这个算法使用了新的GMM扩展思想,具有更高的稳定性和简洁性,并且能够在1轮内学习。我们也提供了一个对于不同参数初始值的确定性证明。
  • results: 我们的GMM扩展方法比 классиical Expectation Maximization(EM)算法更具有鲁棒性和精度,并且能够更好地应对数据不确定性和逆问题。最后,我们测试了基于GMM的生成器,并证明了它的潜在应用于随机抽样和变量控制。
    Abstract We propose an Gaussian Mixture Model (GMM) learning algorithm, based on our previous work of GMM expansion idea. The new algorithm brings more robustness and simplicity than classic Expectation Maximization (EM) algorithm. It also improves the accuracy and only take 1 iteration for learning. We theoretically proof that this new algorithm is guarantee to converge regardless the parameters initialisation. We compare our GMM expansion method with classic probability layers in neural network leads to demonstrably better capability to overcome data uncertainty and inverse problem. Finally, we test GMM based generator which shows a potential to build further application that able to utilized distribution random sampling for stochastic variation as well as variation control.
    摘要 我团队提出了一种高斯混合模型(GMM)学习算法,基于我们之前的GMM扩展思想。新算法具有更高的稳定性和简洁性,并且在1轮学习中可以提高准确性。我们 theoretically证明了这种新算法是无论初始化参数都能够 converge。我们对 классический可期望最大化(EM)算法和GMM扩展方法进行比较,发现我们的方法在面对数据不确定性和逆问题时表现更好。最后,我们测试了基于GMM的生成器,发现它具有可以利用分布随机抽样以及变量控制的潜在应用 potential。Note: Please keep in mind that the translation is done by a machine and may not be perfect. It's always a good idea to have a human proofread and verify the translation, especially for important documents.

From Hope to Safety: Unlearning Biases of Deep Models by Enforcing the Right Reasons in Latent Space

  • paper_url: http://arxiv.org/abs/2308.09437
  • repo_url: None
  • paper_authors: Maximilian Dreyer, Frederik Pahde, Christopher J. Anders, Wojciech Samek, Sebastian Lapuschkin
  • for: The paper is written for deep neural networks that are prone to learning spurious correlations and biases, with a focus on high-stake decision-making applications such as medical applications.
  • methods: The paper proposes a novel method for mitigating biases in deep neural networks by reducing the model’s sensitivity towards biases through the gradient. The method uses Concept Activation Vectors to model biases and highlights the importance of choosing robust directions.
  • results: The paper demonstrates the effectiveness of the proposed method in controlling biases in controlled and real-world settings on several datasets, including ISIC, Bone Age, ImageNet, and CelebA, using VGG, ResNet, and EfficientNet architectures.
    Abstract Deep Neural Networks are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.
    摘要 翻译结果:深度神经网络 Deep neural networks 深度神经网络 are prone to learning spurious correlations embedded in the training data, leading to potentially biased predictions. This poses risks when deploying these models for high-stake decision-making, such as in medical applications. Current methods for post-hoc model correction either require input-level annotations, which are only possible for spatially localized biases, or augment the latent feature space, thereby hoping to enforce the right reasons. We present a novel method ensuring the right reasons on the concept level by reducing the model's sensitivity towards biases through the gradient. When modeling biases via Concept Activation Vectors, we highlight the importance of choosing robust directions, as traditional regression-based approaches such as Support Vector Machines tend to result in diverging directions. We effectively mitigate biases in controlled and real-world settings on the ISIC, Bone Age, ImageNet and CelebA datasets using VGG, ResNet and EfficientNet architectures.

Can ultrasound confidence maps predict sonographers’ labeling variability?

  • paper_url: http://arxiv.org/abs/2308.09433
  • repo_url: None
  • paper_authors: Vanessa Gonzalez Duque, Leonhard Zirus, Yordanka Velikova, Nassir Navab, Diana Mateus
  • for: 这篇论文的目的是提出一种新的深度学习 segmentation 方法,以帮助诊断人员更准确地识别骨内部结构。
  • methods: 该方法使用了 ultrasound 图像中的 confidence map,以帮助深度学习 segmentation 网络更好地识别图像中的结构。
  • results: 研究发现,使用 confidence map 可以提高 segmentation 的准确性,降低 isolated pixel 的预测数量,并且可以更好地识别图像中的异常区域。
    Abstract Measuring cross-sectional areas in ultrasound images is a standard tool to evaluate disease progress or treatment response. Often addressed today with supervised deep-learning segmentation approaches, existing solutions highly depend upon the quality of experts' annotations. However, the annotation quality in ultrasound is anisotropic and position-variant due to the inherent physical imaging principles, including attenuation, shadows, and missing boundaries, commonly exacerbated with depth. This work proposes a novel approach that guides ultrasound segmentation networks to account for sonographers' uncertainties and generate predictions with variability similar to the experts. We claim that realistic variability can reduce overconfident predictions and improve physicians' acceptance of deep-learning cross-sectional segmentation solutions. Our method provides CM's certainty for each pixel for minimal computational overhead as it can be precalculated directly from the image. We show that there is a correlation between low values in the confidence maps and expert's label uncertainty. Therefore, we propose to give the confidence maps as additional information to the networks. We study the effect of the proposed use of ultrasound CMs in combination with four state-of-the-art neural networks and in two configurations: as a second input channel and as part of the loss. We evaluate our method on 3D ultrasound datasets of the thyroid and lower limb muscles. Our results show ultrasound CMs increase the Dice score, improve the Hausdorff and Average Surface Distances, and decrease the number of isolated pixel predictions. Furthermore, our findings suggest that ultrasound CMs improve the penalization of uncertain areas in the ground truth data, thereby improving problematic interpolations. Our code and example data will be made public at https://github.com/IFL-CAMP/Confidence-segmentation.
    摘要 measuring cross-sectional areas in ultrasound images 是评估疾病进展或治疗效果的标准工具。现有的深度学习分割方法常常高度依赖于专家的注释质量。然而, ultrasound 中注释质量存在各向异性和位置变化,由于物理射频原理所带来的吸收、阴影和缺失边界等问题,这些问题通常会在深度中扩大。这项工作提议一种新的方法,使 ultrasound 分割网络考虑到医生的不确定性,并生成与专家的预测类似的结果。我们认为,使用真实的变化可以减少过于自信的预测,提高医生接受深度学习横截分割解决方案的可能性。我们的方法可以在图像直接从图像中计算CM的确定性,无需较大的计算负担。我们发现,CM的低值与专家标签的不确定性之间存在相关性。因此,我们提议将CM作为额外信息传递给网络。我们在四种当前的状态艺术神经网络和两种配置下对 ultrasound CM 的使用进行了研究。我们的结果表明, ultrasound CM 可以提高 dice 分数,改善 Hausdorff 和平均表面距离,并减少孤立像素预测。此外,我们的发现表明, ultrasound CM 可以改善基于真实数据的 interpolations 问题。我们的代码和示例数据将在 GitHub 上公开。

End-to-end topographic networks as models of cortical map formation and human visual behaviour: moving beyond convolutions

  • paper_url: http://arxiv.org/abs/2308.09431
  • repo_url: None
  • paper_authors: Zejin Lu, Adrien Doerig, Victoria Bosch, Bas Krahmer, Daniel Kaiser, Radoslaw M Cichy, Tim C Kietzmann
  • for: 这篇论文旨在理解震撼视系统的演化和功能,以及如何通过计算模型来理解这种排列结构。
  • methods: 这篇论文使用了All-Topographic Neural Networks(All-TNNs)来模拟视觉系统,并通过训练视觉输入数据来实现了一些猴颗粒的特征,如平滑的方向图和大脑内部的倍增。
  • results: 论文表明,All-TNNs在对人类空间偏好的对象识别任务中表现 significatively better than之前的状态对模型,这是因为All-TNNs的排列性质使得它们更能够和人类视觉系统的排列结构相匹配。
    Abstract Computational models are an essential tool for understanding the origin and functions of the topographic organisation of the primate visual system. Yet, vision is most commonly modelled by convolutional neural networks that ignore topography by learning identical features across space. Here, we overcome this limitation by developing All-Topographic Neural Networks (All-TNNs). Trained on visual input, several features of primate topography emerge in All-TNNs: smooth orientation maps and cortical magnification in their first layer, and category-selective areas in their final layer. In addition, we introduce a novel dataset of human spatial biases in object recognition, which enables us to directly link models to behaviour. We demonstrate that All-TNNs significantly better align with human behaviour than previous state-of-the-art convolutional models due to their topographic nature. All-TNNs thereby mark an important step forward in understanding the spatial organisation of the visual brain and how it mediates visual behaviour.
    摘要 计算模型是视觉系统起源和功能理解的重要工具。然而,大多数视觉模型使用卷积神经网络,这些神经网络忽略了空间的特征,通过学习同样的特征来学习视觉数据。在这里,我们解决了这一限制,开发了全面特征神经网络(All-TNNs)。这些神经网络在训练视觉输入数据后显示出了许多非常有趣的特征,包括顺序映射和 cortical 增强在其第一层,以及在最终层中的类选择区域。此外,我们还介绍了一个新的人类空间偏见数据集,该数据集允许我们直接将模型与行为联系起来。我们表明,All-TNNs 比前一代 convolutional 模型更好地适应人类行为,这是因为它们的特征性质。All-TNNs 因此代表了我们理解视觉脑的空间组织和如何通过视觉行为来转化的重要一步。

Towards Understanding the Generalizability of Delayed Stochastic Gradient Descent

  • paper_url: http://arxiv.org/abs/2308.09430
  • repo_url: None
  • paper_authors: Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, Dacheng Tao
  • for: 这个论文主要研究了异步延迟随机梯度下降(ASGD)在大规模机器学习模型训练中的泛化性表现。
  • methods: 作者使用了生成函数分析工具来研究延迟梯度算法的稳定性,并基于这种稳定性提供了异步延迟随机梯度下降算法的泛化误差Upper bound。
  • results: 作者的理论研究结果表明,异步延迟可以降低延迟随机梯度下降算法的泛化误差。同时,作者还对随机延迟设置进行了相应的分析,并通过实验 validate了他们的理论结论。
    Abstract Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models. However, the generalization performance of asynchronous delayed SGD, which is an essential metric for assessing machine learning algorithms, has rarely been explored. Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization. In this paper, we investigate sharper generalization error bound for SGD with asynchronous delay $\tau$. Leveraging the generating function analysis tool, we first establish the average stability of the delayed gradient algorithm. Based on this algorithmic stability, we provide upper bounds on the generalization error of $\tilde{\mathcal{O}(\frac{T-\tau}{n\tau})$ and $\tilde{\mathcal{O}(\frac{1}{n})$ for quadratic convex and strongly convex problems, respectively, where $T$ refers to the iteration number and $n$ is the amount of training data. Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm. Analogous analysis can be generalized to the random delay setting, and the experimental results validate our theoretical findings.
    摘要 (注:在这个句子中,我们使用了一些简化的中文表达,以便更好地适应中文语言的特点。例如,我们使用了“异步方式”而不是“异步调度”,以便更好地表达SGD的异步性。同时,我们还使用了“$\mathcal{O}$”来表示大O符号,以便更好地表达数学符号。)

Self-Supervised Single-Image Deconvolution with Siamese Neural Networks

  • paper_url: http://arxiv.org/abs/2308.09426
  • repo_url: None
  • paper_authors: Mikhail Papkov, Kaupo Palo, Leopold Parts
  • for: Image reconstruction from noisy observations, specifically in 3D microscopy deconvolution tasks.
  • methods: Deep learning methods with a Siamese invariance loss and Fast Fourier Transform (FFT) convolutions, which improve upon previous state-of-the-art deconvolution methods with a known point spread function.
  • results: Outperformance of the improved framework compared to previous state-of-the-art deconvolution methods, with improved sharpness and reduced grain.
    Abstract Inverse problems in image reconstruction are fundamentally complicated by unknown noise properties. Classical iterative deconvolution approaches amplify noise and require careful parameter selection for an optimal trade-off between sharpness and grain. Deep learning methods allow for flexible parametrization of the noise and learning its properties directly from the data. Recently, self-supervised blind-spot neural networks were successfully adopted for image deconvolution by including a known point-spread function in the end-to-end training. However, their practical application has been limited to 2D images in the biomedical domain because it implies large kernels that are poorly optimized. We tackle this problem with Fast Fourier Transform convolutions that provide training speed-up in 3D microscopy deconvolution tasks. Further, we propose to adopt a Siamese invariance loss for deconvolution and empirically identify its optimal position in the neural network between blind-spot and full image branches. The experimental results show that our improved framework outperforms the previous state-of-the-art deconvolution methods with a known point spread function.
    摘要 <> translate text into Simplified Chinese文本: inverse problems in image reconstruction are fundamentally complicated by unknown noise properties. classical iterative deconvolution approaches amplify noise and require careful parameter selection for an optimal trade-off between sharpness and grain. deep learning methods allow for flexible parametrization of the noise and learning its properties directly from the data. recently, self-supervised blind-spot neural networks were successfully adopted for image deconvolution by including a known point-spread function in the end-to-end training. however, their practical application has been limited to 2D images in the biomedical domain because it implies large kernels that are poorly optimized. we tackle this problem with fast fourier transform convolutions that provide training speed-up in 3D microscopy deconvolution tasks. further, we propose to adopt a siamese invariance loss for deconvolution and empirically identify its optimal position in the neural network between blind-spot and full image branches. the experimental results show that our improved framework outperforms the previous state-of-the-art deconvolution methods with a known point spread function.翻译: inverse problems in 图像重建是基础上 complicated by unknown noise properties。 classical iterative deconvolution approaches amplify noise and require careful parameter selection for an optimal trade-off between sharpness and grain。 deep learning methods allow for flexible parametrization of the noise and learning its properties directly from the data。 recently, self-supervised blind-spot neural networks were successfully adopted for image deconvolution by including a known point-spread function in the end-to-end training。 however, their practical application has been limited to 2D images in the biomedical domain because it implies large kernels that are poorly optimized。 we tackle this problem with fast fourier transform convolutions that provide training speed-up in 3D microscopy deconvolution tasks。 further, we propose to adopt a siamese invariance loss for deconvolution and empirically identify its optimal position in the neural network between blind-spot and full image branches。 experimental results show that our improved framework outperforms the previous state-of-the-art deconvolution methods with a known point spread function。

Enhancing Agent Communication and Learning through Action and Language

  • paper_url: http://arxiv.org/abs/2308.10842
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Caselles-Dupré Hugo, Sigaud Olivier, Chetouani Mohamed
  • for: 本研究旨在开发一种新的GC-代理人,能够同时扮演教师和学生的角色,提高交流效率。
  • methods: 该研究采用了动作示范和语言指令,并考虑了人类交流中的教学和实践元素,以提高代理人的教学和学习能力。
  • results: 研究发现,结合动作和语言交流模式可以提高学习效果,并且多模式交流具有优势。
    Abstract We introduce a novel category of GC-agents capable of functioning as both teachers and learners. Leveraging action-based demonstrations and language-based instructions, these agents enhance communication efficiency. We investigate the incorporation of pedagogy and pragmatism, essential elements in human communication and goal achievement, enhancing the agents' teaching and learning capabilities. Furthermore, we explore the impact of combining communication modes (action and language) on learning outcomes, highlighting the benefits of a multi-modal approach.
    摘要 我们介绍了一种新的GC-代理,可以同时作为教师和学生进行功能。通过动作示例和语言指令,这些代理提高了交流效率。我们研究了包括教学和实践在内的人类communication的关键元素,以提高代理的教学和学习能力。此外,我们还研究了将多种交流模式(动作和语言)结合使用的影响,并 highlighted the benefits of a multi-modal approach。

ICU Mortality Prediction Using Long Short-Term Memory Networks

  • paper_url: http://arxiv.org/abs/2308.12800
  • repo_url: None
  • paper_authors: Manel Mili, Asma Kerkeni, Asma Ben Abdallah, Mohamed Hedi Bedoui
  • for: 预测医院死亡率和医院 lengths of stay (LOS) 的早期预测
  • methods: 使用自动化数据驱动系统,分析大量多变量时间数据,并提取高级信息以预测医院死亡率和LOS
  • results: LSTM 网络模型能够准确预测医院死亡率和LOS,并且可以减少时间帧数以提高临床任务效果
    Abstract Extensive bedside monitoring in Intensive Care Units (ICUs) has resulted in complex temporal data regarding patient physiology, which presents an upscale context for clinical data analysis. In the other hand, identifying the time-series patterns within these data may provide a high aptitude to predict clinical events. Hence, we investigate, during this work, the implementation of an automatic data-driven system, which analyzes large amounts of multivariate temporal data derived from Electronic Health Records (EHRs), and extracts high-level information so as to predict in-hospital mortality and Length of Stay (LOS) early. Practically, we investigate the applicability of LSTM network by reducing the time-frame to 6-hour so as to enhance clinical tasks. The experimental results highlight the efficiency of LSTM model with rigorous multivariate time-series measurements for building real-world prediction engines.
    摘要 使用了床side监测的医疗机构(ICU)已经生成了复杂的时间序列数据,这些数据具有更高的上下文级别,可以进行丰富的临床数据分析。在另一方面,可以通过时间序列模式的识别来预测临床事件。因此,在这项工作中,我们研究了自动化数据驱动系统,该系统分析了大量的多变量时间序列数据,并提取高级信息,以预测医院内死亡率和长期入院天数(LOS)的早期预测。实际上,我们调整了时间帧为6小时,以便进行临床任务。实验结果表明,LSTM模型在严格的多变量时间序列测量上可以建立实用的临床预测机器。Here is the translation of the text into Simplified Chinese:通过床side监测的医疗机构(ICU)已经生成了复杂的时间序列数据,这些数据具有更高的上下文级别,可以进行丰富的临床数据分析。在另一方面,可以通过时间序列模式的识别来预测临床事件。因此,在这项工作中,我们研究了自动化数据驱动系统,该系统分析了大量的多变量时间序列数据,并提取高级信息,以预测医院内死亡率和长期入院天数(LOS)的早期预测。实际上,我们调整了时间帧为6小时,以便进行临床任务。实验结果表明,LSTM模型在严格的多变量时间序列测量上可以建立实用的临床预测机器。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Machine-Learning Solutions for the Analysis of Single-Particle Diffusion Trajectories

  • paper_url: http://arxiv.org/abs/2308.09414
  • repo_url: None
  • paper_authors: Henrik Seckler, Janusz Szwabinski, Ralf Metzler
  • for: 本研究旨在探讨现代机器学习技术如何应用于扩散时序序中,以解释记录的动态。
  • methods: 本文综述了最近引入的机器学习方法,包括在异常扩散挑战中取得成功的方法。这些方法受到批评因为缺乏可解性,所以本文强调了包含不确定性估计和特征基于的方法,以提高解释性和提供具体的学习过程的信息。
  • results: 本文分析了不同的非标型数据集中的预测结果,并评论了未来发展的想法。
    Abstract Single-particle traces of the diffusive motion of molecules, cells, or animals are by-now routinely measured, similar to stochastic records of stock prices or weather data. Deciphering the stochastic mechanism behind the recorded dynamics is vital in understanding the observed systems. Typically, the task is to decipher the exact type of diffusion and/or to determine system parameters. The tools used in this endeavor are currently revolutionized by modern machine-learning techniques. In this Perspective we provide an overview over recently introduced methods in machine-learning for diffusive time series, most notably, those successfully competing in the Anomalous-Diffusion-Challenge. As such methods are often criticized for their lack of interpretability, we focus on means to include uncertainty estimates and feature-based approaches, both improving interpretability and providing concrete insight into the learning process of the machine. We expand the discussion by examining predictions on different out-of-distribution data. We also comment on expected future developments.
    摘要 单 particle跟踪记录分子、细胞或动物的扩散运动已经成为惯例,与 Stochastic 记录如股票价格或天气资料一样。解读这些动态的数学机制是理解观察系统的关键。通常,任务是确定扩散的类型和/或系统参数。现代机器学习技术已经推动这些工具的改革。在这篇 Perspective 中,我们给出了最近引入的机器学习方法 для扩散时间序列,主要是在 Anomalous-Diffusion-Challenge 中竞争成功的方法。由于这些方法经常被批评缺乏解释性,我们专注于包括不确定性估计和特征基于的方法,它们可以提高解释性和给予实际的机器学习过程的关键信息。我们进一步扩展了讨论,评估不同的外部数据类型的预测。我们也评估未来的发展。

Metadata Improves Segmentation Through Multitasking Elicitation

  • paper_url: http://arxiv.org/abs/2308.09411
  • repo_url: None
  • paper_authors: Iaroslav Plutenko, Mikhail Papkov, Kaupo Palo, Leopold Parts, Dmytro Fishman
  • for: 本研究用于探讨Metadata在深度学习方法中的应用,具体来说是在Semantic Segmentation任务中使用Metadata进行改进。
  • methods: 本研究使用了通道调制机制,将Metadata作为 convolutional network 的输入,以提高Semantic Segmentation的结果。
  • results: 研究结果表明,Metadata作为输入可以提高Semantic Segmentation的结果,而且这种改进只需要对现有的模型进行简单修改,不需要增加训练样本或更改网络结构。
    Abstract Metainformation is a common companion to biomedical images. However, this potentially powerful additional source of signal from image acquisition has had limited use in deep learning methods, for semantic segmentation in particular. Here, we incorporate metadata by employing a channel modulation mechanism in convolutional networks and study its effect on semantic segmentation tasks. We demonstrate that metadata as additional input to a convolutional network can improve segmentation results while being inexpensive in implementation as a nimble add-on to popular models. We hypothesize that this benefit of metadata can be attributed to facilitating multitask switching. This aspect of metadata-driven systems is explored and discussed in detail.
    摘要 这文本中的metadata是一个常见的伴侣,尤其是在生物医学影像的摄取中。然而,这具有潜力的额外信号仍然尚未在深度学习方法中得到了广泛的利用,尤其是在semantic segmentation中。在这里,我们将metadata incorporated into convolutional networks using a channel modulation mechanism,并评估其对于semantic segmentation task的影响。我们发现,将metadata作为 convolutional network的额外输入,可以提高分类结果,而且实现起来相对容易,可以作为受欢迎的模型的一个小改进。我们预测,metadata的帮助可以减少多任务的变换,这个方面的metadata-driven系统的特点是详细地探讨和讨论。

Learning MDL logic programs from noisy data

  • paper_url: http://arxiv.org/abs/2308.09393
  • repo_url: None
  • paper_authors: Céline Hocquette, Andreas Niskanen, Matti Järvisalo, Andrew Cropper
  • for: 该论文是为了解决 inductive logic programming 方法在含有噪声数据时学习程序的问题。
  • methods: 该方法使用 minimal description length 来学习从含有噪声数据中的程序,包括回归式程序。
  • results: 我们在多个领域,包括药物设计、游戏撸猫和程序生成中进行了实验,发现我们的方法可以在噪声数据中提高预测精度和扩展至一定程度的噪声。
    Abstract Many inductive logic programming approaches struggle to learn programs from noisy data. To overcome this limitation, we introduce an approach that learns minimal description length programs from noisy data, including recursive programs. Our experiments on several domains, including drug design, game playing, and program synthesis, show that our approach can outperform existing approaches in terms of predictive accuracies and scale to moderate amounts of noise.
    摘要 多种逻辑编程方法在含噪数据上学习程序具有困难,为了解决这一限制,我们提出了一种学习最短描述长度程序从含噪数据中学习,包括循环程序。我们在多个领域,如药物设计、游戏撸抓和程序生成等,进行了实验,结果表明我们的方法可以在适度的噪音量下超过现有方法的预测精度。

FunQuant: A R package to perform quantization in the context of rare events and time-consuming simulations

  • paper_url: http://arxiv.org/abs/2308.10871
  • repo_url: None
  • paper_authors: Charlie Sire, Yann Richet, Rodolphe Le Riche, Didier Rullière, Jérémy Rohmer, Lucie Pheulpin
  • for: 这篇论文是为了描述数据量化的方法和其应用场景。
  • methods: 这篇论文使用 Lloyd’s algorithm,它将数据空间分成Voronoi细分,并基于中心和概率质量来构建离散分布。
  • results: 这篇论文发现,在数据评估是成本高、罕见事件关联的情况下,Lloyd’s algorithm可能会遇到困难,而且单个无事件集中占据大量概率质量。因此,需要使用мета模型和改进的采样方法来增加精度计算的罕见集。
    Abstract Quantization summarizes continuous distributions by calculating a discrete approximation. Among the widely adopted methods for data quantization is Lloyd's algorithm, which partitions the space into Vorono\"i cells, that can be seen as clusters, and constructs a discrete distribution based on their centroids and probabilistic masses. Lloyd's algorithm estimates the optimal centroids in a minimal expected distance sense, but this approach poses significant challenges in scenarios where data evaluation is costly, and relates to rare events. Then, the single cluster associated to no event takes the majority of the probability mass. In this context, a metamodel is required and adapted sampling methods are necessary to increase the precision of the computations on the rare clusters.
    摘要 量化概率分布的目的是将连续的数据转换为离散的形式,以便进行更加简单的计算和分析。 Lloyd 算法是一种广泛采用的数据量化方法,它将数据空间分割成 Voronoi 维度,可以看作是各个类别的集中点,并根据这些集中点和概率质量来构建离散分布。 Lloyd 算法会估算最优中心点,但是在数据评估是成本高、与罕见事件相关的场景下,这种方法会遇到很大的挑战。在这种情况下,需要使用多态模型,并采用适当的采样方法来提高罕见类别的计算精度。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

On Gradient-like Explanation under a Black-box Setting: When Black-box Explanations Become as Good as White-box

  • paper_url: http://arxiv.org/abs/2308.09381
  • repo_url: None
  • paper_authors: Yi Cai, Gerhard Wunder
  • For: 本文旨在提供一种基于梯度估计的解释方法,以便在数据驱动方法中提高解释性。* Methods: 本文提出了一种名为GEEX的解释方法,该方法可在黑盒Setting下提供梯度类型的解释。此外,本文还将GEEX方法与路径方法集成,得到了名为iGEEX的完整解释方法。* Results: 实验表明,提出的方法可以在图像数据上超越当前黑盒方法的表现,并与完全访问的方法具有相似的性能。
    Abstract Attribution methods shed light on the explainability of data-driven approaches such as deep learning models by revealing the most contributing features to decisions that have been made. A widely accepted way of deriving feature attributions is to analyze the gradients of the target function with respect to input features. Analysis of gradients requires full access to the target system, meaning that solutions of this kind treat the target system as a white-box. However, the white-box assumption may be untenable due to security and safety concerns, thus limiting their practical applications. As an answer to the limited flexibility, this paper presents GEEX (gradient-estimation-based explanation), an explanation method that delivers gradient-like explanations under a black-box setting. Furthermore, we integrate the proposed method with a path method. The resulting approach iGEEX (integrated GEEX) satisfies the four fundamental axioms of attribution methods: sensitivity, insensitivity, implementation invariance, and linearity. With a focus on image data, the exhaustive experiments empirically show that the proposed methods outperform state-of-the-art black-box methods and achieve competitive performance compared to the ones with full access.
    摘要 <>将文本翻译成简化中文。<>数据驱动方法的解释方法可以把灯光抛向数字学习模型中的决策过程中的最重要的特征。一种广泛得到的解释特征的 derivation 方法是通过输入特征的梯度分析target 函数的梯度。这种方法需要对目标系统具有全访问权限,因此这种方法被称为白盒模型。然而,白盒假设可能存在安全和安全问题,因此它们的实际应用受到限制。为了解决这些限制,本文提出了 GEEX(梯度估计基于解释),一种解释方法,可以在黑盒Setting下提供梯度类似的解释。此外,我们将该方法与路径方法集成,得到了 iGEEX(集成 GEEX)。该方法满足了解释方法的四个基本假设:敏感性、不敏感性、实现不变性和直线性。对于图像数据,我们进行了大量的实验,证明了我们的方法在黑盒设置下表现出色,并且与完全访问的情况相比,其性能几乎相同。Note: "简化中文" refers to Simplified Chinese, which is one of the two standardized Chinese languages used in mainland China and Singapore.

Deciphering knee osteoarthritis diagnostic features with explainable artificial intelligence: A systematic review

  • paper_url: http://arxiv.org/abs/2308.09380
  • repo_url: None
  • paper_authors: Yun Xin Teoh, Alice Othmani, Siew Li Goh, Juliana Usman, Khin Wee Lai
  • for: 提高膝关节炎诊断的可靠性和可读性,推广人工智能技术在医疗领域的应用。
  • methods: 本文首次对膝关节炎诊断使用可解释人工智能技术进行了评估,从数据可解释性和模型可解释性两个角度进行了分析。
  • results: 研究发现,可解释人工智能技术可以提高膝关节炎诊断的可靠性和可读性,并且可以增强医生对模型预测的信任感。
    Abstract Existing artificial intelligence (AI) models for diagnosing knee osteoarthritis (OA) have faced criticism for their lack of transparency and interpretability, despite achieving medical-expert-like performance. This opacity makes them challenging to trust in clinical practice. Recently, explainable artificial intelligence (XAI) has emerged as a specialized technique that can provide confidence in the model's prediction by revealing how the prediction is derived, thus promoting the use of AI systems in healthcare. This paper presents the first survey of XAI techniques used for knee OA diagnosis. The XAI techniques are discussed from two perspectives: data interpretability and model interpretability. The aim of this paper is to provide valuable insights into XAI's potential towards a more reliable knee OA diagnosis approach and encourage its adoption in clinical practice.
    摘要 现有的人工智能(AI)模型用于诊断膝关节病(OA)已经受到了不透明性和解释性的批评,尽管它们达到了医疗专业人员水平。这种透明性使得它们在临床实践中具有挑战。最近,可解释的人工智能(XAI)技术在医疗领域出现了,它可以为模型预测提供信任度,并且解释预测是如何 derivation。这篇论文是首次对膝关节病诊断中使用XAI技术进行了报告。本文从数据可解释性和模型可解释性两个角度来讨论XAI技术,旨在为读者提供XAI在膝关节病诊断方面的可靠性和可信度,并促进XAI在临床实践中的应用。

Deep Learning Techniques in Extreme Weather Events: A Review

  • paper_url: http://arxiv.org/abs/2308.10995
  • repo_url: None
  • paper_authors: Shikha Verma, Kuldeep Srivastava, Akhilesh Tiwari, Shekhar Verma
  • for: 本评论旨在提供气象预报领域深度学习的现状报告,探讨深度学习在气象预报中的应用和发展趋势。
  • methods: 本评论总结了各种深度学习架构在气象预报中的应用,包括风暴、雷电、降水、旱情、热波、寒波等方面的应用。
  • results: 本评论指出了深度学习在气象预报中的优势,包括能够捕捉复杂的模式和非线性关系,并且对现有方法存在一些限制。
    Abstract Extreme weather events pose significant challenges, thereby demanding techniques for accurate analysis and precise forecasting to mitigate its impact. In recent years, deep learning techniques have emerged as a promising approach for weather forecasting and understanding the dynamics of extreme weather events. This review aims to provide a comprehensive overview of the state-of-the-art deep learning in the field. We explore the utilization of deep learning architectures, across various aspects of weather prediction such as thunderstorm, lightning, precipitation, drought, heatwave, cold waves and tropical cyclones. We highlight the potential of deep learning, such as its ability to capture complex patterns and non-linear relationships. Additionally, we discuss the limitations of current approaches and highlight future directions for advancements in the field of meteorology. The insights gained from this systematic review are crucial for the scientific community to make informed decisions and mitigate the impacts of extreme weather events.
    摘要 极端天气事件带来重大挑战,需要精准的分析和预测方法来减轻其影响。近年来,深度学习技术在天气预测和极端天气事件动力学理解方面emerged as a promising approach。本文提供了天气预测领域的深度学习状态评估,探讨了不同天气元素的适用深度学习架构,包括雨夹、闪电、降水、旱情、热潮、冰潮和热带风暴。我们强调了深度学习的可能性,如其能够捕捉复杂的模式和非线性关系。此外,我们还讨论了当前方法的局限性,并提出了未来的发展方向,以便在天气预测领域取得更大的进步。这些系统性评估的结论对科学社区的决策具有重要意义,以减轻极端天气事件的影响。

Image Processing and Machine Learning for Hyperspectral Unmixing: An Overview and the HySUPP Python Package

  • paper_url: http://arxiv.org/abs/2308.09375
  • repo_url: https://github.com/behnoodrasti/hysupp
  • paper_authors: Behnood Rasti, Alexandre Zouaoui, Julien Mairal, Jocelyn Chanussot
    for:This paper provides an overview of advanced and conventional unmixing approaches for hyperspectral image analysis, and compares their performance on various datasets.methods:The paper discusses linear unmixing techniques, including supervised, semi-supervised, and unsupervised (blind) methods, and their applications in hyperspectral image analysis.results:The experimental results show the advantages of different unmixing categories for different unmixing scenarios, and provide an open-source Python-based package for reproducing the results.
    Abstract Spectral pixels are often a mixture of the pure spectra of the materials, called endmembers, due to the low spatial resolution of hyperspectral sensors, double scattering, and intimate mixtures of materials in the scenes. Unmixing estimates the fractional abundances of the endmembers within the pixel. Depending on the prior knowledge of endmembers, linear unmixing can be divided into three main groups: supervised, semi-supervised, and unsupervised (blind) linear unmixing. Advances in Image processing and machine learning substantially affected unmixing. This paper provides an overview of advanced and conventional unmixing approaches. Additionally, we draw a critical comparison between advanced and conventional techniques from the three categories. We compare the performance of the unmixing techniques on three simulated and two real datasets. The experimental results reveal the advantages of different unmixing categories for different unmixing scenarios. Moreover, we provide an open-source Python-based package available at https://github.com/BehnoodRasti/HySUPP to reproduce the results.
    摘要 spectral pixels 常常是Materials的纯谱的混合物,称为Endmember,由于遥感器的低空间分辨率、双折射和场景中Materials的密切混合,导致了这种混合。混合计算Endmember内Pixel中的含量。根据Endmember的先知情况,线性混合可以分为三类:有监督、半监督和无监督(盲目)线性混合。图像处理和机器学习技术的进步对混合有很大影响。本文提供了高级和传统混合方法的总览,并对这些方法进行了严格的比较。我们在三个模拟 dataset和两个实际 dataset上进行了比较性研究,研究结果表明不同混合类型在不同混合场景中的优势。此外,我们还提供了一个可以在 上下载的开源Python包,以便重现结果。

Noise Sensitivity and Stability of Deep Neural Networks for Binary Classification

  • paper_url: http://arxiv.org/abs/2308.09374
  • repo_url: None
  • paper_authors: Johan Jonasson, Jeffrey E. Steif, Olof Zetterqvist
  • for: 研究深度神经网络(DNN)分类器的不稳定性现象,从布尔函数的角度出发,检查某些布尔函数表示的常见DNN模型是否对随机噪声敏感或稳定。
  • methods: 使用布尔函数理论中的随机噪声敏感和稳定性概念,对常见的批处理和卷积模型进行分析和研究。
  • results: 对于两种标准的DNN模型——批处理和卷积模型——在初始化为高斯权重情况下,研究其在随机噪声下的性质和特性。
    Abstract A first step is taken towards understanding often observed non-robustness phenomena of deep neural net (DNN) classifiers. This is done from the perspective of Boolean functions by asking if certain sequences of Boolean functions represented by common DNN models are noise sensitive or noise stable, concepts defined in the Boolean function literature. Due to the natural randomness in DNN models, these concepts are extended to annealed and quenched versions. Here we sort out the relation between these definitions and investigate the properties of two standard DNN architectures, the fully connected and convolutional models, when initiated with Gaussian weights.
    摘要 第一步是对深度神经网络(DNN)分类器的常见非稳定性现象进行理解。这是通过将Boolean函数的概念应用到常见DNN模型中,以查核这些模型是否对随机变量敏感或稳定的。由于DNN模型的自然随机性,这些概念被扩展到气化和固化版本。我们调整这些定义并调查两种标准DNN架构——完全连接和卷积模型——在各种初始Conditions下的性质。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that instead.

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

  • paper_url: http://arxiv.org/abs/2308.09372
  • repo_url: https://github.com/tobna/whattransformertofavor
  • paper_authors: Tobias Christian Nauen, Sebastian Palacio, Andreas Dengel
  • for: 本研究旨在对多种效率准备的视图变换器和相关架构进行全面的分析,以提供一个公共的基准,从而为实践者和研究人员提供有价值的指导。
  • methods: 本研究使用了多种性能指标来评估多种效率准备的视图变换器和相关架构,包括ViT和其他一些替代方案。
  • results: 研究发现,尽管存在多种宣称更高效的方法,但ViT仍然在多个效率指标上保持Pareto优化的状态,而且混合注意力-CNN模型在具有低执行内存和参数量的情况下表现特别好。此外,研究还发现了图像大小与模型大小之间的关系,以及计算Memory和训练内存之间的正相关关系。
    Abstract The growing popularity of Vision Transformers as the go-to models for image classification has led to an explosion of architectural modifications claiming to be more efficient than the original ViT. However, a wide diversity of experimental conditions prevents a fair comparison between all of them, based solely on their reported results. To address this gap in comparability, we conduct a comprehensive analysis of more than 30 models to evaluate the efficiency of vision transformers and related architectures, considering various performance metrics. Our benchmark provides a comparable baseline across the landscape of efficiency-oriented transformers, unveiling a plethora of surprising insights. For example, we discover that ViT is still Pareto optimal across multiple efficiency metrics, despite the existence of several alternative approaches claiming to be more efficient. Results also indicate that hybrid attention-CNN models fare particularly well when it comes to low inference memory and number of parameters, and also that it is better to scale the model size, than the image size. Furthermore, we uncover a strong positive correlation between the number of FLOPS and the training memory, which enables the estimation of required VRAM from theoretical measurements alone. Thanks to our holistic evaluation, this study offers valuable insights for practitioners and researchers, facilitating informed decisions when selecting models for specific applications. We publicly release our code and data at https://github.com/tobna/WhatTransformerToFavor
    摘要 “vision transformer”的崛起使得许多架构 modifications 宣称更有效性 than the original ViT,但是实验环境的多样性使得比较这些模型的公平性很难。为了解决这个问题,我们进行了详细的30多个模型的分析,以评估视觉 трансформа器和相关的架构在不同的效率指标下的表现。我们的参考基准提供了跨多种效率对应的比较基线,揭露了许多意外的发现。例如,我们发现,即使有许多替代方案,ViT仍然在多个效率指标上是Pareto优化的。实验结果显示,混合注意力-CNN模型在内存少量和参数少量的情况下表现 particulary well,而且较好的对image size的调整,而不是模型size。此外,我们发现了许多FLOPS和训练内存之间的强正相关,这使得从理论测量 alone 可以估算需要的VRAM。这些结果将为实践者和研究人员提供有用的参考,帮助他们在特定应用中选择合适的模型。我们将代码和数据公开发布在https://github.com/tobna/WhatTransformerToFavor上。”

A tailored Handwritten-Text-Recognition System for Medieval Latin

  • paper_url: http://arxiv.org/abs/2308.09368
  • repo_url: None
  • paper_authors: Philipp Koch, Gilary Vera Nuñez, Esteban Garces Arias, Christian Heumann, Matthias Schöffel, Alexander Häberlin, Matthias Aßenmacher
  • for: This paper aims to digitize the Medieval Latin Dictionary by using Handwritten Text Recognition (HTR) to transcribe handwritten lemmas on record cards.
  • methods: The authors employ two state-of-the-art image segmentation models to prepare the initial data set, and experiment with different transformer-based models and data augmentation techniques to improve the HTR performance.
  • results: The best-performing setup achieved a Character Error Rate (CER) of 0.015, which is superior to the commercial Google Cloud Vision model and shows more stable performance.
    Abstract The Bavarian Academy of Sciences and Humanities aims to digitize its Medieval Latin Dictionary. This dictionary entails record cards referring to lemmas in medieval Latin, a low-resource language. A crucial step of the digitization process is the Handwritten Text Recognition (HTR) of the handwritten lemmas found on these record cards. In our work, we introduce an end-to-end pipeline, tailored to the medieval Latin dictionary, for locating, extracting, and transcribing the lemmas. We employ two state-of-the-art (SOTA) image segmentation models to prepare the initial data set for the HTR task. Furthermore, we experiment with different transformer-based models and conduct a set of experiments to explore the capabilities of different combinations of vision encoders with a GPT-2 decoder. Additionally, we also apply extensive data augmentation resulting in a highly competitive model. The best-performing setup achieved a Character Error Rate (CER) of 0.015, which is even superior to the commercial Google Cloud Vision model, and shows more stable performance.
    摘要 《巴伐利亚科学院计划将中世纪拉丁词典数字化。这个词典包含手写记录卡上的中世纪拉丁词汇,这是一种低资源语言。我们的工作是设计一个端到端管道,用于在记录卡上找到、提取和转写词汇。我们使用两种最新的图像分割模型来准备初始数据集,并运用多种变换器模型和GPT-2解码器进行实验,以探索不同组合的可能性。此外,我们还进行了广泛的数据增强,实现了非常竞争力强的模型。最佳配置达到了字符错误率(CER)0.015,超越了商业Google云视觉模型,表现更稳定。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The Traditional Chinese writing system is used in Taiwan and other parts of the world.

On the Approximation of Bi-Lipschitz Maps by Invertible Neural Networks

  • paper_url: http://arxiv.org/abs/2308.09367
  • repo_url: None
  • paper_authors: Bangti Jin, Zehui Zhou, Jun Zou
  • for: 这个论文主要是为了研究嵌入型神经网络(INNs)的表达能力和应用场景。
  • methods: 论文使用了 coupling-based INNs 来表示bi-Lipschitz连续函数的映射,并提出了一种基于模型缩放、主成分分析和 INNs 的方法来近似无穷维空间上的bi-Lipschitz映射。
  • results: 论文显示了 coupling-based INNs 可以同时良好地近似前向和反向映射,并提出了一种可以同时近似前向和反向映射的方法。此外,论文还进行了初步的数值实验,并表明了该方法的可行性。
    Abstract Invertible neural networks (INNs) represent an important class of deep neural network architectures that have been widely used in several applications. The universal approximation properties of INNs have also been established recently. However, the approximation rate of INNs is largely missing. In this work, we provide an analysis of the capacity of a class of coupling-based INNs to approximate bi-Lipschitz continuous mappings on a compact domain, and the result shows that it can well approximate both forward and inverse maps simultaneously. Furthermore, we develop an approach for approximating bi-Lipschitz maps on infinite-dimensional spaces that simultaneously approximate the forward and inverse maps, by combining model reduction with principal component analysis and INNs for approximating the reduced map, and we analyze the overall approximation error of the approach. Preliminary numerical results show the feasibility of the approach for approximating the solution operator for parameterized second-order elliptic problems.
    摘要 insehen neural networks (INNs) represent an important class of deep neural network architectures that have been widely used in several applications. The universal approximation properties of INNs have also been established recently. However, the approximation rate of INNs is largely missing. In this work, we provide an analysis of the capacity of a class of coupling-based INNs to approximate bi-Lipschitz continuous mappings on a compact domain, and the result shows that it can well approximate both forward and inverse maps simultaneously. Furthermore, we develop an approach for approximating bi-Lipschitz maps on infinite-dimensional spaces that simultaneously approximate the forward and inverse maps, by combining model reduction with principal component analysis and INNs for approximating the reduced map, and we analyze the overall approximation error of the approach. Preliminary numerical results show the feasibility of the approach for approximating the solution operator for parameterized second-order elliptic problems.Here's the translation in Traditional Chinese:这里的文本翻译为简化字的中文:对于深度神经网络(INNs)来说,它们是一种广泛使用的深度学习架构,而且在最近的研究中,它们的通用预测性也得到了证明。然而,INNs的预测率却受到了很大的限制。在这个研究中,我们提供了一个 coupling-based INNs 的容量分析,该分析表明这种 INNs 可以同时对内部和外部的映射进行良好的预测。此外,我们还开发了一种可以同时预测前向和反向映射的方法,这种方法基于模型简化、主成分分析和 INNs 的 combinaison,并且分析了这个方法的总预测误差。preliminary numerical results 表明这个方法可以实现解析Operator的解析。

Multi-feature concatenation and multi-classifier stacking: an interpretable and generalizable machine learning method for MDD discrimination with rsfMRI

  • paper_url: http://arxiv.org/abs/2308.09360
  • repo_url: None
  • paper_authors: Yunsong Luo, Wenyu Chen, Ling Zhan, Jiang Qiu, Tao Jia
  • For: The paper aims to improve the accuracy of diagnosing major depressive disorder (MDD) using resting-state functional MRI (rsfMRI) and machine learning algorithms.* Methods: The paper proposes a machine learning method called Multi-Feature Multi-Classifier (MFMC) that concatenates multiple features and stacks multiple classifiers to discriminate MDD patients from normal controls. The method is tested on the REST-meta-MDD data set, which contains 2428 subjects from 25 different sites.* Results: The paper reports that MFMC achieves 96.9% MDD discrimination accuracy, outperforming existing methods. The method is also validated for its generalizability by showing good performance when the training and testing subjects are from independent sites. Additionally, the paper identifies 13 feature values related to 9 brain regions that contribute most to the classification and demonstrate significant differences at the group level.
    Abstract Major depressive disorder is a serious and heterogeneous psychiatric disorder that needs accurate diagnosis. Resting-state functional MRI (rsfMRI), which captures multiple perspectives on brain structure, function, and connectivity, is increasingly applied in the diagnosis and pathological research of mental diseases. Different machine learning algorithms are then developed to exploit the rich information in rsfMRI and discriminate MDD patients from normal controls. Despite recent advances reported, the discrimination accuracy has room for further improvement. The generalizability and interpretability of the method are not sufficiently addressed either. Here, we propose a machine learning method (MFMC) for MDD discrimination by concatenating multiple features and stacking multiple classifiers. MFMC is tested on the REST-meta-MDD data set that contains 2428 subjects collected from 25 different sites. MFMC yields 96.9% MDD discrimination accuracy, demonstrating a significant improvement over existing methods. In addition, the generalizability of MFMC is validated by the good performance when the training and testing subjects are from independent sites. The use of XGBoost as the meta classifier allows us to probe the decision process of MFMC. We identify 13 feature values related to 9 brain regions including the posterior cingulate gyrus, superior frontal gyrus orbital part, and angular gyrus, which contribute most to the classification and also demonstrate significant differences at the group level. The use of these 13 feature values alone can reach 87% of MFMC's full performance when taking all feature values. These features may serve as clinically useful diagnostic and prognostic biomarkers for mental disorders in the future.
    摘要 大笔迹谱精神疾病是一种严重多样的心理疾病,需要准确诊断。休息态功能磁共振成像(rsfMRI)已成为诊断和病理研究心理疾病的新技术。不同的机器学习算法被开发出来利用rsfMRI中的多种信息,以分辨抑郁症患者和正常控制人群。尽管有最新的进展报道,但iscrimination accuracy还有很大的提升空间。此外,方法的一般可行性和可解释性也没有充分考虑。我们提出了一种机器学习方法(MFMC),通过 concatenating多个特征和堆叠多个分类器来实现抑郁症分辨。MFMC在 REST-meta-MDD 数据集上进行测试,包括 2428 名参与者,来自 25 个不同的站点。MFMC 的抑郁症分辨率为 96.9%,显著高于现有方法。此外,我们还 validate了 MFMC 的一般可行性,通过在训练和测试集来自独立站点时进行测试。使用 XGBoost 作为元分类器,我们可以探索 MFMC 的决策过程。我们发现了 13 个特征值,与 9 个大脑区域相关,包括后中枢孔隙、上前叶颞部和抽象皮层,这些特征值对分类具有最大的贡献,同时在群体水平也存在显著差异。这些特征值可能在未来作为心理疾病的临床有用的诊断和预后标志。

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

  • paper_url: http://arxiv.org/abs/2308.09351
  • repo_url: https://github.com/jacobyuan7/rlipv2
  • paper_authors: Hangjie Yuan, Shiwei Zhang, Xiang Wang, Samuel Albanie, Yining Pan, Tao Feng, Jianwen Jiang, Dong Ni, Yingya Zhang, Deli Zhao
  • For: 本文提出了一种名为RLIPv2的模型,用于提高计算机视觉任务中关系逻辑能力。RLIPv2可以快速地训练和 fine-tune,并且可以在大规模 pseudo-labelled scene graph 数据上进行预训练。* Methods: RLIPv2 使用了一种名为ALIF的机制,它可以在早期和深度的干扰模式中进行快速的语言-图像 fusión。此外,RLIPv2 还使用了一种名为 Relation Tagger 的工具,用于生成大量的自由形式关系标签。* Results: 通过广泛的实验,RLIPv2 在人物-物体互动检测和场景图生成任务上达到了状态的前导性表现。RLIPv2 可以在完全训练、几何shot 和零shot 设置下达到 state-of-the-art 性能。特别是,RLIPv2 的最大模型在没有任何训练的情况下可以达到 23.29mAP 的表现,并且在 1% 数据上可以达到 32.22mAP,在 100% 数据上可以达到 45.09mAP。
    Abstract Relational Language-Image Pre-training (RLIP) aims to align vision representations with relational texts, thereby advancing the capability of relational reasoning in computer vision tasks. However, hindered by the slow convergence of RLIPv1 architecture and the limited availability of existing scene graph data, scaling RLIPv1 is challenging. In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data. To enable fast scaling, RLIPv2 introduces Asymmetric Language-Image Fusion (ALIF), a mechanism that facilitates earlier and deeper gated cross-modal fusion with sparsified language encoding layers. ALIF leads to comparable or better performance than RLIPv1 in a fraction of the time for pre-training and fine-tuning. To obtain scene graph data at scale, we extend object detection datasets with free-form relation labels by introducing a captioner (e.g., BLIP) and a designed Relation Tagger. The Relation Tagger assigns BLIP-generated relation texts to region pairs, thus enabling larger-scale relational pre-training. Through extensive experiments conducted on Human-Object Interaction Detection and Scene Graph Generation, RLIPv2 shows state-of-the-art performance on three benchmarks under fully-finetuning, few-shot and zero-shot settings. Notably, the largest RLIPv2 achieves 23.29mAP on HICO-DET without any fine-tuning, yields 32.22mAP with just 1% data and yields 45.09mAP with 100% data. Code and models are publicly available at https://github.com/JacobYuan7/RLIPv2.
    摘要 RLIPv2是一种快速结合模型,旨在将视觉表示与关系文本相align,从而提高计算机视觉任务中的关系逻辑能力。然而,RLIPv1架构的慢速收敛和现有场景图数据的有限性,使得扩展RLIPv1的 scaling 困难。在这篇论文中,我们提出RLIPv2,一种快速收敛的模型,可以在大规模 pseudo-labelled 场景图数据上进行关系预训练。为了实现快速收敛,RLIPv2引入了Asymmetric Language-Image Fusion(ALIF)机制,该机制使得在执行cross-modal fusión之前,可以通过隐藏层进行早期和深度的受限缓存语言编码层。ALIF使得RLIPv2在预训练和精度调整中比RLIPv1更快收敛。为了获得大规模场景图数据,我们将对象检测dataset扩展,并在其中添加自由形态关系标签。我们引入了一个captioner(例如BLIP)和一个设计的Relation Tagger,以便为region对 assigning BLIP生成的关系文本。这种方法使得我们可以在大规模的关系预训练中获得更多的数据。通过广泛的实验,我们证明RLIPv2在人物物体互动检测和场景图生成中具有状态机器人的表现,在三个标准准则下达到了全finetuning、少shot和零shot设置下的最佳性能。特别是,RLIPv2最大化的模型在HICO-DET无需任何调整就达到了23.29mAP,并且在1%数据上达到了32.22mAP,在100%数据上达到了45.09mAP。我们的代码和模型在https://github.com/JacobYuan7/RLIPv2上公开。

Denoising diffusion-based MR to CT image translation enables whole spine vertebral segmentation in 2D and 3D without manual annotations

  • paper_url: http://arxiv.org/abs/2308.09345
  • repo_url: https://github.com/robert-graf/readable-conditional-denoising-diffusion
  • paper_authors: Robert Graf, Joachim Schmitt, Sarah Schlaeger, Hendrik Kristian Möller, Vasiliki Sideri-Lampretsa, Anjany Sekuboyina, Sandro Manuel Krieg, Benedikt Wiestler, Bjoern Menze, Daniel Rueckert, Jan Stefan Kirschke
    for: 这个研究的目的是将MR图像翻译成CT图像,以便更好地分类和诊断脊梁部分的疾病。methods: 该研究使用了两种方法: paired image-to-image translation 和 unpaired image-to-image translation。 paired 方法使用了 landmark-based registration 来对图像进行对齐,而 unpaired 方法使用了 contrastive unpaired translation 和 SynDiff 来进行对齐。results: 研究发现,paired 方法和 SynDiff 在对照数据上 exhibited 相似的翻译性和 Dice 分数。 DDIM 图像模式 achieve 最高的图像质量。 SynDiff、Pix2Pix 和 DDIM 图像模式在 paired 数据上都达到了 Dice 分数的 0.77。 在沿着脊梁轴的旋转变换下,至少需要两个附加的标记来进行注册。 3D 翻译方法在 Dice 分数和图像质量上超过了 2D 方法,并提供了更高分辨率的分类结果。
    Abstract Background: Automated segmentation of spinal MR images plays a vital role both scientifically and clinically. However, accurately delineating posterior spine structures presents challenges. Methods: This retrospective study, approved by the ethical committee, involved translating T1w and T2w MR image series into CT images in a total of n=263 pairs of CT/MR series. Landmark-based registration was performed to align image pairs. We compared 2D paired (Pix2Pix, denoising diffusion implicit models (DDIM) image mode, DDIM noise mode) and unpaired (contrastive unpaired translation, SynDiff) image-to-image translation using "peak signal to noise ratio" (PSNR) as quality measure. A publicly available segmentation network segmented the synthesized CT datasets, and Dice scores were evaluated on in-house test sets and the "MRSpineSeg Challenge" volumes. The 2D findings were extended to 3D Pix2Pix and DDIM. Results: 2D paired methods and SynDiff exhibited similar translation performance and Dice scores on paired data. DDIM image mode achieved the highest image quality. SynDiff, Pix2Pix, and DDIM image mode demonstrated similar Dice scores (0.77). For craniocaudal axis rotations, at least two landmarks per vertebra were required for registration. The 3D translation outperformed the 2D approach, resulting in improved Dice scores (0.80) and anatomically accurate segmentations in a higher resolution than the original MR image. Conclusion: Two landmarks per vertebra registration enabled paired image-to-image translation from MR to CT and outperformed all unpaired approaches. The 3D techniques provided anatomically correct segmentations, avoiding underprediction of small structures like the spinous process.
    摘要 背景:自动分割神经穿梭MR图像在科学和临床领域中发挥重要作用。然而,准确地界定后方脊梁结构具有挑战。方法:这是一项回顾性研究,得到了伦敦委员会的批准。研究通过将T1w和T2wMR图像序列翻译成CT图像序列,共计n=263对CT/MR序列进行了同步注册。我们使用了“峰峰信号响应比”(PSNR)作为质量指标,并对在家试验集和“MRSpineSeg Challenge”volumes上进行了Dice分数的评估。使用了一个公共可用的 segmentation network 将生成的Synthesized CTdataset中进行了 segmentation。结果:2D paired方法和SynDiff在对应数据上显示了类似的翻译性和Dice分数。DDIM图像模式实现了最高的图像质量。SynDiff、Pix2Pix和DDIM图像模式在对应数据上具有类似的Dice分数(0.77)。对于脊梁轴向旋转,至少需要两个附加的Landmark每个vertebra进行注册。3D翻译超过了2D方法,导致了改进的Dice分数(0.80)和高分辨率的正确分 segmentation。结论:对每个vertebra需要至少两个Landmark注册,以实现paired图像-图像翻译从MR到CT,并且超过了所有的不对应方法。3D技术提供了正确的分 segmentation,避免了小结构 like spinous process的下预测。

Surprise machines: revealing Harvard Art Museums’ image collection

  • paper_url: http://arxiv.org/abs/2308.09343
  • repo_url: None
  • paper_authors: Dario Rodighiero, Lins Derry, Douglas Duhaime, Jordan Kruguer, Maximilian C. Mueller, Christopher Pietsch, Jeffrey T. Schnapp, Jeff Steward
  • for: 这个研究是为了使用人工智能技术来visualize哈佛艺术博物馆的全部图像收藏,以便开拓不可能 для访客访问的更多than 200,000个物品的新视野。
  • methods: 这个项目使用了人工智能技术,设计了一个舞台式界面,将评估者的移动与多个特有视角的收藏相连接,以创造对访客的惊喜。
  • results: 这个项目成功地使用了人工智能技术来显示大量图像,创造了对访客的惊喜和新的视野,开拓了不可能 для访客访问的更多than 200,000个物品的新视野。
    Abstract Surprise Machines is a project of experimental museology that sets out to visualize the entire image collection of the Harvard Art Museums, intending to open up unexpected vistas on more than 200,000 objects usually inaccessible to visitors. Part of the exhibition Curatorial A(i)gents organized by metaLAB (at) Harvard, the project explores the limits of artificial intelligence to display a large set of images and create surprise among visitors. To achieve such a feeling of surprise, a choreographic interface was designed to connect the audience's movement with several unique views of the collection.
    摘要 “抽象机器”是哈佛艺术博物馆的一个实验博物馆项目,旨在视觉化哈佛艺术博物馆的全幅图像收藏,以创造未知的景象,并对访客展示200,000件以上不可见的物品。这个项目是metaLAB(@)哈佛的 Curatorial A(i)gents 展览的一部分,探索人工智能可以显示大量图像的限制,并创造访客的惊喜感。为了实现这种感觉,设计了一个舞蹈式的用户界面,让访客的运动与收藏中的多个独特视野相连。

Document Automation Architectures: Updated Survey in Light of Large Language Models

  • paper_url: http://arxiv.org/abs/2308.09341
  • repo_url: None
  • paper_authors: Mohammad Ahmadi Achachlouei, Omkar Patil, Tarun Joshi, Vijayan N. Nair
  • for: 这篇论文主要是为了对文档自动化(DA)的当前状况进行评估和概述。
  • methods: 本论文使用了学术研究的 DA 架构和技术来对不同来源的输入自动创建和组合,并且使用了定义模板来生成符合要求的文档。
  • results: 本论文通过对学术研究的 DA 架构和技术进行评估和概述,提供了更清晰的 DA 定义和特征,并且预测了基于生成 AI 和大语言模型的新研究机遇。
    Abstract This paper surveys the current state of the art in document automation (DA). The objective of DA is to reduce the manual effort during the generation of documents by automatically creating and integrating input from different sources and assembling documents conforming to defined templates. There have been reviews of commercial solutions of DA, particularly in the legal domain, but to date there has been no comprehensive review of the academic research on DA architectures and technologies. The current survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features, identifies state-of-the-art DA architectures and technologies in academic research, and provides ideas that can lead to new research opportunities within the DA field in light of recent advances in generative AI and large language models.
    摘要 This survey of DA reviews the academic literature and provides a clearer definition and characterization of DA and its features. It identifies state-of-the-art DA architectures and technologies in academic research and suggests new research opportunities in the field of DA, taking into account recent advances in generative AI and large language models.

Causal Interpretable Progression Trajectory Analysis of Chronic Disease

  • paper_url: http://arxiv.org/abs/2308.09735
  • repo_url: None
  • paper_authors: Zhoujian Sun, Wenzhuo Zhang, Zhengxing Huang, Nai Ding
  • for: 预测疾病进程轨迹和决策支持
  • methods: combining trajectory prediction and causal discovery
  • results: 提供精度预测疾病进程轨迹和揭示 causal 关系,增强模型的解释性In English, this means:
  • for: Predicting disease progression trajectories and supporting clinical decisions
  • methods: Combining trajectory prediction and causal discovery
  • results: Providing accurate predictions of disease progression trajectories and uncovering causal relationships, enhancing the interpretability of the model.
    Abstract Chronic disease is the leading cause of death, emphasizing the need for accurate prediction of disease progression trajectories and informed clinical decision-making. Machine learning (ML) models have shown promise in this domain by capturing non-linear patterns within patient features. However, existing ML-based models lack the ability to provide causal interpretable predictions and estimate treatment effects, limiting their decision-assisting perspective. In this study, we propose a novel model called causal trajectory prediction (CTP) to tackle the limitation. The CTP model combines trajectory prediction and causal discovery to enable accurate prediction of disease progression trajectories and uncovering causal relationships between features. By incorporating a causal graph into the prediction process, CTP ensures that ancestor features are not influenced by treatment on descendant features, thereby enhancing the interpretability of the model. By estimating the bounds of treatment effects, even in the presence of unmeasured confounders, the CTP provides valuable insights for clinical decision-making. We evaluate the performance of the CTP using simulated and real medical datasets. Experimental results demonstrate that our model achieves satisfactory performance, highlighting its potential to assist clinical decisions.
    摘要 Chronic disease is the leading cause of death, emphasizing the need for accurate prediction of disease progression trajectories and informed clinical decision-making. Machine learning (ML) models have shown promise in this domain by capturing non-linear patterns within patient features. However, existing ML-based models lack the ability to provide causal interpretable predictions and estimate treatment effects, limiting their decision-assisting perspective. In this study, we propose a novel model called causal trajectory prediction (CTP) to tackle the limitation. The CTP model combines trajectory prediction and causal discovery to enable accurate prediction of disease progression trajectories and uncovering causal relationships between features. By incorporating a causal graph into the prediction process, CTP ensures that ancestor features are not influenced by treatment on descendant features, thereby enhancing the interpretability of the model. By estimating the bounds of treatment effects, even in the presence of unmeasured confounders, the CTP provides valuable insights for clinical decision-making. We evaluate the performance of the CTP using simulated and real medical datasets. Experimental results demonstrate that our model achieves satisfactory performance, highlighting its potential to assist clinical decisions.Here is the word-for-word translation of the text into Simplified Chinese: Chronic disease is the leading cause of death, emphasizing the need for accurate prediction of disease progression trajectories and informed clinical decision-making. Machine learning (ML) models have shown promise in this domain by capturing non-linear patterns within patient features. However, existing ML-based models lack the ability to provide causal interpretable predictions and estimate treatment effects, limiting their decision-assisting perspective. In this study, we propose a novel model called causal trajectory prediction (CTP) to tackle the limitation. The CTP model combines trajectory prediction and causal discovery to enable accurate prediction of disease progression trajectories and uncovering causal relationships between features. By incorporating a causal graph into the prediction process, CTP ensures that ancestor features are not influenced by treatment on descendant features, thereby enhancing the interpretability of the model. By estimating the bounds of treatment effects, even in the presence of unmeasured confounders, the CTP provides valuable insights for clinical decision-making. We evaluate the performance of the CTP using simulated and real medical datasets. Experimental results demonstrate that our model achieves satisfactory performance, highlighting its potential to assist clinical decisions.

Towards Attack-tolerant Federated Learning via Critical Parameter Analysis

  • paper_url: http://arxiv.org/abs/2308.09318
  • repo_url: https://github.com/sungwon-han/fedcpa
  • paper_authors: Sungwon Han, Sungwon Park, Fangzhao Wu, Sundong Kim, Bin Zhu, Xing Xie, Meeyoung Cha
  • for: 本研究旨在提出一种防御欺诈攻击的聚合方法,以帮助在分布式学习中训练共享模型。
  • methods: 本研究使用了批处理学习和敏感参数分析(FedCPA)来防御欺诈攻击。
  • results: 实验结果显示,FedCPA模型在不同的攻击场景下,能够更高效地防御欺诈攻击,并且在多个数据集上表现出色。
    Abstract Federated learning is used to train a shared model in a decentralized way without clients sharing private data with each other. Federated learning systems are susceptible to poisoning attacks when malicious clients send false updates to the central server. Existing defense strategies are ineffective under non-IID data settings. This paper proposes a new defense strategy, FedCPA (Federated learning with Critical Parameter Analysis). Our attack-tolerant aggregation method is based on the observation that benign local models have similar sets of top-k and bottom-k critical parameters, whereas poisoned local models do not. Experiments with different attack scenarios on multiple datasets demonstrate that our model outperforms existing defense strategies in defending against poisoning attacks.
    摘要 federated learning 是一种在分布式方式下训练共享模型,而不需要客户端对彼此分享私人数据。 federated learning 系统容易受到恶意客户端发送false更新给中央服务器的恶意攻击。现有的防御策略在非同分布数据设置下无效。这篇论文提出了一种新的防御策略,即 FedCPA( Federated learning with Critical Parameter Analysis)。我们的攻击忍受聚合方法基于本地模型的惊异性和恶意模型的不同性。实验结果表明,我们的模型在不同的攻击场景下,在防御恶意攻击方面表现出色,超过了现有的防御策略。

Path Signatures for Seizure Forecasting

  • paper_url: http://arxiv.org/abs/2308.09312
  • repo_url: None
  • paper_authors: Jonas F. Haderlein, Andre D. H. Peterson, Parvin Zarei Eskikand, Mark J. Cook, Anthony N. Burkitt, Iven M. Y. Mareels, David B. Grayden
  • for: 预测脑动力系统的状态从观测时间序列数据中
  • methods: 使用现有和新的特征提取算法,包括路径签名,一种新的时间序列分析方法,以及统计分类算法和内置的子集选择来自动发现和评估患者特定的病理特征,以预测发作
  • results: 通过使用这些复杂、非线性特征,对发作预测进行了自动发现和评估,并与基于线性特征进行了比较
    Abstract Forecasting the state of a system from an observed time series is the subject of research in many domains, such as computational neuroscience. Here, the prediction of epileptic seizures from brain measurements is an unresolved problem. There are neither complete models describing underlying brain dynamics, nor do individual patients exhibit a single seizure onset pattern, which complicates the development of a `one-size-fits-all' solution. Based on a longitudinal patient data set, we address the automated discovery and quantification of statistical features (biomarkers) that can be used to forecast seizures in a patient-specific way. We use existing and novel feature extraction algorithms, in particular the path signature, a recent development in time series analysis. Of particular interest is how this set of complex, nonlinear features performs compared to simpler, linear features on this task. Our inference is based on statistical classification algorithms with in-built subset selection to discern time series with and without an impending seizure while selecting only a small number of relevant features. This study may be seen as a step towards a generalisable pattern recognition pipeline for time series in a broader context.
    摘要 预测系统的状态从观测时序列是许多领域的研究主题,如计算神经科学。在这里,从脑测量获取的癫痫病发症预测是一个未解决的问题。没有完整的模型描述下面动力学,也没有各个患者表现出单一的癫痫病发症开始模式,这使得开发一个"一Size-fits-all"解决方案变得复杂。基于长期患者数据集,我们addresses自动发现和量化统计特征(生物标志),可以用于预测患者特定的癫痫病发症。我们使用现有和新的特征提取算法,特别是路径签名,是时间序列分析的新发展。我们的推断基于统计分类算法,并包含内置的子集选择,以分辨时间序列中有无危险癫痫病发症,并且只选择一小部分相关的特征。这项研究可能被看作为时间序列普遍的模式识别管道的一步。

Variance reduction techniques for stochastic proximal point algorithms

  • paper_url: http://arxiv.org/abs/2308.09310
  • repo_url: None
  • paper_authors: Cheik Traoré, Vassilis Apidopoulos, Saverio Salzo, Silvia Villa
  • for: 针对于贝叶斯随机Gradient方法的性能提高,使用变量减少技术广泛应用。
  • methods: 提出了随机 proximal 点算法,并对其进行了随机 proximal 版本的SVRG、SAGA以及一些变体的研究。
  • results: 对iterates和目标函数值进行了许多的收敛结果,并在PL条件下获得了线性收敛率。实验表明,采用 proximal 变量减少方法比 Gradient 方法更稳定,尤其是对步长的选择。
    Abstract In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but a proper variance reduced version is missing. In this work, we propose the first study of variance reduction techniques for stochastic proximal point algorithms. We introduce a stochastic proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-{\L}ojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the step size.
    摘要 在finite sums minimization中, variance reduction techniques 广泛应用以提高现有的泛型梯度方法性能。其实际影响明显,同时其理论性质也很清晰。随机 proximal point algoritms 被视为梯度算法的代替方案,因为它们对步长的选择更加稳定。在这项工作中,我们提出了variance reduction techniques的首次研究 для随机 proximal point algoritms。我们引入了随机 proximal版本的SVRG、SAGA以及一些其他的变体,用于对凸函数进行优化。我们提供了许多收敛结果 для迭代值和目标函数值。此外,在Polyak-{\L}ojasiewicz(PL)条件下,我们得到了梯度值和函数值的线性收敛率。我们的numerical experiments表明, proximal variance reduction方法比其梯度对应方法更加稳定,特别是对步长的选择。

Meta-learning enhanced next POI recommendation by leveraging check-ins from auxiliary cities

  • paper_url: http://arxiv.org/abs/2308.09309
  • repo_url: https://github.com/oli-wang/merec
  • paper_authors: Jinze Wang, Lu Zhang, Zhu Sun, Yew-Soon Ong
  • for: 提高城市级用户偏好学习的效果,弥补城市级用户历史检查入数据缺乏问题。
  • methods: 基于城市级用户历史检查入数据和相关城市之间的相似性关系,提出一种Meta-learning Enhanced next POI Recommendation(MERec)框架。MERec利用了城市级用户检查入行为之间的相似性,在meta-learning paradigm中帮助推断用户偏好。
  • results: 对比于现有算法,MERec表现出了显著的优势。
    Abstract Most existing point-of-interest (POI) recommenders aim to capture user preference by employing city-level user historical check-ins, thus facilitating users' exploration of the city. However, the scarcity of city-level user check-ins brings a significant challenge to user preference learning. Although prior studies attempt to mitigate this challenge by exploiting various context information, e.g., spatio-temporal information, they ignore to transfer the knowledge (i.e., common behavioral pattern) from other relevant cities (i.e., auxiliary cities). In this paper, we investigate the effect of knowledge distilled from auxiliary cities and thus propose a novel Meta-learning Enhanced next POI Recommendation framework (MERec). The MERec leverages the correlation of check-in behaviors among various cities into the meta-learning paradigm to help infer user preference in the target city, by holding the principle of "paying more attention to more correlated knowledge". Particularly, a city-level correlation strategy is devised to attentively capture common patterns among cities, so as to transfer more relevant knowledge from more correlated cities. Extensive experiments verify the superiority of the proposed MERec against state-of-the-art algorithms.
    摘要 现有的点 интерес (POI) 推荐器大多 aim to 捕捉用户偏好,通过使用城市级别的用户历史检查入,以便用户探索城市。然而,城市级别的用户检查入缺乏是一个重要的挑战,用户偏好学习。尽管先前的研究尝试通过利用不同的上下文信息,如空间-时间信息,来解决这个挑战,但它们忽略了将知识(即共享行为模式)从其他相关城市(即辅助城市)传递到用户。在本文中,我们研究了auxiliary cities中的知识传递对用户喜好的影响,并提出了一种基于meta-learning的下一POI推荐框架(MERec)。MERec利用城市之间的检查入行为相似性在meta-learning中进行学习,以帮助推荐用户喜好的下一POI。特别是,我们设计了一种城市级别的相似性策略,以便精准地捕捉城市之间的共享模式,从而将更加相关的知识传递到用户。经验证明,我们提出的MERec在对state-of-the-art算法进行比较时显示出了明显的优势。

Online Class Incremental Learning on Stochastic Blurry Task Boundary via Mask and Visual Prompt Tuning

  • paper_url: http://arxiv.org/abs/2308.09303
  • repo_url: https://github.com/moonjunyyy/si-blurry
  • paper_authors: Jun-Yeong Moon, Keon-Hee Park, Jung Uk Kim, Gyeong-Moon Park
  • for: This paper focuses on the problem of continual learning in real-world scenarios, where the number of input data and tasks is constantly changing in a statistical way, and proposes a new scenario called Stochastic Incremental Blurry (Si-Blurry) to reflect the stochastic properties of the real-world.
  • methods: The paper introduces a novel method called Mask and Visual Prompt tuning (MVP) to alleviate the inter- and intra-task forgetting issues and class imbalance problem in the Si-Blurry scenario. MVP includes a novel instance-wise logit masking and contrastive visual prompt tuning loss, as well as a new gradient similarity-based focal loss and adaptive feature scaling.
  • results: The paper shows that MVP significantly outperforms the existing state-of-the-art methods in the challenging Si-Blurry scenario through extensive experiments.
    Abstract Continual learning aims to learn a model from a continuous stream of data, but it mainly assumes a fixed number of data and tasks with clear task boundaries. However, in real-world scenarios, the number of input data and tasks is constantly changing in a statistical way, not a static way. Although recently introduced incremental learning scenarios having blurry task boundaries somewhat address the above issues, they still do not fully reflect the statistical properties of real-world situations because of the fixed ratio of disjoint and blurry samples. In this paper, we propose a new Stochastic incremental Blurry task boundary scenario, called Si-Blurry, which reflects the stochastic properties of the real-world. We find that there are two major challenges in the Si-Blurry scenario: (1) inter- and intra-task forgettings and (2) class imbalance problem. To alleviate them, we introduce Mask and Visual Prompt tuning (MVP). In MVP, to address the inter- and intra-task forgetting issues, we propose a novel instance-wise logit masking and contrastive visual prompt tuning loss. Both of them help our model discern the classes to be learned in the current batch. It results in consolidating the previous knowledge. In addition, to alleviate the class imbalance problem, we introduce a new gradient similarity-based focal loss and adaptive feature scaling to ease overfitting to the major classes and underfitting to the minor classes. Extensive experiments show that our proposed MVP significantly outperforms the existing state-of-the-art methods in our challenging Si-Blurry scenario.
    摘要 <>Continual learning目标是从连续的数据流中学习模型,但是它假设有固定的数据量和任务数,并且任务boundary是清晰的。然而,在实际情况下,输入数据和任务的数量在统计方式上不断变化,而不是静态的。虽然最近引入的增量学习场景中的模糊任务boundary有些地址了以上问题,但是它们仍然不能完全反映实际情况中的统计性质。在这篇论文中,我们提出了一种新的随机增量模糊任务boundary场景,称为Si-Blurry。我们发现在Si-Blurry场景中有两个主要挑战:(1)间和内任务忘记,以及(2)类划分问题。为了解决这些问题,我们提出了Mask和Visual Prompt tuning(MVP)技术。在MVP中,我们提出了一种新的实例级别的logit掩码和视觉提示调整损失,以帮助我们的模型在当前批处理中学习的类。这会帮助我们的模型保持之前的知识。此外,我们还引入了一种新的类相似性基于的焦点损失和自适应特征涨幅,以避免过拟合主要类和下降不适应次要类。我们的实验表明,我们的提出的MVP在我们的挑战性Si-Blurry场景中表现出色,舒适性强。Note: The translation is done using a machine translation tool, and may not be perfect or idiomatic.

Learning Reward Machines through Preference Queries over Sequences

  • paper_url: http://arxiv.org/abs/2308.09301
  • repo_url: None
  • paper_authors: Eric Hsiung, Joydeep Biswas, Swarat Chaudhuri
  • for: 学习任务中的复杂动作序列
  • methods: 使用 preference queries 和 symbolic observation table 等技术
  • results: 有correctness和termination garanties,并且能够准确地学习奖励机制
    Abstract Reward machines have shown great promise at capturing non-Markovian reward functions for learning tasks that involve complex action sequencing. However, no algorithm currently exists for learning reward machines with realistic weak feedback in the form of preferences. We contribute REMAP, a novel algorithm for learning reward machines from preferences, with correctness and termination guarantees. REMAP introduces preference queries in place of membership queries in the L* algorithm, and leverages a symbolic observation table along with unification and constraint solving to narrow the hypothesis reward machine search space. In addition to the proofs of correctness and termination for REMAP, we present empirical evidence measuring correctness: how frequently the resulting reward machine is isomorphic under a consistent yet inexact teacher, and the regret between the ground truth and learned reward machines.
    摘要 奖励机器有广泛应用于复杂动作序列学习任务中捕捉非马尔可夫奖函数的承诺。然而,目前没有一种算法可以学习奖励机器 WITH 实际的弱反馈。我们提出了REMAP算法,一种学习奖励机器 FROM 偏好的新算法,具有正确性和结束保证。REMAP在L*算法中替换了成员查询,引入了偏好查询,并利用符号观察表 along with 统一和约束解决方案,将假设奖励机器搜索空间缩小。此外,我们还提供了REMAP的正确性和结束证明,以及对学习后的奖励机器和真实奖励机器之间的差异进行实验证明。

CARLA: A Self-supervised Contrastive Representation Learning Approach for Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2308.09296
  • repo_url: None
  • paper_authors: Zahra Zamanzadeh Darban, Geoffrey I. Webb, Shirui Pan, Mahsa Salehi
    for: 这个研究旨在提出一个自动获取的时间序列异常检测方法,并且这个方法可以在单ivariate和多ivariate时间序列数据中检测异常。methods: 这个方法使用对照式表现学习,将时间序列窗口转换为高度相似的表现,并且透过自动获取的方法来分类正常和异常表现。results: 这个方法在7个标准的实际世界时间序列异常检测 benchmark dataset 上得到了较高的 F1 和 AU-PR 成绩,较先进的州前方法。
    Abstract We introduce a Self-supervised Contrastive Representation Learning Approach for Time Series Anomaly Detection (CARLA), an innovative end-to-end self-supervised framework carefully developed to identify anomalous patterns in both univariate and multivariate time series data. By taking advantage of contrastive representation learning, We introduce an innovative end-to-end self-supervised deep learning framework carefully developed to identify anomalous patterns in both univariate and multivariate time series data. By taking advantage of contrastive representation learning, CARLA effectively generates robust representations for time series windows. It achieves this by 1) learning similar representations for temporally close windows and dissimilar representations for windows and their equivalent anomalous windows and 2) employing a self-supervised approach to classify normal/anomalous representations of windows based on their nearest/furthest neighbours in the representation space. Most of the existing models focus on learning normal behaviour. The normal boundary is often tightly defined, which can result in slight deviations being classified as anomalies, resulting in a high false positive rate and limited ability to generalise normal patterns. CARLA's contrastive learning methodology promotes the production of highly consistent and discriminative predictions, thereby empowering us to adeptly address the inherent challenges associated with anomaly detection in time series data. Through extensive experimentation on 7 standard real-world time series anomaly detection benchmark datasets, CARLA demonstrates F1 and AU-PR superior to existing state-of-the-art results. Our research highlights the immense potential of contrastive representation learning in advancing the field of time series anomaly detection, thus paving the way for novel applications and in-depth exploration in this domain.
    摘要 我们介绍一个自我超vised contrastive representation learning方法(CARLA),这是一个创新的端到端自我超vised深度学习框架,用于时间序列异常探测。通过利用对比表现学习,CARLA具有生成高效的时间序列窗口表现的能力。它通过以下两个方法来实现这一点:1. 学习近似时间序列窗口的表现,并将其与其等效的异常窗口表现分开。2. 使用自我超vised的方法来分类正常/异常窗口表现,基于它们与其最近/最远邻的表现空间。现有的模型通常专注于学习正常行为,但是这会导致紧密定义正常边界,从而导致微小差异被识别为异常,导致伪阳性率高且难以扩展正常模式。CARLA的对比学习方法ologies生成高度一致和描述性的预测,因此可以有效地处理时间序列异常探测中的自然挑战。经过广泛的实验,CARLA在7个标准的真实世界时间序列异常探测 benchmark 数据集上显示 F1 和 AU-PR 的成绩superior 于现有的州�� nich 的结果。我们的研究显示了对比表现学习在时间序列异常探测领域的潜在应用和深入探索的潜力。

How important are specialized transforms in Neural Operators?

  • paper_url: http://arxiv.org/abs/2308.09293
  • repo_url: https://github.com/Ritam-M/LearnableTransformsNO
  • paper_authors: Ritam Majumdar, Shirish Karande, Lovekesh Vig
  • for: 研究transform基于神经网络的partial differential equation(PDE)解决方法,以提高现代工业过程优化的效率。
  • methods: 使用Transform-based Neural Operators,如傅里叶 нейрон运算和wavelet нейрон运算,来解决PDEs。
  • results: 发现使用learnable线性层可以提供与最佳transform层相同的性能,并且计算时间更短。这些结果可能对未来神经网络架构的研究有重要意义,并可能暴露其他效率来源。
    Abstract Simulating physical systems using Partial Differential Equations (PDEs) has become an indispensible part of modern industrial process optimization. Traditionally, numerical solvers have been used to solve the associated PDEs, however recently Transform-based Neural Operators such as the Fourier Neural Operator and Wavelet Neural Operator have received a lot of attention for their potential to provide fast solutions for systems of PDEs. In this work, we investigate the importance of the transform layers to the reported success of transform based neural operators. In particular, we record the cost in terms of performance, if all the transform layers are replaced by learnable linear layers. Surprisingly, we observe that linear layers suffice to provide performance comparable to the best-known transform-based layers and seem to do so with a compute time advantage as well. We believe that this observation can have significant implications for future work on Neural Operators, and might point to other sources of efficiencies for these architectures.
    摘要 使用偏微分方程(PDE)模拟物理系统已成为现代工业过程优化的不可或缺的一部分。在过去,数值方法通常用来解决相关的PDE,但最近,基于变换的神经网络运算器,如傅ри尼尔神经网络运算器和波干神经网络运算器,受到了很多关注,因为它们可以提供解决PDE的快速方法。在这项工作中,我们调查了变换层对报道的success的重要性。 Specifically,我们记录了将所有变换层替换为学习型线性层后的成本,以及性能的影响。我们发现,使用学习型线性层可以提供与最佳变换基于层相当的性能,并且在计算时间方面也有优势。我们认为这一观察可能对未来神经网络运算器的研究产生重要的影响,并可能指向其他效率来源。

Graph-based Alignment and Uniformity for Recommendation

  • paper_url: http://arxiv.org/abs/2308.09292
  • repo_url: https://github.com/yangliangwei/graphau
  • paper_authors: Liangwei Yang, Zhiwei Liu, Chen Wang, Mingdai Yang, Xiaolong Liu, Jing Ma, Philip S. Yu
  • for: solves the sparsity issue in collaborative filtering-based recommender systems (RecSys) by using graph-based alignment and uniformity (GraphAU) to learn representations for users and items.
  • methods: uses a novel approach called GraphAU, which explicitly considers high-order connectivities in the user-item bipartite graph to align the user/item embedding to the dense vector representations of high-order neighbors.
  • results: significantly alleviates the sparsity issue and achieves state-of-the-art performance on four datasets.Here is the full text in Simplified Chinese:
  • for: 这个论文是为了解决基于协同推荐系统(RecSys)的用户和项目的表示学习中的缺乏问题。
  • methods: 该论文提出了一种新的方法called GraphAU,它在用户-项目二分图中显式考虑高阶连接关系,将用户/项目的嵌入align到高阶邻居的稀盐vector表示中。
  • results: GraphAU可以减少缺乏问题,并在四个数据集上实现了状态机器人的表现。
    Abstract Collaborative filtering-based recommender systems (RecSys) rely on learning representations for users and items to predict preferences accurately. Representation learning on the hypersphere is a promising approach due to its desirable properties, such as alignment and uniformity. However, the sparsity issue arises when it encounters RecSys. To address this issue, we propose a novel approach, graph-based alignment and uniformity (GraphAU), that explicitly considers high-order connectivities in the user-item bipartite graph. GraphAU aligns the user/item embedding to the dense vector representations of high-order neighbors using a neighborhood aggregator, eliminating the need to compute the burdensome alignment to high-order neighborhoods individually. To address the discrepancy in alignment losses, GraphAU includes a layer-wise alignment pooling module to integrate alignment losses layer-wise. Experiments on four datasets show that GraphAU significantly alleviates the sparsity issue and achieves state-of-the-art performance. We open-source GraphAU at https://github.com/YangLiangwei/GraphAU.
    摘要 Translated into Simplified Chinese:共同推荐系统(RecSys)通过学习用户和物品的表示来预测用户的偏好。 表示学习在圆上是一种有利的方法,因为它具有一些恰当的性质,如对齐和均匀性。但当它遇到RecSys时,缺乏性问题出现。为解决这个问题,我们提出了一种新的方法——基于图的对齐和均匀性(GraphAU),它直接考虑用户-物品 дву价图中的高阶连接。GraphAU将用户/物品的嵌入对高阶邻居的稠密向量表示进行对齐,无需计算高阶邻居的困难对齐。为了解决层次对齐损失的差异,GraphAU包含层次对齐池化模块,将层次对齐损失集成。实验表明,GraphAU可以有效地解决缺乏性问题,并达到状态机器的性能。我们在https://github.com/YangLiangwei/GraphAU中开源GraphAU。

HyperLoRA for PDEs

  • paper_url: http://arxiv.org/abs/2308.09290
  • repo_url: None
  • paper_authors: Ritam Majumdar, Vishal Jadhav, Anirudh Deodhar, Shirish Karande, Lovekesh Vig, Venkataramana Runkana
  • For: develop neural surrogates for solutions of Partial Differential Equations* Methods: Hypernetworks, low-ranked adaptation (LoRA)* Results: 8x reduction in prediction parameters on average without compromising on accuracy, improved generalization capabilities for parameterized PDEs like Burger’s equation and Navier Stokes: Kovasznay flow.
    Abstract Physics-informed neural networks (PINNs) have been widely used to develop neural surrogates for solutions of Partial Differential Equations. A drawback of PINNs is that they have to be retrained with every change in initial-boundary conditions and PDE coefficients. The Hypernetwork, a model-based meta learning technique, takes in a parameterized task embedding as input and predicts the weights of PINN as output. Predicting weights of a neural network however, is a high-dimensional regression problem, and hypernetworks perform sub-optimally while predicting parameters for large base networks. To circumvent this issue, we use a low ranked adaptation (LoRA) formulation to decompose every layer of the base network into low-ranked tensors and use hypernetworks to predict the low-ranked tensors. Despite the reduced dimensionality of the resulting weight-regression problem, LoRA-based Hypernetworks violate the underlying physics of the given task. We demonstrate that the generalization capabilities of LoRA-based hypernetworks drastically improve when trained with an additional physics-informed loss component (HyperPINN) to satisfy the governing differential equations. We observe that LoRA-based HyperPINN training allows us to learn fast solutions for parameterized PDEs like Burger's equation and Navier Stokes: Kovasznay flow, while having an 8x reduction in prediction parameters on average without compromising on accuracy when compared to all other baselines.
    摘要 物理学信息泛化神经网络(PINNs)已广泛应用于解决部分偏微分方程的解的神经替换问题。PINNs的缺点是它们需要每次更改初始边界条件和偏微分方程的系数时重新训练。基于模型的元学习技术,超网络(Hypernetwork)可以将任务嵌入作为输入,预测PINN的权重。但是,预测神经网络的参数是一个高维度回归问题,超网络在预测参数时表现不佳,特别是预测大基础网络的参数。为解决这个问题,我们使用低级别适应(LoRA)形式将每层基础网络分解成低级别矩阵,并使用超网络预测低级别矩阵。尽管结果的维度减少,但LoRA基于的超网络仍然违反了给定任务的物理基础。我们表明,在增加物理学信息泛化损失组件(HyperPINN)进行训练后,LoRA基于的超网络的泛化能力得到了悉数提高。我们发现,LoRA基于的HyperPINN训练可以快速解决参数化的偏微分方程,如布尔格方程和奈尔-斯托克斯方程:kovasznay流,而无需增加预测参数的数量,平均每个基eline下降8倍,而且不会妥协准确性。

A hybrid Decoder-DeepONet operator regression framework for unaligned observation data

  • paper_url: http://arxiv.org/abs/2308.09274
  • repo_url: https://github.com/cb-sjtu/decoder_deeponet
  • paper_authors: Bo Chen, Chenyu Wang, Weipeng Li, Haiyang Fu
    for:这个研究的目的是解决深度神经网络(DNO)在非线性映射函数空间中的维度和计算成本问题,尤其是在面对不同观察数据时。methods:该研究提出了一种混合Decoder-DeepONet操作器 regression框架,以及一种Multi-Decoder-DeepONet,它使用了训练数据的平均场作为输入增强。这两种方法都基于操作器近似理论,并且在数学上提供了一致性。results:两个数值实验(Darcy问题和风流附近飞机翼)证明了Decoder-DeepONet和Multi-Decoder-DeepONet在面对不同观察数据时的效果和准确性。这些方法可以改善预测精度,特别是在面对高维度和复杂的数据时。
    Abstract Deep neural operators (DNOs) have been utilized to approximate nonlinear mappings between function spaces. However, DNOs face the challenge of increased dimensionality and computational cost associated with unaligned observation data. In this study, we propose a hybrid Decoder-DeepONet operator regression framework to handle unaligned data effectively. Additionally, we introduce a Multi-Decoder-DeepONet, which utilizes an average field of training data as input augmentation. The consistencies of the frameworks with the operator approximation theory are provided, on the basis of the universal approximation theorem. Two numerical experiments, Darcy problem and flow-field around an airfoil, are conducted to validate the efficiency and accuracy of the proposed methods. Results illustrate the advantages of Decoder-DeepONet and Multi-Decoder-DeepONet in handling unaligned observation data and showcase their potentials in improving prediction accuracy.
    摘要 深度神经运算符 (DNO) 已经用于approximate非线性映射 между函数空间。然而,DNO 面临不对采样数据进行Alignment的挑战,增加维度和计算成本。在这项研究中,我们提出了一种混合Decoder-DeepONet操作量 regression框架,以有效地处理不对采样数据。此外,我们还引入了Multi-Decoder-DeepONet,它使用训练数据的平均场为输入增强。我们提供了基于运算符approximation理论的一致性,以确保方法的正确性。在Darcy问题和流体风洞附近的飞行器上进行了两个数值实验,以验证提出的方法的效率和准确性。结果表明Decoder-DeepONet和Multi-Decoder-DeepONet在处理不对采样数据方面具有优势,并且它们在改进预测精度方面具有潜力。

Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality Assessment Model

  • paper_url: http://arxiv.org/abs/2308.09262
  • repo_url: None
  • paper_authors: Ryandhimas E. Zezario, Bo-Ren Brian Bai, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao
  • for: This paper proposes a multi-task pseudo-label (MPL) learning approach for a non-intrusive speech quality assessment model.
  • methods: The MPL approach consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The model is optimized using a Huber loss function.
  • results: The proposed MPL approach outperforms training the model from scratch and using knowledge transfer mechanisms. Additionally, the use of Huber loss improves the prediction capabilities of the model.
    Abstract This study introduces multi-task pseudo-label (MPL) learning for a non-intrusive speech quality assessment model. MPL consists of two stages which are obtaining pseudo-label scores from a pretrained model and performing multi-task learning. The 3QUEST metrics, namely Speech-MOS (S-MOS), Noise-MOS (N-MOS), and General-MOS (G-MOS) are selected as the primary ground-truth labels. Additionally, the pretrained MOSA-Net model is utilized to estimate three pseudo-labels: perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI), and speech distortion index (SDI). Multi-task learning stage of MPL is then employed to train the MTQ-Net model (multi-target speech quality assessment network). The model is optimized by incorporating Loss supervision (derived from the difference between the estimated score and the real ground-truth labels) and Loss semi-supervision (derived from the difference between the estimated score and pseudo-labels), where Huber loss is employed to calculate the loss function. Experimental results first demonstrate the advantages of MPL compared to training the model from scratch and using knowledge transfer mechanisms. Secondly, the benefits of Huber Loss in improving the prediction model of MTQ-Net are verified. Finally, the MTQ-Net with the MPL approach exhibits higher overall prediction capabilities when compared to other SSL-based speech assessment models.
    摘要 In the multi-task learning stage, the MTQ-Net model (multi-target speech quality assessment network) is trained using the estimated pseudo-labels and real ground-truth labels. The model is optimized using a loss function that incorporates both Loss supervision (the difference between the estimated score and the real ground-truth labels) and Loss semi-supervision (the difference between the estimated score and pseudo-labels). The Huber loss is employed to calculate the loss function.Experimental results show that the MPL approach outperforms training the model from scratch and using knowledge transfer mechanisms. Additionally, the benefits of using the Huber loss in improving the prediction model of MTQ-Net are demonstrated. Finally, the MTQ-Net with the MPL approach exhibits higher overall prediction capabilities compared to other speech assessment models based on semi-supervised learning.

Distribution shift mitigation at test time with performance guarantees

  • paper_url: http://arxiv.org/abs/2308.09259
  • repo_url: None
  • paper_authors: Rui Ding, Jielong Yang, Feng Ji, Xionghu Zhong, Linbo Xie
  • For: The paper aims to address the challenge of distribution shift in Graph Neural Networks (GNNs), which can negatively impact the test performance of the model.* Methods: The proposed framework, called FR-GNN, constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. The reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time.* Results: The paper shows that the proposed FR-GNN framework can effectively reduce the distribution shift and improve the test performance of GNNs without modifying the model structure or parameters. The experimental results demonstrate the superior performance of FR-GNN in comparison to mainstream methods on various public datasets.
    Abstract Due to inappropriate sample selection and limited training data, a distribution shift often exists between the training and test sets. This shift can adversely affect the test performance of Graph Neural Networks (GNNs). Existing approaches mitigate this issue by either enhancing the robustness of GNNs to distribution shift or reducing the shift itself. However, both approaches necessitate retraining the model, which becomes unfeasible when the model structure and parameters are inaccessible. To address this challenge, we propose FR-GNN, a general framework for GNNs to conduct feature reconstruction. FRGNN constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. These reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time. Notably, the reconstructed node features can be directly utilized for testing the well-trained model, effectively reducing the distribution shift and leading to improved test performance. This remarkable achievement is attained without any modifications to the model structure or parameters. We provide theoretical guarantees for the effectiveness of our framework. Furthermore, we conduct comprehensive experiments on various public datasets. The experimental results demonstrate the superior performance of FRGNN in comparison to mainstream methods.
    摘要 FR-GNN constructs a mapping relationship between the output and input of a well-trained GNN to obtain class representative embeddings and then uses these embeddings to reconstruct the features of labeled nodes. These reconstructed features are then incorporated into the message passing mechanism of GNNs to influence the predictions of unlabeled nodes at test time. Notably, the reconstructed node features can be directly utilized for testing the well-trained model, effectively reducing the distribution shift and leading to improved test performance. This remarkable achievement is attained without any modifications to the model structure or parameters. We provide theoretical guarantees for the effectiveness of our framework.In addition, we conduct comprehensive experiments on various public datasets, and the experimental results demonstrate the superior performance of FR-GNN in comparison to mainstream methods.

Capacity Bounds for Hyperbolic Neural Network Representations of Latent Tree Structures

  • paper_url: http://arxiv.org/abs/2308.09250
  • repo_url: None
  • paper_authors: Anastasis Kratsios, Ruiyang Hong, Haitz Sáez de Ocáriz Borde
  • for: 这篇论文是研究深度征有折射函数的折射神经网络(HNN)的表示能力的。
  • methods: 这篇论文使用了ReLU activation function,并提供了首次证明HNN可以在任意权重树上实现$\varepsilon$-同构嵌入,并且可以在折射空间中嵌入树的尺寸为至少$d$,并且权重树的sectional curvature为$\kappa<0$。
  • results: 这篇论文提供了关于HNN实现图像表示的网络复杂度的精确上限,并且发现网络复杂度与表示质量之间无直接关系。此外,这篇论文还提供了对任意ReLU多层感知机(MLP)在折射空间中嵌入树的下界,并证明任何ReLU MLP在嵌入树时必须夹带至少$\Omega(L^{1/d})$的质量损失。
    Abstract We study the representation capacity of deep hyperbolic neural networks (HNNs) with a ReLU activation function. We establish the first proof that HNNs can $\varepsilon$-isometrically embed any finite weighted tree into a hyperbolic space of dimension $d$ at least equal to $2$ with prescribed sectional curvature $\kappa<0$, for any $\varepsilon> 1$ (where $\varepsilon=1$ being optimal). We establish rigorous upper bounds for the network complexity on an HNN implementing the embedding. We find that the network complexity of HNN implementing the graph representation is independent of the representation fidelity/distortion. We contrast this result against our lower bounds on distortion which any ReLU multi-layer perceptron (MLP) must exert when embedding a tree with $L>2^d$ leaves into a $d$-dimensional Euclidean space, which we show at least $\Omega(L^{1/d})$; independently of the depth, width, and (possibly discontinuous) activation function defining the MLP.
    摘要 我们研究深度偏折 neural network (HNN) 的表示能力,使用 ReLU 活化函数。我们证明了 HNN 可以将任意有限重量树 $\varepsilon$-同构 embedding 到半正交空间中,其维度至少为 $2$,且sectional curvature $\kappa<0$,对任何 $\varepsilon>1$(其中 $\varepsilon=1$ 是最优)。我们提供了精确的网络复杂度上限,用于表示实现。我们发现网络复杂度与表示质量/扭曲度无关。我们与这些下界对任何 ReLU 多层感知机 (MLP) 将树图 embedding 到 $d$-维欧式空间中的至少 $\Omega(L^{1/d})$ 的扭曲度进行比较,独立于深度、宽度和(可能破折)活化函数定义 MLP。

Active and Passive Causal Inference Learning

  • paper_url: http://arxiv.org/abs/2308.09248
  • repo_url: None
  • paper_authors: Daniel Jiwoong Im, Kyunghyun Cho
  • for: 这篇论文是为机器学习研究者、工程师和学生准备的一个入门课程,旨在介绍 causal inference 的基本概念和技术。
  • methods: 论文从一系列重要的假设出发,如交换性、正性、一致性和不干扰,然后建立了一系列重要的 causal inference 技术,分为两个杯子:活跃和被动两类。
  • results: 论文介绍了一些常见的 causal inference 技术,包括随机控制试验和短剑算法,以及经典的匹配和反射权重等方法。通过完善一些 causal inference 的缺失方面,如卷积风险、聚合风险等,论文预期能为读者提供一个多样化的入门点,进一步研究和探索 causal inference 领域。
    Abstract This paper serves as a starting point for machine learning researchers, engineers and students who are interested in but not yet familiar with causal inference. We start by laying out an important set of assumptions that are collectively needed for causal identification, such as exchangeability, positivity, consistency and the absence of interference. From these assumptions, we build out a set of important causal inference techniques, which we do so by categorizing them into two buckets; active and passive approaches. We describe and discuss randomized controlled trials and bandit-based approaches from the active category. We then describe classical approaches, such as matching and inverse probability weighting, in the passive category, followed by more recent deep learning based algorithms. By finishing the paper with some of the missing aspects of causal inference from this paper, such as collider biases, we expect this paper to provide readers with a diverse set of starting points for further reading and research in causal inference and discovery.
    摘要 这篇论文作为机器学习研究者、工程师和学生们的入门文献,旨在介绍 causal inference 的基本概念和方法。我们从共同需要的假设出发,包括交换性、正性、一致性和没有干扰。从这些假设中,我们构建了一组重要的 causal inference 技术,并将其分为两个托管:活动和被动approaches。我们描述了Randomized controlled trials和bandit-based approaches从活动类别,然后描述了经典方法,如匹配和逆序权重法,以及更近期的深度学习基于算法。通过结束文章中的一些 causal inference 的缺失方面,如融合偏见,我们期望这篇文章能为读者提供一个多样化的入门点,进一步阅读和研究 causal inference 和发现。

A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

  • paper_url: http://arxiv.org/abs/2308.09734
  • repo_url: None
  • paper_authors: Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu
  • for: solves the problem of non-stationary dynamics in multi-objective reinforcement learning
  • methods: developmental optimization approach, novel multi-objective reinforcement learning algorithm
  • results: significantly outperforms state-of-the-art algorithms in non-stationary environments while achieving comparable results in stationary environments.Here’s the same information in Simplified Chinese:
  • for: 解决多bjective reinforcement learning中的非站点环境问题
  • methods: 发展优化方法,新型多bjective reinforcement learning算法
  • results: 在非站点环境下显著超越现有算法,在站点环境下达到相对的比较Result。
    Abstract Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem. This paper introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.
    摘要 多目标Markov决策过程是特殊的多目标优化问题,它涉及到顺序做出决策,并满足Markov性的随机过程的属性。多目标激励学习方法 Addresses this problem by combining the reinforcement learning paradigm with multi-objective optimization techniques. However, one major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem.This paper proposes a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.Here's the translation in Traditional Chinese:多目标Markov决策过程是特殊的多目标优化问题,它涉及到顺序做出决策,并满足Markov性的随机过程的属性。多目标激励学习方法 Addresses this problem by combining the reinforcement learning paradigm with multi-objective optimization techniques. However, one major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem.This paper proposes a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

Intrinsically Motivated Hierarchical Policy Learning in Multi-objective Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2308.09733
  • repo_url: None
  • paper_authors: Sherif Abdelfattah, Kathryn Merrick, Jiankun Hu
  • for: 解决多目标Markov决策过程中的多个矛盾奖励函数不能同时优化,而是需要一种权衡策略来满足所有可能的偏好。
  • methods: 使用了自适应奖励学习方法,通过学习一个内在motivated的技能集来演化策略覆盖集,从而实现持续学习过程。
  • results: 在动态环境中,提出了一种新的双相自适应奖励学习方法,在第一阶段学习一个通用的技能集,然后在第二阶段使用这个集来启动策略覆盖集的演化,实验显示该方法可以significantly outperform现有的多目标奖励学习方法。
    Abstract Multi-objective Markov decision processes are sequential decision-making problems that involve multiple conflicting reward functions that cannot be optimized simultaneously without a compromise. This type of problems cannot be solved by a single optimal policy as in the conventional case. Alternatively, multi-objective reinforcement learning methods evolve a coverage set of optimal policies that can satisfy all possible preferences in solving the problem. However, many of these methods cannot generalize their coverage sets to work in non-stationary environments. In these environments, the parameters of the state transition and reward distribution vary over time. This limitation results in significant performance degradation for the evolved policy sets. In order to overcome this limitation, there is a need to learn a generic skill set that can bootstrap the evolution of the policy coverage set for each shift in the environment dynamics therefore, it can facilitate a continuous learning process. In this work, intrinsically motivated reinforcement learning has been successfully deployed to evolve generic skill sets for learning hierarchical policies to solve multi-objective Markov decision processes. We propose a novel dual-phase intrinsically motivated reinforcement learning method to address this limitation. In the first phase, a generic set of skills is learned. While in the second phase, this set is used to bootstrap policy coverage sets for each shift in the environment dynamics. We show experimentally that the proposed method significantly outperforms state-of-the-art multi-objective reinforcement methods in a dynamic robotics environment.
    摘要 多目标Markov决策过程是一种следова意决策问题,其涉及多个矛盾奖励函数,这些奖励函数无法同时优化。这类问题不能通过单一优化策略来解决,而是需要一组优化策略来满足所有可能的偏好。然而,许多多目标学习方法无法扩展其覆盖集来处理不稳定环境。在这些环境中,状态转移和奖励分布的参数会随时间变化,这会导致优化策略集的性能下降。为了解决这一限制,需要学习一个通用技能集,以便在环境动态变化时,使用这个技能集来演化策略覆盖集,从而实现连续学习过程。在这种情况下,我们成功地应用了内在激励学习来演化通用技能集,以解决多目标Markov决策过程中的限制。我们提出了一种新的双期内在激励学习方法,其中,在第一阶段,学习一个通用技能集;在第二阶段,使用这个技能集来演化策略覆盖集 для每个环境动态变化。我们实验ally表明,提议的方法在动态 робо特环境中显著超过了当前最佳多目标学习方法。

Generalized Sum Pooling for Metric Learning

  • paper_url: http://arxiv.org/abs/2308.09228
  • repo_url: https://github.com/yetigurbuz/generalized-sum-pooling
  • paper_authors: Yeti Z. Gurbuz, Ozan Sener, A. Aydın Alatan
  • for: 本研究旨在提出一种可学习的总和池化方法(Generalized Sum Pooling,GSP),用于深度度量学习中的核心 pooling 阶段。
  • methods: 本研究提出了一种基于信息ENTROPY的优化transport问题,用于学习核心池化方法。此外,还提出了一种零批评损函数,用于帮助学习GSP。
  • results: 实验结果表明,GSP可以提高深度度量学习中的性能,并且可以快速地学习到有用的semantic entity。Code可以在GSP-DML Framework中找到。
    Abstract A common architectural choice for deep metric learning is a convolutional neural network followed by global average pooling (GAP). Albeit simple, GAP is a highly effective way to aggregate information. One possible explanation for the effectiveness of GAP is considering each feature vector as representing a different semantic entity and GAP as a convex combination of them. Following this perspective, we generalize GAP and propose a learnable generalized sum pooling method (GSP). GSP improves GAP with two distinct abilities: i) the ability to choose a subset of semantic entities, effectively learning to ignore nuisance information, and ii) learning the weights corresponding to the importance of each entity. Formally, we propose an entropy-smoothed optimal transport problem and show that it is a strict generalization of GAP, i.e., a specific realization of the problem gives back GAP. We show that this optimization problem enjoys analytical gradients enabling us to use it as a direct learnable replacement for GAP. We further propose a zero-shot loss to ease the learning of GSP. We show the effectiveness of our method with extensive evaluations on 4 popular metric learning benchmarks. Code is available at: GSP-DML Framework
    摘要 一般而言,深度度量学中常用的建筑方式是一个卷积神经网络,然后跟global average pooling(GAP)。尽管简单,但GAP是一种非常有效的信息汇总方法。一种可能的解释是每个特征向量都代表不同的semantic entity,GAP是这些entity的 convex combination。基于这个视角,我们总结GAP并提出一种学习可能的总和方法(GSP)。GSP在两个方面提高GAP:一是可以选择一 subset of semantic entity,有效地忽略干扰信息,二是学习每个entity的重要性权重。我们提出一个 entropy-smoothed optimal transport问题,并证明它是GAP的严格推广,即这个问题的特定实现可以返回GAP。我们表明这个优化问题具有分析性Gradient,可以作为直接学习替换GAP的方法。此外,我们还提出了一种零拟损优化方法,以便学习GSP。我们通过对4种常见度量学权重 benchmark进行广泛的评估,证明了我们的方法的有效性。代码可以在:GSP-DML Framework 中找到。

Advancing Relation Extraction through Language Probing with Exemplars from Set Co-Expansion

  • paper_url: http://arxiv.org/abs/2308.11720
  • repo_url: None
  • paper_authors: Yerong Li, Roxana Girju
    for:This paper focuses on improving relation extraction accuracy and reducing confusion between contrastive classes.methods:The proposed approach uses representative examples and co-set expansion, incorporating similarity measures between target pairs and representative pairs from the target class. Contextual details are harnessed via context-free Hearst patterns to ascertain contextual similarity.results:The co-set expansion approach significantly enhances relation classification performance, achieving an observed margin of at least 1 percent improvement in accuracy in most settings. Tuning contrastive examples further refines the approach, reducing confusion between classes sharing similarities and leading to more precise classification.
    Abstract Relation Extraction (RE) is a pivotal task in automatically extracting structured information from unstructured text. In this paper, we present a multi-faceted approach that integrates representative examples and through co-set expansion. The primary goal of our method is to enhance relation classification accuracy and mitigating confusion between contrastive classes. Our approach begins by seeding each relationship class with representative examples. Subsequently, our co-set expansion algorithm enriches training objectives by incorporating similarity measures between target pairs and representative pairs from the target class. Moreover, the co-set expansion process involves a class ranking procedure that takes into account exemplars from contrastive classes. Contextual details encompassing relation mentions are harnessed via context-free Hearst patterns to ascertain contextual similarity. Empirical evaluation demonstrates the efficacy of our co-set expansion approach, resulting in a significant enhancement of relation classification performance. Our method achieves an observed margin of at least 1 percent improvement in accuracy in most settings, on top of existing fine-tuning approaches. To further refine our approach, we conduct an in-depth analysis that focuses on tuning contrastive examples. This strategic selection and tuning effectively reduce confusion between classes sharing similarities, leading to a more precise classification process. Experimental results underscore the effectiveness of our proposed framework for relation extraction. The synergy between co-set expansion and context-aware prompt tuning substantially contributes to improved classification accuracy. Furthermore, the reduction in confusion between contrastive classes through contrastive examples tuning validates the robustness and reliability of our method.
    摘要 relation extraction (RE) 是自动提取结构化信息的重要任务。在这篇论文中,我们提出了一种多方面的方法,该方法通过代表性例子和相似度扩展来提高关系分类精度。我们的方法首先将每个关系类型中的代表性例子填充。然后,我们的相似度扩展算法在训练目标中添加了类似度测试。此外,相似度扩展过程还包括一个类别排名过程,该过程考虑了相似类型的表达。Contextual details surrounding relation mentions are harnessed using context-free Hearst patterns to determine contextual similarity。Empirical evaluation shows the effectiveness of our co-set expansion approach, resulting in a significant improvement in relation classification performance. Our method achieves an observed margin of at least 1 percent improvement in accuracy in most settings, on top of existing fine-tuning approaches。为了进一步改进我们的方法,我们进行了深入的分析,集中在调整对比例例子上。这种筛选和调整有效地减少了类似类型之间的混淆,从而使得分类更加精准。实验结果证明我们提出的框架对relation extraction是有效的。它的同时涵盖相似度扩展和上下文相关的提示调整,导致了改进的分类精度。此外,通过对相似类型之间的混淆进行调整,我们的方法的可靠性和可靠性得到了证明。

DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction

  • paper_url: http://arxiv.org/abs/2308.09223
  • repo_url: https://github.com/hexiaoxiao-cs/dmcvr
  • paper_authors: Xiaoxiao He, Chaowei Tan, Ligong Han, Bo Liu, Leon Axel, Kang Li, Dimitris N. Metaxas
  • for: 提高心血管疾病诊断和治疗规划的准确三维心脏重建
  • methods: 基于生成模型的心脏形态导导模型(DMCVR),通过合并高分辨率二维心脏图像和相应的三维重建体积来提高三维心脏重建质量
  • results: 在多个方面比前方法高效,包括二维生成和三维重建性能,并且可以生成高分辨率三维心脏MRI重建图像,超过现有技术水平
    Abstract Accurate 3D cardiac reconstruction from cine magnetic resonance imaging (cMRI) is crucial for improved cardiovascular disease diagnosis and understanding of the heart's motion. However, current cardiac MRI-based reconstruction technology used in clinical settings is 2D with limited through-plane resolution, resulting in low-quality reconstructed cardiac volumes. To better reconstruct 3D cardiac volumes from sparse 2D image stacks, we propose a morphology-guided diffusion model for 3D cardiac volume reconstruction, DMCVR, that synthesizes high-resolution 2D images and corresponding 3D reconstructed volumes. Our method outperforms previous approaches by conditioning the cardiac morphology on the generative model, eliminating the time-consuming iterative optimization process of the latent code, and improving generation quality. The learned latent spaces provide global semantics, local cardiac morphology and details of each 2D cMRI slice with highly interpretable value to reconstruct 3D cardiac shape. Our experiments show that DMCVR is highly effective in several aspects, such as 2D generation and 3D reconstruction performance. With DMCVR, we can produce high-resolution 3D cardiac MRI reconstructions, surpassing current techniques. Our proposed framework has great potential for improving the accuracy of cardiac disease diagnosis and treatment planning. Code can be accessed at https://github.com/hexiaoxiao-cs/DMCVR.
    摘要 精准的3D心脏重建从 cinematic magnet共振成像(cMRI)是诊断心血管疾病和心脏运动的关键。然而,现有的心脏MRI基于的重建技术在临床 Settings 中只有2D的,导致重建的心脏体积质量低下。为了更好地从稀疏的2D图像堆栈中重建3D心脏体积,我们提议一种基于几何学指导的扩散模型,称为DMCVR,该模型可以将高分辨率的2D图像和相应的3D重建体积相互conditioning。我们的方法超越先前的方法,因为它们不需要质量优化过程,提高生成质量。学习的秘密空间提供了全球 semantics,局部心脏形态和每个2D cMRI slice 的高度解释性。我们的实验表明,DMCVR 在多个方面具有高效性,例如2D生成和3D重建性能。通过DMCVR,我们可以生成高分辨率的3D心脏MRI重建,超越当前技术。我们提出的框架具有较大的诊断心血管疾病和治疗规划的潜在价值。代码可以在https://github.com/hexiaoxiao-cs/DMCVR 中下载。

Baird Counterexample Is Solved: with an example of How to Debug a Two-time-scale Algorithm

  • paper_url: http://arxiv.org/abs/2308.09732
  • repo_url: None
  • paper_authors: Hengshuai Yao
  • for: 这个论文是为了解释帕尔德对例中的TD(0)算法的异常行为,以及对此例进行测试和比较偏离政策学习算法的性能。
  • methods: 这个论文使用了调度TD算法的分析和调试技术,以解释TD(0)算法在帕尔德对例中的慢速收敛问题。
  • results: 该论文提供了一种调度TD算法的方法,可以在帕尔德对例中实现TD解决方案的快速收敛。此外,论文还提供了实验结果,表明该算法在帕尔德对例中的收敛速率是线性的。因此,该论文结论认为,帕尔德对例已经得到了解决。
    Abstract Baird counterexample was proposed by Leemon Baird in 1995, first used to show that the Temporal Difference (TD(0)) algorithm diverges on this example. Since then, it is often used to test and compare off-policy learning algorithms. Gradient TD algorithms solved the divergence issue of TD on Baird counterexample. However, their convergence on this example is still very slow, and the nature of the slowness is not well understood, e.g., see (Sutton and Barto 2018). This note is to understand in particular, why TDC is slow on this example, and provide debugging analysis to understand this behavior. Our debugging technique can be used to study the convergence behavior of two-time-scale stochastic approximation algorithms. We also provide empirical results of the recent Impression GTD algorithm on this example, showing the convergence is very fast, in fact, in a linear rate. We conclude that Baird counterexample is solved, by an algorithm with convergence guarantee to the TD solution in general and a fast convergence rate.
    摘要 白尔德Counterexample(Baird counterexample)在1995年被李姓飞(Leemon Baird)提出,用以证明TD(0)算法在这个例子中出现偏离。 desde entonces, 它经常用于测试和比较不同的离散学习算法。 梯度TD算法解决了TD算法在白尔德Counterexample中的偏离问题,但是它们在这个例子中的收敛速度非常慢,并且这种慢速度的原因还不很了解,例如参见(Sutton和Barto 2018)。 本文的目的是理解特别是TD梯度算法在这个例子中的慢收敛问题,并提供调试分析以理解这种行为。 我们的调试技术可以用来研究两个时间尺度的随机抽象算法的收敛行为。 我们还提供了最近的Impression GTD算法在这个例子中的实验结果,显示其收敛非常快,甚至在线性率。 我们 conclude that Baird counterexample已经得到解决,并且TD梯度算法在总的来说具有收敛保证和快速收敛速度。

A Model-Agnostic Framework for Recommendation via Interest-aware Item Embeddings

  • paper_url: http://arxiv.org/abs/2308.09202
  • repo_url: None
  • paper_authors: Amit Kumar Jaiswal, Yu Xiong
  • for: 本研究旨在提高推荐系统中ITEM表示的精度,以更好地捕捉用户的兴趣。
  • methods: 本文提出了一种名为Interest-aware Capsule network(IaCN)的新型推荐模型,它通过直接学习用户兴趣 oriented item表示来提高推荐效果。
  • results: 实验结果表明,对于不同的深度神经网络、行为序列长度和共同学习率,IaCN模型均显示出显著的性能提升,证明了该方法的有效性。
    Abstract Item representation holds significant importance in recommendation systems, which encompasses domains such as news, retail, and videos. Retrieval and ranking models utilise item representation to capture the user-item relationship based on user behaviours. While existing representation learning methods primarily focus on optimising item-based mechanisms, such as attention and sequential modelling. However, these methods lack a modelling mechanism to directly reflect user interests within the learned item representations. Consequently, these methods may be less effective in capturing user interests indirectly. To address this challenge, we propose a novel Interest-aware Capsule network (IaCN) recommendation model, a model-agnostic framework that directly learns interest-oriented item representations. IaCN serves as an auxiliary task, enabling the joint learning of both item-based and interest-based representations. This framework adopts existing recommendation models without requiring substantial redesign. We evaluate the proposed approach on benchmark datasets, exploring various scenarios involving different deep neural networks, behaviour sequence lengths, and joint learning ratios of interest-oriented item representations. Experimental results demonstrate significant performance enhancements across diverse recommendation models, validating the effectiveness of our approach.
    摘要 “Item representation plays a crucial role in recommendation systems, which encompasses domains such as news, retail, and videos. Retrieval and ranking models use item representation to capture the user-item relationship based on user behaviors. However, existing representation learning methods primarily focus on optimizing item-based mechanisms, such as attention and sequential modeling, and lack a modeling mechanism to directly reflect user interests within the learned item representations. As a result, these methods may be less effective in capturing user interests indirectly. To address this challenge, we propose a novel Interest-aware Capsule network (IaCN) recommendation model, a model-agnostic framework that directly learns interest-oriented item representations. IaCN serves as an auxiliary task, enabling the joint learning of both item-based and interest-based representations. This framework adopts existing recommendation models without requiring substantial redesign. We evaluate the proposed approach on benchmark datasets, exploring various scenarios involving different deep neural networks, behavior sequence lengths, and joint learning ratios of interest-oriented item representations. Experimental results demonstrate significant performance enhancements across diverse recommendation models, validating the effectiveness of our approach.”Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

TinyProp – Adaptive Sparse Backpropagation for Efficient TinyML On-device Learning

  • paper_url: http://arxiv.org/abs/2308.09201
  • repo_url: None
  • paper_authors: Marcus Rüb, Daniel Maier, Daniel Mueller-Gritschneder, Axel Sikora
  • for: 这篇论文目的是提高在低功耗微控制器单元(MCU)上进行device learning或精细化神经网络的效率。
  • methods: 这篇论文使用了简略的传播算法,通过在训练步骤中适当地选择和训练价值对的方法来实现硬件上的训练。
  • results: compared to非简略训练,这篇论文的结果显示,使用这种方法可以提高平均5倍,并且仅受到轻微的计算成本影响。此外,与现有的静态简略传播算法相比,这种方法的执行速度提高了2.9倍,并且降低了平均6%的精度损失。
    Abstract Training deep neural networks using backpropagation is very memory and computationally intensive. This makes it difficult to run on-device learning or fine-tune neural networks on tiny, embedded devices such as low-power micro-controller units (MCUs). Sparse backpropagation algorithms try to reduce the computational load of on-device learning by training only a subset of the weights and biases. Existing approaches use a static number of weights to train. A poor choice of this so-called backpropagation ratio limits either the computational gain or can lead to severe accuracy losses. In this paper we present TinyProp, the first sparse backpropagation method that dynamically adapts the back-propagation ratio during on-device training for each training step. TinyProp induces a small calculation overhead to sort the elements of the gradient, which does not significantly impact the computational gains. TinyProp works particularly well on fine-tuning trained networks on MCUs, which is a typical use case for embedded applications. For typical datasets from three datasets MNIST, DCASE2020 and CIFAR10, we are 5 times faster compared to non-sparse training with an accuracy loss of on average 1%. On average, TinyProp is 2.9 times faster than existing, static sparse backpropagation algorithms and the accuracy loss is reduced on average by 6 % compared to a typical static setting of the back-propagation ratio.
    摘要 训练深度神经网络使用反传播是非常占用内存和计算资源的。这使得在设备学习或精度调整神经网络在小型嵌入式设备(如低功耗微控制器)上进行困难。稀疏反传播算法尝试减少设备上学习的计算负担,只训练部分权重和偏移。现有方法使用静态的反传播比率来训练。这称为反传播比率的选择会限制计算的益处或导致严重的准确性损失。在这篇论文中,我们介绍了TinyProp,第一个动态在设备上训练时适应反传播比率的稀疏反传播方法。TinyProp添加了一些排序元素计划的小计算开销,不会对计算的益处产生很大的影响。TinyProp在MCU上精度调整已经训练的网络时特别有效,这是常见的嵌入式应用场景。对于典型的MNIST、DCASE2020和CIFAR10数据集,我们比非稀疏训练更快5倍,准确性损失平均1%。与现有静态稀疏反传播算法相比,TinyProp平均2.9倍快,并且准确性损失平均降低6%。

Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors

  • paper_url: http://arxiv.org/abs/2308.09199
  • repo_url: None
  • paper_authors: Apollo Albright, Boris Gelfand, Michael Dixon
  • for: 这篇论文主要研究了一种光学物理不可克隆函数(PUF)的学习问题。
  • methods: 该论文使用了一种基于多项式的批量学习方法,通过训练一个多项式回归模型来学习PUF。
  • results: 该论文显示,可以使用这种方法将PUF学习到任意精度,即使在噪声存在的情况下,只要有充足的挑战对象和计算资源。这些结果超越了2013年 Rh"uramir 等人的研究,他们只处理了一个子集的PUF,并且假设了光学系统是线性的或者有轻微非线性效应。
    Abstract It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.
    摘要 研究表明,一类光学物理不可克隆函数(PUF)可以在很高的准确率下学习,即使在噪声存在的情况下,只需要访问很多挑战-响应对的对话和计算能力,假设噪声和挑战向量的分布是某种很少的,并且允许使用很多的计算资源。这个结论超越了2013年rh\"uramir等人的结果,他们只处理了这个类型的PUF的一个子集,并且假设optics的影响是线性或有小非线性效应。我们 derive了基于PUF的大小参数、挑战向量和噪声向量的分布、恢复算法的概率和准确率的多项式上限,与2018年Bootle等人的一个类似的分析相同,他们展示了一种学习攻击,攻击一个不好实现的学习问题。

Half-Hop: A graph upsampling approach for slowing down message passing

  • paper_url: http://arxiv.org/abs/2308.09198
  • repo_url: https://github.com/nerdslab/halfhop
  • paper_authors: Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Veličković, Eva L. Dyer
  • for: 提高消息传递神经网络的学习效果,特别在邻居节点属于不同类型时。
  • methods: 添加”慢节点”来协调消息传递,只需修改输入图。
  • results: 在多种批处理和自动化学习 benchmark 上提高表现,特别在异质情况下,并可用于生成自适应学习的增强视图。
    Abstract Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.
    摘要 We provide both theoretical and empirical analyses to demonstrate the benefits of slowing down message passing. Our results on several supervised and self-supervised benchmarks show improvements across the board, particularly in heterophilic conditions where adjacent nodes are more likely to have different labels. Additionally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.

A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

  • paper_url: http://arxiv.org/abs/2308.09193
  • repo_url: https://github.com/av9ash/duplicatebugdetection
  • paper_authors: Avinash Patil, Kihwan Han, Sabyasachi Mukhopadhyay
  • for: 本研究旨在比较不同文本 Similarity 方法在 bug report 中的效果,以提高 bug report 的检索和分类效果。
  • methods: 本研究使用了多种嵌入模型,包括 TF-IDF (基准), FastText, Gensim, BERT, ADA 等。
  • results: 实验结果显示,BERT 在回忆性和准确率方面比其他模型都高,其次是 ADA, Gensim, FastText, TF-IDF。这些结果提供了不同嵌入方法在 bug report 检索中的效果,并指出了选择合适的嵌入方法对这种任务的重要性。
    Abstract Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.
    摘要 📝 bug 报告是软件开发中非常重要的一环,快速确定和解决 bug 可以保证软件系统的一致性。从现有数据库中检索类似 bug 报告可以减少解决 bug 所需的时间和努力。在这篇论文中,我们比较了基于 semantic textual similarity 方法来检索类似 bug 报告的效果,并计算了相似性分数。我们检查了TF-IDF(基准)、FastText、Gensim、BERT 和 ADA 等嵌入模型。我们使用了 Software Defects Data 中的 bug 报告来评估这些模型的性能。我们的实验结果表明,BERT 通常在 recall 方面表现出色,其次是 ADA、Gensim、FastText 和 TF-IDF。我们的研究提供了不同嵌入方法在检索类似 bug 报告的效果的启示,并 highlights 选择合适的嵌入方法对这种任务的重要性。我们的代码可以在 GitHub 上找到。

Regularizing Adversarial Imitation Learning Using Causal Invariance

  • paper_url: http://arxiv.org/abs/2308.09189
  • repo_url: None
  • paper_authors: Ivan Ovinnikov, Joachim M. Buhmann
  • for: 这篇论文的目的是使用伪函数学习方法从专家示范数据集中推导出一个策略,并使用推论器作为对抗优化过程中的指导信号。
  • methods: 这篇论文使用了伪函数学习方法,并使用了一个推论器作为对抗优化过程中的指导信号。
  • results: 论文发现这种模型容易吸收专家数据中的偶极 correlations,为了解决这个问题,提出了使用 causal invariance 作为对抗训练这些模型的正则化原则。
    Abstract Imitation learning methods are used to infer a policy in a Markov decision process from a dataset of expert demonstrations by minimizing a divergence measure between the empirical state occupancy measures of the expert and the policy. The guiding signal to the policy is provided by the discriminator used as part of an versarial optimization procedure. We observe that this model is prone to absorbing spurious correlations present in the expert data. To alleviate this issue, we propose to use causal invariance as a regularization principle for adversarial training of these models. The regularization objective is applicable in a straightforward manner to existing adversarial imitation frameworks. We demonstrate the efficacy of the regularized formulation in an illustrative two-dimensional setting as well as a number of high-dimensional robot locomotion benchmark tasks.
    摘要 模仿学习方法用于在Markov决策过程中推导策略,通过将专家示例集中的数据用作示例来 minimum化分割度量和策略之间的差异。导向信号将被用作抽象优化过程中的评价函数。我们发现这些模型容易吸收专家数据中存在的偶极关系。为了解决这个问题,我们提议使用 causal invariance 作为对 adversarial 训练的规则化原则。这个正则化目标可以直接应用于现有的对抗模仿框架中。我们在一个简单的二维设定中以及一些高维机器人行走 benchmark 任务中证明了这种规则化形式的效果。

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

  • paper_url: http://arxiv.org/abs/2308.09187
  • repo_url: https://github.com/lions-epfl/qgenx
  • paper_authors: Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher
  • for: 解决多处理器/工作者/客户端之间的分布式凸小最优化问题,包括分布式凸最小化、最大化和游戏等问题。
  • methods: 提出了一种基于量化的通用逆向梯度法(Q-GenX),该方法可以有效地减少通信量,并且适应不同的噪声 Profile。
  • results: 提出了一种自适应步长规则,可以适应各种噪声 Profiless,并且在分布式训练下显著加速了训练速度,并通过实际实验验证了这些理论结论。
    Abstract We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. We provide an adaptive step-size rule, which adapts to the respective noise profiles at hand and achieve a fast rate of ${\mathcal O}(1/T)$ under relative noise, and an order-optimal ${\mathcal O}(1/\sqrt{T})$ under absolute noise and show distributed training accelerates convergence. Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.
    摘要 我们考虑多个GPU上的 monotoneVariational inequality(VI)问题,其中多个处理器/工作者/客户端具有本地随机对应矩阵。这些设定包括分布式凸减少问题、min-max和游戏等重要问题。标准的Extra-gradient算法(de facto algorithm)未被设计来实现通信效率。因此,我们提出了量化通过矩阵(Q-GenX),它是一种适应性和无偏量的压缩方法,用于解决VI问题。我们提供了适应步长规则,可以适应不同的噪音 профи在手中,并可以在相对噪音下 achieved a fast rate of $\mathcal{O}(1/T)$,以及在绝对噪音下 achieved an order-optimal rate of $\mathcal{O}(1/\sqrt{T})$。最后,我们 validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.Note that "随机对应矩阵" in the text can be translated as "stochastic dual vectors" or "randomized dual vectors".

RatGPT: Turning online LLMs into Proxies for Malware Attacks

  • paper_url: http://arxiv.org/abs/2308.09183
  • repo_url: None
  • paper_authors: Mika Beckerich, Laura Plein, Sergio Coronado
  • for: 这项研究旨在探讨使用开源插件和大语言模型(LLMs)在软件工程中新开的可能性,以及这些技术在网络安全方面带来的新挑战。
  • methods: 研究人员使用ChatGPT等LLMs生成攻击性内容,并通过在攻击者和受害者之间作为中间人使用这些LLMs来实现攻击。
  • results: 研究人员成功地使用ChatGPT等LLMs生成攻击性软件,并在不被检测的情况下传递命令到受害者系统。这项研究指出了使用开源插件和LLMs时存在的重要网络安全问题,需要开发安全指南、控制和缓解策略。
    Abstract The evolution of Generative AI and the capabilities of the newly released Large Language Models (LLMs) open new opportunities in software engineering. However, they also lead to new challenges in cybersecurity. Recently, researchers have shown the possibilities of using LLMs such as ChatGPT to generate malicious content that can directly be exploited or guide inexperienced hackers to weaponize tools and code. Those studies covered scenarios that still require the attacker in the middle of the loop. In this study, we leverage openly available plugins and use an LLM as proxy between the attacker and the victim. We deliver a proof-of-concept where ChatGPT is used for the dissemination of malicious software while evading detection, alongside establishing the communication to a command and control (C2) server to receive commands to interact with a victim's system. Finally, we present the general approach as well as essential elements in order to stay undetected and make the attack a success. This proof-of-concept highlights significant cybersecurity issues with openly available plugins and LLMs, which require the development of security guidelines, controls, and mitigation strategies.
    摘要 随着生成式人工智能的演化和新一代大语言模型(LLMs)的发布,软件工程方面开放出了新的机遇。然而,这也导致了新的网络安全挑战。最近,研究人员已经证明了使用 ChatGPT 等 LLMs 生成恶意内容,并直接利用或帮助不熟悉黑客 weaponize 工具和代码。这些研究都是在攻击者在中途的情况下进行的。在本研究中,我们利用公开available的插件,并使用 LLM 作为攻击者和受害者之间的代理。我们实现了一个证明,在使用 ChatGPT 进行恶意软件的散布时,同时避免检测,并与Command and Control(C2)服务器建立通信,以接收对受害者系统的交互命令。最后,我们提出了一种总体方法和重要元素,以确保隐蔽和成功攻击。这个证明指出了公开available的插件和 LLMS 对网络安全的潜在问题,需要开发安全指南、控制和缓解策略。

ChatGPT-HealthPrompt. Harnessing the Power of XAI in Prompt-Based Healthcare Decision Support using ChatGPT

  • paper_url: http://arxiv.org/abs/2308.09731
  • repo_url: None
  • paper_authors: Fatemeh Nazary, Yashar Deldjoo, Tommaso Di Noia
  • for: 本研究旨在探讨大语言模型(LLM)在医疗决策中的应用,尤其是OpenAI的ChatGPT。我们的方法涉及使用上下文提示,截然地设计了任务描述、特征描述以及针对医疗领域知识的整合。
  • methods: 我们的研究利用了医疗领域知识,从高性能可解释Machine Learning(ML)模型中提取了关键信息,并将其灵活地 интеGRATE到提示设计中。我们视这些ML模型为医疗专家,从而提取了关键的特征重要性,以帮助决策过程。
  • results: 我们的研究发现,通过使用上下文提示和医疗领域知识的整合,可以在数据缺乏情况下实现高质量的 binary 分类任务。此外,我们还探讨了LLMs在不同数据条件下的零少shot和几少shot提示学习效果,并对传统supervised ML模型进行比较。
    Abstract This study presents an innovative approach to the application of large language models (LLMs) in clinical decision-making, focusing on OpenAI's ChatGPT. Our approach introduces the use of contextual prompts-strategically designed to include task description, feature description, and crucially, integration of domain knowledge-for high-quality binary classification tasks even in data-scarce scenarios. The novelty of our work lies in the utilization of domain knowledge, obtained from high-performing interpretable ML models, and its seamless incorporation into prompt design. By viewing these ML models as medical experts, we extract key insights on feature importance to aid in decision-making processes. This interplay of domain knowledge and AI holds significant promise in creating a more insightful diagnostic tool. Additionally, our research explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. In essence, this paper bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. It highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making.
    摘要 Our research also explores the dynamics of zero-shot and few-shot prompt learning based on LLMs. By comparing the performance of OpenAI's ChatGPT with traditional supervised ML models in different data conditions, we aim to provide insights into the effectiveness of prompt engineering strategies under varied data availability. This study bridges the gap between AI and healthcare, proposing a novel methodology for LLMs application in clinical decision support systems. Our approach highlights the transformative potential of effective prompt design, domain knowledge integration, and flexible learning approaches in enhancing automated decision-making.

Diversifying AI: Towards Creative Chess with AlphaZero

  • paper_url: http://arxiv.org/abs/2308.09175
  • repo_url: None
  • paper_authors: Tom Zahavy, Vivek Veeriah, Shaobo Hou, Kevin Waugh, Matthew Lai, Edouard Leurent, Nenad Tomasev, Lisa Schut, Demis Hassabis, Satinder Singh
  • for: 这项研究是为了检查人工智能(AI)系统是否可以从创造性决策机制中受益,当被推到计算性理解的边缘时。
  • methods: 我们使用了多种行为多样性技术来让AI系统生成更多的想法,然后选择最有前途的想法。我们基于AlphaZero(AZ)扩展了其为一个联盟agent,并使用了隐藏条件的建筑来让AZ_db在不同的开局中选择最佳策略。
  • results: 我们的实验表明,AZ_db在困难的棋局中玩出了更多的想法,解决了更多的问题,并在多个开局中比AZ表现出色。特别是,AZ_db在解决了困难的彭罗斯位置的棋局中表现出了特别的优异。在不同的开局中,我们发现了每个player在不同的开局中特циализи得更强,并且通过使用不同的开局来选择最佳策略,可以提高50个Elo分。我们的发现表明,在AI代理中,多样性奖励存在,与人类团队中的多样性奖励相似。
    Abstract In recent years, Artificial Intelligence (AI) systems have surpassed human intelligence in a variety of computational tasks. However, AI systems, like humans, make mistakes, have blind spots, hallucinate, and struggle to generalize to new situations. This work explores whether AI can benefit from creative decision-making mechanisms when pushed to the limits of its computational rationality. In particular, we investigate whether a team of diverse AI systems can outperform a single AI in challenging tasks by generating more ideas as a group and then selecting the best ones. We study this question in the game of chess, the so-called drosophila of AI. We build on AlphaZero (AZ) and extend it to represent a league of agents via a latent-conditioned architecture, which we call AZ_db. We train AZ_db to generate a wider range of ideas using behavioral diversity techniques and select the most promising ones with sub-additive planning. Our experiments suggest that AZ_db plays chess in diverse ways, solves more puzzles as a group and outperforms a more homogeneous team. Notably, AZ_db solves twice as many challenging puzzles as AZ, including the challenging Penrose positions. When playing chess from different openings, we notice that players in AZ_db specialize in different openings, and that selecting a player for each opening using sub-additive planning results in a 50 Elo improvement over AZ. Our findings suggest that diversity bonuses emerge in teams of AI agents, just as they do in teams of humans and that diversity is a valuable asset in solving computationally hard problems.
    摘要 近年来,人工智能(AI)系统已经超越了人类智能在多种计算任务中。然而,AI系统,如人类一样,会出现错误、盲点、幻觉和适应新情况的困难。这项工作探索了AI是否可以通过创造性决策机制来提高其计算合理性。特别是,我们研究了一群多样化AI系统是否可以在复杂任务中超越单一AI,通过生成更多的想法并选择最佳的想法来解决问题。我们在国际象棋中进行了研究,国际象棋被称为AI的“蜘蛛”。我们基于AlphaZero(AZ),并将其扩展为多个代理via含有 latent 条件的架构,称为AZ_db。我们在AZ_db中使用行为多样性技术来生成更广泛的想法,并使用负添加itive планинг选择最有前途的想法。我们的实验表明,AZ_db在各种开局下进行棋牌比赛,可以解决更多的难题,并且在某些情况下比AZ更好。特别是,AZ_db解决了AlphaZero无法解决的困难位置,包括贝叶姆位置。当AZ_db在不同的开局下进行棋牌比赛时,我们发现每个代理在不同的开局上特化,并且通过选择每个开局中的最佳代理使用负添加itive планинг,可以提高50个Elo分。我们的发现表明,在AI代理团队中,多样性奖励 emerge,与人类团队一样,多样性是解决计算上的难题的有价值资产。

Forensic Data Analytics for Anomaly Detection in Evolving Networks

  • paper_url: http://arxiv.org/abs/2308.09171
  • repo_url: None
  • paper_authors: Li Yang, Abdallah Moubayed, Abdallah Shami, Amine Boukhtouta, Parisa Heidari, Stere Preda, Richard Brunner, Daniel Migault, Adel Larabi
    for:The paper is written to elaborate effective security controls to protect evolving network deployments in-depth, specifically in the context of 5G and virtualization.methods:The paper proposes a digital forensic data analytics framework for network anomaly detection, which includes multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction procedures.results:Experiments on real-world evolving network data demonstrate the effectiveness of the proposed forensic data analytics solution.
    Abstract In the prevailing convergence of traditional infrastructure-based deployment (i.e., Telco and industry operational networks) towards evolving deployments enabled by 5G and virtualization, there is a keen interest in elaborating effective security controls to protect these deployments in-depth. By considering key enabling technologies like 5G and virtualization, evolving networks are democratized, facilitating the establishment of point presences integrating different business models ranging from media, dynamic web content, gaming, and a plethora of IoT use cases. Despite the increasing services provided by evolving networks, many cybercrimes and attacks have been launched in evolving networks to perform malicious activities. Due to the limitations of traditional security artifacts (e.g., firewalls and intrusion detection systems), the research on digital forensic data analytics has attracted more attention. Digital forensic analytics enables people to derive detailed information and comprehensive conclusions from different perspectives of cybercrimes to assist in convicting criminals and preventing future crimes. This chapter presents a digital analytics framework for network anomaly detection, including multi-perspective feature engineering, unsupervised anomaly detection, and comprehensive result correction procedures. Experiments on real-world evolving network data show the effectiveness of the proposed forensic data analytics solution.
    摘要 在传统基础设施(如telco和产业运维网络)向5G和虚拟化技术的演进部署过渡时,有很大的兴趣在深入保护这些部署。通过考虑关键实现技术(如5G和虚拟化),演进网络得到了民主化,使得不同业务模式的点 présence得到了建立,包括媒体、动态网页内容、游戏和大量物联网应用场景。然而,随着演进网络服务的增加,许多网络犯罪和攻击也在演进网络进行 malicious 活动。由于传统安全文件(如防火墙和入侵检测系统)的局限性,研究数字科学证据分析在获得更多的关注。数字科学证据分析可以帮助人们从不同角度获得细节信息,并提供完整的结论,以帮助控制犯罪和预防未来犯罪。本章介绍了一个网络异常检测数字分析框架,包括多元视角特征工程、无监督异常检测和完整结果修正过程。对实际演进网络数据进行实验,表明提议的数字科学证据分析解决方案的有效性。

Online Transition-Based Feature Generation for Anomaly Detection in Concurrent Data Streams

  • paper_url: http://arxiv.org/abs/2308.10893
  • repo_url: None
  • paper_authors: Yinzheng Zhong, Alexei Lisitsa
  • for: 这种技术是为了处理通用活动数据,包括网络流量、系统调用和监控录像等,并生成step-by-step生成的数据。
  • methods: 这种技术使用转换基于特征生成器(TFGen)技术,可以在线处理数据,并将历史数据编码到每个进来的活动中,以实现高效计算。
  • results: 这种技术可以解决域独特性、全球过程结构的发现、时间序列数据编码和在线处理能力等问题。
    Abstract In this paper, we introduce the transition-based feature generator (TFGen) technique, which reads general activity data with attributes and generates step-by-step generated data. The activity data may consist of network activity from packets, system calls from processes or classified activity from surveillance cameras. TFGen processes data online and will generate data with encoded historical data for each incoming activity with high computational efficiency. The input activities may concurrently originate from distinct traces or channels. The technique aims to address issues such as domain-independent applicability, the ability to discover global process structures, the encoding of time-series data, and online processing capability.
    摘要 在这篇论文中,我们介绍了一种基于过程的特征生成技术(TFGen),它可以读取具有特征的活动数据,并生成步骤生成的数据。活动数据可以包括网络活动数据包、进程系统调用或Surveillance camera中分类的活动数据。TFGen处理数据在线,并将生成每个入参活动数据的编码历史数据,以高效处理能力进行处理。输入活动可同时来自不同的轨迹或通道。该技术的目标是解决域独立应用性、找到全局过程结构、编码时间序列数据以及在线处理能力等问题。

FedPerfix: Towards Partial Model Personalization of Vision Transformers in Federated Learning

  • paper_url: http://arxiv.org/abs/2308.09160
  • repo_url: https://github.com/imguangyu/fedperfix
  • paper_authors: Guangyu Sun, Matias Mendieta, Jun Luo, Shandong Wu, Chen Chen
  • for: 这个研究旨在提高分散式学习中的模型个性化,并且应用于具有不同数据分布的多种模型中。
  • methods: 研究使用了半模型个性化来提高分散式学习的效率,并且对每种层次进行了调查,以了解哪些部分最需要个性化。
  • results: 研究结果显示,使用 FedPerfix 可以对 CIFAR-100、OrganAMNIST 和 Office-Home 等 dataset 进行优化,并且与多种进阶 PFL 方法进行比较,获得了更好的效果。
    Abstract Personalized Federated Learning (PFL) represents a promising solution for decentralized learning in heterogeneous data environments. Partial model personalization has been proposed to improve the efficiency of PFL by selectively updating local model parameters instead of aggregating all of them. However, previous work on partial model personalization has mainly focused on Convolutional Neural Networks (CNNs), leaving a gap in understanding how it can be applied to other popular models such as Vision Transformers (ViTs). In this work, we investigate where and how to partially personalize a ViT model. Specifically, we empirically evaluate the sensitivity to data distribution of each type of layer. Based on the insights that the self-attention layer and the classification head are the most sensitive parts of a ViT, we propose a novel approach called FedPerfix, which leverages plugins to transfer information from the aggregated model to the local client as a personalization. Finally, we evaluate the proposed approach on CIFAR-100, OrganAMNIST, and Office-Home datasets and demonstrate its effectiveness in improving the model's performance compared to several advanced PFL methods.
    摘要 personalized federated learning (PFL) 是一种有前途的解决方案,用于分布式学习不同数据环境中的模型个性化。partial model personalization 可以提高 PFL 的效率,通过选择ively更新本地模型参数而不是所有参数的汇集。然而,过去关于 partial model personalization 的研究主要集中在卷积神经网络 (CNNs) 上,尚未对其他流行的模型,如视觉转换器 (ViTs) 进行研究。在这项工作中,我们调查了如何在 ViT 模型中部分个性化。具体来说,我们employn了数据分布的敏感性测试,发现自注意层和分类头是 ViT 模型中最敏感的部分。基于这些发现,我们提出了一种名为 FedPerfix,它使用插件来将汇集模型中的信息传递给本地客户端进行个性化。最后,我们在 CIFAR-100、OrganAMNIST 和 Office-Home datasets 上评估了我们的方法,并证明它在许多先进的 PFL 方法之上表现出色。

Data diversity and virtual imaging in AI-based diagnosis: A case study based on COVID-19

  • paper_url: http://arxiv.org/abs/2308.09730
  • repo_url: None
  • paper_authors: Fakrul Islam Tushar, Lavsen Dahal, Saman Sotoudeh-Paima, Ehsan Abadi, W. Paul Segars, Ehsan Samei, Joseph Y. Lo
  • for: This study aimed to evaluate the performance of deep-learning-based AI models for COVID-19 diagnosis using diverse clinical and virtually generated medical images, and to assess the impact of dataset characteristics, disease extent, and imaging modality on AI performance.
  • methods: The study used a retrospective design and developed AI models using both clinical and virtually generated medical images. A virtual imaging trial was conducted to assess the impact of patient- and physics-based factors on AI performance.
  • results: The study found that AI performance was strongly influenced by dataset characteristics, with poor generalization to new data and a 20% drop in receiver operating characteristic area under the curve. CT results were consistently superior to those from CXR, and imaging dose had negligible influence on results. The study highlighted the significance of dataset characteristics and disease extent on COVID assessment, and the potential role of virtual imaging trial techniques in developing effective AI algorithms and facilitating translation into diagnostic practice.
    Abstract Many studies have investigated deep-learning-based artificial intelligence (AI) models for medical imaging diagnosis of the novel coronavirus (COVID-19), with many reports of near-perfect performance. However, variability in performance and underlying data biases raise concerns about clinical generalizability. This retrospective study involved the development and evaluation of artificial intelligence (AI) models for COVID-19 diagnosis using both diverse clinical and virtually generated medical images. In addition, we conducted a virtual imaging trial to assess how AI performance is affected by several patient- and physics-based factors, including the extent of disease, radiation dose, and imaging modality of computed tomography (CT) and chest radiography (CXR). AI performance was strongly influenced by dataset characteristics including quantity, diversity, and prevalence, leading to poor generalization with up to 20% drop in receiver operating characteristic area under the curve. Model performance on virtual CT and CXR images was comparable to overall results on clinical data. Imaging dose proved to have negligible influence on the results, but the extent of the disease had a marked affect. CT results were consistently superior to those from CXR. Overall, the study highlighted the significant impact of dataset characteristics and disease extent on COVID assessment, and the relevance and potential role of virtual imaging trial techniques on developing effective evaluation of AI algorithms and facilitating translation into diagnostic practice.
    摘要 AI performance was strongly influenced by dataset characteristics including quantity, diversity, and prevalence, leading to poor generalization with up to 20% drop in receiver operating characteristic area under the curve. Model performance on virtual CT and CXR images was comparable to overall results on clinical data. Imaging dose proved to have negligible influence on the results, but the extent of the disease had a marked affect. CT results were consistently superior to those from CXR. Overall, the study highlighted the significant impact of dataset characteristics and disease extent on COVID assessment, and the relevance and potential role of virtual imaging trial techniques on developing effective evaluation of AI algorithms and facilitating translation into diagnostic practice.Translation:多些研究已经 investigate deep learning 基于人工智能(AI)模型用于医疗影像诊断 COVID-19,其中许多报告显示 near-perfect 性能。然而,表现变化和下面数据偏见引起了临床可靠性的 Concerns。这项 retrospective 研究涉及开发和评估 COVID-19 诊断中使用多种临床和虚拟生成的医疗影像 AI 模型。此外,我们还进行了虚拟成像试验,以评估 AI 性能受患者和物理因素的影响,包括疾病程度、辐射剂量和成像方式。AI 性能强烈受 dataset 特点的影响,包括数量、多样性和患者发展程度,导致到 20% 的 Receiver Operating Characteristic 曲线下降。模型在虚拟 CT 和 CXR 影像上的性能与临床数据上的性能相似。辐射剂量对结果没有重要影响,但疾病程度有显著影响。 CT 结果比 CXR 结果更好。总之,这项研究强调了 COVID 诊断中数据特点和疾病程度的重要影响,以及虚拟成像试验技术在开发有效 AI 算法并实现诊断实践中的重要作用。

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

  • paper_url: http://arxiv.org/abs/2308.09158
  • repo_url: https://github.com/zhangyikaii/lamda-zhijian
  • paper_authors: Yi-Kai Zhang, Lu Ren, Chao Yi, Qi-Wei Wang, De-Chuan Zhan, Han-Jia Ye
  • for: This paper is written for deep learning practitioners and researchers who want to explore downstream tasks and identify the complementary advantages among different methods for model reuse.
  • methods: The paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse that utilizes the PyTorch backend. ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, including target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
  • results: The paper empowers deep learning practitioners to explore downstream tasks and identify the complementary advantages among different methods for model reuse, and ZhiJian is readily accessible at https://github.com/zhangyikaii/lamda-zhijian for seamless utilization of pre-trained models and streamlining the model reuse process.
    Abstract The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including reusing model weights, structures, and hypothesis spaces. This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend. ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference. This empowers deep learning practitioners to explore downstream tasks and identify the complementary advantages among different methods. ZhiJian is readily accessible at https://github.com/zhangyikaii/lamda-zhijian facilitating seamless utilization of pre-trained models and streamlining the model reuse process for researchers and developers.
    摘要 “快速扩展基础模型和其精度调整版本的发展对机器学习进步做出了重要贡献。利用预训练模型提取知识和快速学习实际任务,称为“模型重用”,在各种应用中变得非常重要。先前的研究主要关注在某一方面进行模型重用,包括重用模型权重、结构和假设空间。本文介绍了智慧箱(ZhiJian),一个通用且易用的模型重用工具箱,使用PyTorch backend。智慧箱提出了一种新的模型重用 paradigma,整合多种视角,包括目标建筑PTM、调整目标模型PTM和PTM基于的推理。这使得深度学习专家能够更好地探索下游任务,并发现不同方法之间的补偿优势。智慧箱可以在https://github.com/zhangyikaii/lamda-zhijian中免费获取,便于预训练模型的使用和模型重用过程,为研究人员和开发者提供便捷的使用。”

Accurate machine learning force fields via experimental and simulation data fusion

  • paper_url: http://arxiv.org/abs/2308.09142
  • repo_url: None
  • paper_authors: Sebastien Röcken, Julija Zavadlav
  • for: 这个论文的目的是开发一种基于机器学习(ML)的力场模型,以涵盖离子质量级别的分子模型。
  • methods: 该论文使用了density functional theory(DFT)计算和实验测量的机械性质和晶体结构来训练一个ML力场模型。
  • results: 研究人员发现,通过结合DFT计算和实验数据来训练ML力场模型,可以同时满足所有目标目标,而且比使用单个数据源训练的模型更准确。此外,通过这种方法,可以正确地修正DFT函数的缺陷,同时保持偏离目标 свой性的准确性。
    Abstract Machine Learning (ML)-based force fields are attracting ever-increasing interest due to their capacity to span spatiotemporal scales of classical interatomic potentials at quantum-level accuracy. They can be trained based on high-fidelity simulations or experiments, the former being the common case. However, both approaches are impaired by scarce and erroneous data resulting in models that either do not agree with well-known experimental observations or are under-constrained and only reproduce some properties. Here we leverage both Density Functional Theory (DFT) calculations and experimentally measured mechanical properties and lattice parameters to train an ML potential of titanium. We demonstrate that the fused data learning strategy can concurrently satisfy all target objectives, thus resulting in a molecular model of higher accuracy compared to the models trained with a single data source. The inaccuracies of DFT functionals at target experimental properties were corrected, while the investigated off-target properties remained largely unperturbed. Our approach is applicable to any material and can serve as a general strategy to obtain highly accurate ML potentials.
    摘要

RTB Formulation Using Point Process

  • paper_url: http://arxiv.org/abs/2308.09122
  • repo_url: None
  • paper_authors: Seong Jin Lee, Bumsik Kim
  • for: 这篇论文是为了模型重复拍卖(RTB)生态系统中的点过程,并提出了一种通用的杂event-driven模型。
  • methods: 该论文使用点过程模型来描述拍卖过程,并提出了一种基于Poisson点过程的approximation方法。
  • results: 该论文提出了player的优化策略以及joint分布的重要性,并指出了在不同情况下的优化策略。
    Abstract We propose a general stochastic framework for modelling repeated auctions in the Real Time Bidding (RTB) ecosystem using point processes. The flexibility of the framework allows a variety of auction scenarios including configuration of information provided to player, determination of auction winner and quantification of utility gained from each auctions. We propose theoretical results on how this formulation of process can be approximated to a Poisson point process, which enables the analyzer to take advantage of well-established properties. Under this framework, we specify the player's optimal strategy under various scenarios. We also emphasize that it is critical to consider the joint distribution of utility and market condition instead of estimating the marginal distributions independently.
    摘要 我们提出了一种通用的随机化框架,用于模拟RTB生态系统中的重复拍卖。这个框架的灵活性允许多种拍卖场景,包括玩家信息提供方式、拍卖赢家决定以及每次拍卖中获得的用于量。我们提出了理论结果,表明这种过程的形式可以approximerate到一个Poisson点过程,这使得分析者可以利用已有的性质。在这个框架下,我们定义了玩家的优化策略在不同场景下。我们还强调了考虑市场条件和用户性的联合分布,而不是独立地估算各自的分布。

Multi-fidelity Fourier Neural Operator for Fast Modeling of Large-Scale Geological Carbon Storage

  • paper_url: http://arxiv.org/abs/2308.09113
  • repo_url: None
  • paper_authors: Hewei Tang, Qingkai Kong, Joseph P. Morris
  • for: 用于加速地质碳存储(GCS)问题中预测储存压力和CO2气泡迁移的深度学习基于模拟器。
  • methods: 使用多优异Fourier神经操作器解决大规模GCS问题,使用更可持有的多优异训练数据集来预测复杂物理行为。
  • results: 测试表明,使用多优异FNO模型可以在81%的数据生成成本下达到与高精度模型相同的准确率,并且在不同地球预测模型和流体 simulator生成的高精度和低精度数据集上进行了一致预测。
    Abstract Deep learning-based surrogate models have been widely applied in geological carbon storage (GCS) problems to accelerate the prediction of reservoir pressure and CO2 plume migration. Large amounts of data from physics-based numerical simulators are required to train a model to accurately predict the complex physical behaviors associated with this process. In practice, the available training data are always limited in large-scale 3D problems due to the high computational cost. Therefore, we propose to use a multi-fidelity Fourier Neural Operator to solve large-scale GCS problems with more affordable multi-fidelity training datasets. The Fourier Neural Operator has a desirable grid-invariant property, which simplifies the transfer learning procedure between datasets with different discretization. We first test the model efficacy on a GCS reservoir model being discretized into 110k grid cells. The multi-fidelity model can predict with accuracy comparable to a high-fidelity model trained with the same amount of high-fidelity data with 81% less data generation costs. We further test the generalizability of the multi-fidelity model on a same reservoir model with a finer discretization of 1 million grid cells. This case was made more challenging by employing high-fidelity and low-fidelity datasets generated by different geostatistical models and reservoir simulators. We observe that the multi-fidelity FNO model can predict pressure fields with reasonable accuracy even when the high-fidelity data are extremely limited.
    摘要 深度学习基于的代理模型在地球科学中广泛应用于加速预测储存气体和二氧化碳泵迹迁移。它们需要大量的物理基础数据来训练模型,以便准确预测这些过程中的复杂物理行为。然而,在大规模3D问题中,实际可用的培训数据总是有限的,因此我们提议使用多质量Fourier neural operator来解决这些问题。Fourier neural operator具有可以简化转移学习过程的网格不变性属性,因此可以在不同的精度水平上进行数据转移。我们首先测试了模型在110k个网格细分的气体储存器模型上的效果。我们发现,使用多质量模型可以与高精度模型在同样的数据量下达到相同的准确率,但是需要81% fewer data generation costs。我们进一步测试了模型的通用性,在同一个气体储存器模型上使用不同的地OSTATISTICAL模型和气体 simulator生成的高精度和低精度数据来进行测试。我们发现,使用多质量FNO模型可以在高精度数据很少的情况下预测压力场的reasonable accuracy。

Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation

  • paper_url: http://arxiv.org/abs/2308.09105
  • repo_url: None
  • paper_authors: Shengcao Cao, Mengtian Li, James Hays, Deva Ramanan, Yi-Xiong Wang, Liang-Yan Gui
  • for: 提高资源受限的感知系统中的视觉模型精度和轻量级计算和存储使用。
  • methods: 提出一种简单 yet 有效的顺序方法,从多个教师检测器中逐步传递知识到一个轻量级学生模型。
  • results: 成功地从Transformer基本的教师检测器中提取知识,并使用逐步进行知识传递的方法,从而提高了ResNet-50基于RetinaNet的MS COCObenchmark中的AP分数从36.5%提高到42.0%,以及Mask R-CNN的AP分数从38.2%提高到42.5%。
    Abstract Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance the performance of lightweight classification models, its application to structured outputs like object detection and instance segmentation remains a complicated task, due to the variability in outputs and complex internal network modules involved in the distillation process. In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt. Our progressive strategy can be easily combined with existing detection distillation mechanisms to consistently maximize student performance in various settings. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students, and unprecedentedly boost the performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN from 38.2% to 42.5% AP on the MS COCO benchmark.
    摘要 资源受限的感知系统,如边缘 computing 和 robotics 视觉系统,需要视觉模型具备高度精度和轻量级的计算和内存使用。而知识传递是一种证实的策略,可以提高轻量级分类模型的性能,但是对于结构化输出 like 物体探测和实例分割仍然是一个复杂的任务,因为输出的变化和内部网络模组的复杂性。在这篇文章中,我们提出了一个简单又奇怪有效的顺序式知识传递方法,可以将一群教师探测器中的知识逐渐传递给一个轻量级学生。为了将高度精度但复杂的教师探测器中的知识传递给 convolution 型学生,我们建立了一系列的教师,帮助学生逐渐适应。我们的顺序式策略可以与现有的探测传递机制整合,以确保学生在不同的设定下的表现最佳。我们知道,我们是第一个成功地将 Transformer 型教师探测器中的知识传递给 convolution 型学生,并在 MS COCO benchmark 上从 36.5% 提高到 42.0% AP 和 38.2% 提高到 42.5% AP。

A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

  • paper_url: http://arxiv.org/abs/2308.09104
  • repo_url: None
  • paper_authors: Sanket Jantre, Shrijita Bhattacharya, Tapabrata Maiti
  • for: 提高深度学习的网络复杂性和计算效率,使用减少极大过 parameterized 深度神经网络来恢复稀疏表示。
  • methods: 使用架构稀疏(例如节点稀疏)压缩深度神经网络,以实现低延迟推理、高速数据传输和降低能耗。
  • results: 使用架构稀疏 Bayesian 神经网络,通过 Spike-and-Slab Group Lasso (SS-GL) 和 Spike-and-Slab Group Horseshoe (SS-GHS) 约束,实现计算可 tractable 变分推理,并在不同网络架构、层次节点数量和网络参数的约束下,确定变分 posterior 的收缩率。通过实验,比较了我们的模型与基eline模型的预测精度、模型压缩和推理延迟,并证明了我们的模型在这些方面的竞争性。
    Abstract Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low latency inference, higher data throughput, and reduced energy consumption. In this paper, we explore two well-established shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian neural networks. To this end, we propose structurally sparse Bayesian neural networks which systematically prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors, and develop computationally tractable variational inference including continuous relaxation of Bernoulli variables. We establish the contraction rates of the variational posterior of our proposed models as a function of the network topology, layer-wise node cardinalities, and bounds on the network weights. We empirically demonstrate the competitive performance of our models compared to the baseline models in prediction accuracy, model compression, and inference latency.
    摘要 网络复杂性和计算效率在深度学习中日益重要。 sparse deep learning 通过recovering 简洁表示函数来降低深度神经网络的复杂性。Specifically, 深度神经网络通过结构减少(例如节点稀疏)提供低延迟推理、高速数据传输和降低能耗。在这篇论文中,我们探讨了两种已有的减少技术,lasso 和 horseshoe,用于神经网络压缩。为此,我们提出了结构减少的 Bayesian neural networks,并使用 Spike-and-Slab Group Lasso(SS-GL)和 Spike-and-Slab Group Horseshoe(SS-GHS)假设来系统地剪除过度的节点。我们还开发了可 computationally tractable 的变换推理,包括连续放寒变量的放寒。我们证明了我们提出的模型的可信度衰减率与网络结构、层次节点数量和网络权值的约束有关。我们还通过实验表明我们的模型与基eline模型相比在预测精度、模型压缩和推理延迟方面具有竞争性。

MindMap: Knowledge Graph Prompting Sparks Graph of Thoughts in Large Language Models

  • paper_url: http://arxiv.org/abs/2308.09729
  • repo_url: https://github.com/willing510/MindMap
  • paper_authors: Yilin Wen, Zifeng Wang, Jimeng Sun
  • for: 这个论文的目的是解决大语言模型(LLM)的限制,包括它们的知识更新、hallucination和决策过程的不透明度。
  • methods: 这篇论文使用知识图(KG)来激活LLM的知识更新和逻辑推理。具体来说,我们建立了一个提示管道,使LLM能够理解KG输入并结合内存中的隐式知识和外部知识进行推理。此外,我们还研究了询问LLM的思维路径和生成答案。
  • results: 实验结果表明,使用 MindMap 提示可以获得显著的实验性提升。例如,使用 MindMap 提示与 GPT-3.5 比较,可以在三个问答数据集上获得悬殊的性能。此外,我们还发现,使用 KG 中的结构化事实可以超过使用文档检索方法,因为 MindMap 可以提供更准确、简洁和全面的知识。
    Abstract LLMs usually exhibit limitations in their ability to incorporate new knowledge, the generation of hallucinations, and the transparency of their decision-making process. In this paper, we explore how to prompt LLMs with knowledge graphs (KG), working as a remedy to engage LLMs with up-to-date knowledge and elicit the reasoning pathways from LLMs. Specifically, we build a prompting pipeline that endows LLMs with the capability of comprehending KG inputs and inferring with a combined implicit knowledge and the retrieved external knowledge. In addition, we investigate eliciting the mind map on which LLMs perform the reasoning and generate the answers. It is identified that the produced mind map exhibits the reasoning pathways of LLMs grounded on the ontology of knowledge, hence bringing the prospects of probing and gauging LLM inference in production. The experiments on three question & answering datasets also show that MindMap prompting leads to a striking empirical gain. For instance, prompting a GPT-3.5 with MindMap yields an overwhelming performance over GPT-4 consistently. We also demonstrate that with structured facts retrieved from KG, MindMap can outperform a series of prompting-with-document-retrieval methods, benefiting from more accurate, concise, and comprehensive knowledge from KGs.
    摘要

Modeling Edge Features with Deep Bayesian Graph Networks

  • paper_url: http://arxiv.org/abs/2308.09087
  • repo_url: https://github.com/diningphil/e-cgmm
  • paper_authors: Daniele Atzeni, Federico Errica, Davide Bacciu, Alessio Micheli
  • for: 这篇论文是为了扩展Contextual Graph Markov Model(一种深度和概率学习模型),以模型边Feature的分布。
  • methods: 我们提出了一种建立额外的 Bayesian网络,将边特征映射到精确的状态中,以便由原始模型使用。这种方法使得我们可以在缺乏边特征的情况下建立更加富有的图表示,并且在标准图分类benchmark上达到了性能提升。
  • results: 我们在图 regression 场景中成功应用了我们的提议,并在三个链接预测任务上实现了明显的性能提升。此外,我们还证明了我们的模型在大规模图处理中保持了线性计算复杂性。
    Abstract We propose an extension of the Contextual Graph Markov Model, a deep and probabilistic machine learning model for graphs, to model the distribution of edge features. Our approach is architectural, as we introduce an additional Bayesian network mapping edge features into discrete states to be used by the original model. In doing so, we are also able to build richer graph representations even in the absence of edge features, which is confirmed by the performance improvements on standard graph classification benchmarks. Moreover, we successfully test our proposal in a graph regression scenario where edge features are of fundamental importance, and we show that the learned edge representation provides substantial performance improvements against the original model on three link prediction tasks. By keeping the computational complexity linear in the number of edges, the proposed model is amenable to large-scale graph processing.
    摘要 我们提出了Contextual Graph Markov Model的扩展,一种深度和概率机器学习模型,以模型边Feature的分布。我们的方法是建立一个额外的 Bayesian 网络,将边特征映射到精确的状态中,以便由原始模型使用。这样做的好处是,我们可以在原始模型缺失edge特征时,建立更加富裕的图表示。此外,我们成功地在图回归情况下测试了我们的建议,并证明了对三个链接预测任务的性能提高。此外,我们保持了计算复杂性为边数量的线性,使得我们的模型适用于大规模图处理。

Embracing assay heterogeneity with neural processes for markedly improved bioactivity predictions

  • paper_url: http://arxiv.org/abs/2308.09086
  • repo_url: None
  • paper_authors: Lucian Chan, Marcel Verdonk, Carl Poelking
  • for: 预测药物的生物活性是计算辅助药物发现中最Difficult和最重要的挑战之一,尽管年月的数据收集和整理努力已经做出了很大的贡献,但生物活性数据仍然稀缺和多样化,这使得建立准确、可迁移和稳定的预测模型变得困难。
  • methods: 作者提出了一种层次meta学框架,该框架利用不同的试验方法之间的信息相互作用,成功地考虑了试验方法的多样化。
  • results: 作者展示了该模型在不同的蛋白目标和试验方法上的较好的预测性能,并且可以快速适应新的目标 Context 使用非常少的观察,因此可以实现大规模的虚拟屏选在早期药物发现阶段。
    Abstract Predicting the bioactivity of a ligand is one of the hardest and most important challenges in computer-aided drug discovery. Despite years of data collection and curation efforts by research organizations worldwide, bioactivity data remains sparse and heterogeneous, thus hampering efforts to build predictive models that are accurate, transferable and robust. The intrinsic variability of the experimental data is further compounded by data aggregation practices that neglect heterogeneity to overcome sparsity. Here we discuss the limitations of these practices and present a hierarchical meta-learning framework that exploits the information synergy across disparate assays by successfully accounting for assay heterogeneity. We show that the model achieves a drastic improvement in affinity prediction across diverse protein targets and assay types compared to conventional baselines. It can quickly adapt to new target contexts using very few observations, thus enabling large-scale virtual screening in early-phase drug discovery.
    摘要 预测药物的生物活性是计算辅助药物发现中最Difficult和最重要的挑战之一。尽管世界各地研究机构多年来努力收集和筛选数据,生物活性数据仍然稀缺和多样化,这使得建立准确、可传播和可靠的预测模型变得困难。实验室中的自然变化更加剑指了数据积累做法,这些做法忽略了多样性,以 overcome sparsity。我们讨论了这些实践的局限性,并提出了一种层次元学习框架,该框架利用不同测试方法之间的信息协同,成功地考虑到多样性。我们显示,该模型在不同蛋白目标和测试方法上显著改善了粘性预测,相比传统基准值,它可以快速适应新的目标上下文,使用非常少的观察数据进行大规模的虚拟屏选,以便在早期药物发现阶段进行大规模的虚拟屏选。

MovePose: A High-performance Human Pose Estimation Algorithm on Mobile and Edge Devices

  • paper_url: http://arxiv.org/abs/2308.09084
  • repo_url: None
  • paper_authors: Dongyang Yu, Haoyue Zhang, Zhirui Zhou, Wangpeng An, Yanhong Yang
  • for: 这个论文的目的是为了提供高精度且实时性强的人体姿态估计模型,特别适用于 CPU 型手持式设备上的移动设备人体姿态估计。
  • methods: 这个模型使用了优化的轻量级卷积神经网组件,包括滤除、大 kernel 卷积和坐标分类方法,以提高精度和实时性。
  • results: 这个模型在 COCO 验证数据集上获得了 67.7 的 Mean Average Precision 分数,并在 Intel i9-10920x CPU 和 NVIDIA RTX3090 GPU 上显示出了高效性和实时性。
    Abstract We present MovePose, an optimized lightweight convolutional neural network designed specifically for real-time body pose estimation on CPU-based mobile devices. The current solutions do not provide satisfactory accuracy and speed for human posture estimation, and MovePose addresses this gap. It aims to maintain real-time performance while improving the accuracy of human posture estimation for mobile devices. The network produces 17 keypoints for each individual at a rate exceeding 11 frames per second, making it suitable for real-time applications such as fitness tracking, sign language interpretation, and advanced mobile human posture estimation. Our MovePose algorithm has attained an Mean Average Precision (mAP) score of 67.7 on the COCO \cite{cocodata} validation dataset. The MovePose algorithm displayed efficiency with a performance of 69+ frames per second (fps) when run on an Intel i9-10920x CPU. Additionally, it showcased an increased performance of 452+ fps on an NVIDIA RTX3090 GPU. On an Android phone equipped with a Snapdragon 8 + 4G processor, the fps reached above 11. To enhance accuracy, we incorporated three techniques: deconvolution, large kernel convolution, and coordinate classification methods. Compared to basic upsampling, deconvolution is trainable, improves model capacity, and enhances the receptive field. Large kernel convolution strengthens these properties at a decreased computational cost. In summary, MovePose provides high accuracy and real-time performance, marking it a potential tool for a variety of applications, including those focused on mobile-side human posture estimation. The code and models for this algorithm will be made publicly accessible.
    摘要 我们现在提出了 MovePose,一种优化的轻量级卷积神经网络,专门用于在 CPU 基于的移动设备上实时人体姿态估计。现有解决方案无法提供满意的准确率和速度 для人体姿态估计,MovePose 弥补了这一空隙。它的目标是在实时应用中维持实时性,同时提高人体姿态估计的准确率。该网络每秒钟产生17个关键点,适用于实时应用such as 健身跟踪、手语 interpret 和高级移动人体姿态估计。我们的 MovePose 算法在 COCO 验证集()上获得了67.7的 Mean Average Precision(mAP)分数。 MovePose 算法在 Intel i9-10920x CPU 上运行时达到69+帧每秒(fps)的性能,而在 NVIDIA RTX3090 GPU 上则达到452+ fps。在配备 Snapdragon 8 + 4G 处理器的 Android 手机上,fps 超过11。为了提高准确率,我们引入了三种技术:deconvolution、大小 kernel convolution 和坐标分类方法。相比基本的 upsampling,deconvolution 可以学习、提高模型容量和感知范围。大小 kernel convolution 强化这些特性,而且降低计算成本。综上所述,MovePose 提供了高准确率和实时性,使其成为许多应用,包括移动端人体姿态估计的可能工具。我们将代码和模型公开访问。

Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient

  • paper_url: http://arxiv.org/abs/2308.09082
  • repo_url: None
  • paper_authors: Rongfei Fan, Xuming An, Shiyuan Zuo, Han Hu
  • for: 这个论文的目的是提出一种基于无线电计算的联合学习(Federated Learning,FL)方法,以提高系统的计算效率。
  • methods: 这个论文使用了一种循环进程,其中每个移动设备都会更新本地梯度,并将其扩大并传输给服务器。服务器接收所有设备的梯度,并生成并广播新的模型参数。在扩大因子选择方面,大多数相关的工作假设了本地梯度的最大 нор Always happens,这可能会降低数据的整合性。为了解决这个问题,我们提议将本地梯度转换成正规化的梯度。
  • results: 我们的提议方法可以在平滑的损失函数下达到相对较快的收敛速率,并且在平滑和强地 convex 损失函数下可以达到最小的训练损失,并且发现了收敛速率和容错度之间的负反关系。此外,我们还提出了一些优化系统参数的问题,并 derivated 优化问题的优化解决方案,其中问题的解决方案具有多项式复杂度。实验结果表明,我们的提议方法可以超越标准方法在收敛性能方面。
    Abstract Over-the-air computation is a communication-efficient solution for federated learning (FL). In such a system, iterative procedure is performed: Local gradient of private loss function is updated, amplified and then transmitted by every mobile device; the server receives the aggregated gradient all-at-once, generates and then broadcasts updated model parameters to every mobile device. In terms of amplification factor selection, most related works suppose the local gradient's maximal norm always happens although it actually fluctuates over iterations, which may degrade convergence performance. To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it. Under our proposed method, when the loss function is smooth, we prove our proposed method can converge to stationary point at sub-linear rate. In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance. Moreover, a tradeoff between convergence rate and the tolerance is discovered. To speedup convergence, problems optimizing system parameters are also formulated for above two cases. Although being non-convex, optimal solution with polynomial complexity of the formulated problems are derived. Experimental results show our proposed method can outperform benchmark methods on convergence performance.
    摘要 无需网络传输的计算是一种可靠的解决方案 для联合学习(FL)。在这种系统中,每个移动设备都会进行迭代程序:每个设备都会更新、增强并将本地梯度传输给服务器,服务器会收到所有设备的汇总梯度,并生成并广播更新后的模型参数给每个设备。在扩大因子选择方面,大多数相关的研究假设了本地梯度的最大 нор Always happens,实际上,这可能会降低对称性性的性能。为解决这个问题,我们提议将本地梯度转换成归一化的梯度之前,然后扩大它。我们的提议方法可以在平滑的损失函数下 converge 到稳定点的速度下。在平滑和强地 convex 损失函数下,我们证明我们的提议方法可以在 any small positive tolerance 下 достичь最佳训练损失的速度。此外,我们发现了训练速度和误差容差之间的负面选择。为加速训练,我们还提出了一些优化系统参数的问题,并derived 优化问题的优秀解,其复杂度为多阶 polynomials。实验结果表明,我们的提议方法可以在对比方法上出现较好的融合性表现。

Conditional Sampling of Variational Autoencoders via Iterated Approximate Ancestral Sampling

  • paper_url: http://arxiv.org/abs/2308.09078
  • repo_url: None
  • paper_authors: Vaidotas Simkus, Michael U. Gutmann
  • for: 用于 addresses the limitations of Metropolis-within-Gibbs (MWG) sampler in variational autoencoders (VAEs) when learning a structured latent space.
  • methods: 使用 two original methods to mitigate the pitfalls of MWG sampler: (1) a novel initialization method to improve the mixing of the latent space, and (2) a adaptive temperature schedule to adjust the sampling temperature based on the current sample.
  • results: 对一系列的 sampling tasks 进行了实验,并证明了提出的方法可以提高 sampling 的性能。
    Abstract Conditional sampling of variational autoencoders (VAEs) is needed in various applications, such as missing data imputation, but is computationally intractable. A principled choice for asymptotically exact conditional sampling is Metropolis-within-Gibbs (MWG). However, we observe that the tendency of VAEs to learn a structured latent space, a commonly desired property, can cause the MWG sampler to get "stuck" far from the target distribution. This paper mitigates the limitations of MWG: we systematically outline the pitfalls in the context of VAEs, propose two original methods that address these pitfalls, and demonstrate an improved performance of the proposed methods on a set of sampling tasks.
    摘要 <转换给定文本到简化中文。> conditional sampling of variational autoencoders (VAEs) 是在各种应用中需要的,如数据缺失填充,但是计算复杂性太高。 Metropolis-within-Gibbs (MWG) 是一种理想的 asymptotically exact conditional sampling 方法,但我们发现 VAEs 往往学习一个结构化的尘肠空间,这是通常所希望的特性,可以使 MWG 抽样器受到目标分布的吸引力强化。这篇论文解决了 VAEs 中 MWG 抽样器的局限性,我们系统地描述了这些局限性在 VAEs 中的坑害,并提出了两种原创的方法来解决这些坑害,并在一组抽样任务中示出了提高的性能。

Fast Decision Support for Air Traffic Management at Urban Air Mobility Vertiports using Graph Learning

  • paper_url: http://arxiv.org/abs/2308.09075
  • repo_url: None
  • paper_authors: Prajit KrisshnaKumar, Jhoel Witter, Steve Paul, Hanvit Cho, Karthik Dantu, Souma Chowdhury
  • for: 解决城市和郊区堵塞、安全、快速旅行的问题,提供城市空中流体力(UAM)的新一代解决方案。
  • methods: 使用图 reconstruction学习来生成决策支持策略,图 reconstruction学习是一种基于图学习的自动决策技术。
  • results: 通过在AirSim simulator上进行实际的多rotor飞行器模拟,研究发现使用图 reconstruction学习解决UAM-VSM问题的适用性和优势,比基本的再征学习(图嵌入)或随机选择基线更好。
    Abstract Urban Air Mobility (UAM) promises a new dimension to decongested, safe, and fast travel in urban and suburban hubs. These UAM aircraft are conceived to operate from small airports called vertiports each comprising multiple take-off/landing and battery-recharging spots. Since they might be situated in dense urban areas and need to handle many aircraft landings and take-offs each hour, managing this schedule in real-time becomes challenging for a traditional air-traffic controller but instead calls for an automated solution. This paper provides a novel approach to this problem of Urban Air Mobility - Vertiport Schedule Management (UAM-VSM), which leverages graph reinforcement learning to generate decision-support policies. Here the designated physical spots within the vertiport's airspace and the vehicles being managed are represented as two separate graphs, with feature extraction performed through a graph convolutional network (GCN). Extracted features are passed onto perceptron layers to decide actions such as continue to hover or cruise, continue idling or take-off, or land on an allocated vertiport spot. Performance is measured based on delays, safety (no. of collisions) and battery consumption. Through realistic simulations in AirSim applied to scaled down multi-rotor vehicles, our results demonstrate the suitability of using graph reinforcement learning to solve the UAM-VSM problem and its superiority to basic reinforcement learning (with graph embeddings) or random choice baselines.
    摘要

Joint Power Control and Data Size Selection for Over-the-Air Computation Aided Federated Learning

  • paper_url: http://arxiv.org/abs/2308.09072
  • repo_url: https://github.com/anxuming/fedaircomp
  • paper_authors: Xuming An, Rongfei Fan, Shiyuan Zuo, Han Hu, Hai Jiang, Ning Zhang
  • for: 这个研究旨在提高 Federated Learning (FL) 中总站(BS)与多个移动设备(Mobile Device,MD)之间的训练模型参数的整合。
  • methods: 我们提出了一种基于 Over-the-air computation 的 Spectrum-efficient 解决方案,将 MD 的参数映射信号同时传递到 BS。
  • results: 我们的方法可以对 MSE 进行最小化,并且能够提高 FL 的训练性能,比对 Benchmark 方法更好。
    Abstract Federated learning (FL) has emerged as an appealing machine learning approach to deal with massive raw data generated at multiple mobile devices, {which needs to aggregate the training model parameter of every mobile device at one base station (BS) iteratively}. For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS. Due to heterogeneous channel fading and noise, there exists difference between the BS's received signal and its desired signal, measured as the mean-squared error (MSE). To minimize the MSE, we propose to jointly optimize the signal amplification factors at the BS and the mobile devices as well as the data size (the number of data samples involved in local training) at every mobile device. The formulated problem is challenging to solve due to its non-convexity. To find the optimal solution, with some simplification on cost function and variable replacement, which still preserves equivalence, we transform the changed problem to be a bi-level problem equivalently. For the lower-level problem, optimal solution is found by enumerating every candidate solution from the Karush-Kuhn-Tucker (KKT) condition. For the upper-level problem, the optimal solution is found by exploring its piecewise convexity. Numerical results show that our proposed method can greatly reduce the MSE and can help to improve the training performance of FL compared with benchmark methods.
    摘要 “联合学习(FL)已经出现为处理大量 raw 数据生成多个移动设备的有吸引力机器学习方法。在 FL 中,需要在一个基站(BS)上运算多个移动设备的训练模型参数,并在每个移动设备上进行本地训练。为了在 BS 上进行参数联合,游击式计算是一种可以实现spectrum-efficient的解决方案,允许所有的移动设备同时将参数映射到 BS 上。由于频道折射和噪音的不同,BS 所接收到的信号和其所需的信号之间存在差异,表示为 Mean-Squared Error(MSE)。为了最小化 MSE,我们提议同时优化基站和移动设备中的信号增幅因子以及每个移动设备上的数据大小。这个问题具有非对称性,实际上是一个问题。为了找到最佳解决方案,我们通过将问题转换为一个类比问题,并通过 KKT 条件来找到最佳解。结果显示,我们的提议方法可以对 MSE 进行重大降低,并且可以帮助 FL 的训练性能提高。”