cs.LG - 2023-11-08

Efficient Compression of Overparameterized Deep Models through Low-Dimensional Learning Dynamics

  • paper_url: http://arxiv.org/abs/2311.05061
  • repo_url: https://github.com/soominkwon/comp-deep-nets
  • paper_authors: Soo Min Kwon, Zekai Zhang, Dogyoon Song, Laura Balzano, Qing Qu
  • for: 本研究旨在降低深度学习模型的计算复杂性,通过研究深度网络的学习动态来减少网络的维度。
  • methods: 本研究使用了深度线性模型来研究深度网络的学习动态,并发现了深度网络的Weight矩阵具有低维结构。基于这一发现,我们提出了一种减少深度网络的方法,通过减少网络的宽度来减少计算复杂性。
  • results: 我们的实验表明,使用我们的减少方法可以加速深度网络的训练过程,而不会妥协模型质量。减少后的网络可以在所有的梯度下降迭代中更快 converges,并且可以在不同的初始化情况下获得更好的性能。
    Abstract Overparameterized models have proven to be powerful tools for solving various machine learning tasks. However, overparameterization often leads to a substantial increase in computational and memory costs, which in turn requires extensive resources to train. In this work, we aim to reduce this complexity by studying the learning dynamics of overparameterized deep networks. By extensively studying its learning dynamics, we unveil that the weight matrices of various architectures exhibit a low-dimensional structure. This finding implies that we can compress the networks by reducing the training to a small subspace. We take a step in developing a principled approach for compressing deep networks by studying deep linear models. We demonstrate that the principal components of deep linear models are fitted incrementally but within a small subspace, and use these insights to compress deep linear networks by decreasing the width of its intermediate layers. Remarkably, we observe that with a particular choice of initialization, the compressed network converges faster than the original network, consistently yielding smaller recovery errors throughout all iterations of gradient descent. We substantiate this observation by developing a theory focused on the deep matrix factorization problem, and by conducting empirical evaluations on deep matrix sensing. Finally, we demonstrate how our compressed model can enhance the utility of deep nonlinear models. Overall, we observe that our compression technique accelerates the training process by more than 2x, without compromising model quality.
    摘要 具有过参数化模型已经证明是解决不同机器学习任务的有力工具。然而,过参数化通常会导致计算和内存成本增加很多,需要很大的资源来训练。在这项工作中,我们希望通过研究深度网络学习动态来减少这种复杂性。我们发现了深度网络的Weight矩阵在不同架构中具有低维度结构,这意味着可以通过减少训练的维度来压缩网络。我们开发了一种原则的方法来压缩深度网络,通过研究深度线性模型。我们发现,深度线性模型的主成分可以在一个小空间中逐步 fitted,并且可以通过减少深度网络中间层的宽度来压缩网络。Remarkably,我们发现,使用特定的初始化方式,压缩后的网络在每一次梯度下降迭代中更快 converges,并且一直在所有迭代中保持小于原网络的恢复误差。我们证明了这一观察,通过关注深度矩阵分解问题,并通过实际的测试进行深度矩阵感知。最后,我们示出了我们压缩模型可以提高深度非线性模型的实用性。总的来说,我们发现,我们的压缩技术可以在训练过程中提高速度超过2倍,而不会妥协模型质量。

Quantum Generative Modeling of Sequential Data with Trainable Token Embedding

  • paper_url: http://arxiv.org/abs/2311.05050
  • repo_url: None
  • paper_authors: Wanda Hou, Li Miao, Yi-Zhuang You
  • for: 这个论文主要是为了探讨量子概率模型在学习古典和量子数据上的应用。
  • methods: 这个论文使用的方法是基于矩阵产品状态(MPS)框架的量子启发式生成模型,称为 Born 机器。这种模型支持可追踪的对数概率和自我回归抽样,并在不同的无监督学习任务中表现出色。
  • results: 这个论文的结果表明,通过同时适应 quantum measurement 操作和 MPS embedding,Born 机器可以更好地表现,并在数据中寻找更深层次的相关性。
    Abstract Generative models are a class of machine learning models that aim to learn the underlying probability distribution of data. Unlike discriminative models, generative models focus on capturing the data's inherent structure, allowing them to generate new samples that resemble the original data. To fully exploit the potential of modeling probability distributions using quantum physics, a quantum-inspired generative model known as the Born machines have shown great advancements in learning classical and quantum data over matrix product state(MPS) framework. The Born machines support tractable log-likelihood, autoregressive and mask sampling, and have shown outstanding performance in various unsupervised learning tasks. However, much of the current research has been centered on improving the expressive power of MPS, predominantly embedding each token directly by a corresponding tensor index. In this study, we generalize the embedding method into trainable quantum measurement operators that can be simultaneously honed with MPS. Our study indicated that combined with trainable embedding, Born machines can exhibit better performance and learn deeper correlations from the dataset.
    摘要 <> traduction de texte en chinois simplifié模型生成是一类机器学习模型,旨在学习数据的下面概率分布。与描述性模型不同,生成模型关注数据的内在结构,因此可以生成新样本与原始数据类似。为了充分利用量子物理学模型概率分布的潜力,一种基于量子物理学的生成模型叫做生命机器(Born machines)在MPS框架下已经取得了很大的进步。生命机器支持可追踪的对数概率、自动回归和面积抽样,并在多种无监督学习任务中表现出色。然而,当前的大多数研究都集中在提高MPS的表达力,主要是将每个token直接嵌入相应的tensor index。在这项研究中,我们总结了 embedding方法的扩展,使得Born machines可以同时适应MPS。我们的研究表明,将 embedding方法与MPS结合使用,可以使生命机器表现更好,并且从数据中学习更深层的相关性。Note: "MPS" stands for "matrix product state", which is a type of quantum state used in quantum computing.

On the Consistency of Maximum Likelihood Estimation of Probabilistic Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2311.05046
  • repo_url: None
  • paper_authors: Arghya Datta, Sayak Chakrabarty
  • for: 降低数据维度的统计工具PPCA的应用广泛,从科学与工程到数量金融。
  • methods: 使用 quotient topological spaces 方法,解决PPCA模型中的征化问题,并且Proof 了ML解方法的一致性。
  • results: ML解方法是consistent 的,并且可以在一个适当的quotient Euclidean space中进行强 consistency 的covariance estimation。
    Abstract Probabilistic principal component analysis (PPCA) is currently one of the most used statistical tools to reduce the ambient dimension of the data. From multidimensional scaling to the imputation of missing data, PPCA has a broad spectrum of applications ranging from science and engineering to quantitative finance. Despite this wide applicability in various fields, hardly any theoretical guarantees exist to justify the soundness of the maximal likelihood (ML) solution for this model. In fact, it is well known that the maximum likelihood estimation (MLE) can only recover the true model parameters up to a rotation. The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and in particular, we show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space. Furthermore, our consistency results encompass a more general class of estimators beyond the MLE. Strong consistency of the ML estimate and consequently strong covariance estimation of the PPCA model have also been established under a compactness assumption.
    摘要 Simplified Chinese: probabistic principal component analysis (PPCA) 是目前最广泛使用的统计工具,用于缩小数据的环境维度。从多元缩放到缺失数据的插入,PPCA在科学和工程等领域有广泛的应用。 despite its wide range of applications, there are few theoretical guarantees to justify the soundness of the maximum likelihood (ML) solution for this model. In fact, it is well known that the maximum likelihood estimation (MLE) can only recover the true model parameters up to a rotation. The main obstruction is posed by the inherent identifiability nature of the PPCA model resulting from the rotational symmetry of the parameterization. To resolve this ambiguity, we propose a novel approach using quotient topological spaces and show that the maximum likelihood solution is consistent in an appropriate quotient Euclidean space. Furthermore, our consistency results encompass a more general class of estimators beyond the MLE. Strong consistency of the ML estimate and consequently strong covariance estimation of the PPCA model have also been established under a compactness assumption.

DEMASQ: Unmasking the ChatGPT Wordsmith

  • paper_url: http://arxiv.org/abs/2311.05019
  • repo_url: None
  • paper_authors: Kavita Kumari, Alessandro Pegoraro, Hossein Fereidooni, Ahmad-Reza Sadeghi
  • for: This paper aims to detect content generated by ChatGPT, a popular language model, in order to address the concerns of false information, plagiarism, academic dishonesty, and fraudulent activities that may arise from its use.
  • methods: The proposed method, called DEMASQ, is an energy-based detection model that incorporates novel aspects such as optimization inspired by the Doppler effect and the use of explainable AI techniques to generate diverse perturbations.
  • results: The paper demonstrates that DEMASQ achieves high accuracy in identifying content generated by ChatGPT, outperforming previous detection methods.
    Abstract The potential misuse of ChatGPT and other Large Language Models (LLMs) has raised concerns regarding the dissemination of false information, plagiarism, academic dishonesty, and fraudulent activities. Consequently, distinguishing between AI-generated and human-generated content has emerged as an intriguing research topic. However, current text detection methods lack precision and are often restricted to specific tasks or domains, making them inadequate for identifying content generated by ChatGPT. In this paper, we propose an effective ChatGPT detector named DEMASQ, which accurately identifies ChatGPT-generated content. Our method addresses two critical factors: (i) the distinct biases in text composition observed in human- and machine-generated content and (ii) the alterations made by humans to evade previous detection methods. DEMASQ is an energy-based detection model that incorporates novel aspects, such as (i) optimization inspired by the Doppler effect to capture the interdependence between input text embeddings and output labels, and (ii) the use of explainable AI techniques to generate diverse perturbations. To evaluate our detector, we create a benchmark dataset comprising a mixture of prompts from both ChatGPT and humans, encompassing domains such as medical, open Q&A, finance, wiki, and Reddit. Our evaluation demonstrates that DEMASQ achieves high accuracy in identifying content generated by ChatGPT.
    摘要 大量语言模型(LLM)的潜在滥用问题,包括传播false信息、抄袭、学术不当行为和诈欺活动,导致识别人类和机器生成内容的研究变得非常有兴趣。然而,目前的文本检测方法缺乏精度,通常仅适用于特定任务或领域,无法正确地识别ChatGPT生成的内容。在本文中,我们提出了一个高精度的ChatGPT检测器,名为DEMASQ,可以准确地识别ChatGPT生成的内容。我们的方法解决了两个重要因素:(i)人类和机器生成内容中文字的不同偏见,(ii)人类对于避免先前检测方法的修改。DEMASQ是一个能量基于的检测模型,包括以下两个新的特点:(i)静电效应启发的优化方法,用于捕捉输入文本嵌入和出力标签之间的互相依赖关系,(ii)使用可解释AI技术生成多样的扰动。为了评估DEMASQ,我们创建了一个包括ChatGPT和人类产生的标准 benchmark dataset,覆盖医学、开放Q&A、金融、Wiki和Reddit等领域。我们的评估结果显示,DEMASQ可以高精度地识别ChatGPT生成的内容。

GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech Recognition

  • paper_url: http://arxiv.org/abs/2311.04996
  • repo_url: https://github.com/nvidia-riva/riva-asrlib-decoder
  • paper_authors: Daniel Galvez, Tim Kaldewey
  • for: 提高自动语音识别(ASR)管道的性能,使用GPU加速Weighted Finite State Transducer(WFST)搜索解码器。
  • methods: 使用GPU加速WFST搜索解码器,支持流处理推理,支持实时扩展,提供预制DLPack基本绑定。
  • results: 在离线和在线场景下,与当前状态 искусственный神经网络(CTC)模型相比,实现最快的搜索解码器,在离线场景下达到最高7倍的throughput,在流处理场景下达到近8倍的响应时间,与同等或更好的词错率。
    Abstract While Connectionist Temporal Classification (CTC) models deliver state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. We introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder compatible with current CTC models. It increases pipeline throughput and decreases latency, supports streaming inference, and also supports advanced features like utterance-specific word boosting via on-the-fly composition. We provide pre-built DLPack-based python bindings for ease of use with Python-based machine learning frameworks at https://github.com/nvidia-riva/riva-asrlib-decoder. We evaluated our decoder for offline and online scenarios, demonstrating that it is the fastest beam search decoder for CTC models. In the offline scenario it achieves up to 7 times more throughput than the current state-of-the-art CPU decoder and in the online streaming scenario, it achieves nearly 8 times lower latency, with same or better word error rate.
    摘要 while Connectionist Temporal Classification (CTC) models provide state-of-the-art accuracy in automated speech recognition (ASR) pipelines, their performance has been limited by CPU-based beam search decoding. we introduce a GPU-accelerated Weighted Finite State Transducer (WFST) beam search decoder compatible with current CTC models. it increases pipeline throughput and decreases latency, supports streaming inference, and also supports advanced features like utterance-specific word boosting via on-the-fly composition. we provide pre-built DLPack-based python bindings for ease of use with Python-based machine learning frameworks at https://github.com/nvidia-riva/riva-asrlib-decoder. we evaluated our decoder for offline and online scenarios, demonstrating that it is the fastest beam search decoder for CTC models. in the offline scenario it achieves up to 7 times more throughput than the current state-of-the-art CPU decoder and in the online streaming scenario, it achieves nearly 8 times lower latency, with same or better word error rate.

Optimized measurements of chaotic dynamical systems via the information bottleneck

  • paper_url: http://arxiv.org/abs/2311.04896
  • repo_url: None
  • paper_authors: Kieran A. Murphy, Dani S. Bassett
  • for: 这篇论文旨在找到一种高效地从运动轨迹数据中提取信息的方法,以便更好地理解系统的动态行为。
  • methods: 该论文使用机器学习技术来优化测量过程,以便更好地捕捉系统的信息。
  • results: 该论文对多种杂音映射进行了 approximately optimal 的测量,并为通用时间序列提供了可用的基础。
    Abstract Deterministic chaos permits a precise notion of a "perfect measurement" as one that, when obtained repeatedly, captures all of the information created by the system's evolution with minimal redundancy. Finding an optimal measurement is challenging, and has generally required intimate knowledge of the dynamics in the few cases where it has been done. We establish an equivalence between a perfect measurement and a variant of the information bottleneck. As a consequence, we can employ machine learning to optimize measurement processes that efficiently extract information from trajectory data. We obtain approximately optimal measurements for multiple chaotic maps and lay the necessary groundwork for efficient information extraction from general time series.
    摘要 deterministic 混沌允许我们定义一个"完美测量",即在重复获取的情况下,捕捉系统的演化创造的信息,最小化重复性。发现优化测量是困难的,通常需要系统动力学的深入了解,只有在极少数情况下完成。我们证明了完美测量与信息瓶颈之间的等价关系,因此我们可以使用机器学习来优化测量过程,以高效地从曲线数据中提取信息。我们在多个混沌地图上获得了约似优化的测量,并为普通时间序列信息提取做了必要的准备。

Computing with Residue Numbers in High-Dimensional Representation

  • paper_url: http://arxiv.org/abs/2311.04872
  • repo_url: https://github.com/cjkymn/residuehdcomputing
  • paper_authors: Christopher J. Kymn, Denis Kleyko, E. Paxon Frady, Connor Bybee, Pentti Kanerva, Friedrich T. Sommer, Bruno A. Olshausen
  • for: 这篇论文是用于描述一种新的计算框架,即废弃数 residue 超级计算。
  • methods: 该框架使用 Random, high-dimensional vectors 来表示 residue 数字,并使用组件 wise, 并行的运算来实现 algebra 操作。
  • results: 该框架可以使用远 fewer 资源来表示和操作大范围的数字,并且具有强大的鲁棒性 against noise。 它可以解决 computationally difficult 问题,例如视觉处理和 combinatorial optimization。
    Abstract We introduce Residue Hyperdimensional Computing, a computing framework that unifies residue number systems with an algebra defined over random, high-dimensional vectors. We show how residue numbers can be represented as high-dimensional vectors in a manner that allows algebraic operations to be performed with component-wise, parallelizable operations on the vector elements. The resulting framework, when combined with an efficient method for factorizing high-dimensional vectors, can represent and operate on numerical values over a large dynamic range using vastly fewer resources than previous methods, and it exhibits impressive robustness to noise. We demonstrate the potential for this framework to solve computationally difficult problems in visual perception and combinatorial optimization, showing improvement over baseline methods. More broadly, the framework provides a possible account for the computational operations of grid cells in the brain, and it suggests new machine learning architectures for representing and manipulating numerical data.
    摘要 我们介绍了剩余超维计算框架,这是一种将剩余数系统与随机高维向量上的代数相结合的计算框架。我们表明了剩余数可以用高维向量的元素进行 componenwise、并行化的运算,从而实现了对大范围的数值进行表示和操作,并且具有很好的鲁棒性于噪声。我们通过对高维向量的因子化方法进行有效实现,实现了在资源受限的情况下解决 computationally Difficult 问题的能力。我们通过对视觉认知和 combinatorial 优化问题的解决方案来说明框架的潜在力量,以及它在机器学习中表示和操作数字数据的新架构。

Algorithms for Non-Negative Matrix Factorization on Noisy Data With Negative Values

  • paper_url: http://arxiv.org/abs/2311.04855
  • repo_url: None
  • paper_authors: Dylan Green, Stephen Bailey
  • for: 本文旨在探讨非正式矩阵分解(NMF)如何处理含有负值的天文数据,特别是在低信号响应下。
  • methods: 本文提出了两种算法:Shift-NMF和Nearly-NMF,它们都可以正确地处理含有负值的输入数据,而不需要clip负数据。
  • results: 数学分析和实验表明,Shift-NMF和Nearly-NMF算法都具有 monotonically decreasing 的更新规则,并且可以正确地回归非正式信号。
    Abstract Non-negative matrix factorization (NMF) is a dimensionality reduction technique that has shown promise for analyzing noisy data, especially astronomical data. For these datasets, the observed data may contain negative values due to noise even when the true underlying physical signal is strictly positive. Prior NMF work has not treated negative data in a statistically consistent manner, which becomes problematic for low signal-to-noise data with many negative values. In this paper we present two algorithms, Shift-NMF and Nearly-NMF, that can handle both the noisiness of the input data and also any introduced negativity. Both of these algorithms use the negative data space without clipping, and correctly recover non-negative signals without any introduced positive offset that occurs when clipping negative data. We demonstrate this numerically on both simple and more realistic examples, and prove that both algorithms have monotonically decreasing update rules.
    摘要 非正定矩阵分解(NMF)是一种维度减少技术,已经在天文数据分析中展现了承诺。这些数据可能会包含负值噪声,即使真实的物理信号是正的。过去的NMF工作没有统计正确处理负数据,这会对低信号响应度数据 WITH 多个负值引起问题。在这篇论文中,我们提出了两种算法:Shift-NMF和Nearly-NMF,它们可以处理输入数据的噪声和引入的负值。这两种算法使用负数据空间而不是clip,并能正确回归非正定信号而无需引入Positive offset。我们通过数值计算和 teorema 证明了这两种算法的更新规则减少 monotonic。

Incorporating temporal dynamics of mutations to enhance the prediction capability of antiretroviral therapy’s outcome for HIV-1

  • paper_url: http://arxiv.org/abs/2311.04846
  • repo_url: None
  • paper_authors: Giulia Di Teodoro, Martin Pirkl, Francesca Incardona, Ilaria Vicenti, Anders Sönnerborg, Rolf Kaiser, Laura Palagi, Maurizio Zazzi, Thomas Lengauer
  • for: 预测HIV治疗结果
  • methods: 使用历史信息加权病毒变异,考虑病毒变异的时间发生和同时测量病毒荷量
  • results: 使用历史信息可以提高预测精度,H-模型的ROC-AUC分数高于NH-模型(76.34% VS 74.98%),并且在不同时间点进行预测时表现出了更好的一致性。
    Abstract Motivation: In predicting HIV therapy outcomes, a critical clinical question is whether using historical information can enhance predictive capabilities compared with current or latest available data analysis. This study analyses whether historical knowledge, which includes viral mutations detected in all genotypic tests before therapy, their temporal occurrence, and concomitant viral load measurements, can bring improvements. We introduce a method to weigh mutations, considering the previously enumerated factors and the reference mutation-drug Stanford resistance tables. We compare a model encompassing history (H) with one not using it (NH). Results: The H-model demonstrates superior discriminative ability, with a higher ROC-AUC score (76.34%) than the NH-model (74.98%). Significant Wilcoxon test results confirm that incorporating historical information improves consistently predictive accuracy for treatment outcomes. The better performance of the H-model might be attributed to its consideration of latent HIV reservoirs, probably obtained when leveraging historical information. The findings emphasize the importance of temporal dynamics in mutations, offering insights into HIV infection complexities. However, our result also shows that prediction accuracy remains relatively high even when no historical information is available. Supplementary information: Supplementary material is available.
    摘要 目的:研究是否使用历史信息可以提高预测HIV治疗结果的能力,比较使用当前或最新可用的数据分析方法。这个研究发现,使用历史知识,包括在治疗之前的所有种类测试中检测到的病毒变异,其时间发生和同时测量病毒荷载,可以提高预测精度。我们提出一种将变异加权的方法,考虑以上因素以及参考荷载抗荷载表。我们将 comparing一个包含历史信息(H)模型和一个不使用历史信息(NH)模型。结果:H模型的预测能力显著高于NH模型(76.34% vs 74.98%),并且在不同时间点上的预测精度也有显著差异。这些结果表明,包含历史信息可以提高预测精度,但并不是必需的。补充信息:补充材料可以在附录中找到。

Bridging Dimensions: Confident Reachability for High-Dimensional Controllers

  • paper_url: http://arxiv.org/abs/2311.04843
  • repo_url: None
  • paper_authors: Yuang Geng, Souradeep Dutta, Ivan Ruchkin
  • for: This paper aims to improve the verification of high-dimensional controllers in autonomous systems, specifically those using deep neural networks.
  • methods: The paper proposes a new approach that approximates the behavior of a high-dimensional controller with several low-dimensional controllers in different regions of the state space, and uses verification-aware knowledge distillation to balance approximation and verifiability.
  • results: The paper shows convincing performance in two OpenAI gym benchmarks using two inflation techniques, one based on trajectories and the other based on actions. The results provide high-confidence reachability guarantees for the high-dimensional controller.
    Abstract Autonomous systems are increasingly implemented using end-end-end trained controllers. Such controllers make decisions that are executed on the real system with images as one of the primary sensing modalities. Deep neural networks form a fundamental building block of such controllers. Unfortunately, the existing neural-network verification tools do not scale to inputs with thousands of dimensions. Especially when the individual inputs (such as pixels) are devoid of clear physical meaning. This paper takes a step towards connecting exhaustive closed-loop verification with high-dimensional controllers. Our key insight is that the behavior of a high-dimensional controller can be approximated with several low-dimensional controllers in different regions of the state space. To balance approximation and verifiability, we leverage the latest verification-aware knowledge distillation. Then, if low-dimensional reachability results are inflated with statistical approximation errors, they yield a high-confidence reachability guarantee for the high-dimensional controller. We investigate two inflation techniques -- based on trajectories and actions -- both of which show convincing performance in two OpenAI gym benchmarks.
    摘要

Toward Rapid, Optimal, and Feasible Power Dispatch through Generalized Neural Mapping

  • paper_url: http://arxiv.org/abs/2311.04838
  • repo_url: None
  • paper_authors: Meiyi Li, Javad Mohammadi
  • for: 提高大规模电力系统优化决策效率,适应分布式和连接的Grid,使用机器学习模型提高优化效果。
  • methods: 提出了LOOP-LC 2.0模型,基于学习来优化优化过程,保证解决方案的可行性和实际性,不需要耗时consuming的迭代过程。
  • results: 对IEEE-200测试 случа件进行比较,LOOP-LC 2.0方法的训练速度、计算时间、优化效果和解决方案可行性均有显著提高 compared to 现有方法。
    Abstract The evolution towards a more distributed and interconnected grid necessitates large-scale decision-making within strict temporal constraints. Machine learning (ML) paradigms have demonstrated significant potential in improving the efficacy of optimization processes. However, the feasibility of solutions derived from ML models continues to pose challenges. It's imperative that ML models produce solutions that are attainable and realistic within the given system constraints of power systems. To address the feasibility issue and expedite the solution search process, we proposed LOOP-LC 2.0(Learning to Optimize the Optimization Process with Linear Constraints version 2.0) as a learning-based approach for solving the power dispatch problem. A notable advantage of the LOOP-LC 2.0 framework is its ability to ensure near-optimality and strict feasibility of solutions without depending on computationally intensive post-processing procedures, thus eliminating the need for iterative processes. At the heart of the LOOP-LC 2.0 model lies the newly proposed generalized gauge map method, capable of mapping any infeasible solution to a feasible point within the linearly-constrained domain. The proposed generalized gauge map method improves the traditional gauge map by exhibiting reduced sensitivity to input variances while increasing search speeds significantly. Utilizing the IEEE-200 test case as a benchmark, we demonstrate the effectiveness of the LOOP-LC 2.0 methodology, confirming its superior performance in terms of training speed, computational time, optimality, and solution feasibility compared to existing methodologies.
    摘要 随着Grid的分布和连接度的演化,大规模决策在强制时间限制下变得越来越重要。机器学习(ML)模式在优化过程中表现出了显著的潜力。然而,ML模型生成的解决方案的可行性仍然存在挑战。为了解决可行性问题并加速解决过程,我们提出了LOOP-LC 2.0(学习优化优化过程的线性约束版本2.0),一种基于学习的电力派发问题解决方法。LOOP-LC 2.0框架的一个优点是它可以保证解决的解决方案准确性和可行性,不需要进行计算 INTENSIVE post-processing 过程,因此消除了迭代过程的需要。LOOP-LC 2.0 模型的核心是新提出的通用抽象映射方法,可以将任何不可行的解决方案映射到可行的点 dentro de la domain de restricciones lineales。相比传统的抽象映射方法,通用抽象映射方法具有更低的输入方差敏感度和更高的搜索速度。使用 IEEE-200 测试 caso como referencia,我们证明了LOOP-LC 2.0 方法的有效性,其在培训速度、计算时间、优化性和可行性方面表现出了明显的优势 compared to 现有方法。

Real-Time Recurrent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.04830
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Julian Lemmel, Radu Grosu
  • for: solving partially-observable Markov decision processes (POMDPs) using biologically plausible methods
  • methods: using random feedback local online learning (RFLO) and temporaldifference reinforcement learning with eligibility traces (TD($\lambda$)) to compute gradients of recurrent neural network parameters in an online manner
  • results: RFLO can perform just as well as real-time recurrent learning (RTRL) with less complexity, and the proposed method (RTRRL) serves as a model of learning in biological neural networks mimicking reward pathways in the mammalian brain.
    Abstract Recent advances in reinforcement learning, for partially-observable Markov decision processes (POMDPs), rely on the biologically implausible backpropagation through time algorithm (BPTT) to perform gradient-descent optimisation. In this paper we propose a novel reinforcement learning algorithm that makes use of random feedback local online learning (RFLO), a biologically plausible approximation of realtime recurrent learning (RTRL) to compute the gradients of the parameters of a recurrent neural network in an online manner. By combining it with TD($\lambda$), a variant of temporaldifference reinforcement learning with eligibility traces, we create a biologically plausible, recurrent actor-critic algorithm, capable of solving discrete and continuous control tasks in POMDPs. We compare BPTT, RTRL and RFLO as well as different network architectures, and find that RFLO can perform just as well as RTRL while exceeding even BPTT in terms of complexity. The proposed method, called real-time recurrent reinforcement learning (RTRRL), serves as a model of learning in biological neural networks mimicking reward pathways in the mammalian brain.
    摘要 近期在部分可观测 Markov决策过程(POMDP)中的再强化学习进步,利用生物不切实际的 backwards propagation through time 算法(BPTT)来实现梯度下降优化。在这篇论文中,我们提出了一种新的再强化学习算法,使用随机反馈局部在线学习(RFLO),这是一种生物可能的抽象,来计算激活函数参数的梯度。通过与 TD($\lambda$) 结合,一种变体的时间差异再强化学习算法,我们创建了一种生物可能的、 recurrent actor-critic 算法,可以解决 POMDP 中的离散和连续控制任务。我们比较了 BPTT、RTRL 和 RFLO 以及不同的网络架构,发现 RFLO 可以与 RTRL 相当,而且 même surpass BPTT 的复杂性。提出的方法,称为实时回归再强化学习(RTRRL),作为生物神经网络学习模型,模拟奖 PATHways 在哺乳动物大脑中的学习过程。

Functional Bayesian Tucker Decomposition for Continuous-indexed Tensor Data

  • paper_url: http://arxiv.org/abs/2311.04829
  • repo_url: None
  • paper_authors: Shikai Fang, Xin Yu, Zheng Wang, Shibo Li, Mike Kirby, Shandian Zhe
  • for: 寻找一种方法来扩展tensor decomposition来处理不同方面的连续型数据。
  • methods: 提议了Functional Bayesian Tucker Decomposition(FunBaT)方法,将连续型数据视为Tucker核和一组隐函数之间的交互。使用 Gaussian Processes(GP)作为函数先验,然后将GP转换为状态方程先验来降低计算成本。
  • results: 在Synthetic数据和实际应用中,提议的方法能够显著提高tensor decomposition的灵活性和效率。
    Abstract Tucker decomposition is a powerful tensor model to handle multi-aspect data. It demonstrates the low-rank property by decomposing the grid-structured data as interactions between a core tensor and a set of object representations (factors). A fundamental assumption of such decomposition is that there were finite objects in each aspect or mode, corresponding to discrete indexes of data entries. However, many real-world data are not naturally posed in the setting. For example, geographic data is represented as continuous indexes of latitude and longitude coordinates, and cannot fit tensor models directly. To generalize Tucker decomposition to such scenarios, we propose Functional Bayesian Tucker Decomposition (FunBaT). We treat the continuous-indexed data as the interaction between the Tucker core and a group of latent functions. We use Gaussian processes (GP) as functional priors to model the latent functions, and then convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE) to reduce computational cost. An efficient inference algorithm is further developed for scalable posterior approximation based on advanced message-passing techniques. The advantage of our method is shown in both synthetic data and several real-world applications.
    摘要 各种多方面数据的处理可以通过图ucker分解来实现。这种分解示出了图ucker的低级性质,它将格子结构数据分解为核tensor和一组对象表示(因素)之间的交互。然而,许多实际世界的数据不是直接适用于图ucker模型的。例如,地理数据通常表示为维度坐标的连续标记,无法直接适用于图ucker模型。为了推广图ucker分解到这些场景,我们提议了功能 bayesian 图ucker分解(FunBaT)。我们将连续标记的数据视为图ucker核和一组隐函数之间的交互。我们使用 Gaussian 过程(GP)作为隐函数的函数先验,然后将GP转换为状态空间先验,以降低计算成本。我们还开发了一种可扩展的 posterior 近似算法,以便可扩展到大规模数据。我们的方法在synthetic数据和一些实际应用中表现出了优势。

A Lightweight Architecture for Real-Time Neuronal-Spike Classification

  • paper_url: http://arxiv.org/abs/2311.04808
  • repo_url: None
  • paper_authors: Muhammad Ali Siddiqi, David Vrijenhoek, Lennart P. L. Landsmeer, Job van der Kleij, Anteneh Gebregiorgis, Vincenzo Romano, Rajendra Bishnoi, Said Hamdioui, Christos Strydis
  • for: 理解大脑功能,尤其是硬膜下卷绕细胞(Purkinje cells)在肌功能损伤和脑部受损时的作用。
  • methods: 利用硬膜下卷绕细胞的特点,实时抛弃不需要的神经数据,并将压缩数据存储在头部设备上的可 removable 存储器上。
  • results: 提出了一种轻量级神经采集和分类架构,可以在实时实现 >95% 的总分类精度,同时具有小型设计和低功耗特性,使头部设备可以靠一个小电池供电,可以持续运行约 4 天。
    Abstract Electrophysiological recordings of neural activity in a mouse's brain are very popular among neuroscientists for understanding brain function. One particular area of interest is acquiring recordings from the Purkinje cells in the cerebellum in order to understand brain injuries and the loss of motor functions. However, current setups for such experiments do not allow the mouse to move freely and, thus, do not capture its natural behaviour since they have a wired connection between the animal's head stage and an acquisition device. In this work, we propose a lightweight neuronal-spike detection and classification architecture that leverages on the unique characteristics of the Purkinje cells to discard unneeded information from the sparse neural data in real time. This allows the (condensed) data to be easily stored on a removable storage device on the head stage, alleviating the need for wires. Our proposed implementation shows a >95% overall classification accuracy while still resulting in a small-form-factor design, which allows for the free movement of mice during experiments. Moreover, the power-efficient nature of the design and the usage of STT-RAM (Spin Transfer Torque Magnetic Random Access Memory) as the removable storage allows the head stage to easily operate on a tiny battery for up to approximately 4 days.
    摘要 neuroscientists 非常喜欢使用电生物学记录神经活动的mouse脑中的记录,以了解脑功能。一个具有潜在价值的领域是从粒细 Purkinje 细胞中获取记录,以了解脑损伤和lost of motor functions。但现有的实验设置不允许鼠标自由移动,因此不能捕捉其自然行为,因为它们有一个连接鼠标头stage和收集设备的硬件连接。在这种工作中,我们提出了一种轻量级神经元发射检测和分类架构,利用粒细 Purkinje 细胞的特有特征,在实时中抛弃不必要的神经数据。这使得(缩减)数据可以轻松地存储在鼠标头stage上的可 removable 存储设备上,解决了需要硬件连接的问题。我们的提议实现显示了 >95% 的总分类精度,同时仍保持小型设计,允许鼠标在实验中自由移动。此外,设计的能效性和使用 STT-RAM(磁转转换栅隔隔 Memory)作为可 removable 存储,使得头stage可以轻松运行在 tiny 电池上,可以达到约4天的操作时间。

The PetShop Dataset – Finding Causes of Performance Issues across Microservices

  • paper_url: http://arxiv.org/abs/2311.04806
  • repo_url: None
  • paper_authors: Michaela Hardt, William Orchard, Patrick Blöbaum, Shiva Kasiviswanathan, Elke Kirschbaum
  • for: 本研究旨在提供一个特定于微服务应用的根本原因分析数据集,用于评估不同的根本原因分析方法。
  • methods: 本研究使用了一个分布式应用程序 emit 5 分钟间隔的延迟、请求和可用性指标,并在系统中随机引入了 68 个性能问题,以模拟不良行为。
  • results: 本研究通过使用这个数据集,证明了这个数据集可以用于评估不同的根本原因分析方法的准确性。
    Abstract Identifying root causes for unexpected or undesirable behavior in complex systems is a prevalent challenge. This issue becomes especially crucial in modern cloud applications that employ numerous microservices. Although the machine learning and systems research communities have proposed various techniques to tackle this problem, there is currently a lack of standardized datasets for quantitative benchmarking. Consequently, research groups are compelled to create their own datasets for experimentation. This paper introduces a dataset specifically designed for evaluating root cause analyses in microservice-based applications. The dataset encompasses latency, requests, and availability metrics emitted in 5-minute intervals from a distributed application. In addition to normal operation metrics, the dataset includes 68 injected performance issues, which increase latency and reduce availability throughout the system. We showcase how this dataset can be used to evaluate the accuracy of a variety of methods spanning different causal and non-causal characterisations of the root cause analysis problem. We hope the new dataset, available at https://github.com/amazon-science/petshop-root-cause-analysis/ enables further development of techniques in this important area.
    摘要 通用系统中异常或不满意的行为的根本原因识别是一个广泛存在的挑战。特别是现代云应用程序使用多个微服务,这问题变得更加重要。虽然机器学习和系统研究共同体已经提出了各种方法来解决这个问题,但目前还没有标准化的数据集用于量化比较。因此,研究组织被迫创建自己的数据集用于实验。这篇文章介绍了一个专门为识别微服务基本应用中的根本原因分析而设计的数据集。该数据集包括5分钟间隔的延迟、请求和可用性指标,以及68个注入性性能问题,这些问题会在系统中增加延迟和降低可用性。我们展示了如何使用这个数据集来评估多种不同的 causal 和非 causal 根本原因分析问题的准确性。我们希望新的数据集,可以在 上获取,能够推动这一重要领域的进一步发展。

Why Do Clinical Probabilistic Models Fail To Transport Between Sites?

  • paper_url: http://arxiv.org/abs/2311.04787
  • repo_url: None
  • paper_authors: Thomas A. Lasko, Eric V. Strobl, William W. Stead
  • for: 本研究旨在解释在健康领域中人工智能的应用中出现的问题,即模型在训练站上达到超人类性能后在新站点上表现差异较大。
  • methods: 本研究使用了分析常见导致模型在新站点上表现差异的源头,并将这些源头分为实验者可控的和数据生成过程中的内在源头。
  • results: 研究发现,数据生成过程中的站点特有的临床实践可能导致模型在新站点上表现差异,并提出了一种解决方案,即隔离数据中各站点临床实践的影响,以便更好地预测疾病的发展趋势。
    Abstract The rising popularity of artificial intelligence in healthcare is highlighting the problem that a computational model achieving super-human clinical performance at its training sites may perform substantially worse at new sites. In this perspective, we present common sources for this failure to transport, which we divide into sources under the control of the experimenter and sources inherent to the clinical data-generating process. Of the inherent sources we look a little deeper into site-specific clinical practices that can affect the data distribution, and propose a potential solution intended to isolate the imprint of those practices on the data from the patterns of disease cause and effect that are the usual target of clinical models.
    摘要 人工智能在医疗领域的普及正加剧了一个问题:一个计算模型在训练 Site 上达到超人类优秀表现,但在新 Site 上可能表现很差。在这个视角下,我们描述了不能传输的常见来源,分为实验者控制的源和数据生成过程中的自然源。其中,我们对内在的源进一步分析了Site-specific临床实践对数据分布的影响,并提出了一种解决方案,以隔离临床实践对数据的影响,从而更好地预测疾病的 causa 和效果。

FetMRQC: an open-source machine learning framework for multi-centric fetal brain MRI quality control

  • paper_url: http://arxiv.org/abs/2311.04780
  • repo_url: https://github.com/medical-image-analysis-laboratory/fetal_brain_qc
  • paper_authors: Thomas Sanchez, Oscar Esteban, Yvan Gomez, Alexandre Pron, Mériam Koob, Vincent Dunet, Nadine Girard, Andras Jakab, Elisenda Eixarch, Guillaume Auzias, Meritxell Bach Cuadra
  • for: 这个论文旨在提供一种自动化图像质量评估和控制框架,以提高胎儿脑部MRI图像的质量和可靠性。
  • methods: 这种框架使用机器学习算法,提取了不同扫描仪和数据采集中的质量指标,并将其组合成Random Forest模型来预测专家评分。
  • results: 研究表明,FetMRQC的预测结果在不同的扫描仪和数据采集中具有良好的泛化能力和可解释性。
    Abstract Fetal brain MRI is becoming an increasingly relevant complement to neurosonography for perinatal diagnosis, allowing fundamental insights into fetal brain development throughout gestation. However, uncontrolled fetal motion and heterogeneity in acquisition protocols lead to data of variable quality, potentially biasing the outcome of subsequent studies. We present FetMRQC, an open-source machine-learning framework for automated image quality assessment and quality control that is robust to domain shifts induced by the heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics from unprocessed anatomical MRI and combines them to predict experts' ratings using random forests. We validate our framework on a pioneeringly large and diverse dataset of more than 1600 manually rated fetal brain T2-weighted images from four clinical centers and 13 different scanners. Our study shows that FetMRQC's predictions generalize well to unseen data while being interpretable. FetMRQC is a step towards more robust fetal brain neuroimaging, which has the potential to shed new insights on the developing human brain.
    摘要 《胎儿脑MRI在产前诊断中成为越来越重要的补充,允许深入了解胎儿脑发育的全程。但是,无法控制胎儿运动和数据采集协议的不同导致数据质量存在变化,可能影响后续研究的结果。我们提出了FetMRQC,一个开源的机器学习框架,用于自动评估和控制图像质量,对域外传递产生的影响具有抗难度特性。FetMRQC从未处理的 анатомичеMRI中提取一 ensemble of 质量指标,使用随机森林将其组合成为专家评分。我们验证了我们的框架,使用了1600多个手动评分的胎儿脑T2强化MRI图像,来自四个临床中心和13个不同的扫描仪。我们的研究表明,FetMRQC的预测能够在未看到数据上具有良好的泛化能力,同时具有可解释性。FetMRQC是更加Robust的胎儿脑神经成像的一步,它有可能为人类脑发育带来新的发现。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need the translation in Traditional Chinese, please let me know.

Optimal Deep Neural Network Approximation for Korobov Functions with respect to Sobolev Norms

  • paper_url: http://arxiv.org/abs/2311.04779
  • repo_url: None
  • paper_authors: Yahong Yang, Yulong Lu
  • for: 这个论文为了解决深度神经网络(DNNs)应用于科罗波夫函数时的近似问题而写作。
  • methods: 该论文使用了深度神经网络来近似科罗波夫函数,并使用$L_p$ norms和$H^1$ norms来衡量近似结果。
  • results: 该论文获得了一个很高的近似率,超过传统方法和任何连续函数近似器。这些结果是非对数的,可以同时考虑网络宽度和深度。
    Abstract This paper establishes the nearly optimal rate of approximation for deep neural networks (DNNs) when applied to Korobov functions, effectively overcoming the curse of dimensionality. The approximation results presented in this paper are measured with respect to $L_p$ norms and $H^1$ norms. Our achieved approximation rate demonstrates a remarkable "super-convergence" rate, outperforming traditional methods and any continuous function approximator. These results are non-asymptotic, providing error bounds that consider both the width and depth of the networks simultaneously.
    摘要 Note:* "Korobov functions" 是指具有特定的某种函数形式的函数。* "curse of dimensionality" 是指在高维空间中, tradicional method 的拟合率会随着维度的增加而减慢。* "super-convergence" 是指拟合率比传统方法更快地增长。* "non-asymptotic" 是指不含 asymptotic 的概率 bound。

Towards a Unified Framework of Contrastive Learning for Disentangled Representations

  • paper_url: http://arxiv.org/abs/2311.04774
  • repo_url: None
  • paper_authors: Stefan Matthes, Zhiwei Han, Hao Shen
  • for: 本研究旨在扩展对冲学习方法的理论保证,以便在更广泛的对冲搜索空间中找到和分离数据中的解释因素。
  • methods: 本研究使用了四种冲对方法,包括雷达对冲估计(NCE)和信息对冲估计(InfoNCE)等。
  • results: 研究人员通过 теорем的证明,证明了这些对冲方法可以帮助找到和分离数据中的解释因素,而不需要假设数据生成过程的特定假设。这些结论在多个标准数据集上进行了验证。
    Abstract Contrastive learning has recently emerged as a promising approach for learning data representations that discover and disentangle the explanatory factors of the data. Previous analyses of such approaches have largely focused on individual contrastive losses, such as noise-contrastive estimation (NCE) and InfoNCE, and rely on specific assumptions about the data generating process. This paper extends the theoretical guarantees for disentanglement to a broader family of contrastive methods, while also relaxing the assumptions about the data distribution. Specifically, we prove identifiability of the true latents for four contrastive losses studied in this paper, without imposing common independence assumptions. The theoretical findings are validated on several benchmark datasets. Finally, practical limitations of these methods are also investigated.
    摘要 contrastive learning 最近 emerged as a promising approach for learning data representations that discover and disentangle the explanatory factors of the data. Previous analyses of such approaches have largely focused on individual contrastive losses, such as noise-contrastive estimation (NCE) and InfoNCE, and rely on specific assumptions about the data generating process. This paper extends the theoretical guarantees for disentanglement to a broader family of contrastive methods, while also relaxing the assumptions about the data distribution. Specifically, we prove identifiability of the true latents for four contrastive losses studied in this paper, without imposing common independence assumptions. The theoretical findings are validated on several benchmark datasets. Finally, practical limitations of these methods are also investigated.

Towards Open-world Cross-Domain Sequential Recommendation: A Model-Agnostic Contrastive Denoising Approach

  • paper_url: http://arxiv.org/abs/2311.04760
  • repo_url: None
  • paper_authors: Wujiang Xu, Xuying Ning, Wenfang Lin, Mingming Ha, Qiongxu Ma, Linxun Chen, Bing Han, Minnan Luo
  • for: 提高开放世界CDSR场景中模型的一致性和有效性(1st CH)
  • methods: 使用辅助行为来补充长尾用户的信息(2nd CH)
  • results: 这些SR方法无法在CDSR场景中提供优秀的表现,因为它们忽略了目标行为和辅助行为之间的semantic gap,以及用户兴趣偏移 across domains(2nd CH)
    Abstract Cross-domain sequential recommendation (CDSR) aims to address the data sparsity problems that exist in traditional sequential recommendation (SR) systems. The existing approaches aim to design a specific cross-domain unit that can transfer and propagate information across multiple domains by relying on overlapping users with abundant behaviors. However, in real-world recommender systems, CDSR scenarios usually consist of a majority of long-tailed users with sparse behaviors and cold-start users who only exist in one domain. This leads to a drop in the performance of existing CDSR methods in the real-world industry platform. Therefore, improving the consistency and effectiveness of models in open-world CDSR scenarios is crucial for constructing CDSR models (\textit{1st} CH). Recently, some SR approaches have utilized auxiliary behaviors to complement the information for long-tailed users. However, these multi-behavior SR methods cannot deliver promising performance in CDSR, as they overlook the semantic gap between target and auxiliary behaviors, as well as user interest deviation across domains (\textit{2nd} CH).
    摘要 Recently, some SR approaches have used auxiliary behaviors to complement information for long-tailed users. However, these multi-behavior SR methods cannot deliver promising performance in CDSR due to the semantic gap between target and auxiliary behaviors, as well as user interest deviation across domains.

Natural Bayesian Cramér-Rao Bound with an Application to Covariance Estimation

  • paper_url: http://arxiv.org/abs/2311.04748
  • repo_url: None
  • paper_authors: Florent Bouchard, Alexandre Renaux, Guillaume Ginolhac, Arnaud Breloy
  • for: 本研究提出了一种新的克拉默-拉托 bound (CRB),用于估计参数在拓扑 manifold 上并且受到先验分布的影响。
  • methods: 本研究使用了一种新的 derivation 方法,导致了一种自然的 geometrical 性质准则和这个新的 bound 之间的不等式。
  • results: 数值 simulations 表明,提出的 CRB 可以展示一些MAP 估计器的有趣性质,而经典极大似然估计器 (Bayesian CRB) 不能够显示出这些性质。
    Abstract In this paper, we propose to develop a new Cram\'er-Rao Bound (CRB) when the parameter to estimate lies in a manifold and follows a prior distribution. This derivation leads to a natural inequality between an error criteria based on geometrical properties and this new bound. This main contribution is illustrated in the problem of covariance estimation when the data follow a Gaussian distribution and the prior distribution is an inverse Wishart. Numerical simulation shows new results where the proposed CRB allows to exhibit interesting properties of the MAP estimator which are not observed with the classical Bayesian CRB.
    摘要 在这篇论文中,我们提出了一种新的卡默-拉托矩bound(CRB),其中参数需要估计的投影到一个拓扑上,并且遵循一个先验分布。这个 derivation 导致了一种自然的准则,该准则与这个新的矩bound 之间存在一种对应关系。这种主要贡献在 covariance 估计中进行了应用,其中数据遵循 Gaussian 分布,先验分布是 inverse Wishart。数值实验显示,我们的提议的 CRB 可以展示一些MAP 估计器的有趣特性,这些特性与 классическому Bayesian CRB 不可见。

Enhancing Multi-Agent Coordination through Common Operating Picture Integration

  • paper_url: http://arxiv.org/abs/2311.04740
  • repo_url: None
  • paper_authors: Peihong Yu, Bhoram Lee, Aswin Raghavan, Supun Samarasekara, Pratap Tokekar, James Zachary Hare
  • For: This paper focuses on improving multi-agent coordination in dynamic environments, where agents possess only local observations and must communicate to enhance coordination.* Methods: The proposed approach uses a Common Operating Picture (COP) that integrates each agent’s observations, actions, and messages received, and disseminates the COP to other agents. This approach takes into account the dynamic nature of the environment and the shared mission.* Results: The paper shows that COP-based training leads to robust policies compared to state-of-the-art MARL methods when faced with out-of-distribution initial states, through experiments in the StarCraft2 environment.Here’s the summary in Simplified Chinese:
  • for: 多 Agent 系统中的 Agent 仅 possessed 本地观察,通信成为协调的关键。这篇 paper 针对这个问题提出了一种方法。
  • methods: 提议的方法使用 Common Operating Picture (COP),让每个 Agent 统一其观察、动作和获取的讯息,并将 COP 分享到其他 Agent 中。这种方法考虑了环境的动态性和共同任务。
  • results: 这篇 paper 透过 StarCraft2 环境进行实验,证明 COP-based 训练对于离distribution 初始状态时的策略比 state-of-the-art Multi-Agent Reinforcement Learning (MARL) 方法更加Robust。
    Abstract In multi-agent systems, agents possess only local observations of the environment. Communication between teammates becomes crucial for enhancing coordination. Past research has primarily focused on encoding local information into embedding messages which are unintelligible to humans. We find that using these messages in agent's policy learning leads to brittle policies when tested on out-of-distribution initial states. We present an approach to multi-agent coordination, where each agent is equipped with the capability to integrate its (history of) observations, actions and messages received into a Common Operating Picture (COP) and disseminate the COP. This process takes into account the dynamic nature of the environment and the shared mission. We conducted experiments in the StarCraft2 environment to validate our approach. Our results demonstrate the efficacy of COP integration, and show that COP-based training leads to robust policies compared to state-of-the-art Multi-Agent Reinforcement Learning (MARL) methods when faced with out-of-distribution initial states.
    摘要 在多智能系统中,智能体仅具有本地环境观察。团队成员之间的交流成为协调的关键。过去的研究主要集中在编码本地信息到嵌入消息中,这些消息对人类不可读。我们发现,在智能体政策学习中使用这些消息会导致不稳定的政策,对于非标准初始状态进行测试时。我们提出了一种多智能协调方法,其中每个智能体具有将其(历史观察、行动和接收的消息)集成为共同运作图像(COP)的能力,并将COP分布给其他团队成员。这个过程考虑了环境的动态性和共同任务。我们在StarCraft2环境中进行了实验,以验证我们的方法。我们的结果表明COP集成的有效性,并示出COP基于培训在对不同初始状态进行测试时,与现有多智能学习方法相比,具有更加稳定的政策。

Robust Best-arm Identification in Linear Bandits

  • paper_url: http://arxiv.org/abs/2311.04731
  • repo_url: None
  • paper_authors: Wei Wang, Sattar Vakili, Ilija Bogunovic
  • for: 这种研究旨在解决 robust best-arm identification problem (RBAI) 中的 linear rewards 问题,目标是找到一个近似优化的 robust arm,以便在实际应用中实现 transferred 的优化策略。
  • methods: 该研究提出了一个实例取值下的下界,并提出了静态和适应式bandit算法,以实现与下界匹配的样本复杂度。
  • results: 在synthetic实验中,该算法能够有效地找到最佳的 robust arm,并与oracle策略相似。在应用中,该算法在不同年龄层的病人中实现了robust dosage值的标准化。
    Abstract We study the robust best-arm identification problem (RBAI) in the case of linear rewards. The primary objective is to identify a near-optimal robust arm, which involves selecting arms at every round and assessing their robustness by exploring potential adversarial actions. This approach is particularly relevant when utilizing a simulator and seeking to identify a robust solution for real-world transfer. To this end, we present an instance-dependent lower bound for the robust best-arm identification problem with linear rewards. Furthermore, we propose both static and adaptive bandit algorithms that achieve sample complexity that matches the lower bound. In synthetic experiments, our algorithms effectively identify the best robust arm and perform similarly to the oracle strategy. As an application, we examine diabetes care and the process of learning insulin dose recommendations that are robust with respect to inaccuracies in standard calculators. Our algorithms prove to be effective in identifying robust dosage values across various age ranges of patients.
    摘要 我们研究了Robust Best-Arm Identification问题(RBAI)在线性奖励情况下。主要目标是找到近似优化的Robust arm,这里是在每个轮次选择武器并评估它们的Robustness,通过探索敌方动作的可能性。这种方法特别有用在使用模拟器并寻找实际世界中的稳定解决方案。为此,我们提出了一个实例dependent的下界 дляRobust Best-Arm Identification问题,并提出了静态和适应式bandit算法,这些算法的样本复杂度与下界相匹配。在 sintetic 实验中,我们的算法成功地确定了最佳Robust arm,并与oracle策略相似。作为应用,我们研究了diabetes care和学习不准确的标准计算器中的药物剂量建议的Robust性。我们的算法在不同年龄范围的患者中表现出了有效的Robust剂量值。

Predicting Properties of Nodes via Community-Aware Features

  • paper_url: http://arxiv.org/abs/2311.04730
  • repo_url: https://github.com/sebkaz/betastar
  • paper_authors: Bogumił Kamiński, Paweł Prałat, François Théberge, Sebastian Zając
  • for: 本研究旨在提出一家族的社区意识型节点特征,并研究其性质。
  • methods: 本文提出了一种基于社区意识的节点特征家族,并对其进行了 investigate。
  • results: 研究表明,这种特征家族具有高预测力 для分类任务,并且包含不可recover的信息, neither by classical node features nor by node embeddings。
    Abstract A community structure that is often present in complex networks plays an important role not only in their formation but also shapes dynamics of these networks, affecting properties of their nodes. In this paper, we propose a family of community-aware node features and then investigate their properties. We show that they have high predictive power for classification tasks. We also verify that they contain information that cannot be recovered neither by classical node features nor by node embeddings (both classical as well as structural).
    摘要 复杂网络中常见的社区结构不仅参与网络的形成,还对网络的动态shape有重要影响,对节点的性质产生影响。在这篇论文中,我们提出了一家族社区意识型节点特征,然后调查其性质。我们发现它们具有高预测力 для分类任务。我们还证明它们不可能通过传统节点特征还是结构节点嵌入获得。

Robust and Communication-Efficient Federated Domain Adaptation via Random Features

  • paper_url: http://arxiv.org/abs/2311.04686
  • repo_url: https://github.com/sadangelf/fedrf-tca
  • paper_authors: Zhanbo Feng, Yuanjie Wang, Jie Li, Fan Yang, Jiong Lou, Tiebin Mi, Robert. C. Qiu, Zhenyu Liao
  • for: This paper is written for researchers and practitioners who are interested in federated domain adaptation (FDA) and want to improve the efficiency and robustness of their FDA methods.
  • methods: The paper proposes an enhancement to the standard Transfer Component Analysis (TCA) approach, called RF-TCA, which significantly accelerates computation without compromising theoretical and empirical performance. The proposed FedRF-TCA protocol is an extension of RF-TCA to the FDA setting, which has communication complexity that is independent of the sample size and maintains performance that is either comparable to or even surpasses state-of-the-art FDA methods.
  • results: The paper presents extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA compared to state-of-the-art FDA methods. The results demonstrate that FedRF-TCA can handle large-scale FDA tasks with high efficiency and accuracy, and is robust to network conditions.Here is the answer in Simplified Chinese:
  • for: 这篇论文是为研究者和实践者们,他们关心联合预测领域(Federated Domain Adaptation,FDA)的人们所写的。
  • methods: 这篇论文提出了改进标准传输组分分析(TCA)方法的增强版本,即RF-TCA,它可以在 computation 方面减少计算量,而不会产生理论和实际性的损害。提出的 FedRF-TCA 协议是 TCA 的扩展,用于 FDA 设定,它的通信复杂度是采样大小独立的,而且可以保持和现有 FDA 方法相比或者甚至超越其性能。
  • results: 这篇论文通过了广泛的实验,证明 FedRF-TCA 可以处理大规模 FDA 任务,并且具有高效性和精度。它还可以在不同的网络条件下保持稳定性和可靠性。
    Abstract Modern machine learning (ML) models have grown to a scale where training them on a single machine becomes impractical. As a result, there is a growing trend to leverage federated learning (FL) techniques to train large ML models in a distributed and collaborative manner. These models, however, when deployed on new devices, might struggle to generalize well due to domain shifts. In this context, federated domain adaptation (FDA) emerges as a powerful approach to address this challenge. Most existing FDA approaches typically focus on aligning the distributions between source and target domains by minimizing their (e.g., MMD) distance. Such strategies, however, inevitably introduce high communication overheads and can be highly sensitive to network reliability. In this paper, we introduce RF-TCA, an enhancement to the standard Transfer Component Analysis approach that significantly accelerates computation without compromising theoretical and empirical performance. Leveraging the computational advantage of RF-TCA, we further extend it to FDA setting with FedRF-TCA. The proposed FedRF-TCA protocol boasts communication complexity that is \emph{independent} of the sample size, while maintaining performance that is either comparable to or even surpasses state-of-the-art FDA methods. We present extensive experiments to showcase the superior performance and robustness (to network condition) of FedRF-TCA.
    摘要 现代机器学习(ML)模型已经发展到了训练在单机器上是不现实的规模。因此,有一个增长的趋势是使用联邦学习(FL)技术来训练大型ML模型在分布式和协作的方式上。这些模型在新设备上部署时可能会遇到领域变化,导致其不能良好地泛化。在这种情况下,联邦领域适应(FDA)作为一种有力的方法来解决这个挑战。现有的FDA方法通常是通过最小化源频率和目标频率之间的差距(例如MMD)来对 distributions进行对齐。然而,这些策略会带来高通信开销并且对网络可靠性非常敏感。在这篇论文中,我们介绍了RF-TCA,一种提高标准传输组件分析方法的优化。RF-TCA可以快速计算,而不需要牺牲理论和实际性能。基于RF-TCA的计算优势,我们进一步扩展了它到FDA设置,得到了FedRF-TCA协议。FedRF-TCA协议的通信复杂度是独立于样本大小的,同时保持和现状最佳的性能。我们进行了广泛的实验,证明FedRF-TCA的超越性和网络条件的稳定性。

Compressive Recovery of Sparse Precision Matrices

  • paper_url: http://arxiv.org/abs/2311.04673
  • repo_url: None
  • paper_authors: Titouan Vayer, Etienne Lasalle, Rémi Gribonval, Paulo Gonçalves
  • for: 本研究旨在学习一个图模型,用于统计关系分析dataset中的$d$变量的$n$个样本$X$。
  • methods: 本研究使用了一种压缩视角,通过从$X$中随机生成的低维度向量来Estimate一个稀疏的$\Theta$。
  • results: 研究表明,在certain assumptions下,可以从一个低维度sketch中Estimate一个稀疏的$\Theta$,且需要$m=\Omega((d+2k)\log(d))$的维度,其中$k$是Underlying graph的最大边数。
    Abstract We consider the problem of learning a graph modeling the statistical relations of the $d$ variables of a dataset with $n$ samples $X \in \mathbb{R}^{n \times d}$. Standard approaches amount to searching for a precision matrix $\Theta$ representative of a Gaussian graphical model that adequately explains the data. However, most maximum likelihood-based estimators usually require storing the $d^{2}$ values of the empirical covariance matrix, which can become prohibitive in a high-dimensional setting. In this work, we adopt a compressive viewpoint and aim to estimate a sparse $\Theta$ from a sketch of the data, i.e. a low-dimensional vector of size $m \ll d^{2}$ carefully designed from $X$ using nonlinear random features. Under certain assumptions on the spectrum of $\Theta$ (or its condition number), we show that it is possible to estimate it from a sketch of size $m=\Omega((d+2k)\log(d))$ where $k$ is the maximal number of edges of the underlying graph. These information-theoretic guarantees are inspired by compressed sensing theory and involve restricted isometry properties and instance optimal decoders. We investigate the possibility of achieving practical recovery with an iterative algorithm based on the graphical lasso, viewed as a specific denoiser. We compare our approach and graphical lasso on synthetic datasets, demonstrating its favorable performance even when the dataset is compressed.
    摘要 我们考虑一个图模型,用于描述一个 dataset 中 $d$ 个变数之间的统计关系。标准方法通常是寻找一个精确的 $\Theta$ 矩阵,代表一个 Gaussian 图模型,以便对数据进行适当地描述。但是,大多数最大 LIKELIHOOD 基本的估计方法通常需要储存 $d^2$ 个 empirical covariance matrix 的值,这可能会在高维度设定中成为禁止的。在这个工作中,我们遵循一种压缩的观点,企图从 dataset 中获取一个压缩的 $\Theta$ 矩阵,即一个来自 $X$ 的非线性随机特征下的低维度 вектор。在某些 $\Theta$ 的 спектル(或其 condition number)的假设下,我们展示了可以从获取的大小为 $m \ll d^2$ 的压缩 sketch 中估计 $\Theta$。这些信息理论上的保证是基于数据压缩理论和具有特定的 Restricted Isometry 性和实例最佳解解oder。我们 investigate 可能在实际应用中实现实用的重建,使用一个基于图形lasso 的迭代算法,视为特定的推理器。我们在实验中与图形lasso 进行比较,展示了其优越的表现,即使对于压缩的 dataset。

Learning Linear Gaussian Polytree Models with Interventions

  • paper_url: http://arxiv.org/abs/2311.04636
  • repo_url: https://github.com/emduart2/polytrees
  • paper_authors: D. Tramontano, L. Waldmann, M. Drton, E. Duarte
  • for: 学习非 Parametric linear Gaussian 树的 causal 结构,使用来自干预实验的数据,其中干预目标已知。
  • methods: 方法首先学习树的skeleton,然后将其边 orient。输出是一个 CPDAG,表示真实分布下的树的干预Equivalence class。skeleton和orientation恢复过程都基于第二阶统计和低维度边 Distribution。
  • results: 在不同场景下的synthetic数据集中,方法具有快速、高精度、可扩展性。在一个基因表达干预数据集中应用方法,并且对结果进行了评估。
    Abstract We present a consistent and highly scalable local approach to learn the causal structure of a linear Gaussian polytree using data from interventional experiments with known intervention targets. Our methods first learn the skeleton of the polytree and then orient its edges. The output is a CPDAG representing the interventional equivalence class of the polytree of the true underlying distribution. The skeleton and orientation recovery procedures we use rely on second order statistics and low-dimensional marginal distributions. We assess the performance of our methods under different scenarios in synthetic data sets and apply our algorithm to learn a polytree in a gene expression interventional data set. Our simulation studies demonstrate that our approach is fast, has good accuracy in terms of structural Hamming distance, and handles problems with thousands of nodes.
    摘要 我们提出了一种一致性很高且可扩展的本地方法,用于学习 linear Gaussian 树的 causal 结构,基于干扰实验数据中知道的干扰目标。我们的方法首先学习树的skeleton,然后对其 edges 进行orienting。输出是一个 CPDAG 表示真实下面分布的干扰 equivalence class。我们使用第二阶 Statistics 和低维度边分布来进行骨架和 Orienting 过程。我们在不同情况下进行了simulationstudies,并将方法应用到了一个基因表达干扰数据集中。我们的模拟研究显示,我们的方法具有快速、高准确率和可扩展性。

Byzantine-Tolerant Methods for Distributed Variational Inequalities

  • paper_url: http://arxiv.org/abs/2311.04611
  • repo_url: https://github.com/nazya/sgda-ra
  • paper_authors: Nazarii Tupitsa, Abdulla Jasem Almansoori, Yanlin Wu, Martin Takáč, Karthik Nandakumar, Samuel Horváth, Eduard Gorbunov
  • for: This paper is written for discussing the problem of Byzantine robustness in distributed training scenarios, particularly in the context of variational inequalities.
  • methods: The paper proposes several provably Byzantine-robust methods for distributed variational inequality, and thoroughly studies their theoretical convergence.
  • results: The paper provides numerical comparisons supporting the theoretical findings, and removes the limitations of previous work in this area.
    Abstract Robustness to Byzantine attacks is a necessity for various distributed training scenarios. When the training reduces to the process of solving a minimization problem, Byzantine robustness is relatively well-understood. However, other problem formulations, such as min-max problems or, more generally, variational inequalities, arise in many modern machine learning and, in particular, distributed learning tasks. These problems significantly differ from the standard minimization ones and, therefore, require separate consideration. Nevertheless, only one work (Adibi et al., 2022) addresses this important question in the context of Byzantine robustness. Our work makes a further step in this direction by providing several (provably) Byzantine-robust methods for distributed variational inequality, thoroughly studying their theoretical convergence, removing the limitations of the previous work, and providing numerical comparisons supporting the theoretical findings.
    摘要 Robustness to Byzantine attacks is a necessity for various distributed training scenarios. When the training reduces to the process of solving a minimization problem, Byzantine robustness is relatively well-understood. However, other problem formulations, such as min-max problems or, more generally, variational inequalities, arise in many modern machine learning and, in particular, distributed learning tasks. These problems significantly differ from the standard minimization ones and, therefore, require separate consideration. Nevertheless, only one work (Adibi et al., 2022) addresses this important question in the context of Byzantine robustness. Our work makes a further step in this direction by providing several (provably) Byzantine-robust methods for distributed variational inequality, thoroughly studying their theoretical convergence, removing the limitations of the previous work, and providing numerical comparisons supporting the theoretical findings.Here's the translation in Traditional Chinese:Robustness to Byzantine attacks is a necessity for various distributed training scenarios. When the training reduces to the process of solving a minimization problem, Byzantine robustness is relatively well-understood. However, other problem formulations, such as min-max problems or, more generally, variational inequalities, arise in many modern machine learning and, in particular, distributed learning tasks. These problems significantly differ from the standard minimization ones and, therefore, require separate consideration. Nevertheless, only one work (Adibi et al., 2022) addresses this important question in the context of Byzantine robustness. Our work makes a further step in this direction by providing several (provably) Byzantine-robust methods for distributed variational inequality, thoroughly studying their theoretical convergence, removing the limitations of the previous work, and providing numerical comparisons supporting the theoretical findings.

Accurate Autism Spectrum Disorder prediction using Support Vector Classifier based on Federated Learning (SVCFL)

  • paper_url: http://arxiv.org/abs/2311.04606
  • repo_url: None
  • paper_authors: Ali Mohammadifar, Hasan Samadbin, Arman Daliri
  • for: 旨在提高autism诊断的准确率和效率,尤其是在诊断的 inicial stages 中。
  • methods: 使用 Federated Learning 方法和 Support Vector Classifier 方法,通过分析大量数据并找到不同人类评估者之间的 Patterns ,帮助确认诊断或 highlight 需要进一步测试的情况。
  • results: 在这种方法下,实现了 99% 的准确率和 13% 的提升。
    Abstract The path to an autism diagnosis can be long and difficult, and delays can have serious consequences. Artificial intelligence can completely change the way autism is diagnosed, especially when it comes to situations where it is difficult to see the first signs of the disease. AI-based diagnostic tools may help confirm a diagnosis or highlight the need for further testing by analyzing large volumes of data and uncovering patterns that may not be immediately apparent to human evaluators. After a successful and timely diagnosis, autism can be treated through artificial intelligence using various methods. In this article, by using four datasets and gathering them with the federated learning method and diagnosing them with the support vector classifier method, the early diagnosis of this disorder has been discussed. In this method, we have achieved 99% accuracy for predicting autism spectrum disorder and we have achieved 13% improvement in the results.
    摘要 “患有自闭症的诊断路径可能很长,困难,延迟诊断可能会有严重的后果。人工智能可能将改变自闭症的诊断方式,尤其是在诊断难以看到早期症状的情况下。人工智能基本技术可以帮助确认诊断或发现更多的测试是必要的,通过分析大量数据并找到不容易被人类评估者所发现的模式。在这篇文章中,我们使用了四个数据集和聚合它们,使用联邦学习方法进行诊断,使用支持向量分类方法进行预测,实现了99%的自闭症诊断精度,并取得13%的提升。”Note: Please keep in mind that the translation is done by a machine and may not be perfect. It's always best to have a human translator to ensure the accuracy of the translation.

Zeroth-order Asynchronous Learning with Bounded Delays with a Use-case in Resource Allocation in Communication Networks

  • paper_url: http://arxiv.org/abs/2311.04604
  • repo_url: None
  • paper_authors: Pourya Behmandpoor, Marc Moonen, Panagiotis Patrinos
  • for: 这篇论文专门研究了多个代理合作实现分布式优化,具体是在各自有不同任务的情况下,代理通过互动来协同优化本地参数,以实现共同任务的最佳化。
  • methods: 该论文使用的方法包括分布式优化、异步学习和通信延迟等。
  • results: 论文提出了一种基于异步学习和分布式优化的方法,并提供了对该方法的分析和数学分析,以及一些实验结果,以证明该方法的有效性。
    Abstract Distributed optimization has experienced a significant surge in interest due to its wide-ranging applications in distributed learning and adaptation. While various scenarios, such as shared-memory, local-memory, and consensus-based approaches, have been extensively studied in isolation, there remains a need for further exploration of their interconnections. This paper specifically concentrates on a scenario where agents collaborate toward a unified mission while potentially having distinct tasks. Each agent's actions can potentially impact other agents through interactions. Within this context, the objective for the agents is to optimize their local parameters based on the aggregate of local reward functions, where only local zeroth-order oracles are available. Notably, the learning process is asynchronous, meaning that agents update and query their zeroth-order oracles asynchronously while communicating with other agents subject to bounded but possibly random communication delays. This paper presents theoretical convergence analyses and establishes a convergence rate for the proposed approach. Furthermore, it addresses the relevant issue of deep learning-based resource allocation in communication networks and conducts numerical experiments in which agents, acting as transmitters, collaboratively train their individual (possibly unique) policies to maximize a common performance metric.
    摘要 分布式优化在分布式学习和适应中得到了广泛的关注,因为它们在各种应用场景中具有广泛的应用前景。虽然许多场景,如共享内存、本地内存和协议基本上的方法,在孤立的情况下得到了广泛的研究,但是还需要进一步探索这些场景之间的相互连接。这篇论文专门关注在多个代理 collaborate 以实现共同任务的情况下,每个代理的行为可能会影响其他代理的行为。在这个上下文中,代理的目标是在各自的本地参数上优化本地奖励函数,只有本地零次权重可用。它们的学习过程是异步的,meaning that agents update and query their zeroth-order oracles asynchronously while communicating with other agents subject to bounded but possibly random communication delays。这篇论文提供了理论的叠加分析和确定了提案的速度。此外,它还考虑了基于深度学习的通信网络资源分配问题,并进行了 numerically experiments in which agents, acting as transmitters, collaboratively train their individual (possibly unique) policies to maximize a common performance metric。

A Deep Learning Based Resource Allocator for Communication Systems with Dynamic User Utility Demands

  • paper_url: http://arxiv.org/abs/2311.04600
  • repo_url: None
  • paper_authors: Pourya Behmandpoor, Panagiotis Patrinos, Marc Moonen
  • for: 这篇论文目的是提出一种基于深度学习的资源分配算法,以满足用户的 utility 需求。
  • methods: 该算法使用了深度神经网络(DNN)来实现用户的 utility 需求调整,并在每个时间点进行了迭代优化算法来优化用户的在线状态。
  • results: 实验结果表明,该算法可以有效地满足用户的 utility 需求,并且可以在不同的场景下(如中央化和分布式场景)进行部署。
    Abstract Deep learning (DL) based resource allocation (RA) has recently gained a lot of attention due to its performance efficiency. However, most of the related studies assume an ideal case where the number of users and their utility demands, e.g., data rate constraints, are fixed and the designed DL based RA scheme exploits a policy trained only for these fixed parameters. A computationally complex policy retraining is required whenever these parameters change. Therefore, in this paper, a DL based resource allocator (ALCOR) is introduced, which allows users to freely adjust their utility demands based on, e.g., their application layer. ALCOR employs deep neural networks (DNNs), as the policy, in an iterative optimization algorithm. The optimization algorithm aims to optimize the on-off status of users in a time-sharing problem to satisfy their utility demands in expectation. The policy performs unconstrained RA (URA) -- RA without taking into account user utility demands -- among active users to maximize the sum utility (SU) at each time instant. Based on the chosen URA scheme, ALCOR can perform RA in a model-based or model-free manner and in a centralized or distributed scenario. Derived convergence analyses provide guarantees for the convergence of ALCOR, and numerical experiments corroborate its effectiveness.
    摘要 深度学习(DL)基于资源分配(RA)在最近吸引了很多关注,因为它的性能效率很高。然而,大多数相关研究假设用户和他们的需求是固定的,而设计的DL基于RA schemes只是在这些固定参数下采用一个已经训练过的策略。因此,在这篇论文中,一种基于DL的资源分配器(ALCOR)被介绍,允许用户自由地调整他们的需求。ALCOR使用深度神经网络(DNN)作为策略,并在迭代优化算法中使用。这个算法的目标是在时间分享问题中,使用者的状态在每个时间点上进行优化,以满足他们的需求。策略在活动用户中进行无约RA(URA),即不考虑用户的需求来进行资源分配,以最大化每个时间点的总用户Utility(SU)。根据选择的URA方案,ALCOR可以在中央化或分布式环境中进行RA,并且可以采用模型基于或模型独立的方式进行RA。 derive的收敛分析提供了ALCOR的收敛性 guarantees,而数字实验证明了它的有效性。

Predicting Market Value in Professional Soccer: Insights from Explainable Machine Learning Models

  • paper_url: http://arxiv.org/abs/2311.04599
  • repo_url: None
  • paper_authors: Chunyang Huang, Shaoliang Zhang
  • for: 这个研究旨在预测职业足球运动员市场价值使用可解释机器学习模型。
  • methods: 我们使用FIFA官方网站Curated数据集,采用 ensemble机器学习方法并与SHAP添加itive exPlanations(SHAP)提供详细的模型预测解释。
  • results: GBDT模型在评估中获得最高的平均R-Squared值(0.8780)和最低的平均Root Mean Squared Error值(3,221,632.175),表明其在评估中的superior表现。我们的分析发现,球控、短传、完成、抢断、练习和攻击等技能在技能维度是关键的,而冲刺速度和加速在身体维度是关键的,而反应在认知维度是关键的。我们的结果提供了更准确、Objective和一致的市场价值估算框架,为管理层的转会决策提供有用的洞察。
    Abstract This study presents an innovative method for predicting the market value of professional soccer players using explainable machine learning models. Using a dataset curated from the FIFA website, we employ an ensemble machine learning approach coupled with Shapley Additive exPlanations (SHAP) to provide detailed explanations of the models' predictions. The GBDT model achieves the highest mean R-Squared (0.8780) and the lowest mean Root Mean Squared Error (3,221,632.175), indicating its superior performance among the evaluated models. Our analysis reveals that specific skills such as ball control, short passing, finishing, interceptions, dribbling, and tackling are paramount within the skill dimension, whereas sprint speed and acceleration are critical in the fitness dimension, and reactions are preeminent in the cognitive dimension. Our results offer a more accurate, objective, and consistent framework for market value estimation, presenting useful insights for managerial decisions in player transfers.
    摘要 Note: "Simplified Chinese" is also known as "Mandarin Chinese" or "Standard Chinese".Translation notes:* "market value" is translated as "市场价值" (shìchǎng jīyà)* "professional soccer players" is translated as "职业足球运动员" (zhíyè zúqiú yùndòngyuán)* "explainable machine learning models" is translated as "可解释机器学习模型" (kějiěshì jīshì yùnxíng módelì)* "ensemble machine learning approach" is translated as "集成机器学习方法" (jíshì jīshì yùnxíng fāngshì)* "Shapley Additive exPlanations" is translated as "夏普利加法解释" (xiàpèlì jiāfāng jiěshì)* "GBDT model" is translated as " Gradient Boosting Decision Tree 模型" (Gradient Boosting Decision Tree módel)* "mean R-Squared" is translated as "平均R平方" (píngjì R píngfāng)* "mean Root Mean Squared Error" is translated as "平均根均方差" (píngjì gēnjì fāngbiān)* "specific skills" is translated as "特定技能" (tèqīng jìnéng)* "fitness dimension" is translated as "身体维度" (shēngrōng wéidù)* "cognitive dimension" is translated as "认知维度" (rènzhì wéidù)

Deep learning as a tool for quantum error reduction in quantum image processing

  • paper_url: http://arxiv.org/abs/2311.04575
  • repo_url: None
  • paper_authors: Krzysztof Werner, Kamil Wereszczyński, Rafał Potempa, Krzysztof Cyran
  • for: 这个论文的目的是提出一种基于生成对抗网络的图像识别方法,以减少图像编码使用LPIQE方法所受的总错误。
  • methods: 该方法使用生成对抗网络和phasdistortion unraveling方法来减少图像编码中的总错误。
  • results: 该方法可以成功减少图像编码中的总错误,并且可以保持图像的原始特征。
    Abstract Despite the limited availability and quantum volume of quantum computers, quantum image representation is a widely researched area. Currently developed methods use quantum entanglement to encode information about pixel positions. These methods range from using the angle parameter of the rotation gate (e.g., the Flexible Representation of Quantum Images, FRQI), sequences of qubits (e.g., Novel Enhanced Quantum Representation, NEQR), or the angle parameter of the phase shift gates (e.g., Local Phase Image Quantum Encoding, LPIQE) for storing color information. All these methods are significantly affected by decoherence and other forms of quantum noise, which is an inseparable part of quantum computing in the noisy intermediate-scale quantum era. These phenomena can highly influence the measurements and result in extracted images that are visually dissimilar to the originals. Because this process is at its foundation quantum, the computational reversal of this process is possible. There are many methods for error correction, mitigation, and reduction, but all of them use quantum computer time or additional qubits to achieve the desired result. We report the successful use of a generative adversarial network trained for image-to-image translation, in conjunction with Phase Distortion Unraveling error reduction method, for reducing overall error in images encoded using LPIQE.
    摘要 尽管量子计算机的可用性和量子量有限,量子图像表示仍然是广泛研究的领域。目前已经开发出的方法利用量子Entanglement来编码图像像素的信息。这些方法包括使用旋转门的角度参数(如Flexible Representation of Quantum Images,FRQI)、顺序的qubits(如Novel Enhanced Quantum Representation,NEQR)或扩散门的角度参数(如Local Phase Image Quantum Encoding,LPIQE)来存储颜色信息。这些方法都受到干扰和其他量子噪声的影响,这些噪声是量子计算在不稳定中型量子时代的不可避免的一部分。这些现象会高度影响测量结果,导致提取的图像与原始图像显示不同。由于这是量子的基础,因此可以通过量子计算的计算反转来解决这个问题。有许多方法用于错误纠正、减轻和减少,但所有这些方法均需要使用量子计算机时间或额外的qubits来实现感兴趣的结果。我们报告了使用基于图像到图像翻译的生成 adversarial network,与扩散门错误降低法相结合,以降低LPIQE编码图像的总错误。

Information-Theoretic Generalization Bounds for Transductive Learning and its Applications

  • paper_url: http://arxiv.org/abs/2311.04561
  • repo_url: None
  • paper_authors: Huayi Tang, Yong Liu
  • for: 这个论文是为了研究逻辑学上的推导学习算法的通用化 bound,特别是在信息论上。
  • methods: 论文使用了数据依赖和算法依赖的通用化 bound,以及在推导学习算法中首次提出的拓扑supersamples概念。
  • results: 论文显示了在不同信息度下的推导学习算法的通用化 bound,并 deriv了 novel PAC-Bayesian bound和推导学习下的损失地形平坦性。最后,论文应用了结果到 semi-supervised learning 和图学习场景。
    Abstract In this paper, we develop data-dependent and algorithm-dependent generalization bounds for transductive learning algorithms in the context of information theory for the first time. We show that the generalization gap of transductive learning algorithms can be bounded by the mutual information between training labels and hypothesis. By innovatively proposing the concept of transductive supersamples, we go beyond the inductive learning setting and establish upper bounds in terms of various information measures. Furthermore, we derive novel PAC-Bayesian bounds and build the connection between generalization and loss landscape flatness under the transductive learning setting. Finally, we present the upper bounds for adaptive optimization algorithms and demonstrate the applications of results on semi-supervised learning and graph learning scenarios. Our theoretic results are validated on both synthetic and real-world datasets.
    摘要 在本文中,我们为推导学习算法的泛化性提供了数据依赖和算法依赖的通用 bound,这是在信息理论中的首次。我们证明了推导学习算法的泛化差可以通过训练标签和假设之间的共识来Upper bound。通过提出了推导超额样本的概念,我们超出了 inductive 学习设定,并在不同的信息度量上建立了Upper bounds。此外,我们还 deriv了 PAC-Bayesian bound 和泛化与损失函数平坦性之间的连接。最后,我们给出了适应优化算法的Upper bounds,并在 semi-supervised 学习和图学习场景中应用了结果。我们的理论结果通过 synthetic 数据和实际数据进行验证。

Regression with Cost-based Rejection

  • paper_url: http://arxiv.org/abs/2311.04550
  • repo_url: None
  • paper_authors: Xin Cheng, Yuzhou Cao, Haobo Wang, Hongxin Wei, Bo An, Lei Feng
  • for: 本研究旨在解决 regression 问题中的 cost-based rejection 问题,其中模型可以根据 certain rejection costs 拒绝对某些示例进行预测。
  • methods: 我们首先将此问题转化为预测风险的问题,然后 deriv 出 Bayes 优化解决方案,其显示了在使用 mean squared error 评价指标时,优化的模型应该拒绝对 variance 大于 rejection cost 的示例进行预测。 我们还提出了一种基于 surrogate loss function 的训练方法,并提供了模型一致性的条件,这意味着我们的提议的 surrogate loss 可以回归 Bayes 优化解决方案。
  • results: 我们的实验结果表明,我们的提议的方法可以有效地解决 regression 问题中的 cost-based rejection 问题。
    Abstract Learning with rejection is an important framework that can refrain from making predictions to avoid critical mispredictions by balancing between prediction and rejection. Previous studies on cost-based rejection only focused on the classification setting, which cannot handle the continuous and infinite target space in the regression setting. In this paper, we investigate a novel regression problem called regression with cost-based rejection, where the model can reject to make predictions on some examples given certain rejection costs. To solve this problem, we first formulate the expected risk for this problem and then derive the Bayes optimal solution, which shows that the optimal model should reject to make predictions on the examples whose variance is larger than the rejection cost when the mean squared error is used as the evaluation metric. Furthermore, we propose to train the model by a surrogate loss function that considers rejection as binary classification and we provide conditions for the model consistency, which implies that the Bayes optimal solution can be recovered by our proposed surrogate loss. Extensive experiments demonstrate the effectiveness of our proposed method.
    摘要 Translated into Simplified Chinese:学习与拒绝是一个重要的框架,可以避免严重的预测错误,通过平衡预测和拒绝来做出决策。先前的研究仅将成本基于拒绝应用于分类设定,无法处理连续和无穷目标空间的回归设定。在这篇论文中,我们调查一种新的回归问题,即基于成本的拒绝回归,其中模型可以在某些示例上拒绝进行预测,给定某些拒绝成本。为解决这个问题,我们首先形式化预期风险,然后 derivate Bayes优化解决方案,其显示了优化模型应该在具有更大的方差时拒绝进行预测,当使用mean squared error作为评价指标时。此外,我们提议使用假设损失函数来训练模型,该损失函数考虑了拒绝为二分类问题,并提供了模型一致性的条件,这意味着我们的提议的代理损失可以回归到bayes优化解决方案。广泛的实验证明了我们的提议的有效性。

FEIR: Quantifying and Reducing Envy and Inferiority for Fair Recommendation of Limited Resources

  • paper_url: http://arxiv.org/abs/2311.04542
  • repo_url: https://github.com/aida-ugent/feir
  • paper_authors: Nan Li, Bo Kang, Jefrey Lijffijt, Tijl De Bie
  • for: 这篇论文主要是关于电子招聘和在线约会中的推荐系统,具体来说是研究一种新的公平度量表,以及一种基于这个公平度量表的多目标优化问题。
  • methods: 这篇论文提出了一种新的公平度量表,即“劣等”(inferiority),它 mesure 用户对推荐的Item的竞争性。同时,它还使用了“嫉妒”(envy)和“实用性”(utility)这两个已有的公平度量表,并将它们组合在一起。这些公平度量表都是非�ifferentiable的,因此 authors使用了概率解释 recommender systems 来将它们转换为可微分的版本。最后,authors将这些公平度量表组合在一起,形成了一个多目标优化问题 called \texttt{FEIR}(Fairness through Envy and Inferiority Reduction)。
  • results: experiments 表明,这种方法可以在 synthetic 和实际数据上提高推荐系统的公平性,特别是在劣等和嫉妒方面。在这些实验中, authors 使用了标准的推荐系统作为基础,并对它们进行了post-processing。
    Abstract In settings such as e-recruitment and online dating, recommendation involves distributing limited opportunities, calling for novel approaches to quantify and enforce fairness. We introduce \emph{inferiority}, a novel (un)fairness measure quantifying a user's competitive disadvantage for their recommended items. Inferiority complements \emph{envy}, a fairness notion measuring preference for others' recommendations. We combine inferiority and envy with \emph{utility}, an accuracy-related measure of aggregated relevancy scores. Since these measures are non-differentiable, we reformulate them using a probabilistic interpretation of recommender systems, yielding differentiable versions. We combine these loss functions in a multi-objective optimization problem called \texttt{FEIR} (Fairness through Envy and Inferiority Reduction), applied as post-processing for standard recommender systems. Experiments on synthetic and real-world data demonstrate that our approach improves trade-offs between inferiority, envy, and utility compared to naive recommendations and the baseline methods.
    摘要 在电子招聘和在线约会等设置中,推荐具有有限的机会分配,需要开发新的方法来衡量和实施公平。我们介绍了一种新的不公平度量量,称为“劣等”(inferiority),它衡量用户对推荐的项目的竞争性劣势。劣等与嫉妒(envy)和用户性(utility)这三种度量相结合,通过一种多目标优化问题called \texttt{FEIR}(公平性通过劣等和嫉妒减少)来实现。我们将这些损失函数转换为可导的形式,并应用于标准推荐系统的后处理。实验表明,我们的方法可以在劣等、嫉妒和用户性之间进行更好的质量平衡,相比于直观推荐和基准方法。

  • paper_url: http://arxiv.org/abs/2311.04537
  • repo_url: None
  • paper_authors: Ercong Yu, Jinle Zhu, Qiang Li, Zilong Liu, Hongyang Chen, Shlomo Shamai, H. Vincent Poor
  • for: 这个论文关注的是多用户负载调制阵列(MU-LMA),它们具有低系统复杂度和成本,适用于 millimeter wave(mmWave)多输入多出口(MIMO)系统。
  • methods: 这篇论文提出了两种算法:一种是基于全面阵列结构(FAS)的传输器,另一种是基于深度学习(Deep Learning)的阵列设计和独立解码算法。
  • results: 论文显示了这两种算法的性能是 Robust to imperfect channel state information(CSI)和具有低复杂性的信号检测。另外,使用深度学习阵列设计算法可以适应不同的频率响应。
    Abstract This paper is focused on multiuser load modulation arrays (MU-LMAs) which are attractive due to their low system complexity and reduced cost for millimeter wave (mmWave) multi-input multi-output (MIMO) systems. The existing precoding algorithm for downlink MU-LMA relies on a sub-array structured (SAS) transmitter which may suffer from decreased degrees of freedom and complex system configuration. Furthermore, a conventional LMA codebook with codewords uniformly distributed on a hypersphere may not be channel-adaptive and may lead to increased signal detection complexity. In this paper, we conceive an MU-LMA system employing a full-array structured (FAS) transmitter and propose two algorithms accordingly. The proposed FAS-based system addresses the SAS structural problems and can support larger numbers of users. For LMA-imposed constant-power downlink precoding, we propose an FAS-based normalized block diagonalization (FAS-NBD) algorithm. However, the forced normalization may result in performance degradation. This degradation, together with the aforementioned codebook design problems, is difficult to solve analytically. This motivates us to propose a Deep Learning-enhanced (FAS-DL-NBD) algorithm for adaptive codebook design and codebook-independent decoding. It is shown that the proposed algorithms are robust to imperfect knowledge of channel state information and yield excellent error performance. Moreover, the FAS-DL-NBD algorithm enables signal detection with low complexity as the number of bits per codeword increases.
    摘要 In this paper, we propose an MU-LMA system using a full-array structured (FAS) transmitter, which addresses the SAS structural problems and can support larger numbers of users. For LMA-imposed constant-power downlink precoding, we propose an FAS-based normalized block diagonalization (FAS-NBD) algorithm. However, the forced normalization may result in performance degradation. This degradation, together with the aforementioned codebook design problems, is difficult to solve analytically.To address these issues, we propose a Deep Learning-enhanced (FAS-DL-NBD) algorithm for adaptive codebook design and codebook-independent decoding. The proposed algorithm uses deep learning to optimize the codebook design and decoding process, which can improve the performance of the system. Moreover, the FAS-DL-NBD algorithm enables signal detection with low complexity as the number of bits per codeword increases.The proposed algorithms are robust to imperfect knowledge of channel state information and yield excellent error performance. The use of deep learning in the FAS-DL-NBD algorithm enables the system to adapt to changing channel conditions and improve its performance. Additionally, the proposed algorithms have low complexity, which makes them suitable for practical applications.

An Unsupervised Deep Learning Approach for the Wave Equation Inverse Problem

  • paper_url: http://arxiv.org/abs/2311.04531
  • repo_url: None
  • paper_authors: Xiong-Bin Yan, Keke Wu, Zhi-Qin John Xu, Zheng Ma
  • for: 实时地球物理几何问题的高精度解析
  • methods: integrate deep neural networks and partial differential equations for solving full-waveform inversion problems
  • results: 提供了一个无监督学习的方法,可以实时地从测量数据中推断地球物理几何 Parameters,并且与传统方法比较,获得更好的结果。
    Abstract Full-waveform inversion (FWI) is a powerful geophysical imaging technique that infers high-resolution subsurface physical parameters by solving a non-convex optimization problem. However, due to limitations in observation, e.g., limited shots or receivers, and random noise, conventional inversion methods are confronted with numerous challenges, such as the local-minimum problem. In recent years, a substantial body of work has demonstrated that the integration of deep neural networks and partial differential equations for solving full-waveform inversion problems has shown promising performance. In this work, drawing inspiration from the expressive capacity of neural networks, we provide an unsupervised learning approach aimed at accurately reconstructing subsurface physical velocity parameters. This method is founded on a re-parametrization technique for Bayesian inference, achieved through a deep neural network with random weights. Notably, our proposed approach does not hinge upon the requirement of the labeled training dataset, rendering it exceedingly versatile and adaptable to diverse subsurface models. Extensive experiments show that the proposed approach performs noticeably better than existing conventional inversion methods.
    摘要 全波形推敲(FWI)是一种强大的地球物理成像技术,可以高精度地推算地下物理参数。然而,由于观测限制,如有限的发射器或接收器,以及随机噪声,传统的推敲方法面临着许多挑战,如地点最优化问题。在过去几年,大量的研究表明,将深度神经网络和部分偏微分方程相结合,可以对全波形推敲问题提供有前所未有的表现。在这篇文章中,我们Drawing inspiration from the expressive capacity of neural networks,提出了一种无监督学习方法,用于准确地重建地下物理速度参数。这种方法基于 Bayesian 推敲技术,通过深度神经网络的随机权重来实现。吸引地,我们的提议方法不需要训练数据集,因此非常灵活和适应性强,适用于多种地下模型。经过广泛的实验,我们发现,我们的方法与传统推敲方法相比,表现出了明显的优势。

Bandit Learning to Rank with Position-Based Click Models: Personalized and Equal Treatments

  • paper_url: http://arxiv.org/abs/2311.04528
  • repo_url: None
  • paper_authors: Tianchen Zhou, Jia Liu, Yang Jiao, Chaosheng Dong, Yetian Chen, Yan Gao, Yi Sun
  • for: 这个论文主要针对的问题是在线学习排名(ONL2R),它是推荐系统的基础问题,在过去几年内受到了越来越多的关注。
  • methods: 这篇论文提出了一种基于多手枪投机(MAB)框架和位置基于点击模型(PCM)的ONL2R模型。但是,开发基于MAB的ONL2R策略是非常具有挑战性,因为这个问题的 combinatorial 性和部分可见性。
  • results: 这篇论文的主要贡献包括三个方面:一是提出了第一个涵盖所有ONL2R特点的MAB框架;二是基于上述分析框架,开发了两种统一的贪婪策略和UCBRank策略,可以应用于个性化和平等的排名待遇;三是证明了这两种策略在个性化和平等的排名待遇下都能够获得$O(\sqrt{t}\ln t)$和$O(\sqrt{t\ln t})$的任何线性 regret。此外,对于基本难以解决的平等排名待遇,我们也identified了一些集合性的用途函数和其相应的充分条件,under which GreedyRank和UCBRank策略可以在$O(\sqrt{t}\ln t)$和$O(\sqrt{t\ln t})$任何线性 regret下实现优化。
    Abstract Online learning to rank (ONL2R) is a foundational problem for recommender systems and has received increasing attention in recent years. Among the existing approaches for ONL2R, a natural modeling architecture is the multi-armed bandit framework coupled with the position-based click model. However, developing efficient online learning policies for MAB-based ONL2R with position-based click models is highly challenging due to the combinatorial nature of the problem, and partial observability in the position-based click model. To date, results in MAB-based ONL2R with position-based click models remain rather limited, which motivates us to fill this gap in this work. Our main contributions in this work are threefold: i) We propose the first general MAB framework that captures all key ingredients of ONL2R with position-based click models. Our model considers personalized and equal treatments in ONL2R ranking recommendations, both of which are widely used in practice; ii) Based on the above analytical framework, we develop two unified greed- and UCB-based policies called GreedyRank and UCBRank, each of which can be applied to personalized and equal ranking treatments; and iii) We show that both GreedyRank and UCBRank enjoy $O(\sqrt{t}\ln t)$ and $O(\sqrt{t\ln t})$ anytime sublinear regret for personalized and equal treatment, respectively. For the fundamentally hard equal ranking treatment, we identify classes of collective utility functions and their associated sufficient conditions under which $O(\sqrt{t}\ln t)$ and $O(\sqrt{t\ln t})$ anytime sublinear regrets are still achievable for GreedyRank and UCBRank, respectively. Our numerical experiments also verify our theoretical results and demonstrate the efficiency of GreedyRank and UCBRank in seeking the optimal action under various problem settings.
    摘要 “在线学习排名(ONL2R)是推荐系统的基础问题,在最近的几年中受到了显著的注意。 exist 的方法 для ONL2R 中,一个自然的模型建构是多臂矢量机制(MAB)与位置基于的点击模型。然而,为了开发具有高效性的在线学习策略,MAB 基于 ONL2R WITH 位置基于的点击模型 是非常困难的,因为这问题的 combinatorial 性和点击模型中的假设不完整。到目前为止,关于 MAB 基于 ONL2R WITH 位置基于的点击模型 的结果仍然很有限,这鼓使我们填补这个差距。我们的主要贡献是三个:1. 我们提出了第一个涵盖所有关键成分的 ONL2R 基于 MAB 框架。我们的模型考虑了对推荐的个性化和平等对待,这两种方法在实践中广泛使用;2. 基于上述分析框架,我们开发了两种应用于个性化和平等对待的统一的牛顿策略和 UCB 策略,它们分别被称为 GreedyRank 和 UCBRank;3. 我们证明了 GreedyRank 和 UCBRank 在个性化和平等对待下都具有 $O(\sqrt{t}\ln t)$ 和 $O(\sqrt{t\ln t})$ 任何时间斜线 regret,对应的是,我们还证明了这两种策略在某些特定的问题设定下可以获得 $O(\sqrt{t}\ln t)$ 和 $O(\sqrt{t\ln t})$ 任何时间斜线 regret。我们的实验也验证了我们的理论结果,并证明了 GreedyRank 和 UCBRank 在不同的问题设定下能够有效地寻找优化的动作。”

Long-term Time Series Forecasting based on Decomposition and Neural Ordinary Differential Equations

  • paper_url: http://arxiv.org/abs/2311.04522
  • repo_url: None
  • paper_authors: Seonkyu Lim, Jaehyeon Park, Seojin Kim, Hyowon Wi, Haksoo Lim, Jinsung Jeon, Jeongwhan Choi, Noseong Park
  • for: 该论文旨在解决长期时间序列预测(LTSF) tasks 中的挑战,具体来说是Linear-based LTSF 模型在实际应用中的表现不佳,主要归因于 transformer-based 方法中的时间信息损失。
  • methods: 该论文提出了 LTSF-DNODE 模型,其基于线性常微分方程(ODEs)和时间序列分解方法,可以充分利用数据特点。
  • results: 论文表明 LTSF-DNODE 模型在多个实际数据集上表现出色,并且对每个数据集进行了常微分方程(NODE)核心 régularization 的影响分析。
    Abstract Long-term time series forecasting (LTSF) is a challenging task that has been investigated in various domains such as finance investment, health care, traffic, and weather forecasting. In recent years, Linear-based LTSF models showed better performance, pointing out the problem of Transformer-based approaches causing temporal information loss. However, Linear-based approach has also limitations that the model is too simple to comprehensively exploit the characteristics of the dataset. To solve these limitations, we propose LTSF-DNODE, which applies a model based on linear ordinary differential equations (ODEs) and a time series decomposition method according to data statistical characteristics. We show that LTSF-DNODE outperforms the baselines on various real-world datasets. In addition, for each dataset, we explore the impacts of regularization in the neural ordinary differential equation (NODE) framework.
    摘要 长期时间序列预测(LTSF)是一项复杂的任务,在不同领域如金融投资、医疗、交通和天气预测中都有广泛的研究。最近几年,线性基于的 LTSF 模型表现更好,表明了 transformer 基于的方法导致时间信息损失的问题。然而,线性基于的方法也有局限性,模型太简单,无法全面利用数据集的特点。为解决这些局限性,我们提出了 LTSF-DNODE,它利用基于线性常微分方程(ODEs)的模型和根据数据统计特点的时间序列分解方法。我们显示了 LTSF-DNODE 在多个实际 datasets 上的超越基准点。此外,对于每个数据集,我们还探讨了 NODE 框架中的规则化的影响。

Adaptive Mirror Descent Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2311.04520
  • repo_url: None
  • paper_authors: Feihu Huang
  • for: 本文提出了一类高效的适应双层方法,用于非 convex 双层优化问题,其上层问题是非 convex 可能具有不规则化的正则化,下层问题是非 convex 问题,满足波佩-{\L}ojasiewicz(PL)条件。
  • methods: 我们提出了一种高效的适应投影帮助梯度(i.e., AdaPAG)方法,基于镜像下降,并证明其可以在非 convex 双层问题中获得最佳known的梯度复杂度为 $O(\epsilon^{-1})$,用于找到 $\epsilon$-定点解。我们还提出了一种高效的适应随机投影帮助梯度(i.e., AdaVSPAG)方法,基于镜像下降和减少噪声技术,并证明其可以在非 convex 双层问题中获得最佳known的梯度复杂度为 $O(\epsilon^{-3/2})$,用于找到 $\epsilon$-定点解。
  • results: 我们提供了一种有用的 convergence 分析框架,用于我们的方法,并证明其在某些轻量级的假设下具有 $O(\frac{1}{T})$ 的快速收敛率,其中 $T$ 是迭代次数。
    Abstract In the paper, we propose a class of efficient adaptive bilevel methods based on mirror descent for nonconvex bilevel optimization, where its upper-level problem is nonconvex possibly with nonsmooth regularization, and its lower-level problem is also nonconvex while satisfies Polyak-{\L}ojasiewicz (PL) condition. To solve these deterministic bilevel problems, we present an efficient adaptive projection-aid gradient (i.e., AdaPAG) method based on mirror descent, and prove that it obtains the best known gradient complexity of $O(\epsilon^{-1})$ for finding an $\epsilon$-stationary solution of nonconvex bilevel problems. To solve these stochastic bilevel problems, we propose an efficient adaptive stochastic projection-aid gradient (i.e., AdaVSPAG) methods based on mirror descent and variance-reduced techniques, and prove that it obtains the best known gradient complexity of $O(\epsilon^{-3/2})$ for finding an $\epsilon$-stationary solution. Since the PL condition relaxes the strongly convex, our algorithms can be used to nonconvex strongly-convex bilevel optimization. Theoretically, we provide a useful convergence analysis framework for our methods under some mild conditions, and prove that our methods have a fast convergence rate of $O(\frac{1}{T})$, where $T$ denotes the number of iterations.
    摘要 文章中,我们提出了一种高效的适应镜架方法,用于非 convex 双层优化问题,其中上层问题是非 convex 可能具有非细ooth 规范,而下层问题是非 convex 问题,满足波佩-{\L}ojasiewicz(PL)条件。为解决这些权重 determine 的双层问题,我们提出了一种高效的适应投影帮助梯度(i.e., AdaPAG)方法,基于镜架 descent,并证明其可以在 $O(\epsilon^{-1})$ 时间内找到非 convex 双层问题的 $\epsilon$-稳定解。为解决这些随机双层问题,我们提出了一种高效的适应随机投影帮助梯度(i.e., AdaVSPAG)方法,基于镜架 descent 和减少噪声技术,并证明其可以在 $O(\epsilon^{-3/2})$ 时间内找到非 convex 双层问题的 $\epsilon$-稳定解。由于 PL 条件放宽了强式 convex,我们的算法可以应用于非 convex 强式 convex 双层优化问题。我们提供了一个有用的 convergence 分析框架,并证明我们的方法在某些轻微条件下具有 $O(\frac{1}{T})$ 的快速收敛率,where $T$ 表示迭代次数。

Towards Democratizing AI: A Comparative Analysis of AI as a Service Platforms and the Open Space for Machine Learning Approach

  • paper_url: http://arxiv.org/abs/2311.04518
  • repo_url: None
  • paper_authors: Dennis Rall, Bernhard Bauer, Thomas Fraunholz
  • for: 本研究旨在推动人工智能的普及和宣扬,但现有的AIaaS平台仍有一定的障碍,因此本研究旨在 Comparing several popular AI-as-a-Service platforms and identifying the key requirements for a platform that can achieve true democratization of AI.
  • methods: 本研究使用了多种现代技术,如Kubernetes、Kubeflow Pipelines和Ludwig,以 overcome the challenges of democratizing AI.
  • results: 本研究的分析显示,自主主机选项、高可扩展性和开源性是普及人工智能的关键要求。此外,我们的方法比现有的AIaaS平台更加全面和有效地满足了普及人工智能的需求。
    Abstract Recent AI research has significantly reduced the barriers to apply AI, but the process of setting up the necessary tools and frameworks can still be a challenge. While AI-as-a-Service platforms have emerged to simplify the training and deployment of AI models, they still fall short of achieving true democratization of AI. In this paper, we aim to address this gap by comparing several popular AI-as-a-Service platforms and identifying the key requirements for a platform that can achieve true democratization of AI. Our analysis highlights the need for self-hosting options, high scalability, and openness. To address these requirements, we propose our approach: the "Open Space for Machine Learning" platform. Our platform is built on cutting-edge technologies such as Kubernetes, Kubeflow Pipelines, and Ludwig, enabling us to overcome the challenges of democratizing AI. We argue that our approach is more comprehensive and effective in meeting the requirements of democratizing AI than existing AI-as-a-Service platforms.
    摘要 现代人工智能研究已经大幅降低了应用人工智能的门槛,但是设置必要的工具和框架仍然是一个挑战。而AIaaS平台已经出现以简化训练和部署人工智能模型的过程,但它们仍然无法实现真正的人工智能民主化。在这篇论文中,我们想要解决这个差距,我们对多个流行的AIaaS平台进行比较,并确定了实现真正的民主化人工智能的关键需求。我们的分析表明,自主主机、可扩展性和开放性是必需的。为了解决这些需求,我们提出了我们的方法:“机器学习开放空间”平台。我们的平台基于最新的技术,如Kubernetes、Kubeflow Pipelines和Ludwig,使我们能够超越民主化人工智能的挑战。我们认为,我们的方法比现有的AIaaS平台更加全面和有效地满足了民主化人工智能的需求。

Strategies for Parallelizing the Big-Means Algorithm: A Comprehensive Tutorial for Effective Big Data Clustering

  • paper_url: http://arxiv.org/abs/2311.04517
  • repo_url: None
  • paper_authors: Ravil Mussabayev, Rustam Mussabayev
  • for: 这个研究旨在优化大量数据集 clustering 的 Big-means 算法,探讨了四种不同的并行策略。
  • methods: 研究采用了多种并行策略,并进行了广泛的实验测试,以评估每种方法的计算效率、可扩展性和归一化性。
  • results: 研究发现了不同并行策略的优缺点,并分析了各种因素对归一化质量的影响。 这些发现可以为选择最佳并行策略提供实践性的指导。
    Abstract This study focuses on the optimization of the Big-means algorithm for clustering large-scale datasets, exploring four distinct parallelization strategies. We conducted extensive experiments to assess the computational efficiency, scalability, and clustering performance of each approach, revealing their benefits and limitations. The paper also delves into the trade-offs between computational efficiency and clustering quality, examining the impacts of various factors. Our insights provide practical guidance on selecting the best parallelization strategy based on available resources and dataset characteristics, contributing to a deeper understanding of parallelization techniques for the Big-means algorithm.
    摘要 Note: Simplified Chinese is also known as "Mandarin" or "Guoyu".Translation notes:* "Big-means" is translated as "大均值算法" (dà jù yù suàn fǎ)* "clustering" is translated as "分类" (fēn xì)* "large-scale datasets" is translated as "大规模数据集" (dà xiàng móu dà tóu)* "parallelization strategies" is translated as "并行策略" (bèng xíng cè lü)* "computational efficiency" is translated as "计算效率" (jì suan xiǎng lü)* "scalability" is translated as "可扩展性" (kě kē zhòng xiǎng xìng)* "clustering performance" is translated as "分类性能" (fēn xì de yè ning)

Solution of FPK Equation for Stochastic Dynamics Subjected to Additive Gaussian Noise via Deep Learning Approach

  • paper_url: http://arxiv.org/abs/2311.04511
  • repo_url: None
  • paper_authors: Amir H. Khodabakhsh, Seid H. Pourtakdoust
  • For: 这个论文是为了解决高维度泊松方程(FPK equation)的解决方法,以及使用物理法律来塑造深度学习网络(FPK-DP Net)。* Methods: 该论文提出了一种名为FPK-DP Net的物理学习网络,该网络可以解决高维度泊松方程(FPK equation)的density演化问题,不需要任何先前的数据 simulate。* Results: 该论文通过对五个 benchmark 问题的数学实现来证明FPK-DP Net的准确性和效率。
    Abstract The Fokker-Plank-Kolmogorov (FPK) equation is an idealized model representing many stochastic systems commonly encountered in the analysis of stochastic structures as well as many other applications. Its solution thus provides an invaluable insight into the performance of many engineering systems. Despite its great importance, the solution of the FPK equation is still extremely challenging. For systems of practical significance, the FPK equation is usually high dimensional, rendering most of the numerical methods ineffective. In this respect, the present work introduces the FPK-DP Net as a physics-informed network that encodes the physical insights, i.e. the governing constrained differential equations emanated out of physical laws, into a deep neural network. FPK-DP Net is a mesh-free learning method that can solve the density evolution of stochastic dynamics subjected to additive white Gaussian noise without any prior simulation data and can be used as an efficient surrogate model afterward. FPK-DP Net uses the dimension-reduced FPK equation. Therefore, it can be used to address high-dimensional practical problems as well. To demonstrate the potential applicability of the proposed framework, and to study its accuracy and efficacy, numerical implementations on five different benchmark problems are investigated.
    摘要 《福克-普朗-科洛果罗夫(FPK)方程》是一种理想化的模型,表示了许多随机系统的分析,以及许多其他应用。其解决方案可以提供对许多工程系统的性能进行各种可观的了解。然而,尽管其重要性,FPK方程的解决仍然非常困难。实际应用中的FPK方程通常具有高维度,使得大多数数值方法无法应用。在这个意义上,当前的工作引入了FPK-DP网,这是一种嵌入物理知识的深度学习网络。FPK-DP网是一种不含网格的学习方法,可以在没有任何先 simulations 数据的情况下解决涉及加法白噪声的随机动力学性能演化。因此,它可以用于解决实际应用中的高维度问题。为了证明提案的框架的可应用性和精度,这里进行了五个不同的标准问题的数值实现。

Constrained Adaptive Attacks: Realistic Evaluation of Adversarial Examples and Robust Training of Deep Neural Networks for Tabular Data

  • paper_url: http://arxiv.org/abs/2311.04503
  • repo_url: None
  • paper_authors: Thibault Simonetto, Salah Ghamizi, Antoine Desjardins, Maxime Cordy, Yves Le Traon
  • for: 本研究的目的是评估深度学习模型在表格数据上的抗击力,以及在不同攻击者能力水平下的抗击性能。
  • methods: 本研究提出了CAA,首个适用于受限表格深度学习模型的有效谋敌攻击。CAA 是一种具有迭代和搜索特性的参数自适应攻击,可以在约束下生成攻击示例。
  • results: 我们使用CAA 建立了表格深度学习模型的抗击性 benchmark,并在三个受欢迎的应用场景(信用评分、钓鱼攻击和 botnet 攻击响应)上进行了评估。我们的 benchmark 支持了十个威胁模型,每个模型都具有不同的攻击能力水平。我们的结果表明,预测环境、对抗训练和攻击预算等因素对深度表格模型的抗击性评估具有重要影响,并提供了一些安全培训的建议,以提高深度表格模型对不同谋敌攻击场景的抗击性能。
    Abstract State-of-the-art deep learning models for tabular data have recently achieved acceptable performance to be deployed in industrial settings. However, the robustness of these models remains scarcely explored. Contrary to computer vision, there is to date no realistic protocol to properly evaluate the adversarial robustness of deep tabular models due to intrinsic properties of tabular data such as categorical features, immutability, and feature relationship constraints. To fill this gap, we propose CAA, the first efficient evasion attack for constrained tabular deep learning models. CAA is an iterative parameter-free attack that combines gradient and search attacks to generate adversarial examples under constraints. We leverage CAA to build a benchmark of deep tabular models across three popular use cases: credit scoring, phishing and botnet attacks detection. Our benchmark supports ten threat models with increasing capabilities of the attacker, and reflects real-world attack scenarios for each use case. Overall, our results demonstrate how domain knowledge, adversarial training, and attack budgets impact the robustness assessment of deep tabular models and provide security practitioners with a set of recommendations to improve the robustness of deep tabular models against various evasion attack scenarios.
    摘要 现代深度学习模型在表格数据上已经达到了工业级别的性能,但是这些模型的Robustness(鲁棒性)仍然很少研究。与计算机视觉不同,到目前为止没有任何实际协议来评估深度表格模型的攻击Robustness,这是因为表格数据的内在特性,如分类特征、不可变性和特征关系约束。为了填补这个空白,我们提出了CAA,第一个高效的约束深度表格模型攻击方法。CAA是一种具有迭代和搜索特性的参数自由攻击方法,可以在约束下生成攻击示例。我们利用CAA来建立了深度表格模型的三大应用场景的benchmark:信用评估、钓鱼和恶意botnet攻击检测。我们的benchmark支持了十种威胁模型,每种威胁模型都有不同的攻击能力,能够反映现实世界的攻击enario。总的来说,我们的结果表明了domain知识、对抗训练和攻击预算对深度表格模型的Robustness评估产生了深观影响,并提供了一组建议,以提高深度表格模型对不同攻击enario的Robustness。

Autonomous Advanced Aerial Mobility – An End-to-end Autonomy Framework for UAVs and Beyond

  • paper_url: http://arxiv.org/abs/2311.04472
  • repo_url: None
  • paper_authors: Sakshi Mishra, Praveen Palanisamy
  • for: 本研究旨在开拓全自动无人飞行器在交通领域的应用,包括城市空中交通、快递、监测等领域。
  • methods: 本文提出了一个扩展和可扩展的自主框架,包括感知、识别、规划和控制四个主要块,以实现全自动无人飞行器的飞行和任务执行。
  • results: 本文对多种应用场景进行了分析和评估,并探讨了多个自主飞行器群体操作和管理的挑战和机遇,以及自主飞行系统的测试、验证和证明问题。
    Abstract Developing aerial robots that can both safely navigate and execute assigned mission without any human intervention - i.e., fully autonomous aerial mobility of passengers and goods - is the larger vision that guides the research, design, and development efforts in the aerial autonomy space. However, it is highly challenging to concurrently operationalize all types of aerial vehicles that are operating fully autonomously sharing the airspace. Full autonomy of the aerial transportation sector includes several aspects, such as design of the technology that powers the vehicles, operations of multi-agent fleets, and process of certification that meets stringent safety requirements of aviation sector. Thereby, Autonomous Advanced Aerial Mobility is still a vague term and its consequences for researchers and professionals are ambiguous. To address this gap, we present a comprehensive perspective on the emerging field of autonomous advanced aerial mobility, which involves the use of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft for various applications, such as urban air mobility, package delivery, and surveillance. The article proposes a scalable and extensible autonomy framework consisting of four main blocks: sensing, perception, planning, and controls. Furthermore, the article discusses the challenges and opportunities in multi-agent fleet operations and management, as well as the testing, validation, and certification aspects of autonomous aerial systems. Finally, the article explores the potential of monolithic models for aerial autonomy and analyzes their advantages and limitations. The perspective aims to provide a holistic picture of the autonomous advanced aerial mobility field and its future directions.
    摘要 发展可以安全地自动导航并完成任务无需人类干预的天空机器人,即完全自动天空 mobilit y是研究、设计和开发努力的大方向。然而,同时操作完全自动的所有天空车辆共享空间具有极高的挑战性。全自动天空交通部门的完整性包括技术 powers 车辆的设计、多 Agent 队列的运行和遵循严格的航空领域安全要求的证明过程。因此,自主高级天空 mobilit y仍然是一个抽象的概念,其对研究人员和专业人员的影响是模糊的。为了解决这个漏洞,我们提供了一篇全面的emerging field 自主高级天空 mobilit y的视角,该视角包括使用无人航空器(UAV)和电动升降起降机(eVTOL)机器人进行各种应用,如城市空中交通、快递和监测。文章提出了可扩展和可靠的自主框架,由四个主要块组成:感知、识别、规划和控制。此外,文章还讨论了多 Agent 队列操作和管理的挑战和机遇,以及自主天空系统的测试、验证和证明方面的问题。最后,文章探讨了monolithic 模型在天空自主性方面的优点和局限性。该视角旨在提供自主高级天空 mobilit y领域的总体图景和未来方向。

Solving High Frequency and Multi-Scale PDEs with Gaussian Processes

  • paper_url: http://arxiv.org/abs/2311.04465
  • repo_url: None
  • paper_authors: Shikai Fang, Madison Cooley, Da Long, Shibo Li, Robert Kirby, Shandian Zhe
  • For: The paper aims to improve the accuracy and efficiency of solving partial differential equations (PDEs) using machine learning-based methods, specifically physics-informed neural networks (PINNs), by addressing the problem of spectral bias during training.* Methods: The authors resort to the Gaussian process (GP) framework and model the power spectrum of the PDE solution using a student t mixture or Gaussian mixture. They apply the inverse Fourier transform to obtain the covariance function and use the Jeffreys prior to estimate the mixture weights in the log domain. They also place collocation points on a grid and use the GP conditional mean to predict the solution and its derivatives.* Results: The authors show the advantage of their method in systematic experiments, including the ability to capture high-frequency and multi-scale PDEs without spectral bias, and the promotion of computational efficiency and scalability using Kronecker product properties and multilinear algebra.Here is the same information in Simplified Chinese:
  • for: 该文章目的是使用机器学习基于方法解决部分偏微分方程(PDEs)的精度和效率问题,特别是physics-informed neural networks (PINNs) 中的频谱偏迷问题。
  • methods: 作者们使用 Gaussian process (GP) 框架,模型部分偏微分方程解的能量谱,使用 student t 混合或 Gaussian 混合模型。他们应用反傅敛变换获取协方差函数,并使用 Jeffreys 先验来估算混合重量。
  • results: 作者们在系统性实验中展示了他们的方法的优势,包括不受频谱偏迷、高频和多尺度PDEs 的解决,以及使用 Kronecker 乘积性质和多线性代数提高计算效率和可扩展性。
    Abstract Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student t mixture or Gaussian mixture. We then apply the inverse Fourier transform to obtain the covariance function (according to the Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. We are the first to discover its rationale and effectiveness for PDE solving. Next,we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to greatly promote computational efficiency and scalability, without any low-rank approximations. We show the advantage of our method in systematic experiments.
    摘要 《机器学习基于的解决方案在物理 simulations 和科学计算中备受推崇,例如物理 Informed Neural Networks (PINNs)。然而,PINNs 经常遇到高频和多核频率的 PDE 问题,这可能是在 neural network 训练过程中的 spectral bias。为解决这个问题,我们转而使用 Gaussian Process (GP) 框架。我们使用学生 t 混合或 Gaussian 混合来模型 PDE 解的能量谱,然后使用 inverse Fourier transform 获得 covariance function(根据 Wiener-Khinchin 定理)。这个 covariance 函数与知名的 spectral mixture kernel 相同。我们是第一个发现其理由和有效性,并且在 PDE 解中应用。接下来,我们在循环域中估算混合 веса,我们显示这等价于在 Jeffreys 前导下进行估算。这会自动强制简洁,抑制过分频率,并调整剩下的频率向真实值。三、我们使用 GP 的 conditional mean 来预测解和其导函数,以拟合边界条件和方程本身。因此,我们可以 derive 一个 Kronecker 乘积结构在 covariance 矩阵中。我们使用 Kronecker 乘积属性和多线性代数,不需要低级 approximation,可以大幅提高计算效率和可扩展性。我们在系统性实验中展示了我们的方法的优势。》Note: The translation is in Simplified Chinese, which is a standardized form of Chinese used in mainland China and widely used in education, media, and other formal contexts. If you need the translation in Traditional Chinese, please let me know.

A Hierarchical Spatial Transformer for Massive Point Samples in Continuous Space

  • paper_url: http://arxiv.org/abs/2311.04434
  • repo_url: https://github.com/spatialdatasciencegroup/hst
  • paper_authors: Wenchong He, Zhe Jiang, Tingsong Xiao, Zelin Xu, Shigang Chen, Ronald Fick, Miles Medina, Christine Angelini
  • For: This paper is written for massive point samples in continuous space, which are common in environment sciences, numerical simulations, and location-based services. The author proposes a novel transformer model to address the challenges of long-range and multi-scale dependency, non-uniform point distribution, and high computational costs.* Methods: The proposed hierarchical spatial transformer model includes multi-resolution representation learning within a quad-tree hierarchy and efficient spatial attention via coarse approximation. The model also includes an uncertainty quantification branch to estimate prediction confidence related to input feature noise and point sparsity.* Results: The author provides a theoretical analysis of computational time complexity and memory costs. Extensive experiments on both real-world and synthetic datasets show that the proposed method outperforms multiple baselines in prediction accuracy and can scale up to one million points on one NVIDIA A100 GPU.Here’s the Chinese version of the information:* For: 这篇论文是为大量连续空间中的点样本而写的,这些样本广泛存在环境科学、数值仿真和位置基础服务等领域。作者提出了一种新的变换器模型,用于解决连续空间中点样本的长距离和多尺度关系、非均匀点分布和计算成本的问题。* Methods: 提出的层次空间变换器模型包括在Quad-tree层次结构中进行多尺度表示学习,以及高效的空间注意力计算方法。模型还包括一个不确定性评估分支,用于估计输入特征噪声和点缺失的影响。* Results: 作者提供了计算时间复杂度和内存成本的理论分析。对实际数据和 sintetic 数据进行了广泛的实验,结果显示,提出的方法在预测精度方面超过多个基elines,并且可以在一个NVIDIA A100 GPU上处理一百万个点。模型代码可以在 GitHub 上找到:https://github.com/spatialdatasciencegroup/HST。
    Abstract Transformers are widely used deep learning architectures. Existing transformers are mostly designed for sequences (texts or time series), images or videos, and graphs. This paper proposes a novel transformer model for massive (up to a million) point samples in continuous space. Such data are ubiquitous in environment sciences (e.g., sensor observations), numerical simulations (e.g., particle-laden flow, astrophysics), and location-based services (e.g., POIs and trajectories). However, designing a transformer for massive spatial points is non-trivial due to several challenges, including implicit long-range and multi-scale dependency on irregular points in continuous space, a non-uniform point distribution, the potential high computational costs of calculating all-pair attention across massive points, and the risks of over-confident predictions due to varying point density. To address these challenges, we propose a new hierarchical spatial transformer model, which includes multi-resolution representation learning within a quad-tree hierarchy and efficient spatial attention via coarse approximation. We also design an uncertainty quantification branch to estimate prediction confidence related to input feature noise and point sparsity. We provide a theoretical analysis of computational time complexity and memory costs. Extensive experiments on both real-world and synthetic datasets show that our method outperforms multiple baselines in prediction accuracy and our model can scale up to one million points on one NVIDIA A100 GPU. The code is available at \url{https://github.com/spatialdatasciencegroup/HST}.
    摘要 启用变换器是深度学习架构的广泛应用。现有的变换器主要是为文本、图像、视频和图表设计的。这篇论文提出了一种新的变换器模型,用于处理大量(达到百万)的点样本,这些样本在连续空间中分布。这些样本广泛存在环境科学(例如感知器观测)、数学模拟(例如带有粒子的流体和天文学)以及地理位置服务(例如POI和轨迹)等领域。然而,设计为大量点样本的变换器是非常困难的,因为这些样本存在许多挑战,包括隐式的长距离和多尺度相互关系,分布不均匀,计算所有对之间的注意力计算的可能高计算成本,以及因点密度变化而导致的预测结果过于自信。为了解决这些挑战,我们提出了一种新的层次空间变换器模型,包括多尺度表示学习在四个树层中和高效的空间注意力via粗略估计。我们还设计了一个不确定量计算分支,用于估计输入特征噪声和点稀缺所导致的预测信度。我们提供了计算时间复杂度和内存成本的理论分析。在实际实验中,我们的方法在多个实际和 sintetic 数据集上取得了较高的预测精度,并且我们的模型可以在一个NVIDIA A100 GPU上处理一百万点。代码可以在 \url{https://github.com/spatialdatasciencegroup/HST} 上获得。

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

  • paper_url: http://arxiv.org/abs/2311.04417
  • repo_url: None
  • paper_authors: Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker, Ang Li
    for:这些商业AI/ML加速器的设计和实现是为了满足现代AI/ML算法的增加复杂度和计算需求而创造的,以提高AI/ML任务的性能和能效性。methods:这些加速器的各种设计优化和数据流体系结构使得它们在AI/ML任务中表现出色,包括Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU) 和改进的GPU平台。results:这些加速器在常见的DNN运算符和其他AI/ML任务上的性能评估和比较,以阐明数据流体系的优势和传统处理器设计的缺陷,并提供了每个平台的性能交易。这些发现可以作为研发下一代加速器的参考,以满足AI/ML应用领域不断演化的需求。
    Abstract The relentless advancement of artificial intelligence (AI) and machine learning (ML) applications necessitates the development of specialized hardware accelerators capable of handling the increasing complexity and computational demands. Traditional computing architectures, based on the von Neumann model, are being outstripped by the requirements of contemporary AI/ML algorithms, leading to a surge in the creation of accelerators like the Graphcore Intelligence Processing Unit (IPU), Sambanova Reconfigurable Dataflow Unit (RDU), and enhanced GPU platforms. These hardware accelerators are characterized by their innovative data-flow architectures and other design optimizations that promise to deliver superior performance and energy efficiency for AI/ML tasks. This research provides a preliminary evaluation and comparison of these commercial AI/ML accelerators, delving into their hardware and software design features to discern their strengths and unique capabilities. By conducting a series of benchmark evaluations on common DNN operators and other AI/ML workloads, we aim to illuminate the advantages of data-flow architectures over conventional processor designs and offer insights into the performance trade-offs of each platform. The findings from our study will serve as a valuable reference for the design and performance expectations of research prototypes, thereby facilitating the development of next-generation hardware accelerators tailored for the ever-evolving landscape of AI/ML applications. Through this analysis, we aspire to contribute to the broader understanding of current accelerator technologies and to provide guidance for future innovations in the field.
    摘要 人工智能(AI)和机器学习(ML)应用的不断发展需要特化的硬件加速器来处理不断增长的复杂性和计算需求。传统的计算架构,基于 von Neumann 模型,被当代 AI/ML 算法的要求所超越,导致加速器的创造,如 Graphcore 智能处理器(IPU)、Sambanova 可重新配置数据流处理器(RDU)和增强 GPU 平台。这些硬件加速器具有创新的数据流架构和其他设计优化,以提供对 AI/ML 任务的超越性性能和能效性。本研究提供了这些商业 AI/ML 加速器的初步评估和比较,探讨其硬件和软件设计特点,以确定它们的优势和特殊能力。通过对常见深度神经网络(DNN)操作和其他 AI/ML 工作负荷进行 benchmark 评估,我们希望通过探讨数据流架构的优势和传统处理器设计的缺陷,为研究人员提供参考,以便设计和实现下一代特化于 AI/ML 应用的硬件加速器。我们的研究结果将对广泛的硬件加速器技术产生影响,并为未来在这个领域的创新提供指导。

Likelihood Ratio Confidence Sets for Sequential Decision Making

  • paper_url: http://arxiv.org/abs/2311.04402
  • repo_url: None
  • paper_authors: Nicolas Emmenegger, Mojmír Mutný, Andreas Krause
  • for: 这篇论文旨在提供一种可证明的、适应性 uncertainty 估计方法,用于Sequential Decision-Making 算法。
  • methods: 该方法基于 likelihood-based inference principle,使用 likelihood ratios 构建 any-time 有效 confidence sequences,不需要特殊的应用场景处理。
  • results: 该方法适用于具有明确 likelihood 函数的问题,其 resulting sets 总是保持预设的覆盖率,并且可以在 model-agnostic 的方式下实现。 estimators 的选择可以通过 provable 的方式来确定,而且与 online convex optimization 算法相关,如 Follow-the-Regularized-Leader。 更重要的是,该方法可以在 non-parametric 设置中使用,如 RKHS 函数类型。
    Abstract Certifiable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific combination of parameterization, noise family, and estimator. In this paper, we revisit the likelihood-based inference principle and propose to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario. Our method is especially suitable for problems with well-specified likelihoods, and the resulting sets always maintain the prescribed coverage in a model-agnostic manner. The size of the sets depends on a choice of estimator sequence in the likelihood ratio. We discuss how to provably choose the best sequence of estimators and shed light on connections to online convex optimization with algorithms such as Follow-the-Regularized-Leader. To counteract the initially large bias of the estimators, we propose a reweighting scheme that also opens up deployment in non-parametric settings such as RKHS function classes. We provide a non-asymptotic analysis of the likelihood ratio confidence sets size for generalized linear models, using insights from convex duality and online learning. We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions.
    摘要 certificable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific combination of parameterization, noise family, and estimator. In this paper, we revisit the likelihood-based inference principle and propose to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario. Our method is especially suitable for problems with well-specified likelihoods, and the resulting sets always maintain the prescribed coverage in a model-agnostic manner. The size of the sets depends on a choice of estimator sequence in the likelihood ratio. We discuss how to provably choose the best sequence of estimators and shed light on connections to online convex optimization with algorithms such as Follow-the-Regularized-Leader. To counteract the initially large bias of the estimators, we propose a reweighting scheme that also opens up deployment in non-parametric settings such as RKHS function classes. We provide a non-asymptotic analysis of the likelihood ratio confidence sets size for generalized linear models, using insights from convex duality and online learning. We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions.Here's the translation in Traditional Chinese: certificable, adaptive uncertainty estimates for unknown quantities are an essential ingredient of sequential decision-making algorithms. Standard approaches rely on problem-dependent concentration results and are limited to a specific combination of parameterization, noise family, and estimator. In this paper, we revisit the likelihood-based inference principle and propose to use likelihood ratios to construct any-time valid confidence sequences without requiring specialized treatment in each application scenario. Our method is especially suitable for problems with well-specified likelihoods, and the resulting sets always maintain the prescribed coverage in a model-agnostic manner. The size of the sets depends on a choice of estimator sequence in the likelihood ratio. We discuss how to provably choose the best sequence of estimators and shed light on connections to online convex optimization with algorithms such as Follow-the-Regularized-Leader. To counteract the initially large bias of the estimators, we propose a reweighting scheme that also opens up deployment in non-parametric settings such as RKHS function classes. We provide a non-asymptotic analysis of the likelihood ratio confidence sets size for generalized linear models, using insights from convex duality and online learning. We showcase the practical strength of our method on generalized linear bandit problems, survival analysis, and bandits with various additive noise distributions.