cs.LG - 2023-10-17

  • paper_url: http://arxiv.org/abs/2310.11612
  • repo_url: https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval
  • paper_authors: Yimu Wang, Xiangru Jian, Bo Xue
  • for: 本研究旨在解决跨Modal Retrieval中的干扰问题,即 gallery数据点频繁出现在检索结果中,导致检索性能下降。
  • methods: 我们提出了一种新的框架,双银行正常化(DBNorm),以及两种新的方法,对比轮径正常化和动态对比轮径正常化,以正常化相似度基于两个银行。这些方法可以减少极值点和查询样本之间的相似度,提高非极值点和查询样本之间的相似度。
  • results: 我们在多种语言基础 benchmark上进行了广泛的实验,包括文本-图像、文本-视频和文本-音频 benchmark,并证明了我们的方法可以比前方法更好地解决干扰问题,提高检索性能。我们的代码可以在https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval上下载。
    Abstract In this work, we present a post-processing solution to address the hubness problem in cross-modal retrieval, a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance. We first theoretically demonstrate the necessity of incorporating both the gallery and query data for addressing hubness as hubs always exhibit high similarity with gallery and query data. Second, building on our theoretical results, we propose a novel framework, Dual Bank Normalization (DBNorm). While previous work has attempted to alleviate hubness by only utilizing the query samples, DBNorm leverages two banks constructed from the query and gallery samples to reduce the occurrence of hubs during inference. Next, to complement DBNorm, we introduce two novel methods, dual inverted softmax and dual dynamic inverted softmax, for normalizing similarity based on the two banks. Specifically, our proposed methods reduce the similarity between hubs and queries while improving the similarity between non-hubs and queries. Finally, we present extensive experimental results on diverse language-grounded benchmarks, including text-image, text-video, and text-audio, demonstrating the superior performance of our approaches compared to previous methods in addressing hubness and boosting retrieval performance. Our code is available at https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval.
    摘要 在这项工作中,我们提出了一种后处理解决方案,用于解决跨Modal重现中的枢轴问题,即一些小量的 галеリー数据点频繁地被重现,导致检索性能下降。我们首先理论上证明了在 incorporating both gallery和query数据时,解决枢轴问题的必要性,因为枢轴总是与 gallery和query数据 exhibit high similarity。然后,基于我们的理论结果,我们提出了一种新的框架,双银行Normalization(DBNorm)。在前一些工作中,人们尝试了通过只使用查询样本来缓解枢轴,但DBNorm利用了基于查询和 галеリー样本构建的两个银行来减少在推理中出现的枢轴。此外,为了补充DBNorm,我们提出了两种新的方法,双 inverted softmax和 dual dynamic inverted softmax,用于在两个银行基础上Normalize similarity。specifically,我们的提出的方法可以降低枢轴和查询之间的相似性,而提高非枢轴和查询之间的相似性。最后,我们在多种语言基础 benchmark上进行了广泛的实验,包括文本-图像、文本-视频和文本-声音,并证明了我们的方法比前一些方法在解决枢轴和提高检索性能方面表现更出色。我们的代码可以在 https://github.com/yimuwangcs/Better_Cross_Modal_Retrieval 上获取。

In defense of parameter sharing for model-compression

  • paper_url: http://arxiv.org/abs/2310.11611
  • repo_url: None
  • paper_authors: Aditya Desai, Anshumali Shrivastava
  • for: 这篇论文主要目的是对模型减小内存占用的方法进行全面评估,并推广RPS方法在模型压缩中的应用。
  • methods: 本论文使用了RPS方法、截割技术和建立更小的模型来减少模型的内存占用。
  • results: 研究发现,RPS方法在压缩范围内 consistently 击败/匹配更小的模型和一些中等知识水平的截割策略,特别在高压缩场景下。此外,RPS方法还有较高的鲁棒性和稳定性。
    Abstract When considering a model architecture, there are several ways to reduce its memory footprint. Historically, popular approaches included selecting smaller architectures and creating sparse networks through pruning. More recently, randomized parameter-sharing (RPS) methods have gained traction for model compression at start of training. In this paper, we comprehensively assess the trade-off between memory and accuracy across RPS, pruning techniques, and building smaller models. Our findings demonstrate that RPS, which is both data and model-agnostic, consistently outperforms/matches smaller models and all moderately informed pruning strategies, such as MAG, SNIP, SYNFLOW, and GRASP, across the entire compression range. This advantage becomes particularly pronounced in higher compression scenarios. Notably, even when compared to highly informed pruning techniques like Lottery Ticket Rewinding (LTR), RPS exhibits superior performance in high compression settings. This points out inherent capacity advantage that RPS enjoys over sparse models. Theoretically, we establish RPS as a superior technique in terms of memory-efficient representation when compared to pruning for linear models. This paper argues in favor of paradigm shift towards RPS based models. During our rigorous evaluation of RPS, we identified issues in the state-of-the-art RPS technique ROAST, specifically regarding stability (ROAST's sensitivity to initialization hyperparameters, often leading to divergence) and Pareto-continuity (ROAST's inability to recover the accuracy of the original model at zero compression). We provably address both of these issues. We refer to the modified RPS, which incorporates our improvements, as STABLE-RPS.
    摘要 当考虑模型建 architecture时,有几种方法可以降低其内存占用量。历史上,流行的方法包括选择更小的 architecture和通过剪裁来减少模型的大小。在这篇论文中,我们系统地评估了减少内存和精度之间的权衡,并对 RPS、剪裁技术和建立更小的模型进行了比较。我们的发现表明,RPS在整个压缩范围内一直表现出优异,特别是在更高的压缩场景下。此外,RPS还比所有中等知识的剪裁策略(如MAG、SNIP、SYNFLOW和GRASP)在整个压缩范围内表现出优异。这种优势在高压缩场景下特别明显。在高压缩场景下,RPS甚至超过了高知识剪裁策略(如Lottery Ticket Rewinding)的性能。这表明RPS在压缩场景下具有内存效率的优势。从理论角度来看,我们证明RPS在线性模型上是一种更佳的压缩技术。这篇论文提倡使用基于RPS的模型。在我们对RPS进行了严格的评估后,我们发现了一些问题,包括ROAST的稳定性和紧张性(ROAST的初始化参数的敏感性,常导致偏转)以及级联稳定性(ROAST无法在零压缩场景下恢复原始模型的精度)。我们解决了这些问题,并提出了一种改进后的RPS,称为稳定RPS(STABLE-RPS)。

Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance

  • paper_url: http://arxiv.org/abs/2310.11609
  • repo_url: https://github.com/aspuru-guzik-group/kreed
  • paper_authors: Austin Cheng, Alston Lo, Santiago Miret, Brooks Pate, Alán Aspuru-Guzik
    for: 这篇论文的目的是用翻译光谱来确定小分子有机物的三维结构。methods: 这篇论文使用了翻译光谱来获得小分子有机物的精准三维信息,并使用 Краичман分析来确定氢原子的取代坐标。results: 这篇论文开发了一种基于生成 diffusion 模型,可以从分子式、翻译光谱和重元素取代坐标来推断小分子有机物的完整三维结构。这种方法的顶尖预测结果可以在 QM9 和 GEOM 数据集上达到 >98% 的准确率。
    Abstract Structure determination is necessary to identify unknown organic molecules, such as those in natural products, forensic samples, the interstellar medium, and laboratory syntheses. Rotational spectroscopy enables structure determination by providing accurate 3D information about small organic molecules via their moments of inertia. Using these moments, Kraitchman analysis determines isotopic substitution coordinates, which are the unsigned $|x|,|y|,|z|$ coordinates of all atoms with natural isotopic abundance, including carbon, nitrogen, and oxygen. While unsigned substitution coordinates can verify guesses of structures, the missing $+/-$ signs make it challenging to determine the actual structure from the substitution coordinates alone. To tackle this inverse problem, we develop KREED (Kraitchman REflection-Equivariant Diffusion), a generative diffusion model that infers a molecule's complete 3D structure from its molecular formula, moments of inertia, and unsigned substitution coordinates of heavy atoms. KREED's top-1 predictions identify the correct 3D structure with >98% accuracy on the QM9 and GEOM datasets when provided with substitution coordinates of all heavy atoms with natural isotopic abundance. When substitution coordinates are restricted to only a subset of carbons, accuracy is retained at 91% on QM9 and 32% on GEOM. On a test set of experimentally measured substitution coordinates gathered from the literature, KREED predicts the correct all-atom 3D structure in 25 of 33 cases, demonstrating experimental applicability for context-free 3D structure determination with rotational spectroscopy.
    摘要 STRUCTURE determination 是必需的,以确定未知的有机分子,如自然产物、刑事样本、 междузвездmedium 和实验室合成。扭转 спектроскопия 可以提供有机分子的准确三维信息,通过其惯性矩来确定结构。使用这些矩,卡迪曼分析可以确定原子的替换坐标,即所有原子的自然同位素含量,包括碳、氮和氧。然而,未签名的替换坐标无法决定实际结构,这是一个逆向问题。为解决这个问题,我们开发了 KREED(卡迪曼反射相对均匀扩散),一种生成扩散模型,可以从分子式、惯性矩和未签名重元素替换坐标中推断出分子的完整三维结构。KREED 的顶峰预测可以在 QM9 和 GEOM 数据集上确定分子的正确三维结构,并且在所有重元素替换坐标中具有 >98% 的准确率。当替换坐标只限于一 subset of 碳时,准确率仍保持在 91% 的水平。在Literature中测试的实验ally measured substitution coordinates中,KREED 预测了正确的所有atom 3D结构,证明了实验可行性。

TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification

  • paper_url: http://arxiv.org/abs/2310.11607
  • repo_url: https://github.com/servicenow/tk-knn
  • paper_authors: Nicholas Botzer, David Vasquez, Tim Weninger, Issam Laradji
  • For: The paper is written for improving the ability to detect intent in dialogue systems, specifically by using semi-supervised learning methods to label unlabeled data.* Methods: The paper proposes a new method called Top-K K-Nearest Neighbor (TK-KNN) that uses a more robust pseudo-labeling approach based on distance in the embedding space, while maintaining a balanced set of pseudo-labeled examples across classes through a ranking-based approach.* Results: The experiments on several datasets show that TK-KNN outperforms existing models, particularly when labeled data is scarce, such as on popular datasets like CLINC150 and Banking77.Here are the three key points in Simplified Chinese text:* For: 这篇论文是为了提高对话系统中的意图检测能力,特别是使用半监督学习方法来标注无标签数据。* Methods: 论文提出了一种新的方法called Top-K K-Nearest Neighbor (TK-KNN),它使用了更加可靠的 pseudo-labeling 方法基于 embedding 空间的距离,同时保持了类别之间的 pseudo-标签例子具有平衡的分布。* Results: 实验结果表明,TK-KNN 在几个 dataset 上表现出色,特别是在标注数据scarce的情况下,如 CLINC150 和 Banking77 等 популяр的 dataset 上。
    Abstract The ability to detect intent in dialogue systems has become increasingly important in modern technology. These systems often generate a large amount of unlabeled data, and manually labeling this data requires substantial human effort. Semi-supervised methods attempt to remedy this cost by using a model trained on a few labeled examples and then by assigning pseudo-labels to further a subset of unlabeled examples that has a model prediction confidence higher than a certain threshold. However, one particularly perilous consequence of these methods is the risk of picking an imbalanced set of examples across classes, which could lead to poor labels. In the present work, we describe Top-K K-Nearest Neighbor (TK-KNN), which uses a more robust pseudo-labeling approach based on distance in the embedding space while maintaining a balanced set of pseudo-labeled examples across classes through a ranking-based approach. Experiments on several datasets show that TK-KNN outperforms existing models, particularly when labeled data is scarce on popular datasets such as CLINC150 and Banking77. Code is available at https://github.com/ServiceNow/tk-knn
    摘要 现代技术中探测对话系统中的意图已经变得越来越重要。这些系统经常生成大量未标注数据,并且手动标注这些数据需要很大的人工劳动。半超vised方法试图解决这个问题,使用一些标注的示例来训练模型,然后将 pseudo-标签分配给一部分未标注示例,这些示例的模型预测度高于某个阈值。然而,这些方法存在一个特别危险的后果,即选择类别之间不均衡的示例集,这可能导致 poor labels。在 presente 的工作中,我们描述了 Top-K K-Nearest Neighbor (TK-KNN),这是一种基于距离 embedding 空间的更加可靠的 pseudo-标签方法,同时保持类别之间的 pseudo-标签示例集均衡。在几个数据集上进行了实验,发现 TK-KNN 超越了现有模型,特别是在 CLINC150 和 Banking77 等Popular数据集上。代码可以在 https://github.com/ServiceNow/tk-knn 上获取。

Towards Inferring Users’ Impressions of Robot Performance in Navigation Scenarios

  • paper_url: http://arxiv.org/abs/2310.11590
  • repo_url: None
  • paper_authors: Qiping Zhang, Nathan Tsoi, Booyeon Choi, Jie Tan, Hao-Tien Lewis Chiang, Marynel Vázquez
  • for: 这个论文的目的是研究使用非语言行为特征和机器学习技术来预测人们对机器人表现的印象。
  • methods: 作者提供了一个名为 SEAN TOGETHER 的数据集,包含人与移动机器人在虚拟现实环境中的互动记录,以及用户对机器人表现的评分。同时,作者还进行了人类和超参量学习模型如何预测人们对机器人表现的印象的分析。
  • results: 研究发现,人脸表情 alone 可以提供有用的信息关于人们对机器人表现的印象,但在我们测试的导航场景中,空间特征是最critical的信息 для这种推断任务。此外,当评分为二分类而不是多类时,人类预测和机器学习模型的 F1 score 更 чем doublies,表明它们都更好地预测机器人表现的方向性而不是具体评分。
    Abstract Human impressions of robot performance are often measured through surveys. As a more scalable and cost-effective alternative, we study the possibility of predicting people's impressions of robot behavior using non-verbal behavioral cues and machine learning techniques. To this end, we first contribute the SEAN TOGETHER Dataset consisting of observations of an interaction between a person and a mobile robot in a Virtual Reality simulation, together with impressions of robot performance provided by users on a 5-point scale. Second, we contribute analyses of how well humans and supervised learning techniques can predict perceived robot performance based on different combinations of observation types (e.g., facial, spatial, and map features). Our results show that facial expressions alone provide useful information about human impressions of robot performance; but in the navigation scenarios we tested, spatial features are the most critical piece of information for this inference task. Also, when evaluating results as binary classification (rather than multiclass classification), the F1-Score of human predictions and machine learning models more than doubles, showing that both are better at telling the directionality of robot performance than predicting exact performance ratings. Based on our findings, we provide guidelines for implementing these predictions models in real-world navigation scenarios.
    摘要 人类对机器人性能的印象通常通过调查来衡量。作为一种可扩展和成本效果更高的替代方案,我们研究使用非语言行为特征和机器学习技术预测人类对机器人行为的印象。为此,我们首先提供了SEAN TOGETHER数据集,包括人与移动机器人在虚拟现实环境中的互动记录,以及用户对机器人性能的评分(在5分比例上)。其次,我们分析了人类和监督学习技术如何预测人类对机器人性能的印象,根据不同的观察类型(例如,表情特征、空间特征和地图特征)。我们的结果显示,表情特征alone提供了人类对机器人性能的有用信息;但在我们测试的导航场景中,空间特征是最重要的信息来源。此外,当评估结果为二分类(而不是多类)时,人类预测和机器学习模型的F1分值超过了两倍,表示它们都更好地预测机器人性能的方向性,而不是精确的评分。根据我们的发现,我们提供了实现这些预测模型的指南,用于实际导航场景。

Partially Observable Stochastic Games with Neural Perception Mechanisms

  • paper_url: http://arxiv.org/abs/2310.11566
  • repo_url: None
  • paper_authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska
  • for: This paper is written for researchers and practitioners interested in multi-agent decision-making under uncertainty, with a focus on partial observability and data-driven perception.
  • methods: The paper proposes a new model called neuro-symbolic partially-observable stochastic games (NS-POSGs), which incorporates perception mechanisms and is applicable to one-sided settings with discrete, data-driven observations. The paper also introduces a new point-based method called one-sided NS-HSVI for approximating values of NS-POSGs.
  • results: The paper presents experimental results demonstrating the practical applicability of the proposed method for neural networks whose preimage is in polyhedral form. The results show that the one-sided NS-HSVI method is effective in approximating values of NS-POSGs and can be used to solve real-world problems involving partial observability and data-driven perception.
    Abstract Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In reality, though, agents have only partial observability of their environment, which makes the problem computationally challenging, even in the single-agent setting of partially observable Markov decision processes. Furthermore, in practice, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. To tackle this problem, we propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates perception mechanisms. We focus on a one-sided setting, comprising a partially-informed agent with discrete, data-driven observations and a fully-informed agent with continuous observations. We present a new point-based method, called one-sided NS-HSVI, for approximating values of one-sided NS-POSGs and implement it based on the popular particle-based beliefs, showing that it has closed forms for computing values of interest. We provide experimental results to demonstrate the practical applicability of our method for neural networks whose preimage is in polyhedral form.
    摘要 We focus on a one-sided setting where the partially-informed agent has discrete, data-driven observations, while the fully-informed agent has continuous observations. We develop a new point-based method called one-sided NS-HSVI for approximating values of one-sided NS-POSGs, which is based on the popular particle-based beliefs. Our method has closed forms for computing values of interest, and we provide experimental results to demonstrate its practical applicability for neural networks whose preimage is in polyhedral form.

Online Algorithms with Uncertainty-Quantified Predictions

  • paper_url: http://arxiv.org/abs/2310.11558
  • repo_url: None
  • paper_authors: Bo Sun, Jerry Huang, Nicolas Christianson, Mohammad Hajiesmaili, Adam Wierman
  • for: This paper focuses on developing online algorithms that incorporate uncertainty-quantified predictions to achieve high-quality performance guarantees while maintaining bounded worst-case guarantees.
  • methods: The paper explores the use of uncertainty-quantified predictions in online algorithms, specifically for ski rental and online search problems. The authors propose non-trivial modifications to algorithm design to fully leverage the probabilistic predictions.
  • results: The paper demonstrates the effectiveness of the proposed methods through theoretical analysis and experimental evaluations. The results show that the algorithms achieve better performance guarantees compared to traditional online algorithms, and the uncertainty-quantified predictions provide valuable information for making optimal decisions in multi-instance settings.
    Abstract Online algorithms with predictions have become a trending topic in the field of beyond worst-case analysis of algorithms. These algorithms incorporate predictions about the future to obtain performance guarantees that are of high quality when the predictions are good, while still maintaining bounded worst-case guarantees when predictions are arbitrarily poor. In general, the algorithm is assumed to be unaware of the prediction's quality. However, recent developments in the machine learning literature have studied techniques for providing uncertainty quantification on machine-learned predictions, which describes how certain a model is about its quality. This paper examines the question of how to optimally utilize uncertainty-quantified predictions in the design of online algorithms. In particular, we consider predictions augmented with uncertainty quantification describing the likelihood of the ground truth falling in a certain range, designing online algorithms with these probabilistic predictions for two classic online problems: ski rental and online search. In each case, we demonstrate that non-trivial modifications to algorithm design are needed to fully leverage the probabilistic predictions. Moreover, we consider how to utilize more general forms of uncertainty quantification, proposing a framework based on online learning that learns to exploit uncertainty quantification to make optimal decisions in multi-instance settings.
    摘要 在 beyond worst-case 分析算法领域,在线算法 Predictions 已经成为一个流行的话题。这些算法利用未来的预测来获得高质量的性能保证,当预测准确度很好时,而且仍保持 bounded worst-case 保证,当预测准确度很差时。在总的来说,算法假设不知道预测的质量。然而,现代机器学习文献中的技术已经研究了提供机器学习预测的不确定性评估,这种评估描述了模型对其质量的确定程度。本文考虑了如何优化不确定性评估的预测,并在两个经典的在线问题上进行了实践:滑雪租赁和在线搜索。在每个情况下,我们表明了非常轻量级的修改,以便完全利用预测的不确定性。此外,我们考虑了如何利用更加通用的不确定性评估,提出了基于在线学习的框架,用于在多实例设置中学习利用不确定性评估来做出优化的决策。

Bias and Error Mitigation in Software-Generated Data: An Advanced Search and Optimization Framework Leveraging Generative Code Models

  • paper_url: http://arxiv.org/abs/2310.11546
  • repo_url: None
  • paper_authors: Ernesto Giralt Hernández
  • for: corrected errors and biases in software systems specializing in data analysis and generation
  • methods: Solomonoff Induction, Kolmogorov Conditional Complexity, generative models ( LLMS)
  • results: incrementally improve the quality of output results
    Abstract Data generation and analysis is a fundamental aspect of many industries and disciplines, from strategic decision making in business to research in the physical and social sciences. However, data generated using software and algorithms can be subject to biases and errors. These can be due to problems with the original software, default settings that do not align with the specific needs of the situation, or even deeper problems with the underlying theories and models. This paper proposes an advanced search and optimization framework aimed at generating and choosing optimal source code capable of correcting errors and biases from previous versions to address typical problems in software systems specializing in data analysis and generation, especially those in the corporate and data science world. Applying this framework multiple times on the same software system would incrementally improve the quality of the output results. It uses Solomonoff Induction as a sound theoretical basis, extending it with Kolmogorov Conditional Complexity, a novel adaptation, to evaluate a set of candidate programs. We propose the use of generative models for the creation of this set of programs, with special emphasis on the capabilities of Large Language Models (LLMs) to generate high quality code.
    摘要 “数据生成和分析是许多行业和领域的基础方面,从商业战略决策到物理和社会科学研究。但是,由软件和算法生成的数据可能受到偏见和错误的影响。这些问题可能来自原始软件的问题、不适应特定情况的默认设置或更深层次的理论和模型问题。这篇论文提出了一种高级搜索和优化框架,用于生成和选择修正过去版本中的错误和偏见的最佳源代码。通过多次应用这种框架于同一个软件系统,可以逐步提高输出结果的质量。它基于索löмо夫推理为基础,并将其扩展到科尔莫果ров conditional complexity,一种新的适应,以评估候选程序集。我们建议使用生成模型来创建这些候选程序集,尤其是利用大型自然语言模型(LLMs)生成高质量代码。”

Thin and Deep Gaussian Processes

  • paper_url: http://arxiv.org/abs/2310.11527
  • repo_url: https://github.com/spectraldani/thindeepgps
  • paper_authors: Daniel Augusto de Souza, Alexander Nikitin, ST John, Magnus Ross, Mauricio A. Álvarez, Marc Peter Deisenroth, João P. P. Gomes, Diego Mesquita, César Lincoln C. Mattos
  • for: 这个论文的目的是提出一种新的深度 Gaussian process(TDGP)模型,以解决深度 Gaussian process(DGP)模型中的一些问题,如敏感性和解释性的缺失。
  • methods: TDGP 模型使用了successively parameterizing kernels with Gaussian process layers,这种方法可以学习输入数据的低维度表示,同时保持 kernel 的 interpretable 性。
  • results: 研究发现,TDGP 模型可以在标准的 benchmark 数据集上表现出色,并且可以适应增加层数的情况。此外,TDGP 模型可以学习低维度表示,并且不会出现特定的PATHOLOGIES。
    Abstract Gaussian processes (GPs) can provide a principled approach to uncertainty quantification with easy-to-interpret kernel hyperparameters, such as the lengthscale, which controls the correlation distance of function values. However, selecting an appropriate kernel can be challenging. Deep GPs avoid manual kernel engineering by successively parameterizing kernels with GP layers, allowing them to learn low-dimensional embeddings of the inputs that explain the output data. Following the architecture of deep neural networks, the most common deep GPs warp the input space layer-by-layer but lose all the interpretability of shallow GPs. An alternative construction is to successively parameterize the lengthscale of a kernel, improving the interpretability but ultimately giving away the notion of learning lower-dimensional embeddings. Unfortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.
    摘要 traducción al chino simplificado:Gaussian processes (GPs) 可以提供一个原理性的方法来评估uncertainty量化,通过容易理解的kernel参数,如lengthscale,控制函数值之间的相关程度。然而,选择合适的kernel可以是困难的。深度GPs 可以通过 successively parameterizing kernels with GP layers 来避免手动kernel工程,从而学习输入数据的低维表示。然而,这些方法通常会失去 shallow GPs 中的解释性。一种alternative construction是 successively parameterize the lengthscale of a kernel,以提高解释性,但是 ultimately give up the notion of learning lower-dimensional embeddings。fortunately, both methods are susceptible to particular pathologies which may hinder fitting and limit their interpretability. This work proposes a novel synthesis of both previous approaches: Thin and Deep GP (TDGP). Each TDGP layer defines locally linear transformations of the original input data maintaining the concept of latent embeddings while also retaining the interpretation of lengthscales of a kernel. Moreover, unlike the prior solutions, TDGP induces non-pathological manifolds that admit learning lower-dimensional representations. We show with theoretical and experimental results that i) TDGP is, unlike previous models, tailored to specifically discover lower-dimensional manifolds in the input data, ii) TDGP behaves well when increasing the number of layers, and iii) TDGP performs well in standard benchmark datasets.

Value-Biased Maximum Likelihood Estimation for Model-based Reinforcement Learning in Discounted Linear MDPs

  • paper_url: http://arxiv.org/abs/2310.11515
  • repo_url: None
  • paper_authors: Yu-Heng Hung, Ping-Chun Hsieh, Akshay Mete, P. R. Kumar
  • for: linear Markov Decision Processes (MDPs) with infinite horizon and linearly parameterized transition probabilities
  • methods: Value-Biased Maximum Likelihood Estimation (VBMLE)
  • results: $\widetilde{O}(d\sqrt{T})$ regret, computationally more efficient than existing regression-based approaches, and a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.Here’s the Chinese translation of the three points:
  • for: linear Markov 决策过程 (MDPs) WITH infinite horizon 和 linearly parameterized transition probabilities
  • methods: Value-Biased Maximum Likelihood Estimation (VBMLE)
  • results: $\widetilde{O}(d\sqrt{T})$ regret, computationally more efficient than existing regression-based approaches, AND a generic convergence result of MLE in linear MDPs through a novel supermartingale construct.
    Abstract We consider the infinite-horizon linear Markov Decision Processes (MDPs), where the transition probabilities of the dynamic model can be linearly parameterized with the help of a predefined low-dimensional feature mapping. While the existing regression-based approaches have been theoretically shown to achieve nearly-optimal regret, they are computationally rather inefficient due to the need for a large number of optimization runs in each time step, especially when the state and action spaces are large. To address this issue, we propose to solve linear MDPs through the lens of Value-Biased Maximum Likelihood Estimation (VBMLE), which is a classic model-based exploration principle in the adaptive control literature for resolving the well-known closed-loop identification problem of Maximum Likelihood Estimation. We formally show that (i) VBMLE enjoys $\widetilde{O}(d\sqrt{T})$ regret, where $T$ is the time horizon and $d$ is the dimension of the model parameter, and (ii) VBMLE is computationally more efficient as it only requires solving one optimization problem in each time step. In our regret analysis, we offer a generic convergence result of MLE in linear MDPs through a novel supermartingale construct and uncover an interesting connection between linear MDPs and online learning, which could be of independent interest. Finally, the simulation results show that VBMLE significantly outperforms the benchmark method in terms of both empirical regret and computation time.
    摘要 我们考虑无穷远线性Markov决策过程(MDP),其过程概率转移可线性参数化通过一个固定的低维度特征映射。现有的回归方法有理论上可达到近似优劣 regret,但 computationally 较为慢,特别是当状态和动作空间较大时。为解决这个问题,我们提议通过Value-Biased Maximum Likelihood Estimation(VBMLE)解决linear MDPs,VBMLE 是适应控制文献中的一种经典的模型基于探索原理,用于解决Maximum Likelihood Estimation 的关闭loop标定问题。我们正式表明VBMLE 具有 $\widetilde{O}(d\sqrt{T})$ regret,其中 $T$ 是时间悬度,$d$ 是模型参数的维度,并且VBMLE computationally 更高效,只需在每个时间步骤中解决一个优化问题。在我们的 regret 分析中,我们提供了线性 MDPs 的MLE 的普适减少结果,并发现了线性 MDPs 与在线学习之间的有趣连接,这可能是独立的兴趣。最后,实验结果显示VBMLE 在 empirical regret 和计算时间上明显超过参考方法。

Stochastic Quantum Sampling for Non-Logconcave Distributions and Estimating Partition Functions

  • paper_url: http://arxiv.org/abs/2310.11445
  • repo_url: None
  • paper_authors: Guneykan Ozgul, Xiantao Li, Mehrdad Mahdavi, Chunhao Wang
  • for: 这个论文目标是设计一种量子算法来采样非几何均匀概率分布 $\pi(x) \propto \exp(-\beta f(x))$.
  • methods: 这个方法基于量子模拟热化算法,使用慢变化的马尔可夫链,并使用小批量的梯度诊断来实现量子步进。
  • results: 这个量子算法在维度和精度两个方面表现出了幂等速度的优化,比最佳known的类型算法更快。
    Abstract We present quantum algorithms for sampling from non-logconcave probability distributions in the form of $\pi(x) \propto \exp(-\beta f(x))$. Here, $f$ can be written as a finite sum $f(x):= \frac{1}{N}\sum_{k=1}^N f_k(x)$. Our approach is based on quantum simulated annealing on slowly varying Markov chains derived from unadjusted Langevin algorithms, removing the necessity for function evaluations which can be computationally expensive for large data sets in mixture modeling and multi-stable systems. We also incorporate a stochastic gradient oracle that implements the quantum walk operators inexactly by only using mini-batch gradients. As a result, our stochastic gradient based algorithm only accesses small subsets of data points in implementing the quantum walk. One challenge of quantizing the resulting Markov chains is that they do not satisfy the detailed balance condition in general. Consequently, the mixing time of the algorithm cannot be expressed in terms of the spectral gap of the transition density, making the quantum algorithms nontrivial to analyze. To overcome these challenges, we first build a hypothetical Markov chain that is reversible, and also converges to the target distribution. Then, we quantified the distance between our algorithm's output and the target distribution by using this hypothetical chain as a bridge to establish the total complexity. Our quantum algorithms exhibit polynomial speedups in terms of both dimension and precision dependencies when compared to the best-known classical algorithms.
    摘要 我们提出了量子算法用于采样非几何均勋分布,其形式为 $\pi(x) \propto \exp(-\beta f(x))$。其中,$f$ 可以写作 finite sum $f(x):= \frac{1}{N}\sum_{k=1}^N f_k(x)$。我们的方法基于量子模拟热化法,使用慢变化的马尔可夫链,从无调整的勒文算法中 derivation,从而消除了计算成本较高的函数评估,特别是在混合模型和多稳定系统中。我们还使用 Stochastic gradient oracle,通过使用小批量评估来实现量子步进 operator。因此,我们的 Stochastic gradient 基本算法只需访问小 subsets of data points,实现量子步进。一个挑战是量化得到的马尔可夫链不满足细化平衡条件,因此我们无法通过spectral gap 来衡量混合时间。为了突破这些挑战,我们首先建立一个假的马尔可夫链,该链是可逆的,并且 converge 到目标分布。然后,我们使用这个假链作为桥,来衡量我们算法的输出和目标分布之间的距离。我们的量子算法在维度和精度上都 exhibit 对比 classical algorithms 的多项式减速。

Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

  • paper_url: http://arxiv.org/abs/2310.11431
  • repo_url: None
  • paper_authors: David Klindt, Sophia Sanborn, Francisco Acosta, Frédéric Poitevin, Nina Miolane
  • for: 这个论文的目的是为了探讨深度神经网络中单个神经元的可解释性,以及是否存在多个无关的特征在同一个神经元中的表现。
  • methods: 这篇论文使用了一种自动化的可解释性量化方法,该方法基于大量的人类 psychophysics 判断神经元可解释性的数据库,并且还提出了一种在网络活动空间找到有意义的方向的方法。
  • results: 研究发现,使用这种方法可以在卷积神经网络中找到更加直观的方向,这些方向不同于单个神经元的表现。此外,研究还应用了这种方法于三个最近发表的视觉神经响应数据集,并发现结论大致传递到实际神经数据中,这建议superposition可能被脑部实现。这也提出了关于稳定、高效和分解表示的基本问题,并且与分解有关。
    Abstract Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. However, many neurons exhibit $\textit{mixed selectivity}$, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in $\textit{superposition}$, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to three recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems.
    摘要 单一神经元在神经网络中经常是可解释的,它们表示单一、直觉的特征。然而,许多神经元会表现出混合选择性,即它们表示多个无关的特征。一个最近的假设提出了,内部特征在深度网络中可能会被表示为组合,即在非正交的轴上由多个神经元表示。由于自然数据中的可解释特征的数量通常大于给定网络中的神经元数量,因此我们应该能够在网络启动空间中找到有意义的方向。我们提出了以下两个方法来进行这些研究:1. 一个自动化的方法来评估视觉可解释性,该方法被验证了一个大量的人类心理学评价神经元可解释性的数据库。2. 一种方法来在网络启动空间中找到有意义的方向,这些方法可以在实际的神经网络中发现更直觉的方向。我们运用这些方法发现,对于某些问题,深度网络中的活动空间中的方向可能更直觉、更有意义,并且在实际的神经网络中发现了这些方向。此外,我们将这些方法应用到了三个最近的视觉神经反应数据中,发现结果大多转移到了实际的神经资料中,这表明了组合可能被脑部使用。这还提供了与分离开来的连结,并提出了基本问题,例如如何实现可靠、高效和分离的表示在人工和生物神经系统中。

Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression

  • paper_url: http://arxiv.org/abs/2310.11428
  • repo_url: None
  • paper_authors: Adam Block, Dylan J. Foster, Akshay Krishnamurthy, Max Simchowitz, Cyril Zhang
  • for: This paper studies the issue of training instability in behavior cloning with deep neural networks, specifically the sharp oscillations in long-horizon rewards that can occur during training.
  • methods: The authors use minibatch SGD updates to the policy network during training, and empirically disentangle the statistical and computational causes of the oscillations. They also test several standard mitigation techniques and find an exponential moving average (EMA) of iterates to be effective in alleviating the issue.
  • results: The authors show that GVA is a common phenomenon in both continuous control and autoregressive language generation, and that EMA can effectively mitigate it. They also provide theoretical vignettes to explain the benefits of EMA in alleviating GVA and shed light on the extent to which classical convex models can help in understanding the benefits of iterate averaging in deep learning.
    Abstract This work studies training instabilities of behavior cloning with deep neural networks. We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards, despite negligibly affecting the behavior cloning loss. We empirically disentangle the statistical and computational causes of these oscillations, and find them to stem from the chaotic propagation of minibatch SGD noise through unstable closed-loop dynamics. While SGD noise is benign in the single-step action prediction objective, it results in catastrophic error accumulation over long horizons, an effect we term gradient variance amplification (GVA). We show that many standard mitigation techniques do not alleviate GVA, but find an exponential moving average (EMA) of iterates to be surprisingly effective at doing so. We illustrate the generality of this phenomenon by showing the existence of GVA and its amelioration by EMA in both continuous control and autoregressive language generation. Finally, we provide theoretical vignettes that highlight the benefits of EMA in alleviating GVA and shed light on the extent to which classical convex models can help in understanding the benefits of iterate averaging in deep learning.
    摘要

Group-blind optimal transport to group parity and its constrained variants

  • paper_url: http://arxiv.org/abs/2310.11407
  • repo_url: None
  • paper_authors: Quan Zhou, Jakub Marecek
  • for: 这个论文的目的是为了实现不受人类特征(如性别、种族)影响的机器学习模型。
  • methods: 这个论文使用了一种单组盲投影 map,将源数据中两个组的特征分布都 align,实现(人口)组均点,无需个体样本中的敏感属性值。
  • results: 作者通过使用真实数据和 sintetic数据进行数值实验,证明了这种方法可以实现不受敏感属性影响的机器学习模型。
    Abstract Fairness holds a pivotal role in the realm of machine learning, particularly when it comes to addressing groups categorised by sensitive attributes, e.g., gender, race. Prevailing algorithms in fair learning predominantly hinge on accessibility or estimations of these sensitive attributes, at least in the training process. We design a single group-blind projection map that aligns the feature distributions of both groups in the source data, achieving (demographic) group parity, without requiring values of the protected attribute for individual samples in the computation of the map, as well as its use. Instead, our approach utilises the feature distributions of the privileged and unprivileged groups in a boarder population and the essential assumption that the source data are unbiased representation of the population. We present numerical results on synthetic data and real data.
    摘要 “公平在机器学习中扮演着关键角色,特别是在面临敏感属性分类的群体时。现有的 Fair learning 算法主要基于敏感属性的访问或估计,至少在训练过程中。我们设计了一个单一群体盲目投影Map,使源数据中两个群体的特征分布相互对齐,实现群体平均性,无需个别样本中的敏感属性值,也无需在计算投影Map时和其使用过程中使用敏感属性值。而是我们的方法基于优先群体和受难群体在更大的人口中的特征分布,以及假设源数据是人口的不偏 representations。我们在synthetic数据和实际数据上提供数字结果。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Enhancing Group Fairness in Online Settings Using Oblique Decision Forests

  • paper_url: http://arxiv.org/abs/2310.11401
  • repo_url: None
  • paper_authors: Somnath Basu Roy Chowdhury, Nicholas Monath, Ahmad Beirami, Rahul Kidambi, Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi
  • for: 这篇论文的目的是提出一种在线上环境中实现公平的机器学习系统,以确保不同群体之间的公平。
  • methods: 这篇论文提出了一个名为Aranyani的ensemble of oblique decision trees的方法,可以在线上环境中实现公平的决策。Aranyani使用了树结构,允许在决策时计算公平度量,并且可以快速计算公平度量,不需要额外的储存和前向/后向通过。
  • results: 这篇论文的实验结果显示,Aranyani可以在5个公开的benchmark上 achieves a better accuracy-fairness trade-off compared to baseline approaches。
    Abstract Fairness, especially group fairness, is an important consideration in the context of machine learning systems. The most commonly adopted group fairness-enhancing techniques are in-processing methods that rely on a mixture of a fairness objective (e.g., demographic parity) and a task-specific objective (e.g., cross-entropy) during the training process. However, when data arrives in an online fashion -- one instance at a time -- optimizing such fairness objectives poses several challenges. In particular, group fairness objectives are defined using expectations of predictions across different demographic groups. In the online setting, where the algorithm has access to a single instance at a time, estimating the group fairness objective requires additional storage and significantly more computation (e.g., forward/backward passes) than the task-specific objective at every time step. In this paper, we propose Aranyani, an ensemble of oblique decision trees, to make fair decisions in online settings. The hierarchical tree structure of Aranyani enables parameter isolation and allows us to efficiently compute the fairness gradients using aggregate statistics of previous decisions, eliminating the need for additional storage and forward/backward passes. We also present an efficient framework to train Aranyani and theoretically analyze several of its properties. We conduct empirical evaluations on 5 publicly available benchmarks (including vision and language datasets) to show that Aranyani achieves a better accuracy-fairness trade-off compared to baseline approaches.
    摘要 “公平性,特别是群体公平性,在机器学习系统中是一个重要考虑因素。通常运用的群体公平化技术是在训练过程中使用混合物的公平目标(例如人口平衡)和任务特定目标(例如十字项目)。但在线上数据来临时,实现这些公平目标是有挑战的。具体来说,群体公平目标是根据不同群体的预期预测结果定义的。在线上设置中,algorithm只有单独的实例,估计群体公平目标需要额外的存储和更多的计算(例如前向/后向通过)。在这篇论文中,我们提出Aranyani,一个以梯形树为基础的混合决策树,以确保在线上设置中做出公平的决策。Aranyani的树状架构允许参数隔离和通过先前的决策统计资料来计算公平的梯度,无需额外的存储和前向/后向通过。我们还提供了一个有效的训练框架和理论分析多个性能。我们在5个公开可用的benchmark(包括视觉和语言dataset)进行实验评估,发现Aranyani在精度-公平性贡献中比基准方法更好。”

Last One Standing: A Comparative Analysis of Security and Privacy of Soft Prompt Tuning, LoRA, and In-Context Learning

  • paper_url: http://arxiv.org/abs/2310.11397
  • repo_url: None
  • paper_authors: Rui Wen, Tianhao Wang, Michael Backes, Yang Zhang, Ahmed Salem
  • for: 这篇论文旨在探讨大语言模型(LLM)适用私有数据时的隐私和安全问题。
  • methods: 论文使用了三种已知技术来适应LLM:Low-Rank Adaptation(LoRA)、Soft Prompt Tuning(SPT)和In-Context Learning(ICL)。
  • results: 研究发现,无一种适合所有隐私和安全需求的适应技术,每种技术都有不同的优劣点。
    Abstract Large Language Models (LLMs) are powerful tools for natural language processing, enabling novel applications and user experiences. However, to achieve optimal performance, LLMs often require adaptation with private data, which poses privacy and security challenges. Several techniques have been proposed to adapt LLMs with private data, such as Low-Rank Adaptation (LoRA), Soft Prompt Tuning (SPT), and In-Context Learning (ICL), but their comparative privacy and security properties have not been systematically investigated. In this work, we fill this gap by evaluating the robustness of LoRA, SPT, and ICL against three types of well-established attacks: membership inference, which exposes data leakage (privacy); backdoor, which injects malicious behavior (security); and model stealing, which can violate intellectual property (privacy and security). Our results show that there is no silver bullet for privacy and security in LLM adaptation and each technique has different strengths and weaknesses.
    摘要

VaR\ and CVaR Estimation in a Markov Cost Process: Lower and Upper Bounds

  • paper_url: http://arxiv.org/abs/2310.11389
  • repo_url: None
  • paper_authors: Sanjay Bhat, Prashanth L. A., Gugan Thoppe
  • for: 这篇论文旨在研究Markov过程中的 Value-at-Risk(VaR)和Conditional Value-at-Risk(CVaR)估计问题。
  • methods: 该论文首先 derivates a minimax下界为Ω(1/√n),这个下界在预期和 probabilistic上都成立。然后,使用finite-horizon truncation scheme, derivates an upper bound for CVaR estimation error, which matches the lower bound up to constant factors。
  • results: 该论文的结果表明,在Markovian设定下,our estimation scheme can provide lower and upper bounds on the estimation error for any risk measure, including spectral risk measures and utility-based shortfall risk. Our lower bounds also extend to the infinite-horizon discounted costs’ mean, and improve upon the existing result Ω(1/n) [13].
    Abstract We tackle the problem of estimating the Value-at-Risk (VaR) and the Conditional Value-at-Risk (CVaR) of the infinite-horizon discounted cost within a Markov cost process. First, we derive a minimax lower bound of $\Omega(1/\sqrt{n})$ that holds both in an expected and in a probabilistic sense. Then, using a finite-horizon truncation scheme, we derive an upper bound for the error in CVaR estimation, which matches our lower bound up to constant factors. Finally, we discuss an extension of our estimation scheme that covers more general risk measures satisfying a certain continuity criterion, e.g., spectral risk measures, utility-based shortfall risk. To the best of our knowledge, our work is the first to provide lower and upper bounds on the estimation error for any risk measure within Markovian settings. We remark that our lower bounds also extend to the infinite-horizon discounted costs' mean. Even in that case, our result $\Omega(1/\sqrt{n}) $ improves upon the existing result $\Omega(1/n)$[13].
    摘要 我们研究了在马尔可夫过程中估计值风险(VaR)和条件值风险(CVaR)的问题。首先,我们 deriv了一个最小最大下界为 $\Omega(1/\sqrt{n})$,这个下界在预期上和概率上都成立。然后,使用一种 finite-horizon truncation scheme,我们 deriv了 CVaR 估计错误的Upper bound,与我们的下界几乎相同。最后,我们讨论了我们的估计方案的扩展,覆盖更加一般的风险度量,如спектраль风险度量和utilities-based shortfall风险。据我们所知,我们的工作是在马尔可夫 Setting 中提供了任何风险度量的下界和上界。我们的下界还扩展到了无限期折抵费用的mean。甚至在那种情况下,我们的结果 $\Omega(1/\sqrt{n})$ 超越了现有的结果 $\Omega(1/n)$ [13].

Faster Algorithms for Generalized Mean Densest Subgraph Problem

  • paper_url: http://arxiv.org/abs/2310.11377
  • repo_url: None
  • paper_authors: Chenglin Fan, Ping Li, Hanyu Peng
  • for: 本研究的目的是解决 $p$-mean densest subgraph问题,即寻找一个 graphs 中的最密集子图,其中 $p $ 是一个非负整数,表示使用 $p $-th 次幂度来衡量子图的密度。
  • methods: 本研究使用了一种新的通用抽屉算法(GENPEEL),以及一种更快的通用抽屉算法(GENPEEL++),它们可以在 $p \geq 1 $ 时间复杂度为 $O(mn)$,在 $p \in [1, +\infty)$ 时间复杂度为 $O(m(\log n))$,并且具有 $(p+1)^{1/p}$ 和 $(2(p+1))^{1/p}$ 的近似比率,分别适用于不同的 $p $ 值。
  • results: 本研究发现,对于 $0 < p < 1 $,标准抽屉算法可以提供 $2^{1/p}$ 的近似比率,而不是预期的 $p $ 次幂度。此外,GENPEEL 和 GENPEEL++ 算法可以在 $p \geq 1 $ 和 $p \in [1, +\infty)$ 中提供 $(p+1)^{1/p}$ 和 $(2(p+1))^{1/p}$ 的近似比率,分别适用于不同的 $p $ 值。
    Abstract The densest subgraph of a large graph usually refers to some subgraph with the highest average degree, which has been extended to the family of $p$-means dense subgraph objectives by~\citet{veldt2021generalized}. The $p$-mean densest subgraph problem seeks a subgraph with the highest average $p$-th-power degree, whereas the standard densest subgraph problem seeks a subgraph with a simple highest average degree. It was shown that the standard peeling algorithm can perform arbitrarily poorly on generalized objective when $p>1$ but uncertain when $0
    摘要 通常情况下,最密集子图(dense subgraph)指的是一个具有最高平均度的子图。这个概念在$p$-means dense subgraph目标家族中被推广,其中$p$-mean densest subgraph问题 seek一个具有最高$p$-th-power度的子图。与标准的最密集子图问题不同的是,后者仅寻找一个简单的最高平均度的子图。当$p>1$时,标准的剥离算法可能会表现出现 arbitrarily poor performance,而当$0

Lie Group Decompositions for Equivariant Neural Networks

  • paper_url: http://arxiv.org/abs/2310.11366
  • repo_url: None
  • paper_authors: Mircea Mironenco, Patrick Forré
  • For: 这个论文的目标是构建具有对称性和同态性的神经网络模型,特别在数据端不充足的情况下。* Methods: 这个论文使用了利群的 Lie 代数和群 exponential 和 logarithm 函数来扩展传统的对称性模型,并使用了 Lie 群的结构和几何来实现对称集合的归一化和全局均衡。* Results: 这个论文通过对比各种已有的对称模型和卷积神经网络模型,证明了其在标准对称检测任务上的性能优越性和对于不同的输入数据的泛化能力。
    Abstract Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest $G$, the exponential map may not be surjective. Further limitations are encountered when $G$ is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups $G = \text{GL}^{+}(n, \mathbb{R})$ and $G = \text{SL}(n, \mathbb{R})$, as well as their representation as affine transformations $\mathbb{R}^{n} \rtimes G$. Invariant integration as well as a global parametrization is realized by decomposing the `larger` groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals.
    摘要 固有和等变征对几何变换有利,特别是在数据缺乏时。许多研究都集中在 компакт或很小的symmetry group上。现在的工作探索了使用Lie group的方法,包括Lie algebra、组 exponential和logarithm maps。然而,这些方法的应用 scope limited by the fact that the exponential map may not be surjective, and further limitations are encountered when the group of interest $G$ is neither compact nor abelian.我们使用 Lie group的结构和几何特性,提出一个框架,可以让我们在 $G = \text{GL}^{+}(n, \mathbb{R})$ 和 $G = \text{SL}(n, \mathbb{R})$ 上工作,以及它们的表示为抽象变换 $\mathbb{R}^{n} \rtimes G$。我们可以通过将这些 '大' 组织分解成子组织和子抽象变换,并将它们处理一个一个来实现不变 интеграл和全局参数化。在这个框架下,我们可以设计卷积核心来构建对抽象变换具有不变性的模型。我们在标准对称变换分类任务上评估了我们的模型的稳定性和 OUT-OF-distribution泛化能力,并超越了所有均衡变换模型以及所有卷积网络提议。

Contextualized Machine Learning

  • paper_url: http://arxiv.org/abs/2310.11340
  • repo_url: https://github.com/SAP/contextual-ai
  • paper_authors: Benjamin Lengerich, Caleb N. Ellington, Andrea Rubbi, Manolis Kellis, Eric P. Xing
  • for: 这篇论文旨在探讨Contextualized Machine Learning(ML),一种学习不同和上下文相关的效应的方法。
  • methods: 该方法使用深度学习来模型meta-关系,将上下文信息翻译成模型参数,实现不同参数的变化。
  • results: 该方法可以汇集不同的框架,包括带环境分析和年龄模型,并且可以实现非Parametric推断和模型识别性条件。最后,作者提供了一个开源的PyTorch包ContextualizedML。
    Abstract We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis and cohort modeling by introducing two reusable concepts: a context encoder which translates sample context into model parameters, and sample-specific model which operates on sample predictors. We review the process of developing contextualized models, nonparametric inference from contextualized models, and identifiability conditions of contextualized models. Finally, we present the open-source PyTorch package ContextualizedML.
    摘要 我们研究Contextualized Machine Learning(ML),一种学习不同和上下文依赖的效果的方法。Contextualized ML使用深度学习来模型meta关系,即样本上下文信息和样本特定的参数模型之间的关系。这是一种 varying-coefficient modeling,可以统一现有的框架,包括集群分析和团队模型,通过引入两个可重用概念:样本上下文编码器,将样本上下文转换为模型参数,以及样本特定的模型,对样本预测变量进行操作。我们详细介绍了Contextualized模型的开发、非参数推断、和Contextualized模型的可识别条件。最后,我们发布了一个开源的PyTorch包,即ContextualizedML。

Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

  • paper_url: http://arxiv.org/abs/2310.11335
  • repo_url: None
  • paper_authors: Dominik Baumann, Erfaun Noorani, James Price, Ole Peters, Colm Connaughton, Thomas B. Schön
  • for: This paper aims to address the issue of non-robustness in reinforcement learning (RL) algorithms, specifically in real-world applications such as autonomous driving, precision agriculture, and finance.
  • methods: The authors propose a new algorithm for learning ergodicity transformations from data, which enables the optimization of long-term return for individual agents rather than the average across infinitely many trajectories.
  • results: The proposed algorithm is demonstrated to be effective in an instructive, non-ergodic environment and on standard RL benchmarks.Here’s the simplified Chinese text:
  • for: 本研究旨在解决反对式学习(RL)算法在实际应用中的不稳定性问题,特别是在自动驾驶、精准农业和金融等领域。
  • methods: 作者提出了一种基于数据学习Ergodicity变换的算法,以便优化个体代理的长期返回,而不是权衡无穷多个轨迹的平均返回。
  • results: 提出的算法在一个教育性的非ergodic环境以及标准RL benchmark上得到了证明。
    Abstract Envisioned application areas for reinforcement learning (RL) include autonomous driving, precision agriculture, and finance, which all require RL agents to make decisions in the real world. A significant challenge hindering the adoption of RL methods in these domains is the non-robustness of conventional algorithms. In this paper, we argue that a fundamental issue contributing to this lack of robustness lies in the focus on the expected value of the return as the sole "correct" optimization objective. The expected value is the average over the statistical ensemble of infinitely many trajectories. For non-ergodic returns, this average differs from the average over a single but infinitely long trajectory. Consequently, optimizing the expected value can lead to policies that yield exceptionally high returns with probability zero but almost surely result in catastrophic outcomes. This problem can be circumvented by transforming the time series of collected returns into one with ergodic increments. This transformation enables learning robust policies by optimizing the long-term return for individual agents rather than the average across infinitely many trajectories. We propose an algorithm for learning ergodicity transformations from data and demonstrate its effectiveness in an instructive, non-ergodic environment and on standard RL benchmarks.
    摘要 拟合应用领域 для强化学习(RL)包括自动驾驶、精细农业和金融,这些领域都需要RL代理人做出实际世界中的决策。然而,现有的RL方法在这些领域的应用受到一定的阻碍。在这篇论文中,我们认为RL方法的一个基本问题在于强调预期返回值作为唯一的“正确”优化目标。预期返回值是统计ensemble中的平均值,对于非ergodic返回,这个平均值与单个但是无限长的轨迹的平均值不同。因此,优化预期返回可能导致政策产生极高的返回,但是几乎确定会导致灾难性的结果。这个问题可以通过将收集到的返回时间序列转换成一个ergodic增量来解决。这种转换允许学习 robust政策,而不是优化infinitely多轨迹的平均值。我们提出了一种从数据中学习ergodicity转换的算法,并在一个 instructive、非ergodic环境中和标准RLbenchmark上进行了证明。

Elucidating The Design Space of Classifier-Guided Diffusion Generation

  • paper_url: http://arxiv.org/abs/2310.11311
  • repo_url: https://github.com/alexmaols/elucd
  • paper_authors: Jiajun Ma, Tianyang Hu, Wenjia Wang, Jiacheng Sun
  • for: 提高Diffusion模型的样本质量和可控性,通过主流方法和训练自由方法都需要额外的标注数据训练,而训练自由方法的性能尚未得到证明。
  • methods: 我们通过对设计空间的全面调查,发现可以通过免训练的方式,使用市场上可得的预训练分类器来提高 diffusion 模型的性能,同时兼顾主流方法和训练自由方法的优点。我们提出了几种预处理技术来更好地利用预训练分类器来导向Diffusion生成。
  • results: 我们通过对 ImageNet 进行了广泛的实验,证明了我们的提posed方法可以提高 state-of-the-art diffusion 模型(DDPM、EDM、DiT)的性能(最多提高20%),而且几乎不需要额外的计算成本。随着可得到的预训练分类器的普及,我们的提posed方法具有极大的潜力,可以快速扩展到文本到图像生成任务。代码可以在 https://github.com/AlexMaOLS/EluCD/tree/main 获取。
    Abstract Guidance in conditional diffusion generation is of great importance for sample quality and controllability. However, existing guidance schemes are to be desired. On one hand, mainstream methods such as classifier guidance and classifier-free guidance both require extra training with labeled data, which is time-consuming and unable to adapt to new conditions. On the other hand, training-free methods such as universal guidance, though more flexible, have yet to demonstrate comparable performance. In this work, through a comprehensive investigation into the design space, we show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion, enjoying the best of both worlds. Employing calibration as a general guideline, we propose several pre-conditioning techniques to better exploit pretrained off-the-shelf classifiers for guiding diffusion generation. Extensive experiments on ImageNet validate our proposed method, showing that state-of-the-art diffusion models (DDPM, EDM, DiT) can be further improved (up to 20%) using off-the-shelf classifiers with barely any extra computational cost. With the proliferation of publicly available pretrained classifiers, our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks. The code is available at https://github.com/AlexMaOLS/EluCD/tree/main.
    摘要 <>将文本翻译成简化中文。<>Diffusion模型的指导是样本质量和可控性方面的关键因素。然而,现有的指导方法尚不够。一方面,主流方法如类型器指导和类型器无指导都需要额外的训练频道标注数据,时间consuming并不适应新条件。另一方面,无需训练的方法如通用指导,虽然更灵活,尚未达到相关性表现。在这种情况下,我们通过对设计空间的全面调查,展示了可以通过利用准备好的类型器来获得显著性能提升,同时兼得到最佳的两个世界。采用准确性为总则,我们提出了一些预处理技巧来更好地利用预训练的类型器来导向扩散生成。广泛的实验 validate我们的提议方法,显示了使用预训练的类型器可以提高状态当前的扩散模型(DDPM、EDM、DiT)的性能(最高提升20%),而且几乎没有额外的计算成本。随着公共预训练类型器的普及,我们的提议方法具有巨大的潜力,可以轻松扩展到文本到图生成任务。代码可以在 GitHub 上获取:https://github.com/AlexMaOLS/EluCD/tree/main。

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

  • paper_url: http://arxiv.org/abs/2310.11291
  • repo_url: None
  • paper_authors: Zhao Song, Chiwun Yang
  • for: 这个研究的目的是提高神经网络训练中的优化技术,尤其是解决 mini-batch 优化中的测验问题。
  • methods: 本研究使用了 delta-bar-delta 算法,并提出了一个新的 RDBD (Regrettable Delta-Bar-Delta) 方法来解决 convergence 问题。
  • results: 经过广泛的实验评估,RDBD 方法能够增加优化过程的速度和稳定性,并且可以与不同的优化算法整合使用。
    Abstract The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates. While this algorithm has demonstrated strong competitiveness in full data optimization when compared to other state-of-the-art algorithms like Adam and SGD, it may encounter convergence issues in mini-batch optimization scenarios due to the presence of noisy gradients. In this study, we thoroughly investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization. To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta). Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process. Furthermore, we demonstrate that RDBD can be seamlessly integrated with any optimization algorithm and significantly improve the convergence speed. By conducting extensive experiments and evaluations, we validate the effectiveness and efficiency of our proposed RDBD approach. The results showcase its capability to overcome convergence issues in mini-batch optimization and its potential to enhance the convergence speed of various optimization algorithms. This research contributes to the advancement of optimization techniques in neural network training, providing practitioners with a reliable automatic learning rate scheduler for achieving faster convergence and improved optimization outcomes.
    摘要 delta-bar-delta 算法是一种学习率自适应技术,可以增加训练过程的速度并且在全数据优化中与其他当前标准算法如 Adam 和 SGD 进行比较。然而,在小批量优化场景下,这种算法可能会遇到收敛问题,这是因为梯度具有噪音。在这个研究中,我们对 delta-bar-delta 算法在实际神经网络优化中的收敛行为进行了全面的调查。为了解决任何可能出现的收敛挑战,我们提出了一种新的 Approach,即 RDBD(Regrettable Delta-Bar-Delta)。我们的方法可以快速更正偏导学习率调整,并确保优化过程的收敛。此外,我们证明了 RDBD 可以轻松地与任何优化算法结合使用,并显著提高优化速度。通过进行了广泛的实验和评估,我们证明了 RDBD 的效果和效率。结果表明,RDBD 可以在小批量优化中解决收敛问题,并且有可能在不同的优化算法中提高收敛速度。这项研究对神经网络训练中优化技术的进步做出了贡献,为实践者提供了一个可靠的自动学习率调整器,以实现更快的收敛和优化结果。

Evaluating the Impact of Humanitarian Aid on Food Security

  • paper_url: http://arxiv.org/abs/2310.11287
  • repo_url: None
  • paper_authors: Jordi Cerdà-Bautista, José María Tárraga, Vasileios Sitokonstantinou, Gustau Camps-Valls
  • for: 评估气候变化引起的干旱地区食品安全问题,需要紧急人道主义帮助。
  • methods: 该文章提出了一种 causal inference 框架,用于评估投入到食品危机中的金钱支付政策的影响。文章包括识别食品安全系统中的 causal 关系、完善全面的数据库、和估计人道援助政策对于营养不良的 causal 效应。
  • results: 研究发现,这些投入没有显著影响,可能因为样本规模太小、数据质量不佳、和 causal 图不完善,即我们对多学科系统食品安全的理解还有限。这说明需要进一步提高数据收集和精细化 causal 模型,以便更有效地进行未来的投入和政策,提高人道援助的透明度和责任感。
    Abstract In the face of climate change-induced droughts, vulnerable regions encounter severe threats to food security, demanding urgent humanitarian assistance. This paper introduces a causal inference framework for the Horn of Africa, aiming to assess the impact of cash-based interventions on food crises. Our contributions encompass identifying causal relationships within the food security system, harmonizing a comprehensive database, and estimating the causal effect of humanitarian interventions on malnutrition. Our results revealed no significant effects, likely due to limited sample size, suboptimal data quality, and an imperfect causal graph resulting from our limited understanding of multidisciplinary systems like food security. This underscores the need to enhance data collection and refine causal models with domain experts for more effective future interventions and policies, improving transparency and accountability in humanitarian aid.
    摘要 面对气候变化引起的干旱,抵触地区面临严重的食品安全威胁,需要急需人道主义援助。这篇论文介绍了一种 causal inference 框架,用于评估针对东非的食品危机造成的影响。我们的贡献包括确定食品安全系统中的 causal 关系,融合全面的数据库,并估算人道主义干预对营养不良的影响。我们的结果表明,没有显著的影响, probable 因为样本规模过小、数据质量不佳和我们对多学科系统的理解不够,从而导致 causal 图不准确。这反映了需要增强数据收集和改进 causal 模型,以更有效地应用未来的援助和政策,提高透明度和责任感。

Self-supervision meets kernel graph neural models: From architecture to augmentations

  • paper_url: http://arxiv.org/abs/2310.11281
  • repo_url: None
  • paper_authors: Jiawang Dan, Ruofan Wu, Yunpeng Liu, Baokun Wang, Changhua Meng, Tengfei Liu, Tianyi Zhang, Ningtao Wang, Xing Fu, Qi Li, Weiqiang Wang
  • for: 这 paper 的目的是提高 kernel graph neural networks (KGNNs) 的设计和学习方法,以提高 graph representation learning 的效果。
  • methods: 本 paper 使用了一种更加灵活的 graph-level similarity定义,以及一种更加简洁的优化目标函数,以解决 MPNNs 中的一些挑战。此外,paper 还提出了一种新的自我监督学习方法 called latent graph augmentation (LGA),以提高 KGNNs 的表达能力。
  • results: 实验结果表明,提出的方法可以在 graph classification 任务中达到竞争性的性能,并且在一些比较难的任务中even outperform 现有的状态态-of-the-art 方法。此外,对比其他已有的 graph data augmentation 方法,LGA augmentation scheme 能够更好地捕捉 graph-level 的 semantics。
    Abstract Graph representation learning has now become the de facto standard when handling graph-structured data, with the framework of message-passing graph neural networks (MPNN) being the most prevailing algorithmic tool. Despite its popularity, the family of MPNNs suffers from several drawbacks such as transparency and expressivity. Recently, the idea of designing neural models on graphs using the theory of graph kernels has emerged as a more transparent as well as sometimes more expressive alternative to MPNNs known as kernel graph neural networks (KGNNs). Developments on KGNNs are currently a nascent field of research, leaving several challenges from algorithmic design and adaptation to other learning paradigms such as self-supervised learning. In this paper, we improve the design and learning of KGNNs. Firstly, we extend the algorithmic formulation of KGNNs by allowing a more flexible graph-level similarity definition that encompasses former proposals like random walk graph kernel, as well as providing a smoother optimization objective that alleviates the need of introducing combinatorial learning procedures. Secondly, we enhance KGNNs through the lens of self-supervision via developing a novel structure-preserving graph data augmentation method called latent graph augmentation (LGA). Finally, we perform extensive empirical evaluations to demonstrate the efficacy of our proposed mechanisms. Experimental results over benchmark datasets suggest that our proposed model achieves competitive performance that is comparable to or sometimes outperforming state-of-the-art graph representation learning frameworks with or without self-supervision on graph classification tasks. Comparisons against other previously established graph data augmentation methods verify that the proposed LGA augmentation scheme captures better semantics of graph-level invariance.
    摘要 Graph表示学习现在成为了处理图结构数据的标准方法,MPNN框架是最具有影响力的算法工具。然而,MPNN家族受到一些缺点的限制,如透明度和表达能力。近些年,基于图kernels的图神经网络(KGNN)在MPNN的基础上设计图神经网络,被认为是更透明和有时更表达能力的替代方案。KGNN的发展现在是一个有前途的研究领域,还有许多挑战,如算法设计和适应其他学习模式,如无监督学习。在这篇论文中,我们提高了KGNN的设计和学习。首先,我们扩展了KGNN的算法表述,允许更flexible的图级相似性定义,包括过去的提议,如随机步行图kernels,以及提供更平滑的优化目标,以避免引入 combinatorial学习过程。其次,我们通过对KGNN进行自我监督来增强其性能,发展了一种新的结构保持graph数据增强方法,即latent graph augmentation(LGA)。最后,我们进行了广泛的实验评估,以证明我们提出的机制的有效性。实验结果表明,我们的提出的模型在图分类任务上达到了与或超过了现状标准的表现,并且在不含自我监督的情况下也能够达到类似的表现。与其他之前Established graph data增强方法进行比较,我们的LGA增强方案更好地捕捉到图级 semantics。

Learning to Sample Better

  • paper_url: http://arxiv.org/abs/2310.11232
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Michael S. Albergo, Eric Vanden-Eijnden
  • for: 本文介绍了最近的生成模型方法,基于测量传输的动力学,从基本概率分布到目标概率分布的样本映射。
  • methods: 本文使用了变量学习来学习映射,并使用MC采样技术来提高采样效率。
  • results: 本文在MCMC和重要采样方面获得了改进的结果。
    Abstract These lecture notes provide an introduction to recent advances in generative modeling methods based on the dynamical transportation of measures, by means of which samples from a simple base measure are mapped to samples from a target measure of interest. Special emphasis is put on the applications of these methods to Monte-Carlo (MC) sampling techniques, such as importance sampling and Markov Chain Monte-Carlo (MCMC) schemes. In this context, it is shown how the maps can be learned variationally using data generated by MC sampling, and how they can in turn be used to improve such sampling in a positive feedback loop.
    摘要 Translation in Simplified Chinese:这些讲义 introduce 最近的生成模型方法,基于动态传输度量,从简单的基础探障中映射到目标度量的样本。ocus 在 Monte Carlo (MC) 抽样技术上,如重要抽样和 Markov Chain Monte Carlo (MCMC) 方案。这些讲义显示了如何通过变量学习使得映射,使用 MC 抽样生成的数据,并将其用于改进抽样。

Zipformer: A faster and better encoder for automatic speech recognition

  • paper_url: http://arxiv.org/abs/2310.11230
  • repo_url: https://github.com/k2-fsa/icefall
  • paper_authors: Zengwei Yao, Liyong Guo, Xiaoyu Yang, Wei Kang, Fangjun Kuang, Yifan Yang, Zengrui Jin, Long Lin, Daniel Povey
  • for: 这个论文是为了提出一种更快速、更具有内存效率的 transformer 模型,即 Zipformer,用于自动语音识别(ASR)。
  • methods: 该模型使用了以下方法:1)U-Net-like Encoder结构,中间堆叠在较低帧率下运行; 2)重新排序块结构,增加更多模块,并在每个模块中重用注意力权重以实现更好的效率; 3)修改了 LayerNorm 为 BiasNorm,以保留一些长度信息; 4)新的激活函数 SwooshR 和 SwooshL 比 Swish 更好。
  • results: 对 LibriSpeech、Aishell-1 和 WenetSpeech 数据集进行了广泛的实验,并证明了 Zipformer 模型在其他状态码模型中表现更好。
    Abstract The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and better-performing transformer, called Zipformer. Modeling changes include: 1) a U-Net-like encoder structure where middle stacks operate at lower frame rates; 2) reorganized block structure with more modules, within which we re-use attention weights for efficiency; 3) a modified form of LayerNorm called BiasNorm allows us to retain some length information; 4) new activation functions SwooshR and SwooshL work better than Swish. We also propose a new optimizer, called ScaledAdam, which scales the update by each tensor's current scale to keep the relative change about the same, and also explictly learns the parameter scale. It achieves faster convergence and better performance than Adam. Extensive experiments on LibriSpeech, Aishell-1, and WenetSpeech datasets demonstrate the effectiveness of our proposed Zipformer over other state-of-the-art ASR models. Our code is publicly available at https://github.com/k2-fsa/icefall.
    摘要 《充当者》已成为自动语音识别(ASR)最受欢迎的编码器模型。它将卷积模块添加到转换器中,以学习本地和全局依赖关系。在这项工作中,我们描述了一种更快、更有效和性能更高的转换器,即Zipformer。模型变化包括:1. 中堆结构采用U-Net类型,中间堆叠运行速率较低;2. 块结构重新排序,增加更多模块,并在这些模块中重用注意力权重以实现效率;3. 使用修改后的层Normalization,称为BiasNorm,以保留一些长度信息;4. 新的激活函数SwooshR和SwooshL,比Swish更好地工作;5. 我们还提出了一种新的优化器,称为扫描Adam,它可以根据每个tensor的当前尺度缩放更新,以保持相对变化的相同程度,并且显式地学习参数尺度。它在其他状态的ASR模型比Adam更快地 converges和表现更好。我们对LibriSpeech、Aishell-1和WenetSpeech datasets进行了广泛的实验,并证明了我们提出的Zipformer在其他状态的ASR模型之上表现更好。我们的代码可以在https://github.com/k2-fsa/icefall中找到。

Federated Learning with Nonvacuous Generalisation Bounds

  • paper_url: http://arxiv.org/abs/2310.11203
  • repo_url: None
  • paper_authors: Pierre Jobic, Maxime Haddouche, Benjamin Guedj
  • for: 本研究旨在采用隐私保护技术来实现 federated learning,每个节点都保持着自己的训练数据私有,而不会泄露给其他节点。
  • methods: 本研究使用随机生成器来训练 federated learning 模型,每个节点都生成了一个本地隐私predictor,而不把训练数据分享给其他节点。然后,我们建立了一个全局的随机生成器,该随机生成器继承了本地私有predictor的性质,即PAC-Bayesian泛化 bound。
  • results: 我们通过一系列的数字实验显示,我们的方法可以与批处理方法(其中所有数据集被共享)匹配的预测性能,而不需要共享数据集。此外,我们的方法可以提供 numerically nonvacuous 泛化 bound,保护每个节点的隐私。我们计算了批处理和 federated learning 之间的增量预测性能和泛化 bound,这将为保护隐私而付出的代价。
    Abstract We introduce a novel strategy to train randomised predictors in federated learning, where each node of the network aims at preserving its privacy by releasing a local predictor but keeping secret its training dataset with respect to the other nodes. We then build a global randomised predictor which inherits the properties of the local private predictors in the sense of a PAC-Bayesian generalisation bound. We consider the synchronous case where all nodes share the same training objective (derived from a generalisation bound), and the asynchronous case where each node may have its own personalised training objective. We show through a series of numerical experiments that our approach achieves a comparable predictive performance to that of the batch approach where all datasets are shared across nodes. Moreover the predictors are supported by numerically nonvacuous generalisation bounds while preserving privacy for each node. We explicitly compute the increment on predictive performance and generalisation bounds between batch and federated settings, highlighting the price to pay to preserve privacy.
    摘要 我们提出了一种新的策略,用于在联合学习中训练随机预测器,每个网络节点都希望保持自己的隐私,通过发布本地预测器,而不把训练数据集与其他节点分享。然后,我们构建了一个全局随机预测器,该预测器继承了本地私有预测器的性质,即PAC-Bayesian泛化约束。我们考虑了同步和异步两种情况,在同步情况下,所有节点共享同一个训练目标(基于一个泛化约束),在异步情况下,每个节点可能有自己的个性化训练目标。我们通过一系列数值实验表示,我们的方法可以与批处理方法(所有数据集在节点间共享)相比,具有相似的预测性能,同时保持隐私性。我们显式计算了批处理和联合学习之间的增量预测性能和泛化约束,强调保护隐私的代价。

A Modified EXP3 and Its Adaptive Variant in Adversarial Bandits with Multi-User Delayed Feedback

  • paper_url: http://arxiv.org/abs/2310.11188
  • repo_url: https://github.com/chubbro/mud-exp3
  • paper_authors: Yandi Li, Jianxiong Guo
  • for: 本研究假设了延迟反馈问题中的多用户情况,即每个用户的反馈可能会在不同的延迟时间内提供,而这些延迟时间都是未知的。
  • methods: 我们采用了修改后EXP3算法,称之为MUD-EXP3算法,它在每个轮次中基于不同用户的重要性权重来做决策。
  • results: 我们证明了在知道终点轮次索引$T$的情况下,我们的算法的违和为$\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$。此外,我们还提出了一种适应算法 named AMUD-EXP3,它在不知道$T$的情况下可以实现SUBLINEAR的违和。最后,我们进行了广泛的实验来证明我们的算法的正确性和有效性。
    Abstract For the adversarial multi-armed bandit problem with delayed feedback, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. As the player picks an arm, feedback from multiple users may not be received instantly yet after an arbitrary delay of time which is unknown to the player in advance. For different users in a round, the delays in feedback have no latent correlation. Thus, we formulate an adversarial multi-armed bandit problem with multi-user delayed feedback and design a modified EXP3 algorithm named MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm named AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.
    摘要 For the adversarial multi-armed bandit problem with delayed feedback, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. As the player picks an arm, feedback from multiple users may not be received instantly yet after an arbitrary delay of time which is unknown to the player in advance. For different users in a round, the delays in feedback have no latent correlation. Thus, we formulate an adversarial multi-armed bandit problem with multi-user delayed feedback and design a modified EXP3 algorithm named MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm named AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.Here is the translation in Traditional Chinese:For the adversarial multi-armed bandit problem with delayed feedback, we consider that the delayed feedback results are from multiple users and are unrestricted on internal distribution. As the player picks an arm, feedback from multiple users may not be received instantly yet after an arbitrary delay of time which is unknown to the player in advance. For different users in a round, the delays in feedback have no latent correlation. Thus, we formulate an adversarial multi-armed bandit problem with multi-user delayed feedback and design a modified EXP3 algorithm named MUD-EXP3, which makes a decision at each round by considering the importance-weighted estimator of the received feedback from different users. On the premise of known terminal round index $T$, the number of users $M$, the number of arms $N$, and upper bound of delay $d_{max}$, we prove a regret of $\mathcal{O}(\sqrt{TM^2\ln{N}(N\mathrm{e}+4d_{max})})$. Furthermore, for the more common case of unknown $T$, an adaptive algorithm named AMUD-EXP3 is proposed with a sublinear regret with respect to $T$. Finally, extensive experiments are conducted to indicate the correctness and effectiveness of our algorithms.

Efficiently Visualizing Large Graphs

  • paper_url: http://arxiv.org/abs/2310.11186
  • repo_url: https://github.com/charlie-xiao/embedding-visualization-test
  • paper_authors: Xinyu Li, Yao Xiao, Yuchen Zhou
  • for: 这个论文旨在提出一种基于维度减少的图像化方法,用于可读性地显示图的结构。
  • methods: 该方法基于t-SNE算法,但是它采用了邻域结构来降低时间复杂度,从而支持更大的图。此外,该方法还结合了laplacian eigenmaps和最短路算法,以获得高维度的图嵌入。
  • results: 通过使用这种方法,可以在5分钟内图像化300K个节点和1M个边的图,并且可以达到约10%的视觉质量提升。代码和数据可以在https://github.com/Charlie-XIAO/embedding-visualization-test上获取。
    Abstract Most existing graph visualization methods based on dimension reduction are limited to relatively small graphs due to performance issues. In this work, we propose a novel dimension reduction method for graph visualization, called t-Distributed Stochastic Graph Neighbor Embedding (t-SGNE). t-SGNE is specifically designed to visualize cluster structures in the graph. As a variant of the standard t-SNE method, t-SGNE avoids the time-consuming computations of pairwise similarity. Instead, it uses the neighbor structures of the graph to reduce the time complexity from quadratic to linear, thus supporting larger graphs. In addition, to suit t-SGNE, we combined Laplacian Eigenmaps with the shortest path algorithm in graphs to form the graph embedding algorithm ShortestPath Laplacian Eigenmaps Embedding (SPLEE). Performing SPLEE to obtain a high-dimensional embedding of the large-scale graph and then using t-SGNE to reduce its dimension for visualization, we are able to visualize graphs with up to 300K nodes and 1M edges within 5 minutes and achieve approximately 10% improvement in visualization quality. Codes and data are available at https://github.com/Charlie-XIAO/embedding-visualization-test.
    摘要 现有的图视化方法基于维度减少通常只能处理相对较小的图,由于性能问题。在这个工作中,我们提出了一种新的维度减少方法 для图视化,即t-Distributed Stochastic Graph Neighbor Embedding(t-SGNE)。t-SGNE专门用于描述图中的集群结构。作为标准t-SNE方法的变体,t-SGNE避免了对对应之间的相似性进行时间消耗的计算,而是使用图中的邻居结构来降低时间复杂度从quadratico至线性,因此可以支持更大的图。此外,为了适应t-SGNE,我们将Laplacian Eigenmaps与图中最短路算法组合成为图嵌入算法ShortestPath Laplacian Eigenmaps Embedding(SPLEE)。通过对大规模图进行SPLEE嵌入,并使用t-SGNE减少其维度进行视化,我们可以在5分钟内视化300K个节点和1M个边的图,并达到约10%的视化质量提升。代码和数据可以在https://github.com/Charlie-XIAO/embedding-visualization-test中找到。

Serenade: A Model for Human-in-the-loop Automatic Chord Estimation

  • paper_url: http://arxiv.org/abs/2310.11165
  • repo_url: None
  • paper_authors: Hendrik Vincent Koops, Gianluca Micchi, Ilaria Manco, Elio Quinton
  • for: 这 paper 的目的是提高 Musical Instrument Digital Interface (MIDI) 任务中的自动分 segmentation、 corpus analysis 和自动和声标注 estimation 的精度。
  • methods: 这 paper 使用了一种新的人类和自动推理模型共同创建和声标注的方法,其中人类在自动生成和声预测时提供精度优化的稀疏标注,而模型则根据人类指导进行修正。
  • results: 这 paper 在一个流行音乐数据集上进行了评估,并显示了人类和模型共同创建和声标注的方法可以提高和声分析性能,并且人类的贡献被模型的第二次、受限预测所强调。
    Abstract Computational harmony analysis is important for MIR tasks such as automatic segmentation, corpus analysis and automatic chord label estimation. However, recent research into the ambiguous nature of musical harmony, causing limited inter-rater agreement, has made apparent that there is a glass ceiling for common metrics such as accuracy. Commonly, these issues are addressed either in the training data itself by creating majority-rule annotations or during the training phase by learning soft targets. We propose a novel alternative approach in which a human and an autoregressive model together co-create a harmonic annotation for an audio track. After automatically generating harmony predictions, a human sparsely annotates parts with low model confidence and the model then adjusts its predictions following human guidance. We evaluate our model on a dataset of popular music and we show that, with this human-in-the-loop approach, harmonic analysis performance improves over a model-only approach. The human contribution is amplified by the second, constrained prediction of the model.
    摘要 计算音乐和谐分析对音乐信息 Retrieval(MIR)任务如自动分割、文献分析和自动和声标注有着重要的作用。然而,最近关于音乐和谐的抽象性的研究,使得限制了通用指标的准确率。通常,这些问题通过创建多数规则约束或在训练阶段学习软目标来解决。我们提出了一种新的人机合作方法,在音频轨道上,人类和自动推理模型共同创建和谐标注。首先,模型自动生成和谐预测,然后人类精选部分低度信任的部分并让模型根据人类指导更新预测。我们对流行音乐数据集进行评估,并显示了人机共同Loop Approach可以提高和谐分析性能,人类贡献被模型第二次预测所增强。

A new high-resolution indoor radon map for Germany using a machine learning based probabilistic exposure model

  • paper_url: http://arxiv.org/abs/2310.11143
  • repo_url: None
  • paper_authors: Eric Petermann, Peter Bossew, Joachim Kemski, Valeria Gruber, Nils Suhr, Bernd Hoffmann
  • for: 这个研究是为了更正确地估计室内Radon浓度分布,并且实现高空间分辨率的推估。
  • methods: 这篇研究使用了一个两阶段的模型方法,包括一个量iles regression forest以环境和建筑资料作为预测器,以及一个可靠的Monte Carlo抽样法来组合和人口权重平均。
  • results: 研究结果显示,室内Radon浓度的算术平均值为63 Bq/m3,几何平均值为41 Bq/m3,95 %ile值为180 Bq/m3。对100 Bq/m3和300 Bq/m3的过衡 probabilities是12.5 % (10.5 million people)和2.2 % (1.9 million people),分别。在大城市地区,个人室内Radon曝露较在乡村地区低,这是因为人口分布不同。
    Abstract Radon is a carcinogenic, radioactive gas that can accumulate indoors. Indoor radon exposure at the national scale is usually estimated on the basis of extensive measurement campaigns. However, characteristics of the sample often differ from the characteristics of the population due to the large number of relevant factors such as the availability of geogenic radon or floor level. Furthermore, the sample size usually does not allow exposure estimation with high spatial resolution. We propose a model-based approach that allows a more realistic estimation of indoor radon distribution with a higher spatial resolution than a purely data-based approach. We applied a two-stage modelling approach: 1) a quantile regression forest using environmental and building data as predictors was applied to estimate the probability distribution function of indoor radon for each floor level of each residential building in Germany; (2) a probabilistic Monte Carlo sampling technique enabled the combination and population weighting of floor-level predictions. In this way, the uncertainty of the individual predictions is effectively propagated into the estimate of variability at the aggregated level. The results give an arithmetic mean of 63 Bq/m3, a geometric mean of 41 Bq/m3 and a 95 %ile of 180 Bq/m3. The exceedance probability for 100 Bq/m3 and 300 Bq/m3 are 12.5 % (10.5 million people) and 2.2 % (1.9 million people), respectively. In large cities, individual indoor radon exposure is generally lower than in rural areas, which is a due to the different distribution of the population on floor levels. The advantages of our approach are 1) an accurate exposure estimation even if the survey was not fully representative with respect to the main controlling factors, and 2) an estimate of the exposure distribution with a much higher spatial resolution than basic descriptive statistics.
    摘要 气体氧化物Radon是一种致癌的放射性气体,可以在室内堆积。室内Radon暴露的国家规模通常通过广泛的测量运动来估算。然而,样本特点与人口特点之间存在许多相关因素,如地源Radon的可用性和地板层。此外,样本大小通常无法实现高空间分辨率的暴露估计。我们提出了一种基于模型的方法,可以更加准确地估计室内Radon分布,并提高空间分辨率。我们采用了两个阶段的模型方法:1. 使用缺陷回归森林来估计室内Radon的分布函数,使用环境和建筑数据作为预测器。2. 使用 probabilistic Monte Carlo sampling technique来组合和人口权重 floor-level 预测。这样,个体预测的uncertainty会被有效地传递到聚合水平的估计中。结果表明, arithmetic mean 为 63 Bq/m3, geometric mean 为 41 Bq/m3,和 95%ile 为 180 Bq/m3。100 Bq/m3和300 Bq/m3的超过 probabilities 分别为 12.5% (10.5 million people) 和 2.2% (1.9 million people)。在大城市地区,室内Radon暴露通常比农村地区低,这是因为人口分布不同。我们的方法的优点包括:1. 即使调查不具有完全反映主要控制因素的 representativeness,也可以准确地估计暴露水平。2. 可以提供高空间分辨率的暴露估计,比基本描述统计数据更加精准。

Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control

  • paper_url: http://arxiv.org/abs/2310.11138
  • repo_url: None
  • paper_authors: Chao Li, Chen Gong, Qiang He, Xinwen Hou
  • for: 这个研究的目的是强化深度强化学习(DRL) ensemble 方法的 empirical 成功,并且提高多模型的统计价值估计和政策的稳定性。
  • methods: 本研究使用了 Trajectories-awarE Ensemble exploratioN (TEEN) 算法,它的目的是增加多条 trajectory 的丰富性,以提高 ensemble 政策的表现。
  • results: 实验结果显示,TEEN 可以提高 ensemble 政策的表现,比基eline ensemble DRL 算法高出41%。
    Abstract The combination of deep reinforcement learning (DRL) with ensemble methods has been proved to be highly effective in addressing complex sequential decision-making problems. This success can be primarily attributed to the utilization of multiple models, which enhances both the robustness of the policy and the accuracy of value function estimation. However, there has been limited analysis of the empirical success of current ensemble RL methods thus far. Our new analysis reveals that the sample efficiency of previous ensemble DRL algorithms may be limited by sub-policies that are not as diverse as they could be. Motivated by these findings, our study introduces a new ensemble RL algorithm, termed \textbf{T}rajectories-awar\textbf{E} \textbf{E}nsemble exploratio\textbf{N} (TEEN). The primary goal of TEEN is to maximize the expected return while promoting more diverse trajectories. Through extensive experiments, we demonstrate that TEEN not only enhances the sample diversity of the ensemble policy compared to using sub-policies alone but also improves the performance over ensemble RL algorithms. On average, TEEN outperforms the baseline ensemble DRL algorithms by 41\% in performance on the tested representative environments.
    摘要 深度强化学习(DRL)和集成方法的组合在复杂的顺序决策问题上表现出非常高效。这种成功可以归功于多个模型的使用,增强政策的稳健性和估值函数的准确性。然而,现有的集成RL算法的实证成功仍然受限。我们的新分析发现,前一些集成DRL算法的样本效率可能受到较差的优化策略的限制。驱动于这些发现,我们的研究提出了一种新的集成RL算法,名为天文道-探索-集成探索(TEEN)。TEEN的主要目标是 Maximize the expected return while promoting more diverse trajectories。我们通过广泛的实验表明,TEEN不仅提高了集成政策的样本多样性,还超过了基eline集成DRL算法的性能。在测试环境中,TEEN平均比基eline算法提高41%的性能。

Non-parametric Conditional Independence Testing for Mixed Continuous-Categorical Variables: A Novel Method and Numerical Evaluation

  • paper_url: http://arxiv.org/abs/2310.11132
  • repo_url: None
  • paper_authors: Oana-Iuliana Popescu, Andreas Gerhardus, Jakob Runge
  • for: 这篇研究是关于 conditional independence testing (CIT) 的 mixed-type dataset 的应用。
  • methods: 这篇研究使用了 conditional mutual information (CMI) 估计器,与 local permutation scheme,并比较了两种新的 CMI 估计器:一种是基于 k-nearest-neighbors (k-NN) 方法,另一种是基于 entropy 度量。
  • results: 研究发现,该 variants 可以更好地检测依赖性,并且可以适应不同的数据分布和预processing 类型。
    Abstract Conditional independence testing (CIT) is a common task in machine learning, e.g., for variable selection, and a main component of constraint-based causal discovery. While most current CIT approaches assume that all variables are numerical or all variables are categorical, many real-world applications involve mixed-type datasets that include numerical and categorical variables. Non-parametric CIT can be conducted using conditional mutual information (CMI) estimators combined with a local permutation scheme. Recently, two novel CMI estimators for mixed-type datasets based on k-nearest-neighbors (k-NN) have been proposed. As with any k-NN method, these estimators rely on the definition of a distance metric. One approach computes distances by a one-hot encoding of the categorical variables, essentially treating categorical variables as discrete-numerical, while the other expresses CMI by entropy terms where the categorical variables appear as conditions only. In this work, we study these estimators and propose a variation of the former approach that does not treat categorical variables as numeric. Our numerical experiments show that our variant detects dependencies more robustly across different data distributions and preprocessing types.
    摘要 <>将文本翻译成简化中文。<> Conditional independence testing (CIT) 是机器学习中常见的任务之一,例如变量选择,并是约束基于 causal discovery 的主要组成部分。而现实中的大多数 CIT 方法假设所有变量都是数值型或所有变量都是类别型,但是实际应用中经常会出现混合类型的数据集。非 Parametric CIT 可以通过 conditional mutual information (CMI) 估计器和地方排序方案来进行。最近,两种新的 CMI 估计器 для混合类型数据集基于 k-nearest-neighbors (k-NN) 已经被提出。这些估计器都取决于距离度量的定义。一种方法通过一个一键编码的 categorical 变量来计算距离,实际上将 categorical 变量当作数值型处理,而另一种方法通过 entropy 表达来计算 CMI,在 categorical 变量中只有作为条件出现。在这项工作中,我们研究这些估计器,并提出一种不对 categorical 变量进行数值化的变体。我们的数值实验表明,我们的变体在不同的数据分布和预处理类型下能够更加稳定地检测依赖关系。

FROST: Towards Energy-efficient AI-on-5G Platforms – A GPU Power Capping Evaluation

  • paper_url: http://arxiv.org/abs/2310.11131
  • repo_url: None
  • paper_authors: Ioannis Mavromatis, Stefano De Feo, Pietro Carnelli, Robert J. Piechocki, Aftab Khan
  • for: 该论文targets the Open Radio Access Network (O-RAN) market, which is expected to experience significant growth in the coming years. The paper aims to optimize the energy consumption of Machine Learning (ML) pipelines in O-RAN ecosystems.
  • methods: 该论文提出了一种名为FROST的解决方案,即Flexible Reconfiguration method with Online System Tuning。FROST可以 Profiling the energy consumption of an ML pipeline and optimizing the hardware accordingly, thereby limiting the power draw.
  • results: 根据该论文的发现,FROST可以实现energy savings of up to 26.4% without compromising the model’s accuracy or introducing significant time delays.
    Abstract The Open Radio Access Network (O-RAN) is a burgeoning market with projected growth in the upcoming years. RAN has the highest CAPEX impact on the network and, most importantly, consumes 73% of its total energy. That makes it an ideal target for optimisation through the integration of Machine Learning (ML). However, the energy consumption of ML is frequently overlooked in such ecosystems. Our work addresses this critical aspect by presenting FROST - Flexible Reconfiguration method with Online System Tuning - a solution for energy-aware ML pipelines that adhere to O-RAN's specifications and principles. FROST is capable of profiling the energy consumption of an ML pipeline and optimising the hardware accordingly, thereby limiting the power draw. Our findings indicate that FROST can achieve energy savings of up to 26.4% without compromising the model's accuracy or introducing significant time delays.
    摘要 openRadio Access Network (O-RAN) 是一个快速发展的市场,未来几年将出现快速增长。RAN 是网络总体的最高CapEx 成本和73% 的能源消耗,因此它成为了优化的目标。然而,机器学习 (ML) 在这些生态系统中的能源消耗frequently 被忽略。我们的工作解决了这个关键问题,提出了FROST - 可变化的重配置方法与在线系统调整 - 一种遵循 O-RAN 规范和原则的能源意识机器学习管道解决方案。FROST 可以对机器学习管道的能源消耗进行 profiling,并根据硬件进行优化,从而限制能源浪费。我们的发现表明,FROST 可以实现能源节约达到 26.4% 而不会 compromise 模型准确性或引入显著的时间延迟。

Topological Expressivity of ReLU Neural Networks

  • paper_url: http://arxiv.org/abs/2310.11130
  • repo_url: None
  • paper_authors: Ekin Ergen, Moritz Grillo
  • for: 本研究探讨了ReLU神经网络在二分类问题中的表达能力,从拓扑角度来看。
  • methods: 本研究使用了Betti数来度量神经网络对数据集的拓扑简化程度。
  • results: 研究结果表明,深度的ReLU神经网络在二分类问题中的表达能力比浅度的神经网络更强,具体来说是 exponentially more powerful。这提供了一个数学上的正式解释,为什么深度的神经网络能够更好地处理复杂和拓扑 ric 的数据集。
    Abstract We study the expressivity of ReLU neural networks in the setting of a binary classification problem from a topological perspective. Recently, empirical studies showed that neural networks operate by changing topology, transforming a topologically complicated data set into a topologically simpler one as it passes through the layers. This topological simplification has been measured by Betti numbers, which are algebraic invariants of a topological space. We use the same measure to establish lower and upper bounds on the topological simplification a ReLU neural network can achieve with a given architecture. We therefore contribute to a better understanding of the expressivity of ReLU neural networks in the context of binary classification problems by shedding light on their ability to capture the underlying topological structure of the data. In particular the results show that deep ReLU neural networks are exponentially more powerful than shallow ones in terms of topological simplification. This provides a mathematically rigorous explanation why deeper networks are better equipped to handle complex and topologically rich datasets.
    摘要 我们研究使用ReLU神经网络进行二分类问题的表达能力,从拓扑角度来看。最近的实验证明,神经网络在传递层次时会改变拓扑结构,将复杂的拓扑数据集转化为简单的拓扑结构。这种拓扑简化的度量使用Betti数,它是一种拓扑空间的代数 invariants。我们使用这个度量来确定ReLU神经网络的拓扑简化能力,并提出了对于给定架构的下限和上限。因此,我们对于二分类问题中ReLU神经网络的表达能力做出了更深入的理解,特别是结果表明深度的ReLU神经网络在拓扑简化方面的表达能力是极大的,这提供了一种数学上的正式解释,为复杂和拓扑 ric的数据集进行处理,为何更深度的网络更好。

On the Temperature of Bayesian Graph Neural Networks for Conformal Prediction

  • paper_url: http://arxiv.org/abs/2310.11479
  • repo_url: None
  • paper_authors: Seohyeon Cha, Honggu Kang, Joonhyuk Kang
  • for: 提高Graph Neural Networks(GNNs)中的不确定性评估,尤其在高风险领域where GNNs frequently employed。
  • methods: 使用Conformal Prediction(CP)框架,提供有效的预测集, garantizing formal probabilistic guarantees that a prediction set contains a true label with a desired probability。
  • results: 实验表明,可以通过设置温度参数,使得预测集更加有效率。此外,我们还进行了一种分析,以便了解CP性能和模型准确性之间的关系。
    Abstract Accurate uncertainty quantification in graph neural networks (GNNs) is essential, especially in high-stakes domains where GNNs are frequently employed. Conformal prediction (CP) offers a promising framework for quantifying uncertainty by providing $\textit{valid}$ prediction sets for any black-box model. CP ensures formal probabilistic guarantees that a prediction set contains a true label with a desired probability. However, the size of prediction sets, known as $\textit{inefficiency}$, is influenced by the underlying model and data generating process. On the other hand, Bayesian learning also provides a credible region based on the estimated posterior distribution, but this region is $\textit{well-calibrated}$ only when the model is correctly specified. Building on a recent work that introduced a scaling parameter for constructing valid credible regions from posterior estimate, our study explores the advantages of incorporating a temperature parameter into Bayesian GNNs within CP framework. We empirically demonstrate the existence of temperatures that result in more efficient prediction sets. Furthermore, we conduct an analysis to identify the factors contributing to inefficiency and offer valuable insights into the relationship between CP performance and model calibration.
    摘要 precisions of uncertainty quantification in graph neural networks (GNNs) is crucial, especially in high-stakes domains where GNNs are widely used. Conformal prediction (CP) provides a promising framework for uncertainty quantification by offering valid prediction sets for any black-box model. CP guarantees formal probabilistic guarantees that a prediction set contains a true label with a desired probability. However, the size of prediction sets, known as inefficiency, is influenced by the underlying model and data generating process. On the other hand, Bayesian learning provides a credible region based on the estimated posterior distribution, but this region is well-calibrated only when the model is correctly specified. Building on a recent work that introduced a scaling parameter for constructing valid credible regions from posterior estimate, our study explores the advantages of incorporating a temperature parameter into Bayesian GNNs within CP framework. We empirically demonstrate the existence of temperatures that result in more efficient prediction sets. Furthermore, we conduct an analysis to identify the factors contributing to inefficiency and offer valuable insights into the relationship between CP performance and model calibration.Here's the translation in Traditional Chinese: precisions of uncertainty quantification in graph neural networks (GNNs) is crucial, especially in high-stakes domains where GNNs are widely used. Conformal prediction (CP) provides a promising framework for uncertainty quantification by offering valid prediction sets for any black-box model. CP guarantees formal probabilistic guarantees that a prediction set contains a true label with a desired probability. However, the size of prediction sets, known as inefficiency, is influenced by the underlying model and data generating process. On the other hand, Bayesian learning provides a credible region based on the estimated posterior distribution, but this region is well-calibrated only when the model is correctly specified. Building on a recent work that introduced a scaling parameter for constructing valid credible regions from posterior estimate, our study explores the advantages of incorporating a temperature parameter into Bayesian GNNs within CP framework. We empirically demonstrate the existence of temperatures that result in more efficient prediction sets. Furthermore, we conduct an analysis to identify the factors contributing to inefficiency and offer valuable insights into the relationship between CP performance and model calibration.

Sensitivity-Aware Amortized Bayesian Inference

  • paper_url: http://arxiv.org/abs/2310.11122
  • repo_url: None
  • paper_authors: Lasse Elsemüller, Hans Olischläger, Marvin Schmitt, Paul-Christian Bürkner, Ullrich Köthe, Stefan T. Radev
  • for: 这篇论文的目的是提出一种可以在不同假设下进行 Bayesian 推理的方法,以便更好地了解不同假设下的结果之间的关系。
  • methods: 这篇论文使用了 neural network 来实现 simulation-based inference,并利用 weight sharing 技术来编码结构相似性。
  • results: 该方法可以快速地评估不同假设下的结果之间的敏感性,并且可以使用 neural network ensemble 来评估模型的变化。
    Abstract Bayesian inference is a powerful framework for making probabilistic inferences and decisions under uncertainty. Fundamental choices in modern Bayesian workflows concern the specification of the likelihood function and prior distributions, the posterior approximator, and the data. Each choice can significantly influence model-based inference and subsequent decisions, thereby necessitating sensitivity analysis. In this work, we propose a multifaceted approach to integrate sensitivity analyses into amortized Bayesian inference (ABI, i.e., simulation-based inference with neural networks). First, we utilize weight sharing to encode the structural similarities between alternative likelihood and prior specifications in the training process with minimal computational overhead. Second, we leverage the rapid inference of neural networks to assess sensitivity to various data perturbations or pre-processing procedures. In contrast to most other Bayesian approaches, both steps circumvent the costly bottleneck of refitting the model(s) for each choice of likelihood, prior, or dataset. Finally, we propose to use neural network ensembles to evaluate variation in results induced by unreliable approximation on unseen data. We demonstrate the effectiveness of our method in applied modeling problems, ranging from the estimation of disease outbreak dynamics and global warming thresholds to the comparison of human decision-making models. Our experiments showcase how our approach enables practitioners to effectively unveil hidden relationships between modeling choices and inferential conclusions.
    摘要 泛bayesian推理是一种强大的推理框架,用于在不纯的情况下做出推理和决策。现代泛bayesian工作流程中的基本选择包括可信度函数和先验分布的规定、 posterior approximator 和数据。每一个选择都会对模型基于推理和后续决策产生重要影响,因此需要敏感分析。在这种工作中,我们提议一种多方面的方法,将敏感分析integrated into amortized Bayesian inference (ABI,即通过神经网络进行 simulations-based inference)。首先,我们利用 weight sharing 将结构相似性编码到替代可信度函数和先验分布中的训练过程中,以 minimize computational overhead。其次,我们利用神经网络的快速推理来评估数据变化或预处理过程对结果的敏感性。与大多数泛bayesian方法不同,这两个步骤都可以避免对模型的重新适应过程中的成本。最后,我们提议使用神经网络集合来评估未知数据上的结果变化。我们在应用模型问题中进行了实验,从疾病爆发动力和全球暖化阈值估计到人类决策模型的比较。我们的实验显示了我们的方法可以帮助实践者更好地揭示模型选择和推理结论之间的隐藏关系。

Minimally Informed Linear Discriminant Analysis: training an LDA model with unlabelled data

  • paper_url: http://arxiv.org/abs/2310.11110
  • repo_url: None
  • paper_authors: Nicolas Heintz, Tom Francart, Alexander Bertrand
  • for: 这 paper 是用来解释如何使用 Linear Discriminant Analysis (LDA) 算法来解决无标签数据的分类问题。
  • methods: 这 paper 使用了一种名为 Minimally Informed Linear Discriminant Analysis (MILDA) 模型,该模型可以在没有标签数据的情况下计算出 LDA 投影向量,只需要一些最小的先验信息。
  • results: 该 paper 的实验结果表明,MILDA 模型可以准确地模型分类问题,并且可以快速适应非站ARY 数据,这使得它成为一个可靠的 adaptive classifier。
    Abstract Linear Discriminant Analysis (LDA) is one of the oldest and most popular linear methods for supervised classification problems. In this paper, we demonstrate that it is possible to compute the exact projection vector from LDA models based on unlabelled data, if some minimal prior information is available. More precisely, we show that only one of the following three pieces of information is actually sufficient to compute the LDA projection vector if only unlabelled data are available: (1) the class average of one of the two classes, (2) the difference between both class averages (up to a scaling), or (3) the class covariance matrices (up to a scaling). These theoretical results are validated in numerical experiments, demonstrating that this minimally informed Linear Discriminant Analysis (MILDA) model closely matches the performance of a supervised LDA model. Furthermore, we show that the MILDA projection vector can be computed in a closed form with a computational cost comparable to LDA and is able to quickly adapt to non-stationary data, making it well-suited to use as an adaptive classifier.
    摘要
  1. 一个类型的类均值(或一个类型的平均值)2. 两个类型之间的差异(几乎可以忽略扩大)3. 两个类型的类 covariance 矩阵(几乎可以忽略扩大)这些理论结果在数学实验中得到了验证,表明了这种只需要最小先验信息的 Linear Discriminant Analysis (MILDA) 模型能够与指导类型的 LDA 模型准确匹配。此外,我们还证明了 MILDA 投影向量可以在关闭形式下计算,计算成本与 LDA 相当,能够快速适应非站点数据,使其成为一种适用于适应类型的批处理器。

Local Lipschitz Constant Computation of ReLU-FNNs: Upper Bound Computation with Exactness Verification

  • paper_url: http://arxiv.org/abs/2310.11104
  • repo_url: None
  • paper_authors: Yoshio Ebihara, Xin Dai, Victor Magron, Dimitri Peaucelle, Sophie Tarbouriech
  • for: 这 paper 关注计算Feedforward Neural Networks (FNNs) 的地方 Lipschitz 常数,其中 activation functions 是 Rectified Linear Units (ReLUs)。地方 Lipschitz 常数 是一个 reasonable measure для FNNs 的评估量。
  • methods: 我们首先将 upper bound 计算问题转化为一个 semidefinite programming problem (SDP),然后引入新的 copositive multipliers 来准确地捕捉 ReLU 的行为。然后,我们通过对 SDP 的 dual 进行分析,提出一种可靠的检验方法来验证 computed upper bound 的准确性。
  • results: 我们通过数学实验来证明方法的有效性,并通过实验示例来验证方法在实际 FNNs 上的可行性。
    Abstract This paper is concerned with the computation of the local Lipschitz constant of feedforward neural networks (FNNs) with activation functions being rectified linear units (ReLUs). The local Lipschitz constant of an FNN for a target input is a reasonable measure for its quantitative evaluation of the reliability. By following a standard procedure using multipliers that capture the behavior of ReLUs,we first reduce the upper bound computation problem of the local Lipschitz constant into a semidefinite programming problem (SDP). Here we newly introduce copositive multipliers to capture the ReLU behavior accurately. Then, by considering the dual of the SDP for the upper bound computation, we second derive a viable test to conclude the exactness of the computed upper bound. However, these SDPs are intractable for practical FNNs with hundreds of ReLUs. To address this issue, we further propose a method to construct a reduced order model whose input-output property is identical to the original FNN over a neighborhood of the target input. We finally illustrate the effectiveness of the model reduction and exactness verification methods with numerical examples of practical FNNs.
    摘要 We first convert the upper bound computation problem of the local Lipschitz constant into a semidefinite programming problem (SDP) using multipliers that capture the ReLU behavior. To improve the accuracy of the computation, we introduce new copositive multipliers.Next, we derive a feasibility test for the computed upper bound by considering the dual of the SDP. However, these SDPs are computationally intractable for practical FNNs with hundreds of ReLUs.To address this issue, we propose a method to construct a reduced order model whose input-output property is identical to the original FNN over a neighborhood of the target input. We demonstrate the effectiveness of the model reduction and exactness verification methods with numerical examples of practical FNNs.

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

  • paper_url: http://arxiv.org/abs/2310.11096
  • repo_url: https://github.com/samsunglabs/sparse-multi-dnn-scheduling
  • paper_authors: Hongxiang Fan, Stylianos I. Venieris, Alexandros Kouris, Nicholas D. Lane
  • for: 本研究旨在探讨多个稀疏深度神经网络(DNN)在 Edge 设备和数据中心之间的平衡执行。
  • methods: 本文使用了多种简化 Approaches,包括静态稀疏模型和动态稀疏信息,以提高稀疏多DNN 调度。
  • results: 对于多种部署场景,本文提出了一种新的双级静态和动态调度策略,并通过实验证明其与状态艺术方法相比,可以降低对响应时间的依赖性,并提高平均 норма化响应时间。
    Abstract Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices, such as mobile phones where multiple tasks serve a single user for daily activities, and data centers, where various requests are raised from millions of users, as seen with large language models. To reduce the costly computational and memory requirements of these workloads, various efficient sparsification approaches have been introduced, resulting in widespread sparsity across different types of DNN models. In this context, there is an emerging need for scheduling sparse multi-DNN workloads, a problem that is largely unexplored in previous literature. This paper systematically analyses the use-cases of multiple sparse DNNs and investigates the opportunities for optimizations. Based on these findings, we propose Dysta, a novel bi-level dynamic and static scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Both static and dynamic components of Dysta are jointly designed at the software and hardware levels, respectively, to improve and refine the scheduling approach. To facilitate future progress in the study of this class of workloads, we construct a public benchmark that contains sparse multi-DNN workloads across different deployment scenarios, spanning from mobile phones and AR/VR wearables to data centers. A comprehensive evaluation on the sparse multi-DNN benchmark demonstrates that our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time. Our artifacts and code are publicly available at: https://github.com/SamsungLabs/Sparse-Multi-DNN-Scheduling.
    摘要 running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices, such as mobile phones where multiple tasks serve a single user for daily activities, and data centers, where various requests are raised from millions of users, as seen with large language models. To reduce the costly computational and memory requirements of these workloads, various efficient sparsification approaches have been introduced, resulting in widespread sparsity across different types of DNN models. In this context, there is an emerging need for scheduling sparse multi-DNN workloads, a problem that is largely unexplored in previous literature. This paper systematically analyzes the use-cases of multiple sparse DNNs and investigates the opportunities for optimizations. Based on these findings, we propose Dysta, a novel bi-level dynamic and static scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Both static and dynamic components of Dysta are jointly designed at the software and hardware levels, respectively, to improve and refine the scheduling approach. To facilitate future progress in the study of this class of workloads, we construct a public benchmark that contains sparse multi-DNN workloads across different deployment scenarios, spanning from mobile phones and AR/VR wearables to data centers. A comprehensive evaluation on the sparse multi-DNN benchmark demonstrates that our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time. Our artifacts and code are publicly available at: https://github.com/SamsungLabs/Sparse-Multi-DNN-Scheduling.

Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs

  • paper_url: http://arxiv.org/abs/2310.11094
  • repo_url: None
  • paper_authors: Uri Stern, Daphna Weinshall
  • for: 这个论文的目的是解释深度神经网络中的过拟合现象,并提出一种新的评估过拟合方法。
  • methods: 这篇论文使用了一种新的评估过拟合方法,它基于 validate 数据集中模型忘记率的评估。
  • results: 该论文的实验结果表明,过拟合可以在模型训练过程中发生,并且可能更为常见 than 先前认为的。此外,该论文还提出了一种基于单个网络训练历史的新ensemble方法,该方法可以提高性能而无需额外的训练时间成本。
    Abstract The infrequent occurrence of overfit in deep neural networks is perplexing. On the one hand, theory predicts that as models get larger they should eventually become too specialized for a specific training set, with ensuing decrease in generalization. In contrast, empirical results in image classification indicate that increasing the training time of deep models or using bigger models almost never hurts generalization. Is it because the way we measure overfit is too limited? Here, we introduce a novel score for quantifying overfit, which monitors the forgetting rate of deep models on validation data. Presumably, this score indicates that even while generalization improves overall, there are certain regions of the data space where it deteriorates. When thus measured, we show that overfit can occur with and without a decrease in validation accuracy, and may be more common than previously appreciated. This observation may help to clarify the aforementioned confusing picture. We use our observations to construct a new ensemble method, based solely on the training history of a single network, which provides significant improvement in performance without any additional cost in training time. An extensive empirical evaluation with modern deep models shows our method's utility on multiple datasets, neural networks architectures and training schemes, both when training from scratch and when using pre-trained networks in transfer learning. Notably, our method outperforms comparable methods while being easier to implement and use, and further improves the performance of competitive networks on Imagenet by 1\%.
    摘要 启发性训练深度神经网络中偶尔出现过拟合现象很困惑。一面理论预测,随着模型的大小增加,它们应该逐渐变得特化于具体的训练集,导致泛化性下降。然而,实际研究发现,深度模型的训练时间增加或使用更大的模型在图像分类任务中并没有明显的泛化性下降。我们是否因为量化过拟合的方法有限而导致这种情况呢?在这里,我们引入一种新的过拟合评价指标,可以监测深度模型在验证数据上忘记率。这个指标表明,即使总的泛化性提高,仍然有一些数据空间中的忘记率下降。当如此量化过拟合时,我们发现过拟合可以发生在Validation accuracy下降和不下降的情况下,并且可能更为常见。这一观察可能有助于解释深度神经网络中的困惑场景。我们利用这些观察,构建了一种基于单个网络训练历史的新集成方法,可以在不添加训练时间成本的情况下提供显著性能提升。我们的方法在现代深度模型、不同的 neural network 架构、训练方案和传输学习中都有广泛的实际评估,并且在Imagenet上提高了1%的性能。另外,我们的方法比同类方法更容易实现和使用,并且可以进一步提高竞争力强的网络的性能。

Data Drift Monitoring for Log Anomaly Detection Pipelines

  • paper_url: http://arxiv.org/abs/2310.14893
  • repo_url: None
  • paper_authors: Dipak Wani, Samuel Ackerman, Eitan Farchi, Xiaotong Liu, Hau-wen Chang, Sarasi Lalithsena
  • for: 本文主要用于提出了一种基于 bayes factor 的偏移检测方法,用于检测日志活动偏移,并且可以帮助系统可靠工程师(SRE)在系统诊断中获得协助。
  • methods: 本文使用了 Bayes Factor 来检测日志活动偏移,并且提出了一种基于人工干预的方法来更新 LAD 模型。
  • results: 本文通过使用真实收集的日志数据和 simulate 的活动序列来证明了该方法的可靠性和有效性。
    Abstract Logs enable the monitoring of infrastructure status and the performance of associated applications. Logs are also invaluable for diagnosing the root causes of any problems that may arise. Log Anomaly Detection (LAD) pipelines automate the detection of anomalies in logs, providing assistance to site reliability engineers (SREs) in system diagnosis. Log patterns change over time, necessitating updates to the LAD model defining the `normal' log activity profile. In this paper, we introduce a Bayes Factor-based drift detection method that identifies when intervention, retraining, and updating of the LAD model are required with human involvement. We illustrate our method using sequences of log activity, both from unaltered data, and simulated activity with controlled levels of anomaly contamination, based on real collected log data.
    摘要 日志可以监控基础设施状态和相关应用程序的性能。日志也是解决问题的根本原因的诊断的重要工具。日志异常检测(LAD)管道自动检测日志中的异常情况,为站点可靠工程师(SRE)提供帮助。日志模式随着时间的变化,因此LAD模型需要定期更新。在这篇文章中,我们介绍了基于 bayes 因子的漂移检测方法,可以在人工参与下确定是否需要介入、重新训练和更新 LAD 模型。我们使用了日志活动序列,包括未修改数据和 simulated 活动,以及控制了异常污染的水平,基于实际收集的日志数据来示例我们的方法。

CSG: Curriculum Representation Learning for Signed Graph

  • paper_url: http://arxiv.org/abs/2310.11083
  • repo_url: None
  • paper_authors: Zeyu Zhang, Jiamou Liu, Kaiqi Zhao, Yifei Wang, Pengqian Han, Xianda Zheng, Qiqi Wang, Zijian Zhang
  • for: 本研究旨在提高Signed Graph Neural Networks(SGNNs)的精度和稳定性,通过设计一种基于课程的训练方法,以便更好地处理复杂的签名图。
  • methods: 本研究提出了一种基于课程的训练方法,其中样本按照难度从易到复杂进行排序,以便SGNN模型在处理不同难度的样本上进行学习。此外,我们还引入了一种轻量级的机制,以量化图的学习难度。
  • results: 经验 validate 表明,我们的训练方法可以提高 SGNN 模型的准确率,在链接签记预测(AUC)中提高了23.7%,并且可以显著降低标准差的AUC分布。
    Abstract Signed graphs are valuable for modeling complex relationships with positive and negative connections, and Signed Graph Neural Networks (SGNNs) have become crucial tools for their analysis. However, prior to our work, no specific training plan existed for SGNNs, and the conventional random sampling approach did not address varying learning difficulties within the graph's structure. We proposed a curriculum-based training approach, where samples progress from easy to complex, inspired by human learning. To measure learning difficulty, we introduced a lightweight mechanism and created the Curriculum representation learning framework for Signed Graphs (CSG). This framework optimizes the order in which samples are presented to the SGNN model. Empirical validation across six real-world datasets showed impressive results, enhancing SGNN model accuracy by up to 23.7% in link sign prediction (AUC) and significantly improving stability with an up to 8.4 reduction in the standard deviation of AUC scores.
    摘要 签名图是用于模型复杂关系的工具,它们可以表示正有负连接。然而,在我们的工作之前,没有专门的培训计划 для签名图神经网络(SGNN),而常见的随机抽样方法也不能处理图结构中的变化学习困难。我们提出了一种学习级别的培训方法,其中样本从简单到复杂进行排序,这是基于人类学习的启发。为了测量学习困难,我们引入了一种轻量级机制,并创建了签名图表示学习框架(CSG)。这个框架优化了SGNN模型被抽样的顺序。经验 validate 在六个实际数据集上表现出色,SGNN 模型的准确率提高了最多 23.7%(AUC),并有显著提高稳定性,AUC 标准差下降了最多 8.4。

Resampling Stochastic Gradient Descent Cheaply for Efficient Uncertainty Quantification

  • paper_url: http://arxiv.org/abs/2310.11065
  • repo_url: None
  • paper_authors: Henry Lam, Zitong Wang
  • for: 本研究旨在分析 Stochastic Gradient Descent(SGD)训练模型和 Stochastic Optimization 中的解决方案,并对其进行uncertainty quantification。
  • methods: 我们提出了两种 computationally cheap resampling-based方法来构建SGD解决方案的信任区间。其中一种使用多个SGD并行运行,通过从数据中采样而不填充替换,另一种在在线模式下进行。我们的方法可以视为现有批处理方法的改进,同时可以减少计算努力的重复样本需求。
  • results: 我们采用了一种称为”便宜bootstrap”的新想法和Berry-Esseen-type bound for SGD,以实现这些目标。我们的方法可以减少计算努力,同时可以快速地生成高质量的信任区间。
    Abstract Stochastic gradient descent (SGD) or stochastic approximation has been widely used in model training and stochastic optimization. While there is a huge literature on analyzing its convergence, inference on the obtained solutions from SGD has only been recently studied, yet is important due to the growing need for uncertainty quantification. We investigate two computationally cheap resampling-based methods to construct confidence intervals for SGD solutions. One uses multiple, but few, SGDs in parallel via resampling with replacement from the data, and another operates this in an online fashion. Our methods can be regarded as enhancements of established bootstrap schemes to substantially reduce the computation effort in terms of resampling requirements, while at the same time bypassing the intricate mixing conditions in existing batching methods. We achieve these via a recent so-called cheap bootstrap idea and Berry-Esseen-type bound for SGD.
    摘要

Locally Differentially Private Graph Embedding

  • paper_url: http://arxiv.org/abs/2310.11060
  • repo_url: None
  • paper_authors: Zening Li, Rong-Hua Li, Meihao Liao, Fusheng Jin, Guoren Wang
  • for: 本研究旨在开发一种满足本地散列保护(LDP)的图像抽象算法,以保护图据中敏感信息的隐私。
  • methods: 本文提出了一种名为LDP-GE的隐私保护图像抽象框架,包括一种LDP机制来隐蔽节点数据,以及使用个性化PageRank来学习节点表示。
  • results: 对多个真实世界图据集进行了广泛的实验,并证明LDP-GE在节点分类和链接预测任务中达到了有利的隐私-用途质量比。
    Abstract Graph embedding has been demonstrated to be a powerful tool for learning latent representations for nodes in a graph. However, despite its superior performance in various graph-based machine learning tasks, learning over graphs can raise significant privacy concerns when graph data involves sensitive information. To address this, in this paper, we investigate the problem of developing graph embedding algorithms that satisfy local differential privacy (LDP). We propose LDP-GE, a novel privacy-preserving graph embedding framework, to protect the privacy of node data. Specifically, we propose an LDP mechanism to obfuscate node data and adopt personalized PageRank as the proximity measure to learn node representations. Then, we theoretically analyze the privacy guarantees and utility of the LDP-GE framework. Extensive experiments conducted over several real-world graph datasets demonstrate that LDP-GE achieves favorable privacy-utility trade-offs and significantly outperforms existing approaches in both node classification and link prediction tasks.
    摘要 “图像插入”已经被证明是一种有力的工具,用于学习图像中节点的隐藏表示。然而,在学习图像时,可能会引起个人隐私问题,特别是当图像数据包含敏感信息时。为了解决这个问题,在这篇论文中,我们调查了在图像上学习隐藏表示的问题,并提出了一种具有地方敏感性(LDP)的图像插入框架。我们提议了一种LDP机制,以隐藏节点数据,并采用个性化PageRank作为距离度量来学习节点表示。然后,我们对LDP-GE框架的隐私保证和实用性进行了理论分析。广泛的实验表明,LDP-GE在节点分类和链接预测任务中具有良好的隐私-实用质量比。

Causal Feature Selection via Transfer Entropy

  • paper_url: http://arxiv.org/abs/2310.11059
  • repo_url: None
  • paper_authors: Paolo Bonetti, Alberto Maria Metelli, Marcello Restelli
  • for: 本研究旨在提出一种新的方法,旨在Feature Selection和Causal Discovery之间的交叉点上,用于处理时间序列数据。
  • methods: 我们提出了一种新的 causal feature selection 方法,该方法基于前向和后向 Feature Selection 过程,并利用了传输 entropy 来估计特征之间的 causal 信息流。
  • results: 我们提供了理论保证,证明在 exact 和 finite-sample 情况下,我们的方法可以实现更好的预测和分类性能。此外,我们还在 synthetic 和实际 regression 问题上进行了数值验证,结果与考虑的基准相当竞争。
    Abstract Machine learning algorithms are designed to capture complex relationships between features. In this context, the high dimensionality of data often results in poor model performance, with the risk of overfitting. Feature selection, the process of selecting a subset of relevant and non-redundant features, is, therefore, an essential step to mitigate these issues. However, classical feature selection approaches do not inspect the causal relationship between selected features and target, which can lead to misleading results in real-world applications. Causal discovery, instead, aims to identify causal relationships between features with observational data. In this paper, we propose a novel methodology at the intersection between feature selection and causal discovery, focusing on time series. We introduce a new causal feature selection approach that relies on the forward and backward feature selection procedures and leverages transfer entropy to estimate the causal flow of information from the features to the target in time series. Our approach enables the selection of features not only in terms of mere model performance but also captures the causal information flow. In this context, we provide theoretical guarantees on the regression and classification errors for both the exact and the finite-sample cases. Finally, we present numerical validations on synthetic and real-world regression problems, showing results competitive w.r.t. the considered baselines.
    摘要

Matrix Compression via Randomized Low Rank and Low Precision Factorization

  • paper_url: http://arxiv.org/abs/2310.11028
  • repo_url: None
  • paper_authors: Rajarshi Saha, Varun Srivastava, Mert Pilanci
    for:这个论文是为了提出一种基于矩阵的压缩算法,以实现矩阵的压缩和处理。methods:该算法利用矩阵的低级结构,通过随机抽象矩阵的列,并对这些列进行量化,以获得一个低级和低精度的矩阵分解。results:该算法可以实现矩阵的压缩,并且可以达到一比特为一的压缩率,同时保持或超过传统压缩技术的性能。
    Abstract Matrices are exceptionally useful in various fields of study as they provide a convenient framework to organize and manipulate data in a structured manner. However, modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. Although prohibitively large, such matrices are often approximately low rank. We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix $\mathbf{A}$ as $\mathbf{A} \approx \mathbf{L}\mathbf{R}$, where $\mathbf{L}$ and $\mathbf{R}$ are the low rank factors. The total number of elements in $\mathbf{L}$ and $\mathbf{R}$ can be significantly less than that in $\mathbf{A}$. Furthermore, the entries of $\mathbf{L}$ and $\mathbf{R}$ are quantized to low precision formats $--$ compressing $\mathbf{A}$ by giving us a low rank and low precision factorization. Our algorithm first computes an approximate basis of the range space of $\mathbf{A}$ by randomly sketching its columns, followed by a quantization of the vectors constituting this basis. It then computes approximate projections of the columns of $\mathbf{A}$ onto this quantized basis. We derive upper bounds on the approximation error of our algorithm, and analyze the impact of target rank and quantization bit-budget. The tradeoff between compression ratio and approximation accuracy allows for flexibility in choosing these parameters based on specific application requirements. We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-$7$b. Our results illustrate that we can achieve compression ratios as aggressive as one bit per matrix coordinate, all while surpassing or maintaining the performance of traditional compression techniques.
    摘要 矩阵在不同的领域中非常有用,因为它们可以有效地组织和处理数据。然而,现代矩阵可能包含数百亿个元素,这会导致存储和处理它们的计算资源和内存使用非常高。虽然这些矩阵可能是非常大的,但它们通常是低级别的。我们提出了一个算法,它利用这种结构来获得矩阵 $\mathbf{A}$ 的低级别分解,即 $\mathbf{A} \approx \mathbf{L}\mathbf{R}$,其中 $\mathbf{L}$ 和 $\mathbf{R}$ 是低级别因素。总的来说, $\mathbf{L}$ 和 $\mathbf{R}$ 中的元素数量可以非常少于 $\mathbf{A}$ 中的元素数量。此外, $\mathbf{L}$ 和 $\mathbf{R}$ 的元素可以使用低精度格式进行压缩,从而压缩 $\mathbf{A}$。我们的算法首先计算矩阵 $\mathbf{A}$ 的估计基准的范围空间,然后使用这个基准来压缩 $\mathbf{A}$ 的列。我们then compute approximate projections of the columns of $\mathbf{A}$ onto this quantized basis. We derive upper bounds on the approximation error of our algorithm, and analyze the impact of target rank and quantization bit-budget. The tradeoff between compression ratio and approximation accuracy allows for flexibility in choosing these parameters based on specific application requirements.我们实际实现了我们的算法,并在图像压缩、图像和文本嵌入图像的最近邻紧类фикации以及LLaMa-$7$b层的压缩中进行了实验。我们的结果表明,我们可以达到一个比特为一个矩阵坐标的压缩比,同时超越或保持传统压缩技术的性能。

SignGT: Signed Attention-based Graph Transformer for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2310.11025
  • repo_url: None
  • paper_authors: Jinsong Chen, Gaichao Li, John E. Hopcroft, Kun He
  • for: 这 paper 的目的是提出一种基于签名自注意力的图Transformers,以适应不同图的复杂关系。
  • methods: 该 paper 使用了自注意力机制,并提出了一种新的签名自注意力机制(SignSA),以便根据节点对的 semantic relevance 生成签名注意力值。此外, paper 还提出了一种结构意识Feed-Forward Network(SFFN),以保留地方topology信息。
  • results: EXTENSIVE empirical results 表明,SignGT 在 node-level 和 graph-level 任务上表现出色,超过了当前的图Transformers 和高级 GNNs。
    Abstract The emerging graph Transformers have achieved impressive performance for graph representation learning over graph neural networks (GNNs). In this work, we regard the self-attention mechanism, the core module of graph Transformers, as a two-step aggregation operation on a fully connected graph. Due to the property of generating positive attention values, the self-attention mechanism is equal to conducting a smooth operation on all nodes, preserving the low-frequency information. However, only capturing the low-frequency information is inefficient in learning complex relations of nodes on diverse graphs, such as heterophily graphs where the high-frequency information is crucial. To this end, we propose a Signed Attention-based Graph Transformer (SignGT) to adaptively capture various frequency information from the graphs. Specifically, SignGT develops a new signed self-attention mechanism (SignSA) that produces signed attention values according to the semantic relevance of node pairs. Hence, the diverse frequency information between different node pairs could be carefully preserved. Besides, SignGT proposes a structure-aware feed-forward network (SFFN) that introduces the neighborhood bias to preserve the local topology information. In this way, SignGT could learn informative node representations from both long-range dependencies and local topology information. Extensive empirical results on both node-level and graph-level tasks indicate the superiority of SignGT against state-of-the-art graph Transformers as well as advanced GNNs.
    摘要 新出现的图变换器技术已经取得了图表示学习中的出色表现,超过传统的图神经网络(GNNs)。在这项工作中,我们将自注意机制,变换器的核心模块,视为一个完全连接的图上的两步积算操作。由于生成正向注意值的性质,自注意机制等同于对所有节点进行缓和操作,保留低频信息。然而,只capture低频信息可能是学习多样性图上的节点关系不充分的。为此,我们提出了一种签名自注意机制基于图变换器(SignGT),以适应不同图上的多样性频率信息。具体来说,SignGT开发了一种新的签名自注意机制(SignSA),生成签名注意值根据节点对的semantic relevance。因此,不同节点对之间的多样频率信息可以得到细致的保留。此外,SignGT还提出了一种结构意识适应链接网络(SFFN),通过引入邻居偏好来保留本地链接信息。因此,SignGT可以从长距离依赖和本地链接信息中学习有用的节点表示。 empirical研究表明,SignGT在节点级和图级任务上表现出色,超过当前的图变换器和高级GNNs。

Pure Exploration in Asynchronous Federated Bandits

  • paper_url: http://arxiv.org/abs/2310.11015
  • repo_url: None
  • paper_authors: Zichen Wang, Chuanhao Li, Chenyu Song, Lianghui Wang, Quanquan Gu, Huazheng Wang
  • for: 这个论文是为了解决多机器人投掷机制中的联合探索问题,即多个机器人在中央服务器的协作下,找到最佳投掷机制。
  • methods: 这篇论文提出了首个在异步环境下的联合探索多机器人投掷机制和线性投掷机制算法,以提高实际场景中agent的缺失和延迟的Robustness。
  • results: 论文的理论分析表明,提议的算法在异步环境下可以 дости到近似优化的样本复杂度和有效的通信成本,并且基于实验数据表明,这些算法在实际场景中具有高效性和通信成本的优势。
    Abstract We study the federated pure exploration problem of multi-armed bandits and linear bandits, where $M$ agents cooperatively identify the best arm via communicating with the central server. To enhance the robustness against latency and unavailability of agents that are common in practice, we propose the first federated asynchronous multi-armed bandit and linear bandit algorithms for pure exploration with fixed confidence. Our theoretical analysis shows the proposed algorithms achieve near-optimal sample complexities and efficient communication costs in a fully asynchronous environment. Moreover, experimental results based on synthetic and real-world data empirically elucidate the effectiveness and communication cost-efficiency of the proposed algorithms.
    摘要 我们研究多机构共同探索多臂枪和线性枪问题,其中 $M$ 名代理人共同决定最佳臂via 与中央服务器的通信。为了增强实际中常见的延迟和代理人缺失的响应性,我们提出了首个联邦异步多臂枪和线性枪探索算法,并进行了对这些算法的理论分析。我们的分析结果显示,提案的算法可以实现近乎最佳的样本复杂度和有效的通信成本在完全异步环境中。此外,基于实验数据的实验结果也证明了提案的算法的实际性和通信成本效率。

Hyperspectral In-Memory Computing with Optical Frequency Combs and Programmable Optical Memories

  • paper_url: http://arxiv.org/abs/2310.11014
  • repo_url: None
  • paper_authors: Mostafa Honari Latifpour, Byoung Jun Park, Yoshihisa Yamamoto, Myoung-Gyun Suh
  • for: This paper aims to develop a highly parallel, programmable, and scalable optical computing system capable of handling matrix-vector multiplication operations for deep learning and optimization tasks.
  • methods: The proposed hyperspectral in-memory computing architecture integrates space multiplexing with frequency multiplexing of optical frequency combs and uses spatial light modulators as a programmable optical memory.
  • results: The authors have experimentally demonstrated multiply-accumulate operations with higher than 4-bit precision in both matrix-vector and matrix-matrix multiplications, which suggests the system’s potential for a wide variety of deep learning and optimization tasks.Here’s the text in Simplified Chinese:
  • for: 这篇论文目标是开发一种高并行、可编程、可扩展的光学计算机系统,用于处理深度学习和优化任务中的矩阵-向量乘法操作。
  • methods: 提议的幽挺色响应计算机架构,结合空间多plexing和频率多plexing的光频率剖面,使用空间光模ulator作为可编程光学记忆。
  • results: 作者们实验表明,在矩阵-向量和矩阵-矩阵乘法操作中,可以实现高于4比特精度的乘法操作。
    Abstract The rapid advancements in machine learning across numerous industries have amplified the demand for extensive matrix-vector multiplication operations, thereby challenging the capacities of traditional von Neumann computing architectures. To address this, researchers are currently exploring alternatives such as in-memory computing systems to develop faster and more energy-efficient hardware. In particular, there is renewed interest in computing systems based on optics, which could potentially handle matrix-vector multiplication in a more energy-efficient way. Despite promising initial results, developing a highly parallel, programmable, and scalable optical computing system capable of rivaling electronic computing hardware still remains elusive. In this context, we propose a hyperspectral in-memory computing architecture that integrates space multiplexing with frequency multiplexing of optical frequency combs and uses spatial light modulators as a programmable optical memory, thereby boosting the computational throughput and the energy efficiency. We have experimentally demonstrated multiply-accumulate operations with higher than 4-bit precision in both matrix-vector and matrix-matrix multiplications, which suggests the system's potential for a wide variety of deep learning and optimization tasks. This system exhibits extraordinary modularity, scalability, and programmability, effectively transcending the traditional limitations of optics-based computing architectures. Our approach demonstrates the potential to scale beyond peta operations per second, marking a significant step towards achieving high-throughput energy-efficient optical computing.
    摘要 快速发展的机器学习技术在各个领域的应用使得大量矩阵-向量乘法操作的需求增加,导致传统的 von Neumann 计算架构的能力受到挑战。为了解决这问题,研究人员正在寻找代替方案,如内存计算系统,以开发更快速、更能效的硬件。特别是,光学计算系统在处理矩阵-向量乘法方面可能存在更高的能效性。虽然初步的结果很有前途,但是开发一个高度并行、可编程、扩展的光学计算系统,能与电子计算硬件竞争仍然很困难。在这种情况下,我们提出了一种快速响应的多光谱内存计算架构,通过空间复用和频率复用光谱镜的技术,使用空间光模拟器作为可编程的光学记忆,从而提高计算通过put和能效性。我们在实验中已经实现了高于4位精度的矩阵-向量和矩阵-矩阵乘法 multiply-accumulate 操作,这表明该系统在深度学习和优化任务中的潜在能力。这种系统具有极高的可组合性、可扩展性和可编程性,实际上跨越了传统光学计算架构的限制。我们的方法可以超过PETA操作每秒,这标志着光学计算技术的高性能、能效的发展。

  • paper_url: http://arxiv.org/abs/2310.11009
  • repo_url: https://github.com/harryshomer/lpformer
  • paper_authors: Harry Shomer, Yao Ma, Haitao Mao, Juanhui Li, Bo Wu, Jiliang Tang
  • for: 链接预测任务是图structured数据中常见的任务,有各种应用。过去通常使用手动设计的规则进行预测。
  • methods: 这些方法使用message-passing神经网络(MPNN)和规则方法的优点,通过对MPNN的输出和“对称编码”(pairwise encoding)进行组合来进行预测。这些方法在许多数据集上达到了强大的表现。但是,现有的对称编码往往带有强大的推导性偏见,使用同一些下面因素来分类所有链接。
  • results: LPFormer方法可以在许多数据集上达到SOTA表现,同时保持效率。
    Abstract Link prediction is a common task on graph-structured data that has seen applications in a variety of domains. Classically, hand-crafted heuristics were used for this task. Heuristic measures are chosen such that they correlate well with the underlying factors related to link formation. In recent years, a new class of methods has emerged that combines the advantages of message-passing neural networks (MPNN) and heuristics methods. These methods perform predictions by using the output of an MPNN in conjunction with a "pairwise encoding" that captures the relationship between nodes in the candidate link. They have been shown to achieve strong performance on numerous datasets. However, current pairwise encodings often contain a strong inductive bias, using the same underlying factors to classify all links. This limits the ability of existing methods to learn how to properly classify a variety of different links that may form from different factors. To address this limitation, we propose a new method, LPFormer, which attempts to adaptively learn the pairwise encodings for each link. LPFormer models the link factors via an attention module that learns the pairwise encoding that exists between nodes by modeling multiple factors integral to link prediction. Extensive experiments demonstrate that LPFormer can achieve SOTA performance on numerous datasets while maintaining efficiency.
    摘要 链接预测是图Structured data上常见的任务,它在多个领域应用。过去,人工设计的规则通常用于这种任务。这些规则选择的目的是使其与下面的链接形成因素相吻合。在最近几年里,一种新的方法 emerge,它结合了 message-passing neural networks(MPNN)和规则方法的优点。这些方法通过 MPNN 的输出和候选链接的 "对称编码" 进行预测,其中对于每个链接, captured the relationship between nodes in the candidate link。它们在许多数据集上实现了强的表现。然而,现有的对称编码通常具有强 inductive bias,使用同一些基因来分类所有的链接。这限制了现有方法的能力,以learn how to properly classify a variety of different links that may form from different factors。为了解决这些限制,我们提出了一种新方法,LPFormer,它尝试通过 adaptively learning the pairwise encodings for each link来模型链接因素。LPFormer 通过注意力模块来学习每个链接的对称编码,该编码捕捉了 nodes 之间的多种因素。广泛的实验表明,LPFormer 可以在多个数据集上实现 SOTA 性能,同时保持效率。

Spatially-resolved hyperlocal weather prediction and anomaly detection using IoT sensor networks and machine learning techniques

  • paper_url: http://arxiv.org/abs/2310.11001
  • repo_url: None
  • paper_authors: Anita B. Agarwal, Rohit Rajesh, Nitin Arul
  • for: 本研究旨在提供高精度、快速更新的本地天气预测,以满足各种应用需求,如农业、灾害管理等。
  • methods: 本研究提议一种新的方法, combining 本地天气预测和异常检测,使用互联网物理传感器网络和高级机器学习技术。该方法利用多个空间分布的、但相对较近的位置和物理传感器数据,创建高分辨率的天气模型,可预测短期、本地化的天气 Conditions。
  • results: 研究发现,该系统可以增强天气预测的空间分辨率,同时实时检测异常天气情况。此外,该系统还可以通过不监督学习算法,找到异常天气模式,为决策提供时间性信息。
    Abstract Accurate and timely hyperlocal weather predictions are essential for various applications, ranging from agriculture to disaster management. In this paper, we propose a novel approach that combines hyperlocal weather prediction and anomaly detection using IoT sensor networks and advanced machine learning techniques. Our approach leverages data from multiple spatially-distributed yet relatively close locations and IoT sensors to create high-resolution weather models capable of predicting short-term, localized weather conditions such as temperature, pressure, and humidity. By monitoring changes in weather parameters across these locations, our system is able to enhance the spatial resolution of predictions and effectively detect anomalies in real-time. Additionally, our system employs unsupervised learning algorithms to identify unusual weather patterns, providing timely alerts. Our findings indicate that this system has the potential to enhance decision-making.
    摘要 准确和及时的本地天气预测非常重要,用于各种应用,从农业到灾害管理。在这篇论文中,我们提出了一种新的方法,它结合了本地天气预测和异常检测,使用互联网器件网络和高级机器学习技术。我们的方法利用多个位于不同地点,但相对较近的位置和互联网器件来创建高分解能力的天气模型,能够预测短期、本地化的天气条件,如温度、压力和湿度。通过监测这些位置之间的天气参数变化,我们的系统可以提高地理分解能力,并实时检测异常。此外,我们的系统使用无监督学习算法来识别异常天气模式,提供实时警报。我们的发现表明,这种系统有可能提高决策。

Program Translation via Code Distillation

  • paper_url: http://arxiv.org/abs/2310.11476
  • repo_url: None
  • paper_authors: Yufan Huang, Mengnan Qi, Yongqiang Yao, Maoquan Wang, Bin Gu, Colin Clement, Neel Sundaresan
  • for: 这篇论文主要写于如何使用Code Distillation(CoDist)模型进行软件版本迁移和程序翻译。
  • methods: 这篇论文提出了一种基于Code Distillation(CoDist)模型的方法,该方法可以捕捉代码的semantic和structural等价性,并生成一种语言无关的中间表示。这种中间表示可以作为翻译的准则,从而生成并行的训练数据集,并且可以在任何编程语言上应用。
  • results: 根据CodeXGLUE和TransCoder GeeksForGeeks翻译测试 benchmark,这种方法可以达到当前最佳性能水平,与TransCoder-ST相比,增加了12.7%的平均绝对提升。
    Abstract Software version migration and program translation are an important and costly part of the lifecycle of large codebases. Traditional machine translation relies on parallel corpora for supervised translation, which is not feasible for program translation due to a dearth of aligned data. Recent unsupervised neural machine translation techniques have overcome data limitations by included techniques such as back translation and low level compiler intermediate representations (IR). These methods face significant challenges due to the noise in code snippet alignment and the diversity of IRs respectively. In this paper we propose a novel model called Code Distillation (CoDist) whereby we capture the semantic and structural equivalence of code in a language agnostic intermediate representation. Distilled code serves as a translation pivot for any programming language, leading by construction to parallel corpora which scale to all available source code by simply applying the distillation compiler. We demonstrate that our approach achieves state-of-the-art performance on CodeXGLUE and TransCoder GeeksForGeeks translation benchmarks, with an average absolute increase of 12.7% on the TransCoder GeeksforGeeks translation benchmark compare to TransCoder-ST.
    摘要 软件版本迁移和程序翻译是大型代码库生命周期中的重要和昂贵部分。传统机器翻译依赖平行 corpora 进行监督翻译,但对程序翻译来说不是可行的,因为没有准确的数据对齐。现代无监督神经机器翻译技术已经突破了数据限制,通过包括回翻译和低级编译器中间表示(IR)等技术。然而,这些方法面临着代码片段对齐的噪音和 IR 的多样性的挑战。在这篇论文中,我们提出了一种新的模型 called Code Distillation(CoDist),它可以捕捉代码的 semantics 和结构相似性,并将其转化为语言无关的中间表示。浓缩代码可以作为任何编程语言的翻译轮廓,从而自动生成平行 corpora,并且可以通过 simply 应用浓缩编译器来扩展到所有可用的源代码。我们示示了我们的方法可以在 CodeXGLUE 和 TransCoder GeeksForGeeks 翻译benchmark中达到状态机器翻译的性能水平,与TransCoder-ST 的平均绝对增幅为12.7%。

Why Do Students Drop Out? University Dropout Prediction and Associated Factor Analysis Using Machine Learning Techniques

  • paper_url: http://arxiv.org/abs/2310.10987
  • repo_url: None
  • paper_authors: Sean Kim, Eliot Yoo, Samuel Kim
  • for: 本研究旨在预测大学学生的毕业和退学情况,以便帮助教育机构和学生更好地规划教学和学习计划。
  • methods: 本研究使用学术、民生、社会经济和macro经济数据类型进行大学生毕业和退学预测。同时,我们进行了相关因素分析,以便分析这些数据类型对机器学习模型的表现有多大影响。
  • results: 我们使用这些特征训练了四个二分类器,以确定学生会毕业或退学。总的来说,这些模型在预测退学状况时的ROC-AUC分数为0.935。对于学术数据类型,模型性能最高,当排除所有学术相关特征时,模型性能下降到0.811。初步结果表明,数据类型和退学状况之间存在相关性。
    Abstract Graduation and dropout rates have always been a serious consideration for educational institutions and students. High dropout rates negatively impact both the lives of individual students and institutions. To address this problem, this study examined university dropout prediction using academic, demographic, socioeconomic, and macroeconomic data types. Additionally, we performed associated factor analysis to analyze which type of data would be most influential on the performance of machine learning models in predicting graduation and dropout status. These features were used to train four binary classifiers to determine if students would graduate or drop out. The overall performance of the classifiers in predicting dropout status had an average ROC-AUC score of 0.935. The data type most influential to the model performance was found to be academic data, with the average ROC-AUC score dropping from 0.935 to 0.811 when excluding all academic-related features from the data set. Preliminary results indicate that a correlation does exist between data types and dropout status.
    摘要 translate into Simplified Chinese:毕业和退学率一直是教育机构和学生的严重考虑之一。高退学率对个人学生和机构都有负面影响。为了解决这个问题,本研究使用学术、人口、社会经济和 macroeconomic 数据类型进行大学退学预测。此外,我们还进行了相关因素分析,以分析这些数据类型对机器学习模型的执行性能有多大影响。这些特征被用来训练四个二分类器,以确定学生会毕业或退学。总的来说,这些分类器在预测退学状况时的 ROC-AUC 分数为 0.935。数据类型对模型性能最有影响的是学术数据,当排除所有学术相关特征时,模型的 ROC-AUC 分数从 0.935 下降至 0.811。初步结果表明,数据类型和退学状况之间存在相关性。

Exact nonlinear state estimation

  • paper_url: http://arxiv.org/abs/2310.10976
  • repo_url: None
  • paper_authors: Hristo G. Chipilski
  • for: 提高数据融合方法的准确性和稳定性,尤其是在高维模型中。
  • methods: 基于生成型人工智能技术的新非线性估计理论,拓展了现有的准 Gaussian 分布假设,提供了更高准确性和稳定性的数据融合方法。
  • results: 对理想化的统计实验进行了验证,结果表明,在观测错误小于预测不确定性和状态变量存在强非线性相互关系的情况下,ECTF 可以提供更高的准确性和稳定性。
    Abstract The majority of data assimilation (DA) methods in the geosciences are based on Gaussian assumptions. While these assumptions facilitate efficient algorithms, they cause analysis biases and subsequent forecast degradations. Non-parametric, particle-based DA algorithms have superior accuracy, but their application to high-dimensional models still poses operational challenges. Drawing inspiration from recent advances in the field of generative artificial intelligence (AI), this article introduces a new nonlinear estimation theory which attempts to bridge the existing gap in DA methodology. Specifically, a Conjugate Transform Filter (CTF) is derived and shown to generalize the celebrated Kalman filter to arbitrarily non-Gaussian distributions. The new filter has several desirable properties, such as its ability to preserve statistical relationships in the prior state and convergence to highly accurate observations. An ensemble approximation of the new theory (ECTF) is also presented and validated using idealized statistical experiments that feature bounded quantities with non-Gaussian distributions, a prevalent challenge in Earth system models. Results from these experiments indicate that the greatest benefits from ECTF occur when observation errors are small relative to the forecast uncertainty and when state variables exhibit strong nonlinear dependencies. Ultimately, the new filtering theory offers exciting avenues for improving conventional DA algorithms through their principled integration with AI techniques.
    摘要 大多数数据吸收(DA)方法在地球科学中基于 Gaussian 假设。这些假设使得算法效率高,但会导致分析偏误和预测质量下降。非Parametric, 粒子基本的 DA 算法具有更高的准确度,但在高维模型应用中仍存在操作挑战。本文从最近的生成式人工智能(AI)领域启发,提出一种新的非线性估计理论,以尝试桥接现有 DA 方法ología 的空缺。特别是,一种 conjugate transform filter(CTF)被 derivation 和证明能够泛化 kalman 筛到任意非 Gaussian 分布。新筛有多个愉悦性质,如保持先前状态的统计关系和 converge 到高精度观测。一种 ensemble approximation of the new theory(ECTF)也被提出和验证,使用 идеalized 统计实验,这些实验中的量均具有非 Gaussian 分布,是地球系统模型中的普遍问题。实验结果表明,ECTF 在观测Error 小于预测 uncertainty 以及状态变量具有强非线性关系时,具有最大的优势。最后,新的 filtering 理论提供了改进传统 DA 算法的原则性 интеграción with AI 技术的推动力。

SD-PINN: Deep Learning based Spatially Dependent PDEs Recovery

  • paper_url: http://arxiv.org/abs/2310.10970
  • repo_url: None
  • paper_authors: Ruixian Liu, Peter Gerstoft
  • for: 这篇论文是为了描述一种能够直接从物理测量数据中恢复部分偏微分方程(PDE)的含义的物理学习神经网络(PINN)的扩展。
  • methods: 该方法使用一个具有空间依赖性的物理学习神经网络(SD-PINN),可以通过单个神经网络来恢复空间依赖性的PDE含义。该方法还利用物理约束来降低噪声的影响。
  • results: 该方法可以充分利用物理约束来恢复PDE含义,并且可以在没有测量数据的情况下,通过含义低级假设来恢复PDE含义。
    Abstract The physics-informed neural network (PINN) is capable of recovering partial differential equation (PDE) coefficients that remain constant throughout the spatial domain directly from physical measurements. In this work, we propose a spatially dependent physics-informed neural network (SD-PINN), which enables the recovery of coefficients in spatially-dependent PDEs using a single neural network, eliminating the requirement for domain-specific physical expertise. The proposed method exhibits robustness to noise owing to the incorporation of physical constraints. It can also incorporate the low-rank assumption of the spatial variation for the PDE coefficients to recover the coefficients at locations without available measurements.
    摘要 физи学信息泛化神经网络(PINN)可以直接从物理测量中提取常数partial differential equation(PDE)的征量,这些征量在空间领域中保持常数。在这个工作中,我们提议使用空间依赖的 физи学信息泛化神经网络(SD-PINN),它使得可以使用单个神经网络来恢复空间依赖的PDE征量,消除了域pecific的物理专业知识的需求。该方法具有鲁棒性于噪声,并可以 incorporate 空间变化的low-rank假设来恢复征量在测量位置之外的位置。Note: "PINN" and "SD-PINN" in the text are abbreviations for "physics-informed neural network" and "spatially-dependent physics-informed neural network", respectively.

The neural network models with delays for solving absolute value equations

  • paper_url: http://arxiv.org/abs/2310.10965
  • repo_url: None
  • paper_authors: Dongmei Yu, Gehao Zhang, Cairong Chen, Deren Han
  • for: 解决绝值方程 ($Ax - |x| - b = 0$)
  • methods: 使用倒计时间神经网络模型和离散延迟神经网络模型,以及 Lyapunov-Krasovskii 理论和线性矩阵不等式(LMI)方法
  • results: 能够解决一类绝值方程,其中 $A^{-1}$ 的范数大于 1Here’s the breakdown of each point:1. For: This point states that the paper is written for solving the absolute value equation (AVE).2. Methods: This point lists the methods used in the paper, including the use of inverse-free neural network models with discrete delays, as well as the Lyapunov-Krasovskii theory and LMI method.3. Results: This point states the main result of the paper, which is that the proposed neural network models are exponentially convergent to the solution of the AVE, and can solve a class of AVE with $|A^{-1}|>1$.
    Abstract An inverse-free neural network model with mixed delays is proposed for solving the absolute value equation (AVE) $Ax -|x| - b =0$, which includes an inverse-free neural network model with discrete delay as a special case. By using the Lyapunov-Krasovskii theory and the linear matrix inequality (LMI) method, the developed neural network models are proved to be exponentially convergent to the solution of the AVE. Compared with the existing neural network models for solving the AVE, the proposed models feature the ability of solving a class of AVE with $\|A^{-1}\|>1$. Numerical simulations are given to show the effectiveness of the two delayed neural network models.
    摘要 <>转换给定文本到简化中文。>一种无反函数神经网络模型,包括杂度延迟,被提出来解决绝值方程(AVE) $Ax - |x| - b = 0$。这个模型包括杂度延迟神经网络模型作为特殊情况。通过利用利阿普涅夫-克拉索夫斯基理论和线性矩阵不等式(LMI)方法,我们证明了这些神经网络模型在AVE的解的抽象 convergent。与现有的神经网络模型相比,我们的模型可以解决一类AVE中,$ \|A^{-1}\|>1$。numerical simulations 给出了这两种延迟神经网络模型的效果。

A Local Graph Limits Perspective on Sampling-Based GNNs

  • paper_url: http://arxiv.org/abs/2310.10953
  • repo_url: None
  • paper_authors: Yeganeh Alimohammadi, Luana Ruiz, Amin Saberi
  • for: 这项研究旨在提供一种训练图神经网络(GNNs)的理论框架,用于处理大输入图。
  • methods: 这项研究使用了采样方法,将大输入图分解成小固定大小的子图进行训练。
  • results: 研究发现,通过采样训练GNNs,可以在减少训练时间和数据量的情况下,保持模型的性能。在一个节点分类任务中,研究发现,使用小型子图进行采样训练,可以达到与直接训练在原始图上的性能相似的水平。
    Abstract We propose a theoretical framework for training Graph Neural Networks (GNNs) on large input graphs via training on small, fixed-size sampled subgraphs. This framework is applicable to a wide range of models, including popular sampling-based GNNs, such as GraphSAGE and FastGCN. Leveraging the theory of graph local limits, we prove that, under mild assumptions, parameters learned from training sampling-based GNNs on small samples of a large input graph are within an $\epsilon$-neighborhood of the outcome of training the same architecture on the whole graph. We derive bounds on the number of samples, the size of the graph, and the training steps required as a function of $\epsilon$. Our results give a novel theoretical understanding for using sampling in training GNNs. They also suggest that by training GNNs on small samples of the input graph, practitioners can identify and select the best models, hyperparameters, and sampling algorithms more efficiently. We empirically illustrate our results on a node classification task on large citation graphs, observing that sampling-based GNNs trained on local subgraphs 12$\times$ smaller than the original graph achieve comparable performance to those trained on the input graph.
    摘要 我们提出一种理论框架,用于在大输入图上训练图神经网络(GNNs) via 训练小型、固定大小的采样子图。这个框架适用于各种采样基于GNNs,包括受欢迎的采样GNNs,如GraphSAGE和FastGCN。我们利用图本地限制理论,证明在某些假设下,从训练采样GNNs на小样本上获得的参数与训练同样模型在整个图上的参数在$\epsilon$-邻域内。我们 derive出参数数量、图的大小和训练步骤的上限,作为函数于$\epsilon$。我们的结果为使用采样在训练GNNs提供了新的理论理解,并表明通过训练GNNs于小样本上可以更加快速地确定最佳模型、 гиперпараметры和采样算法。我们在一个节点分类任务上对大量引用图进行了实验,发现采样基于GNNs训练于local子图12$\times$小于原始图的性能与训练于原始图相当。

Restricted Tweedie Stochastic Block Models

  • paper_url: http://arxiv.org/abs/2310.10952
  • repo_url: None
  • paper_authors: Jie Jian, Mu Zhu, Peijun Sang
  • for: 本文旨在提出一种基于非正态 Tweedie 分布的随机块模型(SBM),用于社区检测网络中的非负零Inflated 连接量。
  • methods: 本文提出了一种新的 SBM 模型,使用restricted Tweedie distribution来模型连接量,并考虑了节点信息,如国家之间的地理距离。
  • results: 在大量的 simulations 和实际国际贸易数据中,本文显示了该模型的效果。特别是,当 nodes 数足够多时,计算最大 likelihood 参数的过程可以独立地计算节点标签。这使得可以开发一种高效的 two-step 算法,将 covariate 效应与其他参数分离。
    Abstract The stochastic block model (SBM) is a widely used framework for community detection in networks, where the network structure is typically represented by an adjacency matrix. However, conventional SBMs are not directly applicable to an adjacency matrix that consists of non-negative zero-inflated continuous edge weights. To model the international trading network, where edge weights represent trading values between countries, we propose an innovative SBM based on a restricted Tweedie distribution. Additionally, we incorporate nodal information, such as the geographical distance between countries, and account for its dynamic effect on edge weights. Notably, we show that given a sufficiently large number of nodes, estimating this covariate effect becomes independent of community labels of each node when computing the maximum likelihood estimator of parameters in our model. This result enables the development of an efficient two-step algorithm that separates the estimation of covariate effects from other parameters. We demonstrate the effectiveness of our proposed method through extensive simulation studies and an application to real-world international trading data.
    摘要 Stochastic block model (SBM) 是一种广泛使用的社区探测模型,用于网络结构的表示,通常是一个相对矩阵。然而,传统的 SBM 不直接适用于具有非负零填充连接权重的邻接矩阵。为了模型国际贸易网络,其中边权重表示国家之间贸易值,我们提出了一种创新的 SBM,基于限制的 Tweedie 分布。此外,我们还考虑了节点信息,如国家之间的地理距离,并考虑其动态影响边权重。我们发现,当节点数足够多时,计算每个节点社区标签的最大 LIKELIHOOD 估计器中计算 covariate 效应的结果,是独立的。这一结果允许我们开发一种高效的 two-step 算法,将 covariate 效应与其他参数分离。我们通过广泛的 simulations 研究和实际国际贸易数据应用,证明了我们提出的方法的有效性。

Combat Urban Congestion via Collaboration: Heterogeneous GNN-based MARL for Coordinated Platooning and Traffic Signal Control

  • paper_url: http://arxiv.org/abs/2310.10948
  • repo_url: None
  • paper_authors: Xianyue Peng, Hang Gao, Hao Wang, H. Michael Zhang
  • for: 提高交通流量和减少拥堵
  • methods: 使用多智能体学习和交通理论来解决异质性和协调问题,并设计了各自的观察、操作和奖励函数来优化交通流量
  • results: 通过 SUMO 模拟,实现了交通流量等多个指标的协调结果,并与独立的信号控制或队列控制相比,表现更佳。
    Abstract Over the years, reinforcement learning has emerged as a popular approach to develop signal control and vehicle platooning strategies either independently or in a hierarchical way. However, jointly controlling both in real-time to alleviate traffic congestion presents new challenges, such as the inherent physical and behavioral heterogeneity between signal control and platooning, as well as coordination between them. This paper proposes an innovative solution to tackle these challenges based on heterogeneous graph multi-agent reinforcement learning and traffic theories. Our approach involves: 1) designing platoon and signal control as distinct reinforcement learning agents with their own set of observations, actions, and reward functions to optimize traffic flow; 2) designing coordination by incorporating graph neural networks within multi-agent reinforcement learning to facilitate seamless information exchange among agents on a regional scale. We evaluate our approach through SUMO simulation, which shows a convergent result in terms of various transportation metrics and better performance over sole signal or platooning control.
    摘要
  1. Designing platoon and signal control as distinct reinforcement learning agents with their own set of observations, actions, and reward functions to optimize traffic flow.2. Designing coordination by incorporating graph neural networks within multi-agent reinforcement learning to facilitate seamless information exchange among agents on a regional scale.We evaluate our approach through SUMO simulation, which shows a convergent result in terms of various transportation metrics and better performance over sole signal or platooning control.Translation notes:* “signal control” and “vehicle platooning” are translated as “信号控制” and “车辆队列”, respectively.* “heterogeneous graph multi-agent reinforcement learning” is translated as “多代理信号控制与车辆队列强化学习”.* “coordination” is translated as “协调”.* “ SUMO simulation” is translated as “SUMO仿真”.

Multi-point Feedback of Bandit Convex Optimization with Hard Constraints

  • paper_url: http://arxiv.org/abs/2310.10946
  • repo_url: None
  • paper_authors: Yasunari Hikima
  • for: 本 paper 研究了带约束的带射搜索优化问题,learner 需要在部分损失函数信息的情况下生成一个序列决策,以实现总损失降低和约束违反降低。
  • methods: 我们采用了累积硬约束违反度作为约束违反度的度量,即 $\sum_{t=1}^{T} \max{g_t(\boldsymbol{x}_t), 0}$. 由于最大运算,不可能通过不满足约束的解释取消约束违反的效果,与传统的长期约束违反度不同。我们提出了一种罚款基于的 proximal 梯度下降法,可以在梯度估计中使用两点函数评估。
  • results: 我们的算法可以实现 $O(d^2T^{\max{c,1-c}})$ regret bounds和 $O(d^2T^{1-\frac{c}{2})$ 约束违反度 bounds,其中 $d$ 是可行区域的维度,$c\in[\frac{1}{2}, 1)$ 是用户决定的参数。我们还扩展了结果到损失函数是强 convex 时的情况,并证明了 regret 和约束违反度 bounds 可以进一步降低。
    Abstract This paper studies bandit convex optimization with constraints, where the learner aims to generate a sequence of decisions under partial information of loss functions such that the cumulative loss is reduced as well as the cumulative constraint violation is simultaneously reduced. We adopt the cumulative \textit{hard} constraint violation as the metric of constraint violation, which is defined by $\sum_{t=1}^{T} \max\{g_t(\boldsymbol{x}_t), 0\}$. Owing to the maximum operator, a strictly feasible solution cannot cancel out the effects of violated constraints compared to the conventional metric known as \textit{long-term} constraints violation. We present a penalty-based proximal gradient descent method that attains a sub-linear growth of both regret and cumulative hard constraint violation, in which the gradient is estimated with a two-point function evaluation. Precisely, our algorithm attains $O(d^2T^{\max\{c,1-c\})$ regret bounds and $O(d^2T^{1-\frac{c}{2})$ cumulative hard constraint violation bounds for convex loss functions and time-varying constraints, where $d$ is the dimensionality of the feasible region and $c\in[\frac{1}{2}, 1)$ is a user-determined parameter. We also extend the result for the case where the loss functions are strongly convex and show that both regret and constraint violation bounds can be further reduced.
    摘要

Reaching the Limit in Autonomous Racing: Optimal Control versus Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.10943
  • repo_url: None
  • paper_authors: Yunlong Song, Angel Romero, Matthias Mueller, Vladlen Koltun, Davide Scaramuzza
  • for: This paper aims to design a control system for an agile mobile robot, specifically in the context of autonomous drone racing.
  • methods: The paper uses reinforcement learning (RL) to train a neural network controller, which outperforms optimal control (OC) methods in this setting.
  • results: The RL controller achieves superhuman control performance within minutes of training on a standard workstation, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour.Here is the same information in Simplified Chinese:
  • for: 这篇论文目标是设计一个适应度高的移动机器人控制系统,具体是在自动驾驶飞机比赛中进行。
  • methods: 这篇论文使用反馈学习(RL)来训练一个神经网络控制器,RL控制器在这个设定中超过优化控制(OC)方法的性能。
  • results: RL控制器在标准工作站上训练了几分钟后已经实现了人类控制性能的超越,达到了12倍重力加速度的峰值加速和108公里/小时的峰值速度。
    Abstract A central question in robotics is how to design a control system for an agile mobile robot. This paper studies this question systematically, focusing on a challenging setting: autonomous drone racing. We show that a neural network controller trained with reinforcement learning (RL) outperformed optimal control (OC) methods in this setting. We then investigated which fundamental factors have contributed to the success of RL or have limited OC. Our study indicates that the fundamental advantage of RL over OC is not that it optimizes its objective better but that it optimizes a better objective. OC decomposes the problem into planning and control with an explicit intermediate representation, such as a trajectory, that serves as an interface. This decomposition limits the range of behaviors that can be expressed by the controller, leading to inferior control performance when facing unmodeled effects. In contrast, RL can directly optimize a task-level objective and can leverage domain randomization to cope with model uncertainty, allowing the discovery of more robust control responses. Our findings allowed us to push an agile drone to its maximum performance, achieving a peak acceleration greater than 12 times the gravitational acceleration and a peak velocity of 108 kilometers per hour. Our policy achieved superhuman control within minutes of training on a standard workstation. This work presents a milestone in agile robotics and sheds light on the role of RL and OC in robot control.
    摘要 中心问题在 роботике是如何设计一个敏捷移动机器人的控制系统。这篇论文系统地研究这个问题,专注于一个挑战性的设定:自主无人机赛车。我们表明,使用强化学习(RL)训练的神经网络控制器在这个设定中表现得更好于优化控制(OC)方法。然后,我们研究了RL和OC之间的基本因素,发现RL的优势不在于更好地优化目标函数,而是在于优化更好的目标函数。OC将问题分解为规划和控制两个部分,使用显式中间表示(如轨迹)作为控制器的界面,这种分解限制控制器可表达的行为范围,导致面临不Modeled Effects时的控制性下降。相比之下,RL可以直接优化任务级目标函数,并通过随机化预测来快速适应模型不确定性,从而发现更加 Robust control response。我们的发现使我们可以将敏捷无人机 pushed to its maximum performance,达到了12倍重力加速度的峰值加速和108公里/小时的峰值速度。我们的策略在标准工作站上训练仅需几分钟便可以达到超人控制水平。这项工作为敏捷 роботиcs带来了里程碑,也照亮了RL和OC在机器人控制中的角色。

Fast and Simple Spectral Clustering in Theory and Practice

  • paper_url: http://arxiv.org/abs/2310.10939
  • repo_url: https://github.com/pmacg/fast-spectral-clustering
  • paper_authors: Peter Macgregor
  • for: 寻找图像中的k个团体
  • methods: 使用频谱归一化法,使用力量法计算顺序 embed 图像中的顶点
  • results: 可以快速地找到图像中的团体,并且准确地回归真实的团体结果
    Abstract Spectral clustering is a popular and effective algorithm designed to find $k$ clusters in a graph $G$. In the classical spectral clustering algorithm, the vertices of $G$ are embedded into $\mathbb{R}^k$ using $k$ eigenvectors of the graph Laplacian matrix. However, computing this embedding is computationally expensive and dominates the running time of the algorithm. In this paper, we present a simple spectral clustering algorithm based on a vertex embedding with $O(\log(k))$ vectors computed by the power method. The vertex embedding is computed in nearly-linear time with respect to the size of the graph, and the algorithm provably recovers the ground truth clusters under natural assumptions on the input graph. We evaluate the new algorithm on several synthetic and real-world datasets, finding that it is significantly faster than alternative clustering algorithms, while producing results with approximately the same clustering accuracy.
    摘要 spectral clustering 是一种流行的有效算法,用于在图 G 中找到 $k$ 个群。 classical spectral clustering 算法中,图 vertices 被嵌入到 $\mathbb{R}^k$ 中使用 $k$ 个图 Laplacian 矩阵的特征值。然而,计算这个嵌入是计算成本高昂,对算法的运行时间产生很大影响。在本文中,我们提出了一种简单的 spectral clustering 算法,基于一个 $O(\log(k))$ 维的顶点嵌入,通过力方法计算。顶点嵌入的计算时间与图的大小近似线性,而且算法可以证明地回归真实的群集,以下面的自然假设。我们对这种新算法在一些 sintetic 和实际世界数据集上进行了评估,发现它比其他归一化算法更快速,并且生成的结果与实际结果相似。

Machine Learning in the Quantum Age: Quantum vs. Classical Support Vector Machines

  • paper_url: http://arxiv.org/abs/2310.10910
  • repo_url: None
  • paper_authors: Davut Emre Tasar, Kutan Koruyan, Ceren Ocal Tasar
  • For: 本研究探讨机器学习算法在经典和量子计算模式下的效率。具体来说,通过强调支持向量机器(SVM),我们研究了使用量子硬件进行分类的量子支持向量机器(QSVM)的性能。* Methods: 本研究采用了一系列实验,使用Qiskit库进行实现,并进行了参数优化。* Results: 发现在某些情况下,QSVM可以与经典SVM匹敌,但是现在的执行时间比较长。此外,我们发现,随着量子计算能力的提高和平行计算的增加,量子机器学习算法的性能可以得到明显改善。这项研究为未来量子机器学习应用提供了有价值的信息。
    Abstract This work endeavors to juxtapose the efficacy of machine learning algorithms within classical and quantum computational paradigms. Particularly, by emphasizing on Support Vector Machines (SVM), we scrutinize the classification prowess of classical SVM and Quantum Support Vector Machines (QSVM) operational on quantum hardware over the Iris dataset. The methodology embraced encapsulates an extensive array of experiments orchestrated through the Qiskit library, alongside hyperparameter optimization. The findings unveil that in particular scenarios, QSVMs extend a level of accuracy that can vie with classical SVMs, albeit the execution times are presently protracted. Moreover, we underscore that augmenting quantum computational capacity and the magnitude of parallelism can markedly ameliorate the performance of quantum machine learning algorithms. This inquiry furnishes invaluable insights regarding the extant scenario and future potentiality of machine learning applications in the quantum epoch. Colab: https://t.ly/QKuz0
    摘要 Note:* " classical" and "quantum" are translated as "古典" and "量子" respectively.* "computational paradigms" is translated as "计算框架"* "Support Vector Machines" is translated as "支持向量机"* "Quantum Support Vector Machines" is translated as "量子支持向量机"* "hyperparameter optimization" is translated as "超参数优化"* " execution times" is translated as "执行时间"* "quantum computational capacity" is translated as "量子计算能力"* "magnitude of parallelism" is translated as "并行性的大小"

Heterogenous Memory Augmented Neural Networks

  • paper_url: http://arxiv.org/abs/2310.10909
  • repo_url: https://github.com/qiuzh20/hma
  • paper_authors: Zihan Qiu, Zhen Liu, Shuicheng Yan, Shanghang Zhang, Jie Fu
  • for: 本研究旨在提出一种基于异构记忆扩展的神经网络方法,以提高神经网络在数据稀缺和 OUT-OF-DISTRIBUTION(OOD)场景中的表现。
  • methods: 该方法通过引入学习记忆标记和注意机制,使得神经网络可以更好地处理大量数据,而无需增加巨大的计算成本。
  • results: 经过广泛的实验证明,该方法可以与不同的底层模型(MLP、CNN、GNN和Transformer)结合使用,并在图像和图格等任务下显示出竞争性的表现,特别是在数据稀缺和OOD情况下。
    Abstract It has been shown that semi-parametric methods, which combine standard neural networks with non-parametric components such as external memory modules and data retrieval, are particularly helpful in data scarcity and out-of-distribution (OOD) scenarios. However, existing semi-parametric methods mostly depend on independent raw data points - this strategy is difficult to scale up due to both high computational costs and the incapacity of current attention mechanisms with a large number of tokens. In this paper, we introduce a novel heterogeneous memory augmentation approach for neural networks which, by introducing learnable memory tokens with attention mechanism, can effectively boost performance without huge computational overhead. Our general-purpose method can be seamlessly combined with various backbones (MLP, CNN, GNN, and Transformer) in a plug-and-play manner. We extensively evaluate our approach on various image and graph-based tasks under both in-distribution (ID) and OOD conditions and show its competitive performance against task-specific state-of-the-art methods. Code is available at \url{https://github.com/qiuzh20/HMA}.
    摘要 研究表明,半 Parametric 方法,将标准神经网络与非 Parametric 组件相结合,如外部记忆模块和数据检索,在数据缺乏和外部分布(OOD)场景中特别有帮助。然而,现有的半 Parametric 方法通常依赖于独立的原始数据点,这种策略难以扩展,因为它们的计算成本高,以及当前的注意机制无法处理大量的 tokens。在这篇文章中,我们介绍了一种新的异 heterogeneous 记忆扩展方法,通过引入学习记忆тоoken和注意机制,可以有效提高性能,而无需巨大的计算负担。我们的通用方法可以与不同的基础结构(MLP、CNN、GNN、Transformer)混合使用,并且可以在插入式方式下运行。我们对各种图像和图Structured 任务进行了广泛的评估,并在ID和OOD条件下显示了与任务特有的状态UNTUK 的竞争性性能。代码可以在 \url{https://github.com/qiuzh20/HMA} 上获取。

Surrogate Active Subspaces for Jump-Discontinuous Functions

  • paper_url: http://arxiv.org/abs/2310.10907
  • repo_url: None
  • paper_authors: Nathan Wycoff
  • for: 该研究旨在探讨Surrogate模型和活动子空间在社会科学计算中的应用,特别是对于粒子模型,以及这些技术在处理离散函数时的限制。
  • methods: 该研究使用了Gaussian процеessed估计active subspace,并对离散函数进行了扩展,以便更好地理解估计中的量。数据进行了比较分析,并在Flee模型中进行了应用,以获得关于8个迁徙危机的新的发现。
  • results: 研究发现,在离散函数上使用Gaussian процеessed估计active subspace可以提供有用的信息,并且可以帮助理解模型中重要的参数。在Flee模型中,该方法提供了新的发现,对8个迁徙危机进行了分析。
    Abstract Surrogate modeling and active subspaces have emerged as powerful paradigms in computational science and engineering. Porting such techniques to computational models in the social sciences brings into sharp relief their limitations in dealing with discontinuous simulators, such as Agent-Based Models, which have discrete outputs. Nevertheless, prior applied work has shown that surrogate estimates of active subspaces for such estimators can yield interesting results. But given that active subspaces are defined by way of gradients, it is not clear what quantity is being estimated when this methodology is applied to a discontinuous simulator. We begin this article by showing some pathologies that can arise when conducting such an analysis. This motivates an extension of active subspaces to discontinuous functions, clarifying what is actually being estimated in such analyses. We also conduct numerical experiments on synthetic test functions to compare Gaussian process estimates of active subspaces on continuous and discontinuous functions. Finally, we deploy our methodology on Flee, an agent-based model of refugee movement, yielding novel insights into which parameters of the simulation are most important across 8 displacement crises in Africa and the Middle East.
    摘要 《代理模型和活跃子空间在计算科学和工程领域已经成为强大的趋势。将这些技术应用到社会科学计算模型上可以使其局限性得到鲜明的表现。然而,先前的应用研究表明,代理估计的活跃子空间可以得到有趣的结果。但是,由于活跃子空间是通过梯度定义的,因此不清楚什么量在这种分析中是被估计的。本文开始通过显示一些在进行这种分析时可能出现的病理来激发扩展。这些病理的出现强化了我们对离散函数的扩展。我们还对 synthetic 测试函数进行了数值实验,比较了 Gaussian 过程的活跃子空间估计在连续和离散函数上。最后,我们将我们的方法应用于 Flee,一个基于代理的难民移动模型,并得到了8个驱逐危机在非洲和中东的新的发现。》Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and widely used in other countries. If you need Traditional Chinese, please let me know.

Analyzing Modularity Maximization in Approximation, Heuristic, and Graph Neural Network Algorithms for Community Detection

  • paper_url: http://arxiv.org/abs/2310.10898
  • repo_url: None
  • paper_authors: Samin Aref, Mahdi Mostajabdaveh
  • for: 本研究旨在探讨不同模块化最优化算法在网络Community detection中的性能。
  • methods: 本研究使用了104个网络,包括了真实世界的例子和Synthetic图的模块结构。研究使用了10种模块最优化算法,包括8种搜索算法、2种变种的图神经网络算法和Bayan Approximation算法的多种变种。
  • results: 研究发现,大多数常用的模块性最优化算法 rarely produce an optimal partition or a partition resembling an optimal partition,即使网络具有模块结构。如果使用模块性来探讨社群,则应用近似优化算法更加合理。
    Abstract Community detection, a fundamental problem in computational sciences, finds applications in various domains. Heuristics are often employed to detect communities through maximizing an objective function, modularity, over partitions of network nodes. Our research delves into the performance of different modularity maximization algorithms in achieving optimal partitions. We use 104 networks, comprising real-world instances from diverse contexts and synthetic graphs with modular structures. We analyze ten inexact modularity-based algorithms against an exact baseline which is an exact integer programming method that globally optimizes modularity. The ten algorithms analyzed include eight heuristics, two variations of a graph neural network algorithm, and several variations of the Bayan approximation algorithm. Our analysis uncovers substantial dissimilarities between the partitions obtained by most commonly used modularity-based methods and any optimal partition of the networks, as indicated by both adjusted and reduced mutual information metrics. Importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of the commonly used unguaranteed modularity-based methods for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition even on networks with modular structures. If modularity is to be used for detecting communities, approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.
    摘要 We analyze ten inexact modularity-based algorithms, including eight heuristics, two variations of a graph neural network algorithm, and several variations of the Bayan approximation algorithm, against an exact baseline that globally optimizes modularity using integer programming. Our analysis reveals that the partitions obtained by most commonly used modularity-based methods are often substantially dissimilar to any optimal partition, as indicated by both adjusted and reduced mutual information metrics. Furthermore, we find that near-optimal partitions are often disproportionately dissimilar to any optimal partition.Our results suggest that the commonly used unguaranteed modularity-based methods for discovering communities are rarely able to produce an optimal partition or a partition resembling an optimal partition, even on networks with modular structures. As a result, we recommend using approximate optimization algorithms for a more methodologically sound usage of modularity within its applicability limits.