cs.LG - 2023-07-18

Enhancing Pattern Classification in Support Vector Machines through Matrix Formulation

  • paper_url: http://arxiv.org/abs/2307.09372
  • repo_url: None
  • paper_authors: Sambhav Jain Reshma Rastogi
  • for: 本研究 paper 的目的是提出一种矩阵形式的支持向量机器 (Matrix SVM),以解决现有 SVM 模型在多类和多标签 Setting 中的限制。
  • methods: 本研究使用 Accelerated Gradient Descent 方法在 dual 中进行优化,以提高解决 Matrix-SVM 问题的效率。
  • results: 实验结果表明,Matrix SVM 在多标签和多类数据集上可以 достичь更高的时间效果,同时保持与 Binary Relevance SVM 相同的结果 Waterfall 。此外,矩阵形式还提供了一些透彻的意见和优势,可能不会在传统的 vector-based notation 中显示出来。
    Abstract Support Vector Machines (SVM) have gathered significant acclaim as classifiers due to their successful implementation of Statistical Learning Theory. However, in the context of multiclass and multilabel settings, the reliance on vector-based formulations in existing SVM-based models poses limitations regarding flexibility and ease of incorporating additional terms to handle specific challenges. To overcome these limitations, our research paper focuses on introducing a matrix formulation for SVM that effectively addresses these constraints. By employing the Accelerated Gradient Descent method in the dual, we notably enhance the efficiency of solving the Matrix-SVM problem. Experimental evaluations on multilabel and multiclass datasets demonstrate that Matrix SVM achieves superior time efficacy while delivering similar results to Binary Relevance SVM. Moreover, our matrix formulation unveils crucial insights and advantages that may not be readily apparent in traditional vector-based notations. We emphasize that numerous multilabel models can be viewed as extensions of SVM, with customised modifications to meet specific requirements. The matrix formulation presented in this paper establishes a solid foundation for developing more sophisticated models capable of effectively addressing the distinctive challenges encountered in multilabel learning.
    摘要 支持向量机 (SVM) 在分类方面受到了广泛的赞誉,因为它们成功地应用了统计学学习理论。然而,在多类和多标签设置下,现有的 SVM 基本模型的向量化表述带来了灵活性和特定挑战处理的局限性。为了突破这些限制,我们的研究论文关注在 introducing 矩阵表述方法来解决这些问题。在 dual 中使用加速 gradient descent 方法,我们可以 notable 提高矩阵-SVM 问题的解决效率。实验评估在多标签和多类 datasets 上表明,矩阵 SVM 可以很快地完成任务,同时与 binary relevance SVM 的结果相似。此外,我们的矩阵表述还揭示了一些不太明显的优点和意义,它们可能不会在传统的向量化notation中得到表达。我们强调,许多多标签模型可以被视为 SVM 的扩展,通过自定义修改来满足特定的需求。矩阵表述在这篇论文中建立了一个坚实的基础,用于开发更加复杂的模型,以更好地Addressing 多标签学习中的特殊挑战。

Explanation-Guided Fair Federated Learning for Transparent 6G RAN Slicing

  • paper_url: http://arxiv.org/abs/2307.09494
  • repo_url: None
  • paper_authors: Swastika Roy, Hatim Chergui, Christos Verikoukis
  • for: 这个论文主要目标是建立在6G网络自动化中的透明性和可信度,通过使用可解释人工智能(XAI)技术来帮助建立AI黑obox中的信任。
  • methods: 这篇论文使用了closed-loop自动化和解释导向学习(EGL)的方法,并采用Jensen-Shannon(JS)差分来评估模型的解释。
  • results: 实验结果表明,提出的EGFL-JS方案可以提高了6G网络中RAN掉 Package的损失概率预测的可靠性和公平性,相比之下其他基于文献的基elines的性能提高了 более50%,并且提高了Recall metric的评价分。
    Abstract Future zero-touch artificial intelligence (AI)-driven 6G network automation requires building trust in the AI black boxes via explainable artificial intelligence (XAI), where it is expected that AI faithfulness would be a quantifiable service-level agreement (SLA) metric along with telecommunications key performance indicators (KPIs). This entails exploiting the XAI outputs to generate transparent and unbiased deep neural networks (DNNs). Motivated by closed-loop (CL) automation and explanation-guided learning (EGL), we design an explanation-guided federated learning (EGFL) scheme to ensure trustworthy predictions by exploiting the model explanation emanating from XAI strategies during the training run time via Jensen-Shannon (JS) divergence. Specifically, we predict per-slice RAN dropped traffic probability to exemplify the proposed concept while respecting fairness goals formulated in terms of the recall metric which is included as a constraint in the optimization task. Finally, the comprehensiveness score is adopted to measure and validate the faithfulness of the explanations quantitatively. Simulation results show that the proposed EGFL-JS scheme has achieved more than $50\%$ increase in terms of comprehensiveness compared to different baselines from the literature, especially the variant EGFL-KL that is based on the Kullback-Leibler Divergence. It has also improved the recall score with more than $25\%$ relatively to unconstrained-EGFL.
    摘要 未来零点touch的人工智能(AI)驱动的6G网络自动化需要建立AI黑盒子中的信任,通过可解释人工智能(XAI),其中AI的忠诚性将被视为服务级别协议(SLA)度量标准和电信键性表现指标(KPI)。这意味着利用XAI输出生成透明和不偏的深度神经网络(DNNs)。驱动closed-loop(CL)自动化和解释导向学习(EGL),我们设计了一种解释导向联合学习(EGFL)方案,以确保可靠的预测,通过在训练过程中使用XAI策略生成的模型解释。例如,我们预测了无线网络承载层损失报告概率,以 illustrate the proposed concept,并且遵循公平性目标表示为Recall度量,并包含在优化任务中。最后,我们采用了completeness分数来评估和验证解释的 faithfulness 量化。实验结果显示,我们的EGFL-JS方案在比较多个基elines之前获得了超过50%的提高,特别是与基于Kullback-Leibler分配的EGFL-KL变体相比,以及不受限制的EGFL。此外,EGFL-JS还提高了Recall分数,相比未受限制的EGFL,提高了超过25%。

Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

  • paper_url: http://arxiv.org/abs/2307.09366
  • repo_url: None
  • paper_authors: Kayhan Behdin, Wenyu Chen, Rahul Mazumder
  • For: The paper aims to estimate the inverse covariance matrix of a multivariate Gaussian distribution, assuming it is sparse.* Methods: The proposed method, called GraphL0BnB, is based on an $\ell_0$-penalized version of the pseudolikelihood function and uses a custom nonlinear branch-and-bound framework to solve the resulting mixed integer program.* Results: The paper reports numerical experiments on real and synthetic datasets that demonstrate the effectiveness of GraphL0BnB in solving the problem to near-optimality, even for large problem instances with $p = 10^4$ variables. The paper also compares the performance of GraphL0BnB with various state-of-the-art approaches.Here are the three points in Simplified Chinese:* For: 本文目标是估计多变量 Gaussian 分布下的对角矩阵,假设它是稀疏的。* Methods: 提议的方法是基于 $\ell_0$ 约束的 Pseudolikelihood 函数,使用自定义的非线性分支和约束搜索 Framework 解决 resulting 混合整数程序。* Results: 文章报告了使用真实和 sintetic 数据进行的数值实验,表明 GraphL0BnB 可以准确地解决问题,即使 пробле 的规模很大,例如 $p = 10^4$ 变量。文章还比较了 GraphL0BnB 与不同的现有方法的性能。
    Abstract We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB, a new estimator based on an $\ell_0$-penalized version of the pseudolikelihood function, while most earlier approaches are based on the $\ell_1$-relaxation. Our estimator can be formulated as a convex mixed integer program (MIP) which can be difficult to compute at scale using off-the-shelf commercial solvers. To solve the MIP, we propose a custom nonlinear branch-and-bound (BnB) framework that solves node relaxations with tailored first-order methods. As a by-product of our BnB framework, we propose large-scale solvers for obtaining good primal solutions that are of independent interest. We derive novel statistical guarantees (estimation and variable selection) for our estimator and discuss how our approach improves upon existing estimators. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 10^4$ -- corresponding to a symmetric matrix of size $p \times p$ with $p^2/2$ binary variables. We demonstrate the usefulness of GraphL0BnB versus various state-of-the-art approaches on a range of datasets.
    摘要 我们考虑一个学习简单Graph的问题,即在无向 Gaussian 统计模型中学习一个简单的 inverse covariance 矩阵(即精度矩阵)。假设这个矩阵只有几个非零元素。我们提出了一个新的估计器,即 GraphL0BnB,它基于一个 $\ell_0$-检测的扩展版 pseudolikelihood 函数。大多数先前的方法则是基于 $\ell_1$-缓和。我们的估计器可以表示为一个内部为整数的混合整数程式(MIP),它可能需要大量的计算资源使用现成的商业解决方案。为解决 MIP,我们提出了一个自定义的非线性分支与缓解(BnB)框架,它可以解决节点缓和使用特制的首项方法。作为我们的 BnB 框架的一个副产物,我们提出了一些大规模的解决方案,可以实现良好的原始解。我们 derive novel 的 statistically garantuees(估计和变数选择) для我们的估计器,并讨论了我们的方法与先前的方法之间的优化。我们的数据实验表明,我们的方法可以对实际数据进行几乎优质的估计,并且可以解决具有 $p = 10^4$ 的问题,即一个对应的矩阵中的 $p^2/2$ 个二进制变数。我们还证明了我们的方法在各种数据集上的优化。

An Evaluation of Zero-Cost Proxies – from Neural Architecture Performance to Model Robustness

  • paper_url: http://arxiv.org/abs/2307.09365
  • repo_url: None
  • paper_authors: Jovita Lukasik, Michael Moeller, Margret Keuper
  • for: 本文研究zero-cost proxy的应用在 neural architecture search 中,特别是在robustness和clean accuracy之间的joint search。
  • methods: 本文使用现有的zero-cost proxy来预测模型的性能,并分析这些proxy的特征重要性。
  • results: 本文发现,单个proxy预测robustness的任务相对更加困难,需要考虑多个proxy来预测模型的robustness。同时,clean accuracy可以由单个proxy进行预测。
    Abstract Zero-cost proxies are nowadays frequently studied and used to search for neural architectures. They show an impressive ability to predict the performance of architectures by making use of their untrained weights. These techniques allow for immense search speed-ups. So far the joint search for well-performing and robust architectures has received much less attention in the field of NAS. Therefore, the main focus of zero-cost proxies is the clean accuracy of architectures, whereas the model robustness should play an evenly important part. In this paper, we analyze the ability of common zero-cost proxies to serve as performance predictors for robustness in the popular NAS-Bench-201 search space. We are interested in the single prediction task for robustness and the joint multi-objective of clean and robust accuracy. We further analyze the feature importance of the proxies and show that predicting the robustness makes the prediction task from existing zero-cost proxies more challenging. As a result, the joint consideration of several proxies becomes necessary to predict a model's robustness while the clean accuracy can be regressed from a single such feature.
    摘要 现在,零成本代理常常被研究和使用来搜索神经网络架构。它们能够很好地预测架构的性能,只使用未训练的权重。这些技术可以大大减少搜索速度。迄今为止,搜索well-performing和Robust Architecture在神经网络搜索领域中得到了相对较少的关注。因此,零成本代理的主要焦点是神经网络的净精度,而模型的稳定性应该具有相同的重要性。在这篇论文中,我们分析了常见的零成本代理是否能够用来预测模型的稳定性在NAS-Bench-201搜索空间中。我们对单个预测任务和多目标任务(净精度和稳定性)进行分析。我们还分析了代理的特征重要性,并发现预测稳定性使得预测任务更加困难。因此,需要结合多个代理来预测模型的稳定性,而净精度可以从单个特征进行回归。

MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments

  • paper_url: http://arxiv.org/abs/2307.09361
  • repo_url: None
  • paper_authors: Spyros Gidaris, Andrei Bursuc, Oriane Simeoni, Antonin Vobecky, Nikos Komodakis, Matthieu Cord, Patrick Pérez
  • for: 降低视Transformer网络的贪吃需求,使用自监学习来减少大量完全标注数据的需求。
  • methods: 提出了一种单Stage和独立的方法MOCA,通过使用高级特征定义的面积预测任务来捕捉具有良好上下文理解性和图像变化不变性的两种自监学习方法。
  • results: 在低投入设定下实现新的状态纪录Results,并在多种评估协议中显示出了强大的实验性能,训练时间至少3倍 быстреeder than先前的方法。
    Abstract Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks for very large fully-annotated datasets. Different classes of self-supervised learning offer representations with either good contextual reasoning properties, e.g., using masked image modeling strategies, or invariance to image perturbations, e.g., with contrastive methods. In this work, we propose a single-stage and standalone method, MOCA, which unifies both desired properties using novel mask-and-predict objectives defined with high-level features (instead of pixel-level details). Moreover, we show how to effectively employ both learning paradigms in a synergistic and computation-efficient way. Doing so, we achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols with a training that is at least 3 times faster than prior methods.
    摘要 自我监督学习可以用于减轻视力转换网络的贪吃需求,需要很大的完全标注数据集。不同类型的自我监督学习可以提供具有不同特性的表示,如使用遮盲图像模型策略获得良好的上下文理解性,或者使用对比方法获得图像变化的抗变异性。在这项工作中,我们提出了一种单阶段、独立的方法MOCA,它通过定义高级特征的新式遮盲预测目标来 объединить这两种愿望的特性。此外,我们还示了如何有效地使用这两种学习方法,以实现 computation-efficient 的synergistic效果。通过这种方法,我们在低投入设定下 achieve new state-of-the-art 的结果,并在多种评估协议中获得了强大的实验结果,并且训练时间至少3倍 быстреeder than priori方法。

Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

  • paper_url: http://arxiv.org/abs/2307.09357
  • repo_url: https://github.com/IBM/aihwkit
  • paper_authors: Manuel Le Gallo, Corey Lammie, Julian Buechel, Fabio Carta, Omobayode Fagbohungbe, Charles Mackin, Hsinyu Tsai, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui, Malte J. Rasch
  • for: 本文旨在介绍如何在Analog In-Memory Computing(AIMC)硬件上部署深度神经网络(DNN)推理和训练,以实现与数字计算相同的准确性。
  • methods: 本文使用IBM的Analog Hardware Acceleration Kit(AIHWKit)Python库来模拟DNN的推理和训练。AIHWKit提供了各种功能和最佳实践来进行推理和训练。
  • results: 本文提供了AIHWKit在推理和训练DNN时的性能分析和评估。此外,本文还介绍了Analog AI Cloud Composer,它提供了使用AIHWKit simulation platform的完全托管云环境的优势。
    Abstract Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial.
    摘要 智能存储计算(AIMC)是一种有前途的方法,以减少深度神经网络(DNN)的延迟和能耗。然而,设备特性和周围电路在AIMC芯片上是不准确的,需要适应DNN来实现相同的准确率。在这个教程中,我们将提供深入的解释如何实现和评估这些适应,以及使用IBM的分析硬件加速器包(AIHWKit)进行 simulate inference和训练。AIHWKit是一个基于Python的库,可以模拟DNN的推理和训练 using AIMC。我们将提供AIHWKit的设计、功能和最佳实践,以及使用Analog AI Cloud Composer在云环境中实现相应的优势。最后,我们将展示用户如何扩展和自定义AIHWKit。这个教程的详细 Jupyter Notebook 代码示例可以从https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial下载。

Learning to Select SAT Encodings for Pseudo-Boolean and Linear Integer Constraints

  • paper_url: http://arxiv.org/abs/2307.09342
  • repo_url: https://github.com/felixvuo/lease-data
  • paper_authors: Felix Ulrich-Oltean, Peter Nightingale, James Alfred Walker
  • for: 解决复杂的满足和优化问题
  • methods: 使用超级vised机器学习方法选择编码
  • results: 比AutoFolio好,可以选择不同类型的问题编码
    Abstract Many constraint satisfaction and optimisation problems can be solved effectively by encoding them as instances of the Boolean Satisfiability problem (SAT). However, even the simplest types of constraints have many encodings in the literature with widely varying performance, and the problem of selecting suitable encodings for a given problem instance is not trivial. We explore the problem of selecting encodings for pseudo-Boolean and linear constraints using a supervised machine learning approach. We show that it is possible to select encodings effectively using a standard set of features for constraint problems; however we obtain better performance with a new set of features specifically designed for the pseudo-Boolean and linear constraints. In fact, we achieve good results when selecting encodings for unseen problem classes. Our results compare favourably to AutoFolio when using the same feature set. We discuss the relative importance of instance features to the task of selecting the best encodings, and compare several variations of the machine learning method.
    摘要 许多约束满足和优化问题可以有效地通过编码为布尔满足问题(SAT)解决。然而,最简单的约束类型还有许多编码方法在文献中,这些编码方法之间的性能差异很大,选择适合的编码方法 для给定问题实例是一个不容易的问题。我们使用监督式机器学习方法来选择编码方法,并证明可以使用标准的约束问题特征集来选择编码方法,但是使用专门为 Pseudo-Boolean 和线性约束设计的新特征集可以获得更好的性能。我们在使用同一集特征时与 AutoFolio 进行比较,并讨论实例特征对选择最佳编码方法的重要性。我们还比较了几种机器学习方法的变种。

Towards Automated Semantic Segmentation in Mammography Images

  • paper_url: http://arxiv.org/abs/2307.10296
  • repo_url: None
  • paper_authors: Cesar A. Sierra-Franco, Jan Hurtado, Victor de A. Thomaz, Leonardo C. da Cruz, Santiago V. Silva, Alberto B. Raposo
  • for: 检测非可触护乳腺癌,提供诊断和评估图像质量的机会。
  • methods: 使用深度学习框架自动 segmenting 乳腺、肌肉、肉细胞和脂肪组织的边界。
  • results: 在多种不同的框架和图像 dataset 下,实现了准确的 segmentation 性能,表明该框架可以在临床实践中整合。
    Abstract Mammography images are widely used to detect non-palpable breast lesions or nodules, preventing cancer and providing the opportunity to plan interventions when necessary. The identification of some structures of interest is essential to make a diagnosis and evaluate image adequacy. Thus, computer-aided detection systems can be helpful in assisting medical interpretation by automatically segmenting these landmark structures. In this paper, we propose a deep learning-based framework for the segmentation of the nipple, the pectoral muscle, the fibroglandular tissue, and the fatty tissue on standard-view mammography images. We introduce a large private segmentation dataset and extensive experiments considering different deep-learning model architectures. Our experiments demonstrate accurate segmentation performance on variate and challenging cases, showing that this framework can be integrated into clinical practice.
    摘要 乳影像广泛用于检测不可触感乳腺癌病或肿块,预防癌病并提供诊断和治疗计划时的机会。确定一些关键结构的标识是诊断和评估影像质量的关键。因此,计算机助成检测系统可以帮助医疗解释人员自动分割关键结构。在这篇论文中,我们提出了基于深度学习的框架,用于标识乳膜、肌肉、肉絮组织和脂肪组织在标准视图乳影像中的自动分割。我们提供了大量私有分割数据集和广泛的实验,考虑了不同的深度学习模型架构。我们的实验结果表明,这种框架在多种和复杂的案例中具有高准确性,这表明该框架可以在临床实践中集成。

Exploiting Field Dependencies for Learning on Categorical Data

  • paper_url: http://arxiv.org/abs/2307.09321
  • repo_url: https://github.com/csiro-robotics/mdl
  • paper_authors: Zhibin Li, Piotr Koniusz, Lu Zhang, Daniel Edward Pagendam, Peyman Moghadam
  • for: 学习 categorical 数据中的依赖关系,以提高模型的准确率和稳定性。
  • methods: 提出了一种新的方法,通过学习全局字段依赖矩阵,然后在实例级别使用不同的权重(即本地依赖模型)来提高字段间的模型。
  • results: 在六个popular数据集上比较了多种现有方法,并达到了更高的准确率和稳定性。详细的ablation study提供了更多的内容。
    Abstract Traditional approaches for learning on categorical data underexploit the dependencies between columns (\aka fields) in a dataset because they rely on the embedding of data points driven alone by the classification/regression loss. In contrast, we propose a novel method for learning on categorical data with the goal of exploiting dependencies between fields. Instead of modelling statistics of features globally (i.e., by the covariance matrix of features), we learn a global field dependency matrix that captures dependencies between fields and then we refine the global field dependency matrix at the instance-wise level with different weights (so-called local dependency modelling) w.r.t. each field to improve the modelling of the field dependencies. Our algorithm exploits the meta-learning paradigm, i.e., the dependency matrices are refined in the inner loop of the meta-learning algorithm without the use of labels, whereas the outer loop intertwines the updates of the embedding matrix (the matrix performing projection) and global dependency matrix in a supervised fashion (with the use of labels). Our method is simple yet it outperforms several state-of-the-art methods on six popular dataset benchmarks. Detailed ablation studies provide additional insights into our method.
    摘要 传统方法学习 categorical 数据会忽略数据集中列(即字段)之间的依赖关系,因为它们基于单独的分类/回归损失来驱动数据点的嵌入。相比之下,我们提出了一种新的方法,旨在利用数据集中列之间的依赖关系。而不是通过特征的全局统计来模型特征(即特征的covariance矩阵),我们学习一个全局字段依赖矩阵,然后在每个实例级别使用不同的权重(即本地依赖模型)来改进字段依赖的模型。我们的算法利用了元学习 парадиг,即依赖矩阵在内Loop中被反复更新,而外Loop则在有标签的情况下,将插入矩阵的更新和全局依赖矩阵的更新相互交互。我们的方法简单,但它在六个流行的数据集benchmark上表现出色,并且我们进行了详细的拟合分析,以提供更多的准确性。

Biomaker CA: a Biome Maker project using Cellular Automata

  • paper_url: http://arxiv.org/abs/2307.09320
  • repo_url: None
  • paper_authors: Ettore Randazzo, Alexander Mordvintsev
  • for: 这个论文是关于使用细胞自动机(CA)模拟生物体的生长和进化的研究。
  • methods: 这个研究使用了Python JAX框架对2D网格上的CA规则进行并行计算,并提供了不同的环境和物理法则,以及不同的模型架构和 мутаagen策略。
  • results: 研究人员通过模拟不同的环境和物理法则,证明了植物代理可以在缺乏营养的环境中生长、存活、繁殖和演化,并且可以通过用户交互式进行进化。
    Abstract We introduce Biomaker CA: a Biome Maker project using Cellular Automata (CA). In Biomaker CA, morphogenesis is a first class citizen and small seeds need to grow into plant-like organisms to survive in a nutrient starved environment and eventually reproduce with variation so that a biome survives for long timelines. We simulate complex biomes by means of CA rules in 2D grids and parallelize all of its computation on GPUs through the Python JAX framework. We show how this project allows for several different kinds of environments and laws of 'physics', alongside different model architectures and mutation strategies. We further analyze some configurations to show how plant agents can grow, survive, reproduce, and evolve, forming stable and unstable biomes. We then demonstrate how one can meta-evolve models to survive in a harsh environment either through end-to-end meta-evolution or by a more surgical and efficient approach, called Petri dish meta-evolution. Finally, we show how to perform interactive evolution, where the user decides how to evolve a plant model interactively and then deploys it in a larger environment. We open source Biomaker CA at: https://tinyurl.com/2x8yu34s .
    摘要 我们介绍生物创造器 CA:一个基因组织(CA)的生物创造项目。在生物创造器 CA 中,形态形成是一等公民,小种子需要在营养不足的环境中增长成植物如果生存,并最终繁殖,具有多样性,以确保生物群落长期存活。我们使用2D网格上的 CA 规则来模拟复杂的生态系统,并通过 Python JAX 框架在 GPU 上分布式计算。我们显示了这个项目可以支援多种环境和物理法则,以及不同的模型架构和突变策略。我们进一步分析了一些配置,说明植物代表如何在营养不足的环境中增长、存活、繁殖和演化,形成稳定和不稳定的生态系统。我们还示出了如何使用终端进化来在严峻环境中存活,或者使用更精确和高效的方法,即 Petri dish 进化。最后,我们显示了如何进行互动演化,让用户可以在互动式的方式下演化植物模型,然后将其部署到更大的环境中。我们开源了生物创造器 CA,请参考以下连结:https://tinyurl.com/2x8yu34s。

Multi-Modal Discussion Transformer: Integrating Text, Images and Graph Transformers to Detect Hate Speech on Social Media

  • paper_url: http://arxiv.org/abs/2307.09312
  • repo_url: https://github.com/liamhebert/multimodaldiscussiontransformer
  • paper_authors: Liam Hebert, Gaurav Sahu, Nanda Kishore Sreenivas, Lukasz Golab, Robin Cohen
  • for: 本研究的目的是开发一种基于多模态图表示的 hate speech 检测模型,以捕捉在在线社交网络中的谩骂语言。
  • methods: 该模型使用图transformer来捕捉整个讨论的上下文关系,并使用杂合层将文本和图像嵌入组合以取代单modal处理。
  • results: 对于基elines进行比较,我们发现我们的模型在检测 hate speech 方面的性能明显提高了,并进行了广泛的ablation研究。Translation:
  • for: The purpose of this study is to develop a hate speech detection model based on multimodal graph representation, to capture anti-social behavior in online social networks.
  • methods: The model uses graph transformers to capture the contextual relationships in the entire discussion, and combines text and image embeddings using fusion layers instead of processing them separately.
  • results: Compared to baselines, our model shows significantly improved performance in detecting hate speech, and we conducted extensive ablation studies.
    Abstract We present the Multi-Modal Discussion Transformer (mDT), a novel multi-modal graph-based transformer model for detecting hate speech in online social networks. In contrast to traditional text-only methods, our approach to labelling a comment as hate speech centers around the holistic analysis of text and images. This is done by leveraging graph transformers to capture the contextual relationships in the entire discussion that surrounds a comment, with interwoven fusion layers to combine text and image embeddings instead of processing different modalities separately. We compare the performance of our model to baselines that only process text; we also conduct extensive ablation studies. We conclude with future work for multimodal solutions to deliver social value in online contexts, arguing that capturing a holistic view of a conversation greatly advances the effort to detect anti-social behavior.
    摘要 我们介绍了多模态讨论变换器(mDT),一种新的多模态图基于变换器模型,用于在社交网络上探测 hate speech。与传统的文本Only方法不同,我们的方法是根据对评论的全面分析,包括文本和图像。我们利用图transformer来捕捉讨论中的上下文关系,并通过交叠卷积层将文本和图像嵌入拼接在一起,而不是分开处理不同的模式。我们与基线对比,并进行了广泛的剖析研究。我们 conclude 未来的多模态解决方案可以为在线上提供社会价值,因为捕捉讨论的全面视图可以帮助探测反社会行为。

Automatic Differentiation for Inverse Problems with Applications in Quantum Transport

  • paper_url: http://arxiv.org/abs/2307.09311
  • repo_url: None
  • paper_authors: Ivan Williams, Eric Polizzi
  • for: inverse quantum transport problem
  • methods: neural solver and differentiable simulation
  • results: engineering continuous transmission properties and current-voltage characteristics
    Abstract A neural solver and differentiable simulation of the quantum transmitting boundary model is presented for the inverse quantum transport problem. The neural solver is used to engineer continuous transmission properties and the differentiable simulation is used to engineer current-voltage characteristics.
    摘要 neural 算法和可导的量子传输边界模型是用于反向量 calculus 问题的解决方案。 neural 算法用于引入连续传输性质,而可导的 simulate 用于引入电流-电压特性。

EigenTrajectory: Low-Rank Descriptors for Multi-Modal Trajectory Forecasting

  • paper_url: http://arxiv.org/abs/2307.09306
  • repo_url: https://github.com/inhwanbae/eigentrajectory
  • paper_authors: Inhwan Bae, Jean Oh, Hae-Gon Jeon
  • For: 本研究旨在提高人行道预测的精度和可靠性,通过使用新的轨迹描述符来减少轨迹的维度。* Methods: 我们使用一种新的轨迹描述符,将人行道径变换为一个紧凑的 $\mathbb{ET}$ 空间,然后使用现有的轨迹预测模型进行预测。此外,我们还提出了一种基于轨迹锚点的修正方法,以覆盖所有可能的未来。* Results: 我们的实验结果表明,使用我们的 EigenTrajectory 预测器可以显著提高现有轨迹预测模型的预测精度和可靠性,这表明我们的描述符适用于表示行人行为。代码可以在 https://github.com/inhwanbae/EigenTrajectory 上下载。
    Abstract Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .
    摘要 Capturing high-dimensional social interactions and feasible futures is essential for predicting trajectories. To address this complex nature, several attempts have been devoted to reducing the dimensionality of the output variables via parametric curve fitting such as the B\'ezier curve and B-spline function. However, these functions, which originate in computer graphics fields, are not suitable to account for socially acceptable human dynamics. In this paper, we present EigenTrajectory ($\mathbb{ET}$), a trajectory prediction approach that uses a novel trajectory descriptor to form a compact space, known here as $\mathbb{ET}$ space, in place of Euclidean space, for representing pedestrian movements. We first reduce the complexity of the trajectory descriptor via a low-rank approximation. We transform the pedestrians' history paths into our $\mathbb{ET}$ space represented by spatio-temporal principle components, and feed them into off-the-shelf trajectory forecasting models. The inputs and outputs of the models as well as social interactions are all gathered and aggregated in the corresponding $\mathbb{ET}$ space. Lastly, we propose a trajectory anchor-based refinement method to cover all possible futures in the proposed $\mathbb{ET}$ space. Extensive experiments demonstrate that our EigenTrajectory predictor can significantly improve both the prediction accuracy and reliability of existing trajectory forecasting models on public benchmarks, indicating that the proposed descriptor is suited to represent pedestrian behaviors. Code is publicly available at https://github.com/inhwanbae/EigenTrajectory .

Conformal prediction under ambiguous ground truth

  • paper_url: http://arxiv.org/abs/2307.09302
  • repo_url: None
  • paper_authors: David Stutz, Abhijit Guha Roy, Tatiana Matejovicova, Patricia Strachan, Ali Taylan Cemgil, Arnaud Doucet
  • for: 这个论文的目的是提出一种基于不确定标签的整形预测方法,以便在不具备准确标签的情况下进行不确定性评估。
  • methods: 该方法基于一种approximentation of the underlying posterior distribution of labels given inputs,以便处理不具备准确标签的情况。
  • results: 在synthetic和实际 dataset上,该方法可以准确地预测输入样本的标签,并且可以正确地评估输入样本的不确定性。在一个dermatology例子中,该方法可以成功地预测皮肤状况的标签。
    Abstract In safety-critical classification tasks, conformal prediction allows to perform rigorous uncertainty quantification by providing confidence sets including the true class with a user-specified probability. This generally assumes the availability of a held-out calibration set with access to ground truth labels. Unfortunately, in many domains, such labels are difficult to obtain and usually approximated by aggregating expert opinions. In fact, this holds true for almost all datasets, including well-known ones such as CIFAR and ImageNet. Applying conformal prediction using such labels underestimates uncertainty. Indeed, when expert opinions are not resolvable, there is inherent ambiguity present in the labels. That is, we do not have ``crisp'', definitive ground truth labels and this uncertainty should be taken into account during calibration. In this paper, we develop a conformal prediction framework for such ambiguous ground truth settings which relies on an approximation of the underlying posterior distribution of labels given inputs. We demonstrate our methodology on synthetic and real datasets, including a case study of skin condition classification in dermatology.
    摘要 在安全关键分类任务中,协形预测可以进行严格的不确定性评估,提供包含真实类别的信任集,用户可以指定概率。通常假设有一个保留 calibration set 可以获得真实标签。然而,在许多领域,这些标签很难获得,通常通过专家意见的汇总来估计。实际上,这是大多数数据集中的情况,包括常见的 CIFAR 和 ImageNet。在这些标签不确定的情况下,使用协形预测会下降 uncertainty。事实上,当专家意见不能分解时,存在 inherent ambiguity 在标签中。即,我们没有 "卷积" 、definitive 的地面实标签,这种uncertainty 应该在 calibration 阶段被考虑。在这篇论文中,我们开发了一种协形预测框架,用于这种 ambiguous 地面实标签设置,基于输入的后期分布。我们在 sintetic 和实际数据集上进行了示例研究,包括皮肤状况分类在皮肤科学中。

FlexiAST: Flexibility is What AST Needs

  • paper_url: http://arxiv.org/abs/2307.09286
  • repo_url: https://github.com/JiuFengSC/FlexiAST_INTERSPEECH23
  • paper_authors: Jiu Feng, Mehmet Hamza Erol, Joon Son Chung, Arda Senocak
  • for: 提高Audio Spectrogram Transformer(AST)模型在不同补丁大小下的性能。
  • methods: 提出一种基于随机补丁大小选择和补丁重量resize的训练方法,使标准AST模型在推理阶段能够适应不同补丁大小。
  • results: 实验表明,FlexiAST模型在不同补丁大小下保持类似的性能,而无需进行architecture变更。
    Abstract The objective of this work is to give patch-size flexibility to Audio Spectrogram Transformers (AST). Recent advancements in ASTs have shown superior performance in various audio-based tasks. However, the performance of standard ASTs degrades drastically when evaluated using different patch sizes from that used during training. As a result, AST models are typically re-trained to accommodate changes in patch sizes. To overcome this limitation, this paper proposes a training procedure to provide flexibility to standard AST models without architectural changes, allowing them to work with various patch sizes at the inference stage - FlexiAST. This proposed training approach simply utilizes random patch size selection and resizing of patch and positional embedding weights. Our experiments show that FlexiAST gives similar performance to standard AST models while maintaining its evaluation ability at various patch sizes on different datasets for audio classification tasks.
    摘要 “这个工作的目标是给Audio Spectrogram Transformer(AST)提供材质大小灵活性。现代AST的进步已经在不同的音频任务中展示出色的表现。然而,标准AST的表现会在不同的材质大小下逐渐下降,导致AST模型需要重新训练来适应材质大小的变化。为了解决这个限制,这篇文章提出了一种不需要建构更改的训练方法,可以让标准AST模型在测试阶段运行各种材质大小。这个提案使用随机选择材质大小和重复材质大小的位置嵌入对应。我们的实验表明,FlexiAST可以与标准AST模型的表现相似,并且在不同的材质大小下保持评估能力。”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese text may be written differently.

End-to-End Neural Network Training for Hyperbox-Based Classification

  • paper_url: http://arxiv.org/abs/2307.09269
  • repo_url: https://github.com/mlde-ms/hypernn
  • paper_authors: Denis Mayr Lima Martins, Christian Lülf, Fabian Gieseke
  • for: 这篇论文是为了提出一个新的、可微分的核心框架,以便实现高效地对大量数据进行分类。
  • methods: 这篇论文使用了神经网络,并将核心框架转换为可微分的形式,以便实现更高效的训练。
  • results: 这篇论文的结果显示,使用这个新的核心框架和训练方法可以获得更好的分类结果,并且训练时间更短。
    Abstract Hyperbox-based classification has been seen as a promising technique in which decisions on the data are represented as a series of orthogonal, multidimensional boxes (i.e., hyperboxes) that are often interpretable and human-readable. However, existing methods are no longer capable of efficiently handling the increasing volume of data many application domains face nowadays. We address this gap by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. In contrast to previous work, our hyperbox models can be efficiently trained in an end-to-end fashion, which leads to significantly reduced training times and superior classification results.
    摘要 互 box 基本的分类技术已被视为一种有前途的技术,在这种技术中,数据的决策被表示为一系列的正交、多维度的盒子(即互 box),这些盒子通常是可读的和人类可读的。然而,现有的方法不再能够有效地处理现在许多应用领域面临的数据量的增加。我们解决这个问题 by proposing a novel, fully differentiable framework for hyperbox-based classification via neural networks. 在我们的方法中,互 box 模型可以在端到端的方式进行高效地训练,这导致了训练时间的减少和分类结果的提高。

Mobility-Aware Joint User Scheduling and Resource Allocation for Low Latency Federated Learning

  • paper_url: http://arxiv.org/abs/2307.09263
  • repo_url: None
  • paper_authors: Kecheng Fan, Wen Chen, Jun Li, Xiumei Deng, Xuefeng Han, Ming Ding
  • for: 这个论文的目的是提出一个实用的机器学习方法,以解决在联盟学习(Federated Learning,FL)中用户移动导致训练效能下降的问题。
  • methods: 这个论文使用了一个实际的用户移动模型,并提出了一个用户排程和资源分配方法,以减少训练延迟时间,并且考虑了对于用户移动的影响。
  • results: simulations results show that the proposed algorithm achieves better performance than the state-of-the-art baselines, and a certain level of user mobility could improve training performance.
    Abstract As an efficient distributed machine learning approach, Federated learning (FL) can obtain a shared model by iterative local model training at the user side and global model aggregating at the central server side, thereby protecting privacy of users. Mobile users in FL systems typically communicate with base stations (BSs) via wireless channels, where training performance could be degraded due to unreliable access caused by user mobility. However, existing work only investigates a static scenario or random initialization of user locations, which fail to capture mobility in real-world networks. To tackle this issue, we propose a practical model for user mobility in FL across multiple BSs, and develop a user scheduling and resource allocation method to minimize the training delay with constrained communication resources. Specifically, we first formulate an optimization problem with user mobility that jointly considers user selection, BS assignment to users, and bandwidth allocation to minimize the latency in each communication round. This optimization problem turned out to be NP-hard and we proposed a delay-aware greedy search algorithm (DAGSA) to solve it. Simulation results show that the proposed algorithm achieves better performance than the state-of-the-art baselines and a certain level of user mobility could improve training performance.
    摘要 为了实现高效的分布式机器学习方法,联邦学习(FL)可以在用户端进行轮循式地本地模型训练和中央服务器端进行全球模型汇总,以保护用户隐私。在FL系统中,移动用户通常通过无线通信通道与基站(BS)进行交互,但是训练性能可能受到用户移动导致的不可预测访问所增加的干扰。现有研究仅考虑静止场景或随机初始化用户位置,未能捕捉实际网络中的移动。为解决这个问题,我们提出了实际的用户移动模型在FL中,并开发了一种用户调度和资源分配方法,以最小化通信资源的培训延迟。具体来说,我们首先将用户移动引入到联邦学习中的优化问题中,并jointly考虑用户选择、用户分配到BS以及带宽分配,以最小化每次通信圈中的延迟。这个优化问题被证明是NP困难的,我们提出了延迟意识搜索算法(DAGSA)解决它。实验结果表明,我们的算法在比较状态前的基elines上表现得更好,并且一定程度的用户移动可以提高训练性能。

Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds

  • paper_url: http://arxiv.org/abs/2307.09259
  • repo_url: None
  • paper_authors: Naoki Nishikawa, Yuichi Ike, Kenji Yamanishi
  • for: 提高机器学习点云处理精度,应用于形态识别和材料科学等领域。
  • methods: 使用神经网络学习自适应滤波,以保证 persistent homology 的同质性。
  • results: 在多个分类任务中表现出色,证明了我们的框架的可行性。Here’s the full text in Simplified Chinese:
  • for: 本研究旨在提高机器学习点云处理精度,应用于形态识别和材料科学等领域。
  • methods: 我们提出了一种基于神经网络的自适应滤波方法,以保证 persistent homology 的同质性。
  • results: 我们在多个分类任务中进行了实验,结果表明我们的框架在这些任务中表现出色,证明了我们的方法的可行性。
    Abstract Machine learning for point clouds has been attracting much attention, with many applications in various fields, such as shape recognition and material science. To enhance the accuracy of such machine learning methods, it is known to be effective to incorporate global topological features, which are typically extracted by persistent homology. In the calculation of persistent homology for a point cloud, we need to choose a filtration for the point clouds, an increasing sequence of spaces. Because the performance of machine learning methods combined with persistent homology is highly affected by the choice of a filtration, we need to tune it depending on data and tasks. In this paper, we propose a framework that learns a filtration adaptively with the use of neural networks. In order to make the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we theoretically show a finite-dimensional approximation result that justifies our architecture. Experimental results demonstrated the efficacy of our framework in several classification tasks.
    摘要 In this paper, we propose a framework that learns a filtration adaptively using neural networks. To ensure the resulting persistent homology isometry-invariant, we develop a neural network architecture with such invariance. Additionally, we provide a finite-dimensional approximation result that justifies our architecture. Experimental results show the effectiveness of our framework in several classification tasks.Here's the translation in Simplified Chinese:机器学习 для点云已经吸引了很多关注,并在各种领域上有广泛的应用,如形状识别和材料科学。为了提高机器学习方法的准确性,通常需要包含全局拓扑特征,通常通过不变 homology 来提取。在计算不变 homology 中,我们需要选择一个筛选器,是一个增长序列的空间。由于选择筛选器的性能会高度影响机器学习方法的性能,因此需要根据数据和任务进行调整。在这篇论文中,我们提出了一种框架,通过使用神经网络来自适应地学习筛选器。为确保结果的不变 homology 尺度 invariants,我们开发了一种具有不变性的神经网络架构。此外,我们也提供了一个数学上的有限维approximation 结果,证明了我们的架构的正确性。实验结果表明,我们的框架在多个分类任务中表现出色。

PAC Neural Prediction Set Learning to Quantify the Uncertainty of Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.09254
  • repo_url: None
  • paper_authors: Sangdon Park, Taesoo Kim
  • for: 提高模型的可靠性和信任worthiness
  • methods: 使用神经网络 parameterized prediction set models,实现更精准的uncertainty quantification,并且满足 probably approximately correct (PAC) 保证
  • results: 在四种语言数据集和六种模型上,比基准方法提高quantified uncertainty的精度$63%$的平均值
    Abstract Uncertainty learning and quantification of models are crucial tasks to enhance the trustworthiness of the models. Importantly, the recent surge of generative language models (GLMs) emphasizes the need for reliable uncertainty quantification due to the concerns on generating hallucinated facts. In this paper, we propose to learn neural prediction set models that comes with the probably approximately correct (PAC) guarantee for quantifying the uncertainty of GLMs. Unlike existing prediction set models, which are parameterized by a scalar value, we propose to parameterize prediction sets via neural networks, which achieves more precise uncertainty quantification but still satisfies the PAC guarantee. We demonstrate the efficacy of our method on four types of language datasets and six types of models by showing that our method improves the quantified uncertainty by $63\%$ on average, compared to a standard baseline method.
    摘要 <>对不确定性学习和模型评估是重要任务,以提高模型的可靠性。尤其是最近几年的生成语言模型(GLMs),它们的不确定性评估问题得到了更多的关注,因为它们可能会生成假信息。在这篇论文中,我们提议使用神经网络来学习预测集模型,这些模型具有可靠的不确定性评估保证(PAC),并且可以更 precisely 评估 GLMs 的不确定性。 unlike 现有的预测集模型,我们的模型参数化使用神经网络,这使得我们可以更好地评估 GLMs 的不确定性。我们在四种语言 dataset 和六种模型上进行了实验,并证明我们的方法可以提高评估不确定性的精度,相比标准基准方法,提高了63%的平均值。Note: " Probably approximately correct" (PAC) is a theoretical guarantee that the output of a machine learning model is likely to be close to the true output, with a certain level of confidence. In this context, the PAC guarantee is used to ensure that the uncertainty estimates produced by the model are reliable and accurate.

UniTabE: Pretraining a Unified Tabular Encoder for Heterogeneous Tabular Data

  • paper_url: http://arxiv.org/abs/2307.09249
  • repo_url: None
  • paper_authors: Yazheng Yang, Yuqi Wang, Guang Liu, Ledell Wu, Qi Liu
  • for: 本研究旨在推广自然语言处理(NLP)中的预训练方法,应用于表格数据,以提高表格数据分析的Semantic Representation。
  • methods: 本研究使用了UniTabE方法,该方法基于表格元素模块(TabUnit)和Transformer编码器,可以适应不同的表格结构。此外,模型还支持预训练和Finetuning,通过自由形式的提示。
  • results: 实验结果显示,UniTabE方法在多个benchmark dataset上表现出色,超过了多个基eline模型。这说明UniTabE方法可以有效地提高表格数据的Semantic Representation,为表格数据分析带来 significiant progress.
    Abstract Recent advancements in Natural Language Processing (NLP) have witnessed the groundbreaking impact of pretrained models, yielding impressive outcomes across various tasks. This study seeks to extend the power of pretraining methodologies to tabular data, a domain traditionally overlooked, yet inherently challenging due to the plethora of table schemas intrinsic to different tasks. The primary research questions underpinning this work revolve around the adaptation to heterogeneous table structures, the establishment of a universal pretraining protocol for tabular data, the generalizability and transferability of learned knowledge across tasks, the adaptation to diverse downstream applications, and the incorporation of incremental columns over time. In response to these challenges, we introduce UniTabE, a pioneering method designed to process tables in a uniform manner, devoid of constraints imposed by specific table structures. UniTabE's core concept relies on representing each basic table element with a module, termed TabUnit. This is subsequently followed by a Transformer encoder to refine the representation. Moreover, our model is designed to facilitate pretraining and finetuning through the utilization of free-form prompts. In order to implement the pretraining phase, we curated an expansive tabular dataset comprising approximately 13 billion samples, meticulously gathered from the Kaggle platform. Rigorous experimental testing and analyses were performed under a myriad of scenarios to validate the effectiveness of our methodology. The experimental results demonstrate UniTabE's superior performance against several baseline models across a multitude of benchmark datasets. This, therefore, underscores UniTabE's potential to significantly enhance the semantic representation of tabular data, thereby marking a significant stride in the field of tabular data analysis.
    摘要 近年的自然语言处理(NLP)技术发展,启示出革命性的影响,在多种任务上取得了卓越的成绩。这项研究旨在扩展预训练方法的应用范围,推广到表格数据领域,这是传统上受过忽略的领域,但具有各种表格结构的强大挑战。本研究的主要问题包括适应不同表格结构、建立通用预训练协议、学习知识的通用性和跨任务传递性、适应多种下游应用、以及逐渐增加的列的支持。为解决这些挑战,我们提出了UniTabE方法,用于统一处理表格数据,不受特定表格结构的限制。UniTabE的核心思想是将每个基本表格元素表示为Module,称为TabUnit,然后使用Transformer编码器进行细化表示。此外,我们的模型设计能够方便预训练和finetuning,通过使用自由形式的提示。为进行预训练阶段,我们精心收集了约130亿个样本的大量表格数据,从Kaggle平台上精心收集。通过多种情况下的严格实验和分析,我们证明UniTabE方法在多个benchmark数据集上的表现优于多个基eline模型。这一结果 therefore表明UniTabE具有提高表格数据semantic表示的潜在能力,从而为表格数据分析带来重要的进步。

Application of BERT in Wind Power Forecasting-Teletraan’s Solution in Baidu KDD Cup 2022

  • paper_url: http://arxiv.org/abs/2307.09248
  • repo_url: https://github.com/longxingtan/kdd2022-baidu
  • paper_authors: Longxing Tan, Hongying Yue
  • for: 预测风力电力系统的可靠性和可持续发展
  • methods: 使用BERT模型和日均异常值补做来预测风力电力系统的输出
  • results: 在Baidu KDD Cup 2022中获得第三名,代表着模型的可靠性和精度Here’s a more detailed explanation of each point:
  • for: The paper is written for the purpose of improving the reliability and sustainability of wind power systems by using a BERT model and daily fluctuation post-processing to make accurate predictions.
  • methods: The paper uses the BERT model, which is a type of deep learning model that has shown great success in natural language processing tasks, to predict the output of wind power systems. Additionally, the authors add daily fluctuation to the predicted results through post-processing to make the predictions more accurate and in line with daily periodicity.
  • results: The authors achieved third place out of 2490 teams in the Baidu KDD Cup 2022, which demonstrates the effectiveness and accuracy of their proposed method.
    Abstract Nowadays, wind energy has drawn increasing attention as its important role in carbon neutrality and sustainable development. When wind power is integrated into the power grid, precise forecasting is necessary for the sustainability and security of the system. However, the unpredictable nature and long sequence prediction make it especially challenging. In this technical report, we introduce the BERT model applied for Baidu KDD Cup 2022, and the daily fluctuation is added by post-processing to make the predicted results in line with daily periodicity. Our solution achieves 3rd place of 2490 teams. The code is released athttps://github.com/LongxingTan/KDD2022-Baidu
    摘要 现在,风能资源已经吸引了越来越多的关注,因为它在碳中和可持续发展中扮演着重要的角色。当风力发电机与电力网络集成时,准确预测成为了系统可持续性和安全性的重要因素。然而,风力预测具有不可预测性和长时间序列预测的特点,使得预测变得特别困难。在这份技术报告中,我们介绍了BERT模型在Baidu KDD杯2022中的应用,并通过后处理来添加日律性,使预测结果与日律性保持一致。我们的解决方案在2490个团队中获得第三名,代码在https://github.com/LongxingTan/KDD2022-Baidu上发布。

Towards Sustainable Deep Learning for Multi-Label Classification on NILM

  • paper_url: http://arxiv.org/abs/2307.09244
  • repo_url: None
  • paper_authors: Anže Pirnat, Blaž Bertalanič, Gregor Cerar, Mihael Mohorčič, Carolina Fortuna
  • For: The paper is written for the purpose of improving the computation and energy efficiency of deep learning (DL) models for non-intrusive load monitoring (NILM) classification.* Methods: The paper proposes a novel DL model for enhanced multi-label classification of NILM, which is designed to reduce computational and energy demands during training and operation.* Results: The proposed model achieves on average approximately 8 percentage points in performance improvement compared to the state-of-the-art, while reducing the carbon footprint by more than 23%.Here’s the Chinese translation of the three key information points:* For: 本文是为了提高深度学习(DL)模型的计算和能源效率,用于非侵入式电力监测(NILM)分类。* Methods: 本文提出了一种新的DL模型,用于改进NILM多标签分类,以降低训练和运行中的计算和能源需求。* Results: 提议的模型与状态艺术比较,在REFIT和UK-DALE数据集上测试时, average提高了约8%的性能,同时减少了碳脚印的23%以上。
    Abstract Non-intrusive load monitoring (NILM) is the process of obtaining appliance-level data from a single metering point, measuring total electricity consumption of a household or a business. Appliance-level data can be directly used for demand response applications and energy management systems as well as for awareness raising and motivation for improvements in energy efficiency and reduction in the carbon footprint. Recently, classical machine learning and deep learning (DL) techniques became very popular and proved as highly effective for NILM classification, but with the growing complexity these methods are faced with significant computational and energy demands during both their training and operation. In this paper, we introduce a novel DL model aimed at enhanced multi-label classification of NILM with improved computation and energy efficiency. We also propose a testing methodology for comparison of different models using data synthesized from the measurement datasets so as to better represent real-world scenarios. Compared to the state-of-the-art, the proposed model has its carbon footprint reduced by more than 23% while providing on average approximately 8 percentage points in performance improvement when testing on data derived from REFIT and UK-DALE datasets.
    摘要 非侵入式电力监测(NILM)是指从单个计量点获取家用电器或商业用电器的具体数据,计算总电力消耗量。家用电器或商业用电器的具体数据可以直接用于需求应答应用和能源管理系统,以及提高能源效率和减少碳足迹。在最近的几年中,传统的机器学习和深度学习(DL)技术在NILM类型分类中变得非常流行,但是随着模型的复杂度的增加,它们面临着显著的计算和能源投入问题。在本文中,我们介绍了一种新的深度学习模型,旨在提高多标签分类的NILM性能。我们还提出了一种测试方法ологи,用于对不同模型进行比较,以更好地 simulate real-world scenarios。相比之前的状态艺术,我们的提案模型可以减少碳脚印的23%以上,并在REFIT和UK-DALE数据集上测试平均提高8个百分点的性能。

Fusing Hand and Body Skeletons for Human Action Recognition in Assembly

  • paper_url: http://arxiv.org/abs/2307.09238
  • repo_url: None
  • paper_authors: Dustin Aganian, Mona Köhler, Benedict Stephan, Markus Eisenbach, Horst-Michael Gross
  • for: 这篇论文主要是为了提高人机合作的效果,使用机器人在制造过程中协助人类完成Assembly任务。
  • methods: 该方法使用较为简单的人体骨架,并与高级别的手套骨架结合,使用CNN和转换器来提高人体动作识别率。
  • results: 该方法在Assembly场景中的人体动作识别率得到了提高,可以帮助机器人更好地协助人类完成Assembly任务。
    Abstract As collaborative robots (cobots) continue to gain popularity in industrial manufacturing, effective human-robot collaboration becomes crucial. Cobots should be able to recognize human actions to assist with assembly tasks and act autonomously. To achieve this, skeleton-based approaches are often used due to their ability to generalize across various people and environments. Although body skeleton approaches are widely used for action recognition, they may not be accurate enough for assembly actions where the worker's fingers and hands play a significant role. To address this limitation, we propose a method in which less detailed body skeletons are combined with highly detailed hand skeletons. We investigate CNNs and transformers, the latter of which are particularly adept at extracting and combining important information from both skeleton types using attention. This paper demonstrates the effectiveness of our proposed approach in enhancing action recognition in assembly scenarios.
    摘要 随着协同机器人(COBOT)在工业生产领域的普及,人机合作的效果变得越来越重要。COBOT应该能够认识人类动作,协助Assembly任务,并且可以自动行动。为达到这个目标,skeleton-based方法经常用于人机合作,因为它们可以在不同的人和环境中广泛普适。虽然body skeleton方法广泛用于动作识别,但它们可能无法准确地识别Assembly动作,这是因为工人的手指和手臂在这些动作中扮演着重要的角色。为解决这个限制,我们提议一种方法,即将较为简单的body skeleton与高级细节的手skeleton结合在一起。我们 investigate CNNs和transformers,后者尤其适合从skeleton类型中提取和组合重要信息,使用注意力。本文证明我们提议的方法可以在Assembly场景中提高动作识别的效果。

Detecting Throat Cancer from Speech Signals Using Machine Learning: A Reproducible Literature Review

  • paper_url: http://arxiv.org/abs/2307.09230
  • repo_url: None
  • paper_authors: Mary Paterson, James Moor, Luisa Cutillo
  • for: 这个研究是对现有文献中的嗓腔癌检测使用机器学习和人工智能的论文进行探讨。
  • methods: 这些论文使用的方法包括神经网络,并且大多数使用神经网络进行实现。Audio中的多种特征被提取,mel-frequency cepstral coefficients最常用。
  • results: 我们使用转移学习在多类问题上进行分类,并实现了53.54%的涂抹率,83.14%的敏感率和64.00%的特征率。我们的分类器与同一个数据集上的结果相似。
    Abstract In this work we perform a scoping review of the current literature on the detection of throat cancer from speech recordings using machine learning and artificial intelligence. We find 22 papers within this area and discuss their methods and results. We split these papers into two groups - nine performing binary classification, and 13 performing multi-class classification. The papers present a range of methods with neural networks being most commonly implemented. Many features are also extracted from the audio before classification, with the most common bring mel-frequency cepstral coefficients. None of the papers found in this search have associated code repositories and as such are not reproducible. Therefore, we create a publicly available code repository of our own classifiers. We use transfer learning on a multi-class problem, classifying three pathologies and healthy controls. Using this technique we achieve an unweighted average recall of 53.54%, sensitivity of 83.14%, and specificity of 64.00%. We compare our classifiers with the results obtained on the same dataset and find similar results.
    摘要 在这个工作中,我们进行了评估当前文献中用于喉部癌诊断从语音记录中使用机器学习和人工智能的研究。我们找到了22篇相关文献,并讨论了它们的方法和结果。我们将这些文献分为了两组:9篇为二分类,13篇为多类分类。文献中最常用的方法是神经网络。大多数文献在语音分类之前将各种特征提取出来,最常用的是MEL-frequency cepstral coefficients。我们发现所有搜索到的文献都没有关联代码库,因此无法重现。因此,我们创建了一个公共可用的代码库。我们使用传输学习解决多类问题,分类三种疾病和健康控制。使用这种技术,我们获得了无权重平均回归率53.54%,感知率83.14%和特征率64.00%。我们比较了我们的分类器与同一个数据集上的结果,发现结果类似。

How Many Neurons Does it Take to Approximate the Maximum?

  • paper_url: http://arxiv.org/abs/2307.09212
  • repo_url: None
  • paper_authors: Itay Safran, Daniel Reichman, Paul Valiant
  • for: 本研究旨在探讨一个神经网络可以近似最大函数的大小,在最基本的近似情况下,即使用$L_2$范数,对于连续分布,并使用ReLU激活函数。
  • methods: 我们提供了新的下界和上界,以评估不同深度的神经网络所需的宽度,以及一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$的建构,可以高效地近似最大函数。
  • results: 我们的结果显示,在depth 2和depth 3网络之间存在新的深度分离,以及depth 3和depth 5网络之间的深度分离。此外,我们还提供了一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$的建构,可以高效地近似最大函数,远远超过了最好已知的深度 bound。
    Abstract We study the size of a neural network needed to approximate the maximum function over $d$ inputs, in the most basic setting of approximating with respect to the $L_2$ norm, for continuous distributions, for a network that uses ReLU activations. We provide new lower and upper bounds on the width required for approximation across various depths. Our results establish new depth separations between depth 2 and 3, and depth 3 and 5 networks, as well as providing a depth $\mathcal{O}(\log(\log(d)))$ and width $\mathcal{O}(d)$ construction which approximates the maximum function, significantly improving upon the depth requirements of the best previously known bounds for networks with linearly-bounded width. Our depth separation results are facilitated by a new lower bound for depth 2 networks approximating the maximum function over the uniform distribution, assuming an exponential upper bound on the size of the weights. Furthermore, we are able to use this depth 2 lower bound to provide tight bounds on the number of neurons needed to approximate the maximum by a depth 3 network. Our lower bounds are of potentially broad interest as they apply to the widely studied and used \emph{max} function, in contrast to many previous results that base their bounds on specially constructed or pathological functions and distributions.
    摘要 我们研究了一个神经网络需要来近似最大函数的大小,在最基本的设定下,即使用$L_2$ нор的近似,并且考虑到连续分布下的情况。我们提供了新的下界和上界,以及对不同深度的数据分布。我们的结果建立了新的深度分隔,包括深度2和3之间的分隔,以及深度3和5之间的分隔。此外,我们还提供了一个depth $\mathcal{O}(\log(\log(d)))$和宽度 $\mathcal{O}(d)$ 的建构,可以将最大函数近似,与之前最好的下界对于线性宽度的网络具有重要的改进。我们的深度分隔结果受到一个新的深度2网络近似最大函数的下界,假设Weight的大小是线性的。此外,我们还可以使用这个深度2下界,提供了对深度3网络近似最大函数的精确的 neuron 数量。我们的下界具有广泛的应用可能性,因为它们应用到了广泛研究和使用的\emph{max}函数,不同于许多先前的结果,它们基于特殊 constructed 或是 Pathological 函数和分布。

Automated Ableism: An Exploration of Explicit Disability Biases in Sentiment and Toxicity Analysis Models

  • paper_url: http://arxiv.org/abs/2307.09209
  • repo_url: None
  • paper_authors: Pranav Narayanan Venkit, Mukund Srinath, Shomir Wilson
  • for: 本研究旨在探讨 sentiment analysis 和攻击干预模型在探测人际障碍 (PWD) 的表现时是否存在显著的偏见。
  • methods: 我们使用了 Perturbation Sensitivity Analysis 检测探测 Twitter 和 Reddit 社交媒体平台上关于 PWD 的对话,以获得实际社交场景中如何传播残障偏见的信息。然后,我们创建了 \textit{Bias Identification Test in Sentiment} (BITS) корпуス,以量化任何 sentiment analysis 和攻击干预模型中的直接残障偏见。
  • results: 我们的研究发现,这些open AIaaS sentiment analysis 工具(包括 TextBlob、VADER、Google Cloud Natural Language API 和 DistilBERT)以及两个攻击干预模型(包括 two versions of Toxic-BERT)都存在显著的直接残障偏见。
    Abstract We analyze sentiment analysis and toxicity detection models to detect the presence of explicit bias against people with disability (PWD). We employ the bias identification framework of Perturbation Sensitivity Analysis to examine conversations related to PWD on social media platforms, specifically Twitter and Reddit, in order to gain insight into how disability bias is disseminated in real-world social settings. We then create the \textit{Bias Identification Test in Sentiment} (BITS) corpus to quantify explicit disability bias in any sentiment analysis and toxicity detection models. Our study utilizes BITS to uncover significant biases in four open AIaaS (AI as a Service) sentiment analysis tools, namely TextBlob, VADER, Google Cloud Natural Language API, DistilBERT and two toxicity detection models, namely two versions of Toxic-BERT. Our findings indicate that all of these models exhibit statistically significant explicit bias against PWD.
    摘要 我们对偏见检测和负面情绪检测模型进行分析,以检测对人障(PWD)的直接偏见。我们使用扰动敏感性分析框架来分析社交媒体平台上关于PWD的对话,以获得实际社会中对障碍偏见的准确情况。然后,我们创建了《偏见标准测试集》(BITS)来衡量任何情感分析和负面情绪检测模型中的直接障碍偏见。我们的研究使用BITS来揭示四个开放的 AIaaS(人工智能 как服务)情感分析工具——TextBlob、VADER、Google Cloud Natural Language API和DistilBERT——以及两个负面情绪检测模型——两个版本的 Toxic-BERT——中的明显偏见。我们的发现表明,这些模型都存在 statistically significant的直接障碍偏见。

Context-Conditional Navigation with a Learning-Based Terrain- and Robot-Aware Dynamics Model

  • paper_url: http://arxiv.org/abs/2307.09206
  • repo_url: None
  • paper_authors: Suresh Guttikonda, Jan Achterhold, Haolong Li, Joschka Boedecker, Joerg Stueckler
  • For: 本研究目的是开发一种能够适应不同环境和机器人属性变化的自主导航方法。* Methods: 本研究使用了基于神经过程的meta-学前向动力学模型,以适应不同地形和机器人动力学变化。* Results: 实验表明,提出的模型在长期轨迹预测任务中的预测误差较低,而且在自主规划控制任务中也能够更好地规划控制路径。
    Abstract In autonomous navigation settings, several quantities can be subject to variations. Terrain properties such as friction coefficients may vary over time depending on the location of the robot. Also, the dynamics of the robot may change due to, e.g., different payloads, changing the system's mass, or wear and tear, changing actuator gains or joint friction. An autonomous agent should thus be able to adapt to such variations. In this paper, we develop a novel probabilistic, terrain- and robot-aware forward dynamics model, termed TRADYN, which is able to adapt to the above-mentioned variations. It builds on recent advances in meta-learning forward dynamics models based on Neural Processes. We evaluate our method in a simulated 2D navigation setting with a unicycle-like robot and different terrain layouts with spatially varying friction coefficients. In our experiments, the proposed model exhibits lower prediction error for the task of long-horizon trajectory prediction, compared to non-adaptive ablation models. We also evaluate our model on the downstream task of navigation planning, which demonstrates improved performance in planning control-efficient paths by taking robot and terrain properties into account.
    摘要 在自主导航设置下,许多量值可能会受到变化。地形特性,如摩擦系数,随时间的变化可能会影响机器人的位置。此外,机器人的动力学也可能会发生变化,例如不同的负荷、系统质量的变化、 actuator gain 或 JOINT 摩擦的变化。因此,一个自主智能体应该能够适应这些变化。在这篇论文中,我们开发了一种新的probabilistic,地形和机器人意识的前瞻动力学模型,称为 TRADYN,它能够适应以上所 mention 的变化。它基于最近的前瞻动力学模型基于神经过程的meta-学进步。我们在一个模拟的2D导航设置中使用了一种unicycle-like 机器人和不同的地形布局,并对其进行了评估。在我们的实验中,提议的模型在长期轨迹预测任务中表现出较低的预测错误,相比非适应模型。此外,我们还评估了我们的模型在导航规划任务中的表现,其表现出了改进的控制效率的规划路径。

Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.09205
  • repo_url: None
  • paper_authors: Fan Feng, Sara Magliacane
  • for: 这个论文的目的是提高强化学习任务中agent的扩展性和可重复性,使其能够在不同的物体和属性下进行学习和执行任务。
  • methods: 这个论文使用了对象中心表示学习来提取视觉输入中的物体,并将其分类为不同的类别。然后,对每个类别的物体,学习一个类模板图,描述了这种物体的动力和奖励如何因属性分解。还学习了对象之间的互动模式图,描述了不同类别的物体之间的互动。通过这些图和动态互动图,学习出一个策略,可以在新环境中直接应用。
  • results: 在三个标准 dataset上测试了这个框架,并证明了它在未seen的物体、属性和潜在参数下进行扩展和可重复性的任务时表现出色,以及在组合已知任务时的表现也是比较好的。
    Abstract In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects. Often a task is a composition of previously learned tasks (e.g. block stacking). These are examples of compositional generalization, in which we compose object-centric representations to solve complex tasks. Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency in these settings. On the other hand, these methods do not fully exploit the benefits of factorization in terms of object attributes. In this paper, we address this opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework. In DAFT-RL, we leverage object-centric representation learning to extract objects from visual inputs. We learn to classify them in classes and infer their latent parameters. For each class of object, we learn a class template graph that describes how the dynamics and reward of an object of this class factorize according to its attributes. We also learn an interaction pattern graph that describes how objects of different classes interact with each other at the attribute level. Through these graphs and a dynamic interaction graph that models the interactions between objects, we can learn a policy that can then be directly applied in a new environment by just estimating the interactions and latent parameters. We evaluate DAFT-RL in three benchmark datasets and show our framework outperforms the state-of-the-art in generalizing across unseen objects with varying attributes and latent parameters, as well as in the composition of previously learned tasks.
    摘要 在许多强化学习任务中,机器人需要学习与多种不同类型的物体交互,并泛化到未经见过的组合和数量。经常情况下,任务是一个各种已经学习过的任务的组合(例如堆叠块)。这些任务是物体中心的泛化,在这些情况下,我们可以通过物体中心的表示学习来解决复杂任务。在这篇论文中,我们提出了动态特征划分RL(DAFT-RL)框架。在DAFT-RL中,我们利用物体中心的表示学习来提取视觉输入中的物体。我们可以将它们分类为类别,并且从属特征参数的推断。对于每个类型的物体,我们学习一个类型图,该图描述了物体的动力学和奖励因为其特征的分解。我们还学习了不同类型物体之间的交互图,该图描述了物体之间的特征级别交互。通过这些图和动态交互图,我们可以学习一个策略,该策略可以在新环境中直接应用,只需要估计交互和隐藏参数。我们在三个标准数据集上进行了评估,并证明了我们的框架在未经见过的物体特征和隐藏参数的泛化,以及在组合已经学习过的任务中表现出色。

Federated Learning for Computationally-Constrained Heterogeneous Devices: A Survey

  • paper_url: http://arxiv.org/abs/2307.09182
  • repo_url: None
  • paper_authors: Kilian Pfeiffer, Martin Rapp, Ramin Khalili, Jörg Henkel
  • for: 提高用户隐私和减少中心服务器的负担,实现在设备上进行神经网络训练。
  • methods: 联邦学习(Federated Learning)技术,通过共享设备之间的知识,保持用户隐私,同时提高模型精度。
  • results: 在具有多种设备的不同硬件和软件环境中,联邦学习技术面临着多种差异和挑战,需要采取多种策略来减少这些差异,以提高模型精度和可靠性。
    Abstract With an increasing number of smart devices like internet of things (IoT) devices deployed in the field, offloadingtraining of neural networks (NNs) to a central server becomes more and more infeasible. Recent efforts toimprove users' privacy have led to on-device learning emerging as an alternative. However, a model trainedonly on a single device, using only local data, is unlikely to reach a high accuracy. Federated learning (FL)has been introduced as a solution, offering a privacy-preserving trade-off between communication overheadand model accuracy by sharing knowledge between devices but disclosing the devices' private data. Theapplicability and the benefit of applying baseline FL are, however, limited in many relevant use cases dueto the heterogeneity present in such environments. In this survey, we outline the heterogeneity challengesFL has to overcome to be widely applicable in real-world applications. We especially focus on the aspect ofcomputation heterogeneity among the participating devices and provide a comprehensive overview of recentworks on heterogeneity-aware FL. We discuss two groups: works that adapt the NN architecture and worksthat approach heterogeneity on a system level, covering Federated Averaging (FedAvg), distillation, and splitlearning-based approaches, as well as synchronous and asynchronous aggregation schemes.
    摘要 随着智能设备的数量不断增加,如物联网(IoT)设备,将神经网络(NN)的训练卷积到中央服务器上成为了不可能的。随后,为了保护用户隐私,在设备上进行学习(On-Device Learning)已经成为了一个可行的选择。然而,基于单个设备和本地数据进行训练的模型很难达到高精度。为了解决这个问题,联邦学习(Federated Learning,FL)已经被提出,它可以在保护设备私钥数据的同时,通过共享设备之间的知识,实现私钥数据不泄露的高精度模型训练。然而,FL在许多实际应用场景中的可应用性和优势受到了多种多样性的限制。在这篇简述中,我们描述了FL面临的多样性挑战,特别是设备计算能力的多样性,并提供了一个全面的最新研究综述。我们将分为两组:一组是改进神经网络架构的方法,另一组是在系统层面进行多样性处理的方法,包括联邦平均(FedAvg)、液化、分布式学习等方法,以及同步和异步聚合方案。

ECSIC: Epipolar Cross Attention for Stereo Image Compression

  • paper_url: http://arxiv.org/abs/2307.10284
  • repo_url: None
  • paper_authors: Matthias Wödlinger, Jan Kotera, Manuel Keglevic, Jan Xu, Robert Sablatnig
  • for: 这个论文是为了提出一种新的学习基于方法,用于压缩立体图像。
  • methods: 该方法利用了两个 Stero Context 模块和一个 Stero Cross Attention(SCA)模块来同时压缩左右图像。SCA模块在相对应的epipolar线上进行了交叉注意力,并在平行进行处理。
  • results: 对比其他方法,ECSIC 在 Cityscapes 和 InStereo2k 两个 популяр的立体图像数据集上达到了最佳性能,同时允许快速编码和解码,非常适合实时应用。
    Abstract In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance among stereo image compression models on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding, making it highly practical for real-time applications.
    摘要 在这篇论文中,我们提出了一种新的学习方法 для顺帧图像压缩,称为ECSIC。我们的提议方法将左右图像压缩在一起,通过利用顺帧图像对的相互信息来进行压缩。我们使用了一种新的顺帧相关注意力(SCA)模块和两个顺帧上下文模块来实现这一目标。SCA模块在相应的轴线上进行交叉注意力限制,并在平行进行处理。两个顺帧上下文模块使得第二个编码图像的Entropy估计得到改善,通过使用第一个图像作为 Context。我们进行了广泛的缺省研究,并对现有方法进行了全面的量化和质量比较。ECSIC在Cityscapes和InStereo2k两个流行的顺帧图像数据集上实现了顺帧图像压缩模型的状态机器,同时具有快速编码和解码功能,因此在实时应用中非常实用。

Towards Trustworthy Dataset Distillation

  • paper_url: http://arxiv.org/abs/2307.09165
  • repo_url: None
  • paper_authors: Shijie Ma, Fei Zhu, Zhen Cheng, Xu-Yao Zhang
    for: Trustworthy Dataset Distillation (TrustDD) aims to reduce training costs and enhance the trustworthiness of deep learning models in real-world applications by distilling both in-distribution (InD) samples and outliers.methods: The proposed method utilizes dataset distillation (DD) to condenses large datasets into tiny synthetic datasets, and introduces Pseudo-Outlier Exposure (POE) to generate pseudo-outliers and enhance OOD detection.results: Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). TrustDD is more trustworthy and applicable to real open-world scenarios compared to preceding DD methods.Here’s the simplified Chinese version:for: TrustDD 目的是提高深度学习模型在实际应用中的可靠性和训练效率,通过将大量数据筛选到简单的Synthetic dataset中。methods: TrustDD 使用 dataset distillation (DD) 将大量数据筛选到简单的 Synthetic dataset 中,并引入 Pseudo-Outlier Exposure (POE) 生成 Pseudo-outlier 并提高 OOD 检测。results: 各种设置的 comprehensive 实验表明 TrustDD 的有效性,并且提出的 POE 超越了state-of-the-art 方法 Outlier Exposure (OE)。TrustDD 比前一代 DD 方法更可靠和适用于实际开放世界应用场景。
    Abstract Efficiency and trustworthiness are two eternal pursuits when applying deep learning in real-world applications. With regard to efficiency, dataset distillation (DD) endeavors to reduce training costs by distilling the large dataset into a tiny synthetic dataset. However, existing methods merely concentrate on in-distribution (InD) classification in a closed-world setting, disregarding out-of-distribution (OOD) samples. On the other hand, OOD detection aims to enhance models' trustworthiness, which is always inefficiently achieved in full-data settings. For the first time, we simultaneously consider both issues and propose a novel paradigm called Trustworthy Dataset Distillation (TrustDD). By distilling both InD samples and outliers, the condensed datasets are capable to train models competent in both InD classification and OOD detection. To alleviate the requirement of real outlier data and make OOD detection more practical, we further propose to corrupt InD samples to generate pseudo-outliers and introduce Pseudo-Outlier Exposure (POE). Comprehensive experiments on various settings demonstrate the effectiveness of TrustDD, and the proposed POE surpasses state-of-the-art method Outlier Exposure (OE). Compared with the preceding DD, TrustDD is more trustworthy and applicable to real open-world scenarios. Our code will be publicly available.
    摘要 “效率和可靠性是深度学习应用实际场景中的两大永恒追求。在这个领域,数据集缩写(DD)尝试通过缩写大数据集为一个小型的合成数据集来减少训练成本。然而,现有方法仅关注在关闭世界设定下的内部分布(InD)类别,忽略了外部分布(OOD)样本。然而,OOD检测的目的是增强模型的可靠性,这通常在全数据设定下是不fficient的。为了解决这些问题,我们同时考虑了这两个问题,并提出了一种新的思路called Trustworthy Dataset Distillation(TrustDD)。通过缩写InD样本和异常样本,缩写后的数据集可以训练能够在InD类别和OOD检测中具备竞争力。为了避免实际异常数据的需求和使OOD检测更实用,我们进一步提出了 Pseudo-Outlier Exposure(POE)。我们对不同的设定进行了广泛的实验,并证明了 TrustDD 的有效性,而我们提出的 POE 超过了现有的 Outlier Exposure(OE)方法。相比之下,TrustDD 更加可靠和适用于真实的开放世界场景。我们的代码将在公共可用。”

MVA2023 Small Object Detection Challenge for Spotting Birds: Dataset, Methods, and Results

  • paper_url: http://arxiv.org/abs/2307.09143
  • repo_url: https://github.com/iim-ttij/mva2023smallobjectdetection4spottingbirds
  • paper_authors: Yuki Kondo, Norimichi Ukita, Takayuki Yamaguchi, Hao-Yu Hou, Mu-Yi Shen, Chia-Chi Hsu, En-Ming Huang, Yu-Chen Huang, Yu-Cheng Xia, Chien-Yao Wang, Chun-Yi Lee, Da Huo, Marc A. Kastner, Tingwei Liu, Yasutomo Kawanishi, Takatsugu Hirayama, Takahiro Komamizu, Ichiro Ide, Yosuke Shinya, Xinyao Liu, Guang Liang, Syusuke Yasui
  • for: 本研究旨在提出一个新的小物体检测数据集,以便进行远程小物体检测的实际应用。
  • methods: 本文提出了一种新的小物体检测方法,并在223名参与者的挑战中评测了其效果。
  • results: 研究发现,使用这种新方法可以在远程小物体检测中获得优秀的效果,并且提供了一个大量的小物体检测数据集和基eline代码以便进一步研究。
    Abstract Small Object Detection (SOD) is an important machine vision topic because (i) a variety of real-world applications require object detection for distant objects and (ii) SOD is a challenging task due to the noisy, blurred, and less-informative image appearances of small objects. This paper proposes a new SOD dataset consisting of 39,070 images including 137,121 bird instances, which is called the Small Object Detection for Spotting Birds (SOD4SB) dataset. The detail of the challenge with the SOD4SB dataset is introduced in this paper. In total, 223 participants joined this challenge. This paper briefly introduces the award-winning methods. The dataset, the baseline code, and the website for evaluation on the public testset are publicly available.
    摘要 小物体检测(SOD)是机器视觉领域的重要话题,因为(i)许多现实世界应用需要对远距离的物体进行检测,以及(ii)SOD是一项复杂的任务,因为小物体的图像表现具有噪声、模糊和不具有很多信息。这篇论文提出了一个新的SOD数据集,包括39,070张图像和137,121只鸟类实例,称为Small Object Detection for Spotting Birds(SOD4SB)数据集。本文介绍了SOD4SB数据集的挑战。总共有223名参与者参加了这个挑战。本文 briefly introduce了获奖方法。数据集、基线代码和评估网站对公共测试集进行评估是公共可用的。

Characterization of partial wetting by CMAS droplets using multiphase many-body dissipative particle dynamics and data-driven discovery based on PINNs

  • paper_url: http://arxiv.org/abs/2307.09142
  • repo_url: None
  • paper_authors: Elham Kiyani, Mahdi Kooshkbaghi, Khemraj Shukla, Rahul Babu Koneru, Zhen Li, Luis Bravo, Anindya Ghoshal, George Em Karniadakis, Mikko Karttunen
  • For: 这研究探讨了高熔率的CMAS液滴在不同初始尺寸和稳定接触角下的湿润动态。* Methods: 这研究使用多相多体积耗散动力学(mDPD)模拟,研究CMAS液滴的湿润动态。使用Physics-Informed Neural Network(PINN)框架确定湿润半径行为。使用符号回归来表示关系函数。* Results: 研究发现了CMAS液滴的湿润半径行为,并使用Bayesian PINNs(B-PINNs)评估和量化相关参数的不确定性。这研究将湿润动态模拟和机器学习技术结合,为高温应用提供了创新解决方案。
    Abstract The molten sand, a mixture of calcia, magnesia, alumina, and silicate, known as CMAS, is characterized by its high viscosity, density, and surface tension. The unique properties of CMAS make it a challenging material to deal with in high-temperature applications, requiring innovative solutions and materials to prevent its buildup and damage to critical equipment. Here, we use multiphase many-body dissipative particle dynamics (mDPD) simulations to study the wetting dynamics of highly viscous molten CMAS droplets. The simulations are performed in three dimensions, with varying initial droplet sizes and equilibrium contact angles. We propose a coarse parametric ordinary differential equation (ODE) that captures the spreading radius behavior of the CMAS droplets. The ODE parameters are then identified based on the Physics-Informed Neural Network (PINN) framework. Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.
    摘要 熔融砂粒材料(CMAS),它是含 calcium、magnesia、alumina 和 silicate 的混合物,具有高粘度、密度和表面张力。由于 CMAS 的特有性,在高温应用中处理它是一项挑战,需要创新的解决方案和材料来避免它的堆积和设备损害。在这里,我们使用多相多体积排斥凝聚 dynamics(mDPD)仿真来研究高粘度熔融 CMAS 液滴的湿润动力学。仿真在三维空间中进行,初始液滴尺寸和均衡接触角度进行变化。我们提出了一个粗略的常数参数方程(ODE),捕捉液滴的扩散半径行为。ODE 参数的标准值 Subsequently, the closed form dependency of parameter values found by PINN on the initial radii and contact angles are given using symbolic regression. Finally, we employ Bayesian PINNs (B-PINNs) to assess and quantify the uncertainty associated with the discovered parameters. In brief, this study provides insight into spreading dynamics of CMAS droplets by fusing simple parametric ODE modeling and state-of-the-art machine learning techniques.

Mining of Single-Class by Active Learning for Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.09109
  • repo_url: None
  • paper_authors: Hugues Lambert, Emma Slade
  • for: 本研究的目的是提出一种基于深度优化学习的活动学习策略,以提高特定类型的模型训练效果。
  • methods: 本研究使用的方法是基于深度优化学习的 MiSiCAL 方法,它可以通过量精度相关性来建立高性能的模型训练集。 MiSiCAL 方法不需要重新训练目标模型多次,因此适用于大批量训练。
  • results: 研究结果表明,MiSiCAL 方法能够在 COCO10k 数据集上 OUTPERFORM 随机策略的 150 个类型,而最强的基线方法只能 OUTPERFORM 随机策略的 101 个类型。
    Abstract Several Active Learning (AL) policies require retraining a target model several times in order to identify the most informative samples and rarely offer the option to focus on the acquisition of samples from underrepresented classes. Here the Mining of Single-Class by Active Learning (MiSiCAL) paradigm is introduced where an AL policy is constructed through deep reinforcement learning and exploits quantity-accuracy correlations to build datasets on which high-performance models can be trained with regards to specific classes. MiSiCAL is especially helpful in the case of very large batch sizes since it does not require repeated model training sessions as is common in other AL methods. This is thanks to its ability to exploit fixed representations of the candidate data points. We find that MiSiCAL is able to outperform a random policy on 150 out of 171 COCO10k classes, while the strongest baseline only outperforms random on 101 classes.
    摘要 几种活动学习(AL)策略需要重新训练目标模型多次以确定最有用的样本并rarely提供针对少 представohn classes 的集中着注意力选择样本的选择。在这里,我们介绍了一种名为MINING SINGLE-CLASS BY ACTIVE LEARNING(MiSiCAL)的策略,通过深度强化学习构建了一个AL策略,利用量精度相关性来建立高性能模型可以在特定类上训练。MiSiCAL在大批量时 particuarily helpful,因为它不需要重复的模型训练会议。这是因为它可以利用固定表示的候选数据点。我们发现MiSiCAL可以在150个COCO10k类中超过随机策略,而最强基eline只能在101个类中超过随机策略。

  • paper_url: http://arxiv.org/abs/2307.09093
  • repo_url: None
  • paper_authors: Saeed Ghoorchian, Setareh Maghsudi
  • for: The paper is written for solving the problem of sequential decision-making under uncertainty with long feedback delays, particularly in non-stationary environments with structural dependencies amongst the reward distributions.
  • methods: The paper proposes a policy that learns the causal relations between the arms using a stationary structural equation model, and utilizes this knowledge to optimize the decision-making while adapting to drifts.
  • results: The paper proves a regret bound for the performance of the proposed algorithm, and evaluates the method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
    Abstract Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This problem becomes significantly challenging in a non-stationary environment with structural dependencies amongst the reward distributions associated with the arms. Therefore, besides adapting to delays and environmental changes, learning the causal relations alleviates the adverse effects of feedback delay on the decision-making process. We formalize the described setting as a non-stationary and delayed combinatorial semi-bandit problem with causally related rewards. We model the causal relations by a directed graph in a stationary structural equation model. The agent maximizes the long-term average payoff, defined as a linear function of the base arms' rewards. We develop a policy that learns the structural dependencies from delayed feedback and utilizes that to optimize the decision-making while adapting to drifts. We prove a regret bound for the performance of the proposed algorithm. Besides, we evaluate our method via numerical analysis using synthetic and real-world datasets to detect the regions that contribute the most to the spread of Covid-19 in Italy.
    摘要 纷纷决策下面存在长时间反馈延迟,这会导致学习代理人的表现下降,无法在长期内确定优化收益的子集。在非站点环境中,抽象依赖关系的赏金分布难以预测,这使得问题变得更加挑战。因此,除了适应延迟和环境变化之外,学习 causal 关系可以减轻延迟的负面影响。我们将此设定形式化为非站点和延迟 combinatorial 半带抽象问题,使用 causally 相关的赏金分布模型。我们的代理人通过延迟反馈学习 structural 相关性,并使用这些相关性来优化决策。我们证明了我们提出的算法的 regret bound。此外,我们通过NumPy和实际数据进行数学分析,以便检测意大利COVID-19 的扩散区域。

A Federated learning model for Electric Energy management using Blockchain Technology

  • paper_url: http://arxiv.org/abs/2307.09080
  • repo_url: None
  • paper_authors: Muhammad Shoaib Farooq, Azeen Ahmed Hayat
  • For: The paper aims to address energy shortfall and electricity load shedding in developing countries by improving energy management and increasing the use of renewable energy sources.* Methods: The paper proposes the use of federated learning and blockchain technology to forecast energy requirements and ensure transparency, traceability, and security in energy transactions between prosumers and consumers.* Results: The experiment results show that renewable energy sources have produced better and comparable results to other non-renewable energy resources.Here’s the simplified Chinese text for the three points:* For: 这篇论文目的是解决发展中国家的能源短缺和电力卸载问题,通过改善能源管理和使用可再生能源。* Methods: 论文提议使用联邦学习和区块链技术,对消费者和生产者之间的能源交易进行透明度、跟踪性和安全性的保障。* Results: 实验结果表明,可再生能源资源比其他非可再生能源资源更好和相当。
    Abstract Energy shortfall and electricity load shedding are the main problems for developing countries. The main causes are lack of management in the energy sector and the use of non-renewable energy sources. The improved energy management and use of renewable sources can be significant to resolve energy crisis. It is necessary to increase the use of renewable energy sources (RESs) to meet the increasing energy demand due to high prices of fossil-fuel based energy. Federated learning (FL) is the most emerging technique in the field of artificial intelligence. Federated learning helps to generate global model at server side by ensemble locally trained models at remote edges sites while preserving data privacy. The global model used to predict energy demand to satisfy the needs of consumers. In this article, we have proposed Blockchain based safe distributed ledger technology for transaction of data between prosumer and consumer to ensure their transparency, traceability and security. Furthermore, we have also proposed a Federated learning model to forecast the energy requirements of consumer and prosumer. Moreover, Blockchain has been used to store excess energy data from prosumer for better management of energy between prosumer and grid. Lastly, the experiment results revealed that renewable energy sources have produced better and comparable results to other non-renewable energy resources.
    摘要 发展中国家面临着能源短缺和电力卸载危机,主要原因是能源部门的管理不足和使用非可再生能源。通过改善能源管理和使用可再生能源,可以有效解决能源危机。随着化石燃料基本能源价格的上涨,使用可再生能源成为了解决能源危机的重要手段。最新的技术之一是联邦学习(FL),它可以在远程的边缘设备上 ensemble 本地训练的模型,而不需要将数据传输到服务器端,以保护数据隐私。这个全球模型可以预测消费者的能源需求,并且可以使用可再生能源来满足消费者的需求。本文提出了基于区块链的安全分布式笔记录技术,用于在消费者和生产者之间传输数据,以确保数据的透明度、可追溯性和安全性。此外,我们还提出了基于联邦学习的能源需求预测模型,以便更好地预测消费者和生产者的能源需求。此外,使用区块链存储生产者的剩余能源数据,以便更好地管理能源的协调。实验结果表明,可再生能源可以生产更好的和相对比较好的结果,相比于其他非可再生能源资源。

DiTTO: Diffusion-inspired Temporal Transformer Operator

  • paper_url: http://arxiv.org/abs/2307.09072
  • repo_url: None
  • paper_authors: Oded Ovadia, Eli Turkel, Adar Kahana, George Em Karniadakis
  • for: 用于解决时间取值积分方程(PDE)。
  • methods: 使用数据驱动的操作学习方法,不需要时间排序。
  • results: 在多维度的 burgers 方程、navier-stokes 方程和声波方程上达到了状态机器人精度。 Additionally, the method can perform zero-shot super-resolution in time.Here’s the full text in Simplified Chinese:
  • for: 本文用于解决时间取值积分方程(PDE)。
  • methods: 本文提出了一种基于操作学习的数据驱动方法,不需要时间排序。
  • results: 在多维度的 burgers 方程、navier-stokes 方程和声波方程上,本方法达到了状态机器人精度。此外,方法还可以实现零试验超分辨率。
    Abstract Solving partial differential equations (PDEs) using a data-driven approach has become increasingly common. The recent development of the operator learning paradigm has enabled the solution of a broader range of PDE-related problems. We propose an operator learning method to solve time-dependent PDEs continuously in time without needing any temporal discretization. The proposed approach, named DiTTO, is inspired by latent diffusion models. While diffusion models are usually used in generative artificial intelligence tasks, their time-conditioning mechanism is extremely useful for PDEs. The diffusion-inspired framework is combined with elements from the Transformer architecture to improve its capabilities. We demonstrate the effectiveness of the new approach on a wide variety of PDEs in multiple dimensions, namely the 1-D Burgers' equation, 2-D Navier-Stokes equations, and the acoustic wave equation in 2-D and 3-D. DiTTO achieves state-of-the-art results in terms of accuracy for these problems. We also present a method to improve the performance of DiTTO by using fast sampling concepts from diffusion models. Finally, we show that DiTTO can accurately perform zero-shot super-resolution in time.
    摘要 解决部分梯度方程(PDEs)使用数据驱动方法已成为日益普遍。最近的运算学学 paradigm的发展已使得解决更广泛的 PDE 相关问题变得可能。我们提议一种运算学学方法,名为 DiTTO,可以连续地在时间上解决时间依赖的 PDE。该方法灵感于潜在扩散模型,而扩散模型通常用于生成人工智能任务。扩散启发的框架与 transformer 架构的元素相结合,以提高其能力。我们在多维 PDE 上进行了广泛的实验,包括一维拜尔斯坦方程、二维奈尔-斯托克方程以及二维和三维的声波方程。DiTTO 在这些问题上达到了最新的精度标准。此外,我们还提出了使用快速抽样概念从扩散模型来提高 DiTTO 的性能的方法。最后,我们展示了 DiTTO 可以准确地进行零 shot 超分辨率在时间上。

Evaluate Fine-tuning Strategies for Fetal Head Ultrasound Image Segmentation with U-Net

  • paper_url: http://arxiv.org/abs/2307.09067
  • repo_url: https://github.com/13204942/ft_methods_for_fetal_head_segmentation
  • paper_authors: Fangyijie Wang, Guénolé Silvestre, Kathleen M. Curran
  • for: 这篇研究的目的是提高妊娠期间胎头圆周盘 circumference(HC)的测量效率,使用扩展学习(Transfer Learning,TL)方法来改善医疗生物米etry的精度。
  • methods: 本研究使用了潜在神经网络(Convolutional Neural Network,CNN)模型,并使用了轻量级的 MobileNet 作为数据库的数据库。
  • results: 研究发现,使用 Transfer Learning 方法可以将胎头像的数据库训练为 U-Net 网络,并且可以实现高度的准确性,仅需要有限的训练时间和资源。此外,本研究还发现,使用 Transfer Learning 方法可以实现较小的模型大小,并且可以提高模型的稳定性和可靠性。
    Abstract Fetal head segmentation is a crucial step in measuring the fetal head circumference (HC) during gestation, an important biometric in obstetrics for monitoring fetal growth. However, manual biometry generation is time-consuming and results in inconsistent accuracy. To address this issue, convolutional neural network (CNN) models have been utilized to improve the efficiency of medical biometry. But training a CNN network from scratch is a challenging task, we proposed a Transfer Learning (TL) method. Our approach involves fine-tuning (FT) a U-Net network with a lightweight MobileNet as the encoder to perform segmentation on a set of fetal head ultrasound (US) images with limited effort. This method addresses the challenges associated with training a CNN network from scratch. It suggests that our proposed FT strategy yields segmentation performance that is comparable when trained with a reduced number of parameters by 85.8%. And our proposed FT strategy outperforms other strategies with smaller trainable parameter sizes below 4.4 million. Thus, we contend that it can serve as a dependable FT approach for reducing the size of models in medical image analysis. Our key findings highlight the importance of the balance between model performance and size in developing Artificial Intelligence (AI) applications by TL methods. Code is available at https://github.com/13204942/FT_Methods_for_Fetal_Head_Segmentation.
    摘要 Gestational fetal head circumference (HC) measurement is crucial, and manual biometry is time-consuming and prone to errors. To address this, convolutional neural networks (CNNs) have been used for biometry. However, training a CNN from scratch is challenging. To solve this, we proposed a transfer learning (TL) method. Our approach involves fine-tuning a U-Net network with a lightweight MobileNet as the encoder to perform segmentation on a set of fetal head ultrasound (US) images with minimal effort. This method addresses the challenges of training a CNN from scratch and shows that our proposed FT strategy achieves segmentation performance comparable to training with a reduced number of parameters by 85.8%. Our proposed FT strategy also outperforms other strategies with smaller trainable parameter sizes below 4.4 million. Therefore, we suggest that our FT approach is a reliable method for reducing the size of models in medical image analysis. Our findings highlight the importance of balancing model performance and size in developing artificial intelligence (AI) applications using TL methods. 针对妊娠期胎头径的量度是至关重要,但是手动测量是时间费时且精度不稳定。为解决这个问题,人工神经网络(CNN)已经被应用于生物метría。然而,从零开始训练CNN是一项具有挑战性的任务。为解决这个问题,我们提出了传输学习(TL)方法。我们的方法涉及到了精细调整U-Net网络的 MobileNet 作为编码器,以实现对一组妊娠期胎头ultrasound(US)图像进行分割,即使是很少的努力。这种方法解决了训练CNN从零开始的挑战,并表明了我们的FT策略可以与减少参数数量的85.8%相比,实现分割性能。此外,我们的FT策略还超过了其他具有更小的可学习参数数量的策略。因此,我们建议这种FT方法可以作为医疗图像分析中减小模型的可靠方法。我们的关键发现指出了在通过TL方法开发人工智能应用程序时,模型性能和模型大小之间的平衡是非常重要的。

Learning Adaptive Neighborhoods for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.09065
  • repo_url: None
  • paper_authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden
  • for: 实现终端学习于 гра组织资料上,但许多工作假设已知 graph 结构。当输入 graph 是噪音或无法取得时,一种方法是建构或学习 latent graph 结构。这些方法通常固定 graph 结构中每个 node 的度量,这是不佳的。相反,我们提出了一个 novel end-to-end differentiable graph generator,可以建构 graph 结构,每个 node 可以选择它的邻居和大小。
  • methods: 我们提出了一个 novel end-to-end differentiable graph generator,可以建构 graph 结构,每个 node 可以选择它的邻居和大小。
  • results: 我们将我们的模组 integrate 到 trajectory prediction, point cloud classification 和 node classification pipeline 中,实现了与其他 structure-learning 方法相比的提高精度,在各种数据集和 GCN 背景下。
    Abstract Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiable graph generator which builds graph topologies where each node selects both its neighborhood and its size. Our module can be readily integrated into existing pipelines involving graph convolution operations, replacing the predetermined or existing adjacency matrix with one that is learned, and optimized, as part of the general objective. As such it is applicable to any GCN. We integrate our module into trajectory prediction, point cloud classification and node classification pipelines resulting in improved accuracy over other structure-learning methods across a wide range of datasets and GCN backbones.
    摘要 格点卷积网络(GCN)可以实现端到端学习在格式化数据上。然而,许多工作假设给定的图结构。当输入图是噪音或不可用时,一种方法是构建或学习隐藏的图结构。这些方法通常固定整个图的节点度,这是不优化的。相反,我们提出了一种新的终端可微的图生成器,它可以在每个节点选择其邻居和大小。我们的模块可以轻松地与现有的图卷积操作相结合,将预先确定或现有的相互作用矩阵 replaced with一个学习和优化的矩阵,并成为任何GCN的一部分。因此,它是可应用的。我们将我们的模块集成到了路径预测、点云分类和节点分类管道中,从而在各种数据集和GCN脊梁上实现了与其他结构学习方法相比较高的准确率。

Extreme heatwave sampling and prediction with analog Markov chain and comparisons with deep learning

  • paper_url: http://arxiv.org/abs/2307.09060
  • repo_url: None
  • paper_authors: George Miloshevich, Dario Lucente, Pascal Yiou, Freddy Bouchet
  • For: The paper aims to develop a data-driven emulator, called stochastic weather generator (SWG), to estimate the probabilities of prolonged heatwaves in France and Scandinavia.* Methods: The SWG emulator uses the method of analogs of circulation, which is combined with temperature and soil moisture as predictor fields. The emulator is trained on an intermediate complexity climate model run, and the performance is evaluated using proper score appropriate for rare events. Dimensionality reduction techniques are applied to accelerate the computation of analogs.* Results: The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). The SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. The quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied, and two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia are provided.
    Abstract We present a data-driven emulator, stochastic weather generator (SWG), suitable for estimating probabilities of prolonged heatwaves in France and Scandinavia. This emulator is based on the method of analogs of circulation to which we add temperature and soil moisture as predictor fields. We train the emulator on an intermediate complexity climate model run and show that it is capable of predicting conditional probabilities (forecasting) of heatwaves out of sample. Special attention is payed that this prediction is evaluated using proper score appropriate for rare events. To accelerate the computation of analogs dimensionality reduction techniques are applied and the performance is evaluated. The probabilistic prediction achieved with SWG is compared with the one achieved with Convolutional Neural Network (CNN). With the availability of hundreds of years of training data CNNs perform better at the task of probabilistic prediction. In addition, we show that the SWG emulator trained on 80 years of data is capable of estimating extreme return times of order of thousands of years for heatwaves longer than several days more precisely than the fit based on generalised extreme value distribution. Finally, the quality of its synthetic extreme teleconnection patterns obtained with stochastic weather generator is studied. We showcase two examples of such synthetic teleconnection patterns for heatwaves in France and Scandinavia that compare favorably to the very long climate model control run.
    摘要 我们介绍了一个数据驱动的模拟器,随机天气生成器(SWG),用于估计法国和斯堪的纳维亚地区的持续高温事件的可能性。这个模拟器基于流体动力学方法,并添加温度和土壤湿度作为预测字段。我们使用一个中间复杂度气候模型的训练来训练这个模拟器,并显示它可以预测基于样本外的条件概率(预测)高温事件。特别是,我们使用合适的评价函数来评估这种预测,以确保对罕见事件进行正确的评估。为加速计算流体的维度减少技术是应用于 analogs,并评估其性能。我们还比较了使用 convolutional neural network(CNN)进行probabilistic预测的性能。通过使用数百年的训练数据,CNN在这项任务上表现更好。此外,我们发现使用80年的数据训练SWG模拟器可以更正精确地估计高温事件持续时间长于数天的极端返回时间,与基于总体极值分布的预测相比。最后,我们研究了SWG模拟器生成的 sintethic极端 теле连接模式的质量。我们展示了法国和斯堪的纳维亚两个高温事件的 sintethic tele连接模式,与非常长气候模型控制运行相比,表现良好。

Deep learning for unsupervised domain adaptation in medical imaging: Recent advancements and future perspectives

  • paper_url: http://arxiv.org/abs/2308.01265
  • repo_url: None
  • paper_authors: Suruchi Kumari, Pravendra Singh
  • for: 本文主要探讨了医学成像领域中最新的深度学习领域适应(Unsupervised Domain Adaptation,UDA)技术,以及它们在各种医学成像任务中的应用。
  • methods: 本文分析了医学成像领域中最新的UDA方法,包括特征对应、图像翻译、自我超vision和分解表示方法等多种方法。
  • results: 本文对各种UDA方法进行了技术分析和评估,并将其分为六个类别,包括图像分类、生物marks检测、肿瘤识别、脑成像分析、肠胃成像分析等多种任务。
    Abstract Deep learning has demonstrated remarkable performance across various tasks in medical imaging. However, these approaches primarily focus on supervised learning, assuming that the training and testing data are drawn from the same distribution. Unfortunately, this assumption may not always hold true in practice. To address these issues, unsupervised domain adaptation (UDA) techniques have been developed to transfer knowledge from a labeled domain to a related but unlabeled domain. In recent years, significant advancements have been made in UDA, resulting in a wide range of methodologies, including feature alignment, image translation, self-supervision, and disentangled representation methods, among others. In this paper, we provide a comprehensive literature review of recent deep UDA approaches in medical imaging from a technical perspective. Specifically, we categorize current UDA research in medical imaging into six groups and further divide them into finer subcategories based on the different tasks they perform. We also discuss the respective datasets used in the studies to assess the divergence between the different domains. Finally, we discuss emerging areas and provide insights and discussions on future research directions to conclude this survey.
    摘要 深度学习在医疗影像领域已经表现出惊人的表现。然而,这些方法主要偏向于有监督学习,假设训练和测试数据来自同一个分布。然而,这种假设可能不 siempre 成立。为 Address 这些问题,无监督领域适应(UDA)技术被开发出来,以传递来自标注域的知识到相关 yet unlabeled 域。在过去几年中,UDA 领域内有了大量的进展,包括特征对齐、图像翻译、自我监督和分解表示方法等。在本文中,我们提供了医疗影像领域的深度 UDA 方法的全面文献回顾,具体来说,我们将当前 UDA 研究分为六个组,并将它们进一步分为不同任务的子类别。我们还讨论了不同研究使用的数据集,以评估不同领域之间的差异。最后,我们提出了未来研究的前景和意见,以结束本文的报告。

Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces

  • paper_url: http://arxiv.org/abs/2307.09057
  • repo_url: None
  • paper_authors: Martin Ryner, Jan Kronqvist, Johan Karlsson
  • for: Computes the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, to quantify the similarity between two formations or shapes.
  • methods: Reformulates the Quadratic Assignment Problem (QAP) as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank.
  • results: Scales well with the number of points and can be used to find the global solution for large-scale problems with thousands of points.
    Abstract This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning. The problem can be formulated as a Quadratic Assignment Problem (QAP), which is in general computationally intractable even for small problems. Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank. The method scales well with the number of points, and it can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem which is of particular interest in computational biology.
    摘要 However, the Gromov-Wasserstein problem is computationally intractable, even for small problems, and is typically formulated as a Quadratic Assignment Problem (QAP). Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank.The proposed method scales well with the number of points and can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem of particular interest in computational biology.

Outlier-Robust Tensor Low-Rank Representation for Data Clustering

  • paper_url: http://arxiv.org/abs/2307.09055
  • repo_url: None
  • paper_authors: Tong Wu
  • for: 提取受损张量数据中的异常点和分 clustering
  • methods: 基于张量特征值分解(t-SVD)的异常点检测和张量数据分 clustering
  • results: 可以 preciselly 恢复受损张量数据的行空间和检测异常点,并且可以 hanlde missing 数据情况。
    Abstract Low-rank tensor analysis has received widespread attention with many practical applications. However, the tensor data are often contaminated by outliers or sample-specific corruptions. How to recover the tensor data that are corrupted by outliers and perform data clustering remains a challenging problem. This paper develops an outlier-robust tensor low-rank representation (OR-TLRR) method for simultaneous outlier detection and tensor data clustering based on the tensor singular value decomposition (t-SVD) algebraic framework. It is motivated by the recently proposed tensor-tensor product induced by invertible linear transforms that satisfy certain conditions. For tensor observations with arbitrary outlier corruptions, OR-TLRR has provable performance guarantee for exactly recovering the row space of clean data and detecting outliers under mild conditions. Moreover, an extension of OR-TLRR is also proposed to handle the case when parts of the data are missing. Finally, extensive experimental results on both synthetic and real data demonstrate the effectiveness of the proposed algorithms.
    摘要 低级张量分析已经广泛受到关注,有很多实际应用。然而,张量数据经常受到异常值或样本特定的损害。如何修复受损的张量数据,并对其进行分类仍然是一个困难的问题。这篇论文开发了一种对异常值敏感的张量低级表示(OR-TLRR)方法,用于同时检测异常值和张量数据的分类。它是基于张量单值分解(t-SVD)的代数框架的。对于受到 произвольными异常损害的张量观察数据,OR-TLRR有可证明的性能保证,可以准确地恢复干净数据的列空间和检测异常值,只要异常值的干扰程度不太大。此外,论文还提出了处理缺失数据的扩展方法。最后,论文的实验结果表明了提议的算法的效果。

qecGPT: decoding Quantum Error-correcting Codes with Generative Pre-trained Transformers

  • paper_url: http://arxiv.org/abs/2307.09025
  • repo_url: https://github.com/chy-i/qecgpt
  • paper_authors: Hanyan Cao, Feng Pan, Yijia Wang, Pan Zhang
  • for: 提出了一个通用框架 для解码量子错误修正码,使用生成模型。
  • methods: 使用自然语言处理技术,特别是Transformers,学习逻辑运算和症状的联合概率。
  • results: 可以高效地计算逻辑运算的可能性,并直接生成最有可能的逻辑运算结果,计算复杂度为 $\mathcal O(2k)$,比传统最大可能性decoding算法要好。
    Abstract We propose a general framework for decoding quantum error-correcting codes with generative modeling. The model utilizes autoregressive neural networks, specifically Transformers, to learn the joint probability of logical operators and syndromes. This training is in an unsupervised way, without the need for labeled training data, and is thus referred to as pre-training. After the pre-training, the model can efficiently compute the likelihood of logical operators for any given syndrome, using maximum likelihood decoding. It can directly generate the most-likely logical operators with computational complexity $\mathcal O(2k)$ in the number of logical qubits $k$, which is significantly better than the conventional maximum likelihood decoding algorithms that require $\mathcal O(4^k)$ computation. Based on the pre-trained model, we further propose refinement to achieve more accurately the likelihood of logical operators for a given syndrome by directly sampling the stabilizer operators. We perform numerical experiments on stabilizer codes with small code distances, using both depolarizing error models and error models with correlated noise. The results show that our approach provides significantly better decoding accuracy than the minimum weight perfect matching and belief-propagation-based algorithms. Our framework is general and can be applied to any error model and quantum codes with different topologies such as surface codes and quantum LDPC codes. Furthermore, it leverages the parallelization capabilities of GPUs, enabling simultaneous decoding of a large number of syndromes. Our approach sheds light on the efficient and accurate decoding of quantum error-correcting codes using generative artificial intelligence and modern computational power.
    摘要

U-shaped Transformer: Retain High Frequency Context in Time Series Analysis

  • paper_url: http://arxiv.org/abs/2307.09019
  • repo_url: None
  • paper_authors: Qingkui Chen, Yiqin Zhang
  • for: 本研究旨在增进时间序列预测领域中的 neural network 性能,通过综合利用 transformer 和 MLP 两种网络结构。
  • methods: 本研究采用了 skip-layer 连接和 patch merge 和 split 操作,以提高 transformer 的低频特征表示能力,并使用更大的数据集来充分利用 transformer 背景。
  • results: 实验结果表明,模型在多个数据集上表现出了高水平的性能,而且比 traditional transformer 更加高效。
    Abstract Time series prediction plays a crucial role in various industrial fields. In recent years, neural networks with a transformer backbone have achieved remarkable success in many domains, including computer vision and NLP. In time series analysis domain, some studies have suggested that even the simplest MLP networks outperform advanced transformer-based networks on time series forecast tasks. However, we believe these findings indicate there to be low-rank properties in time series sequences. In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of MLP. We adopt skip-layer connections inspired by Unet into traditional transformer backbone, thus preserving high-frequency context from input to output, namely U-shaped Transformer. We introduce patch merge and split operation to extract features with different scales and use larger datasets to fully make use of the transformer backbone. Our experiments demonstrate that the model performs at an advanced level across multiple datasets with relatively low cost.
    摘要 时间序列预测在各个产业领域发挥重要作用。近年来,基于 transformer 结构的神经网络在各个领域,如计算机视觉和自然语言处理,取得了很大成功。然而,一些研究表明,简单的多层感知网络可以在时间序列预测任务上超越高级 transformer 基于网络。我们认为这些发现反映了时间序列序列中的低级特性。在这篇论文中,我们考虑了 transformer 的低通Characteristics 和 MLP 网络的优点,并将它们结合在一起。我们采用了 skip-layer 连接,以保持输入到输出的高频上下文,即 U-shaped Transformer。我们还引入 patch 合并和拆分操作,以提取不同缩放的特征,并使用更大的数据集,以充分利用 transformer 脊梁。我们的实验表明,模型在多个数据集上表现出了高水平的性能,而且Relative low cost。

Multimodal LLMs for health grounded in individual-specific data

  • paper_url: http://arxiv.org/abs/2307.09018
  • repo_url: None
  • paper_authors: Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte
  • for: 这篇研究的目的是为了创建能够处理多种资料模式的大语言模型(LLMs),以解决各种领域中的问题,包括健康领域。
  • methods: 这篇研究使用了一个名为HeLM(Health Large Language Model for Multimodal Understanding)的框架,它可以将高维度的医疗资料与LLMs集成,以估计个人疾病风险。HeLM使用了一个Encoder来转换资料模式,将复杂的资料模式转换为LLM的token embedding空间,并将简单的资料模式转换为文本。
  • results: 根据UK Biobank的数据,HeLM可以有效地使用民生和医疗资料,以及高维度时间序列数据估计疾病风险。例如,HeLM在结合表格和呼吸图数据模式时,可以获得预测病例0.75的AUROC,比仅使用表格数据模式时的0.49要高。总的来说,HeLM在选择的八个二分类问题中,都能够超越或与класичного机器学习方法相等。此外,研究还评估了这个模型的扩展性和对个人健康和快速诊断的应用。
    Abstract Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.
    摘要 基于大语言模型(LLM)的研究表明,LLM可以在各种领域解决问题,包括健康领域。为了在个人化医疗方面有效地解决问题,LLM需要能够处理个人健康状况相关的多种数据类型。在这篇论文中,我们开发了一个框架(HeLM:健康大语言模型 для多模态理解),帮助LLM使用个人化数据来估计疾病风险。HeLM将复杂的数据类型编码为将其映射到LLM的符号空间中,而简单的数据类型则通过将数据序列化为文本来实现。使用UK Biobank数据,我们显示了HeLM可以有效地使用人口和临床特征以及高维时序数据来估计疾病风险。例如,当 combining 表格和气流数据模式时,HeLM的 AUC 为 0.75,而只使用表格数据时的 AUC 为 0.49。总的来说,我们发现 HeLM 在选择的八个二分类特征上表现出色,并且与传统机器学习方法相当或超过其表现。此外,我们还 investigate HeLM 的下游应用,包括其对非标型特征的普适性和在个人医疗和健康谈话中的应用能力。

PLiNIO: A User-Friendly Library of Gradient-based Methods for Complexity-aware DNN Optimization

  • paper_url: http://arxiv.org/abs/2307.09488
  • repo_url: None
  • paper_authors: Daniele Jahier Pagliari, Matteo Risso, Beatrice Alessandra Motetti, Alessio Burrello
  • for: 这篇论文主要是为了提供一个开源的深度神经网络设计自动化库(PLiNIO),来搭配各种渐渐变小的优化技术,以提高深度神经网络在紧缩边缘设备上的效能。
  • methods: 这篇论文使用了许多现代的深度神经网络设计自动化技术,包括预测精度估计、优化搜索、阶层优化、卷积优化等,并将这些技术集成到一个开源库中,提供了一个易用的用户界面。
  • results: 根据实验结果,PLiNIO可以实现优化深度神经网络的体积和精度,并且可以实现大约94.34%的内存减少,却只有<1%的精度损失比基eline架构。
    Abstract Accurate yet efficient Deep Neural Networks (DNNs) are in high demand, especially for applications that require their execution on constrained edge devices. Finding such DNNs in a reasonable time for new applications requires automated optimization pipelines since the huge space of hyper-parameter combinations is impossible to explore extensively by hand. In this work, we propose PLiNIO, an open-source library implementing a comprehensive set of state-of-the-art DNN design automation techniques, all based on lightweight gradient-based optimization, under a unified and user-friendly interface. With experiments on several edge-relevant tasks, we show that combining the various optimizations available in PLiNIO leads to rich sets of solutions that Pareto-dominate the considered baselines in terms of accuracy vs model size. Noteworthy, PLiNIO achieves up to 94.34% memory reduction for a <1% accuracy drop compared to a baseline architecture.
    摘要 高效减少内存的深度神经网络(DNN)在边缘设备上的应用越来越受欢迎,特别是在有限的边缘设备上运行。手动搜索大量的超参数组合是不可能的,因此需要自动优化管道。在这种情况下,我们提出PLiNIO,一个开源库,实现了现代DNN设计自动化技术的总集,所有基于轻量级的梯度基于优化,通过统一和易用的界面进行实现。经过一些边缘相关任务的实验,我们发现,PLiNIO中的多种优化技术的组合可以生成高精度和轻量级的解决方案,与考虑的基准架构相比,PLiNIO可以实现94.34%的内存减少,仅带来<1%的准确率下降。

How is ChatGPT’s behavior changing over time?

  • paper_url: http://arxiv.org/abs/2307.09009
  • repo_url: https://github.com/lchen001/llmdrift
  • paper_authors: Lingjiao Chen, Matei Zaharia, James Zou
  • for: 评估 GPT-3.5 和 GPT-4 两个大语言模型在不同时间点上的变化。
  • methods: 使用多种多样化任务评估 GPT-3.5 和 GPT-4 在不同时间点上的表现。
  • results: 发现 GPT-4 和 GPT-3.5 在不同时间点上的表现和行为可能会有很大的变化,如 prime vs. composite numbers 识别 task 中 GPT-4 (3月2023) 的表现比 GPT-4 (6月2023) 更好,但是 GPT-3.5 在6月的表现更好于3月。
    Abstract GPT-3.5 and GPT-4 are the two most widely used large language model (LLM) services. However, when and how these models are updated over time is opaque. Here, we evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 on several diverse tasks: 1) math problems, 2) sensitive/dangerous questions, 3) opinion surveys, 4) multi-hop knowledge-intensive questions, 5) generating code, 6) US Medical License tests, and 7) visual reasoning. We find that the performance and behavior of both GPT-3.5 and GPT-4 can vary greatly over time. For example, GPT-4 (March 2023) was reasonable at identifying prime vs. composite numbers (84% accuracy) but GPT-4 (June 2023) was poor on these same questions (51% accuracy). This is partly explained by a drop in GPT-4's amenity to follow chain-of-thought prompting. Interestingly, GPT-3.5 was much better in June than in March in this task. GPT-4 became less willing to answer sensitive questions and opinion survey questions in June than in March. GPT-4 performed better at multi-hop questions in June than in March, while GPT-3.5's performance dropped on this task. Both GPT-4 and GPT-3.5 had more formatting mistakes in code generation in June than in March. Overall, our findings show that the behavior of the "same" LLM service can change substantially in a relatively short amount of time, highlighting the need for continuous monitoring of LLMs.
    摘要

OxfordVGG Submission to the EGO4D AV Transcription Challenge

  • paper_url: http://arxiv.org/abs/2307.09006
  • repo_url: https://github.com/m-bain/whisperx
  • paper_authors: Jaesung Huh, Max Bain, Andrew Zisserman
  • for: 本研究报告提供了2023年EGO4D音频视觉自动语音识别挑战(AV-ASR)中oxfordvgg团队的技术细节。
  • methods: 本研究使用了WhisperX系统,用于高效处理长形audio的语音识别,并且使用了两个公开可用的文本标准化器。
  • results: 本研究在挑战测试集上取得56.0%的字误率(WER),排名第一名在排行板上。所有基eline代码和模型可以在https://github.com/m-bain/whisperX上获取。
    Abstract This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team. We present WhisperX, a system for efficient speech transcription of long-form audio with word-level time alignment, along with two text normalisers which are publicly available. Our final submission obtained 56.0% of the Word Error Rate (WER) on the challenge test set, ranked 1st on the leaderboard. All baseline codes and models are available on https://github.com/m-bain/whisperX.
    摘要 这份报告介绍了我们在EGO4D Audio-Visual(AV)自动话语识别挑战2023中的提交技术细节,来自于牛津VGG团队。我们提出了一种名为WhisperX的高效语音转文本系统,以及两种公共可用的文本Normalizer。我们的最终提交在挑战测试集上达到56.0%的字SError率,排名第一名。所有基线代码和模型可以在https://github.com/m-bain/whisperX上下载。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Zero-shot Domain-sensitive Speech Recognition with Prompt-conditioning Fine-tuning

  • paper_url: http://arxiv.org/abs/2307.10274
  • repo_url: https://github.com/mtkresearch/clairaudience
  • paper_authors: Feng-Ting Liao, Yung-Chieh Chan, Yi-Chang Chen, Chan-Jan Hsu, Da-shan Shiu
  • for: 这 paper 是为了创建域专注的语音识别模型,利用文本域信息进行 Conditioning 生成。
  • methods: 这 paper 使用了精心调整的预训练、端到端模型(Whisper),通过示例示出学习域特定的示例来学习。
  • results: 这 paper 表明这种能力可以在不同的域和不同的示例上进行泛化,模型在未经见过的数据集上 achieve Word Error Rate (WER) 减少达 33%,并且通过文本Only fine-tuning 来实现域敏感和域适应。
    Abstract In this work, we propose a method to create domain-sensitive speech recognition models that utilize textual domain information by conditioning its generation on a given text prompt. This is accomplished by fine-tuning a pre-trained, end-to-end model (Whisper) to learn from demonstrations with prompt examples. We show that this ability can be generalized to different domains and even various prompt contexts, with our model gaining a Word Error Rate (WER) reduction of up to 33% on unseen datasets from various domains, such as medical conversation, air traffic control communication, and financial meetings. Considering the limited availability of audio-transcript pair data, we further extend our method to text-only fine-tuning to achieve domain sensitivity as well as domain adaptation. We demonstrate that our text-only fine-tuned model can also attend to various prompt contexts, with the model reaching the most WER reduction of 29% on the medical conversation dataset.
    摘要 在这个工作中,我们提出了一种方法,用于创建域特定的语音识别模型,该模型利用文本域信息进行conditioning。这是通过练化一个预训练的、端到端模型(Whisper)来学习示例示唆。我们表明这种能力可以泛化到不同域和不同提示上下文中,我们的模型在未seen datasets上 achieved Word Error Rate(WER)下降达33%。考虑到语音-讲本数据的有限可用性,我们进一步扩展了我们的方法,使其能够在文本只 fine-tuning中实现域敏感性和域适应性。我们示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示示

Oracle Efficient Online Multicalibration and Omniprediction

  • paper_url: http://arxiv.org/abs/2307.08999
  • repo_url: None
  • paper_authors: Sumegha Garg, Christopher Jung, Omer Reingold, Aaron Roth
  • for: 这种研究的目的是为了研究在在线对抗 Setting中的 omniprediction 算法,以及其与多calibration 的关系。
  • methods: 这种研究使用了多calibration 和 omniprediction 两种概念,以及一些学习理论的概念。
  • results: 这种研究得到了一种新的在线多calibration 算法,可以在无限 benchmark 类 $F$ 中进行定义,并且是 oracle 有效的(即对于任何类 $F$, 算法可以转化为一种有效的减少 regret 学习算法)。此外,这种算法还可以在 linear functions 类 $F$ 中进行有效的实现。此外,这种研究还提供了 upper 和 lower bounds,用于评估这种算法的性能。
    Abstract A recent line of work has shown a surprising connection between multicalibration, a multi-group fairness notion, and omniprediction, a learning paradigm that provides simultaneous loss minimization guarantees for a large family of loss functions. Prior work studies omniprediction in the batch setting. We initiate the study of omniprediction in the online adversarial setting. Although there exist algorithms for obtaining notions of multicalibration in the online adversarial setting, unlike batch algorithms, they work only for small finite classes of benchmark functions $F$, because they require enumerating every function $f \in F$ at every round. In contrast, omniprediction is most interesting for learning theoretic hypothesis classes $F$, which are generally continuously large. We develop a new online multicalibration algorithm that is well defined for infinite benchmark classes $F$, and is oracle efficient (i.e. for any class $F$, the algorithm has the form of an efficient reduction to a no-regret learning algorithm for $F$). The result is the first efficient online omnipredictor -- an oracle efficient prediction algorithm that can be used to simultaneously obtain no regret guarantees to all Lipschitz convex loss functions. For the class $F$ of linear functions, we show how to make our algorithm efficient in the worst case. Also, we show upper and lower bounds on the extent to which our rates can be improved: our oracle efficient algorithm actually promises a stronger guarantee called swap-omniprediction, and we prove a lower bound showing that obtaining $O(\sqrt{T})$ bounds for swap-omniprediction is impossible in the online setting. On the other hand, we give a (non-oracle efficient) algorithm which can obtain the optimal $O(\sqrt{T})$ omniprediction bounds without going through multicalibration, giving an information theoretic separation between these two solution concepts.
    摘要 最近的研究表明了 Multicalibration 和 Omniprediction 之间的意外联系,Multicalibration 是一种多组公平性概念,Omniprediction 是一种学习 парадиг,可以同时提供多种损失函数下的极小损失 garantías。先前的研究主要关注 Omniprediction 在批处理 setting 中。我们开始研究 Omniprediction 在在线对抗 setting 中。 existing 算法可以在在线对抗 setting 中获得 Multicalibration 的概念,但它们只适用于小型固定的 benchmark function 集合 $F$,因为它们需要在每个轮次中列出所有函数 $f \in F$。与此相比,Omniprediction 更加 interesseting,因为它可以学习理论上的假设类 $F$,这些类通常是无限大的。我们开发了一个新的在线 Multicalibration 算法,可以对无限 benchmark function 集合 $F$进行定义,并且是oracle efficient(即对于任何类 $F$,算法有形式的减少到一个不失业的学习算法 для $F$)。结果是首个 oracle efficient 的在线 omnipredictor,可以同时提供对所有 Lipschitz 几何损失函数的 no-regret garantías。对于 linear function 集合 $F$,我们说明了如何使我们的算法高效。此外,我们还提供了上下 bounds 的证明,其中我们的 oracle efficient 算法实际上承诺了更强的 guarantee called swap-omniprediction,并且我们证明了在在线 setting 中,不可能在 $O(\sqrt{T})$ 级别上获得 swap-omniprediction。相反,我们提供了一个(非oracle efficient)算法,可以在无需 Multicalibration 的情况下获得最佳 $O(\sqrt{T})$ omniprediction bound。这提供了一种信息理论上的分离,证明了这两个解决方案之间的不同。

GraphCL-DTA: a graph contrastive learning with molecular semantics for drug-target binding affinity prediction

  • paper_url: http://arxiv.org/abs/2307.08989
  • repo_url: None
  • paper_authors: Xinxing Yang, Genke Yang, Jian Chu
  • for: 预测药物与Target分子之间的吸附积分,以便在药物探索的初期阶段快速地评估新药的可能性。
  • methods: 我们提出了一种基于分子图的对比学习框架——GraphCL-DTA,通过这种框架,学习药物的分子图表示,保持分子图的 semantics。此外,我们还设计了一种新的损失函数,可以直接调整药物和目标表示的均匀性。
  • results: 我们在两个真实数据集(KIBA和Davis)上验证了GraphCL-DTA的效果,结果显示GraphCL-DTA在这些数据集上表现出色,较之前的状态艺模型而言,具有更高的准确率和更好的可靠性。
    Abstract Drug-target binding affinity prediction plays an important role in the early stages of drug discovery, which can infer the strength of interactions between new drugs and new targets. However, the performance of previous computational models is limited by the following drawbacks. The learning of drug representation relies only on supervised data, without taking into account the information contained in the molecular graph itself. Moreover, most previous studies tended to design complicated representation learning module, while uniformity, which is used to measure representation quality, is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.
    摘要 药Target绑定亲和力预测在药物发现的早期阶段具有重要的作用,可以推断新药和新目标之间的亲和力强度。然而,先前的计算模型的性能受以下缺点限制。学习药物表示 rely only on supervised data, without considering the information contained in the molecular graph itself. In addition, most previous studies have designed complicated representation learning modules, while the uniformity of the representation quality is ignored. In this study, we propose GraphCL-DTA, a graph contrastive learning with molecular semantics for drug-target binding affinity prediction. In GraphCL-DTA, we design a graph contrastive learning framework for molecular graphs to learn drug representations, so that the semantics of molecular graphs are preserved. Through this graph contrastive framework, a more essential and effective drug representation can be learned without additional supervised data. Next, we design a new loss function that can be directly used to smoothly adjust the uniformity of drug and target representations. By directly optimizing the uniformity of representations, the representation quality of drugs and targets can be improved. The effectiveness of the above innovative elements is verified on two real datasets, KIBA and Davis. The excellent performance of GraphCL-DTA on the above datasets suggests its superiority to the state-of-the-art model.

Neural Network Pruning as Spectrum Preserving Process

  • paper_url: http://arxiv.org/abs/2307.08982
  • repo_url: None
  • paper_authors: Shibo Yao, Dantong Yu, Ioannis Koutis
  • for: 本研究旨在提出一种基于矩阵 спектル学习的神经网络减量方法,以提高神经网络在边缘设备上的运行效率。
  • methods: 本文使用矩阵 спектル学习来分析神经网络的训练过程,并提出一种基于矩阵减量的神经网络减量算法。
  • results: 实验结果表明,该算法可以更好地保留神经网络的重要参数,并提高神经网络在边缘设备上的运行效率。
    Abstract Neural networks have achieved remarkable performance in various application domains. Nevertheless, a large number of weights in pre-trained deep neural networks prohibit them from being deployed on smartphones and embedded systems. It is highly desirable to obtain lightweight versions of neural networks for inference in edge devices. Many cost-effective approaches were proposed to prune dense and convolutional layers that are common in deep neural networks and dominant in the parameter space. However, a unified theoretical foundation for the problem mostly is missing. In this paper, we identify the close connection between matrix spectrum learning and neural network training for dense and convolutional layers and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on the analysis, we also propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning result. We carefully design and conduct experiments to support our arguments. Hence we provide a consolidated viewpoint for neural network pruning and enhance the interpretability of deep neural networks by identifying and preserving the critical neural weights.
    摘要 In this paper, we explore the close connection between matrix spectrum learning and neural network training for dense and convolutional layers, and argue that weight pruning is essentially a matrix sparsification process to preserve the spectrum. Based on this analysis, we propose a matrix sparsification algorithm tailored for neural network pruning that yields better pruning results.We carefully design and conduct experiments to support our arguments, providing a consolidated viewpoint for neural network pruning and enhancing the interpretability of deep neural networks by identifying and preserving the critical neural weights.

A Unifying Framework for Differentially Private Sums under Continual Observation

  • paper_url: http://arxiv.org/abs/2307.08970
  • repo_url: None
  • paper_authors: Monika Henzinger, Jalaj Upadhyay, Sarvagya Upadhyay
  • for: 本研究考虑了维护具有不同权重的汇总数据的权衡私钥性问题,即在不断观察时维护 differentially private 的汇总数据。
  • methods: 我们提出了一种通用框架和有效的算法来解决这个问题,其适用于任何足够平滑的函数。我们的算法是首个不具有多项式误差的权衡私钥性汇总数据算法。
  • results: 我们的算法可以在不断观察时维护 differentially private 的汇总数据,并且可以 precisley recover continual counting 问题中的误差( Henzinger et al., SODA 2023)。我们的算法基于因子化机制,其误差取决于underlying matrix的 $\gamma_2$ 和 $\gamma_F$ 范数。我们给出了一个可构造的证明,证明了 $\gamma_2$ 和 $\gamma_F$ 范数的上界和下界。这是首次不同于所有非零元素都是相同的lower-triangular矩阵下的非тиrivial下界。
    Abstract We study the problem of maintaining a differentially private decaying sum under continual observation. We give a unifying framework and an efficient algorithm for this problem for \emph{any sufficiently smooth} function. Our algorithm is the first differentially private algorithm that does not have a multiplicative error for polynomially-decaying weights. Our algorithm improves on all prior works on differentially private decaying sums under continual observation and recovers exactly the additive error for the special case of continual counting from Henzinger et al. (SODA 2023) as a corollary. Our algorithm is a variant of the factorization mechanism whose error depends on the $\gamma_2$ and $\gamma_F$ norm of the underlying matrix. We give a constructive proof for an almost exact upper bound on the $\gamma_2$ and $\gamma_F$ norm and an almost tight lower bound on the $\gamma_2$ norm for a large class of lower-triangular matrices. This is the first non-trivial lower bound for lower-triangular matrices whose non-zero entries are not all the same. It includes matrices for all continual decaying sums problems, resulting in an upper bound on the additive error of any differentially private decaying sums algorithm under continual observation. We also explore some implications of our result in discrepancy theory and operator algebra. Given the importance of the $\gamma_2$ norm in computer science and the extensive work in mathematics, we believe our result will have further applications.
    摘要 我们研究维护具有泛化隐私衰减的总和问题在不断观察下。我们提供一个统一的框架和高效的算法来解决这个问题,该算法适用于任何足够光滑的函数。我们的算法是第一个不具有多项式误差的泛化隐私总和算法。我们的算法超越了所有以前的泛化隐私总和算法,并在特殊情况下 recover exactly Henzinger et al.(SODA 2023)中的添加误差。我们的算法是一种因子化机制的变体,其误差取决于underlying矩阵的$\gamma_2$和$\gamma_F$范数。我们提供了一种可构造的证明,证明了一个非常接近的上界和下界,其中下界适用于一类lower-triangular矩阵。这是第一个不同于所有非零非同样的非零元素的下三角矩阵的下界。它包括所有不断观察下的总和问题矩阵,从而得到了任何泛化隐私总和算法的添加误差的Upper bound。我们还探讨了我们结果在不同误差理论和运算代数方面的应用。由于计算机科学中的$\gamma_2$范数的重要性以及数学领域的广泛工作,我们认为我们的结果将有更多应用。

AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment enabled by Large Language Models

  • paper_url: http://arxiv.org/abs/2307.11772
  • repo_url: None
  • paper_authors: Rui Zhang, Yixin Su, Bayu Distiawan Trisedya, Xiaoyan Zhao, Min Yang, Hong Cheng, Jianzhong Qi
  • For: The paper is written for the task of entity alignment between knowledge graphs (KGs), specifically proposing a fully automatic method that does not require manually crafted seed alignments.* Methods: The method proposed in the paper is called AutoAlign, which uses predicate embeddings and entity embeddings to align entities between two KGs. Specifically, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs, and shifts the two KGs’ entity embeddings into the same vector space by computing the similarity between entities based on their attributes.* Results: The paper reports that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods, as demonstrated through experiments using real-world KGs.
    Abstract The task of entity alignment between knowledge graphs (KGs) aims to identify every pair of entities from two different KGs that represent the same entity. Many machine learning-based methods have been proposed for this task. However, to our best knowledge, existing methods all require manually crafted seed alignments, which are expensive to obtain. In this paper, we propose the first fully automatic alignment method named AutoAlign, which does not require any manually crafted seed alignments. Specifically, for predicate embeddings, AutoAlign constructs a predicate-proximity-graph with the help of large language models to automatically capture the similarity between predicates across two KGs. For entity embeddings, AutoAlign first computes the entity embeddings of each KG independently using TransE, and then shifts the two KGs' entity embeddings into the same vector space by computing the similarity between entities based on their attributes. Thus, both predicate alignment and entity alignment can be done without manually crafted seed alignments. AutoAlign is not only fully automatic, but also highly effective. Experiments using real-world KGs show that AutoAlign improves the performance of entity alignment significantly compared to state-of-the-art methods.
    摘要 知识 graphs (KGs) 的实体对应问题的任务是将两个不同 KGs 中的实体对应到同一个实体。许多机器学习基于方法已经被提出来解决这个问题。然而,我们所知道的是,现有的方法都需要手动制作的种子对应,这是贵重的。在这篇论文中,我们提出了第一个完全自动对应方法,名为 AutoAlign,不需要任何手动制作的种子对应。特别是,对于 predicate 嵌入,AutoAlign 使用大型自然语言模型来自动捕捉两个 KGs 中 predicate 之间的相似性。对于实体嵌入,AutoAlign 先使用 TransE 来独立计算每个 KG 中的实体嵌入,然后通过计算实体之间的属性相似性来将两个 KGs 的实体嵌入Shift到同一个向量空间中。因此, predicate 对应和实体对应都可以不需要手动制作种子对应。AutoAlign 不仅是完全自动的,还非常有效。使用实际世界 KGs 的实验表明,AutoAlign 在对entity alignment进行比对state-of-the-art方法时有显著改进。

Landscape Surrogate: Learning Decision Losses for Mathematical Optimization Under Partial Information

  • paper_url: http://arxiv.org/abs/2307.08964
  • repo_url: https://github.com/facebookresearch/lancer
  • paper_authors: Arman Zharmagambetov, Brandon Amos, Aaron Ferber, Taoan Huang, Bistra Dilkina, Yuandong Tian
  • for: 优化问题中的部分观察或通用优化器表现不佳,学习一个优化器 $\mathbf{g}$ 来解决这些问题,可以快速加速优化过程,并且可以利用过去经验。
  • methods: 使用一个可学习的地形代理 $M$ 来取代 $f\circ \mathbf{g}$,这个地形代理可以更快速地计算,提供稠密和平滑的梯度,可以泛化到未看到的优化问题,并通过分布式优化来高效地学习。
  • results: 在 sintetic 问题和实际问题上测试了我们的方法,比如短路和多维零链包,和股票配置优化,与状态 искусственный基线相比,我们的方法可以达到相同或更高的目标值,同时减少了对 $\mathbf{g}$ 的调用数量。特别是,我们的方法在高维ensional computationally expensive 问题上表现出色。
    Abstract Recent works in learning-integrated optimization have shown promise in settings where the optimization problem is only partially observed or where general-purpose optimizers perform poorly without expert tuning. By learning an optimizer $\mathbf{g}$ to tackle these challenging problems with $f$ as the objective, the optimization process can be substantially accelerated by leveraging past experience. The optimizer can be trained with supervision from known optimal solutions or implicitly by optimizing the compound function $f\circ \mathbf{g}$. The implicit approach may not require optimal solutions as labels and is capable of handling problem uncertainty; however, it is slow to train and deploy due to frequent calls to optimizer $\mathbf{g}$ during both training and testing. The training is further challenged by sparse gradients of $\mathbf{g}$, especially for combinatorial solvers. To address these challenges, we propose using a smooth and learnable Landscape Surrogate $M$ as a replacement for $f\circ \mathbf{g}$. This surrogate, learnable by neural networks, can be computed faster than the solver $\mathbf{g}$, provides dense and smooth gradients during training, can generalize to unseen optimization problems, and is efficiently learned via alternating optimization. We test our approach on both synthetic problems, including shortest path and multidimensional knapsack, and real-world problems such as portfolio optimization, achieving comparable or superior objective values compared to state-of-the-art baselines while reducing the number of calls to $\mathbf{g}$. Notably, our approach outperforms existing methods for computationally expensive high-dimensional problems.
    摘要 现代学习整合优化技术已经在部分 observable 的优化问题或通用优化器无需专家调整时表现出了搭配性。通过学习一个优化器 $\mathbf{g}$,以便在 $f$ 作为目标函数下解决这些复杂的问题,优化过程可以大幅加速。可以通过知识到的优化解决方案或间接地将 $f\circ \mathbf{g}$ 作为整合函数来训练优化器。 indirect approach 可能不需要优化解决方案作为标签,并且可以处理问题不确定性,但是它在训练和部署时间较慢,因为在训练和测试中频繁地调用优化器 $\mathbf{g}$。训练还面临着 $\mathbf{g}$ 的稀疏梯度问题,特别是对 combinatorial 解决器。为解决这些挑战,我们提出使用一个缓和可学习的景观准则 $M$,作为 $f\circ \mathbf{g}$ 的替代品。这个准则可以通过神经网络学习,在训练时间更快,提供稠密和平滑的梯度,可以泛化到未见优化问题,并通过相关优化来有效地学习。我们在 sintetic 问题和实际问题上进行了测试,包括短路和多维锦包,并取得了与状态艺术基准相同或更高的目标值,同时减少了对 $\mathbf{g}$ 的调用数量。特别是,我们的方法在高维计算成本高的问题上表现出色。

REX: Rapid Exploration and eXploitation for AI Agents

  • paper_url: http://arxiv.org/abs/2307.08962
  • repo_url: None
  • paper_authors: Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
  • for: 提高AI代理的快速探索和尝试能力,解决现有AutoGPT风格技术的缺陷,如偏重精确描述的决策和缺乏有系统的尝试和失败处理方式。
  • methods: 提出一种增强的Rapid Exploration and eXploitation(REX)方法,通过添加奖励层和基于Upper Confidence Bound(UCB)的概念,使AI代理性能更加稳定和高效。REX方法不需要模型细化,可以利用日志数据,并与现有基础模型协作无缝。
  • results: Comparative analysis表明,使用REX方法可以与现有方法(如Chain-of-Thoughts(CoT)和Reasoning viA Planning(RAP))相比,在一些情况下甚至超越其表现,同时具有显著减少执行时间的优点,提高了在多样化场景下的实际应用性。
    Abstract In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.
    摘要 在这篇论文中,我们提出了一种改进后的快速探索和努力(Rapid Exploration and eXploitation,REX)方法,用于AI代理。现有的AutoGPT样式技术存在一定的限制,如准确描述的重要性和RL的缺乏系统化适应。REX增加了一层奖励,并将Upper Confidence Bound(UCB)类概念纳入方法中,从而使AI代理表现更加稳定和高效。这种方法具有使用日志中的假日行为,无需模型细化而可以与现有基础模型集成,并且在比较分析中与现有方法(Chain-of-Thoughts(CoT)和Reasoning viA Planning(RAP)) demonstrate了相似或者甚至超过其表现。尤其是,REX基本方法在执行时间方面具有显著的减少,使其在多样化的情况下更加实用。

Discretization-based ensemble model for robust learning in IoT

  • paper_url: http://arxiv.org/abs/2307.08955
  • repo_url: None
  • paper_authors: Anahita Namvar, Chandra Thapa, Salil S. Kanhere
  • for: 提高 IoT 设备识别模型的安全性,抵御黑盒和白盒攻击。
  • methods: integrate discretization techniques and ensemble methods to improve the robustness of machine learning models for IoT device identification.
  • results: 提高了 ML 模型对 IoT 设备识别的可靠性和安全性,抵御了黑盒和白盒攻击。
    Abstract IoT device identification is the process of recognizing and verifying connected IoT devices to the network. This is an essential process for ensuring that only authorized devices can access the network, and it is necessary for network management and maintenance. In recent years, machine learning models have been used widely for automating the process of identifying devices in the network. However, these models are vulnerable to adversarial attacks that can compromise their accuracy and effectiveness. To better secure device identification models, discretization techniques enable reduction in the sensitivity of machine learning models to adversarial attacks contributing to the stability and reliability of the model. On the other hand, Ensemble methods combine multiple heterogeneous models to reduce the impact of remaining noise or errors in the model. Therefore, in this paper, we integrate discretization techniques and ensemble methods and examine it on model robustness against adversarial attacks. In other words, we propose a discretization-based ensemble stacking technique to improve the security of our ML models. We evaluate the performance of different ML-based IoT device identification models against white box and black box attacks using a real-world dataset comprised of network traffic from 28 IoT devices. We demonstrate that the proposed method enables robustness to the models for IoT device identification.
    摘要 互联网物联网设备识别是将连接到网络的互联网设备识别和验证的过程。这是确保只允许授权的设备访问网络的 essencial 过程,并对网络管理和维护是必需的。在过去几年中,机器学习模型广泛用于自动化网络中设备识别的过程。然而,这些模型容易受到恶意攻击的影响,这可能会降低其精度和有效性。为了更好地安全设备识别模型,精度技术可以减少机器学习模型对恶意攻击的敏感度,从而提高模型的稳定性和可靠性。此外,组合多种不同的模型可以减少剩下的噪音或错误的影响。因此,在这篇论文中,我们将精度技术和组合方法结合使用,并对其在模型对恶意攻击的Robustness进行评估。即我们提出了一种基于精度的集成堆叠技术,以提高互联网设备识别模型的安全性。我们使用了一个实际的网络流量数据集,包含28个互联网设备的网络流量,对不同的机器学习基于互联网设备识别模型进行了白盒和黑盒攻击的评估。我们的结果表明,我们的提议的方法可以提高互联网设备识别模型的Robustness。

Knowledge-infused Deep Learning Enables Interpretable Landslide Forecasting

  • paper_url: http://arxiv.org/abs/2307.08951
  • repo_url: None
  • paper_authors: Zhengjing Ma, Gang Mei
  • for: 预测山崩发展和失败的可能性是一项复杂的任务,因为它们受到许多内部和外部因素的影响。
  • methods: 这篇文章使用了一种名为LFIT的转换器基本深度学习网络,该网络可以学习非线性关系,并且具有可读性和多源数据处理能力。
  • results: 文章表明,通过结合先前知识,可以提高整体山崩预测,并且可以捕捉不同地区的山崩行为和时间模式。通过使用塑形变形观测数据,文章验证了该方法的可靠性和可读性。
    Abstract Forecasting how landslides will evolve over time or whether they will fail is a challenging task due to a variety of factors, both internal and external. Despite their considerable potential to address these challenges, deep learning techniques lack interpretability, undermining the credibility of the forecasts they produce. The recent development of transformer-based deep learning offers untapped possibilities for forecasting landslides with unprecedented interpretability and nonlinear feature learning capabilities. Here, we present a deep learning pipeline that is capable of predicting landslide behavior holistically, which employs a transformer-based network called LFIT to learn complex nonlinear relationships from prior knowledge and multiple source data, identifying the most relevant variables, and demonstrating a comprehensive understanding of landslide evolution and temporal patterns. By integrating prior knowledge, we provide improvement in holistic landslide forecasting, enabling us to capture diverse responses to various influencing factors in different local landslide areas. Using deformation observations as proxies for measuring the kinetics of landslides, we validate our approach by training models to forecast reservoir landslides in the Three Gorges Reservoir and creeping landslides on the Tibetan Plateau. When prior knowledge is incorporated, we show that interpretable landslide forecasting effectively identifies influential factors across various landslides. It further elucidates how local areas respond to these factors, making landslide behavior and trends more interpretable and predictable. The findings from this study will contribute to understanding landslide behavior in a new way and make the proposed approach applicable to other complex disasters influenced by internal and external factors in the future.
    摘要 预测滑坡的发展趋势或是否会失败是一项复杂的任务,因为它们受到多种内部和外部因素的影响。 DESPITE THEIR POTENTIAL TO ADDRESS THESE CHALLENGES, deep learning techniques lack interpretability, which undermines the credibility of the forecasts they produce. However, the recent development of transformer-based deep learning offers untapped possibilities for forecasting landslides with unprecedented interpretability and nonlinear feature learning capabilities. 在这种情况下,我们提出了一个深度学习管道,可以捕捉滑坡的行为的整体特征,该管道使用名为LFIT的变换器基于网络,可以学习复杂的非线性关系,并且可以确定最重要的变量。通过结合先前知识,我们提供了改进的总体滑坡预测方法,可以捕捉不同的滑坡区域响应不同的外部因素的多样化响应。使用塑形观察作为滑坡动力的代理,我们验证了我们的方法,通过在三峡水库和藏北高原的滑坡中训练模型,预测滑坡的发展趋势。当嵌入先前知识时,我们显示出可解释的滑坡预测方法可以准确地确定影响滑坡的因素,并且可以解释不同的滑坡区域如何响应这些因素,使滑坡行为和趋势更加可解释和预测。这些发现将在未来对其他复杂的自然灾害,受到内部和外部因素影响的灾害中应用。

Alioth: A Machine Learning Based Interference-Aware Performance Monitor for Multi-Tenancy Applications in Public Cloud

  • paper_url: http://arxiv.org/abs/2307.08949
  • repo_url: https://github.com/sthowling/alioth
  • paper_authors: Tianyao Shi, Yingxuan Yang, Yunlong Cheng, Xiaofeng Gao, Zhen Fang, Yongqiang Yang
    for: This paper aims to monitor the performance degradation of cloud applications in public clouds caused by co-location interference.methods: The proposed method, Alioth, uses a novel machine learning framework that includes interference generators, denoising auto-encoders, domain adaptation neural networks, and SHAP explainers to monitor performance degradation.results: Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming baseline methods. It also demonstrates robustness in signaling quality-of-service violation under dynamicity.
    Abstract Multi-tenancy in public clouds may lead to co-location interference on shared resources, which possibly results in performance degradation of cloud applications. Cloud providers want to know when such events happen and how serious the degradation is, to perform interference-aware migrations and alleviate the problem. However, virtual machines (VM) in Infrastructure-as-a-Service public clouds are black-boxes to providers, where application-level performance information cannot be acquired. This makes performance monitoring intensely challenging as cloud providers can only rely on low-level metrics such as CPU usage and hardware counters. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. To feed the data-hungry models, we first elaborate interference generators and conduct comprehensive co-location experiments on a testbed to build Alioth-dataset which reflects the complexity and dynamicity in real-world scenarios. Then we construct Alioth by (1) augmenting features via recovering low-level metrics under no interference using denoising auto-encoders, (2) devising a transfer learning model based on domain adaptation neural network to make models generalize on test cases unseen in offline training, and (3) developing a SHAP explainer to automate feature selection and enhance model interpretability. Experiments show that Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming the baseline methods. Alioth is also robust in signaling quality-of-service violation under dynamicity. Finally, we demonstrate a possible application of Alioth's interpretability, providing insights to benefit the decision-making of cloud operators. The dataset and code of Alioth have been released on GitHub.
    摘要 多重租户在公共云上可能导致共享资源上的干扰,从而导致云应用程序的性能下降。云提供商希望在这些事件发生时知道其严重程度,以进行干扰意识的迁移和缓解问题。然而,基础设施协议(IaaS)云公共云中的虚拟机(VM)对提供商是黑洞,无法获得应用层性能信息。这使得性能监测变得非常困难,cloud提供商只能依靠低级别指标,如CPU使用率和硬件计数器。我们提出了一种新的机器学习框架,称为Alioth,用于监测云应用程序的性能下降。为了充实数据鲁棒的模型,我们首先 elaborated interference generators和在testbed上进行了广泛的共享实验,以建立Alioth-dataset,该集合反映了实际场景中的复杂性和动态性。然后,我们构建了Alioth,包括以下三个主要组成部分:1. 通过使用降噪自适应神经网络恢复低级别指标,扩展特征。2. 基于域 adapted神经网络模型,以便模型在测试 случаeschannel unseen in offline training中 generalize。3. 开发 SHAP解释器,以自动选择特征和提高模型解释性。实验表明,Alioth在线上和离线上的平均绝对误差为5.29%和10.8% respectively,比基eline方法优化。此外,Alioth在动态环境下也具有可靠的质量服务预测能力。最后,我们示出了Alioth的解释性可以为云运维师提供有价值的决策指导。Alioth-dataset和相关代码已经在GitHub上发布。

Mitigating Label Bias via Decoupled Confident Learning

  • paper_url: http://arxiv.org/abs/2307.08945
  • repo_url: None
  • paper_authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky
  • for: 本研究旨在提出一种特点是适应标签偏见的分类方法,以便在涉及重要领域中减少算法偏见。
  • methods: 本研究提出了一种名为分离信心学习(DeCoLe)的遮盾方法,该方法可以减少标签偏见的影响。
  • results: 在一个Synthetic数据集上测试了DeCoLe方法,结果显示其能够成功地检测出偏见标签,并且在仇恨言语识别任务中超过其他方法表现。
    Abstract Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches.
    摘要 algorithmic fairness的问题在不断增加,这导致了一系列方法来减少算法偏见。然而,这些方法假设训练数据中的标签是正确的。这是一个问题,因为标签中的偏见是广泛存在的,包括医疗、招聘和内容审核等重要领域。人类生成的标签容易带有社会偏见。虽然标签偏见的存在已经被讨论,但是没有有效的方法来解决这个问题。我们提出了一种剪裁方法——分离信任学习(DeCoLe),特意设计来减少标签偏见。我们在一个 sintetic 数据集上验证了 DeCoLe 的性能,然后在仇恨言语检测中应用了 DeCoLe,并证明它成功地标识了偏见标签,并超过了竞争方法的性能。

Siamese Networks for Weakly Supervised Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2307.08944
  • repo_url: None
  • paper_authors: Taoran Sheng, Manfred Huber
  • for: 这篇论文旨在应用深度学习于人体活动识别,但是训练深度神经网络需要大量的明确标注数据,这是困难的获得。
  • methods: 这篇论文提出了一种使用多个同构网络进行训练,只使用数据对的相似性信息来训练模型,从而生成一个可以作为各种各样的聚类算法的度量模型。
  • results: 论文在三个数据集上进行了评估,并证明了模型的效果iveness在分类和识别连续人体活动序列中。
    Abstract Deep learning has been successfully applied to human activity recognition. However, training deep neural networks requires explicitly labeled data which is difficult to acquire. In this paper, we present a model with multiple siamese networks that are trained by using only the information about the similarity between pairs of data samples without knowing the explicit labels. The trained model maps the activity data samples into fixed size representation vectors such that the distance between the vectors in the representation space approximates the similarity of the data samples in the input space. Thus, the trained model can work as a metric for a wide range of different clustering algorithms. The training process minimizes a similarity loss function that forces the distance metric to be small for pairs of samples from the same kind of activity, and large for pairs of samples from different kinds of activities. We evaluate the model on three datasets to verify its effectiveness in segmentation and recognition of continuous human activity sequences.
    摘要 深度学习已成功应用于人类活动识别。然而,训练深度神经网络需要明确标注的数据,具体来说是困难的获得。在这篇论文中,我们提出了一种使用多个同构网络进行训练,不需要明确标注数据。训练模型将活动数据样本映射到固定大小的表示向量中,使得表示空间中的距离 approximates 输入空间中的相似性。因此,训练模型可以作为各种不同的聚类算法的度量。训练过程中 minimizes 一个相似损失函数,该函数让距离度量在同类活动样本对应的情况下很小,并在不同类活动样本对应的情况下很大。我们在三个数据集上验证了模型的有效性,以确认其在连续人类活动序列的分割和识别方面的表现。

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

  • paper_url: http://arxiv.org/abs/2307.08941
  • repo_url: https://github.com/weitianxin/mlp_fusion
  • paper_authors: Tianxin Wei, Zeming Guo, Yifan Chen, Jingrui He
  • for: 这篇论文旨在提出一种通过NTK数据来实现简化预训练语言模型(PLM)的方法,以减少PLM的计算和内存需求。
  • methods: 本文使用NTK几何来检查PLM的多层感知器(MLP)模组,并提出一种通过将MLP装置为一些中心的组合来实现轻量级PLM的方法。
  • results: 实验结果显示,该方法可以实现PLM的简化,并在自然语言理解(NLU)和生成(NLG)任务上进行了有效的调整。
    Abstract Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.
    摘要 大多数自然语言处理应用中出现了调整预训练语言模型(PLM)的方法。然而,即使是调整PLM和做出判断都是昂贵的,特别是在边缘设备上进行。一些通用的方法(如量化和液化)已经广泛研究以降低PLM调整的计算/存储量,而很少有一次性压缩技术被探索。在这篇论文中,我们研究了神经积分析(NTK)——描述神经网络的梯度下降动力学——的多层感知器(MLP)模块在PLM中,并提议通过NTK-近似MLP融合来创造轻量级PLM。为此,我们重新考虑MLP为一个分解成多个子MLP的Bundle,并将其分成一定数量的中心点,然后可以将其Restore为压缩MLP,并意外地发现可以良好地近似原PLM的NTK。我们提供了大量PLM精度调整NLU和NLG任务的实验来证明提议的方法的有效性。我们的代码可以在https://github.com/weitianxin/MLP_Fusion上找到。

Experimental Security Analysis of DNN-based Adaptive Cruise Control under Context-Aware Perception Attacks

  • paper_url: http://arxiv.org/abs/2307.08939
  • repo_url: None
  • paper_authors: Xugui Zhou, Anqi Chen, Maxfield Kouzel, Haotian Ren, Morgan McCarty, Cristina Nita-Rotaru, Homa Alemzadeh
  • for: 评估深度神经网络(DNN)基于自适应巡航控制(ACC)系统的安全性,以防止恶意投毒攻击引起前方碰撞。
  • methods: 提出了一种结合知识驱动和数据驱动的方法,用于在攻击时选择最 kritical时刻,以及一种基于优化的方法来在运行时生成适应性的图像偏移。
  • results: 通过实验和实际驱动 simulator Platform 和生产 ACC 系统,发现提案的攻击可以 achiev 142.9x 高的成功率,同时受到安全功能(如自动紧急刹车和前方碰撞预警)的干扰减少了89.6%。这种攻击 Robust 到实际世界因素和环境动态变化,同时能够避免被发现。这种研究提供了人类运行员和基本安全功能的抗攻击策略。
    Abstract Adaptive Cruise Control (ACC) is a widely used driver assistance feature for maintaining desired speed and safe distance to the leading vehicles. This paper evaluates the security of the deep neural network (DNN) based ACC systems under stealthy perception attacks that strategically inject perturbations into camera data to cause forward collisions. We present a combined knowledge-and-data-driven approach to design a context-aware strategy for the selection of the most critical times for triggering the attacks and a novel optimization-based method for the adaptive generation of image perturbations at run-time. We evaluate the effectiveness of the proposed attack using an actual driving dataset and a realistic simulation platform with the control software from a production ACC system and a physical-world driving simulator while considering interventions by the driver and safety features such as Automatic Emergency Braking (AEB) and Forward Collision Warning (FCW). Experimental results show that the proposed attack achieves 142.9x higher success rate in causing accidents than random attacks and is mitigated 89.6% less by the safety features while being stealthy and robust to real-world factors and dynamic changes in the environment. This study provides insights into the role of human operators and basic safety interventions in preventing attacks.
    摘要 这篇研究评估了基于深度神经网络(DNN)的自适应巡航控制(ACC)系统的安全性,以及这些系统对于隐藏式感知攻击的抵抗力。我们提出了一种结合知识驱动和数据驱动的方法,以选择最重要的时刻进行攻击,并且使用优化方法生成Run-time中的像素噪声。我们使用实际驾驶数据和真实的驾驶 simulate平台,考虑到驾驶员的干预和安全功能,例如自动紧急刹车(AEB)和前方冲击警示(FCW)。实验结果显示,我们的攻击成功率高于随机攻击的142.9倍,并且受到安全功能的抑制89.6%。此研究给出了人类驾驶员和基本安全功能的防御效果。

Multi-stage Neural Networks: Function Approximator of Machine Precision

  • paper_url: http://arxiv.org/abs/2307.08934
  • repo_url: None
  • paper_authors: Yongji Wang, Ching-Yao Lai
  • for: 这 paper 是为了提高神经网络在科学问题中的精度,并且使用多stage neural networks来mitigate spectral biases。
  • methods: 这 paper 使用了多stage neural networks,将训练过程分成不同的阶段,每个阶段使用一个新的网络来适应剩下的差异。
  • results: 这 paper 表明,使用多stage neural networks可以减少预测错误至O(10^{-16})水平,这是单个神经网络很难达到的精度。
    Abstract Deep learning techniques are increasingly applied to scientific problems, where the precision of networks is crucial. Despite being deemed as universal function approximators, neural networks, in practice, struggle to reduce the prediction errors below $O(10^{-5})$ even with large network size and extended training iterations. To address this issue, we developed the multi-stage neural networks that divides the training process into different stages, with each stage using a new network that is optimized to fit the residue from the previous stage. Across successive stages, the residue magnitudes decreases substantially and follows an inverse power-law relationship with the residue frequencies. The multi-stage neural networks effectively mitigate the spectral biases associated with regular neural networks, enabling them to capture the high frequency feature of target functions. We demonstrate that the prediction error from the multi-stage training for both regression problems and physics-informed neural networks can nearly reach the machine-precision $O(10^{-16})$ of double-floating point within a finite number of iterations. Such levels of accuracy are rarely attainable using single neural networks alone.
    摘要 深度学习技术在科学问题中越来越广泛应用,其精度的网络是关键。尽管被认为是通用函数近似器,实际上,神经网络在实践中难以降低预测错误 Below $O(10^{-5})$,即使使用大型网络和长时间训练轮次。为解决这问题,我们开发了多 stage 神经网络,将训练过程分解成不同的阶段,每个阶段使用新的网络,该网络是适应前一阶段剩余的。在 successive 阶段中,剩余大小减少了很多,并且与剩余频率关系为 inverse power-law 关系。多 stage 神经网络有效地 mitigate 神经网络的 спектраль偏好,使其能够捕捉目标函数的高频特征。我们示出,使用多 stage 训练,对于回归问题和物理学 Informed neural networks 的预测错误可以几乎达到机器精度 $O(10^{-16})$ 的水平,这些精度在单个神经网络alone 中 rarely 可以达到。

IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness

  • paper_url: http://arxiv.org/abs/2307.08933
  • repo_url: https://github.com/sri-aic/23-xai-ixdrl-data
  • paper_authors: Pedro Sequeira, Melinda Gervasio
    for:The paper aims to provide a more explainable deep reinforcement learning (xDRL) framework to help human operators understand the competence of RL agents in complex decision-making tasks.methods:The proposed framework is based on interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit.results:The approach can identify agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent’s competence, based on global and local analyses of interestingness. The framework provides agent designers with insights about RL agent competence, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
    Abstract In recent years, advances in deep learning have resulted in a plethora of successes in the use of reinforcement learning (RL) to solve complex sequential decision tasks with high-dimensional inputs. However, existing systems lack the necessary mechanisms to provide humans with a holistic view of their competence, presenting an impediment to their adoption, particularly in critical applications where the decisions an agent makes can have significant consequences. Yet, existing RL-based systems are essentially competency-unaware in that they lack the necessary interpretation mechanisms to allow human operators to have an insightful, holistic view of their competency. Towards more explainable Deep RL (xDRL), we propose a new framework based on analyses of interestingness. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit. We showcase the use of our framework by applying the proposed pipeline in a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, and the task elements mostly responsible for an agent's competence, based on global and local analyses of interestingness. Overall, we show that our framework can provide agent designers with insights about RL agent competence, both their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.
    摘要 To address this issue, we propose a new framework based on interestingness analysis to provide a more explainable Deep RL (xDRL) system. Our tool provides various measures of RL agent competence stemming from interestingness analysis and is applicable to a wide range of RL algorithms, natively supporting the popular RLLib toolkit.We demonstrate the use of our framework by applying it to a set of scenarios of varying complexity. We empirically assess the capability of the approach in identifying agent behavior patterns and competency-controlling conditions, as well as the task elements most responsible for an agent's competence, based on global and local analyses of interestingness.Our framework provides agent designers with insights into RL agent competence, including their capabilities and limitations, enabling more informed decisions about interventions, additional training, and other interactions in collaborative human-machine settings.

Submodular Maximization under the Intersection of Matroid and Knapsack Constraints

  • paper_url: http://arxiv.org/abs/2307.09487
  • repo_url: None
  • paper_authors: Yu-Ran Gu, Chao Bian, Chao Qian
  • for: 本文研究的目的是解决具有 intersection of $k$-matroid constraint和$m$-knapsack constraint的submodular maximization问题。
  • methods: 作者提出了一种名为SPROUT的新算法,通过将partial enumeration incorporated into the simultaneous greedy framework来解决该问题。
  • results: 作者证明了SPROUT可以在几乎polynomial时间内实现更好的approximation guarantee,并通过引入随机枚举和矫正技术,开发出了SPROUT++算法,在实践中具有更高的效率和相似的approximation guarantee。
    Abstract Submodular maximization arises in many applications, and has attracted a lot of research attentions from various areas such as artificial intelligence, finance and operations research. Previous studies mainly consider only one kind of constraint, while many real-world problems often involve several constraints. In this paper, we consider the problem of submodular maximization under the intersection of two commonly used constraints, i.e., $k$-matroid constraint and $m$-knapsack constraint, and propose a new algorithm SPROUT by incorporating partial enumeration into the simultaneous greedy framework. We prove that SPROUT can achieve a polynomial-time approximation guarantee better than the state-of-the-art algorithms. Then, we introduce the random enumeration and smooth techniques into SPROUT to improve its efficiency, resulting in the SPROUT++ algorithm, which can keep a similar approximation guarantee. Experiments on the applications of movie recommendation and weighted max-cut demonstrate the superiority of SPROUT++ in practice.
    摘要 <>设km为两个正整数,我们考虑一个问题:在k-matroid约束和m-吸顺约束下,最大化submodular函数的问题。之前的研究主要考虑单一约束,而实际问题通常涉及多个约束。在本文中,我们提出了一种新的算法SPROUT,它通过同时干扰框架中的增量扩展来解决这个问题。我们证明了SPROUT可以在几乎真实时间内提供更好的近似度 garantia。然后,我们将随机枚举和缓和技术添加到SPROUT中,得到了SPROUT++算法,它可以保持相似的近似度 garantia。在电影推荐和权重最大枢纽问题的应用中,SPROUT++在实践中表现出了superiority。Note: The text has been translated using Google Translate, and may not be perfect.

On-the-fly machine learning for parametrization of the effective Hamiltonian

  • paper_url: http://arxiv.org/abs/2307.08929
  • repo_url: None
  • paper_authors: Xingyue Ma, L. Bellaiche, Di Wu, Yurong Yang
  • for: 这研究旨在开发一种基于机器学习的启动式有效汉姆逻辑,用于预测和模拟 ferroelectrics 和 relaxor ferroelectrics 的性质。
  • methods: 这种方法使用 Bayesian 线性回归来 Parametrize 有效汉姆逻辑,在分子动力学实验中完成 Parametrization,并预测能量、力和压力以及它们的不确定性。当不确定性较大时,使用首要原理计算来重新训练参数。
  • results: 这种方法可以自动计算任何考虑系统的有效汉姆逻辑参数,包括复杂系统,而传统方法无法处理。用 BaTiO3 和 Pb(Sc,Ta)O3 作为示例,这种方法的准确性与传统首要原理 Parametrization 方法相当。
    Abstract The first-principles-based effective Hamiltonian is widely used to predict and simulate the properties of ferroelectrics and relaxor ferroelectrics. However, the parametrization method of the effective Hamiltonian is complicated and hardly can resolve the systems with complex interactions and/or complex components. Here, we developed an on-the-fly machine learning approach to parametrize the effective Hamiltonian based on Bayesian linear regression. The parametrization is completed in molecular dynamics simulations, with the energy, forces and stress predicted at each step along with their uncertainties. First-principles calculations are executed when the uncertainties are large to retrain the parameters. This approach provides a universal and automatic way to compute the effective Hamiltonian parameters for any considered systems including complex systems which previous methods can not handle. BaTiO3 and Pb(Sc,Ta)O3 are taken as examples to show the accurateness of this approach comparing with conventional first-principles parametrization method.
    摘要 <>使用基本原理效 Hamiltonians 广泛预测和模拟 ferroelectrics 和 relaxor ferroelectrics 的性质。然而,基于效 Hamiltonians 的参数化方法复杂,难以处理复杂的交互和/或复杂的组分。我们在分子动力学实验中开发了一种在飞行中机器学习方法来参数化效 Hamiltonians,通过 Bayesian 线性回归来完成。在每步的分子动力学 simulate 中,能量、力和压力都预测了,同时预测了它们的不确定性。当不确定性大于一定程度时,我们使用首先原理计算来重新训练参数。这种方法提供了一种通用和自动的方法来计算效 Hamiltonians 参数,可以处理任何考虑的系统,包括复杂的系统,先前的方法无法处理。作为例子,我们选择了 BaTiO3 和 Pb(Sc,Ta)O3 来说明这种方法的准确性,与传统的首先原理参数化方法进行比较。

Federated Large Language Model: A Position Paper

  • paper_url: http://arxiv.org/abs/2307.08925
  • repo_url: None
  • paper_authors: Chaochao Chen, Xiaohua Feng, Jun Zhou, Jianwei Yin, Xiaolin Zheng
  • for: 这个研究旨在解决大规模语言模型(LLM)的开发问题,特别是在实际应用中遇到的挑战,例如公共领域数据的缺乏和维护private领域数据的隐私。
  • methods: 这个研究提出了一种称为“联邦式语言模型”(federated LLM)的技术,它包括三个主要的component,即联邦式语言模型预训练、联邦式语言模型细化和联邦式语言模型提示工程。每个component都有优点比传统LLM训练方法,并且提出了具体的工程策略来实现。
  • results: 这个研究获得了联邦式语言模型的优点,包括可以解决实际应用中的挑战,并且可以维护隐私和数据安全性。此外,研究也发现了联邦式语言模型在某些情况下可能会面临新的挑战和障碍。
    Abstract Large scale language models (LLM) have received significant attention and found diverse applications across various domains, but their development encounters challenges in real-world scenarios. These challenges arise due to the scarcity of public domain data availability and the need to maintain privacy with respect to private domain data. To address these issues, federated learning (FL) has emerged as a promising technology that enables collaborative training of shared models while preserving decentralized data. We propose the concept of federated LLM, which comprises three key components, i.e., federated LLM pre-training, federated LLM fine-tuning, and federated LLM prompt engineering. For each component, we discuss its advantage over traditional LLM training methods and propose specific engineering strategies for implementation. Furthermore, we explore the novel challenges introduced by the integration of FL and LLM. We analyze existing solutions and identify potential obstacles faced by these solutions within the context of federated LLM.
    摘要

Learning to Sample Tasks for Meta Learning

  • paper_url: http://arxiv.org/abs/2307.08924
  • repo_url: https://github.com/ZJLAB-AMMI/HS-OMRL
  • paper_authors: Jingyao Wang, Zeen Song, Xingzhe Su, Lingyu Si, Hongwei Dong, Wenwen Qiang, Changwen Zheng
  • for: 通过对各种元学习方法、任务采样器和少量学习任务进行实验,这篇论文得出了三个结论。首先,无法确保元学习模型性能的通用任务采样策略。其次,任务多样性可能导致模型在训练中 Either underfit 或 overfit。最后,模型的总结果受任务分化、任务熵和任务Difficulty的影响。
  • methods: 作者提出了一种名为 Adaptive Sampler (ASr) 的新任务采样器,该采样器可以根据任务分化、任务熵和任务Difficulty来采样任务。以便优化 ASr,作者提出了一种简单普适的元学习算法。
  • results: 许多实验证明了提出的 ASr 的有效性。
    Abstract Through experiments on various meta-learning methods, task samplers, and few-shot learning tasks, this paper arrives at three conclusions. Firstly, there are no universal task sampling strategies to guarantee the performance of meta-learning models. Secondly, task diversity can cause the models to either underfit or overfit during training. Lastly, the generalization performance of the models are influenced by task divergence, task entropy, and task difficulty. In response to these findings, we propose a novel task sampler called Adaptive Sampler (ASr). ASr is a plug-and-play task sampler that takes task divergence, task entropy, and task difficulty to sample tasks. To optimize ASr, we rethink and propose a simple and general meta-learning algorithm. Finally, a large number of empirical experiments demonstrate the effectiveness of the proposed ASr.
    摘要 通过多种元学习方法、任务采样策略和少量学习任务的实验,这篇论文得出了三个结论。首先,没有一种通用的任务采样策略可以保证元学习模型的性能。第二,任务多样性可以使模型在训练中出现下降或过度适应。最后,模型的总体性能受到任务分化、任务 entropy 和任务难度的影响。为了应对这些发现,我们提出了一种名为 Adaptive Sampler(ASr)的任务采样器。ASr 是一个插件和玩家的任务采样器,它根据任务分化、任务 entropy 和任务难度来采样任务。为了优化 ASr,我们提出了一种简单和通用的元学习算法。最后,大量的实验证明了我们提出的 ASr 的有效性。

Optimistic Estimate Uncovers the Potential of Nonlinear Models

  • paper_url: http://arxiv.org/abs/2307.08921
  • repo_url: None
  • paper_authors: Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
  • for: 评估非线性模型最佳适应性表现的估计方法。
  • methods: 使用非线性模型,并且采用估计最小样本大小以达到目标函数的最佳适应性。
  • results: 对矩阵分解模型、深度模型和深度神经网络(DNN)进行了估计,并证明了这些模型在过参数化下的可适应性。此外,研究还发现了深度神经网络的两种特殊性:自由表达能力和成本表达能力。这两种特殊性提出了建议DNNS的建模设计原则:(一)不妨添加神经元和核函数;(二)限制神经元之间的连接。通过这种框架,我们预计在未来更深入理解如何和为什么许多非线性模型在实践中能够有效实现其潜在可能性。
    Abstract We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.
    摘要 我们提出了一种优派估计来评估非线性模型的最佳适应性表现。它提供了一个最小的样本大小,用于评估目标函数使用非线性模型适应的可能性。我们对矩阵因子化模型、深度模型和深度神经网络(DNN)进行了估计,并证明了每种非线性模型的估计预测了一个特定的目标集可以在过参数化下适应。我们的估计还揭示了深度神经网络(DNN)模型的两种特殊性:自由表达能力和成本表达能力。这两种特殊性建议了深度神经网络(DNN)模型的建设设计原则:(i)自由添加神经元/核函数;(ii)限制神经元之间的连接。总的来说,我们的优派估计 theoretically 探明了非线性模型在过参数化下的潜在适应能力。基于这个框架,我们预计在未来将更深入地理解非线性模型在实践中如何有效地实现其潜在。

Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

  • paper_url: http://arxiv.org/abs/2307.08920
  • repo_url: None
  • paper_authors: Brent A. Wallace, Jennie Si
  • for: 这个论文的目的是提出一种新的连续时间非线性优化控制方法,用于控制非线性系统。
  • methods: 这种方法基于分解physical system into smaller subproblems,并引入了一种新的刺激框架,以提高 persistency of excitation 和数值稳定性。
  • results: 这些算法可以提供 convergence 和关闭Loop稳定性保证,并在控制一个不稳定、非最小频段高速飞行器(HSV)上进行了示例应用。
    Abstract Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. However, a recent comprehensive analysis of state-of-the-art continuous-time RL (CT-RL) methods, namely, adaptive dynamic programming (ADP)-based CT-RL algorithms, reveals they face significant design challenges due to their complexity, numerical conditioning, and dimensional scaling issues. Despite advanced theoretical results, existing ADP CT-RL synthesis methods are inadequate in solving even small, academic problems. The goal of this work is thus to introduce a suite of new CT-RL algorithms for control of affine nonlinear systems. Our design approach relies on two important factors. First, our methods are applicable to physical systems that can be partitioned into smaller subproblems. This constructive consideration results in reduced dimensionality and greatly improved intuitiveness of design. Second, we introduce a new excitation framework to improve persistence of excitation (PE) and numerical conditioning performance via classical input/output insights. Such a design-centric approach is the first of its kind in the ADP CT-RL community. In this paper, we progressively introduce a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms. We provide convergence and closed-loop stability guarantees, and we demonstrate these guarantees on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV).
    摘要 <>translation into Simplified Chinese<>连续时间非线性优化控制问题在实际应用中具有很大的推动力。在过去的几十年中,强化学习(RL)已经取得了一些非常成功的非线性控制设计方法。然而,一个最近的总体分析表明,使用ADP基于CT-RL算法的continuous-time RL(CT-RL)方法面临着复杂性、数值条件和维度增加问题。尽管有了先进的理论成果,现有的ADP CT-RL合成方法无法解决even small的学术问题。本工作的目标是引入一组新的CT-RL算法,用于控制非线性系统。我们的设计方法基于以下两个重要因素。首先,我们的方法适用于可以被分解成更小的子问题的物理系统。这种构建考虑导致维度减少和设计更加直观的问题。其次,我们引入了一个新的刺激框架,以提高持续刺激(PE)和数值条件性性能。这种设计中心的方法是CT-RL社区中的第一个。在这篇论文中,我们逐渐介绍了一组(分布式)刺激积分学习(EIRL)算法。我们提供了收敛和关闭Loop稳定性保证,并在一个重要应用问题中控制不稳定、非最小阶段速度飞行器(HSV)中进行了证明。

Accuracy versus time frontiers of semi-supervised and self-supervised learning on medical images

  • paper_url: http://arxiv.org/abs/2307.08919
  • repo_url: https://github.com/tufts-ml/ssl-vs-ssl-benchmark
  • paper_authors: Zhe Huang, Ruijie Jiang, Shuchin Aeron, Michael C. Hughes
  • for: 本研究的目的是为了解决资源受限、结果受关注的医学图像分类问题,通过采用自我超级vised学习和 semi-supervised learning方法,提高分类器的性能。
  • methods: 本研究使用了6种 semi-supervised 方法和5种自我超级vised学习方法,并与高品质标注数据作为基准进行比较。
  • results: 研究发现, MixMatch、SimCLR 和 BYOL 等方法在3种医学图像 datasets 上表现出色,并且可以在几个小时内达到优秀的性能。同时,通过选择适当的 hyperparameter 和进行较多的训练,可以获得进一步的提升。
    Abstract For many applications of classifiers to medical images, a trustworthy label for each image can be difficult or expensive to obtain. In contrast, images without labels are more readily available. Two major research directions both promise that additional unlabeled data can improve classifier performance: self-supervised learning pretrains useful representations on unlabeled data only, then fine-tunes a classifier on these representations via the labeled set; semi-supervised learning directly trains a classifier on labeled and unlabeled data simultaneously. Recent methods from both directions have claimed significant gains on non-medical tasks, but do not systematically assess medical images and mostly compare only to methods in the same direction. This study contributes a carefully-designed benchmark to help answer a practitioner's key question: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unlabeled images are possible and which methods best achieve them? Unlike previous benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.
    摘要 für viele Anwendungen von Klassifikatoren auf medizinische Bilder kann ein zuverlässiger Label für jede Bilddatei schwierig oder teuer zu beschaffen sein. Im Gegensatz dazu sind Bilder ohne Labels mehr verfügbar. Zwei wichtige Forschungsrichtungen versprechen, dass zusätzliches unetikettiertes Datenmaterial die Leistung des Klassifikators verbessern kann: Selbstübergreifendes Lernen trainiert nützliche Representationen auf unetikettiertem Datenmaterial, then fine-tunes a classifier via the labeled set; semi-supervised Learning trainiert direkt einen Klassifikator auf etikettierten und unetikettierten Daten gleichzeitig. Recente Methoden beider Richtungen haben bedeutende Fortschritte bei nicht-medizinischen Aufgaben erzielt, aber systematisch die medizinischen Bilder nicht bewertet und sich nur mit Methoden in derselben Richtungen verglichen. This study contributes a carefully-designed Benchmark, um zu antworten auf eine wichtige Frage des Practitioners: given a small labeled dataset and a limited budget of hours to spend on training, what gains from additional unetikettiertes images are possible and which methods best achieve them? Im Gegensatz zu previous Benchmarks, ours uses realistic-sized validation sets to select hyperparameters, assesses runtime-performance tradeoffs, and bridges two research fields. By comparing 6 semi-supervised methods and 5 self-supervised methods to strong labeled-only baselines on 3 medical datasets with 30-1000 labels per class, we offer insights to resource-constrained, results-focused practitioners: MixMatch, SimCLR, and BYOL represent strong choices that were not surpassed by more recent methods. After much effort selecting hyperparameters on one dataset, we publish settings that enable strong methods to perform well on new medical tasks within a few hours, with further search over dozens of hours delivering modest additional gains.

Towards the Sparseness of Projection Head in Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.08913
  • repo_url: None
  • paper_authors: Zeen Song, Xingzhe Su, Jingyao Wang, Wenwen Qiang, Changwen Zheng, Fuchun Sun
  • for: 提高自动学习(SSL)方法中的表示性能
  • methods: 使用对偶学习方法和理论分析,探讨投影头部件的内部机制和维度归一化现象之间的关系
  • results: 提出了一种假设,即只需要在数据批处理中最小化对偶损失时使用一部分特征;并通过对SSL方法进行论证,提出了一种名为SparseHead的规范项,可以减少投影头部件的稀疏性,从而提高SSL方法的表示性能。
    Abstract In recent years, self-supervised learning (SSL) has emerged as a promising approach for extracting valuable representations from unlabeled data. One successful SSL method is contrastive learning, which aims to bring positive examples closer while pushing negative examples apart. Many current contrastive learning approaches utilize a parameterized projection head. Through a combination of empirical analysis and theoretical investigation, we provide insights into the internal mechanisms of the projection head and its relationship with the phenomenon of dimensional collapse. Our findings demonstrate that the projection head enhances the quality of representations by performing contrastive loss in a projected subspace. Therefore, we propose an assumption that only a subset of features is necessary when minimizing the contrastive loss of a mini-batch of data. Theoretical analysis further suggests that a sparse projection head can enhance generalization, leading us to introduce SparseHead - a regularization term that effectively constrains the sparsity of the projection head, and can be seamlessly integrated with any self-supervised learning (SSL) approaches. Our experimental results validate the effectiveness of SparseHead, demonstrating its ability to improve the performance of existing contrastive methods.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

Sharpness-Aware Graph Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2307.08910
  • repo_url: None
  • paper_authors: Huiyuan Chen, Chin-Chia Michael Yeh, Yujie Fan, Yan Zheng, Junpeng Wang, Vivian Lai, Mahashweta Das, Hao Yang
  • for: 提高Graph Neural Networks(GNNs)在协同缓存中的表现。
  • methods: 提出了一种有效的训练方案{gSAM},基于权重损失 landscape的平滑性来优化GNNs。
  • results: 实验结果表明,gSAM可以提高GNNs的表现。
    Abstract Graph Neural Networks (GNNs) have achieved impressive performance in collaborative filtering. However, GNNs tend to yield inferior performance when the distributions of training and test data are not aligned well. Also, training GNNs requires optimizing non-convex neural networks with an abundance of local and global minima, which may differ widely in their performance at test time. Thus, it is essential to choose the minima carefully. Here we propose an effective training schema, called {gSAM}, under the principle that the \textit{flatter} minima has a better generalization ability than the \textit{sharper} ones. To achieve this goal, gSAM regularizes the flatness of the weight loss landscape by forming a bi-level optimization: the outer problem conducts the standard model training while the inner problem helps the model jump out of the sharp minima. Experimental results show the superiority of our gSAM.
    摘要 格raph神经网络(GNNs)在共同推荐中表现出色。然而,GNNs在训练和测试数据分布不匹配时表现不佳。此外,训练GNNs需要优化非核心积分神经网络,这些神经网络具有很多的本地和全局最小值,其测试时表现可能很不同。因此,选择最佳的最小值非常重要。我们提出了一种有效的训练方法,即{gSAM},其基于“稍平”的最小值具有更好的泛化能力。为实现这个目标,gSAM在权重损失的折衔减中做出了二级优化:外部问题进行标准模型训练,而内部问题帮助模型离开锋利的最小值。实验结果显示了我们的gSAM的优越性。

Solving multiphysics-based inverse problems with learned surrogates and constraints

  • paper_url: http://arxiv.org/abs/2307.11099
  • repo_url: None
  • paper_authors: Ziyi Yin, Rafael Orozco, Mathias Louboutin, Felix J. Herrmann
  • for: 这 paper 是用于解决地质碳存储监测中的多物理 inverse problem 的。
  • methods: 这 paper 使用了 computationally cheap 的 learned surrogates 和 learned constraints 来解决这些问题。
  • results: 这 paper 的结果表明,这种 combinaison 可以提高 fluid-flow 性能的减法,并且可以处理多模态数据,包括 well 测量和 active-source time-lapse seismic 数据。另外,这种方法还可以保持准确性,因为它使用了一个 trained deep neural network 来 constrain the model iterates。
    Abstract Solving multiphysics-based inverse problems for geological carbon storage monitoring can be challenging when multimodal time-lapse data are expensive to collect and costly to simulate numerically. We overcome these challenges by combining computationally cheap learned surrogates with learned constraints. Not only does this combination lead to vastly improved inversions for the important fluid-flow property, permeability, it also provides a natural platform for inverting multimodal data including well measurements and active-source time-lapse seismic data. By adding a learned constraint, we arrive at a computationally feasible inversion approach that remains accurate. This is accomplished by including a trained deep neural network, known as a normalizing flow, which forces the model iterates to remain in-distribution, thereby safeguarding the accuracy of trained Fourier neural operators that act as surrogates for the computationally expensive multiphase flow simulations involving partial differential equation solves. By means of carefully selected experiments, centered around the problem of geological carbon storage, we demonstrate the efficacy of the proposed constrained optimization method on two different data modalities, namely time-lapse well and time-lapse seismic data. While permeability inversions from both these two modalities have their pluses and minuses, their joint inversion benefits from either, yielding valuable superior permeability inversions and CO2 plume predictions near, and far away, from the monitoring wells.
    摘要 解决基于多物理学的逆 пробле 的地质碳存储监测可以是困难的,当 multimodal 时间差数据成本高昂, numerically 计算成本高昂时。我们利用计算成本低廉的学习的代理人与学习的约束结合,不仅能够大幅提高流体流动性的重要性质,孔隙性,还提供了自然的多模态数据逆向平台。通过添加学习的约束,我们实现了计算可行的逆向方法,保持精度。这是通过包含训练好的深度神经网络,即 нормализа流,使模型迭代器 remains 在distribution中,保证训练了Fourier神经网络作为多相流 simulations involving partial differential equation solves的计算昂贵的surrogate。通过选择精心的实验,以地质碳存储问题为中心,我们在时间差井和时间差地震数据两个不同的模式下展示了提案的受限优化方法的效果。虽然孔隙性逆向从这两个模式中有其优缺点,但是两者的共同逆向具有优势,为CO2泵预测和碳存储监测提供了有价值的Superior permeability inversions和预测。

Basal-Bolus Advisor for Type 1 Diabetes (T1D) Patients Using Multi-Agent Reinforcement Learning (RL) Methodology

  • paper_url: http://arxiv.org/abs/2307.08897
  • repo_url: None
  • paper_authors: Mehrad Jaloli, Marzia Cescon
  • for: 这种研究旨在开发一种基于多智能 reinforcement learning(RL)的个性化血糖控制方法,以改善ype 1 диабе尼(T1D)患者的血糖控制。
  • methods: 该方法使用一个关闭的循环系统,包括血糖代谢模型和多智能软actor-critic RL模型,作为基础-膳食帮手。
  • results: 研究结果表明,RL-基于的基础-膳食帮手可以有效改善血糖控制,降低血糖波动性,并增加血糖水平在目标范围内的时间。同时,RL方法可以有效预防低血糖事件,并减少高血糖事件。此外,RL方法还导致了对 convential 疗法相比,每天基础荷尔血糖剂的减少。这些发现表明RL方法可以在ype 1 диабе尼患者中实现更好的血糖控制,并减少严重高血糖的风险。
    Abstract This paper presents a novel multi-agent reinforcement learning (RL) approach for personalized glucose control in individuals with type 1 diabetes (T1D). The method employs a closed-loop system consisting of a blood glucose (BG) metabolic model and a multi-agent soft actor-critic RL model acting as the basal-bolus advisor. Performance evaluation is conducted in three scenarios, comparing the RL agents to conventional therapy. Evaluation metrics include glucose levels (minimum, maximum, and mean), time spent in different BG ranges, and average daily bolus and basal insulin dosages. Results demonstrate that the RL-based basal-bolus advisor significantly improves glucose control, reducing glycemic variability and increasing time spent within the target range (70-180 mg/dL). Hypoglycemia events are effectively prevented, and severe hyperglycemia events are reduced. The RL approach also leads to a statistically significant reduction in average daily basal insulin dosage compared to conventional therapy. These findings highlight the effectiveness of the multi-agent RL approach in achieving better glucose control and mitigating the risk of severe hyperglycemia in individuals with T1D.
    摘要

Evaluating unsupervised disentangled representation learning for genomic discovery and disease risk prediction

  • paper_url: http://arxiv.org/abs/2307.08893
  • repo_url: None
  • paper_authors: Taedong Yun
  • for: 这个论文主要是为了研究高维клиниче数据中的生物marks,以及使用深度学习技术进行遗传学研究。
  • methods: 这个论文使用了多种无监督学习方法,包括自动编码器、VAE、β-VAE和factorVAE,以学习分离的表示。
  • results: 研究发现,使用FactorVAE或β-VAE,可以提高遗传学研究中的结果,包括基因associes数量、heritability和多ifactorial风险分数的性能。factorVAE在不同的规则化参数值下表现良好,而β-VAE却受到规则化参数值的影响较大。
    Abstract High-dimensional clinical data have become invaluable resources for genetic studies, due to their accessibility in biobank-scale datasets and the development of high performance modeling techniques especially using deep learning. Recent work has shown that low dimensional embeddings of these clinical data learned by variational autoencoders (VAE) can be used for genome-wide association studies and polygenic risk prediction. In this work, we consider multiple unsupervised learning methods for learning disentangled representations, namely autoencoders, VAE, beta-VAE, and FactorVAE, in the context of genetic association studies. Using spirograms from UK Biobank as a running example, we observed improvements in the number of genome-wide significant loci, heritability, and performance of polygenic risk scores for asthma and chronic obstructive pulmonary disease by using FactorVAE or beta-VAE, compared to standard VAE or non-variational autoencoders. FactorVAEs performed effectively across multiple values of the regularization hyperparameter, while beta-VAEs were much more sensitive to the hyperparameter values.
    摘要 高维临床数据已成为生物银行规模数据集的不可或缺的资源,这主要归功于生物银行规模数据集的可用性和深度学习技术的发展。据研究表明,使用变量自动编码器(VAE)学习的低维度表示可以用于遗传相关研究和多ifactorial风险预测。在本工作中,我们考虑了多种无监督学习方法,包括 autoencoder、VAE、β-VAE 和 FactorVAE,在遗传相关研究中。使用 UK Biobank 的呼吸图为例,我们发现了使用 FactorVAE 或 β-VAE 而非标准 VAE 或非变量自动编码器后,对气喘病和肺部疾病的遗传相关性、heritability 和多ifactorial风险分数的改进。FactorVAE 在多个规则化超参数值上表现得更好,而 β-VAE 对超参数值的敏感性较高。

The Predicted-Deletion Dynamic Model: Taking Advantage of ML Predictions, for Free

  • paper_url: http://arxiv.org/abs/2307.08890
  • repo_url: None
  • paper_authors: Quanquan C. Liu, Vaidehi Srinivas
  • for: 这 paper 的目的是解决动态图中edge更新预测问题,提高动态算法的效率。
  • methods: 这 paper 使用了预测 deleting 的 Dynamic Model,并提出了一种基于这种模型的框架,可以将部分动态算法”升级”到完全动态Setting中,减少了更新时间的复杂性。
  • results: 这 paper 的算法在不同的问题上都能够实现更好的性能,具体来说,它们的平均更新时间与部分动态算法相似,而且在预测质量较高时,它们的性能与完全动态算法相当。
    Abstract The main bottleneck in designing efficient dynamic algorithms is the unknown nature of the update sequence. In particular, there are some problems, like 3-vertex connectivity, planar digraph all pairs shortest paths, and others, where the separation in runtime between the best partially dynamic solutions and the best fully dynamic solutions is polynomial, sometimes even exponential. In this paper, we formulate the predicted-deletion dynamic model, motivated by a recent line of empirical work about predicting edge updates in dynamic graphs. In this model, edges are inserted and deleted online, and when an edge is inserted, it is accompanied by a "prediction" of its deletion time. This models real world settings where services may have access to historical data or other information about an input and can subsequently use such information make predictions about user behavior. The model is also of theoretical interest, as it interpolates between the partially dynamic and fully dynamic settings, and provides a natural extension of the algorithms with predictions paradigm to the dynamic setting. We give a novel framework for this model that "lifts" partially dynamic algorithms into the fully dynamic setting with little overhead. We use our framework to obtain improved efficiency bounds over the state-of-the-art dynamic algorithms for a variety of problems. In particular, we design algorithms that have amortized update time that scales with a partially dynamic algorithm, with high probability, when the predictions are of high quality. On the flip side, our algorithms do no worse than existing fully-dynamic algorithms when the predictions are of low quality. Furthermore, our algorithms exhibit a graceful trade-off between the two cases. Thus, we are able to take advantage of ML predictions asymptotically "for free.''
    摘要 主要瓶颈在设计高效的动态算法上是未知的更新序列。特别是有些问题,如3个顶点连接性、平面图全对短路和其他问题,其运行时间差 между最佳半动态解决方案和最佳完全动态解决方案是多项式,有时甚至是指数。在这篇论文中,我们提出预测删除动态模型,因为一些现实世界中的服务可以通过历史数据或其他信息来预测用户行为。这种模型在理论上也很有 interess,因为它在半动态和完全动态之间进行 interpolating,并且为动态设定提供了自然的扩展。我们给出了一种新的框架,使得半动态算法在完全动态设定下具有较少的开销。我们使用这种框架,设计了一些算法,其更新时间平均与半动态算法相似,高probability 时,当预测质量高时。另一方面,我们的算法不比现有的完全动态算法差,当预测质量低时。此外,我们的算法具有温和的质量补偿,因此可以充分利用机器学习预测,“免费”。

Examining the Effects of Degree Distribution and Homophily in Graph Learning Models

  • paper_url: http://arxiv.org/abs/2307.08881
  • repo_url: https://github.com/google-research/graphworld
  • paper_authors: Mustafa Yasir, John Palowitch, Anton Tsitsulin, Long Tran-Thanh, Bryan Perozzi
  • for: This paper aims to improve the evaluation of graph neural network (GNN) models by expanding the coverage of graph space within the GraphWorld framework.
  • methods: The paper uses three synthetic graph generators: the Stochastic Block Model (SBM), LFR, and CABAM. These generators are integrated into the GraphWorld framework to create more diverse populations of synthetic graphs for benchmarking GNN tasks.
  • results: The paper generates 300,000 graphs to benchmark 11 GNN models on a node classification task, and finds variations in GNN performance in response to homophily, degree distribution, and feature signal. The paper classifies GNN models based on their sensitivity to the new generators under these properties.Here’s the simplified Chinese text for the three information points:
  • for: 这篇论文目标是提高图神经网络(GNN)模型的评估,通过扩展图WORLD框架中的图空间覆盖。
  • methods: 这篇论文使用了三种 sintetic 图生成器:Stochastic Block Model(SBM)、LFR 和 CABAM。这些生成器被 integrate 到图WORLD框架中,以创造更多的多样化的 sintetic 图来 benchmark GNN 任务。
  • results: 这篇论文通过生成 300,000 个图,对 11 种 GNN 模型进行节点分类任务的评估,发现 GNN 性能响应于同类性、度分布和特征信号。 paper 根据这些特性将 GNN 模型分类为敏感性。
    Abstract Despite a surge in interest in GNN development, homogeneity in benchmarking datasets still presents a fundamental issue to GNN research. GraphWorld is a recent solution which uses the Stochastic Block Model (SBM) to generate diverse populations of synthetic graphs for benchmarking any GNN task. Despite its success, the SBM imposed fundamental limitations on the kinds of graph structure GraphWorld could create. In this work we examine how two additional synthetic graph generators can improve GraphWorld's evaluation; LFR, a well-established model in the graph clustering literature and CABAM, a recent adaptation of the Barabasi-Albert model tailored for GNN benchmarking. By integrating these generators, we significantly expand the coverage of graph space within the GraphWorld framework while preserving key graph properties observed in real-world networks. To demonstrate their effectiveness, we generate 300,000 graphs to benchmark 11 GNN models on a node classification task. We find GNN performance variations in response to homophily, degree distribution and feature signal. Based on these findings, we classify models by their sensitivity to the new generators under these properties. Additionally, we release the extensions made to GraphWorld on the GitHub repository, offering further evaluation of GNN performance on new graphs.
    摘要 尽管GNN发展中的兴趣增长,仍然存在基本问题,即 benchmarking 数据集的同质性。GraphWorld 是一种最近的解决方案,使用 Stochastic Block Model(SBM)生成多样化的 synthetic graph 用于任何 GNN 任务的 benchmarking。尽管它成功,但 SBM 强制性限制 GraphWorld 可以创建的图结构类型。在这种工作中,我们检查了两种额外的 sintethic graph 生成器是如何提高 GraphWorld 的评估。LFR 是一种已有的图分群模型,CABAM 是一种对 Barabasi-Albert 模型的最近适应,专门用于 GNN benchmarking。通过将这些生成器纳入 GraphWorld 框架,我们可以覆盖图空间的扩展,保持真实世界网络中观察到的关键图属性。为了证明其效果,我们生成了 300,000 个图用于对 11 种 GNN 模型进行节点分类任务的 benchmarking。我们发现 GNN 模型在同质性、度分布和特征信号下的性能变化。根据这些发现,我们将模型分为它们对新生成器下的性能响应。此外,我们在 GitHub 仓库中发布了 GraphWorld 的扩展,以便进一步评估 GNN 性能在新的图上。

Modular Neural Network Approaches for Surgical Image Recognition

  • paper_url: http://arxiv.org/abs/2307.08880
  • repo_url: None
  • paper_authors: Nosseiba Ben Salem, Younes Bennani, Joseph Karkazan, Abir Barbara, Charles Dacheux, Thomas Gregory
  • for: 这个研究是为了提出一种基于深度学习的脑网络架构,以解决现代问题的复杂化和数据不足问题。
  • methods: 这个研究使用了自我训练的方法来解决数据不足问题,并且将问题分解为更简单的子 задачі,以提高模型的泛化和解释性。
  • results: 研究发现,使用模块学习方法可以提高分类性能,并且可以实现近乎完美的分类。另外,这种方法还可以提高数据分类的速度和可解释性。
    Abstract Deep learning-based applications have seen a lot of success in recent years. Text, audio, image, and video have all been explored with great success using deep learning approaches. The use of convolutional neural networks (CNN) in computer vision, in particular, has yielded reliable results. In order to achieve these results, a large amount of data is required. However, the dataset cannot always be accessible. Moreover, annotating data can be difficult and time-consuming. Self-training is a semi-supervised approach that managed to alleviate this problem and achieve state-of-the-art performances. Theoretical analysis even proved that it may result in a better generalization than a normal classifier. Another problem neural networks can face is the increasing complexity of modern problems, requiring a high computational and storage cost. One way to mitigate this issue, a strategy that has been inspired by human cognition known as modular learning, can be employed. The principle of the approach is to decompose a complex problem into simpler sub-tasks. This approach has several advantages, including faster learning, better generalization, and enables interpretability. In the first part of this paper, we introduce and evaluate different architectures of modular learning for Dorsal Capsulo-Scapholunate Septum (DCSS) instability classification. Our experiments have shown that modular learning improves performances compared to non-modular systems. Moreover, we found that weighted modular, that is to weight the output using the probabilities from the gating module, achieved an almost perfect classification. In the second part, we present our approach for data labeling and segmentation with self-training applied on shoulder arthroscopy images.
    摘要 深度学习基于应用在过去几年中取得了很多成功。文本、音频、图像和视频都被使用深度学习方法进行了成功的探索。特别是在计算机视觉方面,卷积神经网络(CNN)的使用已经取得了可靠的结果。但是,获取数据的问题仍然存在。另外,标注数据可能会是困难的和耗时的。自学习是一种半监督学习方法,可以解决这个问题,并达到状态艺术的性能。理论分析还证明,它可能会在普通分类器之上取得更好的泛化性。另一个问题是现代问题的复杂性,需要高度的计算和存储成本。一种可以 mitigate这个问题的方法是模块学习。这种方法的原理是将复杂问题分解成更简单的子任务。这种方法有很多优点,包括更快的学习、更好的泛化和可读性。在本文的第一部分,我们介绍了不同的模块学习架构,并对DCSS不稳定性分类问题进行了评估。我们的实验结果表明,模块学习可以提高性能,而且使用权重模块可以达到几乎完美的分类。在第二部分,我们介绍了我们的自动标注和分割方法,使用自学习在肩镜像中进行了应用。

  • paper_url: http://arxiv.org/abs/2307.08877
  • repo_url: https://github.com/chatterjeeayan/upna
  • paper_authors: Ayan Chatterjee, Robin Walters, Giulia Menichetti, Tina Eliassi-Rad
  • for: 链接预测是图机器学习中关键的任务,具有广泛的应用。本文研究节点特征和图结构之间的互动,并证明在包含预训节点特征的情况下,链接预测模型的泛化能力得到提高。
  • methods: 我们提出的方法是UPNA(无监督节点特征预训),它解决了链接预测问题,学习一个函数,将两个节点特征作为输入,预测两节点之间的边的概率。与传统的图神经网络(GNN)不同,UPNA不会因为图中强制度分布的问题而陷入拟合偏见。
  • results: 我们的实验表明,UPNA可以在多种对比 datasets 上达到3X至34X的提高,超过当前的状态艺。此外,UPNA可以应用于多种对比学习任务,并与现有的链接预测模型集成,提高其泛化能力和图生成模型的强度。
    Abstract Link prediction is a crucial task in graph machine learning with diverse applications. We explore the interplay between node attributes and graph topology and demonstrate that incorporating pre-trained node attributes improves the generalization power of link prediction models. Our proposed method, UPNA (Unsupervised Pre-training of Node Attributes), solves the inductive link prediction problem by learning a function that takes a pair of node attributes and predicts the probability of an edge, as opposed to Graph Neural Networks (GNN), which can be prone to topological shortcuts in graphs with power-law degree distribution. In this manner, UPNA learns a significant part of the latent graph generation mechanism since the learned function can be used to add incoming nodes to a growing graph. By leveraging pre-trained node attributes, we overcome observational bias and make meaningful predictions about unobserved nodes, surpassing state-of-the-art performance (3X to 34X improvement on benchmark datasets). UPNA can be applied to various pairwise learning tasks and integrated with existing link prediction models to enhance their generalizability and bolster graph generative models.
    摘要 链接预测是图机器学习中关键的任务,具有广泛的应用。我们研究节点特征和图结构之间的互动,并证明在把预训练节点特征纳入模型中可以提高链接预测模型的通用能力。我们提出的方法是UPNA(无监督节点特征预训练),解决了链接预测问题,学习一个函数,用来预测两个节点之间的边的概率,而不是使用图神经网络(GNN),后者可能会因为图中具有强制的度分布而导致拓扑短 Circuit。这种方法可以学习图生成机制的一个重要部分,因为学习的函数可以用来添加新的入节点到生长中的图。通过利用预训练节点特征,我们超越观察偏见,可以对未观察的节点进行有意义的预测,超过了状态艺术性的表现(3X-34X提高在标准数据集上)。UPNA可以应用于多种对称学习任务,并可以与现有的链接预测模型集成,提高其通用性和图生成模型的鲁棒性。

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

  • paper_url: http://arxiv.org/abs/2307.08875
  • repo_url: None
  • paper_authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
  • for: 这个论文的目的是解决模型匹配问题,即在训练环境和测试环境之间的模型差异问题,以确定一个可靠性高的策略。
  • methods: 该论文提出了两种新的不确定集形式,一种基于双抽样,另一种基于积分概率度量。这两种不确定集形式使得大规模的Robust reinforcement learning(RL)变得可 tractable,即使只有训练环境。该论文还提出了一种 robust natural actor-critic(RNAC)方法,该方法包括新的不确定集形式和函数approximation。
  • results: 该论文的实验结果显示,RNAC方法可以在多个 MuJoCo 环境和一个实际世界的TurtleBot导航任务中提供良好的Robust性性能。
    Abstract We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.
    摘要 我们研究了一种robust reinforcement learning(RL)方法,以确定在训练环境和测试环境之间存在模型匹配不良的情况下,可以达到良好的策略。之前的策略基于RL算法主要在表格设定下进行了不确定性集的使用,但是当状态数量增加时,这些算法就不再可行了。为此,我们提出了两种新的不确定性集形式,一种基于双抽样,另一种基于 интеграル概率度量。这两种形式使得大规模的RL问题变得可 tractable,即使只有训练环境的模型。我们提出了一种robust natural actor-critic(RNAC)方法,该方法包括新的不确定性集和函数近似。我们提供了finite-time converges guarantees,表明RNAC算法在函数近似误差下可以在有限时间内 converges到最佳robust策略。最后,我们在多个MuJoCo环境和一个实际世界TurtleBot导航任务中证明了我们提出的RNAC策略的robust性。

Latent Space Representations of Neural Algorithmic Reasoners

  • paper_url: http://arxiv.org/abs/2307.08874
  • repo_url: https://github.com/mirjanic/nar-latent-spaces
  • paper_authors: Vladimir V. Mirjanić, Razvan Pascanu, Petar Veličković
  • for: 这个研究探讨了神经算法逻辑(NAR)领域中使用神经网络架构来可靠地捕捉经典计算方法的问题。
  • methods: 该研究使用图神经网络(GNN)架构,将输入编码成高维隐藏空间,并在执行算法时进行重复转换。
  • results: 研究发现GNN架构中的隐藏空间结构存在两种可能的失败模式:(1)loss of resolution,导致同样的值很难分辨;(2)无法处理训练期间未见到的值。提议使用softmax汇聚器和衰减隐藏空间来解决这两种问题,并证明这些改进可以在CLRS-30标准测试集上提高大多数算法的性能。
    Abstract Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at \href{https://github.com/mirjanic/nar-latent-spaces}{https://github.com/mirjanic/nar-latent-spaces}.
    摘要

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

  • paper_url: http://arxiv.org/abs/2307.08873
  • repo_url: None
  • paper_authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
  • for: 降低奖励学习中的风险偏好,即使是在数值上有些偏好。
  • methods: 使用新的风险度量——吉尼偏度,代替传统的归一化奖励方法。
  • results: 在具有明确的风险偏好的领域中,通过对吉尼偏度进行优化,实现高回报低风险的策略学习。其他方法在这些领域中很难学习一个合理的策略。
    Abstract Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
    摘要 限制策略回报的方差是现代学习控制(RL)中的一个受欢迎选择,因为它的数学定义具有明确的定义和易于理解的解释。传统方法直接限制总返回方差。最近的方法则限制每步奖励方差作为代理。我们详细检查了这些方差基于的方法的局限性,如数字化缩放的感度和策略学习妨碍,并提出一种新的风险度量,即吉尼度偏离,作为替代。我们研究了这种新的风险度量的多种性质,并 derivation 一种策略梯度算法来最小化它。实验表明,当其他策略不能学习合理的策略时,我们的算法可以减轻方差基于的风险度量的局限性,并在 variance 和吉尼度偏离方面实现高回报低风险。

Meta-Value Learning: a General Framework for Learning with Learning Awareness

  • paper_url: http://arxiv.org/abs/2307.08863
  • repo_url: https://github.com/metavaluelearning/metavaluelearning
  • paper_authors: Tim Cooijmans, Milad Aghajohari, Aaron Courville
  • For: 本 paper 的目的是解决多体系统中的梯度学习问题,因为梯度来自于一个第一个模型,这个模型不会考虑多体间学习过程的交互。* Methods: 本 paper extend了 LOLA 的想法,开发了一种全面的值基定义优化方法。这种方法的核心是一个我们称为元价值函数,它在每个 JOINT-policy 空间中为每个代理给出一个折抵负号的优化目标。我们 argue 这个梯度比原始目标更可靠,因为元价值函数来自于优化过程中的实际观察。* Results: 我们通过对 Logistic Game 和 Iterated Prisoner’s Dilemma 两个问题进行分析,显示了我们的方法的行为。
    Abstract Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We extend the ideas of LOLA and develop a fully-general value-based approach to optimization. At the core is a function we call the meta-value, which at each point in joint-policy space gives for each agent a discounted sum of its objective over future optimization steps. We argue that the gradient of the meta-value gives a more reliable improvement direction than the gradient of the original objective, because the meta-value derives from empirical observations of the effects of optimization. We show how the meta-value can be approximated by training a neural network to minimize TD error along optimization trajectories in which agents follow the gradient of the meta-value. We analyze the behavior of our method on the Logistic Game and on the Iterated Prisoner's Dilemma.
    摘要 gradient-based learning in multi-agent systems 困难,因为梯度来自于一个第一阶模型,这个模型不考虑多个代理机器学习过程之间的互动。LOLA(arXiv:1709.04326)提出了一种解决方案,通过一步优化差分。我们在LOLA的基础上发展了一种完全普遍的价值基于方法,其核心是一个我们称为“元价值”的函数,每个代理机器在联合策略空间中的每个点处给每个代理机器一个折扣的未来优化步骤中的目标减少和。我们 argue that the gradient of the meta-value gives a more reliable improvement direction than the gradient of the original objective, because the meta-value derives from empirical observations of the effects of optimization. We show how the meta-value can be approximated by training a neural network to minimize TD error along optimization trajectories in which agents follow the gradient of the meta-value。我们分析了我们的方法在Logistic Game和Iterated Prisoner's Dilemma中的行为。

Curriculum Learning for Graph Neural Networks: A Multiview Competence-based Approach

  • paper_url: http://arxiv.org/abs/2307.08859
  • repo_url: https://github.com/CLU-UML/MCCL
  • paper_authors: Nidhi Vakil, Hadi Amiri
  • for: 本研究旨在提出一种基于图复杂度形式和模型能力的新方法,以便在图神经网络训练中进行有效的课程学习。
  • methods: 该方法使用了一种调度方案,以确定有效的课程,并考虑了不同的图Difficulty标准和模型能力 durante el entrenamiento。
  • results: 实验结果表明,该方法可以在真实世界的链接预测和节点分类任务中提供更高的效果,比如使用多个图Difficulty标准和模型能力来评估模型的性能。
    Abstract A curriculum is a planned sequence of learning materials and an effective one can make learning efficient and effective for both humans and machines. Recent studies developed effective data-driven curriculum learning approaches for training graph neural networks in language applications. However, existing curriculum learning approaches often employ a single criterion of difficulty in their training paradigms. In this paper, we propose a new perspective on curriculum learning by introducing a novel approach that builds on graph complexity formalisms (as difficulty criteria) and model competence during training. The model consists of a scheduling scheme which derives effective curricula by accounting for different views of sample difficulty and model competence during training. The proposed solution advances existing research in curriculum learning for graph neural networks with the ability to incorporate a fine-grained spectrum of graph difficulty criteria in their training paradigms. Experimental results on real-world link prediction and node classification tasks illustrate the effectiveness of the proposed approach.
    摘要 一个课程是一个规划的学习材料序列,一个有效的课程可以使学习变得更加效率和有效。现代研究已经开发出了训练图型神经网络的数据驱动课程学习方法。然而,现有的课程学习方法通常使用单一的难度标准来训练。在这篇论文中,我们提出了一个新的观点,即基于图型复杂性形式(难度标准)和模型能力的课程学习方法。我们的方法包括一个时间表,它可以从训练中的不同角度来评估题目难度和模型能力,并从中 derivate 有效的课程。我们的解决方案超越了现有的课程学习研究,可以将图型难度标准细分为训练中的不同角度。实验结果显示,我们的方法在实际的连接预测和节点分类任务中具有优秀的效果。

An Admissible Shift-Consistent Method for Recommender Systems

  • paper_url: http://arxiv.org/abs/2307.08857
  • repo_url: None
  • paper_authors: Tung Nguyen, Jeffrey Uhlmann
  • for: solves matrix/tensor completion problems in the context of recommender systems
  • methods: proposes a new constraint called shift-consistency, and provides a rigorous mathematical description of the method
  • results: provably guarantees several key mathematical properties, including satisfaction of an admissibility criterion, fairness, and robustness
    Abstract In this paper, we propose a new constraint, called shift-consistency, for solving matrix/tensor completion problems in the context of recommender systems. Our method provably guarantees several key mathematical properties: (1) satisfies a recently established admissibility criterion for recommender systems; (2) satisfies a definition of fairness that eliminates a specific class of potential opportunities for users to maliciously influence system recommendations; and (3) offers robustness by exploiting provable uniqueness of missing-value imputation. We provide a rigorous mathematical description of the method, including its generalization from matrix to tensor form to permit representation and exploitation of complex structural relationships among sets of user and product attributes. We argue that our analysis suggests a structured means for defining latent-space projections that can permit provable performance properties to be established for machine learning methods.
    摘要 在这篇论文中,我们提出了一个新的约束,称为偏移一致性,用于解决Matrix/Tensor completion问题在推荐系统中。我们的方法可以证明满足以下几个关键数学性质:(1)满足推荐系统中最近确立的适用性标准;(2)满足一种定义的公平性,以消除用户恶意影响推荐系统的可能性;(3)具有耐用性,通过利用缺失值填充的可证明唯一性来抗衡。我们提供了一个严格的数学描述,包括矩阵到多重形式的普遍化,以利用用户和产品特征之间的复杂结构关系。我们认为,我们的分析表明了一种结构化的方式,可以让 latent-space 投影具有可证明性能特性。

Autoregressive Diffusion Model for Graph Generation

  • paper_url: http://arxiv.org/abs/2307.08849
  • repo_url: None
  • paper_authors: Lingkai Kong, Jiaming Cui, Haotian Sun, Yuchen Zhuang, B. Aditya Prakash, Chao Zhang
  • for: 本文提出了一种束缚基于模型 для图形生成。
  • methods: 该模型使用自适应扩散过程,直接在零域图空间中操作。在前向扩散过程中,我们设计了一个数据依赖的节点吸引排序网络,用于学习图的排序。在反向生成过程中,我们设计了一个减噪网络,用于高效地重建图。
  • results: 我们在六种不同的通用图数据集和两种分子数据集上进行了实验,结果显示,我们的模型可以与之前的状态地图形成比或更好,同时具有快速的生成速度。
    Abstract Diffusion-based graph generative models have recently obtained promising results for graph generation. However, existing diffusion-based graph generative models are mostly one-shot generative models that apply Gaussian diffusion in the dequantized adjacency matrix space. Such a strategy can suffer from difficulty in model training, slow sampling speed, and incapability of incorporating constraints. We propose an \emph{autoregressive diffusion} model for graph generation. Unlike existing methods, we define a node-absorbing diffusion process that operates directly in the discrete graph space. For forward diffusion, we design a \emph{diffusion ordering network}, which learns a data-dependent node absorbing ordering from graph topology. For reverse generation, we design a \emph{denoising network} that uses the reverse node ordering to efficiently reconstruct the graph by predicting the node type of the new node and its edges with previously denoised nodes at a time. Based on the permutation invariance of graph, we show that the two networks can be jointly trained by optimizing a simple lower bound of data likelihood. Our experiments on six diverse generic graph datasets and two molecule datasets show that our model achieves better or comparable generation performance with previous state-of-the-art, and meanwhile enjoys fast generation speed.
    摘要 Diffusion-based图生成模型在最近得到了优秀的结果,但现有的扩散基于的图生成模型多数是一次性的生成模型,它们在减量化的相对位图空间内应用扩散。这种策略可能会受到训练模型的困难,慢速的采样速度和约束的不具备。我们提出了一种“自适应扩散”模型,与现有方法不同,我们在离散图空间直接定义节点吸引扩散过程。对于前进扩散,我们设计了一个“扩散排序网络”,它学习从图ptopology得到数据依赖的节点吸引排序。对于逆生成,我们设计了一个“除噪网络”,它使用反向节点排序来高效地重建图, predicting the node type of the new node and its edges with previously denoised nodes at a time。基于图的幂等性,我们表明了这两个网络可以同时训练,通过优化数据可能性函数的简单下界来优化。我们在六种多样化的生成图据集和两个分子数据集上进行了实验,结果表明我们的模型可以与之前的状态时的性能相当或更好,同时具有快速的生成速度。

Privacy-preserving patient clustering for personalized federated learning

  • paper_url: http://arxiv.org/abs/2307.08847
  • repo_url: https://github.com/g2lab/pcfbl
  • paper_authors: Ahmed Elhussein, Gamze Gursoy
  • For: 这个研究旨在解决 Federated Learning (FL) 中 data 非同一性 Independent Distribution (non-IID) 问题,并提出 Privacy-preserving Community-Based Federated machine Learning (PCBFL) 框架,可以在不同医院中训练分组学习模型,并保护隐私。* Methods: PCBFL 使用 Secure Multiparty Computation (SMPC) 技术,可以安全地计算不同医院中病人的相似性分数,并使用 clustering 算法将病人分组。* Results: PCBFL 可以成功地将病人分组为低、中、高风险三种群体,并与传统和现有的 Clustered FL 框架进行比较,获得了平均 AUC 提升率4.3% 和 AUPRC 提升率7.8%。
    Abstract Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.
    摘要 federated learning (FL) 是一种机器学习框架,允许多个组织共同训练模型,无需将数据分享到中央服务器。然而,如果数据不是非 identical independently distributed (non-IID),FL 会经受显著性能下降。这是医疗设置中的问题,Variations in the patient population contribute significantly to distribution differences across hospitals。personalized FL 解决了这个问题,通过考虑各地点特定的分布差异。clustered FL,一种个人化 FL 变体,使用 clustering 方法将患者分组,并在每个组上训练 separating 模型。然而,隐私问题仍然成为挑战,因为 clustering 过程需要交换患者级别信息。这已经解决了通过使用聚合数据来组成 clusters,但这会导致不准确的组和性能下降。在本研究中,我们提出了隐私保护的社区基于 Federated 机器学习 (PCBFL),一种新的 clustering FL 框架,可以在患者级别数据上 clustering 患者,同时保护隐私。PCBFL 使用 Secure Multiparty Computation,一种密码学技术,以安全地计算各地点患者相似度分数。我们然后评估 PCBFL,通过在 20 个 eICU 数据集中训练一个联邦 Mortality 预测模型。我们比较 PCBFL 的性能与传统和现有的 clustering FL 框架。我们的结果表明,PCBFL 成功划分了低、中、高风险患者的临床意义full cohort。PCBFL 与传统和现有的 clustering FL 框架相比,平均 AUC 提高4.3%,AUPRC 提高7.8%。

Bayesian Safe Policy Learning with Chance Constrained Optimization: Application to Military Security Assessment during the Vietnam War

  • paper_url: http://arxiv.org/abs/2307.08840
  • repo_url: None
  • paper_authors: Zeyang Jia, Eli Ben-Michael, Kosuke Imai
  • For: The paper aims to improve a security assessment algorithm used during the Vietnam War by using outcomes measured immediately after its introduction in late 1969.* Methods: The paper introduces the Average Conditional Risk (ACRisk) to quantify the risk of worse outcomes for subgroups of individual units, and a Bayesian policy learning framework to maximize the posterior expected value while controlling the ACRisk.* Results: The learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors, compared to the actual algorithm used during the Vietnam War.Here are the three points in Simplified Chinese text:* For: 该文章目标是通过1969年底引入的出口来改进越南战争期间安全评估算法。* Methods: 该文章提出了 Conditional Risk (ACRisk) 来衡量各个单位 subgroup 的输出风险,以及 Bayesian 政策学习框架来控制 ACrisk 并最大化 posterior 期望值。* Results: 学习的算法认为大多数地区更安全,并且强调经济和政治因素比军事因素更重要,与实际使用的算法不同。
    Abstract Algorithmic and data-driven decisions and recommendations are commonly used in high-stakes decision-making settings such as criminal justice, medicine, and public policy. We investigate whether it would have been possible to improve a security assessment algorithm employed during the Vietnam War, using outcomes measured immediately after its introduction in late 1969. This empirical application raises several methodological challenges that frequently arise in high-stakes algorithmic decision-making. First, before implementing a new algorithm, it is essential to characterize and control the risk of yielding worse outcomes than the existing algorithm. Second, the existing algorithm is deterministic, and learning a new algorithm requires transparent extrapolation. Third, the existing algorithm involves discrete decision tables that are common but difficult to optimize over. To address these challenges, we introduce the Average Conditional Risk (ACRisk), which first quantifies the risk that a new algorithmic policy leads to worse outcomes for subgroups of individual units and then averages this over the distribution of subgroups. We also propose a Bayesian policy learning framework that maximizes the posterior expected value while controlling the posterior expected ACRisk. This framework separates the estimation of heterogeneous treatment effects from policy optimization, enabling flexible estimation of effects and optimization over complex policy classes. We characterize the resulting chance-constrained optimization problem as a constrained linear programming problem. Our analysis shows that compared to the actual algorithm used during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors.
    摘要 高科技和数据驱动的决策和建议在高风险决策场景中广泛应用,如刑事司法、医疗和公共政策。我们调查了越南战争期间使用的安全评估算法是否可以改进,使用实际实施后的1969年底的结果。这种实践中出现了一些高风险算法决策中常见的方法学挑战。首先,在实施新算法之前,需要评估和控制新算法可能导致差化的风险。第二,现有的算法是 deterministic,需要透明地推断新算法。第三,现有的算法包含精确的决策表,这些表difficult to optimize。为了解决这些挑战,我们引入了 Conditional Risk (ACRisk),它首先评估新算法政策对各个单位的 subgroup 的风险差化,然后平均这些风险。我们还提出了 Bayesian 政策学习框架,该框架在控制 posterior 预期值时最大化预期值,并且可以灵活地估计影响和优化复杂的政策类型。我们将这种机会constrained optimization问题 characterized as a linear programming problem。我们的分析表明,相比 actual algorithm 使用 during the Vietnam War, the learned algorithm assesses most regions as more secure and emphasizes economic and political factors over military factors。

A Meta-Learning Based Precoder Optimization Framework for Rate-Splitting Multiple Access

  • paper_url: http://arxiv.org/abs/2307.08822
  • repo_url: None
  • paper_authors: Rafael Cerna Loli, Bruno Clerckx
  • for: 提出一种基于元学习的RSMA前处理优化框架,直接在无法知道整个通道状态情况的情况下优化RSMA前处理器。
  • methods: 利用含义过拟合的卷积神经网络来最大化显式均值总bitrate表达式,从而绕过需要其他训练数据的限制。
  • results: 数值结果显示,元学习基于的解决方案在中等规模场景下与传统前处理优化相当,在大规模场景下明显超越低复杂度前处理算法。
    Abstract In this letter, we propose the use of a meta-learning based precoder optimization framework to directly optimize the Rate-Splitting Multiple Access (RSMA) precoders with partial Channel State Information at the Transmitter (CSIT). By exploiting the overfitting of the compact neural network to maximize the explicit Average Sum-Rate (ASR) expression, we effectively bypass the need for any other training data while minimizing the total running time. Numerical results reveal that the meta-learning based solution achieves similar ASR performance to conventional precoder optimization in medium-scale scenarios, and significantly outperforms sub-optimal low complexity precoder algorithms in the large-scale regime.
    摘要 在这封信中,我们提议使用基于meta-学习的precoder优化框架来直接优化Rate-Splitting Multiple Access(RSMA)precoder,使用 transmitter (CSIT)中的部分通道状态信息。通过利用Compact Neural Network的过度适应来最大化显式Average Sum-Rate(ASR)表达,我们可以快速减少训练数据量,同时减少总耗时。 numerically 的结果表明,基于meta-学习的解决方案在中型情景下与传统precoder优化相似的ASR性能,而在大规模情景下明显超过低复杂度precoder算法。Here's the translation of the text into Traditional Chinese:在这封信中,我们提议使用基于meta-学习的precoder优化框架来直接优化Rate-Splitting Multiple Access(RSMA)precoder,使用传递器(CSIT)中的部分通道状态信息。通过利用Compact Neural Network的过度适应来最大化显式Average Sum-Rate(ASR)表达,我们可以快速减少训练数据量,同时减少总耗时。numerically 的结果显示,基于meta-学习的解决方案在中型情景下与传统precoder优化相似的ASR性能,而在大规模情景下明显超过低复杂度precoder算法。

Towards Accelerating Benders Decomposition via Reinforcement Learning Surrogate Models

  • paper_url: http://arxiv.org/abs/2307.08816
  • repo_url: None
  • paper_authors: Stephen Mak, Kyle Mana, Parisa Zehtabi, Michael Cashmore, Daniele Magazzeni, Manuela Veloso
  • for: 这篇论文是为了提出一种快速化Benders decomposition(BD)方法,以解决受不确定性影响的数值优化问题。
  • methods: 本论文使用的方法是BD方法,并利用一个代理模型来取代NP困难的整数主问题,以加速BD方法的执行。
  • results: 在实验中,这种加速BD方法可以让解决随机存储管理问题的时间提高30%,比其他加速BD实现方法更快。
    Abstract Stochastic optimization (SO) attempts to offer optimal decisions in the presence of uncertainty. Often, the classical formulation of these problems becomes intractable due to (a) the number of scenarios required to capture the uncertainty and (b) the discrete nature of real-world planning problems. To overcome these tractability issues, practitioners turn to decomposition methods that divide the problem into smaller, more tractable sub-problems. The focal decomposition method of this paper is Benders decomposition (BD), which decomposes stochastic optimization problems on the basis of scenario independence. In this paper we propose a method of accelerating BD with the aid of a surrogate model in place of an NP-hard integer master problem. Through the acceleration method we observe 30% faster average convergence when compared to other accelerated BD implementations. We introduce a reinforcement learning agent as a surrogate and demonstrate how it can be used to solve a stochastic inventory management problem.
    摘要 In this paper, we propose a method for accelerating Benders decomposition (BD), a popular decomposition method for stochastic optimization problems, using a surrogate model in place of the NP-hard integer master problem. Our method leverages a reinforcement learning agent as the surrogate and demonstrates its effectiveness in solving a stochastic inventory management problem. We observe an average convergence rate 30% faster than other accelerated BD implementations.Here is the text in Simplified Chinese: Stochastic optimization (SO) 尝试提供在不确定环境中的优化决策。然而, classical 的问题表述方式可能会变得不可求解,因为需要大量的enario来捕捉不确定性,以及实际规划问题的精度问题。为了解决这些 tractability 问题,专家们经常使用分解方法,将问题分解成更加可控的子问题。在这篇论文中,我们提出了使用代理模型加速 Benders 分解(BD)的方法。我们使用 reinforcement learning 代理来解决一个不确定存储管理问题。我们观察到,使用这种加速方法可以比其他加速BD实现的平均速度提高30%。

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

  • paper_url: http://arxiv.org/abs/2307.08813
  • repo_url: https://github.com/boxorange/bioie-llm
  • paper_authors: Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa López-Marrero, Patrick Johnstone, Shinjae Yoo, Francis J. Alexander
  • for: 这项研究的目的是使用大型自然语言模型来自动从科学文献中提取蛋白质相互作用、蛋白质通路和基因调控关系的知识。
  • methods: 本研究使用了不同的大型自然语言模型来完成蛋白质相互作用、蛋白质通路和基因调控关系的识别任务。
  • results: 研究发现了不同的大型自然语言模型在完成这些任务时的效果,并提供了一些显著的发现和未来的机会,以及仍然存在的挑战。In English, it means:
  • for: The goal of this study is to use large language models to automatically extract knowledge of protein interactions, pathways, and gene regulatory relations from scientific literature.
  • methods: The study uses different large language models to complete tasks of recognizing protein interactions, pathways, and gene regulatory relations.
  • results: The study finds the effectiveness of different language models in completing these tasks, provides significant findings, and discusses future opportunities and remaining challenges.
    Abstract Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, pathways, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM
    摘要 理解蛋白交互和生物路径知识是生物系统复杂性的关键,帮助我们探索生物功能和复杂疾病的基础机理。现有数据库提供了文献和其他来源中的生物数据,但这些数据库经常受到不完整性和维护劳动的限制,需要新的方法。在本研究中,我们利用大型自然语言模型来解决这些问题,自动从相关的科学文献中提取生物知识。为达到这个目标,我们在这篇论文中评估了不同的大型自然语言模型在蛋白交互、生物路径和蛋白质调控关系的识别任务中的效果。我们仔细评估了各模型的表现,披露了重要的发现,并讨论了这种方法的未来机会和仍然存在的挑战。代码和数据可以在 GitHub 上获取:

DeepMem: ML Models as storage channels and their (mis-)applications

  • paper_url: http://arxiv.org/abs/2307.08811
  • repo_url: None
  • paper_authors: Md Abdullah Al Mamun, Quazi Mishkatul Alam, Erfan Shaigani, Pedram Zaree, Ihsen Alouani, Nael Abu-Ghazaleh
  • for: 本文提出了一种新的信息理论视角,视 ML 模型为一个存储通道,并研究了在这个存储通道上进行隐藏信息的存储和检测。
  • methods: 作者使用了一种黑盒访问方式,通过在训练时嵌入隐藏信息,并在部署后使用黑盒访问来检测和提取隐藏信息。
  • results: 作者分析了存储 primitives 和检测 primitives,并提出了一种基于 ML 特有的替换基于错误 correction 协议来提高存储 primitives 的可靠性。
    Abstract Machine learning (ML) models are overparameterized to support generality and avoid overfitting. Prior works have shown that these additional parameters can be used for both malicious (e.g., hiding a model covertly within a trained model) and beneficial purposes (e.g., watermarking a model). In this paper, we propose a novel information theoretic perspective of the problem; we consider the ML model as a storage channel with a capacity that increases with overparameterization. Specifically, we consider a sender that embeds arbitrary information in the model at training time, which can be extracted by a receiver with a black-box access to the deployed model. We derive an upper bound on the capacity of the channel based on the number of available parameters. We then explore black-box write and read primitives that allow the attacker to: (i) store data in an optimized way within the model by augmenting the training data at the transmitter side, and (ii) to read it by querying the model after it is deployed. We also analyze the detectability of the writing primitive and consider a new version of the problem which takes information storage covertness into account. Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.
    摘要 Translated into Simplified Chinese:机器学习(ML)模型通常过过参数化来支持通用性和避免过拟合。先前的研究表明,这些额外参数可以用于both malicious(如隐藏一个模型在训练过程中)和有益目的(如模型水印)。在这篇论文中,我们提出了一种新的信息理论视角,视ML模型为一个存储通道,其容量随参数的增加而增加。 Specifically,我们考虑一个发送者在训练时将自定义信息嵌入模型中,并且通过黑盒访问已部署模型来提取这些信息。我们得出了参数的容量的上限,并explore黑盒写和读 primitives,allowing the attacker to: (i) 在发送方 сторо面优化数据,以便在部署后通过模型进行读取,和 (ii) 通过访问部署后的模型来读取数据。我们还分析了写 primitives的检测性,并考虑了一个新的问题,即存储隐蔽性。 Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.Translated into Traditional Chinese:机器学习(ML)模型通常过过参数化来支持通用性和避免过拟合。先前的研究表明,这些额外参数可以用于both malicious(如隐藏一个模型在训练过程中)和有益目的(如模型水印)。在这篇论文中,我们提出了一个新的信息理论角度,视ML模型为一个存储通道,其容量随参数的增加而增加。 Specifically,我们考虑一个发送者在训练时将自定义信息嵌入模型中,并且透过黑盒访问已部署模型来提取这些信息。我们得出了参数的容量的上限,并explore黑盒写和读 primitives,allowing the attacker to: (i) 在发送方 сторо面优化数据,以便在部署后通过模型进行读取,和 (ii) 通过访问部署后的模型来读取数据。我们还分析了写 primitives的检测性,并考虑了一个新的问题,即存储隐蔽性。 Specifically, to obtain storage covertness, we introduce a new constraint such that the data augmentation used for the write primitives minimizes the distribution shift with the initial (baseline task) distribution. This constraint introduces a level of "interference" with the initial task, thereby limiting the channel's effective capacity. Therefore, we develop optimizations to improve the capacity in this case, including a novel ML-specific substitution based error correction protocol. We believe that the proposed modeling of the problem offers new tools to better understand and mitigate potential vulnerabilities of ML, especially in the context of increasingly large models.

Operator Guidance Informed by AI-Augmented Simulations

  • paper_url: http://arxiv.org/abs/2307.08810
  • repo_url: None
  • paper_authors: Samuel J. Edwards, Michael Levine
  • for: 这个论文是为了计算船舶响应统计数据的多优化、数据适应方法。
  • methods: 这个论文使用了Long Short-Term Memory(LSTM)神经网络,以及一个快速低精度的计算工具SimpleCode,以及一个更高精度的计算工具Large Amplitude Motion Program(LAMP)。
  • results: 研究发现,使用LSTM神经网络可以准确地估计船舶响应统计数据,并且可以在不同的海洋条件下提供高精度的结果。
    Abstract This paper will present a multi-fidelity, data-adaptive approach with a Long Short-Term Memory (LSTM) neural network to estimate ship response statistics in bimodal, bidirectional seas. The study will employ a fast low-fidelity, volume-based tool SimpleCode and a higher-fidelity tool known as the Large Amplitude Motion Program (LAMP). SimpleCode and LAMP data were generated by common bi-modal, bi-directional sea conditions in the North Atlantic as training data. After training an LSTM network with LAMP ship motion response data, a sample route was traversed and randomly sampled historical weather was input into SimpleCode and the LSTM network, and compared against the higher fidelity results.
    摘要 这篇论文将介绍一种多模精度、数据适应的方法,使用长期快速响应(LSTM)神经网络来估算船舶响应统计在双模态、双向海域中。这项研究将使用快速低精度的SimpleCode工具和高精度的Large Amplitude Motion Program(LAMP)工具。SimpleCode和LAMP数据都是通过共同的双模态、双向海域条件在北大西洋中生成的训练数据。 после训练LSTM网络使用LAMP船舶运动数据,一个示例路线被跨越,并将SimpleCode和LSTM网络输入历史气象数据,并与更高精度结果进行比较。

Local or Global: Selective Knowledge Assimilation for Federated Learning with Limited Labels

  • paper_url: http://arxiv.org/abs/2307.08809
  • repo_url: None
  • paper_authors: Yae Jee Cho, Gauri Joshi, Dimitrios Dimitriadis
  • for: 提高 federated learning 的效果,尤其是在 client 有限的标签数据的情况下。
  • methods: 提出 FedLabel 方法,使 client 可以选择本地或全局模型来pseudo-标签未标签数据,并通过全局-本地一致常量正则化来利用两个模型的知识。
  • results: 在 cross-device 和 cross-silo Setting 中,FedLabel 比其他 semi-supervised FL 基线方法提高 $8$-$24%$,甚至超过了标准全部标签 FL 基线($100%$ 标签数据),只使用 $5$-$20%$ 的标签数据。
    Abstract Many existing FL methods assume clients with fully-labeled data, while in realistic settings, clients have limited labels due to the expensive and laborious process of labeling. Limited labeled local data of the clients often leads to their local model having poor generalization abilities to their larger unlabeled local data, such as having class-distribution mismatch with the unlabeled data. As a result, clients may instead look to benefit from the global model trained across clients to leverage their unlabeled data, but this also becomes difficult due to data heterogeneity across clients. In our work, we propose FedLabel where clients selectively choose the local or global model to pseudo-label their unlabeled data depending on which is more of an expert of the data. We further utilize both the local and global models' knowledge via global-local consistency regularization which minimizes the divergence between the two models' outputs when they have identical pseudo-labels for the unlabeled data. Unlike other semi-supervised FL baselines, our method does not require additional experts other than the local or global model, nor require additional parameters to be communicated. We also do not assume any server-labeled data or fully labeled clients. For both cross-device and cross-silo settings, we show that FedLabel outperforms other semi-supervised FL baselines by $8$-$24\%$, and even outperforms standard fully supervised FL baselines ($100\%$ labeled data) with only $5$-$20\%$ of labeled data.
    摘要

Anomaly Detection with Selective Dictionary Learning

  • paper_url: http://arxiv.org/abs/2307.08807
  • repo_url: https://github.com/denisilie94/pyod-dl
  • paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
  • for: 本研究提出了基于词典学习(DL)和核心词典学习(KDL)的新型异常检测方法。
  • methods: 本研究使用了已知的DL和KDL算法,并将其改进为无监督的异常检测方法。此外,我们还提出了一种减少kernel版本(RKDL),用于解决大数据集问题。
  • results: 我们的算法在一个异常检测工具箱中引入,并与标准 referéncé результаты进行比较。
    Abstract In this paper we present new methods of anomaly detection based on Dictionary Learning (DL) and Kernel Dictionary Learning (KDL). The main contribution consists in the adaption of known DL and KDL algorithms in the form of unsupervised methods, used for outlier detection. We propose a reduced kernel version (RKDL), which is useful for problems with large data sets, due to the large kernel matrix. We also improve the DL and RKDL methods by the use of a random selection of signals, which aims to eliminate the outliers from the training procedure. All our algorithms are introduced in an anomaly detection toolbox and are compared to standard benchmark results.
    摘要 在这篇论文中,我们提出了基于字典学习(DL)和核函数字典学习(KDL)的新方法,用于异常检测。我们的主要贡献在于将知名的DL和KDL算法改进为无监督的方法,用于异常检测。我们还提出了一种减小核kernel版本(RKDL),适用于具有大数据集的问题,因为大 kernel 矩阵。此外,我们还使用随机选择的信号,以消除异常从训练过程中。我们的所有算法都是在异常检测工具箱中引入,并与标准 Referenz结果进行比较。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I can provide the translation in that version as well.

Towards Automated Design of Riboswitches

  • paper_url: http://arxiv.org/abs/2307.08801
  • repo_url: None
  • paper_authors: Frederic Runge, Jörg K. H. Franke, Frank Hutter
  • for: 本研究旨在开发一种新的计算方法,用于降低扩ycz的筛选和选择成本,以提高核酸扩ycz的发现效率。
  • methods: 本研究使用了一种新的结构基于的设计方法,考虑了全球性和愿望的序列和结构特征。
  • results: 研究人员通过使用libLEARNA方法,成功地设计了茶苷核酸扩ycz库,包含30%更多的高质量独特候选者。
    Abstract Experimental screening and selection pipelines for the discovery of novel riboswitches are expensive, time-consuming, and inefficient. Using computational methods to reduce the number of candidates for the screen could drastically decrease these costs. However, existing computational approaches do not fully satisfy all requirements for the design of such initial screening libraries. In this work, we present a new method, libLEARNA, capable of providing RNA focus libraries of diverse variable-length qualified candidates. Our novel structure-based design approach considers global properties as well as desired sequence and structure features. We demonstrate the benefits of our method by designing theophylline riboswitch libraries, following a previously published protocol, and yielding 30% more unique high-quality candidates.
    摘要 现有的实验室检测和选择管道可能会带来高额的成本和时间开销,同时效率也不高。使用计算机方法来减少层次的候选人选择可能会带来极大的成本降低。然而,现有的计算方法并不完全满足初步层次检测图书馆的设计需求。在这个工作中,我们提出了一种新的方法,libLEARNA,可以提供多样化变长资格候选人库。我们的新的结构基于设计方法考虑了全局特性以及愿望的序列和结构特征。我们示出了我们的方法的优势,通过采用已发表的卡夫曼协议,设计了茶苷核酸抑制 riboswitch库,并且获得了30%更多的独特高质量候选人。

regulAS: A Bioinformatics Tool for the Integrative Analysis of Alternative Splicing Regulome using RNA-Seq data

  • paper_url: http://arxiv.org/abs/2307.08800
  • repo_url: https://github.com/slipnitskaya/regulas
  • paper_authors: Sofya Lipnitskaya
  • for: regulAS is designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations in cancer and healthy human donors.
  • methods: regulAS uses integrative analysis of large-scale RNA-Seq data from TCGA and GTEx projects, with features such as RNA-Seq data retrieval, predictive modeling, and flexible reporting.
  • results: regulAS provides automated solutions for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, with the extensibility to tailor the software to specific research needs.Here’s the same information in Simplified Chinese:
  • for: regulAS 是为 computation biology 研究人员提供一个支持工具,用于调查转录调节变化的规则机制,以及人体和癌症样本中的转录调节。
  • methods: regulAS 使用了大规模 RNA-Seq 数据,包括 TCGA 和 GTEx 项目,并提供了一些功能,如 RNA-Seq 数据检索、预测模型和灵活报告生成。
  • results: regulAS 提供了一个自动化的解决方案,用于研究转录调节和癌症生物学,提高了效率、可重复性和自定义实验设计的能力,同时允许研究人员根据自己的需求进行特定的自定义和扩展。
    Abstract The regulAS software package is a bioinformatics tool designed to support computational biology researchers in investigating regulatory mechanisms of splicing alterations through integrative analysis of large-scale RNA-Seq data from cancer and healthy human donors, characterized by TCGA and GTEx projects. This technical report provides a comprehensive overview of regulAS, focusing on its core functionality, basic modules, experiment configuration, further extensibility and customisation. The core functionality of regulAS enables the automation of computational experiments, efficient results storage and processing, and streamlined workflow management. Integrated basic modules extend regulAS with features such as RNA-Seq data retrieval from the public multi-omics UCSC Xena data repository, predictive modeling and feature ranking capabilities using the scikit-learn package, and flexible reporting generation for analysing gene expression profiles and relevant modulations of alternative splicing aberrations across tissues and cancer types. Experiment configuration is handled through YAML files with the Hydra and OmegaConf libraries, offering a user-friendly approach. Additionally, regulAS allows for the development and integration of custom modules to handle specialized tasks. In conclusion, regulAS provides an automated solution for alternative splicing and cancer biology studies, enhancing efficiency, reproducibility, and customization of experimental design, while the extensibility of the pipeline enables researchers to further tailor the software package to their specific needs. Source code is available under the MIT license at https://github.com/slipnitskaya/regulAS.
    摘要 regulAS 软件包是一款 bioinformatics 工具,用于支持生物计算研究人员在 investigate 蛋白水平修饰的调控机制方面进行集成分析大规模 RNA-Seq 数据。这份技术报告提供了 regulAS 的全面介绍,重点介绍其核心功能、基本模块、实验配置、进一步扩展和自定义。regulAS 的核心功能包括自动化计算实验、高效存储和处理结果,以及流程管理。 integrate 的基本模块包括从公共多元素 UCSC Xena 数据存储库中获取 RNA-Seq 数据、使用 scikit-learn 包进行预测模型和特征排名,以及自定义报告生成分析蛋白表达资料和相关的修饰异常现象 across 组织和癌种。实验配置通过 YAML 文件与 Hydra 和 OmegaConf 库进行处理,提供了一种用户友好的方法。此外,regulAS 还允许开发和集成特殊任务的自定义模块。总之,regulAS 提供了一个自动化的蛋白水平修饰和癌生物学研究的解决方案,提高了效率、可重复性和实验设计的自定义能力,同时 pipeline 的可扩展性允许研究人员根据自己的具体需求进行进一步的定制。源代码可以在 获取,采用 MIT 许可证。

Reduced Kernel Dictionary Learning

  • paper_url: http://arxiv.org/abs/2307.08798
  • repo_url: https://github.com/denisilie94/rkdl
  • paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
  • for: 这篇论文是为了解决大量数据集时,常见的问题:核心矩阵的大小。
  • methods: 本文提出了一种新的方法,即使用训练精简表示法来生成减少大小的非线性表示。具体来说,我们使用梯度下降步骤来优化核kernel向量。
  • results: 我们通过三个数据集的实验显示,我们的方法可以提供更好的表示,即使使用一小数量的核kernel向量,同时也可以降低执行时间。
    Abstract In this paper we present new algorithms for training reduced-size nonlinear representations in the Kernel Dictionary Learning (KDL) problem. Standard KDL has the drawback of a large size of the kernel matrix when the data set is large. There are several ways of reducing the kernel size, notably Nystr\"om sampling. We propose here a method more in the spirit of dictionary learning, where the kernel vectors are obtained with a trained sparse representation of the input signals. Moreover, we optimize directly the kernel vectors in the KDL process, using gradient descent steps. We show with three data sets that our algorithms are able to provide better representations, despite using a small number of kernel vectors, and also decrease the execution time with respect to KDL.
    摘要 在这篇论文中,我们提出了新的算法用于在kernel Dictionary Learning(KDL)问题中训练减小非线性表示。标准KDL在数据集大时具有大kernel矩阵的缺点。有几种减小kernel大小的方法,其中一种是Nystr\"om sampling。我们在这里提出了一种更接近字典学习的方法,其中kernel вектор通过输入信号的训练稀缺表示获得。此外,我们直接在KDL过程中优化kernel вектор,使用梯度下降步骤。我们通过三个数据集的实验表明,我们的算法能够提供更好的表示,即使使用少量kernel вектор,同时也降低了与KDL的执行时间。

Classification with Incoherent Kernel Dictionary Learning

  • paper_url: http://arxiv.org/abs/2307.08796
  • repo_url: https://github.com/denisilie94/incoherent-kernel-dictionary-learning
  • paper_authors: Denis C. Ilie-Ablachim, Bogdan Dumitrescu
  • for: 这个论文提出了一种基于字典学习(DL)的新的分类方法。
  • methods: 该方法使用了一种基于kernel的协方差DL,与标准线性版本的DL进行比较。此外,我们还提出了AK-SVD算法中的表示更新改进。
  • results: 我们对多个流行的分类问题数据库进行了测试,并得到了优秀的结果。
    Abstract In this paper we present a new classification method based on Dictionary Learning (DL). The main contribution consists of a kernel version of incoherent DL, derived from its standard linear counterpart. We also propose an improvement of the AK-SVD algorithm concerning the representation update. Our algorithms are tested on several popular databases of classification problems.
    摘要 在这篇论文中,我们提出了一种基于词典学习(DL)的新的分类方法。我们的主要贡献是基于 стандар的线性DL的kernel版本。此外,我们还提出了对AK-SVD算法的表示更新方法。我们的算法在一些流行的分类问题数据库上进行了测试。Here's a breakdown of the translation:* "In this paper" becomes "在这篇论文中"* "we present" becomes "我们提出"* "a new classification method" becomes "一种新的分类方法"* "based on Dictionary Learning (DL)" becomes "基于词典学习(DL)"* "The main contribution consists of" becomes "主要贡献是"* "a kernel version of incoherent DL" becomes "基于stanдар的线性DL的kernel版本"* "derived from its standard linear counterpart" becomes "从其标准线性对应部分 derivated"* "We also propose an improvement of the AK-SVD algorithm" becomes "此外,我们还提出了对AK-SVD算法的表示更新方法"* "concerning the representation update" becomes "关于表示更新"* "Our algorithms are tested on several popular databases of classification problems" becomes "我们的算法在一些流行的分类问题数据库上进行了测试"

Non-Stationary Policy Learning for Multi-Timescale Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.08794
  • repo_url: None
  • paper_authors: Patrick Emami, Xiangyu Zhang, David Biagioni, Ahmed S. Zamzam
  • for: This paper is written for learning non-stationary policies in multi-timescale multi-agent reinforcement learning (MARL) environments.
  • methods: The paper proposes a simple framework for learning non-stationary policies, using available information about agent timescales to define a periodic time encoding. The proposed algorithm uses phase-functioned neural networks to parameterize the actor and critic, providing an inductive bias for periodicity.
  • results: The paper demonstrates the effectiveness of the proposed framework in learning multi-timescale policies through simulations in a gridworld and building energy management environment.
    Abstract In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effectively learn multi-timescale policies is validated on a gridworld and building energy management environment.
    摘要 在多时间步骤多代理人学习(MARL)中,代理人在不同的时间步骤之间互动。一般来说,由多个时间步骤引起的政策是非站立的。学习非站立政策具有挑战性,通常需要复杂或不fficient的算法。由实际世界中复杂系统中的控制问题的普遍性启发了我们,我们提出了一个简单的框架 для学习非站立政策。我们使用代理人的时间步骤信息来定义周期时间编码。在详细的演示中,我们证明了由多个时间步骤引起的非站立效果可以通过周期多代理人政策学习。为学习这种政策,我们提议使用phasic函数神经网络来参数化actor和critic,这些神经网络提供了周期性的偏好。我们的框架能够有效地学习多时间步骤的政策,并在格リッド世界和建筑能源管理环境中验证了其效果。

Quarl: A Learning-Based Quantum Circuit Optimizer

  • paper_url: http://arxiv.org/abs/2307.10120
  • repo_url: None
  • paper_authors: Zikun Li, Jinjun Peng, Yixuan Mei, Sina Lin, Yi Wu, Oded Padon, Zhihao Jia
  • for: 优化量子Circuit是一个具有很大搜索空间的函数等价Circuit的优化问题,需要应用变换来实现最终性能提高。这篇论文介绍了Quarl,一种基于学习的量子Circuit优化器。
  • methods: Quarl使用了复制学习(RL)来优化量子Circuit,但RL在量子Circuit优化中存在两个主要挑战:巨大和变化的行动空间,以及非均匀的状态表示。Quarl使用了一种新的神经网络架构和RL训练过程来解决这些问题。
  • results: 我们的评估显示,Quarl在大多数benchmark Circuit上显著超越了现有的Circuit优化器。另外,Quarl可以学习执行旋转合并,这是现有优化器中的一种复杂、非本地的循环优化。
    Abstract Optimizing quantum circuits is challenging due to the very large search space of functionally equivalent circuits and the necessity of applying transformations that temporarily decrease performance to achieve a final performance improvement. This paper presents Quarl, a learning-based quantum circuit optimizer. Applying reinforcement learning (RL) to quantum circuit optimization raises two main challenges: the large and varying action space and the non-uniform state representation. Quarl addresses these issues with a novel neural architecture and RL-training procedure. Our neural architecture decomposes the action space into two parts and leverages graph neural networks in its state representation, both of which are guided by the intuition that optimization decisions can be mostly guided by local reasoning while allowing global circuit-wide reasoning. Our evaluation shows that Quarl significantly outperforms existing circuit optimizers on almost all benchmark circuits. Surprisingly, Quarl can learn to perform rotation merging, a complex, non-local circuit optimization implemented as a separate pass in existing optimizers.
    摘要 优化量子Circuit是一项挑战性的任务,因为函数相同Circuit的搜索空间非常大,并且需要应用变换,暂时降低性能,以达到最终的性能提高。本文介绍Quarl,一种基于学习的量子Circuit优化器。在应用了反射学习(RL)到量子Circuit优化时,存在两个主要挑战:大和变化的动作空间,以及不均匀的状态表示。Quarl通过一种新的神经网络架构和RL训练过程来解决这些问题。我们的神经网络架构将动作空间分解成两个部分,并使用图 нейрон网络来表示状态,两者均受到了认为优化决策可以主要由本地逻辑引导,同时允许全局Circuit范围内的逻辑。我们的评估表明,Quarl在大多数测试Circuit上显著超越了现有的优化器。另外,Quarl还能学习执行旋转合并,这是一项复杂的、非本地Circuit优化,在现有的优化器中实现为单独的一个过程。

A DPLL(T) Framework for Verifying Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10266
  • repo_url: https://github.com/dynaroars/neuralsat-solver
  • paper_authors: Hai Duong, Linhan Li, ThanhVu Nguyen, Matthew Dwyer
  • for: 这个论文是为了提出一种新的深度神经网络验证方法,帮助检测和修复神经网络中的漏洞和攻击。
  • methods: 该方法基于DPLL(T)算法,包括冲突学习、抽象和理论解决,可以看作是一种基于SMT的神经网络验证框架。
  • results: 预liminary结果表明,NeuralSAT прототип与当前领导的状态之间具有竞争力。希望通过优化和工程化,NeuralSAT能够带来现代SAT/SMT解决方案的力量和成功,并推动神经网络验证领域的发展。
    Abstract Deep Neural Networks (DNNs) have emerged as an effective approach to tackling real-world problems. However, like human-written software, automatically-generated DNNs can have bugs and be attacked. This thus attracts many recent interests in developing effective and scalable DNN verification techniques and tools. In this work, we introduce a NeuralSAT, a new constraint solving approach to DNN verification. The design of NeuralSAT follows the DPLL(T) algorithm used modern SMT solving, which includes (conflict) clause learning, abstraction, and theory solving, and thus NeuralSAT can be considered as an SMT framework for DNNs. Preliminary results show that the NeuralSAT prototype is competitive to the state-of-the-art. We hope, with proper optimization and engineering, NeuralSAT will carry the power and success of modern SAT/SMT solvers to DNN verification. NeuralSAT is avaliable from: https://github.com/dynaroars/neuralsat-solver
    摘要 深度神经网络(DNN)已成为解决现实世界问题的有效方法。然而,如人工写的软件一样,自动生成的 DNN 也可能具有错误和攻击性。这引起了许多最近的关注,旨在开发有效和扩展性的 DNN 验证技术和工具。在这项工作中,我们介绍了一种名为 NeuralSAT 的新的约束解决方法。NeuralSAT 的设计基于现代 SMT 解决方法中的 DPLL(T) 算法,包括(冲突)条件学习、抽象和理论解决,因此 NeuralSAT 可以视为 DNN 的 SMT 框架。初步结果表明,NeuralSAT 原型在竞争力方面与现状保持紧密。我们希望,通过适当的优化和工程,NeuralSAT 能够将现代 SAT/SMT 解决方法的力量和成功带到 DNN 验证中。NeuralSAT 可以从以下地址获取:https://github.com/dynaroars/neuralsat-solver

A mixed policy to improve performance of language models on math problems

  • paper_url: http://arxiv.org/abs/2307.08767
  • repo_url: https://github.com/vividitytech/math_lm_rl
  • paper_authors: Gang Chen
  • for: 解决 math 问题时,语言模型通常采用采样策略来预测下一个词的 conditional probabilities。但是,在 math 理解步骤中,这种方法可能会导致错误答案。因此,我们提出了一种混合策略探索方法,使用 reinforcement learning 解决 math 问题。
  • methods: 我们提出了一种两级 токен探索策略:抽象级别的策略会根据概率采样下一个 токен是操作符或操作数,而第二级是具有最高分的循环采集策略。
  • results: 我们在 GSM8K 数据集上测试了我们的方法,使用 GPT-2 模型,并证明了更高于 $2%$ 的性能提升。我们的实现可以在 https://github.com/vividitytech/math_lm_rl 上找到。
    Abstract When to solve math problems, most language models take a sampling strategy to predict next word according conditional probabilities. In the math reasoning step, it may generate wrong answer. Considering math problems are deterministic, we propose a mixed policy exploration approach to solve math problems with reinforcement learning. In peculiar, we propose a two level token exploration policy: the abstract level explores next token with probability and the second level is deterministic. Specifically, the abstract level policy will decide whether the token is operator or operand with probability sampling, while the second level is deterministic to select next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than $2\%$ performance gain. Our implementation is available at https://github.com/vividitytech/math_lm_rl.
    摘要 当解决数学问题时,大多数语言模型采取采样策略来预测下一个词的条件概率。在数学逻辑步骤中,它可能生成错误答案。考虑到数学问题是deterministic的,我们提议一种混合策略探索方法来解决数学问题使用再征学习。具体来说,我们提议一个两级符号探索策略:第一级是概率采样,第二级是决定性选择下一个符号的最高分。 Specifically, the first-level policy will decide whether the token is an operator or an operand with probability sampling, while the second level is deterministic to select the next token with the highest score in a greedy way. We test our method on GSM8K dataset with GPT-2 model, and demonstrate more than 2% performance gain. Our implementation is available at .

Quality Assessment of Photoplethysmography Signals For Cardiovascular Biomarkers Monitoring Using Wearable Devices

  • paper_url: http://arxiv.org/abs/2307.08766
  • repo_url: None
  • paper_authors: Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Douglas A. Almeida, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这个研究用于评估光学 Plethysmography (PPG) 信号质量,以提高血液征分和征识Cardiovascular 健康。
  • methods: 这个研究使用了机器学习算法(包括XGBoost、CatBoost和Random Forest)来训练27个统计特征从PPG信号中提取出高质量和低质量的PPG信号。
  • results: 研究发现,使用这些机器学习模型可以达到Se、PPV和F1-score的94.4、95.6和95.0,94.7、95.9和95.3,93.7、91.3和92.5,分别。这些结果与文献中的状态作准比较,表明机器学习模型可以用于开发远程、非侵入式和连续测量设备。
    Abstract Photoplethysmography (PPG) is a non-invasive technology that measures changes in blood volume in the microvascular bed of tissue. It is commonly used in medical devices such as pulse oximeters and wrist worn heart rate monitors to monitor cardiovascular hemodynamics. PPG allows for the assessment of parameters (e.g., heart rate, pulse waveform, and peripheral perfusion) that can indicate conditions such as vasoconstriction or vasodilation, and provides information about microvascular blood flow, making it a valuable tool for monitoring cardiovascular health. However, PPG is subject to a number of sources of variations that can impact its accuracy and reliability, especially when using a wearable device for continuous monitoring, such as motion artifacts, skin pigmentation, and vasomotion. In this study, we extracted 27 statistical features from the PPG signal for training machine-learning models based on gradient boosting (XGBoost and CatBoost) and Random Forest (RF) algorithms to assess quality of PPG signals that were labeled as good or poor quality. We used the PPG time series from a publicly available dataset and evaluated the algorithm s performance using Sensitivity (Se), Positive Predicted Value (PPV), and F1-score (F1) metrics. Our model achieved Se, PPV, and F1-score of 94.4, 95.6, and 95.0 for XGBoost, 94.7, 95.9, and 95.3 for CatBoost, and 93.7, 91.3 and 92.5 for RF, respectively. Our findings are comparable to state-of-the-art reported in the literature but using a much simpler model, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices.
    摘要 photoplethysmography (PPG) 是一种不侵入性技术,用于测量血液Volume在微血管细胞床中的变化。它通常用于医疗器械 such as pulse oximeters和背部搭载的心率测量仪器来监测Cardiovascular hemodynamics。 PPG 可以评估参数(例如心率、脉冲形态和血液径流),这些参数可能会指示condition such as vasoconstriction or vasodilation,并提供关于微血管血液流动的信息,使其成为监测Cardiovascular health 的有用工具。然而,PPG 受到多种源的变化的影响,特别是在使用 wearable device 进行连续监测时,如运动 artifacts、皮肤颜色和血管运动。在这项研究中,我们从 PPG 信号中提取了27个统计特征,用于训练机器学习模型,包括梯度提升(XGBoost和CatBoost)和随机森(RF)算法。我们使用公共可用的 PPG 时间序列数据集,并使用 Se、Positive Predicted Value(PPV)和 F1-score(F1) metric 评估算法的性能。我们的模型实现了 Se、PPV 和 F1-score 的94.4、95.6和95.0,XGBoost 的94.7、95.9和95.3,CatBoost 的93.7、91.3和92.5,RF 的93.7、91.3和92.5。我们的发现与文献中的状态Of-the-art相似,但使用更简单的模型, indicating that ML models are promising for developing remote, non-invasive, and continuous measurement devices。

A Novel Application of Conditional Normalizing Flows: Stellar Age Inference with Gyrochronology

  • paper_url: http://arxiv.org/abs/2307.08753
  • repo_url: None
  • paper_authors: Phil Van-Lane, Joshua S. Speagle, Stephanie Douglas
  • for: 用于推算低质量主序星的年龄
  • methods: 使用机器学习技术进行Conditional Normalizing Flows分析光谱数据
  • results: 实现了与文献值相符的年龄估算,并且提供了一种可靠的数据驱动的星系年龄推算方法
    Abstract Stellar ages are critical building blocks of evolutionary models, but challenging to measure for low mass main sequence stars. An unexplored solution in this regime is the application of probabilistic machine learning methods to gyrochronology, a stellar dating technique that is uniquely well suited for these stars. While accurate analytical gyrochronological models have proven challenging to develop, here we apply conditional normalizing flows to photometric data from open star clusters, and demonstrate that a data-driven approach can constrain gyrochronological ages with a precision comparable to other standard techniques. We evaluate the flow results in the context of a Bayesian framework, and show that our inferred ages recover literature values well. This work demonstrates the potential of a probabilistic data-driven solution to widen the applicability of gyrochronological stellar dating.
    摘要 星系年龄是进化模型的关键构建块,但低质量主序星测量具有挑战。未经探索的解决方案在这个领域是通过潜在机器学习方法进行gyrochronology,这是特别适用于这些星体的星年龄测量技术。虽然精确的分析gyrochronological模型具有挑战,但我们在光度测量数据上应用条件正常流,并示出了一种数据驱动的方法可以与其他标准技术相比的精度来限制gyrochronological年龄。我们在bayesian框架下评估流果,并发现我们的推断年龄与文献值很好地匹配。这种工作示出了潜在的数据驱动probabilistic解决方案可以扩展gyrochronological星年龄测量的应用范围。

Flow Matching in Latent Space

  • paper_url: http://arxiv.org/abs/2307.08698
  • repo_url: https://github.com/vinairesearch/lfm
  • paper_authors: Quan Dao, Hao Phung, Binh Nguyen, Anh Tran
  • for: train generative models with improved computational efficiency and scalability for high-resolution image synthesis
  • methods: apply flow matching in the latent spaces of pretrained autoencoders, integrate various conditions for conditional generation tasks
  • results: effective in both quantitative and qualitative results on various datasets, provide theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution
    Abstract Flow matching is a recent framework to train generative models that exhibits impressive empirical performance while being relatively easier to train compared with diffusion-based models. Despite its advantageous properties, prior methods still face the challenges of expensive computing and a large number of function evaluations of off-the-shelf solvers in the pixel space. Furthermore, although latent-based generative methods have shown great success in recent years, this particular model type remains underexplored in this area. In this work, we propose to apply flow matching in the latent spaces of pretrained autoencoders, which offers improved computational efficiency and scalability for high-resolution image synthesis. This enables flow-matching training on constrained computational resources while maintaining their quality and flexibility. Additionally, our work stands as a pioneering contribution in the integration of various conditions into flow matching for conditional generation tasks, including label-conditioned image generation, image inpainting, and semantic-to-image generation. Through extensive experiments, our approach demonstrates its effectiveness in both quantitative and qualitative results on various datasets, such as CelebA-HQ, FFHQ, LSUN Church & Bedroom, and ImageNet. We also provide a theoretical control of the Wasserstein-2 distance between the reconstructed latent flow distribution and true data distribution, showing it is upper-bounded by the latent flow matching objective. Our code will be available at https://github.com/VinAIResearch/LFM.git.
    摘要 “流匹配”是一种最近的框架,用于训练生成模型,具有印象性的实验性能,而且训练更容易。然而,之前的方法仍面临computational expensive和大量的函数评估问题在像素空间中。另外,Latent-based生成方法在最近几年内表现出色,但这种模型类型在这个领域仍然未得到充分的探索。在这种工作中,我们提议在预训练 autoencoder 的 latent space 中应用流匹配,这可以提高计算效率和可扩展性,用于高分辨率图像生成。这种方法可以在受限的计算资源上进行流匹配训练,同时保持图像质量和灵活性。此外,我们的工作是在 flow matching 中 интеGRATION 多种条件的先驱性贡献,包括标签条件图像生成、图像填充和semantic-to-image生成。通过广泛的实验,我们的方法在多个数据集上表现出色,如 CelebA-HQ、FFHQ、LSUN Church & Bedroom 和 ImageNet。我们还提供了 Wasserstein-2 距离真实数据分布和重建 latent flow 分布的理论控制,证明它是上界。我们的代码将可以在 GitHub 上获得:https://github.com/VinAIResearch/LFM.git。

A Multiobjective Reinforcement Learning Framework for Microgrid Energy Management

  • paper_url: http://arxiv.org/abs/2307.08692
  • repo_url: None
  • paper_authors: M. Vivienne Liu, Patrick M. Reed, David Gold, Garret Quist, C. Lindsay Anderson
  • for: 提供一种解决多目标冲突的微grid操作方法
  • methods: 利用外生信息和数据驱动学习来探索高维目标空间,找到多目标之间的补做
  • results: 比Status quo操作更高效,提供多样化、适应性和可解释的操作方法
    Abstract The emergence of microgrids (MGs) has provided a promising solution for decarbonizing and decentralizing the power grid, mitigating the challenges posed by climate change. However, MG operations often involve considering multiple objectives that represent the interests of different stakeholders, leading to potentially complex conflicts. To tackle this issue, we propose a novel multi-objective reinforcement learning framework that explores the high-dimensional objective space and uncovers the tradeoffs between conflicting objectives. This framework leverages exogenous information and capitalizes on the data-driven nature of reinforcement learning, enabling the training of a parametric policy without the need for long-term forecasts or knowledge of the underlying uncertainty distribution. The trained policies exhibit diverse, adaptive, and coordinative behaviors with the added benefit of providing interpretable insights on the dynamics of their information use. We employ this framework on the Cornell University MG (CU-MG), which is a combined heat and power MG, to evaluate its effectiveness. The results demonstrate performance improvements in all objectives considered compared to the status quo operations and offer more flexibility in navigating complex operational tradeoffs.
    摘要 随着微型电网(MG)的出现,为了解决气候变化所带来的挑战,提供了一个有前途的解决方案,即减少和分散电力网络。然而,MG的运营通常需要考虑多个目标,这些目标代表不同的利益相互之间的矛盾,这可能会导致复杂的冲突。为了解决这个问题,我们提出了一种新的多目标学习框架,该框架可以探索高维目标空间中的交叉关系,并揭示不同目标之间的负担变化。这种框架利用外生信息,并利用学习的数据驱动特性,可以在不需要长期预测或者知道下游不确定分布的情况下,训练一个参数化策略。训练出来的策略具有多样化、适应性和协调性,同时还提供了可解释的动态信息使用动态。我们在康奈尔大学微型电网(CU-MG)上使用这种框架进行评估,结果表明,相比Status quo操作,我们的方法可以提高所有考虑的目标的性能,并提供更多的操作复杂关系的灵活性。

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

  • paper_url: http://arxiv.org/abs/2307.08691
  • repo_url: https://github.com/dao-ailab/flash-attention
  • paper_authors: Tri Dao
  • for: 提高Transformers的Sequence length scaling,以提高语言模型和高分辨率图像理解的性能,以及开启代码、音频和视频生成等新应用。
  • methods: 利用GPU内存层次结构,实现 linear 而不是 quadratic 的内存占用和运行时间减速,无需 aproximation。
  • results: 对比于优化baselines,FlashAttention-2可以达到2-4倍的运行时间减速,并且在A100 GPU上达到50-73%的理论最大FLOPs/s,接近GEMM操作的效率。
    Abstract Scaling Transformers to longer sequence lengths has been a major problem in the last several years, promising to improve performance in language modeling and high-resolution image understanding, as well as to unlock new applications in code, audio, and video generation. The attention layer is the main bottleneck in scaling to longer sequences, as its runtime and memory increase quadratically in the sequence length. FlashAttention exploits the asymmetric GPU memory hierarchy to bring significant memory saving (linear instead of quadratic) and runtime speedup (2-4$\times$ compared to optimized baselines), with no approximation. However, FlashAttention is still not nearly as fast as optimized matrix-multiply (GEMM) operations, reaching only 25-40\% of the theoretical maximum FLOPs/s. We observe that the inefficiency is due to suboptimal work partitioning between different thread blocks and warps on the GPU, causing either low-occupancy or unnecessary shared memory reads/writes. We propose FlashAttention-2, with better work partitioning to address these issues. In particular, we (1) tweak the algorithm to reduce the number of non-matmul FLOPs (2) parallelize the attention computation, even for a single head, across different thread blocks to increase occupancy, and (3) within each thread block, distribute the work between warps to reduce communication through shared memory. These yield around 2$\times$ speedup compared to FlashAttention, reaching 50-73\% of the theoretical maximum FLOPs/s on A100 and getting close to the efficiency of GEMM operations. We empirically validate that when used end-to-end to train GPT-style models, FlashAttention-2 reaches training speed of up to 225 TFLOPs/s per A100 GPU (72\% model FLOPs utilization).
    摘要 缩放变换器在更长的序列长度上进行缩放是过去几年内的一个主要问题,承诺改进语言模型和高分辨率图像理解的性能,以及开启新的代码、音频和视频生成应用。注意层是缩放到更长序列的主要瓶颈,因为它的运行时间和内存增长为序列长度的平方。 FlashAttention 利用了 GPU 内存层次结构的非对称性,实现了重要的内存减少(线性而不是平方)和运行时间加速(2-4倍于优化基线),无需折衣。然而,FlashAttention 仍然不够快,只达到了25-40%的理论最大 FLOPs/s。我们发现,这是由不佳的工作分配导致的,包括在 GPU 中的不同线程块和批处理中的低占用或 Shared 内存中的不必要读写。我们提议 FlashAttention-2,它通过改进工作分配来解决这些问题。具体来说,我们(1)修改算法,减少非 matrix-multiply FLOPs;(2)在单个头上并行执行注意计算,以增加占用率;(3)在每个线程块中,分配工作 между批处理,以减少通信过 Shared 内存。这些提高了约 2 倍的速度,达到 50-73% 的理论最大 FLOPs/s 在 A100 上,与 GEMM 操作的效率相似。我们经验 validate 了,当用于 Train GPT-style 模型时,FlashAttention-2 的训练速度可达 225 TFLOPs/s per A100 GPU (72% 模型 FLOPs 利用率)。

COLLIE: Systematic Construction of Constrained Text Generation Tasks

  • paper_url: http://arxiv.org/abs/2307.08689
  • repo_url: https://github.com/princeton-nlp/Collie
  • paper_authors: Shunyu Yao, Howard Chen, Austin W. Hanjie, Runzhe Yang, Karthik Narasimhan
  • for: 本研究旨在提供一种 grammar-based 框架,用于 specifying 复杂的、compositional 约束,以便在自然语言处理中进行 Text generation under constraints。
  • methods: 本研究使用了 grammar-based 框架 COLLIE,可以Specify 多种层次的约束(word、sentence、paragraph、passage)和模型挑战(语言理解、逻辑推理、计数、semantic planning)。此外,还开发了一些自动提取任务实例的工具,以便使用 COLLIE 进行数据生成。
  • results: 通过使用 COLLIE,研究人员编译了 COLLIE-v1 数据集,包含 2080 个任务实例,其中每个任务实例包含 13 种约束结构。通过对 five 种 instruction-tuned 语言模型进行系统性的实验和分析,发现这些模型在处理 COLLIE 数据集时存在缺陷。 COLLIE 框架设计为轻量级和可扩展,希望社区可以通过开发更复杂的约束和评价方法来进一步提高自然语言处理技术。
    Abstract Text generation under constraints have seen increasing interests in natural language processing, especially with the rapidly improving capabilities of large language models. However, existing benchmarks for constrained generation usually focus on fixed constraint types (e.g.,generate a sentence containing certain words) that have proved to be easy for state-of-the-art models like GPT-4. We present COLLIE, a grammar-based framework that allows the specification of rich, compositional constraints with diverse generation levels (word, sentence, paragraph, passage) and modeling challenges (e.g.,language understanding, logical reasoning, counting, semantic planning). We also develop tools for automatic extraction of task instances given a constraint structure and a raw text corpus. Using COLLIE, we compile the COLLIE-v1 dataset with 2080 instances comprising 13 constraint structures. We perform systematic experiments across five state-of-the-art instruction-tuned language models and analyze their performances to reveal shortcomings. COLLIE is designed to be extensible and lightweight, and we hope the community finds it useful to develop more complex constraints and evaluations in the future.
    摘要 文本生成 unter constraint 已经受到了自然语言处理领域的越来越多的关注,尤其是大语言模型的能力在不断提高。然而,现有的受制定生成标准通常围绕固定的约束类型(例如,生成包含某些词的句子),这些约束已经证明容易 для当前的模型 like GPT-4。我们提出了 COLLIE,一个基于语法的框架,允许指定Rich和多层次的 compositional 约束(单词、句子、段落、段落),以及模型挑战(例如,语言理解、逻辑推理、计数、semantic planning)。我们还开发了自动提取 task instance 的工具,基于约束结构和原始文本库。使用 COLLIE,我们编译了 COLLIE-v1 数据集,包含 2080 个实例,其中 13 种约束结构。我们在五种状态atracking 语言模型上进行了系统性的实验,并分析其性能,以揭示缺陷。COLLIE 是可扩展和轻量级的,我们希望社区能够在未来开发更复杂的约束和评价。

An R package for parametric estimation of causal effects

  • paper_url: http://arxiv.org/abs/2307.08686
  • repo_url: None
  • paper_authors: Joshua Wolff Anderson, Cyril Rakovski
  • for: 本文旨在介绍R包CausalModels,用于估计 causal effect。
  • methods: 本文使用了一些常见的统计方法,包括标准化、IP重み、G估计、结果回归、工具变量和投影匹配等。
  • results: 本文提供了一个简单和可访问的框架,可以在R中对不同的统计方法进行集成,用于估计 causal effect。Note: The above text is in Simplified Chinese.
    Abstract This article explains the usage of R package CausalModels, which is publicly available on the Comprehensive R Archive Network. While packages are available for sufficiently estimating causal effects, there lacks a package that provides a collection of structural models using the conventional statistical approach developed by Hernan and Robins (2020). CausalModels addresses this deficiency of software in R concerning causal inference by offering tools for methods that account for biases in observational data without requiring extensive statistical knowledge. These methods should not be ignored and may be more appropriate or efficient in solving particular problems. While implementations of these statistical models are distributed among a number of causal packages, CausalModels introduces a simple and accessible framework for a consistent modeling pipeline among a variety of statistical methods for estimating causal effects in a single R package. It consists of common methods including standardization, IP weighting, G-estimation, outcome regression, instrumental variables and propensity matching.
    摘要

A Rubik’s Cube inspired approach to Clifford synthesis

  • paper_url: http://arxiv.org/abs/2307.08684
  • repo_url: https://github.com/gshartnett/rubiks-clifford-synthesis
  • paper_authors: Ning Bao, Gavin S. Hartnett
  • for: 解决Clifford元素的分解问题,即Clifford合成问题。
  • methods: 采用机器学习方法,基于距离标准的近似来实现Clifford合成。
  • results: 比现有算法更具有灵活性,可以适应特定设备的gate集、设备拓扑和gate精度。
    Abstract The problem of decomposing an arbitrary Clifford element into a sequence of Clifford gates is known as Clifford synthesis. Drawing inspiration from similarities between this and the famous Rubik's Cube problem, we develop a machine learning approach for Clifford synthesis based on learning an approximation to the distance to the identity. This approach is probabilistic and computationally intensive. However, when a decomposition is successfully found, it often involves fewer gates than existing synthesis algorithms. Additionally, our approach is much more flexible than existing algorithms in that arbitrary gate sets, device topologies, and gate fidelities may incorporated, thus allowing for the approach to be tailored to a specific device.
    摘要 “把任意的克利福德元素分解成克利福德门的序列是称为克利福德合成的问题。 Drawing inspiration from类似于这和著名的聂隐秘 куби cura问题,我们开发了基于学习距离Identidade的机器学习方法 для克利福德合成。 This approach是 probabilistic和 computationally intensive。 however,当一个分解成功时,它通常具有 fewer gates than existing synthesis algorithms。 In addition, our approach is much more flexible than existing algorithms in that arbitrary gate sets, device topologies, and gate fidelities may be incorporated, thus allowing for the approach to be tailored to a specific device.”Note that Simplified Chinese is the standard writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations

  • paper_url: http://arxiv.org/abs/2307.08678
  • repo_url: None
  • paper_authors: Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
  • for: 这个论文旨在研究大型自然语言模型(LLM)是否可以解释自己的决策过程。
  • methods: 作者提出了评估对natural language explanation的counterfactual simulatability,以测试LLM是否可以帮助人类构建模型处理不同输入的MENTAL MODEL。
  • results: 研究发现,LLM的解释具有低精度和不符合可能性,因此直接优化人类批准(例如RLHF)可能并不是 suficient solution。
    Abstract Large language models (LLMs) are trained to imitate humans to explain human decisions. However, do LLMs explain themselves? Can they help humans build mental models of how LLMs process different inputs? To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input. For example, if a model answers "yes" to the input question "Can eagles fly?" with the explanation "all birds can fly", then humans would infer from the explanation that it would also answer "yes" to the counterfactual input "Can penguins fly?". If the explanation is precise, then the model's answer should match humans' expectations. We implemented two metrics based on counterfactual simulatability: precision and generality. We generated diverse counterfactuals automatically using LLMs. We then used these metrics to evaluate state-of-the-art LLMs (e.g., GPT-4) on two tasks: multi-hop factual reasoning and reward modeling. We found that LLM's explanations have low precision and that precision does not correlate with plausibility. Therefore, naively optimizing human approvals (e.g., RLHF) may not be a sufficient solution.
    摘要 大型自然语言模型(LLM)在训练时尝试模仿人类的决策,但是 LLM 是否能够解释自己的处理逻辑?可以使用 counterfactual simulatability 来评估 LLM 的解释能力。我们定义 counterfactual simulatability 为:一个解释是否能够帮助人类建立模型处理输入的精准模型。例如,如果一个模型对 input 问题 "Can eagles fly?" 的答案是 "yes",并且提供解释 "all birds can fly",那么人类就可以从解释中推断出模型对 counterfactual input "Can penguins fly?" 的答案是什么。如果解释准确,那么模型的答案应该与人类的预期相符。为了评估 LLM 的 counterfactual simulatability,我们提出了两种指标:精度和通用性。我们使用 LLM 自动生成了多个 counterfactual,然后使用这些指标来评估当前 state-of-the-art LLM (例如 GPT-4)在 multi-hop factual reasoning 和 reward modeling 两个任务上的表现。我们发现 LLM 的解释准确率很低,而且准确率与可能性无关。因此,直接优化人类的批准(例如 RLHF)可能并不是一个充分的解决方案。

TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT

  • paper_url: http://arxiv.org/abs/2307.08674
  • repo_url: None
  • paper_authors: Liangyu Zha, Junlin Zhou, Liyao Li, Rui Wang, Qingyi Huang, Saisai Yang, Jing Yuan, Changbao Su, Xiang Li, Aofeng Su, Tao Zhang, Chen Zhou, Kaizhe Shou, Miao Wang, Wufang Zhu, Guoshan Lu, Chao Ye, Yali Ye, Wentao Ye, Yiming Zhang, Xinglong Deng, Jie Xu, Haobo Wang, Gang Chen, Junbo Zhao
  • for: 论文旨在提供一个可以通过自然语言输入操作表格的框架,使用大语言模型(LLMs)来理解和处理表格。
  • methods: 该框架基于全新的全球表格表示方式,通过同时训练 LLMs 在表格和文本模式之间,以便它们能够深入理解表格数据并在指令链中执行复杂的操作。
  • results: TableGPT 可以提供简单易用的表格操作方式,包括问答、数据操作、数据可视化、分析报告生成和自动预测等,从而为用户提供更多的便利和访问ibilty。
    Abstract Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.
    摘要 Tables are prevalent in real-world databases, requiring significant time and effort for humans to analyze and manipulate. The advancements in large language models (LLMs) have made it possible to interact with tables using natural language input, bringing this capability closer to reality. In this paper, we present TableGPT, a unified fine-tuned framework that enables LLMs to understand and operate on tables using external functional commands. It introduces the capability to seamlessly interact with tables, enabling a wide range of functionalities such as question answering, data manipulation (e.g., insert, delete, query, and modify operations), data visualization, analysis report generation, and automated prediction. TableGPT aims to provide convenience and accessibility to users by empowering them to effortlessly leverage tabular data. At the core of TableGPT lies the novel concept of global tabular representations, which empowers LLMs to gain a comprehensive understanding of the entire table beyond meta-information. By jointly training LLMs on both table and text modalities, TableGPT achieves a deep understanding of tabular data and the ability to perform complex operations on tables through chain-of-command instructions. Importantly, TableGPT offers the advantage of being a self-contained system rather than relying on external API interfaces. Moreover, it supports efficient data process flow, query rejection (when appropriate) and private deployment, enabling faster domain data fine-tuning and ensuring data privacy, which enhances the framework's adaptability to specific use cases.

CohortFinder: an open-source tool for data-driven partitioning of biomedical image cohorts to yield robust machine learning models

  • paper_url: http://arxiv.org/abs/2307.08673
  • repo_url: None
  • paper_authors: Fan Fan, Georgia Martinez, Thomas Desilvio, John Shin, Yijiang Chen, Bangchen Wang, Takaya Ozeki, Maxime W. Lafarge, Viktor H. Koelzer, Laura Barisoni, Anant Madabhushi, Satish E. Viswanath, Andrew Janowczyk
  • for: 降低机器学习模型的泛化性下降,减少批处理影响
  • methods: 使用数据驱动的 cohort 分割方法来缓解批处理的影响
  • results: 在医疗影像处理任务中,使用 CohortFinder 可以提高机器学习模型的性能
    Abstract Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder, an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.
    摘要 批处效应(BE)指的是数据收集过程中的系统性技术差异,不 relacionados con variationes biológicas,这些噪声可能会负面影响机器学习(ML)模型的泛化性。我们现在发布了一个开源工具,即CohortFinder,用于缓解BE的影响。我们在医学图像处理任务中展示了CohortFinder可以提高ML模型的性能。CohortFinder可以免费下载于cohortfinder.com。

Neural Image Compression: Generalization, Robustness, and Spectral Biases

  • paper_url: http://arxiv.org/abs/2307.08657
  • repo_url: None
  • paper_authors: Kelsey Lieberman, James Diffenderfer, Charles Godfrey, Bhavya Kailkhura
  • for: 评估 neural image compression (NIC) 模型在实际应用中的抗迁移性和一致性性能。
  • methods: 提供了一个完整的benchmark suite来评估图像压缩方法的out-of-distribution (OOD)性能,包括CLIC-C和Kodak-C两个 benchmark,并提出了基于 спектrum的检查工具来深入了解图像压缩方法引入的错误和OOD性能。
  • results: 对一种经典编码器和多种 NIC 变体进行了详细的性能比较,发现了一些挑战当前我们对 NIC 的强点和局限性的发现,并通过理论分析深入了解 NIC 的OOD性能和数据的spectral properties的关系。
    Abstract Recent neural image compression (NIC) advances have produced models which are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.
    摘要 Recent neural image compression (NIC) advances have produced models that are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.Here's the translation in Traditional Chinese:Recent neural image compression (NIC) advances have produced models that are starting to outperform traditional codecs. While this has led to growing excitement about using NIC in real-world applications, the successful adoption of any machine learning system in the wild requires it to generalize (and be robust) to unseen distribution shifts at deployment. Unfortunately, current research lacks comprehensive datasets and informative tools to evaluate and understand NIC performance in real-world settings. To bridge this crucial gap, first, this paper presents a comprehensive benchmark suite to evaluate the out-of-distribution (OOD) performance of image compression methods. Specifically, we provide CLIC-C and Kodak-C by introducing 15 corruptions to popular CLIC and Kodak benchmarks. Next, we propose spectrally inspired inspection tools to gain deeper insight into errors introduced by image compression methods as well as their OOD performance. We then carry out a detailed performance comparison of a classical codec with several NIC variants, revealing intriguing findings that challenge our current understanding of the strengths and limitations of NIC. Finally, we corroborate our empirical findings with theoretical analysis, providing an in-depth view of the OOD performance of NIC and its dependence on the spectral properties of the data. Our benchmarks, spectral inspection tools, and findings provide a crucial bridge to the real-world adoption of NIC. We hope that our work will propel future efforts in designing robust and generalizable NIC methods. Code and data will be made available at https://github.com/klieberman/ood_nic.

A General Framework for Learning under Corruption: Label Noise, Attribute Noise, and Beyond

  • paper_url: http://arxiv.org/abs/2307.08643
  • repo_url: None
  • paper_authors: Laura Iacovissi, Nan Lu, Robert C. Williamson
  • for: 本研究旨在系统地分析损害模型在分布水平上的影响,提供一个涵盖所有损害模型的通用框架,并研究损害对标准预测学习的影响。
  • methods: 本研究使用Markov kernel来形式地分析损害模型,并发现了 Label和特征上的复杂相互作用和依赖关系,这些关系通常被之前的研究所忽略。
  • results: 研究发现,损害对标准预测学习会导致 bayes 风险的变化,并提供了对不同损害实例的loss correction的理论分析。
    Abstract Corruption is frequently observed in collected data and has been extensively studied in machine learning under different corruption models. Despite this, there remains a limited understanding of how these models relate such that a unified view of corruptions and their consequences on learning is still lacking. In this work, we formally analyze corruption models at the distribution level through a general, exhaustive framework based on Markov kernels. We highlight the existence of intricate joint and dependent corruptions on both labels and attributes, which are rarely touched by existing research. Further, we show how these corruptions affect standard supervised learning by analyzing the resulting changes in Bayes Risk. Our findings offer qualitative insights into the consequences of "more complex" corruptions on the learning problem, and provide a foundation for future quantitative comparisons. Applications of the framework include corruption-corrected learning, a subcase of which we study in this paper by theoretically analyzing loss correction with respect to different corruption instances.
    摘要 <>将文本翻译成简化中文。<>腐败是收集数据中常见的现象,在机器学习中也广泛研究了不同的腐败模型。尽管如此,我们对这些模型之间的关系仍然具有有限的理解,而且还缺乏一个总体的视角来描述这些腐败和它们对学习的影响。在这项工作中,我们使用Markov核来正式分析腐败模型的分布水平。我们发现了 labels和特征上的复杂联合腐败,这些腐败通常不受现有研究的关注。此外,我们还分析了这些腐败对标准指导学习的影响,并研究了由不同的腐败实例导致的损失修复。我们的发现可以提供对"更复杂"腐败对学习问题的影响的质量性理解,并为未来的量化比较提供基础。应用该框架包括腐败修正学习,我们在这篇论文中对这个子情况进行了理论分析。

LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

  • paper_url: http://arxiv.org/abs/2307.08637
  • repo_url: None
  • paper_authors: Ivan Carvalho, Ramon Lawrence
  • for: 本文 analyze 和 parallelize LearnedSort algorithm,一种使用机器学习模型来实现排序的新算法。
  • methods: 本文使用了对predictions的分析, argue dass LearnedSort 是一种learning-augmented SampleSort。
  • results: 对 synthetic 和实际 dataset 进行了 benchmark, parallel LearnedSort 比 IPS4o 和其他排序算法具有更高的并发性能。
    Abstract This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.
    摘要 这个工作分析并平行化了 LearnedSort 算法,这是基于累累函数的机器学习模型来进行排序的新算法。 LearnedSort 被视为一种基于预测的算法,并且被证明是一种增强SampleSort的学习算法。我们开发了一种将 LearnedSort 与现有的 SampleSort 实现 IPS4o 结合的平行 LearnedSort 算法。对假数据和实际数据集进行了比较,结果显示了平行 LearnedSort 的性能提高 compared to IPS4o 和其他排序算法。

Retentive Network: A Successor to Transformer for Large Language Models

  • paper_url: http://arxiv.org/abs/2307.08621
  • repo_url: https://github.com/microsoft/unilm
  • paper_authors: Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
  • for: This paper proposes a new architecture called Retentive Network (RetNet) for large language models, which simultaneously achieves training parallelism, low-cost inference, and good performance.
  • methods: The paper uses a retention mechanism for sequence modeling, which supports three computation paradigms: parallel, recurrent, and chunkwise recurrent. The parallel representation allows for training parallelism, while the recurrent representation enables low-cost $O(1)$ inference. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity.
  • results: The paper shows that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. Experimental results on language modeling demonstrate the effectiveness of RetNet, making it a strong successor to Transformer for large language models.
    Abstract In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at https://aka.ms/retnet.
    摘要 在这个工作中,我们提议Retentive Network(RetNet)作为大语言模型的基础架构,同时实现培训并行、低成本推理和好性能。我们理论上 derivates了回忆和注意力之间的连接。然后我们提议了保留机制,用于序列模型化,该机制支持三种计算方式,即并行、循环和块级循环。Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost $O(1)$ inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks.实验结果表明,RetNet实现了有利扩展性、并行培训、低成本部署和高效推理。RetNet的特有性使其成为Transformer的强 successor for large language models。代码将提供在https://aka.ms/retnet.

Understanding the impacts of crop diversification in the context of climate change: a machine learning approach

  • paper_url: http://arxiv.org/abs/2307.08617
  • repo_url: None
  • paper_authors: Georgios Giannarakis, Ilias Tsoumas, Stelios Neophytides, Christiana Papoutsa, Charalampos Kontoes, Diofantos Hadjimitsis
  • for: 这个论文是为了研究农业可持续强化的方法,以及这些方法在气候变化的情况下的影响。
  • methods: 这篇论文使用了多种数据和机器学习方法来研究农业生产力的影响。
  • results: 论文发现,在更暖和干燥的气候下,多种作物杂 cultivation 能够提高农业生产力,平均提高了2.8%。这种效果与高温和低湿度有相互作用。
    Abstract The concept of sustainable intensification in agriculture necessitates the implementation of management practices that prioritize sustainability without compromising productivity. However, the effects of such practices are known to depend on environmental conditions, and are therefore expected to change as a result of a changing climate. We study the impact of crop diversification on productivity in the context of climate change. We leverage heterogeneous Earth Observation data and contribute a data-driven approach based on causal machine learning for understanding how crop diversification impacts may change in the future. We apply this method to the country of Cyprus throughout a 4-year period. We find that, on average, crop diversification significantly benefited the net primary productivity of crops, increasing it by 2.8%. The effect generally synergized well with higher maximum temperatures and lower soil moistures. In a warmer and more drought-prone climate, we conclude that crop diversification exhibits promising adaptation potential and is thus a sensible policy choice with regards to agricultural productivity for present and future.
    摘要 “减少农业的环境影响是一种把持可持续发展的概念,但这些实践的效果受环境因素的影响,因此随着气候变化而改变。我们研究了采用多种作物杂 planting 对产量的影响,并通过基于 causal machine learning 的数据驱动方法来理解这些影响可能在未来如何变化。我们在塞浦路斯国家范围内进行了4年的研究,发现,在平均来说,多种作物杂 planting 对农作物的 net primary productivity 产生了2.8%的增长。这种效果通常与高温和低湿度相关。在将来的气候变化中,我们认为多种作物杂 planting 具有良好的适应能力,因此是一种有理解的农业产量政策选择。”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Temporal and Geographical Analysis of Real Economic Activities in the Bitcoin Blockchain

  • paper_url: http://arxiv.org/abs/2307.08616
  • repo_url: None
  • paper_authors: Rafael Ramos Tubino, Remy Cazabet, Natkamon Tovanich, Celine Robardet
  • for: The paper focuses on the real economic activity in the Bitcoin blockchain, specifically on transactions between retail users and their neighbors, rather than between organizations.
  • methods: The paper introduces a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others.
  • results: Most real transactions involve Frequent Receivers, who represent a small fraction of the total value exchanged but a significant fraction of all payments, which raises concerns about the centralization of the Bitcoin ecosystem. Additionally, the paper conducts a weekly pattern analysis of activity to provide insights into the geographical location of Bitcoin users and to quantify the bias of a well-known dataset for actor identification.Here are the same information points in Simplified Chinese text:
  • for: 这篇论文关注比特币链上真实的经济活动,具体来说是对于个人用户的交易,而不是 между机构如市场、交易所或其他服务。
  • methods: 论文提出一种归纳方法,将比特币玩家分为三类:固定接收者(FR)、邻居 FR 和其他人。
  • results: 实际交易主要发生在固定接收者身上,占总交易值的小部分,但占所有支付的重要部分,这引发了中央化的担忧。论文还进行了每周活动模式分析,提供了比特币用户的地理位置信息,并且量化了一个常见的数据集的偏见。
    Abstract We study the real economic activity in the Bitcoin blockchain that involves transactions from/to retail users rather than between organizations such as marketplaces, exchanges, or other services. We first introduce a heuristic method to classify Bitcoin players into three main categories: Frequent Receivers (FR), Neighbors of FR, and Others. We show that most real transactions involve Frequent Receivers, representing a small fraction of the total value exchanged according to the blockchain, but a significant fraction of all payments, raising concerns about the centralization of the Bitcoin ecosystem. We also conduct a weekly pattern analysis of activity, providing insights into the geographical location of Bitcoin users and allowing us to quantify the bias of a well-known dataset for actor identification.
    摘要 我们研究比特币区块链上真正的经济活动,涉及到零售用户的交易而不是组织如市场、交易所或其他服务。我们首先提出一种启发法来分类比特币玩家为三个主要类别:固定接收者(FR)、邻居FR和其他人。我们表明,大多数真实交易发生在固定接收者身上,表示比特币总额中的一小部分,但是对所有支付都占有重要的比重,引发了中央化比特币生态系统的问题。我们还进行了每周活动模式分析,提供了比特币用户的地理位置信息,并使我们可以评估一个常见的数据集中截然的偏见。

Hyperparameter Tuning Cookbook: A guide for scikit-learn, PyTorch, river, and spotPython

  • paper_url: http://arxiv.org/abs/2307.10262
  • repo_url: https://github.com/sequential-parameter-optimization/spotpython
  • paper_authors: Thomas Bartz-Beielstein
  • for: 本文提供了一个完整的 гипер参数优化指南,使用 spotPython 对 scikit-learn、PyTorch 和 river 进行优化。
  • methods: 本文使用 spotPython 的模拟模型基于优化过程,并详细介绍了 hyperparameter tuning 的过程。
  • results: 本文通过多个案例研究,包括 sklearn 模型 Support Vector Classification、Random Forests、Gradient Boosting (XGB) 和 K-nearest neighbors (KNN) 的 hyperparameter tuning,以及 river 中的 Hoeffding Adaptive Tree Regressor。 plus, the integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed.
    Abstract This document provides a comprehensive guide to hyperparameter tuning using spotPython for scikit-learn, PyTorch, and river. The first part introduces spotPython's surrogate model-based optimization process, while the second part focuses on hyperparameter tuning. Several case studies are presented, including hyperparameter tuning for sklearn models such as Support Vector Classification, Random Forests, Gradient Boosting (XGB), and K-nearest neighbors (KNN), as well as a Hoeffding Adaptive Tree Regressor from river. The integration of spotPython into the PyTorch and PyTorch Lightning training workflow is also discussed. With a hands-on approach and step-by-step explanations, this cookbook serves as a practical starting point for anyone interested in hyperparameter tuning with Python. Highlights include the interplay between Tensorboard, PyTorch Lightning, spotPython, and river. This publication is under development, with updates available on the corresponding webpage.
    摘要 这份文档提供了使用spotPython进行scikit-learn、PyTorch和river中的参数优化的全面指南。文档的首部介绍了spotPython的代理模型基于优化过程,而第二部分则专注于参数优化。文档包含了多个案例研究,包括scikit-learn模型如支持向量分类、随机森林、梯度折衔(XGB)和K最近邻(KNN)等,以及来自river的韦伯丁适应树回归模型。文档还讨论了spotPython在PyTorch和PyTorch Lightning训练工作流程中的集成。通过实践的方式和步骤说明,这本cookbook作为Python中参数优化的实践开始点,强调了Tensorboard、PyTorch Lightning、spotPython和river之间的互动。这份文档正在开发中,更新信息可以通过相应的网页获得。

Artificial Intelligence for the Electron Ion Collider (AI4EIC)

  • paper_url: http://arxiv.org/abs/2307.08593
  • repo_url: None
  • paper_authors: C. Allaire, R. Ammendola, E. -C. Aschenauer, M. Balandat, M. Battaglieri, J. Bernauer, M. Bondì, N. Branson, T. Britton, A. Butter, I. Chahrour, P. Chatagnon, E. Cisbani, E. W. Cline, S. Dash, C. Dean, W. Deconinck, A. Deshpande, M. Diefenthaler, R. Ent, C. Fanelli, M. Finger, M. Finger, Jr., E. Fol, S. Furletov, Y. Gao, J. Giroux, N. C. Gunawardhana Waduge, R. Harish, O. Hassan, P. L. Hegde, R. J. Hernández-Pinto, A. Hiller Blin, T. Horn, J. Huang, D. Jayakodige, B. Joo, M. Junaid, P. Karande, B. Kriesten, R. Kunnawalkam Elayavalli, M. Lin, F. Liu, S. Liuti, G. Matousek, M. McEneaney, D. McSpadden, T. Menzo, T. Miceli, V. Mikuni, R. Montgomery, B. Nachman, R. R. Nair, J. Niestroy, S. A. Ochoa Oregon, J. Oleniacz, J. D. Osborn, C. Paudel, C. Pecar, C. Peng, G. N. Perdue, W. Phelps, M. L. Purschke, K. Rajput, Y. Ren, D. F. Renteria-Estrada, D. Richford, B. J. Roy, D. Roy, N. Sato, T. Satogata, G. Sborlini, M. Schram, D. Shih, J. Singh, R. Singh, A. Siodmok, P. Stone, J. Stevens, L. Suarez, K. Suresh, A. -N. Tawfik, F. Torales Acosta, N. Tran, R. Trotta, F. J. Twagirayezu, R. Tyson, S. Volkova, A. Vossen, E. Walter, D. Whiteson, M. Williams, S. Wu, N. Zachariou, P. Zurita
  • for: The paper is written for the EIC community, discussing the potential applications of AI/ML in the facility’s experiments and commissioning processes.
  • methods: The paper covers various R&D projects and approaches currently being explored in the EIC community, including cutting-edge techniques from other experiments.
  • results: The paper provides an overview of the goals and strategies regarding AI/ML in the EIC community, as well as the potential benefits and insights that can be gained from their application.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为EIC社区写的,探讨了AI/ML在该设施的实验和启动过程中的潜在应用。
  • methods: 论文涵盖了EIC社区当前在进行的多个R&D项目和方法,包括其他实验中的前沿技术。
  • results: 论文提供了EIC社区对AI/ML的应用 goals和策略的概述,以及通过其应用可以获得的优点和发现。
    Abstract The Electron-Ion Collider (EIC), a state-of-the-art facility for studying the strong force, is expected to begin commissioning its first experiments in 2028. This is an opportune time for artificial intelligence (AI) to be included from the start at this facility and in all phases that lead up to the experiments. The second annual workshop organized by the AI4EIC working group, which recently took place, centered on exploring all current and prospective application areas of AI for the EIC. This workshop is not only beneficial for the EIC, but also provides valuable insights for the newly established ePIC collaboration at EIC. This paper summarizes the different activities and R&D projects covered across the sessions of the workshop and provides an overview of the goals, approaches and strategies regarding AI/ML in the EIC community, as well as cutting-edge techniques currently studied in other experiments.
    摘要 电子离子碰撞器(EIC),一个最先进的强相互作用研究设施,预计在2028年开始首次实验启用。这是一个非常有利的时机,让人工智能(AI)从设施的开始就包括在内,并在所有实验阶段进行应用。第二年度的AI4EIC工作坊,由AI4EIC工作组组织,最近召开,主要是探讨CURRENT AND PROSPECTIVE APPLICATION AREAS OF AI FOR EIC。这个工作坊不仅对EIC有利,还为新成立的ePIC合作项目提供了宝贵的经验。本文将summarize工作坊的不同活动和R&D项目,提供EIC社区关于AI/ML的目标、方法和战略,以及目前在其他实验中研究的前沿技术。

Snapshot Spectral Clustering – a costless approach to deep clustering ensembles generation

  • paper_url: http://arxiv.org/abs/2307.08591
  • repo_url: None
  • paper_authors: Adam Piróg, Halina Kwaśnicka
  • for: 本研究旨在探讨将深度学习与聚类结合使用,以提高聚类结果的准确性和稳定性。
  • methods: 本研究提出了一种新的深度聚类协同方法(Snapshot Spectral Clustering),利用多个视角的数据生成多个深度学习模型,并将其组合以实现更高的聚类精度和稳定性。
  • results: 实验结果表明,Snapshot Spectral Clustering方法可以减少计算成本,同时提高聚类结果的准确性和稳定性,相比于传统的聚类方法和深度学习方法。
    Abstract Despite tremendous advancements in Artificial Intelligence, learning from large sets of data in an unsupervised manner remains a significant challenge. Classical clustering algorithms often fail to discover complex dependencies in large datasets, especially considering sparse, high-dimensional spaces. However, deep learning techniques proved to be successful when dealing with large quantities of data, efficiently reducing their dimensionality without losing track of underlying information. Several interesting advancements have already been made to combine deep learning and clustering. Still, the idea of enhancing the clustering results by combining multiple views of the data generated by deep neural networks appears to be insufficiently explored yet. This paper aims to investigate this direction and bridge the gap between deep neural networks, clustering techniques and ensemble learning methods. To achieve this goal, we propose a novel deep clustering ensemble method - Snapshot Spectral Clustering, designed to maximize the gain from combining multiple data views while minimizing the computational costs of creating the ensemble. Comparative analysis and experiments described in this paper prove the proposed concept, while the conducted hyperparameter study provides a valuable intuition to follow when selecting proper values.
    摘要 To achieve this goal, we propose a novel deep clustering ensemble method called Snapshot Spectral Clustering. This method is designed to maximize the gain from combining multiple data views while minimizing computational costs. Our comparative analysis and experiments show that the proposed method is effective, and a hyperparameter study provides valuable intuition for selecting appropriate values.