cs.LG - 2023-10-05

Improved prediction of ligand-protein binding affinities by meta-modeling

  • paper_url: http://arxiv.org/abs/2310.03946
  • repo_url: https://github.com/lee1701/lee2023a
  • paper_authors: Ho-Joon Lee, Prashant S. Emani, Mark B. Gerstein
  • for: 这篇研究旨在开发一个meta-modeling框架,以提高药物结构与序列学习模型之间的组合,以提高药物与标的蛋白质之间的结合力预测。
  • methods: 这篇研究使用了已出版的empirical结构基 docking和序列学习模型,并评估了多种个人模型、训练数据库和线性和非线性meta-modeling方法。
  • results: 研究发现,使用多种模型的ensemble可以实现与单一模型相比较好的结合力预测性能,并且可以控制输入特征,如物理化学性质或分子描述子。
    Abstract The accurate screening of candidate drug ligands against target proteins through computational approaches is of prime interest to drug development efforts, as filtering potential candidates would save time and expenses for finding drugs. Such virtual screening depends in part on methods to predict the binding affinity between ligands and proteins. Given many computational models for binding affinity prediction with varying results across targets, we herein develop a meta-modeling framework by integrating published empirical structure-based docking and sequence-based deep learning models. In building this framework, we evaluate many combinations of individual models, training databases, and linear and nonlinear meta-modeling approaches. We show that many of our meta-models significantly improve affinity predictions over individual base models. Our best meta-models achieve comparable performance to state-of-the-art exclusively structure-based deep learning tools. Overall, we demonstrate that diverse modeling approaches can be ensembled together to gain substantial improvement in binding affinity prediction while allowing control over input features such as physicochemical properties or molecular descriptors.
    摘要 <>翻译文本为简化中文。> Computational approaches for screening candidate drug ligands against target proteins have become a priority in drug development, as they can save time and expenses in finding potential drug candidates. One of the key aspects of virtual screening is predicting the binding affinity between ligands and proteins. However, there are many computational models for binding affinity prediction, and the results can vary across different targets. To address this challenge, we have developed a meta-modeling framework by integrating published empirical structure-based docking and sequence-based deep learning models.在虚拟屏选中,预测ligand和蛋白质之间的绑定强度是一项关键的任务。然而,不同目标之间的绑定强度预测结果可能会很不一致。为解决这个挑战,我们在这里提出了一种meta-模型框架,通过结合已出版的实验性结构含量 docking 和深度学习模型来实现。在建立这个框架时,我们评估了许多组合individual模型,训练数据库和线性和非线性meta-模型方法。我们发现,许多我们的meta-模型可以在绑定强度预测中提供显著改进,并且我们的best meta-模型可以与当前的结构深度学习工具相比。总的来说,我们展示了多种模型方法可以被 ensemble вместе来实现较大的绑定强度预测改进,同时允许控制输入特征,如物理化学性质或分子描述符。

On Wasserstein distances for affine transformations of random vectors

  • paper_url: http://arxiv.org/abs/2310.03945
  • repo_url: None
  • paper_authors: Keaton Hamm, Andrzej Korzeniowski
  • for: 这 paper 是关于Random Vector在 $\mathbb{R}^n$ 中的Quadratic Wasserstein distance的下界,强调了对Affine Transformations的应用在 Wasserstein 空间中的掌握数据。
  • methods: 这 paper 使用了Bures Metric来计算Random Vector的covariance Matrix,并derived upper bounds for compositions of affine maps。
  • results: 这 paper 提供了一些Lower bounds和upper bounds for Quadratic Wasserstein distance,并应用于various distributions,包括lying on a 1-dimensional manifold in $\mathbb{R}^2$。 Additionally, the paper provides a framework for mimicking handwritten digit or alphabet datasets for use in a manifold learning framework.
    Abstract We expound on some known lower bounds of the quadratic Wasserstein distance between random vectors in $\mathbb{R}^n$ with an emphasis on affine transformations that have been used in manifold learning of data in Wasserstein space. In particular, we give concrete lower bounds for rotated copies of random vectors in $\mathbb{R}^2$ with uncorrelated components by computing the Bures metric between the covariance matrices. We also derive upper bounds for compositions of affine maps which yield a fruitful variety of diffeomorphisms applied to an initial data measure. We apply these bounds to various distributions including those lying on a 1-dimensional manifold in $\mathbb{R}^2$ and illustrate the quality of the bounds. Finally, we give a framework for mimicking handwritten digit or alphabet datasets that can be applied in a manifold learning framework.
    摘要 我们讨论了一些已知的 quaratic Wasserstein 距离下界,特别是在 $\mathbb{R}^n$ 中Random vectors 上,并强调了使用抽象变换来进行数据在 Wasserstein 空间的学习。我们给出了对旋转 copies of random vectors in $\mathbb{R}^2$ 的不相关 ком ponent的具体下界,通过计算协VARIANCE 矩阵之间的Bures metric来得到。我们还得出了作用于初始数据测度的affine maps 的聚合上界,这些上界可以生成一种多样的diffemorphisms。我们应用这些上下界到不同的分布,包括在 $\mathbb{R}^2$ 中的 1-dimensional manifold 上的分布,并 Illustrates the quality of the bounds。最后,我们提出了一种拟合手写字符或字母数据集的框架,可以在 manifold learning 框架中应用。

LaTeX: Language Pattern-aware Triggering Event Detection for Adverse Experience during Pandemics

  • paper_url: http://arxiv.org/abs/2310.03941
  • repo_url: None
  • paper_authors: Kaiqun Fu, Yangxiao Bai, Weiwei Zhang, Deepthi Kolady
  • for: This paper aims to explore the role of social media platforms in highlighting and addressing socioeconomic disparities during the COVID-19 pandemic, specifically focusing on four major types of adverse experiences: loss of employment income, food scarcity, housing insecurity, and unmet needs for mental health services.
  • methods: The paper uses real-time data from Twitter to analyze language patterns related to the four adverse experiences, and proposes a sparsity optimization problem to extract low-level language features. The authors also propose novel constraints on feature similarity based on prior knowledge about the similarity of language patterns among the adverse experiences.
  • results: The proposed model is challenging to solve due to the non-convexity objective and non-smoothness penalties, but the authors develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the problem. Extensive experiments and comparisons to other models on real-world social media data justify the efficacy of their model in detecting adverse experiences.
    Abstract The COVID-19 pandemic has accentuated socioeconomic disparities across various racial and ethnic groups in the United States. While previous studies have utilized traditional survey methods like the Household Pulse Survey (HPS) to elucidate these disparities, this paper explores the role of social media platforms in both highlighting and addressing these challenges. Drawing from real-time data sourced from Twitter, we analyzed language patterns related to four major types of adverse experiences: loss of employment income (LI), food scarcity (FS), housing insecurity (HI), and unmet needs for mental health services (UM). We first formulate a sparsity optimization problem that extracts low-level language features from social media data sources. Second, we propose novel constraints on feature similarity exploiting prior knowledge about the similarity of the language patterns among the adverse experiences. The proposed problem is challenging to solve due to the non-convexity objective and non-smoothness penalties. We develop an algorithm based on the alternating direction method of multipliers (ADMM) framework to solve the proposed formulation. Extensive experiments and comparisons to other models on real-world social media and the detection of adverse experiences justify the efficacy of our model.
    摘要 COVID-19 大流行减少了不同的民族和文化背景下的社会经济差距。 previous 研究使用传统的调查方法,如家庭普ulse调查(HPS)来描述这些差距,但这篇论文探讨了社交媒体平台在描述和解决这些挑战方面的作用。 drawing from real-time 社交媒体数据源,我们分析了四种主要的不利经验语言模式:失业收入损失(LI)、食物不足(FS)、住房不安定(HI)和精神健康服务需求不满(UM)。 first,我们构建了一个简单的语言特征提取问题,以EXTRACT 社交媒体数据源中的低级语言特征。 second,我们提出了一些新的语言特征相似性约束,基于先前知道这些语言模式之间的相似性。 proposed 问题是因为非对称目标函数和不对称补偿因子而具有挑战性。 we 发展了基于多元分解方法(ADMM)框架的算法来解决提议的问题。 extensive 实验和对真实社交媒体数据进行比较,证明了我们的模型的有效性。

Improving classifier decision boundaries using nearest neighbors

  • paper_url: http://arxiv.org/abs/2310.03927
  • repo_url: None
  • paper_authors: Johannes Schneider
  • for: 提高神经网络的选择边界优化
  • methods: 使用样本和其最近邻居的平面均值来提高神经网络的预测性能
  • results: 提高了对label noise、对抗攻击、分类精度和一定程度的解释性能Here’s a more detailed explanation of each point:
  • for: The paper aims to improve the optimization of decision boundaries in neural networks.
  • methods: The proposed method uses a weighted average of the predictions of a sample and its nearest neighbors in latent space to improve the performance of neural networks.
  • results: The proposed method improves various important measures of neural networks, including resistance to label noise, robustness against adversarial attacks, classification accuracy, and interpretability. The improvements are not necessarily large in all four areas, but the approach is simple and does not require any modifications to the network architecture, training procedure, or dataset.
    Abstract Neural networks are not learning optimal decision boundaries. We show that decision boundaries are situated in areas of low training data density. They are impacted by few training samples which can easily lead to overfitting. We provide a simple algorithm performing a weighted average of the prediction of a sample and its nearest neighbors' (computed in latent space) leading to a minor favorable outcomes for a variety of important measures for neural networks. In our evaluation, we employ various self-trained and pre-trained convolutional neural networks to show that our approach improves (i) resistance to label noise, (ii) robustness against adversarial attacks, (iii) classification accuracy, and to some degree even (iv) interpretability. While improvements are not necessarily large in all four areas, our approach is conceptually simple, i.e., improvements come without any modification to network architecture, training procedure or dataset. Furthermore, they are in stark contrast to prior works that often require trade-offs among the four objectives or provide valuable, but non-actionable insights.
    摘要 (i) resistance to label noise,(ii) robustness against adversarial attacks,(iii) classification accuracy, and to some degree even(iv) interpretability. While improvements are not necessarily large in all four areas, our approach is conceptually simple, i.e., improvements come without any modification to network architecture, training procedure, or dataset. Furthermore, they are in stark contrast to prior works that often require trade-offs among the four objectives or provide valuable, but non-actionable insights.Translated into Simplified Chinese:神经网络不会学习优化的决策边界。我们显示决策边界位于训练数据密度低的区域内。它们受到训练样本少量的影响,容易导致过拟合。我们提供一个简单的算法,将样本预测值和其最近的邻居(在潜在空间中计算)进行权重平均,导致许多重要指标上的微妙改善。在我们的评估中,我们使用了各种自动学习和预训练 convolutional neural network,以示我们的方法可以提高(i) 标签噪声抵抗性,(ii) 对抗攻击性,(iii) 分类精度,并且在一定程度上还可以提高(iv) 解释性。虽然改善不一定是所有四个领域中的很大,但我们的方法是概念简单,即改进不需要修改网络结构、训练过程或数据集。此外,它们与之前的工作不同,常常需要四个目标之间的交易或提供有价值,但无法实现的指导。

Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control

  • paper_url: http://arxiv.org/abs/2310.03915
  • repo_url: None
  • paper_authors: Neehal Tumma, Mathias Lechner, Noel Loo, Ramin Hasani, Daniela Rus
  • for: 这个论文旨在研究机器学习中自适应Agent的开发,特别是在环境变化的情况下。
  • methods: 这篇论文使用了回归神经网络,并研究了这种神经网络的连接性 Parameters 对于closed-loop设置的稳定性的影响。
  • results: 研究发现,通过控制连接性的权重和稀疏程度,可以使得神经网络在线上下文中更加稳定和可靠,并且可以使用 fewer parameters 的模型在分布Shift下表现更好。
    Abstract Developing autonomous agents that can interact with changing environments is an open challenge in machine learning. Robustness is particularly important in these settings as agents are often fit offline on expert demonstrations but deployed online where they must generalize to the closed feedback loop within the environment. In this work, we explore the application of recurrent neural networks to tasks of this nature and understand how a parameterization of their recurrent connectivity influences robustness in closed-loop settings. Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity.
    摘要 开发自适应智能代理人,可以在变化环境中交互,是机器学习领域的开放挑战。在这些设置下,Robustness特别重要,因为代理人常常在专家示例的基础上训练,但在环境中部署时需要总结并适应关闭反馈循环。在这项工作中,我们探索使用回归神经网络来解决这类任务,并研究回归连接的参数化对于封闭循环设置的影响。 Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity.

PyDCM: Custom Data Center Models with Reinforcement Learning for Sustainability

  • paper_url: http://arxiv.org/abs/2310.03906
  • repo_url: None
  • paper_authors: Avisek Naug, Antonio Guillen, Ricardo Luna Gutiérrez, Vineet Gundecha, Dejan Markovikj, Lekhapriya Dheeraj Kashyap, Lorenz Krause, Sahand Ghorbanpour, Sajad Mousavi, Ashwin Ramesh Babu, Soumyendu Sarkar
  • for: 这篇论文目的是优化数据中心的能源消耗和 Computational workloads。
  • methods: 论文使用了Python实现的自定义数据中心模型(PyDCM),允许用户创建具有自定义服务器特性和IT库 geometrical arrangement的专一化IT设备配置。
  • results: 相比现有的Energy Plus模型实现,PyDCM的矩阵化热计算使其速度提高至30倍,并且对于多CPU的数据中心设计实验 scales sublinearly。此外,PyDCM还允许使用深度强化学习via Gymnasium wrapper来优化数据中心冷却,并提供了一个易用的平台来测试多种数据中心设计构想。
    Abstract The increasing global emphasis on sustainability and reducing carbon emissions is pushing governments and corporations to rethink their approach to data center design and operation. Given their high energy consumption and exponentially large computational workloads, data centers are prime candidates for optimizing power consumption, especially in areas such as cooling and IT energy usage. A significant challenge in this pursuit is the lack of a configurable and scalable thermal data center model that offers an end-to-end pipeline. Data centers consist of multiple IT components whose geometric configuration and heat dissipation make thermal modeling difficult. This paper presents PyDCM, a customizable Data Center Model implemented in Python, that allows users to create unique configurations of IT equipment with custom server specifications and geometric arrangements of IT cabinets. The use of vectorized thermal calculations makes PyDCM orders of magnitude faster (30 times) than current Energy Plus modeling implementations and scales sublinearly with the number of CPUs. Also, PyDCM enables the use of Deep Reinforcement Learning via the Gymnasium wrapper to optimize data center cooling and offers a user-friendly platform for testing various data center design prototypes.
    摘要 全球增加对可持续性的强调和减少碳排放,政府和企业被迫重新考虑数据中心的设计和运行方法。由于数据中心的能源消耗和计算工作负载呈指数增长,特别是冷却和IT能源使用方面,因此寻找可 configurable 和可扩展的热数据中心模型成为了一项挑战。由于数据中心由多个IT组件组成,这些组件的几何配置和热释放使得热模型化变得困难。本文介绍了PyDCM,一个基于Python的自定义数据中心模型,允许用户创建自定义IT设备参数和自定义IT柜的唯一配置。通过向量化热计算,PyDCM比现有的Energy Plus模型实现速度高得多(30倍),并且在数据中心设计尝试中进行深度强化学习。此外,PyDCM还提供了一个易于使用的平台,可以用于测试不同的数据中心设计原型。

Provable benefits of annealing for estimating normalizing constants: Importance Sampling, Noise-Contrastive Estimation, and beyond

  • paper_url: http://arxiv.org/abs/2310.03902
  • repo_url: https://github.com/l-omar-chehab/annealing-normalizing-constants
  • paper_authors: Omar Chehab, Aapo Hyvarinen, Andrej Risteski
  • for: 这些研究旨在开发一些基于熔炼的蒙特卡罗方法来估算正则化常数(分配函数)。
  • methods: 这些方法包括气化Importance Sampling和气化Noise-Contrastive Estimation(NCE)。这些方法的设计选择包括选择哪种估计器、选择哪种分布路径和是否使用路径。
  • results: 我们在这篇论文中评估了每一个设计选择的极限估计误差。我们发现,使用NCE估计器比Importance Sampling估计器更有效,但在极限情况下,差异消失。其次,我们发现使用几何路径可以减少估计误差,并且在某些情况下,使用算术路径可以实现最佳性。最后,我们建议一种两步估计器来近似最佳路径。
    Abstract Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces. First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. Third, we find that the arithmetic path, while rarely used, can offer optimality properties over the universally-used geometric path. In fact, in a particular limit, the optimal path is arithmetic. Based on this theory, we finally propose a two-step estimator to approximate the optimal path in an efficient way.
    摘要 现有研究已经开发出了一些蒙特卡洛方法来估计正规化常数(分配函数),基于渐变的想法。这意味着采样successively从一个描述分布的路径中,该路径连接了一个可行的"提议"分布和未正规化的"目标"分布。主要估计器在这个家族中包括渐变重要性样本和渐变噪声相对估计(NCE)。这些方法受到许多设计选择的影响:哪种估计器使用,哪个路径使用,是否使用路径等。到目前为止,没有一个确定的理论来确定这些选择是有效的。我们在这里评估每种设计选择的极限估计错误。首先,我们发现使用NCE估计器比重要性样本估计器更有效,但在无限小路步下,两者的差异消失。其次,我们发现使用几何路径可以将估计错误从指数函数下降到多项函数中,而且与参数距离目标分布的距离成正比。最后,我们发现使用阿基米德路径,虽然 rarely used,可以在一些情况下提供优化性质。实际上,在某个特殊情况下,优化的路径是阿基米德路径。基于这种理论,我们最终提议了一种两步估计器,可以有效地估计正规化常数。

CrysFormer: Protein Structure Prediction via 3d Patterson Maps and Partial Structure Attention

  • paper_url: http://arxiv.org/abs/2310.03899
  • repo_url: None
  • paper_authors: Chen Dun, Qiutai Pan, Shikai Jin, Ria Stevens, Mitchell D. Miller, George N. Phillips, Jr., Anastasios Kyrillidis
  • for: 这个论文主要研究了如何使用transformer neural网络架构和蛋白质晶体结晶学信息来预测蛋白质的电子密度图。
  • methods: 该论文提出了一种基于transformer neural网络架构的新方法,直接使用蛋白质晶体结晶学信息和部分结构信息来预测蛋白质的电子密度图。
  • results: 通过两个新的peptide fragments数据集(2个残基和15个残基),论文表明该方法可以准确地预测蛋白质的电子密度图,并且需要训练数据集的数量和计算成本远比传统方法少。
    Abstract Determining the structure of a protein has been a decades-long open question. A protein's three-dimensional structure often poses nontrivial computation costs, when classical simulation algorithms are utilized. Advances in the transformer neural network architecture -- such as AlphaFold2 -- achieve significant improvements for this problem, by learning from a large dataset of sequence information and corresponding protein structures. Yet, such methods only focus on sequence information; other available prior knowledge, such as protein crystallography and partial structure of amino acids, could be potentially utilized. To the best of our knowledge, we propose the first transformer-based model that directly utilizes protein crystallography and partial structure information to predict the electron density maps of proteins. Via two new datasets of peptide fragments (2-residue and 15-residue) , we demonstrate our method, dubbed \texttt{CrysFormer}, can achieve accurate predictions, based on a much smaller dataset size and with reduced computation costs.
    摘要 Determining the structure of a protein has been an open question for decades. A protein's three-dimensional structure often poses significant computational costs when using classical simulation algorithms. Advances in the transformer neural network architecture, such as AlphaFold2, have achieved significant improvements for this problem by learning from a large dataset of sequence information and corresponding protein structures. However, these methods only focus on sequence information; other available prior knowledge, such as protein crystallography and partial structure of amino acids, could be potentially utilized. To the best of our knowledge, we propose the first transformer-based model that directly utilizes protein crystallography and partial structure information to predict the electron density maps of proteins. Via two new datasets of peptide fragments (2-residue and 15-residue), we demonstrate that our method, dubbed \texttt{CrysFormer}, can achieve accurate predictions with a much smaller dataset size and reduced computation costs.

Class-Incremental Learning Using Generative Experience Replay Based on Time-aware Regularization

  • paper_url: http://arxiv.org/abs/2310.03898
  • repo_url: None
  • paper_authors: Zizhao Hu, Mohammad Rostami
  • for: 本研究旨在解决逐渐学习中积累新任务而不导致忘记之挑战,通过生成pseudo-数据点进行过去学习任务的合成,并在新任务的训练过程中重新使用这些pseudo-数据点。
  • methods: 本研究提出了一种基于生物神经系统机制的时间意识正则化方法,用于动态精度调整三个训练目标函数:supervised learning、射频正则化和数据重建。
  • results: 实验结果表明,our方法在严格的类增量设定下(i)保持模型大小不变,(ii)无需预训练数据集和(iii)无需记忆缓存来存储过去任务的数据时,可以在逐渐学习中达到更高的表现和记忆留存。
    Abstract Learning new tasks accumulatively without forgetting remains a critical challenge in continual learning. Generative experience replay addresses this challenge by synthesizing pseudo-data points for past learned tasks and later replaying them for concurrent training along with the new tasks' data. Generative replay is the best strategy for continual learning under a strict class-incremental setting when certain constraints need to be met: (i) constant model size, (ii) no pre-training dataset, and (iii) no memory buffer for storing past tasks' data. Inspired by the biological nervous system mechanisms, we introduce a time-aware regularization method to dynamically fine-tune the three training objective terms used for generative replay: supervised learning, latent regularization, and data reconstruction. Experimental results on major benchmarks indicate that our method pushes the limit of brain-inspired continual learners under such strict settings, improves memory retention, and increases the average performance over continually arriving tasks.
    摘要 学习新任务积累无忘记是持续学习中的核心挑战。生成经验回放解决了这个挑战,通过合成过去学习的任务的 Pseudo-数据点并在当前任务的数据同时重新训练。生成回放是在严格的类增量设定下最佳的启发式学习策略,当下列条件需要满足:(i)不变的模型大小,(ii)无预训练集,(iii)无记忆缓存以存储过去任务的数据。通过模仿生物神经系统机制,我们引入时间意识正则化方法来动态细调三个训练目标函数用于生成回放:监督学习、干扰正则化和数据重建。实验结果表明,我们的方法可以在这些严格的设定下超越脑神经系统逻辑学习器,提高记忆保持和逐渐到达的平均性能。

Information Geometry for the Working Information Theorist

  • paper_url: http://arxiv.org/abs/2310.03884
  • repo_url: None
  • paper_authors: Kumar Vijay Mishra, M. Ashok Kumar, Ting-Kam Leonard Wong
  • for: 本文提供了信息 геометри的概述,以便对信息理论领域的人了解这个新领域的研究。
  • methods: 本文介绍了统计 manifolds 上的异 diferences,通常是 Generalized distances, orthogonality, and geodesics 等概念。
  • results: 本文介绍了一些现代信息 геометри的发展,包括雷达探测、数组信号处理、量子物理、深度学习和最优运输等领域的应用。
    Abstract Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community.
    摘要 信息 геометрия是研究统计 manifold的学科,即概率分布的空间从 геометрического角度出发。传统上,信息学应用包括统计概念如费希尔信息、充分统计量和有效估计器。但今天,信息 geometry 已经成为一个横跨多个领域的交叉学科,包括雷达探测、数组信号处理、量子物理、深度学习和最优运输。本文为不熟悉信息 geometry 的信息学家提供一个入门性的概述,包括在统计 manifold 上的差异、通用距离、正交和最短路径,以便进一步探索实际应用和新的理论研究。此外,我们还提到了一些最近的信息 geometry 发展,对信息理论社区总体而言都很有价值。

Non Commutative Convolutional Signal Models in Neural Networks: Stability to Small Deformations

  • paper_url: http://arxiv.org/abs/2310.03879
  • repo_url: None
  • paper_authors: Alejandro Parada-Mayorga, Landon Butler, Alejandro Ribeiro
  • for: 这篇论文关注了基于非交汇代数的抽象信号模型(ASM),以及其在卷积神经网络中的应用。
  • methods: 该论文基于核心是阿尔格браic信号处理(ASP)的一般工具,研究了非交汇卷积滤波器的过滤和稳定性质。
  • results: 研究发现,非交汇滤波器可以具有小 perturbation 的稳定性,同时存在与 commutative 模型相似的选择性和稳定性之间的质量负担。 numerical experiments validate these results.
    Abstract In this paper we discuss the results recently published in~[1] about algebraic signal models (ASMs) based on non commutative algebras and their use in convolutional neural networks. Relying on the general tools from algebraic signal processing (ASP), we study the filtering and stability properties of non commutative convolutional filters. We show how non commutative filters can be stable to small perturbations on the space of operators. We also show that although the spectral components of the Fourier representation in a non commutative signal model are associated to spaces of dimension larger than one, there is a trade-off between stability and selectivity similar to that observed for commutative models. Our results have direct implications for group neural networks, multigraph neural networks and quaternion neural networks, among other non commutative architectures. We conclude by corroborating these results through numerical experiments.
    摘要 在这篇论文中,我们讨论了最近发表在[1]中关于基于非交换代数的拟合信号模型(ASM)以及它们在卷积神经网络中的应用。基于普通的代数信号处理(ASP)工具,我们研究了非交换卷积 Filter 的过滤和稳定性质性。我们示出了非交换卷积 Filter 可以具有小 perturbation 的空间操作器稳定性。此外,我们还证明了在非交换信号模型中的傅里叶分量相对于 commutative 模型存在一种与稳定性和选择性之间的质量负担。我们的结果直接适用于群神经网络、多graph神经网络和四元数神经网络等非交换架构。我们通过数值实验证实了这些结果。

Model Complexity of Program Phases

  • paper_url: http://arxiv.org/abs/2310.03865
  • repo_url: None
  • paper_authors: Arjun Karuvally, J. Eliot B. Moss
  • for: 资源受限的 Computing 系统中,序列预测模型需要在紧张的环境下运作。不同的模型各自对预测在这些限制下进行了一些修改,以减少实现成本。这些资源受限的序列预测模型在实践中展现了一个基本的交换关系,即预测质量和实现成本之间的贸易。
  • methods: 本文使用了一种machine learning模型,具体来说是深度神经网络,并且提出了一个相应的实验方法,以探索这个交换关系的空间。
  • results: 本文预测这个交换关系对于特定的机器学习模型而言,可以获得更好的预测质量,并且可以更好地理解这些模型在资源受限的情况下的实际和理论上的限制。
    Abstract In resource limited computing systems, sequence prediction models must operate under tight constraints. Various models are available that cater to prediction under these conditions that in some way focus on reducing the cost of implementation. These resource constrained sequence prediction models, in practice, exhibit a fundamental tradeoff between the cost of implementation and the quality of its predictions. This fundamental tradeoff seems to be largely unexplored for models for different tasks. Here we formulate the necessary theory and an associated empirical procedure to explore this tradeoff space for a particular family of machine learning models such as deep neural networks. We anticipate that the knowledge of the behavior of this tradeoff may be beneficial in understanding the theoretical and practical limits of creation and deployment of models for resource constrained tasks.
    摘要 在有限资源计算系统中,序列预测模型需要在严格的限制下运行。有各种模型可以满足这种情况,它们强调降低实现成本。这些资源受限序列预测模型在实践中存在一个基本的交易关系,即实现成本和预测质量之间的交易关系。这种基本交易关系还未对不同任务的模型进行了系统性的探索。在这里,我们建立了必要的理论和相关的实验方法,以探索这个交易关系的空间,特别是用于深度神经网络等一家machine learning模型。我们预计,了解这种交易关系的行为可以帮助我们理解资源受限任务中模型的理论和实践上的限制。

Variational Barycentric Coordinates

  • paper_url: http://arxiv.org/abs/2310.03861
  • repo_url: None
  • paper_authors: Ana Dodik, Oded Stein, Vincent Sitzmann, Justin Solomon
  • for: optimize for generalized barycentric coordinates and provide additional control over existing models
  • methods: use a variational technique and parameterize the continuous function that maps any coordinate in a polytope’s interior to its barycentric coordinates using a neural field
  • results: demonstrate the flexibility of the model using a variety of objective functions and present a thorough validation of the algorithm, as well as demonstrate several applications
    Abstract We propose a variational technique to optimize for generalized barycentric coordinates that offers additional control compared to existing models. Prior work represents barycentric coordinates using meshes or closed-form formulae, in practice limiting the choice of objective function. In contrast, we directly parameterize the continuous function that maps any coordinate in a polytope's interior to its barycentric coordinates using a neural field. This formulation is enabled by our theoretical characterization of barycentric coordinates, which allows us to construct neural fields that parameterize the entire function class of valid coordinates. We demonstrate the flexibility of our model using a variety of objective functions, including multiple smoothness and deformation-aware energies; as a side contribution, we also present mathematically-justified means of measuring and minimizing objectives like total variation on discontinuous neural fields. We offer a practical acceleration strategy, present a thorough validation of our algorithm, and demonstrate several applications.
    摘要 我们提出了一种变量技术来优化通用的加权坐标,它提供了更多的控制比现有模型。先前的工作使用网格或关闭式公式来表示加权坐标,但这限制了目标函数的选择。相比之下,我们直接使用神经场来 parameterize任何polytope内部坐标到其加权坐标的连续函数。这种形式是由我们对加权坐标的理论特征化,允许我们构建总函数类型的有效坐标的神经场。我们采用多种目标函数,包括多项细化和形态感知能量;同时,我们也提供了数学上正确的测量和最小化对不连续神经场的目标。我们提供了实用的加速策略,进行了全面验证我们的算法,并应用了多种场景。

Euclid: Identification of asteroid streaks in simulated images using deep learning

  • paper_url: http://arxiv.org/abs/2310.03845
  • repo_url: None
  • paper_authors: M. Pöntinen, M. Granvik, A. A. Nucita, L. Conversi, B. Altieri, B. Carry, C. M. O’Riordan, D. Scott, N. Aghanim, A. Amara, L. Amendola, N. Auricchio, M. Baldi, D. Bonino, E. Branchini, M. Brescia, S. Camera, V. Capobianco, C. Carbone, J. Carretero, M. Castellano, S. Cavuoti, A. Cimatti, R. Cledassou, G. Congedo, Y. Copin, L. Corcione, F. Courbin, M. Cropper, A. Da Silva, H. Degaudenzi, J. Dinis, F. Dubath, X. Dupac, S. Dusini, S. Farrens, S. Ferriol, M. Frailis, E. Franceschi, M. Fumana, S. Galeotta, B. Garilli, W. Gillard, B. Gillis, C. Giocoli, A. Grazian, S. V. H. Haugan, W. Holmes, F. Hormuth, A. Hornstrup, K. Jahnke, M. Kümmel, S. Kermiche, A. Kiessling, T. Kitching, R. Kohley, M. Kunz, H. Kurki-Suonio, S. Ligori, P. B. Lilje, I. Lloro, E. Maiorano, O. Mansutti, O. Marggraf, K. Markovic, F. Marulli, R. Massey, E. Medinaceli, S. Mei, M. Melchior, Y. Mellier, M. Meneghetti, G. Meylan, M. Moresco, L. Moscardini, E. Munari, S. -M. Niemi, T. Nutma, C. Padilla, S. Paltani, F. Pasian, K. Pedersen, V. Pettorino, S. Pires, G. Polenta, M. Poncet, F. Raison, A. Renzi, J. Rhodes, G. Riccio, E. Romelli, M. Roncarelli, E. Rossetti, R. Saglia, D. Sapone, B. Sartoris, P. Schneider, A. Secroun, G. Seidel, S. Serrano, C. Sirignano, G. Sirri, L. Stanco, P. Tallada-Crespí, A. N. Taylor, I. Tereno, R. Toledo-Moreo, F. Torradeflot, I. Tutusaus, L. Valenziano, T. Vassallo, G. Verdoes Kleijn, Y. Wang, J. Weller, G. Zamorani, J. Zoubian, V. Scottez
  • for: 帮助欧空爆料 telescope 探测到更多的小行星
  • methods: 使用深度学习 pipeline,包括卷积神经网络、回归神经网络和梯度拟合树
  • results: 比革Det software更高的完整性和相似的纯度,能探测到0.25-0.5 magnitudes 更暗的小行星,可能增加探测小行星的数量50%
    Abstract Up to 150000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the StreakDet software was previously tested, but the results were not optimal for short and/or faint streaks. We set out to improve the capability to detect asteroid streaks in Euclid images by using deep learning. We built, trained, and tested a three-step machine-learning pipeline with simulated Euclid images. First, a convolutional neural network (CNN) detected streaks and their coordinates in full images, aiming to maximize the completeness (recall) of detections. Then, a recurrent neural network (RNN) merged snippets of long streaks detected in several parts by the CNN. Lastly, gradient-boosted trees (XGBoost) linked detected streaks between different Euclid exposures to reduce the number of false positives and improve the purity (precision) of the sample. The deep-learning pipeline surpasses the completeness and reaches a similar level of purity of a non-machine-learning pipeline based on the StreakDet software. Additionally, the deep-learning pipeline can detect asteroids 0.25-0.5 magnitudes fainter than StreakDet. The deep-learning pipeline could result in a 50% increase in the number of detected asteroids compared to the StreakDet software. There is still scope for further refinement, particularly in improving the accuracy of streak coordinates and enhancing the completeness of the final stage of the pipeline, which involves linking detections across multiple exposures.
    摘要 “Up to 150,000 asteroids will be visible in the images of the ESA Euclid space telescope, and the instruments of Euclid offer multiband visual to near-infrared photometry and slitless spectra of these objects. Most asteroids will appear as streaks in the images. Due to the large number of images and asteroids, automated detection methods are needed. A non-machine-learning approach based on the StreakDet software was previously tested, but the results were not optimal for short and/or faint streaks. We set out to improve the capability to detect asteroid streaks in Euclid images by using deep learning. We built, trained, and tested a three-step machine-learning pipeline with simulated Euclid images. First, a convolutional neural network (CNN) detected streaks and their coordinates in full images, aiming to maximize the completeness (recall) of detections. Then, a recurrent neural network (RNN) merged snippets of long streaks detected in several parts by the CNN. Lastly, gradient-boosted trees (XGBoost) linked detected streaks between different Euclid exposures to reduce the number of false positives and improve the purity (precision) of the sample. The deep-learning pipeline surpasses the completeness and reaches a similar level of purity of a non-machine-learning pipeline based on the StreakDet software. Additionally, the deep-learning pipeline can detect asteroids 0.25-0.5 magnitudes fainter than StreakDet. The deep-learning pipeline could result in a 50% increase in the number of detected asteroids compared to the StreakDet software. There is still scope for further refinement, particularly in improving the accuracy of streak coordinates and enhancing the completeness of the final stage of the pipeline, which involves linking detections across multiple exposures.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. If you need Traditional Chinese, please let me know.

Chameleon: Increasing Label-Only Membership Leakage with Adaptive Poisoning

  • paper_url: http://arxiv.org/abs/2310.03838
  • repo_url: None
  • paper_authors: Harsh Chaudhari, Giorgio Severi, Alina Oprea, Jonathan Ullman
  • for: 本研究旨在提高现有的标签只会员推理攻击的准确率,特别是在低假阳极低(FPR)的情况下。
  • methods: 本文提出了一种新的攻击方法,即染色蛋白攻击,它利用了一种新的适应式数据毒素策略和高效的查询选择方法来实现在标签只会员推理中更高的会员推理精度。
  • results: 作者对多个实验结果进行了比较,结果显示,染色蛋白攻击在低FPR情况下能够更高效地进行会员推理,特别是在标签只会员推理中。
    Abstract The integration of machine learning (ML) in numerous critical applications introduces a range of privacy concerns for individuals who provide their datasets for model training. One such privacy risk is Membership Inference (MI), in which an attacker seeks to determine whether a particular data sample was included in the training dataset of a model. Current state-of-the-art MI attacks capitalize on access to the model's predicted confidence scores to successfully perform membership inference, and employ data poisoning to further enhance their effectiveness. In this work, we focus on the less explored and more realistic label-only setting, where the model provides only the predicted label on a queried sample. We show that existing label-only MI attacks are ineffective at inferring membership in the low False Positive Rate (FPR) regime. To address this challenge, we propose a new attack Chameleon that leverages a novel adaptive data poisoning strategy and an efficient query selection method to achieve significantly more accurate membership inference than existing label-only attacks, especially at low FPRs.
    摘要 机器学习(ML)在许多关键应用中的整合引入了许多个人隐私问题,其中一个问题是会员推断(MI),攻击者希望确定一个特定的数据示例是否包含在训练集中。现有的MI攻击都利用了访问模型预测的自信度分数,并使用数据毒攻击进一步提高其效果。在这项工作中,我们关注了更为未explored和更加现实istic的标签只设置,在这种设置下,模型只提供了查询示例的预测标签。我们显示现有的标签只MI攻击在低false positive rate(FPR) régime下无法进行会员推断。为解决这个挑战,我们提议一种新的攻击方法叫做假蝴蝶,它利用了一种新的适应式数据毒攻击策略和高效的查询选择方法,可以在低FPR régime下实现较为准确的会员推断。

Learning A Disentangling Representation For PU Learning

  • paper_url: http://arxiv.org/abs/2310.03833
  • repo_url: None
  • paper_authors: Omar Zamzam, Haleh Akrami, Mahdi Soltanolkotabi, Richard Leahy
  • For: Addresses the problem of learning a binary classifier given Positive and Unlabeled data (PU learning) in high-dimensional settings.* Methods: Proposes a neural network-based data representation using a loss function to project unlabeled data into two clusters, amplified by vector quantization.* Results: Demonstrates improved performance compared to current state-of-the-art approaches on simulated PU data, with theoretical justification for the two-cluster-based approach and algorithmic choices.Here’s the format you requested:* For: <what are the paper written for?>* Methods: <what methods the paper use?>* Results: <what results the paper get?>I hope that helps!
    Abstract In this paper, we address the problem of learning a binary (positive vs. negative) classifier given Positive and Unlabeled data commonly referred to as PU learning. Although rudimentary techniques like clustering, out-of-distribution detection, or positive density estimation can be used to solve the problem in low-dimensional settings, their efficacy progressively deteriorates with higher dimensions due to the increasing complexities in the data distribution. In this paper we propose to learn a neural network-based data representation using a loss function that can be used to project the unlabeled data into two (positive and negative) clusters that can be easily identified using simple clustering techniques, effectively emulating the phenomenon observed in low-dimensional settings. We adopt a vector quantization technique for the learned representations to amplify the separation between the learned unlabeled data clusters. We conduct experiments on simulated PU data that demonstrate the improved performance of our proposed method compared to the current state-of-the-art approaches. We also provide some theoretical justification for our two cluster-based approach and our algorithmic choices.
    摘要 在这篇论文中,我们解决了基于Positive和Unlabeled数据(PU学习)的二分类学习问题。虽然既有基础技术如聚类、外围检测或正类浓度估计可以在低维度设置下解决问题,但是其效果逐渐下降为高维度设置,由于数据分布的复杂度的增加。在这篇论文中,我们提议使用神经网络基于损失函数来学习数据表示,可以将未标注数据项映射到两个(正/_负)类中,并使用矢量量化技术来强化这些类之间的分离。我们在模拟的PU数据上进行实验,并证明了我们的提议方法与当前状态艺技术相比有更好的性能。我们还提供了一些理论基础和算法选择的正当性。

Logical Languages Accepted by Transformer Encoders with Hard Attention

  • paper_url: http://arxiv.org/abs/2310.03817
  • repo_url: None
  • paper_authors: Pablo Barcelo, Alexander Kozachinskiy, Anthony Widjaja Lin, Vladimir Podolskii
  • for: 本研究主要针对于使用转换器Encoder来认识正则语言。
  • methods: 本文使用Unique Hard Attention Transformers(UHAT)和Average Hard Attention Transformers(AHAT)两种自注意机制进行研究。UHATEncoder只能认识${\sf AC}^0}$中的语言,而AHATEncoder可以认识${\sf TC}^0}$中的语言,但其表达能力仍然在${\sf AC}^0}$中。
  • results: 我们首先证明了UHATEncoder无法认识一个${\sf AC}^0}$语言。然而,我们也证明了UHATEncoder可以认识一个${\sf AC}^0}$语言中的一个rich Fragment,即所有可以用第一预言逻辑表示的语言,这个逻辑包括所有${\sf AC}^0}$中的常见语言。此外,我们还证明了AHATEncoder可以认识这些语言,并且可以在添加计数器的情况下进一步扩展这些语言。我们通过这些结果来 derive新的UHAT和AHAT表达能力的结论。
    Abstract We contribute to the study of formal languages that can be recognized by transformer encoders. We focus on two self-attention mechanisms: (1) UHAT (Unique Hard Attention Transformers) and (2) AHAT (Average Hard Attention Transformers). UHAT encoders are known to recognize only languages inside the circuit complexity class ${\sf AC}^0$, i.e., accepted by a family of poly-sized and depth-bounded boolean circuits with unbounded fan-ins. On the other hand, AHAT encoders can recognize languages outside ${\sf AC}^0$), but their expressive power still lies within the bigger circuit complexity class ${\sf TC}^0$, i.e., ${\sf AC}^0$-circuits extended by majority gates. We first show a negative result that there is an ${\sf AC}^0$-language that cannot be recognized by an UHAT encoder. On the positive side, we show that UHAT encoders can recognize a rich fragment of ${\sf AC}^0$-languages, namely, all languages definable in first-order logic with arbitrary unary numerical predicates. This logic, includes, for example, all regular languages from ${\sf AC}^0$. We then show that AHAT encoders can recognize all languages of our logic even when we enrich it with counting terms. We apply these results to derive new results on the expressive power of UHAT and AHAT up to permutation of letters (a.k.a. Parikh images).
    摘要 我们贡献于形式语言的研究,可以被转换器Encoder认可。我们关注两种自注意机制:(1)UHAT(Unique Hard Attention Transformers)和(2)AHAT(Average Hard Attention Transformers)。UHATEncoder只能认可内部Circuit复杂性类${\sf AC}^0}$中的语言,即由多个poly-sized和深度bound boolean circuits组成的家族。相比之下,AHATEncoder可以认可外部${\sf AC}^0}$中的语言,但其表达力仍处于更大的Circuit复杂性类${\sf TC}^0}$中,即${\sf AC}^0}$circuits的扩展。我们首先显示了一个负结果,即${\sf AC}^0}$语言中有一个不可以被UHATEncoder认可。在正面上,我们显示了UHATEncoder可以认可${\sf AC}^0}$语言的Rich fragment,即所有可以用first-order logic表示的语言,其中包括所有${\sf AC}^0}$中的常见语言。然后我们显示AHATEncoder可以认可我们的逻辑中的所有语言, même when we enrich it with counting terms。我们将这些结果应用于derive new results on UHAT和AHAT的表达力,以及其 permutation of letters(即Parikh images)。

Fishnets: Information-Optimal, Scalable Aggregation for Sets and Graphs

  • paper_url: http://arxiv.org/abs/2310.03812
  • repo_url: None
  • paper_authors: T. Lucas Makinen, Justin Alsing, Benjamin D. Wandelt
  • for: 这篇论文主要关注的是设计一种可以实现信息优化的嵌入对象的方法,以便进行统计学和 graphs 的数据处理。
  • methods: 本文提出了一种名为“Fishnets”的统计汇集方法,可以实现对于数据集的信息优化嵌入。
  • results: 作者透过实验表明,Fishnets 可以实现对于数据集的信息优化嵌入,并且可以在不同数据分布下保持稳定性。此外,Fishnets 可以与现有的 GNN 架构相容,并且可以在训练时间和学习parameters方面实现更好的性能。
    Abstract Set-based learning is an essential component of modern deep learning and network science. Graph Neural Networks (GNNs) and their edge-free counterparts Deepsets have proven remarkably useful on ragged and topologically challenging datasets. The key to learning informative embeddings for set members is a specified aggregation function, usually a sum, max, or mean. We propose Fishnets, an aggregation strategy for learning information-optimal embeddings for sets of data for both Bayesian inference and graph aggregation. We demonstrate that i) Fishnets neural summaries can be scaled optimally to an arbitrary number of data objects, ii) Fishnets aggregations are robust to changes in data distribution, unlike standard deepsets, iii) Fishnets saturate Bayesian information content and extend to regimes where MCMC techniques fail and iv) Fishnets can be used as a drop-in aggregation scheme within GNNs. We show that by adopting a Fishnets aggregation scheme for message passing, GNNs can achieve state-of-the-art performance versus architecture size on ogbn-protein data over existing benchmarks with a fraction of learnable parameters and faster training time.
    摘要 设计学习是现代深度学习和网络科学中的一个关键Component。图 neural networks (GNNs) 和它们的边 livre counterparts Deepsets 在异常和复杂的数据集上表现出了极其有用的特性。在学习集成的数据成员嵌入中,关键在于指定的汇聚函数,通常是总、最大或平均。我们提出了 Fishnets,一种汇聚策略,用于学习对集合数据的信息丰富嵌入,包括权化推理和图聚合。我们证明了以下四点:1. Fishnets神经摘要可以优化地扩展到任意数据对象数量上,而不会增加计算复杂性。2. Fishnets的汇聚方法对数据分布变化具有更高的稳定性,与标准Deepsets不同。3. Fishnets可以达到信息理论内存的极限,并在MCMC技术无法进行融合的场景下进行扩展。4. Fishnets可以作为GNNs中的替换汇聚方法,使得GNNs可以在已有的参数和训练时间下达到现有的性能水平。我们在ogbn-protein数据集上使用Fishnets汇聚方法进行消息传递,并证明了GNNs可以在已有的参数和训练时间下达到现有的性能水平,并且可以在很短的时间内训练。

Droplets of Good Representations: Grokking as a First Order Phase Transition in Two Layer Networks

  • paper_url: http://arxiv.org/abs/2310.03789
  • repo_url: None
  • paper_authors: Noa Rubin, Inbar Seroussi, Zohar Ringel
  • for: 本研究探讨了深度神经网络(DNN)在训练中学习新特征的能力。特别是在最近报道的Grokking现象中,这一特点更加突出。
  • methods: 本研究使用了最新的特征学习理论——适应kernel方法,对两种教师模型(卷积波形和模块添加)进行了分析。
  • results: 研究发现,在Grokking过程中,DNN的状态与第一次相变过程中的混合阶段类似,DNN在这个阶段生成了有用的内部表示,与之前的转变前不同。
    Abstract A key property of deep neural networks (DNNs) is their ability to learn new features during training. This intriguing aspect of deep learning stands out most clearly in recently reported Grokking phenomena. While mainly reflected as a sudden increase in test accuracy, Grokking is also believed to be a beyond lazy-learning/Gaussian Process (GP) phenomenon involving feature learning. Here we apply a recent development in the theory of feature learning, the adaptive kernel approach, to two teacher-student models with cubic-polynomial and modular addition teachers. We provide analytical predictions on feature learning and Grokking properties of these models and demonstrate a mapping between Grokking and the theory of phase transitions. We show that after Grokking, the state of the DNN is analogous to the mixed phase following a first-order phase transition. In this mixed phase, the DNN generates useful internal representations of the teacher that are sharply distinct from those before the transition.
    摘要

The Un-Kidnappable Robot: Acoustic Localization of Sneaking People

  • paper_url: http://arxiv.org/abs/2310.03743
  • repo_url: None
  • paper_authors: Mengyu Yang, Patrick Grady, Samarth Brahmbhatt, Arun Balajee Vasudevan, Charles C. Kemp, James Hays
  • for: 研究是用来检测人员是否可以在机器人听到的声音中透露自己的位置。
  • methods: 使用高质量的4通道音频数据和360度RGB数据来训练模型,判断人员是否在静默移动中存在并且确定其位置。
  • results: 实现了通过只使用音频感知来跟踪静默移动的人员的功能,视频示例可以在项目页面上查看:https://sites.google.com/view/unkidnappable-robot。
    Abstract How easy is it to sneak up on a robot? We examine whether we can detect people using only the incidental sounds they produce as they move, even when they try to be quiet. We collect a robotic dataset of high-quality 4-channel audio paired with 360 degree RGB data of people moving in different indoor settings. We train models that predict if there is a moving person nearby and their location using only audio. We implement our method on a robot, allowing it to track a single person moving quietly with only passive audio sensing. For demonstration videos, see our project page: https://sites.google.com/view/unkidnappable-robot
    摘要 如何轻松逃脱机器人的察视?我们研究是否可以通过机器人发出的各种各样的声音来探测人们的存在,即使他们尽力保持沉寂。我们收集了一个机器人数据集,包括高质量的4通道音频数据和360度RGB数据人们在不同的室内场景中移动。我们用音频数据来预测人们在附近的存在和位置。我们将方法应用于机器人上,让它通过只使用音频感知跟踪一个在沉寂状态下移动的人。视频示例请参考我们项目页面:https://sites.google.com/view/unkidnappable-robot

Stochastic interpolants with data-dependent couplings

  • paper_url: http://arxiv.org/abs/2310.03725
  • repo_url: None
  • paper_authors: Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, Eric Vanden-Eijnden
  • for: 这个论文旨在构建基于动态运输概率的 conditional generative models,以便使用class标签或连续嵌入来拟合target density。
  • methods: 该论文使用stochastic interpolants的框架来正式地\textit{couple} base density和target density,然后通过解决一个简单的方差损失问题来学习transport maps。
  • results: 实验表明,通过建立dependent coupling,可以在super-resolution和in-painting中获得更好的结果。
    Abstract Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
    摘要 Simplified Chinese:生成模型,如流和散射,通过建立两个概率密度之间的连续时间映射。在这种工作中,我们使用权重 interpolant 框架来规范如何对基准密度和目标密度进行coupling。这使得我们能够通过包含类标签或连续嵌入的信息来构建动态传输图,这些图可以作为条件生成模型。我们表明这些传输图可以通过解决一个简单的方差损失回归问题来学习。我们通过实验展示了在超分辨和填充等应用中,建立依赖关系的好处。

Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

  • paper_url: http://arxiv.org/abs/2310.03722
  • repo_url: None
  • paper_authors: Hongjian Wang, Aaditya Ramdas
  • for: 这个论文是为了描述一种用于泊松分布中方差未知的 случа的信任序列和信任程序。
  • methods: 该论文使用了通用非integrable martingale和扩展 Ville 不等式来构建信任序列和信任程序。
  • results: 该论文发现了一种新的信任序列和信任程序,它们分别使用了将 Лаи的平均替换为 Gaussian 混合,并将右 Haar 混合换为最大 likelihood 估计下的 null 值。该论文还分析了这些方法的信任间隔宽度,并提供了数值实验来比较和对比不同方法。
    Abstract In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
    摘要 在1976年,拉伊(Lai)构造了一个非致命序列对 Gaussian 分布中的均值μ的 confidence sequence。异常的是,他使用了一个不正确(右哈尔)混合和一个平均混合 над μ。在这里,我们仔细介绍拉伊的构造细节,使用通用非integrable martingale和扩展 Ville 不等式。尽管这不会生成一个积分过程(由于非integrability of his martingale),但它们可以用于Sequential t-test。在这篇文章中,我们开发了两个新的积分过程和 confidence sequence для同一个设定:一个是在减少过滤中的测试martingale,另一个是在 canonical data 过滤中的 e-process。这两个积分过程分别由将拉伊的平均混合换成 Gaussian 混合,并将右哈尔混合换成 null 下最大可信度估计。我们还分析了这些 confidence sequence 的宽度,它们有一个怪异的依赖于错误probability α。我们在文章中提供了一些数字实验,以比较和对比不同的方法。

HeaP: Hierarchical Policies for Web Actions using LLMs

  • paper_url: http://arxiv.org/abs/2310.03720
  • repo_url: None
  • paper_authors: Paloma Sodhi, S. R. K. Branavan, Ryan McDonald
  • for: 这个论文旨在解决在网络上进行任务时存在基本挑战,即开放世界任务和网络界面的变化。
  • methods: 论文提出了一种基于大型自然语言模型(LLM)的解决方案,即通过分解网络任务为一系列低级别、闭环策略来解决问题。这些策略组成一个共享语法,可以将新的网络任务表示为这些策略的compositions。
  • results: 论文在一系列网络任务上进行了评估,包括MiniWoB++, WebArena、mock航空公司客服系统和实际网站交互,并与优先作出比较,结果表明它能够在使用数量级别下的数据量下表现出类似或更好的效果。
    Abstract Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.
    摘要

Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities

  • paper_url: http://arxiv.org/abs/2310.03696
  • repo_url: None
  • paper_authors: Rahul Parhi, Michael Unser
  • For: investigate the variational optimality of neural architectures with multivariate nonlinearities/activation functions.* Methods: construct a new family of Banach spaces using regularization operators and the $k$-plane transform, prove a representer theorem.* Results: optimal neural architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, shed light on the regularity of functions learned by neural networks trained on data with multivariate nonlinearities.
    Abstract We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    摘要 我们研究一类 neural network 的可变优化问题(具体来说是 Banach space 优化问题),这类 neural network 具有多变量非线性/活动函数。为此,我们构造了一个新的 Banach space 家族,通过正则化算子和 $k$-plane transform 定义。我们证明了一个表示定理, stating that the solution sets of learning problems posed over these Banach spaces 是由 neural network WITH multivariate nonlinearities 完全 caracterized。这些优化的架构具有跳过连接和正交 весаnormalization,与多indeces模型和 orthogonal weight normalization 密切相关。我们的框架与 Rectified Linear Unit (ReLU) 活动函数、 norm 活动函数和 radial basis functions 等等 classical nonlinearities 相容。我们还证明了这些下面空间是 reproducing kernel Banach space 和 variation space 的特例。我们的结果解释了 neural network 对数据进行学习时 learned 函数的 regularity,特别是 WITH multivariate nonlinearities,并提供了新的理论动机 для一些实践中的架构选择。

Multimarginal generative modeling with stochastic interpolants

  • paper_url: http://arxiv.org/abs/2310.03695
  • repo_url: None
  • paper_authors: Michael S. Albergo, Nicholas M. Boffi, Michael Lindsey, Eric Vanden-Eijnden
  • for: 学习多个概率密度的联合分布,以恢复这些密度作为边缘分布。
  • methods: 基于动态传输推定的方法,使用速度和评价场来定义生成模型,并在一个扩展的时间变量上进行运动。
  • results: 提出了一种可以提取多向对应关系的多 marginal 生成模型,并在数据修复、风格转换和公平性等方面具有应用前景。同时,该方法还可以在双边分布情况下提高动态传输成本。在numerical例子中,提供了证明。
    Abstract Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
    摘要 Translated into Simplified Chinese:给定一个包含 $K$ 概率密度函数的集合,我们考虑多个概率密度函数的生成模型问题,即学习一个joint分布,使其中的每个分布都是这些概率密度函数的边界。我们将这个问题形式化为一个通用的随机 interpolant 框架下的一种方法,从而获得基于动态传输的有效学习算法。我们的生成模型由速度场和Score场定义,它们可以被视为在一个扩展了时间变量的 simplicx 上的最小二乘目标的解。这种传输在 simplicx 上受到所有边界的影响,并且我们示出了在多个边界之间建立对应关系的能力。这种对应关系有应用于样式传递、算法公平和数据修复等。此外,我们的多边界视角还允许在传统的两边界设定中减少动态传输成本。我们通过一些数值示例来证明这些能力。

Hadamard Domain Training with Integers for Class Incremental Quantized Learning

  • paper_url: http://arxiv.org/abs/2310.03675
  • repo_url: None
  • paper_authors: Martin Schiemer, Clemens JS Schaefer, Jayden Parker Vap, Mark James Horeni, Yu Emma Wang, Juan Ye, Siddharth Joshi
  • for: 提高Edge平台上的继续学习性能,满足隐私和延迟低的应用需求。
  • methods: 使用便宜的哈达马丁变换来实现减少精度的培训,并在精度减少后进行精度控制。
  • results: 在人活动识别和CIFAR100等数据集上实现了继续学习精度下降不到0.5%和3%,同时将所有矩阵乘法输入减少到4位,8位积分器。
    Abstract Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.
    摘要 In our technique, we use tiled matrix multiplication to enable low-bit width accumulators, and we determine which tensors need stochastic rounding. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. Our results show that we can achieve less than 0.5% and 3% accuracy degradation while quantizing all matrix multiplications inputs down to 4-bits with 8-bit accumulators. This demonstrates that our technique can enable low-precision training for continual learning on resource-constrained edge platforms, while maintaining acceptable model accuracy.

Strategic Evaluation: Subjects, Evaluators, and Society

  • paper_url: http://arxiv.org/abs/2310.03655
  • repo_url: None
  • paper_authors: Benjamin Laufer, Jon Kleinberg, Karen Levy, Helen Nissenbaum
  • For: The paper is written to explore the idea that evaluations can be understood as strategic interactions between the evaluator and the subject of evaluation, and how this can lead to misaligned goals and moral judgments.* Methods: The paper uses a model with three interacting agents - the decision subject, the evaluator, and society - to represent the process of evaluation and the strategic behaviors that can arise.* Results: The paper highlights the applicability of the model to a number of social systems where one or two players strategically undermine the others’ interests to advance their own, and argues that the moral standing of strategic behaviors depends on the moral standing of the evaluations and incentives that provoke such behaviors.
    Abstract A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.
    摘要 一种广泛应用的算法是在正式和量化度量朦杂概念(如优劣)进行决策。当人们在这些类型的评估中策略地回应,以获得更有利的决策结果时,他们的行为可能会被道德判断。他们可能会被描述为“游戏系统”或“诈骗”,或者(在其他情况下)投入“正直努力”或“改进”行为。机器学习文献中关于策略行为的描述通常强调试者的努力以获得更有利的评估结果,一些作品提供了预防或避免这种欺骗的方法,一些作品将“游戏”行为与“改进”行为分开,而其他作品则尝试测量评估系统中的努力负担或不同效果。我们从不同的起点开始:评估设计本身可以被理解为评估者所持的目标,这些目标可能与更广泛的社会目标不一致。为了发展这一想法,我们提出了一个模型,表示评估过程中的三个交互者:决策者、评估者和社会,代表一个Bundle of values和监督机制。我们强调我们的模型在多种社会系统中是可适用的,其中一个或两个玩家通过策略性的方式推翻另一个玩家的利益,以提高自己的利益。当评估者被视为自己是策略的时,我们可以重新定位评估审查的注意力,从而把注意力转移到评估设计中的机制和激励机制。strategic behaviors的道德地位常常取决于评估和激励机制的道德地位。

Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics

  • paper_url: http://arxiv.org/abs/2310.03652
  • repo_url: None
  • paper_authors: Jan N. Fuhg, Reese E. Jones, Nikolaos Bouklas
  • for: 这个论文旨在提出一种基于神经网络的数据驱动 constitutive 模型,以便轻松地包含物理和机制性约束,并且可以超越现有的时间消耗大量的现象学 constitutive 法则。
  • methods: 这个论文使用了 Physics-augmented neural network-based constitutive models,通过使用矫正的 $L^{0}$-正则化来保持信任性,同时实现可解释性。
  • results: 论文表明,这种方法可靠地获得可解释性和信任性的 constitutive 模型,并且可以应用于压缩和不压缩的 hyperelasticity、yield 函数和硬化模型。
    Abstract Data-driven constitutive modeling with neural networks has received increased interest in recent years due to its ability to easily incorporate physical and mechanistic constraints and to overcome the challenging and time-consuming task of formulating phenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow to obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priory but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.
    摘要 “数据驱动的构成模型使用神经网络received increased interest in recent years due to its ability to easily incorporatephysical和mechanistic constraints和 overcome the challenging and time-consuming task of formulatingphenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priori but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.”Note that Simplified Chinese is a romanization of Chinese, and the actual Chinese characters may be different.

Rethinking Fairness for Human-AI Collaboration

  • paper_url: http://arxiv.org/abs/2310.03647
  • repo_url: None
  • paper_authors: Haosen Ge, Hamsa Bastani, Osbert Bastani
  • for: Ensuring equitable outcomes in human-AI collaboration, especially when human decision-makers do not comply perfectly with algorithmic decisions.
  • methods: Propose a new approach called compliance-robustly fair algorithmic recommendations, which are guaranteed to improve fairness in decisions regardless of the human’s compliance pattern. An optimization strategy is also proposed to identify the best performance-improving compliance-robustly fair policy.
  • results: Show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy, which means that enforcing traditional fairness constraints may not be desirable if our goal is to improve the equity and accuracy of human-AI collaboration.
    Abstract Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.
    摘要 现有的算法公平方法尝试确保算法决策的结果是公平的,只要人类决策者完全遵循算法的决策。然而,完美地遵循算法是非常rare的现实或者even desirable outcome in human-AI collaboration。实际上,latest studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy.因此,保证公平的结果需要fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern。我们定义了“compliance-robustly fair” algorithmic recommendations,meaning that the recommendations are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern。我们还提出了一种简单的优化策略来标识最佳性能改进的compliance-robustly fair policy。然而,我们表明,可能无法设计算法建议,同时是孤立的公平,compliance-robustly fair,和人类政策更高的准确性。因此,如果我们的目标是提高人类-AI合作的公平和准确性,那么可能不是desirable to enforce traditional fairness constraints。

Distributional PAC-Learning from Nisan’s Natural Proofs

  • paper_url: http://arxiv.org/abs/2310.03641
  • repo_url: None
  • paper_authors: Ari Karchmer
  • for: 这个论文的目的是为了研究自然证明是否可以导致有效地学习Lambda-Circuit,并且是否可以扩展到Lambda不等于AC^0[p]和Valiant的PAC模型中。
  • methods: 这篇论文使用了自然证明的概念,并使用了一种通信复杂度Argument来提出了一种新的分布型PAC模型,以及一些有关这种模型的性质和应用。
  • results: 论文的主要结论是,在某些特定的自然证明情况下,可以有效地学习Lambda-Circuit在新的分布型PAC模型中,并且可以应用于深度2的多数票电路、多面体和DNF等问题。此外,论文还证明了这种模型的一些重要性质和应用。
    Abstract (Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
    摘要 (简化版) 卡尔莫西诺等 (2016) 表明自然证明的电路下界可以efficient地学习Lambda电路,但只有在均匀分布下,使用会员查询,并且符号集合为\AC^0[p]。我们考虑了这种推论是否可以推广到\Lambda不包含\AC^0[p],以及在 ва利安特的PAC模型中学习算法,使用随机示例并学习到任意示例分布。我们得到了正面和负面的结果。负面方面,我们发现如果对每个电路类\Lambda,自然证明可以导致学习\Lambda电路在 ва利安特的PAC模型中,那么存在一个 polynomial time的解决方案,可以在 O(n^{1.5}) 时间内解决唯一最短 вектор问题(uSVP)。这表明自然证明是否可以导致学习\Lambda电路可能取决于\Lambda。正面方面,我们的主要结果是特定的自然证明, originating from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits),可以导致 PAC-learning algorithms in a new distributional variant of Valiant's model。我们的分布式PAC模型更强于Blum等人 (1993) 的平均情况预测模型和Nanashima (2021) 的启发式PAC模型,并具有许多重要的性质,例如可以增强。主要应用包括新的分布式PAC-learning算法 для深度2的多数电路、多面体和 DNFs over natural target distributions,以及 depth-2 majority circuits 无法识别编码输入弱PRFs。

CLASSify: A Web-Based Tool for Machine Learning

  • paper_url: http://arxiv.org/abs/2310.03618
  • repo_url: https://github.com/Aastha2104/Parkinson-Disease-Prediction
  • paper_authors: Aaron D. Mullen, Samuel E. Armstrong, Jeff Talbert, V. K. Cody Bumgardner
  • for: 用于简化机器学习分类问题的解决方案,帮助研究者不需具备深入的机器学习知识就可以使用这种技术。
  • methods: 使用自动化工具,提供多种模型和方法,同时支持生成synthetic数据,进行特征评估,并生成解释性分数以显示影响输出最大的特征。
  • results: 提供了一个开源的工具,可以帮助研究者更容易地解决分类问题,不需要深入的机器学习知识。
    Abstract Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.
    摘要 machine learning 分类问题在生物信息学中广泛,但技术知识要求进行模型训练、优化和推理可能会阻碍研究人员使用这种技术。本文介绍了一个自动化工具来简化机器学习分类问题的训练模型和获得结果,并提供了有用的可视化和数据分析。这个工具支持 binary 和多类分类问题,并提供了多种模型和方法。在界面中,您可以生成假数据来填充缺失的值、平衡类别标签或生成全新的数据集。此外,它还提供了特征评估和生成可读性分数,以指示影响输出的特征。我们介绍了 CLASSify,一个开源的工具来简化解决分类问题的用户体验,无需了解机器学习知识。

Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally

  • paper_url: http://arxiv.org/abs/2310.03614
  • repo_url: None
  • paper_authors: Shawqi Al-Maliki, Adnan Qayyum, Hassan Ali, Mohamed Abdallah, Junaid Qadir, Dinh Thai Hoang, Dusit Niyato, Ala Al-Fuqaha
  • for: 这个论文旨在探讨 AdvML for Social Good (AdvML4G) 这个新兴领域,它利用了 Adversarial Machine Learning (AdvML) 的漏洞,开发出了一系列的 про社会应用程序。
  • methods: 本论文使用了一种稠密的分析方法,涵盖了 AdvML4G 领域的各种研究和应用,包括一个分类法和一个综述。
  • results: 研究发现,AdvML4G 领域的工作具有很高的创新性和可行性,但同时也存在一些挑战和未解决的问题,需要进一步的研究和发展。
    Abstract Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community.
    摘要 深度神经网络(DNN)已经成为机器学习的驱动力,但是研究表明,DNN受到了攻击性示例的影响,导致机器学习领域内的攻击机器学习(AdvML)得到了广泛的关注。研究人员在不同的场景和模式下调查了这些攻击性示例,并发现DNN中嵌入的偏见和不可解释的预测结果,可能导致反社会的AI应用程序。新的AI技术的出现,如大语言模型(LLM),如ChatGPT和GPT-4,会增加反社会应用程序的风险。为了鼓励开发负面向社会的应用程序,并阻止开发反社会应用程序, regulators、实践者和研究人员应该共同合作。在这篇评论中,我们提供了机器学习领域的第一份全面评论,涵盖了AdvML4G的出现、与AdvML的区别和相似性、社会好的相关概念和方面、AdvML4G的动机以及使用AdvML4G作为创新负面向社会应用程序的auxiliary工具的各种研究工作。最后,我们还详细介绍了需要研究社区的各种挑战和开放问题。

GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data

  • paper_url: http://arxiv.org/abs/2310.03611
  • repo_url: https://github.com/ahmedfakhry47/gener
  • paper_authors: Ahmed Fakhry, Raneem Khafagy, Adriaan-Alexander Ludl
  • for: identifying novel gene-gene interactions based on known gene expressions and interaction data
  • methods: parallel-layer deep learning network (GENER) using gene expression data
  • results: outperformed competing methods with an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset
    Abstract Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions.
    摘要 检测和发现新的基因交互是一项具有挑战性的任务。不同的统计学和深度学习方法尝试利用基因交互的topological结构和基因表达特征来预测新的基因交互。然而,一些方法仅仅利用基因表达 profiling。在这个上下文中,我们介绍GENER,一种专门为基因交互预测设计的并行层深度学习网络。我们进行了两个训练实验,并与现有的统计学和深度学习方法进行比较。可以注意的是,我们的模型在combined BioGRID&DREAM5集合上 achieved an average AUROC score of 0.834,在预测基因交互方面表现出色。

Comparing Time-Series Analysis Approaches Utilized in Research Papers to Forecast COVID-19 Cases in Africa: A Literature Review

  • paper_url: http://arxiv.org/abs/2310.03606
  • repo_url: None
  • paper_authors: Ali Ebadi, Ebrahim Sahafizadeh
  • for: 本文旨在比较在非洲预测COVID-19病例的不同时间序分析方法。
  • methods: 本研究使用英语论文,从2020年1月至2023年7月进行了系统性的搜索,特意搜索了在非洲COVID-19数据集上使用时间序分析方法的论文。使用了PubMed、Google Scholar、Scopus和Web of Science等数据库。研究论文经过了评估程序,以提取相关的时间序分析模型实施和性能信息。
  • results: 本研究发现了不同的方法ologies,评估了它们在预测病毒传播的有效性和局限性。结果可能为预测COVID-19病例提供更深入的理解,未来研究应考虑这些理解,以提高时间序分析模型和探索不同方法的集成,以提高公共卫生决策。
    Abstract This literature review aimed to compare various time-series analysis approaches utilized in forecasting COVID-19 cases in Africa. The study involved a methodical search for English-language research papers published between January 2020 and July 2023, focusing specifically on papers that utilized time-series analysis approaches on COVID-19 datasets in Africa. A variety of databases including PubMed, Google Scholar, Scopus, and Web of Science were utilized for this process. The research papers underwent an evaluation process to extract relevant information regarding the implementation and performance of the time-series analysis models. The study highlighted the different methodologies employed, evaluating their effectiveness and limitations in forecasting the spread of the virus. The result of this review could contribute deeper insights into the field, and future research should consider these insights to improve time series analysis models and explore the integration of different approaches for enhanced public health decision-making.
    摘要 Translated into Simplified Chinese:这篇文献综述旨在比较在非洲地区预测COVID-19确诊病例的不同时间序分析方法。该研究在2020年1月至2023年7月期间,通过英文研究论文检索,特定地点在非洲使用时间序分析方法进行COVID-19数据分析。各种数据库,如PubMed、Google学术、Scopus和Web of Science等,都被使用于这个过程中。审查的研究论文中的信息,包括时间序分析模型的实施和性能评估。该研究报告了不同方法的应用和局限性,以及预测病毒传播的效果。这些结果可以为预测领域提供更深入的理解,并且未来的研究应该考虑这些意见,以改进时间序分析模型并探讨不同方法的集成,以提高公共卫生决策。

Sampling via Gradient Flows in the Space of Probability Measures

  • paper_url: http://arxiv.org/abs/2310.03597
  • repo_url: None
  • paper_authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich, Andrew M Stuart
  • for: 采样target概率分布中的一个基本挑战是computational科学和工程中的一个基本问题,recent work shows that algorithms derived by considering gradient flows in the space of probability measures 开辟了新的开发途径。这篇论文提供了三种贡献,分别是:
  • methods: gradient flows中的设计元素的研究。any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms。我们的第一贡献是:Kullback-Leibler divergence作为能量函数,gradient flows resulting from it do not depend on the normalization constant of the target distribution。我们的第二贡献是:study the choice of metric from the perspective of invariance。Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant。as a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows。in particular, we construct various affine invariant Wasserstein and Stein gradient flows。
  • results: affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, both in theory and by using particle methods。we also study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods。we establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically。
    Abstract Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
    摘要 取样target概率分布的问题是计算科学和工程中的基础挑战。近期的研究表明,通过考虑梯度流在概率分布空间上来开发算法,可以开辟新的算法发展途径。本文对这种取样方法做出三项贡献:1. 我们显示出,Kullback-Leibler divergence作为能函数,其梯度流不受目标分布的normalization常数影响。这意味着,通过Kullback-Leibler divergence来定义梯度流,可以避免一些困难,例如,对目标分布的normalization常数进行估计。2. 我们研究了选择metric的问题,从diffusion invariants的角度出发。Fisher-Rao metric是唯一不同Scaling的diffusion invariants metric,但它可能是computationally tractable的问题。我们引入了一种relaxed, affine invariants property for metrics and gradient flows,并构造了各种affine invariants Wasserstein和Stein gradient flows。我们证明了,在取样高度非对称分布时,affine invariants gradient flows会表现更加优于非affine invariants counterparts。3. 我们研究了基于Gaussian approximations的gradient flows的问题,并开发了一种alternative to particle methods。我们建立了各种Gaussian approximate gradient flows的连接,并考虑它们与参数化variational inference中的gradient methods的关系。我们还研究了它们的数学性和numerical性的性质。

TimeGPT-1

  • paper_url: http://arxiv.org/abs/2310.03589
  • repo_url: None
  • paper_authors: Azul Garza, Max Mergenthaler-Canseco
  • for: 这个论文旨在提出一种基于时间序列的基础模型,能够生成准确的预测结果,并不需要训练数据集包含该预测目标。
  • methods: 该论文使用了一种基于Transformer的强大预测模型,并通过预训练和自适应训练来提高模型的性能。
  • results: 论文的实验结果表明,TimeGPT模型在不同的时间序列数据集上的预测性能强于传统的统计学、机器学习和深度学习方法,同时具有较高的效率和简洁性。
    Abstract In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
    摘要 在本文中,我们介绍了 TimeGPT,首个适用于时间序列的基础模型,能够生成准确的预测结果,无需见到训练数据。我们对我们的预训练模型进行了对比,与统计学、机器学习和深度学习方法进行了比较,结果显示,TimeGPT零shot推理性能高效简单。我们的研究表明,从其他人工智能领域的技术可以有效应用于时间序列分析。我们认为,大规模的时间序列模型将为精确预测和减少不确定性提供了一个激动人心的机会,通过利用当代深度学习技术。

Smoothing Methods for Automatic Differentiation Across Conditional Branches

  • paper_url: http://arxiv.org/abs/2310.03585
  • repo_url: https://github.com/philipp-andelfinger/discograd
  • paper_authors: Justin N. Kreikemeyer, Philipp Andelfinger
  • for: 这个论文的目的是提出一种基于幂分析的自动分配方法,以便在控制流构造引入的缺陷中进行优化。
  • methods: 这个论文使用了幂分析(SI)和自动导数(AD)两种方法,以计算分支程序的导数。
  • results: 研究人员通过对分支程序的输出进行幂分析,并使用自动导数来计算导数,从而实现了基于导数的参数synthesis。这种方法可以在分支程序中进行高效的优化。
    Abstract Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.
    摘要 Programs with discontinuities caused by control flow constructs, such as conditional branches, pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. We combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We discuss the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.

Targeted Adversarial Attacks on Generalizable Neural Radiance Fields

  • paper_url: http://arxiv.org/abs/2310.03578
  • repo_url: None
  • paper_authors: Andras Horvath, Csaba M. Jozsa
  • for: This paper discusses the vulnerability of Neural Radiance Fields (NeRFs) to adversarial attacks, and demonstrates the effectiveness of both low-intensity and targeted attacks.
  • methods: The paper uses NeRFs to synthesize high-quality images from sparse 2D observations, and employs both low-intensity and targeted adversarial attacks to evaluate the model’s robustness.
  • results: The paper shows that NeRFs can be vulnerable to both low-intensity and targeted adversarial attacks, and that the attacks can be robust enough to be used in real-world applications. Additionally, the paper demonstrates the ability to generate specific, predefined output scenes using targeted attacks.
    Abstract Neural Radiance Fields (NeRFs) have recently emerged as a powerful tool for 3D scene representation and rendering. These data-driven models can learn to synthesize high-quality images from sparse 2D observations, enabling realistic and interactive scene reconstructions. However, the growing usage of NeRFs in critical applications such as augmented reality, robotics, and virtual environments could be threatened by adversarial attacks. In this paper we present how generalizable NeRFs can be attacked by both low-intensity adversarial attacks and adversarial patches, where the later could be robust enough to be used in real world applications. We also demonstrate targeted attacks, where a specific, predefined output scene is generated by these attack with success.
    摘要 neural radiance fields (NeRFs) 近期出现为3D场景表示和渲染的强大工具。这些数据驱动模型可以从稀疏的2D观察中学习生成高质量图像,使得场景重建变得真实和交互式。然而,随着NeRFs在扩展实际应用,如增强现实、机器人和虚拟环境中的使用,它们可能会受到恶意攻击。 在这篇论文中,我们表明了通用NeRFs可以受到低强度攻击和攻击贴图的威胁。其中,攻击贴图可能够在实际应用中使用,并且可以实现特定、预先定义的输出场景。我们还示出了 Targeted 攻击,即通过攻击NeRFs来生成特定的场景。

Analysis of learning a flow-based generative model from limited sample complexity

  • paper_url: http://arxiv.org/abs/2310.03575
  • repo_url: https://github.com/spoc-group/diffusion_gmm
  • paper_authors: Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová
  • for: 这个论文旨在训练一种基于流的生成模型,以 parametrize 一个二层自编码器,从高维 Gaussian 混合分布中采样。
  • methods: 这个论文使用了一种 shallow denoising auto-encoder 进行训练,并提供了一个精确的关闭式分析。
  • results: 研究结果显示,使用这种方法可以实现高维 Gaussian 混合分布中采样,并且可以提供一个 Bayes-optimal 的方法来评估模型的性能。
    Abstract We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
    摘要 我们研究一个基于流的生成模型, Parametrized by a two-layer autoencoder,可以抽样自高维 Gaussian 混合体。我们提供了锐利的终端分析。首先,我们提供了一个紧密的关闭式形式的learned velocity field,当 Parametrized by a shallow denoising autoencoder 在一定数量 $n$ of samples from the target distribution 上训练。 Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density。 Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal。Here's the translation breakdown:* "We study the problem of training a flow-based generative model" becomes "我们研究一个基于流的生成模型"* "parametrized by a two-layer autoencoder" becomes "Parametrized by a two-layer autoencoder"* "to sample from a high-dimensional Gaussian mixture" becomes "可以抽样自高维 Gaussian 混合体"* "We provide a sharp end-to-end analysis of the problem" becomes "我们提供了锐利的终端分析"* "First, we provide a tight closed-form characterization of the learnt velocity field" becomes "首先,我们提供了一个紧密的关闭式形式的learned velocity field"* "when parametrized by a shallow denoising autoencoder trained on a finite number $n$ of samples from the target distribution" becomes "当 Parametrized by a shallow denoising autoencoder 在一定数量 $n$ of samples from the target distribution 上训练"* "Building on this analysis, we provide a sharp description of the corresponding generative flow" becomes "Building on this analysis, we provide a sharp description of the corresponding generative flow"* "which pushes the base Gaussian density forward to an approximation of the target density" becomes "which pushes the base Gaussian density forward to an approximation of the target density"* "Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture" becomes " Specifically, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture"* "which we show decays as $\Theta_n(\frac{1}{n})$" becomes "which we show decays as $\Theta_n(\frac{1}{n})$"* "Finally, this rate is shown to be in fact Bayes-optimal" becomes "Finally, this rate is shown to be in fact Bayes-optimal"

Residual Multi-Fidelity Neural Network Computing

  • paper_url: http://arxiv.org/abs/2310.03572
  • repo_url: None
  • paper_authors: Owen Davis, Mohammad Motamed, Raul Tempone
  • for: constructed a neural network surrogate model using multi-fidelity information
  • methods: residual multi-fidelity computational framework, two neural networks working in concert
  • results: dramatic savings in computational cost, accurate predictions within small tolerances
    Abstract In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.
    摘要 在这项工作中,我们考虑了一个总体的神经网络模拟器使用多种精度信息的问题。给定一个便宜的低精度计算模型和一个昂贵的高精度计算模型,我们提出了一个多质量计算框架,它将计算模型之间的相关性表示为一个差分函数,这可能是一个非线性映射,它将1)输入空间中共享的模型输出和低精度模型输出相关联,2)两个模型输出之间的差异。为了实现这一点,我们训练了两个神经网络。第一个网络学习了差分函数,它在一小段高精度和低精度数据上进行了训练。一旦训练完成,这个网络就可以生成更多的 sintetic高精度数据,这些数据被用在第二个网络的训练中。第二个网络,一旦训练完成,就成为我们的神经网络模拟器,用于预测高精度量度。我们给出了三个数学例子,以示本提案的能力。特别是,我们发现,当输出预测需要在小误差内时,可以获得巨大的计算成本减少。

Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective

  • paper_url: http://arxiv.org/abs/2310.03556
  • repo_url: None
  • paper_authors: Kutay Bölat, Simon H. Tindemans, Peter Palensky
  • for: 这篇论文的目的是提出一种适应不同密度区域的逻辑密度函数模型,以优化数据驱动方法的能力。
  • methods: 该模型使用了自适应kernel density estimation(KDE)方法,并采用了留一个出去最大logs likelihood(LOO-MLL) criterion来避免缺失singularity问题。
  • results: 该模型在两个不同的电力系统数据集上进行了测试,并与 Gaussian mixture models进行了比较。结果表明,提出的模型具有扎实的性能,同时具有防止缺失singularity的保证。
    Abstract Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    摘要 probabilistic 模型 power 系统操作和规划过程取决于数据驱动方法,需要具有足够的数据量。当历史数据缺乏这些数据时,可以模型下面的数据生成机制为概率分布,以评估数据质量并生成更多数据,如果需要。基于 kernel density estimation(KDE)的模型是非常流行的选择,但它们无法适应数据区域中的不同浓度。在这篇论文中,一种适应型KDE模型被employs,其中每个kernel具有自己的宽度。通过 leave-one-out maximum log-likelihood(LOO-MLL) criterion来避免普通的最大 log-likelihood(MLL) criterion所引起的孤立解,并且证明了LOO-MLL的可靠性。此外,在这个可靠性保证下,模型被扩展了,并将学习权重分配给kernel。此外,一种修改后的 expectation-maximization 算法被使用,以加速优化速度可靠地。提出的方法和模型在两个不同的电力系统数据集上进行了不同的统计测试和对比 Gaussian mixture models,结果表明,提出的模型具有良好的表现,同时具有可靠性保证。

Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models

  • paper_url: http://arxiv.org/abs/2310.03546
  • repo_url: https://github.com/marien-renaud/pnp_ula_posterior_law_sensivity
  • paper_authors: Marien Renaud, Jiaming Liu, Valentin de Bortoli, Andrés Almansa, Ulugbek S. Kamilov
  • for: 本研究探讨了 posterior sampling 在媒体 inverse problems 中的应用,以及 recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) 的应用在 Monte Carlo 样本和最小平均方差 (MMSE) 估计中。
  • methods: 本研究使用了 posterior-L2 pseudometric,以量化 PnP-ULA 下的样本分布匹配度和数据准确性。
  • results: 数值 validate 结果表明,PnP-ULA 的样本分布对于不符合的测量模型和滤波器有明确的敏感性。
    Abstract Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
    摘要 <>将文本翻译成简化中文。<> posterior sampling 已经被证明是一种有力的 bayesian 方法,用于解决图像反问题。最近的插入式不变Langevin算法(PnP-ULA)已经被认为是一种有前途的方法,通过将物理测量模型与深度学习假设相结合,使用图像去噪器来 specify deep-learning priors。然而, posterior sampling 分布中 PnP-ULA 与数据准确性和去噪器之间的复杂关系尚未被理论上分析。我们强调这一点,并提出了 posterior-L2 pseudometric,用于量化 PnP-ULA 下的明确误差 bound。我们在几个反问题上进行数值验证,结果表明,PnP-ULA 的 sampling 分布对于测量模型和去噪器的不一致具有高度的敏感性。

Distribution-free risk assessment of regression-based machine learning algorithms

  • paper_url: http://arxiv.org/abs/2310.03545
  • repo_url: None
  • paper_authors: Sukrita Singh, Neeraj Sarna, Yuanyuan Li, Yang Li, Agni Orfanoudaki, Michael Berger
  • for: 该论文旨在解决机器学习模型在实际应用中的风险评估问题,特别是在医学和工程等高风险应用中。
  • methods: 该论文使用了 конформаль预测方法来解决风险评估问题,提供了一种保证报告的预测范围内包含真实标签的预测方法。
  • results: 该论文通过对具有不同模型化情况、数据集大小和 конформаль预测方法的实验,证明了其方法的准确性和可靠性。
    Abstract Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
    摘要 In this paper, we focus on regression algorithms and the risk-assessment task of calculating the probability of the true label falling within a range around the model's prediction. We solve this problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a certain probability. By using this coverage property, we prove that our approximated failure probability is conservative, meaning it is not lower than the true failure probability of the machine learning algorithm.We conduct extensive experiments to empirically study the accuracy of our proposed method for problems with and without covariate shift. Our analysis covers different modeling regimes, dataset sizes, and conformal prediction methodologies.

Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks

  • paper_url: http://arxiv.org/abs/2310.03530
  • repo_url: None
  • paper_authors: Sho Sonoda, Hideyuki Ishi, Isao Ishikawa, Masahiro Ikeda
  • for: 研究内存网络中数据对称和几何的编码方式
  • methods: 使用 JOINT GROUP INVARIANT FUNCTION 系统地找到数据域中的 dual group action
  • results: 提出了一种新的 GROUP THEORETIC PROOF 用于证明普遍性定理,连接了几何深度学习与抽象的幂分析Note: “ JOINT GROUP INVARIANT FUNCTION” and “GROUP THEORETIC PROOF” are in English, as there is no direct Simplified Chinese translation for these terms.
    Abstract The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    摘要 “对于输入数据的对称和几何都是嵌入到神经网络内部的内部数据表示中的一部分,但具体的编码规则尚未得到充分研究。我们将注意力集中在质共变函数的质共变函数域上,从数据领域上的群动作中找到另一个群动作在参数领域上。此外,我们引入了通过质共变函数所导引的扩展神经网络,并提出了一个新的群论证明其 universality 定理,使用 Schul's lemma。传统的 universality 定理都是基于函数分析方法,这项研究将几何深度学习与抽象数学分析之间的连结推广到群论方面。”Note: Simplified Chinese is used in this translation, which is a more casual and widely used version of Chinese. If you prefer Traditional Chinese, I can provide that version as well.

Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks

  • paper_url: http://arxiv.org/abs/2310.03529
  • repo_url: None
  • paper_authors: Sho Sonoda, Yuka Hashimoto, Isao Ishikawa, Masahiro Ikeda
  • for: 这篇论文是研究深度神经网络(DNN)中隐藏层的发现和分析的。
  • methods: 作者使用了群作用理论来表述DNN的结构,并通过使用Schur的 lemma进行了一种简单的证明,证明了DNN的 universality。
  • results: 研究表明,DNN可以被视为一种 dual voice transform,与 Koopman 运算器相关的 linear 表示。这种表示可以捕捉到DNN中隐藏的层结构,并且可以用于分析DNN的行为。
    Abstract We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.
    摘要 我们在深度神经网络(DNN)中寻找隐藏层,使用群作用在数据空间进行表示,并将DNN转换为koopman运算的双声变换。基于群理论的论据,特别是使用舒尔的 lemma,我们提供了DNN的 universality 的简单证明。Here's a breakdown of the translation:* "We identify hidden layers inside a DNN" becomes "我们在深度神经网络中寻找隐藏层"* "with group actions on the data space" becomes "使用群作用在数据空间进行表示"* "and formulate the DNN as a dual voice transform with respect to Koopman operator" becomes "并将DNN转换为koopman运算的双声变换"* "Based on the group theoretic arguments" becomes "基于群理论的论据"* "particularly by using Schur's lemma" becomes "特别是使用舒尔的 lemma"* "we show a simple proof of the universality of those DNNs" becomes "我们提供了DNN的 universality 的简单证明"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

High-dimensional Bayesian Optimization with Group Testing

  • paper_url: http://arxiv.org/abs/2310.03515
  • repo_url: https://github.com/gtboauthors/gtbo
  • paper_authors: Erik Orm Hellsten, Carl Hvarfner, Leonard Papenmeier, Luigi Nardi
  • for: 优化高维黑盒函数,尤其是在高维问题中,因为目标函数模型受到维度的咒语,准确模型具有困难。
  • methods: 我们提出了一种组测方法,即组测bayesian优化(GTBO),通过系统地选择和测试变量的组合来帮助高维优化。我们对函数范围的扩展进行了group testing理论,以便更好地地模型目标函数。
  • results: GTBO在一些synthetic和实际高维优化任务上与现有方法竞争,并且可以帮助实际Operator发现活跃参数,从而提高问题理解。
    Abstract Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    摘要 bayesian 优化是一种有效的优化昂贵黑盒函数的方法。高维问题特别困难,因为目标函数的模型受到维度之咒的影响,准确模型困难。我们提出了一种组测试方法,以便在这些领域中高效地优化。我们称之为组测试 bayesian 优化(GTBO)。在第一个测试阶段,GTBO首先运行一系列的组测试,选择并测试变量的组合,以确定影响目标函数的变量。然后,在第二个优化阶段,GTBO通过强调活跃维度来导航优化。通过利用轴对齐的子空间假设,GTBO与当前状态艺术方法竞争。此外,GTBO可以帮助发现应用中活跃参数,从而增强实践者对问题的理解。

Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models

  • paper_url: http://arxiv.org/abs/2310.03512
  • repo_url: None
  • paper_authors: Meng Shang, Lenore Dedeyne, Jolan Dupont, Laura Vercauteren, Nadjia Amini, Laurence Lapauw, Evelien Gielen, Sabine Verschueren, Carolina Varon, Walter De Raedt, Bart Vanrumste
  • for: 这个研究的目的是建立一个不侵入的和准确的系统,用于监测老年人参与奥塔哥体能计划 (OEP)。
  • methods: 这个研究使用了一个单个腰部安装的加速度计 (IMU) 收集数据,并使用了深度学习模型来识别老年人是否在进行 OEP 或者日常生活活动 (ADLs)。
  • results: 研究发现,使用 10 分钟滑动窗口可以在室内和家庭场景下分别达到 window-wise f1-scores 高于 0.95 和 Intersection-over-Union (IoU) f1-scores 高于 0.85。另外,使用 6 秒滑动窗口可以在家庭场景下识别四种 OEP subclass。结果表明,使用单个 IMU 可以准确地监测老年人参与 OEP 的情况,并且可以进行进一步的分析。
    Abstract Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.
    摘要 奥塔哥运动项目(OEP)是一项为老年人提供的康复计划,以提高衰退、肌肉萎缩和平衡。但监测older adults的参与度很具挑战性,因为自我报告(日记)通常不可靠。随着便携式传感器的发展,基于便携式传感器的人体活动识别(HAR)技术在医疗领域得到了广泛应用。然而,它们在OEP中的使用还具有有限的表现。本研究的目标是建立一个不侵入式和准确的OEP监测系统。数据来自于老年人穿着一个背部固定的�ер�orio measure(IMU)。研究采集了两组数据,一个在实验室中,另一个在老年人家中。提出了一种层次结构,包括两个阶段:1)使用深度学习模型来确定older adults是否在进行OEP或日常活动(ADLs),使用10分钟滑动窗口;2)基于第一阶段,使用6秒钟滑动窗口来识别OEP下的亚类。结果表明,在第一阶段,OEP可以在窗口级别上获得window-wise f1分数超过0.95,并且在IoU分数上超过0.85。在第二阶段,在家庭场景下,可以识别四种活动,其中f1分数超过0.8:脚踝肌肉、腹部肌肉、膝盖弯曲和坐姿转起。结果表明,使用单个IMU可以在日常生活中监测OEP的合作性。此外,一些OEP下的亚类也可以被识别,以供进一步分析。

Deep Generative Models of Music Expectation

  • paper_url: http://arxiv.org/abs/2310.03500
  • repo_url: None
  • paper_authors: Ninon Lizé Masclef, T. Anderson Keller
  • for: 这个研究的目的是开发现代深度生成模型来计算音乐的惊喜度和预期。
  • methods: 这个研究使用了扩散模型,它是一种基于深度神经网络的概率生成模型,可以直接从训练集中学习复杂的非线性特征。
  • results: 研究发现,使用 pré-trained扩散模型可以计算出高度准确的音乐惊喜度值,并且这些值与人类 слуша者的喜欢度评分 exhibit 一种负QUadratic关系。
    Abstract A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.
    摘要 一种流行的音乐情感响应理论中心于意外性和期望。在先前的工作中,这个想法被实现为probabilistic模型,允许精确计算歌曲(或每个音符)的概率,基于一个'训练集'的前期音乐或文化经验。然而,这些模型只能通过手动设计特征或restricted linear模型来计算恰当的概率,这些模型可能不能表达音乐中的复杂 conditional distribution。在这项工作中,我们提议使用现代深度概率生成模型,即Diffusion Model,计算音乐输入序列的近似概率。与先前工作不同,这种生成模型由深度神经网络参数化,可以直接从训练集中学习复杂非线性特征。我们预计,这种模型将更准确地表达人类听众对音乐的意外性。从文献来看,知道存在一个人际U型关系 между意外性和人类听众对一首歌曲的评分。在这项工作中,我们证明了预训练的Diffusion Models实际上对音乐的意外性值具有负二次关系,与测量的听众评分相关度呈现负相关性。因此,我们提出了这种模型作为现代深度生成模型的音乐期望和主观喜欢度的开发前期步骤。

TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method

  • paper_url: http://arxiv.org/abs/2310.03491
  • repo_url: None
  • paper_authors: Washington Cunha, Celso França, Leonardo Rocha, Marcos André Gonçalves
  • for: 该论文是为了解决企业间产品描述标准化问题,即将客户提供的产品描述与产品目录中的描述匹配。
  • methods: 该论文提出了一种基于Transformer的两步产品和类别描述检索方法(TPDR),利用注意力机制和对比学习来探索 semantic correspondence between IS和SD。
  • results: 该论文在11个真实公司的应用上实现了71%的正确检索和80%的正确分类,并且与纯粹的语法或semantic基线比较而言,效果提高达3.7倍。
    Abstract There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves.
    摘要 有一些公司专门为其他公司批购大量多种产品,主要挑战是标准化产品描述,即将客户提供的产品描述与公司 catalo 中的产品描述匹配。这是一个复杂的问题,因为客户的产品描述可能具有以下特点:(1)可能含有噪音(即不必要的信息);(2)短板和不够详细(例如缺少型号和大小信息);(3)跨语言。在这篇论文中,我们将这个问题正式化为排名任务:给定客户的初始产品规范(查询),返回最相应的标准化产品描述(响应)。我们提议使用 transformers 来解决这个问题,具体来说,我们使用 transformers 作为两个编码器,一个用于编码 IS(Initial Specification),另一个用于编码 SD(Standard Description),两者之间需要在 embedding 空间中相似。我们还使用对应对(IS、SD)的匹配性进行加重,使用特殊的损失函数进行强制匹配。此外,我们还使用一个(第二)重新排名步骤,基于语法特征,以便更好地匹配某些产品,这些产品可能由 transformers 被忽略。为评估我们的提议,我们使用了 11 个真实公司的数据集,覆盖不同的应用场景。我们的解决方案能够在前5名的排名中 retrieved 正确的标准化产品,并在 71% 的情况下在第一名中 retrieved 正确的类别。此外,与纯语法或 semantics 基eline 相比,我们的解决方案的效果提高了多达 3.7 倍,解决了 none 的方法无法办的案例。

The Geometric Structure of Fully-Connected ReLU-Layers

  • paper_url: http://arxiv.org/abs/2310.03482
  • repo_url: None
  • paper_authors: Jonatan Vallin, Karl Larsson, Mats G. Larson
  • for: 这篇论文主要针对$d$-维充满ReLU层在神经网络中的几何结构进行了正式化和解释。
  • methods: 这篇论文使用了ReLU层的参数来自然地将输入空间分成多个部分,并在每个部分中简化ReLU层,从而导致了ReLU层的几何解释。
  • results: 这篇论文提出了一种简化表达折射面和折射平面的方法,并证明了这种结构可以在分类设置下描述决策边界。此外,论文还研究了一个具有一个隐藏层的普通feedforward网络的决策边界的几何复杂性,以及证明了这种网络只能生成$d$个不同的决策边界。最后,论文还讨论了增加更多层的影响。
    Abstract We formalize and interpret the geometric structure of $d$-dimensional fully connected ReLU-layers in neural networks. The parameters of a ReLU-layer induce a natural partition of the input domain, such that in each sector of the partition, the ReLU-layer can be greatly simplified. This leads to a geometric interpretation of a ReLU-layer as a projection onto a polyhedral cone followed by an affine transformation, in line with the description in [doi:10.48550/arXiv.1905.08922] for convolutional networks with ReLU activations. Further, this structure facilitates simplified expressions for preimages of the intersection between partition sectors and hyperplanes, which is useful when describing decision boundaries in a classification setting. We investigate this in detail for a feed-forward network with one hidden ReLU-layer, where we provide results on the geometric complexity of the decision boundary generated by such networks, as well as proving that modulo an affine transformation, such a network can only generate $d$ different decision boundaries. Finally, the effect of adding more layers to the network is discussed.
    摘要 我们正式化和解释了深度学习网络中的$d$-维完全相连ReLU层的几何结构。层的参数导致了输入空间的自然分割,在每个分割部分中,ReLU层可以大大简化。这导致了ReLU层的几何解释为一个投影到多边形锥后的拓扑变换,与[doi:10.48550/arXiv.1905.08922]中的对于具有ReLU启动函数的卷积网络的描述相符。此外,这结构使得partition部分与条件面的顶点的前像简化了,这有用于描述分类设定中的决策界面。我们在详细调查了一个具有一个隐藏层的对应网络,并提供了决策界面的几何复杂性的结果,以及证明这种网络只能生成$d$个不同的决策界面。最后,我们讨论了将更多层添加到网络中的效果。

The Cadenza ICASSP 2024 Grand Challenge

  • paper_url: http://arxiv.org/abs/2310.03480
  • repo_url: None
  • paper_authors: Gerardo Roa Dabike, Michael A. Akeroyd, Scott Bannister, Jon Barker, Trevor J. Cox, Bruno Fazenda, Jennifer Firth, Simone Graetzer, Alinka Greasley, Rebecca Vos, William Whitmer
  • for: 该论文旨在推动听力障碍人群音频质量提高,通过音乐分解和个性化重新混音来提高听力障碍人群对音乐的听众体验。
  • methods: 该论文提出了一种基于ICASSP SP Cadenza Challenge的音乐分解和重新混音方法,通过分解音乐为 vocals、bass、drums 等组成部分,并通过个性化的混音方法来提高音频质量。
  • results: 该论文通过使用HAAQI指标进行评估,发现该方法可以提高听力障碍人群对音乐的听众体验。
    Abstract The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.
    摘要 文本:The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.翻译: cadence 项目目标是提高音乐听众听力损伤人群的音频质量。为此,项目在ICASSP SP Cadenza Challenge:音乐分解/重新混音 для听众器中进行组织。挑战可以通过在听众器麦克风中分解音乐来实现,将 vocals、bass、鼓等组分分别处理,然后通过个性化方式进行重新混音,以提高音频质量。或者可以使用综合方法。处理需要考虑音乐本身,每个组分的增益应用,以及听众的听力损伤。提交的作品将根据潜入式对metric,听音器音频质量指数(HAAQI)进行评估。本文介绍了这个挑战。

The Blame Problem in Evaluating Local Explanations, and How to Tackle it

  • paper_url: http://arxiv.org/abs/2310.03466
  • repo_url: None
  • paper_authors: Amir Hossein Akhavan Rahnama
  • for: 本研究提出了一个新的本地解释评价分类,以解决当前本地解释技术的评价问题。
  • methods: 本研究使用了多种评价方法,包括Robustness、基于真实数据 Synthetic dataset的评价、模型随机化评价和人类基于评价。
  • results: 研究发现,除了基于可读性模型的真实数据评价外,其他评价方法均受到一种问题称为“责任问题”的影响。此外,即使使用这种评价方法,本地解释评价仍然存在问题。
    Abstract The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the "blame problem." In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.
    摘要 “当前本地模型无关解释技术的数量呈急剧增长趋势。主要原因是开发新解释技术的门槛很低,由于缺乏优化的评价指标,没有充分的证据表明新解释技术是否可以显著超越前一代。我们的研究提出了一个新的本地解释评价分类法:可靠性、使用synthetic数据生成的真实数据和可解 modelo randomization、以及人类权威评价。使用这个分类法,我们发现所有类型的评价方法,除了基于可解 modelo的真实数据,都受到一种问题,我们称之为“责任问题”。在我们的研究中,我们认为这一类评价方法是评价本地模型无关解释的更合理的方法,但我们还发现这些评价方法具有进一步的局限性。本地解释评价仍然是一个开放的研究问题。”

Which mode is better for federated learning? Centralized or Decentralized

  • paper_url: http://arxiv.org/abs/2310.03461
  • repo_url: None
  • paper_authors: Yan Sun, Li Shen, Dacheng Tao
  • for: 这个论文主要研究了 federated learning(FL)中中心化和分布式方法的比较,以及它们在不同场景下的表现。
  • methods: 这个论文使用了 optimization 和 generalization 两个方面进行了 joint 分析,并提出了一些新的结论和建议。
  • results: 研究发现,在平滑非对称目标函数上,中心化 federated learning(CFL)总是比分布式 federated learning(DFL)更好地泛化;在 CFL 中,采用部分参与比全参与更好;而 DFL 中,需要特定的topology来避免性能崩溃。
    Abstract Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.
    摘要 中央化和分布式方法都在联合学习(FL)中表现出色,但现有研究没有提供足够的证据来评估哪一种表现更好。虽然从优化角度来看,分布式方法可以在通信量更少的情况下达到相当于中央化方法的相同收敛性,但在实际研究中,它的测试性能总是较差。为全面探讨它们在FL中的行为,我们研究了它们的过剩风险,包括优化和泛化的共同分析。我们证明了以下结论:1)中央化联合学习(CFL)总是在不光滑非对称目标函数上的泛化性比分布式联合学习(DFL)更好; 2)在CFL中,采用偏函数参与部分比总参与更有优势; 3)在DFL中,避免训练规模增加时性能崩溃的必要条件是 topology。基于一些简单的硬件指标,我们可以评估哪种框架在实践中更好。我们在常见的FL设置下进行了广泛的实验,以验证我们的理论分析在实际场景中是Contextually valid。

FLAIM: AIM-based Synthetic Data Generation in the Federated Setting

  • paper_url: http://arxiv.org/abs/2310.03447
  • repo_url: https://github.com/Samuel-Maddock/flaim
  • paper_authors: Samuel Maddock, Graham Cormode, Carsten Maple
  • for: 防止个人隐私泄露,启用合作数据分享是组织所需。
  • methods: 使用 Synthetic Data Generation 技术生成 искусствен数据,保持了私人数据的统计性质。
  • results: 提出了一种基于 differential privacy 的 federated Synthetic Tabular Data Generation 方法 DistAIM 和 FLAIM,可以减少了 overhead 并在不同程度的 hetrogeniety 下提高了实用性。
    Abstract Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.
    摘要

Variational Inference for GARCH-family Models

  • paper_url: http://arxiv.org/abs/2310.03435
  • repo_url: None
  • paper_authors: Martin Magris, Alexandros Iosifidis
  • for: 这篇论文是为了检验Variational Inference是否能够成为GARCH-like模型的 bayesian估计的可靠和可行的替代方法。
  • methods: 这篇论文使用了多种Variational Inference优化器、多种波动模型和一个案例研究来证明Variational Inference是一种可靠、非常准确和竞争力强的bayesian学习方法。
  • results: 经过大规模的实验,这篇论文显示了Variational Inference在S&P 500指数的组成部分上的性能非常出色,并且与蒙特卡洛样本的 bayesian估计相比,Variational Inference具有更高的准确性和更好的可靠性。
    Abstract The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.
    摘要 “bayesian预测garch家族模型通常通过蒙地卡罗 sampling来实现,但是variational inference在复杂机器学习模型中的采用尚未得到广泛的应用。这篇文章探讨garch家族模型中variational inference是一种可靠和可行的bayesian预测方法的可行性。通过对S&P 500指数成分进行大规模实验,以及使用多种方差模型和一个案例研究,我们示出了variational inference是一种吸引人、很善地调整和竞争力强的bayesian学习方法。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Over-the-Air Federated Learning with Compressed Sensing: Is Sparsification Necessary?

  • paper_url: http://arxiv.org/abs/2310.03410
  • repo_url: None
  • paper_authors: Adrian Edin, Zheng Chen
  • for: 这个论文主要研究了随机上下文下的联合学习(Federated Learning,FL)系统,其中多个代理机制使用随机上下文中的计算进行模型更新的传输到公共边缘服务器。
  • methods: 作者使用了线性处理和信号水平混合,以减少通道上传输的数据样本。他们还使用了压缩感知(Compressed Sensing,CS)方法来减少数据量。
  • results: 研究发现,不需要将原始模型更新向量先将其简化,而直接将非零元素发送即可以达到更好的性能,即使在同样的总功率限制下。此外,作者还发现,在某些情况下,不使用线性压缩并直接发送简化后的模型更新也可以达到更好的性能。
    Abstract Over-the-Air (OtA) Federated Learning (FL) refers to an FL system where multiple agents apply OtA computation for transmitting model updates to a common edge server. Two important features of OtA computation, namely linear processing and signal-level superposition, motivate the use of linear compression with compressed sensing (CS) methods to reduce the number of data samples transmitted over the channel. The previous works on applying CS methods in OtA FL have primarily assumed that the original model update vectors are sparse, or they have been sparsified before compression. However, it is unclear whether linear compression with CS-based reconstruction is more effective than directly sending the non-zero elements in the sparsified update vectors, under the same total power constraint. In this study, we examine and compare several communication designs with or without sparsification. Our findings demonstrate that sparsification before compression is not necessary. Alternatively, sparsification without linear compression can also achieve better performance than the commonly considered setup that combines both.
    摘要 “气候飞行(OtA)联合学习(FL)”指的是一种FL系统,其中多个代理应用OtA计算来传输模型更新给共同的边缘服务器。两种重要的OtA计算特点,即线性处理和信号水平积加,驱动了使用线性压缩与压缩感知(CS)方法来减少通道上传输的数据样本数。先前的CS方法在OtA FL中的应用主要假设了原始模型更新向量是稀疏的,或者它们已经被稀疏化了 перед压缩。然而,是否Linear压缩与CS基于重建是更有效的,以及是否可以不进行稀疏化,这些问题尚未得到解答。在这项研究中,我们考虑了多种通信设计,其中一些包括稀疏化,而另一些则不包括稀疏化。我们的发现表明,稀疏化 перед压缩并不是必要的。相反,稀疏化而不进行Linear压缩可以实现更好的性能,与传统的设置相比。

RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning

  • paper_url: http://arxiv.org/abs/2310.03406
  • repo_url: None
  • paper_authors: Deepak Raina, Abhishek Mathur, Richard M. Voyles, Juan Wachs, SH Chandrashekhara, Subir Kumar Saha
  • for: 提高自主无人式超声系统中图像质量的问题
  • methods: 使用 Bayesian 优化法在扫描面上进行劳动力平衡,以实现融合探针的自动调整
  • results: 实验结果表明,提出的方法可以在不同的患者中获得高质量的超声图像,其中平均(±SD)的绝对角误差为2.4±0.7度和2.1±1.3度,分别在膜质量和3D人体模型中进行了验证。
    Abstract The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.
    摘要 一个重要挑战 faced by autonomous robotic ultrasound systems 是获得高质量图像 across different patients. 正确的探针旋转角度对于控制ultrasound图像质量具有关键作用。为了解决这个挑战,我们提议一种样本效率的方法,自动调整探针旋转角度,使其与接触面的法向成直角。我们利用 Bayesian Optimization (BO) 基于扫描面上的搜索来快速找到正常化探针旋转角度。我们定义了一个新的目标函数,用于BO搜索,该函数利用探针与接触面之间的Contact force measurement和下面的机械学来确定正常。我们进一步添加了一个补做项来处理目标函数中的噪声。实验结果表明,我们提议的策略可以在 urinary bladder phantoms 和 3D human mesh models 上具有高效性。具体来说,在所有 phantoms 和 3D models 上的 Mean(±SD)绝对角度误差为 $\boldsymbol{2.4\pm0.7^\circ}$ 和 $\boldsymbol{2.1\pm1.3^\circ}$, 分别。

EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning

  • paper_url: http://arxiv.org/abs/2310.03404
  • repo_url: https://github.com/ku-milab/eag-rs
  • paper_authors: Wonsik Jung, Eunjin Jeon, Eunsong Kang, Heung-Il Suk
    for:The paper aims to develop a novel explainability-guided region of interest (ROI) selection framework for brain disease identification using resting-state functional magnetic resonance imaging (rs-fMRI).methods:The proposed framework includes three steps: inter-regional relation learning, explainable connection-wise relevance score estimation, and non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning. The framework leverages an explainable artificial intelligence technique to identify non-linear high-order functional associations among brain regions and select class-discriminative regions for brain disease identification.results:The proposed method outperforms other comparative methods in terms of various evaluation metrics, and qualitative analysis of the selected ROIs identifies ASD subtypes linked to previous neuroscientific studies.
    Abstract Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.
    摘要 深度学习模型基于休息态功能磁共振成像(rs-fMRI)已广泛应用于诊断脑病,特别是自闭症 спектル异常(ASD)。现有研究通过使用 rs-fMRI 的功能相关性(FC)来实现可读性的分类性能。然而,这些研究存在一些局限性,包括FC的缺乏充分信息,不考虑患者之间的个体特征(如不同的症状或不同的病程度),以及分类过程的不可追溯性。为了缓解这些限制,我们提出了一种可追溯性导向的区域选择(EAG-RS)框架。该框架包括以下三步:1. 通过随机种子网络屏蔽来估计非线性关系,以便在不同的脑区之间建立非线性关系;2. 使用可追溯的人工智能技术来计算连接点级别的相关性分数,以探索高级别的函数连接关系;3. 基于高级别功能连接的非线性FC来选择诊断有用的区域,并使用这些区域来学习分类器,以便诊断ASD。我们在使用ABIDE数据集进行实验,并证明了我们的方法在不同的评价指标上表现出色。此外,我们还进行了质量分析选定的ROI,并发现了与先前的神经科学研究相关的ASDSubtypes。

Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning

  • paper_url: http://arxiv.org/abs/2310.03400
  • repo_url: None
  • paper_authors: Huan Ma, Changqing Zhang, Huazhu Fu, Peilin Zhao, Bingzhe Wu
  • for: 本研究的目的是提供私有部署的语言模型 fine-tuning 的实现细节,以便在各个领域进行域specific研究。
  • methods: 本研究使用 Large Language Models (LLMs) 进行语言模型 fine-tuning,并 explore 不同的处理方法以便在私有部署中使用更强大的 LLMs 生成的理由。
  • results: 本研究发现,在私有部署中使用更强大的 LLMs 生成的理由可以提高语言模型的性能,但是需要根据不同的处理方法进行调整。
    Abstract Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. However, in the field of content moderation, there is still a lack of detailed work that systematically introduces implementation details. In this paper, we introduce how to fine-tune an LLM model that can be privately deployed for content moderation. Specifically, we discuss whether incorporating reasons during the fine-tuning process would be better or if it should be treated as a classification task directly. We also explore the benefits of utilizing reasons generated by more powerful LLMs for fine-tuning privately deployed models and the impact of different processing approaches when the answers generated by the more powerful LLMs are incorrect. We report the entire research process and the key findings in this paper, hoping to provide valuable experience for researchers who are fine-tuning privately deployed models in their domain-specific research.
    摘要 现在,每天有数十亿人在互联网上进行交流和表达自己的意见。然而,不幸的是,不所有的表达都是友好或合法的,因此内容审核成为不可或缺的任务。随着大语言模型(LLM)的成功发展,LLM基本方法在各个领域中成为了可能的解决方案。然而,在内容审核领域,还缺乏详细的实施细节。在这篇论文中,我们介绍了如何私有部署LLM模型进行内容审核。specifically,我们讨论了在细化过程中是否应该包含理由,或者直接将其视为分类任务。我们还探讨了使用更强大LLM生成的理由来细化私有部署模型的效果,以及不同处理方法的影响,当更强大LLM生成的答案错误时。我们报告了整个研究过程和关键发现,希望为审核私有部署模型的研究人员提供有价值的经验。

Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein

  • paper_url: http://arxiv.org/abs/2310.03398
  • repo_url: None
  • paper_authors: Hugues Van Assel, Cédric Vincent-Cuaz, Titouan Vayer, Rémi Flamary, Nicolas Courty
  • for: simultaneous reduction of both sample and feature sizes
  • methods: semi-relaxed Gromov-Wasserstein optimal transport (OT) problem
  • results: competitive hard clustering and summarization of real dataHere’s the full sentence in Simplified Chinese:
  • for: 这种方法用于同时减少样本和特征数量
  • methods: 使用半松散格罗莫-瓦asserstein优质运输问题
  • results: 竞争性强的硬聚类和实际数据的概要I hope this helps!
    Abstract We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    摘要 我们提出了一种多样化的维度减少(DR)目标的变体,允许同时减少样本和特征的大小。通过一种半relaxed的格罗莫-瓦asserstein优化运输(OT)问题计算输入和嵌入样本之间的对应关系。当嵌入样本大小与输入一样时,我们的模型可以重新获得经典的受欢迎DR模型。当嵌入维度是自由的时,我们显示出OT计划可以提供竞争性的硬团 clustering。我们强调在实际数据概括时间中的中间阶段,这些阶段将DR和团 clustering融合起来,并应用我们的方法图像 dataset。

Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations

  • paper_url: http://arxiv.org/abs/2310.03393
  • repo_url: None
  • paper_authors: Lorenc Kapllani, Long Teng, Matthias Rottmann
  • for: This paper aims to study uncertainty quantification (UQ) for deep learning-based numerical schemes used to solve high-dimensional backward stochastic differential equations (BSDEs).
  • methods: The paper uses a UQ model that efficiently estimates the standard deviation (STD) of the approximate solution using only a single run of the algorithm, as well as estimates the mean of the approximate solution.
  • results: The numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes, and can identify hyperparameter values for which the scheme achieves good approximations. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values.
    Abstract Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
    摘要 高维度后逆随机分子方程(BSDE)的数学方法Recently, deep learning-based numerical schemes have attracted a lot of scientific attention. These schemes can approximate very high-dimensional BSDEs, but their reliability has not been well studied and is not well understood. In this work, we study the uncertainty quantification (UQ) of a class of deep learning-based BSDE schemes. Specifically, we identify the sources of uncertainty in these schemes and numerically study the impact of different sources. Typically, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. However, this approach is computationally expensive, especially for high-dimensional problems. Therefore, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be used to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model provides reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Moreover, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.

Machine learning the interaction network in coupled dynamical systems

  • paper_url: http://arxiv.org/abs/2310.03378
  • repo_url: None
  • paper_authors: Pawan R. Bhure, M. S. Santhanam
  • for: 这个研究旨在掌握互动动力系统中的相互作用网络信息,以便更好地理解它们之间的相互作用。
  • methods: 这种自动学习神经网络模型可以从观察到的轨迹数据中恢复交互网络和预测个体代理的动态。
  • results: 这种模型在两个动力系统中进行了应用,包括彼此受托的粒子 mediated by Hooke’s law交互和相互频率振荡器。
    Abstract The study of interacting dynamical systems continues to attract research interest in various fields of science and engineering. In a collection of interacting particles, the interaction network contains information about how various components interact with one another. Inferring the information about the interaction network from the dynamics of agents is a problem of long-standing interest. In this work, we employ a self-supervised neural network model to achieve two outcomes: to recover the interaction network and to predict the dynamics of individual agents. Both these information are inferred solely from the observed trajectory data. This work presents an application of the Neural Relational Inference model to two dynamical systems: coupled particles mediated by Hooke's law interaction and coupled phase (Kuramoto) oscillators.
    摘要 研究互动动力系统仍然在不同科学和工程领域吸引研究者的兴趣。在一个集合中的互动网络包含对各个组件之间的交互信息。从动态代理的观察数据中推断交互网络的信息是长期关注的问题。在这个工作中,我们采用了一种无监督神经网络模型,以实现两个目的:恢复交互网络和预测个体代理的动态。这两个信息都是从观察轨迹数据中推断出来的。这篇文章描述了使用神经关系推断模型在两种动力系统中进行应用:彼此受欧姆法则互动的集合体和彼此受库拉摩托oscillators互动。

An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples

  • paper_url: http://arxiv.org/abs/2310.03349
  • repo_url: None
  • paper_authors: Armin Ettenhofer, Jan-Philipp Schulze, Karla Pizzi
  • for: 这个论文的目的是提出一种基于心理听觉模型和房间冲击响应(RIR)的敏感语音攻击示例生成算法,以提高语音识别系统的抗击性和人类听众的鲁棒性。
  • methods: 这个论文使用了心理听觉模型和RIR在生成步骤中,通过动态生成房间冲击响应来模拟物理环境,以强化示例的抗击性。
  • results: 这个论文的实验结果表明,包含心理听觉因素和抗击性的算法在语音识别系统的robustness和人类听众的perceptibility两个方面均显示出改善,但是在word error rate(WER)方面受到了一定的影响。
    Abstract Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate the perceptibility. Our algorithms considering psychoacoustics only or in addition to the robustness show an improvement in the signal-to-noise ratio (SNR) as well as in the human perception study, at the cost of an increased word error rate (WER).
    摘要 audio adversarial examples 是指一种受到自动语音识别(ASR)系统 manipulate 而仍然听起来如人类听起来的音频文件。大多数生成这些样本的方法都是基于两步算法:首先生成可靠的反对例音频文件,然后对其进行辐射和Robustness 的微调。在这项工作中,我们提出了一种集成的算法,使用心理听觉模型和房间冲击响(RIR)在生成过程中。RIR 在生成过程中由神经网络动态创建,以模拟物理环境,以防止在无线攻击中的变换。我们在三个实验中比较了不同的方法:在模拟环境中和真实的无线攻击enario中评估了 robustness,以及在人类研究中评估了人类听起来。我们的算法只考虑心理听觉或者在 robustness 上加以考虑,都显示了 SNR 的提高,以及人类听起来的研究中的提高,但是在 WER 方面带来了一定的增加。

LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework

  • paper_url: http://arxiv.org/abs/2310.03342
  • repo_url: https://github.com/beanie00/lesson
  • paper_authors: Woojun Kim, Jeonghye Kim, Youngchul Sung
  • for: 提出了一个统一的探索框架 для reinforcement learning(RL),基于选择批评模型。
  • methods: 提出了一种能够集成多种多样化的探索策略,使RL代理人可以适应不同任务的适当探索-利用平衡。
  • results: 通过在MiniGrid和Atari环境中的多种实验,证明了提案的探索框架的效果。
    Abstract In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
    摘要 在这篇论文中,一种基于选项-评估器模型的探索游戏RL框架被提出。该提议的框架可以学习集成一组多样化的探索策略,使代理人可以适应不同任务的不同探索策略,以实现每个任务的有效的探索-利用交互。实验表明,提议的探索框架在MiniGrid和Atari环境中得到了证明。

Probabilistic Forecasting of Day-Ahead Electricity Prices and their Volatility with LSTMs

  • paper_url: http://arxiv.org/abs/2310.03339
  • repo_url: None
  • paper_authors: Julius Trebbien, Sebastian Pütz, Benjamin Schäfer, Heidi S. Nygård, Leonardo Rydin Gorjão, Dirk Witthaut
  • for: 预测电力价格的准确性是电力系统管理和智能应用的关键。欧洲电力价格增长很大,变得非常波动,挑战了已有的预测方法。
  • methods: 我们使用了Long Short-Term Memory(LSTM)模型来预测德国卢森堡日前电力价格,以适应这些挑战。LSTM模型的循环结构允许模型适应趋势,并jointly predicting both mean and standard deviation允许probabilistic prediction。
  • results: 使用物理启发的方法——超 statistics来解释价格的统计性,我们显示了LSTM模型能够准确地预测价格和其波动性。
    Abstract Accurate forecasts of electricity prices are crucial for the management of electric power systems and the development of smart applications. European electricity prices have risen substantially and became highly volatile after the Russian invasion of Ukraine, challenging established forecasting methods. Here, we present a Long Short-Term Memory (LSTM) model for the German-Luxembourg day-ahead electricity prices addressing these challenges. The recurrent structure of the LSTM allows the model to adapt to trends, while the joint prediction of both mean and standard deviation enables a probabilistic prediction. Using a physics-inspired approach - superstatistics - to derive an explanation for the statistics of prices, we show that the LSTM model faithfully reproduces both prices and their volatility.
    摘要 准确预测电力价格对电力系统管理和智能应用的发展非常重要。欧洲电力价格在俄罗斯入侵乌克兰后高涨并变得极为不稳,挑战了传统预测方法。我们在这篇文章中介绍了一个基于Long Short-Term Memory(LSTM)模型的德国卢森堡日前电力价格预测方法,以解决这些挑战。LSTM模型的循环结构使其能够适应趋势,而联合预测两者的平均值和标准差使得预测变得 probabilistic。通过基于物理学的方法——超统计学——获得价格统计的解释,我们表明LSTM模型能够准确地复制价格和其不稳。

Untargeted White-box Adversarial Attack with Heuristic Defence Methods in Real-time Deep Learning based Network Intrusion Detection System

  • paper_url: http://arxiv.org/abs/2310.03334
  • repo_url: None
  • paper_authors: Khushnaseeb Roshan, Aasim Zafar, Sheikh Burhan Ul Haque
  • For: This research work aims to increase the robustness of Machine Learning (ML) and Deep Learning (DL) based Network Intrusion Detection Systems (NIDS) against adversarial attacks.* Methods: The research uses four powerful adversarial attack techniques (FGSM, JSMA, PGD, and C&W) to evaluate the performance of NIDS under adversarial attack situations. It also employs three heuristics defense strategies (AT, GDA, and HC) to improve the NIDS robustness.* Results: The research demonstrates the complete workflow of the proposed approach in a real-time network with data packet flow and evaluates the performance of NIDS under adversarial attacks using various performance metrics.
    Abstract Network Intrusion Detection System (NIDS) is a key component in securing the computer network from various cyber security threats and network attacks. However, consider an unfortunate situation where the NIDS is itself attacked and vulnerable more specifically, we can say, How to defend the defender?. In Adversarial Machine Learning (AML), the malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions with intentionally crafted adversarial examples. These adversarial perturbed examples have become the biggest vulnerability of ML and DL based systems and are major obstacles to their adoption in real-time and mission-critical applications such as NIDS. AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks and their defence strategies to safeguard the computer network from various cyber security threads. In this research work, we aim to cover important aspects related to NIDS, adversarial attacks and its defence mechanism to increase the robustness of the ML and DL based NIDS. We implemented four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS. We analyzed its performance in terms of various performance metrics in detail. Furthermore, the three heuristics defence strategies, i.e., Adversarial Training (AT), Gaussian Data Augmentation (GDA) and High Confidence (HC), are implemented to improve the NIDS robustness under adversarial attack situations. The complete workflow is demonstrated in real-time network with data packet flow. This research work provides the overall background for the researchers interested in AML and its implementation from a computer network security point of view.
    摘要 在这项研究中,我们将探讨 NIDS 相关的重要方面,包括反对攻击和防御策略,以提高 ML 和 DL 基础的 NIDS Robustness。我们在 NIDS 中实现了四种强大的反对攻击技术,分别是 Fast Gradient Sign Method (FGSM)、Jacobian Saliency Map Attack (JSMA)、Projected Gradient Descent (PGD) 和 Carlini & Wagner (C&W)。我们在详细的性能指标下进行了分析。此外,我们还实现了三种较为有效的防御策略,即 Adversarial Training (AT)、Gaussian Data Augmentation (GDA) 和 High Confidence (HC)。整个工作流程在实时网络中进行了示例。本研究提供了对 AML 的实现和应用在计算机网络安全方面的全面背景,为研究者提供了一个好的入门点。

Fine-tune Language Models to Approximate Unbiased In-context Learning

  • paper_url: http://arxiv.org/abs/2310.03331
  • repo_url: None
  • paper_authors: Timothy Chu, Zhao Song, Chiwun Yang
  • for: 提高大语言模型(LLM)的域内学习(ICL)性能,并且解决ICL中输入示例偏袋问题。
  • methods: 提出了一种重量算法called RICL(重量域内学习),通过使用无偏示例集来调整语言模型,以便更好地 aproximate域内学习。此外,我们还提出了一种低成本的线性优化重量算法called LARICL(线性优化重量域内学习),它具有较少的训练成本,同时可以提供有效的结果。
  • results: 通过实验 validate our algorithm的性能,发现与 benchmark 包括协助示例基于的域内学习和 класси型 fine-tuning 方法相比,our algorithm 具有显著提高的性能。
    Abstract In-context learning (ICL) is an astonishing emergent ability of large language models (LLMs). By presenting a prompt that includes multiple input-output pairs as examples and introducing a new query input, models can generate the corresponding output. However, the performance of models heavily relies on the quality of the input prompt when implementing in-context learning. Biased or imbalanced input prompts can significantly degrade the performance of language models. To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning). This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example to approximate unbiased in-context learning. Furthermore, we also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL (Linear Approximation of Reweighted In-context Learning). This algorithm requires minimal training cost while providing effective results. We prove the convergence of our algorithm and validate its performance through experiments conducted on a numerical dataset. The experimental findings reveal a substantial improvement in comparison to benchmarks including the performance of casual prompt-based in-context learning and the performance of a classic fine-tuning method.
    摘要 大语言模型(LLM)的内容学习(ICL)是一种惊人的出现能力。通过提供包含多个输入输出对的示例,并在新的查询输入上引入一个新的查询,模型可以生成相应的输出。但是,模型的性能受到干扰输入提示的质量的影响。偏见或不均匀的干扰输入可能会严重降低语言模型的性能。为解决这个问题,我们提出了一个替Weighted algorithm called RICL(重新定量内容学习)。这个算法可以通过一个不偏见的验证集来调整语言模型,以 aproximate 不偏见的内容学习。此外,我们还引入了一个低成本的替Weighted algorithm,即Linear Approximation of Reweighted In-context Learning(线性 aproximation of Reweighted ICL)。这个算法仅需少量的训练成本,但可以提供有效的结果。我们证明了我们的算法的渐进性,并通过实验显示了与参考值相比,包括内容学习和 класи的 fine-tuning 方法的性能有所改善。

BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2310.03320
  • repo_url: None
  • paper_authors: Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N. Ioannidis, Huzefa Rangwala, Rishita Anubhai
  • for: 本研究旨在超越生物医学领域Foundation models(FMs)的局限性,即独立地训练和使用不同类型数据进行任务。
  • methods: 本研究提出了一种新的参数效率学习框架——BioBridge,通过知识图(KG)学习对一种单Modal FM和另一种单Modal FM之间的转换,而不需要 Fine-tune任何下层单Modal FM。
  • results: 实验结果表明,BioBridge可以在多模态检索任务中超过基eline KG嵌入方法(在 average 约76.3%),并且 BioBridge 还示出了在不同模式或关系上的外部泛化能力。此外,BioBridge 还可以用于生物医学多模态问答以及引导生成新药的帮助。
    Abstract Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
    摘要 基础模型(FM)可以利用大量未标注数据来实现广泛的任务表现优秀。然而,生物医学领域中的FM主要是单模态的,即独立地训练和使用对蛋白序列、小分子结构或医疗数据进行任务。为了超越生物医学领域中的FM限制,我们提出了 BioBridge,一种新的参数效率学习框架,用于将独立地训练的单模态FM连接起来,以实现多模态行为。BioBridge通过使用知识图(KG)来学习模态之间的变换,而无需修改任何基础模型。我们的实验结果表明,BioBridge可以在多模态检索任务中击败最佳基eline KG嵌入方法(平均提高约76.3%)。此外,我们也发现 BioBridge具有跨领域泛化能力,可以在未看到的模式或关系上进行推断。此外,我们还证明 BioBridge可以作为生物医学多模态问答系统中的通用检索器,以及生成新药的指导生成助手。

Certifiably Robust Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.03312
  • repo_url: https://github.com/ventr1c/res-gcl
  • paper_authors: Minhua Lin, Teng Xiao, Enyan Dai, Xiang Zhang, Suhang Wang
  • for: 这篇论文主要targets Graph Contrastive Learning (GCL),提出了一种可证明Robustness的方法来增强GCL的可靠性。
  • methods: 作者首先提出了一种综合的评估和证明GCL模型的可靠性的标准,然后提出了一种名为Randomized Edgedrop Smoothing(RES)的新技术,可以确保GCL模型的可证明Robustness。
  • results: 作者通过实验表明,RES可以有效地提高GCL模型的可靠性,并且可以在下游任务中保持证明Robustness。
    Abstract Graph Contrastive Learning (GCL) has emerged as a popular unsupervised graph representation learning method. However, it has been shown that GCL is vulnerable to adversarial attacks on both the graph structure and node attributes. Although empirical approaches have been proposed to enhance the robustness of GCL, the certifiable robustness of GCL is still remain unexplored. In this paper, we develop the first certifiably robust framework in GCL. Specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL. We then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks. Furthermore, an effective training method is proposed for robust GCL. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model. The source code of RES is available at https://github.com/ventr1c/RES-GCL.
    摘要 GRAPH CONTRASTIVE LEARNING (GCL) 已经成为一种受欢迎的无监督 гра网表示学习方法。然而,它已经被证明是对 both гра网结构和节点属性的攻击敏感。 although empirical 方法已经被提出来增强 GCL 的韧性, GCL 的认证韧性仍然未知。 在这篇文章中,我们开发了 GCL 的第一个认证韧性框架。 specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL。 we then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks。 Furthermore, an effective training method is proposed for robust GCL。 extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model。 the source code of RES is available at https://github.com/ventr1c/RES-GCL。

Deep Variational Multivariate Information Bottleneck – A Framework for Variational Losses

  • paper_url: http://arxiv.org/abs/2310.03311
  • repo_url: None
  • paper_authors: Eslam Abdelaleem, Ilya Nemenman, K. Michael Martini
    for: 这种方法的目的是用信息理论来推导和总结现有的维度减少方法,并设计新的方法。methods: 这种方法基于一种多重信息瓶颈的解释,其中两个 bayesian 网络相互质量。具体来说,首先是一个编码图,它指定了压缩数据时需要保留的信息。其次是一个解码图,它指定了数据的生成模型。通过这种解释,我们可以重新计算现有的维度减少方法,包括深度维度减少信息瓶颈(DVIB)、β-VAE 和深度维度减少 canonical correlation analysis(DVCCA)。此外,我们还 derivated一种新的维度减少方法,深度维度减少Symmetric informational bottleneck(DVSIB),它同时压缩两个变量,以保留它们压缩表示中的信息。results: 我们实现了所有这些算法,并对一个修改后的噪音 MNIST 数据集进行评估。结果显示,better matched to the structure of the data 的算法(β-DVCCA 和 DVSIB)可以生成更好的低维度减少空间, measured by classification accuracy and the dimensionality of the latent variables。我们认为这种框架可以用来统一其他多视图表征学习算法,并提供一个直观的框架来 derive 问题特定的损失函数。
    Abstract Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.
    摘要 “维度减少方法已经知道其高精度、生成能力和稳定性。这些方法有许多理论基础。在这篇文章中,我们提出一个统一的原理,基于信息理论,将现有的维度减少方法推广和更新。我们基于多重信息瓶颈的解释,将第一个网络 interpret as an encoder graph,它决定了对数据压缩时保留的信息。第二个网络 interpret as a decoder graph,它决定了资料生成模型。使用这个框架,我们可以重新计算现有的维度减少方法,如深度维度减少信息瓶颈(DVIB)、β-VAE和深度维度减少 canonical correlation analysis(DVCCA)。这个框架还导出了一个内部对数据结构的调整参数,从而产生了β-DVCCA家族。此外,我们还 deriv了一个新的维度减少方法,深度维度减少 симметри情报瓶颈(DVSIB),它同时将两个变数压缩以保留它们压缩表示之间的信息。我们实现了所有这些算法,并评估它们在一个修改后的杂音MNIST dataset上的表现。我们发现,对于资料结构更加适合的算法(β-DVCCA和DVSIB)生成了更好的对数据空间, measured by classification accuracy and the dimensionality of the latent variables。我们认为这个框架可以用来统一其他多观点表示学习算法,并提供了一个直观的框架 для deriving问题特定的损失函数。”

Learning Energy Decompositions for Partial Inference of GFlowNets

  • paper_url: http://arxiv.org/abs/2310.03301
  • repo_url: None
  • paper_authors: Hyosoon Jang, Minsu Kim, Sungsoo Ahn
  • for: This paper aims to improve Generative Flow Networks (GFlowNets) for sampling objects from the Boltzmann energy distribution using partial inference.
  • methods: The paper proposes a novel approach called Learning Energy Decompositions for GFlowNets (LED-GFN), which decomposes the energy of an object into learnable potential functions defined on state transitions and reparameterizes the flow functions using these potential functions.
  • results: The proposed LED-GFN method is empirically verified to be superior to traditional GFlowNets in five problems, including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
    Abstract This paper studies generative flow networks (GFlowNets) to sample objects from the Boltzmann energy distribution via a sequence of actions. In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions. To this end, the recently developed forward-looking GFlowNet reparameterizes the flow functions based on evaluating the energy of intermediate states. However, such an evaluation of intermediate energies may (i) be too expensive or impossible to evaluate and (ii) even provide misleading training signals under large energy fluctuations along the sequence of actions. To resolve this issue, we propose learning energy decompositions for GFlowNets (LED-GFN). Our main idea is to (i) decompose the energy of an object into learnable potential functions defined on state transitions and (ii) reparameterize the flow functions using the potential functions. In particular, to produce informative local credits, we propose to regularize the potential to change smoothly over the sequence of actions. It is also noteworthy that training GFlowNet with our learned potential can preserve the optimal policy. We empirically verify the superiority of LED-GFN in five problems including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
    摘要 The main idea of LED-GFN is to decompose the energy of an object into learnable potential functions defined on state transitions, and reparameterize the flow functions using these potential functions. To produce informative local credits, the authors regularize the potential to change smoothly over the sequence of actions. The authors also show that training GFlowNet with the learned potential can preserve the optimal policy.The paper is evaluated on five problems, including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences, and the results demonstrate the superiority of LED-GFN over traditional GFlowNet.

A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling

  • paper_url: http://arxiv.org/abs/2310.03298
  • repo_url: None
  • paper_authors: Yi-Ping Chen, Liwei Wang, Yigitcan Comlek, Wei Chen
  • for: 提高伪降阶模型和设计优化的精度和效率,通过多模型适应性(Multi-fidelity)方法。
  • methods: 使用隐藏变量泛函过程(Latent Variable Gaussian Process) Map different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels,并在每个替入样本迭代中使用预后分析确定下一个样本的最佳选择。
  • results: 在测试问题中,提出的方法比基准方法在多模型适应性(Multi-fidelity)问题中具有更高的准确率和更好的稳定性,并且方法具有可Switch между多模型适应性(Multi-fidelity)和 bayesian优化(Bayesian Optimization)问题的灵活性。
    Abstract Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.
    摘要 多 fideltiness(MF)方法在增强仿真模型和设计优化中得到广泛应用,但大多数现有MF方法假设固定数据集,不能动态分配资源于不同级别的模型。而我们提议的框架是基于不同级别模型之间的秘密嵌入和相关分析来显式利用它们之间的相关性,以便适应тив sampling。在我们的框架中,每次插入样本迭代包括两步:首先使用高级别(HF)模型确定有最大改进潜力的位置,然后在所有级别模型中寻找最大改进效果与这个位置相关的下一个样本。这是由单个秘密变量 Gaussian Process(LVGP)模型实现的,该模型将不同级别模型映射到可解释的秘密空间,以捕捉它们之间的相关性而不需要假设层次结构。LVGP允许我们在先后分析中评估不同级别样本是否会影响HF响应,并确定下一个样本具有最好的利弊比。通过测试,我们表明了我们的方法在多 fideltiness全球适应(GF)和 Bayesian 优化(BO)问题中具有更高的速度和稳定性,并且可以根据采样函数来 switching между GF和BO。

Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations

  • paper_url: http://arxiv.org/abs/2310.03285
  • repo_url: None
  • paper_authors: Ahmed Abusnaina, Yizhen Wang, Sunpreet Arora, Ke Wang, Mihai Christodorescu, David Mohaisen
  • for: 防止攻击者透过攻击表面来推迟适应较好的防御措施,我们专注于攻击表面的探索。
  • methods: 我们使用实际的binary-level黑盒敌对攻击示例进行根本分析,并发现潜在的攻击表面。此外,我们还探索检测引擎中敏感的变化特征,并证明其可以被攻击。
  • results: 我们的实验结果显示,传统的防御模型对于攻击探索无法获得显著的效果。但是,我们发现可以通过删除潜在的攻击表面来大大增加防御力。因此,我们提出了一些简单 yet effective的方法来减少攻击表面的影响。总的来说,我们的图形基础攻击探索方法可以实现高度的测试精度,分别为88.32%和88.19%。
    Abstract Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.
    摘要 为了提高恶意软件检测的稳定性,我们研究现有恶意软件检测系统的攻击表面。我们对实际的Binary级黑盒反对恶意软件示例进行根本分析。同时,我们揭示检测引擎中敏感的可变特征,并证明其可以被利用。在软件中挖掘可变信道,我们提出三种软件预处理步骤来减少攻击表面,即减少padding,软件剥离,和信道重置。此外,为了对抗增长Section插入攻击,我们提议一种基于图的Section依赖信息提取方案。该方案利用软件中不同Section中的积累信息,以便实现robust的恶意软件检测和缓解对抗设定。我们的实验结果表明,传统的恶意软件检测模型对对抗性攻击无效。但是,可以通过消除可变信道来减少攻击表面。因此,我们提出了一些简单 yet有效的方法来缓解对Binary manipulate攻击的影响。总的来说,我们的图基于恶意软件检测方案可以准确地检测恶意软件,AUC分数为88.32%,并在Binary manipulate攻击下 scores为88.19%,表明我们提出的方案的效果。

Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems

  • paper_url: http://arxiv.org/abs/2310.03278
  • repo_url: None
  • paper_authors: Muhammad Kamran Saeed, Ahmed E. Kamal, Ashfaq Khokhar
    for: 这篇论文关注了大量MIMO系统中的导航信号污染和可扩展性问题。methods: 该论文提出了一种新的导航分配方案,基于设备数据传输模式,将导航序列分配给设备群组而不是个体设备。此外,该论文还使用了最大k-cut图分割方法来解决多个Cell的干扰问题。results: 该论文显示,提出的方案可以显著改善大量MIMO系统的 spectral efficiency,并提高可扩展性。例如,使用十个导航序列可以容纳200个设备,只有12.5%的漏掉率。
    Abstract Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate.
    摘要 大规模MIMO在5G网络发展中扮演重要角色。这篇论文强调 orthogonal pilot sequence的重复和大规模MIMO系统的可扩展性问题。当前的实践是在邻近细分单元中重复orthogonal pilot sequence,这会导致进出inter-和intra-细分单元的pilot sequence很难分辨。为了解决这些挑战,这篇论文提出了一种创新的pilot分配方案,基于IoT设备的数据传输模式。该方案将orthogonal pilot sequence分配给设备集而不是单个设备,allowing multiple devices to share the same pilot for periodically transmitting data。此外,我们将pilot分配问题转化为图色分问题,使用max k-cut图分解方法来解决多细分单元大规模MIMO系统中的pilot污染。提议的方案可以显著提高spectral efficiency和大规模MIMO系统的可扩展性;例如,使用十个orthogonal pilot sequence可以容纳200个设备,只有12.5%的漏掉率。

Fragment-based Pretraining and Finetuning on Molecular Graphs

  • paper_url: http://arxiv.org/abs/2310.03274
  • repo_url: https://github.com/lvkd84/graphfp
  • paper_authors: Kha-Dinh Luong, Ambuj Singh
  • for: 本研究旨在提高Graph Neural Networks(GNNs)在分子图上的预测性能,通过在分子图上预训练GNNs的方法。
  • methods: 本研究提出了一种基于分子图的fragment预训练方法(GraphFP),通过在分子图上预训练GNNs,并在 fragments上进行预测任务,以提高GNNs的预测性能。
  • results: 研究发现,GraphFP可以提高5个常见分子测试集上的性能,并在长距离生物测试集上提高至少11.5%。
    Abstract Property prediction on molecular graphs is an important application of Graph Neural Networks. Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principal subgraph mining, we obtain a compact vocabulary of prevalent fragments from a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one on molecular graphs and the other on fragment graphs, which represents higher-order connectivity within molecules. By enforcing consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that the embeddings capture structural information at multiple resolutions. The structural information of fragment graphs is further exploited to extract auxiliary labels for graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our graph fragment-based pretraining (GraphFP) advances the performances on 5 out of 8 common molecular benchmarks and improves the performances on long-range biological benchmarks by at least 11.5%. Code is available at: https://github.com/lvkd84/GraphFP.
    摘要 “分子图学习是Graph Neural Networks的重要应用之一。最近,无标记分子数据变得更加广泛,这使得自主学习在化学领域中的GNN发展得更加快速。在这项工作中,我们提议在分子图中预训练GNN,这是一个有前途的中间层,以超越节点级和图级预训练的局限性。通过抽取大量预训练数据中的精炼 vocabulary,我们引入了一些基于分子图的Word2Vec模型,以及一些基于分子图的对比和预测任务。这些任务包括对分子图和分子图中的碎片图进行对比学习,以及在分子图上预测分子的特征。通过在碎片图上预训练GNN,我们可以在下游预测时使用这些碎片信息。我们的分子图碎片预训练(GraphFP)在8个常见分子benchmark中提高了5个benchmark的性能,并在长距离生物benchmark中提高了至少11.5%的性能。代码可以在:https://github.com/lvkd84/GraphFP中找到。”

UniPredict: Large Language Models are Universal Tabular Predictors

  • paper_url: http://arxiv.org/abs/2310.03266
  • repo_url: None
  • paper_authors: Ruiyu Wang, Zifeng Wang, Jimeng Sun
  • for: 这篇论文旨在开发一种通用的表格数据预测系统,可以在不同的预测任务下进行快速适应和高效的预测。
  • methods: 这篇论文提出了一种基于生成模型的方法,即UniPredict,通过训练一个大型语言模型(LLM)来建立通用的表格数据预测模型,可以理解多种表格输入和预测目标变量。
  • results: experiments 表明,UniPredict 模型在169个不同目标列的表格数据集上显示出了5.4%到13.4%的优势,比基eline最佳树 boosting 和基eline最佳神经网络基eline的性能更高。此外,UniPredict 模型在几十个少量数据集上进行几步学习中也表现出色,超过了 XGBoost 在低资源设置下的性能,并在所有基eline之上显示出了显著的差异。
    Abstract Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we show that scaling up an LLM to extensive tabular datasets with the capability of comprehending diverse tabular inputs and predicting for target variables following the input instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks, where our method outperforms XGBoost over 100% on the low-resource setup and shows a significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
    摘要 这是一个基本的机器学习任务,它在许多应用中具有重要性。现有的方法主要靠归化模型,并假设有固定的目标字段,需要重新训练每个新的预测任务。这篇论文参考了大型自然语言模型(LLMs)的生成能力,构建了一个通用的 tabular 数据预测器,即 UniPredict。我们在169个不同目标的 tabular 数据集合上训练了单一的 LLM,并与每个数据集合 separately 训练的基准相比较。我们发现这个多元的 UniPredict 模型在与其他模型进行比较中,有优势,兹从5.4% 到13.4%。我们还将 UniPredict 应用于几何少学习设定中,在另外 62 个 tabular 数据集合上进行测试。我们发现我们的方法在新任务上快速适应,比 XGBoost 在低资源设置上高于 100%,并在所有基准中显示了明显的优势。我们希望这个 UniPredict 可以照亮开发一个通用的 tabular 数据预测系统,可以从数据中学习,并且能够应用于广泛的预测任务。

Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets

  • paper_url: http://arxiv.org/abs/2310.03258
  • repo_url: None
  • paper_authors: Song Wei, Xiangrui Kong, Sarah A Huestis-Mitchell, Shixiang Zhu, Yao Xie, Alinson Santos Xavier, Feng Qiu
  • for: The paper is written to address the challenges of identifying systematic biases in the energy sector, particularly in low-income and elderly-populated areas, using a novel approach for counterfactual causal analysis centered on energy justice.
  • methods: The paper uses subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup.
  • results: The paper finds that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions, highlighting existing biases in the power system and the need for focused improvements in areas with economic challenges.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为了解决能源领域中系统性偏见的挑战,特别是在低收入和老龄人口区域中,采用一种新的对称 justice 方法进行Counterfactual 分析。
  • methods: 论文使用 subgroup analysis 来管理多种因素,并利用传输学习来减轻数据罕见性。
  • results: 论文发现,低收入和老龄人口区域 invariably 经历长时间的停电,不受天气条件影响,这指示了现有的能源系统偏见,并高亮了需要在经济困难地区进行专注改进。
    Abstract Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges.
    摘要 能源正义是现代能源研究领域的一个快速发展领域。然而,在能源领域中发现系统性偏见仍然是一个挑战,因为存在干扰变量、复杂的治理效果差异和数据稀缺。为解决这些挑战,我们提出了一种新的对假 causal分析方法, centered on 能源正义。我们使用 subgroup analysis来管理多种因素,并利用转移学习的想法来减轻每个 subgroup 中的数据稀缺。在我们的数字分析中,我们对大规模的客户级别停电数据集进行了应用,并 investigate 对假因素,如收入和人口年龄,对停电时间的counterfactual效果。我们的结果表明,低收入和老龄人口地区总是经历 longer 停电时间,无论天气条件如何。这指出了现有的电力系统偏见,并高亮了需要对经济困难地区进行targeted 改进。

Molecule Design by Latent Prompt Transformer

  • paper_url: http://arxiv.org/abs/2310.03253
  • repo_url: None
  • paper_authors: Deqian Kong, Yuhao Huang, Jianwen Xie, Ying Nian Wu
  • for: 本文提出了一种秘め prompt transformer模型,用于解决化学和生物问题中的困难优化问题,目标是找到具有最佳化学或生物性质的分子。
  • methods: 本文的模型由三部分组成:(1)一个秘め向量,其先验分布由一个U-Net变换的高斯白噪干扰Vector模型来模型。(2)一个分子生成模型,根据秘め向量(1)生成分子的字符串表示。我们采用了 causal Transformer模型,将秘め向量(1)作为提示。(3)一个性质预测模型,根据非线性回归算法,预测分子的性质值。
  • results: 我们的实验表明,我们提出的模型在多个benchmark分子设计任务上达到了状态的前景性表现。
    Abstract This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
    摘要
  1. A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector.2. A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in 1. We adopt the causal Transformer model that takes the latent vector in 1 as prompt.3. A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in 1. We call the proposed model the latent prompt Transformer model.After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state-of-the-art performances on several benchmark molecule design tasks.

Relational Convolutional Networks: A framework for learning representations of hierarchical relations

  • paper_url: http://arxiv.org/abs/2310.03240
  • repo_url: https://github.com/awni00/relational-neural-networks
  • paper_authors: Awni Altabaa, John Lafferty
  • for: 本研究探讨了深度学习中关系特征的显式表示方法的发展。
  • methods: 本文提出了一种叫做”关系卷积网络”的架构设计,用于学习层次结构的关系特征。在给定一个对象序列时,”多维内积关系”模块生成一个关系张量,描述所有对象之间的关系。然后,”关系卷积”层将关系张量转换成一个新的对象序列,每个对象描述了在上一层的对象组中的关系。graphlet筛子,与 convolutional neural networks 中的筛子类似,用于比较关系张量中的模板和对象组中的关系。重复这个过程可以获得高阶层次的关系表示。
  • results: 我们提供了 architecture 的背景和细节,以及一些实验来证明 relational convolutional networks 可以有效地处理具有层次结构的关系任务。
    Abstract A maturing area of research in deep learning is the development of architectures that can learn explicit representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". Given a sequence of objects, a "multi-dimensional inner product relation" module produces a relation tensor describing all pairwise relations. A "relational convolution" layer then transforms the relation tensor into a sequence of new objects, each describing the relations within some group of objects at the previous layer. Graphlet filters, analogous to filters in convolutional neural networks, represent a template of relations against which the relation tensor is compared at each grouping. Repeating this yields representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
    摘要 深度学习领域中一个成熔的研究方向是开发能够学习明确的关系特征表示的架构。在这篇论文中,我们关注了层次关系的学习表示问题,提出了“关系卷积网络”架构。给定一个对象序列,“多维内积关系”模块生成一个关系张量,描述所有对象之间的对称关系。然后,“关系卷积”层将关系张量转换成一个新的对象序列,每个对象描述了上一层层次结构中的关系。Graphlet筛选器,与 convolutional neural networks 中的筛选器相似,表示了关系模板,用于在每个分组中对关系张量进行比较。重复这个过程可以获得更高阶的层次关系表示。我们介绍了架构的动机和细节,以及一系列实验来证明关系卷积网络可以提供有效的层次结构模型。

Observatory: Characterizing Embeddings of Relational Tables

  • paper_url: http://arxiv.org/abs/2310.07736
  • repo_url: https://github.com/superctj/observatory
  • paper_authors: Tianji Cong, Madelon Hulsebos, Zhenjie Sun, Paul Groth, H. V. Jagadish
  • for: 这篇论文是为了提供一种系统性地分析关系表嵌入表示的形式框架,以便更好地理解关系表嵌入模型的特点和局限性,从而更好地选择合适的模型进行下游任务。
  • methods: 这篇论文使用了八种基本属性和相应的度量来系统地分析关系表嵌入表示,这些属性包括关系数据模型的不变量和数据分布的统计因素。同时,这篇论文还提出了一种扩展性的评价框架,以便评估语言和表嵌入模型。
  • results: 这篇论文的分析结果显示,一些关系表嵌入模型对表格结构(如列顺序)有敏感性,功能依赖关系罕见地反映在嵌入中,而特циалиzed表嵌入模型的样本准确率相对较低。这些发现可以帮助研究者和实践者更好地预测模型的行为,选择合适的模型进行下游任务,同时促进研究者开发新的模型。
    Abstract Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model for a given task reliant on trial and error. There is an urgent need to gain a comprehensive understanding of these models to minimize inefficiency and failures in downstream usage. To address this need, we propose Observatory, a formal framework to systematically analyze embedding representations of relational tables. Motivated both by invariants of the relational data model and by statistical considerations regarding data distributions, we define eight primitive properties, and corresponding measures to quantitatively characterize table embeddings for these properties. Based on these properties, we define an extensible framework to evaluate language and table embedding models. We collect and synthesize a suite of datasets and use Observatory to analyze seven such models. Our analysis provides insights into the strengths and weaknesses of learned representations over tables. We find, for example, that some models are sensitive to table structure such as column order, that functional dependencies are rarely reflected in embeddings, and that specialized table embedding models have relatively lower sample fidelity. Such insights help researchers and practitioners better anticipate model behaviors and select appropriate models for their downstream tasks, while guiding researchers in the development of new models.
    摘要 研究者和实践者很感兴趣利用最新的语言模型和专门的表嵌入模型在表格数据上进行多种任务。然而,对这些模型和它们生成的表嵌入 representations 的 Limited understanding 使得在选择合适的模型时存在很多尝试和失败。为了解决这问题,我们提出了 Observatory,一个正式的框架来系统地分析表嵌入表示。我们受到关系数据模型的 invariants 和数据分布的统计考虑,定义了八个原始属性,并对它们定义了相应的量化度量来 caracterize 表嵌入表示。基于这些属性,我们定义了一个扩展性强的框架来评估语言和表嵌入模型。我们收集了和合并了一系列数据集,并使用 Observatory 分析了七种模型。我们的分析提供了关于学习表嵌入表示的各种强点和弱点的视角,例如表格结构如column order 对模型的敏感性,函数依赖关系在嵌入表示中的罕见出现,以及专门的表嵌入模型在样本准确性方面的较低水平。这些视角可以帮助研究者和实践者更好地预测模型的行为,选择合适的模型进行下游任务,并促进研究者在开发新模型方面的进展。

History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization

  • paper_url: http://arxiv.org/abs/2310.03228
  • repo_url: None
  • paper_authors: Su Jiang, Louis J. Durlofsky
    for: 这个论文主要关注于如何通过监测数据来减少不确定性,从而改善大规模碳Capture和存储操作中 aquifer 的管理。methods: 这个论文使用了数据空间反推(DSI)技术,通过直接从观测数据中寻找历史匹配的量 interesting,而不需要构建后验地球模型。这里使用了深度学习来parameterize spatio-temporal压力和气溶度场的表达。results: 研究发现,使用这种新的深度学习参数化技术可以减少 posterior 压力和气溶度场中的不确定性,并且可以提供高效的 posterior 预测。这种方法可以应用于多种不同的地质enario中,并且可以efficiently 处理大规模的数据。
    Abstract History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
    摘要 历史匹配基于监测数据可以减少不确定性,从而改进地下温室气体存储操作的管理。传统的模型基于数据整合中,地球模型参数被修改以使流 simulate结果和观测数据匹配。在数据空间反向整合(DSI)中,历史匹配的量据 Interest,例如后逻 posterior压力和渗透率场,通过直接推算而不需要构建 posterior 地球模型。这可以通过一组 O(1000) 的先 simulation 结果、数据 parameterization 和 posterior 采样在 Bayesian 设定下进行高效地完成。在这种研究中,我们开发并实现了基于深度学习的参数化方法,用于表示空间temporal压力和渗透率场的 spatio-temporal 分布。这种参数化使用了一个对抗 autoencoder(AAE)进行维度减少,并使用一个 convolutional long short-term memory(convLSTM)网络来表示空间分布和时间演化的压力和渗透率场。这种参数化与一个ensemble smoother with multiple data assimilation(ESMDA)在 DSI 框架中使用,以实现 posterior 预测。我们考虑了一个实际的3D系统,其中 prior geological realizations 是从一系列地质enario中采样的。我们还引入了一种本地网格细化过程,以估计历史匹配 формулы中出现的错误covariance 项。我们对多种真实模型进行了广泛的历史匹配结果,并在所有情况下都 achieve 了重要的不确定性减少。这种框架可以高效地提供 posterior 预测,对于多种错误covariance 规则。这种评估 would be expensive using a model-based approach。

TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design

  • paper_url: http://arxiv.org/abs/2310.03223
  • repo_url: None
  • paper_authors: Tony Shen, Mohit Pandey, Martin Ester
  • for: 本研究旨在自动生成适应特定蛋白质槽目标的药物类分子。现有方法通常是使用finite数据集的分布来近似蛋白质-分子分布,因此很难生成与训练数据集中的绑定效果有显著提高的分子。本研究将 pocket-conditioned molecular generation task 转化为一个RL问题,并开发了Target Conditional Generative Flow Network(TacoGFN)模型。
  • methods: 我们开发了一种基于transformer的停搁分子检测器,以便快速计算停搁分子的拥有能力。此外,我们还提出了several rounds of active learning,通过使用停搁分子来提高停搁分子的预测。这种方法可以高效地探索分子空间。
  • results: 对比baseline方法,分子生成使用TacoGFN和其变种显著超越了所有性能指标(停搁分数、QED、SA、Lipinski),而且计算速度是当前最快的一个数量级。
    Abstract We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
    摘要 我们寻求自动生成适应特定蛋白质袋子目标的药物类分子。现有方法通常对蛋白质-分子分布数据集进行近似,因此很难生成具有显著的绑定提升的分子。我们将袋子条件的分子生成任务视为一个RL问题,并开发了目标条件生成流网络模型(TacoGFN)。我们的方法会主动生成具有愿景属性的分子,而不是适应已有数据分布。为此,我们开发了基于变换器的吸引力预测器,以加速吸引力预测和提高分子空间的探索效率。此外,我们进行了多轮活动学习,通过使用吸引力或acles来改进吸引力预测。这种方法让我们可以准确地探索计算能力范围内的分子领域。实验表明,使用TacoGFN和其变种生成的分子具有显著性,在每一个性能指标(吸引力、QED、SA、利平斯基)上都高于所有基eline方法,而且计算速度是其他方法的数量级快。

Formal and Practical Elements for the Certification of Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2310.03217
  • repo_url: None
  • paper_authors: Jean-Guillaume Durand, Arthur Dubois, Robert J. Moss
  • for: 这篇论文的目的是如何在自动驾驶飞行中使用机器学习模型,以确保其安全性和可靠性。
  • methods: 这篇论文使用了一种基于统计学的验证器,以确保机器学习模型的正确性和可靠性。这种验证器是模型无关的,工具无关的,可以适用于多种应用场景。
  • results: 这篇论文通过对视觉 landing 的应用来展示了其certification framework的效果。
    Abstract Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.
    摘要 过去一个十年,机器学习已经表现出了很强的成果,经常超越人类的能力在自动飞行相关的感知任务上。与传统的航空航天软件不同,机器学习模型的参数不是手动编码也不是从物理学 derivated,而是从数据学习。在训练阶段,它们的值会自动调整,并且通常不符合物理要求。因此,要证明这些模型的安全性是一个大的挑战。本文尝试解决这个问题,通过以下三个方法:1. 推动机器学习模型的内部工作和过程的启示,以便更好地理解它们是如何工作的。2. 正式确立机器学习过程的理论保证,以确保它们可以在安全的情况下工作。3. 结合实际考虑因素,开发一个完整的证明框架,以便在安全关键的机器学习系统中保证安全性。我们的提议的框架是基于可扩展的统计验证器,可以独立于模型和工具。这使得它适用于各种业务场景。我们在视觉相关的着陆任务上进行了实践,并取得了良好的结果。