cs.LG - 2023-09-02

Efficient Covariance Matrix Reconstruction with Iterative Spatial Spectrum Sampling

  • paper_url: http://arxiv.org/abs/2309.01040
  • repo_url: None
  • paper_authors: S. Mohammadzadeh, V. H. Nascimento, R. C. de Lamare, O. Kukrer
  • for: 本研究提出了一种可靠且cost-effective的 beamforming算法设计方法,用于防止附近干扰信号的扩散。
  • methods: 本研究使用了有效的covariance矩阵重建方法,基于iterative空间功率spectrum(CMR-ISPS)。这种方法可以重建干扰信号plus noise covariance矩阵(INC),并使用最大Entropy功率spectrumdensity函数来形态指向响应。
  • results: 实验结果表明,提出的CMR-ISPS抑制器可以快速地防止附近干扰信号的扩散,并且可以在不同的干扰信号水平下提供相应的抗干扰性能。
    Abstract This work presents a cost-effective technique for designing robust adaptive beamforming algorithms based on efficient covariance matrix reconstruction with iterative spatial power spectrum (CMR-ISPS). The proposed CMR-ISPS approach reconstructs the interference-plus-noise covariance (INC) matrix based on a simplified maximum entropy power spectral density function that can be used to shape the directional response of the beamformer. Firstly, we estimate the directions of arrival (DoAs) of the interfering sources with the available snapshots. We then develop an algorithm to reconstruct the INC matrix using a weighted sum of outer products of steering vectors whose coefficients can be estimated in the vicinity of the DoAs of the interferences which lie in a small angular sector. We also devise a cost-effective adaptive algorithm based on conjugate gradient techniques to update the beamforming weights and a method to obtain estimates of the signal of interest (SOI) steering vector from the spatial power spectrum. The proposed CMR-ISPS beamformer can suppress interferers close to the direction of the SOI by producing notches in the directional response of the array with sufficient depths. Simulation results are provided to confirm the validity of the proposed method and make a comparison to existing approaches
    摘要 First, the directions of arrival (DoAs) of the interfering sources are estimated using the available snapshots. Then, an algorithm is developed to reconstruct the INC matrix using a weighted sum of outer products of steering vectors whose coefficients can be estimated in the vicinity of the DoAs of the interferences, which lie in a small angular sector.Additionally, a cost-effective adaptive algorithm based on conjugate gradient techniques is proposed to update the beamforming weights, and a method to obtain estimates of the signal of interest (SOI) steering vector from the spatial power spectrum. The proposed CMR-ISPS beamformer can effectively suppress interferers close to the direction of the SOI by producing notches in the directional response of the array with sufficient depths.Simulation results are provided to confirm the validity of the proposed method and compare it to existing approaches. The proposed technique offers a cost-effective and robust solution for adaptive beamforming in the presence of interference.

Online Adaptive Mahalanobis Distance Estimation

  • paper_url: http://arxiv.org/abs/2309.01030
  • repo_url: None
  • paper_authors: Lianke Qin, Aravind Reddy, Zhao Song
  • For: This paper is written for studying dimension reduction for Mahalanobis metrics and providing efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances.* Methods: The paper uses randomized Monte Carlo data structures and adapts it to handle sequences of adaptive queries and online updates to the Mahalanobis metric matrix and the data points.* Results: The paper provides efficient data structures for solving the ADE problem for Mahalanobis distances, which can be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
    Abstract Mahalanobis metrics are widely used in machine learning in conjunction with methods like $k$-nearest neighbors, $k$-means clustering, and $k$-medians clustering. Despite their importance, there has not been any prior work on applying sketching techniques to speed up algorithms for Mahalanobis metrics. In this paper, we initiate the study of dimension reduction for Mahalanobis metrics. In particular, we provide efficient data structures for solving the Approximate Distance Estimation (ADE) problem for Mahalanobis distances. We first provide a randomized Monte Carlo data structure. Then, we show how we can adapt it to provide our main data structure which can handle sequences of \textit{adaptive} queries and also online updates to both the Mahalanobis metric matrix and the data points, making it amenable to be used in conjunction with prior algorithms for online learning of Mahalanobis metrics.
    摘要 马哈拉诺比斯度量广泛应用在机器学习中,常与 $k$-最近邻、$k$-集群和 $k$-中值集群一起使用。尽管其重要性,但是没有任何之前的研究把笔记技术应用于快速化马哈拉诺比斯度量算法。在这篇论文中,我们开始研究维度减少 для马哈拉诺比斯度量。具体来说,我们提供了高效的数据结构来解决 Approximate Distance Estimation(ADE)问题。我们首先提供了随机 Monte Carlo 数据结构。然后,我们如何将其改进,以满足适应性查询和在 Mahalanobis 度量矩阵和数据点上进行在线更新,使其适用于与之前的在线学习 Mahalanobis 度量算法。

On the training and generalization of deep operator networks

  • paper_url: http://arxiv.org/abs/2309.01020
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Sanghyun Lee, Yeonjong Shin
  • for: 这个论文是为了提出一种新的深度运算网络(DeepONets)训练方法。
  • methods: 这种训练方法包括先训练树网络,然后顺序训练分支网络。这种方法的核心思想是利用分治法将复杂的训练任务分解成两个子任务,从而降低训练的复杂性。
  • results: 该训练方法可以在各种情况下提高DeepONets的稳定性和泛化能力,并且可以更好地处理各种非线性和非对易性问题。
    Abstract We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media.
    摘要 我们提出了一种新的训练方法 для深度运算网络(DeepONets),这是一种非常流行的神经网络模型。DeepONets由两个子网络组成:分支网络和主网络。通常情况下,这两个子网络同时进行训练,这等于在高维空间中解决一个复杂的优化问题。此外,由于非对称和非线性的性质,训练非常困难。为解决这个挑战,我们提出了一种分两步训练方法,先训练主网络,然后顺序训练分支网络。这种机制的核心思想是分而治之的方法,即将整个复杂的训练任务分解成两个子任务,每个子任务都有较低的复杂性。在这个过程中,我们引入了 Gram-Schmidt 正交化过程,这有助于提高稳定性和泛化能力。从理论角度来看,我们建立了一个泛化误差估计,其与训练数据量、深度网络宽度和输入和输出感知器数量有关。数据示范中,我们展示了这种两步训练方法的效果,包括 Darcy 流在不同的孔隙媒质中。

MPTopic: Improving topic modeling via Masked Permuted pre-training

  • paper_url: http://arxiv.org/abs/2309.01015
  • repo_url: None
  • paper_authors: Xinche Zhang, Evangelos milios
  • For: The paper aims to improve the quality of topic modeling in text analysis by addressing the limitations of existing methods such as BERTopic and Top2Vec.* Methods: The paper introduces a new approach called TF-RDF (Term Frequency - Relative Document Frequency) to assess the relevance of terms within a document, and uses this approach to drive a clustering algorithm called MPTopic.* Results: The paper shows that the topic keywords identified using MPTopic and TF-RDF outperform those extracted by BERTopic and Top2Vec through comprehensive evaluation.Here’s the same information in Simplified Chinese:* For: 论文目的是为了提高文本分析中的话题模型质量,并且解决现有方法如BERTopic和Top2Vec的局限性。* Methods: 论文引入了一种新的方法 called TF-RDF (文档频次-相对文档频次),用于评估文档中 термина的 relevance,并使用这种方法驱动一种名为 MPTopic 的聚类算法。* Results: 论文表明,使用 MPTopic 和 TF-RDF 提取的话题关键词比 BERTopic 和 Top2Vec 提取的词语要出色。
    Abstract Topic modeling is pivotal in discerning hidden semantic structures within texts, thereby generating meaningful descriptive keywords. While innovative techniques like BERTopic and Top2Vec have recently emerged in the forefront, they manifest certain limitations. Our analysis indicates that these methods might not prioritize the refinement of their clustering mechanism, potentially compromising the quality of derived topic clusters. To illustrate, Top2Vec designates the centroids of clustering results to represent topics, whereas BERTopic harnesses C-TF-IDF for its topic extraction.In response to these challenges, we introduce "TF-RDF" (Term Frequency - Relative Document Frequency), a distinctive approach to assess the relevance of terms within a document. Building on the strengths of TF-RDF, we present MPTopic, a clustering algorithm intrinsically driven by the insights of TF-RDF. Through comprehensive evaluation, it is evident that the topic keywords identified with the synergy of MPTopic and TF-RDF outperform those extracted by both BERTopic and Top2Vec.
    摘要

Streaming Active Learning for Regression Problems Using Regression via Classification

  • paper_url: http://arxiv.org/abs/2309.01013
  • repo_url: None
  • paper_authors: Shota Horiguchi, Kota Dohi, Yohei Kawaguchi
  • for: 本研究旨在提出一种基于流动学习的回归方法,以提高回归模型在不同环境下的性能。
  • methods: 本研究使用了流动活动学习方法,其中将回归问题转化为分类问题,然后应用直接使用了流动活动学习方法。
  • results: 实验结果表明,提出的方法可以在同等级别的注释成本下实现更高的回归精度。
    Abstract One of the challenges in deploying a machine learning model is that the model's performance degrades as the operating environment changes. To maintain the performance, streaming active learning is used, in which the model is retrained by adding a newly annotated sample to the training dataset if the prediction of the sample is not certain enough. Although many streaming active learning methods have been proposed for classification, few efforts have been made for regression problems, which are often handled in the industrial field. In this paper, we propose to use the regression-via-classification framework for streaming active learning for regression. Regression-via-classification transforms regression problems into classification problems so that streaming active learning methods proposed for classification problems can be applied directly to regression problems. Experimental validation on four real data sets shows that the proposed method can perform regression with higher accuracy at the same annotation cost.
    摘要 一个机器学习模型的挑战是其性能随环境变化而下降。为维护性能,流动活动学习被使用,其中模型通过添加新的注释样本到训练集来重新训练,如果预测样本的准确性不够高 enough。虽然许多流动活动学习方法已经为分类问题提出,但对于回归问题,业界上的尝试不多。本文提出使用回归via分类框架来实现流动活动学习回归。回归via分类将回归问题转化为分类问题,从而可以直接应用流动活动学习方法,提高回归的准确性。实验 validate on four real data sets 显示,提出的方法可以在同样的注释成本下实现更高的回归精度。

Bayesian sparsity and class sparsity priors for dictionary learning and coding

  • paper_url: http://arxiv.org/abs/2309.00999
  • repo_url: None
  • paper_authors: Alberto Bocchinfuso, Daniela Calvetti, Erkki Somersalo
  • for: solves challenging inverse problems using dictionary learning methods
  • methods: uses sparse coding techniques and dictionary compression to reduce computational complexity
  • results: effectively identifies relevant subdictionaries and reduces computational complexity in real-world applications such as glitch detection and hyperspectral remote sensing
    Abstract Dictionary learning methods continue to gain popularity for the solution of challenging inverse problems. In the dictionary learning approach, the computational forward model is replaced by a large dictionary of possible outcomes, and the problem is to identify the dictionary entries that best match the data, akin to traditional query matching in search engines. Sparse coding techniques are used to guarantee that the dictionary matching identifies only few of the dictionary entries, and dictionary compression methods are used to reduce the complexity of the matching problem. In this article, we propose a work flow to facilitate the dictionary matching process. First, the full dictionary is divided into subdictionaries that are separately compressed. The error introduced by the dictionary compression is handled in the Bayesian framework as a modeling error. Furthermore, we propose a new Bayesian data-driven group sparsity coding method to help identify subdictionaries that are not relevant for the dictionary matching. After discarding irrelevant subdictionaries, the dictionary matching is addressed as a deflated problem using sparse coding. The compression and deflation steps can lead to substantial decreases of the computational complexity. The effectiveness of compensating for the dictionary compression error and using the novel group sparsity promotion to deflate the original dictionary are illustrated by applying the methodology to real world problems, the glitch detection in the LIGO experiment and hyperspectral remote sensing.
    摘要 字典学习方法继续受欢迎用于解决困难的反问题。在字典学习方法中,计算前方模型被替换为一个大字典的可能结果,问题是将字典条目与数据匹配,类似于传统的查询匹配在搜索引擎中。稀盐编码技术用于保证字典匹配只找到少量的字典条目,而字典压缩方法用于减少匹配问题的复杂性。在这篇文章中,我们提出一个工作流程来促进字典匹配过程。首先,全字典被分解成分字典,并将每个分字典独立压缩。 dictionary compression error 被处理在 bayesian 框架中作为模型误差。此外,我们提出了一种新的 bayesian 数据驱动的群 sparse coding 方法,以帮助标识不相关的分字典。 после将不相关的分字典排除,字典匹配被视为一个减少的问题,使用稀盐编码进行解决。压缩和减少步骤可能会导致计算复杂性的明显减少。我们通过应用方法到实际问题,如 LIGO 实验中的雷达检测和Remote sensing 中的 Hyperspectral 检测,来证明资料做准的补偿和使用新的群 sparse coding 促进法可以减少计算复杂性。

Switch and Conquer: Efficient Algorithms By Switching Stochastic Gradient Oracles For Decentralized Saddle Point Problems

  • paper_url: http://arxiv.org/abs/2309.00997
  • repo_url: https://github.com/chhavisharma123/c-dpssg-cdc2023
  • paper_authors: Chhavi Sharma, Vishnu Narayanan, P. Balamurugan
  • for: 这个论文targets non-smooth strongly convex-strongly concave saddle point problems in a decentralized setting without a central server.
  • methods: authors propose an inexact primal dual hybrid gradient (inexact PDHG) procedure that allows generic gradient computation oracles to update the primal and dual variables.
  • results: authors prove that the proposed algorithm, Decentralized Proximal Switching Stochastic Gradient method with Compression (C-DPSSG), converges to an $\epsilon$-accurate saddle point solution with linear rate, and the algorithm is well suited for obtaining solutions of low/medium accuracy faster.Here is the format you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>I hope this helps!
    Abstract We consider a class of non-smooth strongly convex-strongly concave saddle point problems in a decentralized setting without a central server. To solve a consensus formulation of problems in this class, we develop an inexact primal dual hybrid gradient (inexact PDHG) procedure that allows generic gradient computation oracles to update the primal and dual variables. We first investigate the performance of inexact PDHG with stochastic variance reduction gradient (SVRG) oracle. Our numerical study uncovers a significant phenomenon of initial conservative progress of iterates of IPDHG with SVRG oracle. To tackle this, we develop a simple and effective switching idea, where a generalized stochastic gradient (GSG) computation oracle is employed to hasten the iterates' progress to a saddle point solution during the initial phase of updates, followed by a switch to the SVRG oracle at an appropriate juncture. The proposed algorithm is named Decentralized Proximal Switching Stochastic Gradient method with Compression (C-DPSSG), and is proven to converge to an $\epsilon$-accurate saddle point solution with linear rate. Apart from delivering highly accurate solutions, our study reveals that utilizing the best convergence phases of GSG and SVRG oracles makes C-DPSSG well suited for obtaining solutions of low/medium accuracy faster, useful for certain applications. Numerical experiments on two benchmark machine learning applications show C-DPSSG's competitive performance which validate our theoretical findings. The codes used in the experiments can be found \href{https://github.com/chhavisharma123/C-DPSSG-CDC2023}{here}.
    摘要 我们考虑一类非滑坡强弱缓衡点问题在分布式设置中,无中央服务器。以解决这类问题的协议形式,我们发展了一个不精确的内部预测点数值变化(inexact PDHG)程式,允许普通的梯度计算实体更新内部预测点和梯度。我们首先研究对不精确PDHG使用测量噪声减少梯度(SVRG)实体的性能。我们的数据研究发现在追踪过程中,对于IPDHG的初始阶段,实际上存在较大的保守进步。为了解决这个问题,我们提出了一个简单有效的转换想法,其中在初始阶段使用一个通用梯度计算实体(GSG)来增加积分进步,然后在适当的时刻转换到SVRG实体。我们给这个算法命名为分布式预测转换梯度法(C-DPSSG),并证明其可以在线性速率下落在ε-精确点解。此外,我们的研究发现,通过利用GSG和SVRG实体的最佳追踪阶段,C-DPSSG可以实现低/中精度更快的解决方案,对于一些应用而言是有用的。我们的实验结果显示C-DPSSG在两个机器学习应用中的竞争性表现,与我们的理论成果相符。实验代码可以在以下连结获取:

A Boosted Machine Learning Framework for the Improvement of Phase and Crystal Structure Prediction of High Entropy Alloys Using Thermodynamic and Configurational Parameters

  • paper_url: http://arxiv.org/abs/2309.00993
  • repo_url: None
  • paper_authors: Debsundar Dey, Suchandan Das, Anik Pal, Santanu Dey, Chandan Kumar Raul, Arghya Chatterjee
  • for: This paper aims to predict the phases and crystal structures of High-Entropy Alloys (HEAs) using machine learning (ML) techniques.
  • methods: The study employs five distinct boosting algorithms (XGBoost, LightGBM, Random Forest, Gradient Boosting, and CatBoost) to predict phases and crystal structures, and introduces a methodical framework using the Pearson correlation coefficient to select strongly co-related features for improved accuracy.
  • results: The study achieves an accuracy of 94.05% for phase prediction and 90.07% for crystal structure prediction, and provides a new approach to quantify the influence of parameters on the model’s accuracy.
    Abstract The reason behind the remarkable properties of High-Entropy Alloys (HEAs) is rooted in the diverse phases and the crystal structures they contain. In the realm of material informatics, employing machine learning (ML) techniques to classify phases and crystal structures of HEAs has gained considerable significance. In this study, we assembled a new collection of 1345 HEAs with varying compositions to predict phases. Within this collection, there were 705 sets of data that were utilized to predict the crystal structures with the help of thermodynamics and electronic configuration. Our study introduces a methodical framework i.e., the Pearson correlation coefficient that helps in selecting the strongly co-related features to increase the prediction accuracy. This study employed five distinct boosting algorithms to predict phases and crystal structures, offering an enhanced guideline for improving the accuracy of these predictions. Among all these algorithms, XGBoost gives the highest accuracy of prediction (94.05%) for phases and LightGBM gives the highest accuracy of prediction of crystal structure of the phases (90.07%). The quantification of the influence exerted by parameters on the model's accuracy was conducted and a new approach was made to elucidate the contribution of individual parameters in the process of phase prediction and crystal structure prediction.
    摘要 高级噪声合金(HEA)的很多特有性归因于它们包含多种相和晶体结构。在材料信息学领域,使用机器学习(ML)技术来分类HEA的相和晶体结构得到了广泛的应用。在这项研究中,我们组装了一个新的HEA合集,其中包含了不同组合的1345个HEA。其中,705个数据集用于预测晶体结构,并采用了热力学和电子配置来帮助预测。我们的研究框架包括用Pearson相关系数选择强相关特征,以提高预测精度。我们使用了五种不同的提升算法来预测相和晶体结构,其中XGBoost提供了预测相的最高精度(94.05%),而LightGBM提供了预测晶体结构的相最高精度(90.07%)。我们还对模型精度的影响因素进行了评估,并开发了一种新的方法来解释参数对预测相和晶体结构的贡献。

An Ensemble Score Filter for Tracking High-Dimensional Nonlinear Dynamical Systems

  • paper_url: http://arxiv.org/abs/2309.00983
  • repo_url: https://github.com/zezhongzhang/ensf
  • paper_authors: Feng Bao, Zezhong Zhang, Guannan Zhang
  • for: 解决高维非线性滤波问题的精度高的筛选方法
  • methods: 利用分数基本概率模型来描述滤波过程中的演化,并使用小批量 Monte Carlo 估计器直接估计分数函数,而不需要训练神经网络。
  • results: 在高维劳逊系统中,EnSF 可以可靠地跟踪高维非线性观测过程,并且可以提供高精度的滤波结果,这些问题都是现有滤波方法所面临的挑战。
    Abstract We propose an ensemble score filter (EnSF) for solving high-dimensional nonlinear filtering problems with superior accuracy. A major drawback of existing filtering methods, e.g., particle filters or ensemble Kalman filters, is the low accuracy in handling high-dimensional and highly nonlinear problems. EnSF attacks this challenge by exploiting the score-based diffusion model, defined in a pseudo-temporal domain, to characterizing the evolution of the filtering density. EnSF stores the information of the recursively updated filtering density function in the score function, in stead of storing the information in a set of finite Monte Carlo samples (used in particle filters and ensemble Kalman filters). Unlike existing diffusion models that train neural networks to approximate the score function, we develop a training-free score estimation that uses mini-batch-based Monte Carlo estimator to directly approximate the score function at any pseudo-spatial-temporal location, which provides sufficient accuracy in solving high-dimensional nonlinear problems as well as saves tremendous amount of time spent on training neural networks. Another essential aspect of EnSF is its analytical update step, gradually incorporating data information into the score function, which is crucial in mitigating the degeneracy issue faced when dealing with very high-dimensional nonlinear filtering problems. High-dimensional Lorenz systems are used to demonstrate the performance of our method. EnSF provides surprisingly impressive performance in reliably tracking extremely high-dimensional Lorenz systems (up to 1,000,000 dimension) with highly nonlinear observation processes, which is a well-known challenging problem for existing filtering methods.
    摘要 我们提出了一种ensemble score filter(EnSF),用于解决高维非线性筛选问题,提高准确性。现有的筛选方法,如 particile filter 或 ensemble Kalman filter,在处理高维高非线性问题时的准确性很低。EnSF 利用了分数基 diffusion 模型,在 pseudo-时间领域中定义了筛选演化的分数函数。EnSF 将筛选 densities 的信息存储在分数函数中,而不是使用finite Monte Carlo 样本(用于 particile filter 和 ensemble Kalman filter)。与现有的扩散模型不同,我们开发了一种无需训练的分数估计,使用 mini-batch-based Monte Carlo 估计器直接在任何 pseudo-空间时间位置上估计分数函数,这提供了足够的准确性来解决高维非线性问题,同时节省了训练神经网络所需的巨大时间。另一个关键特点是 EnSF 的分析更新步骤,逐步将数据信息 incorporated 到分数函数中,这是解决非线性问题时的重要问题。高维 Lorenz 系统被用来演示 EnSF 的性能,EnSF 在处理 extremely high-dimensional Lorenz 系统(达 1,000,000 维)的非线性观测过程中表现出非常出众的表现,这是现有筛选方法所面临的一个著名的挑战。

  • paper_url: http://arxiv.org/abs/2309.00976
  • repo_url: None
  • paper_authors: Kaiwen Dong, Zhichun Guo, Nitesh V. Chawla
  • for: 这个研究的目的是提高Message Passing Neural Networks(MPNNs)在链接预测 task 的表现,MPNNs 通常在这个任务中表现不佳,被简单的规律如 Common Neighbor(CN)所超越。
  • methods: 我们提出了一种基于 message-passing 的方法,即 Message Passing Link Predictor(MPLP),这个模型利用 quasi-orthogonal vectors 来估算链接级别的结构特征,同时保留 node-level 的复杂性。
  • results: 我们在不同领域的benchmark datasets上进行了实验,结果显示了我们的方法在预测链接任务中的出色表现,较基于方法的表现更好。
    Abstract Message Passing Neural Networks (MPNNs) have emerged as the {\em de facto} standard in graph representation learning. However, when it comes to link prediction, they often struggle, surpassed by simple heuristics such as Common Neighbor (CN). This discrepancy stems from a fundamental limitation: while MPNNs excel in node-level representation, they stumble with encoding the joint structural features essential to link prediction, like CN. To bridge this gap, we posit that, by harnessing the orthogonality of input vectors, pure message-passing can indeed capture joint structural features. Specifically, we study the proficiency of MPNNs in approximating CN heuristics. Based on our findings, we introduce the Message Passing Link Predictor (MPLP), a novel link prediction model. MPLP taps into quasi-orthogonal vectors to estimate link-level structural features, all while preserving the node-level complexities. Moreover, our approach demonstrates that leveraging message-passing to capture structural features could offset MPNNs' expressiveness limitations at the expense of estimation variance. We conduct experiments on benchmark datasets from various domains, where our method consistently outperforms the baseline methods.
    摘要 Translation in Simplified Chinese:message passing neural networks (MPNNs) 已经成为 graphs 表示学习的“de facto”标准,但当预测关系时,它们经常遇到问题,被简单的规律如共同邻居 (CN) 所超越。这个差异源于 MPNNs 在节点水平的表示方面 excellence,但它们在节点间的结构特征方面缺乏表达能力,如 CN。为了补偿这个差异,我们提出,通过利用输入vector的正交性,纯message-passing可以真正捕捉结构特征。我们进一步研究 MPNNs 在CN规律的近似方面的效能。根据我们的发现,我们引入了 Message Passing Link Predictor (MPLP),一个新的预测关系模型。MPLP 利用 quasi-orthogonal vector 估计关系级别的结构特征,同时保留节点水平的复杂性。此外,我们的方法显示,通过将message-passing用于结构特征的捕捉,可以对 MPNNs 的表达能力进行补偿,即使是在估计误差方面。我们在不同领域的benchmark数据上进行了实验,我们的方法一致地超越了基eline方法。

Network Topology Inference with Sparsity and Laplacian Constraints

  • paper_url: http://arxiv.org/abs/2309.00960
  • repo_url: None
  • paper_authors: Jiaxi Ying, Xi Han, Rui Zhou, Xiwen Wang, Hing Cheung So
  • for: 这篇论文旨在解决网络顶点推导问题,使用laplacian受限 Gaussian graphical models,将任务转换为精确度矩阵的估计。
  • methods: 本文提出了一种具有 $\ell_0$-norm条件的网络矩阵估计方法,通过gradient projection算法解决具有稀疏性和laplacian约束的优化问题。
  • results: numerical experiments表明,提案的方法能够有效地解决网络顶点推导问题,并且比traditional $\ell_1$-norm方法更加稳定和有效。
    Abstract We tackle the network topology inference problem by utilizing Laplacian constrained Gaussian graphical models, which recast the task as estimating a precision matrix in the form of a graph Laplacian. Recent research \cite{ying2020nonconvex} has uncovered the limitations of the widely used $\ell_1$-norm in learning sparse graphs under this model: empirically, the number of nonzero entries in the solution grows with the regularization parameter of the $\ell_1$-norm; theoretically, a large regularization parameter leads to a fully connected (densest) graph. To overcome these challenges, we propose a graph Laplacian estimation method incorporating the $\ell_0$-norm constraint. An efficient gradient projection algorithm is developed to solve the resulting optimization problem, characterized by sparsity and Laplacian constraints. Through numerical experiments with synthetic and financial time-series datasets, we demonstrate the effectiveness of the proposed method in network topology inference.
    摘要 我们解决网络顶点结构推论问题,利用laplacian受限 Gaussian graphical models,它将任务转换为估计一个矩阵precision matrix的graph Laplacian。最近的研究 \cite{ying2020nonconvex} 发现了 $\ell_1$ 条件下学习简短网络的limitation:实验中非零元素的数量随着调整参数增加;理论上,一个大的调整参数将导致一个最密集的网络。为了解决这些挑战,我们提议一个具有 $\ell_0$ 条件的网络Laplacian估计方法。我们开发了一个高效的梯度对应算法来解决这个估计问题,它具有简短和Laplacian的约束。通过实验证明,我们显示了我们的提议方法在网络顶点结构推论中的效果。

Index-aware learning of circuits

  • paper_url: http://arxiv.org/abs/2309.00958
  • repo_url: None
  • paper_authors: Idoia Cortes Garcia, Peter Förster, Lennart Jansen, Wil Schilders, Sebastian Schöps
  • for: 本文旨在描述如何使用机器学习方法来优化电子电路设计,以及如何使用已有系统知识来减少学习的复杂性。
  • methods: 本文提出了一种基于分解理论的机器学习方法,该方法可以将电子电路描述为代数动态系统(DAE),然后使用分解理论来解释DAE中的隐藏约束。最后,该方法可以使用已有系统知识来重建代数变量,从而保证算法的准确性。
  • results: 本文的实验结果表明,使用该方法可以减少学习的复杂性,同时保证算法的准确性。这种方法可以用于各种电子电路设计问题,如电路优化、灵活性分析等。
    Abstract Electrical circuits are present in a variety of technologies, making their design an important part of computer aided engineering. The growing number of tunable parameters that affect the final design leads to a need for new approaches of quantifying their impact. Machine learning may play a key role in this regard, however current approaches often make suboptimal use of existing knowledge about the system at hand. In terms of circuits, their description via modified nodal analysis is well-understood. This particular formulation leads to systems of differential-algebraic equations (DAEs) which bring with them a number of peculiarities, e.g. hidden constraints that the solution needs to fulfill. We aim to use the recently introduced dissection concept for DAEs that can decouple a given system into ordinary differential equations, only depending on differential variables, and purely algebraic equations that describe the relations between differential and algebraic variables. The idea then is to only learn the differential variables and reconstruct the algebraic ones using the relations from the decoupling. This approach guarantees that the algebraic constraints are fulfilled up to the accuracy of the nonlinear system solver, which represents the main benefit highlighted in this article.
    摘要 电路设计是现代工程设计中的一个重要组成部分。随着参数的增加,电路设计的最终结果的影响需要新的方法来衡量其影响。机器学习可能会在这个领域发挥关键作用,但现有的方法常常不充分利用现有系统的知识。在电路方面,使用修改后的节点分析来描述电路是非常好的。这种形式化导致系统拥有偏微分方程(DAE),这些DAE具有一些特点,例如隐藏的约束,解决方案需要满足这些约束。我们想使用最近引入的分割概念来处理DAE,将系统分解成仅依赖于偏微分变量的普通偏微分方程,并且使用系统的关系来重建代数变量。这种方法保证了代数约束的满足,直到非线性系统解决器的精度,这是本文的主要优点。

Emergent Linear Representations in World Models of Self-Supervised Sequence Models

  • paper_url: http://arxiv.org/abs/2309.00941
  • repo_url: https://github.com/ajyl/mech_int_othellogpt
  • paper_authors: Neel Nanda, Andrew Lee, Martin Wattenberg
  • for: 这 paper 是 investigate how sequence models represent their decision-making process, and provide evidence of a closely related linear representation of the board state.
  • methods: 这 paper 使用 Othello-playing neural network, and use probing to understand the model’s internal state.
  • results: 这 paper 得到了一个简单 yet powerful way to interpret the model’s internal state, and demonstrate that linear representations enable significant interpretability progress.Here’s the full text in Simplified Chinese:
  • for: 这 paper 是 investigate how sequence models represent their decision-making process, 和提供 evidence of a closely related linear representation of the board state.
  • methods: 这 paper 使用 Othello-playing neural network, 并使用 probing 来理解模型的内部状态.
  • results: 这 paper 得到了一个简单 yet powerful way to interpret the model’s internal state, 并 demonstrates that linear representations enable significant interpretability progress.
    Abstract How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for "my colour" vs. "opponent's colour" may be a simple yet powerful way to interpret the model's internal state. This precise understanding of the internal representations allows us to control the model's behaviour with simple vector arithmetic. Linear representations enable significant interpretability progress, which we demonstrate with further exploration of how the world model is computed.
    摘要 <>translate "How do sequence models represent their decision-making process? Prior work suggests that Othello-playing neural network learned nonlinear models of the board state (Li et al., 2023). In this work, we provide evidence of a closely related linear representation of the board. In particular, we show that probing for 'my colour' vs. 'opponent's colour' may be a simple yet powerful way to interpret the model's internal state. This precise understanding of the internal representations allows us to control the model's behaviour with simple vector arithmetic. Linear representations enable significant interpretability progress, which we demonstrate with further exploration of how the world model is computed." into Chinese (Simplified)Answer:sequence models的决策过程是如何表示?先前的工作表明,抽象棋盘 neural network 学习了非线性模型(Li et al., 2023)。在这项工作中,我们提供了一种相关的直线表示,具体来说,我们表明了 probing for "我的颜色" vs. "对手的颜色" 可能是一种简单却强大的内部状态的解释方法。这种精确的内部表示允许我们通过简单的矢量算术控制模型的行为。直线表示具有显著的可读性进步,我们通过进一步探索世界模型如何计算来证明这一点。

Short-term power load forecasting method based on CNN-SAEDN-Res

  • paper_url: http://arxiv.org/abs/2309.07140
  • repo_url: None
  • paper_authors: Yang Cui, Han Zhu, Yijian Wang, Lu Zhang, Yang Li
  • for: 这篇研究目的是提出一种基于卷积神经网络、自注意编码器-解码器网络和差分优化(Res)的短期载电预测方法,以解决传统序列模型对于非时序因素资料的处理问题,提高预测精度。
  • methods: 本方法的特点包括使用两维卷积神经网络进行特征提取,并使用自注意编码器-解码器网络和差分优化模组进行预测。自注意编码器可以将高维特征转换为全局相互关联的数据,而差分优化模组可以确保预测结果的稳定性。
  • results: simulation 结果显示,提出的载电预测方法在预测精度和预测稳定性方面具有明显的优势,比较之前的方法更能够捕捉非时序因素资料中的相互关联。
    Abstract In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is composed of a two-dimensional convolutional neural network, which is used to mine the local correlation between data and obtain high-dimensional data features. The initial load fore-casting module consists of a self-attention encoder-decoder network and a feedforward neural network (FFN). The module utilizes self-attention mechanisms to encode high-dimensional features. This operation can obtain the global correlation between data. Therefore, the model is able to retain important information based on the coupling relationship between the data in data mixed with non-time series factors. Then, self-attention decoding is per-formed and the feedforward neural network is used to regression initial load. This paper introduces the residual mechanism to build the load optimization module. The module generates residual load values to optimize the initial load. The simulation results show that the proposed load forecasting method has advantages in terms of prediction accuracy and prediction stability.
    摘要 在深度学习中,带有非时序因素的数据加载具有困难处理序列模型的问题。这种问题导致预测精度不够。因此,一种基于卷积神经网络(CNN)、自注意编码器解码网络(SAEDN)和剩余级修正(Res)的短期电力预测方法被提出。在这种方法中,特征提取模块由两维卷积神经网络组成,用于挖掘数据中的本地相关性,并从而获得高维数据特征。初始电力预测模块包括自注意编码器解码网络和Feedforward神经网络(FFN)。这个模块使用自注意机制编码高维特征,从而获得数据之间的全局相关性。因此,模型能够保留数据混合非时序因素的重要信息。然后,自注意解码被执行,并使用Feedforward神经网络进行回归初始电力。本文介绍了剩余机制来建立电力优化模块。该模块生成剩余电力值,以优化初始电力。实验结果显示,提议的电力预测方法具有更高的预测精度和预测稳定性。

A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading

  • paper_url: http://arxiv.org/abs/2309.00907
  • repo_url: https://github.com/qiyu3816/MTFNN-CO
  • paper_authors: Ruihuai Liang, Bo Yang, Zhiwen Yu, Xuelin Cao, Derrick Wing Kwan Ng, Chau Yuen
  • for: 这个研究旨在设计一个优化的负载协调策略,以提高移动 Multi-Access Edge Computing (MEC) 的性能。
  • methods: 这个研究使用了一个混合数据类型的非线性计划 (MINLP) 问题,并使用了一个深度神经网络 (DNN) 模型进行线上推导。
  • results: 这个研究所得到的结果显示,使用了多头组合多任务学习 (MEMTL) 方法可以对时间变化的环境进行快速的解决,并且可以实现高精度的推导。
    Abstract Computation offloading has become a popular solution to support computationally intensive and latency-sensitive applications by transferring computing tasks to mobile edge servers (MESs) for execution, which is known as mobile/multi-access edge computing (MEC). To improve the MEC performance, it is required to design an optimal offloading strategy that includes offloading decision (i.e., whether offloading or not) and computational resource allocation of MEC. The design can be formulated as a mixed-integer nonlinear programming (MINLP) problem, which is generally NP-hard and its effective solution can be obtained by performing online inference through a well-trained deep neural network (DNN) model. However, when the system environments change dynamically, the DNN model may lose efficacy due to the drift of input parameters, thereby decreasing the generalization ability of the DNN model. To address this unique challenge, in this paper, we propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs). Specifically, the shared backbone will be invariant during the PHs training and the inferred results will be ensembled, thereby significantly reducing the required training overhead and improving the inference performance. As a result, the joint optimization problem for offloading decision and resource allocation can be efficiently solved even in a time-varying wireless environment. Experimental results show that the proposed MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
    摘要 computation offloading 已成为支持 computationally intensive 和延迟敏感应用的受欢迎解决方案,通过将计算任务传输到 mobil edge server (MES) 进行执行,这被称为 mobil/多 access edge computing (MEC)。为了提高 MEC 性能,需要设计一个优化的卸载策略,包括卸载决策(是否卸载)和 MEC 的计算资源分配。该设计可以表示为混合整数非线性编程 (MINLP) 问题,通常是NP-hard 的,其有效解决方法是通过在训练了深度神经网络 (DNN) 模型的线上推理进行获取。然而,当系统环境变化 dynamically 时,DNN 模型可能会失去有效性,因为输入参数的漂移,从而降低 DNN 模型的泛化能力。为解决这个特殊挑战,在这篇论文中,我们提出了一种多头集合多任务学习 (MEMTL) 方法,其特点是共享脊梁和多个预测头 (PH)。具体来说,共享脊梁在 PH 训练时保持不变,并将推理结果ensemble,从而减少了训练负担和提高了推理性能。因此,在时变无线环境中,可以效率地解决卸载决策和资源分配的共优化问题。实验结果表明,提出的 MEMTL 方法在推理准确率和平均方差Error 方面具有显著优势,而无需更多的训练数据。

Discovering Predictive Relational Object Symbols with Symbolic Attentive Layers

  • paper_url: http://arxiv.org/abs/2309.00889
  • repo_url: None
  • paper_authors: Alper Ahmetoglu, Batuhan Celik, Erhan Oztop, Emre Ugur
  • for: 这项研究的目的是开发一种新的深度学习架构,用于自动发现对象和其关系的符号表示。
  • methods: 该模型使用了一个自我注意层,计算了对象特征中的关注量,并将关注量用于对象符号的聚合和行为效果预测。
  • results: 实验表明,该模型在一个 simulate 的表格环境中,能够更好地预测行为效果,同时同时自动发现对象符号和关系符号。分析表明,学习的符号与表格环境中对象之间的相对位置、物品类型和横向Alignment有关。
    Abstract In this paper, we propose and realize a new deep learning architecture for discovering symbolic representations for objects and their relations based on the self-supervised continuous interaction of a manipulator robot with multiple objects on a tabletop environment. The key feature of the model is that it can handle a changing number number of objects naturally and map the object-object relations into symbolic domain explicitly. In the model, we employ a self-attention layer that computes discrete attention weights from object features, which are treated as relational symbols between objects. These relational symbols are then used to aggregate the learned object symbols and predict the effects of executed actions on each object. The result is a pipeline that allows the formation of object symbols and relational symbols from a dataset of object features, actions, and effects in an end-to-end manner. We compare the performance of our proposed architecture with state-of-the-art symbol discovery methods in a simulated tabletop environment where the robot needs to discover symbols related to the relative positions of objects to predict the observed effect successfully. Our experiments show that the proposed architecture performs better than other baselines in effect prediction while forming not only object symbols but also relational symbols. Furthermore, we analyze the learned symbols and relational patterns between objects to learn about how the model interprets the environment. Our analysis shows that the learned symbols relate to the relative positions of objects, object types, and their horizontal alignment on the table, which reflect the regularities in the environment.
    摘要 在这篇论文中,我们提出了一种新的深度学习架构,用于从自适应的 kontinuous 互动中找到对象和它们之间的关系的符号表示。这个模型的关键特点是可以自然地处理变化数量的对象,并将对象之间的关系Explicitly map到符号领域中。在模型中,我们使用了一层自注意力层,从对象特征中计算出精确的注意力权重,这些注意力权重被视为对象之间的关系符号。这些关系符号然后用于聚合学习的对象符号和预测对象上执行的效果。这个管道可以从对象特征、动作和效果的数据集中形成对象符号和关系符号,并在端到端的方式下进行结构化的符号探索。我们与其他基eline进行比较,并在模拟的桌面环境中证明了我们提出的架构的性能比其他基eline更好,可以成功预测对象之间的关系和效果。此外,我们分析了模型中学习的符号和对象之间的关系,发现符号与对象的相对位置、对象类型和桌面上的水平对齐有关,这些符号与环境中的常见性相符。

Tight Bounds for Machine Unlearning via Differential Privacy

  • paper_url: http://arxiv.org/abs/2309.00886
  • repo_url: None
  • paper_authors: Yiyang Huang, Clément L. Canonne
  • for: 这个论文探讨了一种名为”机器忘记”的概念,即在训练数据中删除一些点的问题。
  • methods: 作者使用了幂等私钥技术(DP)来实现机器忘记。
  • results: 作者 closing the gap between upper and lower bounds on the deletion capacity of DP-based machine unlearning algorithms, obtaining tight bounds on the deletion capacity achievable by these algorithms.
    Abstract We consider the formulation of "machine unlearning" of Sekhari, Acharya, Kamath, and Suresh (NeurIPS 2021), which formalizes the so-called "right to be forgotten" by requiring that a trained model, upon request, should be able to "unlearn" a number of points from the training data, as if they had never been included in the first place. Sekhari et al. established some positive and negative results about the number of data points that can be successfully unlearnt by a trained model without impacting the model's accuracy (the "deletion capacity"), showing that machine unlearning could be achieved by using differentially private (DP) algorithms. However, their results left open a gap between upper and lower bounds on the deletion capacity of these algorithms: our work fully closes this gap, obtaining tight bounds on the deletion capacity achievable by DP-based machine unlearning algorithms.
    摘要 我团队考虑了Sekhari等人(NeurIPS 2021)所提出的机器“忘记” formalization,即要求已经训练过的模型,在请求时,能够“忘记”一些训练数据点,如果这些点从来没有被包含在模型中。Sekhari等人确立了一些积极和消极结果,表明可以通过使用匿名隐私(DP)算法实现机器忘记。然而,他们的结果留下了一个DP算法的删除容量(deletion capacity)的上下限之间的差距:我们的工作完全关闭了这个差距,得到了DP基于的机器忘记算法的精确 deletion capacity 上限。

Towards Certified Probabilistic Robustness with High Accuracy

  • paper_url: http://arxiv.org/abs/2309.00879
  • repo_url: None
  • paper_authors: Ruihan Zhang, Peixin Zhang, Jun Sun
  • for: This paper aims to build certifiably robust yet accurate neural network models, which is an open problem in the field of adversarial examples.
  • methods: The proposed approach consists of two parts: a probabilistic robust training method that minimizes variance in terms of divergence, and a runtime inference method for certified probabilistic robustness of the prediction.
  • results: The proposed approach significantly outperforms existing approaches in terms of both certification rate and accuracy, and is reasonably efficient. The approach works for a variety of perturbations and is applicable to multiple models trained on different datasets.
    Abstract Adversarial examples pose a security threat to many critical systems built on neural networks (such as face recognition systems, and self-driving cars). While many methods have been proposed to build robust models, how to build certifiably robust yet accurate neural network models remains an open problem. For example, adversarial training improves empirical robustness, but they do not provide certification of the model's robustness. On the other hand, certified training provides certified robustness but at the cost of a significant accuracy drop. In this work, we propose a novel approach that aims to achieve both high accuracy and certified probabilistic robustness. Our method has two parts, i.e., a probabilistic robust training method with an additional goal of minimizing variance in terms of divergence and a runtime inference method for certified probabilistic robustness of the prediction. The latter enables efficient certification of the model's probabilistic robustness at runtime with statistical guarantees. This is supported by our training objective, which minimizes the variance of the model's predictions in a given vicinity, derived from a general definition of model robustness. Our approach works for a variety of perturbations and is reasonably efficient. Our experiments on multiple models trained on different datasets demonstrate that our approach significantly outperforms existing approaches in terms of both certification rate and accuracy.
    摘要 遭遇攻击性示例对许多基于神经网络的重要系统(如识别面部系统和自动驾驶车)的安全性带来了威胁。许多方法已经被提出来建立坚固的模型,但是如何建立认证可靠且精确的神经网络模型仍然是一个开启的问题。例如,敌对训练可以提高了实际的抗衡能力,但它们不会提供模型的认证 robustness。另一方面,认证训练则可以提供认证的 robustness,但是它们会导致模型的精确度下降。在这个工作中,我们提出了一个新的方法,旨在实现高精确度和认证可靠的神经网络模型。我们的方法有两部分:一个是一种概率 robust 的训练方法,另一个是一种runtime inference方法,用于认证模型的概率 robustness。这个方法可以实现在不同类型的攻击下,且是相对高效的。我们在多个模型和不同的数据集上进行了实验,结果显示,我们的方法在认证率和精确度两方面都大大超过了现有的方法。

Pretraining Representations for Bioacoustic Few-shot Detection using Supervised Contrastive Learning

  • paper_url: http://arxiv.org/abs/2309.00878
  • repo_url: https://github.com/ilyassmoummad/dcase23_task5_scl
  • paper_authors: Ilyass Moummad, Romain Serizel, Nicolas Farrugia
  • for: 这个论文的目的是提出一种基于几何学习的听写音频数据分类方法,以解决生物听音应用中的听音事件检测问题。
  • methods: 该方法利用了数据增强和监睹学习框架,从零开始学习一个精彩特征提取器。
  • results: 该方法在验证集上获得了63.46%的F1分数,在测试集上获得了42.7%的F1分数,在DCASE挑战中名列第二。
    Abstract Deep learning has been widely used recently for sound event detection and classification. Its success is linked to the availability of sufficiently large datasets, possibly with corresponding annotations when supervised learning is considered. In bioacoustic applications, most tasks come with few labelled training data, because annotating long recordings is time consuming and costly. Therefore supervised learning is not the best suited approach to solve bioacoustic tasks. The bioacoustic community recasted the problem of sound event detection within the framework of few-shot learning, i.e. training a system with only few labeled examples. The few-shot bioacoustic sound event detection task in the DCASE challenge focuses on detecting events in long audio recordings given only five annotated examples for each class of interest. In this paper, we show that learning a rich feature extractor from scratch can be achieved by leveraging data augmentation using a supervised contrastive learning framework. We highlight the ability of this framework to transfer well for five-shot event detection on previously unseen classes in the training data. We obtain an F-score of 63.46\% on the validation set and 42.7\% on the test set, ranking second in the DCASE challenge. We provide an ablation study for the critical choices of data augmentation techniques as well as for the learning strategy applied on the training set.
    摘要 现代深度学习技术在声音事件检测和分类方面得到了广泛应用。其成功与具有足够大的数据集,可能带有相应的注释时supervised learning是考虑的。在生物声学应用中,大多数任务都有少量标注的训练数据,因为注释长录音是时间consuming和costly。因此,supervised learning不是解决生物声学任务的最佳方法。生物声学社区将声音事件检测问题重新定义为few-shot learning问题,即使用只有几个标注的示例来训练系统。DCASE挑战中的声音事件检测五个难题中的few-shot bioacoustic sound event detection task是检测长录音中的事件,只需五个标注示例。在这篇论文中,我们表明了可以通过利用数据增强和supervised contrastive learning框架来学习rich feature extractor从scratch。我们指出了这种框架的可轻 Transfer Learning,能够在未看过的类型上进行五个shot事件检测。我们在验证集上取得了63.46%的F-score和42.7%的测试集F-score,在DCASE挑战中排名第二。我们还提供了关键的数据增强技术和训练集上的学习策略的ablation study。

Tutorial: a priori estimation of sample size, effect size, and statistical power for cluster analysis, latent class analysis, and multivariate mixture models

  • paper_url: http://arxiv.org/abs/2309.00866
  • repo_url: https://github.com/esdalmaijer/cluster_power_tutorial
  • paper_authors: Edwin S Dalmaijer
  • For: The paper is written for researchers who want to determine the sample size and effect size for analyses that identify subgroups.* Methods: The paper provides a roadmap for determining sample size and effect size using a procedure that formalizes expectations about effect sizes in a specific domain, and establishes the minimum sample size for subgroup analyses using simulations.* Results: The paper provides a reference table for the most popular subgroup analyses, including k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modeling. The table shows the minimum numbers of observations per expected subgroup and features to achieve acceptable statistical power.
    Abstract Before embarking on data collection, researchers typically compute how many individual observations they should do. This is vital for doing studies with sufficient statistical power, and often a cornerstone in study pre-registrations and grant applications. For traditional statistical tests, one would typically determine an acceptable level of statistical power, (gu)estimate effect size, and then use both values to compute the required sample size. However, for analyses that identify subgroups, statistical power is harder to establish. Once sample size reaches a sufficient threshold, effect size is primarily determined by the number of measured features and the underlying subgroup separation. As a consequence, a priory computations of statistical power are notoriously complex. In this tutorial, I will provide a roadmap to determining sample size and effect size for analyses that identify subgroups. First, I introduce a procedure that allows researchers to formalise their expectations about effect sizes in their domain of choice, and use this to compute the minimally required number of measured variables. Next, I outline how to establish the minimum sample size in subgroup analyses. Finally, I use simulations to provide a reference table for the most popular subgroup analyses: k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modelling. The table shows the minimum numbers of observations per expected subgroup (sample size) and features (measured variables) to achieve acceptable statistical power, and can be readily used in study design.
    摘要 Before starting data collection, researchers usually calculate how many individual observations they should collect. This is crucial for conducting studies with sufficient statistical power, and is often a key component of study pre-registrations and grant applications. For traditional statistical tests, one would typically determine an acceptable level of statistical power, estimate effect size, and then use both values to compute the required sample size. However, for analyses that identify subgroups, statistical power is more difficult to establish. Once the sample size reaches a sufficient threshold, effect size is primarily determined by the number of measured features and the underlying subgroup separation. As a consequence, a priori computations of statistical power are notoriously complex. In this tutorial, I will provide a roadmap to determining sample size and effect size for analyses that identify subgroups. First, I introduce a procedure that allows researchers to formalize their expectations about effect sizes in their domain of choice, and use this to compute the minimally required number of measured variables. Next, I outline how to establish the minimum sample size in subgroup analyses. Finally, I use simulations to provide a reference table for the most popular subgroup analyses: k-means, Ward agglomerative hierarchical clustering, c-means fuzzy clustering, latent class analysis, latent profile analysis, and Gaussian mixture modeling. The table shows the minimum numbers of observations per expected subgroup (sample size) and features (measured variables) to achieve acceptable statistical power, and can be readily used in study design.

DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal

  • paper_url: http://arxiv.org/abs/2309.00855
  • repo_url: https://github.com/wwweiwei/dora
  • paper_authors: Wei-Wei Du, Wei-Yao Wang, Wen-Chih Peng
  • for: 这个论文旨在提供一种基于领域知识的自我监督学习框架,用于低资源的不动产评估。
  • methods: 该模型使用内样本地理预测作为预tex task,并使用 между样本对比学习来增强表示的泛化能力。
  • results: 对实际交易数据进行测试,该模型在几 shot enario下显著超过了基于数据表格的SSL基线、图形基本方法和指导方法。
    Abstract The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
    摘要 marketplace系统连接需求和供应,以发展不偏袋折的决策方法。房地产评估作为高成本房产评估任务,需要域专家根据相关知识和市场判断来进行估价。现有的自动评估模型可以减少域专家的主观性,但它们需要大量的交易数据进行有效评估,这是限制了不仅标注努力,还限制了新规划和农村地区的普适性。为了学习不标注的房地产集合中的表示,现有的自动学习(SSL)技术对于表格数据 neglects 多种重要特征,并且无法包含域知识。在这篇论文中,我们提出了DoRA,一种基于域的自动学习框架,用于低资源房地产评估。DoRA通过 metadata 中的地理预测任务进行预训练,以具备房地产表示的先验知识。此外,我们还使用了交叉样本学习来使表示扩展到有限交易下的稳定性。我们对实际交易中的三种不同类型的财产进行了测试,结果显示DoRA在几个shot scenario下明显超过了SSL基线、图表基eline和批处理方法的性能,提高了MAPЭ、MAE和HR10的性能。我们预计DoRA将对其他金融实践人员有用,他们需要面临新建和有限记录的财产评估模型。代码可以在 获取。

A Unifying Variational Framework for Gaussian Process Motion Planning

  • paper_url: http://arxiv.org/abs/2309.00854
  • repo_url: None
  • paper_authors: Lucas Cosier, Rares Iordan, Sicelukwanda Zwane, Giovanni Franzese, James T. Wilson, Marc Peter Deisenroth, Alexander Terenin, Yasemin Bekiroglu
  • for: 本文为了提出一种基于Variational Gaussian Processes的机器人运动规划框架,以解决机器人运动规划问题中的各种约束和不确定性问题。
  • methods: 本文使用Variational Gaussian Processes来处理机器人运动规划问题,并提出了一种结合概率推理的框架来处理等式、不等式和软运动约束。
  • results: 实验结果表明, compared to基准方法,本文提出的方法可以更好地平衡成功率和运动规划质量。
    Abstract To control how a robot moves, motion planning algorithms must compute paths in high-dimensional state spaces while accounting for physical constraints related to motors and joints, generating smooth and stable motions, avoiding obstacles, and preventing collisions. A motion planning algorithm must therefore balance competing demands, and should ideally incorporate uncertainty to handle noise, model errors, and facilitate deployment in complex environments. To address these issues, we introduce a framework for robot motion planning based on variational Gaussian Processes, which unifies and generalizes various probabilistic-inference-based motion planning algorithms. Our framework provides a principled and flexible way to incorporate equality-based, inequality-based, and soft motion-planning constraints during end-to-end training, is straightforward to implement, and provides both interval-based and Monte-Carlo-based uncertainty estimates. We conduct experiments using different environments and robots, comparing against baseline approaches based on the feasibility of the planned paths, and obstacle avoidance quality. Results show that our proposed approach yields a good balance between success rates and path quality.
    摘要 要控制 robot 的移动,动作规划算法需要计算高维状态空间中的路径,同时考虑到机械制约和 JOINTS 的物理约束,生成平滑和稳定的动作,避免障碍物和冲突。一个动作规划算法应该平衡竞合的需求,并应该包含不确定性,以处理噪声、模型错误和复杂环境中的部署。为解决这些问题,我们介绍了基于Variational Gaussian Processes的机器人动作规划框架,这个框架统一和总结了各种基于概率推理的动作规划算法。我们的框架可以在终端训练中采用等价、不等价和软动作规划约束,并提供了间隔型和Monte Carlo 类型的不确定性估计。我们在不同的环境和机器人上进行了实验,与基线方法进行比较,评价计划路径的可行性和避免障碍质量。结果表明,我们的提议方法可以获得良好的平衡,同时保证动作质量。

Autonomous Soft Tissue Retraction Using Demonstration-Guided Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.00837
  • repo_url: None
  • paper_authors: Amritpal Singh, Wenqi Shi, May D Wang
  • for: 这个论文的目的是为了研究和开发一种能够处理软体的手术机器人系统。
  • methods: 这个论文使用了ROS相容的物理 simulate环境,以及使用示例导引学习(RL)算法来学习软体交互。
  • results: 这个研究实现了一种自动化手术软体压缩的方法,并证明了这种方法的可行性。
    Abstract In the context of surgery, robots can provide substantial assistance by performing small, repetitive tasks such as suturing, needle exchange, and tissue retraction, thereby enabling surgeons to concentrate on more complex aspects of the procedure. However, existing surgical task learning mainly pertains to rigid body interactions, whereas the advancement towards more sophisticated surgical robots necessitates the manipulation of soft bodies. Previous work focused on tissue phantoms for soft tissue task learning, which can be expensive and can be an entry barrier to research. Simulation environments present a safe and efficient way to learn surgical tasks before their application to actual tissue. In this study, we create a Robot Operating System (ROS)-compatible physics simulation environment with support for both rigid and soft body interactions within surgical tasks. Furthermore, we investigate the soft tissue interactions facilitated by the patient-side manipulator of the DaVinci surgical robot. Leveraging the pybullet physics engine, we simulate kinematics and establish anchor points to guide the robotic arm when manipulating soft tissue. Using demonstration-guided reinforcement learning (RL) algorithms, we investigate their performance in comparison to traditional reinforcement learning algorithms. Our in silico trials demonstrate a proof-of-concept for autonomous surgical soft tissue retraction. The results corroborate the feasibility of learning soft body manipulation through the application of reinforcement learning agents. This work lays the foundation for future research into the development and refinement of surgical robots capable of managing both rigid and soft tissue interactions. Code is available at https://github.com/amritpal-001/tissue_retract.
    摘要 在外科领域,机器人可以提供重要的协助,包括进行小、重复的任务,如缝合、针替换和组织吸引,以便外科医生能够更专注于更复杂的过程。然而,现有的外科任务学习主要关注坚体交互,而随着外科机器人的发展,需要涉及到软体的操作。之前的工作主要集中在假体中学习软组织任务,这可能会昂贵并成为研究入门障碍。在这种情况下,我们创建了ROS兼容的物理 simulate环境,并支持坚体和软体交互在外科任务中。此外,我们通过DaVinci外科机器人的病人侧把手 investigate软组织交互的可能性。通过pybullet物理引擎,我们模拟了机械学和确定了引导外科机器人的软组织 manipulate的anchor点。使用示例导引学习(RL)算法,我们研究其性能与传统RL算法相比。我们的室内实验结果表明,通过应用RL代理人,可以实现自主的外科软组织吸引。这些结果证明了在应用RL算法时,可以学习软体操作。这项工作为未来关于开发和改进外科机器人的研究提供了基础。代码可以在https://github.com/amritpal-001/tissue_retract中找到。

Approximating Fair $k$-Min-Sum-Radii in $\mathbb{R}^d$

  • paper_url: http://arxiv.org/abs/2309.00834
  • repo_url: None
  • paper_authors: Lukas Drexler, Annika Hennes, Abhiruk Lahiri, Melanie Schmidt, Julian Wargalla
    for:* The paper is focused on the $k$-min-sum-radii problem in the context of fair clustering.methods:* The paper proposes a PTAS (Probably Approximately Correct) algorithm for the fair $k$-min-sum-radii problem in Euclidean spaces of arbitrary dimension, with a constant number of clusters $k$.results:* The proposed algorithm is the first PTAS for the fair $k$-min-sum-radii problem, and it works for different notions of group fairness.
    Abstract The $k$-center problem is a classical clustering problem in which one is asked to find a partitioning of a point set $P$ into $k$ clusters such that the maximum radius of any cluster is minimized. It is well-studied. But what if we add up the radii of the clusters instead of only considering the cluster with maximum radius? This natural variant is called the $k$-min-sum-radii problem. It has become the subject of more and more interest in recent years, inspiring the development of approximation algorithms for the $k$-min-sum-radii problem in its plain version as well as in constrained settings. We study the problem for Euclidean spaces $\mathbb{R}^d$ of arbitrary dimension but assume the number $k$ of clusters to be constant. In this case, a PTAS for the problem is known (see Bandyapadhyay, Lochet and Saurabh, SoCG, 2023). Our aim is to extend the knowledge base for $k$-min-sum-radii to the domain of fair clustering. We study several group fairness constraints, such as the one introduced by Chierichetti et al. (NeurIPS, 2017). In this model, input points have an additional attribute (e.g., colors such as red and blue), and clusters have to preserve the ratio between different attribute values (e.g., have the same fraction of red and blue points as the ground set). Different variants of this general idea have been studied in the literature. To the best of our knowledge, no approximative results for the fair $k$-min-sum-radii problem are known, despite the immense amount of work on the related fair $k$-center problem. We propose a PTAS for the fair $k$-min-sum-radii problem in Euclidean spaces of arbitrary dimension for the case of constant $k$. To the best of our knowledge, this is the first PTAS for the problem. It works for different notions of group fairness.
    摘要 “$k$-中心问题”是一个热门的聚集问题,要找到一个分 partitioning 方案,使得该集合中的各个对象的最大半径 minimized。这个问题已经很受欢迎,但如果我们总和所有对象的半径而不是仅考虑最大半径,这就是“$k$-min-sum-radii”问题。这个问题在最近的几年中已经引起了越来越多的关注,并且开发了访问算法。我们在 $\mathbb{R}^d$ 的任意维度上研究这个问题,并假设对象的数量是常数的。在这种情况下,我们知道PTAS的存在(见 Bandyapadhyay、Lochet 和 Saurabh, SoCG, 2023)。我们的目标是将这个知识库扩展到公平聚集领域。我们研究了许多公平聚集约束,例如 Chierichetti 等人(NeurIPS, 2017)提出的一种模型,在这个模型中,输入对象有一个额外的特征(例如颜色,如红色和蓝色),并且要求各个集合保持不同特征值的比例(例如输入集合中的红色和蓝色对象的比例和输入集合中的红色和蓝色对象的比例一样)。不同的这种一般的想法已经在文献中被研究。我们提出了一个PTAS для公平的 $k$-min-sum-radii 问题。这是我们知道的第一个PTAS。它适用于不同的公平性观念。

Trustworthiness-Driven Graph Convolutional Networks for Signed Network Embedding

  • paper_url: http://arxiv.org/abs/2309.00816
  • repo_url: https://github.com/kmj0792/trustsgcn
  • paper_authors: Min-Jeong Kim, Yeon-Chang Lee, David Y. Kang, Sang-Wook Kim
  • for: 本文targets the problem of representing nodes in a signed network as low-dimensional vectors, and proposes a novel GCN-based approach named TrustSGCN to correct for incorrect embedding propagation.
  • methods: 本文提出了三个模块:生成每个节点的扩展 egonetwork(M1),测量边签信任度(M2),和基于信任度的协同嵌入传播(M3)。
  • results: 实验结果显示,TrustSGCN在四个真实的签名网络 dataset 上Consistently outperforms five state-of-the-art GCN-based SNE methods。
    Abstract The problem of representing nodes in a signed network as low-dimensional vectors, known as signed network embedding (SNE), has garnered considerable attention in recent years. While several SNE methods based on graph convolutional networks (GCN) have been proposed for this problem, we point out that they significantly rely on the assumption that the decades-old balance theory always holds in the real-world. To address this limitation, we propose a novel GCN-based SNE approach, named as TrustSGCN, which corrects for incorrect embedding propagation in GCN by utilizing the trustworthiness on edge signs for high-order relationships inferred by the balance theory. The proposed approach consists of three modules: (M1) generation of each node's extended ego-network; (M2) measurement of trustworthiness on edge signs; and (M3) trustworthiness-aware propagation of embeddings. Furthermore, TrustSGCN learns the node embeddings by leveraging two well-known societal theories, i.e., balance and status. The experiments on four real-world signed network datasets demonstrate that TrustSGCN consistently outperforms five state-of-the-art GCN-based SNE methods. The code is available at https://github.com/kmj0792/TrustSGCN.
    摘要 “简洁网络图(Signed Network)内的节点表示为低维Vector的问题,称为简洁网络嵌入(SNE),在最近的几年中受到了很大的关注。然而,许多基于图像润满网络(GCN)的SNE方法已经被提出,但是它们假设了现今已经很长时间的平衡理论应用于实际中。为了解决这个限制,我们提出了一个新的GCN基于的SNE方法,名为信任GCN(TrustSGCN),它通过使用对端标签的信任性来修正GCN中的嵌入传播。提案的方法包括三个模组:(M1)每个节点的扩展EGO网络的生成;(M2)根据端标签的信任性量度;以及(M3)基于信任性的嵌入传播。此外,TrustSGCN利用了社会学中两个著名的理论,即平衡和社会地位,来学习节点的嵌入。实验结果显示,TrustSGCN在四个真实的简洁网络数据集上显著地超越了五个现有的GCN基于的SNE方法。代码可以在https://github.com/kmj0792/TrustSGCN中找到。”

Fairness Implications of Heterogeneous Treatment Effect Estimation with Machine Learning Methods in Policy-making

  • paper_url: http://arxiv.org/abs/2309.00805
  • repo_url: None
  • paper_authors: Patrick Rehill, Nicholas Biddle
  • for: The paper is written for governments trying to make and implement policy using causal machine learning methods, and for researchers and practitioners working in this area.
  • methods: The paper discusses the use of AI Fairness methods to protect against unintended consequences in machine learning models, but argues that these methods are not suitable for all causal machine learning applications. Instead, the paper proposes a definition of fairness for indirect decision-making scenarios, where the causal machine learning model only has indirect power.
  • results: The paper argues that the complexity of causal machine learning models can make it difficult to achieve fairness in policy-making, and suggests that careful modelling and awareness of decision-making biases are necessary to address this challenge.
    Abstract Causal machine learning methods which flexibly generate heterogeneous treatment effect estimates could be very useful tools for governments trying to make and implement policy. However, as the critical artificial intelligence literature has shown, governments must be very careful of unintended consequences when using machine learning models. One way to try and protect against unintended bad outcomes is with AI Fairness methods which seek to create machine learning models where sensitive variables like race or gender do not influence outcomes. In this paper we argue that standard AI Fairness approaches developed for predictive machine learning are not suitable for all causal machine learning applications because causal machine learning generally (at least so far) uses modelling to inform a human who is the ultimate decision-maker while AI Fairness approaches assume a model that is making decisions directly. We define these scenarios as indirect and direct decision-making respectively and suggest that policy-making is best seen as a joint decision where the causal machine learning model usually only has indirect power. We lay out a definition of fairness for this scenario - a model that provides the information a decision-maker needs to accurately make a value judgement about just policy outcomes - and argue that the complexity of causal machine learning models can make this difficult to achieve. The solution here is not traditional AI Fairness adjustments, but careful modelling and awareness of some of the decision-making biases that these methods might encourage which we describe.
    摘要 政府可以使用可变性机器学习方法来生成不同类型的干预效果估计,这些方法可能是政府制定和实施政策的有用工具。然而,根据人工智能文献所示,政府应该非常小心不良后果,因为机器学习模型可能会导致不良后果。一种方法是使用 AI Fairness 方法来创建不受敏感变量(如种族或性别)影响的机器学习模型。在这篇论文中,我们 argue That标准 AI Fairness 方法不适用于所有 causal machine learning 应用程序,因为 causal machine learning 通常(至少是)使用模型来告诉人类决策者做出决定。我们称这些场景为 indirect 和 direct 决策making 分别,并认为政策制定是 indirect 决策和 machine learning 模型通常只有 indirect 力量的共同决策。我们提出了一种公平定义 - 一个模型可以提供决策者准确地判断正确的政策结果的信息 - 并 argue dass causal machine learning 模型的复杂性可能使这困难实现。在这里,不是传统的 AI Fairness 调整,而是仔细的模型和决策BIAS 的认识,我们描述。

Deep Learning and Inverse Problems

  • paper_url: http://arxiv.org/abs/2309.00802
  • repo_url: https://github.com/alexpapados/Physics-Informed-Deep-Learning-Solid-and-Fluid-Mechanics
  • paper_authors: Ali Mohammad-Djafari, Ning Chu, Li Wang, Liang Yu
  • for: 这篇论文主要用于探讨深度学习(DL)和神经网络(NN)在反问题上的应用。
  • methods: 本文使用了NN和DLsurrogate模型,以及approximate计算来解决反问题。
  • results: 本文描述了两种情况:首先,使用已知前进算子作为物理约束的情况,其次更一般的数据驱动DL方法。
    Abstract Machine Learning (ML) methods and tools have gained great success in many data, signal, image and video processing tasks, such as classification, clustering, object detection, semantic segmentation, language processing, Human-Machine interface, etc. In computer vision, image and video processing, these methods are mainly based on Neural Networks (NN) and in particular Convolutional NN (CNN), and more generally Deep NN. Inverse problems arise anywhere we have indirect measurement. As, in general, those inverse problems are ill-posed, to obtain satisfactory solutions for them needs prior information. Different regularization methods have been proposed, where the problem becomes the optimization of a criterion with a likelihood term and a regularization term. The main difficulty, however, in great dimensional real applications, remains the computational cost. Using NN, and in particular Deep Learning (DL) surrogate models and approximate computation, can become very helpful. In this work, we focus on NN and DL particularly adapted for inverse problems. We consider two cases: First the case where the forward operator is known and used as physics constraint, the second more general data driven DL methods.
    摘要

League of Legends: Real-Time Result Prediction

  • paper_url: http://arxiv.org/abs/2309.02449
  • repo_url: None
  • paper_authors: Jailson B. S. Junior, Claudio E. C. Campelo
  • for: 这个研究旨在使用机器学习技术预测电子游戏League of Legends(LoL)的赛事结果。
  • methods: 这个研究使用了未发表数据作为基础,考虑了不同变量和比赛阶段,以探索实时预测的能力。
  • results: 研究发现LightGBM模型在比赛中阶段时间占60%-80%时的平均准确率达81.62%,而逻辑回归和梯度抽象模型在早期比赛阶段表现更佳,得到了推动性的结果。
    Abstract This paper presents a study on the prediction of outcomes in matches of the electronic game League of Legends (LoL) using machine learning techniques. With the aim of exploring the ability to predict real-time results, considering different variables and stages of the match, we highlight the use of unpublished data as a fundamental part of this process. With the increasing popularity of LoL and the emergence of tournaments, betting related to the game has also emerged, making the investigation in this area even more relevant. A variety of models were evaluated and the results were encouraging. A model based on LightGBM showed the best performance, achieving an average accuracy of 81.62\% in intermediate stages of the match when the percentage of elapsed time was between 60\% and 80\%. On the other hand, the Logistic Regression and Gradient Boosting models proved to be more effective in early stages of the game, with promising results. This study contributes to the field of machine learning applied to electronic games, providing valuable insights into real-time prediction in League of Legends. The results obtained may be relevant for both players seeking to improve their strategies and the betting industry related to the game.
    摘要 The study evaluated a variety of models, and the results were encouraging. A LightGBM-based model achieved an average accuracy of 81.62% in intermediate stages of the match, when the percentage of elapsed time was between 60% and 80%. On the other hand, Logistic Regression and Gradient Boosting models were more effective in early stages of the game, with promising results.This study contributes to the field of machine learning applied to electronic games, providing valuable insights into real-time prediction in League of Legends. The results obtained may be relevant for both players seeking to improve their strategies and the betting industry related to the game.Translation notes:* "electronic game" is translated as "电子游戏" (diàn xī yóu xì)* "League of Legends" is translated as "英雄联盟" (yīng xióng lián méng)* "machine learning techniques" is translated as "机器学习技术" (jī shī xué xí jì shù)* "real-time results" is translated as "实时结果" (shí jī jié guǒ)* "unpublished data" is translated as "未发布数据" (wèi fā bìu xiàng xì)* "betting industry" is translated as "赌博业" (jià bò yè)

Diffusion Modeling with Domain-conditioned Prior Guidance for Accelerated MRI and qMRI Reconstruction

  • paper_url: http://arxiv.org/abs/2309.00783
  • repo_url: None
  • paper_authors: Wanyu Bian, Albert Jang, Fang Liu
  • for: 这种方法是为了恢复图像,特别是在高速因素下进行恢复。
  • methods: 该方法基于一种吸引模型,该模型在数据领域中受到native数据的约束,并在频率和参数领域中使用域控制的扩散模型。使用了MRI物理学习 embeddings,以实现数据一致性和指导训练和抽样过程。
  • results: 该方法在多核磁共振和量化MRI恢复中显示出了显著的损害降低和精度保持,特别是在高速因素下。此外,该方法还可以在不同的解剖结构中维持高效率和准确性。
    Abstract This study introduces a novel approach for image reconstruction based on a diffusion model conditioned on the native data domain. Our method is applied to multi-coil MRI and quantitative MRI reconstruction, leveraging the domain-conditioned diffusion model within the frequency and parameter domains. The prior MRI physics are used as embeddings in the diffusion model, enforcing data consistency to guide the training and sampling process, characterizing MRI k-space encoding in MRI reconstruction, and leveraging MR signal modeling for qMRI reconstruction. Furthermore, a gradient descent optimization is incorporated into the diffusion steps, enhancing feature learning and improving denoising. The proposed method demonstrates a significant promise, particularly for reconstructing images at high acceleration factors. Notably, it maintains great reconstruction accuracy and efficiency for static and quantitative MRI reconstruction across diverse anatomical structures. Beyond its immediate applications, this method provides potential generalization capability, making it adaptable to inverse problems across various domains.
    摘要

Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction

  • paper_url: http://arxiv.org/abs/2309.00781
  • repo_url: None
  • paper_authors: Alejandro Rodriguez Dominguez, Muhammad Shahzad, Xia Hong
  • For: 这个研究旨在解决多modal regression问题,特别是预测非站ARY процес或具有复杂的混合分布。* Methods: 这个研究使用了一种组合多个假设预测器的结构化对�ishment(Radial Basis Function Network),并证明这个模型可以优化多个假设目标分布。* Results: 研究发现这个模型可以实现高度的普遍化表现和计算效率,并且只需使用两层神经网作为预测器即可控制多样性。此外,这个模型还可以使用梯度下降方法来实现损失无关的多个预测器。实验结果显示这个模型可以在Literature中优化表现。
    Abstract Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. It can be tackled with multiple hypotheses frameworks but with the difficulty of combining them efficiently in a learning model. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. The predictors are regression models of any type that can form centroidal Voronoi tessellations which are a function of their losses during training. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution and is equivalent to interpolating the meta-loss of the predictors, the loss being a zero set of the interpolation error. This model has a fixed-point iteration algorithm between the predictors and the centers of the basis functions. Diversity in learning can be controlled parametrically by truncating the tessellation formation with the losses of individual predictors. A closed-form solution with least-squares is presented, which to the authors knowledge, is the fastest solution in the literature for multiple hypotheses and structured predictions. Superior generalization performance and computational efficiency is achieved using only two-layer neural networks as predictors controlling diversity as a key component of success. A gradient-descent approach is introduced which is loss-agnostic regarding the predictors. The expected value for the loss of the structured model with Gaussian basis functions is computed, finding that correlation between predictors is not an appropriate tool for diversification. The experiments show outperformance with respect to the top competitors in the literature.
    摘要 多Modal重要预测非站点过程或复杂的混合分布。它可以通过多个假设框架来解决,但是将其有效地结合到学习模型中是困难的。一种结构化圆拟函数网络是提出的一种多个假设预测器 ensemble for regression problems。这些预测器是任何类型的回归模型,可以形成中心 Voronoi 划分,这是在训练时的损失函数。 proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution, and is equivalent to interpolating the meta-loss of the predictors, the loss being a zero set of the interpolation error. This model has a fixed-point iteration algorithm between the predictors and the centers of the basis functions. Diversity in learning can be controlled parametrically by truncating the tessellation formation with the losses of individual predictors. A closed-form solution with least-squares is presented, which to the authors' knowledge, is the fastest solution in the literature for multiple hypotheses and structured predictions. Using only two-layer neural networks as predictors, the model achieves superior generalization performance and computational efficiency, with diversity as a key component of success. A gradient-descent approach is introduced, which is loss-agnostic regarding the predictors. The expected value for the loss of the structured model with Gaussian basis functions is computed, finding that correlation between predictors is not an appropriate tool for diversification. Experimental results show outperformance with respect to the top competitors in the literature.

Non-Asymptotic Bounds for Adversarial Excess Risk under Misspecified Models

  • paper_url: http://arxiv.org/abs/2309.00771
  • repo_url: None
  • paper_authors: Changyu Liu, Yuling Jiao, Junhui Wang, Jian Huang
  • for: 评估 robust 估计器的性能based on adversarial losses under misspecified models.
  • methods: 使用 distributional adversarial attack 和 adversarial training 进行评估.
  • results: 提出了一种通用的评估方法,并Establish non-asymptotic upper bounds for the adversarial excess risk associated with Lipschitz loss functions.
    Abstract We propose a general approach to evaluating the performance of robust estimators based on adversarial losses under misspecified models. We first show that adversarial risk is equivalent to the risk induced by a distributional adversarial attack under certain smoothness conditions. This ensures that the adversarial training procedure is well-defined. To evaluate the generalization performance of the adversarial estimator, we study the adversarial excess risk. Our proposed analysis method includes investigations on both generalization error and approximation error. We then establish non-asymptotic upper bounds for the adversarial excess risk associated with Lipschitz loss functions. In addition, we apply our general results to adversarial training for classification and regression problems. For the quadratic loss in nonparametric regression, we show that the adversarial excess risk bound can be improved over those for a general loss.
    摘要 我们提出一个通用的方法来评估预测器在不准确模型下的性能,基于敌对损失函数。我们首先显示出敌对损失相等于对于某些紧缩条件的分布型敌对攻击带来的损失。这确保了敌对训练程序的定义性。然后,我们研究了对于预测器的敌对剩余损失,包括预测器的整合误差和近似误差。我们then establish non-asymptotic upper bounds for the adversarial excess risk associated with Lipschitz loss functions. Finally, we apply our general results to adversarial training for classification and regression problems. For the quadratic loss in nonparametric regression, we show that the adversarial excess risk bound can be improved over those for a general loss.Here's the text with some additional information about the terms used:* "预测器" (zhì wén zhī) refers to a predictor or an estimator.* "不准确模型" (bù jian shí mian) refers to a misspecified model.* "敌对损失函数" (dài tào shè yǐ jī) refers to the adversarial loss function.* "分布型敌对攻击" (fēn bù xīng dào) refers to a distributional adversarial attack.* "整合误差" (zhé yì bù yì) refers to the generalization error.* "近似误差" (jìn xiē bù yì) refers to the approximation error.* "Lipschitz loss functions" (Lipschitz loss functions) refer to a class of loss functions that are Lipschitz continuous.* "nonparametric regression" (nonparametric regression) refers to a type of regression analysis that does not make any assumptions about the underlying distribution of the data.

Physics-informed machine learning of the correlation functions in bulk fluids

  • paper_url: http://arxiv.org/abs/2309.00767
  • repo_url: None
  • paper_authors: Wenqian Chen, Peiyuan Gao, Panos Stinis
  • for: 这篇论文主要用于解决粘性液体现代积分理论中的奥托-泽尼克方程。
  • methods: 这篇论文使用了机器学习模型,具体来说是物理学习激活函数和物理学习运算符网络,解决了奥托-泽尼克方程的前向和反向问题。
  • results: 机器学习模型在解决奥托-泽尼克方程的问题上表现了高精度和高效性,并且对液体热动力学理论的应用具有重要的潜在潜力。
    Abstract The Ornstein-Zernike (OZ) equation is the fundamental equation for pair correlation function computations in the modern integral equation theory for liquids. In this work, machine learning models, notably physics-informed neural networks and physics-informed neural operator networks, are explored to solve the OZ equation. The physics-informed machine learning models demonstrate great accuracy and high efficiency in solving the forward and inverse OZ problems of various bulk fluids. The results highlight the significant potential of physics-informed machine learning for applications in thermodynamic state theory.
    摘要 “欧兹方程”(Ornstein-Zernike equation)是现代流体 integral equation theory 中 Computational pair correlation function 的基本方程。在这项工作中,我们explore了机器学习模型,主要是physics-informed neural networks和physics-informed neural operator networks,来解决欧兹方程。这些physics-informed machine learning模型在解决前向和反向欧兹问题方面表现出了很高的准确率和高效性。结果表明physics-informed machine learning在热动力学状态理论中有广泛的应用前景。