cs.LG - 2023-11-27

Learning Multimodal Latent Dynamics for Human-Robot Interaction

  • paper_url: http://arxiv.org/abs/2311.16380
  • repo_url: None
  • paper_authors: Vignesh Prasad, Lea Heitlinger, Dorothea Koert, Ruth Stock-Homburg, Jan Peters, Georgia Chalvatzaki
  • for: 这个论文是为了学习人机共同互动(Human-Robot Interaction,HRI)而写的。
  • methods: 这个论文使用了隐马尔可夫模型(Hidden Markov Models,HMMs)作为幂论空间的先验知识,并使用变量自动编码器来模型交互代理人的联合分布。在训练过程中,我们利用了从人类间互动中学习的交互动力,以便学习HRI,并将 conditional generation of robot motions from human observations 包含在训练中。
  • results: 我们的方法可以更准确地预测机器人的轨迹,并且可以根据人类的观察来适应机器人的动作。对于接触强度的交互,我们可以使用HMM segmentation来调整机器人的刚度,以实现和人类的弹性互动。我们通过用户研究证明了我们的方法的有效性,并发现用户对我们的方法有更高的人性化、时间化和准确性的评价,并将其与其他基准值进行比较。
    Abstract This article presents a method for learning well-coordinated Human-Robot Interaction (HRI) from Human-Human Interactions (HHI). We devise a hybrid approach using Hidden Markov Models (HMMs) as the latent space priors for a Variational Autoencoder to model a joint distribution over the interacting agents. We leverage the interaction dynamics learned from HHI to learn HRI and incorporate the conditional generation of robot motions from human observations into the training, thereby predicting more accurate robot trajectories. The generated robot motions are further adapted with Inverse Kinematics to ensure the desired physical proximity with a human, combining the ease of joint space learning and accurate task space reachability. For contact-rich interactions, we modulate the robot's stiffness using HMM segmentation for a compliant interaction. We verify the effectiveness of our approach deployed on a Humanoid robot via a user study. Our method generalizes well to various humans despite being trained on data from just two humans. We find that Users perceive our method as more human-like, timely, and accurate and rank our method with a higher degree of preference over other baselines.
    摘要

Bayesian Formulations for Graph Spectral Denoising

  • paper_url: http://arxiv.org/abs/2311.16378
  • repo_url: None
  • paper_authors: Sam Leone, Xingzhi Sun, Michael Perlmutter, Smita Krishnaswamy
  • for: 用于处理含有噪声的信号,包括高斯噪声、dropout噪声和 uniformly distributed噪声。
  • methods: 使用最大 posteriori估计(M.A.P)方法,与三种噪声生成模型结合,以估计真实信号在噪声数据中的存在。
  • results: 能够有效地还原白噪声在图像数据中,并在toy和EHR数据中修复严重的dropout噪声。
    Abstract We consider noisy signals which are defined on the vertices of a graph and present smoothing algorithms for the cases of Gaussian, dropout, and uniformly distributed noise. The signals are assumed to follow a prior distribution defined in the frequency domain which favors signals which are smooth across the edges of the graph. By pairing this prior distribution with our three models of noise generation, we propose \textit{Maximum A Posteriori} (M.A.P.) estimates of the true signal in the presence of noisy data and provide algorithms for computing the M.A.P. Finally, we demonstrate the algorithms' ability to effectively restore white noise on image data, and from severe dropout in toy \& EHR data.
    摘要 我们考虑了含有噪声的信号,这些信号定义在图形Vertex上,并提出了对 Gaussian、dropout 和uniformally distributed噪声进行简化处理的算法。我们假设信号在频域中具有一个偏好简洁信号在图形的边缘平滑的先前分布,并将这个先前分布与我们的三种噪声生成模型结合使用,提出了最大 posteriori(M.A.P)估计真实信号在噪声数据存在下的存在。最后,我们展示了这些算法在图像数据上能够有效地还原白噪声,并在娯 Toy 和 EHR 数据中严重的dropout中还原数据。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and grammar rules. It is commonly used in mainland China and Singapore.

Physics-Informed Neural Network for Discovering Systems with Unmeasurable States with Application to Lithium-Ion Batteries

  • paper_url: http://arxiv.org/abs/2311.16374
  • repo_url: None
  • paper_authors: Yuichi Kajiura, Jorge Espin, Dong Zhang
  • for: 本研究旨在提高physics-informed neural network(PINN)的训练效果,以便更好地用于不可观测系统(如锂离子电池)的模拟和控制。
  • methods: 本研究提出了一种使用 fewer loss terms 的robust PINN训练方法,通过将动力学中的 differential equation embed into a loss function,以便同时优化 PINN 的参数和系统参数。
  • results: 本研究通过应用这种方法于一个锂离子电池模型,同时估算了其状态和参数。结果表明,这种方法可以更好地处理不可观测系统的模拟和控制问题。
    Abstract Combining machine learning with physics is a trending approach for discovering unknown dynamics, and one of the most intensively studied frameworks is the physics-informed neural network (PINN). However, PINN often fails to optimize the network due to its difficulty in concurrently minimizing multiple losses originating from the system's governing equations. This problem can be more serious when the system's states are unmeasurable, like lithium-ion batteries (LiBs). In this work, we introduce a robust method for training PINN that uses fewer loss terms and thus constructs a less complex landscape for optimization. In particular, instead of having loss terms from each differential equation, this method embeds the dynamics into a loss function that quantifies the error between observed and predicted system outputs. This is accomplished by numerically integrating the predicted states from the neural network(NN) using known dynamics and transforming them to obtain a sequence of predicted outputs. Minimizing such a loss optimizes the NN to predict states consistent with observations given the physics. Further, the system's parameters can be added to the optimization targets. To demonstrate the ability of this method to perform various modeling and control tasks, we apply it to a battery model to concurrently estimate its states and parameters.
    摘要 使用机器学习与物理结合是一种流行的方法,以发现未知动力学,而physics-informed neural network(PINN)是最受推广的框架之一。然而,PINN经常因同时最小化多个来源于系统的方程式中的损失而难以优化网络。这个问题可能更加严重当系统的状态不可观测,如锂离子电池(LiBs)。在这种情况下,我们提出了一种可靠的PINN训练方法,使用 fewer loss terms,并构建了一个较为简单的优化地图。具体来说,这种方法将动力学 embedding到一个loss函数中,以衡量预测和观测系统输出之间的错误。这是通过 numerically integrating the predicted states from the neural network(NN)使用已知动力学,并将其转换为获得一个序列的预测输出。最小化这种损失可以优化NN,以预测与观测系统输出一致的状态, giventhe physics。此外,系统的参数也可以添加到优化目标中。为了证明这种方法在不同的模型和控制任务中的能力,我们在一个锂离子电池模型上应用了这种方法,并同时估算了其状态和参数。

Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling

  • paper_url: http://arxiv.org/abs/2311.16361
  • repo_url: None
  • paper_authors: Weicheng Zhu, Sheng Liu, Carlos Fernandez-Granda, Narges Razavian
  • for: 这篇论文探讨了自动学习(SSL)在具有关系的数据上的应用,以及在这些数据上学习的问题。
  • methods: 本文使用了SSL方法,并考虑了关系性的问题。
  • results: 本文发现,在关系性的情况下,SSL训练损失可以通过捕捉一 subset of 显著的特征来降低,即使其他重要预测特征存在。 另外,本文建议了一个具有学习速度意识的SSL方法(LA-SSL),并评估了这个方法在三个显示关系性的数据集上的性能。
    Abstract Self-supervised learning (SSL) has emerged as a powerful technique for learning rich representations from unlabeled data. The data representations are able to capture many underlying attributes of data, and be useful in downstream prediction tasks. In real-world settings, spurious correlations between some attributes (e.g. race, gender and age) and labels for downstream tasks often exist, e.g. cancer is usually more prevalent among elderly patients. In this paper, we investigate SSL in the presence of spurious correlations and show that the SSL training loss can be minimized by capturing only a subset of the conspicuous features relevant to those sensitive attributes, despite the presence of other important predictive features for the downstream tasks. To address this issue, we investigate the learning dynamics of SSL and observe that the learning is slower for samples that conflict with such correlations (e.g. elder patients without cancer). Motivated by these findings, we propose a learning-speed aware SSL (LA-SSL) approach, in which we sample each training data with a probability that is inversely related to its learning speed. We evaluate LA-SSL on three datasets that exhibit spurious correlations between different attributes, demonstrating that it improves the robustness of pretrained representations on downstream classification tasks.
    摘要

Cross Entropy in Deep Learning of Classifiers Is Unnecessary – ISBE Error is All You Need

  • paper_url: http://arxiv.org/abs/2311.16357
  • repo_url: None
  • paper_authors: Wladyslaw Skarbek
  • for: 该 paper 主要是为了探讨深度学习类ifier 中的Cost函数,以及可以避免计算 entropy 的方法。
  • methods: 该 paper 使用了一种新的ISBE功能,用于替代传统的 CrossEntropy 计算。在 back-propagation 阶段,不需要将错误传递给 normalization unit,而是直接传递给模型网络。
  • results: 该 paper 的实验结果表明,使用 ISBE 功能可以减少计算时间,并且不会影响结果的准确性。具体来说,在使用 SoftMax 函数时,可以避免计算 entropy,并且在 MNIST 集合上的例子中,可以 saves up to three percent of time within the total time of forward and backward stages。
    Abstract In deep learning classifiers, the cost function usually takes the form of a combination of SoftMax and CrossEntropy functions. The SoftMax unit transforms the scores predicted by the model network into assessments of the degree (probabilities) of an object's membership to a given class. On the other hand, CrossEntropy measures the divergence of this prediction from the distribution of target scores. This work introduces the ISBE functionality, justifying the thesis about the redundancy of cross entropy computation in deep learning of classifiers. Not only can we omit the calculation of entropy, but also, during back-propagation, there is no need to direct the error to the normalization unit for its backward transformation. Instead, the error is sent directly to the model's network. Using examples of perceptron and convolutional networks as classifiers of images from the MNIST collection, it is observed for ISBE that results are not degraded with SoftMax only, but also with other activation functions such as Sigmoid, Tanh, or their hard variants HardSigmoid and HardTanh. Moreover, up to three percent of time is saved within the total time of forward and backward stages. The article is addressed mainly to programmers and students interested in deep model learning. For example, it illustrates in code snippets possible ways to implement ISBE units, but also formally proves that the softmax trick only applies to the class of softmax functions with relocations.
    摘要 在深度学习类ifiers中,成本函数通常是软Max和交叉熵函数的组合。软Max单元将模型网络预测得分转换成对象属于给定类的度量(概率)。而交叉熵则测量预测与目标分布之间的差异。这篇文章介绍了ISBE功能,并证明了深度学习类ifiers中的交叉熵计算是 redundant。不仅可以 omitted 计算熵,而且在反向传播阶段,错误也不需要直接传递给Normalization单元进行反向变换。而是直接传递给模型网络。使用MNIST集合中的图像分类器(perceptron和卷积网络),实验观察到,在使用ISBE时,结果不会受到SoftMaxonly的影响,而且也不会受到其他活化函数,如Sigmoid、Tanh或其硬变体HardSigmoid和HardTanh的影响。此外,在前向和反向阶段的总时间中,可以 saves up to 3%的时间。这篇文章主要面向程序员和深度学习模型学习的学生。例如,它通过代码示例详细介绍了ISBE单元的实现方式,并正式证明了软Max把逻辑只适用于软Max函数的重定位类型。

From Reactive to Proactive Volatility Modeling with Hemisphere Neural Networks

  • paper_url: http://arxiv.org/abs/2311.16333
  • repo_url: https://github.com/theaionxgit/aionx
  • paper_authors: Philippe Goulet Coulombe, Mikael Frenette, Karin Klieber
  • for: 该 paper 是为了提高 macroeconomic density forecasting 的最大化可信度估计 (MLE) 的研究。
  • methods: 该 paper 使用了一种新的神经网络架构,称为 Hemisphere Neural Network (HNN),其中包括了几种关键的元素,以使 MLE 在这个上下文中工作。这些元素包括:1. 两个半球共享一个核心,以处理不同时期的错误卷积变化;2. 引入了一种强调方差约束,以解决mean/variance 不确定性;3. 使用了阻塞的 out-of-bag 现实检查,以避免过拟合;4. 使用了标准的深度学习软件,可以处理大量数据,并且可以从 computationally 和 statistically 两个方面处理。
  • results: 该 paper 的实验结果表明,HNN 可以提供高精度的 mean/variance 预测,并且可以在不同的目标和时间上进行预测。在对 extensive out-of-sample 实验中,HNN 与其他模型进行比较,并表明其在所有目标和时间上都能够提供高精度的预测。此外,研究结果还表明,HNN 可以提供高可靠性的 probabilistic 预测,并且可以在不同的情况下进行适应。最后, paper 还探讨了如何将这种机器学习模型与其他结构化的深度学习模型结合使用。
    Abstract We reinvigorate maximum likelihood estimation (MLE) for macroeconomic density forecasting through a novel neural network architecture with dedicated mean and variance hemispheres. Our architecture features several key ingredients making MLE work in this context. First, the hemispheres share a common core at the entrance of the network which accommodates for various forms of time variation in the error variance. Second, we introduce a volatility emphasis constraint that breaks mean/variance indeterminacy in this class of overparametrized nonlinear models. Third, we conduct a blocked out-of-bag reality check to curb overfitting in both conditional moments. Fourth, the algorithm utilizes standard deep learning software and thus handles large data sets - both computationally and statistically. Ergo, our Hemisphere Neural Network (HNN) provides proactive volatility forecasts based on leading indicators when it can, and reactive volatility based on the magnitude of previous prediction errors when it must. We evaluate point and density forecasts with an extensive out-of-sample experiment and benchmark against a suite of models ranging from classics to more modern machine learning-based offerings. In all cases, HNN fares well by consistently providing accurate mean/variance forecasts for all targets and horizons. Studying the resulting volatility paths reveals its versatility, while probabilistic forecasting evaluation metrics showcase its enviable reliability. Finally, we also demonstrate how this machinery can be merged with other structured deep learning models by revisiting Goulet Coulombe (2022)'s Neural Phillips Curve.
    摘要 我们重新激活最大可能性估计(MLE),通过一种新的神经网络架构,以提高macro经济密度预测。我们的架构包括以下几个关键元素,使MLE在这个上下文中工作:1. 半球预测和变量预测共享一个共同核心,以满足不同时间变化的错误方差问题。2. 我们引入了一个强调方差约束,以解决非线性模型中的均值/方差不确定性问题。3. 我们使用阻塞的out-of-bag实验来约束过拟合。4. 算法使用标准的深度学习软件,可以处理大量数据,并且从计算和统计上来说都是可行的。因此,我们的半球神经网络(HNN)可以提供活跃的方差预测,基于领先指标的情况下采取主动预测,当必须时则采取反应预测,基于过去预测错误的大小。我们通过大量的out-of-sample实验和对一系列模型进行比较,发现HNN在所有目标和时间距离上都能够提供准确的均值/方差预测。研究生成的方差轨迹表明其灵活性,而 probabilistic 预测评价指标则表明其 enviable 可靠性。最后,我们还示例了如何将这种机器结合其他结构深度学习模型,例如Goulet Coulombe (2022)的神经帕拉杯 Curve。

Target-Free Compound Activity Prediction via Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2311.16328
  • repo_url: None
  • paper_authors: Peter Eckmann, Jake Anderson, Michael K. Gilson, Rose Yu
  • for: 预测药物活性 against 蛋白质或生物化学测试,使用只有几个知道的药物和其活性。
  • methods: 使用神经网络架构,meta-学着训练不同生物活性数据集上的药物活性预测。
  • results: 比传统相似性基于技术和其他状态的几 shot 学习方法在目标无法药物找到设置和数据集上表现出色。
    Abstract Predicting the activities of compounds against protein-based or phenotypic assays using only a few known compounds and their activities is a common task in target-free drug discovery. Existing few-shot learning approaches are limited to predicting binary labels (active/inactive). However, in real-world drug discovery, degrees of compound activity are highly relevant. We study Few-Shot Compound Activity Prediction (FS-CAP) and design a novel neural architecture to meta-learn continuous compound activities across large bioactivity datasets. Our model aggregates encodings generated from the known compounds and their activities to capture assay information. We also introduce a separate encoder for the unknown compound. We show that FS-CAP surpasses traditional similarity-based techniques as well as other state of the art few-shot learning methods on a variety of target-free drug discovery settings and datasets.
    摘要 <>用一些已知的化合物和其活性信息预测其他化合物的活性是target-free drug discovery中常见的任务。现有的几个shot学习方法只能预测二分类标签(活动/不活动)。然而,实际的药物探索中,化合物活性的度量非常重要。我们研究了Few-Shot Compound Activity Prediction(FS-CAP),并设计了一种新的神经网络架构,通过在大规模生物活性数据集上meta-学习来预测化合物活性。我们的模型将知道的化合物和其活性信息生成的编码集成在一起,以捕捉试验信息。我们还引入了一个独立的未知化合物编码器。我们显示,FS-CAP在多种target-free drug discovery设置和数据集上超越了传统相似性基于技术以及其他状态的最佳几个shot学习方法。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Quantum-classical simulation of quantum field theory by quantum circuit learning

  • paper_url: http://arxiv.org/abs/2311.16297
  • repo_url: None
  • paper_authors: Kazuki Ikeda
  • for: simulate quantum field theories (QFTs) on quantum computers
  • methods: quantum circuit learning, compact configuration of qubits, low-depth quantum circuits
  • results: accurate predictions of quench dynamics, chiral dynamics, and jet production, close alignment with classical calculations, high degree of accuracy
    Abstract We employ quantum circuit learning to simulate quantum field theories (QFTs). Typically, when simulating QFTs with quantum computers, we encounter significant challenges due to the technical limitations of quantum devices when implementing the Hamiltonian using Pauli spin matrices. To address this challenge, we leverage quantum circuit learning, employing a compact configuration of qubits and low-depth quantum circuits to predict real-time dynamics in quantum field theories. The key advantage of this approach is that a single-qubit measurement can accurately forecast various physical parameters, including fully-connected operators. To demonstrate the effectiveness of our method, we use it to predict quench dynamics, chiral dynamics and jet production in a 1+1-dimensional model of quantum electrodynamics. We find that our predictions closely align with the results of rigorous classical calculations, exhibiting a high degree of accuracy. This hybrid quantum-classical approach illustrates the feasibility of efficiently simulating large-scale QFTs on cutting-edge quantum devices.
    摘要 我们使用量子环路学习来模拟量子场论(QFT)。通常,在量子计算机上模拟QFT时,我们会遇到技术限制,例如使用保ロ套矩阵实现哈密顿Operator。为解决这个挑战,我们利用量子环路学习,使用高度嵌入的量子环路和快速的量子环路来预测量子场论的实时动力学。这种方法的优点是单个量子比特测量可以准确预测多种物理参数,包括完全相连的操作数。为证明我们的方法的有效性,我们使用它来预测干扰动力学、旋转动力学和喷气生成在1+1维量子电磁学中。我们发现我们的预测与高精度的类型计算结果高度相符,表明我们的方法可以高效地模拟大规模QFT在前沿量子设备上。这种量子-классиical混合方法表明可以高效地模拟大规模QFT。

A statistical approach to latent dynamic modeling with differential equations

  • paper_url: http://arxiv.org/abs/2311.16286
  • repo_url: https://github.com/maren-ha/latentdynamics.jl
  • paper_authors: Maren Hackenberg, Astrid Pechmann, Clemens Kreutz, Janbernd Kirschner, Harald Binder
  • for: 这篇论文是为了探讨 temporally local changes of processes 的机制模型,并使用 ODEs 进行模型化。
  • methods: 本论文使用 ODEs 进行模型化,并使用 neural networks 获得低维 latent space 和 patient-specific ODE parameters。
  • results: 本论文提出了一种使用 each observation in the course of time as the initial value to obtain multiple local ODE solutions,并建立一个 combined estimator of the underlying dynamics。Please note that the above information is in Simplified Chinese text, as requested.
    Abstract Ordinary differential equations (ODEs) can provide mechanistic models of temporally local changes of processes, where parameters are often informed by external knowledge. While ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, e.g., in a clinical setting. Yet, modeling of local changes could also be attractive for assessing the trajectory of an individual in a cohort in the immediate future given its current status, where ODE parameters could be informed by further characteristics of the individual. However, several hurdles so far limit such use of ODEs, as compared to regression-based function fitting approaches. The potentially higher level of noise in cohort data might be detrimental to ODEs, as the shape of the ODE solution heavily depends on the initial value. In addition, larger numbers of variables multiply such problems and might be difficult to handle for ODEs. To address this, we propose to use each observation in the course of time as the initial value to obtain multiple local ODE solutions and build a combined estimator of the underlying dynamics. Neural networks are used for obtaining a low-dimensional latent space for dynamic modeling from a potentially large number of variables, and for obtaining patient-specific ODE parameters from baseline variables. Simultaneous identification of dynamic models and of a latent space is enabled by recently developed differentiable programming techniques. We illustrate the proposed approach in an application with spinal muscular atrophy patients and a corresponding simulation study. In particular, modeling of local changes in health status at any point in time is contrasted to the interpretation of functions obtained from a global regression. This more generally highlights how different application settings might demand different modeling strategies.
    摘要 常微分方程(ODEs)可以提供机制模型,描述时间地方变化的过程,其参数通常由外部知识提供。 although ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, such as in a clinical setting. However, modeling local changes could also be attractive for assessing an individual's trajectory in a cohort over the immediate future, given their current status, where ODE parameters could be informed by further characteristics of the individual. However, several challenges limit the use of ODEs, such as the potentially higher level of noise in cohort data, which can affect the shape of the ODE solution, and the fact that larger numbers of variables can multiply these problems and be difficult to handle for ODEs.To address this, we propose using each observation in the course of time as the initial value to obtain multiple local ODE solutions and build a combined estimator of the underlying dynamics. In addition, neural networks are used to obtain a low-dimensional latent space for dynamic modeling from a potentially large number of variables, and to obtain patient-specific ODE parameters from baseline variables. Recently developed differentiable programming techniques enable the simultaneous identification of dynamic models and of a latent space. We illustrate the proposed approach in an application with spinal muscular atrophy patients and a corresponding simulation study. In particular, modeling local changes in health status at any point in time is contrasted with the interpretation of functions obtained from a global regression. This highlights how different application settings may demand different modeling strategies.

Practical Layout-Aware Analog/Mixed-Signal Design Automation with Bayesian Neural Networks

  • paper_url: http://arxiv.org/abs/2311.17073
  • repo_url: None
  • paper_authors: Ahmet F. Budak, Keren Zhu, David Z. Pan
  • for: 提高实际混合信号设计自动化的效率,适用于临时expensive的 simulatecircuits。
  • methods: 使用学习算法,通过少量数据训练,可扩展到高价值的 simulatecircuits。我们使用 Bayesian Neural Networks 作为回归模型,以优化环境。
  • results: 我们的方法比 conventonal baselines 和 state-of-the-art algorithms 更高效,可以快速优化混合信号设计。我们在三个案例中进行了证明。
    Abstract The high simulation cost has been a bottleneck of practical analog/mixed-signal design automation. Many learning-based algorithms require thousands of simulated data points, which is impractical for expensive to simulate circuits. We propose a learning-based algorithm that can be trained using a small amount of data and, therefore, scalable to tasks with expensive simulations. Our efficient algorithm solves the post-layout performance optimization problem where simulations are known to be expensive. Our comprehensive study also solves the schematic-level sizing problem. For efficient optimization, we utilize Bayesian Neural Networks as a regression model to approximate circuit performance. For layout-aware optimization, we handle the problem as a multi-fidelity optimization problem and improve efficiency by exploiting the correlations from cheaper evaluations. We present three test cases to demonstrate the efficiency of our algorithms. Our tests prove that the proposed approach is more efficient than conventional baselines and state-of-the-art algorithms.
    摘要 高效的数值计算成本是实际混合信号设计自动化的瓶颈。许多学习基于算法需要数千个数据点,这对于贵重的计算circuit来说是不实际的。我们提出了一种学习基于算法,可以通过小量数据进行训练,因此可扩展到任务中的贵重计算。我们的高效算法解决了后置布局性能优化问题,在 simulations known to be expensive 中。我们的全面研究还解决了架构级别的大小问题。为了有效地优化,我们利用 Bayesian Neural Networks 作为回归模型来近似Circuit性能。为了提高效率,我们在评估中处理问题为多iddleFIdelity optimization problem,并利用较便宜的评估来捕捉相关性。我们在三个测试案例中展示了我们的方法的效率,并证明了我们的方法比普通基准和现状算法更高效。

Have we built machines that think like people?

  • paper_url: http://arxiv.org/abs/2311.16093
  • repo_url: https://github.com/lsbuschoff/multimodal
  • paper_authors: Luca M. Schulze Buschoff, Elif Akata, Matthias Bethge, Eric Schulz
  • for: 这篇论文探讨了现代视觉语言模型在INTUITIVE PHYSICS、CAUSAL REASONING和INTUITIVE PSYCHOLOGY领域的表现。
  • methods: 通过一系列控制实验,该论文研究了现代视觉语言模型在处理和解释视觉数据方面的能力。
  • results: 研究发现,虽然这些模型在处理视觉数据方面表现出色,但在理解物理法律和 causal 关系以及人类社交认知方面 still fall short of human capabilities。
    Abstract A chief goal of artificial intelligence is to build machines that think like people. Yet it has been argued that deep neural network architectures fail to accomplish this. Researchers have asserted these models' limitations in the domains of causal reasoning, intuitive physics, and intuitive psychology. Yet recent advancements, namely the rise of large language models, particularly those designed for visual processing, have rekindled interest in the potential to emulate human-like cognitive abilities. This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology. Through a series of controlled experiments, we investigate the extent to which these modern models grasp complex physical interactions, causal relationships, and intuitive understanding of others' preferences. Our findings reveal that, while these models demonstrate a notable proficiency in processing and interpreting visual data, they still fall short of human capabilities in these areas. The models exhibit a rudimentary understanding of physical laws and causal relationships, but their performance is hindered by a lack of deeper insights-a key aspect of human cognition. Furthermore, in tasks requiring an intuitive theory of mind, the models fail altogether. Our results emphasize the need for integrating more robust mechanisms for understanding causality, physical dynamics, and social cognition into modern-day, vision-based language models, and point out the importance of cognitively-inspired benchmarks.
    摘要 主要目标之一的人工智能是建立机器人思维如人的能力。然而,一些研究人员指出,深度神经网络架构不足以实现这一目标。研究人员认为,这些模型在逻辑推理、物理概念和人类心理学方面有限制。然而,在大语言模型的出现后,有些研究人员又重新关注了模拟人类认知能力的可能性。本文评估当今视觉基于大语言模型在逻辑推理、物理概念和人类心理学方面的现状。通过一系列控制实验,我们调查了这些现代模型处理复杂物理交互和 causal 关系的能力,以及理解他人偏好的直觉。我们发现,虽然这些模型在处理和解释视觉数据方面表现出色,但在人类能力方面仍然缺乏深入的理解。它们对物理法律和 causal 关系有一定的理解,但在完成任务时,它们缺乏更深入的认知机制。此外,在需要直觉人类心理的任务中,这些模型完全失败。我们的结果强调了现代视觉基于大语言模型需要更加强大的物理学、物理概念和社会认知机制,以及需要更加认知驱动的benchmark。

XLB: Distributed Multi-GPU Lattice Boltzmann Simulation Framework for Differentiable Scientific Machine Learning

  • paper_url: http://arxiv.org/abs/2311.16080
  • repo_url: https://github.com/autodesk/xlb
  • paper_authors: Mohammadmehdi Ataei, Hesam Salehipour
  • for: 这篇论文旨在介绍一个基于Python的分子 Boltzmann方法(LBM)库,即XLB框架,该框架可以在CPU、多卡GPU和分布式多卡GPU系统上进行可扩展的扩展和高性能的计算。
  • methods: 该论文使用了JAX框架,并提供了一些可扩展的边界条件、碰撞模型和计算能力,以确保框架的可用性、可扩展性和计算性能。
  • results: 该论文成功地扩展了LBM的可扩展性和计算性能,并实现了对 billions of cells 的模拟,达到了兆级的离散 lattice 更新速率。
    Abstract The lattice Boltzmann method (LBM) has emerged as a prominent technique for solving fluid dynamics problems due to its algorithmic potential for computational scalability. We introduce XLB framework, a Python-based differentiable LBM library which harnesses the capabilities of the JAX framework. The architecture of XLB is predicated upon ensuring accessibility, extensibility, and computational performance, enabling scaling effectively across CPU, multi-GPU, and distributed multi-GPU systems. The framework can be readily augmented with novel boundary conditions, collision models, or simulation capabilities. XLB offers the unique advantage of integration with JAX's extensive machine learning echosystem, and the ability to utilize automatic differentiation for tackling physics-based machine learning, optimization, and inverse problems. XLB has been successfully scaled to handle simulations with billions of cells, achieving giga-scale lattice updates per second. XLB is released under the permissive Apache-2.0 license and is available on GitHub at https://github.com/Autodesk/XLB.
    摘要 《离散辐射方法(LBM)》已经成为流体力学问题的解决方法之一,这主要归功于其算法的可扩展性和计算可扩展性。我们介绍了一个名为XLB框架,它是基于Python的可微分LBM库,利用JAX框架的能力。XLB框架的设计强调了可访问性、可扩展性和计算性能,以便效率地在CPU、多个GPU和分布式多个GPU系统上进行扩展。此外,XLB框架可以轻松地添加新的边界条件、Collision模型或计算能力。XLB具有与JAX的广泛机器学习生态系统集成的优势,以及使用自动微分来解决物理基于机器学习、优化和逆问题。XLB已经成功实现了对百亿个细胞的模拟,实现了 гига级离散辐射更新每秒。XLB是根据Apache-2.0许可证发布,可以在GitHub上获取:https://github.com/Autodesk/XLB。

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting

  • paper_url: http://arxiv.org/abs/2311.16214
  • repo_url: None
  • paper_authors: Hanrui Wang, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jonathan Baker, Frederic T. Chong, Song Han
  • for: 提高量子硬件的逻辑错误率和噪声强度,通过量子错误 correction (QEC) 技术,将量子信息分布式存储在多个数据 qubits 上,并使用 syndrome qubits 检查公差。
  • methods: 提出了一种名为 DGR 的高效的 decoding graph edge re-weighting 策略,无需量子质量。该策略利用了实际量子硬件上错误的统计特征,通过统计计算出 edges 和 edge pairs 在解码过程中的出现频率,以估计实际上的权重。
  • results: 对 surface code 和 honeycomb code 进行了广泛的评估,发现 DGR 可以降低逻辑错误率,比如在average-case noise mismatch 下降低了3.6倍,并在 worst-case mismatch 下超过5000倍提高。
    Abstract Quantum hardware suffers from high error rates and noise, which makes directly running applications on them ineffective. Quantum Error Correction (QEC) is a critical technique towards fault tolerance which encodes the quantum information distributively in multiple data qubits and uses syndrome qubits to check parity. Minimum-Weight-Perfect-Matching (MWPM) is a popular QEC decoder that takes the syndromes as input and finds the matchings between syndromes that infer the errors. However, there are two paramount challenges for MWPM decoders. First, as noise in real quantum systems can drift over time, there is a potential misalignment with the decoding graph's initial weights, leading to a severe performance degradation in the logical error rates. Second, while the MWPM decoder addresses independent errors, it falls short when encountering correlated errors typical on real hardware, such as those in the 2Q depolarizing channel. We propose DGR, an efficient decoding graph edge re-weighting strategy with no quantum overhead. It leverages the insight that the statistics of matchings across decoding iterations offer rich information about errors on real quantum hardware. By counting the occurrences of edges and edge pairs in decoded matchings, we can statistically estimate the up-to-date probabilities of each edge and the correlations between them. The reweighting process includes two vital steps: alignment re-weighting and correlation re-weighting. The former updates the MWPM weights based on statistics to align with actual noise, and the latter adjusts the weight considering edge correlations. Extensive evaluations on surface code and honeycomb code under various settings show that DGR reduces the logical error rate by 3.6x on average-case noise mismatch with exceeding 5000x improvement under worst-case mismatch.
    摘要 量子硬件受到高错率和噪声的影响,直接运行应用程序在其上效率低下。量子错误 corrention (QEC) 是一种关键的技术,它在多个数据 qubits 中分布式地编码量子信息,并使用 syndrome qubits 检查平衡。 however,MWPM decoder 存在两个主要挑战。 first,由于实际量子系统中的噪声可以随时间的推移而变化,这可能会导致 decoding 图的初始 веса与实际噪声不匹配,从而导致逻辑错误率的严重下降。 second,MWPM decoder 只能处理独立的错误,而不能处理实际上的相关错误,如在二量子 depolarizing 通道中的 typical 错误。我们提出了 DGR,一种高效的 decoding 图边重量调整策略,无需量子负担。它利用了 decoding 图中 matchings 的统计信息,可以统计地错误的 probabilities 和相关性。 re-weighting 过程包括两个关键步骤:对齐重量调整和相关重量调整。 former 将 MWPM 重量调整到实际噪声的基础上,而 latter 将重量调整considering 边的相关性。我们对 surface code 和 honeycomb code 进行了广泛的评估,并在不同的设置下得到了结果。 average-case noise mismatch 下,DGR 可以降低逻辑错误率的3.6倍。在 worst-case mismatch 下,DGR 可以提高逻辑错误率的5000倍。

Metric Space Magnitude for Evaluating Unsupervised Representation Learning

  • paper_url: http://arxiv.org/abs/2311.16054
  • repo_url: None
  • paper_authors: Katharina Limbeck, Rayna Andreeva, Rik Sarkar, Bastian Rieck
  • for: 本研究目的是提出一种新的维度减少方法,通过捕捉数据的几何和topological特征,以Addressing challenges in unsupervised representation learning tasks.
  • methods: 本研究使用了一种新的距离度量,称为“magnitude”,用于衡量数据的几何和topological特征。这个度量可以减少维度,并且可以快速计算和稳定地计算。
  • results: 本研究通过一系列实验表明,这种距离度量可以减少数据的维度,并且可以用于比较不同维度的数据 embeddings。此外,这种距离度量还可以适用于不同的数据领域和任务,如数据可见化。
    Abstract The magnitude of a metric space was recently established as a novel invariant, providing a measure of the `effective size' of a space across multiple scales. By capturing both geometrical and topological properties of data, magnitude is poised to address challenges in unsupervised representation learning tasks. We formalise a novel notion of dissimilarity between magnitude functions of finite metric spaces and use them to derive a quality measure for dimensionality reduction tasks. Our measure is provably stable under perturbations of the data, can be efficiently calculated, and enables a rigorous multi-scale comparison of embeddings. We show the utility of our measure in an experimental suite that comprises different domains and tasks, including the comparison of data visualisations.
    摘要 metric 空间的大小最近被确定为一个新的不变量,提供了多个尺度下数据的`有效大小'的度量。通过捕捉数据的几何和topological特性,大小能够解决无监督学习任务中的挑战。我们提出了一种新的差异量方法,用于衡量finite metric space中的差异函数的质量。我们的度量可以在数据上进行有效的计算,并且可以在多个尺度下进行比较。我们通过实验表明,我们的度量可以快速地计算,并且在数据上进行多个尺度下的比较。Note: Simplified Chinese is used here, as it is the most widely used version of Chinese. However, if you prefer Traditional Chinese, I can provide that as well.

A Neural Framework for Generalized Causal Sensitivity Analysis

  • paper_url: http://arxiv.org/abs/2311.16026
  • repo_url: None
  • paper_authors: Dennis Frauen, Fergus Imrie, Alicia Curth, Valentyn Melnychuk, Stefan Feuerriegel, Mihaela van der Schaar
  • for: 该 paper 用于提出一种基于神经网络的通用 causal sensitivity analysis 方法,以帮助在不观测潜在干扰的情况下进行 causal inference。
  • methods: 该 paper 使用 two conditional normalizing flows 学习 latent distribution shift,以实现对不同敏感模型、不同治疗类型和不同 causal query 的通用适用性。
  • results: 该 paper 提供了理论保证,证明 NeuralCSA 能够生成有效的 bound 以描述 causal query of interest,并通过验证使用 simulated 和实际数据来证明这一点。
    Abstract Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal conclusions under unobserved confounding with mathematical guarantees. In this paper, we propose NeuralCSA, a neural framework for generalized causal sensitivity analysis. Unlike previous work, our framework is compatible with (i) a large class of sensitivity models, including the marginal sensitivity model, f-sensitivity models, and Rosenbaum's sensitivity model; (ii) different treatment types (i.e., binary and continuous); and (iii) different causal queries, including (conditional) average treatment effects and simultaneous effects on multiple outcomes. The generality of \frameworkname is achieved by learning a latent distribution shift that corresponds to a treatment intervention using two conditional normalizing flows. We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest and also demonstrate this empirically using both simulated and real-world data.
    摘要 “不观察的杂化是许多应用中的常见问题,使得从观察数据中提取 causal inference 变得困难。为了解决这问题, causal 敏感性分析 是一种重要的工具,可以在不观察的杂化情况下提供数学保证。在这篇论文中,我们提出了 NeuralCSA,一种基于神经网络的通用 causal 敏感性分析框架。与前一代工作不同,我们的框架可以与(i)一种广泛的敏感模型集合,包括边缘敏感模型、f-敏感模型和 Розен巴ум的敏感模型;(ii)不同的治疗类型(即二进制和连续);以及(iii)不同的 causal 查询,包括(条件)均值治疗效果和同时对多个结果的效果进行同时分析。我们的框架通过两个 Conditional 正常化流来学习对待治疗干扰的潜在分布变化,并提供了理论保证,证明 NeuralCSA 可以bounds 中的 causal 查询。我们还通过验证使用 simulations 和实际数据来证明这一点。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Scheduling and Communication Schemes for Decentralized Federated Learning

  • paper_url: http://arxiv.org/abs/2311.16021
  • repo_url: None
  • paper_authors: Bahaa-Eldin Ali Abdelghany, Ana Fernández-Vilas, Manuel Fernández-Veiga, Nashwa El-Bendary, Ammar M. Hassan, Walid M. Abdelmoez
  • for: 提高分布式机器学习性能(Federated Learning,FL)在多客户端协同学习中的扩展性和扩展性。
  • methods: 提出了一种分布式 Federated Learning(DFL)模型,使用 Stochastic Gradient Descent(SGD)算法,并提出了三种调度策略来优化客户端与平行服务器之间的通信。
  • results: 通过 Totally Decentralized 实现 SGD,测试结果显示提出的调度策略对速度收敛和最终全局模型产生影响。
    Abstract Federated learning (FL) is a distributed machine learning paradigm in which a large number of clients coordinate with a central server to learn a model without sharing their own training data. One central server is not enough, due to problems of connectivity with clients. In this paper, a decentralized federated learning (DFL) model with the stochastic gradient descent (SGD) algorithm has been introduced, as a more scalable approach to improve the learning performance in a network of agents with arbitrary topology. Three scheduling policies for DFL have been proposed for communications between the clients and the parallel servers, and the convergence, accuracy, and loss have been tested in a totally decentralized mplementation of SGD. The experimental results show that the proposed scheduling polices have an impact both on the speed of convergence and in the final global model.
    摘要 Federated learning(FL)是一种分布式机器学习理念,其中大量客户端与中央服务器协同学习模型,无需分享自己的训练数据。然而,由于客户端与服务器之间的连接问题,单个中央服务器无法满足需求。这篇论文提出了一种分布式 federated learning(DFL)模型,使用渐进逻辑 descent(SGD)算法,以更可扩展的方式提高网络上代理的学习性能。论文还提出了三种调度策略,用于在客户端与平行服务器之间进行通信,并测试了在 Totally Decentralized 实现中的SGD的速度和最终全球模型的影响。实验结果表明,提议的调度策略对速度增长和最终模型的影响具有影响。Here's a breakdown of the translation:* Federated learning (FL) is a distributed machine learning paradigm where a large number of clients coordinate with a central server to learn a model without sharing their own training data.* In this paper, a decentralized federated learning (DFL) model with the stochastic gradient descent (SGD) algorithm has been introduced as a more scalable approach to improve the learning performance in a network of agents with arbitrary topology.* The paper proposes three scheduling policies for communications between the clients and the parallel servers, and tests the convergence, accuracy, and loss in a totally decentralized implementation of SGD.* The experimental results show that the proposed scheduling policies have an impact both on the speed of convergence and in the final global model.

Using Decentralized Aggregation for Federated Learning with Differential Privacy

  • paper_url: http://arxiv.org/abs/2311.16008
  • repo_url: None
  • paper_authors: Hadeel Abd El-Kareem, Abd El-Moaty Saleh, Ana Fernández-Vilas, Manuel Fernández-Veiga, asser El-Sonbaty
  • for: 本研究旨在提供一个实验环境,以测试 Federated Learning(FL)与数据隐私(Differential Privacy,DP)的应用,以确保在搭配交换通信、大数据库和分布式机器学习(P2P)方面维护用户隐私。
  • methods: 本研究使用了 FL 技术,并将 DP 应用于这个 scenarioto提供更高的隐私水平。研究者选择了一些适当的参数和技术来实现这个目的。
  • results: 研究结果显示,选择的参数和技术对于隐私和功能之间的调整是关键的。例如,通过对一个分类例子进行训练,研究者发现了一些关键的独立特征,并且可以通过调整参数来增强隐私和功能之间的调整。
    Abstract Nowadays, the ubiquitous usage of mobile devices and networks have raised concerns about the loss of control over personal data and research advance towards the trade-off between privacy and utility in scenarios that combine exchange communications, big databases and distributed and collaborative (P2P) Machine Learning techniques. On the other hand, although Federated Learning (FL) provides some level of privacy by retaining the data at the local node, which executes a local training to enrich a global model, this scenario is still susceptible to privacy breaches as membership inference attacks. To provide a stronger level of privacy, this research deploys an experimental environment for FL with Differential Privacy (DP) using benchmark datasets. The obtained results show that the election of parameters and techniques of DP is central in the aforementioned trade-off between privacy and utility by means of a classification example.
    摘要 Here is the text in Simplified Chinese:现在,移动设备和网络的普遍使用已经引发了人们对个人数据控制的 Concerns,而研究正在进行交互、大数据和分布式和合作机器学习技术的权衡。一方面, Federated Learning(FL)可以提供一定的隐私性,因为它在本地节点上保留数据,并在本地训练来增强全球模型,但这种场景仍然容易受到隐私泄露的威胁,如成员推理攻击。为了提供更高的隐私性,这些研究在 FL 环境中部署了 diffe 隐私技术,使用标准数据集进行测试。结果表明,选择参数和技术的 diffe 隐私技术对于隐私和实用之间的权衡非常重要,以示例 classification 中所示。

Improved Data Generation for Enhanced Asset Allocation: A Synthetic Dataset Approach for the Fixed Income Universe

  • paper_url: http://arxiv.org/abs/2311.16004
  • repo_url: None
  • paper_authors: Szymon Kubiak, Tillman Weyde, Oleksandr Galkin, Dan Philps, Ram Gopal
  • for: 这个论文是为了评估资产配置方法和构建固定收益宇宙中的投资组合而开发的一种新的数据生成过程。
  • methods: 该方法首先使用CorrGAN模型生成合成相关矩阵,然后提出一种Encoder-Decoder模型,通过对给定相关矩阵进行样本生成。
  • results: 该方法可以生成具有多样化资产宇宙的合成数据集,并且可以用于深入分析资产配置方法的性能。例如,通过在一个基于 simulateAssetAllocation 的资产配置过程中使用该合成数据集,可以提高投资组合的性能。
    Abstract We present a novel process for generating synthetic datasets tailored to assess asset allocation methods and construct portfolios within the fixed income universe. Our approach begins by enhancing the CorrGAN model to generate synthetic correlation matrices. Subsequently, we propose an Encoder-Decoder model that samples additional data conditioned on a given correlation matrix. The resulting synthetic dataset facilitates in-depth analyses of asset allocation methods across diverse asset universes. Additionally, we provide a case study that exemplifies the use of the synthetic dataset to improve portfolios constructed within a simulation-based asset allocation process.
    摘要 我们提出了一种新的过程,用于生成适用于评估资产配置方法和构建Fixed income宇宙中的股票组合的 sintetic 数据集。我们的方法开始于增强 CorrGAN 模型,生成 sintetic 相关矩阵。然后,我们提议一种 Encoder-Decoder 模型,通过给定相关矩阵进行采样,从而生成更多的数据。这些 sintetic 数据集可以帮助深入分析资产配置方法在多个资产宇宙中的表现。此外,我们还提供了一个案例研究,演示了如何使用 sintetic 数据集来改进在基于 simulations 的资产配置过程中的股票组合。

Closing the ODE-SDE gap in score-based diffusion models through the Fokker-Planck equation

  • paper_url: http://arxiv.org/abs/2311.15996
  • repo_url: None
  • paper_authors: Teo Deveney, Jan Stanczuk, Lisa Maria Kreusser, Chris Budd, Carola-Bibiane Schönlieb
  • for: 这个论文旨在描述Score-based diffusion模型的训练过程中出现的各种动力学和估计方法,以及这些方法之间的关系。
  • methods: 这个论文使用了Stochastic Differential Equations (SDEs)和Ordinary Differential Equations (ODEs)作为数学基础,并使用了神经网络来近似这些方程。
  • results: 研究人员发现,在训练Score-based diffusion模型时,使用ODE和SDE之间存在显著的差异,这种差异可以通过一个Fokker-Planck方程来描述。此外,研究人员还发现,通过添加一个额外的正则化项来减少Fokker-Planck差异,可以提高ODE采样质量,但是这可能会导致SDE采样质量下降。
    Abstract Score-based diffusion models have emerged as one of the most promising frameworks for deep generative modelling, due to their state-of-the art performance in many generation tasks while relying on mathematical foundations such as stochastic differential equations (SDEs) and ordinary differential equations (ODEs). Empirically, it has been reported that ODE based samples are inferior to SDE based samples. In this paper we rigorously describe the range of dynamics and approximations that arise when training score-based diffusion models, including the true SDE dynamics, the neural approximations, the various approximate particle dynamics that result, as well as their associated Fokker--Planck equations and the neural network approximations of these Fokker--Planck equations. We systematically analyse the difference between the ODE and SDE dynamics of score-based diffusion models, and link it to an associated Fokker--Planck equation. We derive a theoretical upper bound on the Wasserstein 2-distance between the ODE- and SDE-induced distributions in terms of a Fokker--Planck residual. We also show numerically that conventional score-based diffusion models can exhibit significant differences between ODE- and SDE-induced distributions which we demonstrate using explicit comparisons. Moreover, we show numerically that reducing the Fokker--Planck residual by adding it as an additional regularisation term leads to closing the gap between ODE- and SDE-induced distributions. Our experiments suggest that this regularisation can improve the distribution generated by the ODE, however that this can come at the cost of degraded SDE sample quality.
    摘要 Score-based diffusion models 已经被认为是深度生成模型中的一种最有前途的框架,因为它们在许多生成任务中表现出了state-of-the-art水平,同时具有数学基础 such as Stochastic Differential Equations (SDEs) 和 Ordinary Differential Equations (ODEs)。实际上,有研究发现 ODE 基本样本比 SDE 基本样本低质量。在这篇论文中,我们系统地描述了训练 score-based diffusion models 时出现的动力学和近似,包括真实的 SDE 动力学、神经网络近似、不同的假样动力学以及其相应的 Fokker-Planck 方程和神经网络近似。我们系统分析了 ODE 和 SDE 动力学之间的差异,并将其联系到一个 Fokker-Planck residual。我们也 derive 了一个理论上的 Wasserstein 2-distance 上限,用于比较 ODE- 和 SDE-induced 分布的差异。数值分析表明, conventinal score-based diffusion models 可能会出现 significan differences between ODE- 和 SDE-induced 分布,我们通过显式比较来证明。此外,我们还表明了通过添加 Fokker-Planck residual 作为额外正则项来降低这些差异的效果。我们的实验表明,这种正则可以提高 ODE 生成的分布质量,但可能会导致 SDE 样本质量下降。

Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks

  • paper_url: http://arxiv.org/abs/2311.15995
  • repo_url: https://github.com/leoniekreis/layer_insertion_sensitivity_based
  • paper_authors: Evelyn Herberg, Roland Herzog, Frederik Köhne, Leonie Kreis, Anton Schiela
  • for: 提高神经网络训练的效率和自动化性。
  • methods: 利用束定优化技术和首频敏感信息,在训练过程中逐渐插入新层,不需手动调整网络结构。
  • results: 在数字实验中,提出的敏感性基于插入层技术可以提高训练衰减,同时减少插入层的计算成本。
    Abstract The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training. Our technique borrows techniques from constrained optimization and is based on first-order sensitivity information of the objective with respect to the virtual parameters that additional layers, if inserted, would offer. We consider fully connected feedforward networks with selected activation functions as well as residual neural networks. In numerical experiments, the proposed sensitivity-based layer insertion technique exhibits improved training decay, compared to not inserting the layer. Furthermore, the computational effort is reduced in comparison to inserting the layer from the beginning. The code is available at \url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based}.
    摘要 neural network 训练需要 tedious 和常常是手动调整网络结构的。我们提出了一种系统的方法,在训练过程中插入新层,从而消除了选择固定网络大小之前训练的需求。我们的技术借鉴了受限制优化的技术,基于目标函数对虚拟参数(如果插入层会提供的附加层)的首次敏感性信息。我们考虑了全连接Feedforward网络和剩余神经网络。在数学实验中,我们的敏感性基于层插入技术 exhibits 改善训练衰减,相比不插入层。此外,在插入层的开始之前,计算时间的减少。代码可以在 \url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based} 上获取。

Should We Learn Most Likely Functions or Parameters?

  • paper_url: http://arxiv.org/abs/2311.15990
  • repo_url: https://github.com/activatedgeek/function-space-map
  • paper_authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson
  • for: 本研究旨在探讨maximum a posteriori (MAP) estimation的标准训练方法,以及直接估计模型适用函数空间中的最可能函数。
  • methods: 本研究使用了重要的函数空间估计技术,以及一种扩展的梯度下降算法,以估计模型适用函数空间中的最可能函数。
  • results: 研究发现,使用函数空间MAP estimation可以带来更好的泛化、更好的稳定性和更好的鲁棒性,并且可以避免一些常见的过拟合问题。
    Abstract Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generally correspond to the most likely function induced by the parameter posterior. In fact, we can re-parametrize a model such that any setting of parameters can maximize the parameter posterior. As an alternative, we investigate the benefits and drawbacks of directly estimating the most likely function implied by the model and the data. We show that this procedure leads to pathological solutions when using neural networks and prove conditions under which the procedure is well-behaved, as well as a scalable approximation. Under these conditions, we find that function-space MAP estimation can lead to flatter minima, better generalization, and improved robustness to overfitting.
    摘要 标准化常规训练过程相应于最大化参数 posterior 分布的最大化,也就是最大化 posterior 的MAP 估计。然而,模型参数仅仅是在与模型函数的形式结合下具有参数的意义。此外,参数 posterior 中最可能的参数通常不是模型函数induced by parameter posterior中最可能的函数。实际上,我们可以重新Parametrize a model such that any setting of parameters can maximize the parameter posterior。作为替代,我们研究直接估计模型和数据中最可能的函数的优缺点。我们发现这种方法在使用神经网络时会导致Pathological solutions,并证明了这种方法在满足某些条件下是可行的,以及一种可扩展的估计方法。在这些条件下,我们发现了函数空间MAP估计可以导致更平坦的极小值、更好的泛化和减少过拟合。

Towards Transfer Learning for Large-Scale Image Classification Using Annealing-based Quantum Boltzmann Machines

  • paper_url: http://arxiv.org/abs/2311.15966
  • repo_url: None
  • paper_authors: Daniëlle Schuman, Leo Sünkel, Philipp Altmann, Jonas Stein, Christoph Roch, Thomas Gabor, Claudia Linnhoff-Popien
  • for: This paper is written for image classification tasks, specifically using quantum transfer learning (QTL) and quantum annealing (QA) to improve the performance of classification on large-scale data such as medical images.
  • methods: The paper proposes using annealing-based Quantum Boltzmann Machines as part of a hybrid quantum-classical pipeline for supervised training to learn the classification of real-world data. Simulated Annealing is used as a stand-in for actual QA.
  • results: The paper demonstrates the approach on the three-class COVID-CT-MD dataset, a collection of lung Computed Tomography (CT) scan slices, and compares the performance of the quantum-classical approach to a classical baseline. The results show that the proposed approach consistently outperforms the classical baseline in terms of test accuracy and AUC-ROC-Score, and requires less training epochs to achieve this.Here’s the information in Simplified Chinese text:
  • for: 这篇论文是为了图像分类任务,特别是使用量子传输学习(QTL)和量子渐进(QA)来提高大规模数据,如医学图像的分类性能。
  • methods: 论文提议使用渐进性量子波尔兹曼为量子-классиical管道的supervised训练,以学习真实世界数据的分类。模拟渐进用于实际QA。
  • results: 论文在三类COVID-CT-MD dataset上进行了应用,该 dataset包含肺Computed Tomography(CT)扫描图像。与类型基线相比,提出的方法在测试准确率和AUC-ROC-Score方面表现出色,并且需要 fewer training epochs。
    Abstract Quantum Transfer Learning (QTL) recently gained popularity as a hybrid quantum-classical approach for image classification tasks by efficiently combining the feature extraction capabilities of large Convolutional Neural Networks with the potential benefits of Quantum Machine Learning (QML). Existing approaches, however, only utilize gate-based Variational Quantum Circuits for the quantum part of these procedures. In this work we present an approach to employ Quantum Annealing (QA) in QTL-based image classification. Specifically, we propose using annealing-based Quantum Boltzmann Machines as part of a hybrid quantum-classical pipeline to learn the classification of real-world, large-scale data such as medical images through supervised training. We demonstrate our approach by applying it to the three-class COVID-CT-MD dataset, a collection of lung Computed Tomography (CT) scan slices. Using Simulated Annealing as a stand-in for actual QA, we compare our method to classical transfer learning, using a neural network of the same order of magnitude, to display its improved classification performance. We find that our approach consistently outperforms its classical baseline in terms of test accuracy and AUC-ROC-Score and needs less training epochs to do this.
    摘要

Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

  • paper_url: http://arxiv.org/abs/2311.15961
  • repo_url: None
  • paper_authors: Jiawei Ge, Shange Tang, Jianqing Fan, Cong Ma, Chi Jin
  • For: 本研究证明了在covariate shift下的最优化算法是什么?(what are the most effective algorithms for OOD generalization under covariate shift?)* Methods: 本研究使用了Maximum Likelihood Estimation(MLE)方法,不需要任何修改,并证明了其在well-specified setting下是最优的。(what methods does the paper use? The paper uses Maximum Likelihood Estimation (MLE) method without any modification, and proves its optimality in the well-specified setting.)* Results: 本研究证明了MLE在covariate shift下是最优的(up to a constant factor),并且不需要density ratio的准确性。此外,研究还证明了在misspecified setting下,MWLE在某些场景下是最优的。(what results does the paper get? The paper proves that MLE is optimal in the covariate shift setting (up to a constant factor), and does not require the accuracy of the density ratio. Additionally, the study also shows that MWLE is optimal in certain scenarios under the misspecified setting.)
    Abstract A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.
    摘要 现代机器学习系统面临一个关键挑战,即在不同分布下进行 OUT-OF-DISTRIBUTION(OOD)泛化。尽管这个问题的重要性很大,但是标准设定下的基本问题——“哪些算法可以最好地实现OOD泛化”——还没有得到解答。这篇论文回答了这个基本问题,证明了经典的最大 LIKELIHOOD 估计(MLE)可以在偏移设定下实现最优性,即在 source data 上使用 MLE 后,无需进行任何修改,就能够在 covariate shift 下实现最优性。我们的结果适用于非常富有的参数模型,并不需要density ratio 的准确性假设。我们在 Linear Regression、Logistic Regression 和 Phase Retrieval 等三个具体的例子中证明了我们的框架的广泛适用性。此外,我们还补充了在不正确设定下的研究,证明在这些情况下,MLE 不再是最优选择,而 Maximum Weighted Likelihood Estimator(MWLE)在某些情况下成为最优选择。

GloNets: Globally Connected Neural Networks

  • paper_url: http://arxiv.org/abs/2311.15947
  • repo_url: https://github.com/antoniodicecco/glonet
  • paper_authors: Antonio Di Cecco, Carlo Metta, Marco Fantozzi, Francesco Morandin, Maurizio Parton
  • for: 提高深度神经网络的性能,超越深度相关的性能降低问题
  • methods: 引入全连接神经网络(GloNet),可以覆盖任何模型,提高网络深度而不增加复杂性或降低性能
  • results: GloNet可以自动调节信息流动,减少深度相关的学习挑战,实现稳定的训练,并且在深度相关的问题下表现出优异的稳定性和自适应性。
    Abstract Deep learning architectures suffer from depth-related performance degradation, limiting the effective depth of neural networks. Approaches like ResNet are able to mitigate this, but they do not completely eliminate the problem. We introduce Globally Connected Neural Networks (GloNet), a novel architecture overcoming depth-related issues, designed to be superimposed on any model, enhancing its depth without increasing complexity or reducing performance. With GloNet, the network's head uniformly receives information from all parts of the network, regardless of their level of abstraction. This enables GloNet to self-regulate information flow during training, reducing the influence of less effective deeper layers, and allowing for stable training irrespective of network depth. This paper details GloNet's design, its theoretical basis, and a comparison with existing similar architectures. Experiments show GloNet's self-regulation ability and resilience to depth-related learning challenges, like performance degradation. Our findings suggest GloNet as a strong alternative to traditional architectures like ResNets.
    摘要 深度学习架构受到深度相关性的性能降低问题限制了神经网络的有效深度。如ResNet等方法可以减轻这个问题,但不能完全消除其。我们介绍全网络连接神经网络(GloNet),一种新型架构,可以在任何模型上叠加,提高其深度无需增加复杂性或降低性能。GloNet的网络头 uniformly从整个网络中收集信息,不受各级别抽象层的限制。这使得GloNet在训练时可以自适应信息流,减少较深层次的影响,使得训练不受深度相关性学习挑战的影响。这篇论文介绍GloNet的设计、理论基础以及与现有相似架构的比较。实验表明GloNet具有自适应能力和对深度相关性学习挑战的抗衰假性。我们的发现表明GloNet可以作为传统架构如ResNet的强大替代品。

Over-Squashing in Riemannian Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.15945
  • repo_url: None
  • paper_authors: Julia Balla
  • for: 这个论文旨在解决图 neural network (GNN) 中的过度压缩问题,图中节点特征因为远程节点信息的压缩而失去敏感度。
  • methods: 这篇论文使用了 embedding space 的扩展,对 Riemannian manifold 的变化进行探索,以确定节点特征在不同层数下的敏感度。
  • results: 论文提出了一种基于 Hyperbolic GNNs (HGNNs) 的方法,可以在图中减轻过度压缩问题,并在负曲率图上实现了 theoretically 和 empirically 有望的结果。
    Abstract Most graph neural networks (GNNs) are prone to the phenomenon of over-squashing in which node features become insensitive to information from distant nodes in the graph. Recent works have shown that the topology of the graph has the greatest impact on over-squashing, suggesting graph rewiring approaches as a suitable solution. In this work, we explore whether over-squashing can be mitigated through the embedding space of the GNN. In particular, we consider the generalization of Hyperbolic GNNs (HGNNs) to Riemannian manifolds of variable curvature in which the geometry of the embedding space is faithful to the graph's topology. We derive bounds on the sensitivity of the node features in these Riemannian GNNs as the number of layers increases, which yield promising theoretical and empirical results for alleviating over-squashing in graphs with negative curvature.
    摘要 大多数图 neuronal networks (GNNs) 容易出现过 compress 现象,在图中节点特征变得不敏感于远程节点的信息。 latest works 表明 graph 的topology 对过 compress 有最大的影响,因此提出了图重新连接方法作为解决方案。 在这个工作中,我们研究了 Whether over-squashing 可以通过 GNN 的嵌入空间来缓解。 Specifically, we consider the generalization of Hyperbolic GNNs (HGNNs) to Riemannian manifolds of variable curvature, in which the geometry of the embedding space is faithful to the graph's topology. We derive bounds on the sensitivity of the node features in these Riemannian GNNs as the number of layers increases, which yield promising theoretical and empirical results for alleviating over-squashing in graphs with negative curvature.Here's a word-for-word translation of the text into Simplified Chinese:大多数图 neuronal networks (GNNs) 容易出现过 compress 现象,在图中节点特征变得不敏感于远程节点的信息。 latest works 表明 graph 的topology 对过 compress 有最大的影响,因此提出了图重新连接方法作为解决方案。 在这个工作中,我们研究了 Whether over-squashing 可以通过 GNN 的嵌入空间来缓解。 Specifically, we consider the generalization of Hyperbolic GNNs (HGNNs) to Riemannian manifolds of variable curvature, in which the geometry of the embedding space is faithful to the graph's topology. We derive bounds on the sensitivity of the node features in these Riemannian GNNs as the number of layers increases, which yield promising theoretical and empirical results for alleviating over-squashing in graphs with negative curvature.

Physics-informed neural networks for transformed geometries and manifolds

  • paper_url: http://arxiv.org/abs/2311.15940
  • repo_url: https://github.com/samuelburbulla/trafo-pinn
  • paper_authors: Samuel Burbulla
  • for: 用于模型化复杂的物理系统,特别是具有变形 геометрии的系统。
  • methods: integrate geometric transformations within physics-informed neural networks (PINNs) to robustly accommodate geometric variations, including diffeomorphism as a mapping of a reference domain and adapting the derivative computation of the physics-informed loss function.
  • results: enhance the flexibility of PINNs under geometric variations, demonstrated through several examples including Eikonal equation on Archimedean spiral, Poisson problem on surface manifold, Incompressible Stokes flow in deformed tube, and shape optimization with Laplace operator.
    Abstract Physics-informed neural networks (PINNs) effectively embed physical principles into machine learning, but often struggle with complex or alternating geometries. We propose a novel method for integrating geometric transformations within PINNs to robustly accommodate geometric variations. Our method incorporates a diffeomorphism as a mapping of a reference domain and adapts the derivative computation of the physics-informed loss function. This generalizes the applicability of PINNs not only to smoothly deformed domains, but also to lower-dimensional manifolds and allows for direct shape optimization while training the network. We demonstrate the effectivity of our approach on several problems: (i) Eikonal equation on Archimedean spiral, (ii) Poisson problem on surface manifold, (iii) Incompressible Stokes flow in deformed tube, and (iv) Shape optimization with Laplace operator. Through these examples, we demonstrate the enhanced flexibility over traditional PINNs, especially under geometric variations. The proposed framework presents an outlook for training deep neural operators over parametrized geometries, paving the way for advanced modeling with PDEs on complex geometries in science and engineering.
    摘要 物理学 Informed Neural Networks (PINNs) 能够嵌入物理原理到机器学习中,但经常遇到复杂或交替的几何结构。我们提议一种将几何变换 incorporated 到 PINNs 中,以强健地承受几何变化。我们的方法通过 diffeomorphism 将参照领域映射到Reference domain,并修改物理 Informed 损失函数的导数计算。这将 PINNs 扩展到不仅是平滑做变形领域,还有 Lower-dimensional manifold 和直接在训练网络时进行形状优化。我们通过以下几个问题的示例来证明我们的方法的有效性:(i)Archimedean spiral 上的 Eikonal 方程,(ii)Surface manifold 上的 Poisson 问题,(iii)弯曲管道上的压缩流动,(iv)Laplace 算子上的形状优化。通过这些示例,我们证明了我们的方法在几何变化时的强大灵活性,特别是比传统 PINNs 更加灵活。我们的框架开出了训练深度神经操作符的可能性,为科学和工程中的复杂几何问题培育出了新的模型化方法。

Towards Responsible Governance of Biological Design Tools

  • paper_url: http://arxiv.org/abs/2311.15936
  • repo_url: None
  • paper_authors: Richard Moulange, Max Langenkamp, Tessa Alexanian, Samuel Curtis, Morgan Livingston
  • for: 这些研究旨在适应生物设计工具(BDT)的快速进步,以提高蛋白质结构和序列预测模型的预测精度和设计能力。
  • methods: 这些研究使用了生成式机器学习技术,以提高BDT的预测精度和设计能力。
  • results: 这些研究获得了对BDT的预测精度和设计能力的新的发现和设计能力,但同时也存在 dual-use 风险,需要更好地考虑公共安全和创新之间的 equilibrio。
    Abstract Recent advancements in generative machine learning have enabled rapid progress in biological design tools (BDTs) such as protein structure and sequence prediction models. The unprecedented predictive accuracy and novel design capabilities of BDTs present new and significant dual-use risks. For example, their predictive accuracy allows biological agents, whether vaccines or pathogens, to be developed more quickly, while the design capabilities could be used to discover drugs or evade DNA screening techniques. Similar to other dual-use AI systems, BDTs present a wicked problem: how can regulators uphold public safety without stifling innovation? We highlight how current regulatory proposals that are primarily tailored toward large language models may be less effective for BDTs, which require fewer computational resources to train and are often developed in an open-source manner. We propose a range of measures to mitigate the risk that BDTs are misused, across the areas of responsible development, risk assessment, transparency, access management, cybersecurity, and investing in resilience. Implementing such measures will require close coordination between developers and governments.
    摘要 Current regulatory proposals, which are primarily tailored towards large language models, may be less effective for BDTs, which require fewer computational resources to train and are often developed in an open-source manner. To mitigate the risk of misuse, we propose a range of measures, including:1. Responsible development: Ensuring that BDTs are developed in a responsible and ethical manner, with consideration for their potential risks and benefits.2. Risk assessment: Conducting thorough risk assessments to identify potential hazards and vulnerabilities in BDTs, and implementing appropriate mitigation measures.3. Transparency: Ensuring that the development and use of BDTs are transparent, so that the public and regulators can understand the potential risks and benefits.4. Access management: Implementing appropriate access controls to prevent unauthorized access to BDTs and their associated data.5. Cybersecurity: Ensuring that BDTs are secure from cyber threats, and implementing appropriate incident response plans.6. Investing in resilience: Investing in the development of resilience strategies to address potential risks and vulnerabilities in BDTs.Implementing these measures will require close coordination between developers and governments, as well as a willingness to adapt to the rapidly evolving landscape of generative machine learning and biological design tools.

The Graph Convolutional Network with Multi-representation Alignment for Drug Synergy Prediction

  • paper_url: http://arxiv.org/abs/2311.16207
  • repo_url: None
  • paper_authors: Xinxing Yang, Genke Yang, Jian Chu
  • for: 预测药物组合的 synergy 效果
  • methods: 使用图 convolutional neural network with multi-representation alignment (GCNMRA) 模型
  • results: 提出了一种适于药物组合 synergy 预测任务的多表示匹配函数,并考虑了 vector 模轮的辐射优化策略,以提高预测结果的准确性和模型的收敛速度。
    Abstract Drug combination refers to the use of two or more drugs to treat a specific disease at the same time. It is currently the mainstream way to treat complex diseases. Compared with single drugs, drug combinations have better efficacy and can better inhibit toxicity and drug resistance. The computational model based on deep learning concatenates the representation of multiple drugs and the corresponding cell line feature as input, and the output is whether the drug combination can have an inhibitory effect on the cell line. However, this strategy of concatenating multiple representations has the following defects: the alignment of drug representation and cell line representation is ignored, resulting in the synergistic relationship not being reflected positionally in the embedding space. Moreover, the alignment measurement function in deep learning cannot be suitable for drug synergy prediction tasks due to differences in input types. Therefore, in this work, we propose a graph convolutional network with multi-representation alignment (GCNMRA) for predicting drug synergy. In the GCNMRA model, we designed a multi-representation alignment function suitable for the drug synergy prediction task so that the positional relationship between drug representations and cell line representation is reflected in the embedding space. In addition, the vector modulus of drug representations and cell line representation is considered to improve the accuracy of calculation results and accelerate model convergence. Finally, many relevant experiments were run on multiple drug synergy datasets to verify the effectiveness of the above innovative elements and the excellence of the GCNMRA model.
    摘要 药物组合指的是同时使用两或多种药物治疗特定疾病。目前,药物组合是复杂疾病的主流治疗方式。相比单一药物,药物组合具有更好的效果和更好地避免药物抗药和药物毒性。在深度学习基于的计算模型中,我们 concatenates multiple drug representations and corresponding cell line features as input, and the output is whether the drug combination can inhibit the cell line. However, this strategy has the following defects: the alignment of drug representation and cell line representation is ignored, resulting in the synergistic relationship not being reflected positionally in the embedding space. Moreover, the alignment measurement function in deep learning cannot be suitable for drug synergy prediction tasks due to differences in input types. Therefore, in this work, we propose a graph convolutional network with multi-representation alignment (GCNMRA) for predicting drug synergy. In the GCNMRA model, we designed a multi-representation alignment function suitable for the drug synergy prediction task so that the positional relationship between drug representations and cell line representation is reflected in the embedding space. In addition, the vector modulus of drug representations and cell line representation is considered to improve the accuracy of calculation results and accelerate model convergence. Finally, many relevant experiments were run on multiple drug synergy datasets to verify the effectiveness of the above innovative elements and the excellence of the GCNMRA model.

FLASC: A Flare-Sensitive Clustering Algorithm: Extending HDBSCAN* for Detecting Branches in Clusters

  • paper_url: http://arxiv.org/abs/2311.15887
  • repo_url: None
  • paper_authors: D. M. Bot, J. Peeters, J. Liesenborgs, J. Aerts
  • for: 本文提出了一种针对爆发性 clustering 的算法 FLASC,用于检测数据中爆发性的分布模式。
  • methods: 该算法基于 HDBSCAN*,通过后处理步骤 diferenciate 分支在检测到的群集 manifold 中,以添加一种可以发现的模式。 本文提出了两种变体,其中一种是更加计算成本高,另一种是更加具有噪声鲁棒性。
  • results: 作者们通过 synthetic 数据集和实际数据集的实验,证明了 FLASC 算法可以具有类似于 HDBSCAN* 的计算成本和稳定性,并且在数据探索中比 HDBSCAN* 更有优势。
    Abstract We present FLASC, an algorithm for flare-sensitive clustering. Our algorithm builds upon HDBSCAN* -- which provides high-quality density-based clustering performance -- through a post-processing step that differentiates branches within the detected clusters' manifold, adding a type of pattern that can be discovered. Two variants of the algorithm are presented, which trade computational cost for noise robustness. We show that both variants scale similarly to HDBSCAN* in terms of computational cost and provide stable outputs using synthetic data sets, resulting in an efficient flare-sensitive clustering algorithm. In addition, we demonstrate the algorithm's benefit in data exploration over HDBSCAN* clustering on two real-world data sets.
    摘要 我们介绍FLASC算法,用于敏感折叠 clustering。我们的算法基于HDBSCAN*,提供高质量浸泡基于分布 clustering性能,通过一个后处理步骤, differentiate clusters的分支在检测到的群集 manifold 中,添加一种可以发现的模式。我们提出了两个变体的算法,这两个变体在计算成本和噪声鲁棒性之间进行了交换。我们表明,这两个变体在计算成本方面与HDBSCAN*相当,并且在数据集中提供了稳定的输出。此外,我们还证明了该算法在数据探索中的优势,比HDBSCAN* clustering在两个真实世界数据集中表现出更好的效果。

Nodal Hydraulic Head Estimation through Unscented Kalman Filter for Data-driven Leak Localization in Water Networks

  • paper_url: http://arxiv.org/abs/2311.15875
  • repo_url: None
  • paper_authors: Luis Romero-Ben, Paul Irofti, Florin Stoican, Vicenç Puig
  • for: 这个论文是为了提出一种基于Unscented Kalman Filter(UKF)的水分配网络(WDN)中节点 hidraulic head估算方法,并应用于泄漏定位。
  • methods: 该方法使用UKF方法来精细地估算水网的 hidraulic state,并考虑了预测模型以及可用的压力和需求测量。
  • results: 在模拟实际情况下测试表明,该方法可以有效地提高水网的状态估算和数据驱动的泄漏定位。
    Abstract In this paper, we present a nodal hydraulic head estimation methodology for water distribution networks (WDN) based on an Unscented Kalman Filter (UKF) scheme with application to leak localization. The UKF refines an initial estimation of the hydraulic state by considering the prediction model, as well as available pressure and demand measurements. To this end, it provides customized prediction and data assimilation steps. Additionally, the method is enhanced by dynamically updating the prediction function weight matrices. Performance testing on the Modena benchmark under realistic conditions demonstrates the method's effectiveness in enhancing state estimation and data-driven leak localization.
    摘要 在这篇论文中,我们提出了基于Unscented Kalman Filter(UKF)方法的水配网(WDN)节点液压头估算方法,用于泄露定位。UKF利用预测模型以及可用的压力和需求测量,进一步细化初始估算的液压状态。为此,它提供了个性化预测和数据吸收步骤。此外,方法还通过动态更新预测函数权重矩阵来增强性。在模拟实际条件下测试 Modena bench mark,表明该方法可以有效地提高状态估算和数据驱动的泄露定位。

A precise symbolic emulator of the linear matter power spectrum

  • paper_url: http://arxiv.org/abs/2311.15865
  • repo_url: https://github.com/deaglanbartlett/symbolic_pofk
  • paper_authors: Deaglan J. Bartlett, Lukas Kammerer, Gabriel Kronberger, Harry Desmond, Pedro G. Ferreira, Benjamin D. Wandelt, Bogdan Burlacu, David Alonso, Matteo Zennaro
  • for: 这个论文主要是为了计算宇宙学参数的物质能谱,以便更好地进行宇宙学分析。
  • methods: 这个论文使用了生物进化程序基于 симвоlic regression 框架,以探索宇宙学参数下的可能的数学表达,以approximate 宇宙学参数和 $\sigma_8$。
  • results: 这个论文得到了一个精度为0.2%的分数的线性能谱表达,以及一个简单的分数表达来 aproximate $\sigma_8$,两者具有类似的精度。这些表达可以 Physical interpretations of various terms in the expression, and provide a simple analytic approximation for $\sigma_8$ with a root mean squared fractional error of just 0.4% when evaluated across the same range of cosmologies.
    Abstract Computing the matter power spectrum, $P(k)$, as a function of cosmological parameters can be prohibitively slow in cosmological analyses, hence emulating this calculation is desirable. Previous analytic approximations are insufficiently accurate for modern applications, so black-box, uninterpretable emulators are often used. We utilise an efficient genetic programming based symbolic regression framework to explore the space of potential mathematical expressions which can approximate the power spectrum and $\sigma_8$. We learn the ratio between an existing low-accuracy fitting function for $P(k)$ and that obtained by solving the Boltzmann equations and thus still incorporate the physics which motivated this earlier approximation. We obtain an analytic approximation to the linear power spectrum with a root mean squared fractional error of 0.2% between $k = 9\times10^{-3} - 9 \, h{\rm \, Mpc^{-1}$ and across a wide range of cosmological parameters, and we provide physical interpretations for various terms in the expression. We also provide a simple analytic approximation for $\sigma_8$ with a similar accuracy, with a root mean squared fractional error of just 0.4% when evaluated across the same range of cosmologies. This function is easily invertible to obtain $A_{\rm s}$ as a function of $\sigma_8$ and the other cosmological parameters, if preferred. It is possible to obtain symbolic approximations to a seemingly complex function at a precision required for current and future cosmological analyses without resorting to deep-learning techniques, thus avoiding their black-box nature and large number of parameters. Our emulator will be usable long after the codes on which numerical approximations are built become outdated.
    摘要 计算物质能谱,$P(k)$, 作为 cosmological 参数的函数可能非常慢,因此使用模拟这个计算是有利可图。先前的分析方法不够精确,因此通常使用透明度不高的黑盒模拟器。我们使用高效的进化生物学计算框架来探索可能的数学表达空间,以估算 $P(k)$ 和 $\sigma_8$。我们学习了现有的低精度适应函数与解析 Boltzmann 方程得到的函数之间的比例,从而仍然包含物理学的核心,这使得我们可以提供高精度的分析。我们在 $k = 9\times10^{-3} - 9 \, h{\rm \, Mpc^{-1}$ 内获得了一个分析式的线性能谱,其误差为 0.2%,并且在各种 cosmological 参数下具有广泛的应用范围。我们还提供了一个简单的分析式方法来估算 $\sigma_8$,其误差为 0.4%,并且可以 invertible 地将其转换为 $A_{\rm s}$ 函数的函数,如果您希望。我们的 emulator 可以在现代和未来的 cosmological 分析中使用,而不需要使用深度学习技术,从而避免黑盒模型的问题和大量参数。我们的模拟器将在数字代码上的数字方法变得过时后仍然可用。

Multi-Agent Reinforcement Learning for Power Control in Wireless Networks via Adaptive Graphs

  • paper_url: http://arxiv.org/abs/2311.15858
  • repo_url: None
  • paper_authors: Lorenzo Mario Amorosa, Marco Skocaj, Roberto Verdone, Deniz Gündüz
  • for: 提高无线网络中的高质量和多样化通信服务的需求,促进了广泛的动态优化策略研究。
  • methods: 使用多智能深度学习(MADRL)方法来解决无线网络中复杂的优化问题,如电力控制。
  • results: 通过使用图 структуры来实现分布式代理之间的交互,提高了MADRL方法的稳定性和普适性。
    Abstract The ever-increasing demand for high-quality and heterogeneous wireless communication services has driven extensive research on dynamic optimization strategies in wireless networks. Among several possible approaches, multi-agent deep reinforcement learning (MADRL) has emerged as a promising method to address a wide range of complex optimization problems like power control. However, the seamless application of MADRL to a variety of network optimization problems faces several challenges related to convergence. In this paper, we present the use of graphs as communication-inducing structures among distributed agents as an effective means to mitigate these challenges. Specifically, we harness graph neural networks (GNNs) as neural architectures for policy parameterization to introduce a relational inductive bias in the collective decision-making process. Most importantly, we focus on modeling the dynamic interactions among sets of neighboring agents through the introduction of innovative methods for defining a graph-induced framework for integrated communication and learning. Finally, the superior generalization capabilities of the proposed methodology to larger networks and to networks with different user categories is verified through simulations.
    摘要 随着无线通信服务的需求不断增长和多样化,研究在无线网络中的动态优化策略已经得到了广泛的关注。其中,多代理深度学习(MADRL)被认为是解决许多复杂优化问题的有效方法。然而,在应用MADRL到各种网络优化问题时,融合问题的挑战仍然存在。在这篇论文中,我们提出使用图structure来实现分布式代理之间的通信协调,以解决这些挑战。具体来说,我们利用图神经网络(GNNs)作为政策参数化的神经网络,以引入关系拟合假设,从而在集体决策过程中引入对称性。此外,我们还提出了定义图strucure-induced框架,以便在集成通信和学习过程中模型多个邻居代理之间的动态互动。最后,我们通过实验证明了我们的方法在更大的网络和不同用户类型的网络中的普适性。

A systematic study comparing hyperparameter optimization engines on tabular data

  • paper_url: http://arxiv.org/abs/2311.15854
  • repo_url: None
  • paper_authors: Balazs Kegl
  • for: 这个论文主要是为了比较所有可用于 hyperparameter 优化(hyperopt)的引擎,并evaluate它们的性能。
  • methods: 作者使用了 Ray Tune 库中available的所有 hyperopt 引擎,并引入了两种 normalize 和综合统计方法,一种是rank-based,另一种是将得分与随机搜索得分和全格子搜索得分之间进行比较。
  • results: 作者发现大多数引擎都超过随机搜索,但只有 HEBO、AX 和 BlendSearch 三个引擎明显出众。此外,作者发现一些引擎似乎专门适用于hyperopt 某些学习算法,这使得在比较研究中使用 hyperopt 技术可能会带来一定的偏袋。
    Abstract We run an independent comparison of all hyperparameter optimization (hyperopt) engines available in the Ray Tune library. We introduce two ways to normalize and aggregate statistics across data sets and models, one rank-based, and another one sandwiching the score between the random search score and the full grid search score. This affords us i) to rank the hyperopt engines, ii) to make generalized and statistically significant statements on how much they improve over random search, and iii) to make recommendations on which engine should be used to hyperopt a given learning algorithm. We find that most engines beat random search, but that only three of them (HEBO, AX, and BlendSearch) clearly stand out. We also found that some engines seem to specialize in hyperopting certain learning algorithms, which makes it tricky to use hyperopt in comparison studies, since the choice of the hyperopt technique may favor some of the models in the comparison.
    摘要 我们对RAY Tune库中的所有优化引擎进行独立比较。我们引入了两种Normalize和综合统计方法,一种基于排名,另一种将得分置于随机搜索得分和全格子搜索得分之间。这使得我们能够i) 将优化引擎排名,ii) 对优化引擎进行通用和统计siginficant的评价,iii) 为给定学习算法进行优化引擎选择。我们发现大多数引擎超过随机搜索,但只有HEBO、AX和BlendSearch三个引擎明显出众。我们还发现了一些引擎在hyperopt中特化于某些学习算法,这使得在比较研究中使用hyperopt变得复杂,因为选择优化技术可能会偏爱某些模型。

Temporal Action Localization for Inertial-based Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2311.15831
  • repo_url: None
  • paper_authors: Marius Bock, Michael Moeller, Kristof Van Laerhoven
  • for: 这篇论文是为了应用机器学习概念到其他领域而写的,特别是将TAL模型应用于穿戴式各种传感器的人体活动识别。
  • methods: 这篇论文使用了现有的TAL模型,并将其应用于 raw 的惯性数据,以进行人体活动识别。
  • results: 论文的结果表明,使用TAL模型可以在 4 个 из 6 个穿戴式活动识别数据集上表现出色,与各种传感器模型相比,提高了 F1 分数的25%。此外,引入 TAL 社区最受欢迎的度量,即平均准确率,表明 TAL 模型能够生成更准确的分段,并且在所有数据集上提高了 NULL 类准确率。
    Abstract A persistent trend in Deep Learning has been the applicability of machine learning concepts to other areas than originally introduced for. As of today, state-of-the-art activity recognition from wearable sensors relies on classifiers being trained on fixed windows of data. Contrarily, video-based Human Activity Recognition has followed a segment-based prediction approach, localizing activity occurrences from start to end. This paper is the first to systematically demonstrate the applicability of state-of-the-art TAL models for wearable Human Activity Recongition (HAR) using raw inertial data as input. Our results show that state-of-the-art TAL models are able to outperform popular inertial models on 4 out of 6 wearable activity recognition benchmark datasets, with improvements ranging as much as 25% in F1-score. Introducing the TAL community's most popular metric to inertial-based HAR, namely mean Average Precision, our analysis shows that TAL models are able to produce more coherent segments along with an overall higher NULL-class accuracy across all datasets. Being the first to provide such an analysis, the TAL community offers an interesting new perspective to inertial-based HAR with yet to be explored design choices and training concepts, which could be of significant value for the inertial-based HAR community.
    摘要 历史上,深度学习领域的一个持续趋势是将机器学习概念应用到原来没有预期的领域。到今天为止,最新的活动识别从佩戴式传感器中使用的批处理方法仍然基于固定窗口的数据进行训练。相反,视频基于人体活动识别采用了分割预测方法,将活动发生时间段化为开始和结束。这篇论文是首次系统地证明了现有的TAL模型在佩戴式人体活动识别(HAR)中使用原始各种数据作为输入时的可靠性。我们的结果显示,现有的TAL模型在4个出于6个佩戴式活动识别数据集上能够超越流行的各种各样的模型,改进率可达25%。引入TAL社区最受欢迎的度量,即平均准确率,我们的分析表明TAL模型能够生成更具一致性的分割,并在所有数据集上提高NULL类准确率。这是首次提供这种分析,TAL社区提供了新的视角,可能对无力感器基的HAR社区具有很大的价值。

Exploring Artificial Intelligence Methods for Energy Prediction in Healthcare Facilities: An In-Depth Extended Systematic Review

  • paper_url: http://arxiv.org/abs/2311.15807
  • repo_url: None
  • paper_authors: Marjan FatehiJananloo, Helen Stopps, J. J. McArthur
  • for: 这个研究旨在描述用机器学习和人工智能技术预测医院建筑物的能源消耗。
  • methods: 该研究通过PRISMA框架进行了全面的文献综述,检查了1884篇文献中的17篇,以确定预测能源的状态 искус智能技术的现状和未来研究方向。
  • results: 该综述发现了影响能源预测的多种数据输入,其中occupancy和气象数据是重要的预测因素。然而,许多研究未能深入探讨数据选择的影响,并且存在了时间动态、运行状态和数据整合方面的缺失。机器学习,特别是深度学习模型如ANNs,在这个领域有潜力,但也存在解释性和计算占用的挑战。
    Abstract Hospitals, due to their complexity and unique requirements, play a pivotal role in global energy consumption patterns. This study conducted a comprehensive literature review, utilizing the PRISMA framework, of articles that employed machine learning and artificial intelligence techniques for predicting energy consumption in hospital buildings. Of the 1884 publications identified, 17 were found to address this specific domain and have been thoroughly reviewed to establish the state-of-the-art and identify gaps where future research is needed. This review revealed a diverse range of data inputs influencing energy prediction, with occupancy and meteorological data emerging as significant predictors. However, many studies failed to delve deep into the implications of their data choices, and gaps were evident regarding the understanding of time dynamics, operational status, and preprocessing methods. Machine learning, especially deep learning models like ANNs, have shown potential in this domain, yet they come with challenges, including interpretability and computational demands. The findings underscore the immense potential of AI in optimizing hospital energy consumption but also highlight the need for more comprehensive and granular research. Key areas for future research include the optimization of ANN approaches, new optimization and data integration techniques, the integration of real-time data into Intelligent Energy Management Systems, and increasing focus on long-term energy forecasting.
    摘要 医院由于其复杂性和特殊需求,在全球能源消耗模式中扮演着重要的角色。这项研究通过使用PRISMA框架,对使用机器学习和人工智能技术预测医院建筑物的能源消耗进行了全面的文献评估。从1884篇文献中,17篇文献addressed这个具体领域,并进行了住持阅读,以确定领域的现状和未来研究方向。这项评估发现了各种数据输入对能源预测的影响,其中占用和气象数据是重要的预测因素。然而,许多研究未能深入探讨数据选择的影响,并且存在时间动态、运行状态和数据处理方法等方面的缺口。机器学习,特别是深度学习模型如人工神经网络,在这个领域表现出了潜力,但同时也存在解释性和计算需求等挑战。发现结果表明,AI在优化医院能源消耗方面存在巨大的潜力,但也需要进一步的全面和细化研究。未来研究的关键领域包括ANN方法优化、新的优化和数据集成技术、实时数据 integration到智能能源管理系统,以及更多关注长期能源预测。

Rethinking Privacy in Machine Learning Pipelines from an Information Flow Control Perspective

  • paper_url: http://arxiv.org/abs/2311.15792
  • repo_url: None
  • paper_authors: Lukas Wutschitz, Boris Köpf, Andrew Paverd, Saravan Rajmohan, Ahmed Salem, Shruti Tople, Santiago Zanella-Béguelin, Menglin Xia, Victor Rühle
  • for: 本研究旨在提出一种基于信息流控制的机器学习系统模型,以便利用 metadata 来提供明确的隐私和安全保证,并且可以在多个参与者之间共享敏感信息并实现细化的访问控制。
  • methods: 本研究使用了两种方法来实现用户级不互相侵犯:1)精细调整每个用户的模型,2)在推理时使用检索增强的模型来访问用户特定的数据集。与基线相比,这两种方法可以提供更好的实用性、扩展性和灵活性,同时坚持严格的隐私和安全保证。
  • results: 在两个科学文献数据集上进行了评估,结果显示,使用检索增强的模型可以提供最好的实用性、扩展性和灵活性,同时坚持严格的隐私和安全保证。
    Abstract Modern machine learning systems use models trained on ever-growing corpora. Typically, metadata such as ownership, access control, or licensing information is ignored during training. Instead, to mitigate privacy risks, we rely on generic techniques such as dataset sanitization and differentially private model training, with inherent privacy/utility trade-offs that hurt model performance. Moreover, these techniques have limitations in scenarios where sensitive information is shared across multiple participants and fine-grained access control is required. By ignoring metadata, we therefore miss an opportunity to better address security, privacy, and confidentiality challenges. In this paper, we take an information flow control perspective to describe machine learning systems, which allows us to leverage metadata such as access control policies and define clear-cut privacy and confidentiality guarantees with interpretable information flows. Under this perspective, we contrast two different approaches to achieve user-level non-interference: 1) fine-tuning per-user models, and 2) retrieval augmented models that access user-specific datasets at inference time. We compare these two approaches to a trivially non-interfering zero-shot baseline using a public model and to a baseline that fine-tunes this model on the whole corpus. We evaluate trained models on two datasets of scientific articles and demonstrate that retrieval augmented architectures deliver the best utility, scalability, and flexibility while satisfying strict non-interference guarantees.
    摘要 现代机器学习系统通常使用已经增长到极限的训练集来训练模型。通常情况下,模型训练过程中忽略Metadata如拥有、访问控制和许可信息。相反,以保护隐私为由,我们通常采用通用技术如数据清洁和不同化公民模型训练,这些技术具有内置的隐私/实用负担,这会增加模型性能的限制。此外,这些技术在多个参与者之间共享敏感信息并需要细化访问控制时存在限制。由于忽略Metadata,我们因此失去了利用Metadata来更好地解决安全、隐私和机密性挑战的机会。在本文中,我们采用信息流控制 perspective来描述机器学习系统,这allow us可以利用Metadata如访问控制策略来定义可读性的信息流。根据这种视角,我们比较了两种不同的方法来实现用户级非互斥:1)精细调整每个用户的模型,和2)在推理时使用用户特定的数据来提高模型性能。我们与这两种方法相比较了一个极其不互斥的零 shot baseline,使用公共模型,以及一个基于整个训练集来练习这个模型的基eline。我们在两个科学文献 dataset 上训练模型,并证明了使用提取式扩展的架构可以提供最好的实用性、扩展性和灵活性,同时满足严格的非互斥保证。

Attend Who is Weak: Enhancing Graph Condensation via Cross-Free Adversarial Training

  • paper_url: http://arxiv.org/abs/2311.15772
  • repo_url: None
  • paper_authors: Xinglin Li, Kun Wang, Hanhui Deng, Yuxuan Liang, Di Wu
  • for: 本研究targets the graph condensation problem, aiming to compress large and complex graphs into a concise and synthetic representation that preserves the most essential and discriminative information of structure and features.
  • methods: 我们提出了一种 Shock Absorber 的概念 (a type of perturbation),它可以增强原始图的稳定性和Robustness against changes by selectively perturbing the underrepresented or insufficiently informative parts of the graph. 我们在 adversarial training 中使用了这种 Shock Absorber,通过在 regularly spaced intervals 强制匹配 GNNs 的梯度,以及在每个更新前 Synthetic graph point 上使用 Shock Absorber 来增大 Synthetic dataset 与原始图的距离。
  • results: 我们在 8 个 dataset (3 个图数据集和 5 个节点分类数据集) 上验证了我们的框架,并获得了显著的结果:比如在 Cora、Citeseer 和 Ogbn-Arxiv 上,我们可以获得 nearly 1.13% to 5.03% 的提升,与 SOTA 模型相比。此外,我们的算法增加了只有 0.2% to 2.2% 的额外时间开销,相比于普通的对抗训练,我们的方法可以提高时间效率,大约是 4 倍。
    Abstract In this paper, we study the \textit{graph condensation} problem by compressing the large, complex graph into a concise, synthetic representation that preserves the most essential and discriminative information of structure and features. We seminally propose the concept of Shock Absorber (a type of perturbation) that enhances the robustness and stability of the original graphs against changes in an adversarial training fashion. Concretely, (I) we forcibly match the gradients between pre-selected graph neural networks (GNNs) trained on a synthetic, simplified graph and the original training graph at regularly spaced intervals. (II) Before each update synthetic graph point, a Shock Absorber serves as a gradient attacker to maximize the distance between the synthetic dataset and the original graph by selectively perturbing the parts that are underrepresented or insufficiently informative. We iteratively repeat the above two processes (I and II) in an adversarial training fashion to maintain the highly-informative context without losing correlation with the original dataset. More importantly, our shock absorber and the synthesized graph parallelly share the backward process in a free training manner. Compared to the original adversarial training, it introduces almost no additional time overhead. We validate our framework across 8 datasets (3 graph and 5 node classification datasets) and achieve prominent results: for example, on Cora, Citeseer and Ogbn-Arxiv, we can gain nearly 1.13% to 5.03% improvements compare with SOTA models. Moreover, our algorithm adds only about 0.2% to 2.2% additional time overhead over Flicker, Citeseer and Ogbn-Arxiv. Compared to the general adversarial training, our approach improves time efficiency by nearly 4-fold.
    摘要 在这篇论文中,我们研究了图像压缩(graph condensation)问题,通过压缩大型复杂图进而生成一个简洁、摘要的表示,保留原始结构和特征的关键信息。我们提出了一种叫做冲动吸收器(Shock Absorber)的概念,该概念可以增强原始图的稳定性和Robustness。具体来说,我们在批处理频率 régulièrement spaced intervals中强制匹配预选的图神经网络(GNNs)在原始训练图和生成图之间的梯度。(II)在每个生成图点之前,冲动吸收器会选择性地干扰原始图中不足缺乏信息的部分,以最大化生成图与原始图之间的距离。我们在对抗训练方式下重复这两个过程,以保持高度有用的上下文,而不失去与原始数据集的相关性。另外,我们的冲动吸收器和生成图在自由训练方式下并行进行反向过程,从而减少了对原始对抗训练的额外时间开销。我们在8个数据集(3个图数据集和5个节点分类数据集)上验证了我们的框架,并取得了显著的成果:例如,在Cora、Citeseer和Ogbn-Arxiv上,我们可以相对于SOTA模型提高约1.13%到5.03%。此外,我们的算法增加了约0.2%到2.2%的额外时间开销,相比于Flicker、Citeseer和Ogbn-Arxiv。相比于通用对抗训练,我们的方法提高了时间效率约4倍。

Learning Multi-Frequency Partial Correlation Graphs

  • paper_url: http://arxiv.org/abs/2311.15756
  • repo_url: https://github.com/officiallydac/bspcg
  • paper_authors: Gabriele D’Acunto, Paolo Di Lorenzo, Francesco Bonchi, Stefania Sardellitti, Sergio Barbarossa
  • for: 提高时间序列之间的相互关系学习的研究,并能够在不同频率带中划分相互关系。
  • methods: 提出了一种块稀、频率依赖的偏 correlate图,其中层次分别对应不同的频率带,并且可以在一些层次上存在偏 correlate。
  • results: 对synthetic数据进行了数值实验,并证明了提出的方法可以超过当前状态的艺术。此外,对金融时间序列进行分析,证明了在特定的频率带内存在偏 correlate,表明了我们的方法可以提供有价值的洞察。
    Abstract Despite the large research effort devoted to learning dependencies between time series, the state of the art still faces a major limitation: existing methods learn partial correlations but fail to discriminate across distinct frequency bands. Motivated by many applications in which this differentiation is pivotal, we overcome this limitation by learning a block-sparse, frequency-dependent, partial correlation graph, in which layers correspond to different frequency bands, and partial correlations can occur over just a few layers. To this aim, we formulate and solve two nonconvex learning problems: the first has a closed-form solution and is suitable when there is prior knowledge about the number of partial correlations; the second hinges on an iterative solution based on successive convex approximation, and is effective for the general case where no prior knowledge is available. Numerical results on synthetic data show that the proposed methods outperform the current state of the art. Finally, the analysis of financial time series confirms that partial correlations exist only within a few frequency bands, underscoring how our methods enable the gaining of valuable insights that would be undetected without discriminating along the frequency domain.
    摘要 尽管有大量研究投入了时间序列之间的学习依赖关系,现状下的最佳方法仍面临一个主要限制:现有方法只学习部分相关性,而不能区分不同频率带。为了解决这个限制,我们超越了现状,通过学习块简单、频率相关的部分相关图,其层次对应不同频率带,而且部分相关性只能在几层之间发生。为达到这个目标,我们提出并解决了两个非核心学习问题:第一个问题有关闭式解决方案,适用于有相关性数量的先验知识的情况;第二个问题基于Successive Convex Approximation的迭代解法,适用于一般情况下,无先验知识的情况。数据分析表明,我们提出的方法在synthetic数据上超过当前状态的表现。最后,对金融时间序列的分析表明,partial相关性只存在于一些频率带中,这 подтвержда了我们的方法可以获得不可靠的启示,而不可能通过不区分频率域来获得。

Tabular Two-Dimensional Correlation Analysis for Multifaceted Characterization Data

  • paper_url: http://arxiv.org/abs/2311.15703
  • repo_url: None
  • paper_authors: Shun Muroga, Satoshi Yamazaki, Koji Michishio, Hideaki Nakajima, Takahiro Morimoto, Nagayasu Oshima, Kazufumi Kobashi, Toshiya Okazaki
  • for: 本研究旨在提出二维相关分析方法,用于从多方面Characterization数据中提取特征,以更好地理解材料性能。
  • methods: 该方法使用 Hierarchical clustering 和异步相关分析,将结构参数变化的相似性和阶段延迟 visualized 为热图,以便更好地理解材料的层次结构。
  • results: 通过应用该方法于碳纳米管(CNTs)膜在不同温度下的数据集,发现了这些材料的层次结构的复杂性,包括void、bundle 和杂质碳。分析结果解决了对 strucutral 变化的顺序问题,尤其是在多方面Characterization 数据中,11 个结构参数由 8 种测量方法产生的复杂行为。
    Abstract We propose tabular two-dimensional correlation analysis for extracting features from multifaceted characterization data, essential for understanding material properties. This method visualizes similarities and phase lags in structural parameter changes through heatmaps, combining hierarchical clustering and asynchronous correlations. We applied the proposed method to datasets of carbon nanotube (CNTs) films annealed at various temperatures and revealed the complexity of their hierarchical structures, which include elements like voids, bundles, and amorphous carbon. Our analysis addresses the challenge of attempting to understand the sequence of structural changes, especially in multifaceted characterization data where 11 structural parameters derived from 8 characterization methods interact with complex behavior. The results show how phase lags (asynchronous changes from stimuli) and parameter similarities can illuminate the sequence of structural changes in materials, providing insights into phenomena like the removal of amorphous carbon and graphitization in annealed CNTs. This approach is beneficial even with limited data and holds promise for a wide range of material analyses, demonstrating its potential in elucidating complex material behaviors and properties.
    摘要 我们提议使用二维 correlate 分析方法,从多方面Characterization数据中提取特征,这些特征是物质性质的理解所必需的。这种方法通过热图显示相似性和阶段延迟,在层次 clustering 和 asynchronous 相关性之间进行组合。我们对碳纳米管(CNTs)电泳膜的不同温度处理数据进行了应用,并揭示了这些结构的层次结构的复杂性,包括void、 bundle 和杂质碳。我们的分析解决了对 strucutral 变化的顺序问题,特别是在多方面Characterization数据中,11个结构参数由8种测量方法的复杂行为相互作用。结果显示了阶段延迟(异步变化)和参数相似性可以照明材料中的结构变化的顺序,提供了对于如果摘除杂质碳和Graphitization在电泳膜中的phenomena的深入理解。这种方法具有有限数据的优点,并且在各种材料分析中具有潜在的潜力,因此可以用于解释复杂的材料行为和性质。

Automated discovery of trade-off between utility, privacy and fairness in machine learning models

  • paper_url: http://arxiv.org/abs/2311.15691
  • repo_url: None
  • paper_authors: Bogdan Ficiu, Neil D. Lawrence, Andrei Paleyes
  • for: 这个论文的目的是为了解决机器学习模型在做出决策和政策时是如何保证它们的决策是公正的和遵守政府法规的问题。
  • methods: 这个论文使用了 bayesian 优化来解决机器学习模型中的公正性、隐私和性能之间的贸易决策问题,并提出了 PFairDP 管道来找到这些模型的 pareto 优点点。
  • results: 论文通过实验表明,PFairDP 可以成功地找到机器学习模型中的公正性、隐私和性能之间的平衡点,并且可以用来复现手动设置约束的结果。
    Abstract Machine learning models are deployed as a central component in decision making and policy operations with direct impact on individuals' lives. In order to act ethically and comply with government regulations, these models need to make fair decisions and protect the users' privacy. However, such requirements can come with decrease in models' performance compared to their potentially biased, privacy-leaking counterparts. Thus the trade-off between fairness, privacy and performance of ML models emerges, and practitioners need a way of quantifying this trade-off to enable deployment decisions. In this work we interpret this trade-off as a multi-objective optimization problem, and propose PFairDP, a pipeline that uses Bayesian optimization for discovery of Pareto-optimal points between fairness, privacy and utility of ML models. We show how PFairDP can be used to replicate known results that were achieved through manual constraint setting process. We further demonstrate effectiveness of PFairDP with experiments on multiple models and datasets.
    摘要 机器学习模型在决策和政策操作中作为中心组件,直接影响个人生活。为了 acted ethically 和遵循政府法规,这些模型需要做出公平的决策并保护用户隐私。然而,这些需求可能会导致模型性能下降,与可能带有偏见和隐私泄露的模型相比。因此,机器学习模型的公平、隐私和性能之间存在负面的贸易,需要一种方法来衡量这个贸易,以便进行部署决策。在这种情况下,我们将这个贸易解释为多目标优化问题,并提出了PFairDP,一个使用 Bayesian 优化的管道,用于发现机器学习模型的公平、隐私和用户之间的 pareto 优点点。我们证明了PFairDP可以用来复制通过手动约束设定过程实现的知名结果。此外,我们还通过多个模型和数据集的实验证明了PFairDP的有效性。

The Battleship Approach to the Low Resource Entity Matching Problem

  • paper_url: http://arxiv.org/abs/2311.15685
  • repo_url: https://github.com/bargenossar/the-battleship-approach-to-al-of-em-problem
  • paper_authors: Bar Genossar, Avigdor Gal, Roee Shraga
  • for: 解决low resource entity matching问题,即寻找有限数据量下的实体匹配问题。
  • methods: 提出了一种新的活动学习方法,基于 tuple pair 的分布式表示,以优化 entity matching 的匹配率。
  • results: 对比 state-of-the-art 活动学习解决方案,提出的算法在低资源实体匹配问题上表现出色,并且使用 fewer samples 可以与 fully trained 算法匹配。
    Abstract Entity matching, a core data integration problem, is the task of deciding whether two data tuples refer to the same real-world entity. Recent advances in deep learning methods, using pre-trained language models, were proposed for resolving entity matching. Although demonstrating unprecedented results, these solutions suffer from a major drawback as they require large amounts of labeled data for training, and, as such, are inadequate to be applied to low resource entity matching problems. To overcome the challenge of obtaining sufficient labeled data we offer a new active learning approach, focusing on a selection mechanism that exploits unique properties of entity matching. We argue that a distributed representation of a tuple pair indicates its informativeness when considered among other pairs. This is used consequently in our approach that iteratively utilizes space-aware considerations. Bringing it all together, we treat the low resource entity matching problem as a Battleship game, hunting indicative samples, focusing on positive ones, through awareness of the latent space along with careful planning of next sampling iterations. An extensive experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching, and although using less samples, can be as successful as state-of-the-art fully trained known algorithms.
    摘要 实体匹配问题是数据集合中核心的数据集成问题,即判断两个数据元素是否对应于真实世界中的同一个实体。 current deep learning methods 使用预训练语言模型,已经提出了解决实体匹配问题的新方法,但这些方法受到一个主要的缺点,即需要大量的标注数据进行训练,因此对低资源实体匹配问题不适用。 为了解决获得充足的标注数据的挑战,我们提出了一种新的活动学习方法,关注选择机制,利用实体匹配的特有性。我们 argue That a distributed representation of a tuple pair indicates its informativeness when considered among other pairs。这种方法 iteratively utilizes space-aware considerations。将其总结起来,我们对低资源实体匹配问题进行了一种战舰游戏的尝试,猎捕指示性样本,专注于正面样本,通过观察隐藏空间以及精心规划下一轮采样迭代。 experimental analysis 表明,我们提出的算法可以在低资源实体匹配问题中超越现有的活动学习解决方案,并且使用更少的样本,可以与完全训练的知道算法匹配。

Information theoretic study of the neural geometry induced by category learning

  • paper_url: http://arxiv.org/abs/2311.15682
  • repo_url: None
  • paper_authors: Laurent Bonnasse-Gahot, Jean-Pierre Nadal
  • for: 本研究探讨了生物 neural network 和人工 neural network 中 categorization 的重要性,通过信息理论的方法来评估表示induced的效率。
  • methods: 本研究使用了信息理论的方法,将相关的 Bayesian 成本 decomposes 为两部分,一部分是代码部分,另一部分是解码部分。最小化代码成本意味着 maximize neural activities 和类别之间的共识信息。我们analytically 表明,这个共识信息可以写作两个项目,第一项是找到合适的表示空间,第二项是在这个空间上建立一个合适的表示,基于神经 Fisher 信息。
  • results: 一个重要结论是,category learning 会导致决策边界附近的神经空间扩展。此外,我们提供了数据示例,表明 Fisher 信息 coding 神经 populations 与类别边界对齐。
    Abstract Categorization is an important topic both for biological and artificial neural networks. Here, we take an information theoretic approach to assess the efficiency of the representations induced by category learning. We show that one can decompose the relevant Bayesian cost into two components, one for the coding part and one for the decoding part. Minimizing the coding cost implies maximizing the mutual information between the set of categories and the neural activities. We analytically show that this mutual information can be written as the sum of two terms that can be interpreted as (i) finding an appropriate representation space, and, (ii) building a representation with the appropriate metrics, based on the neural Fisher information on this space. One main consequence is that category learning induces an expansion of neural space near decision boundaries. Finally, we provide numerical illustrations that show how Fisher information of the coding neural population aligns with the boundaries between categories.
    摘要 categorization 是生物和人工神经网络中的重要话题。我们采用信息学方法来评估分类学习所导致的表示的效率。我们表明,可以将相关的 bayesian 成本 decomposing 为两个部分:一部分是编码成本,另一部分是解码成本。最小化编码成本意味着最大化 neural activities 和 category set 之间的 mutual information。我们analytically 表明,这个 mutual information 可以写作在 representation space 上找到适当的表示空间,并在这个空间上建立适当的 metric 基于神经 Fisher information。这一点的主要后果是,分类学习会导致决策边界附近的神经空间扩展。最后,我们提供了数字图示,显示了编码神经 популяция的 Fisher information 与分类边界align。

Accelerating Hierarchical Associative Memory: A Deep Equilibrium Approach

  • paper_url: http://arxiv.org/abs/2311.15673
  • repo_url: https://github.com/cgoemaere/hamdeq
  • paper_authors: Cédric Goemaere, Johannes Deleu, Thomas Demeester
  • for: 提高 Hierarchical Associative Memory 模型在数字硬件上的实现效率,以便未来的研究和应用。
  • methods: 提议两种加速 Hierarchical Associative Memory 模型的记忆检索方法,包括将其映射到深度平衡模型,以及循环优化偶数和奇数层。
  • results: 通过使用这两种方法,可以大幅提高 Hierarchical Associative Memory 模型的能量最小化速度,并提供证明性实验结果。
    Abstract Hierarchical Associative Memory models have recently been proposed as a versatile extension of continuous Hopfield networks. In order to facilitate future research on such models, especially at scale, we focus on increasing their simulation efficiency on digital hardware. In particular, we propose two strategies to speed up memory retrieval in these models, which corresponds to their use at inference, but is equally important during training. First, we show how they can be cast as Deep Equilibrium Models, which allows using faster and more stable solvers. Second, inspired by earlier work, we show that alternating optimization of the even and odd layers accelerates memory retrieval by a factor close to two. Combined, these two techniques allow for a much faster energy minimization, as shown in our proof-of-concept experimental results. The code is available at https://github.com/cgoemaere/hamdeq
    摘要 Hierarchical Associative Memory models 最近被提议作为连续式豪维尔网络的扩展。为便于未来研究这些模型,特别是在大规模上,我们主要关注增加其在数字硬件上的 simulate 效率。具体来说,我们提出了两种加快 память检索的策略,这与它们在推理中的使用相同,但 equally important during training。首先,我们表明它们可以被转换为深度平衡模型,这使得使用更快和稳定的解决方案。其次, drawing on earlier work, we show that alternating optimization of the even and odd layers can accelerate memory retrieval by a factor close to two。总的来说,这两种技术可以使得能量减少得到了很大的加速,如我们的证明性实验结果所示。代码可以在 https://github.com/cgoemaere/hamdeq 中找到。

Universal Event Detection in Time Series

  • paper_url: http://arxiv.org/abs/2311.15654
  • repo_url: https://github.com/menouarazib/eventdetector
  • paper_authors: Menouar Azib, Benjamin Renard, Philippe Garnier, Vincent Génot, Nicolas André
  • for: 这 paper 是为了检测多变量时间序列数据中的事件而写的。
  • methods: 这 paper 使用了一种监督式深度学习方法,而不是二分类。这种简化可以避免整个数据集中每个点的标签需要 manually 标注,而是仅仅使用时间点或时间间隔的ground truth事件。
  • results: 这 paper 证明了这种方法是 universally 适用的,可以准确地检测任何类型的事件,只要时间序列具备某些稳定性假设。这些事件可以包括变点、诈骗、异常、物理现象和更多。这些结论得到了 FFN 的 universality 应用 theorem 的支持,并且通过实验来证明。
    Abstract In our previously published work, we introduced a supervised deep learning method for event detection in multivariate time series data, employing regression instead of binary classification. This simplification avoids the need for point-wise labels throughout the entire dataset, relying solely on ground truth events defined as time points or intervals. In this paper, we establish mathematically that our method is universal, and capable of detecting any type of event with arbitrary precision under mild continuity assumptions on the time series. These events may encompass change points, frauds, anomalies, physical occurrences, and more. We substantiate our theoretical results using the universal approximation theorem for feed-forward neural networks (FFN). Additionally, we provide empirical validations that confirm our claims, demonstrating that our method, with a limited number of parameters, outperforms other deep learning approaches, particularly for rare events and imbalanced datasets from different domains.
    摘要 在我们之前发表的工作中,我们介绍了一种监督式深度学习方法用于多变量时间序列数据中的事件检测,使用回归而不是二分类。这种简化可以避免整个数据集中每个点的点wise标签,仅仅基于时间点或时间间隔的真实事件标注。在这篇论文中,我们证明了我们的方法是通用的,能够检测任何类型的事件,并且可以在某些假设下保证有任何程度的准确性。这些事件可以包括时间变化点、诈骗、异常、物理现象和更多。我们通过卷积神经网络的通用approximation定理(FFN)证明了我们的理论结果。此外,我们还提供了实际验证,证明我们的方法,只需少量参数,可以超越其他深度学习方法,特别是对于罕见事件和不均衡数据集。

Bandits Meet Mechanism Design to Combat Clickbait in Online Recommendation

  • paper_url: http://arxiv.org/abs/2311.15647
  • repo_url: None
  • paper_authors: Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
  • for: 这个论文研究了一种战略性多臂带刺问题,即战略点击带刺问题。这个问题是基于在线推荐,每个臂的点击率是由臂自己选择的,以 maximize 点击数。
  • methods: 作者提出了一种奖励相关的学习算法,称为 UCB-S,可以同时完成两个目标:(a)奖励愿意行为的臂;(b)将未知参数学习到最佳值。
  • results: 作者证明了在 UCB-S 下,所有近似尼亚希尔均衡中的臂都可以具有低偏差。同时,作者也证明了不具有奖励相关性的算法通常无法实现低偏差。
    Abstract We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit. This model is motivated by applications in online recommendation where the choice of recommended items depends on both the click-through rates and the post-click rewards. Like in classical bandits, rewards follow a fixed unknown distribution. However, we assume that the click-rate of each arm is chosen strategically by the arm (e.g., a host on Airbnb) in order to maximize the number of times it gets clicked. The algorithm designer does not know the post-click rewards nor the arms' actions (i.e., strategically chosen click-rates) in advance, and must learn both values over time. To solve this problem, we design an incentive-aware learning algorithm, UCB-S, which achieves two goals simultaneously: (a) incentivizing desirable arm behavior under uncertainty; (b) minimizing regret by learning unknown parameters. We characterize all approximate Nash equilibria among arms under UCB-S and show a $\tilde{\mathcal{O} (\sqrt{KT})$ regret bound uniformly in every equilibrium. We also show that incentive-unaware algorithms generally fail to achieve low regret in the strategic click-bandit. Finally, we support our theoretical results by simulations of strategic arm behavior which confirm the effectiveness and robustness of our proposed incentive design.
    摘要 我们研究一种策略变体的多臂枪 Problem,我们称之为策略Click Bandit。这个模型是在线推荐中Click through rate和后Click reward的选择相互关联的应用所 inspirited。与 классиical bandits一样,奖励随机随机分布。然而,我们假设每个臂的Click rate是由臂(例如,Airbnb上的主机)选择的,以 Maximize the number of times it gets clicked。算法设计者不知道后Click reward nor arms' actions(即选择的Click rate)在先期,必须逐渐学习这些值。为解决这个问题,我们设计了一种注意力相关的学习算法,UCBS,它同时实现了两个目标:(a)吸引欲望的臂行为的激励;(b)避免 regret by learning unknown parameters。我们描述了所有approximate Nash equilibria among arms under UCB-S,并证明了 $\tilde{\mathcal{O} (\sqrt{KT})$ regret bound,这个 bound uniform in every equilibrium。我们还证明了不注意力相关的算法通常无法实现低 regret in the strategic click-bandit。最后,我们通过对策略臂的 simulations of behavior confirm the effectiveness and robustness of our proposed incentive design.

Leveraging Out-of-Domain Data for Domain-Specific Prompt Tuning in Multi-Modal Fake News Detection

  • paper_url: http://arxiv.org/abs/2311.16496
  • repo_url: None
  • paper_authors: Debarshi Brahma, Amartya Bhattacharya, Suraj Nagaje Mahadev, Anmol Asati, Vikas Verma, Soma Biswas
  • for: 本研究旨在解决现代信息泛洪中假新闻散布的问题,即使只有有限的注释数据。
  • methods: 本研究提出了一种名为DPOD(域pecificPrompt-tuning using Out-of-Domain data)的新框架,通过修改CLIP图像语言模型,以实现图像和相应的文本描述的标签意识对齐。此外,还提出了一种基于所有可用域的域pecific prompt学习技术,以利用这些域的训练样本来提高检测性能。
  • results: 对于大规模的NewsClippings benchmark dataset,DPOD Framework实现了状态的最佳性能,明显超过现有的方法。
    Abstract The spread of fake news using out-of-context images has become widespread and is a challenging task in this era of information overload. Since annotating huge amounts of such data requires significant time of domain experts, it is imperative to develop methods which can work in limited annotated data scenarios. In this work, we explore whether out-of-domain data can help to improve out-of-context misinformation detection (termed here as multi-modal fake news detection) of a desired domain, eg. politics, healthcare, etc. Towards this goal, we propose a novel framework termed DPOD (Domain-specific Prompt-tuning using Out-of-Domain data). First, to compute generalizable features, we modify the Vision-Language Model, CLIP to extract features that helps to align the representations of the images and corresponding text captions of both the in-domain and out-of-domain data in a label-aware manner. Further, we propose a domain-specific prompt learning technique which leverages the training samples of all the available domains based on the the extent they can be useful to the desired domain. Extensive experiments on a large-scale benchmark dataset, namely NewsClippings demonstrate that the proposed framework achieves state of-the-art performance, significantly surpassing the existing approaches for this challenging task.
    摘要 “评伪新闻使用无对话的图像变得普遍,是信息混乱时代中的挑战。由于验证大量这种数据需要专家的大量时间,因此需要发展可以在有限组 annotated 数据下运作的方法。在这个工作中,我们调查 Whether out-of-domain 数据可以帮助改善 politics, healthcare 等领域的跨域评伪检测(称为多modal fake news detection)。为达到这个目标,我们提出了一个名为 DPOD 的新框架(Domain-specific Prompt-tuning using Out-of-Domain data)。首先,我们修改了 Vision-Language Model, CLIP 以生成应用于这些领域的通用特征。然后,我们提出了一种领域专门的启发学习技术,借由所有可用的领域的训练样本,根据它们是否可以帮助所需的领域。实验结果显示,我们的方法在 NewsClippings 大规模 benchmark 数据集上取得了国际领先的性能,与现有的方法相比,表现优异。”

VeryFL: A Verify Federated Learning Framework Embedded with Blockchain

  • paper_url: http://arxiv.org/abs/2311.15617
  • repo_url: https://github.com/gtmllab/veryfl
  • paper_authors: Yihao Li, Yanyi Lai, Chuan Chen, Zibin Zheng
  • For: This paper is written for researchers and developers who are interested in exploring the application of blockchain technology in federated learning. It aims to provide a blockchain-based federated learning framework that is compatible with existing federated learning training tasks.* Methods: The paper proposes a blockchain-based federated learning framework called VeryFL, which embeds the Ethereum network and provides a code practice paradigm for combining federated learning with blockchain. The framework also includes mechanisms for model ownership authentication and watermarking to protect intellectual property rights.* Results: The paper presents the overall structure of the VeryFL framework and demonstrates its feasibility by implementing some blockchain federated learning algorithms on smart contracts. The framework is designed to provide a verifiable training, aggregation, and incentive distribution procedure, which can help ensure the integrity and security of the federated learning process.
    Abstract Blockchain-empowered federated learning (FL) has provoked extensive research recently. Various blockchain-based federated learning algorithm, architecture and mechanism have been designed to solve issues like single point failure and data falsification brought by centralized FL paradigm. Moreover, it is easier to allocate incentives to nodes with the help of the blockchain. Various centralized federated learning frameworks like FedML, have emerged in the community to help boost the research on FL. However, decentralized blockchain-based federated learning framework is still missing, which cause inconvenience for researcher to reproduce or verify the algorithm performance based on blockchain. Inspired by the above issues, we have designed and developed a blockchain-based federated learning framework by embedding Ethereum network. This report will present the overall structure of this framework, which proposes a code practice paradigm for the combination of FL with blockchain and, at the same time, compatible with normal FL training task. In addition to implement some blockchain federated learning algorithms on smart contract to help execute a FL training, we also propose a model ownership authentication architecture based on blockchain and model watermarking to protect the intellectual property rights of models. These mechanism on blockchain shows an underlying support of blockchain for federated learning to provide a verifiable training, aggregation and incentive distribution procedure and thus we named this framework VeryFL (A Verify Federated Learninig Framework Embedded with Blockchain). The source code is avaliable on https://github.com/GTMLLab/VeryFL.
    摘要 带有区块链 empowered 联合学习 (FL) 在最近几年内得到了广泛的研究。不同的区块链基础的联合学习算法、架构和机制被设计用于解决中央式 FL 模式中的问题,如单点故障和数据forge。此外,通过区块链可以更方便地分配激励。社区中出现了多种中央式联合学习框架,如 FedML,以帮助提高 FL 的研究。然而,基于区块链的分布式联合学习框架仍然缺失,这使得研究人员无法根据区块链来重现或验证算法性能。以上问题为我们提供了灵感,我们设计了一个基于区块链的联合学习框架,并将 Ethereum 网络 embedding 到该框架中。本报告将介绍该框架的总结ucture,包括一种将 FL 与区块链结合的代码实践范式,同时与传统的 FL 训练任务相容。此外,我们还在智能合约中实现了一些区块链联合学习算法,以帮助执行 FL 训练任务。此外,我们还提出了一种基于区块链的模型所有权鉴定体系和模型纹理,以保护模型的知识产权。这些基于区块链的机制表明了区块链对联合学习的支持,并为训练、聚合和激励分布式进行验证可重复的过程提供了下layers。因此,我们将这个框架称为 VeryFL(一个验证联合学习框架,嵌入区块链)。源代码可以在 中找到。

Bayesian Approach to Linear Bayesian Networks

  • paper_url: http://arxiv.org/abs/2311.15610
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Seyong Hwang, Kyoungjae Lee, Sunmin Oh, Gunwoong Park
  • for: 本研究提出了第一个 bayesian 方法来学习高维线性 bayesian 网络。
  • methods: 该方法 iteratively 估算每个 topological ordering 元素从后向前,使用 inverse partial covariance matrix。
  • results: 该方法可以成功恢复底层结构,并且在 bayesian 正则化下应用 unequal shrinkage 时显示了有效性。 Specifically, 研究发现,要求样本数 $n = \Omega( d_M^2 \log p)$ 和 $n = \Omega(d_M^2 p^{2/m})$,才能使用提posed algorithm 学习线性 bayesian 网络。
    Abstract This study proposes the first Bayesian approach for learning high-dimensional linear Bayesian networks. The proposed approach iteratively estimates each element of the topological ordering from backward and its parent using the inverse of a partial covariance matrix. The proposed method successfully recovers the underlying structure when Bayesian regularization for the inverse covariance matrix with unequal shrinkage is applied. Specifically, it shows that the number of samples $n = \Omega( d_M^2 \log p)$ and $n = \Omega(d_M^2 p^{2/m})$ are sufficient for the proposed algorithm to learn linear Bayesian networks with sub-Gaussian and 4m-th bounded-moment error distributions, respectively, where $p$ is the number of nodes and $d_M$ is the maximum degree of the moralized graph. The theoretical findings are supported by extensive simulation studies including real data analysis. Furthermore the proposed method is demonstrated to outperform state-of-the-art frequentist approaches, such as the BHLSM, LISTEN, and TD algorithms in synthetic data.
    摘要 Note:* "linear Bayesian networks" should be translated as "直线条件概率网络" (straight-line conditional probability networks)* "Bayesian approach" should be translated as " bayesian方法" (Bayesian method)* "topological ordering" should be translated as "顶点排序" (topological sorting)* "partial covariance matrix" should be translated as "部分协力矩阵" (partial covariance matrix)* "unequal shrinkage" should be translated as "不均等压缩" (unequal shrinkage)* "sub-Gaussian" should be translated as "sub-Gaussian" (sub-Gaussian)* "4m-th bounded-moment" should be translated as "4m-th bounded-moment" (4m-th bounded-moment)* "moralized graph" should be translated as "优化过的图" (moralized graph)* "frequentist approaches" should be translated as "频率主义方法" (frequentist methods)* "synthetic data" should be translated as " sintetic数据" (synthetic data)

Optimal Clustering of Discrete Mixtures: Binomial, Poisson, Block Models, and Multi-layer Networks

  • paper_url: http://arxiv.org/abs/2311.15598
  • repo_url: None
  • paper_authors: Zhongyuan Lyu, Ting Li, Dong Xia
  • For: This paper studies the fundamental limit of clustering networks when a multi-layer network is present, and proposes a novel two-stage clustering method to achieve the minimax optimal clustering error rate.* Methods: The proposed method uses a tensor-based initialization algorithm involving both node and sample splitting, followed by a refinement procedure using likelihood-based Lloyd algorithm. The method can handle extreme network sparsity under the mixture multi-layer stochastic block model (MMSBM).* Results: The proposed method outperforms existing methods in terms of clustering error rate, and can handle count-type weights on edges. The optimal clustering error rates in discrete mixtures can also be achieved by the proposed method.
    Abstract In this paper, we first study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Renyi divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization algorithm involving both node and sample splitting and a refinement procedure by likelihood-based Lloyd algorithm. Network clustering must be accompanied by node community detection. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. Numerical simulations and real data experiments both validate that our method outperforms existing methods. Oftentimes, the edges of networks carry count-type weights. We then extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks. The minimax optimal clustering error rates in these discrete mixtures all take the same exponential form characterized by the Renyi divergences. These optimal clustering error rates in discrete mixtures can also be achieved by our proposed two-stage clustering algorithm.
    摘要 在这篇论文中,我们首先研究多层网络归一化时的基本限制。在多层随机块模型(MMSBM)下,我们显示出的最优化网络归一化错误率是一种指数形式,并且由多层网络边频分布之间的Renyi差异特征化。我们提出了一种新的两stage网络归一化方法,包括一个基于张量的初始化算法,以及一个基于概率的Lloyd算法进行细化。网络归一化必须同时进行节点社区探测。我们的提posed算法实现了最优化网络归一化错误率,并允许极高的网络稀疏性。在MMSBM下,我们对实际数据进行了实验,并证明了我们的方法可以超过现有方法。通常情况下,网络边的权重是ount-type的。我们然后扩展了我们的方法和分析框架,对权重为ount-type的多层Poisson网络进行研究。在这些粒子混合中,最优化的归一化错误率都是一种指数形式,并且可以通过我们提出的两stage归一化算法实现。

Quantum Langevin Dynamics for Optimization

  • paper_url: http://arxiv.org/abs/2311.15587
  • repo_url: None
  • paper_authors: Zherui Chen, Yuchen Lu, Hao Wang, Yizhou Liu, Tongyang Li
  • for: 解决非对称目标函数优化问题,特别是使用量子融合链动(QLD)解决这类问题。
  • methods: 使用量子融合链动(QLD)方法,利用无穷热泵对系统的交互,使系统靠向全局最小值。
  • results: theoretically prove QLD的收敛性在凸领域,并证明系统的平均能量在低温下逐渐逼近零,减少了对象函数的能量损耗。 numerically show QLD的能量泄漏能力,并对每个参数的影响进行详细讨论。 finally, propose a time-dependent QLD方法,可以在更广泛的非凸领域中更好地收敛,并且在许多非凸领域中超越了一些现有的量子和类型优化算法。
    Abstract We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic damping effect to the system, which nudge the system towards a steady state that hovers near the global minimum of objective functions. We theoretically prove the convergence of QLD in convex landscapes, demonstrating that the average energy of the system can approach zero in the low temperature limit with an exponential decay rate correlated with the evolution time. Numerically, we first show the energy dissipation capability of QLD by retracing its origins to spontaneous emission. Furthermore, we conduct detailed discussion of the impact of each parameter. Finally, based on the observations when comparing QLD with classical Fokker-Plank-Smoluchowski equation, we propose a time-dependent QLD by making temperature and $\hbar$ time-dependent parameters, which can be theoretically proven to converge better than the time-independent case and also outperforms a series of state-of-the-art quantum and classical optimization algorithms in many non-convex landscapes.
    摘要 我们开始研究使用量子兰姆耳动力学(QLD)解决优化问题,特别是非对称目标函数,这些目标函数可能对传统的梯度下降算法带来很大的阻碍。我们考虑一个与无限热泵相互作用的系统,这种互动导致系统中的量子随机变动和决定性抑制效应,导致系统趋向稳定状态,该状态仅仅偏离全域最小值。我们 teorically 证明 QLD 在凸地形中的渐近减少,表明在低温限制下,系统的平均能量可以接近零,并且在演化时间下降对应数字为 exponential 衰落。numerically,我们首先显示了 QLD 的能量散射能力,并追溯它的起源到自适应发射。此外,我们进行了详细的参数影响分析。最后,根据对比 QLD 与经典 Fokker-Plank-Smoluchowski 方程的观察,我们提出了时间相依的 QLD,这可以理论上证明更好地渐近于稳定状态,并且在许多非对称的地形上超越了一系列现有的量子和经典优化算法。

A Simple Geometric-Aware Indoor Positioning Interpolation Algorithm Based on Manifold Learning

  • paper_url: http://arxiv.org/abs/2311.15583
  • repo_url: None
  • paper_authors: Suorong Yang, Geng Zhang, Jian Zhao, Furao Shen
  • for: 提高室内定位系统中间点的准确性和效率
  • methods: 利用拓扑学原理学习地方的 topological manifold,从而更加简单和高效地估算室内定位点
  • results: 在 simulate 和实际数据集上进行了系统性的实验和性能分析,并证明了提案的算法可以准确地和高效地估算室内定位点,并且在实时室内定位场景中具有实际应用前景。
    Abstract Interpolation methodologies have been widely used within the domain of indoor positioning systems. However, existing indoor positioning interpolation algorithms exhibit several inherent limitations, including reliance on complex mathematical models, limited flexibility, and relatively low precision. To enhance the accuracy and efficiency of indoor positioning interpolation techniques, this paper proposes a simple yet powerful geometric-aware interpolation algorithm for indoor positioning tasks. The key to our algorithm is to exploit the geometric attributes of the local topological manifold using manifold learning principles. Therefore, instead of constructing complicated mathematical models, the proposed algorithm facilitates the more precise and efficient estimation of points grounded in the local topological manifold. Moreover, our proposed method can be effortlessly integrated into any indoor positioning system, thereby bolstering its adaptability. Through a systematic array of experiments and comprehensive performance analyses conducted on both simulated and real-world datasets, we demonstrate that the proposed algorithm consistently outperforms the most commonly used and representative interpolation approaches regarding interpolation accuracy and efficiency. Furthermore, the experimental results also underscore the substantial practical utility of our method and its potential applicability in real-time indoor positioning scenarios.
    摘要 《 interpolación 方法ologies 在indoor positioning 系统中广泛应用。然而,现有的indoor positioning interpolación 算法具有许多内置的限制,包括依赖于复杂的数学模型、有限的灵活性和相对较低的精度。为了提高indoor positioning interpolación 技术的精度和效率,本文提出了一种简单又强大的地理特征意识 interpolación 算法。我们的算法的关键是利用local topological manifold的 геометрические特征,使用拟合学原理来优化 interpolación 精度。因此,不同于构建复杂的数学模型,我们的算法可以更加准确地和高效地估算points grounded in local topological manifold。此外,我们的方法可以轻松地与任何indoor positioning 系统集成,从而增强其适应性。通过对模拟数据集和实际数据集进行系统性的实验和完整性分析,我们证明了我们的算法在 interpolación 精度和效率方面常常超越了最常用和代表性的 interpolación 方法。此外,实验结果还证明了我们的方法在实时indoor positioning 场景中的实际应用性。》Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. The Traditional Chinese version would be slightly different.

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

  • paper_url: http://arxiv.org/abs/2311.15582
  • repo_url: None
  • paper_authors: Yi-Heng Lin, Wen-Hsuan Tseng, Li-Chin Chen, Ching-Ting Tan, Yu Tsao
  • for: 这个研究旨在提高严重不同评估方法的标准化和可重复性,以提高诊断和评估声音质量的精度和可靠性。
  • methods: 该研究提议使用轻量级自动音 ParameterExtraction,以增加临床关联性、降低复杂性和提高声音质量评估的可解释性。研究使用年龄、性别和五种音 Parameters:抖动、绝对抖动、晶谐噪听比(HNR)和零交叉。使用类传统机器学习方法进行分类。
  • results: 研究发现,该方法与现有State-of-the-art(SOTA)方法相当,并超过使用流行的音频预训练模型获得的潜在表示。这种方法提供了不同特征提取方法的可行性和声音质量评估中各种特征的适用性。音 Parameters如抖动和HNR被证明是评估声音质量特征的适用性良好的指标。然而,预训练模型在处理噪声相关的评分中存在限制。
    Abstract The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address this problem, we propose to leverage lightly weighted automatic audio parameter extraction, to increase the clinical relevance, reduce the complexity, and enhance the interpretability of voice quality assessment. The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing. A classical machine learning approach is employed. The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models. This approach provide insights into the feasibility of different feature extraction approaches for voice evaluation. Audio parameters such as jitter and the HNR are proven to be suitable for characterizing voice quality attributes, such as roughness and strain. Conversely, pre-trained models exhibit limitations in effectively addressing noise-related scorings. This study contributes toward more comprehensive and precise voice quality evaluations, achieved by a comprehensively exploring diverse assessment methodologies.
    摘要 “对话质量评估工具”是一种广泛使用的工具,用于跨诊所职业人员之间的交流和评估。目前,由于评估过程仰赖经验丰富的临床专业人员,因此容易受到主观性和不一致性的影响。为了解决这个问题,我们提议使用轻量级自动化音 Parameters 提取,以提高临床相关性、减少复杂性和提高评估结果的可读性。我们的方法使用年龄、性别和五个音 Parameters:抖动、绝对抖动、晃动、声音讯号比例(HNR)和零交叉。我们使用了 classical machine learning 方法。结果显示,我们的方法与现有的 state-of-the-art(SOTA)方法相似,并且在使用受欢迎的音预训练模型时表现更好。这个方法提供了不同的特征提取方法之间的比较,并证明了抖动和 HNR 是可以用来描述语音质量属性的有效特征。然而,预训练模型对于噪音相关的评分表现出限制。这些研究对于更加全面和精确的语音质量评估做出了贡献。

Streaming Lossless Volumetric Compression of Medical Images Using Gated Recurrent Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2311.16200
  • repo_url: None
  • paper_authors: Qianhao Chen, Jietao Chen
  • for: 这篇论文是用于开发一个可靠高效的数据量图像压缩方法,并且考虑到硬件设计和软件实现。
  • methods: 本文提出了一个硬件友好的流动损失压缩框架,使用了只有一千分之一的模型重量,较以前的学习型压缩框架来得更加实用。我们提出了一个阀道组合多种核心结构和融合阀门机制,以捕捉积存像中的条件相互依赖关系。基于这些背景信息,我们预测每个像素的分布,以进行条件编码。
  • results: 我们的方法在不同的医疗影像benchmark上均表现出色,较以前的损失压缩方法和学习型压缩方法来得更好。我们的方法还展示了强大的一致性和竞争性,并且在实时性方面取得了进一步的改进。
    Abstract Deep learning-based lossless compression methods offer substantial advantages in compressing medical volumetric images. Nevertheless, many learning-based algorithms encounter a trade-off between practicality and compression performance. This paper introduces a hardware-friendly streaming lossless volumetric compression framework, utilizing merely one-thousandth of the model weights compared to other learning-based compression frameworks. We propose a gated recurrent convolutional neural network that combines diverse convolutional structures and fusion gate mechanisms to capture the inter-slice dependencies in volumetric images. Based on such contextual information, we can predict the pixel-by-pixel distribution for entropy coding. Guided by hardware/software co-design principles, we implement the proposed framework on Field Programmable Gate Array to achieve enhanced real-time performance. Extensive experimental results indicate that our method outperforms traditional lossless volumetric compressors and state-of-the-art learning-based lossless compression methods across various medical image benchmarks. Additionally, our method exhibits robust generalization ability and competitive compression speed
    摘要 深度学习基于的无损压缩方法在医学三维图像压缩方面提供了重要的优势。然而,许多学习基于的算法往往面临一种实用性和压缩性之间的负担。这篇论文介绍了一种硬件友好的流处理无损三维压缩框架,使用的模型重量只有其他学习基于压缩框架的一万分之一。我们提议一种束 gate 循环卷积神经网络,该网络结合多种卷积结构和融合门机制,以捕捉三维图像中的slice相互依赖关系。基于这些上下文信息,我们可以预测每个像素的分布,以进行 entropy 编码。受硬件/软件合理设计原则导向,我们在 Field Programmable Gate Array 上实现了提高了实时性的方法。广泛的实验结果表明,我们的方法在各种医学图像标准准点上比传统的无损三维压缩器和当前学习基于的无损压缩方法表现出色,同时具有较好的普适性和压缩速度。

Experimental Analysis of Large-scale Learnable Vector Storage Compression

  • paper_url: http://arxiv.org/abs/2311.15578
  • repo_url: https://github.com/hugozhl/hetu
  • paper_authors: Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui
  • for: 这篇论文的目的是为了进行嵌入Vector的压缩,以减少模型训练和部署时的内存消耗。
  • methods: 这篇论文使用了14种嵌入压缩方法,并组织了这些方法成一个分类系统,以便进行比较和评估。
  • results: 这篇论文的实验结果显示,现有的嵌入压缩方法中,有些方法可以实现更好的压缩效果,但是这些方法的优劣尚未得到清晰的探讨。此外,这篇论文还发现了现有方法的限制,并建议了未来研究的可能方向。
    Abstract Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of models. Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads. Nevertheless, the relative performance of these methods remains unclear. Existing experimental comparisons only cover a subset of these methods and focus on limited metrics. In this paper, we perform a comprehensive comparative analysis and experimental evaluation of embedding compression. We introduce a new taxonomy that categorizes these techniques based on their characteristics and methodologies, and further develop a modular benchmarking framework that integrates 14 representative methods. Under a uniform test environment, our benchmark fairly evaluates each approach, presents their strengths and weaknesses under different memory budgets, and recommends the best method based on the use case. In addition to providing useful guidelines, our study also uncovers the limitations of current methods and suggests potential directions for future research.
    摘要

Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation

  • paper_url: http://arxiv.org/abs/2311.16199
  • repo_url: None
  • paper_authors: Ameya Daigavane, Song Kim, Mario Geiger, Tess Smidt
  • for: 该论文是用于生成三维分子结构的权重自REGULAR化生成模型。
  • methods: 该模型使用了高阶 $E(3)$-对称特征进行消息传递,并使用圆柱傅里叶信号来有效地模型分子的3D几何结构。
  • results: 该模型可以准确地生成小分子结构,并且超越了现有的权重自REGULAR化模型,接近了扩散模型的性能。
    Abstract We present Symphony, an $E(3)$-equivariant autoregressive generative model for 3D molecular geometries that iteratively builds a molecule from molecular fragments. Existing autoregressive models such as G-SchNet and G-SphereNet for molecules utilize rotationally invariant features to respect the 3D symmetries of molecules. In contrast, Symphony uses message-passing with higher-degree $E(3)$-equivariant features. This allows a novel representation of probability distributions via spherical harmonic signals to efficiently model the 3D geometry of molecules. We show that Symphony is able to accurately generate small molecules from the QM9 dataset, outperforming existing autoregressive models and approaching the performance of diffusion models.
    摘要 我们介绍Symphony,一种$E(3)$-对称的抽象生成模型,用于三维分子结构。该模型通过Iteratively builds a molecule from molecular fragments的方式来生成分子。现有的抽象模型,如G-SchNet和G-SphereNet,使用旋转不变的特征来尊重分子的三维Symmetry。然而,Symphony使用消息传递与更高级的$E(3)$-对称特征,允许 Novel representation of probability distributions via spherical harmonic signals来高效地模型分子的三维几何结构。我们显示Symphony可以准确地生成QM9数据集中的小分子,超越现有的抽象模型,并接近扩散模型的性能。

Ultra-short-term multi-step wind speed prediction for wind farms based on adaptive noise reduction technology and temporal convolutional network

  • paper_url: http://arxiv.org/abs/2311.16198
  • repo_url: https://github.com/jethrojames/wind-speed-forecast-tcn_gru
  • paper_authors: Haojian Huang
  • for: 提高风力发电的使用率,应对能源危机和环境污染
  • methods: 使用数据噪声减少技术、时间卷积网络(TCN)和闭合回归单元(GRU)构建风速预测模型
  • results: 在三座山东省风电农场实验 validate 过程中,提出的模型表现比传统模型和基于 TCN 的其他模型更好,实现了高精度、强稳定的风速预测,预测结果具有操作和管理风电农场的数据支持 potential.
    Abstract As an important clean and renewable kind of energy, wind power plays an important role in coping with energy crisis and environmental pollution. However, the volatility and intermittency of wind speed restrict the development of wind power. To improve the utilization of wind power, this study proposes a new wind speed prediction model based on data noise reduction technology, temporal convolutional network (TCN), and gated recurrent unit (GRU). Firstly, an adaptive data noise reduction algorithm P-SSA is proposed based on singular spectrum analysis (SSA) and Pearson correlation coefficient. The original wind speed is decomposed into multiple subsequences by SSA and then reconstructed. When the Pearson correlation coefficient between the reconstructed sequence and the original sequence is greater than 0.99, other noise subsequences are deleted to complete the data denoising. Then, the receptive field of the samples is expanded through the causal convolution and dilated convolution of TCN, and the characteristics of wind speed change are extracted. Then, the time feature information of the sequence is extracted by GRU, and then the wind speed is predicted to form the wind speed sequence prediction model of P-SSA-TCN-GRU. The proposed model was validated on three wind farms in Shandong Province. The experimental results show that the prediction performance of the proposed model is better than that of the traditional model and other models based on TCN, and the wind speed prediction of wind farms with high precision and strong stability is realized. The wind speed predictions of this model have the potential to become the data that support the operation and management of wind farms. The code is available at link.
    摘要 随着能源危机和环境污染的加剧,清洁可再生能源的发展变得非常重要。风力能源作为一种重要的清洁可再生能源,但风速的不稳定和间歇性限制了风力发电的发展。为了提高风力发电的利用率,本研究提出了一种基于数据噪声减少技术、时间卷积网络(TCN)和闭合回归网络(GRU)的新风速预测模型。首先,一种适应性的数据噪声减少算法P-SSA是提出,该算法基于单色 спектраль分析(SSA)和归一化相关系数。原始风速序列被分解成多个子序列,并通过SSA重建。当归一化相关系数大于0.99时,其他噪声子序列被删除,以完成数据净化。然后,采样的受应度被扩展通过 causal 卷积和扩大卷积,从而提取风速变化的特征。然后,采样序列中的时间特征信息被提取,并通过 GRU 网络进行预测,从而形成 P-SSA-TCN-GRU 风速序列预测模型。该模型在三个山东省风力电站上进行验证。实验结果表明,提出的模型的预测性能比传统模型和基于 TCN 的其他模型更高,并实现了风速预测的高精度和强稳定。这些风速预测可以支持风力电站的运营和管理。代码可以在链接中获取。

Learning with Complementary Labels Revisited: A Consistent Approach via Negative-Unlabeled Learning

  • paper_url: http://arxiv.org/abs/2311.15502
  • repo_url: None
  • paper_authors: Wei Wang, Takashi Ishida, Yu-Jie Zhang, Gang Niu, Masashi Sugiyama
  • for: 解决弱监督学习问题,在训练例子每个都关联一个或多个补充标签,表示该例子不属于哪些类别。
  • methods: 基于一对多分类问题的思想,将 complementary-label learning 表示为一系列的负标签二分类问题。提出了一种风险共轨的方法,并引入了一种过拟合问题的修正方法。
  • results: 在 synthetic 和实际数据上进行了广泛的实验 validate 了我们提出的方法在比较领域的优势,并且证明了该方法的统计可靠性和收敛速率。
    Abstract Complementary-label learning is a weakly supervised learning problem in which each training example is associated with one or multiple complementary labels indicating the classes to which it does not belong. Existing consistent approaches have relied on the uniform distribution assumption to model the generation of complementary labels, or on an ordinary-label training set to estimate the transition matrix. However, both conditions may not be satisfied in real-world scenarios. In this paper, we propose a novel complementary-label learning approach that does not rely on these conditions. We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems when using the one-versus-rest strategy. This observation allows us to propose a risk-consistent approach with theoretical guarantees. Furthermore, we introduce a risk correction approach to address overfitting problems when using complex models. We also prove the statistical consistency and convergence rate of the corrected risk estimator. Extensive experimental results on both synthetic and real-world benchmark datasets validate the superiority of our proposed approach over state-of-the-art methods.
    摘要 <>将文本翻译成简化中文。<>COMPLEMENTARY-LABEL LEARNING是一种弱监督学习问题,每个训练例子都与一个或多个补充标签相关,这些标签指示该例子不属于哪些类。现有的一致方法都是基于uniform分布假设来模型补充标签的生成,或者基于一个标准标签训练集来估算过渡矩阵。然而,这些条件在实际场景中可能并不满足。在这篇论文中,我们提出了一种新的COMPLEMENTARY-LABEL LEARNING方法,不依赖于这些条件。我们发现,当使用一对一策略时,COMPLEMENTARY-LABEL LEARNING可以表示为一系列的负标签二分类问题。这一观察允许我们提议一种风险一致的方法,并且提供了理论保证。此外,我们还引入了一种 corrections 方法来解决使用复杂模型时的预测问题。我们也证明了风险修正估计器的统计一致性和收敛速率。广泛的实验结果表明,我们的提出的方法在实际 benchmark 数据上具有优势。