cs.LG - 2023-07-15

MixupExplainer: Generalizing Explanations for Graph Neural Networks with Data Augmentation

  • paper_url: http://arxiv.org/abs/2307.07832
  • repo_url: https://github.com/jz48/mixupexplainer
  • paper_authors: Jiaxing Zhang, Dongsheng Luo, Hua Wei
  • for: This paper aims to address the issue of distribution shifting in post-hoc instance-level explanation methods for Graph Neural Networks (GNNs), which can lead to poor explanation quality in real-world applications with tight decision boundaries.
  • methods: The proposed approach is based on a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. The approach also uses a graph mixup method called MixupExplainer, which has a theoretical guarantee to resolve the distribution shifting issue.
  • results: The proposed MixupExplainer approach is validated through extensive experiments on both synthetic and real-world datasets, and is shown to be effective in addressing the distribution shifting issue and improving explanation quality. Additionally, the paper provides a detailed analysis of how the proposed approach alleviates the distribution shifting issue.Here is the result in Simplified Chinese text:
  • for: 这篇论文的目的是解决图神经网络(GNNs)的后期实例级解释方法中的分布shift问题,以提高实际应用中的决策边界。
  • methods: 该方法基于一种泛化的图信息瓶颈(GIB)形式,该形式包括一个独立于标签的图变量,与普通的GIB相等。该方法还使用一种图mixup方法called MixupExplainer,该方法具有解决分布shift问题的理论保证。
  • results: 该方法通过对 sintetic和实际数据集进行了广泛的实验 validate,并证明了其能够有效地解决分布shift问题,提高解释质量。此外,论文还提供了对该方法如何缓解分布shift问题的详细分析。
    Abstract Graph Neural Networks (GNNs) have received increasing attention due to their ability to learn from graph-structured data. However, their predictions are often not interpretable. Post-hoc instance-level explanation methods have been proposed to understand GNN predictions. These methods seek to discover substructures that explain the prediction behavior of a trained GNN. In this paper, we shed light on the existence of the distribution shifting issue in existing methods, which affects explanation quality, particularly in applications on real-life datasets with tight decision boundaries. To address this issue, we introduce a generalized Graph Information Bottleneck (GIB) form that includes a label-independent graph variable, which is equivalent to the vanilla GIB. Driven by the generalized GIB, we propose a graph mixup method, MixupExplainer, with a theoretical guarantee to resolve the distribution shifting issue. We conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our proposed mixup approach over existing approaches. We also provide a detailed analysis of how our proposed approach alleviates the distribution shifting issue.
    摘要 graph neural networks (GNNs) 已经收到了越来越多的关注,因为它们可以从图结构数据中学习。然而,它们的预测通常不是可解释的。post-hoc实例级解释方法已经被提出,以解释训练好的 GNN 的预测行为。在这篇论文中,我们探讨了现有方法中的分布转移问题,该问题影响解释质量,特别是在实际数据集上 with tight decision boundaries 上。为解决这个问题,我们引入一种通用的图信息瓶颈(GIB)形式,该形式包括一个独立于标签的图变量,与普通的 GIB 相等。驱动于通用 GIB,我们提议一种图mixup方法,MixupExplainer,具有解决分布转移问题的理论保证。我们在 both synthetic 和实际数据集上进行了广泛的实验,以验证我们的提议的混合方法的效iveness。我们还提供了详细的分析,解释我们的提议如何缓解分布转移问题。

Minimal Random Code Learning with Mean-KL Parameterization

  • paper_url: http://arxiv.org/abs/2307.07816
  • repo_url: None
  • paper_authors: Jihao Andreas Lin, Gergely Flamich, José Miguel Hernández-Lobato
  • for: 这个论文研究了两种基于Minimal Random Code Learning(MIRACLE)的变分 Bayesian neural networks的质量行为和稳定性。
  • methods: 论文使用了一种强大的、conditionally Gaussian变分approximation来 aproximate the weight posterior $Q_{\mathbf{w}$,并使用relative entropy coding来压缩一个weight sample从 posterior中使用 Gaussian coding distribution $P_{\mathbf{w}$。
  • results: 作者们发现,使用 Mean-KL 参数化可以更快 converges 并保持预测性能,并且 Mean-KL 导致了更有意义的变分分布和压缩weight sample,这些sample更易受到截彩处理。
    Abstract This paper studies the qualitative behavior and robustness of two variants of Minimal Random Code Learning (MIRACLE) used to compress variational Bayesian neural networks. MIRACLE implements a powerful, conditionally Gaussian variational approximation for the weight posterior $Q_{\mathbf{w}$ and uses relative entropy coding to compress a weight sample from the posterior using a Gaussian coding distribution $P_{\mathbf{w}$. To achieve the desired compression rate, $D_{\mathrm{KL}[Q_{\mathbf{w} \Vert P_{\mathbf{w}]$ must be constrained, which requires a computationally expensive annealing procedure under the conventional mean-variance (Mean-Var) parameterization for $Q_{\mathbf{w}$. Instead, we parameterize $Q_{\mathbf{w}$ by its mean and KL divergence from $P_{\mathbf{w}$ to constrain the compression cost to the desired value by construction. We demonstrate that variational training with Mean-KL parameterization converges twice as fast and maintains predictive performance after compression. Furthermore, we show that Mean-KL leads to more meaningful variational distributions with heavier tails and compressed weight samples which are more robust to pruning.
    摘要

Machine Learning Meets Mental Training – A Proof of Concept Applied to Memory Sports

  • paper_url: http://arxiv.org/abs/2307.08712
  • repo_url: None
  • paper_authors: Emanuele Regnani
  • for: 这个研究旨在结合机器学习和记忆运动两个领域,以实现一种实用的机器学习应用于记忆运动的实践。
  • methods: 该研究使用了机器学习算法,包括支持向量机和归一化树,来分析记忆运动中的数据。
  • results: 研究发现,通过使用机器学习算法,可以提高记忆运动的效果和精度,并且可以预测记忆运动的成绩。
    Abstract This work aims to combine these two fields together by presenting a practical implementation of machine learning to the particular form of mental training that is the art of memory, taken in its competitive version called "Memory Sports". Such a fusion, on the one hand, strives to raise awareness about both realms, while on the other it seeks to encourage research in this mixed field as a way to, ultimately, drive forward the development of this seemingly underestimated sport.
    摘要 (Simplified Chinese)这项工作 aimsto combine these two fields together by presenting a practical implementation of machine learning to the particular form of mental training that is the art of memory, taken in its competitive version called "Memory Sports". Such a fusion, on the one hand, strives to raise awareness about both realms, while on the other it seeks to encourage research in this mixed field as a way to, ultimately, drive forward the development of this seemingly underestimated sport.Note: The word " Memory Sports" is not a direct translation of "Memory Sports" in Chinese, but it is a commonly used term in the field to refer to competitive memory training.

Graph Automorphism Group Equivariant Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07810
  • repo_url: None
  • paper_authors: Edward Pearce-Crump
  • for: 这种研究的目的是对任意有 $n$ 个顶点的图 $G$ 和其自动同态群 $\textrm{Aut}(G)$ 进行全面的 caracterization,即确定所有可能的 $\textrm{Aut}(G)$-equivariant neural network 的层次结构,其层次空间是 $\mathbb{R}^{n}$ 的 tensor power。
  • methods: 这种研究使用了learnable、线性、$\textrm{Aut}(G)$-equivariant层函数的 span set 来 characterize 所有可能的层次结构。
  • results: 研究发现,对于任意的图 $G$ 和 $\textrm{Aut}(G)$,存在一个 span set of matrices 表示所有可能的 learnable、线性、$\textrm{Aut}(G)$-equivariant层函数,并且这些层函数可以在标准基底上表示 $\mathbb{R}^{n}$ 中的所有 tensor power。
    Abstract For any graph $G$ having $n$ vertices and its automorphism group $\textrm{Aut}(G)$, we provide a full characterisation of all of the possible $\textrm{Aut}(G)$-equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. In particular, we find a spanning set of matrices for the learnable, linear, $\textrm{Aut}(G)$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$.
    摘要 Simplified Chinese:For any graph $G$ with $n$ vertices, we provide a full characterization of all possible $\textrm{Aut}(G)$-equivariant neural networks whose layers are some tensor power of $\mathbb{R}^{n}$. Specifically, we find a spanning set of matrices for the learnable, linear, $\textrm{Aut}(G)$-equivariant layer functions between such tensor power spaces in the standard basis of $\mathbb{R}^{n}$.Note: "tensor power" is not a standard term in Simplified Chinese, so I used the phrase "some tensor power" to convey the same meaning.

$\text{EFO}_{k}$-CQA: Towards Knowledge Graph Complex Query Answering beyond Set Operation

  • paper_url: http://arxiv.org/abs/2307.13701
  • repo_url: https://github.com/hkust-knowcomp/efok-cqa
  • paper_authors: Hang Yin, Zihao Wang, Weizhi Fei, Yangqiu Song
  • for: 本研究的目的是提供一个涵盖多变量existential first-order queries(EFO)的完整框架,并评估这些方法在这个框架下的性能。
  • methods: 本研究使用了一些学习基本的方法,以扩展现有的知识 гра图学习方法,并将其应用到EFO queries中。
  • results: 本研究提出了一个名为 $\text{EFO}_{k}$-CQA的新数据集,并通过实验评估了这些方法在不同的查询难度下的性能。results also show that the existing dataset construction process is biased, highlighting the importance of the proposed framework.
    Abstract To answer complex queries on knowledge graphs, logical reasoning over incomplete knowledge is required due to the open-world assumption. Learning-based methods are essential because they are capable of generalizing over unobserved knowledge. Therefore, an appropriate dataset is fundamental to both obtaining and evaluating such methods under this paradigm. In this paper, we propose a comprehensive framework for data generation, model training, and method evaluation that covers the combinatorial space of Existential First-order Queries with multiple variables ($\text{EFO}_{k}$). The combinatorial query space in our framework significantly extends those defined by set operations in the existing literature. Additionally, we construct a dataset, $\text{EFO}_{k}$-CQA, with 741 types of query for empirical evaluation, and our benchmark results provide new insights into how query hardness affects the results. Furthermore, we demonstrate that the existing dataset construction process is systematically biased that hinders the appropriate development of query-answering methods, highlighting the importance of our work. Our code and data are provided in~\url{https://github.com/HKUST-KnowComp/EFOK-CQA}.
    摘要 “为回答知识图中复杂的查询,因为开放世界假设,需要逻辑推理 sobre 不完全的知识。学习基于方法是必要的,因为它们可以对未观察到的知识进行泛化。因此,一个适当的数据集是知识推理方法的基础,以及评估这些方法的基础。在这篇论文中,我们提出了一个完整的框架,包括数据生成、模型训练和方法评估,覆盖了多变量($\text{EFO}_{k}$)的组合空间。我们的框架中的组合查询空间significantly extends those defined by set operations in the existing literature。此外,我们构建了741种类型的查询集,并提供了empirical evaluation,我们的研究结果提供了新的视角,描述了查询困难度对结果的影响。此外,我们还发现了现有数据集构建过程存在系统性的偏见,这阻碍了适当的查询答案方法的发展,强调了我们的工作的重要性。我们的代码和数据可以在\url{https://github.com/HKUST-KnowComp/EFOK-CQA}中找到。”

The Interpolating Information Criterion for Overparameterized Models

  • paper_url: http://arxiv.org/abs/2307.07785
  • repo_url: None
  • paper_authors: Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
  • for: interpolating estimators with overparameterized models
  • methods: using classical information criteria, Bayesian duality, and prior misspecification
  • results: a new information criterion called Interpolating Information Criterion (IIC) that accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior in the overparameterized setting
    Abstract The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit, penalizing model size. However, these criteria are not appropriate in modern settings where overparameterized models tend to perform well. For any overparameterized model, we show that there exists a dual underparameterized model that possesses the same marginal likelihood, thus establishing a form of Bayesian duality. This enables more classical methods to be used in the overparameterized setting, revealing the Interpolating Information Criterion, a measure of model quality that naturally incorporates the choice of prior into the model selection. Our new information criterion accounts for prior misspecification, geometric and spectral properties of the model, and is numerically consistent with known empirical and theoretical behavior in this regime.
    摘要 “模型选择问题在 interpolating estimators 的设置下被考虑,其中模型参数的数量超出数据集的大小。经典信息critérium通常在大数据 limit 下考虑模型大小,但这些 критериion 不适用于现代设置, где过参数化模型往往表现良好。我们证明,任何过参数化模型都存在一个对应的 dual underparameterized model,这两个模型具有同样的边缘分布,从而建立了一种 Bayesian duality。这使得更 classical methods 可以在过参数化 Setting 中使用,揭示了 interpolating information criterion,一种评价模型质量的指标,这个指标自然地包括先验选择的选择。我们的新信息 критериion 考虑了先验错误、模型的几何和спектраль性质,与已知的 empirical 和理论行为相一致。”

CatBoost Versus XGBoost and LightGBM: Developing Enhanced Predictive Models for Zero-Inflated Insurance Claim Data

  • paper_url: http://arxiv.org/abs/2307.07771
  • repo_url: None
  • paper_authors: Banghee So
  • for: 这 paper 是为了构建投保laim predictive模型而写的,面临着高度右偏度分布的正确laims 和过多的 zeros 的挑战。
  • methods: 这 paper 使用了 zero-inflated 模型,将 traditional count model 和 binary model 结合起来,更有效地处理投保laim 数据。
  • results: 经过对两个不同的数据集的分析和比较, CatBoost 库在建立汽车投保laim frequency 模型方面表现最佳,并且发现 zero-inflated Poisson 树模型在不同数据特点下的假设对于relation between inflation probability and distribution mean 的变化会影响其性能。 I hope this helps! Let me know if you have any further questions.
    Abstract In the property and casualty insurance industry, some challenges are presented in constructing claim predictive models due to a highly right-skewed distribution of positive claims with excess zeros. Traditional models, such as Poisson or negative binomial Generalized Linear Models(GLMs), frequently struggle with inflated zeros. In response to this, researchers in actuarial science have employed ``zero-inflated" models that merge a traditional count model and a binary model to address these datasets more effectively. This paper uses boosting algorithms to process insurance claim data, including zero-inflated telematics data, in order to construct claim frequency models. We evaluated and compared three popular gradient boosting libraries - XGBoost, LightGBM, and CatBoost - with the aim of identifying the most suitable library for training insurance claim data and fitting actuarial frequency models. Through a rigorous analysis of two distinct datasets, we demonstrated that CatBoost is superior in developing auto claim frequency models based on predictive performance. We also found that Zero-inflated Poisson boosted tree models, with variations in their assumptions about the relationship between inflation probability and distribution mean, outperformed others depending on data characteristics. Furthermore, by using a specific CatBoost tool, we explored the effects and interactions of different risk features on the frequency model when using telematics data.
    摘要 在财产和责任保险业务中,建立投保模型时会遇到一些挑战,主要是因为投保金额呈右skewed分布,具有过多的零值。传统模型,如波尔tz或非正态泛化模型(GLM),经常遇到膨胀零值问题。为了解决这个问题, actuarial science 研究人员使用了“zero-inflated”模型,这种模型结合了传统的计数模型和二分模型,可以更有效地处理这些数据。本文使用了扩大算法来处理投保laim data,包括零Inflated telematics data,以建立投保频率模型。我们对三种popular gradient boosting库(XGBoost、LightGBM、CatBoost)进行了评估和比较,以确定最适合训练投保laim数据和适应保险频率模型的库。经过对两个不同的数据集的严格分析,我们发现CatBoost在开发汽车投保频率模型方面表现出色,并且对数据特点进行了深入的探索和分析。此外,我们还使用了CatBoost工具来探索不同风险特征对频率模型的影响,并对telematics数据进行了深入的分析。

randomHAR: Improving Ensemble Deep Learners for Human Activity Recognition with Sensor Selection and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.07770
  • repo_url: None
  • paper_authors: Yiran Huang, Yexu Zhou, Till Riedel, Likun Fang, Michael Beigl
  • for: 提高人体动作识别(HAR)领域中的表现,并且超越其他需要手动工程Feature的建筑。
  • methods: 使用随机选择数据集中的感知器数据来训练多个深度学习模型,并使用强化学习算法来选择最佳的模型 subsets 用于运行预测。
  • results: 对六个HAR数据集进行比较,结果表明提议的方法可以超越当前状态的各种方法,包括ensembleLSTM。
    Abstract Deep learning has proven to be an effective approach in the field of Human activity recognition (HAR), outperforming other architectures that require manual feature engineering. Despite recent advancements, challenges inherent to HAR data, such as noisy data, intra-class variability and inter-class similarity, remain. To address these challenges, we propose an ensemble method, called randomHAR. The general idea behind randomHAR is training a series of deep learning models with the same architecture on randomly selected sensor data from the given dataset. Besides, an agent is trained with the reinforcement learning algorithm to identify the optimal subset of the trained models that are utilized for runtime prediction. In contrast to existing work, this approach optimizes the ensemble process rather than the architecture of the constituent models. To assess the performance of the approach, we compare it against two HAR algorithms, including the current state of the art, on six HAR benchmark datasets. The result of the experiment demonstrates that the proposed approach outperforms the state-of-the-art method, ensembleLSTM.
    摘要 深度学习在人动识别(HAR)领域已经证明是一种有效的方法,超过了需要人工特征工程的其他架构。 DESPITE recent advancements, HAR数据中的挑战,如噪音数据、内类变化和间类相似性,仍然存在。 To address these challenges, we propose an ensemble method, called randomHAR. The general idea behind randomHAR is to train a series of deep learning models with the same architecture on randomly selected sensor data from the given dataset. Besides, an agent is trained with the reinforcement learning algorithm to identify the optimal subset of the trained models that are utilized for runtime prediction. In contrast to existing work, this approach optimizes the ensemble process rather than the architecture of the constituent models. To assess the performance of the approach, we compare it against two HAR algorithms, including the current state of the art, on six HAR benchmark datasets. The result of the experiment demonstrates that the proposed approach outperforms the state-of-the-art method, ensembleLSTM.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Variational Monte Carlo on a Budget – Fine-tuning pre-trained Neural Wavefunctions

  • paper_url: http://arxiv.org/abs/2307.09337
  • repo_url: https://github.com/mdsunivie/deeperwin
  • paper_authors: Michael Scherbela, Leon Gerard, Philipp Grohs
  • for: 这 paper 的目的是提出一种基于深度学习的变量 Monte Carlo(DL-VMC)方法,以提高计算量化化学中的精度。
  • methods: 这 paper 使用了自我超vised wavefunction optimization 来预训练 DL-VMC 模型,并在新的分子实例上应用这个模型来获得更高的精度。
  • results: compared to established methods such as CCSD(T)-2Z, 这 paper 的方法可以获得更高的精度和更好的相对能量。 In addition, the method can be applied to a wide variety of test systems and shows good scalability.
    Abstract Obtaining accurate solutions to the Schr\"odinger equation is the key challenge in computational quantum chemistry. Deep-learning-based Variational Monte Carlo (DL-VMC) has recently outperformed conventional approaches in terms of accuracy, but only at large computational cost. Whereas in many domains models are trained once and subsequently applied for inference, accurate DL-VMC so far requires a full optimization for every new problem instance, consuming thousands of GPUhs even for small molecules. We instead propose a DL-VMC model which has been pre-trained using self-supervised wavefunction optimization on a large and chemically diverse set of molecules. Applying this model to new molecules without any optimization, yields wavefunctions and absolute energies that outperform established methods such as CCSD(T)-2Z. To obtain accurate relative energies, only few fine-tuning steps of this base model are required. We accomplish this with a fully end-to-end machine-learned model, consisting of an improved geometry embedding architecture and an existing SE(3)-equivariant model to represent molecular orbitals. Combining this architecture with continuous sampling of geometries, we improve zero-shot accuracy by two orders of magnitude compared to the state of the art. We extensively evaluate the accuracy, scalability and limitations of our base model on a wide variety of test systems.
    摘要 computational quantum chemistry中的主要挑战是获取准确的Schrödinger方程解。深度学习基于变量 Monte Carlo(DL-VMC)在过去几年内已经超越了传统方法,但是它们的计算成本很大。在许多领域中,模型会被训练一次并用于推理,而DL-VMC则需要每个新问题都进行全局优化,消耗了千个GPUhs甚至对于小分子来说。我们提议一种已经预训练过的DL-VMC模型,使用自动优化的自我适应波函数优化算法来训练。对于新的分子,只需要几个精度调整步骤,就可以获得比CCSD(T)-2Z更高的精度。为了获取准确的相对能量,我们使用一个完整的端到端机器学习模型,包括改进的几何嵌入体系和现有的SE(3)-可变模型来表示分子轨道函数。将这种体系与连续样本的几何描述相结合,我们提高了零shot精度至少两个数量级比前state of the art。我们对各种测试系统进行了广泛的评估,包括准确度、可扩展性和限制。

Real-time Traffic Classification for 5G NSA Encrypted Data Flows With Physical Channel Records

  • paper_url: http://arxiv.org/abs/2307.07756
  • repo_url: None
  • paper_authors: Xiao Fei, Philippe Martins, Jialiang Lu
  • for: 5G-NR mobile network traffic classification for QoS management and dynamic resource allocation
  • methods: real-time encrypted traffic classification using physical channel records and decision-tree-based gradient boosting algorithms
  • results: 95% accuracy with state-of-the-art response time of 10ms using Light Gradient Boosting Machine (LGBM)
    Abstract The classification of fifth-generation New-Radio (5G-NR) mobile network traffic is an emerging topic in the field of telecommunications. It can be utilized for quality of service (QoS) management and dynamic resource allocation. However, traditional approaches such as Deep Packet Inspection (DPI) can not be directly applied to encrypted data flows. Therefore, new real-time encrypted traffic classification algorithms need to be investigated to handle dynamic transmission. In this study, we examine the real-time encrypted 5G Non-Standalone (NSA) application-level traffic classification using physical channel records. Due to the vastness of their features, decision-tree-based gradient boosting algorithms are a viable approach for classification. We generate a noise-limited 5G NSA trace dataset with traffic from multiple applications. We develop a new pipeline to convert sequences of physical channel records into numerical vectors. A set of machine learning models are tested, and we propose our solution based on Light Gradient Boosting Machine (LGBM) due to its advantages in fast parallel training and low computational burden in practical scenarios. Our experiments demonstrate that our algorithm can achieve 95% accuracy on the classification task with a state-of-the-art response time as quick as 10ms.
    摘要 fifth-generation New-Radio (5G-NR) 移动网络流量的分类是当前 телеcommunications 领域的一个热点话题。它可以用于质量服务(QoS)管理和动态资源分配。然而,传统的方法,如深度包检查(DPI),无法直接应用于加密数据流。因此,新的实时加密交通分类算法需要被研究以处理动态传输。在本研究中,我们研究了实时加密5G非标准应用级别(NSA)的应用级别流量分类,使用物理通道记录。由于它们的特征很多,决策树基本的泵浦搅拌算法是一种可行的方法。我们生成了5G NSA的噪声限定数据集,包括多个应用程序的流量。我们开发了一个新的管道,将物理通道记录序列转换为数字矢量。一系列机器学习模型被测试,我们提议使用光 Gradient Boosting Machine(LGBM),因为它在实际应用中具有快速并行训练和低计算负担的优点。我们的实验表明,我们的算法可以在分类任务上达到95%的准确率,并且响应时间只需10ms。

Learning Expressive Priors for Generalization and Uncertainty Estimation in Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07753
  • repo_url: https://github.com/dlr-rm/bpnn
  • paper_authors: Dominik Schnaus, Jongseok Lee, Daniel Cremers, Rudolph Triebel
  • for: 本文提出了一种新的先学习方法,用于提高深度神经网络的通用化和不确定性估计。
  • methods: 本文使用了可扩展的结构化 posterior 方法,以获得具有普遍保证的通用化表达。我们的学习的先验提供了具有表达能力的概率表示,类似于 Bayesian 对 ImageNet 预训练模型的Counterparts,并且生成了非虚无的泛化 bound。
  • results: 我们通过实验证明了这种方法的效果,包括不确定性估计和通用化。
    Abstract In this work, we propose a novel prior learning method for advancing generalization and uncertainty estimation in deep neural networks. The key idea is to exploit scalable and structured posteriors of neural networks as informative priors with generalization guarantees. Our learned priors provide expressive probabilistic representations at large scale, like Bayesian counterparts of pre-trained models on ImageNet, and further produce non-vacuous generalization bounds. We also extend this idea to a continual learning framework, where the favorable properties of our priors are desirable. Major enablers are our technical contributions: (1) the sums-of-Kronecker-product computations, and (2) the derivations and optimizations of tractable objectives that lead to improved generalization bounds. Empirically, we exhaustively show the effectiveness of this method for uncertainty estimation and generalization.
    摘要 在这项工作中,我们提出了一种新的先学习方法,用于提高深度神经网络的泛化和不确定性估计。关键思想是利用可扩展和结构化的神经网络 posterior 作为有用的先学习模型,具有泛化保证。我们学习的先学习模型可以在大规模上表达可信度,类似于 Bayesian 对 ImageNet 预训练模型的Counterpart,并且生成非虚无效的泛化误差 bound。我们还将这个想法应用于连续学习框架,其中我们的先学习模型具有恰当的性质。主要推动因素是我们的技术贡献:(1) Kronecker 乘积计算,以及(2)对可迭代目标函数的 derivation 和优化,导致改进的泛化误差 bound。在实验中,我们详细展示了这种方法的效果,包括不确定性估计和泛化。

Probabilistic Black-Box Checking via Active MDP Learning

  • paper_url: http://arxiv.org/abs/2308.07930
  • repo_url: None
  • paper_authors: Junya Shijubo, Masaki Waga, Kohei Suenaga
  • for: 测试黑盒系统的概率性行为
  • methods: 使用活动Markov决策过程学习、概率模型检查和统计假设测试
  • results: ProbBBC比现有方法更高效,特别是对具有有限观察的系统。
    Abstract We introduce a novel methodology for testing stochastic black-box systems, frequently encountered in embedded systems. Our approach enhances the established black-box checking (BBC) technique to address stochastic behavior. Traditional BBC primarily involves iteratively identifying an input that breaches the system's specifications by executing the following three phases: the learning phase to construct an automaton approximating the black box's behavior, the synthesis phase to identify a candidate counterexample from the learned automaton, and the validation phase to validate the obtained candidate counterexample and the learned automaton against the original black-box system. Our method, ProbBBC, refines the conventional BBC approach by (1) employing an active Markov Decision Process (MDP) learning method during the learning phase, (2) incorporating probabilistic model checking in the synthesis phase, and (3) applying statistical hypothesis testing in the validation phase. ProbBBC uniquely integrates these techniques rather than merely substituting each method in the traditional BBC; for instance, the statistical hypothesis testing and the MDP learning procedure exchange information regarding the black-box system's observation with one another. The experiment results suggest that ProbBBC outperforms an existing method, especially for systems with limited observation.
    摘要 我们介绍一种新的黑盒系统测试方法,这种方法可以更好地捕捉黑盒系统中的随机行为。我们的方法基于传统的黑盒检查(BBC)技术,但它具有以下三个特点:1. 在学习阶段使用活动马尔可夫遇处理(MDP)学习方法,以更好地模拟黑盒系统的行为。2. 在合成阶段使用概率模型检查,以更好地找到黑盒系统的错误。3. 在验证阶段使用统计假设检测,以验证获得的候选反例和学习的黑盒系统是否符合原始黑盒系统。我们的方法不同于传统的 BBC 方法,不仅是将每种方法简单地替换成另一种。例如,统计假设检测和 MDP 学习过程之间会互相交换黑盒系统的观察信息。我们的实验结果表明,ProbBBC 比现有的方法更高效,特别是针对具有有限观察的系统。

On the Utility Gain of Iterative Bayesian Update for Locally Differentially Private Mechanisms

  • paper_url: http://arxiv.org/abs/2307.07744
  • repo_url: https://github.com/hharcolezi/multi-freq-ldpy
  • paper_authors: Héber H. Arcolezi, Selene Cerna, Catuscia Palamidessi
  • for: 本研究 investigate了使用 Iterative Bayesian Update (IBU) 提高 private discrete distribution 估计中的实用性,使用受到 Locally Differentially Private (LDP) 机制的数据干扰。
  • methods: 我们比较了 IBU 和 Matrix Inversion (MI) 两种估计技术的性能,对七种LDP机制进行了一次数据收集和多次数据收集的比较(如 RAPPOR)。我们还在不同的实用环境下(包括 synthetic 数据和实际数据)进行了参数调整(包括 utility 度量、用户数 n、领域大小 k 和隐私参数 {\epsilon})。
  • results: 我们的结果表明,IBU 可以在不同的场景下提高 LDP 机制的实用性,而不需要额外的隐私成本。例如,在高隐私 режи(即 {\epsilon} 小)下,IBU 可以提供更好的实用性比 MI。我们的研究为实践者提供了使用 IBU 和现有 LDP 机制进行更准确和隐私保护的数据分析的指导。此外,我们将 IBU 实现到了 state-of-the-art multi-freq-ldpy Python 包(https://pypi.org/project/multi-freq-ldpy/)中,并将所有我们用于实验的代码开源了为 tutorials。
    Abstract This paper investigates the utility gain of using Iterative Bayesian Update (IBU) for private discrete distribution estimation using data obfuscated with Locally Differentially Private (LDP) mechanisms. We compare the performance of IBU to Matrix Inversion (MI), a standard estimation technique, for seven LDP mechanisms designed for one-time data collection and for other seven LDP mechanisms designed for multiple data collections (e.g., RAPPOR). To broaden the scope of our study, we also varied the utility metric, the number of users n, the domain size k, and the privacy parameter {\epsilon}, using both synthetic and real-world data. Our results suggest that IBU can be a useful post-processing tool for improving the utility of LDP mechanisms in different scenarios without any additional privacy cost. For instance, our experiments show that IBU can provide better utility than MI, especially in high privacy regimes (i.e., when {\epsilon} is small). Our paper provides insights for practitioners to use IBU in conjunction with existing LDP mechanisms for more accurate and privacy-preserving data analysis. Finally, we implemented IBU for all fourteen LDP mechanisms into the state-of-the-art multi-freq-ldpy Python package (https://pypi.org/project/multi-freq-ldpy/) and open-sourced all our code used for the experiments as tutorials.
    摘要

Knowledge Graph Enhanced Intelligent Tutoring System Based on Exercise Representativeness and Informativeness

  • paper_url: http://arxiv.org/abs/2307.15076
  • repo_url: None
  • paper_authors: Linqing Li, Zhifeng Wang
  • for: 提高学生的性能,适应不同学生的学习需求
  • methods: 基于知识图建立一个权重计算模型,考虑了知识图中的多种关系,并使用了新型的神经网络诊断模型
  • results: 对两个公共教育数据集进行了广泛的实验,结果表明,该 framwork 可以更好地推荐适合学生的练习题,提高学生的性能
    Abstract Presently, knowledge graph-based recommendation algorithms have garnered considerable attention among researchers. However, these algorithms solely consider knowledge graphs with single relationships and do not effectively model exercise-rich features, such as exercise representativeness and informativeness. Consequently, this paper proposes a framework, namely the Knowledge-Graph-Exercise Representativeness and Informativeness Framework, to address these two issues. The framework consists of four intricate components and a novel cognitive diagnosis model called the Neural Attentive cognitive diagnosis model. These components encompass the informativeness component, exercise representation component, knowledge importance component, and exercise representativeness component. The informativeness component evaluates the informational value of each question and identifies the candidate question set that exhibits the highest exercise informativeness. Furthermore, the skill embeddings are employed as input for the knowledge importance component. This component transforms a one-dimensional knowledge graph into a multi-dimensional one through four class relations and calculates skill importance weights based on novelty and popularity. Subsequently, the exercise representativeness component incorporates exercise weight knowledge coverage to select questions from the candidate question set for the tested question set. Lastly, the cognitive diagnosis model leverages exercise representation and skill importance weights to predict student performance on the test set and estimate their knowledge state. To evaluate the effectiveness of our selection strategy, extensive experiments were conducted on two publicly available educational datasets. The experimental results demonstrate that our framework can recommend appropriate exercises to students, leading to improved student performance.
    摘要 Translated into Simplified Chinese:当前,基于知识图的推荐算法已经吸引了研究人员的广泛关注。然而,这些算法只考虑单 relate 知识图,并不能有效地模型运动rich feature,如运动 representativeness 和 informativeness。因此,本文提出了一个框架,即知识图运动 representativeness 和 informativeness 框架,以解决这两个问题。该框架包括四个复杂的组件和一个新的认知诊断模型called Neural Attentive cognitive diagnosis model。这些组件包括 informativeness 组件、运动表现组件、知识重要性组件和运动 representativeness 组件。informativeness 组件评估每个问题的信息价值,并将候选问题集定为展示最高运动 informativeness。此外,技能嵌入被用作知识重要性组件的输入。这个组件通过四种类关系将一维知识图转换为多维知识图,并计算技能重要性 weights 基于新鲜度和流行度。然后,运动 representativeness 组件将运动权重知识覆盖纳入选择候选问题集的 tested question set。最后,认知诊断模型通过运动表现和技能重要性 weights 预测学生在测试集上的表现和知识状态。为评估我们的选择策略的效果,我们在两个公共可用的教育数据集上进行了广泛的实验。实验结果表明,我们的框架可以为学生推荐适合的运动,从而提高学生的表现。

Promotion/Inhibition Effects in Networks: A Model with Negative Probabilities

  • paper_url: http://arxiv.org/abs/2307.07738
  • repo_url: None
  • paper_authors: Anqi Dong, Tryphon T. Georgiou, Allen Tannenbaum
  • for: 本研究旨在解决基因网络中Edge-weight的 inverse problem,即根据签入式互连矩阵和表达水平确定Edge-weight。
  • methods: 本研究采用了P。Dirac和R。Feynman提出的“负概率”框架,并设立了可能性形式来获得Edge-weight的值。 solve this problem, the proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm.
  • results: 本研究得到了一种基于“负概率”框架的方法,可以在基因网络中确定Edge-weight,并且这种方法可以通过一种扩展的Sinkhorn算法来解决。
    Abstract Biological networks often encapsulate promotion/inhibition as signed edge-weights of a graph. Nodes may correspond to genes assigned expression levels (mass) of respective proteins. The promotion/inhibition nature of co-expression between nodes is encoded in the sign of the corresponding entry of a sign-indefinite adjacency matrix, though the strength of such co-expression (i.e., the precise value of edge weights) cannot typically be directly measured. Herein we address the inverse problem to determine network edge-weights based on a sign-indefinite adjacency and expression levels at the nodes. While our motivation originates in gene networks, the framework applies to networks where promotion/inhibition dictates a stationary mass distribution at the nodes. In order to identify suitable edge-weights we adopt a framework of ``negative probabilities,'' advocated by P.\ Dirac and R.\ Feynman, and we set up a likelihood formalism to obtain values for the sought edge-weights. The proposed optimization problem can be solved via a generalization of the well-known Sinkhorn algorithm; in our setting the Sinkhorn-type ``diagonal scalings'' are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with value computed as the positive root of a quadratic polynomial.
    摘要 To identify suitable edge weights, we adopt a framework of "negative probabilities" advocated by P. Dirac and R. Feynman. We set up a likelihood formalism to obtain values for the sought edge weights. The proposed optimization problem can be solved using a generalization of the well-known Sinkhorn algorithm; in our setting, the Sinkhorn-type "diagonal scalings" are multiplicative or inverse-multiplicative, depending on the sign of the respective entries in the adjacency matrix, with values computed as the positive root of a quadratic polynomial.

Measuring Perceived Trust in XAI-Assisted Decision-Making by Eliciting a Mental Model

  • paper_url: http://arxiv.org/abs/2307.11765
  • repo_url: None
  • paper_authors: Mohsen Abbaspour Onari, Isel Grau, Marco S. Nobile, Yingqian Zhang
  • for: This paper aims to measure users’ perceived trust in an Explainable Artificial Intelligence (XAI) model by eliciting their mental models using Fuzzy Cognitive Maps (FCMs).
  • methods: The paper uses an interpretable Machine Learning (ML) model to classify suspected COVID-19 patients and then evaluates the impact of interpretations on perceived trust through a survey of Medical Experts’ (MEs) explanation satisfaction attributes. Fuzzy linguistic variables are used to determine the strength of influences in MEs’ mental subjectivity.
  • results: The paper obtains quantified values to measure the perceived trust of each ME and analyzes the behavior of MEs in completing diagnostic tasks based on the quantified values. The results show that the quantified values can determine whether MEs trust or distrust the XAI model.
    Abstract This empirical study proposes a novel methodology to measure users' perceived trust in an Explainable Artificial Intelligence (XAI) model. To do so, users' mental models are elicited using Fuzzy Cognitive Maps (FCMs). First, we exploit an interpretable Machine Learning (ML) model to classify suspected COVID-19 patients into positive or negative cases. Then, Medical Experts' (MEs) conduct a diagnostic decision-making task based on their knowledge and then prediction and interpretations provided by the XAI model. In order to evaluate the impact of interpretations on perceived trust, explanation satisfaction attributes are rated by MEs through a survey. Then, they are considered as FCM's concepts to determine their influences on each other and, ultimately, on the perceived trust. Moreover, to consider MEs' mental subjectivity, fuzzy linguistic variables are used to determine the strength of influences. After reaching the steady state of FCMs, a quantified value is obtained to measure the perceived trust of each ME. The results show that the quantified values can determine whether MEs trust or distrust the XAI model. We analyze this behavior by comparing the quantified values with MEs' performance in completing diagnostic tasks.
    摘要 Translation Notes:* "empirical study" is translated as "实验研究" (shí yàn yán jí)* "perceived trust" is translated as "感知的信任" (gǎn zhī de xìn ràng)* "Fuzzy Cognitive Maps" is translated as "模糊认知地图" (mó huang gòu zhī dì tú)* "Medical Experts" is translated as "医学专家" (yī xué zhù jià)* "diagnostic decision-making task" is translated as "诊断决策任务" (shòu yán jì suī zhèng yì)* "explanation satisfaction attributes" is translated as "解释满意属性" (jiě jie cháng zhì fù xìng)* "fuzzy linguistic variables" is translated as "模糊语言变量" (mó huang yǔ yán biàn zhì)* "quantified value" is translated as "量化值" (liàng zhì yù)* "perceived trust of each ME" is translated as "每位ME的感知信任" (mēi zhì ME de gǎn zhī xìn ràng)

Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation

  • paper_url: http://arxiv.org/abs/2308.07929
  • repo_url: None
  • paper_authors: Victor Gallego
  • for: 这篇论文的目的是如何将大型多modal模型(如CLIP和Stable Diffusion)进行特定任务或偏好的个性化。
  • methods: 本研究使用布莱德利-泰勒喜好模型(Bradley-Terry preference model)开发了一种快速适应方法,将原始模型迅速微调,只需少量的示例和计算资源。
  • results: 实验结果显示了这个框架在不同的多modal文本和图像理解领域中的能力,包括喜好预测和生成任务。
    Abstract Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards sets of particular human preferences, aligning the retrieved or generated images with the preferences of the user. We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model, with few examples and with minimal computing resources. Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding, including preference prediction as a reward model, and generation tasks.
    摘要

A Nearly-Linear Time Algorithm for Structured Support Vector Machines

  • paper_url: http://arxiv.org/abs/2307.07735
  • repo_url: https://github.com/ljinstat/Structured_Data_Random_Features_for_Large-Scale_Kernel_Machines
  • paper_authors: Yuzhou Gu, Zhao Song, Lichen Zhang
  • for: quadratic programming with low-rank factorization or low-treewidth, and a small number of linear constraints
  • methods: nearly-linear time algorithm
  • results: nearly-linear time algorithms for low-treewidth or low-rank SVMs
    Abstract Quadratic programming is a fundamental problem in the field of convex optimization. Many practical tasks can be formulated as quadratic programming, for example, the support vector machine (SVM). Linear SVM is one of the most popular tools over the last three decades in machine learning before deep learning method dominating. In general, a quadratic program has input size $\Theta(n^2)$ (where $n$ is the number of variables), thus takes $\Omega(n^2)$ time to solve. Nevertheless, quadratic programs coming from SVMs has input size $O(n)$, allowing the possibility of designing nearly-linear time algorithms. Two important classes of SVMs are programs admitting low-rank kernel factorizations and low-treewidth programs. Low-treewidth convex optimization has gained increasing interest in the past few years (e.g.~linear programming [Dong, Lee and Ye 2021] and semidefinite programming [Gu and Song 2022]). Therefore, an important open question is whether there exist nearly-linear time algorithms for quadratic programs with these nice structures. In this work, we provide the first nearly-linear time algorithm for solving quadratic programming with low-rank factorization or low-treewidth, and a small number of linear constraints. Our results imply nearly-linear time algorithms for low-treewidth or low-rank SVMs.
    摘要 quadratic programming 是 convex optimization 领域中的基本问题。许多实际任务可以被формализова为quadratic programming,例如支持向量机器(SVM)。线性SVM 是过去三十年最受欢迎的机器学习工具之一,直到深度学习方法成为主流。 在一般情况下,quadratic program 的输入大小为 $\Theta(n^2)$(where $n$ 是变数的数量),因此需要 $\Omega(n^2)$ 时间来解决。然而,从 SVM 中获得的quadratic program 的输入大小为 $O(n)$,这使得可能设计近似线性时间的算法。两个重要的 SVM 类别是允许低矩阵kernel factorization 和低树几何 programme。低树几何 convex optimization 在过去几年内(例如线性程度 [Dong, Lee 和 Ye 2021] 和对偶定理程度 [Gu 和 Song 2022])获得了增加的关注。因此,一个重要的开问是是否存在近似线性时间的算法 для quadratic program WITH low-rank factorization 或 low-treewidth。在这个工作中,我们提供了第一个 near-linear time algorithm for solving quadratic programming with low-rank factorization or low-treewidth, 和一小数量的线性几何。我们的结果意味着 near-linear time algorithms for low-treewidth 或 low-rank SVMs.

Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection

  • paper_url: http://arxiv.org/abs/2307.07726
  • repo_url: None
  • paper_authors: Shijin Gong, Xinyu Zhang
  • for: 理解神经网络模型的效iveness
  • methods: 通过sample splitting的实践来找到优化hyperparameters的方法
  • results: 实验结果证明了这种方法可以使神经网络模型的预测风险下降到最低Translation:
  • for: Understanding the effectiveness of neural network models
  • methods: By practicing sample splitting to optimize hyperparameters
  • results: Experimental results prove that this method can minimize the prediction risk of neural network models
    Abstract When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks by discovering the mystery underlying a common practice during neural network model construction: sample splitting. Our theory demonstrates that, the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
    摘要

Visual Analytics For Machine Learning: A Data Perspective Survey

  • paper_url: http://arxiv.org/abs/2307.07712
  • repo_url: None
  • paper_authors: Junpeng Wang, Shixia Liu, Wei Zhang
  • for: 本文是一份系统性的回顾,探讨过去十年内关于机器学习(ML)模型的可视化(VIS)研究。
  • methods: 本文分类了常见的机器学习模型处理的数据类型为五种,解释每种类型的特点,并提及对其学习适应的机器学习模型。
  • results: 对143篇评估的论文进行分析,发现这些论文在不同的ML管道阶段和数据类型上进行了六种任务,并对未来研究方向做出预测。
    Abstract The past decade has witnessed a plethora of works that leverage the power of visualization (VIS) to interpret machine learning (ML) models. The corresponding research topic, VIS4ML, keeps growing at a fast pace. To better organize the enormous works and shed light on the developing trend of VIS4ML, we provide a systematic review of these works through this survey. Since data quality greatly impacts the performance of ML models, our survey focuses specifically on summarizing VIS4ML works from the data perspective. First, we categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data (i.e., data-centric tasks) at different stages of the ML pipeline to understand, diagnose, and refine ML models. Lastly, by studying the distribution of 143 surveyed papers across the five data types, six data-centric tasks, and their intersections, we analyze the prospective research directions and envision future research trends.
    摘要 过去一个 décennie hath witnessed a plethora of works that leveraged the power of visualization (VIS) to interpret machine learning (ML) models. The corresponding research topic, VIS4ML, hath been growing at a fast pace. To better organize the enormous works and shed light on the developing trend of VIS4ML, we provide a systematic review of these works through this survey. Since data quality greatly impacts the performance of ML models, our survey focuses specifically on summarizing VIS4ML works from the data perspective. First, we categorize the common data handled by ML models into five types, explain the unique features of each type, and highlight the corresponding ML models that are good at learning from them. Second, from the large number of VIS4ML works, we tease out six tasks that operate on these types of data (i.e., data-centric tasks) at different stages of the ML pipeline to understand, diagnose, and refine ML models. Lastly, by studying the distribution of 143 surveyed papers across the five data types, six data-centric tasks, and their intersections, we analyze the prospective research directions and envision future research trends.

Identification of Stochasticity by Matrix-decomposition: Applied on Black Hole Data

  • paper_url: http://arxiv.org/abs/2307.07703
  • repo_url: https://github.com/sunilvengalil/ts_analysis_pca_eig
  • paper_authors: Sai Pradeep Chakka, Sunil Kumar Vengalil, Neelam Sinha
  • for: 本研究旨在提出一种两路矩阵分解法,用于分类时间序列数据。
  • methods: 该算法使用了两种不同的技术:单值分解(SVD)和主成分分析(PCA)。
  • results: 对synthetic数据进行了分析,并在实验中使用了SVM进行分类。结果显示,在12个时间类中,SVD-label和PCA-label之间存在高度的一致性。
    Abstract Timeseries classification as stochastic (noise-like) or non-stochastic (structured), helps understand the underlying dynamics, in several domains. Here we propose a two-legged matrix decomposition-based algorithm utilizing two complementary techniques for classification. In Singular Value Decomposition (SVD) based analysis leg, we perform topological analysis (Betti numbers) on singular vectors containing temporal information, leading to SVD-label. Parallely, temporal-ordering agnostic Principal Component Analysis (PCA) is performed, and the proposed PCA-derived features are computed. These features, extracted from synthetic timeseries of the two labels, are observed to map the timeseries to a linearly separable feature space. Support Vector Machine (SVM) is used to produce PCA-label. The proposed methods have been applied to synthetic data, comprising 41 realisations of white-noise, pink-noise (stochastic), Logistic-map at growth-rate 4 and Lorentz-system (non-stochastic), as proof-of-concept. Proposed algorithm is applied on astronomical data: 12 temporal-classes of timeseries of black hole GRS 1915+105, obtained from RXTE satellite with average length 25000. For a given timeseries, if SVD-label and PCA-label concur, then the label is retained; else deemed "Uncertain". Comparison of obtained results with those in literature are presented. It's found that out of 12 temporal classes of GRS 1915+105, concurrence between SVD-label and PCA-label is obtained on 11 of them.
    摘要 时间序列分类为随机(噪声如的)或非随机(结构化),可以帮助我们理解时间序列的下面动力学。我们提出了一种基于两个脚本的矩阵分解算法,利用两种 complementary 技术进行分类。在 Singular Value Decomposition(SVD)基础分析脚本中,我们进行了 topological 分析(Betti 数)于时间信号中的特征向量,从而获得 SVD-标签。同时,无关于时间顺序的 Principal Component Analysis(PCA)被应用,并计算了提案的 PCA-derived 特征。这些特征从 synthetic 时间序列中提取出来,并在线性分离特征空间中映射时间序列。使用 Support Vector Machine(SVM)生成 PCA-标签。我们对 synthetic 数据进行了证明,包括41个实现 white-noise、pink-noise(随机)、Logistic-map 增长率4和 Lorentz-system(非随机)。我们还应用了这种方法于天文数据:RXTE 卫星上的 12 个 temporal 类时间序列,每个时间序列的平均长度为 25000。对于每个时间序列,如果 SVD-标签和 PCA-标签协调,则保留标签;否则被称为 "Uncertain"。我们对得到的结果与文献中的结果进行了比较,发现 GRS 1915+105 黑洞的 11 个 temporal 类时间序列中,SVD-标签和 PCA-标签协调。

NeurASP: Embracing Neural Networks into Answer Set Programming

  • paper_url: http://arxiv.org/abs/2307.07700
  • repo_url: None
  • paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
  • for: 该论文是为了推动Answer Set Programming(ASP)和神经网络之间的 integración,提供了一种简单扩展的Answer Set Programming(NeurASP)。
  • methods: 该论文使用神经网络输出作为Answer Set Programming中的概率分布,从而实现了sub-symbolic和symbolic计算的集成。它还展示了如何使用预训练神经网络在符号计算中使用ASP规则,以及如何使用ASP规则来训练神经网络。
  • results: NeurASP可以使用预训练神经网络来改善神经网络的识别结果,并且可以使用ASP规则来帮助神经网络学习从数据中的隐式相关性和Explicit complex semantic constraints。
    Abstract We present NeurASP, a simple extension of answer set programs by embracing neural networks. By treating the neural network output as the probability distribution over atomic facts in answer set programs, NeurASP provides a simple and effective way to integrate sub-symbolic and symbolic computation. We demonstrate how NeurASP can make use of a pre-trained neural network in symbolic computation and how it can improve the neural network's perception result by applying symbolic reasoning in answer set programming. Also, NeurASP can be used to train a neural network better by training with ASP rules so that a neural network not only learns from implicit correlations from the data but also from the explicit complex semantic constraints expressed by the rules.
    摘要 我们介绍NeurASP,一个简单扩展Answer Set Programs(ASP)的方法,通过将神经网络输出视为Answer Set Programs中的原子事实的概率分布。NeurASP提供了一个简单而有效的方式将子符号 computations和符号 computations融合。我们显示了NeurASP如何使用预训练的神经网络在符号计算中使用,以及如何运用符号推理来改善神经网络的认知结果。此外,NeurASP还可以用来训练神经网络,使其不仅从数据中学习隐含的相互关联,而且还从ASP规则中获得明确的复杂 semantic constraint。

The Growth of E-Bike Use: A Machine Learning Approach

  • paper_url: http://arxiv.org/abs/2308.02034
  • repo_url: None
  • paper_authors: Aditya Gupta, Samarth Chitgopekar, Alexander Kim, Joseph Jiang, Megan Wang, Christopher Grattoni
    for: 这个研究的目的是为美国政策制定者提供关于电动自行车(e-bike)的信息,以便他们能够更好地了解电动自行车的增长和影响,并在制定可持续能源计划时做出更 Informed decisions。methods: 这个研究使用了ARIMA模型和一种监管机器学习算法来预测电动自行车销售量的增长。此外,研究还使用Random Forest回归模型来分析电动自行车销售增长的因素。results: 研究发现,电动自行车在美国的销售量将在2025年和2028年分别达到130万和2113万个单位。此外,研究还发现,电动自行车的使用会减少碳排放和提高体能消耗。在2022年,电动自行车的使用已经减少了15737.82吨碳排放和716630.727千卡ло里。
    Abstract We present our work on electric bicycles (e-bikes) and their implications for policymakers in the United States. E-bikes have gained significant popularity as a fast and eco-friendly transportation option. As we strive for a sustainable energy plan, understanding the growth and impact of e-bikes is crucial for policymakers. Our mathematical modeling offers insights into the value of e-bikes and their role in the future. Using an ARIMA model, a supervised machine-learning algorithm, we predicted the growth of e-bike sales in the U.S. Our model, trained on historical sales data from January 2006 to December 2022, projected sales of 1.3 million units in 2025 and 2.113 million units in 2028. To assess the factors contributing to e-bike usage, we employed a Random Forest regression model. The most significant factors influencing e-bike sales growth were disposable personal income and popularity. Furthermore, we examined the environmental and health impacts of e-bikes. Through Monte Carlo simulations, we estimated the reduction in carbon emissions due to e-bike use and the calories burned through e-biking. Our findings revealed that e-bike usage in the U.S. resulted in a reduction of 15,737.82 kilograms of CO2 emissions in 2022. Additionally, e-bike users burned approximately 716,630.727 kilocalories through their activities in the same year. Our research provides valuable insights for policymakers, emphasizing the potential of e-bikes as a sustainable transportation solution. By understanding the growth factors and quantifying the environmental and health benefits, policymakers can make informed decisions about integrating e-bikes into future energy and transportation strategies.
    摘要 我们对电动自行车(e-bike)的研究和其对政策 makers 在美国的影响进行了报告。电动自行车在快速和环保交通方面受到了广泛的欢迎,随着我们努力实现可持续能源规划,理解电动自行车的增长和影响非常重要。我们使用 ARIMA 模型和一种监管机器学习算法来预测电动自行车销售在美国的增长。我们的模型,基于2006年1月至2022年12月的历史销售数据,预测在2025年销售130万部电动自行车,在2028年销售2113万部。为了评估电动自行车使用的因素,我们使用Random Forest回归模型。最主要影响电动自行车销售增长的因素是可 dispose 个人收入和流行度。此外,我们还研究了电动自行车对环境和健康的影响。通过蒙地卡罗模拟,我们估算了电动自行车使用在美国的碳排放减少和热量燃烧。我们的发现表明,在2022年,电动自行车在美国的使用已经减少了15737.82公斤的碳排放,同时电动自行车用户通过其活动燃烧了约716630.727公利 kalories。我们的研究为政策 makers 提供了有价值的见解,强调电动自行车作为可持续交通解决方案的潜在价值。通过理解电动自行车增长因素和评估环境和健康的影响,政策 makers 可以做出 Informed 的决策,将电动自行车纳入未来能源和交通战略中。

Reducing operator complexity in Algebraic Multigrid with Machine Learning Approaches

  • paper_url: http://arxiv.org/abs/2307.07695
  • repo_url: None
  • paper_authors: Ru Huang, Kai Chang, Huan He, Ruipeng Li, Yuanzhe Xi
  • for: solves parametric partial differential equation (PDE) problems with increasing operator complexity.
  • methods: utilizes neural networks (NNs) combined with smooth test vectors from multigrid eigenvalue problems.
  • results: reduces the complexity of coarse-grid operators while maintaining overall AMG convergence.Here’s the simplified Chinese text:
  • for: 用于解决参数部分 diferencial equation (PDE) 问题中增加运算 complexity.
  • methods: 利用神经网络 (NNs) 与多普逊值问题中的畅通测试向量结合.
  • results: 降低粗网操作符的复杂性,保持总的 AMG converges.
    Abstract We propose a data-driven and machine-learning-based approach to compute non-Galerkin coarse-grid operators in algebraic multigrid (AMG) methods, addressing the well-known issue of increasing operator complexity. Guided by the AMG theory on spectrally equivalent coarse-grid operators, we have developed novel ML algorithms that utilize neural networks (NNs) combined with smooth test vectors from multigrid eigenvalue problems. The proposed method demonstrates promise in reducing the complexity of coarse-grid operators while maintaining overall AMG convergence for solving parametric partial differential equation (PDE) problems. Numerical experiments on anisotropic rotated Laplacian and linear elasticity problems are provided to showcase the performance and compare with existing methods for computing non-Galerkin coarse-grid operators.
    摘要 我们提出了一种基于数据驱动和机器学习的方法,用于在数学多普逊(AMG)方法中计算非加尔erkin粗积算子,解决了常见的算子复杂性问题。我们根据AMG理论中的特征相似粗积算子,开发了一种新的机器学习算法,利用神经网络(NN)和多普逊域值问题中的平滑测试向量。我们的方法可以减少粗积算子的复杂性,同时保持AMG方法的总体收敛性,用于解决参数化partial differential equation(PDE)问题。我们在不同的旋转卷积 Laplacian 和线性塑性问题上进行了数值实验,以示出我们的方法的性能和与现有方法相比。

Creating a Dataset for High-Performance Computing Code Translation: A Bridge Between HPC Fortran and C++

  • paper_url: http://arxiv.org/abs/2307.07686
  • repo_url: https://github.com/bin123apple/fortran-cpp-hpc-code-translation-dataset
  • paper_authors: Bin Lei, Caiwen Ding, Le Chen, Pei-Hung Lin, Chunhua Liao
  • for: 本研究准备了一个新的机器学习模型训练集,用于翻译OpenMP Fortran和C++代码。
  • methods: 为确保可靠性和实用性,该集 initially refined 使用仔细的代码相似性测试。
  • results: 我们使用量化(CodeBLEU)和质量(人类评估)方法评估该集的有效性,并发现该集可以提高大规模语言模型的翻译能力,比如无编程知识下的提升为$\mathbf{\times 5.1}$,有编程知识下的提升为$\mathbf{\times 9.9}$。这种dataset的存在可能推动高性能计算领域中的代码翻译技术的发展。该集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset上下载。
    Abstract In this study, we present a novel dataset for training machine learning models translating between OpenMP Fortran and C++ code. To ensure reliability and applicability, the dataset is initially refined using a meticulous code similarity test. The effectiveness of our dataset is assessed using both quantitative (CodeBLEU) and qualitative (human evaluation) methods. We demonstrate how this dataset can significantly improve the translation capabilities of large-scale language models, with improvements of $\mathbf{\times 5.1}$ for models with no prior coding knowledge and $\mathbf{\times 9.9}$ for models with some coding familiarity. Our work highlights the potential of this dataset to advance the field of code translation for high-performance computing. The dataset is available at https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset
    摘要 在这项研究中,我们提供了一个新的数据集用于训练机器学习模型在OpenMP Fortran和C++代码之间翻译。为确保可靠性和实用性,我们首先使用精细的代码相似性测试进行初步纤细。我们使用代码BLEU和人类评估方法进行评估数据集的效果,并证明了该数据集可以大幅提高大规模语言模型的翻译能力,具体是$\times 5.1$ для没有编程知识的模型和$\times 9.9$ для具有一定编程经验的模型。我们的工作展示了该数据集在高性能计算领域的代码翻译技术的前进。数据集可以在https://github.com/bin123apple/Fortran-CPP-HPC-code-translation-dataset中下载。

Learning Subjective Time-Series Data via Utopia Label Distribution Approximation

  • paper_url: http://arxiv.org/abs/2307.07682
  • repo_url: None
  • paper_authors: Wenxin Xu, Hexin Jiang, Xuefeng Liang, Ying Zhou, Yin Zhao, Jie Zhang
    for:STR tasks (Subjective time-series regression)methods:ULDA (Utopia Label Distribution Approximation)TNS (Time-slice Normal Sampling)CWL (Convolutional Weighted Loss)results:lifts the state-of-the-art performance on two STR tasks and three benchmark datasets.
    Abstract Subjective time-series regression (STR) tasks have gained increasing attention recently. However, most existing methods overlook the label distribution bias in STR data, which results in biased models. Emerging studies on imbalanced regression tasks, such as age estimation and depth estimation, hypothesize that the prior label distribution of the dataset is uniform. However, we observe that the label distributions of training and test sets in STR tasks are likely to be neither uniform nor identical. This distinct feature calls for new approaches that estimate more reasonable distributions to train a fair model. In this work, we propose Utopia Label Distribution Approximation (ULDA) for time-series data, which makes the training label distribution closer to real-world but unknown (utopia) label distribution. This would enhance the model's fairness. Specifically, ULDA first convolves the training label distribution by a Gaussian kernel. After convolution, the required sample quantity at each regression label may change. We further devise the Time-slice Normal Sampling (TNS) to generate new samples when the required sample quantity is greater than the initial sample quantity, and the Convolutional Weighted Loss (CWL) to lower the sample weight when the required sample quantity is less than the initial quantity. These two modules not only assist the model training on the approximated utopia label distribution, but also maintain the sample continuity in temporal context space. To the best of our knowledge, ULDA is the first method to address the label distribution bias in time-series data. Extensive experiments demonstrate that ULDA lifts the state-of-the-art performance on two STR tasks and three benchmark datasets.
    摘要 受到媒体关注的主观时序回归(STR)任务在最近几年来得到了越来越多的关注。然而,大多数现有方法忽略了STR数据中标签分布偏见,导致模型偏向。新诞听学者认为STR任务中的标签分布是均匀的,但我们发现STR任务中的训练和测试集标签分布很可能不均匀,也不是完全相同的。这种特殊特点需要新的方法来训练公正的模型。在这种情况下,我们提出了UTopia标签分布近似(ULDA)方法,用于在时序数据上训练公正的模型。ULDA方法首先将训练标签分布通过 Gaussian 核函数进行混合。在混合后,每个回归标签的样本数量可能会改变。我们还提出了时间扁平分布(TNS)和卷积权重损失(CWL)两个模块,用于生成新的样本和更正模型的训练。这两个模块不仅帮助模型在训练中使用更加公正的标签分布,还保持了样本在时间上的连续性。到目前为止,ULDA方法是首个强调STR任务中标签分布偏见的方法。我们对 STR 任务中的三个标准数据集进行了广泛的实验,结果表明ULDA方法可以超越当前的状态势。

Data-centric Operational Design Domain Characterization for Machine Learning-based Aeronautical Products

  • paper_url: http://arxiv.org/abs/2307.07681
  • repo_url: None
  • paper_authors: Fateh Kaakai, Shridhar “Shreeder” Adibhatla, Ganesh Pai, Emmanuelle Escorihuela
  • for: 这个论文是为了提供一种初次准确地定义机器学习(ML)基于飞行器产品的操作设计域(ODD)的方法。
  • methods: 该方法是基于数据而不是场景而定义ODD,并提出了将定义ODD的参数维度和 ML 应用可能遇到的数据类型进行明确表述,以及这些数据类型对 ML 模型和系统层次结构的影响。
  • results: 该论文指出,通过这种方法可以确定 ML 模型的需求,以及系统层次结构中 ML 模型和高级系统的可能的影响,以及可能需要进行学习保障过程和系统体系设计考虑。 例如,通过使用飞行器飞行范围来说明这些概念。
    Abstract We give a first rigorous characterization of Operational Design Domains (ODDs) for Machine Learning (ML)-based aeronautical products. Unlike in other application sectors (such as self-driving road vehicles) where ODD development is scenario-based, our approach is data-centric: we propose the dimensions along which the parameters that define an ODD can be explicitly captured, together with a categorization of the data that ML-based applications can encounter in operation, whilst identifying their system-level relevance and impact. Specifically, we discuss how those data categories are useful to determine: the requirements necessary to drive the design of ML Models (MLMs); the potential effects on MLMs and higher levels of the system hierarchy; the learning assurance processes that may be needed, and system architectural considerations. We illustrate the underlying concepts with an example of an aircraft flight envelope.
    摘要 我们给出了机器学习(ML)基于航空产品的操作设计领域(ODD)的首次正式定义。与其他应用领域(如自动驾驶道路车辆)的ODD开发不同,我们的方法是数据中心:我们提议定义ODD参数的维度,并将ML基于应用中可能遇到的数据分类,以及这些数据的系统水平重要性和影响。specifically, we discuss how these data categories can be used to determine: the requirements needed to drive the design of ML models(MLMs); the potential effects on MLMs and higher levels of the system hierarchy; the learning assurance processes that may be needed, and system architectural considerations. We illustrate the underlying concepts with an example of an aircraft flight envelope.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Sequence-Based Nanobody-Antigen Binding Prediction

  • paper_url: http://arxiv.org/abs/2308.01920
  • repo_url: None
  • paper_authors: Usama Sardar, Sarwan Ali, Muhammad Sohaib Ayub, Muhammad Shoaib, Khurram Bashir, Imdad Ullah Khan, Murray Patterson
    for: This paper aims to develop a machine-learning method to predict the binding of nanobodies (Nb) to antigens based solely on sequence data.methods: The authors curated a comprehensive dataset of Nb-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of Nb and Antigen.results: The approach achieved up to 90% accuracy in binding prediction and was significantly more efficient compared to the widely-used computational docking technique.
    Abstract Nanobodies (Nb) are monomeric heavy-chain fragments derived from heavy-chain only antibodies naturally found in Camelids and Sharks. Their considerably small size (~3-4 nm; 13 kDa) and favorable biophysical properties make them attractive targets for recombinant production. Furthermore, their unique ability to bind selectively to specific antigens, such as toxins, chemicals, bacteria, and viruses, makes them powerful tools in cell biology, structural biology, medical diagnostics, and future therapeutic agents in treating cancer and other serious illnesses. However, a critical challenge in nanobodies production is the unavailability of nanobodies for a majority of antigens. Although some computational methods have been proposed to screen potential nanobodies for given target antigens, their practical application is highly restricted due to their reliance on 3D structures. Moreover, predicting nanobodyantigen interactions (binding) is a time-consuming and labor-intensive task. This study aims to develop a machine-learning method to predict Nanobody-Antigen binding solely based on the sequence data. We curated a comprehensive dataset of Nanobody-Antigen binding and nonbinding data and devised an embedding method based on gapped k-mers to predict binding based only on sequences of nanobody and antigen. Our approach achieves up to 90% accuracy in binding prediction and is significantly more efficient compared to the widely-used computational docking technique.
    摘要 纳诺体(Nb)是含有重链只的轻链抗体的自然存在的哺乳动物和鲨鱼中的蛋白质。它们的非常小的大小(约3-4奈米,13 kDa)和有利的生物物理性质使其成为了重点生产的目标。此外,它们可以特异性地绑定到特定抗原,如毒素、化学物质、细菌和病毒,使其成为了细胞生物、结构生物、医学诊断和未来的疾病治疗的有力工具。然而,纳诺体生产中的主要挑战是缺乏纳诺体对大多数抗原的可用性。虽然一些计算方法已经被提出来屏选纳诺体对给定抗原的可能性,但它们的实际应用受到了三维结构的限制,而且预测纳诺体-抗原交互(绑定)是一项时间consuming和劳动密集的任务。本研究旨在开发一种基于序列数据的机器学习方法,以预测纳诺体-抗原绑定。我们收集了一个完整的纳诺体-抗原绑定和非绑定数据集,并采用基于异常词的嵌入方法来预测绑定基于纳诺体和抗原的序列数据。我们的方法可以达到90%的准确率,与广泛使用的计算协同技术相比,效率明显高于。

Sharp Convergence Rates for Matching Pursuit

  • paper_url: http://arxiv.org/abs/2307.07679
  • repo_url: None
  • paper_authors: Jason M. Klusowski, Jonathan W. Siegel
  • for: 本文研究了matching pursuit的基本限制,即用一个 слова库中的元素组成一个稀疏的线性组合来近似目标函数。当目标函数在字典的变化空间中时,过去几十年有很多卓越的工作获得了上下限 bounds on error of matching pursuit,但它们并不匹配。本文的主要贡献是将这个差异关系closed和获得了准确的衰减率特征。
  • methods: 本文使用了一个最差情况的字典来构建,该字典显示出了现有最佳上限 bound cannot be significantly improved。结果是,与其他greedy algorithm variants不同,matching pursuit的 converges rate是非优的并由一个certain non-linear equation的解决决定。这使得我们可以结论出任何Amount of shrinkage improve matching pursuit in the worst case.
  • results: 本文的结果是,任何Amount of shrinkage improve matching pursuit in the worst case。这意味着,无论如何选择 слова库,matching pursuit都会在最差情况下出现衰减。这与之前的研究不同,因为它们通常认为matching pursuit在某些情况下是optimal的。
    Abstract We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary. When the target function is contained in the variation space corresponding to the dictionary, many impressive works over the past few decades have obtained upper and lower bounds on the error of matching pursuit, but they do not match. The main contribution of this paper is to close this gap and obtain a sharp characterization of the decay rate of matching pursuit. Specifically, we construct a worst case dictionary which shows that the existing best upper bound cannot be significantly improved. It turns out that, unlike other greedy algorithm variants, the converge rate is suboptimal and is determined by the solution to a certain non-linear equation. This enables us to conclude that any amount of shrinkage improves matching pursuit in the worst case.
    摘要 我们研究基本限制的匹配追求(也称为纯格列批处理),用一个简单的线性组合来近似目标函数。当目标函数在字典的变换空间中存在时,过去几十年有很多出色的成果,得到了误差的上下限,但是它们不匹配。本文的主要贡献是关于匹配追求的衰减率的锐化特征化。我们构建了最坏情况的字典,显示现有的最佳上限不能得到显著改进。结果表明,与其他格列算法变体不同,匹配追求的 converges率是不优的,并且取决于一个非线性方程的解。这使得我们能够 conclued 任何Amount of shrinkage 都会提高匹配追求的性能在最坏情况下。

On the Robustness of Epoch-Greedy in Multi-Agent Contextual Bandit Mechanisms

  • paper_url: http://arxiv.org/abs/2307.07675
  • repo_url: None
  • paper_authors: Yinglun Xu, Bhuvesh Kumar, Jacob Abernethy
  • for: 这篇论文主要关注在多重投机机制中的学习问题,特别是面临三大挑战:吸引真实投标行为、使用用户个性化、以及抵御 manipulate click 模式。
  • methods: 这篇论文使用了多种方法来解决这些挑战,包括 truthful multi-armed bandit mechanisms、contextual bandit algorithms 和 bandits with adversarial corruptions。
  • results: 研究发现,可以通过扩展 $\epsilon$-greedy 算法来处理这些挑战,并且这种扩展具有对 adversarial data corruption attacks 的 innate robustness,并且性能会随损害的Amount decay linearly。
    Abstract Efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). Each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. Since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. In this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. We further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.
    摘要 efficient learning in multi-armed bandit mechanisms such as pay-per-click (PPC) auctions typically involves three challenges: 1) inducing truthful bidding behavior (incentives), 2) using personalization in the users (context), and 3) circumventing manipulations in click patterns (corruptions). each of these challenges has been studied orthogonally in the literature; incentives have been addressed by a line of work on truthful multi-armed bandit mechanisms, context has been extensively tackled by contextual bandit algorithms, while corruptions have been discussed via a recent line of work on bandits with adversarial corruptions. since these challenges co-exist, it is important to understand the robustness of each of these approaches in addressing the other challenges, provide algorithms that can handle all simultaneously, and highlight inherent limitations in this combination. in this work, we show that the most prominent contextual bandit algorithm, $\epsilon$-greedy can be extended to handle the challenges introduced by strategic arms in the contextual multi-arm bandit mechanism setting. we further show that $\epsilon$-greedy is inherently robust to adversarial data corruption attacks and achieves performance that degrades linearly with the amount of corruption.

An Empirical Study of the Effectiveness of Using a Replay Buffer on Mode Discovery in GFlowNets

  • paper_url: http://arxiv.org/abs/2307.07674
  • repo_url: None
  • paper_authors: Nikhil Vemgal, Elaine Lau, Doina Precup
  • for: 本文研究了如何使用储存缓存(replay buffer)来加速GFlowNets模式发现。
  • methods: 本文employs empirical studies to explore various replay buffer sampling techniques and evaluates their impact on the speed of mode discovery and the quality of the discovered modes.
  • results: 实验结果表明,在Hypergrid即地域和分子合成环境中,使用储存缓存可以significantly improve模式发现速度和模式质量。
    Abstract Reinforcement Learning (RL) algorithms aim to learn an optimal policy by iteratively sampling actions to learn how to maximize the total expected return, $R(x)$. GFlowNets are a special class of algorithms designed to generate diverse candidates, $x$, from a discrete set, by learning a policy that approximates the proportional sampling of $R(x)$. GFlowNets exhibit improved mode discovery compared to conventional RL algorithms, which is very useful for applications such as drug discovery and combinatorial search. However, since GFlowNets are a relatively recent class of algorithms, many techniques which are useful in RL have not yet been associated with them. In this paper, we study the utilization of a replay buffer for GFlowNets. We explore empirically various replay buffer sampling techniques and assess the impact on the speed of mode discovery and the quality of the modes discovered. Our experimental results in the Hypergrid toy domain and a molecule synthesis environment demonstrate significant improvements in mode discovery when training with a replay buffer, compared to training only with trajectories generated on-policy.
    摘要 强化学习(RL)算法的目标是通过反复样本动作来学习最佳策略,以 maximize the total expected return, $R(x)$. GFlowNets 是一种特殊的算法,用于生成自 discrete 集合中的多个候选者,$x$, 通过学习一个策略,来近似 proportional sampling of $R(x)$. GFlowNets 在模式发现方面表现出了改善,这对于应用如药物发现和 combinatorial search 非常有用。然而,由于 GFlowNets 是一种相对较新的算法,许多RL中的技巧还没有与其相关。在这篇论文中,我们研究了 GFlowNets 中使用 replay buffer 的利用。我们通过 empirical 方式研究了不同的 replay buffer 采样技术的影响,以及它们对速度模式发现和模式质量的影响。我们的实验结果在 Hypergrid 玩家领域和一个分子合成环境中表明,在训练中使用 replay buffer 可以比训练只使用在政策上的 trajectories 更快地发现模式,并且模式质量也更高。

Efficient Adversarial Attacks on Online Multi-agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.07670
  • repo_url: None
  • paper_authors: Guanlin Liu, Lifeng Lai
  • for: investigate the impact of adversarial attacks on MARL
  • methods: action poisoning, reward poisoning, mixed attack strategy
  • results: efficient attack on MARL agents even with no prior information about the environment and agents’ algorithms
    Abstract Due to the broad range of applications of multi-agent reinforcement learning (MARL), understanding the effects of adversarial attacks against MARL model is essential for the safe applications of this model. Motivated by this, we investigate the impact of adversarial attacks on MARL. In the considered setup, there is an exogenous attacker who is able to modify the rewards before the agents receive them or manipulate the actions before the environment receives them. The attacker aims to guide each agent into a target policy or maximize the cumulative rewards under some specific reward function chosen by the attacker, while minimizing the amount of manipulation on feedback and action. We first show the limitations of the action poisoning only attacks and the reward poisoning only attacks. We then introduce a mixed attack strategy with both the action poisoning and the reward poisoning. We show that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior information about the underlying environment and the agents' algorithms.
    摘要 We first show the limitations of action poisoning only attacks and reward poisoning only attacks. We then introduce a mixed attack strategy that combines both action poisoning and reward poisoning. We demonstrate that the mixed attack strategy can efficiently attack MARL agents even if the attacker has no prior knowledge of the underlying environment and the agents' algorithms.

Efficient Action Robust Reinforcement Learning with Probabilistic Policy Execution Uncertainty

  • paper_url: http://arxiv.org/abs/2307.07666
  • repo_url: None
  • paper_authors: Guanlin Liu, Zhihan Zhou, Han Liu, Lifeng Lai
  • for: 本研究目的是找到面对不确定性时的最佳策略,以优化最差情况性能。
  • methods: 本研究使用了可靠性执行不确定性,即策略中指定的动作会被执行的概率是1-ρ,而冲击动作会被执行的概率是ρ。我们提出了动作稳健MDP的优化方法,并开发了Action Robust Reinforcement Learning with Certificates(ARRLC)算法,可以实现最小最大偏差和样本复杂度。
  • results: 我们通过数值实验 validate了我们的方法的稳健性,并证明了ARRLC在动作冲击下比非稳健RL算法表现更好,并且 faster than robust TD算法在存在动作冲击时 converge。
    Abstract Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with the probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.
    摘要 Robust reinforcement learning (RL) aims to find a policy that optimizes the worst-case performance in the face of uncertainties. In this paper, we focus on action robust RL with probabilistic policy execution uncertainty, in which, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We establish the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. Furthermore, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Furthermore, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.Here's the translation in Simplified Chinese:robust reinforcement learning (RL) 目标是找到面临不确定性时的政策优化策略,在这篇论文中,我们关注action robust RL中的抽象uncertainty, Specifically, instead of always carrying out the action specified by the policy, the agent will take the action specified by the policy with probability $1-\rho$ and an alternative adversarial action with probability $\rho$. We prove the existence of an optimal policy on the action robust MDPs with probabilistic policy execution uncertainty and provide the action robust Bellman optimality equation for its solution. In addition, we develop Action Robust Reinforcement Learning with Certificates (ARRLC) algorithm that achieves minimax optimal regret and sample complexity. Finally, we conduct numerical experiments to validate our approach's robustness, demonstrating that ARRLC outperforms non-robust RL algorithms and converges faster than the robust TD algorithm in the presence of action perturbations.

Machine learning for option pricing: an empirical investigation of network architectures

  • paper_url: http://arxiv.org/abs/2307.07657
  • repo_url: None
  • paper_authors: Laurens Van Mieghem, Antonis Papapantoleon, Jonas Papazoglou-Hennig
  • for: 学习选取OPTION价格或附加价值,给出相应的输入数据(模型参数)和输出数据(选取价格或附加价值)。
  • methods: 使用激活函数网络架构,包括普通的推进网络和图像分类方法中的通用高速公路网络,以及最新的机器学习方法 дляPDEs。
  • results: 通过实验发现,对选取价格问题,使用通用高速公路网络架构可以得到最佳性能,其中误差和训练时间都是最佳。而在计算附加价值时,经过必要的转换后,DGM架构的变体可以获得最佳性能。
    Abstract We consider the supervised learning problem of learning the price of an option or the implied volatility given appropriate input data (model parameters) and corresponding output data (option prices or implied volatilities). The majority of articles in this literature considers a (plain) feed forward neural network architecture in order to connect the neurons used for learning the function mapping inputs to outputs. In this article, motivated by methods in image classification and recent advances in machine learning methods for PDEs, we investigate empirically whether and how the choice of network architecture affects the accuracy and training time of a machine learning algorithm. We find that for option pricing problems, where we focus on the Black--Scholes and the Heston model, the generalized highway network architecture outperforms all other variants, when considering the mean squared error and the training time as criteria. Moreover, for the computation of the implied volatility, after a necessary transformation, a variant of the DGM architecture outperforms all other variants, when considering again the mean squared error and the training time as criteria.
    摘要 我们考虑了超级vised学习问题,即通过适当的输入数据(模型参数)和对应的输出数据(选项价格或预测volatility)来学习函数映射。大多数文章在这个文献中使用(普通)径向神经网络架构来连接学习神经元。在这篇文章中,我们受到图像分类方法和最近的机器学习方法 дляPDE的影响,我们在option价格问题上进行了实验,以评估不同网络架构对精度和训练时间的影响。我们发现,对黑-谢尔斯和哈斯顿模型的option价格问题,通用高速公路网络架构在评估 Mean Squared Error 和训练时间作为标准对比下,其表现比其他所有变体更好。此外,对计算预测volatility问题,经过必要的变换,一种DGM架构的变体在评估 Mean Squared Error 和训练时间作为标准对比下,表现比其他所有变体更好。

DIGEST: Fast and Communication Efficient Decentralized Learning with Local Updates

  • paper_url: http://arxiv.org/abs/2307.07652
  • repo_url: https://github.com/anonymous404404/digestcode
  • paper_authors: Peyman Gholami, Hulya Seferoglu
  • for: 这个论文主要针对的是异构分布数据的归一化学习问题,并提出了一种异步分布式学习机制DIGEST,以提高通信效率和速度。
  • methods: 该论文基于Gossip算法和随机漫步算法的想法,并且关注了Stochastic Gradient Descent(SGD)算法。DIGEST机制是一种异步分布式算法,基于本地SGD算法,可以提高通信效率和速度。
  • results: 论文通过分析单流和多流DIGEST机制的渐进性和通信开销,证明了两者都可以 approached到优化解决方案。在ilogistic回归和深度神经网络ResNet20上进行了实验,结果表明,多流DIGEST在iid设定下的渐进性比基eline更好,而在非iid设定下则超越基eline。
    Abstract Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
    摘要 “两种广泛被考虑的分布式学习算法是聊天和随机游走学习。聊天算法(同步和异步版本)具有高通信成本,而随机游走学习则具有增长的收敛时间。在这篇论文中,我们设计了一种快速和通信效率高的异步分布式学习机制DIGEST,通过融合聊天和随机游走的想法,专注于随机梯度下降(SGD)。DIGEST是一种异步分布式算法,基于本地SGD算法,原本设计用于通信效率高的中央化学习。我们设计了单流和多流DIGEST,其通信开销随着流数增加,并且存在一种收敛和通信开销贸易,可以利用。我们分析了单流和多流DIGEST的收敛,并证明它们在iid和非iid数据分布下都能够向优化解决方案 asymptotically。我们对单流和多流DIGEST进行了逻辑回归和深度神经网络ResNet20的性能评估。实验结果表明,多流DIGEST具有良好的收敛性质,即其收敛时间在iid设定下比基eline更快,并在非iid设定下超过基eline。”

SALC: Skeleton-Assisted Learning-Based Clustering for Time-Varying Indoor Localization

  • paper_url: http://arxiv.org/abs/2307.07650
  • repo_url: None
  • paper_authors: An-Hung Hsiao, Li-Hsiang Shen, Chen-Yi Chang, Chun-Jie Chiu, Kai-Ten Feng
  • For: The paper is written for establishing a sustainable and accurate indoor localization system that can adapt to highly-changing environments.* Methods: The paper proposes a skeleton-assisted learning-based clustering localization (SALC) system that jointly considers similarities from the skeleton-based shortest path (SSP) and time-varying RSS measurements across reference points (RPs). The system includes RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE).* Results: The proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, outperforming other existing schemes in the open literature. Both simulation and experimental results demonstrate the effectiveness of the proposed system.
    Abstract Wireless indoor localization has attracted significant amount of attention in recent years. Using received signal strength (RSS) obtained from WiFi access points (APs) for establishing fingerprinting database is a widely utilized method in indoor localization. However, the time-variant problem for indoor positioning systems is not well-investigated in existing literature. Compared to conventional static fingerprinting, the dynamicallyreconstructed database can adapt to a highly-changing environment, which achieves sustainability of localization accuracy. To deal with the time-varying issue, we propose a skeleton-assisted learning-based clustering localization (SALC) system, including RSS-oriented map-assisted clustering (ROMAC), cluster-based online database establishment (CODE), and cluster-scaled location estimation (CsLE). The SALC scheme jointly considers similarities from the skeleton-based shortest path (SSP) and the time-varying RSS measurements across the reference points (RPs). ROMAC clusters RPs into different feature sets and therefore selects suitable monitor points (MPs) for enhancing location estimation. Moreover, the CODE algorithm aims for establishing adaptive fingerprint database to alleviate the timevarying problem. Finally, CsLE is adopted to acquire the target position by leveraging the benefits of clustering information and estimated signal variations in order to rescale the weights fromweighted k-nearest neighbors (WkNN) method. Both simulation and experimental results demonstrate that the proposed SALC system can effectively reconstruct the fingerprint database with an enhanced location estimation accuracy, which outperforms the other existing schemes in the open literature.
    摘要 sans serif;">无线内部位置系统在过去几年内吸引了广泛的关注。使用WiFi接入点(AP)获得的接收信号强度(RSS)来建立指本库是内部位置系统中广泛使用的方法。然而,现有文献中对indoor位置系统中的时间变化问题的研究不够。相比于传统的静止指本,动态重建库可以适应高度变化的环境,实现地位测定精度的持续性。为解决时间变化问题,我们提议一种骨架协助学习基于扩展的分布式位置估计系统(SALC),包括RSS导向的地图帮助分组(ROMAC)、群集基本在线数据建立(CODE)和群集缩放位置估计(CsLE)。SALC方案同时考虑骨架基于最短路(SSP)的相似性和时间变化的RSS测量值 across reference points(RPs)。ROMAC将RPs分为不同的特征集并因此选择了改进地位估计的适用点(MPs)。此外,CODE算法目的是建立适应时间变化的指本库,以解决时间变化问题。最后,CsLE方法使用分组信息和估计信号变化来重新衡量weighted k-nearest neighbors(WkNN)方法中的权重,以实现更高的地位估计精度。在实验和 simulations中,我们发现,提议的SALC系统可以更好地重建指本库,并在开 literature中的其他方案中表现出更高的地位估计精度。

DistTGL: Distributed Memory-Based Temporal Graph Neural Network Training

  • paper_url: http://arxiv.org/abs/2307.07649
  • repo_url: None
  • paper_authors: Hongkuan Zhou, Da Zheng, Xiang Song, George Karypis, Viktor Prasanna
  • for: 这个论文主要用于提出一种可Scalable的 distributed GPU clusters 上进行 memory-based Temporal Graph Neural Networks 的训练方法,以提高训练效率和精度。
  • methods: 该论文提出了三个改进方法:1) 提高 TGNN 模型,2) 开发了一种新的训练算法,3) 优化系统。
  • results: 在实验中,DistTGL 实现了近线性的速度增长,相比单机方法,准确率提高 14.5%,训练 durchput 提高 10.17倍。
    Abstract Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Evenworse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput.
    摘要 <> translate "Memory-based Temporal Graph Neural Networks are powerful tools in dynamic graph representation learning and have demonstrated superior performance in many real-world applications. However, their node memory favors smaller batch sizes to capture more dependencies in graph events and needs to be maintained synchronously across all trainers. As a result, existing frameworks suffer from accuracy loss when scaling to multiple GPUs. Even worse, the tremendous overhead to synchronize the node memory make it impractical to be deployed to distributed GPU clusters. In this work, we propose DistTGL -- an efficient and scalable solution to train memory-based TGNNs on distributed GPU clusters. DistTGL has three improvements over existing solutions: an enhanced TGNN model, a novel training algorithm, and an optimized system. In experiments, DistTGL achieves near-linear convergence speedup, outperforming state-of-the-art single-machine method by 14.5% in accuracy and 10.17x in training throughput." into Simplified Chinese.<>室内Memery-based Temporal Graph Neural Networks是动态图表示学习中的 poderful工具,在多个实际应用中表现出了superior的性能。然而,它们的节点记忆偏好 smaller batch size以捕捉更多的图事件依赖关系,并需要在所有训练器上同步保持。因此,现有的框架会导致精度损失when scaling to multiple GPUs。worse, synchronizing the node memory leads to significant overhead, making it impractical to deploy to distributed GPU clusters.在这种情况下,我们提出了DistTGL——一种高效可扩展的解决方案,用于在分布式GPU集群上训练 memory-based TGNNs。DistTGL有三个改进:一种改进的TGNN模型,一种新的训练算法,以及一种优化的系统。在实验中,DistTGL实现了近线性的速度增长,相比单机方法的14.5%的精度提升和10.17x的训练吞吐量。

  • paper_url: http://arxiv.org/abs/2307.10219
  • repo_url: None
  • paper_authors: Zifeng Ding, Jingcheng Wu, Jingpei Wu, Yan Xia, Volker Tresp
  • for: 这篇论文主要针对的是hyper-relational知识 graphs(HKGs)和temporal知识 graphs(TKGs)的理解和推理。
  • methods: 作者提出了两个新的benchmark datasets(Wiki-hy和YAGO-hy)和一种HTKG理解模型,该模型可以有效地处理时间信息和资料信息。
  • results: 实验结果表明,作者的模型在HTKG连接预测任务上显著超过了之前相关方法,并且可以通过同时利用时间不变的关系知识和时间信息来进一步提高表现。
    Abstract Stemming from traditional knowledge graphs (KGs), hyper-relational KGs (HKGs) provide additional key-value pairs (i.e., qualifiers) for each KG fact that help to better restrict the fact validity. In recent years, there has been an increasing interest in studying graph reasoning over HKGs. In the meantime, due to the ever-evolving nature of world knowledge, extensive parallel works have been focusing on reasoning over temporal KGs (TKGs), where each TKG fact can be viewed as a KG fact coupled with a timestamp (or time period) specifying its time validity. The existing HKG reasoning approaches do not consider temporal information because it is not explicitly specified in previous benchmark datasets. Besides, all the previous TKG reasoning methods only lay emphasis on temporal reasoning and have no way to learn from qualifiers. To this end, we aim to fill the gap between TKG reasoning and HKG reasoning. We develop two new benchmark hyper-relational TKG (HTKG) datasets, i.e., Wiki-hy and YAGO-hy, and propose a HTKG reasoning model that efficiently models both temporal facts and qualifiers. We further exploit additional time-invariant relational knowledge from the Wikidata knowledge base and study its effectiveness in HTKG reasoning. Time-invariant relational knowledge serves as the knowledge that remains unchanged in time (e.g., Sasha Obama is the child of Barack Obama), and it has never been fully explored in previous TKG reasoning benchmarks and approaches. Experimental results show that our model substantially outperforms previous related methods on HTKG link prediction and can be enhanced by jointly leveraging both temporal and time-invariant relational knowledge.
    摘要 traditional知识 graphs (KGs)的核心思想,hyper-relational知识 graphs (HKGs)提供每个KG事实的额外键值对(即资格),以更好地限定事实的有效性。近年来,研究图像理解在HKGs上有增加的兴趣。同时,由于世界知识的演化性,广泛的平行工作在图像理解过程中强调时间因素。现有的HKG理解方法不考虑时间信息,而且所有以前的TKG理解方法只是强调时间理解,没有考虑资格。为了填补这一空白,我们的目标是将HKG理解和TKG理解联系起来。我们开发了两个新的Benchmark hyper-relational TKG(HTKG)数据集,即Wiki-hy和YAGO-hy,并提出了一种HTKG理解模型,该模型能够有效地处理时间因素和资格。此外,我们还利用Wikidata知识库中的时间不变的关系知识,并研究其在HTKG理解中的效果。时间不变的关系知识是指不会随着时间的变化(例如萨沙·奥巴马是巴拉克·奥巴马的孩子),这种知识从未在过去的TKG理解benchmark和方法中被完全探索。实验结果表明,我们的模型在HTKG链接预测任务上显著超越了相关方法,并且可以通过同时利用时间因素和时间不变的关系知识来进一步提高性能。

Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

  • paper_url: http://arxiv.org/abs/2307.07631
  • repo_url: None
  • paper_authors: Davide Giacomini, Maeesha Binte Hashem, Jeremiah Suarez, Swarup Bhunia, Amit Ranjan Trivedi
  • for: 提高资源受限设备上深度神经网络模型的部署
  • methods: 使用记忆搜索(MBI),具有计算免卷和只需查找的特点,通过缓存中存储键值对来实现计算免卷的推理
  • results: 相比较现有的计算在内存(CIM)方法,MBI在MNIST字符识别任务上提高了能效率,相对于多层感知(MLP)-CIM和ResNet20-CIM方法,MBI的能效率提高了大约2.7倍和83倍Here’s the full translation of the abstract in Simplified Chinese:随着深度神经网络的快速发展,它们在各种任务上的表现得到了大幅提高,如图像和语音识别等。然而,随着模型的复杂度增加,计算成本和参数数量也随之增加,使得在资源受限设备上部署这些模型变得更加困难。本文提出了一种新的记忆搜索(MBI)方法,它具有计算免卷和只需查找的特点。通过缓存中存储键值对来实现计算免卷的推理。我们利用了隐藏向量来组合多个扫描结果,以实现问题的总分类输出。通过bayesian优化和归一化,减少了必要的查找数量,提高了准确率。此外,我们还提出了内存计算电路来快速查找输入查询匹配的关键vector。相比较现有的计算在内存(CIM)方法,MBI在MNIST字符识别任务上提高了能效率,相对于多层感知(MLP)-CIM和ResNet20-CIM方法,MBI的能效率提高了大约2.7倍和83倍。
    Abstract The rapid advancement of deep neural networks has significantly improved various tasks, such as image and speech recognition. However, as the complexity of these models increases, so does the computational cost and the number of parameters, making it difficult to deploy them on resource-constrained devices. This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM), where only a small window of input domain (glimpse) is processed in a one time step, and the outputs from multiple glimpses are combined through a hidden vector to determine the overall classification output of the problem. By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization. By exploiting Bayesian optimization and clustering, the necessary lookups are reduced, and accuracy is improved. We also present in-memory computing circuits to quickly look up the matching key vector to an input query. Compared to competitive compute-in-memory (CIM) approaches, MBI improves energy efficiency by almost 2.7 times than multilayer perceptions (MLP)-CIM and by almost 83 times than ResNet20-CIM for MNIST character recognition.
    摘要 深度神经网络的快速进步大大提高了各种任务,如图像和语音识别。然而,随着模型的复杂度增加,计算成本和参数数量也在增加,使得在有限资源的设备上部署变得困难。这篇论文提出了一种新的记忆化推理(MBI),它是计算免的,只需要lookups。我们的工作利用回卷注意力模型(RAM)的推理机制,只处理一次步骤中的小窗口输入领域(印象),并将多个印象的输出组合到一个隐藏向量中,以确定问题的总分类输出。我们利用印象的低维度,将推理过程中的关键值对存储在一个表中。在推理过程中,通过利用表来读取关键值对和计算免的推理。通过对搜索和分区进行优化,减少了必要的lookups,提高了准确率。我们还提出了内存计算电路,快速查找输入查询对应的匹配键向量。与与计算在内存(CIM)方法相比,MBI提高了能效率,相对于多层感知(MLP)-CIM的2.7倍,相对于ResNet20-CIM的83倍。

Generalizable Embeddings with Cross-batch Metric Learning

  • paper_url: http://arxiv.org/abs/2307.07620
  • repo_url: https://github.com/yetigurbuz/xml-dml
  • paper_authors: Yeti Z. Gurbuz, A. Aydin Alatan
  • for: 本文研究了深度度量学中的全球平均池化(GAP)组件,以及如何使其更好地捕捉Semantic Entity。
  • methods: 本文使用了学习可迁移的原型来表示GAP,并表明了这种方法可以在不同的批处理中进行可靠的学习。
  • results: 本文在4个深度度量学benchmark上验证了这种方法的效果,并达到了比较好的结果。In English, this means:
  • for: The paper studies the Global Average Pooling (GAP) component in deep metric learning (DML) and how it can better capture Semantic Entity.
  • methods: The paper uses learnable prototypes to represent GAP, and shows that this method can be reliably learned across different batches.
  • results: The paper verifies the effectiveness of this method on four popular DML benchmarks, achieving good results.
    Abstract Global average pooling (GAP) is a popular component in deep metric learning (DML) for aggregating features. Its effectiveness is often attributed to treating each feature vector as a distinct semantic entity and GAP as a combination of them. Albeit substantiated, such an explanation's algorithmic implications to learn generalizable entities to represent unseen classes, a crucial DML goal, remain unclear. To address this, we formulate GAP as a convex combination of learnable prototypes. We then show that the prototype learning can be expressed as a recursive process fitting a linear predictor to a batch of samples. Building on that perspective, we consider two batches of disjoint classes at each iteration and regularize the learning by expressing the samples of a batch with the prototypes that are fitted to the other batch. We validate our approach on 4 popular DML benchmarks.
    摘要 全球平均池化(GAP)是深度度量学(DML)中常用的一个组件,用于Feature集合。其效果通常被归结到对每个特征向量视为不同的semantic实体,并将GAP视为它们的组合。虽然这种解释得到了证明,但是它的算法逻辑来学习可 generalized Entities来表示未经看过的类,深度度量学的重要目标,仍然不清楚。为此,我们将GAP表示为可学习的原型的吞合权重的 convex combination。我们然后证明了这种原型学习可以表示为一个递归过程,对一个批处理样本适应一个线性预测器。从这个角度出发,我们考虑了两个不同的批处理,并在每个迭代阶段对学习进行正则化,使用这些批处理中的样本表示另一个批处理中的原型。我们验证了我们的方法在4个深度度量学标准测试集上。

Efficiently Factorizing Boolean Matrices using Proximal Gradient Descent

  • paper_url: http://arxiv.org/abs/2307.07615
  • repo_url: https://github.com/sdall/elbmf-python
  • paper_authors: Sebastian Dalleiger, Jilles Vreeken
  • for: addresses the interpretability problem of NMF on Boolean data
  • methods: uses Boolean algebra to decompose the input into low-rank Boolean factor matrices, with a novel elastic-binary regularizer and proximal gradient algorithm
  • results: demonstrates good performance in practice, with quick convergence, precise recovery of ground truth, and exact estimation of simulated rank; improves upon the state of the art in recall, loss, and runtime, and provides easily interpretable and semantically meaningful results on real-world data.Here’s the full text in Simplified Chinese:
  • for: addresses the interpretability problem of NMF on Boolean data
  • methods: 使用Boolean代数划分输入为低级Boolean分解矩阵,这些矩阵具有高可解释性和实际上非常有用,但是需要解决NP困难的 combinatorial优化问题; 我们提议使用一种新的灵活二进制正则化,从而 derivate一种 proximal 梯度算法
  • results: 通过广泛的实验表明,我们的方法在实际中工作良好:在 sintetic 数据上,它快速收敛,准确地回归真实值,并且正确地估算预设的rank; 在实际数据上,它超越了现有的状态,在回归、损失和运行时间上均有所提高,并且一个医疗领域的案例研究表明,我们的结果易于理解和具有Semantically Meaningful。
    Abstract Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.
    摘要 Translated into Simplified Chinese:Addressing the interpretability problem of NMF on Boolean data, Boolean Matrix Factorization (BMF) uses Boolean algebra to decompose the input into low-rank Boolean factor matrices. These matrices are highly interpretable and very useful in practice, but they come at the high computational cost of solving an NP-hard combinatorial optimization problem. To reduce the computational burden, we propose to relax BMF continuously using a novel elastic-binary regularizer, from which we derive a proximal gradient algorithm. Through an extensive set of experiments, we demonstrate that our method works well in practice: On synthetic data, we show that it converges quickly, recovers the ground truth precisely, and estimates the simulated rank exactly. On real-world data, we improve upon the state of the art in recall, loss, and runtime, and a case study from the medical domain confirms that our results are easily interpretable and semantically meaningful.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Towards Generalizable Detection of Urgency of Discussion Forum Posts

  • paper_url: http://arxiv.org/abs/2307.07614
  • repo_url: https://github.com/pcla-code/forum-posts-urgency
  • paper_authors: Valdemar Švábenský, Ryan S. Baker, Andrés Zambrano, Yishan Zou, Stefan Slater
  • for: 提高在线课程教学质量,帮助 instruктор更好地支持学生学习
  • methods: 使用predictive模型自动判断讨论区帖子的优先级,以便 instructor 更有效地响应学生问题
  • results: 使用支持向量回归算法和Universal Sentence Encoder嵌入式,实现了对讨论区帖子的优先级预测,可以帮助 instructor 更好地利用时间,提高学生学习质量
    Abstract Students who take an online course, such as a MOOC, use the course's discussion forum to ask questions or reach out to instructors when encountering an issue. However, reading and responding to students' questions is difficult to scale because of the time needed to consider each message. As a result, critical issues may be left unresolved, and students may lose the motivation to continue in the course. To help address this problem, we build predictive models that automatically determine the urgency of each forum post, so that these posts can be brought to instructors' attention. This paper goes beyond previous work by predicting not just a binary decision cut-off but a post's level of urgency on a 7-point scale. First, we train and cross-validate several models on an original data set of 3,503 posts from MOOCs at University of Pennsylvania. Second, to determine the generalizability of our models, we test their performance on a separate, previously published data set of 29,604 posts from MOOCs at Stanford University. While the previous work on post urgency used only one data set, we evaluated the prediction across different data sets and courses. The best-performing model was a support vector regressor trained on the Universal Sentence Encoder embeddings of the posts, achieving an RMSE of 1.1 on the training set and 1.4 on the test set. Understanding the urgency of forum posts enables instructors to focus their time more effectively and, as a result, better support student learning.
    摘要 在线学习者,如MOOC课程的学生,通常会使用课程的讨论 форуم来提问或与教师联系,当遇到问题时。然而,为了考虑每条消息,评估每个消息的时间成本很高,因此可能会有重要的问题被忽略。为了解决这个问题,我们构建了预测模型,以自动确定讨论 форум的优先级,以便将这些消息引导给教师的注意。这篇论文超过了之前的工作,不仅预测了一个二分类决策阈值,而且预测了每条消息的优先级水平,从1到7的7个级别。首先,我们训练和十分之检验了多种模型,使用大学 Pennsylvania的MOOC课程的原始数据集3,503条消息。其次,为了证明我们的模型的一致性,我们测试了它们的性能在另一个,已经发表的数据集29,604条消息中。而之前的帖子优先级预测工作只使用了一个数据集,我们在不同的数据集和课程之间评估预测。最佳性能的模型是使用Universe Sentence Encoder嵌入的支持向量回归模型,在训练集上的RMSE为1.1,测试集上的RMSE为1.4。了解讨论 форум中的帖子优先级,可以帮助教师更有效地利用时间,从而更好地支持学生学习。

First-order Methods for Affinely Constrained Composite Non-convex Non-smooth Problems: Lower Complexity Bound and Near-optimal Methods

  • paper_url: http://arxiv.org/abs/2307.07605
  • repo_url: None
  • paper_authors: Wei Liu, Qihang Lin, Yangyang Xu
  • for: 这个论文主要针对 composite non-convex non-smooth 优化问题,具有线性和/或非线性函数约束。
  • methods: 本论文使用 first-order method (FOM) 来解决上述问题,并提供了lower complexity bound的确定。
  • results: 本论文首次为 composite non-convex non-smooth 优化问题提供了lower complexity bound,并采用了一种名为减小距离梯度(IPG)方法来实现这个目标。该方法具有 oracle complexity 与 lower bound 几乎相同的性质。
    Abstract Many recent studies on first-order methods (FOMs) focus on \emph{composite non-convex non-smooth} optimization with linear and/or nonlinear function constraints. Upper (or worst-case) complexity bounds have been established for these methods. However, little can be claimed about their optimality as no lower bound is known, except for a few special \emph{smooth non-convex} cases. In this paper, we make the first attempt to establish lower complexity bounds of FOMs for solving a class of composite non-convex non-smooth optimization with linear constraints. Assuming two different first-order oracles, we establish lower complexity bounds of FOMs to produce a (near) $\epsilon$-stationary point of a problem (and its reformulation) in the considered problem class, for any given tolerance $\epsilon>0$. In addition, we present an inexact proximal gradient (IPG) method by using the more relaxed one of the two assumed first-order oracles. The oracle complexity of the proposed IPG, to find a (near) $\epsilon$-stationary point of the considered problem and its reformulation, matches our established lower bounds up to a logarithmic factor. Therefore, our lower complexity bounds and the proposed IPG method are almost non-improvable.
    摘要 很多最近的研究对于首项方法(FOMs)强调 composite non-convex non-smooth 优化问题,包括线性和/或非线性函数约束。然而,对于这些方法的优化性没有很多研究,只有一些特殊的平滑非几何优化问题例外。在这篇论文中,我们首次尝试确定 FOMs 对于解决 composite non-convex non-smooth 优化问题的类型的下界复杂度。我们假设两种不同的首项或acles,并确定 FOMs 的下界复杂度,以便在任何给定的 tolerance ε > 0 下,生成 (near) ε-站点。此外,我们还提出了一种不准确的 proximal Gradient(IPG)方法,使用更松的一个首项或acles。我们的 IPG 方法的 oracle 复杂度与我们确定的下界复杂度几乎相同,只有一个对数性logarithmic factor。因此,我们的下界复杂度和提出的 IPG 方法在优化性方面几乎不可改进。

Smooth Lower Bounds for Differentially Private Algorithms via Padding-and-Permuting Fingerprinting Codes

  • paper_url: http://arxiv.org/abs/2307.07604
  • repo_url: None
  • paper_authors: Naty Peter, Eliad Tsfadia, Jonathan Ullman
  • for: 这个论文是为了提供一种简单的方法来生成硬例子,以便为 differentially private(DP)算法的下界建立更加精细的lower bound。
  • methods: 这个论文使用了一种called “padding-and-permuting”的转换来生成硬例子,并使用了一个新的指纹代码构造方法来提供更加精细的下界。
  • results: 这个论文提供了新的下界在不同的设置下,包括DP averaging、approximate k-means clustering和DP subspace estimation等。这些下界是基于一种新的指纹lemmata,它比之前的指纹lemmata更加强大,并且可以直接从lemmata来证明下界。
    Abstract Fingerprinting arguments, first introduced by Bun, Ullman, and Vadhan (STOC 2014), are the most widely used method for establishing lower bounds on the sample complexity or error of approximately differentially private (DP) algorithms. Still, there are many problems in differential privacy for which we don't know suitable lower bounds, and even for problems that we do, the lower bounds are not smooth, and usually become vacuous when the error is larger than some threshold. In this work, we present a simple method to generate hard instances by applying a padding-and-permuting transformation to a fingerprinting code. We illustrate the applicability of this method by providing new lower bounds in various settings: 1. A tight lower bound for DP averaging in the low-accuracy regime, which in particular implies a new lower bound for the private 1-cluster problem introduced by Nissim, Stemmer, and Vadhan (PODS 2016). 2. A lower bound on the additive error of DP algorithms for approximate k-means clustering, as a function of the multiplicative error, which is tight for a constant multiplication error. 3. A lower bound for estimating the top singular vector of a matrix under DP in low-accuracy regimes, which is a special case of DP subspace estimation studied by Singhal and Steinke (NeurIPS 2021). Our main technique is to apply a padding-and-permuting transformation to a fingerprinting code. However, rather than proving our results using a black-box access to an existing fingerprinting code (e.g., Tardos' code), we develop a new fingerprinting lemma that is stronger than those of Dwork et al. (FOCS 2015) and Bun et al. (SODA 2017), and prove our lower bounds directly from the lemma. Our lemma, in particular, gives a simpler fingerprinting code construction with optimal rate (up to polylogarithmic factors) that is of independent interest.
    摘要 “指纹Argument”,最早由布恩、奥尔曼和 вадан(STOC 2014)引入,是最广泛使用的方法来确定下界或错误率的约 differentially private(DP)算法的下界。然而,有很多 differential privacy 问题,我们还没有知道合适的下界,而且甚至对已知的问题,下界不是平滑的,通常在误差大于某个阈值时变得无效。在这项工作中,我们提出了一种简单的方法,通过对指纹编码进行补充和排序转换来生成困难实例。我们通过以下几个方面证明了这种方法的应用性:1. 对DP抽象平均在低精度 régime中的下界,具体是对Nissim、Stemmer和 вадан(PODS 2016)所引入的私人1-集问题的新下界。2. DP算法对 Approximate k-means 集群化的添加性误差下界,其中multiplicative error是常数多项式的。3. DP算法对矩阵 top singular vector 的估计在低精度 régime中的下界,这是特殊的DP subspace estimation问题,与Singhal和Steinke(NeurIPS 2021)的研究相关。我们的主要技巧是对指纹编码进行补充和排序转换。而不是通过黑盒访问现有的指纹编码(例如Tardos的代码)来证明我们的结果(例如Dwork等人(FOCS 2015)和布恩等人(SODA 2017)的结果),我们开发了一个新的指纹 lemmatheorem,该lemmatheorem是Dwork等人(FOCS 2015)和布恩等人(SODA 2017)的lemmatheorem更强,并直接从lemmatheorem prove我们的下界。具体来说,我们的lemmatheorem提供了一种更简单的指纹编码建构,具有最佳率(即polylogarithmic factor),这是独立有价值的。

Training Discrete Energy-Based Models with Energy Discrepancy

  • paper_url: http://arxiv.org/abs/2307.07595
  • repo_url: None
  • paper_authors: Tobias Schröder, Zijing Ou, Yingzhen Li, Andrew B. Duncan
  • for: 本研究旨在提出一种新的对照损失函数,以便在离散空间上训练能量基模型(EBM)。
  • methods: 本研究使用了能量差(ED),一种新的对照损失函数,只需评估能量函数在数据点和其扰动版本之间的差异,无需采用MCMC样本抽取策略。
  • results: 研究人员通过对三种扰动过程(bernoulli噪声、杜特推论变换和邻域结构)的性能进行比较,并在离散链模型、二进制 sintetic 数据和离散图像数据集上进行了实验,证明了ED的效果。
    Abstract Training energy-based models (EBMs) on discrete spaces is challenging because sampling over such spaces can be difficult. We propose to train discrete EBMs with energy discrepancy (ED), a novel type of contrastive loss functional which only requires the evaluation of the energy function at data points and their perturbed counter parts, thus not relying on sampling strategies like Markov chain Monte Carlo (MCMC). Energy discrepancy offers theoretical guarantees for a broad class of perturbation processes of which we investigate three types: perturbations based on Bernoulli noise, based on deterministic transforms, and based on neighbourhood structures. We demonstrate their relative performance on lattice Ising models, binary synthetic data, and discrete image data sets.
    摘要 培训能量基于模型(EBM)在极性空间上是具有挑战性的,因为抽样这些空间可能困难。我们提议使用能量差(ED),一种新的对比损失函数,只需评估能量函数在数据点和其扰动版本之间,因此不需要采用样本策略如Markov链 Monte Carlo(MCMC)。能量差提供了对广泛类型扰动过程的理论保证,我们investigate三种类型的扰动过程:基于 Bernoulli 噪声、基于 deterministic transforms 和基于 neighbor structure。我们在邻居 Ising 模型、二进制 synthetic 数据和极性图像数据集上证明了它们的相对性能。

A Quantitative Approach to Predicting Representational Learning and Performance in Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07575
  • repo_url: None
  • paper_authors: Ryan Pyle, Sebastian Musslick, Jonathan D. Cohen, Ankit B. Patel
  • for: 本研究旨在探讨神经网络(生物和人工)如何学习表示和处理输入信息,以解决任务。不同类型的表示可能适用于不同类型的任务,因此理解和设计有用的网络需要了解学习的表示。
  • methods: 本研究提出了一种新的 Pseudo-kernel 基于工具,用于分析和预测神经网络学习的表示。该工具基于网络的初始条件和训练课程,并且可以预测表示学习对顺序单任务和并行多任务性能的影响。
  • results: 研究人员使用了一个简单的测试案例,然后使用该工具对一个关于表示学习对顺序单任务和并行多任务性能的问题进行预测。结果显示,该工具可以预测表示学习的规模初始化和训练课程对下游同时多任务性能的影响。
    Abstract A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks, making identifying and understanding learned representations a critical part of understanding and designing useful networks. In this paper, we introduce a new pseudo-kernel based tool for analyzing and predicting learned representations, based only on the initial conditions of the network and the training curriculum. We validate the method on a simple test case, before demonstrating its use on a question about the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.
    摘要 neuronal networks(生物和人工的)的一个关键性能是如何学习表示和处理输入信息以解决任务。不同类型的表示可能适用于不同类型的任务,因此识别和理解学习的表示是设计有用网络的关键部分。在这篇论文中,我们介绍了一种新的 Pseudo-kernel 基于工具 для分析和预测学习的表示,只基于网络的初始条件和训练课程。我们验证了这种方法在一个简单的测试场景中,然后示cases the use of this method to predict the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.Here's the translation in Traditional Chinese: neuronal networks(生物和人工的)的一个关键性能是如何学习表示和处理输入信息以解决任务。不同的类型的表示可能适用于不同的任务,因此识别和理解学习的表示是设计有用网络的关键部分。在这篇论文中,我们介绍了一种新的 Pseudo-kernel 基于工具 для分析和预测学习的表示,只基于网络的初始条件和训练课程。我们验证了这种方法在一个简单的测试场景中,然后示cases the use of this method to predict the effects of representational learning on sequential single versus concurrent multitask performance. We show that our method can be used to predict the effects of the scale of weight initialization and training curriculum on representational learning and downstream concurrent multitasking performance.

Harpa: High-Rate Phase Association with Travel Time Neural Fields

  • paper_url: http://arxiv.org/abs/2307.07572
  • repo_url: https://github.com/dadacheng/phase_association
  • paper_authors: Cheng Shi, Maarten V. de Hoop, Ivan Dokmanić
  • for: 这个论文是为了处理小型、高频地震事件所写的,以获取地震动力学特性的信息。
  • methods: 这个论文使用了深度神经场来建立波速和相关时间的生成模型,并解决了空间-时间源local化和波速恢复问题。
  • results: 这个论文表明可以在高速度下进行相关性分组,并且可以 efficiently处理不确定的波速。 numercial experiments表明,\harpa可以 efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.
    Abstract Phase association groups seismic wave arrivals according to their originating earthquakes. It is a fundamental task in a seismic data processing pipeline, but challenging to perform for smaller, high-rate seismic events which carry fundamental information about earthquake dynamics, especially with a commonly assumed inaccurate wave speed model. As a consequence, most association methods focus on larger events that occur at a lower rate and are thus easier to associate, even though microseismicity provides a valuable description of the elastic medium properties in the subsurface. In this paper, we show that association is possible at rates much higher than previously reported even when the wave speed is unknown. We propose Harpa, a high-rate seismic phase association method which leverages deep neural fields to build generative models of wave speeds and associated travel times, and first solves a joint spatio--temporal source localization and wave speed recovery problem, followed by association. We obviate the need for associated phases by interpreting arrival time data as probability measures and using an optimal transport loss to enforce data fidelity. The joint recovery problem is known to admit a unique solution under certain conditions but due to the non-convexity of the corresponding loss a simple gradient scheme converges to poor local minima. We show that this is effectively mitigated by stochastic gradient Langevin dynamics (SGLD). Numerical experiments show that \harpa~efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.
    摘要 phasic association groups seismic wave arrivals based on their originating earthquakes. It is a fundamental task in a seismic data processing pipeline, but challenging to perform for smaller, high-rate seismic events which carry fundamental information about earthquake dynamics, especially with a commonly assumed inaccurate wave speed model. As a consequence, most association methods focus on larger events that occur at a lower rate and are thus easier to associate, even though microseismicity provides a valuable description of the elastic medium properties in the subsurface. In this paper, we show that association is possible at rates much higher than previously reported even when the wave speed is unknown. We propose Harpa, a high-rate seismic phase association method which leverages deep neural fields to build generative models of wave speeds and associated travel times, and first solves a joint spatio--temporal source localization and wave speed recovery problem, followed by association. We obviate the need for associated phases by interpreting arrival time data as probability measures and using an optimal transport loss to enforce data fidelity. The joint recovery problem is known to admit a unique solution under certain conditions but due to the non-convexity of the corresponding loss a simple gradient scheme converges to poor local minima. We show that this is effectively mitigated by stochastic gradient Langevin dynamics (SGLD). Numerical experiments show that \harpa~efficiently associates high-rate seismicity clouds over complex, unknown wave speeds and graciously handles noisy and missing picks.

Variational Prediction

  • paper_url: http://arxiv.org/abs/2307.07568
  • repo_url: https://github.com/piyushpathak03/Recommendation-systems
  • paper_authors: Alexander A. Alemi, Ben Poole
  • for: 这个论文是为了探讨 bayesian inference 的优势以及其计算成本问题。
  • methods: 这篇论文使用了一种名为 Variational Prediction 的技术,即直接学习一种 variational approximation 来 aproximate posterior predictive distribution。
  • results: 这篇论文通过使用 Variational Prediction 技术,可以提供良好的预测分布,而无需在测试时进行 marginalization 成本。
    Abstract Bayesian inference offers benefits over maximum likelihood, but it also comes with computational costs. Computing the posterior is typically intractable, as is marginalizing that posterior to form the posterior predictive distribution. In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using a variational bound. This approach can provide good predictive distributions without test time marginalization costs. We demonstrate Variational Prediction on an illustrative toy example.
    摘要 Note:* "Bayesian inference" bayesian inference (悖论推理)* "maximum likelihood" maximum likelihood (最大可能性)* "posterior" posterior (后期)* "posterior predictive distribution" posterior predictive distribution (后期预测分布)* "variational bound" variational bound (可能性范围)* "variational prediction" variational prediction (可能性预测)

Reconstruction of 3-Axis Seismocardiogram from Right-to-left and Head-to-foot Components Using A Long Short-Term Memory Network

  • paper_url: http://arxiv.org/abs/2307.07566
  • repo_url: None
  • paper_authors: Mohammad Muntasir Rahman, Amirtahà Taebi
  • for: 这个研究旨在开发一个深度学习模型,用于预测心脏电压信号(SCG)的dorsoventral方向。
  • methods: 使用了15名健康成人的数据集来训练和验证模型,使用了三轴加速计 recording SCG信号,并使用了电cardiogram R波 Segmentation,将信号下推、 нормаLIZATION、中心化。
  • results: 研究获得了一个LSTM网络,可以将一个心脏周期中的100个时间步骤的SCG信号转换为dorsoventral方向的SCG信号,mean square error为0.09。这项研究显示了深度学习模型可以将 dual-axis加速计读取的数据转换为三轴SCG信号。
    Abstract This pilot study aims to develop a deep learning model for predicting seismocardiogram (SCG) signals in the dorsoventral direction from the SCG signals in the right-to-left and head-to-foot directions ($\textrm{SCG}_x$ and $\textrm{SCG}_y$). The dataset used for the training and validation of the model was obtained from 15 healthy adult subjects. The SCG signals were recorded using tri-axial accelerometers placed on the chest of each subject. The signals were then segmented using electrocardiogram R waves, and the segments were downsampled, normalized, and centered around zero. The resulting dataset was used to train and validate a long short-term memory (LSTM) network with two layers and a dropout layer to prevent overfitting. The network took as input 100-time steps of $\textrm{SCG}_x$ and $\textrm{SCG}_y$, representing one cardiac cycle, and outputted a vector that mapped to the target variable being predicted. The results showed that the LSTM model had a mean square error of 0.09 between the predicted and actual SCG segments in the dorsoventral direction. The study demonstrates the potential of deep learning models for reconstructing 3-axis SCG signals using the data obtained from dual-axis accelerometers.
    摘要 Here's the translation in Simplified Chinese:这项试验旨在开发一个深度学习模型,用于预测心电幂量信号(SCG)的dorsoventral方向。试验使用15名健康成人的SCG信号,通过三轴加速度计记录在胸部。信号被电cardiogram R波分割,下amples, норmalize和减少中心在零点。结果显示,使用LSTM网络(两层)和dropout层预防过拟合。网络输入100个时间步长的$SCG_x$和$SCG_y$,表示一个心脏频率征,输出一个向量,将目标变量映射到。结果显示LSTM模型与实际SCG段的平均方差为0.09。这项研究表明,深度学习模型可以使用双轴加速度计记录的数据来重建3轴SCG信号。

Expressive Monotonic Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07512
  • repo_url: https://github.com/niklasnolte/hlt_2track
  • paper_authors: Ouail Kitouni, Niklas Nolte, Michael Williams
  • for: 这个论文的目的是建立一种能够确保神经网络输出具有约束依赖性的权重建立方法,以便在各种应用场景中提高神经网络的可解释性和公平性。
  • methods: 该论文提出了一种基于权重约束的神经网络架构,通过单个差分连接来实现精确的依赖性。该方法直接控制神经网络的李普希茨常数,从而提供了额外的稳定性 benefit。
  • results: 该论文通过训练多种应用场景中的强大、稳定、可解释的探测器,达到了与当前状态艺术法的竞争性性能。
    Abstract The monotonic dependence of the outputs of a neural network on some of its inputs is a crucial inductive bias in many scenarios where domain knowledge dictates such behavior. This is especially important for interpretability and fairness considerations. In a broader context, scenarios in which monotonicity is important can be found in finance, medicine, physics, and other disciplines. It is thus desirable to build neural network architectures that implement this inductive bias provably. In this work, we propose a weight-constrained architecture with a single residual connection to achieve exact monotonic dependence in any subset of the inputs. The weight constraint scheme directly controls the Lipschitz constant of the neural network and thus provides the additional benefit of robustness. Compared to currently existing techniques used for monotonicity, our method is simpler in implementation and in theory foundations, has negligible computational overhead, is guaranteed to produce monotonic dependence, and is highly expressive. We show how the algorithm is used to train powerful, robust, and interpretable discriminators that achieve competitive performance compared to current state-of-the-art methods across various benchmarks, from social applications to the classification of the decays of subatomic particles produced at the CERN Large Hadron Collider.
    摘要 很多情况下,神经网络的输出对某些输入的 monotonic dependence 是一种重要的推导假设。这种假设在解释性和公平性方面具有重要意义。在更广泛的上下文中, monotonicity 在金融、医学、物理和其他领域都具有重要的意义。因此,建立能够实现这种假设的神经网络架构是非常感兴趣的。在这种情况下,我们提出了一种带有单个差异连接的权重约束架构,可以实现任意输入子集的精确 monotonic dependence。这种约束方案直接控制神经网络的 Lipschitz 常数,从而提供了额外的robustness benefit。与现有的 monotonicity 实现技术相比,我们的方法更简单,更有理论基础,计算开销几乎可以忽略不计,可以保证 monotonic dependence,并且具有很高的表达能力。我们显示了如何使用这种算法来训练高效、Robust、可解释的分类器,在社会应用和辐射子粒子在 CERN 大弹性粒子加速器中的分类方面达到了竞争性的性能。

MGit: A Model Versioning and Management System

  • paper_url: http://arxiv.org/abs/2307.07507
  • repo_url: None
  • paper_authors: Wei Hao, Daniel Mendoza, Rafael da Silva, Deepak Narayanan, Amar Phanishaye
  • for: 这篇论文是关于机器学习(ML)中模型 derivation的管理系统,帮助用户更好地存储、测试、更新和合作模型Derivative。
  • methods: 该系统使用线aje graph来记录模型之间的 provinance和版本信息,并实现了高效存储模型参数的优化和相关的测试、更新和合作功能。
  • results: 该系统可以减少线aje graph的存储占用量,并自动将下游模型更新对应的上游模型的更新。
    Abstract Models derived from other models are extremely common in machine learning (ML) today. For example, transfer learning is used to create task-specific models from "pre-trained" models through finetuning. This has led to an ecosystem where models are related to each other, sharing structure and often even parameter values. However, it is hard to manage these model derivatives: the storage overhead of storing all derived models quickly becomes onerous, prompting users to get rid of intermediate models that might be useful for further analysis. Additionally, undesired behaviors in models are hard to track down (e.g., is a bug inherited from an upstream model?). In this paper, we propose a model versioning and management system called MGit that makes it easier to store, test, update, and collaborate on model derivatives. MGit introduces a lineage graph that records provenance and versioning information between models, optimizations to efficiently store model parameters, as well as abstractions over this lineage graph that facilitate relevant testing, updating and collaboration functionality. MGit is able to reduce the lineage graph's storage footprint by up to 7x and automatically update downstream models in response to updates to upstream models.
    摘要 现在机器学习(ML)中,基于其他模型 derivated 的模型非常常见。例如,通过 fine-tuning 来创建任务特定模型从 "预训练" 模型。这导致了一个模型之间相互关联,共享结构,甚至参数值的生态系统。然而,管理这些模型Derivative 很困难:存储所有 derivated 模型的存储开销很快就变得压力很大,让用户放弃 intermediate 模型,可能用于进一步分析。此外,模型中的不良行为困难跟踪(例如,一个 bug 是从上游模型继承吗?)。在这篇论文中,我们提出一个名为 MGit 的模型版本管理系统,使得更加容易存储、测试、更新和合作模型Derivative。MGit 引入了模型家族图,记录模型的 происхождение和版本信息,并且提供了 Parameters 的优化,以及基于这个家族图的抽象,使得更加方便地进行相关的测试、更新和合作功能。MGit 能够将模型家族图的存储占用量减少至最多 7 倍,并自动将下游模型更新响应上游模型的更新。

Brain Tumor Detection using Convolutional Neural Networks with Skip Connections

  • paper_url: http://arxiv.org/abs/2307.07503
  • repo_url: None
  • paper_authors: Aupam Hamran, Marzieh Vaeztourshizi, Amirhossein Esmaili, Massoud Pedram
  • for: 用CNN分类脑肿为良性和恶性类型
  • methods: 使用MRI技术,采用不同的CNN建立方案进行分类
  • results: 结果显示,一些优化技术可以致使CNN模型在这个目标上表现出色
    Abstract In this paper, we present different architectures of Convolutional Neural Networks (CNN) to analyze and classify the brain tumors into benign and malignant types using the Magnetic Resonance Imaging (MRI) technique. Different CNN architecture optimization techniques such as widening and deepening of the network and adding skip connections are applied to improve the accuracy of the network. Results show that a subset of these techniques can judiciously be used to outperform a baseline CNN model used for the same purpose.
    摘要 在这篇论文中,我们介绍了不同类型的卷积神经网络(CNN)来分类大脑肿瘤为良性和有害两类使用Magnetic Resonance Imaging(MRI)技术。不同的CNN结构优化技术 such as 宽化和深化网络以及添加跳过连接被应用以提高网络的准确率。结果显示,一个子集这些技术可以有效地使用以超越基eline CNN模型。Here's the word-for-word translation:在这篇论文中,我们介绍了不同类型的卷积神经网络(CNN)来分类大脑肿瘤为良性和有害两类使用Magnetic Resonance Imaging(MRI)技术。不同的CNN结构优化技术such as 宽化和深化网络以及添加跳过连接被应用以提高网络的准确率。结果显示,一个子集这些技术可以有效地使用以超越基eline CNN模型。

Reinforcement Learning for Photonic Component Design

  • paper_url: http://arxiv.org/abs/2307.11075
  • repo_url: None
  • paper_authors: Donald Witt, Jeff Young, Lukas Chrostowski
  • for: 本研究旨在开发一种含有异常处理的fab-in-the-loop算法,用于设计尺度在220nm的尺度较小的光子学组件。
  • methods: 该算法利用了异常处理机制,以抵消尺度较小的光子学组件 fabrication process中的异常。
  • results: 该算法可以提高插入损耗从8.8dB降至3.24dB,并且可以生成具有150nm宽扩展带width的设计,其最低点loss不超过10.2dB。
    Abstract We present a new fab-in-the-loop reinforcement learning algorithm for the design of nano-photonic components that accounts for the imperfections present in nanofabrication processes. As a demonstration of the potential of this technique, we apply it to the design of photonic crystal grating couplers (PhCGC) fabricated on a 220nm silicon on insulator (SOI) single etch platform. This fab-in-the-loop algorithm improves the insertion loss from 8.8 dB to 3.24 dB. The widest bandwidth designs produced using our fab-in-the-loop algorithm are able to cover a 150nm bandwidth with less than 10.2 dB of loss at their lowest point.
    摘要 我们提出了一种新的 fab-in-the-loop 束缚学习算法,用于 nanophotonic 组件的设计,考虑到 nanofabrication 过程中存在的不确定性。作为这种技术的演示,我们应用它于 SOI 单刻平台上的 photonic crystal grating couplers (PhCGC) 的设计。这种 fab-in-the-loop 算法改善了插入损耗从 8.8 dB 降低至 3.24 dB。我们使用这种算法生成的设计可以覆盖 150nm 的频谱宽度,且损耗在最低点下不超过 10.2 dB。

PseudoCal: A Source-Free Approach to Unsupervised Uncertainty Calibration in Domain Adaptation

  • paper_url: http://arxiv.org/abs/2307.07489
  • repo_url: None
  • paper_authors: Dapeng Hu, Jian Liang, Xinchao Wang, Chuan-Sheng Foo
    for:This paper focuses on improving the calibration of predictive uncertainty in unsupervised domain adaptation (UDA) models, specifically in source-free UDA settings.methods:The proposed method, PseudoCal, relies exclusively on unlabeled target data to calibrate UDA models. It transforms the unsupervised calibration problem into a supervised one by generating a labeled pseudo-target set that captures the structure of the real target.results:Extensive experiments on 10 UDA methods show that PseudoCal consistently exhibits significantly reduced calibration error compared to existing calibration methods, both in traditional UDA settings and recent source-free UDA scenarios.
    Abstract Unsupervised domain adaptation (UDA) has witnessed remarkable advancements in improving the accuracy of models for unlabeled target domains. However, the calibration of predictive uncertainty in the target domain, a crucial aspect of the safe deployment of UDA models, has received limited attention. The conventional in-domain calibration method, \textit{temperature scaling} (TempScal), encounters challenges due to domain distribution shifts and the absence of labeled target domain data. Recent approaches have employed importance-weighting techniques to estimate the target-optimal temperature based on re-weighted labeled source data. Nonetheless, these methods require source data and suffer from unreliable density estimates under severe domain shifts, rendering them unsuitable for source-free UDA settings. To overcome these limitations, we propose PseudoCal, a source-free calibration method that exclusively relies on unlabeled target data. Unlike previous approaches that treat UDA calibration as a \textit{covariate shift} problem, we consider it as an unsupervised calibration problem specific to the target domain. Motivated by the factorization of the negative log-likelihood (NLL) objective in TempScal, we generate a labeled pseudo-target set that captures the structure of the real target. By doing so, we transform the unsupervised calibration problem into a supervised one, enabling us to effectively address it using widely-used in-domain methods like TempScal. Finally, we thoroughly evaluate the calibration performance of PseudoCal by conducting extensive experiments on 10 UDA methods, considering both traditional UDA settings and recent source-free UDA scenarios. The experimental results consistently demonstrate the superior performance of PseudoCal, exhibiting significantly reduced calibration error compared to existing calibration methods.
    摘要 Unsupervised domain adaptation (UDA) 技术在目标频道中的准确性方面做出了很多突出的进步,但是目标频道中的预测 uncertainty 的准确性却受到了有限的关注。传统的域内准则(TempScal)方法在域 Distribution 的转移和目标频道没有标注数据的情况下遇到了挑战。现有的方法使用重要性评估技术来估算目标频道优化的温度,但是这些方法需要源数据,而且在严重的域转移情况下,概率估计不可靠,因此不适用于源自由 UDA 设置。为了解决这些局限性,我们提出了 PseudoCal,一种源自由的准则调整方法,不需要源数据。与前期方法不同,我们将 UDA 准则调整视为目标频道特有的无监督准则调整问题,而不是 covariate shift 问题。受 TempScal 的负逻辑 log-likelihood(NLL) objective 的因子化启发,我们生成了一个 Pseudo-target 集,这个集合捕捉了真实target 的结构。通过这种方式,我们将无监督准则调整问题转化为监督的一个,可以使用现有的域内方法,如 TempScal,进行有效地处理。最后,我们进行了广泛的实验,评估了 10 种 UDA 方法,包括传统的 UDA 设置以及 recent source-free UDA 情况。实验结果表明,PseudoCal 的准则调整性能明显高于现有的准则调整方法,显示它在 calibration error 方面具有显著的优势。

DreamTeacher: Pretraining Image Backbones with Deep Generative Models

  • paper_url: http://arxiv.org/abs/2307.07487
  • repo_url: None
  • paper_authors: Daiqing Li, Huan Ling, Amlan Kar, David Acuna, Seung Wook Kim, Karsten Kreis, Antonio Torralba, Sanja Fidler
  • for: 本研究旨在提出一种自然语言处理框架,即梦教师,该框架利用生成网络进行预训练下游图像背景。
  • methods: 我们提出了两种知识填充方法:1)将生成网络学习的生成特征填充到目标图像背景上,作为对ImageNet大型标注数据集的预训练;2)将生成网络获得的标签填充到目标背景上的Logits上。
  • results: 我们进行了多种生成模型、精密预测benchmark和预训练策略的实验研究,并观察到我们的梦教师在所有自我超越现有自然语言处理方法。不需要手动标注,使用梦教师进行无监督图像预训练,可以获得显著改善。
    Abstract In this work, we introduce a self-supervised feature representation learning framework DreamTeacher that utilizes generative networks for pre-training downstream image backbones. We propose to distill knowledge from a trained generative model into standard image backbones that have been well engineered for specific perception tasks. We investigate two types of knowledge distillation: 1) distilling learned generative features onto target image backbones as an alternative to pretraining these backbones on large labeled datasets such as ImageNet, and 2) distilling labels obtained from generative networks with task heads onto logits of target backbones. We perform extensive analyses on multiple generative models, dense prediction benchmarks, and several pre-training regimes. We empirically find that our DreamTeacher significantly outperforms existing self-supervised representation learning approaches across the board. Unsupervised ImageNet pre-training with DreamTeacher leads to significant improvements over ImageNet classification pre-training on downstream datasets, showcasing generative models, and diffusion generative models specifically, as a promising approach to representation learning on large, diverse datasets without requiring manual annotation.
    摘要 “在这个研究中,我们介绍了一个自我超vised特征表示学习框架DreamTeacher,该框架利用生成网络进行预训练下游图像脑筋。我们提议通过将生成模型已经学习的特征知识融入到标准图像脑筋中来,而不是使用大量标注数据集如ImageNet进行预训练。我们研究了两种知识融入方法:1)将生成模型学习的特征知识直接融入目标图像脑筋中,2)将生成网络中的标签融入到目标脑筋的响应值中。我们对多种生成模型、粘密预测benchmark和预训练策略进行了广泛的分析。我们发现,我们的DreamTeacher在所有自我超vised表示学习方法之上表现出优异的成绩。不需要手动标注,使用DreamTeacher进行无监督ImageNet预训练可以在下游数据集上获得显著改进,特别是使用扩散生成模型。”

Population Expansion for Training Language Models with Private Federated Learning

  • paper_url: http://arxiv.org/abs/2307.07477
  • repo_url: None
  • paper_authors: Tatsuki Koga, Congzheng Song, Martin Pelikan, Mona Chitnis
  • for: 这个研究旨在提高 federated learning(FL) combined with differential privacy(DP)的机器学习(ML)训练效率和形式化隐私保证,尤其是在小型人口的情况下。
  • methods: 这个研究使用了域适应技术来扩展人口,以加快训练和提高最终模型质量。
  • results: 研究表明,使用这些技术可以提高模型的使用価价( Utility),在实际的语言模型化数据集上提高13%到30%。
    Abstract Federated learning (FL) combined with differential privacy (DP) offers machine learning (ML) training with distributed devices and with a formal privacy guarantee. With a large population of devices, FL with DP produces a performant model in a timely manner. However, for applications with a smaller population, not only does the model utility degrade as the DP noise is inversely proportional to population, but also the training latency increases since waiting for enough clients to become available from a smaller pool is slower. In this work, we thus propose expanding the population based on domain adaptation techniques to speed up the training and improves the final model quality when training with small populations. We empirically demonstrate that our techniques can improve the utility by 13% to 30% on real-world language modeling datasets.
    摘要 联合 federated learning (FL) 和差异隐私 (DP) 可以为分布式设备进行机器学习 (ML) 训练,并且提供正式的隐私保证。通过大量的设备人口,FL 与 DP 可以生成高性能的模型,但是在小规模应用中,模型的Utility 会逐渐下降,而且训练时间会增加,因为等待足够的客户端可用于训练的池子中 slower。为了解决这个问题,我们提议通过领域适应技术扩大人口,以加速训练和提高最终模型质量。我们经验表明,我们的技术可以提高实际语言模型集成的Utility 13% 到 30%。

Structured Pruning of Neural Networks for Constraints Learning

  • paper_url: http://arxiv.org/abs/2307.07457
  • repo_url: None
  • paper_authors: Matteo Cacciola, Antonio Frangioni, Andrea Lodi
  • for: 这篇论文主要探讨了Machine Learning(ML)模型与Operation Research(OR)工具的组合,尤其是在肝癌治疗、算法配置和化学处理优化等领域。
  • methods: 本研究使用了删除(pruning)技术来将人工神经网络(ANNs)裁短,以提高Mixed Integer Programming(MIP)表达的解决速度。
  • results: 实验结果显示,删除可以对多层 feed-forward neural networks 建立反例,并且可以实现很大的解决速度提高,而不会对最终决策的质量产生影响。
    Abstract In recent years, the integration of Machine Learning (ML) models with Operation Research (OR) tools has gained popularity across diverse applications, including cancer treatment, algorithmic configuration, and chemical process optimization. In this domain, the combination of ML and OR often relies on representing the ML model output using Mixed Integer Programming (MIP) formulations. Numerous studies in the literature have developed such formulations for many ML predictors, with a particular emphasis on Artificial Neural Networks (ANNs) due to their significant interest in many applications. However, ANNs frequently contain a large number of parameters, resulting in MIP formulations that are impractical to solve, thereby impeding scalability. In fact, the ML community has already introduced several techniques to reduce the parameter count of ANNs without compromising their performance, since the substantial size of modern ANNs presents challenges for ML applications as it significantly impacts computational efforts during training and necessitates significant memory resources for storage. In this paper, we showcase the effectiveness of pruning, one of these techniques, when applied to ANNs prior to their integration into MIPs. By pruning the ANN, we achieve significant improvements in the speed of the solution process. We discuss why pruning is more suitable in this context compared to other ML compression techniques, and we identify the most appropriate pruning strategies. To highlight the potential of this approach, we conduct experiments using feed-forward neural networks with multiple layers to construct adversarial examples. Our results demonstrate that pruning offers remarkable reductions in solution times without hindering the quality of the final decision, enabling the resolution of previously unsolvable instances.
    摘要 近年来,机器学习(ML)模型与运筹学(OR)工具的集成在多种应用中得到了普遍的推广,包括肿瘤治疗、算法配置和化学过程优化。在这个领域,ML和OR的结合经常通过使用混合整数编程(MIP)表述来实现。文献中有很多研究对多种ML预测器进行了MIP表述,特别是人工神经网络(ANNs),因为它们在许多应用中具有极高的 интерес。然而,ANNs经常具有很多参数,导致MIP表述变得不实现,从而降低了扩展性。事实上,ML社区已经开发出了许多技术来减少ANNs中参数的数量,以避免降低性能。在这篇论文中,我们展示了采用剪枝(pruning)这一技术可以在ANNs之前进行剪枝,从而实现显著提高解决速度的效果。我们解释了为什么剪枝在这个上下文中比其他ML压缩技术更适用,并标识了最佳剪枝策略。为了强调这种方法的潜力,我们通过使用多层感知网络构建了反对抗例。我们的结果表明,剪枝可以很有效地减少解决时间,而无需妨碍最终决策的质量,从而解决了之前不可解决的实例。

Generative adversarial networks for data-scarce spectral applications

  • paper_url: http://arxiv.org/abs/2307.07454
  • repo_url: None
  • paper_authors: Juan José García-Esteban, Juan Carlos Cuevas, Jorge Bravo-Abad
  • for: 本研究应用生成数学智慧(GANs)在科学领域中,解决不同科学 context 中的数据短缺问题。
  • methods: 本研究使用了 Wasserstein GANs (WGANs) 和条件 WGANs (CWGANs),并与单元 feed-forward neural network (FFNN) 进行联合使用,以增强模型的性能。
  • results: 研究发现,使用 CWGAN 进行数据增强,可以提高 FFNN 的表现,特别是在有限数据情况下。此外,CWGAN 可以作为低数据情况下的代理模型,表现较好。
    Abstract Generative adversarial networks (GANs) are one of the most robust and versatile techniques in the field of generative artificial intelligence. In this work, we report on an application of GANs in the domain of synthetic spectral data generation, offering a solution to the scarcity of data found in various scientific contexts. We demonstrate the proposed approach by applying it to an illustrative problem within the realm of near-field radiative heat transfer involving a multilayered hyperbolic metamaterial. We find that a successful generation of spectral data requires two modifications to conventional GANs: (i) the introduction of Wasserstein GANs (WGANs) to avoid mode collapse, and, (ii) the conditioning of WGANs to obtain accurate labels for the generated data. We show that a simple feed-forward neural network (FFNN), when augmented with data generated by a CWGAN, enhances significantly its performance under conditions of limited data availability, demonstrating the intrinsic value of CWGAN data augmentation beyond simply providing larger datasets. In addition, we show that CWGANs can act as a surrogate model with improved performance in the low-data regime with respect to simple FFNNs. Overall, this work highlights the potential of generative machine learning algorithms in scientific applications beyond image generation and optimization.
    摘要 生成对抗网络(GAN)是生成人工智能领域最为稳健和多样化的技术之一。在这项工作中,我们报告了GAN在spectral数据生成领域的应用,提供了数据缺乏问题的解决方案。我们通过在多层赫普力元件中的近场辐射热传输问题中应用提议方法来示例。我们发现,成功生成spectral数据需要两个修改:(i)引入Wasserstein GANs(WGANs)以避免模式塌溃,以及(ii)使WGANs Conditioned以获取准确的标签 для生成数据。我们表明,在有限数据情况下,一个简单的Feed-Forward Neural Network(FFNN),当其被补充了由CWGAN生成的数据后,显著提高了其性能。此外,我们还示出了CWGAN可以作为低数据情况下的代理模型,其性能比简单的FFNN更高。总的来说,这项工作强调了生成机器学习算法在科学应用之外的潜在价值。

Differentially Private Clustering in Data Streams

  • paper_url: http://arxiv.org/abs/2307.07449
  • repo_url: None
  • paper_authors: Alessandro Epasto, Tamalika Mukherjee, Peilin Zhong
  • for: 这个论文关注的问题是如何在流处理中实现隐私保护的分 clustering算法,以满足现实世界中数据隐私的要求。
  • methods: 这个论文提出了一种基于流处理的差分隐私 clustering算法,使用了流处理模型来处理大规模的数据流。该算法只需一个通过数据流的一次扫描,并且可以在流处理中实现分 clustering。
  • results: 该论文提出了一种可以实现$(1+\gamma)$-倍增加的差分隐私 clustering算法,使用了流处理模型和差分隐私技术。该算法的空间复杂度为$poly(k,d,\log(T))$,并且可以保证对于任意的$\gamma>0$,扩展系数是$(1+\gamma)$,增加系数是$poly(k,d,\log(T))$.
    Abstract The streaming model is an abstraction of computing over massive data streams, which is a popular way of dealing with large-scale modern data analysis. In this model, there is a stream of data points, one after the other. A streaming algorithm is only allowed one pass over the data stream, and the goal is to perform some analysis during the stream while using as small space as possible. Clustering problems (such as $k$-means and $k$-median) are fundamental unsupervised machine learning primitives, and streaming clustering algorithms have been extensively studied in the past. However, since data privacy becomes a central concern in many real-world applications, non-private clustering algorithms are not applicable in many scenarios. In this work, we provide the first differentially private streaming algorithms for $k$-means and $k$-median clustering of $d$-dimensional Euclidean data points over a stream with length at most $T$ using $poly(k,d,\log(T))$ space to achieve a {\it constant} multiplicative error and a $poly(k,d,\log(T))$ additive error. In particular, we present a differentially private streaming clustering framework which only requires an offline DP coreset algorithm as a blackbox. By plugging in existing DP coreset results via Ghazi, Kumar, Manurangsi 2020 and Kaplan, Stemmer 2018, we achieve (1) a $(1+\gamma)$-multiplicative approximation with $\tilde{O}_\gamma(poly(k,d,\log(T)))$ space for any $\gamma>0$, and the additive error is $poly(k,d,\log(T))$ or (2) an $O(1)$-multiplicative approximation with $\tilde{O}(k \cdot poly(d,\log(T)))$ space and $poly(k,d,\log(T))$ additive error. In addition, our algorithmic framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
    摘要 “流处理模型是大规模数据流处理的抽象,是现代数据分析中受欢迎的方法。在这个模型中,有一串数据点,一个接一个地进行处理。流处理算法只有一次可以访问数据流,目标是在流中进行分析,使用最小的空间。归类问题(如$k$-means和$k$- median)是现代无监督机器学习的基本 primitives,流处理归类算法已经得到了广泛的研究。然而,由于数据隐私问题的关注,非私有的归类算法不适用于许多场景。在这种情况下,我们提供了首个具有常量多元因子错误和$poly(k,d,\log(T))$空间的扩展隐私流处理归类算法。特别是,我们提供了一个具有隐私性的流处理归类框架,只需要一个私有DP核心算法作为黑盒。通过插入现有的DP核心结果,我们实现了以下两个目标:1. $(1+\gamma)$-多元因子近似, $\tilde{O}_\gamma(poly(k,d,\log(T)))$ 空间,对于任何 $\gamma>0$。错误是 $poly(k,d,\log(T))$。2. $O(1)$-多元因子近似, $\tilde{O}(k \cdot poly(d,\log(T)))$ 空间,错误是 $poly(k,d,\log(T))$。此外,我们的算法框架还是隐私的,即将流处理算法的输出集合在每个时间戳都是隐私的。”

Can Large Language Models Empower Molecular Property Prediction?

  • paper_url: http://arxiv.org/abs/2307.07443
  • repo_url: https://github.com/chnq/llm4mol
  • paper_authors: Chen Qian, Huayi Tang, Zhirui Yang, Hong Liang, Yong Liu
  • for: 本研究旨在利用大型自然语言模型(LLM)提高分子物理性能预测。
  • methods: 本研究采用两个视角:零/几次分子类型化和使用LLM生成的新解释作为分子表示。
  • results: 实验结果表明,使用文本解释作为分子表示可以在多个benchmark数据集上实现优越性,并证明LLM在分子物理性能预测任务中具有极大的潜力。
    Abstract Molecular property prediction has gained significant attention due to its transformative potential in multiple scientific disciplines. Conventionally, a molecule graph can be represented either as a graph-structured data or a SMILES text. Recently, the rapid development of Large Language Models (LLMs) has revolutionized the field of NLP. Although it is natural to utilize LLMs to assist in understanding molecules represented by SMILES, the exploration of how LLMs will impact molecular property prediction is still in its early stage. In this work, we advance towards this objective through two perspectives: zero/few-shot molecular classification, and using the new explanations generated by LLMs as representations of molecules. To be specific, we first prompt LLMs to do in-context molecular classification and evaluate their performance. After that, we employ LLMs to generate semantically enriched explanations for the original SMILES and then leverage that to fine-tune a small-scale LM model for multiple downstream tasks. The experimental results highlight the superiority of text explanations as molecular representations across multiple benchmark datasets, and confirm the immense potential of LLMs in molecular property prediction tasks. Codes are available at \url{https://github.com/ChnQ/LLM4Mol}.
    摘要 молекулярная свойство предсказание 已经吸引了广泛关注,因为它在多种科学领域中可能产生很大的转变。通常,分子图可以表示为图structured data或SMILES文本。在最近几年,大型自然语言模型(LLMs)的快速发展已经革命化了自然语言处理(NLP)领域。虽然可以使用LLMs来帮助理解表示分子的SMILES,但是研究如何使用LLMs进行分子性质预测的阶段还处于早期。在这种工作中,我们通过两个视角提前这个目标:零/几个分子类别和使用LLMs生成的新解释来代表分子。具体来说,我们首先请求LLMs在上下文中进行分子分类,并评估其表现。然后,我们使用LLMs生成semantically Rich explanation for the original SMILES,并使用这些解释来精细调整一个小规模LM模型 для多个下游任务。实验结果表明文本解释作为分子表示的超越多个benchmark dataset,并证明LLMs在分子性质预测任务中的极大潜力。代码可以在 \url{https://github.com/ChnQ/LLM4Mol} 中找到。

Atlas-Based Interpretable Age Prediction

  • paper_url: http://arxiv.org/abs/2307.07439
  • repo_url: None
  • paper_authors: Sophie Starck, Yadunandan Vivekanand Kini, Jessica Johanna Maria Ritter, Rickmer Braren, Daniel Rueckert, Tamara Mueller
  • for: 该研究旨在提高医学评估和研究中的年龄预测精度,以检测疾病和异常年龄衰老。
  • methods: 该研究使用整体身体图像进行研究,并使用Grad-CAM解释方法确定人体不同部位对年龄预测的影响。通过注册技术生成人口范围内的解释地图,以扩展分析范围。
  • results: 研究发现三个主要预测年龄关键部位:脊梁、自生背肌和心脏区,其中心脏区的重要性最高。该模型在整体身体图像上实现了state-of-the-art的年龄预测精度,年龄差异平均值为2.76年。
    Abstract Age prediction is an important part of medical assessments and research. It can aid in detecting diseases as well as abnormal ageing by highlighting the discrepancy between chronological and biological age. To gain a comprehensive understanding of age-related changes observed in various body parts, we investigate them on a larger scale by using whole-body images. We utilise the Grad-CAM interpretability method to determine the body areas most predictive of a person's age. We expand our analysis beyond individual subjects by employing registration techniques to generate population-wide interpretability maps. Furthermore, we set state-of-the-art whole-body age prediction with a model that achieves a mean absolute error of 2.76 years. Our findings reveal three primary areas of interest: the spine, the autochthonous back muscles, and the cardiac region, which exhibits the highest importance.
    摘要 预测年龄是医学评估和研究中的一个重要部分。它可以帮助检测疾病以及异常年龄的趋势,并且通过显示生物年龄与 cronological age 之间的差异来提供有价值的信息。为了更全面地了解不同部位的年龄相关变化,我们使用整体图像进行研究。我们使用 Grad-CAM 可读性方法来确定人体各部位最有predictive value的地方。此外,我们还使用注册技术来生成全 популяцион的可读性地图,以扩展我们的分析范围。此外,我们实现了全身年龄预测的state-of-the-art模型,其 сред平均绝对误差为2.76年。我们的发现表明了三个主要领域:脊梁、自生肌肉和心脏区域,这三个领域具有最高的重要性。