cs.LG - 2023-09-20

Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening

  • paper_url: http://arxiv.org/abs/2309.11687
  • repo_url: None
  • paper_authors: Zhonglin Cao, Simone Sciabola, Ye Wang
  • for: 本研究旨在提高活性学习和 bayesian 优化在虚拟屏选中的精度和样本效率,使用预训练的 transformer 基于语言模型和图神经网络。
  • methods: 本研究使用的方法包括虚拟屏选、活性学习和 bayesian 优化,采用预训练的 transformer 基于语言模型和图神经网络作为评估模型。
  • results: 研究结果表明,预训练 transformer 基于语言模型和图神经网络在 Bayesian 优化active learning框架中表现出色,可以在虚拟屏选中提高精度和样本效率,比前一个基eline提高8%。
    Abstract Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, brute-force virtual screening using traditional tools such as docking becomes infeasible in terms of time and computational resources. Active learning and Bayesian optimization has recently been proven as effective methods of narrowing down the search space. An essential component in those methods is a surrogate machine learning model that is trained with a small subset of the library to predict the desired properties of compounds. Accurate model can achieve high sample efficiency by finding the most promising compounds with only a fraction of the whole library being virtually screened. In this study, we examined the performance of pretrained transformer-based language model and graph neural network in Bayesian optimization active learning framework. The best pretrained models identifies 58.97% of the top-50000 by docking score after screening only 0.6% of an ultra-large library containing 99.5 million compounds, improving 8% over previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Such model can serve as a boost to the accuracy and sample efficiency of active learning based molecule virtual screening.
    摘要 做为药物发现的早期步骤之一,虚拟屏选大规模化合物库以找到潜在的靶点候选者。随着商业可用的化合物集合的规模 exponentiated 到亿量级,使用传统工具 such as docking 进行虚拟屏选成为计算资源和时间上的不可行。活动学习和 Bayesian 优化已经被证明为虚拟屏选中的有效方法。这些方法中的一个关键组件是一个训练于小型库中的机器学习模型,用于预测化合物的欲要性。一旦有一个准确的模型,它可以在虚拟屏选中高效地寻找最有前途的化合物,只需虚拟屏选出一小部分的化合物库。在本研究中,我们研究了使用预训练的 transformer 基于语言模型和图神经网络在 Bayesian 优化活动学习框架中的表现。最佳预训练模型可以在虚拟屏选出 99.5 亿个化合物库中的 58.97% 最佳 docking 分数前 50000 个化合物,提高了 8% 于前一个基准值。我们通过广泛的 benchmark 表明,预训练模型在结构基于和药物基于的药物发现中的表现仍然优秀。这种模型可以为活动学习基于虚拟屏选的药物发现增加精度和采样效率。

Popularity Degradation Bias in Local Music Recommendation

  • paper_url: http://arxiv.org/abs/2309.11671
  • repo_url: https://github.com/asem010/legend-pice
  • paper_authors: April Trainor, Douglas Turnbull
  • for: 本研究探讨了当地音乐推荐中的人气倒退偏见问题。
  • methods: 研究使用了两种现场表现最佳的推荐算法:Weight Relevance Matrix Factorization (WRMF) 和 Multinomial Variational Autoencoder (Mult-VAE)。
  • results: 研究发现,这两种算法在更受欢迎的艺术家上的推荐性能都有所提高,并且展现了人气倒退偏见。 Mult-VAE 在 menos popular 的艺术家上表现更好,因此在当地音乐艺术家推荐中可能更有优势。
    Abstract In this paper, we study the effect of popularity degradation bias in the context of local music recommendations. Specifically, we examine how accurate two top-performing recommendation algorithms, Weight Relevance Matrix Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at recommending artists as a function of artist popularity. We find that both algorithms improve recommendation performance for more popular artists and, as such, exhibit popularity degradation bias. While both algorithms produce a similar level of performance for more popular artists, Mult-VAE shows better relative performance for less popular artists. This suggests that this algorithm should be preferred for local (long-tail) music artist recommendation.
    摘要 在这篇论文中,我们研究了本地音乐推荐中的人気倒退偏见影响。我们专门研究了两种最佳推荐算法的精度,即Weight Relevance Matrix Factorization (WRMF)和Multinomial Variational Autoencoder (Mult-VAE)。我们发现这两种算法对更受欢迎的艺术家的推荐性能都有改善,因此它们都存在人気倒退偏见。虽然这两种算法在更受欢迎的艺术家中的表现水平相似,但Mult-VAE在 menos popular 艺术家中表现更优。这表示Mult-VAE应该选择用于本地(长尾)音乐艺术家推荐。

GLM Regression with Oblivious Corruptions

  • paper_url: http://arxiv.org/abs/2309.11657
  • repo_url: None
  • paper_authors: Ilias Diakonikolas, Sushrut Karmalkar, Jongho Park, Christos Tzamos
  • for: 这个论文是为了解决通用线性模型(GLM)在某些情况下添加了随机噪声的问题而写的。
  • methods: 这篇论文使用了一种新的算法来解决这个问题,该算法可以在最通用的分布无关 Settings中实现。
  • results: 论文的结果表明,该算法可以在大多数情况下提供高度准确的解决方案,而且可以处理更多的样本被随机噪声损害的情况。I hope this helps! Let me know if you have any further questions.
    Abstract We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.
    摘要 我们展示了第一个对于通用线性模型(GLM)中扩展的问题的回溯算法。我们假设有一个访问例子 $(x, y)$,其中 $y $ 是 $g(w^* \cdot x)$ 的错误的测量。特别是,我们假设错误标签的形式为 $y = g(w^* \cdot x) + \xi + \epsilon$,其中 $\xi $ 是独立于 $x$ 的随机错误,且 $\Pr[\xi = 0] \geq o(1)$,且 $\epsilon \sim \mathcal N(0, \sigma^2)$。我们的目标是将一个精确地回传 $w $ 的参数,使得 $g(w \cdot x)$ 与真正的值 $g(w^* \cdot x)$ 之间的差异可以随时对应。我们提出了一个可以在最通用的分布不依赖情况下解决这个问题的算法。如果问题可解析,我们的算法将返回一个精确的解析结果;否则,它将返回一个小列表,其中一个与真实解析结果相似。此外,我们还提供了必要和充分的可 identificability 条件,这样在广泛的设定下都会成立。具体来说,问题可解析当 $\xi + \epsilon = 0$ 的quantile 知道,或者家族假设不包含 nearly equal to $g(w^* \cdot x) + A$ 的候选者,而且在与 $g(w^* \cdot x)$ 比较时有大的误差。这是第一个对 GLM 回溯算法中扩展的数据验证项目,可以应对更多于半数的样本被任意损坏。先前的工作主要集中在线性回溯领域,并提供了对于特定假设的限制性算法。

Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks

  • paper_url: http://arxiv.org/abs/2309.11651
  • repo_url: https://github.com/nian-si/rbmsolver
  • paper_authors: Baris Ata, J. Michael Harrison, Nian Si
  • for: 本研究的目的是寻找一种能够在无穷规划 horizion 上最小化预算的折损控制方法。
  • methods: 该研究使用了深度神经网络技术来解决这个控制问题,并对一些测试问题进行了实验 validate。
  • results: 研究发现,使用深度神经网络技术可以在高维度($d=30$) 下实现高精度的解决方案,并且计算效率高。
    Abstract Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
    摘要 Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.Translated into Simplified Chinese:我们受到排阵理论应用的驱动下,考虑一个 Stochastic control problem,其state space是 $d$ 维正方形。控制过程 $Z$ 是一个受到确定的均值矩阵影响的反射 Браун运动,其方向受到正方形边界表面的反射影响。系统管理员在每个时刻 $t$ 选择一个推移 вектор $\theta(t)$,基于 $Z$ 的历史,而在每个时刻 $t$ 的成本率取决于 $Z(t)$ 和 $\theta(t)$。在我们的初始问题中,目标是在无限计划时间后面内预算成本,然后处理相应的ergodic control问题。我们将 extending Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510) 的研究,开发了一个基于深度神经网络技术的 simulational-based computational method。在我们试验的问题上,我们的方法精度在 fraction of one percent 以内,并且在维度至少 $d=30$ 时是 computationally feasible。

Potential and limitations of random Fourier features for dequantizing quantum machine learning

  • paper_url: http://arxiv.org/abs/2309.11647
  • repo_url: None
  • paper_authors: Ryan Sweke, Erik Recio, Sofiene Jerbi, Elies Gil-Fuster, Bryce Fuller, Jens Eisert, Johannes Jakob Meyer
  • for: 这篇论文主要是关于量子机器学习的应用,具体来说是关于近期量子设备上的变量量子机器学习。
  • methods: 这篇论文使用了参数化的量子电路(PQC)作为学习模型,并研究了这些PQC模型在减量化上的效率。
  • results: 这篇论文提出了关于变量量子机器学习 regression 问题下减量化的必要和 suficient 条件,并基于这些准则提出了具体的PQC架构设计和优化方法。
    Abstract Quantum machine learning is arguably one of the most explored applications of near-term quantum devices. Much focus has been put on notions of variational quantum machine learning where parameterized quantum circuits (PQCs) are used as learning models. These PQC models have a rich structure which suggests that they might be amenable to efficient dequantization via random Fourier features (RFF). In this work, we establish necessary and sufficient conditions under which RFF does indeed provide an efficient dequantization of variational quantum machine learning for regression. We build on these insights to make concrete suggestions for PQC architecture design, and to identify structures which are necessary for a regression problem to admit a potential quantum advantage via PQC based optimization.
    摘要 量子机器学习是近期量子设备应用的一个最具探索性的领域。许多研究都集中在变量量子机器学习中,使用参数化量子电路(PQC)作为学习模型。这些PQC模型具有丰富的结构,这意味着它们可能会受到有效的减量化处理(RFF)。在这个工作中,我们确定了变量量子机器学习 regression 问题下的必要和充分条件,以确保RFF实现有效的减量化。我们基于这些发现,对PQC架构设计提出了具体的建议,并标识了可以使用PQC基于优化实现量子优势的结构。

Early diagnosis of autism spectrum disorder using machine learning approaches

  • paper_url: http://arxiv.org/abs/2309.11646
  • repo_url: https://github.com/diponkor-bala/autism-spectrum-disorder
  • paper_authors: Rownak Ara Rasul, Promy Saha, Diponkor Bala, S M Rakib Ul Karim, Ibrahim Abdullah, Bishwajit Saha
  • for: This paper aims to utilize machine learning algorithms to identify and automate the diagnostic process for Autistic Spectrum Disorder (ASD).
  • methods: The paper employs six classification models and five popular clustering methods to analyze ASD datasets, and evaluates their performance using various metrics such as accuracy, precision, recall, specificity, F1-score, AUC, kappa, and log loss.
  • results: The paper achieves a 100% accuracy rate when hyperparameters are carefully tuned for each model, and finds that spectral clustering outperforms other benchmarking clustering models in terms of NMI and ARI metrics, demonstrating comparability to the optimal SC achieved by k-means.Here’s the Chinese version of the three key points:
  • for: 这篇论文目标是使用机器学习算法来识别和自动诊断听力特指症(ASD)。
  • methods: 论文使用 six 种分类模型和 five 种流行的聚类方法来分析 ASD 数据集,并评估其性能使用多种指标 such as 准确率、精度、 recall、特异性、 F1 分数、 AUC、 kappa 和 log loss。
  • results: 论文在hyperparameter 仔细调整后, achieved a 100% 的准确率,并发现 spectral clustering 在 NMI 和 ARI 指标上表现出色,与 k-means 的最佳 SC 相比。
    Abstract Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. The severity of these difficulties varies, and those with this diagnosis face unique challenges. While its primary origin lies in genetics, identifying and addressing it early can contribute to the enhancement of the condition. In recent years, machine learning-driven intelligent diagnosis has emerged as a supplement to conventional clinical approaches, aiming to address the potential drawbacks of time-consuming and costly traditional methods. In this work, we utilize different machine learning algorithms to find the most significant traits responsible for ASD and to automate the diagnostic process. We study six classification models to see which model works best to identify ASD and also study five popular clustering methods to get a meaningful insight of these ASD datasets. To find the best classifier for these binary datasets, we evaluate the models using accuracy, precision, recall, specificity, F1-score, AUC, kappa and log loss metrics. Our evaluation demonstrates that five out of the six selected models perform exceptionally, achieving a 100% accuracy rate on the ASD datasets when hyperparameters are meticulously tuned for each model. As almost all classification models are able to get 100% accuracy, we become interested in observing the underlying insights of the datasets by implementing some popular clustering algorithms on these datasets. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) & Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI & ARI metrics and it also demonstrates comparability to the optimal SC achieved by k-means.
    摘要 “自适应谱综合症(ASD)是一种中枢神经系综合病,表现为社交交流、communication和复制活动等障碍。这些障碍的严重程度不同,患有这个诊断的人面临着独特的挑战。尽管其主要起源是遗传的,但可以通过早期识别和治疗来提高其状况。在过去几年中,基于机器学习的智能诊断技术在传统临床方法的支持下 emerged as a supplement, aiming to address the potential drawbacks of time-consuming and costly traditional methods.在这种工作中,我们使用不同的机器学习算法来找出ASD最重要的特征和自动诊断过程。我们研究了六种分类模型,以确定哪种模型最适合识别ASD,并研究了五种流行的聚类方法,以获得有意义的ASD数据见解。为了选择最佳分类器,我们评估了模型使用精度、准确率、回归率、特征选择率、F1分数、AUC、κ和损失函数等指标。我们的评估表明,五个选择的模型在hyperparameter优化后都能够达到100%的准确率。由于大多数分类模型都能够达到100%的准确率,我们开始关注这些数据集的下面隐含的含义。我们在这些数据集上实施了一些流行的聚类算法,并计算了Normalized Mutual Information(NMI)、Adjusted Rand Index(ARI)和Silhouette Coefficient(SC)等指标,以选择最佳聚类模型。我们的评估发现,spectral clustering在NMI和ARI指标上表现出色,并且与k-means的最佳SC指标相比可观。”

Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation

  • paper_url: http://arxiv.org/abs/2309.11623
  • repo_url: None
  • paper_authors: Pavan Seshadri, Peter Knees
  • For: This paper focuses on improving sequential music recommendation by incorporating negative session-level feedback using transformer-based self-attentive architectures and contrastive learning.* Methods: The paper proposes using transformer-based self-attentive models to learn implicit session-level information and incorporating negative feedback through a contrastive learning task.* Results: The paper shows that incorporating negative feedback through contrastive learning results in consistent performance gains over baseline architectures ignoring negative user feedback.
    Abstract Music streaming services heavily rely on their recommendation engines to continuously provide content to their consumers. Sequential recommendation consequently has seen considerable attention in current literature, where state of the art approaches focus on self-attentive models leveraging contextual information such as long and short-term user history and item features; however, most of these studies focus on long-form content domains (retail, movie, etc.) rather than short-form, such as music. Additionally, many do not explore incorporating negative session-level feedback during training. In this study, we investigate the use of transformer-based self-attentive architectures to learn implicit session-level information for sequential music recommendation. We additionally propose a contrastive learning task to incorporate negative feedback (e.g skipped tracks) to promote positive hits and penalize negative hits. This task is formulated as a simple loss term that can be incorporated into a variety of deep learning architectures for sequential recommendation. Our experiments show that this results in consistent performance gains over the baseline architectures ignoring negative user feedback.
    摘要 音乐流处服务重视推荐引擎,以提供不断的内容给消费者。顺序推荐得到了当前文献中一定的关注,现代approach都是基于自我注意力模型,利用用户历史记录和物品特征进行上下文ual information。然而,大多数研究都是针对长形内容领域(零售、电影等),而不是短形内容领域(如音乐)。另外,许多研究都不会在训练过程中包含负session-level反馈。在这个研究中,我们 investigate使用变换器基于自我注意力架构来学习隐藏session-level信息。我们还提出了一种对比学习任务,以包含负反馈(例如跳过的track),以便提高正确的hit和负反馈hit。这个任务被表示为一个简单的损失函数,可以与多种深度学习架构结合使用。我们的实验结果表明,这会导致 ignore negative user feedback的基eline架构的性能提高。

Latent Diffusion Models for Structural Component Design

  • paper_url: http://arxiv.org/abs/2309.11601
  • repo_url: None
  • paper_authors: Ethan Herron, Jaydeep Rade, Anushrut Jignasu, Baskar Ganapathysubramanian, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy
  • for: 这篇论文旨在提出一个构件设计生成框架,专门用于生成符合问题特定负载条件的结构元件。
  • methods: 我们使用一个对称扩散模型(Latent Diffusion model)来生成潜在的元件设计,以满足问题特定的负载条件。
  • results: 我们的方法可以实现对现有设计的编辑,并且可以实现高品质的结构性表现。我们的研究获得了量化的结果,证明了生成的设计具有内在的近乎最佳性。
    Abstract Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.
    摘要 最近的扩散模型技术进步,如扩散模型,对生成模型带来了革命性变革,使得可以生成高质量适应用户需求的图像。这篇论文提出了一个生成结构组件的框架。我们使用潜在扩散模型来生成可满足给定负荷条件的组件的潜在设计。与其他生成方法,如生成对抗网络(GANs)相比,我们的方法允许编辑现有设计。我们使用结构 topology 优化算法来获得几何数据,并在这些数据上训练我们的模型。因此,我们的框架可以生成自然near-optimal设计。我们的工作提供了量化结果,证明生成的设计具有结构性能的可靠性和可变性。此外,我们还证明了我们的框架可以在 voxel 领域中进行扩展,并且可以在 $32^3$ 到 $128^3$ 的分辨率范围内操作。我们的框架可以作为生成类似于 topology-optimized 设计的开始点。

Multiplying poles to avoid unwanted points in root finding and optimization

  • paper_url: http://arxiv.org/abs/2309.11475
  • repo_url: None
  • paper_authors: Tuyen Trung Truong
  • for: 本文targets at solving the problem of avoiding the basin of attraction of a specific point in root finding and optimization.
  • methods: 提出了一种新的方法,即将函数值分割成一个适当的Power乘以距离集A的距离函数值,以避免在下一次算法中受到集A的吸引。
  • results: 提出了一种新的算法,可以帮助避免在root finding和优化中被吸引到特定点的basin of attraction中。该算法适用于iterative算法,并且可以在函数值为0时和函数值非零时两种情况下进行。此外,还提出了一种算法,可以帮助从一个正方向的分支中逃脱到另一个分支。
    Abstract In root finding and optimization, there are many cases where there is a closed set $A$ one does not the sequence constructed by one's favourite method will converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose a new method aiming to achieve this: we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. We first explain the heuristic for this method in the case where the minimum of the cost function is exactly 0, and then explain how to proceed if the minimum is non-zero (allowing both positive and negative values). The method is very suitable for iterative algorithms which have the descent property. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples to illustrate the usefulness of the new approach.
    摘要 在根寻找和优化中,有许多情况下,使用一种喜欢的方法constructing sequence将不会 converges to A(这里,我们不 assumption A是 convex或连通的其他性质)。例如,如果一个人想要找到根,并且选择初始点在拥有1根x*的基因囊拥(这可能是一个不知道的前提),那么一定会 ending up in that root。在这种情况下,我们希望有一种机制来避免这个点z*在下一次算法中。在这篇论文中,我们提出了一种新的方法,旨在实现这一点:我们将cost函数除以一个合适的powere distance函数到A。这个想法是根据在一个变量中找所有根的方法启发的。我们首先解释了在cost函数的最小值为0时的补做,然后解释如何处理非零最小值(允许正负值)。这种方法非常适合iterative算法,我们也建议一种使用这种方法逃脱基因囊拥的组分的方法。在进行这种方法的比较中,我们与现有的主要相关方法进行了比较。我们还提供了一些例子,以 Illustrate新的方法的有用性。

Model-free tracking control of complex dynamical trajectories with machine learning

  • paper_url: http://arxiv.org/abs/2309.11470
  • repo_url: https://github.com/Zheng-Meng/TrackingControl
  • paper_authors: Zheng-Meng Zhai, Mohammadamin Moradi, Ling-Wei Kong, Bryan Glaz, Mulugeta Haile, Ying-Cheng Lai
  • for: 控制两臂 роботизированного护手器使其跟踪欲要的轨迹,应用于多种 цивиль和国防领域。
  • methods: 使用模型自由、机器学习框架,只使用部分观察状态来控制两臂 роботизированный护手器。
  • results: 通过使用各种 periodic和异常信号,证明了控制框架的有效性,并在测试阶段(部署阶段)下确认了其对测量噪声、干扰和不确定性的稳定性。
    Abstract Nonlinear tracking control enabling a dynamical system to track a desired trajectory is fundamental to robotics, serving a wide range of civil and defense applications. In control engineering, designing tracking control requires complete knowledge of the system model and equations. We develop a model-free, machine-learning framework to control a two-arm robotic manipulator using only partially observed states, where the controller is realized by reservoir computing. Stochastic input is exploited for training, which consists of the observed partial state vector as the first and its immediate future as the second component so that the neural machine regards the latter as the future state of the former. In the testing (deployment) phase, the immediate-future component is replaced by the desired observational vector from the reference trajectory. We demonstrate the effectiveness of the control framework using a variety of periodic and chaotic signals, and establish its robustness against measurement noise, disturbances, and uncertainties.
    摘要 非线性跟踪控制,使动力系统跟踪所需的轨迹是机器人控制的基础,广泛应用于文明和国防领域。在控制工程中,设计跟踪控制需要完整的系统模型和方程。我们开发了一个无模型、机器学习框架,控制两臂机械 manipulate 器使用只有部分观察状态,控制器通过 rezzo 计算机。在训练阶段,利用 Stochastic 输入,训练过程包括观察的部分状态向量作为第一个组成部分,以及其未来的状态向量作为第二个组成部分,因此 neural machine 将后者视为前者的未来状态。在测试(部署)阶段,未来状态向量被替换为来自参照轨迹的所需观察向量。我们使用了多种 periodic 和混沌信号进行测试,并证明了控制框架的可靠性,对测量噪音、干扰和不确定性的抗性。

Digital twins of nonlinear dynamical systems: A perspective

  • paper_url: http://arxiv.org/abs/2309.11461
  • repo_url: None
  • paper_authors: Ying-Cheng Lai
  • for: 预测和避免非线性动力系统的突然规模事件
  • methods: sparse optimization和机器学习两种方法
  • results: 可以预测和避免非线性动力系统的突然规模事件,提供早期警示和预测性解决方案
    Abstract Digital twins have attracted a great deal of recent attention from a wide range of fields. A basic requirement for digital twins of nonlinear dynamical systems is the ability to generate the system evolution and predict potentially catastrophic emergent behaviors so as to providing early warnings. The digital twin can then be used for system "health" monitoring in real time and for predictive problem solving. In particular, if the digital twin forecasts a possible system collapse in the future due to parameter drifting as caused by environmental changes or perturbations, an optimal control strategy can be devised and executed as early intervention to prevent the collapse. Two approaches exist for constructing digital twins of nonlinear dynamical systems: sparse optimization and machine learning. The basics of these two approaches are described and their advantages and caveats are discussed.
    摘要 <>非线性动力系统的数字孪生有很多最近的关注,来自各种领域。数字孪生的基本要求是能够生成系统演化和预测可能出现的灾难性行为,以提供早期警示。数字孪生可以用于实时监测系统“健康”状态,并预测问题。特别是,如果数字孪生预测系统将在未来因为环境变化或干扰而导致崩溃,就可以根据这个预测来设计和执行早期干预措施,以避免崩溃。构建非线性动力系统的数字孪生有两种方法:散列优化和机器学习。这两种方法的基础和优缺点都是介绍的。>>>

Multi-Step Model Predictive Safety Filters: Reducing Chattering by Increasing the Prediction Horizon

  • paper_url: http://arxiv.org/abs/2309.11453
  • repo_url: https://github.com/federico-pizarrobejarano/safe-control-gym
  • paper_authors: Federico Pizarro Bejarano, Lukas Brunke, Angela P. Schoellig
  • for: This paper aims to improve the safety guarantees of learning-based controllers by reducing chattering in model predictive safety filters (MPSFs).
  • methods: The proposed approach considers input corrections over a longer horizon and uses techniques from robust MPC to prove recursive feasibility, reducing chattering by more than a factor of 4 compared to previous MPSF formulations.
  • results: The proposed approach is verified through extensive simulation and quadrotor experiments, demonstrating the preservation of desired safety guarantees and a significant reduction in chattering compared to previous MPSF formulations.
    Abstract Learning-based controllers have demonstrated superior performance compared to classical controllers in various tasks. However, providing safety guarantees is not trivial. Safety, the satisfaction of state and input constraints, can be guaranteed by augmenting the learned control policy with a safety filter. Model predictive safety filters (MPSFs) are a common safety filtering approach based on model predictive control (MPC). MPSFs seek to guarantee safety while minimizing the difference between the proposed and applied inputs in the immediate next time step. This limited foresight can lead to jerky motions and undesired oscillations close to constraint boundaries, known as chattering. In this paper, we reduce chattering by considering input corrections over a longer horizon. Under the assumption of bounded model uncertainties, we prove recursive feasibility using techniques from robust MPC. We verified the proposed approach in both extensive simulation and quadrotor experiments. In experiments with a Crazyflie 2.0 drone, we show that, in addition to preserving the desired safety guarantees, the proposed MPSF reduces chattering by more than a factor of 4 compared to previous MPSF formulations.
    摘要

Distribution and volume based scoring for Isolation Forests

  • paper_url: http://arxiv.org/abs/2309.11450
  • repo_url: https://github.com/porscheofficial/distribution_and_volume_based_isolation_forest
  • paper_authors: Hichem Dhouib, Alissa Wilms, Paul Boes
  • for: 本研究提出了两种改进方法 дляIsland Forest方法,以提高异常检测的精度和效果。
  • methods: 第一种方法是基于信息理论的总体分数函数的扩展,可以考虑整个分布而不仅仅是树ensemble平均值。第二种方法是在隔离树 estimator nivel replace depth-based 分数函数。
  • results: 对于生成的数据和34个``ADBench’’ benchmark dataset进行了评估,发现使用这两种方法可以在某些dataset上提高异常检测的精度,并且在所有dataset上平均上提高一种变体。代码可以在提交中找到。
    Abstract We make two contributions to the Isolation Forest method for anomaly and outlier detection. The first contribution is an information-theoretically motivated generalisation of the score function that is used to aggregate the scores across random tree estimators. This generalisation allows one to take into account not just the ensemble average across trees but instead the whole distribution. The second contribution is an alternative scoring function at the level of the individual tree estimator, in which we replace the depth-based scoring of the Isolation Forest with one based on hyper-volumes associated to an isolation tree's leaf nodes. We motivate the use of both of these methods on generated data and also evaluate them on 34 datasets from the recent and exhaustive ``ADBench'' benchmark, finding significant improvement over the standard isolation forest for both variants on some datasets and improvement on average across all datasets for one of the two variants. The code to reproduce our results is made available as part of the submission.
    摘要 我们做了两个贡献到隔离森林方法中,用于异常和偏出检测。第一个贡献是基于信息理论的预测函数的一种扩展,用于聚合随机树估计值。这个扩展允许我们考虑不仅ensemble均值过滤,而是整个分布。第二个贡献是将隔离树估计值中的深度基于的评分函数 replaced with hyper-volume association with isolation tree leaf nodes。我们在生成数据上验证了这两种方法,并在``ADBench''benchmark中的34个数据集上进行了评估,发现这两种方法在一些数据集上有所改善,而且在所有数据集上的平均改善。我们的结果可以在提交中找到相关的代码。

Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models

  • paper_url: http://arxiv.org/abs/2309.11420
  • repo_url: None
  • paper_authors: Song Mei, Yuchen Wu
  • for: 本研究探讨了深度神经网络在diffusion-based生成模型中的折衔效率。现有的折衔理论假设了折衔函数的光滑性,但是这些理论受到维度约束的困难,特别是图形模型如Markov随机场,这些模型通常用于图像分布。
  • methods: 我们观察到,在图形模型中,折衔函数可以通过变分推理算法得到有效的近似。此外,这些算法可以有效地表示为神经网络。我们在Isling模型、conditional Ising模型、restricted Boltzmann机和简单编码模型中进行了示例。
  • results: 我们提供了一种基于diffusion-based sampling的有效样本复杂度 bound,当折衔函数是通过深度神经网络学习得到的时候。
    Abstract We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling. While existing approximation theories utilize the smoothness of score functions, they suffer from the curse of dimensionality for intrinsically high-dimensional data. This limitation is pronounced in graphical models such as Markov random fields, common for image distributions, where the approximation efficiency of score functions remains unestablished. To address this, we observe score functions can often be well-approximated in graphical models through variational inference denoising algorithms. Furthermore, these algorithms are amenable to efficient neural network representation. We demonstrate this in examples of graphical models, including Ising models, conditional Ising models, restricted Boltzmann machines, and sparse encoding models. Combined with off-the-shelf discretization error bounds for diffusion-based sampling, we provide an efficient sample complexity bound for diffusion-based generative modeling when the score function is learned by deep neural networks.
    摘要 我团队研究使用深度神经网络来近似分布式生成模型中的分数函数的效率。现有的近似理论利用分数函数的平滑性,但是它们由于数维度的封闭而受到诅咒性的影响,特别是图形模型,如图像分布中的马可夫随机场,其中分数函数的近似效率未能得到确定。为了解决这个问题,我们发现分数函数在图形模型中可以通过变量推理梯度下降算法进行良好的近似。此外,这些算法可以fficient地表示为神经网络。我们在图像分布中的伊辛模型、条件伊辛模型、受限的博尔tz曼机和简洁编码模型中进行了示例。与市场上的批量误差边界相结合,我们提供了一个高效的样本复杂度下界 для diffusion-based生成模型,当分数函数被深度神经网络学习时。

Transformers versus LSTMs for electronic trading

  • paper_url: http://arxiv.org/abs/2309.11400
  • repo_url: None
  • paper_authors: Paul Bilokon, Yitao Qiu
  • for: 这个研究的目的是确定Transformer模型是否可以在金融时间序预测中取代LSTM模型,并比较了不同的LSTM和Transformer模型在多个金融预测任务中的表现。
  • methods: 该研究使用了多种LSTM和Transformer模型,包括一种新的DLSTM模型和一种适应金融预测的Transformer模型。
  • results: 实验结果表明,Transformer模型只有在绝对价格序列预测方面表现出有限的优势,而LSTM模型在差价序列预测和价格运动预测方面表现更好和更稳定。
    Abstract With the rapid development of artificial intelligence, long short term memory (LSTM), one kind of recurrent neural network (RNN), has been widely applied in time series prediction. Like RNN, Transformer is designed to handle the sequential data. As Transformer achieved great success in Natural Language Processing (NLP), researchers got interested in Transformer's performance on time series prediction, and plenty of Transformer-based solutions on long time series forecasting have come out recently. However, when it comes to financial time series prediction, LSTM is still a dominant architecture. Therefore, the question this study wants to answer is: whether the Transformer-based model can be applied in financial time series prediction and beat LSTM. To answer this question, various LSTM-based and Transformer-based models are compared on multiple financial prediction tasks based on high-frequency limit order book data. A new LSTM-based model called DLSTM is built and new architecture for the Transformer-based model is designed to adapt for financial prediction. The experiment result reflects that the Transformer-based model only has the limited advantage in absolute price sequence prediction. The LSTM-based models show better and more robust performance on difference sequence prediction, such as price difference and price movement.
    摘要 随着人工智能的快速发展,长短期记忆(LSTM),一种回归神经网络(RNN),在时间序列预测中得到了广泛的应用。与RNN类似,Transformer是用于处理时间序列数据的设计。由于Transformer在自然语言处理(NLP)中取得了巨大成功,研究人员对Transformer在时间序列预测中的表现感到兴趣,并在最近出现了许多基于Transformer的解决方案。然而,在金融时间序列预测中,LSTM仍然是主导的建筑。因此,本研究的问题是:可否使用Transformer-based模型来预测金融时间序列,并超越LSTM。为了回答这个问题,本研究对多种LSTM-based和Transformer-based模型进行了比较,并在高频限制ORDER BOOK数据上进行了多个金融预测任务。此外,一种新的LSTM-based模型called DLSTM被建立,并对Financial prediction进行了新的建筑。实验结果表明,Transformer-based模型只有有限的优势在绝对价格序列预测中。相比之下,LSTM-based模型在差价序列预测中表现更好和更加稳定,例如价格差和价格运动。

SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On

  • paper_url: http://arxiv.org/abs/2309.12218
  • repo_url: https://github.com/rickyskywalker/sr-predictao-official
  • paper_authors: Ruida Wang, Raymond Chi-Wing Wong, Weile Tan
  • for: 本研究旨在提出一种能够在单个会话中预测用户下一步行为的Session-based recommendation模型,以解决现有模型中随机用户行为的影响。
  • methods: 我们提出了一种新的框架 called SR-PredictAO,它包括一个高能力预测器模块,可以减轻用户行为的随机性对预测的影响。此外,我们还提出了一种可以应用于现有模型上的高能力预测器模块优化方法。
  • results: 我们在两个实际数据集上进行了广泛的实验,并证明了SR-PredictAO在三种现有模型上的表现比现有模型更好,具体来说,SR-PredictAO在HR@20和MRR@20上比现有模型高出2.9%和2.3%。此外,这些改进都是在大多数现有模型上的所有数据集上进行的,这可以被视为Session-based recommendation领域的一项重要贡献。
    Abstract Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field.
    摘要 Session-based 推荐,targeting at predicting the user's next item click based on the information in a single session, is a complex problem. This complex problem requires a high-capability model for predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm, where all studies focus on optimizing the encoder module extensively in the paradigm but ignore the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module that can alleviate the effect of random user behavior for prediction. It is worth mentioning that this framework can be applied to any existing models, which can provide opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} outperforms the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all existing models on all datasets, which can be regarded as a significant contribution in the field.

Learning Patient Static Information from Time-series EHR and an Approach for Safeguarding Privacy and Fairness

  • paper_url: http://arxiv.org/abs/2309.11373
  • repo_url: None
  • paper_authors: Wei Liao, Joel Voldman
  • for: 这种研究旨在 investigate the ability of time-series electronic health record data to predict patient static information, and to develop a general approach to protect patient-sensitive attribute information for downstream tasks.
  • methods: 研究使用了时序数据和机器学习模型,并使用了多种方法和数据库来评估模型的性能。
  • results: 研究发现, raw time-series data 和机器学习模型学习的表示可以高度预测patient的静态信息,包括生物性别、年龄和自reported race。此外,这些预测性能可以扩展到各种相关疾病因素,并且存在even when the model was trained for different tasks, using different cohorts, using different model architectures and databases.
    Abstract Recent work in machine learning for healthcare has raised concerns about patient privacy and algorithmic fairness. For example, previous work has shown that patient self-reported race can be predicted from medical data that does not explicitly contain racial information. However, the extent of data identification is unknown, and we lack ways to develop models whose outcomes are minimally affected by such information. Here we systematically investigated the ability of time-series electronic health record data to predict patient static information. We found that not only the raw time-series data, but also learned representations from machine learning models, can be trained to predict a variety of static information with area under the receiver operating characteristic curve as high as 0.851 for biological sex, 0.869 for binarized age and 0.810 for self-reported race. Such high predictive performance can be extended to a wide range of comorbidity factors and exists even when the model was trained for different tasks, using different cohorts, using different model architectures and databases. Given the privacy and fairness concerns these findings pose, we develop a variational autoencoder-based approach that learns a structured latent space to disentangle patient-sensitive attributes from time-series data. Our work thoroughly investigates the ability of machine learning models to encode patient static information from time-series electronic health records and introduces a general approach to protect patient-sensitive attribute information for downstream tasks.
    摘要

Using Property Elicitation to Understand the Impacts of Fairness Constraints

  • paper_url: http://arxiv.org/abs/2309.11343
  • repo_url: None
  • paper_authors: Jessie Finocchiaro
  • for: 本研究旨在理解许多预测算法的训练过程中,添加正则化函数会如何改变优化目标的最小值。
  • methods: 本研究使用属性描述来探讨诸如产品分布变化和约束松弛等因素对优化目标的影响。
  • results: 研究发现,添加正则化函数可能会改变优化目标的最小值,并且可以通过属性描述来描述这种改变。此外,研究还发现在不同的数据分布和约束条件下,算法决策的变化。
    Abstract Predictive algorithms are often trained by optimizing some loss function, to which regularization functions are added to impose a penalty for violating constraints. As expected, the addition of such regularization functions can change the minimizer of the objective. It is not well-understood which regularizers change the minimizer of the loss, and, when the minimizer does change, how it changes. We use property elicitation to take first steps towards understanding the joint relationship between the loss and regularization functions and the optimal decision for a given problem instance. In particular, we give a necessary and sufficient condition on loss and regularizer pairs for when a property changes with the addition of the regularizer, and examine some regularizers satisfying this condition standard in the fair machine learning literature. We empirically demonstrate how algorithmic decision-making changes as a function of both data distribution changes and hardness of the constraints.
    摘要 预测算法经常通过优化损失函数来训练,并将正则函数添加到损失函数中以实现一些约束。预期地,添加正则函数会改变损失函数的最小值。然而,我们不很了解哪些正则函数会改变损失函数的最小值,以及这些改变是如何发生的。我们使用财产描述来开始理解损失和正则函数对于给定问题实例的优化决策的关系。特别是,我们给出了损失和正则函数对的必要和 suficient condition,并考察了常见的公平机器学习 литературе中的一些满足这个condition的正则函数。我们通过实验表明,在数据分布变化和约束硬度变化的情况下,算法决策会发生变化。

WFTNet: Exploiting Global and Local Periodicity in Long-term Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2309.11319
  • repo_url: None
  • paper_authors: Peiyuan Liu, Beiliang Wu, Naiqi Li, Tao Dai, Fengmao Lei, Jigang Bao, Yong Jiang, Shu-Tao Xia
  • for: 预测长期时间序列,使用波峰变换网络(WFTNet)捕捉全面的时间频率信息。
  • methods: 使用波峰变换和 Fourier 变换两者,捕捉全面的时间频率信息,并引入周期性权重因子(PWC)自适应地平衡全面和本地频率模式的重要性。
  • results: 对多种时间序列数据进行了广泛的实验,并 consistently 超过了其他基准值。
    Abstract Recent CNN and Transformer-based models tried to utilize frequency and periodicity information for long-term time series forecasting. However, most existing work is based on Fourier transform, which cannot capture fine-grained and local frequency structure. In this paper, we propose a Wavelet-Fourier Transform Network (WFTNet) for long-term time series forecasting. WFTNet utilizes both Fourier and wavelet transforms to extract comprehensive temporal-frequency information from the signal, where Fourier transform captures the global periodic patterns and wavelet transform captures the local ones. Furthermore, we introduce a Periodicity-Weighted Coefficient (PWC) to adaptively balance the importance of global and local frequency patterns. Extensive experiments on various time series datasets show that WFTNet consistently outperforms other state-of-the-art baseline.
    摘要 近期的CNN和Transformer模型尝试利用频率和周期信息进行长期时间序预测。然而,大多数现有工作基于傅里叶变换,这无法捕捉细致的频率结构。在这篇论文中,我们提出了一种幂 transformed-wavelet网络(WFTNet),用于长期时间序预测。WFTNet利用了傅里叶和wavelet变换来提取时间序列中的全面时间频率信息,其中傅里叶变换捕捉到全球性征周期模式,wavelet变换捕捉到本地性征周期模式。此外,我们引入了一种 Periodicity-Weighted Coefficient(PWC),以适应地 adaptively 衡量全球和本地频率模式的重要性。我们在不同的时间序列数据集上进行了广泛的实验,并证明了WFTNet在其他基eline上 consistently 升级。

Create and Find Flatness: Building Flat Training Spaces in Advance for Continual Learning

  • paper_url: http://arxiv.org/abs/2309.11305
  • repo_url: https://github.com/Eric8932/Create-and-Find-Flatness
  • paper_authors: Wenhang Shi, Yiren Chen, Zhe Zhao, Wei Lu, Kimmo Yan, Xiaoyong Du
  • for: 本研究旨在解决 continual learning 中的恶性忘记问题,提高 neural network 在学习新任务时保持之前任务知识的能力。
  • methods: 我们提出了一种 novel 的 Create and Find Flatness(C&F)框架,在每个任务学习阶段建立一个适应当任务的平坦训练空间。在学习当前任务时,我们适应创建一个损失函数的平坦区域,然后根据参数对当前任务的重要性进行评估。在适应新任务时,我们会应用约束以根据平坦度,同时为新任务准备平坦的训练空间。
  • results: 我们的 C&F 框架在 standalone continual learning 中表现出色,并且可以与其他方法组合使用。实验结果表明,C&F 可以保持之前任务知识,同时学习新任务,并且在不同的 dataset 上具有稳定的性能。
    Abstract Catastrophic forgetting remains a critical challenge in the field of continual learning, where neural networks struggle to retain prior knowledge while assimilating new information. Most existing studies emphasize mitigating this issue only when encountering new tasks, overlooking the significance of the pre-task phase. Therefore, we shift the attention to the current task learning stage, presenting a novel framework, C&F (Create and Find Flatness), which builds a flat training space for each task in advance. Specifically, during the learning of the current task, our framework adaptively creates a flat region around the minimum in the loss landscape. Subsequently, it finds the parameters' importance to the current task based on their flatness degrees. When adapting the model to a new task, constraints are applied according to the flatness and a flat space is simultaneously prepared for the impending task. We theoretically demonstrate the consistency between the created and found flatness. In this manner, our framework not only accommodates ample parameter space for learning new tasks but also preserves the preceding knowledge of earlier tasks. Experimental results exhibit C&F's state-of-the-art performance as a standalone continual learning approach and its efficacy as a framework incorporating other methods. Our work is available at https://github.com/Eric8932/Create-and-Find-Flatness.
    摘要 catastrophic forgetting 是一个重要挑战在持续学习领域, neural network 在接受新任务时忘记之前的知识是一个关键问题。 existing studies 通常只关注在新任务上 mitigating 这个问题,忽视了 pre-task 阶段的重要性。 因此,我们将注意力集中在当前任务学习阶段,提出了一种新的框架, C&F(Create and Find Flatness),它在每个任务之前建立了一个平坦的训练空间。 specifically, 在学习当前任务时,我们的框架会动态创建一个缺失的最小值附近的平坦区域。 然后,它会根据参数的平坦度来确定参数的当前任务重要性。 当适应新任务时,我们会根据平坦度应用约束,并同时为下一个任务准备一个平坦的空间。 我们理论上验证了创建和发现平坦的一致性。 因此,我们的框架不仅为学习新任务提供了充足的参数空间,而且也保留了前一个任务中的知识。 实验结果表明 C&F 能够独立地实现状态机器学习的表现,同时作为其他方法的框架也有出色的效果。 我们的工作可以在 中找到。

Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information

  • paper_url: http://arxiv.org/abs/2309.11294
  • repo_url: None
  • paper_authors: Sarwan Ali
  • for: 这个论文的目的是提出一种方法来评估嵌入的质量和容量,以便更好地理解嵌入在不同应用中的效果。
  • methods: 该论文使用了外部评估方法(如分类和聚类)和t-SNE基于的邻居分析(如邻居一致性和信任度)来全面评估嵌入的质量和容量。同时,使用bayesian优化技术来优化评估 metric的权重,以确保一个数据驱动的、 объек oriented 的方法。
  • results: 该论文通过使用三个生物序列数据集(蛋白质和核酸)和四种嵌入方法(Spike2Vec、Spaced k-mers、PWM2Vec 和 AutoEncoder)进行评估,并结果表明该方法可以帮助研究者和实践者更好地理解嵌入在不同应用中的效果,并提供一个量化的评估方法。
    Abstract Effective representation of data is crucial in various machine learning tasks, as it captures the underlying structure and context of the data. Embeddings have emerged as a powerful technique for data representation, but evaluating their quality and capacity to preserve structural and contextual information remains a challenge. In this paper, we address this need by proposing a method to measure the \textit{representation capacity} of embeddings. The motivation behind this work stems from the importance of understanding the strengths and limitations of embeddings, enabling researchers and practitioners to make informed decisions in selecting appropriate embedding models for their specific applications. By combining extrinsic evaluation methods, such as classification and clustering, with t-SNE-based neighborhood analysis, such as neighborhood agreement and trustworthiness, we provide a comprehensive assessment of the representation capacity. Additionally, the use of optimization techniques (bayesian optimization) for weight optimization (for classification, clustering, neighborhood agreement, and trustworthiness) ensures an objective and data-driven approach in selecting the optimal combination of metrics. The proposed method not only contributes to advancing the field of embedding evaluation but also empowers researchers and practitioners with a quantitative measure to assess the effectiveness of embeddings in capturing structural and contextual information. For the evaluation, we use $3$ real-world biological sequence (proteins and nucleotide) datasets and performed representation capacity analysis of $4$ embedding methods from the literature, namely Spike2Vec, Spaced $k$-mers, PWM2Vec, and AutoEncoder.
    摘要 效果表示数据的表示是机器学习任务中的关键,它捕捉了数据的下面结构和上下文。嵌入在机器学习中出现为一种强大的表示技巧,但评估其质量和保持结构和上下文信息的能力仍然是一个挑战。本文提出一种方法来衡量嵌入的表示能力。这种方法的动机来自于了理解嵌入的优劣点,以便研究者和实践者可以根据特定应用选择合适的嵌入模型。通过结合外部评估方法(如分类和聚类)和t-SNE基于的邻居分析(如邻居一致和信任度),我们提供了一种全面的评估方法。此外,使用搜索算法(bayesian优化)来优化参数(如分类、聚类、邻居一致和信任度),确保了一种客观和数据驱动的方法来选择最佳的综合指标。该方法不仅为嵌入评估领域做出了贡献,还为研究者和实践者提供了一个量化的评估方法,以评估嵌入是否能够有效地捕捉结构和上下文信息。为评估,我们使用了3个实际生物序列(蛋白质和核苷酸)数据集,并对Literature中的4种嵌入方法进行表示能力分析,即Spike2Vec、Spaced k-mers、PWM2Vec和AutoEncoder。

Grassroots Operator Search for Model Edge Adaptation

  • paper_url: http://arxiv.org/abs/2309.11246
  • repo_url: None
  • paper_authors: Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar
  • for: 这种论文的目的是提出一种基于数学基础的 neural architecture search(NAS)方法,用于适应边缘设备上的深度学习模型。
  • methods: 该方法使用 Grassroots Operator Search(GOS)方法,通过搜索和选择高效的操作符来代替原始模型中的操作符,以提高模型的计算效率while maintaining high accuracy。
  • results: 在多种深度学习模型上,该方法可以在Redmi Note 7S和Raspberry Pi3等边缘设备上实现至少2.2倍的计算速度提升,同时保持高度的准确率。此外,在脉冲频度估计应用中,该方法可以达到状态 Künstler的性能,同时保持计算复杂度的减少,证明了该方法的实用性。
    Abstract Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep learning (DL) engineers and standard modeling approaches. In this paper, we present a Grassroots Operator Search (GOS) methodology. Our HW-NAS adapts a given model for edge devices by searching for efficient operator replacement. We express each operator as a set of mathematical instructions that capture its behavior. The mathematical instructions are then used as the basis for searching and selecting efficient replacement operators that maintain the accuracy of the original model while reducing computational complexity. Our approach is grassroots since it relies on the mathematical foundations to construct new and efficient operators for DL architectures. We demonstrate on various DL models, that our method consistently outperforms the original models on two edge devices, namely Redmi Note 7S and Raspberry Pi3, with a minimum of 2.2x speedup while maintaining high accuracy. Additionally, we showcase a use case of our GOS approach in pulse rate estimation on wristband devices, where we achieve state-of-the-art performance, while maintaining reduced computational complexity, demonstrating the effectiveness of our approach in practical applications.
    摘要 在这篇论文中,我们提出了一种基于植物架构的搜寻方法,称为Grassroots Operator Search(GOS)。我们的HW-NAS方法运用了一个基于植物架构的搜寻空间,寻找高效的操作器替代。我们将每个操作器表示为一些数学指令,这些指令 capture了操作器的行为。这些数学指令后来用作搜寻和选择高效的操作器替代,以维持原始模型的精度,并降低计算复杂性。我们的方法是一种基于植物的方法,因为它将基于植物架构的数学基础建构新的高效操作器。我们在不同的深度学习模型上进行了评估,我们的方法在Redmi Note 7S和Raspberry Pi3等两个边缘设备上显示了至少2.2倍的速度提升,同时维持高精度。此外,我们还展示了我们的GOS方法在脉搏监测器上的实际应用,在这个应用中,我们取得了现有最佳性能,同时维持了降低的计算复杂性,实证了我们的方法在实际应用中的有效性。

Towards a Prediction of Machine Learning Training Time to Support Continuous Learning Systems Development

  • paper_url: http://arxiv.org/abs/2309.11226
  • repo_url: None
  • paper_authors: Francesca Marzi, Giordano d’Aloisio, Antinisca Di Marco, Giovanni Stilo
  • for: 预测机器学习模型训练时间的问题在科学社区中变得非常重要。可以预测ML模型训练时间,可以自动选择最佳模型,以提高能效性和性能。本文描述我们在这个方向上的工作。
  • methods: 我们对 Zheng et al.提出的Full Parameter Time Complexity (FPTC)方法进行了广泛的实证研究。这是我们知道的唯一一种形式化ML模型训练时间与数据集和模型参数之间的关系。我们研究了逻辑回归和随机森林分类器的形ulation,并指出了主要的优点和缺点。
  • results: 我们发现,从实验结果来看,训练时间预测与数据集上下文有着紧密的关系。FPTC方法不能泛化。
    Abstract The problem of predicting the training time of machine learning (ML) models has become extremely relevant in the scientific community. Being able to predict a priori the training time of an ML model would enable the automatic selection of the best model both in terms of energy efficiency and in terms of performance in the context of, for instance, MLOps architectures. In this paper, we present the work we are conducting towards this direction. In particular, we present an extensive empirical study of the Full Parameter Time Complexity (FPTC) approach by Zheng et al., which is, to the best of our knowledge, the only approach formalizing the training time of ML models as a function of both dataset's and model's parameters. We study the formulations proposed for the Logistic Regression and Random Forest classifiers, and we highlight the main strengths and weaknesses of the approach. Finally, we observe how, from the conducted study, the prediction of training time is strictly related to the context (i.e., the involved dataset) and how the FPTC approach is not generalizable.
    摘要 《机器学习模型训练时间预测问题已成为科学界热点问题。如果可以在先知道模型训练时间,那么可以自动选择最佳模型,以保证能够达到最佳性能和能效率。在这篇论文中,我们介绍了我们在这个方向下的工作。具体来说,我们对 Zheng et al. 等人提出的 Full Parameter Time Complexity(FPTC)方法进行了广泛的实证研究。这是我们所知道的唯一一种形式化机器学习模型训练时间为数据集和模型参数的函数。我们对 Logistic Regression 和 Random Forest 分类器的形ulation进行了研究,并将其主要优点和缺点进行了描述。最后,我们发现,从我们进行的研究来看,训练时间预测与数据集相关,而 FPTC 方法不能泛化。》Note: Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

A Model-Based Machine Learning Approach for Assessing the Performance of Blockchain Applications

  • paper_url: http://arxiv.org/abs/2309.11205
  • repo_url: https://github.com/AlbshriAdel/BlockchainPerformanceML
  • paper_authors: Adel Albshri, Ali Alzubaidi, Ellis Solaiman
  • for: 本研究旨在提供一种可靠的模型方法,以便促进区块链应用程序的开发和评估。
  • methods: 本研究使用了两种机器学习模型基本方法:首先,我们使用 $k$ nearest neighbors($k$NN)和支持向量机器(SVM)模型来预测区块链性能,使用预先确定的配置参数。其次,我们使用瑞特集群优化(SO)机器学习模型,并使用瑞特集群优化(ISO)来寻找最佳区块链配置,以达到所需性能水平。
  • results: 我们的模型比较统计结果表明,使用 $k$NN 模型可以比 SVM 模型提高性能,并且使用 ISO 可以减少不确定性 deviation 的偏差。
    Abstract The recent advancement of Blockchain technology consolidates its status as a viable alternative for various domains. However, evaluating the performance of blockchain applications can be challenging due to the underlying infrastructure's complexity and distributed nature. Therefore, a reliable modelling approach is needed to boost Blockchain-based applications' development and evaluation. While simulation-based solutions have been researched, machine learning (ML) model-based techniques are rarely discussed in conjunction with evaluating blockchain application performance. Our novel research makes use of two ML model-based methods. Firstly, we train a $k$ nearest neighbour ($k$NN) and support vector machine (SVM) to predict blockchain performance using predetermined configuration parameters. Secondly, we employ the salp swarm optimization (SO) ML model which enables the investigation of optimal blockchain configurations for achieving the required performance level. We use rough set theory to enhance SO, hereafter called ISO, which we demonstrate to prove achieving an accurate recommendation of optimal parameter configurations; despite uncertainty. Finally, statistical comparisons indicate that our models have a competitive edge. The $k$NN model outperforms SVM by 5\% and the ISO also demonstrates a reduction of 4\% inaccuracy deviation compared to regular SO.
    摘要 最近的区块链技术进步使其成为多种领域的可靠 altenative。然而,评估区块链应用程序性能可能会困难由于区块链基础设施的复杂性和分布式特点。因此,一种可靠的模型方法是需要为区块链应用程序的开发和评估提供 boost。而且,使用simulation-based解决方案已经被研究,但是使用机器学习(ML)模型基于技术 rarely discussed in conjunction with evaluating blockchain application performance。我们的新研究使用了两种ML模型基于方法。首先,我们使用 $k$ nearest neighbour ($k$NN) 和支持向量机(SVM)来预测区块链性能使用预先确定的配置参数。其次,我们使用salp swarm optimization(SO)ML模型,该模型允许我们调查到达所需性能水平的优化的区块链配置。我们使用粗设理论来增强SO,称为ISO,并证明ISO可以准确地提供优化参数配置,即使存在uncertainty。最后,统计比较表明,我们的模型具有竞争优势。$k$NN模型在比较SVM方法时表现出5%的提升,而ISO模型也表现出4%的减少不确定性偏移。

RHALE: Robust and Heterogeneity-aware Accumulated Local Effects

  • paper_url: http://arxiv.org/abs/2309.11193
  • repo_url: https://github.com/givasile/RHALE
  • paper_authors: Vasilis Gkolemis, Theodore Dalamagas, Eirini Ntoutsi, Christos Diou
  • for: 本研究旨在提高Explainability方法的精度和可靠性,特别是在处理相关特征情况下。
  • methods: 本研究提出了一种Robust和Heterogeneity-aware ALE(RHALE)方法,该方法可以评估特征对输出的平均效果,同时考虑到实例级别的差异(heterogeneity)。
  • results: 对于 synthetic 和实际数据集,RHALE 方法比其他方法表现更优,特别是在相关特征情况下。 RHALE 方法还可以自动确定最佳分割方案,以兼顾 bias 和 variance。
    Abstract Accumulated Local Effects (ALE) is a widely-used explainability method for isolating the average effect of a feature on the output, because it handles cases with correlated features well. However, it has two limitations. First, it does not quantify the deviation of instance-level (local) effects from the average (global) effect, known as heterogeneity. Second, for estimating the average effect, it partitions the feature domain into user-defined, fixed-sized bins, where different bin sizes may lead to inconsistent ALE estimations. To address these limitations, we propose Robust and Heterogeneity-aware ALE (RHALE). RHALE quantifies the heterogeneity by considering the standard deviation of the local effects and automatically determines an optimal variable-size bin-splitting. In this paper, we prove that to achieve an unbiased approximation of the standard deviation of local effects within each bin, bin splitting must follow a set of sufficient conditions. Based on these conditions, we propose an algorithm that automatically determines the optimal partitioning, balancing the estimation bias and variance. Through evaluations on synthetic and real datasets, we demonstrate the superiority of RHALE compared to other methods, including the advantages of automatic bin splitting, especially in cases with correlated features.
    摘要 集成本地效应(ALE)是一种广泛使用的解释方法,用于隔离输出的平均效应,因为它能够处理相关的特征 случа子 well。然而,它有两个限制。首先,它不计算特定实例(本地)效应与平均(全局)效应之间的偏差。其次,为计算平均效应,它将特征领域分成用户定义、固定大小的分割,不同的分割大小可能会导致不一致的 ALE 估计。为解决这些限制,我们提出了 Robust and Heterogeneity-aware ALE(RHALE)。RHALE 考虑了本地效应的标准差,以及自动确定最佳变量大小分割。在这篇论文中,我们证明了,为在每个分割中精确估计本地效应的标准差,分割必须遵循一组必要条件。基于这些条件,我们提出了一种算法,可以自动确定最佳分割,协调估计偏差和方差。通过对 synthetic 和实际数据进行评估,我们示出了 RHALE 与其他方法相比,具有较好的优势,特别是在相关特征情况下。

Investigating Personalization Methods in Text to Music Generation

  • paper_url: http://arxiv.org/abs/2309.11140
  • repo_url: https://github.com/zelaki/DreamSound
  • paper_authors: Manos Plitsis, Theodoros Kouzelis, Georgios Paraskevopoulos, Vassilis Katsouros, Yannis Panagakis
  • for: 这个研究探讨了在几个shot设定下个性化文本到音乐扩散模型的问题。
  • methods: 研究使用了已有的个性化方法的组合,以及音频专门的数据增强技术。
  • results: 研究发现,相似度指标与用户喜好相吻合,现有的个性化方法更容易学习rhythmic音乐构造而不是melody。Please note that the above text is in Simplified Chinese.
    Abstract In this work, we investigate the personalization of text-to-music diffusion models in a few-shot setting. Motivated by recent advances in the computer vision domain, we are the first to explore the combination of pre-trained text-to-audio diffusers with two established personalization methods. We experiment with the effect of audio-specific data augmentation on the overall system performance and assess different training strategies. For evaluation, we construct a novel dataset with prompts and music clips. We consider both embedding-based and music-specific metrics for quantitative evaluation, as well as a user study for qualitative evaluation. Our analysis shows that similarity metrics are in accordance with user preferences and that current personalization approaches tend to learn rhythmic music constructs more easily than melody. The code, dataset, and example material of this study are open to the research community.
    摘要 在这项研究中,我们调查了文本到音乐填充模型在几个尝试设置下的个性化。受最近计算机视觉领域的进步 inspirits,我们是首次探讨将预训练文本到音频填充器与两种已有个性化方法结合使用。我们对系统性能的影响进行了音频特定数据增强的实验,并评估了不同的训练策略。为评价,我们建立了一个新的提示和音乐片断集合。我们使用了两种嵌入空间和音乐特有的评价指标进行量化评估,以及一项用户研究 для质量评估。我们的分析表明,相似度指标与用户喜好相符,现有的个性化方法更容易学习音乐的节奏结构而不是旋律。我们的代码、数据集和研究材料对研究社区开放。

Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches

  • paper_url: http://arxiv.org/abs/2309.11120
  • repo_url: None
  • paper_authors: Hao Xu, Juan Du, Andi Wang
  • for: This paper aims to address the challenges of existing matrix decomposition methods in image-based anomaly detection, particularly in the presence of complex backgrounds and various anomaly patterns.
  • methods: The proposed method uses a two-stage strategy that involves detecting suspected patches (Ano-SuPs) by reconstructing the input image twice: the first step is to obtain a set of normal patches by removing suspected patches, and the second step is to use those normal patches to refine the identification of patches with anomalies.
  • results: The proposed method is evaluated systematically through simulation experiments and case studies, demonstrating its effectiveness in detecting anomalies in image-based systems. The key parameters and designed steps that impact the model’s performance and efficiency are also identified.
    Abstract Image-based systems have gained popularity owing to their capacity to provide rich manufacturing status information, low implementation costs and high acquisition rates. However, the complexity of the image background and various anomaly patterns pose new challenges to existing matrix decomposition methods, which are inadequate for modeling requirements. Moreover, the uncertainty of the anomaly can cause anomaly contamination problems, making the designed model and method highly susceptible to external disturbances. To address these challenges, we propose a two-stage strategy anomaly detection method that detects anomalies by identifying suspected patches (Ano-SuPs). Specifically, we propose to detect the patches with anomalies by reconstructing the input image twice: the first step is to obtain a set of normal patches by removing those suspected patches, and the second step is to use those normal patches to refine the identification of the patches with anomalies. To demonstrate its effectiveness, we evaluate the proposed method systematically through simulation experiments and case studies. We further identified the key parameters and designed steps that impact the model's performance and efficiency.
    摘要 图像基于系统在生产环境中得到普及,这主要归功于它们能够提供丰富的生产状况信息,以及实现成本和获得率的低。然而,图像背景的复杂性和各种异常模式带来了对现有矩阵分解方法的新挑战。此外,异常现象的不确定性会导致异常污染问题,使得设计的模型和方法容易受到外部干扰。为解决这些挑战,我们提出了一种两Stage策略异常检测方法,通过检测异常的补丁(Ano-SuPs)来检测异常。具体来说,我们首先从输入图像中提取出一组正常补丁,然后使用这些正常补丁来精细地定位异常补丁。为证明其效果,我们系统地通过实验和案例研究评估了提案的方法。此外,我们还标识出了影响模型性能和效率的关键参数和设计步骤。

Bold but Cautious: Unlocking the Potential of Personalized Federated Learning through Cautiously Aggressive Collaboration

  • paper_url: http://arxiv.org/abs/2309.11103
  • repo_url: https://github.com/kxzxvbk/Fling
  • paper_authors: Xinghao Wu, Xuefeng Liu, Jianwei Niu, Guogang Zhu, Shaojie Tang
  • for: 这篇论文主要关注在对多个客户进行协同学习时,减少非独立同分布(non-IID)资料的影响,并且将客户训练的个人化模型与其他客户进行协同学习。
  • methods: 这篇论文提出了一个新的协同学习指南,与现有的方法不同的是,它允许客户将更多的参数与其他客户共享,从而提高模型的性能。这篇论文还提出了一个名为FedCAC的新协同学习方法,它使用一个量值指数来评估各参数的非独立同分布敏感度,并将客户选择为协同学习者基于这个评估结果。
  • results: 实验结果显示,FedCAC比现有的方法更好地将客户的参数与其他客户共享,从而提高模型的性能,特别是在客户的资料分布不同时。
    Abstract Personalized federated learning (PFL) reduces the impact of non-independent and identically distributed (non-IID) data among clients by allowing each client to train a personalized model when collaborating with others. A key question in PFL is to decide which parameters of a client should be localized or shared with others. In current mainstream approaches, all layers that are sensitive to non-IID data (such as classifier layers) are generally personalized. The reasoning behind this approach is understandable, as localizing parameters that are easily influenced by non-IID data can prevent the potential negative effect of collaboration. However, we believe that this approach is too conservative for collaboration. For example, for a certain client, even if its parameters are easily influenced by non-IID data, it can still benefit by sharing these parameters with clients having similar data distribution. This observation emphasizes the importance of considering not only the sensitivity to non-IID data but also the similarity of data distribution when determining which parameters should be localized in PFL. This paper introduces a novel guideline for client collaboration in PFL. Unlike existing approaches that prohibit all collaboration of sensitive parameters, our guideline allows clients to share more parameters with others, leading to improved model performance. Additionally, we propose a new PFL method named FedCAC, which employs a quantitative metric to evaluate each parameter's sensitivity to non-IID data and carefully selects collaborators based on this evaluation. Experimental results demonstrate that FedCAC enables clients to share more parameters with others, resulting in superior performance compared to state-of-the-art methods, particularly in scenarios where clients have diverse distributions.
    摘要 Translated into Simplified Chinese:personalized federated learning (PFL) 减少客户端之间非独立和同分布数据的影响,通过让每个客户端训练个性化模型并与其他客户端合作。PFL中的关键问题是决定每个客户端的参数是否要本地化或与其他客户端共享。现今主流的方法是将所有敏感于非独立和同分布数据的层(例如分类层)都本地化。这种方法的原因是可以避免因合作而导致的可能性。然而,我们认为这种方法是对合作的过度保守。例如,对于某个客户端,即使其参数容易受到非独立和同分布数据的影响,但是它仍可以通过与其他客户端的数据分布相似性来共享参数,从而获得更好的性能。这一观察强调了在PFL中考虑参数的敏感度以及数据分布的相似性是非常重要的。本文提出了一种新的PFL客户端协作指南,与现今主流的方法不同之处在于,它允许客户端更多地共享参数,从而提高模型性能。此外,我们还提出了一种名为FedCAC的新的PFL方法,它使用一种量化的度量来评估每个参数的非独立和同分布数据的敏感度,并且根据这种评估来精心选择合作者。实验结果表明,FedCAC可以减少客户端之间的数据分布差异,从而实现与当前最佳方法相比的更好的性能。

Delays in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.11096
  • repo_url: https://github.com/reiniscimurs/DRL-robot-navigation
  • paper_authors: Pierre Liotet
  • for: 这个论文主要研究了延迟在动态系统中的影响,以及如何在延迟的情况下进行决策。
  • methods: 该论文使用了马可夫决策过程(MDP)作为基础框架,并研究了延迟在这种决策过程中的影响。
  • results: 该论文发现了延迟对动态系统的影响,并提出了一些可能的解决方案。同时,论文还Draws links between celebrated frameworks of reinforcement learning literature and the one of delays.
    Abstract Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays.
    摘要 <>translate "Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximize their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays." into Simplified Chinese.Here's the translation:<>多种动力系统中都存在延迟。 besides 延迟时间的偏移,它们可以对性能产生重要影响。因此,通常值得研究延迟并考虑其影响。因为它们是动力系统,因此也不surprisingly,sequential decision-making problemssuch as Markov decision processes (MDP) 也可以受到延迟的影响。这些过程是RL的基础框架,RL的目标是创建可以在环境中学习提高利用的人工智能代理。 RL已经取得了强大,occasionally astonishing的实验成果,但延迟通常不直接考虑。MDP中延迟的理解受限。在这个论文中,我们提议研究代理 Observation of the state of the environment 或执行代理动作中的延迟。我们将不断更改问题的视点,以揭示其结构和特点。广泛考虑延迟的范围,并提供可能的解决方案。这个论文还计划把RL文献中著名的框架与延迟框架相连接。

GPSINDy: Data-Driven Discovery of Equations of Motion

  • paper_url: http://arxiv.org/abs/2309.11076
  • repo_url: None
  • paper_authors: Junette Hsin, Shubhankar Agarwal, Adam Thorpe, David Fridovich-Keil
  • for: 本研究旨在寻找含有噪声数据的非线性动力系统模型。
  • methods: 我们将 Gaussian 过程回归、SINDy 参数学习方法结合起来,以便从数据中找到非线性动力系统模型。
  • results: 我们在一个 Lotka-Volterra 模型和一个 unicycle 动力系统上进行了实验和硬件数据处理,并证明了我们的方法可以更好地找到系统动力和预测未来轨迹。
    Abstract In this paper, we consider the problem of discovering dynamical system models from noisy data. The presence of noise is known to be a significant problem for symbolic regression algorithms. We combine Gaussian process regression, a nonparametric learning method, with SINDy, a parametric learning approach, to identify nonlinear dynamical systems from data. The key advantages of our proposed approach are its simplicity coupled with the fact that it demonstrates improved robustness properties with noisy data over SINDy. We demonstrate our proposed approach on a Lotka-Volterra model and a unicycle dynamic model in simulation and on an NVIDIA JetRacer system using hardware data. We demonstrate improved performance over SINDy for discovering the system dynamics and predicting future trajectories.
    摘要 在这篇论文中,我们考虑了从含噪数据中找到动力系统模型的问题。噪声知道会对符号回归算法产生很大的影响。我们将 Gaussian process regression 和 SINDy 结合起来,以非参数方式学习方法来识别非线性动力系统模型。我们的提议的方法的优点是简单易用,同时具有较好的鲁棒性特性,在含噪数据上表现 mejor than SINDy。我们在 Lotka-Volterra 模型和 unicycle 动态模型上进行了在 simulate 和 NVIDIA JetRacer 系统上使用硬件数据进行了实验,并证明了我们的方法可以更好地找到系统动力和预测未来轨迹。

InkStream: Real-time GNN Inference on Streaming Graphs via Incremental Update

  • paper_url: http://arxiv.org/abs/2309.11071
  • repo_url: None
  • paper_authors: Dan Wu, Zhaoying Li, Tulika Mitra
  • for: 本文旨在提出一种基于图 neural network (GNN) 的实时推理方法,以适应流动图的更新。
  • methods: 本方法基于两点关键见解:(1)在 $k$-hop 邻域内,大多数节点不受到修改边的影响,当使用汇聚函数时;(2)当模型权重保持不变,而图结构发生变化, THENode 嵌入可以逐渐发展于时间。基于这两点见解,我们提出了一种名为 InkStream 的新方法,用于实时推理,具有最小的内存访问和计算量,同时保证输出与传统方法相同。InkStream 基于事件驱动系统,控制了间层效应传播和内层增量更新节点嵌入。InkStream 高度可配置和扩展,allowing users to create and process customized events。
  • results: 我们在四个大图上使用三种 GNN 模型进行实验,显示 InkStream 在 CPU 集群上加速了 2.5-427 倍,在两个不同的 GPU 集群上加速了 2.4-343 倍,而且输出与传统方法的最新图快照相同。
    Abstract Classic Graph Neural Network (GNN) inference approaches, designed for static graphs, are ill-suited for streaming graphs that evolve with time. The dynamism intrinsic to streaming graphs necessitates constant updates, posing unique challenges to acceleration on GPU. We address these challenges based on two key insights: (1) Inside the $k$-hop neighborhood, a significant fraction of the nodes is not impacted by the modified edges when the model uses min or max as aggregation function; (2) When the model weights remain static while the graph structure changes, node embeddings can incrementally evolve over time by computing only the impacted part of the neighborhood. With these insights, we propose a novel method, InkStream, designed for real-time inference with minimal memory access and computation, while ensuring an identical output to conventional methods. InkStream operates on the principle of propagating and fetching data only when necessary. It uses an event-based system to control inter-layer effect propagation and intra-layer incremental updates of node embedding. InkStream is highly extensible and easily configurable by allowing users to create and process customized events. We showcase that less than 10 lines of additional user code are needed to support popular GNN models such as GCN, GraphSAGE, and GIN. Our experiments with three GNN models on four large graphs demonstrate that InkStream accelerates by 2.5-427$\times$ on a CPU cluster and 2.4-343$\times$ on two different GPU clusters while producing identical outputs as GNN model inference on the latest graph snapshot.
    摘要 传统的图 neural network (GNN) 推理方法,设计 для静止图,对流动图来说是不适用的。流动图的动态特性需要不断更新,这会带来特殊的加速挑战在 GPU 上。我们根据以下两个关键发现:(1)在 $k$-hop 邻域内,大量节点不会受到改变的边对 GNN 模型进行汇聚时的影响;(2)当模型权重保持不变而图结构发生变化时,节点嵌入可以逐渐发展在时间上,只需计算影响的部分邻域。基于这些发现,我们提出了一种新的方法,称为 InkStream,用于实时推理,具有最小的内存访问和计算量,同时保证输出和普通方法相同。InkStream 运行在事件驱动的系统上,控制间层效应传播和INTRA层增量更新节点嵌入。InkStream 高度可 configurable,可以让用户创建和处理自定义事件。我们的实验表明,使用 InkStream 可以在 CPU 集群上加速 2.5-427 倍,在两个不同的 GPU 集群上加速 2.4-343 倍,而且输出和普通方法相同。

Extreme Scenario Selection in Day-Ahead Power Grid Operational Planning

  • paper_url: http://arxiv.org/abs/2309.11067
  • repo_url: None
  • paper_authors: Guillermo Terrén-Serrano, Michael Ludkovski
  • for: 本研究旨在为短期电网规划选择极端情况,以降低运营风险。
  • methods: 本研究使用统计函数深度指标来筛选极端情况,以确定最有可能导致网络运营风险的情况。
  • results: 实验结果表明,使用统计函数深度指标可以有效地筛选出高风险情况,并且可以预测load shedding、运营成本、储备短缺和可变能源电停机等操作风险。
    Abstract We propose and analyze the application of statistical functional depth metrics for the selection of extreme scenarios in day-ahead grid planning. Our primary motivation is screening of probabilistic scenarios for realized load and renewable generation, in order to identify scenarios most relevant for operational risk mitigation. To handle the high-dimensionality of the scenarios across asset classes and intra-day periods, we employ functional measures of depth to sub-select outlying scenarios that are most likely to be the riskiest for the grid operation. We investigate a range of functional depth measures, as well as a range of operational risks, including load shedding, operational costs, reserves shortfall and variable renewable energy curtailment. The effectiveness of the proposed screening approach is demonstrated through a case study on the realistic Texas-7k grid.
    摘要 我们提出和分析使用统计函数深度指标来选择EXTREME场景在一天前电网规划中。我们的 PRIMARY motivation是对 probabilistic scenario 进行屏选,以便 identific scenarios 对电网操作风险最大化。为了处理不同资产类和时间段之间的高维度场景,我们使用函数指标来子选择异常场景,以便更好地了解电网操作风险。我们调查了一系列函数深度指标,以及一系列操作风险,包括荷 shedding、操作成本、储备短缺和可变可再生能源削减。我们的案例研究基于真实的 Texas-7k 电网。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

  • paper_url: http://arxiv.org/abs/2309.11048
  • repo_url: None
  • paper_authors: Nastaran Darabi, Amit R. Trivedi
  • for: This paper aims to improve area efficiency in deep learning inference tasks for edge computing applications, specifically addressing the challenges of limited storage and computing resources in edge devices.
  • methods: The proposed method employs two key strategies: (1) Frequency domain learning using binarized Walsh-Hadamard Transforms, which reduces the necessary parameters for DNN and enables compute-in-SRAM, and (2) a memory-immersed collaborative digitization method among CiM arrays to reduce the area overheads of conventional ADCs.
  • results: The proposed method achieves significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC, as demonstrated using a 65 nm CMOS test chip. The results show that it is possible to process analog data more efficiently and selectively retain valuable data from sensors, alleviating the challenges posed by the analog data deluge.Here’s the Chinese version of the three key information points:
  • for: 这篇论文旨在提高边缘计算应用中深度学习推理任务的面积效率,具体是解决边缘设备的存储和计算资源受限问题。
  • methods: 提议的方法采用了两种关键策略:(1)频域学习使用二进制沃尔什-哈达姆变换,减少深度学习模型中的参数数量(MobileNetV2中减少87%),并且使用计算在SRAM中进行计算,更好地利用并行性;(2)使用Memory-immersed collaborative digitization方法,将 CiM 数组与存储器集成,以降低传统ADC的面积开销。
  • results: 根据65nmCMOS测试板表现,提议的方法可以实现显著的面积和能耗减少,与40nm节点5位SAR ADC和5位Flash ADC相比。通过更有效地处理分析数据,可以选择性地保留感知器中的有价值数据,从而解决分析数据泛洪的问题。
    Abstract Edge computing is a promising solution for handling high-dimensional, multispectral analog data from sensors and IoT devices for applications such as autonomous drones. However, edge devices' limited storage and computing resources make it challenging to perform complex predictive modeling at the edge. Compute-in-memory (CiM) has emerged as a principal paradigm to minimize energy for deep learning-based inference at the edge. Nevertheless, integrating storage and processing complicates memory cells and/or memory peripherals, essentially trading off area efficiency for energy efficiency. This paper proposes a novel solution to improve area efficiency in deep learning inference tasks. The proposed method employs two key strategies. Firstly, a Frequency domain learning approach uses binarized Walsh-Hadamard Transforms, reducing the necessary parameters for DNN (by 87% in MobileNetV2) and enabling compute-in-SRAM, which better utilizes parallelism during inference. Secondly, a memory-immersed collaborative digitization method is described among CiM arrays to reduce the area overheads of conventional ADCs. This facilitates more CiM arrays in limited footprint designs, leading to better parallelism and reduced external memory accesses. Different networking configurations are explored, where Flash, SA, and their hybrid digitization steps can be implemented using the memory-immersed scheme. The results are demonstrated using a 65 nm CMOS test chip, exhibiting significant area and energy savings compared to a 40 nm-node 5-bit SAR ADC and 5-bit Flash ADC. By processing analog data more efficiently, it is possible to selectively retain valuable data from sensors and alleviate the challenges posed by the analog data deluge.
    摘要 “边缘计算是一种具有应用前景的解决方案,用于处理具有高维度和多 спектル的数据流畅的感应器和IoT设备。然而,边缘设备的储存和处理资源有限,导致复杂的预测模型在边缘进行实际问题。compute-in-memory(CiM)技术已经成为一种主要的解决方案,以降低运算的能源消耗。然而,将储存和处理复杂化的内存细节和/或内存周边设备,实际上是将面积效率与能源效率进行交换。本文提出了一个新的解决方案,以改善边缘运算中的面积效率。本方法使用了两个关键策略:首先,使用频率域学习方法,通过将数据压缩为二进制数据,并使用对称的华氏-哈达玛特转换,以降低运算所需的参数数量(MobileNetV2中降低87%),并允许在执行运算时使用SRAM进行计算。其次,描述了一种内存嵌入式合作数字化方法,用于实现 CiM 阵列中的内存与ADC之间的联系。这种方法可以在有限的面积设计中支持更多的 CiM 阵列,实现更好的并行性和对外存储器的减少。不同的网络配置被探讨,包括 Flash、SA 和它们的混合式数字化步骤。结果显示,使用本方法可以在65奈米CMOS试验板上展示出具有明显的面积和能源优化的功能。通过更有效地处理数据,可以对感应器中的有用数据进行选择性储存,从而缓解感应器中的数据潮汐问题。”

A Region-Shrinking-Based Acceleration for Classification-Based Derivative-Free Optimization

  • paper_url: http://arxiv.org/abs/2309.11036
  • repo_url: None
  • paper_authors: Tianyi Han, Jingya Li, Zhipeng Guo, Yuan Jin
  • for: 这篇论文主要关注于科学和工程设计优化问题中的梯度不可知分布式优化算法的框架。
  • methods: 本文提出了一种新的分类基于 derivative-free 优化算法,并引入了一个叫做假设目标隔离率的概念,以更新这类算法的计算复杂性Upper bound。
  • results: 根据实验结果,新提出的 “RACE-CARS” 算法比 traditional “SRACOS” 更快,并且对黑盒优化和自然语言处理中的语言模型服务进行了实证验证。此外,文章还进行了一个ablation experiment,探讨了 “RACE-CARS” 的机制和参数优化的指导。
    Abstract Derivative-free optimization algorithms play an important role in scientific and engineering design optimization problems, especially when derivative information is not accessible. In this paper, we study the framework of classification-based derivative-free optimization algorithms. By introducing a concept called hypothesis-target shattering rate, we revisit the computational complexity upper bound of this type of algorithms. Inspired by the revisited upper bound, we propose an algorithm named "RACE-CARS", which adds a random region-shrinking step compared with "SRACOS" (Hu et al., 2017).. We further establish a theorem showing the acceleration of region-shrinking. Experiments on the synthetic functions as well as black-box tuning for language-model-as-a-service demonstrate empirically the efficiency of "RACE-CARS". An ablation experiment on the introduced hyperparameters is also conducted, revealing the mechanism of "RACE-CARS" and putting forward an empirical hyperparameter-tuning guidance.
    摘要 derivative-free 优化算法在科学和工程设计优化问题中扮演着重要的角色,尤其是当 derivate 信息不可获取时。本文研究了类别基于的 derivative-free 优化算法框架。通过引入假设目标震荡率,我们重新评估了这类算法的计算复杂性Upper bound。 inspirited 由 revisited Upper bound,我们提出了名为 "RACE-CARS" 的算法,它在 "SRACOS" (Hu et al., 2017)中添加了随机区域缩小步骤。我们还证明了区域缩小的加速。对于 synthetic 函数以及黑盒调参语言模型服务,我们进行了实验,并证明了 "RACE-CARS" 的效率。另外,我们还进行了一个ablation experiment 对引入的超参数,探讨了 "RACE-CARS" 的机制,并提出了一个empirical 超参数调整指南。

The Topology and Geometry of Neural Representations

  • paper_url: http://arxiv.org/abs/2309.11028
  • repo_url: https://github.com/neurreps/awesome-neural-geometry
  • paper_authors: Baihan Lin, Nikolaus Kriegeskorte
  • for: 这项研究的目的是Characterize brain representations of perceptual and cognitive content, and distinguish different functional regions with robustness to noise and individual differences.
  • methods: 研究使用了 topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics to characterize the topology of brain representations while de-emphasizing the geometry.
  • results: 研究发现,使用这种新的统计方法可以robust to noise and interindividual variability, and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions.
    Abstract A central question for neuroscience is how to characterize brain representations of perceptual and cognitive content. An ideal characterization should distinguish different functional regions with robustness to noise and idiosyncrasies of individual brains that do not correspond to computational differences. Previous studies have characterized brain representations by their representational geometry, which is defined by the representational dissimilarity matrix (RDM), a summary statistic that abstracts from the roles of individual neurons (or responses channels) and characterizes the discriminability of stimuli. Here we explore a further step of abstraction: from the geometry to the topology of brain representations. We propose topological representational similarity analysis (tRSA), an extension of representational similarity analysis (RSA) that uses a family of geo-topological summary statistics that generalizes the RDM to characterize the topology while de-emphasizing the geometry. We evaluate this new family of statistics in terms of the sensitivity and specificity for model selection using both simulations and functional MRI (fMRI) data. In the simulations, the ground truth is a data-generating layer representation in a neural network model and the models are the same and other layers in different model instances (trained from different random seeds). In fMRI, the ground truth is a visual area and the models are the same and other areas measured in different subjects. Results show that topology-sensitive characterizations of population codes are robust to noise and interindividual variability and maintain excellent sensitivity to the unique representational signatures of different neural network layers and brain regions.
    摘要 中文翻译: neuroscience 中的一个中心问题是如何 caracterize 大脑表征的感知和认知内容。理想的 caracterization 应该能够分辨不同的功能区域,并具有对噪声和个体大脑差异的抗颤性。前一些研究已经使用 representational geometry 来 caracterize 大脑表征,其定义为各个 neuron 或 response channel 的表征差异矩阵 (RDM),这是一个摘要统计量,抑制了个体大脑差异的计算不同。在这篇文章中,我们 explore 一个进一步的抽象步骤:从 geometry 到大脑表征的 topology。我们提出 topological representational similarity analysis (tRSA),这是 representational similarity analysis (RSA) 的扩展,使用一个基于地理 topological 摘要统计量,这个统计量抑制了 geometry 的影响,专注于表征的 topology。我们使用 simulate 和 functional MRI (fMRI) 数据来评估这种新的家族统计量的敏感性和特点。在 simulate 中,ground truth 是一个数据生成层表示,模型是不同的 random seed 生成的不同层模型实例。在 fMRI 中,ground truth 是一个视觉区域,模型是不同的视觉区域和不同的主体 measured 的不同主体。结果显示,基于 topology 的人类代表码 caracterization 是噪声和个体差异的抗颤性,并保持了对不同 neural network 层和大脑区域的唯一表征签名的敏感性。

Information Leakage from Data Updates in Machine Learning Models

  • paper_url: http://arxiv.org/abs/2309.11022
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Tian Hui, Farhad Farokhi, Olga Ohrimenko
  • for: 本研究考虑在机器学习模型 retrained 在更新数据集上以 incorporate 最新信息或反映分布变化。
  • methods: 我们提出了基于模型Prediction confidence差异的攻击方法,并对两个公共数据集以及多层感知器和Logistic regression模型进行评估。
  • results: 我们发现,使用两个模型Snapshot可以导致更高的信息泄露,而且数据记录 WITH rare attribute value 更容易受到攻击。 repeated changes 可能会带来更大的泄露。
    Abstract In this paper we consider the setting where machine learning models are retrained on updated datasets in order to incorporate the most up-to-date information or reflect distribution shifts. We investigate whether one can infer information about these updates in the training data (e.g., changes to attribute values of records). Here, the adversary has access to snapshots of the machine learning model before and after the change in the dataset occurs. Contrary to the existing literature, we assume that an attribute of a single or multiple training data points are changed rather than entire data records are removed or added. We propose attacks based on the difference in the prediction confidence of the original model and the updated model. We evaluate our attack methods on two public datasets along with multi-layer perceptron and logistic regression models. We validate that two snapshots of the model can result in higher information leakage in comparison to having access to only the updated model. Moreover, we observe that data records with rare values are more vulnerable to attacks, which points to the disparate vulnerability of privacy attacks in the update setting. When multiple records with the same original attribute value are updated to the same new value (i.e., repeated changes), the attacker is more likely to correctly guess the updated values since repeated changes leave a larger footprint on the trained model. These observations point to vulnerability of machine learning models to attribute inference attacks in the update setting.
    摘要 在这篇论文中,我们考虑了机器学习模型在更新数据集后重新训练以包含最新的信息或反映分布变化。我们研究了是否可以从训练数据中推断更新信息(例如,记录属性值的更改)。在这个设定下,敌方可以访问机器学习模型的两个快照,即之前和之后更改数据集发生。不同于现有文献,我们假设单个或多个训练数据点的属性发生变化而不是整个数据记录被删除或添加。我们提出了基于原始模型和更新模型预测信任度差异的攻击方法。我们在两个公共数据集以及多层感知和折衔函数模型上进行了评估。我们发现了两个快照的模型可以导致更高的信息泄露,而且数据记录中的罕见值更容易受到攻击,这指出了机器学习模型在更新设定下的敏感度问题。当多个记录中的原始属性值都更新为同一个新值时(即重复更改),攻击者更可能正确地猜测更新值,因为重复更改会留下更大的模型训练中的印记。这些观察表明了机器学习模型在更新设定下面临的属性推断攻击。

3D-U-SAM Network For Few-shot Tooth Segmentation in CBCT Images

  • paper_url: http://arxiv.org/abs/2309.11015
  • repo_url: None
  • paper_authors: Yifu Zhang, Zuozhu Liu, Yang Feng, Renjing Xu
  • for: 解决3D dental图像分割任务中的样本数量太少问题,提出一种基于SAM预训练网络的3D-U-SAM网络。
  • methods: 采用了一种核心抽象方法,并在U-Net网络中设计了跳跃连接,以保留更多的细节信息。
  • results: 通过比较和采样大小实验表明,提出的方法可以更好地解决3D dental图像分割任务。
    Abstract Accurate representation of tooth position is extremely important in treatment. 3D dental image segmentation is a widely used method, however labelled 3D dental datasets are a scarce resource, leading to the problem of small samples that this task faces in many cases. To this end, we address this problem with a pretrained SAM and propose a novel 3D-U-SAM network for 3D dental image segmentation. Specifically, in order to solve the problem of using 2D pre-trained weights on 3D datasets, we adopted a convolution approximation method; in order to retain more details, we designed skip connections to fuse features at all levels with reference to U-Net. The effectiveness of the proposed method is demonstrated in ablation experiments, comparison experiments, and sample size experiments.
    摘要 很重要的是精确地表示牙齿的位置在治疗中。3D dental图像分割是一种广泛使用的方法,但标注的3D dental数据集是一种罕见的资源,导致这个任务在许多情况下面临着小样本问题。为解决这个问题,我们使用预训练的SAM并提议一种3D-U-SAM网络 для3D dental图像分割。具体来说,为了解决使用2D预训练 веса在3D数据集上的问题,我们采用了一种核心approximation方法;为了保留更多的细节,我们设计了跳转连接,以融合所有层的特征参照U-Net。我们的提议方法的效果在ablation实验、比较实验和样本大小实验中得到了证明。

It’s Simplex! Disaggregating Measures to Improve Certified Robustness

  • paper_url: http://arxiv.org/abs/2309.11005
  • repo_url: None
  • paper_authors: Andrew C. Cullen, Paul Montague, Shijie Liu, Sarah M. Erfani, Benjamin I. P. Rubinstein
  • for: 防御攻击的难点是模型预测的不可靠性,这种研究用了证明模型预测的可靠性,以提高模型的鲁棒性。
  • methods: 这种研究使用了证明模型预测的可靠性,通过计算攻击大小来保证模型预测的可靠性。
  • results: 这种研究发现,通过考虑可靠性证明的输出空间,可以提高证明机制的分析,并且可以超过现有状态的证明范围。实验证明,新的证明方法可以在噪声率为1时证明9%更多的样本,并且在预测任务的难度增加时,Relative improvement更大。
    Abstract Certified robustness circumvents the fragility of defences against adversarial attacks, by endowing model predictions with guarantees of class invariance for attacks up to a calculated size. While there is value in these certifications, the techniques through which we assess their performance do not present a proper accounting of their strengths and weaknesses, as their analysis has eschewed consideration of performance over individual samples in favour of aggregated measures. By considering the potential output space of certified models, this work presents two distinct approaches to improve the analysis of certification mechanisms, that allow for both dataset-independent and dataset-dependent measures of certification performance. Embracing such a perspective uncovers new certification approaches, which have the potential to more than double the achievable radius of certification, relative to current state-of-the-art. Empirical evaluation verifies that our new approach can certify $9\%$ more samples at noise scale $\sigma = 1$, with greater relative improvements observed as the difficulty of the predictive task increases.
    摘要 《认证类弹性超越防御攻击的脆弱性,通过将模型预测 garantuee 为攻击规模内的类型不变,从而确保模型在攻击下的预测稳定性。 although there is value in these certifications, the techniques used to assess their performance do not provide a comprehensive account of their strengths and weaknesses, as they have neglected to consider the performance of individual samples. by considering the potential output space of certified models, this work presents two distinct approaches to improve the analysis of certification mechanisms, which allow for both dataset-independent and dataset-dependent measures of certification performance. embracing such a perspective uncovers new certification approaches, which have the potential to more than double the achievable radius of certification, relative to current state-of-the-art. empirical evaluation verifies that our new approach can certify $9\%$ more samples at noise scale $\sigma = 1$, with greater relative improvements observed as the difficulty of the predictive task increases.》

Towards Data-centric Graph Machine Learning: Review and Outlook

  • paper_url: http://arxiv.org/abs/2309.10979
  • repo_url: None
  • paper_authors: Xin Zheng, Yixin Liu, Zhifeng Bao, Meng Fang, Xia Hu, Alan Wee-Chung Liew, Shirui Pan
    for: 这篇论文主要关注数据驱动AI的发展,尤其是Graph数据结构的应用。methods: 论文提出了一种系统化框架,名为Data-centric Graph Machine Learning(DC-GML),该框架包括Graph数据生命周期中的所有阶段,包括数据收集、探索、改进、利用和维护。results: 论文提供了一份完整的taxonomy,用于回答三个关键的Graph数据中心问题:1)如何提高Graph数据的可用性和质量;2)如何从限量可用和低质量的Graph数据中学习;3)如何建立基于Graph数据的Machine Learning操作系统。
    Abstract Data-centric AI, with its primary focus on the collection, management, and utilization of data to drive AI models and applications, has attracted increasing attention in recent years. In this article, we conduct an in-depth and comprehensive review, offering a forward-looking outlook on the current efforts in data-centric AI pertaining to graph data-the fundamental data structure for representing and capturing intricate dependencies among massive and diverse real-life entities. We introduce a systematic framework, Data-centric Graph Machine Learning (DC-GML), that encompasses all stages of the graph data lifecycle, including graph data collection, exploration, improvement, exploitation, and maintenance. A thorough taxonomy of each stage is presented to answer three critical graph-centric questions: (1) how to enhance graph data availability and quality; (2) how to learn from graph data with limited-availability and low-quality; (3) how to build graph MLOps systems from the graph data-centric view. Lastly, we pinpoint the future prospects of the DC-GML domain, providing insights to navigate its advancements and applications.
    摘要 “数据驱动AI”在最近几年内受到了越来越多的关注,它的核心是通过收集、管理和利用数据驱动AI模型和应用程序。在这篇文章中,我们提供了一个深入和全面的评论,对现在的数据驱动AI方面的努力进行了详细的梳理,特别是在图数据strucuture上,图数据是现实世界中各种各样的实体之间的复杂依赖关系的基本表示方式。我们提出了一个涵盖所有图数据生命周期阶段的系统框架,称为数据驱动图机器学习(DC-GML),包括图数据收集、探索、改进、利用和维护等阶段。我们还提供了每个阶段的住进行三个关键问题的答案:(1)如何提高图数据可用性和质量;(2)如何从有限可用性和低质量的图数据中学习;(3)如何从图数据中心视建立图MLOps系统。最后,我们指出了DC-GML领域未来的前景,为其发展和应用提供了指导。

PAGER: A Framework for Failure Analysis of Deep Regression Models

  • paper_url: http://arxiv.org/abs/2309.10977
  • repo_url: None
  • paper_authors: Jayaraman J. Thiagarajan, Vivek Narayanaswamy, Puja Trivedi, Rushil Anirudh
  • for: 本文旨在提出一种检测深度回归模型预测错误的框架,以确保人工智能模型的安全部署。
  • methods: 本文使用了建立在深度模型中的稳定点的想法,并结合了知识 uncertainty 和非 conformity 分数,将样本分为不同的风险 régime。
  • results: 对于 synthetic 和实际 benchmark 进行了评估,结果显示了 PAGER 可以准确地检测出深度回归模型的预测错误,并且可以在不同的风险 régime 中分类样本。
    Abstract Safe deployment of AI models requires proactive detection of potential prediction failures to prevent costly errors. While failure detection in classification problems has received significant attention, characterizing failure modes in regression tasks is more complicated and less explored. Existing approaches rely on epistemic uncertainties or feature inconsistency with the training distribution to characterize model risk. However, we show that uncertainties are necessary but insufficient to accurately characterize failure, owing to the various sources of error. In this paper, we propose PAGER (Principled Analysis of Generalization Errors in Regressors), a framework to systematically detect and characterize failures in deep regression models. Built upon the recently proposed idea of anchoring in deep models, PAGER unifies both epistemic uncertainties and novel, complementary non-conformity scores to organize samples into different risk regimes, thereby providing a comprehensive analysis of model errors. Additionally, we introduce novel metrics for evaluating failure detectors in regression tasks. We demonstrate the effectiveness of PAGER on synthetic and real-world benchmarks. Our results highlight the capability of PAGER to identify regions of accurate generalization and detect failure cases in out-of-distribution and out-of-support scenarios.
    摘要 安全部署人工智能模型需要积极检测可能出现的预测错误,以避免高昂的错误成本。尽管在分类问题上的失败检测已经收到了广泛的关注,但在回归任务中的失败模式特征化尚未得到了充分的研究。现有的方法基于模型知识不确定性或特征偏移度与训练分布相关的方法来特征化模型风险。然而,我们表明了不确定性是特征化失败的必要条件,但并不够。在这篇论文中,我们提出了PAGER(基于深度模型的概念分析和总结),一种框架,用于系统地检测和特征化深度回归模型中的失败。基于深度模型的安chor思想,PAGER结合了epistemic不确定性和新的非准确性分数,将样本分为不同的风险 режимом,从而提供了全面的模型错误分析。此外,我们提出了新的评价失败检测器的度量方法。我们在synthetic和实际世界 benchmark上证明了PAGER的效果。我们的结果显示,PAGER能够标识出高度普适泛化和out-of-distribution和out-of-support场景中的失败案例。

Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2309.10976
  • repo_url: None
  • paper_authors: Puja Trivedi, Mark Heimann, Rushil Anirudh, Danai Koutra, Jayaraman J. Thiagarajan
  • for: 这个论文的目的是提高图 neural network(GNN)的安全部署,特别是在分布shift时提供准确的信任度指标(CI)。
  • methods: 这篇论文使用了一种case study来研究GNN CI的准确性,并证明了增加表达能力或模型大小不总是能提高CI性能。而是使用epistemic uncertainty量化(UQ)方法来调整CI。提出了一种新的单模型UQ方法——G-$\Delta$UQ,它基于最近提出的随机中心框架,支持结构化数据和部分随机性。
  • results: 对于covariate、concept和图大小shift,G-$\Delta$UQ不仅在获得准确的CI方面表现出色,还在使用CI进行泛化差分预测和OOD检测方面表现更好于其他popular UQ方法。总的来说,这篇论文不仅介绍了一种新的GNN UQ方法,还提供了图 neural network在安全关键任务上的新的理解。
    Abstract Safe deployment of graph neural networks (GNNs) under distribution shift requires models to provide accurate confidence indicators (CI). However, while it is well-known in computer vision that CI quality diminishes under distribution shift, this behavior remains understudied for GNNs. Hence, we begin with a case study on CI calibration under controlled structural and feature distribution shifts and demonstrate that increased expressivity or model size do not always lead to improved CI performance. Consequently, we instead advocate for the use of epistemic uncertainty quantification (UQ) methods to modulate CIs. To this end, we propose G-$\Delta$UQ, a new single model UQ method that extends the recently proposed stochastic centering framework to support structured data and partial stochasticity. Evaluated across covariate, concept, and graph size shifts, G-$\Delta$UQ not only outperforms several popular UQ methods in obtaining calibrated CIs, but also outperforms alternatives when CIs are used for generalization gap prediction or OOD detection. Overall, our work not only introduces a new, flexible GNN UQ method, but also provides novel insights into GNN CIs on safety-critical tasks.
    摘要 安全部署图 neural network (GNN) 需要模型提供准确的信任指标 (CI)。然而,虽然在计算机视觉中已经证明了 CI 质量下降于分布转移,但这一点尚未得到对 GNN 的研究。因此,我们开始了一项案例研究,探讨了 CI 准确性下降的情况,并发现增加表达能力或模型大小不一定能提高 CI 性能。因此,我们建议使用 epistemic 不确定性量化 (UQ) 方法来调整 CIs。为此,我们提出了 G-ΔUQ,一种新的单模型 UQ 方法,扩展了最近提出的随机中心框架,以支持结构化数据和部分随机性。经过 covariate、概念和图大小转移的评估,G-ΔUQ 不仅在获得准确的 CIs 方面超过了许多流行的 UQ 方法,还在用 CIs 进行泛化差分预测或 OOD 探测时表现更好。总的来说,我们不仅提出了一种新的、灵活的 GNN UQ 方法,而且为安全关键任务提供了新的思路和发现。

SPFQ: A Stochastic Algorithm and Its Error Analysis for Neural Network Quantization

  • paper_url: http://arxiv.org/abs/2309.10975
  • repo_url: None
  • paper_authors: Jinjie Zhang, Rayan Saab
  • for: 这个论文是为了提出一种高效的神经网络压缩方法,以减少过参化神经网络中的重复性。
  • methods: 这个论文使用了一种快速的随机算法来压缩神经网络的权重。该方法利用了一种扩展的权重路径跟踪机制,以及一种随机压缩器。其计算复杂度只与神经网络中neuron数量成线性关系,因此可以有效地压缩大型神经网络。
  • results: 这个论文提出了一种全网络误差边界,以及一种在 Gaussian 权重下可以实现的高效压缩方法。此外,论文还证明了,当采用这种方法压缩多层神经网络时,误差表达的平方幂 decay Linear 方式与过参化程度增长。此外,论文还证明了可以使用 loglog N 比特数来实现误差边界相当于无限字母情况下的误差边界。
    Abstract Quantization is a widely used compression method that effectively reduces redundancies in over-parameterized neural networks. However, existing quantization techniques for deep neural networks often lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere $\log\log N$ bits per weight, where $N$ represents the largest number of neurons in a layer.
    摘要 量化是一种广泛使用的压缩方法,可以有效地减少深度神经网络中的重复性。然而,现有的深度神经网络量化技术 frequently lack a comprehensive error analysis due to the presence of non-convex loss functions and nonlinear activations. In this paper, we propose a fast stochastic algorithm for quantizing the weights of fully trained neural networks. Our approach leverages a greedy path-following mechanism in combination with a stochastic quantizer. Its computational complexity scales only linearly with the number of weights in the network, thereby enabling the efficient quantization of large networks. Importantly, we establish, for the first time, full-network error bounds, under an infinite alphabet condition and minimal assumptions on the weights and input data. As an application of this result, we prove that when quantizing a multi-layer network having Gaussian weights, the relative square quantization error exhibits a linear decay as the degree of over-parametrization increases. Furthermore, we demonstrate that it is possible to achieve error bounds equivalent to those obtained in the infinite alphabet case, using on the order of a mere $\log\log N$ bits per weight, where $N$ represents the largest number of neurons in a layer.Note: The translation is done using the Google Translate API, and may not be perfect. Please let me know if you need any further assistance.