cs.LG - 2023-11-19

Physics-Enhanced TinyML for Real-Time Detection of Ground Magnetic Anomalies

  • paper_url: http://arxiv.org/abs/2311.11452
  • repo_url: None
  • paper_authors: Talha Siddique, MD Shaad Mahmud
  • for: 这篇论文旨在发展一个基于物理指导的小型机器学习(Tiny Machine Learning)框架,以便提高宇宙天气预报中的精度和可靠性。
  • methods: 本论文使用了一个组合了物理基于调整和压缩的TinyML框架,以便实现高效的资料处理和模型预测。
  • results: 研究结果显示,该提出的框架能够提高预报的精度和可靠性,并且与传统方法进行比较,显示出该框架在宇宙天气预报中的应用前景。
    Abstract Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.
    摘要 Space weather phenomena like geomagnetic disturbances (GMDs) and geomagnetically induced currents (GICs) pose significant risks to critical technological infrastructure. While traditional predictive models, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.Here's the word-for-word translation of the text into Simplified Chinese:Space weather现象如地磁干扰(GMDs)和地磁引起的电流(GICs)对重要的技术基础设施 pose significant risks. While traditional predictive模型, grounded in simulation, hold theoretical robustness, they grapple with challenges, notably the assimilation of imprecise data and extensive computational complexities. In recent years, Tiny Machine Learning (TinyML) has been adopted to develop Machine Learning (ML)-enabled magnetometer systems for predicting real-time terrestrial magnetic perturbations as a proxy measure for GIC. While TinyML offers efficient, real-time data processing, its intrinsic limitations prevent the utilization of robust methods with high computational needs. This paper developed a physics-guided TinyML framework to address the above challenges. This framework integrates physics-based regularization at the stages of model training and compression, thereby augmenting the reliability of predictions. The developed pruning scheme within the framework harnesses the inherent physical characteristics of the domain, striking a balance between model size and robustness. The study presents empirical results, drawing a comprehensive comparison between the accuracy and reliability of the developed framework and its traditional counterpart. Such a comparative analysis underscores the prospective applicability of the developed framework in conceptualizing robust, ML-enabled magnetometer systems for real-time space weather forecasting.

Weight Norm Control

  • paper_url: http://arxiv.org/abs/2311.11446
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Ilya Loshchilov
  • for: 本研究证明了预设 weights 的 norm 值不一定是最佳的,可以考虑其他 norm 值来优化模型性能。
  • methods: 本文提出了一种基于 weight norm control 的权重衰减正则化方法,并证明了这种方法可以与 Adam 等优化器结合使用。
  • results: 通过实验研究,本文发现了使用 weight norm control 正则化方法可以提高模型性能,并且可以与 Adam 等优化器结合使用。
    Abstract We note that decoupled weight decay regularization is a particular case of weight norm control where the target norm of weights is set to 0. Any optimization method (e.g., Adam) which uses decoupled weight decay regularization (respectively, AdamW) can be viewed as a particular case of a more general algorithm with weight norm control (respectively, AdamWN). We argue that setting the target norm of weights to 0 can be suboptimal and other target norm values can be considered. For instance, any training run where AdamW achieves a particular norm of weights can be challenged by AdamWN scheduled to achieve a comparable norm of weights. We discuss various implications of introducing weight norm control instead of weight decay.
    摘要 我们注意到分离式重量衰减调教是重量 нор 控制的特例,其中Target norm of weights是设置为0。任何优化方法(例如Adam)使用分离式重量衰减调教(即AdamW)可以被视为一个更通用的算法,并且可以考虑其他Target norm值。例如,任何训练运行使用AdamW achieve particular norm of weights时,可以被挑战由AdamWN scheduled to achieve comparable norm of weights。我们讨论了将 weight norm control 替代 weight decay 的各种后果。

Duality of Bures and Shape Distances with Implications for Comparing Neural Representations

  • paper_url: http://arxiv.org/abs/2311.11436
  • repo_url: None
  • paper_authors: Sarah E. Harvey, Brett W. Larsen, Alex H. Williams
  • for: 这篇论文旨在统一两类神经网络表示之间的不同类型的相似度量表示法,并探讨这些方法之间的关系。
  • methods: 这篇论文使用了cosine函数来关联category 1中的里曼射影shape distance与category 2中的normalized Bures similarity,从而带来新的解释和对CKA相似度量的比较。
  • results: 研究发现,cosine函数的里曼射影shape distance与normalized Bures similarity之间存在等式关系,这些结果带来新的解释和对CKA相似度量的比较。
    Abstract A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape. Most of these measures fall into one of two categories. First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. Here, we take steps towards unifying these two broad categories of methods by observing that the cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2). We explore how this connection leads to new interpretations of shape distances and NBS, and draw contrasts of these measures with CKA, a popular similarity measure in the deep learning literature.
    摘要 一种多样的(不同)相似度测量方法between神经网络表示方式已经被提出,导致了一个分散的研究领域。大多数这些测量方法都 falls into two categories。 first, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity while accounting for expected invariances. Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics, such as stimulus-by-stimulus kernel matrices, which are already invariant to expected symmetries. 在这篇文章中,我们尝试通过观察cosine of the Riemannian shape distance (from category 1) is equal to NBS (from category 2)来统一这两类方法。我们探索这个连接如何导致新的Shape distances和NBS的解释,并与CKA, a popular similarity measure in the deep learning literature,进行比较。

Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training

  • paper_url: http://arxiv.org/abs/2311.11429
  • repo_url: None
  • paper_authors: Lianke Qin, Saayan Mitra, Zhao Song, Yuanyuan Yang, Tianyi Zhou
  • for: 这个论文解决了一个含有内积计算的内积识别问题,该问题泛化了光棒问题(参考文献:\cite{prr89})。给定两个集合 $A \subset {-1,+1}^d$ 和 $B \subset {-1,+1}^d$,其中 $|A|=|B|=n$,如果存在 exact $k$ 对 whose inner product exceeds a certain threshold, i.e., ${(a_1, b_1), \cdots, (a_k, b_k)} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$,where $\rho \in (0,1)$ is the threshold, then the goal is to identify those $k$ heavy inner product pairs.
  • methods: 我们提供了一种算法,运行时间为 $O(n^{2 \omega / 3+ o(1)})$,可以高 probabilty 地找到 exceeds $\rho \cdot d$ 的 $k$ inner product pair。该算法基于 matrix multiplication 和 random sampling 技术。
  • results: 我们的算法可以快速地训练具有 ReLU 活化函数的神经网络。
    Abstract In this paper, we consider a heavy inner product identification problem, which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets $A \subset \{-1,+1\}^d$ and $B \subset \{-1,+1\}^d$ with $|A|=|B| = n$, if there are exact $k$ pairs whose inner product passes a certain threshold, i.e., $\{(a_1, b_1), \cdots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i,b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, the goal is to identify those $k$ heavy inner products. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method speed up the training of neural networks with ReLU activation function.
    摘要 在本文中,我们考虑了一个重要的内积识别问题,该问题泛化了光泽问题(参考\cite{prr89}):给定两个集合 $A \subset \{-1,+1\}^d$ 和 $B \subset \{-1,+1\}^d$,其中 $|A|=|B|=n$,如果存在 exact $k$ 对 whose inner product exceeds a certain threshold, i.e., $\{(a_1, b_1), \ldots, (a_k, b_k)\} \subset A \times B$ such that $\forall i \in [k], \langle a_i, b_i \rangle \geq \rho \cdot d$, for a threshold $\rho \in (0,1)$, then the goal is to identify those $k$ heavy inner product pairs. We provide an algorithm that runs in $O(n^{2 \omega / 3+ o(1)})$ time to find the $k$ inner product pairs that surpass $\rho \cdot d$ threshold with high probability, where $\omega$ is the current matrix multiplication exponent. By solving this problem, our method can speed up the training of neural networks with ReLU activation function.

Tensor-Aware Energy Accounting

  • paper_url: http://arxiv.org/abs/2311.11424
  • repo_url: https://github.com/project-smaragdine/smaragdine
  • paper_authors: Timur Babakol, Yu David Liu
  • For: This paper aims to introduce Smaragdine, a novel energy accounting system for tensor-based deep learning (DL) programs implemented with TensorFlow, to improve the energy efficiency of DL applications.* Methods: Smaragdine uses a novel white-box methodology of energy accounting that is aware of the internal structure of the DL program, allowing for a detailed breakdown of energy consumption by units aligned with the logical hierarchical decomposition structure.* Results: Smaragdine was applied to understand the energy behavior of BERT, a widely used language model, and was capable of identifying the highest energy/power-consuming components of BERT layer-by-layer and tensor-by-tensor. Additionally, two case studies were conducted to demonstrate the effectiveness of Smaragdine in supporting downstream toolchain building, one comparing the energy impact of hyperparameter tuning of BERT and the other analyzing the energy behavior evolution of BERT as it evolves to its next generation, ALBERT.
    Abstract With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.
    摘要 随着人工智能(AI)应用程序基于深度学习(DL)的快速发展,这些应用程序的能源效率已经对可持续发展产生了越来越大的影响。我们介绍了一种新的能源账务系统:Smaragdine,用于 tensor-based DL 程序实现的 TensorFlow。Smaragdine 的核心是一种新的白盒方法:tensor-aware energy accounting,它能够跟踪 tensor 级别的能源消耗。我们通过应用 Smaragdine 来理解 BERT 语言模型的能量行为。层次地和tensor地,Smaragdine 可以识别 BERT 中最高能耗/功率消耗的组件。此外,我们还进行了两个 случа研究,一是关于 BERT 的hyperparameter 优化对能源影响的比较,另一是关于 BERT 进化到下一代 ALBERT 时的能量行为演化。

Offline Reinforcement Learning for Wireless Network Optimization with Mixture Datasets

  • paper_url: http://arxiv.org/abs/2311.11423
  • repo_url: None
  • paper_authors: Kun Yang, Cong Shen, Jing Yang, Shu-ping Yeh, Jerry Sydir
  • For: 这个论文的目的是研究在无线通信系统中应用 reinforcement learning (RL) 的在线学习方法,以解决无线 radio 资源管理 (RRM) 问题。* Methods: 这个论文使用了多种 state-of-the-art 的 offline RL 算法,包括行为约束 Q-学习 (BCQ)、保守 Q-学习 (CQL) 和隐式 Q-学习 (IQL),来解决一个具体的 RRM 问题,该问题的目标是通过用户调度来最大化一个线性组合 {of sum和} 5-percentile 率。* Results: 论文发现了在 RRM 问题上使用 offline RL 的性能取决于用于数据收集的行为策略,并提出了一种新的 offline RL 解决方案,该解决方案可以利用不同的行为策略收集的多种数据进行权重平均,从而生成一个近似于最优 RL 策略。
    Abstract The recent development of reinforcement learning (RL) has boosted the adoption of online RL for wireless radio resource management (RRM). However, online RL algorithms require direct interactions with the environment, which may be undesirable given the potential performance loss due to the unavoidable exploration in RL. In this work, we first investigate the use of \emph{offline} RL algorithms in solving the RRM problem. We evaluate several state-of-the-art offline RL algorithms, including behavior constrained Q-learning (BCQ), conservative Q-learning (CQL), and implicit Q-learning (IQL), for a specific RRM problem that aims at maximizing a linear combination {of sum and} 5-percentile rates via user scheduling. We observe that the performance of offline RL for the RRM problem depends critically on the behavior policy used for data collection, and further propose a novel offline RL solution that leverages heterogeneous datasets collected by different behavior policies. We show that with a proper mixture of the datasets, offline RL can produce a near-optimal RL policy even when all involved behavior policies are highly suboptimal.
    摘要 近期的增强学习(RL)发展,使得在线RL在无线电路资源管理(RRM)中得到广泛应用。然而,在线RL算法需要直接与环境交互,这可能是不良的,因为RL中不可避免的探索可能会导致性能下降。在这种情况下,我们首先研究使用停机RL算法解决RRM问题。我们评估了一些当今最佳的停机RL算法,包括行为约束Q学习(BCQ)、保守Q学习(CQL)和隐式Q学习(IQL),并评估其在特定RRM问题上的性能。我们发现,停机RL在RRM问题中的性能取决于用于数据采集的行为策略,并提出了一种新的停机RL解决方案,该解决方案利用不同行为策略收集的多种多样的数据进行混合。我们表明,将这些数据进行混合,停机RL可以生成一个几乎优质RL策略,即使所有参与的行为策略都是高度不优的。

Precision at the indistinguishability threshold: a method for evaluating classification algorithms

  • paper_url: http://arxiv.org/abs/2311.11422
  • repo_url: None
  • paper_authors: David J. T. Sumpter
  • for: 本研究想要解决现有的分类算法评价指标问题,提出一个新的指标来衡量分类算法的性能。
  • methods: 本研究使用一个新的指标,即“ precisions at the indistinguishability threshold”,这个指标可以衡量一个分类算法在分类精度和准确率之间的融合。
  • results: 研究发现,这个新指标可以更好地衡量分类算法的性能,而且不同于现有的指标,如AUC和F1-score,它不受分类精度和准确率的融合影响。
    Abstract There exist a wide range of single number metrics for assessing performance of classification algorithms, including AUC and the F1-score (Wikipedia lists 17 such metrics, with 27 different names). In this article, I propose a new metric to answer the following question: when an algorithm is tuned so that it can no longer distinguish labelled cats from real cats, how often does a randomly chosen image that has been labelled as containing a cat actually contain a cat? The steps to construct this metric are as follows. First, we set a threshold score such that when the algorithm is shown two randomly-chosen images -- one that has a score greater than the threshold (i.e. a picture labelled as containing a cat) and another from those pictures that really does contain a cat -- the probability that the image with the highest score is the one chosen from the set of real cat images is 50\%. At this decision threshold, the set of positively labelled images are indistinguishable from the set of images which are positive. Then, as a second step, we measure performance by asking how often a randomly chosen picture from those labelled as containing a cat actually contains a cat. This metric can be thought of as {\it precision at the indistinguishability threshold}. While this new metric doesn't address the tradeoff between precision and recall inherent to all such metrics, I do show why this method avoids pitfalls that can occur when using, for example AUC, and it is better motivated than, for example, the F1-score.
    摘要 存在许多单数量指标用于评估分类算法的性能,包括AUC和F1分数(Wikipedia列出17种指标,共27个不同的名称)。在这篇文章中,我提出了一个新的指标,用于回答以下问题:当一个算法通过调整后不能区分标注的猫和实际的猫之间,某个随机选择的图像中是否实际上包含猫?为构建这个指标,我们首先设置一个阈值分数,使得当算法被给两个随机选择的图像——一个分数高于阈值(即一个标注为猫的图像)和另一个来自真正的猫图像集中的图像——的概率是50%。在这个决策阈值下,标注为猫的图像集和真正的猫图像集是无法区分的。然后,作为第二步,我们测量性能的方式是,随机选择标注为猫的图像中是否实际上包含猫。这个指标可以被理解为“在不可区分性阈值上的准确率”。这个新指标不同于AUC和F1分数等其他指标,它不处理分类算法的准确率和敏感度之间的负担。然而,我展示了这种方法可以避免使用AUC等指标时可能出现的坑,并且更有motivation чемF1分数。

Large Pre-trained time series models for cross-domain Time series analysis tasks

  • paper_url: http://arxiv.org/abs/2311.11413
  • repo_url: None
  • paper_authors: Harshavardhan Kamarthi, B. Aditya Prakash
  • For: 这个论文的目的是为了预训一个通用时序序列模型,以便在不同领域的时序序列分析任务中提供更高的性能。* Methods: 这篇论文使用了一种新的自适应分割策略,该策略可以自动根据不同领域的时序序列数据来选择最佳的分割策略,并在预训期间使用自然语言搅拌学习损失来学习这些分割策略。* Results: 论文的实验结果表明,使用该自适应分割策略和自然语言搅拌学习损失可以在多个不同领域的时序序列分析任务中提供类似或更好的性能,并且需要训练时间和数据量的减少。
    Abstract Large pre-trained models have been instrumental in significant advancements in domains like language and vision making model training for individual downstream tasks more efficient as well as provide superior performance. However, tackling time-series analysis tasks usually involves designing and training a separate model from scratch leveraging training data and domain expertise specific to the task. We tackle a significant challenge for pre-training a general time-series model from multiple heterogeneous time-series dataset: providing semantically useful inputs to models for modeling time series of different dynamics from different domains. We observe that partitioning time-series into segments as inputs to sequential models produces semantically better inputs and propose a novel model LPTM that automatically identifies optimal dataset-specific segmentation strategy leveraging self-supervised learning loss during pre-training. LPTM provides performance similar to or better than domain-specific state-of-art model and is significantly more data and compute efficient taking up to 40% less data as well as 50% less training time to achieve state-of-art performance in a wide range of time-series analysis tasks from multiple disparate domain.
    摘要 大型预训模型在语言和视觉等领域取得了重要进步,使模型训练 для特定下游任务变得更加效率,并提供了更高的性能。然而,对时序分析任务通常需要针对特定任务和领域设计和训练独立的模型。我们解决了在多个不同领域的时序数据上预训一个通用时序模型的挑战:提供Semantically meaningful输入模型,以便模型可以更好地处理不同领域的时序序列。我们发现,将时序序列分割成段 inputs to sequential models生成更有意义的输入,并提出了一种新的模型LPTM,可以自动确定数据集特定的优化分割策略,通过自动学习损失来进行预训。LPTM在多个不同领域的时序分析任务中提供了和领域特定状态态的模型性能相似或更好的性能,并且在训练时间和数据量方面具有50%和40%的下降。

Negotiated Representations for Machine Mearning Application

  • paper_url: http://arxiv.org/abs/2311.11410
  • repo_url: https://github.com/nurikorhan/negotiated-representations
  • paper_authors: Nuri Korhan, Samet Bayram
  • for: 提高机器学习模型的分类精度和避免过拟合
  • methods: 通过模型与已知标签之间的谈判来增强模型的解释能力和泛化能力
  • results: 通过在CIFAR 10、CIFAR 100和MNIST等低度机器学习问题上实验,实现了提高分类精度和降低过拟合的目标,并且达到了许多其他研究领域的应用前提。
    Abstract Overfitting is a phenomenon that occurs when a machine learning model is trained for too long and focused too much on the exact fitness of the training samples to the provided training labels and cannot keep track of the predictive rules that would be useful on the test data. This phenomenon is commonly attributed to memorization of particular samples, memorization of the noise, and forced fitness into a data set of limited samples by using a high number of neurons. While it is true that the model encodes various peculiarities as the training process continues, we argue that most of the overfitting occurs in the process of reconciling sharply defined membership ratios. In this study, we present an approach that increases the classification accuracy of machine learning models by allowing the model to negotiate output representations of the samples with previously determined class labels. By setting up a negotiation between the models interpretation of the inputs and the provided labels, we not only increased average classification accuracy but also decreased the rate of overfitting without applying any other regularization tricks. By implementing our negotiation paradigm approach to several low regime machine learning problems by generating overfitting scenarios from publicly available data sets such as CIFAR 10, CIFAR 100, and MNIST we have demonstrated that the proposed paradigm has more capacity than its intended purpose. We are sharing the experimental results and inviting the machine learning community to explore the limits of the proposed paradigm. We also aim to incentive the community to exploit the negotiation paradigm to overcome the learning related challenges in other research fields such as continual learning. The Python code of the experimental setup is uploaded to GitHub.
    摘要 “过拟合”是一种现象,当机器学习模型在训练过程中进行了太长时间,专注过多在训练样本与提供的训练标签之间的精确对应,而无法维护预测规律,导致模型对于测试样本的预测失败。这现象通常被归因于模型对特定样本的记忆、对于样本的噪音的记忆以及使用高量的神经元。虽然模型在训练过程中encode了许多特性,但我们认为大多数过拟合发生在推导鲜明的会员比率。在这篇研究中,我们提出了一种方法,可以增加机器学习模型的分类精度,并降低过拟合的比率,不需要其他调整技巧。我们通过在模型对输入的解释和提供的标签之间进行协商,使模型能够更好地预测测试样本。我们将这个协商 paradigm 应用到了一些低度机器学习问题中,从公共可用数据集如 CIFAR 10、CIFAR 100 和 MNIST 中产生了过拟合场景,并证明了我们的方法有更多的容量。我们将实验结果分享,邀请机器学习社区探索这个方法的限制,并务实阶段的挑战。我们还希望通过这个协商方法,导致其他研究领域,如持续学习,的挑战。Python 代码的实验设置已经上传到 GitHub。

Towards interpretable-by-design deep learning algorithms

  • paper_url: http://arxiv.org/abs/2311.11396
  • repo_url: None
  • paper_authors: Plamen Angelov, Dmitry Kangin, Ziyang Zhang
  • for: 这个论文旨在提出一种能够解释深度学习模型的框架,以便在深度学习模型的基础上构建更加解释性的模型。
  • methods: 该论文使用了现有的大 neural network(如视transformer)的固有空间,并将其转化为一种基于类似度的函数,以实现解释性。
  • results: 该论文的结果表明,可以使用这种方法转化深度学习模型,以获得更加解释性的模型,并且可以在不需要微调特征空间的情况下进行分类学习。
    Abstract The proposed framework named IDEAL (Interpretable-by-design DEep learning ALgorithms) recasts the standard supervised classification problem into a function of similarity to a set of prototypes derived from the training data, while taking advantage of existing latent spaces of large neural networks forming so-called Foundation Models (FM). This addresses the issue of explainability (stage B) while retaining the benefits from the tremendous achievements offered by DL models (e.g., visual transformers, ViT) pre-trained on huge data sets such as IG-3.6B + ImageNet-1K or LVD-142M (stage A). We show that one can turn such DL models into conceptually simpler, explainable-through-prototypes ones. The key findings can be summarized as follows: (1) the proposed models are interpretable through prototypes, mitigating the issue of confounded interpretations, (2) the proposed IDEAL framework circumvents the issue of catastrophic forgetting allowing efficient class-incremental learning, and (3) the proposed IDEAL approach demonstrates that ViT architectures narrow the gap between finetuned and non-finetuned models allowing for transfer learning in a fraction of time \textbf{without} finetuning of the feature space on a target dataset with iterative supervised methods.
    摘要 提案的框架名为IDEAL(可解释性设计深度学习算法)将标准的监督学习问题转化为与训练数据集中的prototype函数的函数,同时利用现有的大神经网络的固有空间,即基础模型(FM)。这种方法解决了解释性问题(stage B),而不失去深度学习模型的成果(如视觉转换器、ViT)在大量数据集上的辉煌成果(stage A)。我们示出了将这些深度学习模型转化成概念更简单、可解释的模型的可能性。主要发现如下:1. 提案的模型可以通过prototype来解释,解决了混淆的解释问题。2. IDEAL框架缺少牺牲性学习问题,允许高效的分类增量学习。3. IDEAL方法示出了使用ViT架构,在缺少finetuning的情况下,在短时间内完成转移学习,使得模型的表现与finetuning的情况几乎相同。

Addressing the speed-accuracy simulation trade-off for adaptive spiking neurons

  • paper_url: http://arxiv.org/abs/2311.11390
  • repo_url: https://github.com/webstorms/blocks
  • paper_authors: Luke Taylor, Andrew J King, Nicol S Harper
  • for: 这个论文的目的是解决计算神经科学中的逐次采样问题,提高模型的训练速度而不损失准确性。
  • methods: 这个论文使用了对ALIF模型的算法重新解释,将模型的顺序 simulations 复杂度降低到最低,并使用GPU进行并行化。
  • results: 该实现可以在使用小步长DT时获得大约50倍的训练速度提升,并在不同的超visisted classification任务上达到与标准ALIF实现相同的性能,但是在训练时间上占了一个小 Fraction。此外,该模型还可以快速和准确地适应真实的电physiological记录,以捕捉精确的脉冲时间。
    Abstract The adaptive leaky integrate-and-fire (ALIF) model is fundamental within computational neuroscience and has been instrumental in studying our brains $\textit{in silico}$. Due to the sequential nature of simulating these neural models, a commonly faced issue is the speed-accuracy trade-off: either accurately simulate a neuron using a small discretisation time-step (DT), which is slow, or more quickly simulate a neuron using a larger DT and incur a loss in simulation accuracy. Here we provide a solution to this dilemma, by algorithmically reinterpreting the ALIF model, reducing the sequential simulation complexity and permitting a more efficient parallelisation on GPUs. We computationally validate our implementation to obtain over a $50\times$ training speedup using small DTs on synthetic benchmarks. We also obtained a comparable performance to the standard ALIF implementation on different supervised classification tasks - yet in a fraction of the training time. Lastly, we showcase how our model makes it possible to quickly and accurately fit real electrophysiological recordings of cortical neurons, where very fine sub-millisecond DTs are crucial for capturing exact spike timing.
    摘要 computacional neuroscience中的适应泄漏集成和发射(ALIF)模型是基础性的,它在研究大脑的计算中扮演着重要的角色。由于计算这些神经网络模型的顺序性,常常会遇到速度精度之间的贸易OFF:可以使用小步长时间步(DT)准确地模拟神经元,但是这会比较慢;或者使用大步长时间步(DT)更快地模拟神经元,但是会导致模拟精度下降。在这篇文章中,我们提供了一种解决这个困境的方法,通过算法性地重新解释ALIF模型,从而降低了计算神经网络模型的顺序化复杂性,并使得在GPU上更有效地并行化。我们通过计算 validate我们的实现,在使用小DT的情况下达到了50倍以上的训练速度提升。此外,我们还证明了我们的实现在不同的Supervised classification任务中具有相同的性能,但是在训练时间上占了一小部分。最后,我们展示了我们的模型可以快速地和准确地适应实际的电physiological记录,其中非常细小的毫秒级DT是关键的,以捕捉精确的发射时间。

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

  • paper_url: http://arxiv.org/abs/2311.11385
  • repo_url: None
  • paper_authors: Ahmed Hendawy, Jan Peters, Carlo D’Eramo
  • for: 解决多任务 reinforcement learning 中agent的技能可重用性问题
  • methods: 使用mixture of orthogonal experts(MOORE)方法,通过 Gram-Schmidt процесс将任务特定信息转化为共享表示
  • results: 在 MiniGrid 和 MetaWorld 两个多任务 reinforcement learning benchmark 上,MOORE 方法比基线相对较高,在 MetaWorld 上达到了新的状态纪录
    Abstract Multi-Task Reinforcement Learning (MTRL) tackles the long-standing problem of endowing agents with skills that generalize across a variety of problems. To this end, sharing representations plays a fundamental role in capturing both unique and common characteristics of the tasks. Tasks may exhibit similarities in terms of skills, objects, or physical properties while leveraging their representations eases the achievement of a universal policy. Nevertheless, the pursuit of learning a shared set of diverse representations is still an open challenge. In this paper, we introduce a novel approach for representation learning in MTRL that encapsulates common structures among the tasks using orthogonal representations to promote diversity. Our method, named Mixture Of Orthogonal Experts (MOORE), leverages a Gram-Schmidt process to shape a shared subspace of representations generated by a mixture of experts. When task-specific information is provided, MOORE generates relevant representations from this shared subspace. We assess the effectiveness of our approach on two MTRL benchmarks, namely MiniGrid and MetaWorld, showing that MOORE surpasses related baselines and establishes a new state-of-the-art result on MetaWorld.
    摘要

Optimal Locally Private Nonparametric Classification with Public Data

  • paper_url: http://arxiv.org/abs/2311.11369
  • repo_url: https://github.com/karlmyh/lpct
  • paper_authors: Yuheng Ma, Hanfang Yang
  • for: investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification.
  • methods: derive the mini-max optimal convergence rate with LDP constraint, and present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate.
  • results: comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method, and both theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, leading to practical suggestions for prioritizing non-private data collection.
    Abstract In this work, we investigate the problem of public data-assisted non-interactive LDP (Local Differential Privacy) learning with a focus on non-parametric classification. Under the posterior drift assumption, we for the first time derive the mini-max optimal convergence rate with LDP constraint. Then, we present a novel approach, the locally private classification tree, which attains the mini-max optimal convergence rate. Furthermore, we design a data-driven pruning procedure that avoids parameter tuning and produces a fast converging estimator. Comprehensive experiments conducted on synthetic and real datasets show the superior performance of our proposed method. Both our theoretical and experimental findings demonstrate the effectiveness of public data compared to private data, which leads to practical suggestions for prioritizing non-private data collection.
    摘要 在这项研究中,我们研究了公共数据协助非互动式LDP(本地差分隐私学习)问题,专注于非参数型分类。基于 posterior 漂移假设,我们首次 derivate 最优的 mini-max 收敛速率带 LDP 约束。然后,我们提出了一种新的方法, namely 地方隐私分类树,可以实现最优的收敛速率。此外,我们设计了一种基于数据驱动的剪裁过程,以避免参数调整并生成快速收敛的估计器。在 synthetic 和实际数据集上进行了广泛的实验,并证明了我们提出的方法的优越性。 both 我们的理论和实验结果表明,公共数据比私人数据更有效,这导致了实际中优先集集数据的建议。Note: Simplified Chinese is used here, as it is more commonly used in mainland China. If you prefer Traditional Chinese, I can also provide the translation.

Self-Supervised Pretraining for Heterogeneous Hypergraph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.11368
  • repo_url: None
  • paper_authors: Abdalgader Abubaker, Takanori Maehara, Madhav Nimishakavi, Vassilis Plachouras
  • for: 本研究的目的是提出一种自动编码的普适批处理框架(SPHH),用于强化hetereogeneous HyperGNN的自然语言模型。
  • methods: 本研究使用了两种自然语言任务,即node classification和link prediction,来自动学习 HyperGNN 的局部和全局表示。
  • results: 实验结果表明,使用 SPHH 框架可以在不同的 HyperGNN 模型和下游任务中提高表示性能,并且在不同的数据集上具有一定的稳定性。
    Abstract Recently, pretraining methods for the Graph Neural Networks (GNNs) have been successful at learning effective representations from unlabeled graph data. However, most of these methods rely on pairwise relations in the graph and do not capture the underling higher-order relations between entities. Hypergraphs are versatile and expressive structures that can effectively model higher-order relationships among entities in the data. Despite the efforts to adapt GNNs to hypergraphs (HyperGNN), there are currently no fully self-supervised pretraining methods for HyperGNN on heterogeneous hypergraphs. In this paper, we present SPHH, a novel self-supervised pretraining framework for heterogeneous HyperGNNs. Our method is able to effectively capture higher-order relations among entities in the data in a self-supervised manner. SPHH is consist of two self-supervised pretraining tasks that aim to simultaneously learn both local and global representations of the entities in the hypergraph by using informative representations derived from the hypergraph structure. Overall, our work presents a significant advancement in the field of self-supervised pretraining of HyperGNNs, and has the potential to improve the performance of various graph-based downstream tasks such as node classification and link prediction tasks which are mapped to hypergraph configuration. Our experiments on two real-world benchmarks using four different HyperGNN models show that our proposed SPHH framework consistently outperforms state-of-the-art baselines in various downstream tasks. The results demonstrate that SPHH is able to improve the performance of various HyperGNN models in various downstream tasks, regardless of their architecture or complexity, which highlights the robustness of our framework.
    摘要 近期,对图神经网络(GNN)的预训练方法有了成功,可以从不标注的图数据中学习有效的表示。然而,大多数这些方法仅仅是基于图中的对应关系,而不是捕捉图中实体之间的更高级别关系。几何图(Hypergraphs)是一种灵活和表达力强的结构,可以有效地模型图中实体之间的数据。虽然有尝试将GNN应用于几何图(HyperGNN),但目前还没有完全自动预训练方法 для HyperGNN在不同类型的几何图上。在这篇论文中,我们提出了一种新的自动预训练框架(SPHH),可以有效地捕捉几何图中实体之间的更高级别关系。SPHH包括两个自动预训练任务,旨在同时学习实体在几何图中的本地和全局表示。我们使用几何图结构中的信息来 derivate 出有用的表示,以便在预训练过程中学习实体之间的关系。总的来说,我们的工作对自动预训练 HyperGNN 的领域做出了重要的进步,可以提高多种基于图的下游任务的性能,如节点分类和链接预测任务,这些任务可以与几何图配置相对应。我们在两个实际 benchmark 上使用四种不同的 HyperGNN 模型进行实验,得到的结果显示,我们提出的 SPHH 框架在多种下游任务中一直保持领先,并且可以在不同的 HyperGNN 模型和复杂度之间实现一致性,这说明了我们的框架的稳定性。

Symmetry-invariant quantum machine learning force fields

  • paper_url: http://arxiv.org/abs/2311.11362
  • repo_url: None
  • paper_authors: Isabel Nha Minh Le, Oriel Kiss, Julian Schuhmacher, Ivano Tavernelli, Francesco Tacchino
  • for: Computing efficient and accurate force fields for atomistic simulations using machine learning techniques and quantum computational methods.
  • methods: Using variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data, and incorporating physically relevant symmetries in quantum neural networks.
  • results: Outperforming generic quantum learning models on individual molecules of growing complexity, and demonstrating the versatility of the approach on a water dimer as a minimal example of a system with multiple components.
    Abstract Machine learning techniques are essential tools to compute efficient, yet accurate, force fields for atomistic simulations. This approach has recently been extended to incorporate quantum computational methods, making use of variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data. However, the trainability and scalability of such models are still limited, due to both theoretical and practical barriers. Inspired by recent developments in geometric classical and quantum machine learning, here we design quantum neural networks that explicitly incorporate, as a data-inspired prior, an extensive set of physically relevant symmetries. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity. Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent, in fact, an interesting and rich playground for the development and application of advanced quantum machine learning tools.
    摘要 Inspired by recent developments in geometric classical and quantum machine learning, we design quantum neural networks that explicitly incorporate an extensive set of physically relevant symmetries as a data-inspired prior. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity.Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent an interesting and rich playground for the development and application of advanced quantum machine learning tools.Translated into Simplified Chinese:机器学习技术是 Computational efficient yet accurate 力场计算的关键工具。最近,这些技术已经扩展到包括量子计算方法,使用变量量子学习模型预测可见能量表面和原子力从初始数据训练中。然而,这些模型的可训练性和可扩展性仍然受到理论和实际障碍。受最近的几何类 classical 和量子机器学习发展启示,我们设计了量子神经网络,其直接包含大量物理相关的 симметрии作为数据驱动的假设。我们发现,我们的抗变量量子学习模型在增加复杂度的分子上表现出色,超过其更通用的对手。此外,我们研究了水二分子作为化学系统中的最小示例,展示了我们的方法的多样性和可扩展性。我们的结果表明,通过利用几何量子机器学习框架,分子力场生成可以取得显著的改进,化学系统 представ a rich and interesting 的机器学习工具开发和应用的场景。

Coverage-Validity-Aware Algorithmic Recourse

  • paper_url: http://arxiv.org/abs/2311.11349
  • repo_url: None
  • paper_authors: Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen
  • for: 提高机器学习模型的可解释性、透明度和伦理性
  • methods: 使用一种新的框架生成模型兼容的再利用方法,以确保对未来模型的适用性
  • results: 提出一种可以适应模型变化的再利用方法,并证明该方法可以恢复多种常见的正则化方法,包括L2正则化和类别权重正则化,同时可以生成INTUMPTIVE的再利用方法。
    Abstract Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework.
    摘要 仙术式回应技术在机器学习模型中崛起,以提高模型的解释性、透明度和伦理。现有的算法回应方法通常假设不变的预测模型,但实际上预测模型通常会随着新数据的到来而更新。因此,一个有效的回应可能会在未来的模型上无效。为解决这个问题,我们提出了一种新的框架,用于生成一个模型无关的回应,具有对模型变化的抗衰假设。我们首先构建一个具有覆盖度和有效性意识的非线性黑盒模型的线性抽象;然后,我们使用这个抽象来生成回应。我们证明了我们的抽象与最小最大概率机器(MPM)之间存在理论上的连接。我们还证明了,通过不同的covarianceRobustness,我们的框架可以恢复多种MPM的常见正则化,包括L2正则化和类别重量。此外,我们发现我们的抽象可以让近似的 гиперплоскоpush,使得回应不仅是可靠的,还是可解释的。数据结果表明我们的框架是有用和稳定的。

A Generative Model for Accelerated Inverse Modelling Using a Novel Embedding for Continuous Variables

  • paper_url: http://arxiv.org/abs/2311.11343
  • repo_url: None
  • paper_authors: Sébastien Bompas abd Stefan Sandfeld
  • for: 加速材料设计
  • methods: 使用生成机器学习模型,并 comparing a novel embedding strategy for generative models based on binary representation of floating point numbers
  • results: 提供了一种 versatile embedding space for conditioning the generative model,可以提供精细的控制 над生成的结构图像,并促进材料设计的加速。Here’s the Chinese text in the format you requested:
  • for: 加速材料设计
  • methods: 使用生成机器学习模型,并 comparing a novel embedding strategy for generative models based on binary representation of floating point numbers
  • results: 提供了一个 versatile embedding space for conditioning the generative model,可以提供精致的控制 над生成的结构图像,并促进材料设计的加速。
    Abstract In materials science, the challenge of rapid prototyping materials with desired properties often involves extensive experimentation to find suitable microstructures. Additionally, finding microstructures for given properties is typically an ill-posed problem where multiple solutions may exist. Using generative machine learning models can be a viable solution which also reduces the computational cost. This comes with new challenges because, e.g., a continuous property variable as conditioning input to the model is required. We investigate the shortcomings of an existing method and compare this to a novel embedding strategy for generative models that is based on the binary representation of floating point numbers. This eliminates the need for normalization, preserves information, and creates a versatile embedding space for conditioning the generative model. This technique can be applied to condition a network on any number, to provide fine control over generated microstructure images, thereby contributing to accelerated materials design.
    摘要 在材料科学中,快速原型材料的性能通常需要大量实验来找到适合的微结构。此外,找到给定性能的微结构通常是一个不充分定义的问题,可能存在多个解。使用生成机器学习模型可以是一个可行的解决方案,同时也可以降低计算成本。但是,这也带来了新的挑战,例如需要输入条件模型的连续属性变量。我们研究现有方法的缺陷并与一种基于二进制浮点数表示的新嵌入策略进行比较。这种策略可以用来决定网络的任意数字,以提供精细的控制权 над生成的微结构图像,从而促进材料设计的加速。

On the Communication Complexity of Decentralized Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2311.11342
  • repo_url: None
  • paper_authors: Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao
  • for: 提高实际任务中 Decentralized bilevel 优化的应用性
  • methods: 提出了一种基于 Stochastic bilevel gradient descent 算法的各自学习方法,具有小于一轮的通信成本和小于一轮的通信轮数
  • results: 实验结果证明了该算法的有效性,并在具有多级结构的各自学习中减少了通信复杂性
    Abstract Decentralized bilevel optimization has been actively studied in the past few years since it has widespread applications in machine learning. However, existing algorithms suffer from large communication complexity caused by the estimation of stochastic hypergradient, limiting their application to real-world tasks. To address this issue, we develop a novel decentralized stochastic bilevel gradient descent algorithm under the heterogeneous setting, which enjoys a small communication cost in each round and small communication rounds. As such, it can achieve a much better communication complexity than existing algorithms. Moreover, we extend our algorithm to the more challenging decentralized multi-level optimization. To the best of our knowledge, this is the first time achieving these theoretical results under the heterogeneous setting. At last, the experimental results confirm the efficacy of our algorithm.
    摘要 “半централизовAN optimize的研究在最近几年来得到了广泛的应用,特别是在机器学习领域。然而,现有的算法受到估计随机超gradient的困扰,导致它们在实际任务中应用有限。为解决这个问题,我们开发了一种新的分布式随机二级梯度下降算法,在多样化设定下实现了每轮的小communication cost和小数量的通信轮次。因此,它在communication复杂度方面比现有算法更好。此外,我们还扩展了我们的算法到更加复杂的分布式多级优化问题。到目前为止,这是在多样化设定下首次实现的理论结果。最后,实验结果证明了我们的算法的有效性。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Self-Distilled Representation Learning for Time Series

  • paper_url: http://arxiv.org/abs/2311.11335
  • repo_url: None
  • paper_authors: Felix Pieper, Konstantin Ditschuneit, Martin Genzel, Alexandra Lindt, Johannes Otterbach
  • for: 本研究旨在探讨自监督学习在时间序列数据上的潜力,以及一种非对照学习方法,基于数据2vec自适应混合学习框架。
  • methods: 我们提出了一种基于学生-教师模式的非对照学习方法,其中predicts the latent representation of an input time series from masked views of the same time series。
  • results: 我们通过对UCRC、UEA、ETT和电力 datasets进行比较,证明了我们的方法在预测和分类任务中的竞争力。
    Abstract Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.
    摘要 自领导学习 для时间序列数据具有与自然语言处理和计算机视觉领域最近获得的潜力相似。大多数现有的方法在这一领域都是基于对比学习的,我们则提议一种基于数据2vec自领导框架的非对比方法。我们的方法的核心是一种学生教师的方式,Predicting the latent representation of an input time series from masked views of the same time series。这种策略避免了强制性的modal-specific assumption和偏见,通常由对比样本对组成的设计引入。我们通过对UCRC和UEA数据库以及ETT和电力集成来证明我们的方法在分类和预测任务中的竞争力。

LABCAT: Locally adaptive Bayesian optimization using principal component-aligned trust regions

  • paper_url: http://arxiv.org/abs/2311.11328
  • repo_url: https://github.com/aemiliusretiarius/labcat
  • paper_authors: E. Visser, C. E. van Daalen, J. C. Schoeman
  • for: 优化贵重黑盒函数(Black-box function)的开销成本高的问题。
  • methods: 提议使用本地策略(such as trust regions)和主成分 aligned rotation 以及自适应尺度调整策略,以解决 BO 中的限制。
  • results: 经过大量的数字实验,表明 LABCAT 算法可以比 state-of-the-art BO 和其他黑盒优化算法表现更好。
    Abstract Bayesian optimization (BO) is a popular method for optimizing expensive black-box functions. BO has several well-documented shortcomings, including computational slowdown with longer optimization runs, poor suitability for non-stationary or ill-conditioned objective functions, and poor convergence characteristics. Several algorithms have been proposed that incorporate local strategies, such as trust regions, into BO to mitigate these limitations; however, none address all of them satisfactorily. To address these shortcomings, we propose the LABCAT algorithm, which extends trust-region-based BO by adding principal-component-aligned rotation and an adaptive rescaling strategy based on the length-scales of a local Gaussian process surrogate model with automatic relevance determination. Through extensive numerical experiments using a set of synthetic test functions and the well-known COCO benchmarking software, we show that the LABCAT algorithm outperforms several state-of-the-art BO and other black-box optimization algorithms.
    摘要

Large Learning Rates Improve Generalization: But How Large Are We Talking About?

  • paper_url: http://arxiv.org/abs/2311.11303
  • repo_url: None
  • paper_authors: Ekaterina Lobacheva, Eduard Pockonechnyy, Maxim Kodryan, Dmitry Vetrov
  • for: 这 paper 探讨了在开始神经网络训练时使用大学习率(LR)以实现最佳一致性的假设。
  • methods: 该研究具体探讨了这个假设,并确定了最佳初始 LR 范围。我们在简化的设置下进行主要实验,并在更实际的设置中验证了我们的关键发现。
  • results: 我们发现,最佳初始 LR 范围实际上远 narrower than 普遍认为的。
    Abstract Inspired by recent research that recommends starting neural networks training with large learning rates (LRs) to achieve the best generalization, we explore this hypothesis in detail. Our study clarifies the initial LR ranges that provide optimal results for subsequent training with a small LR or weight averaging. We find that these ranges are in fact significantly narrower than generally assumed. We conduct our main experiments in a simplified setup that allows precise control of the learning rate hyperparameter and validate our key findings in a more practical setting.
    摘要 Simplified Chinese:根据最近的研究,启发了使用大学习率(LR)进行神经网络训练以实现最佳通用化,我们在这个假设上进行了详细的探索。我们的研究发现,在后续训练中使用小学习率或权重平均时,最佳结果的初始LR范围实际上是比之前想象的更加窄的。我们在一个简化的设置下进行了主要的实验,该设置允许精准控制学习率超参数,并在更实际的设置下验证了我们的关键发现。

From Categories to Classifier: Name-Only Continual Learning by Exploring the Web

  • paper_url: http://arxiv.org/abs/2311.11293
  • repo_url: None
  • paper_authors: Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard Ghanem, Philip H. S. Torr, Adel Bibi
  • for: 解决手动标注数据的限制,提高 continual learning 的可行性和效率。
  • methods: 利用互联网查询和下载无监督的网络数据,并利用这些数据进行分类。
  • results: 比对手动标注数据和网络数据的可靠性,发现两者相对比较,并且可以通过网络数据创建比STATE-OF-THE-ART名称只分类支持集,提高分类精度。在不同的 continual learning 上下文中应用时,方法具有较小的性能差异。还提出了 EvoTrends,一个基于网络的类增量数据集,用于捕捉实际世界的趋势,只需几分钟创建。总之,这篇论文表明了使用无监督网络数据来缓解手动数据标注的挑战,可以提高 continual learning 的可行性和效率。
    Abstract Continual Learning (CL) often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice. We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation. In this scenario, learners adapt to new category shifts using only category names without the luxury of annotated training data. Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification. We investigate the reliability of our web data and find them comparable, and in some cases superior, to manually annotated datasets. Additionally, we show that by harnessing the web, we can create support sets that surpass state-of-the-art name-only classification that create support sets using generative models or image retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied across varied continual learning contexts, our method consistently exhibits a small performance gap in comparison to models trained on manually annotated datasets. We present EvoTrends, a class-incremental dataset made from the web to capture real-world trends, created in just minutes. Overall, this paper underscores the potential of using uncurated webly-supervised data to mitigate the challenges associated with manual data labeling in continual learning.
    摘要

TimeSQL: Improving Multivariate Time Series Forecasting with Multi-Scale Patching and Smooth Quadratic Loss

  • paper_url: http://arxiv.org/abs/2311.11285
  • repo_url: None
  • paper_authors: Site Mo, Haoxin Wang, Bixiong Li, Songhai Fan, Yuankai Wu, Xianggen Liu
  • for: 这篇论文的目的是提出一个简单有效的框架,用于多重时间序列预测(multivariate time series forecasting)。
  • methods: 这个框架使用了多尺度贴合和流畅quadratic loss(SQL)来解决实际时间序列拥有噪音和复杂的本地和全球时间对称性,使得预测未来时间序列很困难。
  • results: 根据论文的 teorical 分析和实验结果,这个框架可以在八个真实世界 benchmark 数据集上实现新的顶尖性能。
    Abstract Time series is a special type of sequence data, a sequence of real-valued random variables collected at even intervals of time. The real-world multivariate time series comes with noises and contains complicated local and global temporal dynamics, making it difficult to forecast the future time series given the historical observations. This work proposes a simple and effective framework, coined as TimeSQL, which leverages multi-scale patching and smooth quadratic loss (SQL) to tackle the above challenges. The multi-scale patching transforms the time series into two-dimensional patches with different length scales, facilitating the perception of both locality and long-term correlations in time series. SQL is derived from the rational quadratic kernel and can dynamically adjust the gradients to avoid overfitting to the noises and outliers. Theoretical analysis demonstrates that, under mild conditions, the effect of the noises on the model with SQL is always smaller than that with MSE. Based on the two modules, TimeSQL achieves new state-of-the-art performance on the eight real-world benchmark datasets. Further ablation studies indicate that the key modules in TimeSQL could also enhance the results of other models for multivariate time series forecasting, standing as plug-and-play techniques.
    摘要 时间序列是一种特殊的序列数据,一个时间间隔为整数的序列实数随机变量的序列。实际世界多变量时间序列具有噪声和复杂的地方和全局时间动态,使其难以根据历史观测预测未来时间序列。这项工作提出了一个简单有效的框架,命名为TimeSQL,该框架利用多尺度补丁和平滑quadratic loss(SQL)来解决以上挑战。多尺度补丁将时间序列转换为二维补丁,并且具有不同的尺度,以便更好地捕捉时间序列的本地和长期相关性。SQL是基于 rational quadratic kernel 的,可以动态调整Gradient,以避免因噪声和异常值而过拟合。理论分析表明,在某些条件下,TimeSQL模型中的噪声对于MSE模型来说总是小于。基于两个模块,TimeSQL在八个实际世界 referential 数据集上达到了新的状态码性能。进一步的ablation 研究表明,TimeSQL 中的关键模块可以增强其他模型的多变量时间序列预测结果,作为插件技术。

Multi-Timescale Control and Communications with Deep Reinforcement Learning – Part I: Communication-Aware Vehicle Control

  • paper_url: http://arxiv.org/abs/2311.11281
  • repo_url: None
  • paper_authors: Tong Liu, Lei Lei, Kan Zheng, Xuemin, Shen
  • for: 本研究旨在开发一种基于车辆到所有东西(V2X)通信的智能决策系统,以实现自动驾驶(AD)的安全和效率。
  • methods: 本研究提出了一种多时间尺度控制和通信(MTCC)框架,基于深度学习算法(DRL)。
  • results: 研究人员在实验中比较了MTCC-PC算法与基eline DRL算法的性能,并证明了MTCC-PC算法可以在带有观测延迟的情况下提高PC性能。
    Abstract An intelligent decision-making system enabled by Vehicle-to-Everything (V2X) communications is essential to achieve safe and efficient autonomous driving (AD), where two types of decisions have to be made at different timescales, i.e., vehicle control and radio resource allocation (RRA) decisions. The interplay between RRA and vehicle control necessitates their collaborative design. In this two-part paper (Part I and Part II), taking platoon control (PC) as an example use case, we propose a joint optimization framework of multi-timescale control and communications (MTCC) based on Deep Reinforcement Learning (DRL). In this paper (Part I), we first decompose the problem into a communication-aware DRL-based PC sub-problem and a control-aware DRL-based RRA sub-problem. Then, we focus on the PC sub-problem assuming an RRA policy is given, and propose the MTCC-PC algorithm to learn an efficient PC policy. To improve the PC performance under random observation delay, the PC state space is augmented with the observation delay and PC action history. Moreover, the reward function with respect to the augmented state is defined to construct an augmented state Markov Decision Process (MDP). It is proved that the optimal policy for the augmented state MDP is optimal for the original PC problem with observation delay. Different from most existing works on communication-aware control, the MTCC-PC algorithm is trained in a delayed environment generated by the fine-grained embedded simulation of C-V2X communications rather than by a simple stochastic delay model. Finally, experiments are performed to compare the performance of MTCC-PC with those of the baseline DRL algorithms.
    摘要 “一个智能做出决策系统,启用车辆与所有东西(V2X)通信,是为自动驾驶(AD) achieve safe和效率的运转,需要在不同时间尺度上做出两种决策,即车辆控制和对话资源分配(RRA)决策。这两种决策之间的互动,需要它们的共同设计。在这两部分文章(Part I和Part II)中,以单位排队控制(PC)为使用案例,我们提出了一个多个时间尺度的控制和通信(MTCC)的联合优化框架,基于深度循环学习(DRL)。在这篇文章(Part I)中,我们首先将问题分解为一个具有通信意识的DRL基于PC子问题,以及一个具有控制意识的DRL基于RRA子问题。然后,我们专注于PC子问题,假设RRA策略已经给出,并提出了MTCC-PC算法,以学习一个高效的PC策略。为了改善PC性能在随机观察延迟下,PC状态空间被扩展了,加入观察延迟和PC动作历史。此外,对于扩展的状态,定义了一个对应的资源分配奖励函数,以建立一个扩展状态Markov决策过程(MDP)。经过证明,最佳策略 для扩展状态MDP是最佳策略 для原始PC问题中的观察延迟。与大多数现有的通信意识控制方法不同,MTCC-PC算法在精细的C-V2X通信嵌入式 simulator上进行延迟生成的延迟环境中训练,而不是使用简单的概率延迟模型。 finally,我们对MTCC-PC算法与基于DRL的基eline算法进行比较实验。”

Multi-Timescale Control and Communications with Deep Reinforcement Learning – Part II: Control-Aware Radio Resource Allocation

  • paper_url: http://arxiv.org/abs/2311.11280
  • repo_url: None
  • paper_authors: Lei Lei, Tong Liu, Kan Zheng, Xuemin, Shen
  • for: 这个论文是为了解决 Cellular Vehicle-to-Everything (C-V2X) 系统中的多时频控制和通信问题(Multi-Timescale Control and Communications,MTCC)。
  • methods: 这篇论文使用 Deep Reinforcement Learning (DRL) 算法来解决 MTCC 问题,并将其分解成两个互相关联的互动问题:PC 问题和 RRA 问题。PC 问题是用 DRL 算法学习一个最佳的排队控制策略,而 RRA 问题是用 DRL 算法学习一个最佳的广播资源分配策略。
  • results: 在实验中,使用实际驾驶数据对主排队车进行训练,并与基eline DRL 算法进行比较,得到的结果表明,提出的 MTCC 算法能够更好地解决 MTCC 问题。
    Abstract In Part I of this two-part paper (Multi-Timescale Control and Communications with Deep Reinforcement Learning -- Part I: Communication-Aware Vehicle Control), we decomposed the multi-timescale control and communications (MTCC) problem in Cellular Vehicle-to-Everything (C-V2X) system into a communication-aware Deep Reinforcement Learning (DRL)-based platoon control (PC) sub-problem and a control-aware DRL-based radio resource allocation (RRA) sub-problem. We focused on the PC sub-problem and proposed the MTCC-PC algorithm to learn an optimal PC policy given an RRA policy. In this paper (Part II), we first focus on the RRA sub-problem in MTCC assuming a PC policy is given, and propose the MTCC-RRA algorithm to learn the RRA policy. Specifically, we incorporate the PC advantage function in the RRA reward function, which quantifies the amount of PC performance degradation caused by observation delay. Moreover, we augment the state space of RRA with PC action history for a more well-informed RRA policy. In addition, we utilize reward shaping and reward backpropagation prioritized experience replay (RBPER) techniques to efficiently tackle the multi-agent and sparse reward problems, respectively. Finally, a sample- and computational-efficient training approach is proposed to jointly learn the PC and RRA policies in an iterative process. In order to verify the effectiveness of the proposed MTCC algorithm, we performed experiments using real driving data for the leading vehicle, where the performance of MTCC is compared with those of the baseline DRL algorithms.
    摘要 在这篇两部分文章中(多 timescale控制和通信 WITH deep reinforcement learning -- Part I: 交通aware Vehicle Control),我们将多 timescale控制和通信(MTCC)问题分解为一个交通aware Deep Reinforcement Learning(DRL)基于的坏车队控制(PC)子问题和一个控制 aware DRL 基于的广播资源分配(RRA)子问题。我们关注 PC 子问题,并提出了 MTCC-PC 算法,以学习一个最优的 PC 策略。在这篇文章中(Part II),我们首先关注 RRA 子问题,假设 PC 策略已知,并提出了 MTCC-RRA 算法,以学习 RRA 策略。具体来说,我们将 PC 优势函数 incorporated 到 RRA 奖励函数中,以量化 PC 性能下降的观测延迟量。此外,我们将 RRA 状态空间扩展为 PC 动作历史,以更好地 Inform RRA 策略。此外,我们使用 reward shaping 和 reward backpropagation prioritized experience replay(RBPER)技术,以有效地解决多代理和罕见奖励问题。最后,我们提出了一种 sample- 和计算效率高的训练方法,以同时学习 PC 和 RRA 策略。为验证我们提出的 MTCC 算法的有效性,我们使用实际驾驶数据进行了实验,并比较了 MTCC 的性能与基eline DRL 算法。

Uncertainty quantification for noisy inputs-outputs in physics-informed neural networks and neural operators

  • paper_url: http://arxiv.org/abs/2311.11262
  • repo_url: None
  • paper_authors: Zongren Zou, Xuhui Meng, George Em Karniadakis
  • for: This paper is written for addressing the uncertainty quantification (UQ) in scientific machine learning (SciML) models, specifically for noisy inputs in physics-informed neural networks (PINNs) and neural operators (NOs).
  • methods: The paper proposes a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. This approach seamlessly integrates into PINNs and NOs, allowing them to address problems where the observed coordinate or input functions are subject to noise.
  • results: The proposed approach enables PINNs and NOs to handle noisy measurements for both input and output functions, providing reliable and trustworthy deployment of these models in applications involving physical knowledge.
    Abstract Uncertainty quantification (UQ) in scientific machine learning (SciML) becomes increasingly critical as neural networks (NNs) are being widely adopted in addressing complex problems across various scientific disciplines. Representative SciML models are physics-informed neural networks (PINNs) and neural operators (NOs). While UQ in SciML has been increasingly investigated in recent years, very few works have focused on addressing the uncertainty caused by the noisy inputs, such as spatial-temporal coordinates in PINNs and input functions in NOs. The presence of noise in the inputs of the models can pose significantly more challenges compared to noise in the outputs of the models, primarily due to the inherent nonlinearity of most SciML algorithms. As a result, UQ for noisy inputs becomes a crucial factor for reliable and trustworthy deployment of these models in applications involving physical knowledge. To this end, we introduce a Bayesian approach to quantify uncertainty arising from noisy inputs-outputs in PINNs and NOs. We show that this approach can be seamlessly integrated into PINNs and NOs, when they are employed to encode the physical information. PINNs incorporate physics by including physics-informed terms via automatic differentiation, either in the loss function or the likelihood, and often take as input the spatial-temporal coordinate. Therefore, the present method equips PINNs with the capability to address problems where the observed coordinate is subject to noise. On the other hand, pretrained NOs are also commonly employed as equation-free surrogates in solving differential equations and Bayesian inverse problems, in which they take functions as inputs. The proposed approach enables them to handle noisy measurements for both input and output functions with UQ.
    摘要 科学机器学习(SciML)中的不确定性评估(UQ)在 neural networks(NNs)广泛应用的场景中变得越来越重要。代表性的 SciML 模型包括物理相关的 neural networks(PINNs)和 neural operators(NOs)。尽管 UQ 在 SciML 已经得到了一些研究,但很少有关注输入噪声的不确定性,如 PINNs 中的空间-时间坐标和 NOs 中的输入函数。噪声在模型输入中可能会比模型输出中的噪声更加复杂,主要是因为大多数 SciML 算法具有非线性性。因此,对输入噪声的不确定性评估变得非常重要,以确保模型在应用中的可靠和信任性。为此,我们介绍了一种 bayesian 方法,用于评估 PINNs 和 NOs 中噪声输入-输出的不确定性。我们示示这种方法可以顺利地与 PINNs 和 NOs 结合使用,当它们用于编码物理信息时。PINNs 通常通过自动微分来包含物理条件,并将空间-时间坐标作为输入。因此,我们的方法可以帮助 PINNs 解决受到噪声影响的问题。而 pretrained NOs 通常也被用作 equation-free 代理,用于解决微分方程和 bayesian 反问题,它们的输入是函数。我们的方法可以帮助它们处理噪声测量数据。

BOIS: Bayesian Optimization of Interconnected Systems

  • paper_url: http://arxiv.org/abs/2311.11254
  • repo_url: None
  • paper_authors: Leonardo D. González, Victor M. Zavala
  • for: 这个论文是用来优化昂贵的样本系统的一种有效方法。
  • methods: 这篇论文使用 Bayesian 优化(BO)方法,并利用 Gaussian 过程(GP)来表征模型不确定性,从而导引学习和搜索过程。
  • results: 该论文介绍了一种新的 BO 方法,即 BOIS,可以有效地使用 composite functions,并使用 adaptive linearizations 来获得关于 composite function 的闭式表达。该方法在一个化学过程优化案例中被评估,并与标准 BO 和抽样方法进行比较。结果表明,BOIS 可以获得性能提升和正确地捕捉 composite functions 的统计特性。
    Abstract Bayesian optimization (BO) has proven to be an effective paradigm for the global optimization of expensive-to-sample systems. One of the main advantages of BO is its use of Gaussian processes (GPs) to characterize model uncertainty which can be leveraged to guide the learning and search process. However, BO typically treats systems as black-boxes and this limits the ability to exploit structural knowledge (e.g., physics and sparse interconnections). Composite functions of the form $f(x, y(x))$, wherein GP modeling is shifted from the performance function $f$ to an intermediate function $y$, offer an avenue for exploiting structural knowledge. However, the use of composite functions in a BO framework is complicated by the need to generate a probability density for $f$ from the Gaussian density of $y$ calculated by the GP (e.g., when $f$ is nonlinear it is not possible to obtain a closed-form expression). Previous work has handled this issue using sampling techniques; these are easy to implement and flexible but are computationally intensive. In this work, we introduce a new paradigm which allows for the efficient use of composite functions in BO; this uses adaptive linearizations of $f$ to obtain closed-form expressions for the statistical moments of the composite function. We show that this simple approach (which we call BOIS) enables the exploitation of structural knowledge, such as that arising in interconnected systems as well as systems that embed multiple GP models and combinations of physics and GP models. Using a chemical process optimization case study, we benchmark the effectiveness of BOIS against standard BO and sampling approaches. Our results indicate that BOIS achieves performance gains and accurately captures the statistics of composite functions.
    摘要 bayesian 优化(BO)已经证明是全球优化昂贵样本系统的有效方法。BO的一个主要优点是使用 Gaussian processes(GP)来Characterize model uncertainty,这可以用来导引学习和搜索过程。然而,BO通常对系统进行黑盒子封装,这限制了利用结构知识(例如物理和稀疏连接)。使用 $f(x, y(x)) $ 的 composite function,其中 GP 模型Shifted from performance function $f$ to intermediate function $y$,提供了利用结构知识的途径。然而,在 BO 框架中使用 composite function 增加了 Generating a probability density for $f$ from the Gaussian density of $y$ calculated by the GP 的需求(例如当 $f$ 非线性时,不能获得关闭式表达)。过去的工作通过 sampling 技术来解决这个问题,这些技术容易实现但计算昂贵。在这种工作中,我们引入了一种新的方法,即使用 adaptive linearization of $f$ 来获得关闭式表达 composite function 的统计 moments。我们显示这种简单的方法(我们称之为 BOIS)可以有效地利用结构知识,例如在交互连接系统中和GP模型和物理模型的组合中。使用化学过程优化案例研究,我们对 BOIS 的效果进行了比较,与标准 BO 和 sampling 方法进行了比较。我们的结果表明,BOIS 可以 achieve performance gains 和正确地捕捉 composite function 的统计特性。

A Universal Framework for Accurate and Efficient Geometric Deep Learning of Molecular Systems

  • paper_url: http://arxiv.org/abs/2311.11228
  • repo_url: https://github.com/XieResearchGroup/Physics-aware-Multiplex-GNN
  • paper_authors: Shuo Zhang, Yang Liu, Lei Xie
  • for: 用于三维分子的准确和有效学习表示
  • methods: 基于分子动力学的启发式模型本地和非本地交互,并对其共同效应进行模型
  • results: 在多种分子科学应用中表现出色,比如小分子性质、RNA三维结构和蛋白质-抑药结合稳定性,并且在精算和内存方面具有高效性。
    Abstract Molecular sciences address a wide range of problems involving molecules of different types and sizes and their complexes. Recently, geometric deep learning, especially Graph Neural Networks, has shown promising performance in molecular science applications. However, most existing works often impose targeted inductive biases to a specific molecular system, and are inefficient when applied to macromolecules or large-scale tasks, thereby limiting their applications to many real-world problems. To address these challenges, we present PAMNet, a universal framework for accurately and efficiently learning the representations of three-dimensional (3D) molecules of varying sizes and types in any molecular system. Inspired by molecular mechanics, PAMNet induces a physics-informed bias to explicitly model local and non-local interactions and their combined effects. As a result, PAMNet can reduce expensive operations, making it time and memory efficient. In extensive benchmark studies, PAMNet outperforms state-of-the-art baselines regarding both accuracy and efficiency in three diverse learning tasks: small molecule properties, RNA 3D structures, and protein-ligand binding affinities. Our results highlight the potential for PAMNet in a broad range of molecular science applications.
    摘要 分子科学研究各种不同类型和大小的分子和其复杂系统。最近,深度学习,特别是图神经网络,在分子科学应用中表现出了有前途的性能。然而,大多数现有的工作通常会针对特定分子系统强制目标印象,对于大分子或大规模任务来说效率低下,因此对实际世界问题有限制。为解决这些挑战,我们提出了PAMNet,一个通用的框架,可以准确地和高效地学习三维分子的表示,无论它们是什么类型和大小。以分子动力学为灵感,PAMNet引入物理学引用的偏见,以模型本地和非本地交互和它们的共同效果。因此,PAMNet可以减少昂贵的操作,从而节省时间和存储空间。在广泛的benchmark测试中,PAMNet在三种多样化的学习任务中(小分子性质、RNA三维结构和蛋白质-抑药绑定亲和力)表现出了与状态所有基准点的超过。我们的结果表明PAMNet在分子科学应用中具有广泛的潜力。

TextGuard: Provable Defense against Backdoor Attacks on Text Classification

  • paper_url: http://arxiv.org/abs/2311.11225
  • repo_url: https://github.com/ai-secure/textguard
  • paper_authors: Hengzhi Pei, Jinyuan Jia, Wenbo Guo, Bo Li, Dawn Song
  • for: 防止机器学习模型中的后门攻击
  • methods: 分割训练数据,并从每个子训练集中训练基础分类器, ensemble 提供最终预测
  • results: 在三个文本分类任务上达到了证书精度,超过了现有的证书防御措施;并提供了进一步优化策略来提高 TextGuard 的实际性能。
    Abstract Backdoor attacks have become a major security threat for deploying machine learning models in security-critical applications. Existing research endeavors have proposed many defenses against backdoor attacks. Despite demonstrating certain empirical defense efficacy, none of these techniques could provide a formal and provable security guarantee against arbitrary attacks. As a result, they can be easily broken by strong adaptive attacks, as shown in our evaluation. In this work, we propose TextGuard, the first provable defense against backdoor attacks on text classification. In particular, TextGuard first divides the (backdoored) training data into sub-training sets, achieved by splitting each training sentence into sub-sentences. This partitioning ensures that a majority of the sub-training sets do not contain the backdoor trigger. Subsequently, a base classifier is trained from each sub-training set, and their ensemble provides the final prediction. We theoretically prove that when the length of the backdoor trigger falls within a certain threshold, TextGuard guarantees that its prediction will remain unaffected by the presence of the triggers in training and testing inputs. In our evaluation, we demonstrate the effectiveness of TextGuard on three benchmark text classification tasks, surpassing the certification accuracy of existing certified defenses against backdoor attacks. Furthermore, we propose additional strategies to enhance the empirical performance of TextGuard. Comparisons with state-of-the-art empirical defenses validate the superiority of TextGuard in countering multiple backdoor attacks. Our code and data are available at https://github.com/AI-secure/TextGuard.
    摘要 <>translate the following text into Simplified Chinese<>背门攻击已成为安全应用中的主要安全威胁,当部署机器学习模型时。现有的研究努力已经提出了许多防御措施,但是None of these techniques could provide a formal and provable security guarantee against arbitrary attacks。因此,它们可以轻松被强化的适应攻击所损坏,如我们的评估所示。在这个工作中,我们提出了TextGuard,首个可提供可靠的防御措施,对于文本分类中的背门攻击。具体来说,TextGuard首先将(背门)训练数据分成子训练集,通过每个训练句子分成子句子。这些分割确保了训练集中大多数子训练集不包含背门触发器。接着,我们从每个子训练集中训练基本分类器,并将其结合为最终预测。我们 teorically prove that 当背门触发器的长度在某些阈值之下,TextGuard 的预测将不受训练和测试输入中背门触发器的影响。在我们的评估中,我们显示了 TextGuard 在三个标准文本分类任务上的效果,超过了现有的认证防御措施的认证精度。此外,我们还提出了一些增强 TextGuard 的实际性的策略。与现有的实际防御措施比较,TextGuard 在面对多个背门攻击时表现出色。我们的代码和数据可以在 上获得。

Robust Network Slicing: Multi-Agent Policies, Adversarial Attacks, and Defensive Strategies

  • paper_url: http://arxiv.org/abs/2311.11206
  • repo_url: None
  • paper_authors: Feng Wang, M. Cenk Gursoy, Senem Velipasalar
  • for: 本文提出了一种基于多智能深度学习(深度RL)框架的网络切片在动态环境中的多基站多用户场景中的自适应控制方法。
  • methods: 本文提出了一种新的深度RL框架,其中多个演示器(actor)被实现为指向网络中的变化输入。此外,中央批评器(critic)也被使用以提高演示器的性能。
  • results: 通过实验表明,提出的深度RL算法可以具有效果地控制网络切片。此外,还提出了一种基于深度RL的干扰器,可以在有限的前置信息和功率预算下减少网络切片的传输率,从而降低网络切片的性能。最后,提出了一种基于 Nash 平衡的策略混合 Profile,可以用于网络切片和干扰器中。通过在实验中应用这种策略混合 Profile,得到了良好的性能。
    Abstract In this paper, we present a multi-agent deep reinforcement learning (deep RL) framework for network slicing in a dynamic environment with multiple base stations and multiple users. In particular, we propose a novel deep RL framework with multiple actors and centralized critic (MACC) in which actors are implemented as pointer networks to fit the varying dimension of input. We evaluate the performance of the proposed deep RL algorithm via simulations to demonstrate its effectiveness. Subsequently, we develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. Finally, we devise a Nash-equilibrium-supervised policy ensemble mixed strategy profile for network slicing (as a defensive measure) and jamming. We evaluate the performance of the proposed policy ensemble algorithm by applying on the network slicing agents and the jammer agent in simulations to show its effectiveness.
    摘要 在本文中,我们提出了一个多代理深度强化学习(深度RL)框架,用于在多个基站和多个用户的动态环境中实现网络分割。特别是,我们提出了一种新的多actor和中央批评者(MACC)深度RL框架,在其中, actors 是实现为指针网络,以适应输入的变化维度。我们通过 simulations 评估了我们提出的深度RL 算法的性能,以示其效果。然后,我们开发了一个基于深度RL 的干扰器,该干扰器具有限制的前置信息和限制的能量预算。干扰器的目标是降低网络分割代理的性能,以达到干扰网络分割的目的。我们设计了干扰器的 listening 和干扰阶段,并对干扰频道优化以及干扰位置优化进行深度RL 的调 optimize。我们在优化的位置上进行干扰,通过在优化的频道上进行 switching 来生成干扰攻击。我们证明了我们的干扰器可以在不知情的情况下,对网络分割代理造成重大的性能降低。最后,我们提出了一种 Nash 平衡监督的策略ensemble 混合策略,用于网络分割(作为防御措施)和干扰。我们通过在网络分割代理和干扰器代理上应用这种策略 ensemble 算法,以证明其效果。

Scale-free networks: improved inference

  • paper_url: http://arxiv.org/abs/2311.11200
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Nixon Jerez-Lillo, Francisco A. Rodrigues, Pedro L. Ramos
  • for: 本文 investigate whether 网络的度分布遵循 power-law 分布,并提出了改进的 Bayesian inference 方法来确定模型参数的准确性和置信区间。
  • methods: 本文使用 Bayesian inference 方法,包括对绝对和相对分布进行推导,以获得准确的模型参数估计和置信区间。
  • results: 研究结果表明,使用 Bayesian inference 方法可以获得 nearly unbiased 的模型参数估计,并且在实际应用中更适用。 本文还对 более чем 5,000 个synthetic网络和3,000个实际网络进行了应用。结果表明,我们的方法在实际应用中更适用,它的频率接受相对较高。
    Abstract The power-law distribution plays a crucial role in complex networks as well as various applied sciences. Investigating whether the degree distribution of a network follows a power-law distribution is an important concern. The commonly used inferential methods for estimating the model parameters often yield biased estimates, which can lead to the rejection of the hypothesis that a model conforms to a power-law. In this paper, we discuss improved methods that utilize Bayesian inference to obtain accurate estimates and precise credibility intervals. The inferential methods are derived for both continuous and discrete distributions. These methods reveal that objective Bayesian approaches return nearly unbiased estimates for the parameters of both models. Notably, in the continuous case, we identify an explicit posterior distribution. This work enhances the power of goodness-of-fit tests, enabling us to accurately discern whether a network or any other dataset adheres to a power-law distribution. We apply the proposed approach to fit degree distributions for more than 5,000 synthetic networks and over 3,000 real networks. The results indicate that our method is more suitable in practice, as it yields a frequency of acceptance close to the specified nominal level.
    摘要 power-law 分布在复杂网络以及各种应用科学中扮演着关键角色。检查网络度分布是否遵循power-law分布是一项重要的问题。通常使用的推断方法对模型参数进行估计通常会导致偏向估计,这可能会导致模型不符合power-law分布的拒绝。在这篇论文中,我们讨论了改进的方法,利用 Bayesian 推断来获得准确的估计和精确的信任范围。这些方法适用于连续和离散分布。这些方法表明,对 Bayesian 方法进行 Objective 的推断可以返回准确的参数估计。在连续 caso,我们确定了一个显式后验分布。这项工作可以增强好宜性测试的力量,使我们能准确地判断网络或任何其他数据集是否遵循power-law分布。我们在fit degree distribution中应用该方法,并对超过5,000个synthetic网络和3,000个实际网络进行测试。结果表明,我们的方法在实践中更适合,它的频次接受度很接近指定的正常水平。

Testing with Non-identically Distributed Samples

  • paper_url: http://arxiv.org/abs/2311.11194
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Shivam Garg, Chirag Pabbaraju, Kirankumar Shiragur, Gregory Valiant
  • for: 这个论文研究了在独立但不同分布下的非线性样本计数和估计问题。
  • methods: 该论文使用了分布测试框架,具体来说是对每个分布$\textbf{p}i$取$c$个独立的样本,然后通过学习或测试$\textbf{p}{\text{avg}$的平均分布来进行分布测试。
  • results: 研究结果表明,当$c=1$时,需要$\Theta(k/\varepsilon^2)$样本来准确地学习$\textbf{p}_{\text{avg}$,而在测试uniformity或identity时,需要 linear 样本数量与$k$相关。而当$c\geq 2$时,样本数量可以降低至$O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$,与i.i.d.情况的优秀样本复杂度相同。此外,当$c=2$时,存在一个常数$\rho > 0$,使得即使使用$\rho k$个样本,也无法通过忽略哪些样本来自同$\textbf{p}_i$的Multiset测试uniformity。
    Abstract We examine the extent to which sublinear-sample property testing and estimation applies to settings where samples are independently but not identically distributed. Specifically, we consider the following distributional property testing framework: Suppose there is a set of distributions over a discrete support of size $k$, $\textbf{p}_1, \textbf{p}_2,\ldots,\textbf{p}_T$, and we obtain $c$ independent draws from each distribution. Suppose the goal is to learn or test a property of the average distribution, $\textbf{p}_{\mathrm{avg}$. This setup models a number of important practical settings where the individual distributions correspond to heterogeneous entities -- either individuals, chronologically distinct time periods, spatially separated data sources, etc. From a learning standpoint, even with $c=1$ samples from each distribution, $\Theta(k/\varepsilon^2)$ samples are necessary and sufficient to learn $\textbf{p}_{\mathrm{avg}$ to within error $\varepsilon$ in TV distance. To test uniformity or identity -- distinguishing the case that $\textbf{p}_{\mathrm{avg}$ is equal to some reference distribution, versus has $\ell_1$ distance at least $\varepsilon$ from the reference distribution, we show that a linear number of samples in $k$ is necessary given $c=1$ samples from each distribution. In contrast, for $c \ge 2$, we recover the usual sublinear sample testing of the i.i.d. setting: we show that $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ samples are sufficient, matching the optimal sample complexity in the i.i.d. case in the regime where $\varepsilon \ge k^{-1/4}$. Additionally, we show that in the $c=2$ case, there is a constant $\rho > 0$ such that even in the linear regime with $\rho k$ samples, no tester that considers the multiset of samples (ignoring which samples were drawn from the same $\textbf{p}_i$) can perform uniformity testing.
    摘要 我们研究在独立但不同分布下的测试和估计中,sublinear-sample property testing和估计是否适用。 Specifically,我们考虑以下分布性 property testing框架:假设有一集合 $\textbf{p}_1, \textbf{p}_2, \ldots, \textbf{p}_T$ 的分布,每个分布都是在给定的类别支持 $\mathcal{X}$ 上的一个统计分布,并且我们从每个分布中获得 $c$ 个独立的抽象。我们的目标是要学习或测试 $\textbf{p}_{\text{avg}$ 的平均分布。这个设定模拟了许多实际上重要的实际设定,例如个别分布对应的不同实体,例如个人、时间对应的不同时期、空间分布的不同数据来源等。从学习角度来看,就算有 $c=1$ 个抽象,需要 $\Theta(k/\varepsilon^2)$ 个抽象来learn $\textbf{p}_{\text{avg}$ 到 Within 误差 $\varepsilon$ 的TV距离。对于uniformity或identify性测试,我们显示出在 $k$ 中的线性数量的抽象是必要的,假设 $c=1$ 个抽象。相比之下,如果 $c \ge 2$,我们可以回传i.i.d. 的情况下的低于线性数量的抽象,即 $O(\sqrt{k}/\varepsilon^2 + 1/\varepsilon^4)$ 个抽象足够,与i.i.d. 情况下的低于线性数量的抽象匹配。此外,我们还显示出在 $c=2$ 情况下,存在一个常数 $\rho > 0$,使得甚至在 $\rho k$ 个抽象下,没有任何测试器可以对多个抽象(忽略哪些抽象是哪些 $\textbf{p}_i$ 的)进行uniformity测试。