cs.LG - 2023-10-08

Adversarial Attacks on Combinatorial Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2310.05308
  • repo_url: None
  • paper_authors: Rishab Balasubramanian, Jiawei Li, Prasad Tadepalli, Huazheng Wang, Qingyun Wu, Haoyu Zhao
  • for: 这 paper 研究了对 Combinatorial Multi-armed Bandits (CMAB) 的奖伪攻击。
  • methods: 这 paper 提供了一个 suficient 和 necessary condition for the attackability of CMAB, 以及一个攻击算法 для可攻击的 CMAB 实例。
  • results: 这 paper 发现了一个意外的事实,即攻击 CMAB 实例的可能性还取决于敌方知道或不知道该实例的环境。这意味着在实际应用中,对 CMAB 的攻击非常困难,并且无法找到一个通用的攻击策略。这paper 通过实验 validate 了这些理论发现。
    Abstract We study reward poisoning attacks on Combinatorial Multi-armed Bandits (CMAB). We first provide a sufficient and necessary condition for the attackability of CMAB, which depends on the intrinsic properties of the corresponding CMAB instance such as the reward distributions of super arms and outcome distributions of base arms. Additionally, we devise an attack algorithm for attackable CMAB instances. Contrary to prior understanding of multi-armed bandits, our work reveals a surprising fact that the attackability of a specific CMAB instance also depends on whether the bandit instance is known or unknown to the adversary. This finding indicates that adversarial attacks on CMAB are difficult in practice and a general attack strategy for any CMAB instance does not exist since the environment is mostly unknown to the adversary. We validate our theoretical findings via extensive experiments on real-world CMAB applications including probabilistic maximum covering problem, online minimum spanning tree, cascading bandits for online ranking, and online shortest path.
    摘要 我们研究了对 combinatorial multi-armed bandit (CMAB) 的奖励毒攻击。我们首先提供了 CMAB 的攻击可行性必要和 suficient condition,这取决于 CMAB 实例中的奖励分布和结果分布。此外,我们还设计了一种攻击算法 для可攻击 CMAB 实例。与先前对多重抓拍机器人的理解不同,我们发现了一个意外的事实,即 CMAB 实例的攻击可行性还取决于敌方知道或不知道 CMAB 实例的情况。这一发现表明了在实践中对 CMAB 进行攻击是困难的,并且没有一个通用的攻击策略可以应用于任何 CMAB 实例,因为环境多数是不知道的。我们验证了我们的理论发现Result via 广泛的实验,包括probabilistic maximum covering problem、online minimum spanning tree、cascading bandits for online ranking和online shortest path。

Successive Data Injection in Conditional Quantum GAN Applied to Time Series Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.05307
  • repo_url: None
  • paper_authors: Benjamin Kalfon, Soumaya Cherkaoui, Jean-Frédéric Laprade, Ola Ahmad, Shengrui Wang
  • for: 这个论文主要针对的是如何使用量子生成器网络(QGAN)进行异常检测,尤其是在通信网络中采集的时间序列数据上。
  • methods: 这篇论文提出了一种新的高维编码方法,named Successive Data Injection(SuDaI),以便在量子状态中扩展更大的数据空间,从而适应更高维的时间序列数据。
  • results: 该方法可以在高维时间序列数据上进行异常检测,并且可以在其他类型的高维时间序列数据上应用,因此开 up了多个应用领域。
    Abstract Classical GAN architectures have shown interesting results for solving anomaly detection problems in general and for time series anomalies in particular, such as those arising in communication networks. In recent years, several quantum GAN architectures have been proposed in the literature. When detecting anomalies in time series using QGANs, huge challenges arise due to the limited number of qubits compared to the size of the data. To address these challenges, we propose a new high-dimensional encoding approach, named Successive Data Injection (SuDaI). In this approach, we explore a larger portion of the quantum state than that in the conventional angle encoding, the method used predominantly in the literature, through repeated data injections into the quantum state. SuDaI encoding allows us to adapt the QGAN for anomaly detection with network data of a much higher dimensionality than with the existing known QGANs implementations. In addition, SuDaI encoding applies to other types of high-dimensional time series and can be used in contexts beyond anomaly detection and QGANs, opening up therefore multiple fields of application.
    摘要 传统的GAN架构在检测异常问题上有诸多有趣的结果,特别是在通信网络中出现的时间序列异常问题。在过去几年,一些量子GAN架构在 литературе中被提出。在使用QGAN检测时间序列异常时,面临着很大的挑战,主要是因为量子状态的限制比数据集大得多。为解决这些挑战,我们提出了一种新的高维码编码方法,名为Successive Data Injection(SuDaI)。在SuDaI编码中,我们可以更好地探索量子状态的更大部分,而不是在文献中主要使用的角度编码方法。这使得我们可以通过重复数据注入到量子状态来适应QGAN检测高维时间序列数据的问题。此外,SuDaI编码还适用于其他类型的高维时间序列和不同的应用场景,因此开放了多个应用领域。

Clustering Three-Way Data with Outliers

  • paper_url: http://arxiv.org/abs/2310.05288
  • repo_url: None
  • paper_authors: Katharine M. Clark, Paul D. McNicholas
  • for: clustering matrix-variate normal data with outliers
  • methods: 使用分布subset log-likelihoods, extends OCLUST algorithm to matrix-variate normal data,使用迭代方法检测和剔除异常点
  • results: 可以有效地检测和剔除matrix-variate normal data中的异常点
    Abstract Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
    摘要 矩阵变量分布是现代模型基 clustering 领域的新添加,可以处理矩阵数据形式的复杂结构,如图像和时间序列。由于其新的出现,关于矩阵变量数据的文献非常有限,甚至更少关于处理异常值在这些模型中。一种用于矩阵变量正态数据 clustering 和异常值排除的方法被讨论。该方法基于分布subset log-likelihood的分布,对矩阵变量数据进行了扩展,并使用迭代法排除异常值。

Learning force laws in many-body systems

  • paper_url: http://arxiv.org/abs/2310.05273
  • repo_url: https://github.com/wyu54/many-body-force-infer
  • paper_authors: Wentao Yu, Eslam Abdelaleem, Ilya Nemenman, Justin C. Burton
  • for: The paper is written to demonstrate a machine learning (ML) approach for discovering force laws in dusty plasma experiments.
  • methods: The paper uses 3D particle trajectories to train an ML model that incorporates physical intuition to infer the effective non-reciprocal forces between particles, accounting for inherent symmetries and non-identical particles.
  • results: The model accurately learns the force laws and extracts each particle’s mass and charge, with an accuracy of R^2 > 0.99, indicating new physics in dusty plasma beyond the resolution of current theories and demonstrating the potential of ML-powered approaches for guiding new routes of scientific discovery in many-body systems.Here’s the same information in Simplified Chinese text:
  • for: 该文章用于演示一种基于机器学习(ML)的方法,用于在尘晶体实验中发现力法律。
  • methods: 该文章使用3D particulate轨迹来训练一个ML模型,该模型具有物理直觉,以推导粒子之间的有效非对称力,并考虑粒子之间的自旋Symmetry和不同的粒子。
  • results: 模型具有R^2>0.99的准确性,表明尘晶体中存在跟当前理论不同的新物理现象,并证明ML能力可以导引科学发现的新路径。
    Abstract Scientific laws describing natural systems may be more complex than our intuition can handle, and thus how we discover laws must change. Machine learning (ML) models can analyze large quantities of data, but their structure should match the underlying physical constraints to provide useful insight. Here we demonstrate a ML approach that incorporates such physical intuition to infer force laws in dusty plasma experiments. Trained on 3D particle trajectories, the model accounts for inherent symmetries and non-identical particles, accurately learns the effective non-reciprocal forces between particles, and extracts each particle's mass and charge. The model's accuracy (R^2 > 0.99) points to new physics in dusty plasma beyond the resolution of current theories and demonstrates how ML-powered approaches can guide new routes of scientific discovery in many-body systems.
    摘要

Simplifying GNN Performance with Low Rank Kernel Models

  • paper_url: http://arxiv.org/abs/2310.05250
  • repo_url: https://github.com/lucianoavinas/lowrank-gnn-kernels
  • paper_authors: Luciano Vinas, Arash A. Amini
  • for: 本研究 revisits recent spectral GNN approaches to semi-supervised node classification (SSNC), 提出许多现代GNN架构可能过度设计。
  • methods: 研究使用非 Parametric estimation 技术在 spectral 频谱中应用,代替许多深度学习引用 GNN 设计。这些传统技术适用于各种图类型,达到了许多常见 SSNC benchmark 的状态 arts 性能。
  • results: 研究表明,近期 GNN 方法的性能改进部分归功于评估方法的转变。此外,对 GNN спектраль滤波技术的各种超参数进行了ablative study。代码可以在 https://github.com/lucianoAvinas/lowrank-gnn-kernels 找到。
    Abstract We revisit recent spectral GNN approaches to semi-supervised node classification (SSNC). We posit that many of the current GNN architectures may be over-engineered. Instead, simpler, traditional methods from nonparametric estimation, applied in the spectral domain, could replace many deep-learning inspired GNN designs. These conventional techniques appear to be well suited for a variety of graph types reaching state-of-the-art performance on many of the common SSNC benchmarks. Additionally, we show that recent performance improvements in GNN approaches may be partially attributed to shifts in evaluation conventions. Lastly, an ablative study is conducted on the various hyperparameters associated with GNN spectral filtering techniques. Code available at: https://github.com/lucianoAvinas/lowrank-gnn-kernels
    摘要 我们回到最近的спектルールGraph Neural Network(GNN)方法,用于半supervised node classification(SSNC)。我们认为许多现有的GNN架构可能是过工程。相反,更简单的传统方法,应用于spectral domain,可以取代许多深度学习灵感的GNN设计。这些传统技术适用于多种граф型,可以达到多数常见的SSNC benchmark中的state-of-the-art表现。此外,我们显示出最近GNN方法的性能提升部分可以归因于评估惯例的变化。最后,我们进行了GNNspectral filtering技术各种参数的ablative study。code可以在以下github上取得:https://github.com/lucianoAvinas/lowrank-gnn-kernels。

Enhancing Kernel Flexibility via Learning Asymmetric Locally-Adaptive Kernels

  • paper_url: http://arxiv.org/abs/2310.05236
  • repo_url: https://github.com/hefansjtu/labrbf_kernel
  • paper_authors: Fan He, Mingzhen He, Lei Shi, Xiaolin Huang, Johan A. K. Suykens
  • for: 这篇论文的目的是提高基域学习的灵活性,通过使用可训练的本地适应宽度(LAB)来增强径basis函数(RBF)kernels。
  • methods: 这篇论文提出了一种新的非对称基域函数(Asymmetric Kernel Ridge Regression,AKRR)框架,并引入了一种循环基域学习算法来训练可训练的本地适应宽度。
  • results: 实验结果表明,提出的方法可以在实际 dataset 上达到remarkable的性能,比 Nystr"om approximation-based algorithms 更具有扩展性,并且在基域学习方法中显示出较高的准确率,甚至超过了 residual neural networks。
    Abstract The lack of sufficient flexibility is the key bottleneck of kernel-based learning that relies on manually designed, pre-given, and non-trainable kernels. To enhance kernel flexibility, this paper introduces the concept of Locally-Adaptive-Bandwidths (LAB) as trainable parameters to enhance the Radial Basis Function (RBF) kernel, giving rise to the LAB RBF kernel. The parameters in LAB RBF kernels are data-dependent, and its number can increase with the dataset, allowing for better adaptation to diverse data patterns and enhancing the flexibility of the learned function. This newfound flexibility also brings challenges, particularly with regards to asymmetry and the need for an efficient learning algorithm. To address these challenges, this paper for the first time establishes an asymmetric kernel ridge regression framework and introduces an iterative kernel learning algorithm. This novel approach not only reduces the demand for extensive support data but also significantly improves generalization by training bandwidths on the available training data. Experimental results on real datasets underscore the remarkable performance of the proposed algorithm, showcasing its superior capability in handling large-scale datasets compared to Nystr\"om approximation-based algorithms. Moreover, it demonstrates a significant improvement in regression accuracy over existing kernel-based learning methods and even surpasses residual neural networks.
    摘要 文中提到的主要瓶须是基于手动设计、预给定、不可学习的kernels的学习系统的缺乏足够的灵活性。为了增强kernel的灵活性,这篇论文引入了Locally-Adaptive-Bandwidths(LAB)作为可学习参数,从而改进了基于卷积函数(RBF)kernel,得到LAB RBF kernel。这些参数随着数据的变化而变化,数量可以随着数据集的增加而增加,以适应多样化的数据模式,从而提高学习的灵活性。然而,这种新的灵活性也带来了挑战,特别是偏 asymmetry和有效的学习算法的需求。为了解决这些挑战,这篇论文首次提出了一种偏 asymmetric kernel ridge regression框架,并引入了一种迭代式 kernel learning算法。这种新的方法不仅可以减少了大量的支持数据,还可以很好地适应不同的数据模式,从而提高了泛化性。实验结果表明,提议的算法在实际数据上表现出色,比 Nystr\"om Approximation-based algorithms更好地处理大规模数据,并且超过了基于kernel的学习方法和 residual neural networks 的准确率。

Global Convergence of Policy Gradient Methods in Reinforcement Learning, Games and Control

  • paper_url: http://arxiv.org/abs/2310.05230
  • repo_url: None
  • paper_authors: Shicong Cen, Yuejie Chi
  • for: 政策梯度方法,用于Sequential Decision Making中寻找政策优化。
  • methods: 使用首选信息来最大化价值函数。
  • results: 最近的进展包括对政策梯度方法的全球最优性保证,以及对重要问题参数的finite-time收敛率。
    Abstract Policy gradient methods, where one searches for the policy of interest by maximizing the value functions using first-order information, become increasingly popular for sequential decision making in reinforcement learning, games, and control. Guaranteeing the global optimality of policy gradient methods, however, is highly nontrivial due to nonconcavity of the value functions. In this exposition, we highlight recent progresses in understanding and developing policy gradient methods with global convergence guarantees, putting an emphasis on their finite-time convergence rates with regard to salient problem parameters.
    摘要

Accelerating Machine Learning Primitives on Commodity Hardware

  • paper_url: http://arxiv.org/abs/2310.05218
  • repo_url: None
  • paper_authors: Roman Snytsar
  • for: 这篇论文是用于探讨深度神经网络(DNN)中的滑动窗口卷积技术,并对其进行了广泛的研究和评估。
  • methods: 本论文使用了滑动窗口卷积技术来提高深度神经网络的训练和推理效率,并对其进行了广泛的研究和评估。
  • results: 研究结果表明,使用滑动窗口卷积技术可以减少内存占用和提高计算效率,并在CPU和专门设计的硬件加速器上实现显著的速度提升。这种技术可能会推动AI在低功耗和低内存设备上的广泛应用,无需特殊硬件。
    Abstract Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware. We also discuss the compatibility of model compression methods and optimized network architectures with the Sliding Window technique, encouraging further research in these areas.
    摘要 Sliding Window Sum算法已经成功地应用于深度神经网络的训练和推理。我们之前已经证明了抽象和卷积1-D primitives可以表示为滑动和计算缓存的共享结构。在这篇论文中,我们进行了广泛的滑动窗口卷积技术的研究,作为深度神经网络中常用的普通矩阵乘法(GEMM)基于卷积的更有效的替代方案。滑动窗口技术解决了内存膨胀问题,并在2-D卷积中显示出了明显的速度提升。我们对不同的实现进行了探索,包括特定的缓存器大小的自定义kernels。我们的结果表明,滑动窗口计算kernels可以在CPU和专门的硬件加速器上超越GEMM-基于卷积。这可能会推动AI在低功耗和低内存设备上的更广泛应用,无需特殊硬件。我们还讨论了模型压缩方法和优化网络架构与滑动窗口技术的相容性,鼓励进一步的研究在这些领域。

Towards Optimizing with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.05204
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Pei-Fu Guo, Ying-Hsuan Chen, Yun-Da Tsai, Shou-De Lin
  • for: 这个研究是为了评估LLMs在不同任务和数据大小下的优化能力。
  • methods: 这个研究使用了交互式提示法, LLMS需要在每个优化步骤中从过去生成的解决方案中生成新的解决方案,然后评估这些新的解决方案的值。研究者还引入了三种综合评估任务性能的指标,这些指标适用于评估LLM在各种优化任务中的表现,并且对测试样本的变化更加敏感。
  • results: 研究发现,当处理小样本时,LLMs表现出强大的优化能力,但是对数据大小和值的影响表明需要进一步研究LLM在优化任务中的表现。
    Abstract In this work, we conduct an assessment of the optimization capabilities of LLMs across various tasks and data sizes. Each of these tasks corresponds to unique optimization domains, and LLMs are required to execute these tasks with interactive prompting. That is, in each optimization step, the LLM generates new solutions from the past generated solutions with their values, and then the new solutions are evaluated and considered in the next optimization step. Additionally, we introduce three distinct metrics for a comprehensive assessment of task performance from various perspectives. These metrics offer the advantage of being applicable for evaluating LLM performance across a broad spectrum of optimization tasks and are less sensitive to variations in test samples. By applying these metrics, we observe that LLMs exhibit strong optimization capabilities when dealing with small-sized samples. However, their performance is significantly influenced by factors like data size and values, underscoring the importance of further research in the domain of optimization tasks for LLMs.
    摘要 在这个工作中,我们进行了 LLMS 的优化能力评估,涵盖了多种任务和数据大小。每个任务都对应于唯一的优化领域,LLMS 需要在交互式提示下执行这些任务。即在每次优化步骤中,LLMS 从过去生成的解决方案和其值中生成新的解决方案,然后评估并考虑这些新的解决方案。此外,我们引入了三种特征metric来全面评估任务性能从多个角度。这些 metric 可以用于评估 LLMS 在各种优化任务上的性能,并且对测试样本的变化更加敏感。通过应用这些 metric,我们发现 LLMS 在小样本Size 下表现出色,但是它们的表现受到数据大小和值的影响,这 highlights 了进一步研究 LLMS 在优化任务领域的必要性。

Lifelong Learning for Fog Load Balancing: A Transfer Learning Approach

  • paper_url: http://arxiv.org/abs/2310.05187
  • repo_url: None
  • paper_authors: Maad Ebrahim, Abdelhakim Senhaji Hafid, Mohamed Riduan Abid
  • for: 本文旨在提出一种基于Reinforcement Learning(RL)的fog computing环境中的负载均衡(LB)策略,以提高系统性能。
  • methods: 本文使用privacy-aware RL agents来优化 fog computing环境中的负载均衡,并提出一种生命周期学习框架,使用 Transfer Learning(TL)来减少训练成本和适应环境变化。
  • results: 本文的实验结果显示,使用TL可以大幅减少RL agents的训练时间和失败概率,并在不同的环境下保持鲁棒性。
    Abstract Fog computing emerged as a promising paradigm to address the challenges of processing and managing data generated by the Internet of Things (IoT). Load balancing (LB) plays a crucial role in Fog computing environments to optimize the overall system performance. It requires efficient resource allocation to improve resource utilization, minimize latency, and enhance the quality of service for end-users. In this work, we improve the performance of privacy-aware Reinforcement Learning (RL) agents that optimize the execution delay of IoT applications by minimizing the waiting delay. To maintain privacy, these agents optimize the waiting delay by minimizing the change in the number of queued requests in the whole system, i.e., without explicitly observing the actual number of requests that are queued in each Fog node nor observing the compute resource capabilities of those nodes. Besides improving the performance of these agents, we propose in this paper a lifelong learning framework for these agents, where lightweight inference models are used during deployment to minimize action delay and only retrained in case of significant environmental changes. To improve the performance, minimize the training cost, and adapt the agents to those changes, we explore the application of Transfer Learning (TL). TL transfers the knowledge acquired from a source domain and applies it to a target domain, enabling the reuse of learned policies and experiences. TL can be also used to pre-train the agent in simulation before fine-tuning it in the real environment; this significantly reduces failure probability compared to learning from scratch in the real environment. To our knowledge, there are no existing efforts in the literature that use TL to address lifelong learning for RL-based Fog LB; this is one of the main obstacles in deploying RL LB solutions in Fog systems.
    摘要 FOG计算技术 emerged as a promising paradigm to address the challenges of processing and managing data generated by the Internet of Things (IoT). Load balancing (LB) plays a crucial role in FOG computing environments to optimize the overall system performance. It requires efficient resource allocation to improve resource utilization, minimize latency, and enhance the quality of service for end-users. In this work, we improve the performance of privacy-aware Reinforcement Learning (RL) agents that optimize the execution delay of IoT applications by minimizing the waiting delay. To maintain privacy, these agents optimize the waiting delay by minimizing the change in the number of queued requests in the whole system, i.e., without explicitly observing the actual number of requests that are queued in each FOG node nor observing the compute resource capabilities of those nodes. Besides improving the performance of these agents, we propose in this paper a lifelong learning framework for these agents, where lightweight inference models are used during deployment to minimize action delay and only retrained in case of significant environmental changes. To improve the performance, minimize the training cost, and adapt the agents to those changes, we explore the application of Transfer Learning (TL). TL transfers the knowledge acquired from a source domain and applies it to a target domain, enabling the reuse of learned policies and experiences. TL can be also used to pre-train the agent in simulation before fine-tuning it in the real environment; this significantly reduces failure probability compared to learning from scratch in the real environment. To our knowledge, there are no existing efforts in the literature that use TL to address lifelong learning for RL-based Fog LB; this is one of the main obstacles in deploying RL LB solutions in FOG systems.

Unified speech and gesture synthesis using flow matching

  • paper_url: http://arxiv.org/abs/2310.05181
  • repo_url: None
  • paper_authors: Shivam Mehta, Ruibo Tu, Simon Alexanderson, Jonas Beskow, Éva Székely, Gustav Eje Henter
  • for: 这篇论文旨在描述一种新的多Modal合成方法,可以同时生成语音和手势动作。
  • methods: 该方法使用优化交通流行为匹配(OT-CFM)来联合生成语音和手势动作,而且比前一代更简单,具有更小的内存占用量,同时能够捕捉语音和手势的联合分布,从而生成两个模态的动作。
  • results: 该方法在论文中被证明可以生成更自然的语音和更人工的手势动作,并且在单模和多模测试中也表现出更高的合理性。
    Abstract As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behaviour, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesising speech acoustics and skeleton-based 3D gesture motion from text, trained using optimal-transport conditional flow matching (OT-CFM). The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures, generating both modalities together in one single process. The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before. Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks.
    摘要 As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks, there is growing interest in multimodal synthesis of verbal and non-verbal communicative behavior, such as spontaneous speech and associated body gestures. This paper presents a novel, unified architecture for jointly synthesizing speech acoustics and skeleton-based 3D gesture motion from text, trained using optimal-transport conditional flow matching (OT-CFM). The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures, generating both modalities together in one single process. The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before. Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks.Here's the translation breakdown:* As text-to-speech technologies achieve remarkable naturalness in read-aloud tasks: 文本读取技术已经达到了很高的自然性水平。* there is growing interest in multimodal synthesis of verbal and non-verbal communicative behavior: 人们对于涉及语音和非语音通信行为的多模态合成表示越来越大的兴趣。* such as spontaneous speech and associated body gestures: 例如,自然的语音和相关的身体姿势。* This paper presents a novel, unified architecture for jointly synthesizing speech acoustics and skeleton-based 3D gesture motion from text: 本文提出了一种新的、统一的架构,用于从文本中同时合成语音和基于骨架的3D手势动作。* trained using optimal-transport conditional flow matching (OT-CFM): 使用优化交通Conditional Flow匹配(OT-CFM)进行训练。* The proposed architecture is simpler than the previous state of the art, has a smaller memory footprint, and can capture the joint distribution of speech and gestures: 提出的架构比前一代更简单,占用更少的内存空间,并能够捕捉语音和手势的共同分布。* generating both modalities together in one single process: 一起生成两种Modalities。* The new training regime, meanwhile, enables better synthesis quality in much fewer steps (network evaluations) than before: 新的训练方法可以在更少的步骤(网络评估)中实现更高质量的合成。* Uni- and multimodal subjective tests demonstrate improved speech naturalness, gesture human-likeness, and cross-modal appropriateness compared to existing benchmarks: 单模态和多模态主观测试表明,提出的方法可以提高语音自然性、姿势人类化和交叉模态适应性,相比exist的 referential。

Distributional Reinforcement Learning with Online Risk-awareness Adaption

  • paper_url: http://arxiv.org/abs/2310.05179
  • repo_url: None
  • paper_authors: Yupeng Wu, Wenjie Huang
  • for: 本研究旨在提出一种新的分布式RL框架,以快速适应不确定环境中的不同风险水平,以提高RL在安全关键环境中的可靠优化策略。
  • methods: 该框架基于分布式RL的基础上,通过在线解决一个总变量最小化问题, dynamically选择适度的epistemic风险水平,以满足安全性和稳定性的要求。这里使用了一种 Follow-The-Leader 类型的搜索算法,以及一种特殊修改的损失函数,以实现在线选择风险水平。
  • results: 对多种任务进行比较,研究发现,DRL-ORA方法在面对不确定环境中表现出色,超过了基于固定风险水平或手动适应风险水平的方法。此外,研究还发现,DRL-ORA方法可以轻松地与多种RL算法结合使用,不需要进行大量的修改。
    Abstract The use of reinforcement learning (RL) in practical applications requires considering sub-optimal outcomes, which depend on the agent's familiarity with the uncertain environment. Dynamically adjusting the level of epistemic risk over the course of learning can tactically achieve reliable optimal policy in safety-critical environments and tackle the sub-optimality of a static risk level. In this work, we introduce a novel framework, Distributional RL with Online Risk Adaption (DRL-ORA), which can quantify the aleatory and epistemic uncertainties compositely and dynamically select the epistemic risk levels via solving a total variation minimization problem online. The risk level selection can be efficiently achieved through grid search using a Follow-The-Leader type algorithm, and its offline oracle is related to "satisficing measure" (in the decision analysis community) under a special modification of the loss function. We show multiple classes of tasks where DRL-ORA outperforms existing methods that rely on either a fixed risk level or manually predetermined risk level adaption. Given the simplicity of our modifications, we believe the framework can be easily incorporated into most RL algorithm variants.
    摘要 使用强化学习(RL)在实际应用中需要考虑不理想的结果,这些结果取决于智能体对不确定环境的熟悉程度。随着学习过程中的时间推移, dynamically 调整epistemic 风险水平可以策略性实现可靠的优化策略并解决固定风险水平的不优势。在这项工作中,我们介绍了一种新的框架,分布式RLwith Online Risk Adaptation(DRL-ORA),它可以compositely 量化 aleatory 和 epistemic uncertainties,并在线 solves total variation minimization problem 来动态选择 epistemic 风险水平。风险水平选择可以高效地通过格子搜索使用 Follow-The-Leader 类型算法进行,其 Offline oracle 与 "满意度量"(在决策分析社区)相关,只是在特定的损失函数修改下进行修改。我们证明了多种任务上,DRL-ORA 可以超过现有的固定风险水平或手动适应风险水平的方法。 compte tenu de la simplicité de nos modifications, nous croyons que le framework peut être facilement intégré dans la plupart des variantes d'algorithme RL.

Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity

  • paper_url: http://arxiv.org/abs/2310.05175
  • repo_url: https://github.com/luuyin/owl
  • paper_authors: Lu Yin, You Wu, Zhenyu Zhang, Cheng-Yu Hsieh, Yaqing Wang, Yiling Jia, Mykola Pechenizkiy, Yi Liang, Zhangyang Wang, Shiwei Liu
    for:* The paper aims to improve the practical deployment of large language models (LLMs) by applying traditional network pruning techniques.methods:* The paper introduces a novel LLM pruning methodology called Outlier Weighed Layerwise sparsity (OWL), which incorporates non-uniform layerwise sparsity ratios tailored for LLM pruning.results:* The paper demonstrates the distinct advantages offered by OWL over previous methods, achieving a remarkable performance gain of 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively, compared to the state-of-the-art Wanda and SparseGPT.
    Abstract Large Language Models (LLMs), renowned for their remarkable performance, present a challenge due to their colossal model size when it comes to practical deployment. In response to this challenge, efforts have been directed toward the application of traditional network pruning techniques to LLMs, uncovering a massive number of parameters can be pruned in one-shot without hurting performance. Building upon insights gained from pre-LLM models, prevailing LLM pruning strategies have consistently adhered to the practice of uniformly pruning all layers at equivalent sparsity. However, this observation stands in contrast to the prevailing trends observed in the field of vision models, where non-uniform layerwise sparsity typically yields substantially improved results. To elucidate the underlying reasons for this disparity, we conduct a comprehensive analysis of the distribution of token features within LLMs. In doing so, we discover a strong correlation with the emergence of outliers, defined as features exhibiting significantly greater magnitudes compared to their counterparts in feature dimensions. Inspired by this finding, we introduce a novel LLM pruning methodology that incorporates a tailored set of non-uniform layerwise sparsity ratios specifically designed for LLM pruning, termed as Outlier Weighed Layerwise sparsity (OWL). The sparsity ratio of OWL is directly proportional to the outlier ratio observed within each layer, facilitating a more effective alignment between layerwise weight sparsity and outlier ratios. Our empirical evaluation, conducted across the LLaMA-V1 family and OPT, spanning various benchmarks, demonstrates the distinct advantages offered by OWL over previous methods. For instance, our approach exhibits a remarkable performance gain, surpassing the state-of-the-art Wanda and SparseGPT by 61.22 and 6.80 perplexity at a high sparsity level of 70%, respectively.
    摘要 大型语言模型(LLM)因其出色的表现而带来挑战,尤其是在实际应用时遇到模型的巨大大小问题。为了解决这个问题,努力对 LLM 应用传统网络剔除技术,发现可以剔除一个巨大的数据量而无需影响表现。建立在先前的模型中获得的见解上,现有的 LLM 剔除策略一般遵循以剔除所有层的内容的方式,但这与视觉模型中的非均匀层剔除倾向相比,通常会带来更好的结果。为了了解 LLM 中对于剔除的影响,我们进行了一个全面的分析,发现 LLM 中的内容特征分布和异常值的出现有很强的联系。根据这个发现,我们提出了一种新的 LLM 剔除方法,称为非均匀层剔除(OWL)。OWL 的剔除比率与异常值的比率直接相比,从而使得层剔除与异常值的分布更好地匹配。我们对 LLMA-V1 家族和 OPT 进行了实验,覆盖了多个测试benchmark,结果显示我们的方法在剔除率高于 70% 时表现出色,较前者的状态顶峰方法(Wanda和SparseGPT)提高了61.22和6.80的混淆度。

Investigating the Ability of PINNs To Solve Burgers’ PDE Near Finite-Time BlowUp

  • paper_url: http://arxiv.org/abs/2310.05169
  • repo_url: None
  • paper_authors: Dibyakanti Kumar, Anirbit Mukherjee
  • for: 这个论文旨在investigating the stability of Physics Informed Neural Networks (PINNs) in solving partial differential equations (PDEs) with finite-time blow-ups.
  • methods: 作者使用了泛化 bound的方法来研究PINNs的稳定性,并通过实验证明了这些 bound 与 neurally found surrogate 的 $\ell_2$-distance有直接的相关性。
  • results: 研究发现,PINNs 可以准确地探测 finite-time blow-ups,并且可以提供与真实解的 $\ell_2$-distance的评估。
    Abstract Physics Informed Neural Networks (PINNs) have been achieving ever newer feats of solving complicated PDEs numerically while offering an attractive trade-off between accuracy and speed of inference. A particularly challenging aspect of PDEs is that there exist simple PDEs which can evolve into singular solutions in finite time starting from smooth initial conditions. In recent times some striking experiments have suggested that PINNs might be good at even detecting such finite-time blow-ups. In this work, we embark on a program to investigate this stability of PINNs from a rigorous theoretical viewpoint. Firstly, we derive generalization bounds for PINNs for Burgers' PDE, in arbitrary dimensions, under conditions that allow for a finite-time blow-up. Then we demonstrate via experiments that our bounds are significantly correlated to the $\ell_2$-distance of the neurally found surrogate from the true blow-up solution, when computed on sequences of PDEs that are getting increasingly close to a blow-up.
    摘要 物理学 Informed Neural Networks (PINNs) 在解决复杂的偏微分方程(PDEs)方面已经取得了不断更新的成就,同时提供了吸引人的准确率和推理速度之间的折衔。特别是,PDEs 中存在一些简单的 PDEs,可以在有限时间内从流体初始条件演化成精炼解。在最近的实验中,有些突出的实验结果表明,PINNs 可能会检测到这种有限时间爆炸。在这项工作中,我们开始了一项研究,以探讨 PINNs 在理论上的稳定性。首先,我们 derive了 PINNs 对于布尔格 PDE 的泛化上限,在任意维度下,以条件 Allowing for finite-time blow-up。然后,我们通过实验表明,我们的上限与 neurally 发现的代理模型在计算 PDE 序列中的 $\ell_2$ 距离是高度相关的,当 PDE 序列在爆炸解 approached 时。

A Corrected Expected Improvement Acquisition Function Under Noisy Observations

  • paper_url: http://arxiv.org/abs/2310.05166
  • repo_url: https://github.com/han678/correctednoisyei
  • paper_authors: Han Zhou, Xingchen Ma, Matthew B Blaschko
  • for: 该论文主要针对的是 Bayesian 优化中的难题,即在含有噪声的观测中使用预期改进(EI)策略。
  • methods: 该论文提出了一种基于 Gaussian Process 模型的 EI 策略修正方法,该方法可以考虑噪声的影响,并提供一个包容更多情况的 acquisition function。
  • results: 该论文通过 theoretically 和实验来证明,该修正方法可以提高 EI 策略在含有噪声的情况下的性能,并且可以在黑盒优化和神经网络模型压缩等问题中提供更好的解决方案。
    Abstract Sequential maximization of expected improvement (EI) is one of the most widely used policies in Bayesian optimization because of its simplicity and ability to handle noisy observations. In particular, the improvement function often uses the best posterior mean as the best incumbent in noisy settings. However, the uncertainty associated with the incumbent solution is often neglected in many analytic EI-type methods: a closed-form acquisition function is derived in the noise-free setting, but then applied to the setting with noisy observations. To address this limitation, we propose a modification of EI that corrects its closed-form expression by incorporating the covariance information provided by the Gaussian Process (GP) model. This acquisition function specializes to the classical noise-free result, and we argue should replace that formula in Bayesian optimization software packages, tutorials, and textbooks. This enhanced acquisition provides good generality for noisy and noiseless settings. We show that our method achieves a sublinear convergence rate on the cumulative regret bound under heteroscedastic observation noise. Our empirical results demonstrate that our proposed acquisition function can outperform EI in the presence of noisy observations on benchmark functions for black-box optimization, as well as on parameter search for neural network model compression.
    摘要 纯粹最大化期望提升(EI)是搜索优化中最广泛使用的政策之一,主要是因为它的简单性和能够处理噪音观测的能力。具体来说,提升函数经常使用 posterior mean 作为噪音观测下的最佳启发式。然而,启发式解释中往往忽略了启发式解释中的不确定性信息。为了解决这个限制,我们提议修改 EI,通过在 GP 模型中提供的决定矩阵信息来改进它的关闭形式表达。这个购买函数在噪音观测下特有化,我们认为这个函数应该取代普通的噪音观测下的表达。我们的改进购买函数在各种不同的观测环境下都具有良好的通用性。我们证明了我们的方法在各种不同的观测环境下都能够实现下降的 regret 级别。我们的实验结果表明,我们的提议的购买函数在噪音观测下可以超越 EI 在黑obox 优化和神经网络模型压缩中的性能。

Transferable Availability Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2310.05141
  • repo_url: https://github.com/trustmlrg/transpoison
  • paper_authors: Yiyong Liu, Michael Backes, Xiao Zhang
  • for: 这个论文旨在攻击机器学习模型的可用性,特别是针对模型在训练数据上的性能。
  • methods: 这个论文使用了攻击者采用不同的学习算法和攻击策略来降低模型的总测试精度。
  • results: 论文表明,如果攻击者使用不同的学习方法来攻击模型, тоThen the effectiveness of prior poisoning attacks will be significantly decreased. In addition, the authors propose a transferable poisoning attack that can produce poisoned samples with improved transferability across different learners and even paradigms. Through extensive experiments on benchmark image datasets, the authors show that their transferable poisoning attack can produce poisoned samples with significantly improved transferability.
    Abstract We consider availability data poisoning attacks, where an adversary aims to degrade the overall test accuracy of a machine learning model by crafting small perturbations to its training data. Existing poisoning strategies can achieve the attack goal but assume the victim to employ the same learning method as what the adversary uses to mount the attack. In this paper, we argue that this assumption is strong, since the victim may choose any learning algorithm to train the model as long as it can achieve some targeted performance on clean data. Empirically, we observe a large decrease in the effectiveness of prior poisoning attacks if the victim uses a different learning paradigm to train the model and show marked differences in frequency-level characteristics between perturbations generated with respect to different learners and attack methods. To enhance the attack transferability, we propose Transferable Poisoning, which generates high-frequency poisoning perturbations by alternately leveraging the gradient information with two specific algorithms selected from supervised and unsupervised contrastive learning paradigms. Through extensive experiments on benchmark image datasets, we show that our transferable poisoning attack can produce poisoned samples with significantly improved transferability, not only applicable to the two learners used to devise the attack but also for learning algorithms and even paradigms beyond.
    摘要 我们考虑了数据毒化攻击,敌人想要降低机器学习模型的总测试准确率,通过对训练数据进行小幅度的修改。现有的攻击策略可以实现攻击目标,但假设攻击者使用的学习方法与受害者使用的学习方法一样。在这篇论文中,我们认为这是一个强大的假设,因为受害者可以选择任何学习算法来训练模型,只要它可以在干净数据上达到一定的性能目标。我们在实验中观察到,如果受害者使用不同的学习方法来训练模型,攻击效果会减弱很多。为了提高攻击的传送性,我们提议了可传递的毒化攻击,通过交替使用两种特定的算法来生成高频毒化干扰。我们通过对标准图像集进行广泛的实验,证明了我们的可传递毒化攻击可以生成高质量的毒化样本,不仅适用于我们用于制定攻击的两种学习算法,还可以应用于其他学习算法和学习方法。

How Graph Neural Networks Learn: Lessons from Training Dynamics in Function Space

  • paper_url: http://arxiv.org/abs/2310.05105
  • repo_url: None
  • paper_authors: Chenxiao Yang, Qitian Wu, David Wipf, Ruoyu Sun, Junchi Yan
  • for: 本研究的目的是探讨深度学习模型在更加可读性的方式下进行学习行为。特别是对于图神经网络(GNNs),研究者们已经做出了很多进步,但是还没有充分了解GNNs在优化过程中是否会学习愿景函数。
  • methods: 研究者们使用了分析框架来研究GNNs的学习动态,并发现GNNs的训练过程可以被重新描述为一种更加熟悉的标签传播框架。此外,研究者们还提出了一种简化并实现GNNs的学习动态的方法,以提高其效率和可读性。
  • results: 研究者们发现GNNs在不同类型的图上的学习动态,包括同构图和hetrophylic graph,都具有某种程度的相互关联性。此外,研究者们还发现GNNs可以在不同类型的图上学习出高效的函数,并且这些函数可以在新的图上进行泛化。
    Abstract A long-standing goal in deep learning has been to characterize the learning behavior of black-box models in a more interpretable manner. For graph neural networks (GNNs), considerable advances have been made in formalizing what functions they can represent, however it remains less clear whether and how GNNs learn desired functions during the optimization process. To fill this critical gap, we study the learning dynamics of GNNs in function space via the analytic framework of overparameterization. In particular, we find that the seemingly complicated training process of GNNs can be re-cast into a more familiar label propagation framework, due to the graph inductive bias implicit in this process. From this vantage point, we provide explanations for why the learned GNN functions successfully generalize and for their pathological behavior on heterophilic graphs, which are consistent with observations. Practically, sparsifying and implementing the learning dynamics lead to a minimalist semi-supervised learning algorithm with the efficiency of classic algorithms and the effectiveness of modern GNNs.
    摘要 deep learning中的一个长期目标是将黑盒模型的学习行为更加解释性地表示。对于图 neural network (GNN),已经有了许多进步,但是它们是否在优化过程中学习所需的函数仍然不清楚。为了填补这一重要的空白,我们通过过参数化的方法研究 GNN 的学习动态在函数空间。具体来说,我们发现 GNN 的训练过程可以重新划为一种更加熟悉的标签传播框架,这与图适应偏好相关。从这个角度,我们提供了成功泛化和异谱图处理的解释,这与观察相符。在实践中,我们通过减少学习动态和实现来提出一种简洁的半监督学习算法,具有传统算法的效率和现代 GNN 的效果。

Asymmetrically Decentralized Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05093
  • repo_url: None
  • paper_authors: Qinglun Li, Miao Zhang, Nan Yin, Quanjun Yin, Li Shen
  • For: This paper aims to address the communication burden and privacy concerns associated with centralized servers in Federated Learning (FL) by proposing a Decentralized Federated Learning (DFL) algorithm based on asymmetric topologies and the Push-Sum protocol.* Methods: The proposed DFedSGPSM algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum to improve algorithm performance and alleviate local heterogeneous overfitting in FL. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, while the local momentum accelerates the optimization process.* Results: The paper demonstrates the superior performance of the proposed DFedSGPSM algorithm compared to state-of-the-art optimizers through extensive experiments on the MNIST, CIFAR10, and CIFAR100 datasets. The theoretical analysis also proves that the algorithm achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T})$ in a non-convex smooth setting under mild assumptions, and that better topological connectivity achieves tighter upper bounds.
    Abstract To address the communication burden and privacy concerns associated with the centralized server in Federated Learning (FL), Decentralized Federated Learning (DFL) has emerged, which discards the server with a peer-to-peer (P2P) communication framework. However, most existing DFL algorithms are based on symmetric topologies, such as ring and grid topologies, which can easily lead to deadlocks and are susceptible to the impact of network link quality in practice. To address these issues, this paper proposes the DFedSGPSM algorithm, which is based on asymmetric topologies and utilizes the Push-Sum protocol to effectively solve consensus optimization problems. To further improve algorithm performance and alleviate local heterogeneous overfitting in Federated Learning (FL), our algorithm combines the Sharpness Aware Minimization (SAM) optimizer and local momentum. The SAM optimizer employs gradient perturbations to generate locally flat models and searches for models with uniformly low loss values, mitigating local heterogeneous overfitting. The local momentum accelerates the optimization process of the SAM optimizer. Theoretical analysis proves that DFedSGPSM achieves a convergence rate of $\mathcal{O}(\frac{1}{\sqrt{T})$ in a non-convex smooth setting under mild assumptions. This analysis also reveals that better topological connectivity achieves tighter upper bounds. Empirically, extensive experiments are conducted on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating the superior performance of our algorithm compared to state-of-the-art optimizers.
    摘要 为了解决联合服务器在联合学习(FL)中的通信负担和隐私问题,协同联合学习(DFL)已经出现,它抛弃了服务器,使用幂等(P2P)通信框架。然而,大多数现有的DFL算法都基于对称网络 topology,如环和格 topology,这些 topology 可以轻松导致堵塞和因网络连接质量的影响。为解决这些问题,本文提出了DFedSGPSM算法,它基于非对称网络 topology 和Push-Sum协议来有效地解决共识优化问题。为了进一步改进算法性能并避免本地不同类型的过拟合,我们的算法结合了Sharpness Aware Minimization(SAM)优化器和本地冲击。SAM优化器使用梯度偏移来生成本地平滑模型,并在搜索模型的损失值为uniformly low时搜索模型。本地冲击加速了SAM优化器的优化过程。理论分析表明,DFedSGPSM算法在非对称光滑设定下 achieve 收敛速率为 $\mathcal{O}(\frac{1}{\sqrt{T})$ ,其中 T 是迭代次数。此分析还表明,更好的网络连接性可以实现更紧的Upper bound。实验表明,我们的算法在 MNIST、CIFAR10 和 CIFAR100 数据集上进行了广泛的实验,与现状的优化器相比,它的性能显著更高。

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

  • paper_url: http://arxiv.org/abs/2310.05079
  • repo_url: https://github.com/chengzhang-98/llm-mixed-q
  • paper_authors: Cheng Zhang, Jianyi Cheng, Ilia Shumailov, George A. Constantinides, Yiren Zhao
  • for: 这个论文的目的是解决大型自然语言模型(LLM)的扩展和缩放问题,以降低计算和存储资源的成本。
  • methods: 这篇论文使用了统计学和学习Property的分析,发现LLM层的瓶颈在于数值扩展偏移。以此为基础,他们提出了块量化方法,可以有效地减少数值扩展偏移,从计算路径的视角来看。
  • results: 根据论文的结果,使用了6位减法的量化LLM可以达到$19\times$高的数学密度和$5\times$的存储密度,比浮点32基eline高$2.5\times$的数学密度和$1.2\times$的存储密度。此外,他们还分享了在下游任务上实现 nearly-lossless 4位LLM的技巧,包括活动和重量分布的不一致、优化的精度练习策略以及LLM的统计性质中的更低的量化精度。
    Abstract The inference of Large language models (LLMs) requires immense computation and memory resources. To curtail these costs, quantisation has merged as a promising solution, but existing LLM quantisation mainly focuses on 8-bit. In this work, we explore the statistical and learning properties of the LLM layer and attribute the bottleneck of LLM quantisation to numerical scaling offsets. To address this, we adapt block quantisations for LLMs, a family of methods that share scaling factors across packed numbers. Block quantisations efficiently reduce the numerical scaling offsets solely from an arithmetic perspective, without additional treatments in the computational path. Our nearly-lossless quantised 6-bit LLMs achieve a $19\times$ higher arithmetic density and $5\times$ memory density than the float32 baseline, surpassing the prior art 8-bit quantisation by $2.5\times$ in arithmetic density and $1.2\times$ in memory density, without requiring any data calibration or re-training. We also share our insights into sub-8-bit LLM quantisation, including the mismatch between activation and weight distributions, optimal fine-tuning strategies, and a lower quantisation granularity inherent in the statistical properties of LLMs. The latter two tricks enable nearly-lossless 4-bit LLMs on downstream tasks. Our code is open-sourced.
    摘要 Large language models (LLMs) 的推理需要巨量的计算和存储资源。为了降低这些成本,量化已成为一种有前途的解决方案,但现有的 LLM 量化主要集中在8位。在这个工作中,我们研究了 LLM 层的统计和学习特性,并归因 LLM 量化的瓶颈到数字扩大偏移。为解决这个问题,我们采用了块量化技术,这种技术在压缩数据时共享扩大因子。块量化可以高效地减少数字扩大偏移,不需要额外处理在计算路径上。我们的6位量化 LLM 可以达到浮点32基eline的19倍的数学密度和5倍的存储密度,比前一代8位量化的数学密度和存储密度高出2.5倍和1.2倍。此外,我们还分享了在下游任务上实现 nearly-lossless 4位 LLMS 的技巧,包括活动和重量分布不匹配、优化 fine-tuning 策略和 LLMS 的统计性质中的更低的量化精度。这两个技巧使得我们可以在下游任务上实现 nearly-lossless 4位 LLMS。我们的代码已经开源。

FedFed: Feature Distillation against Data Heterogeneity in Federated Learning

  • paper_url: http://arxiv.org/abs/2310.05077
  • repo_url: https://github.com/visitworld123/fedfed
  • paper_authors: Zhiqin Yang, Yonggang Zhang, Yu Zheng, Xinmei Tian, Hao Peng, Tongliang Liu, Bo Han
  • for: 这篇论文旨在解决联合学习(Federated Learning,FL)面临的数据不一致问题,即客户端数据的分布差异。
  • methods: 该论文提出了一种新的方法 called Federated Feature Distillation(FedFed),它将数据分为性能敏感特征(大量对模型性能的贡献)和性能鲁棒特征(对模型性能有限度贡献)。性能敏感特征被全局共享,以减轻数据不一致问题,而性能鲁棒特征被保留在本地。客户端可以使用本地和共享数据来训练模型。
  • results: 实验表明,FedFed 可以提高模型性能。
    Abstract Federated learning (FL) typically faces data heterogeneity, i.e., distribution shifting among clients. Sharing clients' information has shown great potentiality in mitigating data heterogeneity, yet incurs a dilemma in preserving privacy and promoting model performance. To alleviate the dilemma, we raise a fundamental question: \textit{Is it possible to share partial features in the data to tackle data heterogeneity?} In this work, we give an affirmative answer to this question by proposing a novel approach called {\textbf{Fed}erated \textbf{Fe}ature \textbf{d}istillation} (FedFed). Specifically, FedFed partitions data into performance-sensitive features (i.e., greatly contributing to model performance) and performance-robust features (i.e., limitedly contributing to model performance). The performance-sensitive features are globally shared to mitigate data heterogeneity, while the performance-robust features are kept locally. FedFed enables clients to train models over local and shared data. Comprehensive experiments demonstrate the efficacy of FedFed in promoting model performance.
    摘要 通常,联合学习(FL)会面临数据不一致性问题,即客户端数据的分布差异。如果分享客户端信息,可以减轻数据不一致性问题,但是会降低隐私和提高模型性能。为了解决这个矛盾,我们提出了一个基本问题:“是否可以分享数据中的部分特征来解决数据不一致性问题?”在这个工作中,我们给出了一个答案,并提出了一种新的方法 called“联邦特征分离”(FedFed)。具体来说,FedFed将数据分为对性能敏感的特征(即对模型性能具有重要作用)和对性能稳定的特征(即对模型性能具有有限作用)。对性能敏感的特征进行全局分享,以减轻数据不一致性问题,而对性能稳定的特征则保留在本地。FedFed允许客户端通过本地和共享数据来训练模型。我们进行了广泛的实验,并证明了FedFed的效果。

Towards Scalable Wireless Federated Learning: Challenges and Solutions

  • paper_url: http://arxiv.org/abs/2310.05076
  • repo_url: None
  • paper_authors: Yong Zhou, Yuanming Shi, Haibo Zhou, Jingjing Wang, Liqun Fu, Yang Yang
  • for: 本研究旨在探讨在无线网络中实现可信批处理的分布式机器学习( Federated Learning,FL)的挑战和解决方案。
  • methods: 本文提出了两个方面的解决方案:一是通过任务 oriented 模型聚合来提高无线通信缓存性,二是通过计算效率优化来提高资源分配的算法可扩展性。
  • results: 本文提出了三种任务 oriented 学习算法来提高计算效率,并指出了一些需要进一步研究的问题。
    Abstract The explosive growth of smart devices (e.g., mobile phones, vehicles, drones) with sensing, communication, and computation capabilities gives rise to an unprecedented amount of data. The generated massive data together with the rapid advancement of machine learning (ML) techniques spark a variety of intelligent applications. To distill intelligence for supporting these applications, federated learning (FL) emerges as an effective distributed ML framework, given its potential to enable privacy-preserving model training at the network edge. In this article, we discuss the challenges and solutions of achieving scalable wireless FL from the perspectives of both network design and resource orchestration. For network design, we discuss how task-oriented model aggregation affects the performance of wireless FL, followed by proposing effective wireless techniques to enhance the communication scalability via reducing the model aggregation distortion and improving the device participation. For resource orchestration, we identify the limitations of the existing optimization-based algorithms and propose three task-oriented learning algorithms to enhance the algorithmic scalability via achieving computation-efficient resource allocation for wireless FL. We highlight several potential research issues that deserve further study.
    摘要 随着智能设备(如移动电话、汽车、无人机)的激增,生成了历史上无 precedent的数据量。这些大量数据,加上机器学习(ML)技术的快速发展,使得各种智能应用得以实现。为了提取智能,聚合式学习(FL)作为一种有效的分布式ML框架,在网络边缘实现隐私保护的模型训练。本文从网络设计和资源调度两个角度出发,探讨了无线FL的挑战和解决方案。从网络设计角度来看,我们讨论了任务导向的模型聚合如何影响无线FL的性能,然后提出了有效的无线技术来增强通信可扩展性,例如减少模型聚合误差和提高设备参与度。从资源调度角度来看,我们发现现有的优化算法有限制,因此提出了三种任务导向的学习算法,以实现计算效率的资源分配,从而提高无线FL的算法可扩展性。我们还指出了一些需要进一步研究的问题。

Robust-GBDT: A Novel Gradient Boosting Model for Noise-Robust Classification

  • paper_url: http://arxiv.org/abs/2310.05067
  • repo_url: https://github.com/luojiaqimath/robust-gbdt
  • paper_authors: Jiaqi Luo, Yuedong Quan, Shixin Xu
  • for: 这个研究是为了提出一种可以处理标签杂音的高效搜寻算法,并且可以处理多类分类任务。
  • methods: 这个研究使用了进步的Gradient Boosting Decision Trees(GBDT)框架,并且引入了一些robust loss functions,以抵消标签杂音的影响。
  • results: 这个研究发现,使用Robust-GBDT模型可以获得更准确的预测结果,并且可以更好地处理杂音和类别偏心的问题。
    Abstract Robust boosting algorithms have emerged as alternative solutions to traditional boosting techniques for addressing label noise in classification tasks. However, these methods have predominantly focused on binary classification, limiting their applicability to multi-class tasks. Furthermore, they encounter challenges with imbalanced datasets, missing values, and computational efficiency. In this paper, we establish that the loss function employed in advanced Gradient Boosting Decision Trees (GBDT), particularly Newton's method-based GBDT, need not necessarily exhibit global convexity. Instead, the loss function only requires convexity within a specific region. Consequently, these GBDT models can leverage the benefits of nonconvex robust loss functions, making them resilient to noise. Building upon this theoretical insight, we introduce a new noise-robust boosting model called Robust-GBDT, which seamlessly integrates the advanced GBDT framework with robust losses. Additionally, we enhance the existing robust loss functions and introduce a novel robust loss function, Robust Focal Loss, designed to address class imbalance. As a result, Robust-GBDT generates more accurate predictions, significantly enhancing its generalization capabilities, especially in scenarios marked by label noise and class imbalance. Furthermore, Robust-GBDT is user-friendly and can easily integrate existing open-source code, enabling it to effectively handle complex datasets while improving computational efficiency. Numerous experiments confirm the superiority of Robust-GBDT over other noise-robust methods.
    摘要 强健的搜索算法已经出现为 tradicional boosting 技术的替代方案,以解决分类任务中的标签噪声问题。然而,这些方法主要集中在 binary 分类中,因此其可用性不高于多类任务。此外,它们还遇到了不均衡数据集、缺失值和计算效率的挑战。在这篇论文中,我们证明了 GBDT 模型使用的损失函数不必必须具有全局凸性。相反,损失函数只需要在特定区域内具有凸性。因此,这些 GBDT 模型可以利用不凸的 robust 损失函数,使其具有抗噪声的能力。基于这一理论发现,我们引入了一种新的噪声Robust GBDT 模型,该模型通过结合高级 GBDT 框架和robust损失函数来实现。此外,我们还改进了现有的robust损失函数,并引入了一种新的 Robust Focal Loss 函数,用于解决类异常现象。这使得 Robust-GBDT 可以生成更加准确的预测结果,从而提高其泛化能力,特别是在标签噪声和类异常的情况下。此外,Robust-GBDT 易于使用,可以轻松地 интеGRATE现有的开源代码,使其可以有效地处理复杂的数据集,同时提高计算效率。许多实验证明了 Robust-GBDT 的优越性。

Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain

  • paper_url: http://arxiv.org/abs/2310.05063
  • repo_url: None
  • paper_authors: Gerald Woo, Chenghao Liu, Akshat Kumar, Doyen Sahoo
  • for: 这篇论文旨在提供大规模时间序列预测数据集,以便进一步研究预测模型的预训练和扩展。
  • methods: 本研究使用云端操作(CloudOps)领域的三个大规模时间序列预测数据集,其中最大的数据集有比利они个观测值,以便进一步研究预训练和扩展时间序列模型。
  • results: 研究发现,这种预训练方法可以在大规模时间序列预测 task 上 achieve 27% 的错误减少,并且可以与 класиical 学习基eline 相比。
    Abstract Time series has been left behind in the era of pre-training and transfer learning. While research in the fields of natural language processing and computer vision are enjoying progressively larger datasets to train massive models, the most popular time series datasets consist of only tens of thousands of time steps, limiting our ability to study the effectiveness of pre-training and scaling. Recent studies have also cast doubt on the need for expressive models and scale. To alleviate these issues, we introduce three large-scale time series forecasting datasets from the cloud operations (CloudOps) domain, the largest having billions of observations, enabling further study into pre-training and scaling of time series models. We build the empirical groundwork for studying pre-training and scaling of time series models and pave the way for future research by identifying a promising candidate architecture. We show that it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method - achieving a 27% reduction in error on the largest dataset. Code and datasets will be released.
    摘要 时间序列已经被搁置在预训练和传输学习的时代之外。而自然语言处理和计算机视觉领域的研究正在拥有越来越大的数据集来训练庞大模型,而时间序列数据集仅有几万个时间步,限制了我们研究预训练和扩大的能力。最近的研究还把需要表达力强大的模型和扩大的疑问抛弃了出来。为了解决这些问题,我们介绍了三个大规模时间序列预测数据集,来自云操作(CloudOps)领域,最大数据集有数十亿个观察结果,使我们能够进一步研究预训练和扩大时间序列模型的效果。我们建立了时间序列模型预训练和扩大的基础实验,并证明了我们的方法是一个强大的零个shot基线,并且在模型和数据集大小增加时能够减少错误率。我们附加了这些数据集和结果,以及对классиical和深度学习基eline的比较,实现了最大数据集上的27%错误减少。代码和数据集将被公布。

Online Learning in Contextual Second-Price Pay-Per-Click Auctions

  • paper_url: http://arxiv.org/abs/2310.05047
  • repo_url: None
  • paper_authors: Mengxiao Zhang, Haipeng Luo
  • for: 本文研究在上下文敏感的Pay-Per-Click拍卖中进行在线学习,每轮都会收到一些上下文和一组广告,并需要对广告的点击率进行估计,以进行第二价拍卖。learner的目标是尽可能减少她的后悔,定义为她的总收益与一个假设的拍卖策略的差。
  • methods: 我们首先证明了在$T$轮后,learner可以达到$\sqrt{T}$的后悔,并且这是不可避免的,因为我们的算法和多重投机问题类似。然后,我们引用了最近的上下文敏感投机算法的进步,开发了两种实用的上下文拍卖算法:第一种使用对数权重方案和正方差误差,保持了同样的$\sqrt{T}$后悔 bound,而第二种通过简单的ε-胆策略将问题降到在线回归问题,尽管它的后悔 bound更差。
  • results: 我们在一个synthetic数据上进行了实验,并证明了我们的算法在实际应用中的有效性和优异性。
    Abstract We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads and needs to make an estimate on their click-through rate (CTR) in order to run a second-price pay-per-click auction. The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy that always makes perfect CTR predictions. We first show that $\sqrt{T}$-regret is obtainable via a computationally inefficient algorithm and that it is unavoidable since our algorithm is no easier than the classical multi-armed bandit problem. A by-product of our results is a $\sqrt{T}$-regret bound for the simpler non-contextual setting, improving upon a recent work of [Feng et al., 2023] by removing the inverse CTR dependency that could be arbitrarily large. Then, borrowing ideas from recent advances on efficient contextual bandit algorithms, we develop two practically efficient contextual auction algorithms: the first one uses the exponential weight scheme with optimistic square errors and maintains the same $\sqrt{T}$-regret bound, while the second one reduces the problem to online regression via a simple epsilon-greedy strategy, albeit with a worse regret bound. Finally, we conduct experiments on a synthetic dataset to showcase the effectiveness and superior performance of our algorithms.
    摘要 我们研究在上下文中的线上学习,在每个 $T$ 轮中,学习者接收一些上下文以及一组广告,并需要对各个广告的点击率(CTR)进行估计,以进行第二价格支付每击广告。学习者的目标是尽量减少她的恨觉,定义为她的总收入与一个假设总是正确地预测 CTR 的 oracle 策略的差距。我们首先证明了 $\sqrt{T}$-恨觉是可以实现的,并且这是不可避免的,因为我们的算法与经典多重武器问题相同。我们的结果还提供了 $\sqrt{T}$-恨觉 bound для更加简单的非上下文化设定,超过最近的 [Feng et al., 2023] 的研究成果,并将 inverse CTR 依赖项消除。然后,我们借鉴了最近的上下文策略算法的进步,开发了两种实用的上下文拍卖算法:第一种使用几何质数分配方案和乐观方差 Error,保持了同样的 $\sqrt{T}$-恨觉 bound,而第二种将问题降到在线回归问题,通过简单的ε-赫赫策略,尽管它的恨觉 bound 较差。最后,我们在一个 sintetic 数据集上进行了实验,以展示我们的算法的效果和优于性。

Deep Reinforcement Learning Based Cross-Layer Design in Terahertz Mesh Backhaul Networks

  • paper_url: http://arxiv.org/abs/2310.05034
  • repo_url: None
  • paper_authors: Zhifeng Hu, Chong Han, Xudong Wang
  • for: 这个论文是为了解决teraHertz(THz)网络中的跨层路由和长期资源分配问题,以提高未来无线后门系统的可扩展性和可靠性。
  • methods: 这个论文使用了深度强化学习(DRL)技术,实现跨层设计,包括路由对应和资源分配。在DRL方法中,使用了多任务结构,协助实现能源和子阵列的有效使用。此外,这个方法还使用了层次架构,实现每个基站的特定资源分配和学习知识传递。
  • results: simulations 表明,DEFLECT routing 比 minimal hop-count metric 消耗更少的资源,并且不会导致包库损失和第二层延迟。此外,DEFLECT DRL 方法可以在1秒内从破损链路上复原资源有效地。
    Abstract Supporting ultra-high data rates and flexible reconfigurability, Terahertz (THz) mesh networks are attractive for next-generation wireless backhaul systems that empower the integrated access and backhaul (IAB). In THz mesh backhaul networks, the efficient cross-layer routing and long-term resource allocation is yet an open problem due to dynamic traffic demands as well as possible link failures caused by the high directivity and high non-line-of-sight (NLoS) path loss of THz spectrum. In addition, unpredictable data traffic and the mixed integer programming property with the NP-hard nature further challenge the effective routing and long-term resource allocation design. In this paper, a deep reinforcement learning (DRL) based cross-layer design in THz mesh backhaul networks (DEFLECT) is proposed, by considering dynamic traffic demands and possible sudden link failures. In DEFLECT, a heuristic routing metric is first devised to facilitate resource efficiency (RE) enhancement regarding energy and sub-array usages. Furthermore, a DRL based resource allocation algorithm is developed to realize long-term RE maximization and fast recovery from broken links. Specifically in the DRL method, the exploited multi-task structure cooperatively benefits joint power and sub-array allocation. Additionally, the leveraged hierarchical architecture realizes tailored resource allocation for each base station and learned knowledge transfer for fast recovery. Simulation results show that DEFLECT routing consumes less resource, compared to the minimal hop-count metric. Moreover, unlike conventional DRL methods causing packet loss and second-level latency, DEFLECT DRL realizes the long-term RE maximization with no packet loss and millisecond-level latency, and recovers resource-efficient backhaul from broken links within 1s.
    摘要 支持超高数据速率和灵活可重新配置,tera兆Hz(THz)网络是下一代无线备用系统的吸引力,它们可以强化集成访问和备用(IAB)。在THz网络中,有效的交叉层路由和长期资源分配仍然是一个开放的问题,因为动态的流量需求以及可能的链接故障,这些链接故障是由THz频谱的高直达性和高非直视线(NLoS)损失引起的。此外,不可预测的数据流量和混合整数编程性,以及NP困难的性质,进一步挑战了有效的路由和长期资源分配设计。在这篇论文中,一种基于深度学习(DRL)的交叉层设计方法(DEFLECT)被提出,该方法考虑了动态的流量需求和可能的突然链接故障。在DEFLECT中,一种帮助提高资源效率(RE)的启发式路由度量被开发,以便更好地利用能量和子频谱资源。此外,一种基于DRL的资源分配算法被开发,以实现长期RE最大化和快速从破断链接恢复。在DRL方法中,通过合作的多任务结构,对于每个基站的共享资源进行了优化。此外,通过利用层次结构,实现了个性化的资源分配和学习知识传递,以便快速恢复资源。 simulation结果表明,DEFLECT路由占用了较少的资源,相比于最小跳数 metric。此外,与传统DRL方法不同,DEFLECT DRL实现了长期RE最大化,无 packet loss和毫秒级延迟,并在1秒内从破断链接恢复资源。

Compressed online Sinkhorn

  • paper_url: http://arxiv.org/abs/2310.05019
  • repo_url: None
  • paper_authors: Fengpei Wang, Clarice Poon, Tony Shardlow
    for:This paper focuses on the use of optimal transport (OT) distances and the Sinkhorn algorithm for large-scale data processing.methods:The paper revisits the online Sinkhorn algorithm introduced by Mensch and Peyr'e in 2020, and improves the convergence analysis with a faster rate under certain parameter choices. Additionally, the paper proposes a compressed online Sinkhorn algorithm that combines measure compression techniques with the online Sinkhorn algorithm.results:The paper provides numerical results to verify the sharpness of the improved convergence rate, as well as practical numerical gains and theoretical guarantees on the efficiency of the compressed online Sinkhorn algorithm.
    Abstract The use of optimal transport (OT) distances, and in particular entropic-regularised OT distances, is an increasingly popular evaluation metric in many areas of machine learning and data science. Their use has largely been driven by the availability of efficient algorithms such as the Sinkhorn algorithm. One of the drawbacks of the Sinkhorn algorithm for large-scale data processing is that it is a two-phase method, where one first draws a large stream of data from the probability distributions, before applying the Sinkhorn algorithm to the discrete probability measures. More recently, there have been several works developing stochastic versions of Sinkhorn that directly handle continuous streams of data. In this work, we revisit the recently introduced online Sinkhorn algorithm of [Mensch and Peyr\'e, 2020]. Our contributions are twofold: We improve the convergence analysis for the online Sinkhorn algorithm, the new rate that we obtain is faster than the previous rate under certain parameter choices. We also present numerical results to verify the sharpness of our result. Secondly, we propose the compressed online Sinkhorn algorithm which combines measure compression techniques with the online Sinkhorn algorithm. We provide numerical experiments to show practical numerical gains, as well as theoretical guarantees on the efficiency of our approach.
    摘要 使用最优运输距离(OT)和特别是减 entropy 规范化OT距离作为评价指标,在机器学习和数据科学中越来越受欢迎。其使用主要受到高效算法如沟道算法的支持。然而,沟道算法在大规模数据处理中有一个缺点,即需要先从概率分布中筛选出大量数据,然后应用沟道算法来处理离散概率度量。最近,有几篇论文开发了直接处理连续流数据的Stochastic Sinkhorn算法。在这篇文章中,我们回顾了2020年Mensch和Peyr\'e提出的在线沟道算法。我们的贡献有两点:首先,我们提高了在线沟道算法的收敛分析,新的速率比旧速率在某些参数选择下更快。其次,我们提出了压缩在线沟道算法,该算法结合了度量压缩技术和在线沟道算法。我们提供了数据实验来证明我们的方法具有实际数值优势,以及理论保证。

Human-in-the-loop: The future of Machine Learning in Automated Electron Microscopy

  • paper_url: http://arxiv.org/abs/2310.05018
  • repo_url: None
  • paper_authors: Sergei V. Kalinin, Yongtao Liu, Arpan Biswas, Gerd Duscher, Utkarsh Pratiush, Kevin Roccapriore, Maxim Ziatdinov, Rama Vasudevan
  • for: 这篇论文主要是为了介绍机器学习技术在电子顾问中的应用,以及如何通过人工智能自动化实验来提高实验效率和准确性。
  • methods: 该论文使用的方法包括机器学习算法和APIs,用于实时分析和控制微scopes的数据和操作。
  • results: 该论文提出了一种新的实验方法,称为人类在循环(hAE),其中人类操作员监督实验的进行,并通过调整机器学习算法的策略来引导实验向特定目标进行。
    Abstract Machine learning methods are progressively gaining acceptance in the electron microscopy community for de-noising, semantic segmentation, and dimensionality reduction of data post-acquisition. The introduction of the APIs by major instrument manufacturers now allows the deployment of ML workflows in microscopes, not only for data analytics but also for real-time decision-making and feedback for microscope operation. However, the number of use cases for real-time ML remains remarkably small. Here, we discuss some considerations in designing ML-based active experiments and pose that the likely strategy for the next several years will be human-in-the-loop automated experiments (hAE). In this paradigm, the ML learning agent directly controls beam position and image and spectroscopy acquisition functions, and human operator monitors experiment progression in real- and feature space of the system and tunes the policies of the ML agent to steer the experiment towards specific objectives.
    摘要 Here, we discuss some considerations for designing machine learning-based active experiments and suggest that the most likely approach for the next few years will be human-in-the-loop automated experiments (hAE). In this paradigm, the machine learning agent directly controls the beam position and image and spectroscopy acquisition functions, while the human operator monitors the experiment's progress in real- and feature space and adjusts the policies of the machine learning agent to steer the experiment towards specific objectives.

Prompt-augmented Temporal Point Process for Streaming Event Sequence

  • paper_url: http://arxiv.org/abs/2310.04993
  • repo_url: https://github.com/yanyanSann/PromptTPP
  • paper_authors: Siqiao Xue, Yan Wang, Zhixuan Chu, Xiaoming Shi, Caigao Jiang, Hongyan Hao, Gangwei Jiang, Xiaoyun Feng, James Y. Zhang, Jun Zhou
  • for: 本研究旨在 Addressing the challenge of continuous monitoring of Neural Temporal Point Processes (TPPs) for streaming event sequences, while ensuring privacy and memory constraints.
  • methods: 我们提出了一种简单 yet effective 框架 PromptTPP,它将基础 TPP 与一个 continuous-time retrieval prompt pool 集成,以便随着时间流动而学习流行事件序列。
  • results: 我们在三个实际用户行为数据集上展示了 PromptTPP 的优秀性能,并且在 Privacy 和 Memory 约束下实现了 Continual Learning。
    Abstract Neural Temporal Point Processes (TPPs) are the prevalent paradigm for modeling continuous-time event sequences, such as user activities on the web and financial transactions. In real-world applications, event data is typically received in a \emph{streaming} manner, where the distribution of patterns may shift over time. Additionally, \emph{privacy and memory constraints} are commonly observed in practical scenarios, further compounding the challenges. Therefore, the continuous monitoring of a TPP to learn the streaming event sequence is an important yet under-explored problem. Our work paper addresses this challenge by adopting Continual Learning (CL), which makes the model capable of continuously learning a sequence of tasks without catastrophic forgetting under realistic constraints. Correspondingly, we propose a simple yet effective framework, PromptTPP\footnote{Our code is available at {\small \url{ https://github.com/yanyanSann/PromptTPP}}, by integrating the base TPP with a continuous-time retrieval prompt pool. The prompts, small learnable parameters, are stored in a memory space and jointly optimized with the base TPP, ensuring that the model learns event streams sequentially without buffering past examples or task-specific attributes. We present a novel and realistic experimental setup for modeling event streams, where PromptTPP consistently achieves state-of-the-art performance across three real user behavior datasets.
    摘要

Waveformer for modelling dynamical systems

  • paper_url: http://arxiv.org/abs/2310.04990
  • repo_url: None
  • paper_authors: N Navaneeth, Souvik Chakraborty
  • for: 学习解析方程的解 solutions
  • methods: 使用wavelet变换和transformers来捕捉解的空间多尺度行为和远距离动态
  • results: 在四个数学示例中,waveformer可以准确地学习解析方程的解,并在推算区域中表现出优于现有状态 искусственный智能算法,具体来说,waveformer可以在推算区域中准确预测解的动态行为,并且其性能在推算区域中比现有算法更高出至少一个数量级。
    Abstract Neural operators have gained recognition as potent tools for learning solutions of a family of partial differential equations. The state-of-the-art neural operators excel at approximating the functional relationship between input functions and the solution space, potentially reducing computational costs and enabling real-time applications. However, they often fall short when tackling time-dependent problems, particularly in delivering accurate long-term predictions. In this work, we propose "waveformer", a novel operator learning approach for learning solutions of dynamical systems. The proposed waveformer exploits wavelet transform to capture the spatial multi-scale behavior of the solution field and transformers for capturing the long horizon dynamics. We present four numerical examples involving Burgers's equation, KS-equation, Allen Cahn equation, and Navier Stokes equation to illustrate the efficacy of the proposed approach. Results obtained indicate the capability of the proposed waveformer in learning the solution operator and show that the proposed Waveformer can learn the solution operator with high accuracy, outperforming existing state-of-the-art operator learning algorithms by up to an order, with its advantage particularly visible in the extrapolation region
    摘要 Neural operators have gained recognition as powerful tools for learning solutions of a family of partial differential equations. The state-of-the-art neural operators excel at approximating the functional relationship between input functions and the solution space, potentially reducing computational costs and enabling real-time applications. However, they often fall short when tackling time-dependent problems, particularly in delivering accurate long-term predictions. In this work, we propose "waveformer", a novel operator learning approach for learning solutions of dynamical systems. The proposed waveformer exploits wavelet transform to capture the spatial multi-scale behavior of the solution field and transformers for capturing the long horizon dynamics. We present four numerical examples involving Burgers's equation, KS-equation, Allen Cahn equation, and Navier Stokes equation to illustrate the efficacy of the proposed approach. Results obtained indicate the capability of the proposed waveformer in learning the solution operator and show that the proposed Waveformer can learn the solution operator with high accuracy, outperforming existing state-of-the-art operator learning algorithms by up to an order, with its advantage particularly visible in the extrapolation region.Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from Traditional Chinese.

Data-centric Graph Learning: A Survey

  • paper_url: http://arxiv.org/abs/2310.04987
  • repo_url: None
  • paper_authors: Cheng Yang, Deyu Bo, Jixi Liu, Yufei Peng, Boyu Chen, Haoran Dai, Ao Sun, Yue Yu, Yixin Xiao, Qi Zhang, Chunchen Wang, Yuxin Guo, Chuan Shi
  • for: 本文旨在探讨如何在深度学习时更好地处理图数据,以提高图模型的能力。
  • methods: 本文使用数据中心的方法,包括修改图数据的方法,以提高图模型的性能。
  • results: 本文提出了一种基于图学习管道的新分类法,并分析了图数据中的一些潜在问题,以及如何在数据中心的方法下解决这些问题。
    Abstract The history of artificial intelligence (AI) has witnessed the significant impact of high-quality data on various deep learning models, such as ImageNet for AlexNet and ResNet. Recently, instead of designing more complex neural architectures as model-centric approaches, the attention of AI community has shifted to data-centric ones, which focuses on better processing data to strengthen the ability of neural models. Graph learning, which operates on ubiquitous topological data, also plays an important role in the era of deep learning. In this survey, we comprehensively review graph learning approaches from the data-centric perspective, and aim to answer two crucial questions: (1) when to modify graph data and (2) how to modify graph data to unlock the potential of various graph models. Accordingly, we propose a novel taxonomy based on the stages in the graph learning pipeline, and highlight the processing methods for different data structures in the graph data, i.e., topology, feature and label. Furthermore, we analyze some potential problems embedded in graph data and discuss how to solve them in a data-centric manner. Finally, we provide some promising future directions for data-centric graph learning.
    摘要 人工智能(AI)的历史见证了高质量数据对各种深度学习模型的重要影响,如ImageNet对AlexNet和ResNet。在最近,AI社区的注意力转移到了数据中心的方法,而不是设计更复杂的神经网络模型。图学习,它在深度学习时代处理普遍的 topological 数据,也扮演着重要的角色。在本综述中,我们从数据中心的角度全面回顾图学习方法,并试图回答两个关键问题:(1)何时修改图数据,以及(2)如何修改图数据以解锁不同图模型的潜力。因此,我们提出了一种新的分类方法,基于图学习管道中的阶段,并高亮了不同数据结构在图数据中的处理方法,即 topological、特征和标签。此外,我们分析了图数据中的一些可能的问题,并讨论了如何在数据中心的方法下解决这些问题。最后,我们提出了一些未来的可能性,以推动数据中心的图学习发展。

Model-adapted Fourier sampling for generative compressed sensing

  • paper_url: http://arxiv.org/abs/2310.04984
  • repo_url: None
  • paper_authors: Aaron Berk, Simone Brugiapaglia, Yaniv Plan, Matthew Scott, Xia Sheng, Ozgur Yilmaz
  • for: 研究卷积感知抽象,即从固定单位矩阵随机抽取测量矩阵,DFT为重要特殊情况。
  • methods: 构建基于模型的采样策略,以提高采样复杂性为$\textit{O}(kd|\boldsymbol{\alpha}|_{2}^{2})$。这由两个步骤组成:首先发展新的非均匀随机采样分布的理论回归保证,然后优化采样分布以最小化采样次数。
  • results: 提出了一种适用于自然信号类型的采样复杂性,该采样复杂性可以在低卷积频率下实现高准确性。此外,对于代表采样方案进行了实验 validate。
    Abstract We study generative compressed sensing when the measurement matrix is randomly subsampled from a unitary matrix (with the DFT as an important special case). It was recently shown that $\textit{O}(kdn\| \boldsymbol{\alpha}\|_{\infty}^{2})$ uniformly random Fourier measurements are sufficient to recover signals in the range of a neural network $G:\mathbb{R}^k \to \mathbb{R}^n$ of depth $d$, where each component of the so-called local coherence vector $\boldsymbol{\alpha}$ quantifies the alignment of a corresponding Fourier vector with the range of $G$. We construct a model-adapted sampling strategy with an improved sample complexity of $\textit{O}(kd\| \boldsymbol{\alpha}\|_{2}^{2})$ measurements. This is enabled by: (1) new theoretical recovery guarantees that we develop for nonuniformly random sampling distributions and then (2) optimizing the sampling distribution to minimize the number of measurements needed for these guarantees. This development offers a sample complexity applicable to natural signal classes, which are often almost maximally coherent with low Fourier frequencies. Finally, we consider a surrogate sampling scheme, and validate its performance in recovery experiments using the CelebA dataset.
    摘要 我们研究生成式压缩感知(Generative Compressed Sensing),当测量矩阵随机抽取 Unitary 矩阵(DFT 为重要特殊情况)时。最近研究表明,$kdn\| \mathbf{\alpha}\|_{\infty}^{2}$ 随机 Fourier 测量可以重建信号,其中 $\mathbf{\alpha}$ 是所谓的本地协同向量,其中每个分量表示对应的 Fourier 向量与 $G$ 函数($G:\mathbb{R}^k \to \mathbb{R}^n$)的谱的对应。我们构建了适应模型的采样策略,其sample complexity 为 $kd\| \mathbf{\alpha}\|_{2}^{2}$ 测量。这是由以下两个步骤实现的:首先,我们开发了非均匀随机采样分布的新理论恢复保证;其次,我们优化采样分布,以最小化需要的测量数量。这一发展可以应用于自然的信号类型,其通常是高频率下几乎最大的协同。最后,我们考虑了代理采样方案,并通过 CelebA 数据集的实验验证其性能。

Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

  • paper_url: http://arxiv.org/abs/2310.04971
  • repo_url: None
  • paper_authors: Yihao Xue, Siddharth Joshi, Dang Nguyen, Baharan Mirzasoleiman
  • for: 本文研究了multimodal contrastive learning(MMCL)方法在不同频谱上的表达学习,尤其是CLIP的成功。
  • methods: 本文使用了rigorous分析方法,探讨了MMCL的robustness机制,发现了两种机制:intra-class contrasting和inter-class feature sharing。
  • results: 本文的 teorical findings和实验结果表明,rich captions和annotating different types of details可以提高模型的robustness和zero-shot classification accuracy under distribution shift。
    Abstract Recently, multimodal contrastive learning (MMCL) approaches, such as CLIP, have achieved a remarkable success in learning representations that are robust against distribution shift and generalize to new domains. Despite the empirical success, the mechanism behind learning such generalizable representations is not understood. In this work, we rigorously analyze this problem and uncover two mechanisms behind MMCL's robustness: \emph{intra-class contrasting}, which allows the model to learn features with a high variance, and \emph{inter-class feature sharing}, where annotated details in one class help learning other classes better. Both mechanisms prevent spurious features that are over-represented in the training data to overshadow the generalizable core features. This yields superior zero-shot classification accuracy under distribution shift. Furthermore, we theoretically demonstrate the benefits of using rich captions on robustness and explore the effect of annotating different types of details in the captions. We validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP on MS COCO and evaluating the model on variations of shifted ImageNet.
    摘要 近期,多模态对照学习(MMCL)方法,如CLIP,在学习对分布变化robust的表示方面取得了非常成功。尽管在实际中取得了成功,但是这种机制的底层原理尚未了解。在这项工作中,我们仔细分析了这个问题,并揭示了MMCL的 robustness的两种机制:内类对照(intra-class contrasting),允许模型学习具有高方差的特征,以及间类特征共享(inter-class feature sharing),其中一个类中的注解细节帮助学习其他类。这两种机制使得模型不会由训练数据中的假样特征所掩蔽。这 führt zu einer superior zero-shot classification accuracy under distribution shift. 此外,我们也 theoretically demonstrate了使用丰富的描述对robustness带来的好处,并 explore了不同类型的描述在描述中的效果。我们 validate our theoretical findings through experiments, including a well-designed synthetic experiment and an experiment involving training CLIP on MS COCO and evaluating the model on variations of shifted ImageNet.

Improved Active Learning via Dependent Leverage Score Sampling

  • paper_url: http://arxiv.org/abs/2310.04966
  • repo_url: None
  • paper_authors: Atsushi Shimizu, Xiaoou Cheng, Christopher Musco, Jonathan Weare
  • for: 这 paper 的目的是提出一种改进的活动学习方法,用于在agnostic(对抗噪声) Setting 中提高学习效果。
  • methods: 这 paper 使用了marginal leverage score sampling 和 non-independent sampling策略,以提高 espacial coverage 和减少样本数量。具体来说,这 paper 提出了一种基于 pivotal sampling 算法的方法,并在 parametric PDEs 和 uncertainty quantification 中进行了测试。
  • results: 相比于独立 sampling,这 paper 的方法可以减少到达给定准确率的样本数量,提高了效率。此外,paper 还提供了两个理论结论:一是任何非独立 leveragescore sampling 方法,如果它符合弱一侧 $\ell_{\infty}$ 独立性条件,可以活动学习 $d$ 维线性函数,只需要 $O(d\log d)$ 个样本。这一结论扩展了 recient work on matrix Chernoff bounds under $\ell_{\infty}$ independence,并可能对其他 sampling 策略进行分析。二是,对于重要的多项式回归问题,我们的 pivotal 方法可以获得 $O(d)$ 个样本的 bound。
    Abstract We show how to obtain improved active learning methods in the agnostic (adversarial noise) setting by combining marginal leverage score sampling with non-independent sampling strategies that promote spatial coverage. In particular, we propose an easily implemented method based on the pivotal sampling algorithm, which we test on problems motivated by learning-based methods for parametric PDEs and uncertainty quantification. In comparison to independent sampling, our method reduces the number of samples needed to reach a given target accuracy by up to $50\%$. We support our findings with two theoretical results. First, we show that any non-independent leverage score sampling method that obeys a weak one-sided $\ell_{\infty}$ independence condition (which includes pivotal sampling) can actively learn $d$ dimensional linear functions with $O(d\log d)$ samples, matching independent sampling. This result extends recent work on matrix Chernoff bounds under $\ell_{\infty}$ independence, and may be of interest for analyzing other sampling strategies beyond pivotal sampling. Second, we show that, for the important case of polynomial regression, our pivotal method obtains an improved bound of $O(d)$ samples.
    摘要 我们展示了如何在agnostic(反对抗噪)设定中获得改进的活动学习方法,通过融合margin leverage score抽样和非独立抽样策略以提高空间覆盖率。具体而言,我们提出了一个容易实现的方法,基于pivotal抽样算法,并在parametric PDEs和 uncertainty quantification中的问题上进行测试。与独立抽样相比,我们的方法可以降低到 дости���了一定精度的样本数量,低于独立抽样的50%。我们支持我们的结果通过两个理论成果:首先,我们显示任何非独立leverage score抽样方法,只要满足弱一边 $\ell_{\infty}$ 独立性条件(包括pivotal抽样),可以活动地学习 $d$ 维Linear function,只需要 $O(d\log d)$ 样本,与独立抽样相同。这个结果推进了最近matrix Chernoff bounds under $\ell_{\infty}$ 独立性的研究,并可能适用于分析其他抽样策略。其次,我们显示,在重要的多项 regression问题上,我们的pivotal方法可以获得 $O(d)$ 样本的改进 bound。

Towards Explainable Machine Learning: The Effectiveness of Reservoir Computing in Wireless Receive Processing

  • paper_url: http://arxiv.org/abs/2310.04956
  • repo_url: None
  • paper_authors: Shashank Jere, Karim Said, Lizhong Zheng, Lingjia Liu
    for: This paper aims to improve the performance of channel equalization in wireless communications using a learning-based technique called Reservoir Computing (RC) and provide a first principles-based understanding of its operation.methods: The paper uses an echo state network (ESN) as a channel equalizer and incorporates available domain knowledge in the form of wireless channel statistics into the weights of the ESN model. This optimized initialization of the model weights leads to improved receive processing/symbol detection performance.results: The paper shows improved performance in receive processing/symbol detection through simulations, demonstrating the effectiveness of the proposed approach. This is a first step towards explainable machine learning (XML) and assigning practical model interpretability that can be utilized to improve performance and enhance detection reliability.
    Abstract Deep learning has seen a rapid adoption in a variety of wireless communications applications, including at the physical layer. While it has delivered impressive performance in tasks such as channel equalization and receive processing/symbol detection, it leaves much to be desired when it comes to explaining this superior performance. In this work, we investigate the specific task of channel equalization by applying a popular learning-based technique known as Reservoir Computing (RC), which has shown superior performance compared to conventional methods and other learning-based approaches. Specifically, we apply the echo state network (ESN) as a channel equalizer and provide a first principles-based signal processing understanding of its operation. With this groundwork, we incorporate the available domain knowledge in the form of the statistics of the wireless channel directly into the weights of the ESN model. This paves the way for optimized initialization of the ESN model weights, which are traditionally untrained and randomly initialized. Finally, we show the improvement in receive processing/symbol detection performance with this optimized initialization through simulations. This is a first step towards explainable machine learning (XML) and assigning practical model interpretability that can be utilized together with the available domain knowledge to improve performance and enhance detection reliability.
    摘要 深度学习在无线通信应用中得到了迅速的推广,包括物理层。尽管它在channel等化和接收处理/符号检测等任务中表现出色,但它在解释这种超越性表现的问题上留下了很多不满。在这项工作中,我们调查了通道等化的特定任务,通过应用 популяр的学习基于技术——储池计算(RC),该技术在其他学习基于方法和其他学习基于技术上表现出优异。具体来说,我们使用echo state网络(ESN)作为通道等化器,并提供了基于信号处理的基本原理的操作理解。通过这种基础,我们将可用的频率频道知识直接 integrate到ESN模型的权重中。这种方法可以为ESN模型的初始化提供优化,传统上是Random initialization的。最后,我们通过 simulations 表明了增强接收处理/符号检测性能的改进。这是对机器学习(XML)的第一步,它可以让模型解释性得到实践应用,并与可用的频率频道知识结合使用,以提高性能并增强检测可靠性。

Information-Theoretic Bounds on The Removal of Attribute-Specific Bias From Neural Networks

  • paper_url: http://arxiv.org/abs/2310.04955
  • repo_url: None
  • paper_authors: Jiazhi Li, Mahyar Khayatkhoei, Jiageng Zhu, Hanchen Xie, Mohamed E. Hussein, Wael AbdAlmageed
  • for: 本研究旨在探讨避免基于保护特征(如种族、性别、年龄)的神经网络预测中的偏见问题。
  • methods: 本研究使用了一些有前途的偏见除除法,但它们的局限性尚未得到充分探讨。
  • results: 研究发现,当数据集中存在强烈的偏见时,现有的偏见除除法只能在数据集中的偏见较弱时提供有效的性能。这些结论告诉我们在小型数据集中使用这些方法可能不够有效,并促使开发能够在强烈偏见情况下提供有效的方法。
    Abstract Ensuring a neural network is not relying on protected attributes (e.g., race, sex, age) for predictions is crucial in advancing fair and trustworthy AI. While several promising methods for removing attribute bias in neural networks have been proposed, their limitations remain under-explored. In this work, we mathematically and empirically reveal an important limitation of attribute bias removal methods in presence of strong bias. Specifically, we derive a general non-vacuous information-theoretical upper bound on the performance of any attribute bias removal method in terms of the bias strength. We provide extensive experiments on synthetic, image, and census datasets to verify the theoretical bound and its consequences in practice. Our findings show that existing attribute bias removal methods are effective only when the inherent bias in the dataset is relatively weak, thus cautioning against the use of these methods in smaller datasets where strong attribute bias can occur, and advocating the need for methods that can overcome this limitation.
    摘要 Translated into Simplified Chinese:保持神经网络不依赖保护属性(例如种族、性别、年龄)的预测是推进公正和可信的人工智能的关键。虽然一些有前途的属性偏见除除法已经被提出,但它们的限制尚未得到充分探讨。在这种工作中,我们数学和实验上 revela了属性偏见除除法的一个重要限制:即偏见强度的影响。我们 derivates一个普遍的非虚空的信息理论上限,用于衡量任何属性偏见除除法的性能。我们在 sintetic、图像和人口普查数据集上进行了广泛的实验,以验证理论上的上限和实际情况中的后果。我们的发现表明,现有的属性偏见除除法方法只有在数据集中的偏见强度较弱时才能够有效,因此对于小型数据集而言,存在强度偏见的情况下使用这些方法可能不太可靠,而需要开发能够超越这种限制的方法。

A framework to generate sparsity-inducing regularizers for enhanced low-rank matrix completion

  • paper_url: http://arxiv.org/abs/2310.04954
  • repo_url: None
  • paper_authors: Zhi-Yong Wang, Hing Cheung So
  • for: 提出了一种架构,用于生成具有关闭式距离算子的SIR,并应用于矩阵完成低级别矩阵。
  • methods: 使用了半quadratic优化方法生成相关的regularizers,并使用了alternating direction method of multipliers(ADMM)开发算法。
  • results: 对约数据进行了extensive numerical experiments,并证明了方法的效果性和Runtime的优势。
    Abstract Applying half-quadratic optimization to loss functions can yield the corresponding regularizers, while these regularizers are usually not sparsity-inducing regularizers (SIRs). To solve this problem, we devise a framework to generate an SIR with closed-form proximity operator. Besides, we specify our framework using several commonly-used loss functions, and produce the corresponding SIRs, which are then adopted as nonconvex rank surrogates for low-rank matrix completion. Furthermore, algorithms based on the alternating direction method of multipliers are developed. Extensive numerical results show the effectiveness of our methods in terms of recovery performance and runtime.
    摘要 使半quadratice优化方法应用于损失函数可以得到相应的正则化项,但这些正则化项通常不是简洁化正则化项(SIR)。为解决这个问题,我们提出了一个框架,可以生成具有关闭形式距离运算器的SIR。此外,我们在多种通常使用的损失函数上Specify我们的框架,并生成相应的SIR,这些SIR然后被作为非对称矩阵完成的低级rankSurrogate采用。此外,我们还开发了基于多重方向分数法的算法。广泛的numerical实验表明我们的方法在完成性和运行时间方面具有良好的效果。