cs.LG - 2023-10-04

PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks

  • paper_url: http://arxiv.org/abs/2310.03212
  • repo_url: None
  • paper_authors: Samaneh Javadinia, Amirali Baniasadi
  • for: 提高图像分类任务中Capsule Networks(CapsNet)的性能,并且减少计算资源消耗。
  • methods: 引入并研究了并行动态路径分配(Parallel Dynamic Routing,PDR)技术,以提高CapsNet的性能和可扩展性。
  • results: 比较CapsNet和PDR-CapsNet在CIFAR-10 dataset上的性能,PDR-CapsNet具有83.55%的准确率,需要87.26% fewer parameters,32.27%和47.40% fewer MACs和Flops,并且实现了3倍快的推理和7.29J less energy consumption。
    Abstract Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
    摘要 卷积神经网络(CNN)在图像分类任务中具有状态机器人的表现,但它们由于堆叠 Pooling 层所导致的信息损失而受限。卷积神经网络(CapsNet)采用了 computationally 昂贵的迭代过程来解决这些问题,但它们在复杂的数据集上经常表现不佳,需要更多的计算资源 than CNN。为了解决这些挑战,我们提出了并行动态路由 CapsNet(PDR-CapsNet),这是一种更深度的和更加能效的 CapsNet 选择。通过利用并行策略,PDR-CapsNet 减少了 CapsNet 的计算复杂度,提高了通过put和硬件资源的使用效率。因此,我们在 CIFAR-10 数据集上 achieve 83.55% 的准确率,需要 87.26% fewer parameters,32.27% 和 47.40% fewer MACs 和 Flops,实现了 3 倍 быстре的推理和 7.29J 更少的能 consumption 在一个 2080Ti GPU 上。

Regret Analysis of Distributed Online Control for LTI Systems with Adversarial Disturbances

  • paper_url: http://arxiv.org/abs/2310.03206
  • repo_url: None
  • paper_authors: Ting-Jui Chang, Shahin Shahrampour
  • for: 这个论文处理的问题是如何在一个分布式线性时对应系统(可能有未知动力学)下实现分布式在线控制,并且在敌对干扰下实现最佳中央控制策略。
  • methods: 这个论文使用了分布式发现调节器,并且在知道系统动力学时使用了对称矩阵估计,而在未知系统动力学时使用了分布式探索-then-确认方法。
  • results: 这个论文获得了一个 regret bound of $O(\sqrt{T}\log T)$ 以及 $O(T^{2/3} \text{poly}(\log T))$,这表示在不同的时间长度下,这个分布式控制策略可以实现最佳中央控制策略。
    Abstract This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems (with possibly unknown dynamics) in the presence of adversarial perturbations. There exists a global network cost that is characterized by a time-varying convex function, which evolves in an adversarial manner and is sequentially and partially observed by local agents. The goal of each agent is to generate a control sequence that can compete with the best centralized control policy in hindsight, which has access to the global cost. This problem is formulated as a regret minimization. For the case of known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics, and in the learning phase our proposed control algorithm is applied using each agent system estimate. We establish a regret bound of $O(T^{2/3} \text{poly}(\log T))$ for this setting.
    摘要 For the case of known dynamics, a fully distributed disturbance feedback controller is proposed, which guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the case of unknown dynamics, a distributed explore-then-commit approach is designed, where all agents jointly learn the system dynamics in the exploration phase, and the proposed control algorithm is applied using each agent's system estimate in the learning phase. The regret bound for this setting is established as $O(T^{2/3} \text{poly}(\log T))$.

ProGO: Probabilistic Global Optimizer

  • paper_url: http://arxiv.org/abs/2310.04457
  • repo_url: None
  • paper_authors: Xinyu Zhang, Sujit Ghosh
  • for: 提高global optimization中存在许多挑战,包括非几何目标函数和高计算复杂性或计算梯度信息不可用。这些限制使得许多算法无法找到最优解或者失败 converges。
  • methods: 我们开发了一系列基于多维 интеграル的方法,这些方法可以在一些某些软件的假设下 converge 到全球最优解。我们的概率方法不需要使用梯度,而是基于数学上的准确封闭框架,这使得它能够在缺乏梯度信息的情况下提供有效的优化方案。为了缓解多维 интеграル的问题,我们开发了一种秘密批处理器,它可以在高速地生成来自全球最优解的样本,这些样本可以用于估算全球最优解。
  • results: 我们的方法在许多流行的非几何测试函数上表现出色,比如有限全球最优解的函数。与许多现有的状态先进方法相比,我们的方法在 regret 值和速度上表现出了许多的提升,达到了许多的目标函数最优解。然而,我们的方法可能不适用于计算成本高的函数。
    Abstract In the field of global optimization, many existing algorithms face challenges posed by non-convex target functions and high computational complexity or unavailability of gradient information. These limitations, exacerbated by sensitivity to initial conditions, often lead to suboptimal solutions or failed convergence. This is true even for Metaheuristic algorithms designed to amalgamate different optimization techniques to improve their efficiency and robustness. To address these challenges, we develop a sequence of multidimensional integration-based methods that we show to converge to the global optima under some mild regularity conditions. Our probabilistic approach does not require the use of gradients and is underpinned by a mathematically rigorous convergence framework anchored in the nuanced properties of nascent optima distribution. In order to alleviate the problem of multidimensional integration, we develop a latent slice sampler that enjoys a geometric rate of convergence in generating samples from the nascent optima distribution, which is used to approximate the global optima. The proposed Probabilistic Global Optimizer (ProGO) provides a scalable unified framework to approximate the global optima of any continuous function defined on a domain of arbitrary dimension. Empirical illustrations of ProGO across a variety of popular non-convex test functions (having finite global optima) reveal that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods, including gradient-based, zeroth-order gradient-free, and some Bayesian Optimization methods, in term regret value and speed of convergence. It is, however, to be noted that our approach may not be suitable for functions that are expensive to compute.
    摘要 在全球优化领域,许多现有的算法面临非凸目标函数和高计算复杂性或求导函数信息的不可得性的挑战。这些限制,受初始条件的敏感性的影响,常导致优化解不 optimal或失败 convergence。这是真的,即使使用Metaheuristic算法,这些算法是用来混合不同的优化技术以提高其效率和可靠性。为了解决这些挑战,我们开发了一个序列的多维 интеграル基于方法,我们证明它们可以 converge to the global optima under some mild regularity conditions.我们的 probabilistic approach不需要使用 gradients,并且基于数学上的准确的 convergence 框架,anchored in the nuanced properties of nascent optima distribution。为了缓解多维 интеграル的问题,我们开发了一个latent slice sampler,它 enjoys a geometric rate of convergence in generating samples from the nascent optima distribution,这些样本用于approximate the global optima。我们提出的Probabilistic Global Optimizer (ProGO) 提供了一个可扩展的统一框架,用于 aproximate the global optima of any continuous function defined on a domain of arbitrary dimension.empirical examples of ProGO across a variety of popular non-convex test functions (having finite global optima) reveal that the proposed algorithm outperforms, by order of magnitude, many existing state-of-the-art methods, including gradient-based, zeroth-order gradient-free, and some Bayesian Optimization methods, in terms of regret value and speed of convergence. However, it is worth noting that our approach may not be suitable for functions that are expensive to compute.

Digital Ethics in Federated Learning

  • paper_url: http://arxiv.org/abs/2310.03178
  • repo_url: None
  • paper_authors: Liangqi Yuan, Ziran Wang, Christopher G. Brinton
  • for: 这篇论文主要针对的是在互联网物联网(IoT)中实现隐私保护和数据有效利用的 Federated Learning(FL)技术,以及在FL中人类中心设备作为客户端时出现的数字伦理问题。
  • methods: 这篇论文使用了机器学习(ML)模型参数共享的方式来实现多方合作,并分析了客户端和服务器之间的视角和目标不同而导致的挑战,以及这些挑战的解决方案。
  • results: 论文分析了在中央化和分散化FL中Client端的挑战,并探讨了FL在人类中心IoT中的发展前景。
    Abstract The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.
    摘要 互联网物联网(IoT)不断生成巨量数据,引起了数据隐私保护和数据滥用限制的担忧。联邦学习(FL)可以在多方协作中分享机器学习模型参数而不是原始用户数据,因此它在隐私保护和学习效率提高方面受到了广泛关注。本文探讨在人性化设备作为FL客户端时出现的数字道德问题。具体来说,FL中的游戏dinamica、公平、奖励和继承问题由客户端和服务器之间的视角和目标差异引起。我们从客户端和服务器的角度分析这些挑战,并通过中央化和分布式FL的视角进行解读。最后,我们探讨FL在人性化IoT方面的发展机遇。

Test Case Recommendations with Distributed Representation of Code Syntactic Features

  • paper_url: http://arxiv.org/abs/2310.03174
  • repo_url: https://github.com/mosabrezaei/test-case-recommendation
  • paper_authors: Mosab Rezaei, Hamed Alhoori, Mona Rahimi
  • for: 提高软件测试效率和效果,自动生成和维护测试单元。
  • methods: 使用神经网络模型,根据源代码方法和测试单元的结构和Semantic特征,计算cosine相似性并提取相似测试单元。
  • results: 在 Methods2Test 数据集上,提出的方法可以自动找到最相似的测试单元,减少开发人员的测试单元生成努力。
    Abstract Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
    摘要 频繁修改单元测试用例是软件开发中不可避免的,因为软件源代码、设计和需求都在不断发生变化。由于手动维护软件测试用例是费时、费力和成本高昂的,因此自动化测试用例生成和维护的过程将对软件测试过程产生深远的影响。为此,我们提出一种自动化方法,利用源代码方法和单元测试用例的结构和 semantics 属性来推荐最相关和有用的单元测试用例给开发者。该方法首先使用神经网络将方法级源代码和单元测试用例转换成分布式表示(嵌入向量),保留代码结构的重要性。根据方法的semantics和结构特征,该方法计算cosine相似性 между方法的嵌入向量和已经训练的实例。根据嵌入向量的相似性分数,模型标识最相似的方法和相关单元测试用例。在 Methods2Test 数据集上进行了实验,结果表明,虽然无法保证与给定方法集合相似的测试用例,但是提议的测试用例仍然能够捕捉到给定方法的关键特征,并且评估表明,建议的测试用例可以减少开发者的努力来生成预期的测试用例。

Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors

  • paper_url: http://arxiv.org/abs/2310.03166
  • repo_url: https://github.com/advmlphish/raze_to_the_ground_aisec23
  • paper_authors: Biagio Montaruli, Luca Demetrio, Maura Pintor, Luca Compagna, Davide Balzarotti, Battista Biggio
  • for: 这种研究是为了提高机器学习式钓鱼网页检测器的安全性,并对现有的攻击方法进行改进。
  • methods: 该研究使用了一种新的细化的攻击方法,可以无需改变恶意网页的功能和显示效果,同时可以充分利用攻击者所采用的攻击方法。
  • results: 该研究的实验结果显示,使用该新的攻击方法可以让现有的机器学习式钓鱼网页检测器的性能受到严重的损害,只需要30个查询即可。
    Abstract Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
    摘要 machine learning钓鱼网页探测器(ML-PWD)已经被证明容易受到敏感HTML代码的攻击。然而,最近提出的攻击方法具有有限的效iveness,因为它们只ocus于特定的HTML代码元素,而不是优化使用这些攻击。在这种情况下,我们超越这些限制,首先设计了一个新的细化的攻击方法,可以无需改变恶意网页的功能和rendering,即这些攻击是功能和rendering保持的设计。然后,我们使用一种高效的黑盒优化算法选择最佳的攻击方法,以击败目标探测器。我们的实验结果表明,我们的攻击可以很快地击败当前领先的ML-PWD,只需30个查询, thus overcome the weaker attacks proposed in previous work, and enable a more fair robustness evaluation of ML-PWD.

Enhancing Accuracy in Deep Learning Using Random Matrix Theory

  • paper_url: http://arxiv.org/abs/2310.03165
  • repo_url: None
  • paper_authors: Leonid Berlyand, Etienne Sandier, Yitzchak Shmalo, Lei Zhang
  • For: The paper explores the application of random matrix theory (RMT) in the training of deep neural networks (DNNs) to simplify DNN architecture and improve accuracy.* Methods: The paper uses techniques from RMT to determine the number of singular values to be removed from the weight layers of a DNN during training, specifically via singular value decomposition (SVD).* Results: The paper shows that the proposed method can be applied to any fully connected or convolutional layer of a pretrained DNN, reducing the layer’s parameters and simplifying the DNN architecture while preserving or even enhancing the model’s accuracy. Empirical evidence is provided on the MNIST and Fashion MNIST datasets.Here’s the same information in Simplified Chinese text:* 为:本文研究了深度神经网络(DNN)训练中随机矩阵理论(RMT)的应用,以简化DNN结构和提高准确性。* 方法:本文使用RMT技术确定在DNN训练中去除weight层中的小特征值,具体来说是通过特征值分解(SVD)。* 结果:本文证明该方法可以应用于任何已经训练过的层,包括卷积层和全连接层,从而减少层的参数数量,简化DNN结构,同时保持或者提高模型的准确性。实验证明在MNIST和Fashion MNIST datasets上的效果。
    Abstract In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
    摘要 在这项研究中,我们探讨了深度神经网络(DNN)训练中Random Matrix Theory(RMT)的应用,特icularly focusing on layer pruning to simplify DNN architecture and loss landscape. RMT,最近在深度学习中使用以解决过拟合问题,允许我们研究DNN的Weight层спектrum。我们使用这些技术来优化DNN中Weight层中的 singular value数量,以便在训练过程中去除不必要的参数,从而简化DNN结构并提高准确性。我们通过在MNIST和Fashion MNIST数据集上训练简单的DNN模型,证明了我们的方法可以应用于任何已经训练过的完全连接或卷积层。我们的方法可以降低层的参数数量,简化DNN结构,同时保持或者提高模型的准确性。通过基于RMT的标准做法,我们可以确定要从Weight层中去除的小特征值,从而保持测试集的准确率不变,实现更高效的DNN训练而不损失性能。我们提供了 both theoretical and empirical evidence,证明了我们的方法不会对DNN的准确性产生负面影响。我们的结果为创建更高效和准确的深度学习模型提供了有价值的实践经验。

FedNAR: Federated Optimization with Normalized Annealing Regularization

  • paper_url: http://arxiv.org/abs/2310.03163
  • repo_url: https://github.com/ljb121002/fednar
  • paper_authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang
  • for: 提高 Federated Learning(FL)中的泛化性能,防止本地客户端的逻辑泛化。
  • methods: 提出了一种名为“Normalized Annealing Regularization”(FedNAR)的简单 yet effective的算法插件,可以轻松地整合到任何现有的 FL 算法中。FedNAR 通过控制每次更新的规模,通过矩阵和权重减少来实现。
  • results: 对于视觉和语言 datasets 进行了广泛的实验,结果表明,在不同的背景 federated optimization 算法中,权重减少可以加速涨化和提高模型精度。另外,FedNAR 具有自适应性,可以根据初始化参数的不合理性自动调整权重减少,而传统 FL 算法的精度则会明显下降。
    Abstract Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
    摘要 modern deep neural network优化中通用的一种常用技术是Weight decay,用于提高泛化性表现。在这篇论文中,我们首先探讨了Weight decay的选择,并发现Weight decay值对现有的FL算法的 convergence有显著影响。虽然预防过拟合是关键,但Weight decay可能会引入一种新的优化目标,这在FL中更加明显,因为多个本地更新和不同的数据分布。为解决这个挑战,我们开发了{\it Federated optimization with Normalized Annealing Regularization}(FedNAR)算法,这是一种简单 yet effective和多用的插件。我们在每次更新中规定了梯度和Weight decay的范围,通过共同clip来调整每次更新的大小。我们提供了FedNAR的 konvergence rate的完整理论分析,并在视觉和语言数据集上进行了广泛的实验,包括不同的背景 federated optimization算法。我们的实验结果 consistently示出,在把FedNAR incorporated into existing FL algorithms时,可以加速 konvergence和提高模型精度。此外,FedNAR具有自适应Weight decay的能力,当初始参数不合适时,FedNAR可以自动调整Weight decay,而传统FL算法的精度则会明显下降。我们的代码可以在 \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar} 上获取。

FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent

  • paper_url: http://arxiv.org/abs/2310.03156
  • repo_url: None
  • paper_authors: Ziyao Wang, Jianyu Wang, Ang Li
  • for: 本文主要针对 Federated Learning (FL) 实际应用中遇到的复杂挑战之一:hyperparameter 优化。
  • methods: 本文提出了一种基于 hypergradient 的学习率调整算法,名为 FedHyper,可以在 FL 中适应学习率的变化。FedHyper 可以适应全局和本地学习率的调整,并且在不同的初始学习率设置下保持稳定性。
  • results: 实验结果表明,FedHyper 能够在视觉和语言 benchmark 数据集上快速收敛,比 FedAvg 和竞争对手快速收敛 1.1-3 倍,并且在不良初始学习率设置下可以提高最终准确率。此外,FedHyper 可以在不同的初始学习率设置下提高准确率,最高提高 15%。
    Abstract The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
    摘要 理论上,聚合学习(FL)的应用面临着许多复杂的挑战,其中一个重要的挑战是调节超参数。在这些超参数中,学习率的调整是一个关键的组成部分,可以显著提高FL系统的效果。为回应这个急需,这篇论文提出了FedHyper,一种特有的学习率调整算法,专门适用于FL。FedHyper可以适应全局和本地学习率的调整,并且不仅能够在不同的初始学习率设置下保持稳定的性能,还能够减少手动调整学习率的劳动。我们提供了FL的整体分析,并对视觉和语言benchmark datasets进行了广泛的实验。结果显示,FedHyper可以在1.1-3x的速度上 converges,同时也可以在不同的初始学习率设置下达到最高的终端准确率。此外,FedHyper可以在初始学习率设置不佳的情况下带来很大的提升,提高终端准确率达到15%。

Towards out-of-distribution generalizable predictions of chemical kinetics properties

  • paper_url: http://arxiv.org/abs/2310.03152
  • repo_url: None
  • paper_authors: Zihao Wang, Yongqiang Chen, Yang Duan, Weijiang Li, Bo Han, James Cheng, Hanghang Tong
  • for: 这篇论文的目的是提出一种基于机器学习技术的化学反应速率预测方法,以满足高通过率的化学合成过程的设计。
  • methods: 该方法使用了现有的机器学习方法进行反应预测,并将其分为三级别(结构、条件和机制),以探讨不同级别的问题。
  • results: 研究结果表明,现有的机器学习方法在不同级别的问题上存在挑战和机遇,并提供了一些可能的解决方案。
    Abstract Machine Learning (ML) techniques have found applications in estimating chemical kinetics properties. With the accumulated drug molecules identified through "AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.
    摘要

Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly

  • paper_url: http://arxiv.org/abs/2310.03150
  • repo_url: None
  • paper_authors: Herbert Woisetschläger, Alexander Isenko, Shiqiang Wang, Ruben Mayer, Hans-Arno Jacobsen
  • for: This paper explores the use of Federated Learning (FL) to bring large language models (LLMs) to modern edge computing systems.
  • methods: The paper fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. The study also provides a micro-level hardware benchmark and compares the model FLOP utilization to a state-of-the-art data center GPU.
  • results: The paper evaluates the current capabilities of edge computing systems and their potential for LLM FL workloads, and demonstrates the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
    Abstract Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
    摘要 大型语言模型(LLM)和基础模型在使用中变得普遍,因为它们提供了新的机会,让个人和企业可以更好地处理自然语言,与数据进行交互,并更快地获取信息。然而,训练或精度调整LLM需要庞大量数据,这可能会因为法律或技术限制而困难以获取,并可能需要专用的计算资源。联邦学习(FL)是一种解决这些问题的方案,以扩展深入学习应用程序的数据访问权限。本文从硬件角度来探讨如何将LLM带到现代边缘计算系统中。我们的研究将Flan-T5模型家族,从80M到3B参数,使用FL进行文本摘要任务进行细粒度硬件指标,并与当前最佳的数据中心GPU进行比较。我们的贡献有两个方面:首先,我们评估边缘计算系统的当前能力和LLM FL工作负荷的潜在可能性。其次,我们通过与数据中心GPU进行比较,探讨在现实条件下的网络利用率和计算效率的可能性。我们的贡献在于,我们提供了一个硬件指标,并证明了边缘计算系统在LLM FL工作负荷下的可能性和下一步的改进方向。

Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data

  • paper_url: http://arxiv.org/abs/2310.03146
  • repo_url: None
  • paper_authors: Adam Wang, Son Nguyen, Albert Montillo
  • for: This paper aims to address two core problems in traditional deep learning (DL) by introducing a mixed effects deep learning (MEDL) framework that promotes fairness and robustness.
  • methods: The MEDL framework separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) using a cluster adversary and a Bayesian neural network. The framework also incorporates adversarial debiasing to promote equality-of-odds fairness across fairness-sensitive variables.
  • results: The paper shows that the MEDL framework notably enhances fairness across all sensitive variables, increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status, while maintaining robust performance and clarity. The framework is versatile and suitable for various dataset types and tasks, making it broadly applicable.
    Abstract Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
    摘要 传统的深度学习(DL)受到两个核心问题的影响。首先,它假设训练样本是独立并且具有相同的分布。然而,许多实际世界数据集中的样本会根据共同的测量结果(如参与研究的人员或细胞)分组,这会违反这个假设,从而导致深度学习的性能受损,降低了通用化和解释性。其次,模型通常会为总准确率培育,而忽略少数群体,这会导致偏见问题,例如贷款批准或医疗保险费率的偏见,这些偏见可能会对人们的生活质量产生重要影响。为解决这两个挑战,我们提出了混合效果深度学习(MEDL)框架。MEDL分别量化分组 invariant fixed effects(FE)和分组特有的随机效果(RE),通过引入:1)分组反对者,使学习分组 invariant FE,2)bayesian neural network,量化 RE,3)混合函数,将 FE 和 RE 组合成混合效果预测。我们将这种 MEDL 结合了反对批判,以便实现 equality-of-odds 公平性 across FE, RE, 和 ME 预测中的敏感变量。我们使用三个数据集进行评估:两个来自人口/金融,关注收入分类,一个来自医疗领域,预测医院住院时间,一个回归任务。我们的框架可以明显提高公平性,对所有敏感变量的公平性提高至82%,43%,86% 和 27%。此外,我们的方法保持了MEDL的稳定性和清晰度,并且可以适用于不同的数据类型和任务,因此广泛应用。我们的 GitHub 存储库中包含实现。

OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials

  • paper_url: http://arxiv.org/abs/2310.03121
  • repo_url: None
  • paper_authors: Peter Eastman, Raimondas Galvelis, Raúl P. Peláez, Charlles R. A. Abreu, Stephen E. Farr, Emilio Gallicchio, Anton Gorenko, Michael M. Henry, Frank Hu, Jing Huang, Andreas Krämer, Julien Michel, Joshua A. Mitchell, Vijay S. Pande, João PGLM Rodrigues, Jaime Rodriguez-Guerra, Andrew C. Simmonett, Jason Swails, Ivy Zhang, John D. Chodera, Gianni De Fabritiis, Thomas E. Markland
  • for: 这篇论文主要是用于介绍OpenMM分子动力学工具集的新版本,它支持使用机器学习potential来提高分子动力学计算的精度。
  • methods: 论文使用了PyTorch机器学习模型,可以在分子动力学计算中用于计算力和能量。此外,paper还提供了一个高级接口,使用户可以轻松地模型他们的分子对象,并使用通用的预训练potential函数。
  • results: 论文通过对细胞分化调控因子8(CDK8)和绿色荧光蛋白(GFP)chromophore在水中的分子动力学计算来展示这些特性。结果表明,这些特性可以在只需要一定的增加计算成本的情况下,提高分子动力学计算的准确性。
    Abstract Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
    摘要 机器学习在分子模拟中扮演着越来越重要的角色。最新版本的OpenMM分子动力学工具集 introduce了新的特性来支持使用机器学习潜在力。可以将任意PyTorch模型添加到一个 simulatin中,并用来计算力和能量。一个更高层次的接口使得用户可以轻松地模型他们的分子对象,使用通用的预训练潜在功能。一组优化的CUDA 加速器和自定义PyTorch操作,使得分子模拟得到了显著提高。我们在Cyclin-dependent kinase 8(CDK8)和绿色荧光蛋白(GFP)氢键在水中进行了模拟。总之,这些特性使得使用机器学习来提高分子模拟的准确性,只需要一个modest 的增加成本。

Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and Dataset

  • paper_url: http://arxiv.org/abs/2310.03119
  • repo_url: None
  • paper_authors: Tharindu Lakshan Yasarathna, Lojenaa Navanesan, Simon Barque, Assanka Sayakkara, Nhien-An Le-Khac
  • for: This paper is written for the purpose of investigating the limitations of Electromagnetic Side-Channel Analysis (EM-SCA) approaches for IoT forensics, specifically the impact of device variability on the accuracy and reliability of EM-SCA results.
  • methods: The paper uses machine-learning (ML) based approaches for EM-SCA and collects EM-SCA datasets to evaluate the limitations of current EM-SCA approaches and datasets. The study also employs transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices.
  • results: The paper contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter, and demonstrates the feasibility of using transfer learning to improve the accuracy and reliability of EM-SCA results in IoT forensics.
    Abstract IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
    摘要 互联网关系物(IoT)指的是一个包含物理设备、车辆、家用电器和其他设备的网络,这些设备嵌入了感知器、软件和连接性,以便收集和交换数据。IoT审查是收集和分析IoT设备上的数字证据,以调查网络攻击、安全漏洞和其他可能在这些连接设备上发生的恶意活动。特别是,EM-SCA在IoT审查中变得非常重要,因为它可以无须损害IoT设备或窃听其网络,而且可以披露IoT设备内部的机密信息。然而,EM-SCA的准确性和可靠性受到设备多样性、环境因素和数据收集和处理方法的影响。此外,对这些限制的研究非常有限,特别是crossed-IoT设备的问题。因此,本文会详细介绍EM-SCA方法中的设备多样性的影响,以及machine learning(ML)基于的EM-SCA方法的限制。我们首先介绍了背景、基本概念和用于评估现有EM-SCA方法的数据集。然后,我们解决了现有的一个重要限制,即处理器(SoC)的多核架构。我们提出了一种收集EM-SCA数据的方法,并证明了使用传输学习可以在IoT审查中获得更加可靠和有意义的结果。此外,我们还提供了一个新的数据集,用于使用深度学习模型分析电磁romagnetic Side-Channel数据,并且关于cross-device可用性问题。

Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation

  • paper_url: http://arxiv.org/abs/2310.03112
  • repo_url: https://github.com/slds-lmu/mbt_comparison
  • paper_authors: Julia Herbinger, Susanne Dandl, Fiona K. Ewald, Sofia Loibl, Giuseppe Casalicchio
  • for: 这篇论文主要是为了描述如何使用模型基于树来创建逻辑模型,以便对黑盒式机器学习模型进行回溯解释。
  • methods: 本论文使用的方法包括四种模型基于树算法,即SLIM、GUIDE、MOB和CTree,以及一种简单的模型泵制法。这些方法的目的是通过决策规则将特征空间分解成可解释的区域,并在每个区域内使用可解释的模型来近似黑盒模型的行为。
  • results: 本论文的结果表明,这些模型基于树算法具有较高的解释性和稳定性,同时也能够保持和黑盒模型的性能相似。此外,这些方法还能够捕捉复杂的交互效应。
    Abstract Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    摘要 ��urgate模型在抽象和强大的黑盒机器学习模型的透彻解读中发挥关键作用。这篇论文强调使用模型基于树作为资源模型,通过决策规则将特征空间分割成可解释的区域。在每个区域内,使用可解释的模型,基于加法主要效果来近似黑盒模型的行为,寻找优化的平衡点。本文比较了四种模型基于树算法, namely SLIM、GUIDE、MOB和CTree,它们在生成这种代理模型方面的能力。我们进行了全面的分析,包括忠实度、可解释性、稳定性以及捕捉交互效果的能力。根据我们的彻底分析,我们最终提供了用户特定的建议。

Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data

  • paper_url: http://arxiv.org/abs/2310.03111
  • repo_url: None
  • paper_authors: Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley
  • for: 本研究的目的是Characterizing the relationship between neural population activity and behavioral data, 即 Neuroscience 领域中的一个中心目标。
  • methods: 本研究使用了一种基于 Gaussian Process Factor Analysis (GPFA) 和 Gaussian Process Variational Autoencoders (GP-VAEs) 的无监督 latent variable model (LVM),可以提取高维时间序列数据中的共同和独立的特征空间结构。
  • results: 研究表明,使用这种模型可以准确地分解高维时间序列数据中的共同和独立特征空间结构,并且在不同实验数据模式下都能够提供良好的重建结果。此外,研究还应用于两个实验 Setting:蝋烛蛋白氮氧化 imaging 和 Manduca sexta 肌电征观测。
    Abstract Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
    摘要 We use Gaussian Process Factor Analysis (GPFA) for neural spiking data with temporally smooth latent space, combined with Gaussian Process Variational Autoencoders (GP-VAEs) that use a GP prior to characterize correlations in a latent space, and admit rich expressivity due to a deep neural network mapping to observations. We parameterize the latents in the Fourier domain, which improves latent identification.We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that smoothly change over time. Our multi-modal Gaussian Process Variational Autoencoder (MM-GPVAE) accurately identifies shared and independent latent structure across modalities, and provides good reconstructions of both images and neural rates on held-out trials.We also apply our framework to two real-world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus. Our approach provides a powerful tool for analyzing and understanding the complex relationships between neural population activity and behavioral data.

DP-SGD for non-decomposable objective functions

  • paper_url: http://arxiv.org/abs/2310.03104
  • repo_url: None
  • paper_authors: William Kong, Andrés Muñoz Medina, Mónica Ribero
  • for: 这个研究是为了开发一种具有隐私保证的 Computer Vision 模型和大语言模型的开发工具。
  • methods: 这个研究使用了一种新的 DP-SGD 方法,它在 similarity-based loss functions 中 manipulates Gradient 的方式,以获得 $O(1)$ 的 $L_2$ sensitivity。
  • results: 在 CIFAR-10 预训和 CIFAR-100 精进任务中,这个方法的表现与非隐私模型几乎相等,并且通常比 DP-SGD 直接应用到对比例损失的表现更好。
    Abstract Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
    摘要 “不监督式预训”是电脑视觉模型和大语言模型的开发中常见的步骤。在这种设定下,由于没有标签,因此需要使用相似性基于的损失函数,如对照损失,以降低相似输入的距离,并将不相似的输入划分开。随着隐私问题的增加,对这些模型进行权限为 differential privacy 变得更加重要。然而,由于输入的生成方式,这些损失函数的 $L_2$ 敏感性会随着批号大小增长。这个问题对于对于权限为 differentially private 的训练方法,如 DP-SGD,是不利的。为了解决这个问题,我们开发了一种基于相似性损失函数的 DP-SGD variant,具体是对对照损失进行修改,以使得每个批号中的条件 gradient 的敏感性为 $O(1)$。我们在预先训练 CIFAR-10 和 CIFAR-100 中进行了一些验证,结果显示,在这两个任务中,我们的方法的性能与非隐私模型相似,并且通常超过直接对对照损失进行 DP-SGD 的性能。”

Dual Prompt Tuning for Domain-Aware Federated Learning

  • paper_url: http://arxiv.org/abs/2310.03103
  • repo_url: None
  • paper_authors: Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa
  • for: 这份研究旨在解决分布式机器学习中的领域转移问题,以实现多个客户端联合训练共享模型。
  • methods: 这篇研究使用了提示学习技术,具体是运用预训练的视觉语言模型,然后对于每个客户端的资料进行视觉和文本提示调整,以便领域适应。
  • results: 实验结果显示,提案学习方法(Fed-DPT)可以优化领域转移问题,并且与原始CLIP模型相比,实现了14.8%的提升。在DomainNet dataset中,这种方法可以获得68.4%的平均准确率,涵盖了六个领域。
    Abstract Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
    摘要 federated learning 是一种分布式机器学习 paradigma,允许多个客户端共同训练一个共享模型,使用本地数据进行训练。然而,传统的 federated learning 算法经常难以通用,因为客户端的训练数据通常存在域shift问题。在这项工作中,我们考虑了一种具有挑战性和实际性的 federated learning 场景,其中每个客户端的训练数据来自不同的域。我们通过提出了技术,解决域shift问题,并提出了一种名为 Federated Dual Prompt Tuning (Fed-DPT) 的新方法。具体来说,Fed-DPT 使用了预训练的视觉语言模型,然后应用视觉和文本提示调整来促进域适应性。我们进行了广泛的 Fed-DPT 实验,并证明其在域感知 federated learning 中具有显著的有效性。使用预训练的 CLIP 模型(ViT-Base 作为图像编码器),我们的 Fed-DPT 在 DomainNet 数据集上 achieve 68.4% 的平均精度,超过原 CLIP 的 14.8% 。

Physics-Informed Neural Networks for Accelerating Power System State Estimation

  • paper_url: http://arxiv.org/abs/2310.03088
  • repo_url: None
  • paper_authors: Solon Falas, Markos Asprou, Charalambos Konstantinou, Maria K. Michael
  • for: 本文研究了使用物理知识学习(PINNs)加速电力系统状态估算,以提高电力系统监控和运行状况的精度和效率。
  • methods: 本文提出了一种新的方法,通过将物理知识integrated into PINNs来减少状态估算的计算复杂性,同时保持高准确性。
  • results: 经过实验表明,提出的方法可以提高精度,降低标准差,并且更快地 converges,在IEEE 14-bus系统上实现了11%的提高精度、75%的降低标准差和30%的加速。
    Abstract State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
    摘要 <> translate_language: zh-CNState estimation is the cornerstone of the power system control center, providing the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.Note: "zh-CN" is the Simplified Chinese language code.

Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making

  • paper_url: http://arxiv.org/abs/2310.03022
  • repo_url: None
  • paper_authors: Jeonghye Kim, Suyoung Lee, Woojun Kim, Youngchul Sung
  • for: 提高Offline Reinforcement Learning(RL)中的模型表现,并且能够更好地捕捉RL模型中的地方相互关系。
  • methods: 基于Transformer的Decision Transformer(DT)模型,并提出了一种新的动作序列预测器名为Decision ConvFormer(DC),该模型使用本地卷积滤波器来替换DT的注意力模块,以更好地捕捉RL数据中的本地相互关系。
  • results: DC在多个标准RLbenchmark上达到了状态机器人表现的最佳Result,同时具有较少的资源消耗和更好的普适性。
    Abstract The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.
    摘要 Recently, Transformer 的成功在自然语言处理领域已经引发了各种应用。在线下强化学习(RL)中,决策Transformer(DT)正在崛起为一种有前途的模型。然而,我们发现DT的注意模块并不适合捕捉RL中轨迹的内在本地依赖关系。为了超越DT的局限性,我们提出了一种新的行动序列预测器,名为决策ConvFormer(DC),基于MetaFormer的 Architecture。DC使用本地核心 filtering 作为токен混合器,可以有效捕捉RL数据中的内在本地关系。在广泛的实验中,DC达到了多种标准RLbenchmark中的最佳性能,同时需要较少的资源。此外,我们还示出DC更好地理解数据中的含义,并且具有更好的泛化能力。

High-dimensional SGD aligns with emerging outlier eigenspaces

  • paper_url: http://arxiv.org/abs/2310.03010
  • repo_url: None
  • paper_authors: Gerard Ben Arous, Reza Gheissari, Jiaoyang Huang, Aukosh Jagannath
  • for: 这篇论文研究了在使用泊松 gradient descent(SGD)训练多类高维杂合体时,训练过程中对预测矩阵和梯度矩阵的特征 matrix 的 JOINT 演化。
  • methods: 这篇论文使用了 Stochastic gradient descent(SGD)训练方法,并通过分析预测矩阵和梯度矩阵的特征矩阵来研究训练过程中的特征 matrix 的演化。
  • results: 研究发现,在多层神经网络和高维杂合体中,SGD trajectory 快速地与出现的低级别异常 eigenspace 对齐,并且在多层设置中,每层的异常 eigenspace 在训练过程中进行了演化,并且在 SGD converges 到不优化类фика器时会出现rank defect。这些结果证明了一些在过去十年的数值研究中出现的观测结果,关于训练过程中预测矩阵和梯度矩阵的特征矩阵的特征。
    Abstract We rigorously study the joint evolution of training dynamics via stochastic gradient descent (SGD) and the spectra of empirical Hessian and gradient matrices. We prove that in two canonical classification tasks for multi-class high-dimensional mixtures and either 1 or 2-layer neural networks, the SGD trajectory rapidly aligns with emerging low-rank outlier eigenspaces of the Hessian and gradient matrices. Moreover, in multi-layer settings this alignment occurs per layer, with the final layer's outlier eigenspace evolving over the course of training, and exhibiting rank deficiency when the SGD converges to sub-optimal classifiers. This establishes some of the rich predictions that have arisen from extensive numerical studies in the last decade about the spectra of Hessian and information matrices over the course of training in overparametrized networks.
    摘要 我们严格地研究了在 Stochastic Gradient Descent(SGD)训练过程中的训练动态和经验偏导矩阵和梯度矩阵的共同演化。我们证明了在两个 canonical 分类任务中, namely 高维混合体中的multi-class和一层或二层神经网络,SGD的轨迹快速与出现的低维异常特征空间相对应。此外,在多层设置中,这种对应发生在每层上,最后一层的异常特征空间在训练过程中不断演化,并在SGD converges to sub-optimal classifiers时表现出rank defect。这些结论证明了过去十年的数字实验所预测的偏导矩阵和信息矩阵的特征在训练过程中的变化。

Learning characteristic parameters and dynamics of centrifugal pumps under multi-phase flow using physics-informed neural networks

  • paper_url: http://arxiv.org/abs/2310.03001
  • repo_url: None
  • paper_authors: Felipe de Castro Teixeira Carvalho, Kamaljyoti Nath, Alberto Luiz Serpa, George Em Karniadakis
  • for: 这种研究是为了提高油田生产和实施控制策略而写的。
  • methods: 这个研究使用物理受限神经网络(PINN)模型来估算系统参数。
  • results: 研究发现,使用PINN模型可以减少在野外实验室测试中估算流体属性的成本。
    Abstract Electrical submersible pumps (ESP) are the second most used artificial lifting equipment in the oil and gas industry due to their high flow rates and boost pressures. They often have to handle multiphase flows, which usually contain a mixture of hydrocarbons, water, and/or sediments. Given these circumstances, emulsions are commonly formed. It is a liquid-liquid flow composed of two immiscible fluids whose effective viscosity and density differ from the single phase separately. In this context, accurate modeling of ESP systems is crucial for optimizing oil production and implementing control strategies. However, real-time and direct measurement of fluid and system characteristics is often impractical due to time constraints and economy. Hence, indirect methods are generally considered to estimate the system parameters. In this paper, we formulate a machine learning model based on Physics-Informed Neural Networks (PINNs) to estimate crucial system parameters. In order to study the efficacy of the proposed PINN model, we conduct computational studies using not only simulated but also experimental data for different water-oil ratios. We evaluate the state variable's dynamics and unknown parameters for various combinations when only intake and discharge pressure measurements are available. We also study structural and practical identifiability analyses based on commonly available pressure measurements. The PINN model could reduce the requirement of expensive field laboratory tests used to estimate fluid properties.
    摘要 电动潜水泵(ESP)是石油和天然气行业中第二常用人工吸引装置,因其高流量和增压性。它们经常需要处理多相流体,通常包含混合物质,水和/或淤泥。由于这些情况,涂抹是常见的。它是两种不溶的液体流体的液-液流体,其有效粘度和密度与单相流体不同。在这种情况下,ESP系统的准确模型化是关键,以便优化石油生产和实施控制策略。然而,实时和直接测量流体和系统特性是经济不实际的,因此通常使用间接方法来估算系统参数。在这篇论文中,我们使用物理学 Informed Neural Networks(PINNs)来估算系统参数。为了评估提案的PINN模型的效果,我们进行了计算研究,使用不仅数据 simulated,还有实验数据,以便为不同的水油比例进行研究。我们分析了系统参数的动态和未知参数,以及不同组合下的只有吸入和排出压力测量的情况。我们还进行了结构可识别性和实用可识别性分析,以确定可以通过常见压力测量来估算流体属性。PINN模型可以减少在野外实验室测试中估算流体属性的成本。

IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning

  • paper_url: http://arxiv.org/abs/2310.02995
  • repo_url: https://github.com/ibcl-anon/ibcl
  • paper_authors: Pengyuan Lu, Michele Caprio, Eric Eaton, Insup Lee
  • For: The paper focuses on continual learning, specifically addressing the trade-off between different tasks and proposing a new method called Imprecise Bayesian Continual Learning (IBCL) to improve the efficiency of continual learning.* Methods: IBCL updates a knowledge base in the form of a convex hull of model parameter distributions and obtains particular models to address task trade-off preferences with zero-shot, without requiring additional training overhead.* Results: The paper shows that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters, and experiments on standard image classification and NLP tasks support this guarantee. Additionally, IBCL improves average per-task accuracy by at most 23% and peak per-task accuracy by at most 15% with respect to the baseline methods, with steadily near-zero or positive backward transfer.
    Abstract Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a distinct task performance trade-off. Researchers have discussed how to train particular models to address specific trade-off preferences. However, existing algorithms require training overheads proportional to the number of preferences -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address task trade-off preferences with zero-shot. That is, IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters. Moreover, experiments on standard image classification and NLP tasks support this guarantee. Statistically, IBCL improves average per-task accuracy by at most 23\% and peak per-task accuracy by at most 15\% with respect to the baseline methods, with steadily near-zero or positive backward transfer. Most importantly, IBCL significantly reduces the training overhead from training 1 model per preference to at most 3 models for all preferences.
    摘要 LIKE 普通多任务学习,连续学习具有多目标优化的性质,因此面临当前任务分布优化时可能需要牺牲之前任务的性能。这意味着存在多个 Pareto-优质的模型,每个模型在不同的时间都能够满足不同的任务性能质量。研究人员已经讨论了如何训练特定的模型以满足特定的任务质量让权。然而,现有的算法需要训练负担与有多个偏好相对应的训练负担成比例。为回应这个问题,我们提出了不准确杯状泛化学习(IBCL)。在新任务时,IBCL 会(1)更新知识库,其形式为模型参数分布的 convex hull,并(2)在零执行下获取任务质量让权的特定模型。即,IBCL 不需要额外的训练负担来生成根据偏好训练的模型。我们证明了由 IBCL 获取的模型具有确定 Pareto 优质参数的保证。此外,我们在标准图像识别和自然语言处理任务上进行了实验,并证明了这个保证。统计 speaking,IBCL 可以提高每个任务的均值性能 by at most 23%,并且 peak 性能 by at most 15%,与基eline 方法相比。此外,IBCL 可以大幅减少训练负担,从训练 1 个模型到训练所有偏好的模型。

Variance Reduced Halpern Iteration for Finite-Sum Monotone Inclusions

  • paper_url: http://arxiv.org/abs/2310.02987
  • repo_url: None
  • paper_authors: Xufeng Cai, Ahmet Alacaoglu, Jelena Diakonikolas
  • for: This paper focuses on solving game-theoretic equilibrium problems, particularly in machine learning applications with finite-sum structure.
  • methods: The paper proposes variants of the classical Halpern iteration that utilize variance reduction to improve the complexity guarantees. These methods are based on the properties of cocoercivity and Lipschitz continuity of the component operators.
  • results: The paper achieves improved complexity guarantees of $\widetilde{\mathcal{O}( n + \sqrt{n}L\varepsilon^{-1})$ for finite-sum monotone inclusions, which is near-optimal up to poly-logarithmic factors. This result is the first variance reduction-type result for general finite-sum monotone inclusions and for specific problems like convex-concave optimization.
    Abstract Machine learning approaches relying on such criteria as adversarial robustness or multi-agent settings have raised the need for solving game-theoretic equilibrium problems. Of particular relevance to these applications are methods targeting finite-sum structure, which generically arises in empirical variants of learning problems in these contexts. Further, methods with computable approximation errors are highly desirable, as they provide verifiable exit criteria. Motivated by these applications, we study finite-sum monotone inclusion problems, which model broad classes of equilibrium problems. Our main contributions are variants of the classical Halpern iteration that employ variance reduction to obtain improved complexity guarantees in which $n$ component operators in the finite sum are ``on average'' either cocoercive or Lipschitz continuous and monotone, with parameter $L$. The resulting oracle complexity of our methods, which provide guarantees for the last iterate and for a (computable) operator norm residual, is $\widetilde{\mathcal{O}( n + \sqrt{n}L\varepsilon^{-1})$, which improves upon existing methods by a factor up to $\sqrt{n}$. This constitutes the first variance reduction-type result for general finite-sum monotone inclusions and for more specific problems such as convex-concave optimization when operator norm residual is the optimality measure. We further argue that, up to poly-logarithmic factors, this complexity is unimprovable in the monotone Lipschitz setting; i.e., the provided result is near-optimal.
    摘要 机器学习方法,受到逆攻击robustness或多代理 Setting的需求,解决了游戏理论平衡问题。特别是在empirical variant of learning problems中,方法targetingfinite-sum structure是非常重要的。此外,具有计算可 approximations error的方法是非常有优势,因为它们提供可验证的退出标准。驱动了这些应用,我们研究了finite-sum monotone inclusion problem,这些问题模型了广泛的平衡问题。我们的主要贡献是基于classical Halpern iteration的变种,这些变种使用了减少偏误来获得改进的复杂性保证,其中每个finite sum中的n个组件运算器在平均情况下是cocoercive或Lipschitz连续和均衡的,并且具有参数$L$。这些方法的执行 oracle complexity是$\widetilde{\mathcal{O}(n + \sqrt{n}L\varepsilon^{-1})$,这比现有方法快上到$\sqrt{n}$。这是first variance reduction-type result for general finite-sum monotone inclusions和更特定的问题,如几何-几何优化,当操作norm residual是优化度量时。我们还 argues that, up to poly-logarithmic factors,这个复杂性是不可改进的在均衡Lipschitz setting; 即,提供的结果是near-optimal。

Fast, Expressive SE$(n)$ Equivariant Networks through Weight-Sharing in Position-Orientation Space

  • paper_url: http://arxiv.org/abs/2310.02970
  • repo_url: https://github.com/ebekkers/ponita
  • paper_authors: Erik J Bekkers, Sharvaree Vadgama, Rob D Hesselink, Putri A van der Linden, David W Romero
  • For: The paper is written to derive geometrically optimal edge attributes for flexible message passing frameworks and to develop an efficient equivariant group convolutional network for processing 3D point clouds.* Methods: The paper uses the theory of homogeneous spaces to formalize the notion of weight sharing in convolutional networks and to derive attributes that uniquely identify equivalence classes of point-pairs. The paper also uses group convolutions with feature maps over the homogeneous space of positions, position and orientations, and the group SE$(3)$ itself.* Results: The paper achieves state-of-the-art results in accuracy and speed on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models. Specifically, the paper shows that using the homogeneous space of positions and orientations significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group.
    Abstract Based on the theory of homogeneous spaces we derive \textit{geometrically optimal edge attributes} to be used within the flexible message passing framework. We formalize the notion of weight sharing in convolutional networks as the sharing of message functions over point-pairs that should be treated equally. We define equivalence classes of point-pairs that are identical up to a transformation in the group and derive attributes that uniquely identify these classes. Weight sharing is then obtained by conditioning message functions on these attributes. As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions $\mathbb{R}^3$, position and orientations $\mathbb{R}^3 {\times} S^2$, and the group SE$(3)$ itself. Among these, $\mathbb{R}^3 {\times} S^2$ is an optimal choice due to the ability to represent directional information, which $\mathbb{R}^3$ methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group. We empirically support this claim by reaching state-of-the-art results -- in accuracy and speed -- on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.
    摘要 As an application of the theory, we develop an efficient equivariant group convolutional network for processing 3D point clouds. The theory of homogeneous spaces tells us how to do group convolutions with feature maps over the homogeneous space of positions $\mathbb{R}^3$, position and orientations $\mathbb{R}^3 \times S^2$, and the group SE$(3)$ itself. Among these, $\mathbb{R}^3 \times S^2$ is an optimal choice due to the ability to represent directional information, which $\mathbb{R}^3$ methods cannot, and it significantly enhances computational efficiency compared to indexing features on the full SE$(3)$ group.We empirically support this claim by reaching state-of-the-art results -- in accuracy and speed -- on three different benchmarks: interatomic potential energy prediction, trajectory forecasting in N-body systems, and generating molecules via equivariant diffusion models.Simplified Chinese translation:基于同态空间理论,我们 derive геометрически优化的边属性,用于在灵活消息传递框架中进行使用。我们正式化了卷积网络中的重量共享,即在点对上共享消息函数。我们定义点对的等价类,并 derive Attributes 可以唯一标识这些等价类。然后,我们通过条件消息函数于这些Attributes来实现重量共享。作为理论的应用,我们开发了高效的同态群卷积网络,用于处理 3D 点云。同态群卷积的理论告诉我们如何在同态空间中进行群卷积,包括位置空间 $\mathbb{R}^3$、位置和方向空间 $\mathbb{R}^3 \times S^2$ 以及同态群 SE $(3)$ 本身。其中, $\mathbb{R}^3 \times S^2$ 是最佳选择,因为它可以表示方向信息,而 $\mathbb{R}^3$ 方法无法表示,并且对计算效率产生了显著提高。我们通过三个不同的标准测试来支持这一点:预测分子潜能能量、N-体系中的轨迹预测和通过同态扩散模型生成分子。

Dual Conic Proxies for AC Optimal Power Flow

  • paper_url: http://arxiv.org/abs/2310.02969
  • repo_url: None
  • paper_authors: Guancheng Qiu, Mathieu Tanneau, Pascal Van Hentenryck
  • for: 该研究旨在开发一种基于机器学习的优化跟踪器,用于提高交流电力网络优化问题(AC-OPF)的解决方案。
  • methods: 该研究使用了一种新的 dual 架构,它可以提供有效的 dual 下界,并结合了一种自动化学习方案,以避免高成本的训练数据生成。
  • results: 对于媒体和大型电力网络,该研究的数值实验表明,提档的方法可以实现高效率和可扩展性。
    Abstract In recent years, there has been significant interest in the development of machine learning-based optimization proxies for AC Optimal Power Flow (AC-OPF). Although significant progress has been achieved in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF. This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF. Namely, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds. The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation. Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology.
    摘要 Recently, there has been significant interest in the development of machine learning-based optimization proxies for AC Optimal Power Flow (AC-OPF). Although significant progress has been achieved in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF. This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF. Specifically, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds. The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation. Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology.Here is the translation in Traditional Chinese:过去几年来,有很大的 интерес在开发机器学习基础的优化调Proxy дляAC Optimal Power Flow(AC-OPF)。 although significant progress has been made in predicting high-quality primal solutions, no existing learning-based approach can provide valid dual bounds for AC-OPF。 This paper addresses this gap by training optimization proxies for a convex relaxation of AC-OPF。 Specifically, the paper considers a second-order cone (SOC) relaxation of ACOPF, and proposes a novel dual architecture that embeds a fast, differentiable (dual) feasibility recovery, thus providing valid dual bounds。 The paper combines this new architecture with a self-supervised learning scheme, which alleviates the need for costly training data generation。 Extensive numerical experiments on medium- and large-scale power grids demonstrate the efficiency and scalability of the proposed methodology。

Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning

  • paper_url: http://arxiv.org/abs/2310.02964
  • repo_url: https://github.com/zihan-liu-00/repcon
  • paper_authors: Zihan Liu, Ge Wang, Jiaqi Wang, Jiangbin Zheng, Stan Z. Li
  • for: 这 paper 的目的是提出一种基于对抗学习的 peptide 共模型方法(RepCon),以增强 peptide 表示的学习表示的一致性,提高下游任务的推论性能。
  • methods: 这 paper 使用了一种基于对抗学习的框架,将 sequential 和 graphical 两种模型作为两个专家,将它们的表示进行融合,以提高 peptide 表示的学习表示的一致性。
  • results: 实验表明,RepCon 方法比独立模型更高效,并且在对抗学习框架下比其他共模型方法更高效。 Plus, 这 paper 还提供了模型解释,证明 RepCon 方法的有效性。
    Abstract Peptides are formed by the dehydration condensation of multiple amino acids. The primary structure of a peptide can be represented either as an amino acid sequence or as a molecular graph consisting of atoms and chemical bonds. Previous studies have indicated that deep learning routes specific to sequential and graphical peptide forms exhibit comparable performance on downstream tasks. Despite the fact that these models learn representations of the same modality of peptides, we find that they explain their predictions differently. Considering sequential and graphical models as two experts making inferences from different perspectives, we work on fusing expert knowledge to enrich the learned representations for improving the discriminative performance. To achieve this, we propose a peptide co-modeling method, RepCon, which employs a contrastive learning-based framework to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. It considers representations from the sequential encoder and the graphical encoder for the same peptide sample as a positive pair and learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Empirical studies of RepCon and other co-modeling methods are conducted on open-source discriminative datasets, including aggregation propensity, retention time, antimicrobial peptide prediction, and family classification from Peptide Database. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework. In addition, the attribution on RepCon further corroborates the validity of the approach at the level of model explanation.
    摘要 peptides 是由多个氨基酸的干扰凝结形成的。peptide的主要结构可以表示为氨基酸序列或化学链图,由多个氨基酸组成。之前的研究表明,深度学习模型专门针对序列和图形式的peptide模型具有相似的性能。尽管这些模型学习的是同一种modal peptide的表示,但它们对其预测的解释不同。我们认为这些模型可以视为两个专家,从不同的角度对peptide进行推理。为了融合这两个专家的知识,我们提出了一种peptide共模型方法,即RepCon,该方法使用了对比学习框架来增强对应的抽象表示之间的共识性。它将序列Encoder和图形Encoder对同一个peptide样本的表示视为正例对,并学习增强正例对之间的共识性,同时减弱负例对之间的共识性。我们对RepCon和其他共模型方法进行了实验,并对开源的推理数据集进行了测试,包括积累性、保留时间、抗微生物蛋白预测和家族分类。我们的结果表明,共模型方法比独立模型更高效,而RepCon比其他共模型方法更有优势。此外,对RepCon的解释也证明了该方法的正确性。

A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces

  • paper_url: http://arxiv.org/abs/2310.02951
  • repo_url: None
  • paper_authors: Bekzhan Kerimkulov, James-Michael Leahy, David Siska, Lukasz Szpruch, Yufei Zhang
  • for: 该论文研究了无穷 horizon entropy-regularized Markov decision processes 的 Fisher-Rao 政策梯度流的全球收敛性。
  • methods: 该论文使用了一种 continuous-time 的 policy mirror descent 方法,并证明了其在全球well-posed 和 exponential convergence 的性质。
  • results: 该论文证明了该流在优化策略时的稳定性和对 gradient evaluation 的稳定性,并且提供了一种基于 log-linear 政策参数化的性能评估方法。
    Abstract We study the global convergence of a Fisher-Rao policy gradient flow for infinite-horizon entropy-regularised Markov decision processes with Polish state and action space. The flow is a continuous-time analogue of a policy mirror descent method. We establish the global well-posedness of the gradient flow and demonstrate its exponential convergence to the optimal policy. Moreover, we prove the flow is stable with respect to gradient evaluation, offering insights into the performance of a natural policy gradient flow with log-linear policy parameterisation. To overcome challenges stemming from the lack of the convexity of the objective function and the discontinuity arising from the entropy regulariser, we leverage the performance difference lemma and the duality relationship between the gradient and mirror descent flows.
    摘要 我们研究一个渔者-拉奥政策梯度流的全球吸引性,用于无穷远游 Markov决策过程中的无限远景 entropy-REGULATED 状态和动作空间。该流是一种继承 policy 镜投法的连续时间 analogue。我们证明了流的全球吸引性并证明其对优化策略的极速减少。此外,我们证明了流具有对 gradient 评估的稳定性,从而提供了对自然政策梯度流的log-line性参数化的性能分析。为了解决目标函数不 convex 和 entropy 补偿器导致的挑战,我们利用了性能差异 лемма和投影梯度流与 mirror descent 流的 dual 关系。

HappyFeat – An interactive and efficient BCI framework for clinical applications

  • paper_url: http://arxiv.org/abs/2310.02948
  • repo_url: None
  • paper_authors: Arthur Desbois, Tristan Venot, Fabrizio De Vico Fallani, Marie-Constance Corsi
  • for: 本研究是为了提高Brain-Computer Interface(BCI)系统的性能和易用性,特别是在临床环境中。
  • methods: 本研究使用了HappyFeat软件,它可以自动完成BCI实验或分析参数的设置和调整,从而提高BCI性能。同时,HappyFeat还提供了一些功能连接性的metric,可以与传统的功率 спектраль密度相比较。
  • results: 研究表明,HappyFeat可以帮助在时间紧张的环境中快速选择最佳特征,从而提高BCI性能。此外,HappyFeat还可以作为一种有效的工具来比较不同的信号特征,以便在训练分类算法时进行选择。
    Abstract Brain-Computer Interface (BCI) systems allow users to perform actions by translating their brain activity into commands. Such systems usually need a training phase, consisting in training a classification algorithm to discriminate between mental states using specific features from the recorded signals. This phase of feature selection and training is crucial for BCI performance and presents specific constraints to be met in a clinical context, such as post-stroke rehabilitation. In this paper, we present HappyFeat, a software making Motor Imagery (MI) based BCI experiments easier, by gathering all necessary manipulations and analysis in a single convenient GUI and via automation of experiment or analysis parameters. The resulting workflow allows for effortlessly selecting the best features, helping to achieve good BCI performance in time-constrained environments. Alternative features based on Functional Connectivity can be used and compared or combined with Power Spectral Density, allowing a network-oriented approach. We then give details of HappyFeat's main mechanisms, and a review of its performances in typical use cases. We also show that it can be used as an efficient tool for comparing different metrics extracted from the signals, to train the classification algorithm. To this end, we show a comparison between the commonly-used Power Spectral Density and network metrics based on Functional Connectivity. HappyFeat is available as an open-source project which can be freely downloaded on GitHub.
    摘要 Brain-Computer Interface (BCI) 系统允许用户透过识别大脑活动的信号翻译为指令。通常需要一个训练阶段,包括对特定特征的录取信号进行分类学习。这个阶段在临床上有特定的限制,如rehabilitation after stroke。 在这篇文章中,我们介绍了 HappyFeat,一个软件,使得基于想像运动 (MI) 的 BCI实验更加容易,通过集成所有必要的操作和分析到一个易用的Graphical User Interface (GUI) 中,并通过自动化实验或分析参数的自动化,以提高BCI性能。这个工作流程可以帮助在时间紧迫的环境中取得好的BCI性能。此外,HappyFeat 还可以使用不同的功能连接度来检查和比较不同的特征,以推动网络对应的方法。我们随后详细介绍了 HappyFeat 的主要机制,以及它在一般使用情况下的表现。我们还展示了它可以作为一个有效的工具,用于比较不同的信号特征,并训练分类器。为此,我们比较了通常使用的功能spectral density和基于功能连接度的网络特征。HappyFeat 为一个开源项目,可以免费下载在 GitHub 上。

Online Constraint Tightening in Stochastic Model Predictive Control: A Regression Approach

  • paper_url: http://arxiv.org/abs/2310.02942
  • repo_url: None
  • paper_authors: Alexandre Capone, Tim Brüdigam, Sandra Hirche
  • for: 解决可难控的测度随机控制问题,因为无法找到几个特殊情况下的分析解。
  • methods: 使用改进了的约束紧张参数的随机控制方法,通过将机会约束 rewrite 为硬式约束。
  • results: 提出了一种在控制过程中线上学习约束紧张参数的方法,通过使用高度表达能力的GP模型来近似最小约束紧张参数,并且可以 garantuee 约束紧张参数满足机会约束。在数值实验中,该方法比三种现状顶尖方法优于其他三种现状顶尖方法。
    Abstract Solving chance-constrained stochastic optimal control problems is a significant challenge in control. This is because no analytical solutions exist for up to a handful of special cases. A common and computationally efficient approach for tackling chance-constrained stochastic optimal control problems consists of reformulating the chance constraints as hard constraints with a constraint-tightening parameter. However, in such approaches, the choice of constraint-tightening parameter remains challenging, and guarantees can mostly be obtained assuming that the process noise distribution is known a priori. Moreover, the chance constraints are often not tightly satisfied, leading to unnecessarily high costs. This work proposes a data-driven approach for learning the constraint-tightening parameters online during control. To this end, we reformulate the choice of constraint-tightening parameter for the closed-loop as a binary regression problem. We then leverage a highly expressive \gls{gp} model for binary regression to approximate the smallest constraint-tightening parameters that satisfy the chance constraints. By tuning the algorithm parameters appropriately, we show that the resulting constraint-tightening parameters satisfy the chance constraints up to an arbitrarily small margin with high probability. Our approach yields constraint-tightening parameters that tightly satisfy the chance constraints in numerical experiments, resulting in a lower average cost than three other state-of-the-art approaches.
    摘要 解决机会限定随机控制问题是控制领域的一大挑战。这是因为没有关于特定情况的分析解。一种常见和计算效率高的方法是将机会限定转换为硬件约束参数,但选择约束缩短参数仍然是一大挑战,而且通常只能在过程噪声分布已知情况下提供保证。此外,机会限定经常不是紧张满足,导致过高的成本。本工作提出了一种在控制过程中线上学习约束缩短参数的数据驱动方法。为此,我们将closed-loop中的约束缩短参数选择改为一个二进制回归问题。然后,我们利用一种高度表达力的\gls{gp}模型来近似最小的约束缩短参数,以满足机会限定。通过合适地调整算法参数,我们证明了所得到的约束缩短参数可以在高概率下满足机会限定,并且在数值实验中比三种现状顶峰技术优于。

Hoeffding’s Inequality for Markov Chains under Generalized Concentrability Condition

  • paper_url: http://arxiv.org/abs/2310.02941
  • repo_url: None
  • paper_authors: Hao Chen, Abhishek Gupta, Yin Sun, Ness Shroff
  • for: 这个论文研究了基于通用集中性Conditions(IPM)的韦夫丁不等式,该条件可以扩展和修正现有的马克夫chain韦夫丁不等式假设。
  • methods: 论文使用了基于IPM的通用集中性Conditions来推广韦夫丁不等式的应用范围,并在机器学习领域中应用到了一些非 asymptotic 分析问题。
  • results: 论文通过应用通用集中性Conditions来提供了一些非 asymptotic 的韦夫丁不等式,包括empirical risk minimization with Markovian samples、Ployak-Ruppert averaging of SGD和rested Markovian bandits with general state space。
    Abstract This paper studies Hoeffding's inequality for Markov chains under the generalized concentrability condition defined via integral probability metric (IPM). The generalized concentrability condition establishes a framework that interpolates and extends the existing hypotheses of Markov chain Hoeffding-type inequalities. The flexibility of our framework allows Hoeffding's inequality to be applied beyond the ergodic Markov chains in the traditional sense. We demonstrate the utility by applying our framework to several non-asymptotic analyses arising from the field of machine learning, including (i) a generalization bound for empirical risk minimization with Markovian samples, (ii) a finite sample guarantee for Ployak-Ruppert averaging of SGD, and (iii) a new regret bound for rested Markovian bandits with general state space.
    摘要
  1. A generalization bound for empirical risk minimization with Markovian samples.2. A finite sample guarantee for Ployak-Ruppert averaging of stochastic gradient descent (SGD).3. A new regret bound for rested Markovian bandits with general state spaces.

Optimal Transport with Adaptive Regularisation

  • paper_url: http://arxiv.org/abs/2310.02925
  • repo_url: None
  • paper_authors: Hugues Van Assel, Titouan Vayer, Remi Flamary, Nicolas Courty
  • for: 提高优化交通(OT)的数学复杂性和交通计划的稠密度。
  • methods: 使用约束来限制每个点的质量流入或流出。
  • results: 适用于领域适应。Here’s a more detailed explanation of each point:
  • for: The paper aims to improve the numerical complexity and density of the transport plan in the optimal transport (OT) problem by introducing a strictly convex term and imposing constraints on the mass going in or out of each point.
  • methods: The paper proposes a new formulation of OT called OT with Adaptive RegularIsation (OTARI), which imposes constraints on the mass going in or out of each point to remedy the imbalance in the way mass is spread across the points.
  • results: The paper demonstrates the benefits of OTARI for domain adaptation.
    Abstract Regularising the primal formulation of optimal transport (OT) with a strictly convex term leads to enhanced numerical complexity and a denser transport plan. Many formulations impose a global constraint on the transport plan, for instance by relying on entropic regularisation. As it is more expensive to diffuse mass for outlier points compared to central ones, this typically results in a significant imbalance in the way mass is spread across the points. This can be detrimental for some applications where a minimum of smoothing is required per point. To remedy this, we introduce OT with Adaptive RegularIsation (OTARI), a new formulation of OT that imposes constraints on the mass going in or/and out of each point. We then showcase the benefits of this approach for domain adaptation.
    摘要 <>优化运输(OT)的原型化表述通过紧张的凸函数规范化可以提高数值复杂性和传输计划的稠密度。许多表述都强制实施全局约束,例如通过Entropic Regularization来实现。由于偏出点比中心点更容易扩散质量,这通常导致传输计划中每个点的平均熔炼程度具有显著偏好。这可能会对某些应用程序造成不利影响,例如需要每个点的最小平滑。为了缓解这个问题,我们介绍了OT with Adaptive RegularIsation(OTARI),一种新的优化运输表述,它在每个点上强制实施质量进入或者离开的约束。然后,我们展示了这种方法在领域适应中的优势。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The traditional Chinese form is also available, but it may not be as widely used in practice.

Enhancing Ayurvedic Diagnosis using Multinomial Naive Bayes and K-modes Clustering: An Investigation into Prakriti Types and Dosha Overlapping

  • paper_url: http://arxiv.org/abs/2310.02920
  • repo_url: None
  • paper_authors: Pranav Bidve, Shalini Mishra, Annapurna J
  • for: 这项研究的目的是提高医学预测和诊断的准确率,通过使用机器学习算法和医学知识来分类人体内的三种基本类型(VATT-Dosha、PITT-Dosha和KAPH-Dosha)。
  • methods: 该研究使用了多omial Naive Bayes(MNB) кластер分类算法和K-modes归一化分 clustering算法,以及 chi-square 测试来处理分类数据。
  • results: 研究结果显示,使用MNB分类器可以达到0.90的准确率、0.81的精度、0.91的F1分数和0.90的回归率,而且对七种类型的预测和诊断具有较高的准确率和精度。
    Abstract The identification of Prakriti types for the human body is a long-lost medical practice in finding the harmony between the nature of human beings and their behaviour. There are 3 fundamental Prakriti types of individuals. A person can belong to any Dosha. In the existing models, researchers have made use of SVM, KNN, PCA, Decision Tree, and various other algorithms. The output of these algorithms was quite decent, but it can be enhanced with the help of Multinomial Naive Bayes and K-modes clustering. Most of the researchers have confined themselves to 3 basic classes. This might not be accurate in the real-world scenario, where overlapping might occur. Considering these, we have classified the Doshas into 7 categories, which includes overlapping of Doshas. These are namely, VATT-Dosha, PITT-Dosha, KAPH-Dosha, VATT-PITT-Dosha, PITT-KAPH-Dosha, KAPH-VATT-Dosha, and VATT-PITT-KAPH-Dosha. The data used contains a balanced set of all individual entries on which preprocessing steps of machine learning have been performed. Chi-Square test for handling categorical data is being used for feature selection. For model fitting, the method used in this approach is K-modes clustering. The empirical results demonstrate a better result while using the MNB classifier. All key findings of this work have achieved 0.90 accuracy, 0.81 precision, 0.91 F-score, and 0.90 recall. The discussion suggests a provident analysis of the seven clusters and predicts their occurrence. The results have been consolidated to improve the Ayurvedic advancements with machine learning.
    摘要 人体的 Пракрити类型认定是一项长期丢弃的医疗实践,旨在找到人类的自然和行为之间的协调。存在3种基本的 Пракрити类型。一个人可以属于任何一种多样体。现有的模型中,研究人员使用了SVM、KNN、PCA、决策树和其他算法。这些算法的输出很不错,但可以通过多omial Naive Bayes和K-模式归一化进行提高。大多数研究人员只是将人类分为3个基本类。这可能不准确,因为在现实情况下,可能会出现重叠。为此,我们将多样体分为7个类,包括多样体之间的重叠。这些类别分别是:VATT-多样体、PITT-多样体、KAPH-多样体、VATT-PITT-多样体、PITT-KAPH-多样体、KAPH-VATT-多样体和VATT-PITT-KAPH-多样体。数据集中包含了所有个体数据,并进行了机器学习预处理步骤。使用的是χ²测试来处理分类数据。为模型适应,我们使用了K-模式归一化。实际结果表明,使用MNB分类器时的结果更好。所有的关键发现都达到了0.90准确率、0.81精度、0.91F-分数和0.90恢复率。讨论中提出了7个群体的可观测分析和预测其出现。结果被总结以提高医学领域的阿瑞瓦德发展。

Attention-based Multi-task Learning for Base Editor Outcome Prediction

  • paper_url: http://arxiv.org/abs/2310.02919
  • repo_url: None
  • paper_authors: Amina Mollaysa, Ahmed Allam, Michael Krauthammer
  • for: 该论文旨在提高基因编辑技术的精度和效率,以便更好地治疗人类遗传疾病。
  • methods: 该论文提出了一种基于注意力的两阶段机器学习模型,可以预测给定目标基因序列的编辑结果的可能性。同时,该模型还可以同时学习多种基因编辑器(即变种)。
  • results: 该模型的预测结果与实验结果在多个数据集和基因编辑器变种上均显示了强相关性,这表明该模型可以有效地加速和提高基因编辑设计的过程。
    Abstract Human genetic diseases often arise from point mutations, emphasizing the critical need for precise genome editing techniques. Among these, base editing stands out as it allows targeted alterations at the single nucleotide level. However, its clinical application is hindered by low editing efficiency and unintended mutations, necessitating extensive trial-and-error experimentation in the laboratory. To speed up this process, we present an attention-based two-stage machine learning model that learns to predict the likelihood of all possible editing outcomes for a given genomic target sequence. We further propose a multi-task learning schema to jointly learn multiple base editors (i.e. variants) at once. Our model's predictions consistently demonstrated a strong correlation with the actual experimental results on multiple datasets and base editor variants. These results provide further validation for the models' capacity to enhance and accelerate the process of refining base editing designs.
    摘要 人类遗传疾病常由点变化引起,强调了精准基因编辑技术的急需。其中,基因编辑技术最引人注目,因为它可以 Targeted 修改Single nucleotide level。然而,其临床应用受到低编辑效率和无意义变化的限制,导致室内实验室中需要进行广泛的试验和尝试。为了加速这个过程,我们提出了一种基于注意力的两阶段机器学习模型,可以预测给定 genomic 目标序列中的所有可能的编辑结果的可能性。我们还提议一种多任务学习 schema,可以同时学习多种基因编辑器(即变体)。我们的模型预测结果与实验结果在多个数据集和基因编辑器变体上具有强相关性。这些结果提供了进一步的验证,证明我们的模型有助于提高和加速基因编辑设计的过程。

ELUQuant: Event-Level Uncertainty Quantification in Deep Inelastic Scattering

  • paper_url: http://arxiv.org/abs/2310.02913
  • repo_url: None
  • paper_authors: Cristiano Fanelli, James Giroux
  • for: 用于深入的 uncertainty 量化和事件筛选
  • methods: 使用物理学 Informed Bayesian Neural Network (BNN) 和多项式Normalizing Flows (MNF) 方法进行事件级别的 uncertainty 量化
  • results: 能够准确地提取 kinematic 变量 x、Q^2 和 y,并且能够提供精细的物理性 uncertainty 描述,这对于决策和数据质量监测等任务都非常有用。
    Abstract We introduce a physics-informed Bayesian Neural Network (BNN) with flow approximated posteriors using multiplicative normalizing flows (MNF) for detailed uncertainty quantification (UQ) at the physics event-level. Our method is capable of identifying both heteroskedastic aleatoric and epistemic uncertainties, providing granular physical insights. Applied to Deep Inelastic Scattering (DIS) events, our model effectively extracts the kinematic variables $x$, $Q^2$, and $y$, matching the performance of recent deep learning regression techniques but with the critical enhancement of event-level UQ. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering. It also allows for the reduction of true inaccuracies without directly accessing the ground truth. A thorough DIS simulation using the H1 detector at HERA indicates possible applications for the future EIC. Additionally, this paves the way for related tasks such as data quality monitoring and anomaly detection. Remarkably, our approach effectively processes large samples at high rates.
    摘要 我们介绍了一种具有物理学信息的泛化神经网络(BNN),使用多项式正常化流(MNF)来实现细致的不确定性评估(UQ)在物理事件级别。我们的方法可以识别和分解不同类型的不确定性,包括异质灵活的不确定性和知识不确定性,并提供物理学上的细致信息。在深入刺激(DIS)事件中应用了我们的模型,能够有效提取kinematic变量$x$, $Q^2$,和$y$,与最新的深度学习回归技术相当,但是具有物理事件级别的不确定性评估的重要优势。这种细致的不确定性描述对决策非常重要,特别是在事件筛选任务中。此外,它还允许降低实际错误,而不需要直接访问真实的真实值。在使用HERA的H1探测器进行深入刺激模拟中,我们发现了可能的应用于未来EIC。此外,这种方法还可以应用于数据质量监测和异常检测等相关任务。很 satisfactory的是,我们的方法可以高效处理大量样本,并且可以在高速下进行处理。

Spline-based neural network interatomic potentials: blending classical and machine learning models

  • paper_url: http://arxiv.org/abs/2310.02904
  • repo_url: None
  • paper_authors: Joshua A. Vita, Dallas R. Trinkle
  • for: 这种研究旨在证明Machine Learning interatomic potentials (IPs) 的复杂性是否必须高于first-principles数据的随机噪声水平以确定高质量IPs。
  • methods: 该研究引入了一种新的Machine Learning interatomic potentials (MLIP)框架,它将spline-based MEAM potentials (s-MEAM) 的简单性与神经网络(NN)架构相结合。这种框架被称为spline-based neural network potential (s-NNP),可以用来描述复杂的数据集,并且可以在计算上高效地进行。
  • results: 该研究表明,使用spline filters来编码原子环境可以得到一个易于解释的嵌入层,可以与修改NN结构来涵盖预期的物理行为,从而提高总体的解释性。此外,研究还发现,可以在多个化学系统之间共享spline filters,以便提供一个便利的参照点,从而实现跨系统分析。
    Abstract While machine learning (ML) interatomic potentials (IPs) are able to achieve accuracies nearing the level of noise inherent in the first-principles data to which they are trained, it remains to be shown if their increased complexities are strictly necessary for constructing high-quality IPs. In this work, we introduce a new MLIP framework which blends the simplicity of spline-based MEAM (s-MEAM) potentials with the flexibility of a neural network (NN) architecture. The proposed framework, which we call the spline-based neural network potential (s-NNP), is a simplified version of the traditional NNP that can be used to describe complex datasets in a computationally efficient manner. We demonstrate how this framework can be used to probe the boundary between classical and ML IPs, highlighting the benefits of key architectural changes. Furthermore, we show that using spline filters for encoding atomic environments results in a readily interpreted embedding layer which can be coupled with modifications to the NN to incorporate expected physical behaviors and improve overall interpretability. Finally, we test the flexibility of the spline filters, observing that they can be shared across multiple chemical systems in order to provide a convenient reference point from which to begin performing cross-system analyses.
    摘要 机器学习(ML)间位能预测(IP)可以达到近似于初始物理数据中的精度水平,但是需要证明其增加复杂性是否必要 для建立高质量IP。在这种工作中,我们介绍了一个新的MLIP框架,它将spline-based MEAM potentials(s-MEAM)的简单性与神经网络(NN)架构融合在一起。我们称之为spline-based neural network potential(s-NNP)。这个框架是传统NNP的简化版,可以用来描述复杂的数据集,并且具有计算效率。我们示出了如何使用spline filters来编码原子环境,并将其与NN结合以捕捉预期的物理行为,从而提高总体解释性。此外,我们观察了spline filters的共享性,可以在多个化学系统之间共享,以便在不同系统之间进行跨系统分析。

FroSSL: Frobenius Norm Minimization for Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2310.02903
  • repo_url: None
  • paper_authors: Oscar Skean, Aayush Dhakal, Nathan Jacobs, Luis Gonzalo Sanchez Giraldo
  • for: 这篇论文的目的是提出一个新的自我超级vised learning(SSL)目标函数FroSSL,并证明FroSSL可以更快地训练到比较好的表示。
  • methods: FroSSL 使用了抽象对照和对照矩阵的方法,并且使用了均值平方误差来保证扩展对照。
  • results: FroSSL 可以更快地训练到比较好的表示,并且在 linear probe 评估中 learns 竞争性的表示。
    Abstract Self-supervised learning (SSL) is an increasingly popular paradigm for representation learning. Recent methods can be classified as sample-contrastive, dimension-contrastive, or asymmetric network-based, with each family having its own approach to avoiding informational collapse. While dimension-contrastive methods converge to similar solutions as sample-contrastive methods, it can be empirically shown that some methods require more epochs of training to converge. Motivated by closing this divide, we present the objective function FroSSL which is both sample- and dimension-contrastive up to embedding normalization. FroSSL works by minimizing covariance Frobenius norms for avoiding collapse and minimizing mean-squared error for augmentation invariance. We show that FroSSL converges more quickly than a variety of other SSL methods and provide theoretical and empirical support that this faster convergence is due to how FroSSL affects the eigenvalues of the embedding covariance matrices. We also show that FroSSL learns competitive representations on linear probe evaluation when used to train a ResNet18 on the CIFAR-10, CIFAR-100, STL-10, and ImageNet datasets.
    摘要 自适应学习(SSL)是现代表示学习的一种受欢迎的方法。当前的方法可以分为样本对照、维度对照和异形网络基于的三大类别,每个家族都有自己的方法来避免信息归一化。虽然维度对照方法和样本对照方法可以达到相同的解决方案,但是可以经验性地表明一些方法需要更多的训练集数据来融合。为了bridging这个差距,我们提出了一个名为 FroSSL 的目标函数,它同时是样本对照和维度对照的,并且可以保证均值平方差和协方差 Frobenius 范数的最小化。我们表明了 FroSSL 比许多其他 SSL 方法更快地 converges,并提供了理论和实验支持,这更快的 converges 是由 FroSSL 对嵌入协方差矩阵的 eigenvalues 的影响所致。此外,我们还证明了 FroSSL 在 Linear Probe 评估中学习出了竞争力强的表示。

Recovery of Training Data from Overparameterized Autoencoders: An Inverse Problem Perspective

  • paper_url: http://arxiv.org/abs/2310.02897
  • repo_url: None
  • paper_authors: Koren Abitbul, Yehuda Dar
  • for: recovery of training data from overparameterized autoencoder models
  • methods: use trained autoencoder to implicitly define a regularizer for the particular training dataset, and iteratively apply the trained autoencoder and simple computations to estimate and address the unknown degradation operator
  • results: significantly outperforms previous methods for training data recovery from autoencoders, and improves recovery performance in challenging settings that were previously considered highly challenging and impractical
    Abstract We study the recovery of training data from overparameterized autoencoder models. Given a degraded training sample, we define the recovery of the original sample as an inverse problem and formulate it as an optimization task. In our inverse problem, we use the trained autoencoder to implicitly define a regularizer for the particular training dataset that we aim to retrieve from. We develop the intricate optimization task into a practical method that iteratively applies the trained autoencoder and relatively simple computations that estimate and address the unknown degradation operator. We evaluate our method for blind inpainting where the goal is to recover training images from degradation of many missing pixels in an unknown pattern. We examine various deep autoencoder architectures, such as fully connected and U-Net (with various nonlinearities and at diverse train loss values), and show that our method significantly outperforms previous methods for training data recovery from autoencoders. Importantly, our method greatly improves the recovery performance also in settings that were previously considered highly challenging, and even impractical, for such retrieval.
    摘要 我们研究从过参数化 autoencoder 模型中恢复训练数据。给定一个受损训练样本,我们定义恢复原始样本为一个逆问题,并将其转换为一个优化任务。在我们的逆问题中,我们使用训练 autoencoder 来隐式定义特定训练数据集的规 regularizer。我们将这个具有复杂的优化任务转换为一个实用的方法,该方法在每次迭代中运用训练 autoencoder 和一些简单的计算来估计和处理未知的损坏算子。我们将我们的方法应用于隐形填充问题,其目标是从 autoencoder 中恢复训练图像,并且有许多遗传的 pixels 在未知的模式中遗传。我们考虑了不同的深度 autoencoder 架构,例如完全连接和 U-Net (with 多标的非线性和在多个训练损失值),并证明我们的方法在训练数据恢复方面具有重要的进步。特别是,我们的方法在以前考虑为高度困难或不可能的设定中也具有很好的恢复性。

CoLiDE: Concomitant Linear DAG Estimation

  • paper_url: http://arxiv.org/abs/2310.02895
  • repo_url: https://github.com/ignavier/golem
  • paper_authors: Seyed Saman Saboksayr, Gonzalo Mateos, Mariano Tepper
    for:* 这个论文目的是学习来自观察数据的导irected acyclic graph(DAG)结构,并且遵循线性结构方程模型(SEM)。methods:* 这篇论文使用了不同分布的权重函数来描述DAG的不规则结构,并且使用了可微分的非对称性来有效地搜索DAG结构。results:* 这篇论文提出了一种新的卷积分数函数,可以在不同的频率下进行鲁棒的DAG估计,并且可以在不同的频率下进行鲁棒的DAG估计。Here is the same information in Simplified Chinese:for:* 这个论文写的是为了学习来自观察数据的导irected acyclic graph(DAG)结构,并且遵循线性结构方程模型(SEM)。methods:* 这篇论文使用了不同分布的权重函数来描述DAG的不规则结构,并且使用了可微分的非对称性来有效地搜索DAG结构。results:* 这篇论文提出了一种新的卷积分数函数,可以在不同的频率下进行鲁棒的DAG估计,并且可以在不同的频率下进行鲁棒的DAG估计。
    Abstract We deal with the combinatorial problem of learning directed acyclic graph (DAG) structure from observational data adhering to a linear structural equation model (SEM). Leveraging advances in differentiable, nonconvex characterizations of acyclicity, recent efforts have advocated a continuous constrained optimization paradigm to efficiently explore the space of DAGs. Most existing methods employ lasso-type score functions to guide this search, which (i) require expensive penalty parameter retuning when the $\textit{unknown}$ SEM noise variances change across problem instances; and (ii) implicitly rely on limiting homoscedasticity assumptions. In this work, we propose a new convex score function for sparsity-aware learning of linear DAGs, which incorporates concomitant estimation of scale and thus effectively decouples the sparsity parameter from the exogenous noise levels. Regularization via a smooth, nonconvex acyclicity penalty term yields CoLiDE ($\textbf{Co}$ncomitant $\textbf{Li}$near $\textbf{D}$AG $\textbf{E}$stimation), a regression-based criterion amenable to efficient gradient computation and closed-form estimation of noise variances in heteroscedastic scenarios. Our algorithm outperforms state-of-the-art methods without incurring added complexity, especially when the DAGs are larger and the noise level profile is heterogeneous. We also find CoLiDE exhibits enhanced stability manifested via reduced standard deviations in several domain-specific metrics, underscoring the robustness of our novel linear DAG estimator.
    摘要 我们面临了对于观测数据的组合problem,即从线性结构方程模型(SEM)中学习统计学Graph(DAG)的结构。发掘最新的差分可条件优化方法,以有效地探索DAG的空间。大多数现有方法使用lasso类数值函数来导引搜寻,这些方法(一)需要耗费贵重的罚 Parameters 重新设定当 unknown SEM 杂质值 changed across problem instances; 和(二)隐式地假设 homoscedasticity 的限制。在这个工作中,我们提出了一个新的对称数值函数,用于精简DAG的学习,这个函数包含了共同估计标准差,因此实际上将精简 Parameters 与外生杂质水平 decoupled。通过非凸的统一矩阵 penalty 函数,我们得到了 CoLiDE(共同统一 linear DAG 估计),这是一个可以实现高效的梯度计算和关闭式估计杂质水平的条件估计。我们的算法在较大的DAG和不同杂质水平下表现更好,不需要添加额外的复杂性。我们也发现 CoLiDE 具有更好的稳定性,通过减少不同领域的标准差,强调了我们的新的线性DAG估计器的稳定性。

Something for (almost) nothing: Improving deep ensemble calibration using unlabeled data

  • paper_url: http://arxiv.org/abs/2310.02885
  • repo_url: https://github.com/konstantinos-p/something_for_almost_nothing
  • paper_authors: Konstantinos Pitas, Julyan Arbel
  • for: 提高深度套件的准确性在小训练数据情况下,使用无标签数据。
  • methods: 使用随机选择不同标签与每个套件成员进行拟合。
  • results: 经验表明,在低至中等训练集大小情况下,我们的套件更加多样化,提供更好的准确性和套件跨度。
    Abstract We present a method to improve the calibration of deep ensembles in the small training data regime in the presence of unlabeled data. Our approach is extremely simple to implement: given an unlabeled set, for each unlabeled data point, we simply fit a different randomly selected label with each ensemble member. We provide a theoretical analysis based on a PAC-Bayes bound which guarantees that if we fit such a labeling on unlabeled data, and the true labels on the training data, we obtain low negative log-likelihood and high ensemble diversity on testing samples. Empirically, through detailed experiments, we find that for low to moderately-sized training sets, our ensembles are more diverse and provide better calibration than standard ensembles, sometimes significantly.
    摘要 我们提出了一种方法来提高深度 ensemble 的准确性在小训练数据 régime 中,采用了不 labels 数据。我们的方法非常简单实现:对每个不 labels 数据点,我们 simply 随机选择不同的标签并与每个 ensemble 成员进行适应。我们提供了基于 PAC-Bayes bound 的理论分析,证明如果我们在不 labels 数据上适应这种标签,以及真实标签在训练数据上,我们就可以在测试样本上获得低负逻辑概率和高 ensemble 多样性。实际上,通过详细的实验,我们发现,对于小到中等训练集,我们的集成比标准集成更加多样化,提供了更好的准确性,有时甚至是显著的。

Stationarity without mean reversion: Improper Gaussian process regression and improper kernels

  • paper_url: http://arxiv.org/abs/2310.02877
  • repo_url: None
  • paper_authors: Luca Ambrogioni
  • for: This paper aims to address the pathological behavior of mean-reverting Gaussian process regression by introducing improper kernels that are stationary but not mean reverting.
  • methods: The paper proposes the use of improper kernels, including the Smooth Walk kernel and a family of improper Matérn kernels, which can be defined only in this improper regime. The resulting posterior distributions can be computed analytically with a simple correction of the usual formulas.
  • results: The paper demonstrates that these improper kernels solve some known pathologies of mean-reverting GP regression while retaining most of the favorable properties of ordinary smooth stationary kernels, as shown through synthetic and real data analysis.
    Abstract Gaussian processes (GP) regression has gained substantial popularity in machine learning applications. The behavior of a GP regression depends on the choice of covariance function. Stationary covariance functions are favorite in machine learning applications. However, (non-periodic) stationary covariance functions are always mean reverting and can therefore exhibit pathological behavior when applied to data that does not relax to a fixed global mean value. In this paper, we show that it is possible to use improper GP prior with infinite variance to define processes that are stationary but not mean reverting. To this aim, we introduce a large class of improper kernels that can only be defined in this improper regime. Specifically, we introduce the Smooth Walk kernel, which produces infinitely smooth samples, and a family of improper Mat\'ern kernels, which can be defined to be $j$-times differentiable for any integer $j$. The resulting posterior distributions can be computed analytically and it involves a simple correction of the usual formulas. By analyzing both synthetic and real data, we demonstrate that these improper kernels solve some known pathologies of mean reverting GP regression while retaining most of the favourable properties of ordinary smooth stationary kernels.
    摘要

Harmonic Control Lyapunov Barrier Functions for Constrained Optimal Control with Reach-Avoid Specifications

  • paper_url: http://arxiv.org/abs/2310.02869
  • repo_url: None
  • paper_authors: Amartya Mukherjee, Ruikun Zhou, Jun Liu
  • for: 这个论文旨在解决受限制的控制问题,如达到避免的问题。
  • methods: 这个论文提出了含有响应最大原理的响应CLBF函数,它们可以在实验开始时被初始化而不是基于示例轨迹进行训练。控制输入选择系统动力学与最大下降方向夹快内积最大化。
  • results: 实验结果显示,含有响应最大原理的响应CLBF函数可以快速进入安全区域,并且有高概率进入目标区域。
    Abstract This paper introduces harmonic control Lyapunov barrier functions (harmonic CLBF) that aid in constrained control problems such as reach-avoid problems. Harmonic CLBFs exploit the maximum principle that harmonic functions satisfy to encode the properties of control Lyapunov barrier functions (CLBFs). As a result, they can be initiated at the start of an experiment rather than trained based on sample trajectories. The control inputs are selected to maximize the inner product of the system dynamics with the steepest descent direction of the harmonic CLBF. Numerical results are presented with four different systems under different reach-avoid environments. Harmonic CLBFs show a significantly low risk of entering unsafe regions and a high probability of entering the goal region.
    摘要

Estimation of Models with Limited Data by Leveraging Shared Structure

  • paper_url: http://arxiv.org/abs/2310.02864
  • repo_url: None
  • paper_authors: Maryann Rui, Thibaut Horel, Munther Dahleh
  • for: 这篇论文是为了处理具有多个系统、但每个系统只有少量数据的现代数据集而写的。
  • methods: 该论文提出了一种基于共同结构的方法,通过利用其他系统的数据来估计高维参数,从而解决单个系统的数据不充分的问题。该方法包括三步骤:首先估计系统参数之间的低维子空间,然后使用这些参数估计系统的参数,最后使用迭代估计法提高参数的精度。
  • results: 该论文提供了finite sample subspace estimation error guarantees,并通过实验 validate 该方法的正确性。
    Abstract Modern data sets, such as those in healthcare and e-commerce, are often derived from many individuals or systems but have insufficient data from each source alone to separately estimate individual, often high-dimensional, model parameters. If there is shared structure among systems however, it may be possible to leverage data from other systems to help estimate individual parameters, which could otherwise be non-identifiable. In this paper, we assume systems share a latent low-dimensional parameter space and propose a method for recovering $d$-dimensional parameters for $N$ different linear systems, even when there are only $T
    摘要 现代数据集,如医疗和电商,经常来自多个个体或系统,但每个源数据不够以alone来估计高维度模型参数。如果这些系统具有共同结构,那么可能可以通过其他系统的数据来帮助估计个体参数,这些参数可能否认。在这篇论文中,我们假设这些系统共享一个低维度的 latent 参数空间,并提出一种方法来回归 $d$-维度参数的 $N$ 个不同的线性系统,即使只有每个系统 $T

Conformal Predictions for Longitudinal Data

  • paper_url: http://arxiv.org/abs/2310.02863
  • repo_url: None
  • paper_authors: Devesh Batra, Salvatore Mercuri, Raad Khraishi
  • for: 这篇论文是针对长期资料的复准预测方法,专门适用于医学、金融和供应链管理等领域。
  • methods: 本论文提出了一种新的分布自由复准预测算法,named Longitudinal Predictive Conformal Inference (LPCI),它可以确保 both longitudinal and cross-sectional coverage without resorting to infinitely wide intervals。LPCI 使用了 quantile fixed-effects regression 模型,实现了预测interval 的建立。
  • results: 实验结果显示 LPCI 可以实现有效的 cross-sectional coverage 和 longitudinal coverage rates,并且比现有的参考模型perform 更好。论文还提供了 asymptotic coverage guarantees 的理论分析,证明 LPCI 在两个维度上具有有限宽度的预测 интерVAL。
    Abstract We introduce Longitudinal Predictive Conformal Inference (LPCI), a novel distribution-free conformal prediction algorithm for longitudinal data. Current conformal prediction approaches for time series data predominantly focus on the univariate setting, and thus lack cross-sectional coverage when applied individually to each time series in a longitudinal dataset. The current state-of-the-art for longitudinal data relies on creating infinitely-wide prediction intervals to guarantee both cross-sectional and asymptotic longitudinal coverage. The proposed LPCI method addresses this by ensuring that both longitudinal and cross-sectional coverages are guaranteed without resorting to infinitely wide intervals. In our approach, we model the residual data as a quantile fixed-effects regression problem, constructing prediction intervals with a trained quantile regressor. Our extensive experiments demonstrate that LPCI achieves valid cross-sectional coverage and outperforms existing benchmarks in terms of longitudinal coverage rates. Theoretically, we establish LPCI's asymptotic coverage guarantees for both dimensions, with finite-width intervals. The robust performance of LPCI in generating reliable prediction intervals for longitudinal data underscores its potential for broad applications, including in medicine, finance, and supply chain management.
    摘要 我们介绍Longitudinal Predictive Conformal Inference(LPCI),一种新的不偏度概率预测算法,用于长itudinal数据。现有预测方法对时间序列数据偏向单variate setting,缺乏每个时间序列在长itudinal数据集中的交叉sectional覆盖。现状态 искусственного智能 для长itudinal数据是通过创建无限宽预测间隔来保证两个维度的覆盖,包括交叉sectional和垂直维度的覆盖。我们的LPCI方法解决了这个问题,因为它可以保证两个维度的覆盖,不需要无限宽的预测间隔。在我们的方法中,我们使用量iles fixed-effects regression问题来模型剩余数据,并构建基于训练量iles regressor的预测间隔。我们的广泛实验表明,LPCI可以实现有效的交叉sectional覆盖,并且在长itudinal覆盖率方面超过现有的标准准确。理论上,我们确定LPCI在两个维度上具有有限宽预测间隔的极限涵盖保证。LPCI的可靠性和稳定性在生成可靠预测间隔的长itudinal数据上表现出色,这些特点使其在医学、金融和供应链管理等领域有广泛的应用前景。

Multi-Domain Causal Representation Learning via Weak Distributional Invariances

  • paper_url: http://arxiv.org/abs/2310.02854
  • repo_url: None
  • paper_authors: Kartik Ahuja, Amin Mansouri, Yixin Wang
  • for: 本研究旨在探讨 causal 表示学习在多个领域数据集上的优势。
  • methods: 本文使用 autoencoder 来学习 causal 表示,并利用数据集中具有稳定分布性质的子集来提高表示学习的可靠性。
  • results: 研究人员发现,通过 incorporating 稳定分布性质的子集,autoencoder 可以在不同设定下提取稳定的 latent 表示。
    Abstract Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
    摘要 在这项工作中,我们relax these assumptions和利用以下观察:there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. 我们利用这种观察,证明 autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.

Learning to Scale Logits for Temperature-Conditional GFlowNets

  • paper_url: http://arxiv.org/abs/2310.02823
  • repo_url: None
  • paper_authors: Minsu Kim, Joohwan Ko, Dinghuai Zhang, Ling Pan, Taeyoung Yun, Woochang Kim, Jinkyoo Park, Yoshua Bengio
  • for: 这种论文的目的是为了学习一种可能性极高的概率模型,用于顺序生成化学结构,以实现更好的生成多种生物化学过程中的多样性。
  • methods: 这种模型使用的方法是基于温度的流程网络(GFlowNets),它们是一种基于温度的政策序列生成的模型,可以通过调整温度来控制模型的探索和利用行为。
  • results: 作者们提出了一种新的架构设计方法,即学习温度Scaling Logits(LSL-GFN),可以快速加速温度控制的GFlowNets训练。这种方法基于对温度的conditioning进行数值化处理,从而大幅提高了GFlowNets的性能,在多种生物化学任务中都达到了或超过了其他基elines和抽样方法的水平。
    Abstract GFlowNets are probabilistic models that learn a stochastic policy that sequentially generates compositional structures, such as molecular graphs. They are trained with the objective of sampling such objects with probability proportional to the object's reward. Among GFlowNets, the temperature-conditional GFlowNets represent a family of policies indexed by temperature, and each is associated with the correspondingly tempered reward function. The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.
    摘要 GFlowNets 是一种概率模型,它们学习一种随机policy,该policySequentially生成组合结构,如分子图。它们被训练以抽样这些对象,抽样概率与对象的奖励相对应。 Among GFlowNets, temperature-conditional GFlowNets 表示一种由温度索引的策略家族,每个策略都与相应的凉卷奖励函数相关。 The major benefit of temperature-conditional GFlowNets is the controllability of GFlowNets' exploration and exploitation through adjusting temperature. We propose Learning to Scale Logits for temperature-conditional GFlowNets (LSL-GFN), a novel architectural design that greatly accelerates the training of temperature-conditional GFlowNets. It is based on the idea that previously proposed temperature-conditioning approaches introduced numerical challenges in the training of the deep network because different temperatures may give rise to very different gradient profiles and ideal scales of the policy's logits. We find that the challenge is greatly reduced if a learned function of the temperature is used to scale the policy's logits directly. We empirically show that our strategy dramatically improves the performances of GFlowNets, outperforming other baselines, including reinforcement learning and sampling methods, in terms of discovering diverse modes in multiple biochemical tasks.

A Deep Instance Generative Framework for MILP Solvers Under Limited Data Availability

  • paper_url: http://arxiv.org/abs/2310.02807
  • repo_url: https://github.com/miralab-ustc/l2o-g2milp
  • paper_authors: Zijie Geng, Xijun Li, Jie Wang, Xiao Li, Yongdong Zhang, Feng Wu
  • for: 本文旨在提出一种深度生成框架,用于生成混合整数线性 програм(MILP)实例。
  • methods: 本文使用 masked variational autoencoder 来生成 MILP 实例,并且可以不基于专家设计的 формуLA。
  • results: 实验表明,我们的方法可以生成与实际数据有相似结构和计算困难的 MILP 实例,同时能够保持实际数据的特性。
    Abstract In the past few years, there has been an explosive surge in the use of machine learning (ML) techniques to address combinatorial optimization (CO) problems, especially mixed-integer linear programs (MILPs). Despite the achievements, the limited availability of real-world instances often leads to sub-optimal decisions and biased solver assessments, which motivates a suite of synthetic MILP instance generation techniques. However, existing methods either rely heavily on expert-designed formulations or struggle to capture the rich features of real-world instances. To tackle this problem, we propose G2MILP, the first deep generative framework for MILP instances. Specifically, G2MILP represents MILP instances as bipartite graphs, and applies a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets, simultaneously. Thus the generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability. We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Experiments demonstrate that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness. The deliverables are released at https://miralab-ustc.github.io/L2O-G2MILP.
    摘要 To address this problem, we propose G2MILP, the first deep generative framework for MILP instances. G2MILP represents MILP instances as bipartite graphs and uses a masked variational autoencoder to iteratively corrupt and replace parts of the original graphs to generate new ones. The appealing feature of G2MILP is that it can learn to generate novel and realistic MILP instances without prior expert-designed formulations, while preserving the structures and computational hardness of real-world datasets.We design a suite of benchmarks to evaluate the quality of the generated MILP instances. Our experiments show that our method can produce instances that closely resemble real-world datasets in terms of both structures and computational hardness. The generated instances can facilitate downstream tasks for enhancing MILP solvers under limited data availability.The deliverables of our method, including the source code and the generated instances, are released at .

A Data-facilitated Numerical Method for Richards Equation to Model Water Flow Dynamics in Soil

  • paper_url: http://arxiv.org/abs/2310.02806
  • repo_url: None
  • paper_authors: Zeyuan Song, Zheyu Jiang
  • for: 该论文主要针对Root-zone soil moisture监测和精细耕作、智能灌溉和旱灾防治。
  • methods: 该论文提出了一种基于数据优化的数值方法,称为D-GRW方法,该方法 synergistically integrate了适应性线性化方案、神经网络和全球随机漫步在finite volume分割框架中,以生成Richards方程的精准数值解决方案,并保证了合理的假设下的有限 convergence。
  • results: 该论文通过三个示例,证明了D-GRW方法的精度和质量,并与参照方法和商业解决方案进行比较。
    Abstract Root-zone soil moisture monitoring is essential for precision agriculture, smart irrigation, and drought prevention. Modeling the spatiotemporal water flow dynamics in soil is typically achieved by solving a hydrological model, such as the Richards equation which is a highly nonlinear partial differential equation (PDE). In this paper, we present a novel data-facilitated numerical method for solving the mixed-form Richards equation. This numerical method, which we call the D-GRW (Data-facilitated global Random Walk) method, synergistically integrates adaptive linearization scheme, neural networks, and global random walk in a finite volume discretization framework to produce accurate numerical solutions of the Richards equation with guaranteed convergence under reasonable assumptions. Through three illustrative examples, we demonstrate and discuss the superior accuracy and mass conservation performance of our D-GRW method and compare it with benchmark numerical methods and commercial solver.
    摘要 <> translate "Root-zone soil moisture monitoring is essential for precision agriculture, smart irrigation, and drought prevention. Modeling the spatiotemporal water flow dynamics in soil is typically achieved by solving a hydrological model, such as the Richards equation which is a highly nonlinear partial differential equation (PDE). In this paper, we present a novel data-facilitated numerical method for solving the mixed-form Richards equation. This numerical method, which we call the D-GRW (Data-facilitated global Random Walk) method, synergistically integrates adaptive linearization scheme, neural networks, and global random walk in a finite volume discretization framework to produce accurate numerical solutions of the Richards equation with guaranteed convergence under reasonable assumptions. Through three illustrative examples, we demonstrate and discuss the superior accuracy and mass conservation performance of our D-GRW method and compare it with benchmark numerical methods and commercial solver." into Simplified Chinese.Root-zone soil moisture monitoring 是精细农业、智能灌溉和抗旱防治的关键。通常通过解决水文模型,如理查德方程(PDE)来模拟 soil 中水流动的空间时间流动。在这篇文章中,我们介绍了一种新的数据促进 numerical 方法,称为 D-GRW(数据促进全球随机步行)方法,该方法将适应性线性化 schemes,神经网络和全球随机步行 synergistically интегрирован到 finite volume 积分框架中,以生成理查德方程的数字解决方案,并保证合理假设下的有限积分稳定性。通过三个示例,我们展示了 D-GRW 方法的精度和质量保证性,并与参考数值方法和商业解决方案进行比较。

MAD Max Beyond Single-Node: Enabling Large Machine Learning Model Acceleration on Distributed Systems

  • paper_url: http://arxiv.org/abs/2310.02784
  • repo_url: None
  • paper_authors: Samuel Hsia, Alicia Golden, Bilge Acun, Newsha Ardalani, Zachary DeVito, Gu-Yeon Wei, David Brooks, Carole-Jean Wu
  • for: 这份研究是为了提高大机器学习(ML)模型的训练和部署效率,并且需要大量的分布式计算基础设施。
  • methods: 本研究使用了现实世界中大型机器学习模型的训练数据中心设备,并开发了一个敏捷性性能模型框架,以帮助并行化和硬件软件共同设计策略。
  • results: 根据实际的大型机器学习模型和现代GPU训练硬件,本研究显示了预训练和测试场景中的2.24倍和5.27倍的throughput提升潜力。
    Abstract Training and deploying large machine learning (ML) models is time-consuming and requires significant distributed computing infrastructures. Based on real-world large model training on datacenter-scale infrastructures, we show 14~32% of all GPU hours are spent on communication with no overlapping computation. To minimize the outstanding communication latency, in this work, we develop an agile performance modeling framework to guide parallelization and hardware-software co-design strategies. Using the suite of real-world large ML models on state-of-the-art GPU training hardware, we demonstrate 2.24x and 5.27x throughput improvement potential for pre-training and inference scenarios, respectively.
    摘要 培训和部署大型机器学习(ML)模型需要很长时间,并且需要庞大的分布式计算基础设施。根据实际的大型模型在数据中心级别基础设施上的训练实践,我们发现14%-32%的所有GPU时间都被沟通占用,没有重叠计算。为了减少待机时间,在这项工作中,我们开发了一个轻松性性能模型框架,以导引并行化和硬件软件共设策略。使用现代GPU训练硬件的集成体系,我们示出了2.24倍和5.27倍的throughput提升潜力,在预训练和推理场景中。

Expected flow networks in stochastic environments and two-player zero-sum games

  • paper_url: http://arxiv.org/abs/2310.02779
  • repo_url: None
  • paper_authors: Marco Jiralerspong, Bilun Sun, Danilo Vucetic, Tianyu Zhang, Yoshua Bengio, Gauthier Gidel, Nikolay Malkin
  • for: 这个论文是为了应用在不同的结构化物件生成任务中,使用组合抽样方法来实现快速抽样高质量的物件。
  • methods: 该论文使用了组合抽样网络(GFlowNets),并且提出了对应数据分布的预期流网络(EFlowNets),以及在抽样中对抗环境中的抽样网络(AFlowNets)。
  • results: 该论文显示了EFlowNets在数据分布中实现了更好的表现,并且在游戏中实现了更高的胜率(大于80%)。
    Abstract Generative flow networks (GFlowNets) are sequential sampling models trained to match a given distribution. GFlowNets have been successfully applied to various structured object generation tasks, sampling a diverse set of high-reward objects quickly. We propose expected flow networks (EFlowNets), which extend GFlowNets to stochastic environments. We show that EFlowNets outperform other GFlowNet formulations in stochastic tasks such as protein design. We then extend the concept of EFlowNets to adversarial environments, proposing adversarial flow networks (AFlowNets) for two-player zero-sum games. We show that AFlowNets learn to find above 80% of optimal moves in Connect-4 via self-play and outperform AlphaZero in tournaments.
    摘要 generative flow networks (GFlowNets) 是一种顺序采样模型,用于匹配给定的分布。 GFlowNets 在不同的结构化对象生成任务中成功应用,快速采样出高奖对象的多样性。我们提出了预期流网络(EFlowNets),它们extend GFlowNets 到随机环境中。我们表明,EFlowNets 在随机任务中比其他 GFlowNet 形式更高效。然后,我们扩展了 EFlowNets 的概念,提出了对抗流网络(AFlowNets),用于两个玩家的零SUM游戏。我们表明,AFlowNets 在 Connect-4 游戏中通过自游戏和AlphaZero比赛中,找到了大于 80% 的优化移动。

Graph Neural Networks and Time Series as Directed Graphs for Quality Recognition

  • paper_url: http://arxiv.org/abs/2310.02774
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Angelica Simonetti, Ferdinando Zanchetta
  • for: 这篇论文旨在探讨时序序列中的图 neural network (GNN) 的应用,并与现有算法相结合,如 temporally convolutional networks 和 recurrent neural networks。
  • methods: 该论文将时序序列视为导向图,从而利用 GNN 架构来捕捉时间相关性。开发了两种不同的几何深度学习模型,一种是监督类型的分类器,另一种是 autoencoder-like 模型用于信号重建。
  • results: 在质量识别问题中应用这两种模型,得到了有效的结果。
    Abstract Graph Neural Networks (GNNs) are becoming central in the study of time series, coupled with existing algorithms as Temporal Convolutional Networks and Recurrent Neural Networks. In this paper, we see time series themselves as directed graphs, so that their topology encodes time dependencies and we start to explore the effectiveness of GNNs architectures on them. We develop two distinct Geometric Deep Learning models, a supervised classifier and an autoencoder-like model for signal reconstruction. We apply these models on a quality recognition problem.
    摘要 格点网络(GNNs)在时间序列研究中变得中心,与现有算法相结合,如时间卷积网络和循环神经网络。在这篇论文中,我们看到时间序列本身是导向图,其 topology 编码时间关系,我们开始探索 GNNs 架构在它们上的效果。我们开发了两种不同的几何深度学习模型,一个是supervised分类器,另一个是循环神经网络 для信号重建。我们在质量识别问题中应用这些模型。Note that Simplified Chinese is a writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other regions.

Deep Reinforcement Learning Algorithms for Hybrid V2X Communication: A Benchmarking Study

  • paper_url: http://arxiv.org/abs/2310.03767
  • repo_url: None
  • paper_authors: Fouzi Boukhalfa, Reda Alami, Mastane Achab, Eric Moulines, Mehdi Bennis
  • for: 提高自动驾驶车辆的安全性水平,以达到飞行器级别的可靠性。
  • methods: 利用多种通信技术的 redundancy 来实现高可靠性,并使用深度强化学习算法解决 vertical handover 问题。
  • results: 比起现有的状态艺术方法,使用 DRL 算法可以更好地增加 V-VLC 头灯的可用性和重复率,从而减少通信成本while maintaining a high level of reliability。
    Abstract In today's era, autonomous vehicles demand a safety level on par with aircraft. Taking a cue from the aerospace industry, which relies on redundancy to achieve high reliability, the automotive sector can also leverage this concept by building redundancy in V2X (Vehicle-to-Everything) technologies. Given the current lack of reliable V2X technologies, this idea is particularly promising. By deploying multiple RATs (Radio Access Technologies) in parallel, the ongoing debate over the standard technology for future vehicles can be put to rest. However, coordinating multiple communication technologies is a complex task due to dynamic, time-varying channels and varying traffic conditions. This paper addresses the vertical handover problem in V2X using Deep Reinforcement Learning (DRL) algorithms. The goal is to assist vehicles in selecting the most appropriate V2X technology (DSRC/V-VLC) in a serpentine environment. The results show that the benchmarked algorithms outperform the current state-of-the-art approaches in terms of redundancy and usage rate of V-VLC headlights. This result is a significant reduction in communication costs while maintaining a high level of reliability. These results provide strong evidence for integrating advanced DRL decision mechanisms into the architecture as a promising approach to solving the vertical handover problem in V2X.
    摘要 Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Translation notes:* "era" is translated as "时代" (shí dài) in Simplified Chinese.* "autonomous vehicles" is translated as "自动驾驶车辆" (zì àuto xíng yí qiāng liàng) in Simplified Chinese.* "aerospace industry" is translated as "航空航天产业" (háng kōng háng tiān chǎng yè) in Simplified Chinese.* "redundancy" is translated as "备用性" (bèi yòng xìng) in Simplified Chinese.* "V2X" is translated as "车辆到所有物的通信" (qì liàng dào suǒ yǒu wù de tōng xìn) in Simplified Chinese.* "RATs" is translated as "无线通信技术" (wú xiān tōng xìn jī shu) in Simplified Chinese.* "serpentine environment" is translated as "蛇形环境" (shé xíng huán jìng) in Simplified Chinese.* "Deep Reinforcement Learning" is translated as "深度强化学习" (shēn dào qiáng hòu xué xí) in Simplified Chinese.* "benchmarked algorithms" is translated as "标准算法" (biāo zhǔn suān fāng) in Simplified Chinese.* "usage rate" is translated as "使用率" (shǐ yòng gè) in Simplified Chinese.* "communication costs" is translated as "通信成本" (tōng xìn chéng běn) in Simplified Chinese.

Kernel-based function learning in dynamic and non stationary environments

  • paper_url: http://arxiv.org/abs/2310.02767
  • repo_url: None
  • paper_authors: Alberto Giaretta, Mauro Bisiacco, Gianluigi Pillonetto
  • for: 这个论文是关于Function estimation from sparse and noisy data的研究,具体来说是关于supervised learning的研究,其中每个训练集元素都是一个输入位置和一个输出响应的couple。
  • methods: 这篇论文使用了kernel-based ridge regression方法,并derived convergence conditions under non-stationary distributions,包括在不同时间点上的探索-利用问题。
  • results: 这篇论文提出了一些关于函数估计的结果,包括对non-stationary distribution下的函数估计的研究,以及在探索-利用问题中的应用。
    Abstract One central theme in machine learning is function estimation from sparse and noisy data. An example is supervised learning where the elements of the training set are couples, each containing an input location and an output response. In the last decades, a substantial amount of work has been devoted to design estimators for the unknown function and to study their convergence to the optimal predictor, also characterizing the learning rate. These results typically rely on stationary assumptions where input locations are drawn from a probability distribution that does not change in time. In this work, we consider kernel-based ridge regression and derive convergence conditions under non stationary distributions, addressing also cases where stochastic adaption may happen infinitely often. This includes the important exploration-exploitation problems where e.g. a set of agents/robots has to monitor an environment to reconstruct a sensorial field and their movements rules are continuously updated on the basis of the acquired knowledge on the field and/or the surrounding environment.
    摘要 (一)中心主题是机器学习中的函数估计从稀疏噪声数据中进行。例如,超级vised learning,训练集中的元素是一对输入位置和输出响应。过去几十年,对不知函数的估计和优化预测器的设计,以及其对优化预测器的整体学习率的研究,已经占据了大量的时间和精力。这些研究通常假设输入位置采样从一个不变的概率分布中,而不是时间不变的概率分布。在这种情况下,我们考虑使用核函数的ridge regression,并 derive non-stationary分布下的整体学习率。此外,我们还考虑了无限次随机adaptation的情况,包括重要的exploration-exploitation问题,例如一组机器人/робот需要监测环境,重建感知场和其运动规则是基于获得的场景和/或周围环境知识不断更新的。

Fair Feature Selection: A Comparison of Multi-Objective Genetic Algorithms

  • paper_url: http://arxiv.org/abs/2310.02752
  • repo_url: None
  • paper_authors: James Brookhouse, Alex Freitas
  • for: 这个研究paper是为了提出一种新的公平特征选择方法,以提高分类器的准确率和公平性。
  • methods: 这个paper使用了两种不同的多目标优化方法:Pareto优化和lexicographic优化,以选择最佳的特征子集。
  • results: 比较两种方法的结果显示,lexicographic优化方法在精度方面表现较好,而不会对公平性造成影响。这是一个重要的结果,因为现在大多数的GA均基于Pareto方法,这个结果显示了一个新的进展方向。
    Abstract Machine learning classifiers are widely used to make decisions with a major impact on people's lives (e.g. accepting or denying a loan, hiring decisions, etc). In such applications,the learned classifiers need to be both accurate and fair with respect to different groups of people, with different values of variables such as sex and race. This paper focuses on fair feature selection for classification, i.e. methods that select a feature subset aimed at maximising both the accuracy and the fairness of the predictions made by a classifier. More specifically, we compare two recently proposed Genetic Algorithms (GAs) for fair feature selection that are based on two different multi-objective optimisation approaches: (a) a Pareto dominance-based GA; and (b) a lexicographic optimisation-based GA, where maximising accuracy has higher priority than maximising fairness. Both GAs use the same measures of accuracy and fairness, allowing for a controlled comparison. As far as we know, this is the first comparison between the Pareto and lexicographic approaches for fair classification. The results show that, overall, the lexicographic GA outperformed the Pareto GA with respect to accuracy without degradation of the fairness of the learned classifiers. This is an important result because at present nearly all GAs for fair classification are based on the Pareto approach, so these results suggest a promising new direction for research in this area.
    摘要

Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel

  • paper_url: http://arxiv.org/abs/2310.03054
  • repo_url: https://github.com/fabianaltekrueger/conditional_mmd_flows
  • paper_authors: Paul Hagemann, Johannes Hertrich, Fabian Altekrüger, Robert Beinert, Jannis Chemseddine, Gabriele Steidl
  • for: Conditional generative modeling and posterior sampling
  • methods: Discrete Wasserstein gradient flows and negative distance kernel
  • results: Efficient computation, error bound for posterior distributions, and demonstrated power through numerical examples in various applications such as conditional image generation and inverse problems like superresolution, inpainting, and computed tomography.Here’s the full text in Simplified Chinese:
  • for: 本文提出了基于最大均方差(MMD)的 conditional flows,用于 posterior sampling 和 conditional generative modeling。MMD 具有许多优点,如高效计算via slicing 和 sorting。
  • methods: 我们使用柔性 Wasserstein 泛化流动来近似 JOINT 分布函数,并建立了这个分布函数的错误 bound。此外,我们证明了我们的 particle flow 实际上是一个 Wasserstein 泛化流动。
  • results: 我们通过数学示例展示了我们的方法的力量,包括 conditional image generation 和 inverse problems 如超分辨、填充和计算Tomography 在低剂量和有限角度设置下。
    Abstract We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
    摘要 我们提议使用最大均方差(MMD)的条件流动,以优化 posterior sampling 和 conditional generative modeling。 MMD 具有许多有利的性质,如高效的计算via slice 和 sort。我们使用离散 Wasserstein 梯度流动来近似 JOINT 分布ground truth 和观察值,并确定错误 bound для posterior 分布。此外,我们证明我们的 particle flow 实际上是一个 Wasserstein 梯度流动的函数。我们的方法在数字例如 conditional 图像生成和 inverse 问题中如 superresolution、inpainting 和 computed tomography 中的低剂量和有限角度设置中显示出了力量。

SALSA: Semantically-Aware Latent Space Autoencoder

  • paper_url: http://arxiv.org/abs/2310.02744
  • repo_url: None
  • paper_authors: Kathryn E. Kirchoff, Travis Maxfield, Alexander Tropsha, Shawn M. Gomez
  • For: 该研究旨在提高深度学习在药物发现中的应用,特别是在化学数据的表示方面。* Methods: 该研究使用了自适应神经网络(Autoencoder)和变换器,并添加了一个对比任务来学习分子之间的结构相似性。* Results: 研究表明,通过添加对比任务,Autoencoder可以学习更加有意义的分子表示,并且能够更好地捕捉分子之间的结构相似性。这些表示能够帮助进一步提高药物发现的效果。
    Abstract In deep learning for drug discovery, chemical data are often represented as simplified molecular-input line-entry system (SMILES) sequences which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are defined by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not respect the structural similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA), a transformer-autoencoder modified with a contrastive task, tailored specifically to learn graph-to-graph similarity between molecules. Formally, the contrastive objective is to map structurally similar molecules (separated by a single graph edit) to nearby codes in the latent space. To accomplish this, we generate a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We compare SALSA to its ablated counterparts, and show empirically that the composed training objective (reconstruction and contrastive task) leads to a higher quality latent space that is more 1) structurally-aware, 2) semantically continuous, and 3) property-aware.
    摘要 深度学习在药物发现中使用化学数据,通常将化学数据表示为简化的分子输入线入系统(SMILES)序列,这使得自然语言处理方法可以直接实现。然而,我们发现训练自动编码器solely on SMILES是不够学习分子表示,其中表示是指分子之间的结构相似性。我们通过示例显示,自动编码器可能将结构相似分子映射到远离的编码中,导致latent空间无法尊重分子之间的结构相似性。为解决这个缺陷,我们提议使用Semantically-Aware Latent Space Autoencoder(SALSA),一种基于变换器的自动编码器,并添加了一个对比 зада务,特意用于学习分子之间的结构相似性。正式来说,对比目标是将结构相似分子(分开一个分子编辑)映射到latent空间中的近处编码。为实现这一点,我们生成了一个新的数据集,其中包含了结构相似分子的集合,并选择了一种监督的对比损失,可以包含全集的正样本。我们比较了SALSA与其简化版本,并证明了组合的训练目标(重建和对比任务)会导致一个更高质量的latent空间,其1) 结构意识更强,2) 更加连续,3) 性质意识更好。

Reward Model Ensembles Help Mitigate Overoptimization

  • paper_url: http://arxiv.org/abs/2310.02743
  • repo_url: None
  • paper_authors: Thomas Coste, Usman Anwar, Robert Kirk, David Krueger
  • for: 这个研究是用来研究使用强化学习自人给大型自然语言模型的调整方法,以便让模型能够更好地遵循人类的指令。
  • methods: 这个研究使用了一个人工设定的人类反馈系统,并使用了两种优化方法:(a)最佳 sampling(BoN)和(b) proximal policy optimization(PPO)。此外,这个研究还使用了一个大型“真实” reward model,以模拟人类的偏好。
  • results: 研究发现,使用ensemble-based conservative optimization可以有效地遏制 reward model 的过优化,并提高表现的精度。尤其是在使用 BoN 优化时,ensemble-based conservative optimization 可以提高表现的精度最多达 70%。此外,研究还发现,在添加25% 标签错误时,ensemble-based conservative optimization 仍然能够有效地遏制过优化。
    Abstract Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning large language models to follow instructions. As part of this process, learned reward models are used to approximately model human preferences. However, as imperfect representations of the "true" reward, these learned reward models are susceptible to \textit{overoptimization}. Gao et al. (2023) studied this phenomenon in a synthetic human feedback setup with a significantly larger "gold" reward model acting as the true reward (instead of humans) and showed that overoptimization remains a persistent problem regardless of the size of the proxy reward model and training data used. Using a similar setup, we conduct a systematic study to evaluate the efficacy of using ensemble-based conservative optimization objectives, specifically worst-case optimization (WCO) and uncertainty-weighted optimization (UWO), for mitigating reward model overoptimization when using two optimization methods: (a) best-of-n sampling (BoN) (b) proximal policy optimization (PPO). We additionally extend the setup of Gao et al. (2023) to include 25% label noise to better mirror real-world conditions. Both with and without label noise, we find that conservative optimization practically eliminates overoptimization and improves performance by up to 70% for BoN sampling. For PPO, ensemble-based conservative optimization always reduces overoptimization and outperforms single reward model optimization. Moreover, combining it with a small KL penalty successfully prevents overoptimization at no performance cost. Overall, our results demonstrate that ensemble-based conservative optimization can effectively counter overoptimization.
    摘要 人工回馈学习(RLHF)是一种标准的精细调整大语言模型,以便跟进 instrucciones。在这个过程中,学习的奖励模型用于 aproximately 模型人类偏好。然而,由于这些学习的奖励模型是不完美的 "真实" 奖励的表示,因此它们容易过价 optimize。GAO et al. (2023) 在一个 sintética setup 中研究了这种现象,并显示了在不同的 proxy 奖励模型和训练数据使用的情况下,过价 optimize 仍然是一个持续存在的问题。使用相似的 setup,我们进行了一项系统的研究,以评估使用ensemble-based conservative optimization objective,特别是worst-case optimization(WCO)和uncertainty-weighted optimization(UWO)来 mitigate 奖励模型过价 optimize 问题,并使用两种优化方法:(a) best-of-n sampling (BoN)(b) proximal policy optimization (PPO)。我们还扩展了 GAO et al. 的 setup,包括25% 标签噪音,以更好地镜像实际条件。无论含有标签噪音或不含,我们发现ensemble-based conservative optimization 实际上消除了过价 optimize 问题,并提高性能达70% 。对于 PPO,ensemble-based conservative optimization 总是减少过价 optimize,并且在不产生性能损失的情况下,成功阻止过价 optimize。总的来说,我们的结果表明,ensemble-based conservative optimization 可以有效地解决过价 optimize 问题。

Comparative Analysis of Imbalanced Malware Byteplot Image Classification using Transfer Learning

  • paper_url: http://arxiv.org/abs/2310.02742
  • repo_url: None
  • paper_authors: Jayasudha M, Ayesha Shaik, Gaurav Pendharkar, Soham Kumar, Muhesh Kumar B, Sudharshanan Balaji
  • for: 本研究旨在比较六种多类分类模型在Malimg数据集、Blended数据集和Malevis数据集上的性能,以了解类别不均衡对模型性能和融合的影响。
  • methods: 本研究使用了六种多类分类模型,包括ResNet50、EfficientNetB0和DenseNet169,并对不均衡和均衡数据进行比较。
  • results: 研究发现,当类别不均衡时,需要更少的轮数才能达到最高精度(97%),并且存在类型之间的差异。此外,ResNet50、EfficientNetB0和DenseNet169模型在不均衡和均衡数据上都能够表现良好。
    Abstract Cybersecurity is a major concern due to the increasing reliance on technology and interconnected systems. Malware detectors help mitigate cyber-attacks by comparing malware signatures. Machine learning can improve these detectors by automating feature extraction, identifying patterns, and enhancing dynamic analysis. In this paper, the performance of six multiclass classification models is compared on the Malimg dataset, Blended dataset, and Malevis dataset to gain insights into the effect of class imbalance on model performance and convergence. It is observed that the more the class imbalance less the number of epochs required for convergence and a high variance across the performance of different models. Moreover, it is also observed that for malware detectors ResNet50, EfficientNetB0, and DenseNet169 can handle imbalanced and balanced data well. A maximum precision of 97% is obtained for the imbalanced dataset, a maximum precision of 95% is obtained on the intermediate imbalance dataset, and a maximum precision of 95% is obtained for the perfectly balanced dataset.
    摘要 信息安全是一个主要的担忧,因为随着技术和相互连接系统的使用量的增加,黑客可以利用漏洞和攻击系统。恶意软件检测器可以减轻cyber攻击的影响,通过比较恶意软件签名。机器学习可以改进这些检测器,通过自动提取特征、识别模式和动态分析提高检测效果。在这篇论文中,我们比较了六种多类分类模型在Malimg数据集、Blended数据集和Malevis数据集上的性能,以了解类别不均衡对模型性能和融合的影响。我们发现,当类别不均衡时,模型的融合需要更少的epoch数,并且模型之间的性能差异较大。此外,我们还发现,为恶意软件检测器来说,ResNet50、EfficientNetB0和DenseNet169可以处理不均衡和均衡的数据都很好。在不均衡数据集上,最高的准确率为97%,在中等不均衡数据集上最高的准确率为95%,而在完全均衡数据集上最高的准确率为95%。

Extracting Rules from Event Data for Study Planning

  • paper_url: http://arxiv.org/abs/2310.02735
  • repo_url: https://github.com/m4jidRafiei/AIStudyBuddy-RuleExtractor
  • paper_authors: Majid Rafiei, Duygu Bayrak, Mahsa Pourbafrani, Gyunam Park, Hayyan Helal, Gerhard Lakemeyer, Wil M. P. van der Aalst
  • for: 这个研究旨在使用校内管理系统事件数据分析高等教育学生的学习路径。主要目的是为学生提供有用的学习规划建议。
  • methods: 本研究使用处理和数据探索技术来探索学生选择的课程序列对学术成就的影响。使用决策树模型生成基于数据的建议,并与建议的学习规划进行比较。
  • results: 评估运用RWTH慕尼黑理工大学计算机科学学士学位学生,发现提posed course sequence features有效地解释学术成就指标。此外,发现可以发展更灵活的学习规划。
    Abstract In this study, we examine how event data from campus management systems can be used to analyze the study paths of higher education students. The main goal is to offer valuable guidance for their study planning. We employ process and data mining techniques to explore the impact of sequences of taken courses on academic success. Through the use of decision tree models, we generate data-driven recommendations in the form of rules for study planning and compare them to the recommended study plan. The evaluation focuses on RWTH Aachen University computer science bachelor program students and demonstrates that the proposed course sequence features effectively explain academic performance measures. Furthermore, the findings suggest avenues for developing more adaptable study plans.
    摘要 在这项研究中,我们研究了如何使用校园管理系统事件数据来分析高等教育学生的学习路径。主要目标是为学生的学习规划提供有价值的指导。我们使用过程挖掘技术和数据挖掘技术来探索学生们所选择的课程序列对学术成绩的影响。通过使用决策树模型,我们生成了基于数据的建议,并与推荐的学习计划进行比较。评估针对于rwth洪堡大学计算机科学学士学位学生,并证明了我们提出的课程序列特征有效地解释学术成绩指标。此外,发现还有改进学习计划的可能性。

End-to-End Training of a Neural HMM with Label and Transition Probabilities

  • paper_url: http://arxiv.org/abs/2310.02724
  • repo_url: https://github.com/danenergetics/returnn
  • paper_authors: Daniel Mann, Tina Raissi, Wilfried Michel, Ralf Schlüter, Hermann Ney
  • for: 这 paper 的目的是提出一种基于隐藏马尔可夫模型 (HMM) 的端到端神经网络训练方法。
  • methods: 这 paper 使用的方法是在隐藏状态之间Explicitly modeling and learning transition probabilities。
  • results: 这 paper 的结果表明,虽然transition model training不提高了识别性能,但它对Alignment quality有着正面的影响,并且生成的 alignments 可以作为state-of-the-art Viterbi trainings 的可靠目标。
    Abstract We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration statistics. We implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition probabilities. We investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.
    摘要 我们研究了一种新的模型方法,使用隐藏Markov模型(HMM)来训练端到端神经网络。在我们的方法中,transition probabilities между隐藏状态被Explicitly modeled和学习。大多数当代序列到序列模型允许从scratch训练,通过所有可能的标签分 segmentation在给定的topology中总和。在我们的方法中,有Explicit, learnable probabilities for transition between segments,而不是一个负空标签,隐式地编码持续时间统计。我们实现了GPU上的前向后向算法,可以同时训练标签和转移概率。我们 investigate recognition results和Viterbi alignments of our models。我们发现,转移模型训练不会提高认识性能,但是它会改善对齐质量。生成的对齐是可靠的目标在state-of-the-art Viterbi训练中。

Leveraging Temporal Graph Networks Using Module Decoupling

  • paper_url: http://arxiv.org/abs/2310.02721
  • repo_url: None
  • paper_authors: Or Feldman, Chaim Baskin
  • for: The paper is written for learning on dynamic graphs, specifically addressing the issue of using batches in modern approaches and the degradation of model performance.
  • methods: The paper proposes a decoupling strategy that enables models to update frequently while using batches, achieved by decoupling the core modules of temporal graph networks and implementing them with a minimal number of learnable parameters.
  • results: The proposed Lightweight Decoupled Temporal Graph Network (LDTGN) achieves comparable or state-of-the-art results with significantly higher throughput than previous art, outperforming previous approaches by more than 20% on benchmarks that require rapid model update rates.Here are the three points in Simplified Chinese:
  • for: 这篇论文是为了学习动态图而写的,具体是解决现代方法中使用批处理的问题,并且模型性能下降的问题。
  • methods: 这篇论文提出了一种分解策略,使得模型在使用批处理时能够频繁更新,通过分解核心模块的时间图网络,并使用最小化参数来实现。
  • results: 提出的轻量级解 Coupled Temporal Graph Network (LDTGN) 在多个动态图标准 bencmarks 上得到了相似或者状态艺术的结果,并且与之前的方法比较高过的 Throughput 得到了较高的提升,比如 USLegis 或 UNTrade 等benchmarks,提升了 más de 20%。
    Abstract Modern approaches for learning on dynamic graphs have adopted the use of batches instead of applying updates one by one. The use of batches allows these techniques to become helpful in streaming scenarios where updates to graphs are received at extreme speeds. Using batches, however, forces the models to update infrequently, which results in the degradation of their performance. In this work, we suggest a decoupling strategy that enables the models to update frequently while using batches. By decoupling the core modules of temporal graph networks and implementing them using a minimal number of learnable parameters, we have developed the Lightweight Decoupled Temporal Graph Network (LDTGN), an exceptionally efficient model for learning on dynamic graphs. LDTG was validated on various dynamic graph benchmarks, providing comparable or state-of-the-art results with significantly higher throughput than previous art. Notably, our method outperforms previous approaches by more than 20\% on benchmarks that require rapid model update rates, such as USLegis or UNTrade. The code to reproduce our experiments is available at \href{https://orfeld415.github.io/module-decoupling}{this http url}.
    摘要 现代方法 для学习动态图使用了批处理而不是一个个更新。使用批处理可以使这些技术在流动enario中变得有用,但是它们会让模型更新不够频繁,从而导致性能下降。在这项工作中,我们提出了一种解耦策略,允许模型在使用批处理时频繁更新。我们通过对核心模块的时间图网络进行解耦,并使用最小化的学习参数来实现,开发了轻量级解耦时间图网络(LDTGN)。LDTG在各种动态图标准测试上验证,提供了相似或现有的成绩,同时具有明显高于前一代的吞吐量。特别是,我们的方法在需要快速模型更新率的标准测试上,比前一代方法提高了更多于20%。我们的实验代码可以在 \href{https://orfeld415.github.io/module-decoupling}{这个HTTP URL} 上复制。

Local Search GFlowNets

  • paper_url: http://arxiv.org/abs/2310.02710
  • repo_url: https://github.com/dbsxodud-11/ls_gfn
  • paper_authors: Minsu Kim, Taeyoung Yun, Emmanuel Bengio, Dinghuai Zhang, Yoshua Bengio, Sungsoo Ahn, Jinkyoo Park
  • for: 提高Generative Flow Networks(GFlowNets)的性能,尤其是在生成高奖励Sample的问题上。
  • methods: 使用本地搜索,通过破坏和重建导向高奖励解决方案,从而偏向生成高奖励样本。
  • results: 在 biochemical tasks 中显著提高性能。
    Abstract Generative Flow Networks (GFlowNets) are amortized sampling methods that learn a distribution over discrete objects proportional to their rewards. GFlowNets exhibit a remarkable ability to generate diverse samples, yet occasionally struggle to consistently produce samples with high rewards due to over-exploration on wide sample space. This paper proposes to train GFlowNets with local search which focuses on exploiting high rewarded sample space to resolve this issue. Our main idea is to explore the local neighborhood via destruction and reconstruction guided by backward and forward policies, respectively. This allows biasing the samples toward high-reward solutions, which is not possible for a typical GFlowNet solution generation scheme which uses the forward policy to generate the solution from scratch. Extensive experiments demonstrate a remarkable performance improvement in several biochemical tasks. Source code is available: \url{https://github.com/dbsxodud-11/ls_gfn}.
    摘要 流式网络(GFlowNets)是一种抽象采样方法,它们学习一个对 discrete 对象的分布,该分布与对象的奖励相对。GFlowNets 显示出了强大的多样性生成能力,但有时会因为扫描范围太广而偶尔难以保持高奖励样本的生成。这篇论文提议通过在本地搜索中使用破坏和重建,以便通过后向和前向策略分别导航,偏好生成高奖励的样本。这与一般 GFlowNet 的解决方案生成方式不同,后者使用前向策略从零开始生成解决方案。广泛的实验表明,这种方法可以在多个生物化学任务中显著提高性能。源代码可以在以下链接中找到:\url{https://github.com/dbsxodud-11/ls_gfn}。

Tackling Hybrid Heterogeneity on Federated Optimization via Gradient Diversity Maximization

  • paper_url: http://arxiv.org/abs/2310.02702
  • repo_url: https://github.com/Zengdun-cs/FedAWARE
  • paper_authors: Dun Zeng, Zenglin Xu, Yu Pan, Qifan Wang, Xiaoying Tang
  • for: This paper focuses on addressing the challenges of hybrid heterogeneity in federated learning, specifically by developing a novel server-side gradient-based optimizer called \textsc{FedAWARE} to mitigate the negative effects of statistical and system heterogeneity on federated optimization.
  • methods: The proposed optimizer uses adaptive gradient diversity maximization in the server update direction to improve the efficiency of federated learning in heterogeneous settings. Theoretical guarantees are provided to support the effectiveness of the proposed method.
  • results: Extensive experiments in heterogeneous federated learning scenarios demonstrate that \textsc{FedAWARE} significantly enhances the performance of federated learning across varying degrees of hybrid heterogeneity, outperforming existing methods in terms of convergence rate and final model accuracy.
    Abstract Federated learning refers to a distributed machine learning paradigm in which data samples are decentralized and distributed among multiple clients. These samples may exhibit statistical heterogeneity, which refers to data distributions are not independent and identical across clients. Additionally, system heterogeneity, or variations in the computational power of the clients, introduces biases into federated learning. The combined effects of statistical and system heterogeneity can significantly reduce the efficiency of federated optimization. However, the impact of hybrid heterogeneity is not rigorously discussed. This paper explores how hybrid heterogeneity affects federated optimization by investigating server-side optimization. The theoretical results indicate that adaptively maximizing gradient diversity in server update direction can help mitigate the potential negative consequences of hybrid heterogeneity. To this end, we introduce a novel server-side gradient-based optimizer \textsc{FedAWARE} with theoretical guarantees provided. Intensive experiments in heterogeneous federated settings demonstrate that our proposed optimizer can significantly enhance the performance of federated learning across varying degrees of hybrid heterogeneity.
    摘要

Exploring Federated Optimization by Reducing Variance of Adaptive Unbiased Client Sampling

  • paper_url: http://arxiv.org/abs/2310.02698
  • repo_url: https://github.com/Zengdun-cs/K-Vib
  • paper_authors: Dun Zeng, Zenglin Xu, Yu Pan, Qifan Wang, Xiaoying Tang
  • for: 这篇论文主要针对 Federated Learning (FL) 系统中的客户端采样问题,即在训练过程中采样一部分客户端以构建全局模型。
  • methods: 本论文提出了一系列”免费”的适应客户端采样技术,其中服务器可以在不需要进一步的本地通信和计算的前提下,建立有前途的采样概率和可靠的全局估计。
  • results: 根据这些技术,本论文提出了一种新的采样器called K-Vib,它可以在在线 convex 优化中对客户端采样进行优化,并达到了 $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$ 的 regret bound,其中 $K$ 是通信预算。这意味着它可以大幅提高 federated 优化的性能。
    Abstract Federated Learning (FL) systems usually sample a fraction of clients to conduct a training process. Notably, the variance of global estimates for updating the global model built on information from sampled clients is highly related to federated optimization quality. This paper explores a line of "free" adaptive client sampling techniques in federated optimization, where the server builds promising sampling probability and reliable global estimates without requiring additional local communication and computation. We capture a minor variant in the sampling procedure and improve the global estimation accordingly. Based on that, we propose a novel sampler called K-Vib, which solves an online convex optimization respecting client sampling in federated optimization. It achieves improved a linear speed up on regret bound $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$ with communication budget $K$. As a result, it significantly improves the performance of federated optimization. Theoretical improvements and intensive experiments on classic federated tasks demonstrate our findings.
    摘要 联合学习(Federated Learning,FL)系统通常会抽出一部分客户进行训练过程。需要注意的是,训练过程中对全球模型的更新所需的全球估计的方差与联合优化质量有高度的相关性。本文探讨了一条“免费”的自适应客户抽样技术在联合优化中,并在不需要额外的本地通信和计算之下,实现估计的可靠性和全球模型的建立。我们捕捉了抽样程序中的一小变种,并根据此进行改进全球估计。基于这,我们提出了一个名为K-Vib的新抽样器,它在联合优化中解决了线上凸优化问题,并在通信预算K下 achieves 的Linear Speedup $\tilde{\mathcal{O}\big(N^{\frac{1}{3}T^{\frac{2}{3}/K^{\frac{4}{3}\big)$。因此,它可以对联合优化进行明显的改进。理论上的改进和实际的实验结果表明我们的发现。

Probabilistic Block Term Decomposition for the Modelling of Higher-Order Arrays

  • paper_url: http://arxiv.org/abs/2310.02694
  • repo_url: None
  • paper_authors: Jesper Løve Hinrich, Morten Mørup
  • for: 本研究旨在提出一种可效的 bayesian 幂级分解方法,用于robust地推断多linear数据中的pattern。
  • methods: 该方法基于 von-Mises Fisher 矩阵分布,以实现多linear Tucker 部分中的正交性。
  • results: 在synthetic和实际数据上,我们验证了 bayesian 推断过程和提出的 pBTD 方法,并在噪声数据和模型顺序量化问题上进行了应用。结果表明, probabilistic BTD 可以量化适当的多linear结构,提供一种可靠的推断多linear数据中的pattern。
    Abstract Tensors are ubiquitous in science and engineering and tensor factorization approaches have become important tools for the characterization of higher order structure. Factorizations includes the outer-product rank Canonical Polyadic Decomposition (CPD) as well as the multi-linear rank Tucker decomposition in which the Block-Term Decomposition (BTD) is a structured intermediate interpolating between these two representations. Whereas CPD, Tucker, and BTD have traditionally relied on maximum-likelihood estimation, Bayesian inference has been use to form probabilistic CPD and Tucker. We propose, an efficient variational Bayesian probabilistic BTD, which uses the von-Mises Fisher matrix distribution to impose orthogonality in the multi-linear Tucker parts forming the BTD. On synthetic and two real datasets, we highlight the Bayesian inference procedure and demonstrate using the proposed pBTD on noisy data and for model order quantification. We find that the probabilistic BTD can quantify suitable multi-linear structures providing a means for robust inference of patterns in multi-linear data.
    摘要 tensor 是科学和工程领域中的普遍存在,tensor factorization 方法已成为高阶结构的特征化工具。这些分解包括外积级 Canonical Polyadic Decomposition (CPD) 以及多线性级 Tucker 分解,其中 Block-Term Decomposition (BTD) 是这两种表示之间的结构化中间件。而 CPDT, Tucker 和 BTD 传统上采用最大化可信度估计,我们则使用 Bayesian 推断来建立probabilistic CPD 和 Tucker。我们提议了一种高效的 Bayesian 推断可变 BTD,使用 von-Mises Fisher 矩阵分布来强制多线性 Tucker 部分的正交性。在一些Synthetic和两个实际数据集上,我们展示了 Bayesian 推断过程并使用我们提议的 pBTD 处理噪声数据和模型顺序量化。我们发现可变 BTD 可以量化适当的多线性结构,提供一种robust的 Pattern 推断方法。

Robust Ocean Subgrid-Scale Parameterizations Using Fourier Neural Operators

  • paper_url: http://arxiv.org/abs/2310.02691
  • repo_url: https://github.com/vikvador/ocean-subgrid-parameterizations-in-an-idealized-model-using-machine-learning
  • paper_authors: Victor Mangeleer, Gilles Louppe
  • for: 这paper是为了研究小规模过程对海洋动力的影响,但直接计算这些过程仍然是计算成本高的问题。
  • methods: 本paper使用Fourier Neural Operators来开发参数化方法,并证明其精度和通用性比其他方法更高。
  • results: 本paper的结果表明,Fourier Neural Operators可以准确地捕捉小规模过程的影响,并且在长期预测中具有较少的误差。
    Abstract In climate simulations, small-scale processes shape ocean dynamics but remain computationally expensive to resolve directly. For this reason, their contributions are commonly approximated using empirical parameterizations, which lead to significant errors in long-term projections. In this work, we develop parameterizations based on Fourier Neural Operators, showcasing their accuracy and generalizability in comparison to other approaches. Finally, we discuss the potential and limitations of neural networks operating in the frequency domain, paving the way for future investigation.
    摘要 在气候模拟中,小规模过程对海洋动力具有重要作用,但直接计算起来很计算昂贵。因此,通常使用实验性参数化来 aproximate their contributions,这会导致长期预测中出现显著的错误。在这项工作中,我们开发了基于傅ри涅尔 нейрон算法的参数化,并证明其准确性和通用性与其他方法相比。最后,我们讨论了神经网络在频率域中运行的潜在和局限性,以便未来的调查。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

  • paper_url: http://arxiv.org/abs/2310.02671
  • repo_url: None
  • paper_authors: Sara Klein, Simon Weissmann, Leif Döring
  • for: 这篇论文是关于Sequential Decision-Making Problems的Formal Framework,尤其是在 finite-time horizon 中的 Optimal Stopping 和Specific Supply Chain Problems 等领域。
  • methods: 这篇论文使用了 dynamic programming 和 policy gradient 的结合,将parameters逐步训练 backwards in time。
  • results: 研究发现,使用 dynamic policy gradient 训练 much better 利用 finite-time problems 的结构,实现 improved convergence bounds。
    Abstract Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.
    摘要

Hire When You Need to: Gradual Participant Recruitment for Auction-based Federated Learning

  • paper_url: http://arxiv.org/abs/2310.02651
  • repo_url: None
  • paper_authors: Xavier Tan, Han Yu
  • for: 这篇论文的目的是提出一个名为Gradual Participant Selection scheme for Auction-based Federated Learning(GPS-AFL),用于解决联邦学习(Federated Learning,FL)中数据所有者(Data Owner,DO)的选择问题。
  • methods: 这篇论文使用的方法包括Gradual Participant Selection scheme,它在多次训练轮中逐渐选择需要的DO,以提高选择的多样性和对绩效的影响。
  • results: 实验结果显示,GPS-AFL可以对联邦学习中的成本优化,并提高绩效。相比最佳先进方法,GPS-AFL可以降低成本33.65%,并提高绩效2.91%。
    Abstract The success of federated Learning (FL) depends on the quantity and quality of the data owners (DOs) as well as their motivation to join FL model training. Reputation-based FL participant selection methods have been proposed. However, they still face the challenges of the cold start problem and potential selection bias towards highly reputable DOs. Such a bias can result in lower reputation DOs being prematurely excluded from future FL training rounds, thereby reducing the diversity of training data and the generalizability of the resulting models. To address these challenges, we propose the Gradual Participant Selection scheme for Auction-based Federated Learning (GPS-AFL). Unlike existing AFL incentive mechanisms which generally assume that all DOs required for an FL task must be selected in one go, GPS-AFL gradually selects the required DOs over multiple rounds of training as more information is revealed through repeated interactions. It is designed to strike a balance between cost saving and performance enhancement, while mitigating the drawbacks of selection bias in reputation-based FL. Extensive experiments based on real-world datasets demonstrate the significant advantages of GPS-AFL, which reduces costs by 33.65% and improved total utility by 2.91%, on average compared to the best-performing state-of-the-art approach.
    摘要 federated learning(FL)的成功取决于数据所有者(DO)的数量和质量以及他们参与FL模型训练的动机。基于声誉的FL参与者选择方法已被提议,但它们仍面临冷启动问题和可能的选择偏袋向高声誉DOs。这种偏袋会导致低声誉DOs在未来FL训练回合中被排除,从而减少了训练数据的多样性和模型的泛化性。为解决这些挑战,我们提出了 Gradual Participant Selection scheme for Auction-based Federated Learning(GPS-AFL)。与现有的AFL奖励机制不同,GPS-AFL在多个训练回合中逐渐选择需要参与FL任务的DO,以便在更多信息的披露下进行更加精准的选择。它旨在寻求成本节省和性能提高的平衡,同时减少选择偏袋的问题。基于实际数据集的实验表明,GPS-AFL可以规避选择偏袋问题,同时节省33.65%的成本和提高总用度2.91%,相比最佳现有方法。

Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs

  • paper_url: http://arxiv.org/abs/2310.02619
  • repo_url: None
  • paper_authors: Ilan Naiman, N. Benjamin Erichson, Pu Ren, Michael W. Mahoney, Omri Azencot
  • for: 本研究旨在提出一种基于koopman理论的生成框架,以提高时间序列生成的质量。
  • methods: 该方法使用了varational autoencoder(VAE)和生成对抗网络(GAN)的组合,并通过使用spectral工具来制约Conditional Prior动态的linear map。
  • results: 实验结果表明,koopman VAE(KVAE)在多个Synthetic和实际时间序列生成 benchmarck上表现出色,并且可以在Regular和Irregular数据上进行优化。KVAE可以提高时间序列的discriminative和predictive metric,并且可以更好地模拟实际分布。
    Abstract Generating realistic time series data is important for many engineering and scientific applications. Existing work tackles this problem using generative adversarial networks (GANs). However, GANs are often unstable during training, and they can suffer from mode collapse. While variational autoencoders (VAEs) are known to be more robust to these issues, they are (surprisingly) less often considered for time series generation. In this work, we introduce Koopman VAE (KVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data. Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map. Our approach enhances generative modeling with two desired features: (i) incorporating domain knowledge can be achieved by leverageing spectral tools that prescribe constraints on the eigenvalues of the linear map; and (ii) studying the qualitative behavior and stablity of the system can be performed using tools from dynamical systems theory. Our results show that KVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks. Whether trained on regular or irregular data, KVAE generates time series that improve both discriminative and predictive metrics. We also present visual evidence suggesting that KVAE learns probability density functions that better approximate empirical ground truth distributions.
    摘要 <>将文本翻译成简化中文。<>生成实际时间序数据是许多工程和科学应用中的重要问题。现有的工作使用生成对抗网络(GAN)解决这个问题。然而,GAN 在训练时常会不稳定,而且容易出现模式塌缩。而变量自编码器(VAE)则被认为是更加稳定的选择,但它们在时间序数据生成中却让人感到奇怪地少被考虑。在这种工作中,我们引入了 Koopman VAE(KVAE),一种新的生成框架,基于一种新的模型先验设计。我们使用 Koopman 理论来表示干扰条件先验动力,并可以在训练数据是正规或异常的情况下优化。我们的方法拥有两个愿望特征:(一)可以通过spectral工具来遵循干扰条件的特征,从而把域知识引入到模型中;(二)可以通过动力学系统理论来研究系统的质量和稳定性。我们的结果表明,KVAE 在多个Synthetic和实际时间序数据生成 benchmark 上表现出色,并且在训练正规或异常数据时都能够提高描述性和预测性指标。我们还提供视觉证据,表明 KVAE 学习的概率分布更加准确地反映实际的基础真实分布。

Learning adjacency matrix for dynamic graph neural network

  • paper_url: http://arxiv.org/abs/2310.02606
  • repo_url: None
  • paper_authors: Osama Ahmad, Omer Abdul Jalil, Usman Nazir, Murtaza Taj
  • for: This paper aims to address the challenge of modeling spatio-temporal data using Graph Convolutional Networks (GCNs) by introducing an encoder block to learn missing temporal links in the Block Adjacency Matrix (BA).
  • methods: The proposed method uses an encoder block to process the BA and predict connections between previously unconnected subgraphs, resulting in a Spatio-Temporal Block Adjacency Matrix (STBAM). The STBAM is then fed into a GNN to capture the complex spatio-temporal topology of the network.
  • results: The proposed method achieves superior results compared to state-of-the-art results on benchmark datasets, surgVisDom and C2D2, with slightly higher complexity. However, the computational overhead remains significantly lower than conventional non-graph-based methodologies for spatio-temporal data.Here is the simplified Chinese version of the three key points:
  • for: 这篇论文目标是使用图解 convolutional Neural Networks (GCNs) 来处理空间-时间数据,并提出一种encoder块来学习缺失的时间链接。
  • methods: 提议的方法使用encoder块处理块相互邻接矩阵(BA),并预测连接在不同时间步的子图,得到一个Spatio-Temporal Block Adjacency Matrix (STBAM)。然后将STBAM feed into GNN来捕捉复杂的空间-时间网络。
  • results: 提议的方法在benchmark数据集上(surgVisDom和C2D2)达到了与当前最佳方法相当的成绩,但略有更高的复杂性。然而,计算 overhead仍然远低于非图基于的方法。
    Abstract In recent work, [1] introduced the concept of using a Block Adjacency Matrix (BA) for the representation of spatio-temporal data. While their method successfully concatenated adjacency matrices to encapsulate spatio-temporal relationships in a single graph, it formed a disconnected graph. This limitation hampered the ability of Graph Convolutional Networks (GCNs) to perform message passing across nodes belonging to different time steps, as no temporal links were present. To overcome this challenge, we introduce an encoder block specifically designed to learn these missing temporal links. The encoder block processes the BA and predicts connections between previously unconnected subgraphs, resulting in a Spatio-Temporal Block Adjacency Matrix (STBAM). This enriched matrix is then fed into a Graph Neural Network (GNN) to capture the complex spatio-temporal topology of the network. Our evaluations on benchmark datasets, surgVisDom and C2D2, demonstrate that our method, with slightly higher complexity, achieves superior results compared to state-of-the-art results. Our approach's computational overhead remains significantly lower than conventional non-graph-based methodologies for spatio-temporal data.
    摘要 最近的工作中,[1] 提出了使用块邻接矩阵(BA)来表示空间时间数据的思想。他们的方法可以将邻接矩阵 concatenate 成一个图,以便在单个图上捕捉空间时间关系。然而,这种方法会形成离散图,这限制了图解决方法(GCNs)在不同时间步中的信息传递。为了解决这个挑战,我们提出了一个专门为了学习缺失的时间链接而设计的编码块。该编码块处理BA,预测了之前不相连的子图之间的连接,从而生成了一个具有空间时间特征的块邻接矩阵(STBAM)。这个充气的矩阵然后被 fed 给图神经网络(GNN),以捕捉复杂的空间时间 topology。我们对 benchmark 数据集 surgVisDom 和 C2D2 进行了评估,得出了较高的比较结果,而且与传统非图基的方法相比,我们的方法的计算开销仍然很低。

  • paper_url: http://arxiv.org/abs/2310.05978
  • repo_url: None
  • paper_authors: Nofar Piterman, Tamar Makov, Michael Fire
  • for: This paper aims to explore the development of volunteer-based social networks and the behavior of key users in these networks.
  • methods: The authors developed two novel algorithms to analyze the behavior of key users in volunteer-based social networks, including a pattern-based algorithm and a machine learning-based forecasting model.
  • results: The authors used data from a peer-to-peer food-sharing platform to evaluate their algorithms and found that they could accurately predict future behavior of key users, with an accuracy of up to 89.6%. They identified four main types of key user behavior patterns and were able to forecast which users would become active donors or change their behavior to become mainly recipients.
    Abstract Online social networks usage has increased significantly in the last decade and continues to grow in popularity. Multiple social platforms use volunteers as a central component. The behavior of volunteers in volunteer-based networks has been studied extensively in recent years. Here, we explore the development of volunteer-based social networks, primarily focusing on their key users' behaviors and activities. We developed two novel algorithms: the first reveals key user behavior patterns over time; the second utilizes machine learning methods to generate a forecasting model that can predict the future behavior of key users, including whether they will remain active donors or change their behavior to become mainly recipients, and vice-versa. These algorithms allowed us to analyze the factors that significantly influence behavior predictions. To evaluate our algorithms, we utilized data from over 2.4 million users on a peer-to-peer food-sharing online platform. Using our algorithm, we identified four main types of key user behavior patterns that occur over time. Moreover, we succeeded in forecasting future active donor key users and predicting the key users that would change their behavior to donors, with an accuracy of up to 89.6%. These findings provide valuable insights into the behavior of key users in volunteer-based social networks and pave the way for more effective communities-building in the future, while using the potential of machine learning for this goal.
    摘要 在过去一个十年中,在线社交网络的使用量增长了非常 significatively,并且继续增长受欢迎。多个社交平台都使用志愿者作为中心组件。志愿者在志愿者基于的网络中的行为已经得到了广泛的研究。在这里,我们探讨了志愿者基于的社交网络的发展,主要关注针对键用户的行为和活动。我们开发了两个新的算法:第一个显示针对时间的键用户行为模式;第二个使用机器学习方法生成预测未来键用户行为的预测模型,包括未来是否继续为主要捐赠者或者变为主要接收者,并且vice versa。这些算法让我们可以分析预测行为的因素。为了评估我们的算法,我们使用了在线 peer-to-peer 食物分享平台上的超过240万名用户的数据。使用我们的算法,我们发现了四种主要的键用户行为模式,并且成功预测了未来活跃捐赠键用户和变为主要接收者的键用户,准确率达到89.6%。这些发现提供了志愿者基于社交网络的行为的有价值的洞察,并为未来建立更有效的社区帮助做出了重要贡献。

Machine Learning-Enabled Precision Position Control and Thermal Regulation in Advanced Thermal Actuators

  • paper_url: http://arxiv.org/abs/2310.02583
  • repo_url: None
  • paper_authors: Seyed Mo Mirvakili, Ehsan Haghighat, Douglas Sim
  • for: 这个论文是为了研究和开发一种基于机器学习的开 Loop控制器,用于控制纤维素人工肌。
  • methods: 该论文使用了一种基于机器学习的ensemble encoder-style feed-forward neural network来映射所需的位移轨迹到所需的功率。
  • results: 研究人员通过对一种纤维素人工肌进行Position控制,证明了该控制器可以在没有外部传感器的情况下实现精准的位移控制。
    Abstract With their unique combination of characteristics - an energy density almost 100 times that of human muscle, and a power density of 5.3 kW/kg, similar to a jet engine's output - Nylon artificial muscles stand out as particularly apt for robotics applications. However, the necessity of integrating sensors and controllers poses a limitation to their practical usage. Here we report a constant power open-loop controller based on machine learning. We show that we can control the position of a nylon artificial muscle without external sensors. To this end, we construct a mapping from a desired displacement trajectory to a required power using an ensemble encoder-style feed-forward neural network. The neural controller is carefully trained on a physics-based denoised dataset and can be fine-tuned to accommodate various types of thermal artificial muscles, irrespective of the presence or absence of hysteresis.
    摘要 借助其独特的特点 - 能量密度超过人体肌肉100倍,功率密度与液体发动机类似(5.3 kW/kg) - 聚合物人造肌 stood out as particularly suitable for robotics applications. However, the need to integrate sensors and controllers poses a practical limitation. We report a constant power open-loop controller based on machine learning. We show that we can control the position of a nylon artificial muscle without external sensors. To achieve this, we establish a mapping from a desired displacement trajectory to a required power using an ensemble encoder-style feed-forward neural network. The neural controller is carefully trained on a physics-based denoised dataset and can be fine-tuned to accommodate various types of thermal artificial muscles, regardless of the presence or absence of hysteresis.

Online Estimation and Inference for Robust Policy Evaluation in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.02581
  • repo_url: None
  • paper_authors: Weidong Liu, Jiyuan Tu, Yichen Zhang, Xi Chen
  • for: 本研究旨在探讨对强化学习策略评估中的统计推断,特别是使用强化学习算法计算出来的参数估计的统计推断。
  • methods: 本研究使用了Robust statistics和强化学习的概率评估方法,并提出了一种在线的robust政策评估方法,以及一种基于极限分布的 Statistical Inference 方法。
  • results: 本研究通过实验验证了其算法的有效性,并提供了一种更加多样化和可靠的强化学习策略评估方法。
    Abstract Recently, reinforcement learning has gained prominence in modern statistics, with policy evaluation being a key component. Unlike traditional machine learning literature on this topic, our work places emphasis on statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards to follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework. In this paper, we develop an online robust policy evaluation procedure, and establish the limiting distribution of our estimator, based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.
    摘要 最近,再强化学习在现代统计中得到了广泛的应用,其中政策评估是关键 ком ponent。 unlike traditional machine learning literature on this topic, our work emphasizes statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework.In this paper, we develop an online robust policy evaluation procedure and establish the limiting distribution of our estimator based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.Here is the translation of the given text into Traditional Chinese:最近,再强化学习在现代统计中得到了广泛的应用,其中政策评估是关键 ком ponent。 Unlike traditional machine learning literature on this topic, our work emphasizes statistical inference for the parameter estimates computed using reinforcement learning algorithms. While most existing analyses assume random rewards follow standard distributions, limiting their applicability, we embrace the concept of robust statistics in reinforcement learning by simultaneously addressing issues of outlier contamination and heavy-tailed rewards within a unified framework.In this paper, we develop an online robust policy evaluation procedure and establish the limiting distribution of our estimator based on its Bahadur representation. Furthermore, we develop a fully-online procedure to efficiently conduct statistical inference based on the asymptotic distribution. This paper bridges the gap between robust statistics and statistical inference in reinforcement learning, offering a more versatile and reliable approach to policy evaluation. Finally, we validate the efficacy of our algorithm through numerical experiments conducted in real-world reinforcement learning experiments.

Improving Knowledge Distillation with Teacher’s Explanation

  • paper_url: http://arxiv.org/abs/2310.02572
  • repo_url: None
  • paper_authors: Sayantan Chowdhury, Ben Liang, Ali Tizghadam, Ilijc Albanese
  • for: 提高一个低复杂度学生模型的性能,通过一个更强大的教师模型帮助。
  • methods: 使用知识填充distillation(KD)方法,但是限制了传递的知识量。这个研究提出了一种新的知识解释填充(KED)框架,允许学生模型不仅从教师模型的预测中学习,还可以从教师模型的解释中获得知识。
  • results: 我们的实验表明,KED学生模型可以在多个数据集上substantially outperform KD学生模型相同的复杂度。
    Abstract Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This limits the amount of transferred knowledge. In this work, we introduce a novel Knowledge Explaining Distillation (KED) framework, which allows the student to learn not only from the teacher's predictions but also from the teacher's explanations. We propose a class of superfeature-explaining teachers that provide explanation over groups of features, along with the corresponding student model. We also present a method for constructing the superfeatures. We then extend KED to reduce complexity in convolutional neural networks, to allow augmentation with hidden-representation distillation methods, and to work with a limited amount of training data using chimeric sets. Our experiments over a variety of datasets show that KED students can substantially outperform KD students of similar complexity.
    摘要 知识填充(KD)可以提高一个低复杂度学生模型的性能,通过一个更强大的教师模型的帮助。教师模型在KD中是一个黑盒模型,只通过其预测来传递知识给学生。这限制了知识的传递量。在这项工作中,我们介绍了一种新的知识解释填充(KED)框架,允许学生不仅从教师模型的预测中学习,还可以从教师模型的解释中获得知识。我们提议一类超特征解释教师,这些教师可以对特征组提供解释,同时与学生模型一起提供。我们还提出了超特征的构造方法。然后,我们将KED扩展到减少卷积神经网络的复杂性,使其可以与隐藏表示填充方法结合使用,并使用有限的训练数据使用 chimera 集。我们的实验表明,KED学生可以在多个数据集上明显超越KD学生。

Practical, Private Assurance of the Value of Collaboration

  • paper_url: http://arxiv.org/abs/2310.02563
  • repo_url: None
  • paper_authors: Hassan Jameel Asghar, Zhigang Lu, Zhongrui Zhao, Dali Kaafar
  • for: The paper is written for the problem of collaborative machine learning between two parties who want to improve the accuracy of their prediction models by sharing their datasets, but they do not want to reveal their models and datasets to each other beforehand.
  • methods: The paper proposes an interactive protocol based on fully homomorphic encryption (FHE) and label differential privacy to enable collaborative machine learning while preserving the privacy of the parties’ models and datasets. The protocol uses a neural network as the underlying machine learning model.
  • results: The paper shows that the proposed protocol achieves a significant improvement in accuracy compared to a protocol using entirely FHE operations, and the results are obtained with a time that is many orders of magnitude faster. The security of the protocol is proven in the universal composability framework assuming honest-but-curious parties, but with one party having no expertise in labeling its initial dataset.
    Abstract Two parties wish to collaborate on their datasets. However, before they reveal their datasets to each other, the parties want to have the guarantee that the collaboration would be fruitful. We look at this problem from the point of view of machine learning, where one party is promised an improvement on its prediction model by incorporating data from the other party. The parties would only wish to collaborate further if the updated model shows an improvement in accuracy. Before this is ascertained, the two parties would not want to disclose their models and datasets. In this work, we construct an interactive protocol for this problem based on the fully homomorphic encryption scheme over the Torus (TFHE) and label differential privacy, where the underlying machine learning model is a neural network. Label differential privacy is used to ensure that computations are not done entirely in the encrypted domain, which is a significant bottleneck for neural network training according to the current state-of-the-art FHE implementations. We prove the security of our scheme in the universal composability framework assuming honest-but-curious parties, but where one party may not have any expertise in labelling its initial dataset. Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with time many orders of magnitude faster than a protocol using entirely FHE operations.
    摘要 To address this problem, we construct an interactive protocol based on fully homomorphic encryption over the torus (TFHE) and label differential privacy. Label differential privacy ensures that computations are not performed entirely in the encrypted domain, which is a significant bottleneck for neural network training according to current state-of-the-art FHE implementations. We prove the security of our scheme in the universal composability framework, assuming honest-but-curious parties, but where one party may not have any expertise in labeling its initial dataset.Experiments show that we can obtain the output, i.e., the accuracy of the updated model, with a time many orders of magnitude faster than a protocol using entirely FHE operations.

Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning Framework

  • paper_url: http://arxiv.org/abs/2310.02559
  • repo_url: None
  • paper_authors: Jingheng Zheng, Wanli Ni, Hui Tian, Deniz Gunduz, Tony Q. S. Quek, Zhu Han
  • for: 该论文旨在提出一种 semi-federated learning(SemiFL)模型,以充分利用基站(BS)和设备之间的计算资源,并且在同时采用中央学习(CL)和分布式学习(FL)的 hybrid 实现方式下,提高学习效率和质量。
  • methods: 该论文使用了一种 novel transceiver structure,将在空中计算和非对称多access(NMA)技术相结合,以提高通信效率。此外,论文还提出了一种基于closed-form optimality gap的可靠性分析方法,以及一种基于 transmit power和接收天线的非对称优化问题的解决方法。
  • results: 论文的实验结果表明,提出的 SemiFL 方法可以比 conventinal FL 方法提高3.2%的准确率,并且在 MNIST 数据集上达到了比 estado-of-the-art bencmarks 高的准确率。
    Abstract Under the organization of the base station (BS), wireless federated learning (FL) enables collaborative model training among multiple devices. However, the BS is merely responsible for aggregating local updates during the training process, which incurs a waste of the computational resource at the BS. To tackle this issue, we propose a semi-federated learning (SemiFL) paradigm to leverage the computing capabilities of both the BS and devices for a hybrid implementation of centralized learning (CL) and FL. Specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.
    摘要 在基站(BS)的组织下,无线联合学习(FL)可以在多个设备之间进行共同模型训练。然而,BS只负责收集本地更新 durante el proceso de entrenamiento, 这会导致BS的计算资源的浪费。为解决这个问题,我们提出了半联合学习(SemiFL) paradigma,以利用BS和设备之间的计算能力 для hybrid实现中央学习(CL)和FL。specifically, each device sends both local gradients and data samples to the BS for training a shared global model. To improve communication efficiency over the same time-frequency resources, we integrate over-the-air computation for aggregation and non-orthogonal multiple access for transmission by designing a novel transceiver structure. To gain deep insights, we conduct convergence analysis by deriving a closed-form optimality gap for SemiFL and extend the result to two extra cases. In the first case, the BS uses all accumulated data samples to calculate the CL gradient, while a decreasing learning rate is adopted in the second case. Our analytical results capture the destructive effect of wireless communication and show that both FL and CL are special cases of SemiFL. Then, we formulate a non-convex problem to reduce the optimality gap by jointly optimizing the transmit power and receive beamformers. Accordingly, we propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers. Extensive simulation results on two real-world datasets corroborate our theoretical analysis, and show that the proposed SemiFL outperforms conventional FL and achieves 3.2% accuracy gain on the MNIST dataset compared to state-of-the-art benchmarks.

Heterogeneous Federated Learning Using Knowledge Codistillation

  • paper_url: http://arxiv.org/abs/2310.02549
  • repo_url: None
  • paper_authors: Jared Lichtarge, Ehsan Amid, Shankar Kumar, Tien-Ju Yang, Rohan Anil, Rajiv Mathews
  • for: 提高 federated learning 中客户端之间模型 Architecture 的共享性,以提高模型性能。
  • methods: 提出一种方法,通过在整个池中训练小型模型,并在一些客户端上训练更大型模型,通过双向知识传递,使用无标签数据集在服务器上进行交互,不需要客户端参数的共享。
  • results: 在图像分类和自然语言处理任务上,提出了两种变种方法,可以超越 federated averaging 的限制,并且在只有部分out-of-domain或有限域知识传递数据时,也可以达到良好的效果。同时,双向知识传递允许模型在不同池中的客户端引入域转换。
    Abstract Federated Averaging, and many federated learning algorithm variants which build upon it, have a limitation: all clients must share the same model architecture. This results in unused modeling capacity on many clients, which limits model performance. To address this issue, we propose a method that involves training a small model on the entire pool and a larger model on a subset of clients with higher capacity. The models exchange information bidirectionally via knowledge distillation, utilizing an unlabeled dataset on a server without sharing parameters. We present two variants of our method, which improve upon federated averaging on image classification and language modeling tasks. We show this technique can be useful even if only out-of-domain or limited in-domain distillation data is available. Additionally, the bi-directional knowledge distillation allows for domain transfer between the models when different pool populations introduce domain shift.
    摘要 Federated Averaging 和许多联合学习算法的变种都有一个限制:所有客户端都必须使用同一个模型结构。这会导致许多客户端的可用模型容量不被利用,从而限制模型性能。为解决这个问题,我们提议一种方法,即在整个池中训练一个小型模型,并在一些客户端上训练一个更大的模型,这些客户端具有更高的计算能力。这两个模型通过知识传承进行双向交互,无需在服务器上分享参数。我们提出了两种变种,可以超越联合平均值在图像分类和自然语言处理任务上的性能。我们表明,即使只有部分客户端的数据可用,这种技术仍然可以获得有用的效果。此外,双向知识传承允许模型在不同的池中引入域转移。

Exact and soft boundary conditions in Physics-Informed Neural Networks for the Variable Coefficient Poisson equation

  • paper_url: http://arxiv.org/abs/2310.02548
  • repo_url: None
  • paper_authors: Sebastian Barschkis
  • for: 本研究旨在比较使用soft loss基于的边界条件(BC)和精确距离函数基于的BC在物理学信息泛化网络(PINN)中的效果。
  • methods: 本研究使用了变量系数Poisson方程作为目标偏微分方程,并对PINN模型进行了训练。两种不同的BC决策方法被比较,即soft loss基于的BC和精确距离函数基于的BC。
  • results: 研究发现,soft loss基于的BC和精确距离函数基于的BC具有不同的优劣点,选择合适的BC决策方法可以提高PINN模型的拟合精度。此外,本研究还提供了实践中如何实现这些PINN模型的代码和步骤示例。
    Abstract Boundary conditions (BCs) are a key component in every Physics-Informed Neural Network (PINN). By defining the solution to partial differential equations (PDEs) along domain boundaries, BCs constrain the underlying boundary value problem (BVP) that a PINN tries to approximate. Without them, unique PDE solutions may not exist and finding approximations with PINNs would be a challenging, if not impossible task. This study examines how soft loss-based and exact distance function-based BC imposition approaches differ when applied in PINNs. The well known variable coefficient Poisson equation serves as the target PDE for all PINN models trained in this work. Besides comparing BC imposition approaches, the goal of this work is to also provide resources on how to implement these PINNs in practice. To this end, Keras models with Tensorflow backend as well as a Python notebook with code examples and step-by-step explanations on how to build soft/exact BC PINNs are published alongside this review.
    摘要 <>将文本翻译为简化中文。<>物理学 informed neural network(PINN)中的边界条件(BC)是一个关键组成部分。通过定义解决方程 partial differential equations(PDEs)的边界解,BC 使得 PINN approximates 的边界值问题(BVP)受到限制。无其,可能无准确解和使用 PINN approximates 是一项困难,如果不可能的任务。本研究比较了使用 soft loss 基于和 exact distance function 基于的 BC 强制方法在 PINN 中的不同效果。使用了已知变量 coefficients Poisson equation 作为所有 PINN 模型在这种工作中的目标 PDE。此外,本研究还提供了实现这些 PINN 的实践方法,包括使用 Keras 模型和 Tensorflow 后端,以及一个 Python notationebook 中的代码示例和步骤说明如何建立 soft/exact BC PINN。

Joint Design of Protein Sequence and Structure based on Motifs

  • paper_url: http://arxiv.org/abs/2310.02546
  • repo_url: None
  • paper_authors: Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li
  • for: 本研究旨在设计具有感兴趣功能的蛋白质,尤其是蛋白质序列和结构共设计。
  • methods: 本研究提出了GeoPro方法,它可以同时设计蛋白质脊梁结构和序列。GeoPro利用了三维脊梁结构具有同质性的编码器和蛋白质序列带动器,以确保蛋白质序列和结构之间的互相约束。
  • results: 实验结果表明,GeoPro方法在两个生物学重要的铁蛋白 dataset上都超过了多个强基eline。特别是,我们的方法发现了不在蛋白质数据库(PDB)和UniProt中的新的β-恶啉蛋白和肌球蛋白,这些蛋白质具有稳定的折叠和活性站点环境,表明它们具有出色的生物功能。
    Abstract Designing novel proteins with desired functions is crucial in biology and chemistry. However, most existing work focus on protein sequence design, leaving protein sequence and structure co-design underexplored. In this paper, we propose GeoPro, a method to design protein backbone structure and sequence jointly. Our motivation is that protein sequence and its backbone structure constrain each other, and thus joint design of both can not only avoid nonfolding and misfolding but also produce more diverse candidates with desired functions. To this end, GeoPro is powered by an equivariant encoder for three-dimensional (3D) backbone structure and a protein sequence decoder guided by 3D geometry. Experimental results on two biologically significant metalloprotein datasets, including $\beta$-lactamases and myoglobins, show that our proposed GeoPro outperforms several strong baselines on most metrics. Remarkably, our method discovers novel $\beta$-lactamases and myoglobins which are not present in protein data bank (PDB) and UniProt. These proteins exhibit stable folding and active site environments reminiscent of those of natural proteins, demonstrating their excellent potential to be biologically functional.
    摘要 设计新的蛋白质with愿景功能是生物和化学中的关键。然而,大多数现有的工作都集中在蛋白质序列设计上,留下蛋白质序列和结构协同设计的潜在价值得未经探索。在本文中,我们提出了GeoPro方法,它可以同时设计蛋白质脊梁结构和序列。我们的动机是蛋白质序列和其脊梁结构之间存在紧密的关系,因此同时设计两者可以不仅避免蛋白质不感应和扭曲,而且生成更多的多样化候选者with愿景功能。为此,GeoPro得力于一种对三维脊梁结构具有对称性的编码器,以及基于3D几何学的蛋白质序列解码器。我们的实验结果表明,我们的提议的GeoPro在两个生物学上重要的铁蛋白质数据集上(包括β- лакта啡和肌红蛋白)都与多个强大的基准值相比,在大多数指标上表现出色。特别是,我们的方法发现了不在蛋白质数据库(PDB)和UniProt中的新的β- лакта啡和肌红蛋白,这些蛋白质具有自然蛋白质的稳定折叠和活化位点环境,这表明它们具有出色的生物功能。

Provable Tensor Completion with Graph Information

  • paper_url: http://arxiv.org/abs/2310.02543
  • repo_url: None
  • paper_authors: Kaidong Wang, Yao Wang, Xiuwu Liao, Shaojie Tang, Can Yang, Deyu Meng
  • for: 本文研究的目标是 tensor completion problem with graph information,即使用图信息来减少缺失数据的问题。
  • methods: 本文提出了一个整体的框架,包括模型、理论和算法,用于解决动态图常量缺失问题。模型基于变换的 t-SVD tensor decomposition模型,并添加了一种新的图 Orientated smoothness regularization。
  • results: 本文提供了一种有效的算法,并证明了其统计准确性。在 synthetic 数据和实际数据上进行了深入的数值实验,证明了模型的强大性。
    Abstract Graphs, depicting the interrelations between variables, has been widely used as effective side information for accurate data recovery in various matrix/tensor recovery related applications. In this paper, we study the tensor completion problem with graph information. Current research on graph-regularized tensor completion tends to be task-specific, lacking generality and systematic approaches. Furthermore, a recovery theory to ensure performance remains absent. Moreover, these approaches overlook the dynamic aspects of graphs, treating them as static akin to matrices, even though graphs could exhibit dynamism in tensor-related scenarios. To confront these challenges, we introduce a pioneering framework in this paper that systematically formulates a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem. For the model, we establish a rigorous mathematical representation of the dynamic graph, based on which we derive a new tensor-oriented graph smoothness regularization. By integrating this regularization into a tensor decomposition model based on transformed t-SVD, we develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor. In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm. Subsequently, we establish assurances of statistical consistency for our model, effectively bridging a gap in the theoretical examination of the problem involving tensor recovery with graph information. In terms of the algorithm, we develop a solution of high effectiveness, accompanied by a guaranteed convergence, to address the resulting model. To showcase the prowess of our proposed model in contrast to established ones, we provide in-depth numerical experiments encompassing synthetic data as well as real-world datasets.
    摘要 图表, displaying the relationships between variables, 已广泛用于精准数据恢复应用中作为有效的侧信息。在这篇论文中,我们研究了tensor completion问题中的图信息。现有的研究 tend to be task-specific, lacking generality and systematic approaches. Furthermore, a recovery theory to ensure performance remains absent. Moreover, these approaches overlook the dynamic aspects of graphs, treating them as static akin to matrices, even though graphs could exhibit dynamism in tensor-related scenarios. To confront these challenges, we introduce a pioneering framework in this paper that systematically formulates a novel model, theory, and algorithm for solving the dynamic graph regularized tensor completion problem.For the model, we establish a rigorous mathematical representation of the dynamic graph, based on which we derive a new tensor-oriented graph smoothness regularization. By integrating this regularization into a tensor decomposition model based on transformed t-SVD, we develop a comprehensive model simultaneously capturing the low-rank and similarity structure of the tensor.In terms of theory, we showcase the alignment between the proposed graph smoothness regularization and a weighted tensor nuclear norm. Subsequently, we establish assurances of statistical consistency for our model, effectively bridging a gap in the theoretical examination of the problem involving tensor recovery with graph information.In terms of the algorithm, we develop a solution of high effectiveness, accompanied by a guaranteed convergence, to address the resulting model. To showcase the prowess of our proposed model in contrast to established ones, we provide in-depth numerical experiments encompassing synthetic data as well as real-world datasets.

Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data

  • paper_url: http://arxiv.org/abs/2310.02541
  • repo_url: None
  • paper_authors: Zhiwei Xu, Yutong Wang, Spencer Frei, Gal Vardi, Wei Hu
  • for: 这个论文探讨了使用梯度下降(GD)训练神经网络时,神经网络表现出了一些奇异的泛化行为。
  • methods: 该论文使用了两层ReLU网络和梯度下降(GD)训练方法。
  • results: 研究发现,在XOR集群数据上,一部分训练标签被随机变化,并且使用GD训练神经网络时,神经网络可以在训练数据上达到100%的准确率,但在测试数据上却具有近乎随机的性能。在后续训练步骤中,神经网络的测试准确率逐渐提高,并且仍然可以fitRandom labels in the training data,这是一种“搁废”现象。这是神经网络分类时,当数据分布不可分离时,首次出现的恰当过拟合现象。
    Abstract Neural networks trained by gradient descent (GD) have exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.
    摘要 神经网络通过梯度下降(GD)训练Display textGet display textHave exhibited a number of surprising generalization behaviors. First, they can achieve a perfect fit to noisy training data and still generalize near-optimally, showing that overfitting can sometimes be benign. Second, they can undergo a period of classical, harmful overfitting -- achieving a perfect fit to training data with near-random performance on test data -- before transitioning ("grokking") to near-optimal generalization later in training. In this work, we show that both of these phenomena provably occur in two-layer ReLU networks trained by GD on XOR cluster data where a constant fraction of the training labels are flipped. In this setting, we show that after the first step of GD, the network achieves 100% training accuracy, perfectly fitting the noisy labels in the training data, but achieves near-random test accuracy. At a later training step, the network achieves near-optimal test accuracy while still fitting the random labels in the training data, exhibiting a "grokking" phenomenon. This provides the first theoretical result of benign overfitting in neural network classification when the data distribution is not linearly separable. Our proofs rely on analyzing the feature learning process under GD, which reveals that the network implements a non-generalizable linear classifier after one step and gradually learns generalizable features in later steps.

Quantifying and mitigating the impact of label errors on model disparity metrics

  • paper_url: http://arxiv.org/abs/2310.02533
  • repo_url: None
  • paper_authors: Julius Adebayo, Melissa Hall, Bowen Yu, Bobbie Chern
  • for: 这个论文主要研究了标签错误对模型的不同性指标的影响。
  • methods: 作者使用了现有的方法来mitigate标签错误对模型的影响,并提出了一种新的方法来衡量标签错误对模型的影响。
  • results: 研究发现,标签错误会导致模型的不同性指标受到影响,特别是对少数群体的影响。此外,作者还提出了一种方法来衡量标签错误对模型的影响,并证明了这种方法可以提高模型的不同性指标。
    Abstract Errors in labels obtained via human annotation adversely affect a model's performance. Existing approaches propose ways to mitigate the effect of label error on a model's downstream accuracy, yet little is known about its impact on a model's disparity metrics. Here we study the effect of label error on a model's disparity metrics. We empirically characterize how varying levels of label error, in both training and test data, affect these disparity metrics. We find that group calibration and other metrics are sensitive to train-time and test-time label error -- particularly for minority groups. This disparate effect persists even for models trained with noise-aware algorithms. To mitigate the impact of training-time label error, we present an approach to estimate the influence of a training input's label on a model's group disparity metric. We empirically assess the proposed approach on a variety of datasets and find significant improvement, compared to alternative approaches, in identifying training inputs that improve a model's disparity metric. We complement the approach with an automatic relabel-and-finetune scheme that produces updated models with, provably, improved group calibration error.
    摘要 label错误会 adversely affect模型的性能。现有的方法提出了 mitigate label error的影响,但对于模型的差异度指标的影响则不很清楚。在这里,我们研究了 label error对模型的差异度指标的影响。我们经验性地characterize了不同水平的 label error在训练和测试数据中对这些差异度指标的影响。我们发现,对少数群体来说,group calibration和其他指标受到训练时和测试时 label error的影响,尤其是在使用静音感知算法时。这种差异性效应持续存在,即使使用静音感知算法。为了减少训练时 label error的影响,我们提出了一种Estimate the influence of a training input's label on a model's group disparity metric的方法。我们经验性地评估了该方法在多个数据集上,并发现它可以更好地标识训练输入,以提高模型的差异度指标。我们补充了该方法,提出了一种自动重新标签和调整的方法,可以生成更新后的模型,其差异度指标得到了改进。

QuATON: Quantization Aware Training of Optical Neurons

  • paper_url: http://arxiv.org/abs/2310.03049
  • repo_url: None
  • paper_authors: Hasindu Kariyawasam, Ramith Hettiarachchi, Dushan Wadduwage
  • for: 这个研究旨在实现智能测量的光学神经网络(ONA),但实现设计时的制约可以是问题。
  • methods: 我们提出了一个基于物理限制的量化测试框架,考虑了物理限制在训练过程中的影响,从而实现了可靠的设计。
  • results: 我们在文献中提出的Diffractive Deep Neural Network(D2NN)的实现中,运用了我们的方法,实现了对于光学阶层实像和阶层物体的分类。我们在不同的量化水平和数据集上进行了广泛的实验,展示了我们的方法能够实现ONA设计的稳定性。
    Abstract Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
    摘要 依据我们的提案,使用优化的物理参数的光学神经网络(ONA)可以实现智能测量。然而,实现ONA的设计时,制造技术的限制通常会限制训练后的精度。物理限制也可能限制物理参数的值范围。因此,ONA应该在实现可能的约束下进行训练。然而,这会将训练问题转化为受限制的优化问题,使用现有的梯度基本方法困难。为了解决这些问题,我们提出了物理知ledge-aware归一化训练框架。我们的方法会考虑物理约束 durante el proceso de entrenamiento, leading to designs that are robust to quantization noise.我们在Literature中提出的一种diffractive deep neural network(D2NN)中进行了广泛的实验,用于全光学相干成像和相对阶段的分类。我们通过不同的归一化级别和数据集进行了extensive experiments,并证明了我们的方法可以导致ONA的设计具有鲁棒性。

Parameterized Convex Minorant for Objective Function Approximation in Amortized Optimization

  • paper_url: http://arxiv.org/abs/2310.02519
  • repo_url: https://github.com/jinraekim/pcmao
  • paper_authors: Jinrae Kim, Youdan Kim
  • for: 这 paper 的目的是提出一种基于Parameterized convex minorant(PCM)方法的可编程优化方法,用于approximating objective function。
  • methods: 该方法使用PCM和非负差函数的总和来approximate objective function,其中PCM convex在优化变量下bounded from below。这个approximator是continuous functions的universal approximator,global minimizer of PCM可以通过单个convex optimization获得。
  • results: 在numerical simulation中,该方法可以快速和可靠地learn objective functions和global minimizer,并且可以用于non-parameterized-convex objective function approximation和learning-based nonlinear model predictive control。
    Abstract Parameterized convex minorant (PCM) method is proposed for the approximation of the objective function in amortized optimization. In the proposed method, the objective function approximator is expressed by the sum of a PCM and a nonnegative gap function, where the objective function approximator is bounded from below by the PCM convex in the optimization variable. The proposed objective function approximator is a universal approximator for continuous functions, and the global minimizer of the PCM attains the global minimum of the objective function approximator. Therefore, the global minimizer of the objective function approximator can be obtained by a single convex optimization. As a realization of the proposed method, extended parameterized log-sum-exp network is proposed by utilizing a parameterized log-sum-exp network as the PCM. Numerical simulation is performed for non-parameterized-convex objective function approximation and for learning-based nonlinear model predictive control to demonstrate the performance and characteristics of the proposed method. The simulation results support that the proposed method can be used to learn objective functions and to find the global minimizer reliably and quickly by using convex optimization algorithms.
    摘要 Parameterized convex minorant (PCM) 方法是用于优化目标函数的approximation的方法。在提议的方法中,目标函数估计器是表示为PCM和非负差函数的和,其中目标函数估计器在优化变量上受到PCM的下界。提议的目标函数估计器是绝对 continuous 函数的通用估计器, global minimizer 的PCM可以通过单个凸优化来获得。为实现该方法, extended 参数化 log-sum-exp 网络被提议,其中 parameterized log-sum-exp 网络作为PCM。通过数值实验, demonstrate 了该方法可以用凸优化算法来可靠地和快速地学习目标函数和找到全球最小值。Note: Simplified Chinese is used in this translation, which is a simplified version of Traditional Chinese.

Stochastic Thermodynamics of Learning Generative Parametric Probabilistic Models

  • paper_url: http://arxiv.org/abs/2310.19802
  • repo_url: None
  • paper_authors: Shervin Sadat Parsi
  • for: 本研究用 parametric probabilistic models (PPMs) 来描述生成机器学习问题,并研究这些问题的热力学特性。
  • methods: 研究人员使用 Stochastic Gradient Descent (SGD) 优化器和训练集来控制 PPMs 的时间演化。
  • results: 研究发现,SGD 优化器在生成样本时释放热量,导致 PPMs 参数Subsystem 的热力学 entropy 增加,从而确定模型学习的概率分布。这种方法为权重过气化模型的泛化能力提供了热力学意义的视角。
    Abstract We have formulated generative machine learning problems as the time evolution of Parametric Probabilistic Models (PPMs), inherently rendering a thermodynamic process. Then, we have studied the thermodynamic exchange between the model's parameters, denoted as $\Theta$, and the model's generated samples, denoted as $X$. We demonstrate that the training dataset and the action of the Stochastic Gradient Descent (SGD) optimizer serve as a work source that governs the time evolution of these two subsystems. Our findings reveal that the model learns through the dissipation of heat during the generation of samples $X$, leading to an increase in the entropy of the model's parameters, $\Theta$. Thus, the parameter subsystem acts as a heat reservoir, effectively storing the learned information. Furthermore, the role of the model's parameters as a heat reservoir provides valuable thermodynamic insights into the generalization power of over-parameterized models. This approach offers an unambiguous framework for computing information-theoretic quantities within deterministic neural networks by establishing connections with thermodynamic variables. To illustrate the utility of this framework, we introduce two information-theoretic metrics: Memorized-information (M-info) and Learned-information (L-info), which trace the dynamic flow of information during the learning process of PPMs.
    摘要 Translated into Simplified Chinese:我们已经将生成机器学习问题表述为 Parametric Probabilistic Models(PPMs)的时间演化过程,自然地涉及到热力学过程。然后,我们研究了模型参数 $\Theta$ 和生成样本 $X$ 之间的热力学交换。我们发现,训练集和 Stochastic Gradient Descent(SGD)优化器的作用共同控制这两个子系统的时间演化。我们的发现表明,模型通过生成样本 $X$ 中的热膨胀学习,导致模型参数 $\Theta$ 的熵增加。因此,参数子系统 behave as a heat reservoir,有效地存储学习的信息。此外,模型参数作为热贮储的角色提供了有价值的热力学意见,有助于理解过参数模型的泛化能力。这种方法提供了一个不ambiguous的框架,用于计算 deterministic neural networks 中的信息量量。为了证明该框架的实用性,我们引入了两个信息量度量:Memorized-information(M-info)和 Learned-information(L-info),这两个度量跟踪学习过程中信息的动态流动。

A Recipe for Improved Certifiable Robustness: Capacity and Data

  • paper_url: http://arxiv.org/abs/2310.02513
  • repo_url: https://github.com/hukkai/liresnet
  • paper_authors: Kai Hu, Klas Leino, Zifan Wang, Matt Fredrikson
  • for: 本研究旨在提高鲁棒训练的性能,特别是在 Lipschitz 约束下。
  • methods: 本文使用了一系列新技术和设计优化,包括 Cholesky-orthogonalized residual dense 层和 filtered generative data augmentation。
  • results: 本研究在多种数据集和扰动大小上显著提高了鲁棒训练的性能,并达到了最新的鲁棒精度标准(VRA)。 Specifically, the addition of large Cholesky-orthogonalized residual dense layers and filtered generative data augmentation improved the state-of-the-art verified robust accuracy by up to 8.5 percentage points.
    Abstract A key challenge, supported both theoretically and empirically, is that robustness demands greater network capacity and more data than standard training. However, effectively adding capacity under stringent Lipschitz constraints has proven more difficult than it may seem, evident by the fact that state-of-the-art approach tend more towards \emph{underfitting} than overfitting. Moreover, we posit that a lack of careful exploration of the design space for Lipshitz-based approaches has left potential performance gains on the table. In this work, we provide a more comprehensive evaluation to better uncover the potential of Lipschitz-based certification methods. Using a combination of novel techniques, design optimizations, and synthesis of prior work, we are able to significantly improve the state-of-the-art \emph{verified robust accuracy} (VRA) for deterministic certification on a variety of benchmark datasets, and over a range of perturbation sizes. Of particular note, we discover that the addition of large "Cholesky-orthogonalized residual dense" layers to the end of existing state-of-the-art Lipschitz-controlled ResNet architectures is especially effective for increasing network capacity and performance. Combined with filtered generative data augmentation, our final results further the state of the art deterministic VRA by up to 8.5 percentage points. Code is available at \url{https://github.com/hukkai/liresnet}.
    摘要 “一大挑战,理论上和实践上都支持,是耐性需要更大的网络容量和更多的数据,但是在严格的Lipschitz限制下添加容量并不那么容易。事实上,现今的state-of-the-art方法更倾向于\"下养\"而不是\"过养\"。此外,我们认为现有的Lipschitz基本方法的设计空间探索并未充分,导致可能的性能提升被忽略了。在这个研究中,我们提供了更完整的评估,以更好地探索Lipschitz基本方法的潜在性能。使用了一些新的技术、设计优化和对先前工作的 sintesis,我们能够在多种 benchmark 数据集上提高 state-of-the-art 的\"证实确的精度\"(VRA),并在不同的扰动大小下实现 significiant 的性能提升。尤其是,我们发现在现有的Lipschitz控制 ResNet 架构的尾端添加大量\"Cholesky-orthogonalized residual dense\"层可以增加网络容量和性能。在这些层的帮助下,我们使用了筛选的生成数据增强技术,最终的结果还是进一步推进了 state-of-the-art 的 VRA,高于8.5%。代码可以在 \url{https://github.com/hukkai/liresnet} 上找到。”

Ophiuchus: Scalable Modeling of Protein Structures through Hierarchical Coarse-graining SO(3)-Equivariant Autoencoders

  • paper_url: http://arxiv.org/abs/2310.02508
  • repo_url: None
  • paper_authors: Allan dos Santos Costa, Ilan Mitnikov, Mario Geiger, Manvitha Ponnapati, Tess Smidt, Joseph Jacobson
  • for: 这篇论文的目的是提出一种可扩展的蛋白质模型,以便更好地模拟和生成蛋白质结构。
  • methods: 该论文使用了一种名为Ophiuchus的SO(3)-等价变换模型,通过对蛋白质重原子进行本地卷积减分来模型序列-模式交互作用,并通过适应压缩率来实现高级别的结构嵌入。
  • results: 该论文通过对PDB蛋白质残余进行训练,实现了对不同压缩率的结构重建,并通过对折扣扩展的Latent space进行验证,证明了Ophiuchus的可扩展性和可靠性。
    Abstract Three-dimensional native states of natural proteins display recurring and hierarchical patterns. Yet, traditional graph-based modeling of protein structures is often limited to operate within a single fine-grained resolution, and lacks hourglass neural architectures to learn those high-level building blocks. We narrow this gap by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all heavy atoms of standard protein residues, while respecting their relevant symmetries. Our model departs from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions in log-linear length complexity. We train Ophiuchus on contiguous fragments of PDB monomers, investigating its reconstruction capabilities across different compression rates. We examine the learned latent space and demonstrate its prompt usage in conformational interpolation, comparing interpolated trajectories to structure snapshots from the PDBFlex dataset. Finally, we leverage denoising diffusion probabilistic models (DDPM) to efficiently sample readily-decodable latent embeddings of diverse miniproteins. Our experiments demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation.
    摘要 三维原生态蛋白质的Native状态显示出 recursively和层次结构的模式。然而,传统的图形基本模型 oft limitations 在 single fine-grained 分辨率操作,缺乏 hourglass 神经网络架构来学习高级结构块。我们减少这一差距 by introducing Ophiuchus, an SO(3)-equivariant coarse-graining model that efficiently operates on all heavy atoms of standard protein residues, while respecting their relevant symmetries。我们的模型 departure from current approaches that employ graph modeling, instead focusing on local convolutional coarsening to model sequence-motif interactions in log-linear length complexity。我们在 contiguous fragments of PDB monomers 上训练 Ophiuchus,investigating its reconstruction capabilities across different compression rates。我们 examine the learned latent space and demonstrate its prompt usage in conformational interpolation, comparing interpolated trajectories to structure snapshots from the PDBFlex dataset。最后,我们利用 denoising diffusion probabilistic models (DDPM) to efficiently sample readily-decodable latent embeddings of diverse miniproteins。我们的实验 Demonstrate Ophiuchus to be a scalable basis for efficient protein modeling and generation。

Learning to Reach Goals via Diffusion

  • paper_url: http://arxiv.org/abs/2310.02505
  • repo_url: None
  • paper_authors: Vineet Jain, Siamak Ravanbakhsh
  • for: 这个论文是用于解决目标定位问题的。
  • methods: 这个论文使用了扩散模型,并在这个模型中学习了一个目标条件政策。
  • results: 这个论文的实验结果与现有方法竞争力相当,这表明这种 diffusion 的视角在sequential decision-making 中是一种简单、可扩展、有效的方向。
    Abstract Diffusion models are a powerful class of generative models capable of mapping random noise in high-dimensional spaces to a target manifold through iterative denoising. In this work, we present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of diffusion modeling. Analogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy analogous to the score function. This approach, which we call Merlin, can reach predefined or novel goals from an arbitrary initial state without learning a separate value function. We consider three choices for the noise model to replace Gaussian noise in diffusion - reverse play from the buffer, reverse dynamics model, and a novel non-parametric approach. We theoretically justify our approach and validate it on offline goal-reaching tasks. Empirical results are competitive with state-of-the-art methods, which suggests this perspective on diffusion for RL is a simple, scalable, and effective direction for sequential decision-making.
    摘要 Diffusion模型是一种强大的生成模型,可以将高维空间中的随机噪声映射到目标拟合点 через iterative denoising。在这篇文章中,我们提出了一种新的视角,将目标conditioned reinforcement learning嵌入到 diffusion 模型的 Context中。与 diffusion 过程类似,我们构建了从数据拟合点开始的轨迹,然后学习一个 conditioned 政策,类似于 score 函数。我们称之为 Merlin。这种方法可以从任意初始状态达到预定或新的目标,无需学习分离的值函数。我们考虑了三种噪声模型来取代 Gaussian 噪声在 diffusion 中,包括 reverse play from the buffer、reverse dynamics model 和一种新的非 Parametric 方法。我们 teorically 正确了我们的方法,并在 offline goal-reaching 任务中 validate 它。实验结果与当前最佳方法竞争,这 Suggests 这种 diffusion 视角是一种简单、可扩展和有效的方向 для继系列决策。

Towards an Interpretable Representation of Speaker Identity via Perceptual Voice Qualities

  • paper_url: http://arxiv.org/abs/2310.02497
  • repo_url: None
  • paper_authors: Robin Netzorg, Bohan Yu, Andrea Guzman, Peter Wu, Luna McNulty, Gopala Anumanchipalli
  • for: 这个论文是为了提出一种可解释的 speaker identity 表示方法,基于语音质量特征 (PQ)。
  • methods: 这个方法使用了加入性别化 PQ 的 CAPE-V 协议,从而提供了一个抽象水平的语音特征空间,可以作为高级人类特征和低级语音、物理或学习表示之间的中间层。
  • results: 研究发现,这种 PQ-based 方法可以被多个非专业人群听到,并且表明这种信息在不同的语音表示中是预测可能的。
    Abstract Unlike other data modalities such as text and vision, speech does not lend itself to easy interpretation. While lay people can understand how to describe an image or sentence via perception, non-expert descriptions of speech often end at high-level demographic information, such as gender or age. In this paper, we propose a possible interpretable representation of speaker identity based on perceptual voice qualities (PQs). By adding gendered PQs to the pathology-focused Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V) protocol, our PQ-based approach provides a perceptual latent space of the character of adult voices that is an intermediary of abstraction between high-level demographics and low-level acoustic, physical, or learned representations. Contrary to prior belief, we demonstrate that these PQs are hearable by ensembles of non-experts, and further demonstrate that the information encoded in a PQ-based representation is predictable by various speech representations.
    摘要 不同于其他数据模式,如文本和视觉,语音不太容易被解释。而非专业人士通常只能描述语音的高级人口信息,如性别或年龄。在这篇论文中,我们提议一种可解释的发音人身份表示方法,基于感受性声质(PQ)。通过将性别化PQ添加到已有的医学疾病预测协议中(CAPE-V),我们的PQ基本表示方法可以提供一个在抽象水平上的中间 Representation of adult voices的性质,与高级人口信息和低级声学、物理或学习表示之间的中间 Representation。与以往的信念相反,我们示出了这些PQ可以被非专业人士听到,并且我们还示出了PQ基本表示中编码的信息可以通过不同的语音表示来预测。