cs.LG - 2023-10-03

DON-LSTM: Multi-Resolution Learning with DeepONets and Long Short-Term Memory Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02491
  • repo_url: https://github.com/katarzynamichalowska/don_lstm
  • paper_authors: Katarzyna Michałowska, Somdatta Goswami, George Em Karniadakis, Signe Riemer-Sørensen
  • for: 用于长时间演化多非线性系统模型化
  • methods: 使用 DeepONet 和 LSTM 两种 arquitectures,利用多分辨率数据并 capture 长序时间相关性
  • results: 对多非线性系统进行长时间演化模型化,比vanilla版本更低抗泛化误差,需要 fewer高分辨率样本
    Abstract Deep operator networks (DeepONets, DONs) offer a distinct advantage over traditional neural networks in their ability to be trained on multi-resolution data. This property becomes especially relevant in real-world scenarios where high-resolution measurements are difficult to obtain, while low-resolution data is more readily available. Nevertheless, DeepONets alone often struggle to capture and maintain dependencies over long sequences compared to other state-of-the-art algorithms. We propose a novel architecture, named DON-LSTM, which extends the DeepONet with a long short-term memory network (LSTM). Combining these two architectures, we equip the network with explicit mechanisms to leverage multi-resolution data, as well as capture temporal dependencies in long sequences. We test our method on long-time-evolution modeling of multiple non-linear systems and show that the proposed multi-resolution DON-LSTM achieves significantly lower generalization error and requires fewer high-resolution samples compared to its vanilla counterparts.
    摘要 深度运算网络(深度ONets,DONs)在训练多分辨率数据方面具有明显的优势,这种特性在实际应用中非常重要,因为高分辨率测量往往困难,而低分辨率数据却更易获取。然而,独立的DeepONets经常强制要求长时间序列之间的依赖关系,与其他当前算法相比,它们往往不具备稳定性。我们提出了一种新的架构,名为DON-LSTM,它将DeepONet与长短期记忆网络(LSTM)结合在一起。通过这两种架构,我们为网络提供了明确的多分辨率数据利用机制和长时间序列依赖关系捕捉机制。我们对多个非线性系统的长时间演化模型进行测试,并显示我们的多分辨率DON-LSTM方法可以减少总体误差,并且需要 fewer高分辨率样本。

Splitting the Difference on Adversarial Training

  • paper_url: http://arxiv.org/abs/2310.02480
  • repo_url: https://github.com/matanle51/splitting-the-difference-on-adversarial-training
  • paper_authors: Matan Levi, Aryeh Kontorovich
  • for: 本研究旨在提高深度神经网络的抗骚动能力,不同于传统的鲁棒化方法,我们采用了对各个类型的抗骚动例进行分类,从而简化决策边界。
  • methods: 我们的方法是对各个类型的抗骚动例进行分类,并对每个类型的抗骚动例进行独立的学习。这将每个类型的决策边界简化为两个简单的边界。
  • results: 我们的实验结果表明,我们的方法可以很好地保持模型的自然准确率,同时提高模型的抗骚动能力。在CIFAR-10数据集上,我们 obtianed near-optimal natural accuracy of 95.01% alongside significant robustness across multiple tasks。
    Abstract The existence of adversarial examples points to a basic weakness of deep neural networks. One of the most effective defenses against such examples, adversarial training, entails training models with some degree of robustness, usually at the expense of a degraded natural accuracy. Most adversarial training methods aim to learn a model that finds, for each class, a common decision boundary encompassing both the clean and perturbed examples. In this work, we take a fundamentally different approach by treating the perturbed examples of each class as a separate class to be learned, effectively splitting each class into two classes: "clean" and "adversarial." This split doubles the number of classes to be learned, but at the same time considerably simplifies the decision boundaries. We provide a theoretical plausibility argument that sheds some light on the conditions under which our approach can be expected to be beneficial. Likewise, we empirically demonstrate that our method learns robust models while attaining optimal or near-optimal natural accuracy, e.g., on CIFAR-10 we obtain near-optimal natural accuracy of $95.01\%$ alongside significant robustness across multiple tasks. The ability to achieve such near-optimal natural accuracy, while maintaining a significant level of robustness, makes our method applicable to real-world applications where natural accuracy is at a premium. As a whole, our main contribution is a general method that confers a significant level of robustness upon classifiers with only minor or negligible degradation of their natural accuracy.
    摘要 “深度神经网络的存在表明了它们的基本弱点。一种非常有效的防御方法是对抗训练,通过在模型训练中添加抗击例来提高模型的Robustness,通常会导致模型的自然精度下降。大多数对抗训练方法都是通过学习每个类的共同决策边界来防止抗击例的。在这个工作中,我们采取了一种根本不同的方法,即将每个类的抗击例视为一个分开的类,并将每个类分为两个类:“干净”和“抗击”。这将 doubles the number of classes to be learned, but at the same time, it significantly simplifies the decision boundaries. 我们提供了一种理论可能性的论证,可以解释在哪些条件下,我们的方法可以带来好的效果。同时,我们也通过实验证明了我们的方法可以学习Robust的模型,同时保持自然精度的优化或近似优化,例如在CIFAR-10上,我们获得了95.01%的自然精度,并在多个任务上展现了显著的Robustness。能够实现这种近似自然精度,同时保持显著的Robustness,使得我们的方法适用于实际应用中, где自然精度是非常重要。总之,我们的主要贡献是一种可以带来显著的Robustness的通用方法,只需要模型的自然精度下降很小或无法感知。”

ML4EJ: Decoding the Role of Urban Features in Shaping Environmental Injustice Using Interpretable Machine Learning

  • paper_url: http://arxiv.org/abs/2310.02476
  • repo_url: None
  • paper_authors: Yu-Hsuan Ho, Zhewei Liu, Cheng-Chun Lee, Ali Mostafavi
    for: This paper aims to examine the effects of various urban features and their non-linear interactions on exposure disparities of three primary hazards: air pollution, urban heat, and flooding.methods: The study uses an interpretable machine learning model, combining Random Forest and XGBoost, with data from six metropolitan counties in the United States to train and test the models.results: The analysis reveals that features related to social-demographic characteristics are the most prominent urban features that shape hazard extent, while features related to infrastructure distribution and land cover are relatively important for urban heat and air pollution exposure, respectively. The study also finds limited transferability among different regions and hazards, highlighting the intricate differences among hazards and regions and the way in which urban features shape hazard exposures.
    Abstract Understanding the key factors shaping environmental hazard exposures and their associated environmental injustice issues is vital for formulating equitable policy measures. Traditional perspectives on environmental injustice have primarily focused on the socioeconomic dimensions, often overlooking the influence of heterogeneous urban characteristics. This limited view may obstruct a comprehensive understanding of the complex nature of environmental justice and its relationship with urban design features. To address this gap, this study creates an interpretable machine learning model to examine the effects of various urban features and their non-linear interactions to the exposure disparities of three primary hazards: air pollution, urban heat, and flooding. The analysis trains and tests models with data from six metropolitan counties in the United States using Random Forest and XGBoost. The performance is used to measure the extent to which variations of urban features shape disparities in environmental hazard levels. In addition, the analysis of feature importance reveals features related to social-demographic characteristics as the most prominent urban features that shape hazard extent. Features related to infrastructure distribution and land cover are relatively important for urban heat and air pollution exposure respectively. Moreover, we evaluate the models' transferability across different regions and hazards. The results highlight limited transferability, underscoring the intricate differences among hazards and regions and the way in which urban features shape hazard exposures. The insights gleaned from this study offer fresh perspectives on the relationship among urban features and their interplay with environmental hazard exposure disparities, informing the development of more integrated urban design policies to enhance social equity and environmental injustice issues.
    摘要 理解环境风险暴露的关键因素和相关的环境正义问题对形ulating公平的政策措施至关重要。传统的环境正义观点主要关注社会经济维度,常常忽略城市特有的多样性特征。这种有限的视角可能会阻碍我们对环境正义的复杂性和与城市设计特征之间的关系进行全面的理解。为了缺失这个缺陷,本研究创建了可解释的机器学习模型,检查不同城市特征和它们的非线性相互作用对三种主要环境风险的暴露差异产生影响。研究使用美国六个都会县的数据,使用Random Forest和XGBoost进行训练和测试。模型的性能用于衡量城市特征变化对环境风险水平的影响。此外,特征重要性分析表明社会人口特征相关的特征是城市特征的最显著之处,而基础设施分布和土地覆盖相关的特征对城市热量和空气污染暴露有重要影响。此外,我们还评估模型在不同地区和风险之间的传送性。结果表明模型在不同地区和风险之间具有有限的传送性,这反映了风险和地区之间的复杂关系,以及城市特征如何影响环境风险暴露。研究的发现可以为城市规划政策的开发提供新的视角,以便更好地保护社会公平和环境正义问题。

Prompting-based Efficient Temporal Domain Generalization

  • paper_url: http://arxiv.org/abs/2310.02473
  • repo_url: None
  • paper_authors: Sepidehsadat Hosseini, Mengyao Zhai, Hossein Hajimirsadegh, Frederick Tung
  • for: 本研究旨在解决机器学习模型在不同时间段上的泛化问题,即训练和测试数据的分布不同的情况下,模型的泛化性强度不足。
  • methods: 我们提出了一种基于提问的 temporal domain generalization 方法,该方法不需要训练时间段中的目标频道数据(未来时间段的数据),并且可以在不同任务上(如分类、回归、时间序列预测)进行扩展。我们学习了全局提问、域pecific提问以及演化aware提问,以捕捉下降的时间动态。
  • results: 我们的方法在多种任务上实现了新的state-of-the-art benchmark,并且可以快速、parameter-efficient地在不同时间段上进行泛化。代码库将公开。
    Abstract Machine learning traditionally assumes that training and testing data are distributed independently and identically. However, in many real-world settings, the data distribution can shift over time, leading to poor generalization of trained models in future time periods. Our paper presents a novel prompting-based approach to temporal domain generalization that is parameter-efficient, time-efficient, and does not require access to the target domain data (i.e., unseen future time periods) during training. Our method adapts a target pre-trained model to temporal drift by learning global prompts, domain-specific prompts, and drift-aware prompts that capture underlying temporal dynamics. It is compatible across diverse tasks, such as classification, regression, and time series forecasting, and sets a new state-of-the-art benchmark in temporal domain generalization. The code repository will be publicly shared.
    摘要 传统上,机器学习假设训练和测试数据独立和相同分布。然而,在实际场景中,数据分布可能会随时间变化,导致训练模型在未来时间期间的泛化异常。我们的论文提出了一种新的提示基本方法,用于 temporal domain generalization,它是 parameter-efficient、time-efficient,并且不需要在训练过程中访问目标域数据(即未来时间期间的数据)。我们的方法通过学习全局提示、域特定提示和演变意识提示,以捕捉下游时间动力学。它可以与多种任务相容,如分类、回归和时间序列预测,并设置了新的 temporal domain generalization benchmark。代码仓库将公开发布。

Differentiable Chemical Physics by Geometric Deep Learning for Gradient-based Property Optimization of Mixtures

  • paper_url: http://arxiv.org/abs/2310.03047
  • repo_url: None
  • paper_authors: Shang Zhu, Bharath Ramsundar, Emil Annevelink, Hongyi Lin, Adarsh Dave, Pin-Wen Guan, Kevin Gering, Venkatasubramanian Viswanathan
  • for: 模型化化学混合物的多目标性能和约束,用于化学过程和电化学设备中。
  • methods: 利用几何深度学习(GDL)将分子种、组成和环境条件映射到物理系数,扩展混合物 термодинамиче和运输方程。
  • results: 比其数据驱动变体更高精度和模型可靠性,以及可以效率地优化电解质运输性能。
    Abstract Chemical mixtures, satisfying multi-objective performance metrics and constraints, enable their use in chemical processes and electrochemical devices. In this work, we develop a differentiable chemical-physics framework for modeling chemical mixtures, DiffMix, where geometric deep learning (GDL) is leveraged to map from molecular species, compositions and environment conditions, to physical coefficients in the mixture physics laws. In particular, we extend mixture thermodynamic and transport laws by creating learnable physical coefficients, where we use graph neural networks as the molecule encoder and enforce component-wise permutation-invariance. We start our model evaluations with thermodynamics of binary mixtures, and further benchmarked multicomponent electrolyte mixtures on their transport properties, in order to test the model generalizability. We show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, we demonstrate the efficient optimization of electrolyte transport properties, built on the gradient obtained using DiffMix auto-differentiation. Our simulation runs are then backed up by the data generated by a robotic experimentation setup, Clio. By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.
    摘要 We begin by evaluating the thermodynamics of binary mixtures and then benchmark the transport properties of multicomponent electrolyte mixtures to test the model's generalizability. Our results show that DiffMix has improved prediction accuracy and model robustness compared to purely data-driven variants. Additionally, we demonstrate the efficient optimization of electrolyte transport properties using the gradient obtained from DiffMix auto-differentiation. Our simulations are supported by data generated by a robotic experimentation setup, Clio.By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.

Distributionally Safe Reinforcement Learning under Model Uncertainty: A Single-Level Approach by Differentiable Convex Programming

  • paper_url: http://arxiv.org/abs/2310.02459
  • repo_url: None
  • paper_authors: Alaa Eddine Chriat, Chuangchuang Sun
  • for: 本研究的目的是提供一种可追踪的分布安全学习框架,以保证安全性在面临分布风险时。
  • methods: 本研究使用了分布学习和归一化拟合来处理分布风险,并使用了duality理论和可微编程来简化问题。
  • results: 研究表明,该框架可以提供有效的安全保证,并且在不同的系统中表现出了显著的改善。
    Abstract Safety assurance is uncompromisable for safety-critical environments with the presence of drastic model uncertainties (e.g., distributional shift), especially with humans in the loop. However, incorporating uncertainty in safe learning will naturally lead to a bi-level problem, where at the lower level the (worst-case) safety constraint is evaluated within the uncertainty ambiguity set. In this paper, we present a tractable distributionally safe reinforcement learning framework to enforce safety under a distributional shift measured by a Wasserstein metric. To improve the tractability, we first use duality theory to transform the lower-level optimization from infinite-dimensional probability space where distributional shift is measured, to a finite-dimensional parametric space. Moreover, by differentiable convex programming, the bi-level safe learning problem is further reduced to a single-level one with two sequential computationally efficient modules: a convex quadratic program to guarantee safety followed by a projected gradient ascent to simultaneously find the worst-case uncertainty. This end-to-end differentiable framework with safety constraints, to the best of our knowledge, is the first tractable single-level solution to address distributional safety. We test our approach on first and second-order systems with varying complexities and compare our results with the uncertainty-agnostic policies, where our approach demonstrates a significant improvement on safety guarantees.
    摘要 安全保证是不可缺的,特别在安全关键环境中存在悖Number� distributional shift (例如,分布变化),特别是在人类在循环中。然而,在安全学习中包含不确定性会自然导致两级问题,其中在下一级水平上 evaluate 安全约束的最坏情况。在这篇论文中,我们提出了一种可追踪的分布安全学习框架,以便在 Wasserstein metric 中度量分布性的变化下保证安全。为了提高可追踪性,我们首先使用 dual theory 将下一级优化从无穷多样性空间转移到 фиFinite-dimensional 参数空间。此外,通过可微转化的凸 программирова,我们将 bi-level safe learning 问题降低到一个单级问题,其中有两个可计算效率的模块:一个凸quadratic program 确保安全,然后是一个 projected gradient ascent 同时找到最坏情况的不确定性。这种整体可追踪的框架,至于我们所知道的是第一个可追踪单级解决方案,可以保证分布安全。我们在不同复杂度的 first 和 second 阶系统上测试了我们的方法,并与不确定性无关的策略进行比较。我们的方法在安全保证方面表现出了显著的改善。

Dual-stage Flows-based Generative Modeling for Traceable Urban Planning

  • paper_url: http://arxiv.org/abs/2310.02453
  • repo_url: None
  • paper_authors: Xuanming Hu, Wei Fan, Dongjie Wang, Pengyang Wang, Yong Li, Yanjie Fu
  • for: 这篇论文的目的是为了开发一个自动化的城市规划技术,以解决传统城市规划的复杂和严重的问题。
  • methods: 这篇论文使用了一种基于普通化流的新生成框架,即双阶段城市流程(DSUF)框架。这个框架包括两个阶段:首先是使用区域水平的城市规划流程来生成城市功能区,然后使用资讯融合模块来捕捉不同功能区之间的关系,最后是使用配置水平的城市规划流程来从融合的信息中获得适当的土地使用配置。
  • results: 论文的实验结果显示,这个新生成框架可以比其他生成模型更好地完成城市规划任务。
    Abstract Urban planning, which aims to design feasible land-use configurations for target areas, has become increasingly essential due to the high-speed urbanization process in the modern era. However, the traditional urban planning conducted by human designers can be a complex and onerous task. Thanks to the advancement of deep learning algorithms, researchers have started to develop automated planning techniques. While these models have exhibited promising results, they still grapple with a couple of unresolved limitations: 1) Ignoring the relationship between urban functional zones and configurations and failing to capture the relationship among different functional zones. 2) Less interpretable and stable generation process. To overcome these limitations, we propose a novel generative framework based on normalizing flows, namely Dual-stage Urban Flows (DSUF) framework. Specifically, the first stage is to utilize zone-level urban planning flows to generate urban functional zones based on given surrounding contexts and human guidance. Then we employ an Information Fusion Module to capture the relationship among functional zones and fuse the information of different aspects. The second stage is to use configuration-level urban planning flows to obtain land-use configurations derived from fused information. We design several experiments to indicate that our framework can outperform compared to other generative models for the urban planning task.
    摘要 现代化城市规划,旨在设计可行的土地利用配置,由于高速城市化过程而变得越来越重要。然而,由人设计的传统城市规划可能是复杂且费时的任务。随着深度学习算法的发展,研究人员开始开发自动化规划技术。虽然这些模型已经表现出了扎根的结果,但它们仍然面临一些未解决的限制:1)忽略城市功能区域和配置之间的关系,不能捕捉不同功能区域之间的关系。2) menos可读性和稳定性。为了超越这些限制,我们提出了一种基于恒常流的新生成框架,即双stage城市流体系(DSUF)框架。具体来说,第一stage是利用城市规划流体系来生成基于周围上下文和人类指导的城市功能区域。然后我们利用信息融合模块来捕捉不同功能区域之间的关系,并融合不同方面的信息。第二stage是使用配置级别的城市规划流体系来从融合的信息中获得来自不同方面的配置。我们设计了一些实验,表明我们的框架可以在城市规划任务中超越其他生成模型。

Feather: An Elegant Solution to Effective DNN Sparsification

  • paper_url: http://arxiv.org/abs/2310.02448
  • repo_url: https://github.com/athglentis/feather
  • paper_authors: Athanasios Glentis Georgoulakis, George Retsinas, Petros Maragos
  • for: 这篇论文的目的是提出一种高效的神经网络减少模型,以适应资源有限的环境,保持高性能。
  • methods: 这篇论文使用了一种名为Feather的简单 Training模块,利用Straight-Through Estimator作为核心,以及一种新的阈值选择器和梯度缩放技术,实现了稳定和可靠的减少性能。
  • results: 这篇论文在CIFAR dataset上使用了不同的架构进行评测,并在ImageNet上使用ResNet-50架构实现了验证集顶部1的最佳表现,超过了现有的方法,包括更复杂和计算沉重的方法,并且具有了更高的灵活性和可扩展性。
    Abstract Neural Network pruning is an increasingly popular way for producing compact and efficient models, suitable for resource-limited environments, while preserving high performance. While the pruning can be performed using a multi-cycle training and fine-tuning process, the recent trend is to encompass the sparsification process during the standard course of training. To this end, we introduce Feather, an efficient sparse training module utilizing the powerful Straight-Through Estimator as its core, coupled with a new thresholding operator and a gradient scaling technique, enabling robust, out-of-the-box sparsification performance. Feather's effectiveness and adaptability is demonstrated using various architectures on the CIFAR dataset, while on ImageNet it achieves state-of-the-art Top-1 validation accuracy using the ResNet-50 architecture, surpassing existing methods, including more complex and computationally heavy ones, by a considerable margin. Code is publicly available at https://github.com/athglentis/feather .
    摘要 neural network 剪除是一种日益受欢迎的方法,用于生成具有资源限制环境的高效和吞吐量小的模型,同时保持高性能。而在过去,剪除通常需要通过多轮训练和精度调整的过程来实现,但现在的趋势是在标准训练过程中包含剪除过程。为此,我们介绍了 Feather,一种高效的笔墨训练模块,利用强大的 Straight-Through Estimator 为核心,并与新的阈值操作和Gradient Scaling技术相结合,实现了稳定、直接的剪除性能。Feather 在不同的架构上在 CIFAR 数据集上进行了证明,而使用 ResNet-50 架构在 ImageNet 上达到了状态之arte Top-1 验证性能,超过了现有的方法,包括更复杂和计算沉重的方法,并且差距较大。代码可以在 上获取。

Machine learning assist nyc subway navigation safer and faster

  • paper_url: http://arxiv.org/abs/2310.02447
  • repo_url: None
  • paper_authors: Wencheng Bao, Shi Feng
  • for: 本研究旨在寻找一种能够兼顾安全和效率的主流导航软件,如Google和Apple Maps等,通常缺乏提供安全优先的路线。然而,安全仍然是许多人的首要关注点。我们的目标是找到一种能够平衡安全和效率的方法。
  • methods: 我们将使用Integer Programming模型,考虑到最短路和最安全的路线。我们将使用机器学习来 derive safety coefficients,包括Generalized Linear Models、线性回归和回归神经网络。我们的评价基于 Root Mean Square Error(RMSE)在不同的地铁站之间,帮助我们选择最准确的安全系数估计模型。
  • results: 我们将对不同的最短路算法进行全面评估,根据时间复杂度和实际数据来判断它们在合并安全和时间效率方面的适用程度。
    Abstract Mainstream navigation software, like Google and Apple Maps, often lacks the ability to provide routes prioritizing safety. However, safety remains a paramount concern for many. Our aim is to strike a balance between safety and efficiency. To achieve this, we're devising an Integer Programming model that takes into account both the shortest path and the safest route. We will harness machine learning to derive safety coefficients, employing methodologies such as generalized linear models, linear regression, and recurrent neural networks. Our evaluation will be based on the Root Mean Square Error (RMSE) across various subway stations, helping us identify the most accurate model for safety coefficient estimation. Furthermore, we'll conduct a comprehensive review of different shortest-path algorithms, assessing them based on time complexity and real-world data to determine their appropriateness in merging both safety and time efficiency.
    摘要 主流导航软件,如Google和Apple Maps,经常缺乏安全性优先级路径提供功能。然而,安全仍然是许多人的关键问题。我们的目标是找到一个平衡安全性和效率的方案。为此,我们正在开发一个整数编程模型,考虑到最短路和最安全的路径。我们将利用机器学习来 derivate 安全系数,采用方法包括泛化线性模型、线性回归和循环神经网络。我们的评估基于不同站点之间的平均方差(RMSE),帮助我们选择最准确的安全系数估计模型。此外,我们将对不同的最短路算法进行全面评估,根据时间复杂度和实际数据来确定它们是否适合将安全性和时间效率融合在一起。

GenCO: Generating Diverse Solutions to Design Problems with Combinatorial Nature

  • paper_url: http://arxiv.org/abs/2310.02442
  • repo_url: None
  • paper_authors: Aaron Ferber, Arman Zharmagambetov, Taoan Huang, Bistra Dilkina, Yuandong Tian
  • for: This paper aims to address the challenge of generating diverse and high-quality solutions for combinatorial optimization problems, which are common in computer graphics, animation, industrial design, and other fields.
  • methods: The proposed method, called GenCO, integrates deep generative models with embedded combinatorial solvers to generate instances of combinatorial optimization problems. The method differs from conventional generative models in that it focuses on generating combinatorial solutions rather than final objects.
  • results: The authors demonstrate the effectiveness of GenCO on a variety of generative tasks characterized by combinatorial intricacies, including game level generation and map creation for path planning. The results show that GenCO can generate diverse, high-quality solutions that reliably adhere to user-specified combinatorial properties.
    Abstract Generating diverse objects (e.g., images) using generative models (such as GAN or VAE) has achieved impressive results in the recent years, to help solve many design problems that are traditionally done by humans. Going beyond image generation, we aim to find solutions to more general design problems, in which both the diversity of the design and conformity of constraints are important. Such a setting has applications in computer graphics, animation, industrial design, material science, etc, in which we may want the output of the generator to follow discrete/combinatorial constraints and penalize any deviation, which is non-trivial with existing generative models and optimization solvers. To address this, we propose GenCO, a novel framework that conducts end-to-end training of deep generative models integrated with embedded combinatorial solvers, aiming to uncover high-quality solutions aligned with nonlinear objectives. While structurally akin to conventional generative models, GenCO diverges in its role - it focuses on generating instances of combinatorial optimization problems rather than final objects (e.g., images). This shift allows finer control over the generated outputs, enabling assessments of their feasibility and introducing an additional combinatorial loss component. We demonstrate the effectiveness of our approach on a variety of generative tasks characterized by combinatorial intricacies, including game level generation and map creation for path planning, consistently demonstrating its capability to yield diverse, high-quality solutions that reliably adhere to user-specified combinatorial properties.
    摘要 <>Translate the following text into Simplified Chinese.<>生成多样化物品(如图像)使用生成模型(如GAN或VAE)在过去几年内已经取得了很好的成绩,以解决人类设计问题。不过,我们想要超出图像生成,解决更加一般的设计问题,其中包括多样化设计和约束的兼容性都很重要。这种情况在计算机图形、动画、工业设计、材料科学等领域都有应用,我们可能希望生成器的输出遵循离散/组合约束,并对任何偏差进行惩罚,这是现有的生成模型和优化算法所不能做的。为此,我们提出了GenCO框架,它是一种结合深度生成模型和嵌入式 combinatorial solvers 的新框架,旨在通过终端训练来找到高质量的解决方案,这些解决方案遵循非线性目标。虽然结构上与传统的生成模型相似,GenCO 却不同之处在于它的目标是生成 combinatorial 优化问题的实例而不是最终物品(如图像)。这种转移使得可以更细控制生成的输出,并允许评估其可行性,同时还可以添加额外的 combinatorial 损失Component。我们在多种生成任务中进行了详细的测试,包括游戏水平生成和路径规划map创建,一直表现出GenCO 能够生成多样化、高质量的解决方案,可靠地遵循用户指定的 combinatorial 性perty。

EGraFFBench: Evaluation of Equivariant Graph Neural Network Force Fields for Atomistic Simulations

  • paper_url: http://arxiv.org/abs/2310.02428
  • repo_url: None
  • paper_authors: Vaibhav Bihani, Utkarsh Pratiush, Sajid Mannan, Tao Du, Zhimin Chen, Santiago Miret, Matthieu Micoulaut, Morten M Smedskjaer, Sayan Ranu, N M Anoop Krishnan
  • for: This paper aims to evaluate the performance of equivariant graph neural network (EGraFF) force fields for real-world atomistic simulations, and to provide a systematic benchmarking of six EGraFF algorithms.
  • methods: The paper uses eight existing datasets and releases two new benchmark datasets to evaluate the performance of EGraFF models. The authors also propose four new metrics and three new challenging tasks to assess the models’ capabilities and limitations.
  • results: The study finds that no single EGraFF model outperforms others on all datasets and tasks, and that the performance of all models on out-of-distribution datasets is unreliable. The authors highlight the need for developing a foundation model for force fields that can be used in real-world simulations.
    Abstract Equivariant graph neural networks force fields (EGraFFs) have shown great promise in modelling complex interactions in atomic systems by exploiting the graphs' inherent symmetries. Recent works have led to a surge in the development of novel architectures that incorporate equivariance-based inductive biases alongside architectural innovations like graph transformers and message passing to model atomic interactions. However, thorough evaluations of these deploying EGraFFs for the downstream task of real-world atomistic simulations, is lacking. To this end, here we perform a systematic benchmarking of 6 EGraFF algorithms (NequIP, Allegro, BOTNet, MACE, Equiformer, TorchMDNet), with the aim of understanding their capabilities and limitations for realistic atomistic simulations. In addition to our thorough evaluation and analysis on eight existing datasets based on the benchmarking literature, we release two new benchmark datasets, propose four new metrics, and three new challenging tasks. The new datasets and tasks evaluate the performance of EGraFF to out-of-distribution data, in terms of different crystal structures, temperatures, and new molecules. Interestingly, evaluation of the EGraFF models based on dynamic simulations reveals that having a lower error on energy or force does not guarantee stable or reliable simulation or faithful replication of the atomic structures. Moreover, we find that no model clearly outperforms other models on all datasets and tasks. Importantly, we show that the performance of all the models on out-of-distribution datasets is unreliable, pointing to the need for the development of a foundation model for force fields that can be used in real-world simulations. In summary, this work establishes a rigorous framework for evaluating machine learning force fields in the context of atomic simulations and points to open research challenges within this domain.
    摘要 “具有equivariant graph neural network力场(EGraFF)的研究在模型复杂原子系统中表现出了惊人的搭配。最近的研究带来了新的架构,包括graph transformer和信使通信,以及具有equivariant-based inductive bias的新架构。然而,对这些EGraFF模型在真实原子 simulate task中的部署,尚缺乏系统性的评估。因此,我们在这里进行了系统性的比较分析,以了解EGraFF模型在真实原子 simulate task中的能力和局限性。我们分析了8个现有的数据集,并释放了2个新的数据集,并提出了4个新的指标。新的数据集和任务评估EGraFF模型对不同的晶体结构、温度和新的分子的表现。我们发现,对于动态 simulations,EGraFF模型的误差在能量或力方面的低不一定意味着可靠或可信的模拟或原子结构的忠实复制。此外,我们发现没有任何模型在所有数据集和任务中表现出优异。最后,我们发现EGraFF模型对于不同的晶体结构、温度和新的分子的表现不可靠, pointing to the need for developing a foundation model for force fields that can be used in real-world simulations。总之,本文建立了一个严格的评估框架,用于在原子 simulate task中评估机器学习力场,并指出了这个领域的未来研究挑战。”

Delta-AI: Local objectives for amortized inference in sparse graphical models

  • paper_url: http://arxiv.org/abs/2310.02423
  • repo_url: https://github.com/gfnorg/delta-ai
  • paper_authors: Jean-Pierre Falet, Hae Beom Lee, Nikolay Malkin, Chen Sun, Dragos Secrieru, Dinghuai Zhang, Guillaume Lajoie, Yoshua Bengio
  • for: 本研究旨在提出一种新的概率图模型(PGM)中的权重学习算法,以便实现吞吐量的推理。
  • methods: 该算法基于PGM的稀疏性,通过对变量的采样视为智能机器人的行为序列来实现本地归因,从而使用生成流网络(GFlowNets)的方法来实现偏置训练,并且不需要为每个参数更新instantiate all random variables,因此可以很快速地训练。
  • results: 该算法可以快速地从 sintetic PGM 和具有稀疏因子结构的秘密变量模型中采样,并且可以实现对部分变量的推理。
    Abstract We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs), which we call $\Delta$-amortized inference ($\Delta$-AI). Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective. This yields a local constraint that can be turned into a local loss in the style of generative flow networks (GFlowNets) that enables off-policy training but avoids the need to instantiate all the random variables for each parameter update, thus speeding up training considerably. The $\Delta$-AI objective matches the conditional distribution of a variable given its Markov blanket in a tractable learned sampler, which has the structure of a Bayesian network, with the same conditional distribution under the target PGM. As such, the trained sampler recovers marginals and conditional distributions of interest and enables inference of partial subsets of variables. We illustrate $\Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
    摘要 我们提出了一种新的算法,即$\Delta$-抽象推理($\Delta$-AI),用于快速的推理和学习含有稀疏概率图模型(PGM)的含义。我们的方法基于PGM的稀疏性,使得在变量抽取视为动作序列时,可以在代理人的政策学习目标中实现本地归因。这些本地约束可以转化为本地损失,类似于生成流网络(GFlowNets),以便在不需要为每个参数更新都实例化所有随机变量的情况下进行快速训练。$\Delta$-AI目标是将变量 conditional distribution 与其Markov blanket中的 conditional distribution匹配,这个目标可以通过一种简单的学习探针来实现,这个探针具有 Bayesian network 的结构。因此,训练完成后的探针可以重建有 интерес的 marginals 和 conditional distribution,并且可以进行部分变量的推理。我们在synthetic PGM 和含有稀疏因子结构的秘密变量模型上证明了$\Delta$-AI的有效性。

Automated Bug Generation in the era of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.02407
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Ali Reza Ibrahimzada, Yang Chen, Ryan Rong, Reyhaneh Jabbarvand
  • For: 这个论文的目的是为了提出一种能够生成多种复杂的bug的方法,以便进行软件工程中的测试和修复。* Methods: 这个方法使用了深度学习模型(LLMs)来对代码进行多处修改,以便生成复杂的bug。具体来说,它会根据模型的注意力来指定修改的位置,以确保修改不会导致代码表示异常大的变化。* Results: 该方法在320万多个bug中进行了广泛的评估,并与两种替代方法进行比较。结果显示,该方法可以更好地生成难以检测和难以修复的bug,并且可以与当前最佳的学习型程序修复技术进行compatible。
    Abstract Bugs are essential in software engineering; many research studies in the past decades have been proposed to detect, localize, and repair bugs in software systems. Effectiveness evaluation of such techniques requires complex bugs, i.e., those that are hard to detect through testing and hard to repair through debugging. From the classic software engineering point of view, a hard-to-repair bug differs from the correct code in multiple locations, making it hard to localize and repair. Hard-to-detect bugs, on the other hand, manifest themselves under specific test inputs and reachability conditions. These two objectives, i.e., generating hard-to-detect and hard-to-repair bugs, are mostly aligned; a bug generation technique can change multiple statements to be covered only under a specific set of inputs. However, these two objectives are conflicting for learning-based techniques: A bug should have a similar code representation to the correct code in the training data to challenge a bug prediction model to distinguish them. The hard-to-repair bug definition remains the same but with a caveat: the more a bug differs from the original code (at multiple locations), the more distant their representations are and easier to be detected. We propose BugFarm, to transform arbitrary code into multiple complex bugs. BugFarm leverages LLMs to mutate code in multiple locations (hard-to-repair). To ensure that multiple modifications do not notably change the code representation, BugFarm analyzes the attention of the underlying model and instructs LLMs to only change the least attended locations (hard-to-detect). Our comprehensive evaluation of 320k+ bugs from over 2.5M mutants generated by BugFarm and two alternative approaches demonstrates our superiority in generating bugs that are hard to detect by learning-based bug prediction approaches and hard to repair by SOTA learning-based program repair technique.
    摘要 软件工程中,虫子是必需的;多个研究在过去几十年中已经提出了检测、定位和修复软件系统中的虫子的技术。评估这些技术的有效性需要复杂的虫子,即具有检测和修复困难的虫子。从 классиcal的软件工程角度来看,困难修复的虫子与正确代码在多个位置有所不同,使其困难于定位和修复。另一方面,困难检测的虫子通常会在特定的输入和达到性条件下表现出来。这两个目标在一起来说是可以匹配的,一种虫子生成技术可以通过特定的输入来覆盖多个语句。然而,这两个目标在学习基本的技术来说是矛盾的:一个虫子应该与正确代码在训练数据中的代码表示相似,以挑战虫子预测模型将其与正确代码区分开来。困难修复虫子的定义保持不变,但是有一点需要注意:随着虫子与原始代码之间的差异增加,它们的代码表示变得更加不同,并且更容易被检测。我们提出了 BugFarm,它可以将任意代码转换成多种复杂的虫子。BugFarm 利用 LLMS 进行多个位置的мутирование(困难修复),以确保多个修改不会显著改变代码表示。为了确保这些修改不会导致代码表示变得更加不同,BugFarm 分析了下游模型的注意力并 instruc LLMS 只变化最少注意力的位置(困难检测)。我们对 BugFarm 和两种替代方法生成的320万个虫子以及250万个 mutants进行了全面的评估,结果表明我们的技术在使用学习基本的检测和修复技术时,可以更好地生成困难检测和困难修复的虫子。

On the Parallel Complexity of Multilevel Monte Carlo in Stochastic Gradient Descent

  • paper_url: http://arxiv.org/abs/2310.02402
  • repo_url: None
  • paper_authors: Kei Ishikawa
  • for: 这个研究是用于探讨Stochastic Gradient Descent(SGD)中的sequential simulation,特别是Neural Stochastic Differential Equations(NSDE)中的Multilevel Monte Carlo(MLMC)方法。
  • methods: 本研究使用了delayed MLMC gradient estimator,它可以将MLMC的大量且平行的复杂性压缩到naive Monte Carlo方法的水平以下,从而提高SGD的并发性。
  • results: 在numerical experiments中,我们使用了deep hedging来证明了我们的方法可以在SGD中提高并发性,并且与标准MLMC方法相比,其parallel complexity可以得到更好的性能。
    Abstract In the stochastic gradient descent (SGD) for sequential simulations such as the neural stochastic differential equations, the Multilevel Monte Carlo (MLMC) method is known to offer better theoretical computational complexity compared to the naive Monte Carlo approach. However, in practice, MLMC scales poorly on massively parallel computing platforms such as modern GPUs, because of its large parallel complexity which is equivalent to that of the naive Monte Carlo method. To cope with this issue, we propose the delayed MLMC gradient estimator that drastically reduces the parallel complexity of MLMC by recycling previously computed gradient components from earlier steps of SGD. The proposed estimator provably reduces the average parallel complexity per iteration at the cost of a slightly worse per-iteration convergence rate. In our numerical experiments, we use an example of deep hedging to demonstrate the superior parallel complexity of our method compared to the standard MLMC in SGD.
    摘要 在随机梯度下降(SGD)中的连续模拟中,多层 Monte Carlo(MLMC)方法在理论上有更好的计算复杂度,但在实践中,MLMC在现代GPU上的批处理复杂度却与朴素Monte Carlo方法相当。为了解决这个问题,我们提议使用延迟MLMC梯度估计器,可以减少MLMC的平行复杂度,但是会付出一定的更快的平行复杂度。在我们的数值实验中,我们使用深度保险作为例子,展示了我们的方法在SGD中的更好的平行复杂度比标准MLMC。Here's the translation in Traditional Chinese:在随机梯度下降(SGD)中的连续模拟中,多层 Monte Carlo(MLMC)方法在理论上有更好的计算复杂度,但在实践中,MLMC在现代GPU上的批处理复杂度却与朴素Monte Carlo方法相等。为了解决这个问题,我们提议使用延迟MLMC梯度估计器,可以减少MLMC的平行复杂度,但是会付出一定的更快的平行复杂度。在我们的数值实验中,我们使用深度保险作为例子,展示了我们的方法在SGD中的更好的平行复杂度比标准MLMC。

Reducing Intraspecies and Interspecies Covariate Shift in Traumatic Brain Injury EEG of Humans and Mice Using Transfer Euclidean Alignment

  • paper_url: http://arxiv.org/abs/2310.02398
  • repo_url: None
  • paper_authors: Manoj Vishwanath, Steven Cao, Nikil Dutt, Amir M. Rahmani, Miranda M. Lim, Hung Cao
  • for: 这篇论文旨在提出一种转移学习技术来解决生物医学数据缺乏问题,以提高机器学习和深度学习模型在不同数据集上的表现。
  • methods: 本论文使用了转移学习技术,试用了不同的机器学习模型,包括rule-based机器学习模型和EEGNet-based深度学习模型,并在不同的数据集上进行了评估。
  • results: 研究发现,转移学习技术可以提高机器学习和深度学习模型在不同数据集上的表现,特别是在内species数据集上的表现有14.42%的提升,在interspecies数据集上的表现有5.53%的提升。
    Abstract While analytics of sleep electroencephalography (EEG) holds certain advantages over other methods in clinical applications, high variability across subjects poses a significant challenge when it comes to deploying machine learning models for classification tasks in the real world. In such instances, machine learning models that exhibit exceptional performance on a specific dataset may not necessarily demonstrate similar proficiency when applied to a distinct dataset for the same task. The scarcity of high-quality biomedical data further compounds this challenge, making it difficult to evaluate the model's generality comprehensively. In this paper, we introduce Transfer Euclidean Alignment - a transfer learning technique to tackle the problem of the dearth of human biomedical data for training deep learning models. We tested the robustness of this transfer learning technique on various rule-based classical machine learning models as well as the EEGNet-based deep learning model by evaluating on different datasets, including human and mouse data in a binary classification task of detecting individuals with versus without traumatic brain injury (TBI). By demonstrating notable improvements with an average increase of 14.42% for intraspecies datasets and 5.53% for interspecies datasets, our findings underscore the importance of the use of transfer learning to improve the performance of machine learning and deep learning models when using diverse datasets for training.
    摘要 While sleep electroencephalography (EEG) 的分析有一些优势在临床应用中,人工数据的高变化性对于部署机器学习模型进行分类任务在实际世界中带来了一定的挑战。在这种情况下,可能会有一些机器学习模型在特定的数据集上表现出色,但是在应用于不同的数据集上可能并不会达到相同的水平。医疗数据的稀缺性进一步增加了这种挑战,使得不可能全面评估模型的通用性。在这篇论文中,我们提出了传输欧几丁素对策 - 一种传输学习技术,以解决人类生物医学数据的缺乏问题。我们在不同的规则基于的传统机器学习模型和EEGNet基于的深度学习模型上测试了这种传输学习技术的可Robustness,并在人类和鼠标数据上进行了一种二分类任务,即识别患有 versus 不患有脑部 травматиче创伤 (TBI)。我们的发现表明,使用传输学习技术可以提高机器学习和深度学习模型在使用多种数据集进行训练时的性能,特别是在�traspecies数据集上。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that instead.

Implicit regularization of multi-task learning and finetuning in overparameterized neural networks

  • paper_url: http://arxiv.org/abs/2310.02396
  • repo_url: None
  • paper_authors: Jack W. Lindsey, Samuel Lippl
  • for: 这个论文研究了 auxiliary task 学习的 inductive biases,包括同时学习 (multi-task learning, MTL) 和预训练后 fine-tuning (PT+FT)。
  • methods: 作者使用了二层对角线Linear Network,并使用梯度下降来训练。
  • results: 研究发现,在训练 auxiliary task 时,网络会受到各种各样的 inductive biases,包括强制共享任务之间的特征和特征精炼。这些特征会导致网络在继续训练时运行在一种混合的 “核心”(或 “懒散”)模式和 “特征学习” (“rich”) 模式之间。此外,PT+FT 还可能导致一种新的 “嵌入特征学习” 行为,这会帮助网络提取一个稀缺的特征集。在 ReLU 网络中,作者复制了这些Qualitative behaviors。此外,作者还发现,PT+FT 会学习一些与auxiliary task 相关的 yet distinct from 的特征,而 MTL 则倾向于使用同一个特征来解决两个任务。因此,在实际情况下,MTL 在数据 scarcity 情况下更好地通用,而 PT+FT 在有更多数据可用时表现更好。作者还证明了这些结论在图像分类任务上是正确的。
    Abstract It is common in deep learning to train networks on auxiliary tasks with the expectation that the learning will transfer, at least partially, to another task of interest. In this work, we investigate the inductive biases that result from learning auxiliary tasks, either simultaneously (multi-task learning, MTL) or sequentially (pretraining and subsequent finetuning, PT+FT). In the simplified setting of two-layer diagonal linear networks trained with gradient descent, we identify implicit regularization penalties associated with MTL and PT+FT, both of which incentivize feature sharing between tasks and sparsity in learned task-specific features. Notably, our results imply that during finetuning, networks operate in a hybrid of the kernel (or "lazy") regime and the feature learning ("rich") regime identified in prior work. Moreover, PT+FT can exhibit a novel "nested feature learning" behavior not captured by either regime, which biases it to extract a sparse subset of the features learned during pretraining. In ReLU networks, we reproduce all of these qualitative behaviors. We also observe that PT+FT (but not MTL) is biased to learn features that are correlated with (but distinct from) those needed for the auxiliary task, while MTL is biased toward using identical features for both tasks. As a result, we find that in realistic settings, MTL generalizes better when comparatively little data is available for the task of interest, while PT+FT outperforms it with more data available. We show that our findings hold qualitatively for a deep architecture trained on image classification tasks. Our characterization of the nested feature learning regime also motivates a modification to PT+FT that we find empirically improves performance. Overall, our results shed light on the impact of auxiliary task learning and suggest ways to leverage it more effectively.
    摘要 通常在深度学习中,我们会使用 auxiliary task 来培养网络,期望学习将传播到另一个任务中。在这项工作中,我们研究了协助任务学习中的偏见,包括同时学习多个任务(多任务学习,MTL)和先后学习和精度调整(预训练和后续精度调整,PT+FT)。在简化的两层对角线网络中,我们发现了同时学习和预训练+精度调整都会带来隐式的规范化罚款,这些罚款激励feature sharing между任务和精度调整。另外,我们发现在继续训练时,网络会处于“囊括”(或“懒散”)模式和“特征学习”(或“丰富”)模式之间,并且PT+FT可能会展现出一种新的“嵌套特征学习”行为,这种行为会吸引网络学习一 subset of the features during pretraining。在ReLU网络中,我们复制了所有这些qualitative行为。此外,我们发现PT+FT(而不是MTL)会学习与auxiliary task相关的 yet distinct from 的特征,而MTL则倾向于使用同一个特征来进行两个任务。因此,我们发现在实际情况下,MTL在数据不足时会更好地泛化,而PT+FT在有更多数据时会表现更好。我们发现这些结论在图像分类任务上也是如此。我们的结论还适用于深度架构,并且我们的“嵌套特征学习”模式的描述也鼓励了一种PT+FT的修改,我们在实验中发现这种修改可以提高性能。总之,我们的结论 shed light on the impact of auxiliary task learning和提供了更好地利用它的方法。

Secure and Effective Data Appraisal for Machine Learning

  • paper_url: http://arxiv.org/abs/2310.02373
  • repo_url: None
  • paper_authors: Xu Ouyang, Changhong Yang, Felix Xiaozhu Lin, Yangfeng Ji
  • for: 该论文目的是提出一种能够在数据所有者和模型所有者之间实现隐私保护的数据选择方法,以便在数据和模型之间进行交易。
  • methods: 该论文使用多方计算(MPC)技术来评估目标模型,并提出了一种新的方法,可以在评估过程中实现数据选择。
  • results: 该论文的实验结果表明,相比直接使用MPC评估目标模型,该新方法可以减少评估时间从千小时减少到只有几十小时,并且准确率下降的程度非常小(仅0.20%)。
    Abstract Essential for an unfettered data market is the ability to discreetly select and evaluate training data before finalizing a transaction between the data owner and model owner. To safeguard the privacy of both data and model, this process involves scrutinizing the target model through Multi-Party Computation (MPC). While prior research has posited that the MPC-based evaluation of Transformer models is excessively resource-intensive, this paper introduces an innovative approach that renders data selection practical. The contributions of this study encompass three pivotal elements: (1) a groundbreaking pipeline for confidential data selection using MPC, (2) replicating intricate high-dimensional operations with simplified low-dimensional MLPs trained on a limited subset of pertinent data, and (3) implementing MPC in a concurrent, multi-phase manner. The proposed method is assessed across an array of Transformer models and NLP/CV benchmarks. In comparison to the direct MPC-based evaluation of the target model, our approach substantially reduces the time required, from thousands of hours to mere tens of hours, with only a nominal 0.20% dip in accuracy when training with the selected data.
    摘要 为创建一个不受限制的数据市场,必须能够私下选择和评估训练数据 перед完成数据所有者和模型所有者之间的交易。为了保护数据和模型的隐私,这个过程通过多方计算(MPC)进行。 although prior research has suggested that MPC-based evaluation of Transformer models is too resource-intensive, this paper proposes an innovative approach that makes data selection practical. The contributions of this study include three key elements:1. A groundbreaking pipeline for confidential data selection using MPC.2. Replicating complex high-dimensional operations with simplified low-dimensional MLPs trained on a limited subset of relevant data.3. Implementing MPC in a concurrent, multi-phase manner.The proposed method is evaluated across a range of Transformer models and NLP/CV benchmarks. Compared to direct MPC-based evaluation of the target model, our approach significantly reduces the time required, from thousands of hours to just tens of hours, with only a minimal 0.20% decrease in accuracy when training with the selected data.

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation

  • paper_url: http://arxiv.org/abs/2310.02368
  • repo_url: None
  • paper_authors: Benjamin Steenhoek, Michele Tufano, Neel Sundaresan, Alexey Svyatkovskiy
  • For: The paper aims to improve the quality of test cases generated by Large Language Models (LLMs) using Reinforcement Learning (RL) and static quality metrics.* Methods: The authors propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM), which involves analyzing anti-patterns generated by LLMs, training specific reward models for each static quality metric, and using Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time.* Results: The authors show that RL-optimized models consistently generate high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generate nearly 100% syntactically correct code. RLSQM also outperformed GPT-4 on four out of seven metrics.Here’s the information in Simplified Chinese text:* For: 本研究目标是使用强大语言模型(LLMs)和静态质量指标(static quality metrics)提高测试用例的质量。* Methods: 作者提出了一种新的技术 called Reinforcement Learning from Static Quality Metrics(RLSQM),该技术包括分析 LLMS 生成的反模式,Specific reward models for each static quality metric,并使用 Proximal Policy Optimization(PPO)来训练模型。* Results: 作者表明,RL-优化的模型在 LLMS 基础上 consistently 生成高质量的测试用例,提高模型的质量达到 21%,并成功生成nearly 100% 正确的代码。RLSM 还在四个metric中超过 GPT-4。
    Abstract Software testing is a crucial aspect of software development, and the creation of high-quality tests that adhere to best practices is essential for effective maintenance. Recently, Large Language Models (LLMs) have gained popularity for code generation, including the automated creation of test cases. However, these LLMs are often trained on vast amounts of publicly available code, which may include test cases that do not adhere to best practices and may even contain test smells (anti-patterns). To address this issue, we propose a novel technique called Reinforcement Learning from Static Quality Metrics (RLSQM). To begin, we analyze the anti-patterns generated by the LLM and show that LLMs can generate undesirable test smells. Thus, we train specific reward models for each static quality metric, then utilize Proximal Policy Optimization (PPO) to train models for optimizing a single quality metric at a time. Furthermore, we amalgamate these rewards into a unified reward model aimed at capturing different best practices and quality aspects of tests. By comparing RL-trained models with those trained using supervised learning, we provide insights into how reliably utilize RL to improve test generation quality and into the effects of various training strategies. Our experimental results demonstrate that the RL-optimized model consistently generated high-quality test cases compared to the base LLM, improving the model by up to 21%, and successfully generates nearly 100% syntactically correct code. RLSQM also outperformed GPT-4 on four out of seven metrics. This represents a significant step towards enhancing the overall efficiency and reliability of software testing through Reinforcement Learning and static quality metrics. Our data are available at this link: https://figshare.com/s/ded476c8d4c221222849.
    摘要 软件测试是软件开发过程中不可或缺的一部分,创建符合最佳实践的测试用例是软件维护的关键。最近,大型自然语言模型(LLM)在代码生成方面得到了广泛的应用,其中包括自动生成测试用例。然而,这些LLM通常是通过庞大量公共可用的代码进行训练,这些代码可能包含不符合最佳实践的测试用例和测试臭味(anti-patterns)。为解决这个问题,我们提出了一种新的技术 called Reinforcement Learning from Static Quality Metrics(RLSQM)。我们首先分析了LLM生成的测试臭味,并证明LLM可以生成不良测试用例。然后,我们为每种静态质量指标提供特定的奖励模型,然后使用Proximal Policy Optimization(PPO)训练模型,以便在不同的质量指标上优化单个测试用例。此外,我们将这些奖励汇集到一个统一的奖励模型,以捕捉不同的最佳实践和质量方面。通过比较RL训练的模型与超出学习训练的模型,我们提供了如何可靠地利用RL提高测试生成质量的信息,以及不同训练策略的效果。我们的实验结果表明,RL优化后的模型可以与基础LLM相比提高至21%,并成功生成99%的符号正确的代码。RLSQM还在四个 из七个指标上超越GPT-4。这表明RLSQM在软件测试中的应用可以提高总效率和可靠性。我们的数据可以在以下链接中找到:https://figshare.com/s/ded476c8d4c221222849。

Stochastic force inference via density estimation

  • paper_url: http://arxiv.org/abs/2310.02366
  • repo_url: None
  • paper_authors: Victor Chardès, Suryanarayana Maddu, Michael J. Shelley
  • for: 本研究旨在推断低分辨率时间数据中的动力模型,尤其在蛋白质物理中,分离分子程序与噪声仍然是一个重要的开放问题。
  • methods: 我们提出了一种方法,它基于下游分布的概率流来推断一个自主、非线性的力场,将分子程序与内在噪声分离开。我们使用了Score-matching来 отлиčiсть力场与内在噪声。
  • results: 我们通过了一些生物物理的实际例子,显示了我们的方法可以从非站ARY数据中提取非保守的力场,在平衡状态数据上学习平衡动力学,并且可以处理 additive 和 multiplicative 噪声模型。
    Abstract Inferring dynamical models from low-resolution temporal data continues to be a significant challenge in biophysics, especially within transcriptomics, where separating molecular programs from noise remains an important open problem. We explore a common scenario in which we have access to an adequate amount of cross-sectional samples at a few time-points, and assume that our samples are generated from a latent diffusion process. We propose an approach that relies on the probability flow associated with an underlying diffusion process to infer an autonomous, nonlinear force field interpolating between the distributions. Given a prior on the noise model, we employ score-matching to differentiate the force field from the intrinsic noise. Using relevant biophysical examples, we demonstrate that our approach can extract non-conservative forces from non-stationary data, that it learns equilibrium dynamics when applied to steady-state data, and that it can do so with both additive and multiplicative noise models.
    摘要 <>translate "Inferring dynamical models from low-resolution temporal data continues to be a significant challenge in biophysics, especially within transcriptomics, where separating molecular programs from noise remains an important open problem." into Simplified ChineseAnswer:低分辨率时间数据中的动力学模型推断仍然是生物物理学中的一个主要挑战,尤其在转录ómics中,因为分离分子计划与噪声是一个重要的开放问题。我们考虑一种常见的情况,我们有许多时间点的十分适用的横截面样本。我们提议一种方法,它基于下述的扩散过程的概率流来推断一个自主的非线性力场,这个力场可以将分布函数 interpolate。给出一个噪声模型的先验,我们使用分数匹配来 отличи出力场与内在噪声。使用生物物理中的相关示例,我们表明我们的方法可以从非站点数据中提取非保守的力场,learns平衡动力学当应用到平衡数据,并且可以处理 both additive和 multiplicative噪声模型。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Investigating Speed Deviation Patterns During Glucose Episodes: A Quantile Regression Approach

  • paper_url: http://arxiv.org/abs/2310.02351
  • repo_url: None
  • paper_authors: Aparna Joshi, Jennifer Merickel, Cyrus V. Desouza, Matthew Rizzo, Pujitha Gunaratne, Anuj Sharma
  • for: 这个研究旨在探讨 диабе特 sufferers 在驾驶过程中的行为差异,以了解diabetes 对驾驶能力的影响。
  • methods: 该研究使用分布式分析方法,捕捉驾驶者的速度差异模式,以进一步了解diabetes 的影响。
  • results: 研究发现,diabetes 患者在血糖控制不良时,有较高的速度差异和驾驶能力下降风险。
    Abstract Given the growing prevalence of diabetes, there has been significant interest in determining how diabetes affects instrumental daily functions, like driving. Complication of glucose control in diabetes includes hypoglycemic and hyperglycemic episodes, which may impair cognitive and psychomotor functions needed for safe driving. The goal of this paper was to determine patterns of diabetes speed behavior during acute glucose to drivers with diabetes who were euglycemic or control drivers without diabetes in a naturalistic driving environment. By employing distribution-based analytic methods which capture distribution patterns, our study advances prior literature that has focused on conventional approach of average speed to explore speed deviation patterns.
    摘要 Translated into Simplified Chinese:随着糖尿病的流行率的增长,有一些关注如何评估糖尿病对日常功能的影响,如驾驶。糖尿病控制的合并包括低血糖和高血糖 episods,这些可能会影响驾驶需要的认知和动作功能。本文的目的是在自然化驾驶环境中,通过分布分析方法,探讨驾驶者有糖尿病的速度行为模式。我们的研究超越了过去关注的平均速度方法,以探讨速度偏移模式。

Learning unitaries with quantum statistical queries

  • paper_url: http://arxiv.org/abs/2310.02254
  • repo_url: None
  • paper_authors: Armando Angrisani
    for: 这个论文主要针对的是学习unitary оператор的问题。methods: 这篇论文使用了量子统计查询(QSQ)来学习unitary operator。这种查询方法只接受噪声估计的期望值作为输入,而不是直接访问unitary operator和其逆元。results: 这篇论文提出了一些算法来学习unitary operator,包括量子金列良-莱文算法和常量深度电路。这些算法可以使用量子统计查询来实现,而不需要直接访问unitary operator和其逆元。此外,论文还证明了一些任务的 Sample Efficiency 可以被改进,包括 $\mathcal{O}(\log n)$-juntas 和量子布尔函数。
    Abstract We propose several algorithms for learning unitary operators from quantum statistical queries (QSQs) with respect to their Choi-Jamiolkowski state. Quantum statistical queries capture the capabilities of a learner with limited quantum resources, which receives as input only noisy estimates of expected values of measurements. Our methods hinge on a novel technique for estimating the Fourier mass of a unitary on a subset of Pauli strings with a single quantum statistical query, generalizing a previous result for uniform quantum examples. Exploiting this insight, we show that the quantum Goldreich-Levin algorithm can be implemented with quantum statistical queries, whereas the prior version of the algorithm involves oracle access to the unitary and its inverse. Moreover, we prove that $\mathcal{O}(\log n)$-juntas and quantum Boolean functions with constant total influence are efficiently learnable in our model, and constant-depth circuits are learnable sample-efficiently with quantum statistical queries. On the other hand, all previous algorithms for these tasks require direct access to the Choi-Jamiolkowski state or oracle access to the unitary. In addition, our upper bounds imply that the actions of those classes of unitaries on locally scrambled ensembles can be efficiently learned. We also demonstrate that, despite these positive results, quantum statistical queries lead to an exponentially larger sample complexity for certain tasks, compared to separable measurements to the Choi-Jamiolkowski state. In particular, we show an exponential lower bound for learning a class of phase-oracle unitaries and a double exponential lower bound for testing the unitarity of channels, adapting to our setting previous arguments for quantum states. Finally, we propose a new definition of average-case surrogate models, showing a potential application of our results to hybrid quantum machine learning.
    摘要 我们提出了几种算法来从量子统计查询(QSQ)中学习单位 оператор(unitary operator)。量子统计查询捕捉了学习者具有限量量子资源的能力,它将从量子统计查询中获得噪声带入的预期值。我们的方法围绕着一种新的技术来估计单位操作在一 subset of Pauli strings 上的傅立务质量,从而扩展先前的结果,对于均匀量子例子。我们利用这个见解,该示了量子金列 Reich-Levin 算法可以通过量子统计查询进行实现,而先前版本的算法需要量子统计查询中的 oracle 存取。此外,我们证明了 $\mathcal{O}(\log n)$-junta 和量子布尔函数的条件Influence 是高效地学习的,并且可以使用量子统计查询进行学习。另一方面,所有先前的算法需要直接存取Choi-Jamiolkowski 状态或 oracle 存取单位。我们的上限 bounds 显示,量子统计查询可以高效地学习单位操作的行为。此外,我们还示出量子统计查询对于 certain tasks 会带来 exponential sample complexity 的增加,相比于 separable measurements to the Choi-Jamiolkowski 状态。具体来说,我们显示了一个 exponential lower bound для learning a class of phase-oracle unitaries 和 double exponential lower bound для testing the unitarity of channels。最后,我们提出了一个新的 average-case surrogate models 的定义,显示了我们的结果可能应用于半古尔дер量子机器学习。

Why do autoencoders work?

  • paper_url: http://arxiv.org/abs/2310.02250
  • repo_url: None
  • paper_authors: Matthew D. Kvalheim, Eduardo D. Sontag
  • for: 这个论文的目的是解释深度神经网络自动编码器在计算机上的应用,它可以识别数据中的内在维度,并将数据Projected onto a lower-dimensional space。
  • methods: 这个论文使用了深度神经网络自动编码器,包括编码层和解码层,以实现数据的压缩和重建。在编码层中,数据从高维空间 proyected onto a lower-dimensional space,而在解码层中,数据从 lower-dimensional space 重建到高维空间。
  • results: 这个论文证明了深度神经网络自动编码器在实际应用中的效果是非常好,但是存在某些概念上的概念上的限制,这些限制可以通过 differential geometry 来解释。
    Abstract Deep neural network autoencoders are routinely used computationally for model reduction. They allow recognizing the intrinsic dimension of data that lie in a $k$-dimensional subset $K$ of an input Euclidean space $\mathbb{R}^n$. The underlying idea is to obtain both an encoding layer that maps $\mathbb{R}^n$ into $\mathbb{R}^k$ (called the bottleneck layer or the space of latent variables) and a decoding layer that maps $\mathbb{R}^k$ back into $\mathbb{R}^n$, in such a way that the input data from the set $K$ is recovered when composing the two maps. This is achieved by adjusting parameters (weights) in the network to minimize the discrepancy between the input and the reconstructed output. Since neural networks (with continuous activation functions) compute continuous maps, the existence of a network that achieves perfect reconstruction would imply that $K$ is homeomorphic to a $k$-dimensional subset of $\mathbb{R}^k$, so clearly there are topological obstructions to finding such a network. On the other hand, in practice the technique is found to "work" well, which leads one to ask if there is a way to explain this effectiveness. We show that, up to small errors, indeed the method is guaranteed to work. This is done by appealing to certain facts from differential geometry. A computational example is also included to illustrate the ideas.
    摘要 深度神经网络自动编码器在计算上广泛应用于模型减少。它们可以识别数据中的内在维度,即在输入欧几里得空间 $\mathbb{R}^n$ 中的 $k$-维子空间 $K$ 中。基本的想法是通过设置网络参数(权重),使得网络可以将 $\mathbb{R}^n$ 映射到 $\mathbb{R}^k$,并将 $\mathbb{R}^k$ 映射回 $\mathbb{R}^n$,以达到原始输入数据的恢复。这是通过调整参数来实现,以最小化输入和重建输出之间的差异。由于神经网络( WITH 连续活化函数)计算连续Map,因此存在一个网络可以实现完美重建,那么 $K$ 必然是 $\mathbb{R}^k$ 中一个 $k$-维子空间的同构,这会导致找到这种网络的困难。然而,在实践中,这种技术实际上是有效的,这使得人们开始思考是否有一种解释这种效果的方法。我们表明,在小误差下,实际上这种方法是可靠的,这是通过 differential geometry 中的一些事实来证明的。此外,我们还提供了一个计算示例,以 Illustrate 这些想法。

Learning quantum Hamiltonians at any temperature in polynomial time

  • paper_url: http://arxiv.org/abs/2310.02243
  • repo_url: None
  • paper_authors: Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang
  • for: 学习一个本地量子哈密顿 $H$,给出了多个复本的幂函数状态 $\rho = e^{-\beta H}/\text{tr}(e^{-\beta H})$,其中 $\beta > 0$ 是知道的倒数。
  • methods: 我们使用了一种新的平方函数approximation,将幂函数状态转化为多元scalar polynomials和嵌套 commutators,然后将哈密顿学习转化为一个多项系统。我们最后解释,对于这个多项系统,解一个低度 sum-of-squares relaxation 即可准确地学习哈密顿。
  • results: 我们完全解决了这个问题,提供了一个 polynomial time 算法,可以准确地学习 $H$ 到 precision $\epsilon$,只需要 polynomially many copies of the Gibbs state,不 matter what $\beta > 0$ 是。这意味着在任何常数 $\beta$ 下,我们可以实现 computationally efficient 的哈密顿学习算法。
    Abstract We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $\rho = e^{-\beta H}/\textrm{tr}(e^{-\beta H})$ at a known inverse temperature $\beta>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $\epsilon$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $\epsilon$ from polynomially many copies of the Gibbs state at any constant $\beta > 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.
    摘要 我们研究了学习本地量子哈密顿($H$)的问题,将其复复制($\rho = e^{-\beta H}/\text{tr}(e^{-\beta H})$)的副本复本给定的値$\beta > 0$。安修、阿伦那查姆、桑迪哈拉、索利曼费(arXiv:2004.07266)提供了一个算法,可以从多个副本中学习$H$到精度$\epsilon$,但是需要多过多过多的时间。取得 computationally efficient algorithm 是一个主要的开问题,直到[阿伦那查姆'22(arXiv:2204.08349)],[安修、阿伦那查姆'22(arXiv:2204.08349)],仅在高温度情况下解决了这个问题。我们将这个问题完全解决,提供一个 polynomial time 算法,可以从多个副本中学习$H$到精度$\epsilon$,且适用于任何常数$\beta > 0$。我们的主要技术贡献是一个新的平方多项式减少函数,以及将多项式函数与嵌套幂数转换为一个问题。这使我们能够将哈密顿学习推广为一个多项式系统。我们还证明,解决一个低度缩数幂数关推运动的问题,可以精确地学习哈密顿。

Generalized Schrödinger Bridge Matching

  • paper_url: http://arxiv.org/abs/2310.02233
  • repo_url: None
  • paper_authors: Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos A. Theodorou, Ricky T. Q. Chen
  • for: 本文旨在提出一种新的分布匹配算法,用于直接在 diffusion 或 flow 模型中训练分布。
  • methods: 本文使用 Generalized Schrödinger Bridge (GSB) 问题设置,并提出 Generalized Schrödinger Bridge Matching (GSBM) 算法,这种算法可以扩展到考虑任务特定的状态成本。
  • results: 作者在多个实验设置中证明了 GSBM 算法的可靠性和可扩展性,并且在许多情况下显示了改进的扩展性和稳定性。
    Abstract Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generalized Schr\"odinger Bridge (GSB), appears prevalently in many scientific areas both within and without machine learning. We propose Generalized Schr\"odinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances, generalizing them beyond kinetic energy minimization and to account for task-specific state costs. We show that such a generalization can be cast as solving conditional stochastic optimal control, for which efficient variational approximations can be used, and further debiased with the aid of path integral theory. Compared to prior methods for solving GSB problems, our GSBM algorithm always preserves a feasible transport map between the boundary distributions throughout training, thereby enabling stable convergence and significantly improved scalability. We empirically validate our claims on an extensive suite of experimental setups, including crowd navigation, opinion depolarization, LiDAR manifolds, and image domain transfer. Our work brings new algorithmic opportunities for training diffusion models enhanced with task-specific optimality structures.
    摘要 现代分布匹配算法直接定义 diffusion or flow 模型的时间演化 marginal 分布 между两个边缘分布。在这项工作中,我们考虑一种总体分布匹配设置,其中这些 marginal 分布只是被解释为某种任务特定的目标函数的解。这个问题设置被称为通用Schrödinger Bridge(GSB)问题。我们提出一种基于最近进步的Generalized Schrödinger Bridge Matching(GSBM)算法,其扩展了之前的劳动能矩阵最小化,并考虑了任务特定的状态成本。我们表明,这种扩展可以转化为解 conditional stochastic optimal control 问题,并可以使用高效的 Variational approximations 和 path integral theory 进行逼减。与先前的 GSB 问题解法相比,我们的 GSBM 算法总是保持两个边缘分布之间的可靠运输映射,从而实现稳定的转化和显著提高的可扩展性。我们在一系列实验中证明了我们的主张,包括人群导航、意见减轻、 LiDAR manifold 和图像领域传输。我们的工作带来了新的算法机遇,用于增强 diffusion 模型的任务特定优化结构。

HoloNets: Spectral Convolutions do extend to Directed Graphs

  • paper_url: http://arxiv.org/abs/2310.02232
  • repo_url: None
  • paper_authors: Christian Koke, Daniel Cremers
  • for: directed graph convolutional networks
  • methods: advanced tools from complex analysis and spectral theory
  • results: new state of the art results for heterophilic node classification on many datasets, stable to resolution-scale varying topological perturbations.Here’s the full text in Simplified Chinese:
  • for: directed graphs
  • methods: 复杂分析和 спектраль理论工具
  • results: 新的领先Result for 不同 datasets 的异构节点分类,对分辨率层次变化的扰动稳定。
    Abstract Within the graph learning community, conventional wisdom dictates that spectral convolutional networks may only be deployed on undirected graphs: Only there could the existence of a well-defined graph Fourier transform be guaranteed, so that information may be translated between spatial- and spectral domains. Here we show this traditional reliance on the graph Fourier transform to be superfluous and -- making use of certain advanced tools from complex analysis and spectral theory -- extend spectral convolutions to directed graphs. We provide a frequency-response interpretation of newly developed filters, investigate the influence of the basis used to express filters and discuss the interplay with characteristic operators on which networks are based. In order to thoroughly test the developed theory, we conduct experiments in real world settings, showcasing that directed spectral convolutional networks provide new state of the art results for heterophilic node classification on many datasets and -- as opposed to baselines -- may be rendered stable to resolution-scale varying topological perturbations.
    摘要 在图学学术社区中,传统观点认为spectral convolutional networks只能应用于无向图:只有在这种情况下,存在 Graph Fourier Transform的定义可以保证信息在空间频率域和spectral频率域之间进行翻译。我们现在采用了一些复杂分析和频谱理论的高级工具,扩展spectral convolutions到导向图。我们提供了新的滤波器频率响应的解释,研究表达滤波器使用的基准集的影响,并讨论与特征Operator相关的交互。为了彻底测试开发的理论,我们在实际世界中进行了实验,显示了 dirigible spectral convolutional networks在许多数据集上提供了新的状态态Results for heterophilic node classification,并且与基elines相比,可以在分辨率尺度变化时保持稳定。

Structurally guided task decomposition in spatial navigation tasks

  • paper_url: http://arxiv.org/abs/2310.02221
  • repo_url: None
  • paper_authors: Ruiqi He, Carlos G. Correa, Thomas L. Griffiths, Mark K. Ho
  • for: 研究人员想要了解人们如何快速准备计划,即使有限的认知资源。
  • methods: 研究人员通过扩展现有的人任务剖分模型来解释更复杂的计划问题,并在更复杂的 Navigation 领域中应用该模型。
  • results: 研究人员在在线实验中发现,使用该模型可以正确预测大多数参与者的导航策略。
    Abstract How are people able to plan so efficiently despite limited cognitive resources? We aimed to answer this question by extending an existing model of human task decomposition that can explain a wide range of simple planning problems by adding structure information to the task to facilitate planning in more complex tasks. The extended model was then applied to a more complex planning domain of spatial navigation. Our results suggest that our framework can correctly predict the navigation strategies of the majority of the participants in an online experiment.
    摘要 人们如何有效划分计划,即使有限的认知资源呢?我们想要回答这个问题,我们扩展了现有的人类任务分解模型,以便在更复杂的计划问题中更好地划分任务。我们将这个扩展模型应用到了空间导航计划领域,我们的结果表明,我们的框架可以正确预测大多数参与者在线 эксперимен特 navigation策略。

An experimental system for detection and localization of hemorrhage using ultra-wideband microwaves with deep learning

  • paper_url: http://arxiv.org/abs/2310.02215
  • repo_url: None
  • paper_authors: Eisa Hedayati, Fatemeh Safari, George Verghese, Vito R. Ciancia, Daniel K. Sodickson, Seena Dehkharghani, Leeor Alon
  • for: stroke detection
  • methods: 使用低能量微波探测技术和深度学习算法
  • results: 达到了检测的可靠性和准确性,具有1.65毫米的准确性Error
    Abstract Stroke is a leading cause of mortality and disability. Emergent diagnosis and intervention are critical, and predicated upon initial brain imaging; however, existing clinical imaging modalities are generally costly, immobile, and demand highly specialized operation and interpretation. Low-energy microwaves have been explored as low-cost, small form factor, fast, and safe probes of tissue dielectric properties, with both imaging and diagnostic potential. Nevertheless, challenges inherent to microwave reconstruction have impeded progress, hence microwave imaging (MWI) remains an elusive scientific aim. Herein, we introduce a dedicated experimental framework comprising a robotic navigation system to translate blood-mimicking phantoms within an anatomically realistic human head model. An 8-element ultra-wideband (UWB) array of modified antipodal Vivaldi antennas was developed and driven by a two-port vector network analyzer spanning 0.6-9.0 GHz at an operating power of 1 mw. Complex scattering parameters were measured, and dielectric signatures of hemorrhage were learned using a dedicated deep neural network for prediction of hemorrhage classes and localization. An overall sensitivity and specificity for detection >0.99 was observed, with Rayliegh mean localization error of 1.65 mm. The study establishes the feasibility of a robust experimental model and deep learning solution for UWB microwave stroke detection.
    摘要 stroke 是一种主要的死亡和残疾原因。 紧急诊断和 interven 是关键的,但现有的临床成像方法通常很昂贵,不可移动,需要高度专业的操作和解释。 低能量微波已经被研究用于低成本、小形态、快速、安全地探测组织肉眼属性,具有成像和诊断潜力。 然而,微波重建问题的挑战使得微波成像(MWI)成为了一个困难的科学目标。 在这里,我们介绍了一个专门的实验框架,包括一个 робоット导航系统,用于在人类头部模型中穿梭血液模拟体。 我们开发了一个8个元UWB数组,由修改后的反podal Vivaldi天线组成,并通过两个端口 вектор网络分析器覆盖0.6-9.0 GHz的频谱范围,操作功率为1MW。 我们测量了复杂散射参数,并通过专门的深度神经网络来预测出血液类别和地点。 我们观察到了检测的敏感度和特异性都高于0.99,并且各个准确的散射参数的平均值为1.65mm。 这种研究证明了UWB微波成像的可行性和深度学习解决方案的可靠性。

Chunking: Forgetting Matters in Continual Learning even without Changing Tasks

  • paper_url: http://arxiv.org/abs/2310.02206
  • repo_url: None
  • paper_authors: Thomas L. Lee, Amos Storkey
  • for: This paper focuses on the problem of continual learning (CL) with dynamically-changing data distribution, specifically addressing the chunking of data and its impact on CL performance.
  • methods: The paper analyzes the chunking sub-problem in CL and shows that current CL algorithms do not effectively address this issue, leading to performance drops. The authors propose per-chunk weight averaging as a solution to improve performance in the chunking setting and demonstrate its transfer to the full CL setting.
  • results: The paper reveals that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in the authors’ experiments. The proposed per-chunk weight averaging method improves performance in the chunking setting and transfers to the full CL setting, demonstrating its effectiveness in addressing the chunking sub-problem.
    Abstract Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyse why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.
    摘要 translate the given text into Simplified Chinese.Work on continual learning (CL) has largely focused on the problems arising from the dynamically-changing data distribution. However, CL can be decomposed into two sub-problems: (a) shifts in the data distribution, and (b) dealing with the fact that the data is split into chunks and so only a part of the data is available to be trained on at any point in time. In this work, we look at the latter sub-problem -- the chunking of data -- and note that previous analysis of chunking in the CL literature is sparse. We show that chunking is an important part of CL, accounting for around half of the performance drop from offline learning in our experiments. Furthermore, our results reveal that current CL algorithms do not address the chunking sub-problem, only performing as well as plain SGD training when there is no shift in the data distribution. We analyze why performance drops when learning occurs on chunks of data, and find that forgetting, which is often seen to be a problem due to distribution shift, still arises and is a significant problem. Motivated by an analysis of the linear case, we show that per-chunk weight averaging improves performance in the chunking setting and that this performance transfers to the full CL setting. Hence, we argue that work on chunking can help advance CL in general.中文简化版: continual learning (CL) 研究 largely 专注在资料分布的变化问题上,但 CL 可以分为两个子问题:(a)资料分布的变化,以及(b)只有一部分资料可以在任何时候进行训练。在这个工作中,我们专注在后者子问题上——即批量训练。过去 CL 文献中关于批量训练的分析不多,我们表明批量训练是 CL 的重要部分,对于缺乏资料分布的情况下,CL 的性能下降约为半。此外,我们发现现有的 CL 算法不会处理批量训练的问题,它们只能在没有资料分布的情况下和平SGD 训练相同水平。我们分析了在批量训练中学习的原因会导致性能下降,发现这是由于遗传问题导致的。我们还进一步显示,在线性情况下,每个批量的均值调整可以提高批量训练中的性能,并且这些性能可以转移到全 CL 设定中。因此,我们认为研究批量训练可以帮助进步 CL 。

Probabilistically Rewired Message-Passing Neural Networks

  • paper_url: http://arxiv.org/abs/2310.02156
  • repo_url: https://github.com/chendiqian/PR-MPNN
  • paper_authors: Chendi Qian, Andrei Manolache, Kareem Ahmed, Zhe Zeng, Guy Van den Broeck, Mathias Niepert, Christopher Morris
  • for: 这个研究的目的是提出一种可学习的Message-passing graph neural networks(MPNNs)模型,可以处理具有随机遗传的图像输入。
  • methods: 这个模型使用了最近的精确和可微分$k$-subset抽样法,可以学习添加有用的图像关系,同时忽略不重要的关系。
  • results: 这个研究的 тео리式分析表明,PR-MPNNs可以增强表达能力,并且我们提出了具体的条件,在这些条件下,PR-MPNNs 会比Randomized MPNNs表现更好。实验结果显示,我们的方法可以有效地解决过压和不足的问题。此外,我们的方法在一些知名的实际世界数据集上表现了竞争性或更好的预测性,比起传统的 MPNN 模型和最近的图像转换架构。
    Abstract Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable $k$-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
    摘要 message-passing图 neural networks(MPNNs)已经出现为处理图结构输入的强大工具。然而,它们使用固定的输入图结构,忽略了可能的噪声和缺失信息。此外,它们的本地聚合机制可能会导致过抑压和限制表达力,使得不能够捕捉相关的图结构。现有的解决方案主要依靠了规则性的方法,经常忽略了下面数据分布。因此,把学习推理出相关的图结构与给定预测任务相关 remains an open challenge。在这种情况下,我们利用最近的精确和可微的 $k$-subset sampling技术,设计出 probabilistically rewired MPNNs(PR-MPNNs),它可以学习添加相关的边而忽略不重要的边。我们的理论分析表明,PR-MPNNs可以增强表达力,并且我们确定了其表达力超过了各种随机化方法的条件。实际上,我们的方法可以有效地解决过抑压和下降的问题,并且在已知的实际世界数据上显示出与传统 MPNN 模型和最新的图 transformer 架构相当或更高的预测性能。

Graph Neural Network-based EEG Classification: A Survey

  • paper_url: http://arxiv.org/abs/2310.02152
  • repo_url: None
  • paper_authors: Dominik Klepl, Min Wu, Fei He
  • for: 这篇论文旨在系统地审查和分类使用图 neural network (GNN) 来分类 EEG 数据的方法。
  • methods: 论文使用了各种方法来设计 GNN 基类器,包括 spectral graph convolutional layers 和 differential entropy 等。
  • results: 论文发现了这些方法的相似性和差异,以及标准的节点特征 forma , Raw EEG signal 是最为受欢迎的节点特征之一。 论文还提出了一些可能的研究方向,如 Transfer learning 方法和 Cross-frequency interactions 的合适模elling。
    Abstract Graph neural networks (GNN) are increasingly used to classify EEG for tasks such as emotion recognition, motor imagery and neurological diseases and disorders. A wide range of methods have been proposed to design GNN-based classifiers. Therefore, there is a need for a systematic review and categorisation of these approaches. We exhaustively search the published literature on this topic and derive several categories for comparison. These categories highlight the similarities and differences among the methods. The results suggest a prevalence of spectral graph convolutional layers over spatial. Additionally, we identify standard forms of node features, with the most popular being the raw EEG signal and differential entropy. Our results summarise the emerging trends in GNN-based approaches for EEG classification. Finally, we discuss several promising research directions, such as exploring the potential of transfer learning methods and appropriate modelling of cross-frequency interactions.
    摘要 Graph neural networks (GNN) 是越来越多地用于类型化 EEG,用于情绪识别、motor imagery 和 neuroscience diseases 和疾病。许多方法已经提议用于设计 GNN-based 分类器。因此,有需要一篇系统性的评论和分类这些方法。我们对这些方法进行了广泛的文献搜索,并 derivated 出一些比较categories。这些类别 highlights 这些方法之间的相似性和差异。结果表明 spectral graph convolutional layers 的使用更为普遍,而 node features 的标准化也有 Raw EEG signal 和 differential entropy 等。我们的结果总结了 GNN-based EEG 分类的emerging trends,并提出了一些有前途的研究方向,如通过 transfer learning 方法和cross-frequency interactions 的合适模elling。

Symmetric Single Index Learning

  • paper_url: http://arxiv.org/abs/2310.02117
  • repo_url: None
  • paper_authors: Aaron Zweig, Joan Bruna
  • for: 这个论文主要研究了单指数模型在卷积神经网络中的学习问题。
  • methods: 该论文使用了梯度流动方法来解决单指数模型的学习问题。
  • results: 论文证明了梯度流动方法可以在卷积神经网络中收敛到隐藏的植入方向。
    Abstract Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.
    摘要 只有一些神经网络架构适合可证明学习,使用梯度基本方法。一种受欢迎的模型是单指数模型,labels是通过不确定的线性投影和可能不确定的整数链函数生成的。使用SGD进行学习这种模型的理论知识比较完善,其中链函数的信息指数控制了波动样本复杂度的 polynomial 速率。然而,推广这种分析到更深或更复杂的架构仍然是一个挑战。在这种工作中,我们考虑了单指数学习在同质神经网络中。对于 activation 和最大度函数的假设,我们证明了梯度流可以重现隐藏的植入方向,表示为特定的幂 sums 空间中的可数支持向量。我们还定义了适用于我们设定的信息指数,用于控制学习效率的概念。

Hierarchical Concept Discovery Models: A Concept Pyramid Scheme

  • paper_url: http://arxiv.org/abs/2310.02116
  • repo_url: None
  • paper_authors: Konstantinos P. Panousis, Dino Ienco, Diego Marcos
  • for: 本研究旨在提高深度学习模型的 ante hoc 解释性,具体来说是基于概念瓶颈模型(CBM)。
  • methods: 该研究提议一种基于图文模型的新型层次概念发现方法,通过数据驱动和稀疏化 bayesian 理论来实现多级划分概念选择。
  • results: 实验结果表明,提议的构建不仅能够超越当前CBM方法,还提供了一个原则性的解释性框架。
    Abstract Deep Learning algorithms have recently gained significant attention due to their impressive performance. However, their high complexity and un-interpretable mode of operation hinders their confident deployment in real-world safety-critical tasks. This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs). Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on multiple levels of granularity. To this end, we propose a novel hierarchical concept discovery formulation leveraging: (i) recent advances in image-text models, and (ii) an innovative formulation for multi-level concept selection via data-driven and sparsity inducing Bayesian arguments. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene. As we experimentally show, the proposed construction not only outperforms recent CBM approaches, but also yields a principled framework towards interpetability.
    摘要 Translated into Simplified Chinese:深度学习算法在最近受到了广泛关注,因为它们在表现出色的同时也具有较高的复杂性和不可解释性,这限制了它们在实际世界中安全关键任务中的可信部署。本工作目标是在预先解释性(ante hoc interpretability)方面提高CBM的表现,并且具体来说是通过图像文本模型的最新进展和数据驱动、稀热 inducing bayesian理论来实现。在我们的框架中,概念信息不仅仅是基于整个图像和普遍无结构概念之间的相似性,而是通过引入概念层次结构,挖掘和利用图像场景中更细致的概念信息。我们实验表明,我们的建议的构造不仅超越了现有CBM方法,还提供了一个原理性的解释性框架。

FLEDGE: Ledger-based Federated Learning Resilient to Inference and Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2310.02113
  • repo_url: None
  • paper_authors: Jorge Castillo, Phillip Rieger, Hossein Fereidooni, Qian Chen, Ahmad Sadeghi
  • for: 防止潜在攻击者利用联合学习系统进行贪婪和欺诈行为,保证跨多个点之间的资料隐私和安全性。
  • methods: 利用零知识认证和抵抗式认证技术,实现各统计点之间的资料隐私和安全性,并通过奖励良性行为和惩罚违规行为的机制,增强统计点之间的信任和责任感。
  • results: 在四个公共数据集上进行了广泛的评估,试验结果显示FLEDGE可以实现强大的隐私保证和模型价值,同时能够成功地抵制不同的毒素攻击,并且提供了唯一的奖励机制来增强统计点之间的信任和责任感。
    Abstract Federated learning (FL) is a distributed learning process that uses a trusted aggregation server to allow multiple parties (or clients) to collaboratively train a machine learning model without having them share their private data. Recent research, however, has demonstrated the effectiveness of inference and poisoning attacks on FL. Mitigating both attacks simultaneously is very challenging. State-of-the-art solutions have proposed the use of poisoning defenses with Secure Multi-Party Computation (SMPC) and/or Differential Privacy (DP). However, these techniques are not efficient and fail to address the malicious intent behind the attacks, i.e., adversaries (curious servers and/or compromised clients) seek to exploit a system for monetization purposes. To overcome these limitations, we present a ledger-based FL framework known as FLEDGE that allows making parties accountable for their behavior and achieve reasonable efficiency for mitigating inference and poisoning attacks. Our solution leverages crypto-currency to increase party accountability by penalizing malicious behavior and rewarding benign conduct. We conduct an extensive evaluation on four public datasets: Reddit, MNIST, Fashion-MNIST, and CIFAR-10. Our experimental results demonstrate that (1) FLEDGE provides strong privacy guarantees for model updates without sacrificing model utility; (2) FLEDGE can successfully mitigate different poisoning attacks without degrading the performance of the global model; and (3) FLEDGE offers unique reward mechanisms to promote benign behavior during model training and/or model aggregation.
    摘要 federated learning (FL) 是一种分布式学习过程,使用一个可信的聚合服务器,让多个方(或客户端)共同训练一个机器学习模型,无需共享私人数据。然而, latest research 表明,FL 受到推理和毒击攻击的威胁。 simultaneously mitigating both attacks 是非常困难的。 current solutions 提出使用毒素防御技术(SMPC)和/或差分隐私(DP),但这些技术不是高效的,而且不能 Addressing the malicious intent behind the attacks, i.e., adversaries (curious servers and/or compromised clients) seek to exploit the system for monetization purposes。To overcome these limitations, we present a ledger-based FL framework known as FLEDGE that allows making parties accountable for their behavior and achieve reasonable efficiency for mitigating inference and poisoning attacks. Our solution leverages crypto-currency to increase party accountability by penalizing malicious behavior and rewarding benign conduct. We conduct an extensive evaluation on four public datasets: Reddit, MNIST, Fashion-MNIST, and CIFAR-10. Our experimental results demonstrate that:1. FLEDGE provides strong privacy guarantees for model updates without sacrificing model utility;2. FLEDGE can successfully mitigate different poisoning attacks without degrading the performance of the global model;3. FLEDGE offers unique reward mechanisms to promote benign behavior during model training and/or model aggregation.

Stochastic Gradient Descent with Preconditioned Polyak Step-size

  • paper_url: http://arxiv.org/abs/2310.02093
  • repo_url: https://github.com/fxrshed/scaledsps
  • paper_authors: Farshed Abdukhakimov, Chulu Xiang, Dmitry Kamzolov, Martin Takáč
  • for: 提高 Stochastic Gradient Descent 的性能在 badly scaled 和/或 ill-conditioned 数据集上
  • methods: 使用 preconditioning 技术,如 Hutchinson’s method、Adam 和 AdaGrad,提高 SPS 的性能
  • results: 提高 SPS 的性能,使其在 badly scaled 和/或 ill-conditioned 数据集上更高效
    Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization methods that are widely used in solving machine learning problems. These methods display valuable properties and attract researchers and industrial machine learning engineers with their simplicity. However, one of the weaknesses of this type of methods is the necessity to tune learning rate (step-size) for every loss function and dataset combination to solve an optimization problem and get an efficient performance in a given time budget. Stochastic Gradient Descent with Polyak Step-size (SPS) is a method that offers an update rule that alleviates the need of fine-tuning the learning rate of an optimizer. In this paper, we propose an extension of SPS that employs preconditioning techniques, such as Hutchinson's method, Adam, and AdaGrad, to improve its performance on badly scaled and/or ill-conditioned datasets.
    摘要

1D-CapsNet-LSTM: A Deep Learning-Based Model for Multi-Step Stock Index Forecasting

  • paper_url: http://arxiv.org/abs/2310.02090
  • repo_url: None
  • paper_authors: Cheng Zhang, Nilam Nur Amir Sjarif, Roslina Ibrahim
  • for: 预测股票市场指数价格的多步预测任务是金融领域的关键任务,对各种金融活动的决策起着关键性的作用。然而,预测结果经常不满足要求,归因于数据的随机和抖抖性。研究人员已经尝试了各种方法,这一过程仍在继续。
  • methods: 这种研究借鉴了卷积神经网络长短期记忆网络(CNN-LSTM),使用1D卷积神经网络进行特征提取以提高模型性能。此外,该研究还引入了干擦网络(CapsNet)作为高级特征提取器,并结合LSTM层以捕捉时间相关性。为了维护多个输入输出之间的随机相关性,该模型采用了多输入多输出(MIMO)策略。
  • results: 该研究对实际的股票市场指数进行了评估,包括标普500指数(S&P 500)、道琼工业指数(DJIA)、纳斯达克股票指数(IXIC)和纽约股票交易所指数(NYSE)。与基准模型 such as LSTM、RNN和CNN-LSTM进行比较,结果表明1D-CapsNet-LSTM模型在各种评价指标上表现出色,并有很大的潜力在复杂预测任务中。
    Abstract Multi-step forecasting of stock market index prices is a crucial task in the financial sector, playing a pivotal role in decision-making across various financial activities. However, forecasting results are often unsatisfactory owing to the stochastic and volatile nature of the data. Researchers have made various attempts, and this process is ongoing. Inspired by convolutional neural network long short-term memory (CNN-LSTM) networks that utilize a 1D CNN for feature extraction to boost model performance, this study explores the use of a capsule network (CapsNet) as an advanced feature extractor in an LSTM-based forecasting model to enhance multi-step predictions. To this end, a novel neural architecture called 1D-CapsNet-LSTM was introduced, which combines a 1D CapsNet to extract high-level features from 1D sequential data and an LSTM layer to capture the temporal dependencies between the previously extracted features and uses a multi-input multi-output (MIMO) strategy to maintain the stochastic dependencies between the predicted values at different time steps. The proposed model was evaluated based on several real-world stock market indices, including Standard & Poor's 500 (S&P 500), Dow Jones Industrial Average (DJIA), Nasdaq Composite Index (IXIC), and New York Stock Exchange (NYSE), and was compared with baseline models such as LSTM, recurrent neural network (RNN), and CNN-LSTM in terms of various evaluation metrics. The comparison results suggest that the 1D-CapsNet-LSTM model outperforms the baseline models and has immense potential for the effective handling of complex prediction tasks.
    摘要 多步预测股票市场指数价格是金融部门中一项重要的任务,对各种金融活动的决策具有关键性。然而,预测结果经常不满足 expectations due to the stochastic and volatile nature of the data. Researchers have made various attempts, and this process is ongoing.引ourg CNN-LSTM 网络,这种网络使用一个 1D CNN 来提取特征,以提高模型性能。本研究则探索使用 Capsule Network (CapsNet) 作为高级特征提取器,并与 LSTM 层结合使用,以增强多步预测。为此,我们提出了一种新的神经网络架构,称为 1D-CapsNet-LSTM,它将 1D CapsNet 用于提取高级特征,并将 LSTM 层用于捕捉时间间隔中的相互关系。此外,我们采用 MIMO 策略,以维护预测值之间的随机相关性。我们对实际的股票市场指数进行了评估,包括 Standard & Poor's 500 (S&P 500)、Dow Jones Industrial Average (DJIA)、Nasdaq Composite Index (IXIC) 和 New York Stock Exchange (NYSE)。与基准模型 such as LSTM、RNN 和 CNN-LSTM 进行比较,我们发现 1D-CapsNet-LSTM 模型在各种评价指标上表现出色,并且有潜在的应用前景。

Learning Quantum Processes with Quantum Statistical Queries

  • paper_url: http://arxiv.org/abs/2310.02075
  • repo_url: https://github.com/chirag-w/qpsq-learning
  • paper_authors: Chirag Wadhwa, Mina Doosti
  • for: 本文为了研究量子过程学习而提出了首个学习框架,并提供了量子统计查询(QSQ)模型下的首个 formally定义的统计查询(QPSQ)。
  • methods: 该框架使得可以提出高效的QPSQ学习算法,并提供了一个可靠性保证。数据示范也验证了该算法的有效性。
  • results: 本文通过应用于密码分析中,演示了该框架的实际 relevance,揭示了 classical-readout量子物理不可克隆函数(CR-QPUF)的漏洞,解决了量子硬件安全领域的一个重要开问题。这项工作对量子过程学习的理解和安全性带来了重要的进展。
    Abstract Learning complex quantum processes is a central challenge in many areas of quantum computing and quantum machine learning, with applications in quantum benchmarking, cryptanalysis, and variational quantum algorithms. This paper introduces the first learning framework for studying quantum process learning within the Quantum Statistical Query (QSQ) model, providing the first formal definition of statistical queries to quantum processes (QPSQs). The framework allows us to propose an efficient QPSQ learner for arbitrary quantum processes accompanied by a provable performance guarantee. We also provide numerical simulations to demonstrate the efficacy of this algorithm. The practical relevance of this framework is exemplified through application in cryptanalysis, highlighting vulnerabilities of Classical-Readout Quantum Physical Unclonable Functions (CR-QPUFs), addressing an important open question in the field of quantum hardware security. This work marks a significant step towards understanding the learnability of quantum processes and shedding light on their security implications.
    摘要 学习复杂量子过程是许多量子计算和量子机器学习领域的中心挑战,具有应用于量子准则测试、破解和变量量子算法等领域。本文介绍了首个量子过程学习框架,基于量子统计查询(QSQ)模型,提供了首个量子过程统计查询(QPSQ)的正式定义。这个框架允许我们提出一种高效的QPSQ学习算法,并提供了可证明性能保证。我们还提供了数字实验来证明该算法的效果。这个框架的实用性被应用于密码分析中,揭示了类型ical-Readout量子物理不可复制函数(CR-QPUFs)的漏洞,解决了量子硬件安全领域中的一个重要问题。这项工作标志着量子过程学习的理解和安全性的重要进展。

ACE: A fast, skillful learned global atmospheric model for climate prediction

  • paper_url: http://arxiv.org/abs/2310.02074
  • repo_url: None
  • paper_authors: Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, Christopher S. Bretherton
  • for: 该论文是为了提出一种基于人工智能的气候预测模型,以提高气候预测的稳定性和物理一致性。
  • methods: 该论文使用了200M个参数的自动适应机器学习模型,来模拟一个现有的100km分覆盖全球气象模型。该模型的形ulation允许评估物理法律,如质量和湿度的保守。
  • results: 该论文的结果显示,ACE模型在10年的稳定性下,可以准确地预测气候变化,并且在80%以上的变量上超越了一个具有挑战性的基线模型。同时,ACE模型只需要100倍的wall clock时间和100倍的能源成本,与传统的 Referenced Model 相比。
    Abstract Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 10 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 80% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources.
    摘要

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

  • paper_url: http://arxiv.org/abs/2310.02065
  • repo_url: https://github.com/udc-gac/venom
  • paper_authors: Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler
  • for: 提高深度学习模型的计算效率和能力
  • methods: 使用简化方法、剪枝算法和专门的稀疏 вектор单元支持
  • results: 实现了2x增速,并且可以达到高稀疏比率(37x增速 над cuBLAS)和低损失精度(在现代转换器中)
    Abstract The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2x speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparse-library for DL routines. We show that Spatha achieves up to 37x speedup over cuBLAS. We also demonstrate a second-order pruning technique that enables sparsification to high sparsity ratios with V:N:M and little to no loss in accuracy in modern transformers.
    摘要 “深度学习模型的成功和扩展需要更高的计算效率和能力。简化可以导致模型的减小和更高的计算效率,并且特殊化硬件在进行支持。然而,利用其需要kernel实现、剪枝算法和存储格式,以利用硬件支持特殊的稀疏向量单元。例如,NVIDIA的稀疏tensor核心(SPTCs)可以提供2倍的速度提升。然而,SPTCs只支持2:4格式,这限制了可达的稀疏比例到50%。我们提出了V:N:M格式,可以在SPTCs上执行任意N:M比例的计算。为了高效地利用这种格式,我们提出了Spatha,一个高性能的稀疏库 дляDLRoutines。我们表明Spatha可以与cuBLAS相比 achieve up to 37倍的速度提升。我们还展示了一种第二阶剪枝技术,可以在V:N:M格式下实现高稀疏比例,而无需失去现代transformer中的精度。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

Lessons Learned from EXMOS User Studies: A Technical Report Summarizing Key Takeaways from User Studies Conducted to Evaluate The EXMOS Platform

  • paper_url: http://arxiv.org/abs/2310.02063
  • repo_url: None
  • paper_authors: Aditya Bhattacharya, Simone Stumpf, Lucija Gosak, Gregor Stiglic, Katrien Verbert
  • for: 本研究旨在探讨如何在交互式机器学习系统中提供解释,以帮助域专家更好地调试和改进预测模型。
  • methods: 本研究采用了两个用户研究,包括量化分析和质量评估,以探索不同类型的解释对域专家的影响。
  • results: 研究发现,全局模型中心解释独立无法有效地指导用户进行数据配置。相比之下,数据中心解释能够增强系统变化的理解,但是组合两者最高效地带来了域专家对模型改进的信任、理解和能力。
    Abstract In the realm of interactive machine-learning systems, the provision of explanations serves as a vital aid in the processes of debugging and enhancing prediction models. However, the extent to which various global model-centric and data-centric explanations can effectively assist domain experts in detecting and resolving potential data-related issues for the purpose of model improvement has remained largely unexplored. In this technical report, we summarise the key findings of our two user studies. Our research involved a comprehensive examination of the impact of global explanations rooted in both data-centric and model-centric perspectives within systems designed to support healthcare experts in optimising machine learning models through both automated and manual data configurations. To empirically investigate these dynamics, we conducted two user studies, comprising quantitative analysis involving a sample size of 70 healthcare experts and qualitative assessments involving 30 healthcare experts. These studies were aimed at illuminating the influence of different explanation types on three key dimensions: trust, understandability, and model improvement. Results show that global model-centric explanations alone are insufficient for effectively guiding users during the intricate process of data configuration. In contrast, data-centric explanations exhibited their potential by enhancing the understanding of system changes that occur post-configuration. However, a combination of both showed the highest level of efficacy for fostering trust, improving understandability, and facilitating model enhancement among healthcare experts. We also present essential implications for developing interactive machine-learning systems driven by explanations. These insights can guide the creation of more effective systems that empower domain experts to harness the full potential of machine learning
    摘要 在机器学习系统中的互动实现中,提供说明是一项非常重要的帮助工具,用于系统调试和改进预测模型。然而,许多全球模型中心和数据中心的说明是否能够有效地帮助领域专家检测和解决数据相关问题,以便提高模型的改进,还未得到充分探讨。在这份技术报告中,我们Summarize了我们的两项用户研究的关键发现。我们的研究涵盖了全球说明在支持医疗专家优化机器学习模型的自动和手动数据配置系统中的影响。为了实际检查这些动态,我们进行了两项用户研究,包括70名医疗专家的量化分析和30名医疗专家的质量评估。这些研究的目的是为了突出不同类型的说明对三个关键维度的影响:信任、理解度和模型改进。结果表明,全球模型中心的说明独立无法有效地引导用户进行数据配置过程中的繁杂过程。相比之下,数据中心的说明在系统改变后的理解方面表现出了潜在的优势。然而,两者的组合显示出了最高水平的效果,即使用途中心的医疗专家建立信任,提高理解度,并且促进模型改进。我们还提出了开发基于说明的互动机器学习系统的重要建议。这些发现可以导引创造更有效的系统,以便领域专家更好地利用机器学习的潜力。

The Inhibitor: ReLU and Addition-Based Attention for Efficient Transformers

  • paper_url: http://arxiv.org/abs/2310.02041
  • repo_url: None
  • paper_authors: Rickard Brännvall
  • for: 提高量化Transformer的计算效率
  • methods: 使用添加和ReLU活化代替点积和Softmax基于的注意力机制
  • results: 实现了与传统点积注意力相对的测试集预测分数,并且在 encryption 下实现了显著的计算成本减少。
    Abstract To enhance the computational efficiency of quantized Transformers, we replace the dot-product and Softmax-based attention with an alternative mechanism involving addition and ReLU activation only. This side-steps the expansion to double precision often required by matrix multiplication and avoids costly Softmax evaluations but maintains much of the core functionality of conventional dot-product attention. It can enable more efficient execution and support larger quantized Transformer models on resource-constrained hardware or alternative arithmetic systems like homomorphic encryption. Training experiments on four common benchmark tasks show test set prediction scores comparable to those of conventional Transformers with dot-product attention. Our scaling experiments also suggest significant computational savings, both in plaintext and under encryption. In particular, we believe that the ReLU and addition-based attention mechanism introduced in this paper may enable privacy-preserving AI applications operating under homomorphic encryption by avoiding the costly multiplication of encrypted variables.
    摘要 为提高量化Transformers的计算效率,我们将点乘和Softmax基于的注意机制换为一种另外的机制,只包括加法和ReLU活动。这种方法可以避免矩阵乘法中的扩展到双精度,并避免贵重的Softmax评估,但是保持了大部分普通点乘注意的核心功能。它可以启用更高效的执行和支持更大的量化Transformer模型在资源受限的硬件或非标准数学系统上。训练实验在四个常见的任务上表明,测试集预测得分与普通点乘注意的模型相比几乎相同。我们的扩展实验还表明,使用我们在这篇论文中提出的ReLU和加法基于的注意机制可以实现隐私保护AI应用程序在同质加密下运行,避免了EncryptedVariable的贵重乘法。

aSAGA: Automatic Sleep Analysis with Gray Areas

  • paper_url: http://arxiv.org/abs/2310.02032
  • repo_url: None
  • paper_authors: Matias Rusanen, Gabriel Jouan, Riku Huttunen, Sami Nikkonen, Sigríður Sigurðardóttir, Juha Töyräs, Brett Duce, Sami Myllymaa, Erna Sif Arnardottir, Timo Leppänen, Anna Sigridur Islind, Samu Kainulainen, Henri Korkalainen
  • for: 这个研究旨在提出一个人工智能与人类合作的睡眠分析方法,以实现 automatization 的睡眠分析,并且能够与职业睡眠技术员之间的互动进行协同运作。
  • methods: 这个研究使用了自动睡眠分析模型(aSAGA),该模型能够对来自临床波形测量和家用睡眠测量的睡眠资料进行自动分析,并且能够处理不同类型的睡眠资料。
  • results: 研究发现,使用这个自动睡眠分析模型可以与人类职业睡眠技术员的分析结果相互匹配,并且可以运用不同类型的睡眠资料进行自动分析。此外,这个研究还发现了一个称为“灰色区域”的概念,这个概念可以用来描述自动睡眠分析中存在着不确定性的部分,并且可以帮助睡眠技术员更好地处理这些部分。
    Abstract State-of-the-art automatic sleep staging methods have already demonstrated comparable reliability and superior time efficiency to manual sleep staging. However, fully automatic black-box solutions are difficult to adapt into clinical workflow and the interaction between explainable automatic methods and the work of sleep technologists remains underexplored and inadequately conceptualized. Thus, we propose a human-in-the-loop concept for sleep analysis, presenting an automatic sleep staging model (aSAGA), that performs effectively with both clinical polysomnographic recordings and home sleep studies. To validate the model, extensive testing was conducted, employing a preclinical validation approach with three retrospective datasets; open-access, clinical, and research-driven. Furthermore, we validate the utilization of uncertainty mapping to identify ambiguous regions, conceptualized as gray areas, in automatic sleep analysis that warrants manual re-evaluation. The results demonstrate that the automatic sleep analysis achieved a comparable level of agreement with manual analysis across different sleep recording types. Moreover, validation of the gray area concept revealed its potential to enhance sleep staging accuracy and identify areas in the recordings where sleep technologists struggle to reach a consensus. In conclusion, this study introduces and validates a concept from explainable artificial intelligence into sleep medicine and provides the basis for integrating human-in-the-loop automatic sleep staging into clinical workflows, aiming to reduce black-box criticism and the burden associated with manual sleep staging.
    摘要 现代自动睡眠分期方法已经达到了人工分期的相似可靠性和更高的时间效率。然而,完全自动的黑盒解决方案具有与休息技术人员的互动不足和不充分的概念化。因此,我们提出了人工 loop概念 для睡眠分析,presenting an automatic sleep staging model (aSAGA),可以具有临床多somnographic recording和家用睡眠研究的效果。为验证模型,我们进行了广泛的测试,使用了三个退回数据集:开放访问、临床和研究驱动。此外,我们验证了uncertainty mapping的使用,以识别自动睡眠分析中的不确定区域,即灰色区域,需要人工重新评估。结果表明,自动睡眠分析达到了不同睡眠记录类型的相同水平的一致度。此外, validation of the gray area concept revealed its potential to enhance sleep staging accuracy and identify areas in the recordings where sleep technologists struggle to reach a consensus。总之,这项研究将解释人工智能概念引入睡眠医学中,并提供了将人工 loop自动睡眠分析 integrate into clinical workflows的基础,以减少黑盒批评和手动睡眠分析的负担。

Between accurate prediction and poor decision making: the AI/ML gap

  • paper_url: http://arxiv.org/abs/2310.02029
  • repo_url: None
  • paper_authors: Gianluca Bontempi
  • for: 本研究旨在探讨人工智能代理人在做出决策时的假设准确性对最终决策的影响,以及 Utility 评估的不准确性对决策的影响。
  • methods: 本研究使用了 AI/ML 技术来预测可能的动作的后果,并对决策策略进行优化。研究者还使用了 teoretic 和 simulations 来评估 Utility 评估的不准确性对决策的影响。
  • results: 研究结果显示,假设 Utility 评估不准确可能导致决策失败,而且这种情况可能比假设概率评估不准确的情况更加严重。研究者建议人工智能社区做出一个关注 Utility 评估的偏移,强调在决策过程中准确评估 Utility。
    Abstract Intelligent agents rely on AI/ML functionalities to predict the consequence of possible actions and optimise the policy. However, the effort of the research community in addressing prediction accuracy has been so intense (and successful) that it created the illusion that the more accurate the learner prediction (or classification) the better would have been the final decision. Now, such an assumption is valid only if the (human or artificial) decision maker has complete knowledge of the utility of the possible actions. This paper argues that AI/ML community has taken so far a too unbalanced approach by devoting excessive attention to the estimation of the state (or target) probability to the detriment of accurate and reliable estimations of the utility. In particular, few evidence exists about the impact of a wrong utility assessment on the resulting expected utility of the decision strategy. This situation is creating a substantial gap between the expectations and the effective impact of AI solutions, as witnessed by recent criticisms and emphasised by the regulatory legislative efforts. This paper aims to study this gap by quantifying the sensitivity of the expected utility to the utility uncertainty and comparing it to the one due to probability estimation. Theoretical and simulated results show that an inaccurate utility assessment may as (and sometimes) more harmful than a poor probability estimation. The final recommendation to the community is then to undertake a focus shift from a pure accuracy-driven (or obsessed) approach to a more utility-aware methodology.
    摘要 智能代理人 rely on AI/ML 功能来预测可能的行动的结果并优化策略。然而,研究社区对预测精度的努力已经非常投入(并成功),创造了一种假设:当 learner 预测(或分类)更加准确,final 决策就更好。然而,这个假设只有当人工或天然的决级者有完整的知识的可能行动的用处时才是有效的。本文 argue 表示 AI/ML 社区已经过去了一种过度的偏重策略,即强调预测状态(或目标)概率的优先级,而忽略了准确和可靠地 estimating 用处。特别是,有限的证据表明一个 incorrect 用处评估对最终预期的用处产生的影响。这种情况正在创造一个巨大的预期与实际效果之间的差距,如 witnessed 由 latest 批判和 regulatory 立法努力。本文旨在研究这个差距,并通过量化用处不确定性的敏感度和比较它与概率估计的影响来解释。理论和模拟结果表明,一个不准确的用处评估可能比一个差异概率估计更有害。因此,我们建议 AI/ML 社区转移注意点从纯粹的准确率驱动(或困惑)方法至更加用处意识的方法ологи。

DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2310.02027
  • repo_url: None
  • paper_authors: Jiaxu Liu, Xinping Yi, Xiaowei Huang
  • for: 本研究旨在提出一种深度多层扩展的几何图 convolutional neural network (HGCN),以解决现有HGCN的昂贵的几何操作和深度增加导致的过拟合问题。
  • methods: 本研究提出了两个关键技术来推动深度HGCN的发展:首先,一种新的几何特征变换层,可以快速和准确地 Linear Map;其次,在几何GCN中使用几何偏置和regulation,通过高效的几何中点方法来实现。
  • results: 对于链接预测和节点分类任务,DeepHGCN在比对欧式和浅几何GCN variant时获得了显著的改善。
    Abstract Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) Techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.
    摘要 希пербо利ックグラフ学习网络(HGCN)已经表现出抽取层次图的信息的潜力。然而,现有的HGCN受到 shallow 架构的限制,因为浸入很贵的希пербо利ック操作和难以控制的过滤效果。虽然在 GCN 中,对于减轻过滤效果进行了处理,但是开发希пербо利ック疗法呈现出独特的挑战,因为操作需要特别地设计适应希пербо利ック性质。面临这些挑战,在这项工作中,我们提出了 DeepHGCN,第一个深度多层 HGCN 架构,具有显著提高的计算效率和减轻过滤效果。DeepHGCN 的两个关键启发器是:(1)一种新的希пербо利ック特征变换层,可以快速和准确地生成线性映射;(2)使用希пербо利ック征料连接和规则化,通过高效的希пербо利ック中点方法来促进权重和特征的正则化。广泛的实验表明,DeepHGCN 在链接预测和节点分类任务中具有显著的改善,比 Euclidean 和 shallow 希пербо利ック GCN 变体更好。

DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training

  • paper_url: http://arxiv.org/abs/2310.02025
  • repo_url: https://github.com/Phoveran/DeepZero
  • paper_authors: Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Jiancheng Liu, Konstantinos Parasyris, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu
    for:* This paper aims to develop a principled zero-order (ZO) deep learning framework for training deep neural networks (DNNs) without a significant decrease in performance.methods:* The proposed framework, called DeepZero, uses three primary innovations to scale ZO optimization to DNN training: coordinate-wise gradient estimation (CGE), sparsity-induced ZO training protocol, and feature reuse and forward parallelization.results:* The proposed DeepZero framework achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching first-order (FO) training performance for the first time. Additionally, the framework demonstrates practical utility in applications such as certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA.
    Abstract Zeroth-order (ZO) optimization has become a popular technique for solving machine learning (ML) problems when first-order (FO) information is difficult or impossible to obtain. However, the scalability of ZO optimization remains an open problem: Its use has primarily been limited to relatively small-scale ML problems, such as sample-wise adversarial attack generation. To our best knowledge, no prior work has demonstrated the effectiveness of ZO optimization in training deep neural networks (DNNs) without a significant decrease in performance. To overcome this roadblock, we develop DeepZero, a principled ZO deep learning (DL) framework that can scale ZO optimization to DNN training from scratch through three primary innovations. First, we demonstrate the advantages of coordinate-wise gradient estimation (CGE) over randomized vector-wise gradient estimation in training accuracy and computational efficiency. Second, we propose a sparsity-induced ZO training protocol that extends the model pruning methodology using only finite differences to explore and exploit the sparse DL prior in CGE. Third, we develop the methods of feature reuse and forward parallelization to advance the practical implementations of ZO training. Our extensive experiments show that DeepZero achieves state-of-the-art (SOTA) accuracy on ResNet-20 trained on CIFAR-10, approaching FO training performance for the first time. Furthermore, we show the practical utility of DeepZero in applications of certified adversarial defense and DL-based partial differential equation error correction, achieving 10-20% improvement over SOTA. We believe our results will inspire future research on scalable ZO optimization and contribute to advancing DL with black box.
    摘要 Zero-order(ZO)优化已成为机器学习(ML)问题的受欢迎技术,当第一顺(FO)信息困难或不可能获得时。然而,ZO优化的扩展性仍然是一个打开的问题:它的使用主要受到了相对较小规模的ML问题的限制,如采样计算中的对抗攻击生成。据我们所知,没有任何先前的工作可以证明ZO优化在训练深度学习网络(DNN)时不会导致性能下降的情况。为了突破这个障碍,我们开发了DeepZero,一个理解ZO深度学习(DL)框架,可以扩展ZO优化到DNN训练从头开始。我们通过以下三个主要创新来实现这一目标:1. 在训练精度和计算效率方面,coordinate-wise gradient estimation(CGE)比随机化向量化gradient estimation(RVE)更有利。2. 我们提出了基于简单差分的模型剔除方法,通过激活 sparse DL prior 来扩展模型剔除方法。3. 我们开发了Feature Reuse和Forward Parallelization等方法,以提高ZO训练的实际应用。我们的广泛实验表明,DeepZero可以在ResNet-20 trained on CIFAR-10上达到SOTA精度,并且在FO训练性能附近。此外,我们还证明了DeepZero在证明性防御和DL基于partial differential equation error correction应用中的实用性,提高了10-20%。我们认为我们的结果将激励未来关于可扩展ZO优化的研究,并为深度学习的扩展做出贡献。

Nash Regret Guarantees for Linear Bandits

  • paper_url: http://arxiv.org/abs/2310.02023
  • repo_url: None
  • paper_authors: Ayush Sawarni, Soumybrata Pal, Siddharth Barman
  • for: 这个论文主要目标是提供一种在Stochastic Linear Bandits框架下实现的公平的帮助Algorithm,并且提供了一个准确的 regret bound。
  • methods: 这个论文使用了Successive Elimination方法,并且提供了一些新的技术解决方案,包括tailored concentration bounds和使用John ellipsoid sampling。
  • results: 这个论文得到了一个 Nash regret upper bound of $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$,并且在 bounded, positive rewards 下也能够获得一个 Nash regret upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$。
    Abstract We obtain essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening -- referred to as Nash regret -- is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms and, hence, an upper bound on Nash regret provides a principled fairness guarantee. We consider the stochastic linear bandits problem over a horizon of $T$ rounds and with set of arms ${X}$ in ambient dimension $d$. Furthermore, we focus on settings in which the stochastic reward -- associated with each arm in ${X}$ -- is a non-negative, $\nu$-sub-Poisson random variable. For this setting, we develop an algorithm that achieves a Nash regret of $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$. In addition, addressing linear bandit instances in which the set of arms ${X}$ is not necessarily finite, we obtain a Nash regret upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$. Since bounded random variables are sub-Poisson, these results hold for bounded, positive rewards. Our linear bandit algorithm is built upon the successive elimination method with novel technical insights, including tailored concentration bounds and the use of sampling via John ellipsoid in conjunction with the Kiefer-Wolfowitz optimal design.
    摘要 我们获得了一种 Essentially tight upper bounds for a strengthened notion of regret in the stochastic linear bandits framework. The strengthening, referred to as Nash regret, is defined as the difference between the (a priori unknown) optimum and the geometric mean of expected rewards accumulated by the linear bandit algorithm. Since the geometric mean corresponds to the well-studied Nash social welfare (NSW) function, this formulation quantifies the performance of a bandit algorithm as the collective welfare it generates across rounds. NSW is known to satisfy fairness axioms, and therefore, an upper bound on Nash regret provides a principled fairness guarantee.我们考虑了在 $T$ 个轮次和 $X$ 个武器(arm)上的随机线性带刺问题。具体来说,我们考虑的是在每个轮次上,每个武器的奖励是一个非负、 $\nu$-次减Poisson随机变量。为这种设定,我们开发了一个算法,其 Nash regret 为 $O\left( \sqrt{\frac{d\nu}{T} \log( T |X|)\right)$。此外,我们还考虑了线性带刺问题中的非确定武器集 ${X}$,并获得了一个 Nash regret Upper bound of $O\left( \frac{d^\frac{5}{4}\nu^{\frac{1}{2}}{\sqrt{T} \log(T)\right)$。由于减Poisson随机变量是bounded random variables的特例,这些结果适用于带有 bounded, positive 奖励的情况。我们的线性带刺算法基于成功排除方法,并具有新的技术积umi,包括专门的集中散度 bounds 和 John ellipsoid 的采样。

Ranking a Set of Objects using Heterogeneous Workers: QUITE an Easy Problem

  • paper_url: http://arxiv.org/abs/2310.02016
  • repo_url: None
  • paper_authors: Alessandro Nordio, Alberto tarable, Emilio Leonardi
  • for: 本研究旨在解决对 $N$ 个对象进行排名的问题,从一群不均衡的工作者提供的不准确对比数据开始。
  • methods: 本研究提出了一种非适应式排名算法 QUITE,该算法同时估计工作者的可靠性和对象的质量。
  • results: 对于不同场景,QUITE 的表现与之前提出的算法进行比较,并且可以自然地做出适应式改进。
    Abstract We focus on the problem of ranking $N$ objects starting from a set of noisy pairwise comparisons provided by a crowd of unequal workers, each worker being characterized by a specific degree of reliability, which reflects her ability to rank pairs of objects. More specifically, we assume that objects are endowed with intrinsic qualities and that the probability with which an object is preferred to another depends both on the difference between the qualities of the two competitors and on the reliability of the worker. We propose QUITE, a non-adaptive ranking algorithm that jointly estimates workers' reliabilities and qualities of objects. Performance of QUITE is compared in different scenarios against previously proposed algorithms. Finally, we show how QUITE can be naturally made adaptive.
    摘要 我们强调在从一群不同能力的工人提供的混乱对比中排名 $N$ 个物品的问题。更 Specifically,我们假设物品具有内在质量,并且工人的可靠度和物品之间的差异都会影响选择的结果。我们提出了QUITE非逐算法,可以同时估计工人的可靠度和物品的质量。我们在不同的情况下与先前的算法进行比较QUITE的性能,最后我们显示了如何将QUITE自然地变数算法。

Spectral operator learning for parametric PDEs without data reliance

  • paper_url: http://arxiv.org/abs/2310.02013
  • repo_url: None
  • paper_authors: Junho Choi, Taehyun Yun, Namjung Kim, Youngjoon Hong
  • for: solves parametric partial differential equations (PDEs) without the need for data harnessing.
  • methods: employs expansions using orthogonal functions, such as Fourier series and Legendre polynomials, and merges the merits of spectral methods with the prowess of deep neural networks.
  • results: accurately predicts solutions of complex parametric PDEs, including singularly perturbed convection-diffusion equations and the Navier-Stokes equations, without the need for paired input-output training data.
    Abstract In this paper, we introduce the Spectral Coefficient Learning via Operator Network (SCLON), a novel operator learning-based approach for solving parametric partial differential equations (PDEs) without the need for data harnessing. The cornerstone of our method is the spectral methodology that employs expansions using orthogonal functions, such as Fourier series and Legendre polynomials, enabling accurate PDE solutions with fewer grid points. By merging the merits of spectral methods - encompassing high accuracy, efficiency, generalization, and the exact fulfillment of boundary conditions - with the prowess of deep neural networks, SCLON offers a transformative strategy. Our approach not only eliminates the need for paired input-output training data, which typically requires extensive numerical computations, but also effectively learns and predicts solutions of complex parametric PDEs, ranging from singularly perturbed convection-diffusion equations to the Navier-Stokes equations. The proposed framework demonstrates superior performance compared to existing scientific machine learning techniques, offering solutions for multiple instances of parametric PDEs without harnessing data. The mathematical framework is robust and reliable, with a well-developed loss function derived from the weak formulation, ensuring accurate approximation of solutions while exactly satisfying boundary conditions. The method's efficacy is further illustrated through its ability to accurately predict intricate natural behaviors like the Kolmogorov flow and boundary layers. In essence, our work pioneers a compelling avenue for parametric PDE solutions, serving as a bridge between traditional numerical methodologies and cutting-edge machine learning techniques in the realm of scientific computation.
    摘要 在这篇论文中,我们介绍了一种新的算法——参数partial differential equations(PDE)学习方法(SCLON),该方法可以解决 parametric PDE 问题,无需数据采集。我们的方法基于spectral methodology,使用正交函数,如 fourier series和Legendre polynomials,以实现高精度的 PDE 解。通过将spectral方法的优点,包括高精度、效率、通用性和边界条件的精确满足,与深度神经网络的强大特点相结合,SCLON 提供了一种转变性的策略。我们的方法不仅消除了需要对输入输出数据的匹配,而且可以高效地学习和预测复杂参数 PDE 的解。我们的方法可以解决多种参数 PDE 问题,包括带有特征函数的扩散-液体动力学方程和 Navier-Stokes 方程。我们的方法在多个实例中表现出了superior performance,而且可以准确预测自然现象,如kolmogorov流和边界层。总之,我们的工作开拓了一个新的avenue para parametric PDE 解决方案,并成为传统数值方法和现代机器学习技术之间的桥梁。

fmeffects: An R Package for Forward Marginal Effects

  • paper_url: http://arxiv.org/abs/2310.02008
  • repo_url: None
  • paper_authors: Holger Löwe, Christian A. Scholbeck, Christian Heumann, Bernd Bischl, Giuseppe Casalicchio
  • for: 提供一种可读性和可行性的模型解释方法,即如果我们将变量$x$改变为$h$,则预测结果$\widehat{y}$会受到多少改变?
  • methods: 基于Forward Marginal Effects(FMEs)的模型解释方法,提供了一个可用的R包实现。
  • results: 提供了一种可用的R包,可以帮助用户快速地计算和分析模型的解释结果。
    Abstract Forward marginal effects (FMEs) have recently been introduced as a versatile and effective model-agnostic interpretation method. They provide comprehensible and actionable model explanations in the form of: If we change $x$ by an amount $h$, what is the change in predicted outcome $\widehat{y}$? We present the R package fmeffects, the first software implementation of FMEs. The relevant theoretical background, package functionality and handling, as well as the software design and options for future extensions are discussed in this paper.
    摘要 forward marginal effects (FMEs) 最近被引入为一种通用和有效的模型无关解释方法。它们提供可读性和行动性的模型解释,表示:如果我们将变量x变化了一个量h, Then what is the change in predicted outcome $\widehat{y}$? 我们在这篇论文中介绍了R包fmeffects,是FMEs的首个软件实现。我们还讨论了理论背景、包功能和处理方法、软件设计和未来扩展的选项。

Benign Overfitting in Two-Layer ReLU Convolutional Neural Networks for XOR Data

  • paper_url: http://arxiv.org/abs/2310.01975
  • repo_url: None
  • paper_authors: Xuran Meng, Difan Zou, Yuan Cao
  • for: This paper aims to study the “benign overfitting” phenomenon in deep learning models, specifically in the context of XOR-type classification tasks with label-flipping noises.
  • methods: The paper uses an over-parameterized ReLU CNN trained by gradient descent to achieve near Bayes-optimal accuracy in the XOR problems, and establishes a matching lower bound result to demonstrate the efficiency of the CNN in learning the tasks.
  • results: The paper shows that, under certain conditions on the sample complexity and signal-to-noise ratio, the over-parameterized CNN can achieve near Bayes-optimal accuracy in the XOR problems, and establishes a lower bound result to demonstrate the absolute constant gap between the CNN’s accuracy and the Bayes-optimal rate.
    Abstract Modern deep learning models are usually highly over-parameterized so that they can overfit the training data. Surprisingly, such overfitting neural networks can usually still achieve high prediction accuracy. To study this "benign overfitting" phenomenon, a line of recent works has theoretically studied the learning of linear models and two-layer neural networks. However, most of these analyses are still limited to the very simple learning problems where the Bayes-optimal classifier is linear. In this work, we investigate a class of XOR-type classification tasks with label-flipping noises. We show that, under a certain condition on the sample complexity and signal-to-noise ratio, an over-parameterized ReLU CNN trained by gradient descent can achieve near Bayes-optimal accuracy. Moreover, we also establish a matching lower bound result showing that when the previous condition is not satisfied, the prediction accuracy of the obtained CNN is an absolute constant away from the Bayes-optimal rate. Our result demonstrates that CNNs have a remarkable capacity to efficiently learn XOR problems, even in the presence of highly correlated features.
    摘要 现代深度学习模型通常具有高度过参数化,以至于可能过拟合训练数据。尽管如此,这些过拟合神经网络仍可以达到高精度预测。为研究这一“温和过拟合”现象,一系列最近的研究已经 theoretically 研究了线性模型和两层神经网络的学习。然而,大多数这些分析仍然受限于非常简单的学习问题,其中的最佳分类器都是线性的。在这项工作中,我们研究了一类 XOR-类型的分类任务,并对这些任务进行了实验研究。我们发现,在某些条件下,一个过参数化的 ReLU CNN 使用梯度下降来训练,可以达到near Bayes-优化的精度。此外,我们还确立了一个匹配的下界结果,表明在这些条件不满足时,获得的 CNN 的预测精度与 Bayes-优化率之间的差距是一定的常数。我们的结果表明,CNN 在 XOR 问题上具有惊人的学习能力,即使特征之间存在高度的相关性。

Federated Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2310.01973
  • repo_url: None
  • paper_authors: Alain Rakotomamonjy, Kimia Nadjahi, Liva Ralaivola
  • for: 这个论文是为了计算在分布式环境中的沃氏距离而设计的。
  • methods: 这个论文使用的方法是在中央服务器的协调下,通过利用沃氏距离的几何性质和相应的{\em geodesics}来估算沃氏距离。
  • results: 这个论文的结果表明,这种方法可以有效地计算沃氏距离,并且可以用来提高 Federated Learning 算法的性能。
    Abstract We introduce a principled way of computing the Wasserstein distance between two distributions in a federated manner. Namely, we show how to estimate the Wasserstein distance between two samples stored and kept on different devices/clients whilst a central entity/server orchestrates the computations (again, without having access to the samples). To achieve this feat, we take advantage of the geometric properties of the Wasserstein distance -- in particular, the triangle inequality -- and that of the associated {\em geodesics}: our algorithm, FedWad (for Federated Wasserstein Distance), iteratively approximates the Wasserstein distance by manipulating and exchanging distributions from the space of geodesics in lieu of the input samples. In addition to establishing the convergence properties of FedWad, we provide empirical results on federated coresets and federate optimal transport dataset distance, that we respectively exploit for building a novel federated model and for boosting performance of popular federated learning algorithms.
    摘要 我们介绍了一种原理性的联邦方式来计算两个分布的沃氏距离。具体来说,我们展示了如何在不同设备/客户端上存储和计算的样本之间计算沃氏距离,而中央实体/服务器负责协调计算(不需要访问样本)。为了实现这一目标,我们利用沃氏距离的几何性质——特别是三角不等式——以及相应的{\em geodesics}的性质。我们的算法(FedWad,即联邦沃氏距离算法)通过在几何空间中扭曲和交换分布来逼近沃氏距离。此外,我们还证明了FedWad算法的收敛性和实际效果。Here's the breakdown of the translation:* 联邦 (federated) refers to the fact that the algorithm is designed to work on distributed devices or clients, without a centralized server.* 沃氏距离 (Wasserstein distance) is a measure of distance between two probability distributions.* 分布 (distribution) refers to the probability distributions of the samples stored on different devices.* 几何空间 (geometric space) refers to the space of geodesics, which are used to approximate the Wasserstein distance.* 扭曲 (manipulate) and 交换 (exchange) refer to the actions performed on the distributions in the geometric space to approximate the Wasserstein distance.* 收敛性 (convergence properties) refers to the fact that the algorithm converges to the correct solution as the number of iterations increases.* 实际效果 (empirical results) refers to the performance of the algorithm on real-world datasets.

Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

  • paper_url: http://arxiv.org/abs/2310.01972
  • repo_url: https://github.com/sacs-epfl/decentralizepy
  • paper_authors: Martijn de Vos, Sadegh Farhadkhani, Rachid Guerraoui, Anne-Marie Kermarrec, Rafael Pires, Rishi Sharma
  • for: 这篇论文旨在提出一种基于分布式学习(Decentralized Learning,DL)的新算法,称为疫情学习(Epidemic Learning,EL),该算法可以在不同的通信网络结构下进行学习,以提高模型的融合速度。
  • methods: EL算法在每个轮次中,每个节点将其模型更新发送到随机选择的 $s$ 个节点中,这使得模型在不同的通信网络结构下进行学习,从而提高模型的融合速度。
  • results: 我们对 EL 算法进行了广泛的理论分析,并证明了其在不同的通信网络结构下的优化性。在考虑非 convex 损函数时,EL 算法在 $O(n^3/s^2)$ 轮次内 converges,比最佳知道的 bound $O(n^3)$ 快,这表明随机化通信可以为 DL 提供利益。在实验中,我们 comparing EL 算法与现有的 DL 算法,并证明了 EL 可以在 96 个节点网络上 converge 更快,并达到 $2.2 $% 高于基eline DL 算法的同等通信量。
    Abstract We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.
    摘要 我团队现代 Epidemic Learning(EL),一种简单又强大的分布式学习(DL)算法,利用变化的通信 topology 以实现更快的模型融合。在每次 EL 中,每个节点将其模型更新发送到随机选择的 $s$ 个节点(在系统中的 $n$ 个节点)。我们提供了 EL 的广泛理论分析,证明其变化的 topology 会产生更高的融合性能,比起现有的静态和动态 topology。对于平滑非 convex 损失函数,EL 的融合轮数,即需要达到 asymptotic 线性加速的轮数,是 $O(n^3/s^2)$,比best-known bound $O(n^3)$ 高一个因子 $s^2$,表明随机通信对 DL 有利。我们对一个 96 个节点网络进行了实验,并与现有 DL 方法进行比较。我们的结果表明,EL 可以在 1.7 倍快于基eline DL 算法 converged,并达到 2.2 个百分点高于同样的通信量的精度。

Beyond Labeling Oracles: What does it mean to steal ML models?

  • paper_url: http://arxiv.org/abs/2310.01959
  • repo_url: None
  • paper_authors: Avital Shafran, Ilia Shumailov, Murat A. Erdogdu, Nicolas Papernot
  • for: 这篇论文主要写于模型EXTRACTION攻击,即通过API访问ML模型并劫持其中的已训练模型。
  • methods: 论文提出了一种新的模型EXTRACTION攻击方法,即通过让攻击者根据自己的假设采样数据来劫持模型。
  • results: 论文通过实验发现,现有的模型EXTRACTION攻击方法通常不能实现预期的成本减少,因为攻击者需要有充足的假设数据来采样。此外,论文还发现了一些因素影响模型EXTRACTION的成功率,如攻击者的先前知识和攻击策略。
    Abstract Model extraction attacks are designed to steal trained models with only query access, as is often provided through APIs that ML-as-a-Service providers offer. ML models are expensive to train, in part because data is hard to obtain, and a primary incentive for model extraction is to acquire a model while incurring less cost than training from scratch. Literature on model extraction commonly claims or presumes that the attacker is able to save on both data acquisition and labeling costs. We show that the attacker often does not. This is because current attacks implicitly rely on the adversary being able to sample from the victim model's data distribution. We thoroughly evaluate factors influencing the success of model extraction. We discover that prior knowledge of the attacker, i.e. access to in-distribution data, dominates other factors like the attack policy the adversary follows to choose which queries to make to the victim model API. Thus, an adversary looking to develop an equally capable model with a fixed budget has little practical incentive to perform model extraction, since for the attack to work they need to collect in-distribution data, saving only on the cost of labeling. With low labeling costs in the current market, the usefulness of such attacks is questionable. Ultimately, we demonstrate that the effect of prior knowledge needs to be explicitly decoupled from the attack policy. To this end, we propose a benchmark to evaluate attack policy directly.
    摘要 模型提取攻击是设计来盗取受训的模型,仅从query接口进行访问,如ML-as-a-Service提供者所提供的API。机器学习模型训练成本高,其中数据收集成本尤为高,而模型提取攻击的主要动机是盗取模型而不需要重新训练。文献中关于模型提取攻击通常假设或假设攻击者能够减少数据收集和标注成本。我们显示,攻击者并不一定能够减少这些成本。这是因为当前的攻击都是基于攻击者能够采样 victim模型的数据分布。我们仔细评估了模型提取攻击的成功因素,发现攻击者的先前知识,即访问受训数据,对攻击的成功产生了决定性的影响。因此,如果攻击者想要建立一个与训练模型相同的模型,而且只有固定预算,那么他们并不会有实际的动机来进行模型提取,因为为了使攻击成功,他们需要收集受训数据,并且现在标注成本低,这种攻击的用处是有限的。最后,我们示出了需要将先前知识从攻击策略中分离出来。为此,我们提出了一个直接评估攻击策略的标准套件。

Causal Inference with Conditional Front-Door Adjustment and Identifiable Variational Autoencoder

  • paper_url: http://arxiv.org/abs/2310.01937
  • repo_url: None
  • paper_authors: Ziqi Xu, Debo Cheng, Jiuyong Li, Jixue Liu, Lin Liu, Kui Yu
  • for: 这篇论文的目的是解决假设推理中的 causal effect estimation 问题,特别是在没有观察到的潜在随机变量的情况下。
  • methods: 这篇论文提出了 conditional front-door (CFD) adjustment 的概念,并证明了这种调整可以确保 causal effect 的可识别性。此外,这篇论文还提出了一个名为 CFDiVAE 的模型,它可以从数据中学习 CFD adjustment 的表现,并证明了这个模型的可识别性。
  • results: 实验结果显示,CFDiVAE 比较于 existing methods 具有更高的效能和更好的敏感性,并且可以在实际应用中实现更好的结果。此外,CFDiVAE 还可以在实际应用中实现更好的可识别性和隐藏数据的处理。
    Abstract An essential and challenging problem in causal inference is causal effect estimation from observational data. The problem becomes more difficult with the presence of unobserved confounding variables. The front-door adjustment is a practical approach for dealing with unobserved confounding variables. However, the restriction for the standard front-door adjustment is difficult to satisfy in practice. In this paper, we relax some of the restrictions by proposing the concept of conditional front-door (CFD) adjustment and develop the theorem that guarantees the causal effect identifiability of CFD adjustment. Furthermore, as it is often impossible for a CFD variable to be given in practice, it is desirable to learn it from data. By leveraging the ability of deep generative models, we propose CFDiVAE to learn the representation of the CFD adjustment variable directly from data with the identifiable Variational AutoEncoder and formally prove the model identifiability. Extensive experiments on synthetic datasets validate the effectiveness of CFDiVAE and its superiority over existing methods. The experiments also show that the performance of CFDiVAE is less sensitive to the causal strength of unobserved confounding variables. We further apply CFDiVAE to a real-world dataset to demonstrate its potential application.
    摘要 一个重要且挑战性的问题在 causal inference 中是从观察数据中估计 causal effect。在存在隐藏的干扰变量时,这个问题变得更加困难。我们提出了 conditional front-door (CFD) 调整方法,以适应实际中难以满足的限制。在这篇论文中,我们证明了 CFD 调整方法可以确定 causal effect 的可 identificability。然而,在实际中,CFD 变量通常不可能给出。我们提出了 CFDiVAE,一种可以从数据中学习 CFD 调整变量的表示。我们证明了 CFDiVAE 模型的可 identificability,并通过对 sintetic 数据进行了广泛的实验来证明其效果。实验结果表明,CFDiVAE 比存在的方法更有优势,并且对隐藏干扰变量的 causal strength 的敏感性较低。我们最后将 CFDiVAE 应用于实际数据集,以展示其应用前景。

Unsupervised Complex Semi-Binary Matrix Factorization for Activation Sequence Recovery of Quasi-Stationary Sources

  • paper_url: http://arxiv.org/abs/2310.02295
  • repo_url: None
  • paper_authors: Romain Delabeye, Martin Ghienne, Olivia Penas, Jean-Luc Dion
  • for: 本研究旨在提高工业过程和生产系统的理解,以便提高能源可持续性和可靠性。
  • methods: 本研究使用稀疏回归技术来提取序列数据中的主动动力。
  • results: 研究表明,稀疏回归技术可以有效地提取多普通 actuator 的运动序列。
    Abstract Advocating for a sustainable, resilient and human-centric industry, the three pillars of Industry 5.0 call for an increased understanding of industrial processes and manufacturing systems, as well as their energy sustainability. One of the most fundamental elements of comprehension is knowing when the systems are operated, as this is key to locating energy intensive subsystems and operations. Such knowledge is often lacking in practice. Activation statuses can be recovered from sensor data though. Some non-intrusive sensors (accelerometers, current sensors, etc.) acquire mixed signals containing information about multiple actuators at once. Despite their low cost as regards the fleet of systems they monitor, additional signal processing is required to extract the individual activation sequences. To that end, sparse regression techniques can extract leading dynamics in sequential data. Notorious dictionary learning algorithms have proven effective in this regard. This paper considers different industrial settings in which the identification of binary subsystem activation sequences is sought. In this context, it is assumed that each sensor measures an extensive physical property, source signals are periodic, quasi-stationary and independent, albeit these signals may be correlated and their noise distribution is arbitrary. Existing methods either restrict these assumptions, e.g., by imposing orthogonality or noise characteristics, or lift them using additional assumptions, typically using nonlinear transforms.
    摘要 To address this challenge, this paper explores the use of non-intrusive sensors, such as accelerometers and current sensors, to recover activation statuses from sensor data. These sensors can provide mixed signals containing information about multiple actuators at once, but additional signal processing is required to extract the individual activation sequences.To achieve this, the paper proposes using sparse regression techniques to extract leading dynamics in sequential data. Notorious dictionary learning algorithms have proven effective in this regard. The paper considers different industrial settings where the identification of binary subsystem activation sequences is sought.In these settings, it is assumed that each sensor measures an extensive physical property, and the source signals are periodic, quasi-stationary, and independent, although they may be correlated and have an arbitrary noise distribution. Existing methods either restrict these assumptions or lift them using additional assumptions, typically using nonlinear transforms.

Synthetic CT Generation via Variant Invertible Network for All-digital Brain PET Attenuation Correction

  • paper_url: http://arxiv.org/abs/2310.01885
  • repo_url: None
  • paper_authors: Yu Guan, Bohui Shen, Xinchong Shi, Xiangsong Zhang, Bingxuan Li, Qiegen Liu
  • for: 这篇论文的目的是解决 positron emission tomography (PET) 图像中的衰减 correction (AC) 问题,以提供质量更高的 PET 图像。
  • methods: 这篇论文提出了一种使用深度学习approach来实现 PET AC 而无需骨架图像,具体来说是使用一种可逆网络(Invertible Network)和变量增强策略来生成 CT 图像。
  • results: 根据对 1440 个数据集(来自 37 名临床病人)的比较研究,提出的 invertible network for PET AC 表现了比其他 AC 模型更高的性能,这表明了该方法的潜力和在无骨架 CT 下实现 PET AC 的可能性。
    Abstract Attenuation correction (AC) is essential for the generation of artifact-free and quantitatively accurate positron emission tomography (PET) images. However, AC of PET faces challenges including inter-scan motion and erroneous transformation of structural voxel-intensities to PET attenuation-correction factors. Nowadays, the problem of AC for quantitative PET have been solved to a large extent after the commercial availability of devices combining PET with computed tomography (CT). Meanwhile, considering the feasibility of a deep learning approach for PET AC without anatomical imaging, this paper develops a PET AC method, which uses deep learning to generate continuously valued CT images from non-attenuation corrected PET images for AC on brain PET imaging. Specifically, an invertible network combined with the variable augmentation strategy that can achieve the bidirectional inference processes is proposed for synthetic CT generation (IVNAC). To evaluate the performance of the proposed algorithm, we conducted a comprehensive study on a total of 1440 data from 37 clinical patients using comparative algorithms (such as Cycle-GAN and Pix2pix). Perceptual analysis and quantitative evaluations illustrate that the invertible network for PET AC outperforms other existing AC models, which demonstrates the potential of the proposed method and the feasibility of achieving brain PET AC without CT.
    摘要 减弱 correction (AC) 是生成无artifact和量度准确的 positron emission tomography (PET) 图像的关键。然而,PET AC 面临了多种挑战,包括 между扫描图像的运动和错误地将结构ixel-intensity 转换为 PET 减弱 correction 因素。在当今,PET AC 问题已经在一定程度上解决了,尤其是在商业化 PET 与计算机 Tomography (CT) 设备的出现后。在这种情况下,本文提出了一种不需要 анатомиче imaging 的 PET AC 方法,该方法使用深度学习生成维持连续值的 CT 图像,以便对 brain PET 图像进行 AC。特别是,本文提出了一种可逆网络(IVNAC),该网络可以实现 bidirectional inference 过程,并且可以生成 Synthetic CT 图像。为了评估提案的性能,我们对总共 1440 个数据集进行了全面的研究,包括 37 名临床病人的数据。我们对比了一些现有的 AC 模型(如 Cycle-GAN 和 Pix2pix),并进行了感知分析和量度评估。结果表明,可逆网络 для PET AC 在其他现有 AC 模型中表现出色,这表明了提案的可行性和脑 PET AC 无需 CT 的可能性。

AutoCast++: Enhancing World Event Prediction with Zero-shot Ranking-based Context Retrieval

  • paper_url: http://arxiv.org/abs/2310.01880
  • repo_url: None
  • paper_authors: Qi Yan, Raihan Seraj, Jiawei He, Lili Meng, Tristan Sylvain
  • for: 预测现实世界事件的机器学习模型在迅速做出决策的前提下受到关注。传统的预测主要基于结构化数据,如时间序列,但最近的突破口在语言模型中允许使用不结构化的文本进行预测。
  • methods: 我们引入了AutoCast++,一种零shot排名基于上下文检索系统,用于从庞大的新闻文档库中筛选到适合预测的新闻。我们的方法首先根据零shot问题段与新闻文章的相似性进行排名,然后使用零shot摘要来简化Context。我们利用预训练的语言模型,无需培训,进行问题段相似性评估和文章摘要。
  • results: 我们的实验结果表明,AutoCast++可以大幅提高多项问题(MCQ)的性能,提高True/False(TF)问题的性能,最高提高MCQ的性能48%,TF问题的性能最高提高8%。
    Abstract Machine-based prediction of real-world events is garnering attention due to its potential for informed decision-making. Whereas traditional forecasting predominantly hinges on structured data like time-series, recent breakthroughs in language models enable predictions using unstructured text. In particular, (Zou et al., 2022) unveils AutoCast, a new benchmark that employs news articles for answering forecasting queries. Nevertheless, existing methods still trail behind human performance. The cornerstone of accurate forecasting, we argue, lies in identifying a concise, yet rich subset of news snippets from a vast corpus. With this motivation, we introduce AutoCast++, a zero-shot ranking-based context retrieval system, tailored to sift through expansive news document collections for event forecasting. Our approach first re-ranks articles based on zero-shot question-passage relevance, honing in on semantically pertinent news. Following this, the chosen articles are subjected to zero-shot summarization to attain succinct context. Leveraging a pre-trained language model, we conduct both the relevance evaluation and article summarization without needing domain-specific training. Notably, recent articles can sometimes be at odds with preceding ones due to new facts or unanticipated incidents, leading to fluctuating temporal dynamics. To tackle this, our re-ranking mechanism gives preference to more recent articles, and we further regularize the multi-passage representation learning to align with human forecaster responses made on different dates. Empirical results underscore marked improvements across multiple metrics, improving the performance for multiple-choice questions (MCQ) by 48% and true/false (TF) questions by up to 8%.
    摘要 人工预测现实世界事件的可能性正在吸引越来越多的关注,因为它可以提供更加有知识的决策。传统的预测主要基于结构化数据,如时间序列,但最新的突破性语言模型可以通过不结构化的文本进行预测。例如,(Zou et al., 2022)描述了AutoCast,一个新的标准测试用例,使用新闻文章回答预测查询。然而,现有方法仍然落后于人类表现。我们认为,准确预测的基础在于从庞大的新闻文章库中选择一个简短、具有richSemantic的新闻摘要。为此,我们介绍了AutoCast++,一种零shot排名基于上下文检索系统,专门为遍历庞大的新闻文章库进行事件预测。我们的方法首先根据零shot问题文章相似性进行排名,选择semantically相似的新闻。然后,选择的新闻文章将被采用零shot概要化来获得简短的上下文。我们使用预训练的语言模型来进行问题相似性评估和文章概要化,无需培训具有特定领域知识。尤其是,新闻文章可能与之前的文章不同,因为新的事实或意外发展,导致时间动态变化。为了解决这一问题,我们的重新排名机制偏好更新的文章,并进一步规范多个文章表示学习以与人类预测器的回答时间相Alignment。实验结果表明,我们的方法在多个维度上具有明显的改进,提高多选题(MCQ)的性能 by 48%和真假问(TF)的性能 by up to 8%。

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01870
  • repo_url: None
  • paper_authors: Albert Garde, Esben Kran, Fazl Barez
  • for: 这篇论文的目的是提供一个可读性和透明度的工具集,用于分析含有变换器模型的大语言模型(LLM)。
  • methods: 该工具集使用的方法包括探索神经元、比较模型以及从模型中获取信息。
  • results: 该工具集可以帮助研究人员、工程师和开发人员快速检测和解决 LLM 中的问题,提高模型的可读性和透明度,使得 LLM 更加透明、可靠和安全。
    Abstract As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe. Researchers, engineers, and developers can quickly diagnose issues, audit systems, and advance the field.
    摘要 large language models (LLMs) 的能力越来越强大,有一个紧迫的需求是可读性和透明度的工具。现有的方法困难实施,可accessible的模型内部分析工具缺乏。为了bridging这个差距,我们提出了DeepDecipher - 一个用于探索transformer模型的MLP层中neuron的API和界面。DeepDecipher使得LLMs中进行先进的解释性技术的输出 readily available。使用这个易用的界面,可以更直观地检查这些复杂的模型。本文介绍了DeepDecipher的设计和功能。我们示例了如何分析neurons,比较模型,并从模型的行为中获得洞察。例如,我们比较了DeepDecipher的功能与类似的工具如Neuroscope和OpenAI的Neuron Explainer。DeepDecipher可以高效、可扩展地分析LLMs。通过提供先进的解释性方法,DeepDecipher使LLMs更透明、可靠、安全。研究人员、工程师和开发者可以快速诊断问题、审核系统和推动领域的进步。

High-Probability Convergence for Composite and Distributed Stochastic Minimization and Variational Inequalities with Heavy-Tailed Noise

  • paper_url: http://arxiv.org/abs/2310.01860
  • repo_url: None
  • paper_authors: Eduard Gorbunov, Abdurakhmon Sadiev, Marina Danilova, Samuel Horváth, Gauthier Gidel, Pavel Dvurechensky, Alexander Gasnikov, Peter Richtárik
  • for: 这种论文主要研究了在含噪量的情况下,使用概率首领法 optimize 非对映准则和分布式优化问题的高可能性分析方法。
  • methods: 作者提出了一种基于概率 gradient difference clipping 的新方法,并证明了这种方法在 composite 和分布式优化问题中具有高可能性 convergence 性。
  • results: 作者通过实验和分析,证明了这种方法在各种特殊情况下(如强CONvex 问题)具有较好的高可能性 guarantees,并且可以在不受噪量的情况下提高 convergence 速度。
    Abstract High-probability analysis of stochastic first-order optimization methods under mild assumptions on the noise has been gaining a lot of attention in recent years. Typically, gradient clipping is one of the key algorithmic ingredients to derive good high-probability guarantees when the noise is heavy-tailed. However, if implemented na\"ively, clipping can spoil the convergence of the popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of any noise. Due to this reason, many works on high-probability analysis consider only unconstrained non-distributed problems, and the existing results for composite/distributed problems do not include some important special cases (like strongly convex problems) and are not optimal. To address this issue, we propose new stochastic methods for composite and distributed optimization based on the clipping of stochastic gradient differences and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. Using similar ideas, we also develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.
    摘要 Recent years have seen a surge of interest in high-probability analysis of stochastic first-order optimization methods under mild noise assumptions. Typically, gradient clipping is a key ingredient for deriving good high-probability guarantees in the presence of heavy-tailed noise. However, naive implementation of clipping can hinder the convergence of popular methods for composite and distributed optimization (Prox-SGD/Parallel SGD) even in the absence of noise. As a result, many works on high-probability analysis focus on unconstrained non-distributed problems, and existing results for composite/distributed problems exclude important special cases (such as strongly convex problems) and are not optimal.To address this issue, we propose new stochastic methods for composite and distributed optimization based on stochastic gradient difference clipping and prove tight high-probability convergence results (including nearly optimal ones) for the new methods. Similarly, we develop new methods for composite and distributed variational inequalities and analyze the high-probability convergence of these methods.

Variational Gaussian approximation of the Kushner optimal filter

  • paper_url: http://arxiv.org/abs/2310.01859
  • repo_url: https://github.com/angaral41/https-www.gaytorrent.ru-details.php-231000b636db0c4b71a4245d9ffa993a0185914d90a31ebf-Kieran-Flex
  • paper_authors: Marc Lambert, Silvère Bonnabel, Francis Bach
  • for: 这个论文主要是为了解决 Dynamical system 的难题,具体来说是通过 continous-time 观察来描述状态的演化。
  • methods: 这个论文使用了 two 种 tractable variational Gaussian approximations,一种是基于 Wasserstein 距离的 proximal loss,另一种是基于 Fisher 距离的 proximal loss。这两种 approximations 可以 fusion 并满足一组 Stochastic Differential Equations(SDEs),这些 SDEs 是关于 Gaussian 的均值和 covariance 矩阵的。
  • results: 这个论文的结果是一种基于 Gaussian flow 的方法,这种方法可以解决 Kalman-Bucy 和 Riccati 流的问题,并且可以扩展到非线性情况。这个方法可以用来解决 Dynamical system 的难题,并且可以提供一种 tractable 的方法来描述状态的演化。
    Abstract In estimation theory, the Kushner equation provides the evolution of the probability density of the state of a dynamical system given continuous-time observations. Building upon our recent work, we propose a new way to approximate the solution of the Kushner equation through tractable variational Gaussian approximations of two proximal losses associated with the propagation and Bayesian update of the probability density. The first is a proximal loss based on the Wasserstein metric and the second is a proximal loss based on the Fisher metric. The solution to this last proximal loss is given by implicit updates on the mean and covariance that we proposed earlier. These two variational updates can be fused and shown to satisfy a set of stochastic differential equations on the Gaussian's mean and covariance matrix. This Gaussian flow is consistent with the Kalman-Bucy and Riccati flows in the linear case and generalize them in the nonlinear one.
    摘要 在估计理论中,库什纳方程描述了一 dynamical system 的状态的概率密度的发展,给出了连续时间观测的情况。基于我们的先前工作,我们提出了一种新的近似解决库什纳方程的方法,通过可迭代的Variational Gaussian approximations来 aproximate two proximal losses associated with the propagation and Bayesian update of the probability density. The first is a proximal loss based on the Wasserstein metric and the second is a proximal loss based on the Fisher metric. The solution to this last proximal loss is given by implicit updates on the mean and covariance that we proposed earlier. These two variational updates can be fused and shown to satisfy a set of stochastic differential equations on the Gaussian's mean and covariance matrix. This Gaussian flow is consistent with the Kalman-Bucy and Riccati flows in the linear case and generalize them in the nonlinear one.Note: Simplified Chinese is used here, which is a more informal and conversational style of Chinese. If you prefer Traditional Chinese or a more formal style, I can provide those as well.

Score-based Data Assimilation for a Two-Layer Quasi-Geostrophic Model

  • paper_url: http://arxiv.org/abs/2310.01853
  • repo_url: https://github.com/francois-rozet/sda
  • paper_authors: François Rozet, Gilles Louppe
  • for: addressed the problem of identifying plausible state trajectories of dynamical systems given noisy or incomplete observations.
  • methods: score-based data assimilation (SDA) method, with modifications to the score network architecture to reduce memory consumption and execution time.
  • results: promising results for a two-layer quasi-geostrophic model.Here’s the full translation in Simplified Chinese:
  • for: 本研究实际运用于地球物理系统中的数据吸收问题,即从不确定或部分观测中推断可能的系统状态轨迹。
  • methods: 本研究使用了一种名为排名基数据吸收(SDA)的新型数据吸收方法,并对排名网络架构进行了修改,以大幅降低内存 consumption和执行时间。
  • results: 实验结果显示,这种修改后的 SDA 方法在两层 quasi-地球径模型中获得了有前途的结果。
    Abstract Data assimilation addresses the problem of identifying plausible state trajectories of dynamical systems given noisy or incomplete observations. In geosciences, it presents challenges due to the high-dimensionality of geophysical dynamical systems, often exceeding millions of dimensions. This work assesses the scalability of score-based data assimilation (SDA), a novel data assimilation method, in the context of such systems. We propose modifications to the score network architecture aimed at significantly reducing memory consumption and execution time. We demonstrate promising results for a two-layer quasi-geostrophic model.
    摘要 “数据融合”是指根据含有噪声或不完整的观测数据来确定动力系统的可能性状态轨迹的问题。在地球科学中,它受到高维度动力系统的挑战,经常超过百万维度。本文评估了基于得分网络的数据融合方法(SDA)在这些系统中的可扩展性。我们提出了改进得分网络架构,以减少内存占用和执行时间。我们在二层 quasi-地球压力模型中实现了promising结果。

EMBERSim: A Large-Scale Databank for Boosting Similarity Search in Malware Analysis

  • paper_url: http://arxiv.org/abs/2310.01835
  • repo_url: https://github.com/crowdstrike/embersim-databank
  • paper_authors: Dragos Georgian Corlatescu, Alexandru Dinu, Mihaela Gaman, Paul Sumedrea
  • for: 本研究的目的是提高针对二进制文件的相似性研究,以强化针对恶意软件的检测。
  • methods: 本研究使用了机器学习技术,并将EMBER数据集(一个大型针对恶意软件的分类数据集)与相似性信息相结合,以便进一步研究相似性空间。
  • results: 本研究提出了一种基于相似性的针对二进制文件的检测方法,并在EMBERSim数据集上进行了实验,结果表明该方法可以提高检测精度。
    Abstract In recent years there has been a shift from heuristics-based malware detection towards machine learning, which proves to be more robust in the current heavily adversarial threat landscape. While we acknowledge machine learning to be better equipped to mine for patterns in the increasingly high amounts of similar-looking files, we also note a remarkable scarcity of the data available for similarity-targeted research. Moreover, we observe that the focus in the few related works falls on quantifying similarity in malware, often overlooking the clean data. This one-sided quantification is especially dangerous in the context of detection bypass. We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER - one of the largest malware classification data sets. We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space. Our contribution is threefold: (1) we publish EMBERSim, an augmented version of EMBER, that includes similarity-informed tags; (2) we enrich EMBERSim with automatically determined malware class tags using the open-source tool AVClass on VirusTotal data and (3) we describe and share the implementation for our class scoring technique and leaf similarity method.
    摘要
  1. We publish EMBERSim, an augmented version of EMBER, which includes similarity-informed tags.2. We enrich EMBERSim with automatically determined malware class tags using the open-source tool AVClass on VirusTotal data.3. We describe and share the implementation of our class scoring technique and leaf similarity method.

Towards Robust Fidelity for Evaluating Explainability of Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01820
  • repo_url: None
  • paper_authors: Xu Zheng, Farhad Shirani, Tianchun Wang, Wei Cheng, Zhuomin Chen, Haifeng Chen, Hua Wei, Dongsheng Luo
  • for: 这个论文的目的是为了提供一种满足信息论定义的GNN解释函数,并且评估这些解释函数的性能。
  • methods: 这个论文使用了一种新的信息论定义来评估GNN解释函数的性能,并提出了一种robust的评估方法来避免分布shift问题。
  • results: 实验结果表明,提出的评估方法比以前的评估方法更加稳定和可靠,并且与黄金标准 metric更加相似。
    Abstract Graph Neural Networks (GNNs) are neural models that leverage the dependency structure in graphical data via message passing among the graph nodes. GNNs have emerged as pivotal architectures in analyzing graph-structured data, and their expansive application in sensitive domains requires a comprehensive understanding of their decision-making processes -- necessitating a framework for GNN explainability. An explanation function for GNNs takes a pre-trained GNN along with a graph as input, to produce a `sufficient statistic' subgraph with respect to the graph label. A main challenge in studying GNN explainability is to provide fidelity measures that evaluate the performance of these explanation functions. This paper studies this foundational challenge, spotlighting the inherent limitations of prevailing fidelity metrics, including $Fid_+$, $Fid_-$, and $Fid_\Delta$. Specifically, a formal, information-theoretic definition of explainability is introduced and it is shown that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures. Subsequently, a robust class of fidelity measures are introduced, and it is shown analytically that they are resilient to distribution shift issues and are applicable in a wide range of scenarios. Extensive empirical analysis on both synthetic and real datasets are provided to illustrate that the proposed metrics are more coherent with gold standard metrics.
    摘要 图 neural network(GNN)是一种基于图数据的神经网络模型,通过图节点之间的消息传递来利用图数据的依赖结构。GNN在敏感领域中应用广泛,因此需要深入了解它们的决策过程,而需要一个GNN解释性框架。一个GNN解释函数会接受一个预训练的GNN以及一个图作为输入,并生成一个具有图标签的`sufficient statistic'子图。研究GNN解释性的主要挑战是提供可靠的评估指标,以评估这些解释函数的性能。这篇文章研究了这个基础性的挑战,并发现现有的纯度指标,如$Fid_+$, $Fid_-$ 和 $Fid_\Delta$,在多种统计场景下存在诸多限制。Specifically, this paper introduces a formal, information-theoretic definition of explainability and shows that existing metrics often fail to align with this definition across various statistical scenarios. The reason is due to potential distribution shifts when subgraphs are removed in computing these fidelity measures.为了解决这个问题,本文提出了一种Robust class of fidelity measures,并证明它们在各种场景下具有高度的稳定性和可靠性。这些指标可以在各种场景下应用,并且与权威指标更加一致。经验分析表明,提出的指标在synthetic和实际数据上的性能都较高。

AutoLoRa: A Parameter-Free Automated Robust Fine-Tuning Framework

  • paper_url: http://arxiv.org/abs/2310.01818
  • repo_url: None
  • paper_authors: Xilie Xu, Jingfeng Zhang, Mohan Kankanhalli
  • for: 本研究旨在提高鲁棒精度下游应用中的抗袭击性能,而无需大量计算资源和数据收集。
  • methods: 本研究使用Robust Fine-Tuning(RFT)策略,并提出了一种低维度分支(LoRa)来分离RFT成两个独立组成部分,以便更好地优化自然目标和敌意目标。同时,我们引入了一些启发式策略来自动调整学习率和损失项的参数。
  • results: 我们的提出的自动化LoRa(AutoLoRa)在多种下游任务上达到了新的州OF-THE-ART的result,而且具有实用性,可以自动将预训练的特征提取器转换成抗袭击的模型,无需搜索参数。
    Abstract Robust Fine-Tuning (RFT) is a low-cost strategy to obtain adversarial robustness in downstream applications, without requiring a lot of computational resources and collecting significant amounts of data. This paper uncovers an issue with the existing RFT, where optimizing both adversarial and natural objectives through the feature extractor (FE) yields significantly divergent gradient directions. This divergence introduces instability in the optimization process, thereby hindering the attainment of adversarial robustness and rendering RFT highly sensitive to hyperparameters. To mitigate this issue, we propose a low-rank (LoRa) branch that disentangles RFT into two distinct components: optimizing natural objectives via the LoRa branch and adversarial objectives via the FE. Besides, we introduce heuristic strategies for automating the scheduling of the learning rate and the scalars of loss terms. Extensive empirical evaluations demonstrate that our proposed automated RFT disentangled via the LoRa branch (AutoLoRa) achieves new state-of-the-art results across a range of downstream tasks. AutoLoRa holds significant practical utility, as it automatically converts a pre-trained FE into an adversarially robust model for downstream tasks without the need for searching hyperparameters.
    摘要 “强健精致调整(RFT)是一种低成本策略,以获取攻击下流应用程序的防御能力,不需要很多计算资源和收集大量数据。这篇论文揭露了RFT中的一个问题,那就是在批量调整时,将 adversarial 和自然目标通过抽象特征提取器(FE)进行优化会导致几何方向的散度。这个散度会导致优化过程中的不稳定性,使得获取防御能力受到几何 Parameters的影响。为了解决这个问题,我们提出了一个低维(LoRa)支线,将 RFT 拆分为两个不同的部分:通过 LoRa 支线优化自然目标,并通过 FE 优化攻击目标。此外,我们引入了一些假设策略来自动调整学习率和损失函数的参数。实验结果显示,我们的提案的自动拆分 LoRa 支线(AutoLoRa)在多个下流任务中实现新的顶峰成绩。AutoLoRa 具有实用性,因为它可以将预训练的 FE 转换为攻击下流任务的防御模型,不需要搜寻参数。”

What Determines the Price of NFTs?

  • paper_url: http://arxiv.org/abs/2310.01815
  • repo_url: https://github.com/paralleluniversium/pulproject
  • paper_authors: Vivian Ziemke, Benjamin Estermann, Roger Wattenhofer, Ye Wang
  • for: 本研究旨在了解非可转换 Token (NFT) 价格如何决定,以及哪些因素对 NFT 价格产生影响。
  • methods: 本研究使用了 Onchain 和 Offchain 数据来分析 NFT 收藏品在 OpenSea 上的交易。研究人员对 NFT 的文本和图像数据进行了分析,以了解价格变化的原因。
  • results: 研究结果表明,文本和图像数据可以用来解释 NFT 收藏品的价格变化,但是这些特征不能泛化到新、未看到的收藏品。此外,研究人员发现,NFT 收藏品的交易量经常与其在线存在相关,如社交媒体关注者和网站流量。
    Abstract In the evolving landscape of digital art, Non-Fungible Tokens (NFTs) have emerged as a groundbreaking platform, bridging the realms of art and technology. NFTs serve as the foundational framework that has revolutionized the market for digital art, enabling artists to showcase and monetize their creations in unprecedented ways. NFTs combine metadata stored on the blockchain with off-chain data, such as images, to create a novel form of digital ownership. It is not fully understood how these factors come together to determine NFT prices. In this study, we analyze both on-chain and off-chain data of NFT collections trading on OpenSea to understand what influences NFT pricing. Our results show that while text and image data of the NFTs can be used to explain price variations within collections, the extracted features do not generalize to new, unseen collections. Furthermore, we find that an NFT collection's trading volume often relates to its online presence, like social media followers and website traffic.
    摘要 在数字艺术的演化之中,非可转换代币(NFT)已经出现为一个创新的平台,结合艺术和技术两个领域。NFT作为数字艺术市场的基础结构,对艺术家的作品进行了无 precedent的展示和商业化。NFT将区块链上的元数据与外部数据,如图像,结合起来创造了一种新的数字所有权。但是我们还不很理解这些因素如何决定NFT价格。在这项研究中,我们分析了OpenSea上的NFT收藏品交易数据,以解释NFT价格的变化。我们发现,对于同一个收藏品,文本和图像数据可以用来解释价格变化,但是这些特征不能泛化到新的、未看过的收藏品。此外,我们发现,一个NFT收藏品的交易量经常与其在线存在相关,例如社交媒体follower和网站流量。

Simulation-based Inference with the Generalized Kullback-Leibler Divergence

  • paper_url: http://arxiv.org/abs/2310.01808
  • repo_url: None
  • paper_authors: Benjamin Kurt Miller, Marco Federici, Christoph Weniger, Patrick Forré
  • for: 解决隐藏 проблеme 中的逆问题,当仅知likelihood是 implicit的时候。
  • methods: 使用 Neural Posterior Estimation 方法,适应了 normalized density estimator 作为 posterior 模型。
  • results: 提出了一种 generalized Kullback-Leibler divergence 来考虑不正规分布的正则化常数,并将 Neural Posterior Estimation 和 Neural Ratio Estimation 统一到一个目标中。 benchmark 结果表明 hybrid 模型可以同时利用 normalized base distribution 和 learned ratio 的优点。
    Abstract In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler divergence that accounts for the normalization constant in unnormalized distributions. The objective recovers Neural Posterior Estimation when the model class is normalized and unifies it with Neural Ratio Estimation, combining both into a single objective. We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio. We also present benchmark results.
    摘要 在 simulations-based inference 中,我们的目标是解决逆 проблеme 当 likelihood 只能通过 implicit 的方式知道。 Neural Posterior Estimation 通常是一个 normalized density estimator 作为这个类型的 surrogate model。 这个形式不能轻易地适用于不对齐的 surrogate,因为它最佳化 Kullback-Leibler divergence。我们提议使用一个通用化 Kullback-Leibler divergence,考虑到不对齐分布的内部常数。这个目标可以复原 Neural Posterior Estimation 当模型集是正规化的,并将其与 Neural Ratio Estimation 结合,将它们融合为一个单一的目标。我们还探索了一个 hybrid 模型,它可以学习一个正规化的基底分布和一个学习的比例。我们还会提供一些 benchmark 结果。

GNNX-BENCH: Unravelling the Utility of Perturbation-based GNN Explainers through In-depth Benchmarking

  • paper_url: http://arxiv.org/abs/2310.01794
  • repo_url: None
  • paper_authors: Mert Kosan, Samidha Verma, Burouj Armgaan, Khushbu Pahwa, Ambuj Singh, Sourav Medya, Sayan Ranu
  • for: This paper aims to provide a comprehensive understanding of the state-of-the-art explainability methods for Graph Neural Networks (GNNs), and to identify potential research problems for further enhancement.
  • methods: The paper presents a benchmarking study on perturbation-based explainability methods for GNNs, evaluating and comparing a wide range of explainability techniques.
  • results: The study reveals that all algorithms are affected by stability issues when faced with noisy data, and that current counterfactual explainers often fail to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations.
    Abstract Numerous explainability methods have been proposed to shed light on the inner workings of GNNs. Despite the inclusion of empirical evaluations in all the proposed algorithms, the interrogative aspects of these evaluations lack diversity. As a result, various facets of explainability pertaining to GNNs, such as a comparative analysis of counterfactual reasoners, their stability to variational factors such as different GNN architectures, noise, stochasticity in non-convex loss surfaces, feasibility amidst domain constraints, and so forth, have yet to be formally investigated. Motivated by this need, we present a benchmarking study on perturbation-based explainability methods for GNNs, aiming to systematically evaluate and compare a wide range of explainability techniques. Among the key findings of our study, we identify the Pareto-optimal methods that exhibit superior efficacy and stability in the presence of noise. Nonetheless, our study reveals that all algorithms are affected by stability issues when faced with noisy data. Furthermore, we have established that the current generation of counterfactual explainers often fails to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations. Overall, this benchmarking study empowers stakeholders in the field of GNNs with a comprehensive understanding of the state-of-the-art explainability methods, potential research problems for further enhancement, and the implications of their application in real-world scenarios.
    摘要 许多解释方法已经被提出来揭示Graph Neural Networks(GNN)的内部工作方式。尽管所有提出的算法都包括了实验评估,但这些评估的问题缺乏多样性。因此,许多GNN解释方面的问题,如对冲抗理解、不同GNN架构、噪声、随机性等因素的稳定性、遵循域内约束等等,尚未得到正式的探讨。为了解决这一需求,我们提出了对GNN解释方法的批判性研究,旨在系统地评估和比较一系列解释技术。在我们的研究中,我们发现了对噪声的抗噪声方法,并证明了这些方法在不同的GNN架构和噪声水平下的稳定性。然而,我们的研究还发现,所有的算法都受到噪声数据的稳定性问题的影响。此外,我们发现,当前的对冲抗解释器frequently fails to provide feasible recourses due to violations of topological constraints encoded by domain-specific considerations。总的来说,这项benchmarking研究为GNN领域的投资者提供了全面的解释方法的状态、需要进一步改进的研究问题以及在实际应用中的影响。

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

  • paper_url: http://arxiv.org/abs/2310.01769
  • repo_url: None
  • paper_authors: Nuoya Xiong, Lijun Ding, Simon S. Du
  • for: 这 paper 证明了过参数化对 gradient descent (GD) 的抽象行为的影响,具体来说是在矩阵感知问题中,要从near-isotropic linear measurements中recover一个未知的低维度基础矩阵 $M^{*}$。
  • methods: 这 paper 使用了一种新的 $\Omega (1/T^2)$ 下界,以及建立在先前的工作之上的全球精确减法结果,来研究过参数化和 precisions 的效果。
  • results: 这 paper 获得了以下结果:在过参数化情况下 ($k>r$), randomly initialized GD 的 convergencerate是 $\exp (-\Omega (T))$,而在 precisions 情况下 ($k=r$),则是 $\exp (-\Omega (T))$。此外,paper 还提出了一种修改 GD 的一步方法,以实现精确减法结果的独立于 $\alpha$ 的recovery。
    Abstract This paper rigorously shows how over-parameterization changes the convergence behaviors of gradient descent (GD) for the matrix sensing problem, where the goal is to recover an unknown low-rank ground-truth matrix from near-isotropic linear measurements. First, we consider the symmetric setting with the symmetric parameterization where $M^* \in \mathbb{R}^{n \times n}$ is a positive semi-definite unknown matrix of rank $r \ll n$, and one uses a symmetric parameterization $XX^\top$ to learn $M^*$. Here $X \in \mathbb{R}^{n \times k}$ with $k > r$ is the factor matrix. We give a novel $\Omega (1/T^2)$ lower bound of randomly initialized GD for the over-parameterized case ($k >r$) where $T$ is the number of iterations. This is in stark contrast to the exact-parameterization scenario ($k=r$) where the convergence rate is $\exp (-\Omega (T))$. Next, we study asymmetric setting where $M^* \in \mathbb{R}^{n_1 \times n_2}$ is the unknown matrix of rank $r \ll \min\{n_1,n_2\}$, and one uses an asymmetric parameterization $FG^\top$ to learn $M^*$ where $F \in \mathbb{R}^{n_1 \times k}$ and $G \in \mathbb{R}^{n_2 \times k}$. Building on prior work, we give a global exact convergence result of randomly initialized GD for the exact-parameterization case ($k=r$) with an $\exp (-\Omega(T))$ rate. Furthermore, we give the first global exact convergence result for the over-parameterization case ($k>r$) with an $\exp(-\Omega(\alpha^2 T))$ rate where $\alpha$ is the initialization scale. This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence. On the other hand, we propose a novel method that only modifies one step of GD and obtains a convergence rate independent of $\alpha$, recovering the rate in the exact-parameterization case.
    摘要 这个论文证明了过参数化对于梯度下降(GD)在矩阵感知问题中的 converges 行为,其目标是从近iso特RIGHT linear measurements中 recover 一个未知低级matrix $M^* \in \mathbb{R}^{n \times n}$。我们首先考虑了对称参数化的情况,其中 $M^*$ 是一个正 semi-definite 的 unknown matrix 的rank $r \ll n$,并使用对称参数化 $XX^\top$ 来学习 $M^*$。在这种情况下,我们给出了一个 novel $\Omega (1/T^2)$ 下界,表明 randomly initialized GD 在过参数化情况($k > r$)下 converges 的速度是 $\Omega (1/T^2)$。这与精确参数化情况($k=r$)下的 converge 速度 $\exp (-\Omega (T))$ 是 stark contrast。接下来,我们研究了不对称参数化的情况,其中 $M^* \in \mathbb{R}^{n_1 \times n_2}$ 是一个未知矩阵的rank $r \ll \min\{n_1,n_2\}$,并使用不对称参数化 $FG^\top$ 来学习 $M^*$。基于先前的工作,我们给出了一个全球精确 converge 结果,表明 randomly initialized GD 在精确参数化情况($k=r$) 下 converge 的速度是 $\exp (-\Omega(T))$。此外,我们给出了首次全球精确 converge 结果,表明在过参数化情况($k>r$) 下, randomly initialized GD 的 converge 速度是 $\exp(-\Omega(\alpha^2 T))$,其中 $\alpha$ 是初始化的涨scale。这个线性 converge 结果在过参数化情况下是非常 significante,因为可以将对称参数化应用到symmetric setting中,从 $\Omega (1/T^2)$ 提升到线性 converge。此外,我们提出了一种新的方法,只需要修改一步的 GD,可以获得独立于 $\alpha$ 的 converge 速度,recovering the rate in the exact-parameterization case。

Backdiff: a diffusion model for generalized transferable protein backmapping

  • paper_url: http://arxiv.org/abs/2310.01768
  • repo_url: None
  • paper_authors: Yikai Liu, Ming Chen, Guang Lin
  • for: 本研究旨在提出一种通用的背景映射方法,可以应用于不同的粗化模型和蛋白质结构。
  • methods: 我们提出了一种基于梯度分布模型的生成模型,称为BackDiff,可以实现对蛋白质结构的背景映射。BackDiff使用了 Conditional score-based diffusion model with geometric representations,并在自我监督训练框架中适应不同的粗化模型。
  • results: 我们的BackDiff方法在多种流行的粗化模型上进行了广泛的实验,并证明了它的超过 existing state-of-the-art 性能和通用性。此外,BackDiff方法还可以无需重新训练,对不同的蛋白质进行高效的采样。
    Abstract Coarse-grained (CG) models play a crucial role in the study of protein structures, protein thermodynamic properties, and protein conformation dynamics. Due to the information loss in the coarse-graining process, backmapping from CG to all-atom configurations is essential in many protein design and drug discovery applications when detailed atomic representations are needed for in-depth studies. Despite recent progress in data-driven backmapping approaches, devising a backmapping method that can be universally applied across various CG models and proteins remains unresolved. In this work, we propose BackDiff, a new generative model designed to achieve generalization and reliability in the protein backmapping problem. BackDiff leverages the conditional score-based diffusion model with geometric representations. Since different CG models can contain different coarse-grained sites which include selected atoms (CG atoms) and simple CG auxiliary functions of atomistic coordinates (CG auxiliary variables), we design a self-supervised training framework to adapt to different CG atoms, and constrain the diffusion sampling paths with arbitrary CG auxiliary variables as conditions. Our method facilitates end-to-end training and allows efficient sampling across different proteins and diverse CG models without the need for retraining. Comprehensive experiments over multiple popular CG models demonstrate BackDiff's superior performance to existing state-of-the-art approaches, and generalization and flexibility that these approaches cannot achieve. A pretrained BackDiff model can offer a convenient yet reliable plug-and-play solution for protein researchers, enabling them to investigate further from their own CG models.
    摘要 粗粒化(CG)模型在蛋白结构、蛋白热力学性质和蛋白构象动力学中扮演着关键的角色。由于粗粒化过程中的信息损失,从CG到全原子配置的映射是蛋白设计和药物探索应用中的关键环节。尽管最近的数据驱动回映方法已经取得了一定的进步,但是为了在不同的CG模型和蛋白之间实现通用的回映方法,仍然是一个未解决的问题。在这种情况下,我们提出了BackDiff,一种新的生成模型,用于解决蛋白回映问题。BackDiff利用了条件分数基 diffusion 模型和几何表示。由于不同的CG模型可以包含不同的粗粒化站点,我们设计了一种自我超vised 训练框架,以适应不同的CG atoms,并将条件推送样本路径约束为CG auxiliary variables。我们的方法实现了终端训练和高效的样本探索,不需要重新训练。广泛的实验表明,BackDiff的性能高于现有的状态 искусственного智能方法,并且具有这些方法无法实现的通用性和灵活性。一个预训练的 BackDiff 模型可以提供一个便捷又可靠的插件解决方案,帮助蛋白研究人员进一步探索他们自己的CG模型。

Exploring Counterfactual Alignment Loss towards Human-centered AI

  • paper_url: http://arxiv.org/abs/2310.01766
  • repo_url: None
  • paper_authors: Mingzhou Liu, Xinwei Sun, Ching-Wen Lee, Yu Qiao, Yizhou Wang
  • for: 这篇论文旨在提高深度神经网络在监督学习任务中的可信度,特别是在安全批评领域如医疗保健中。
  • methods: 这篇论文提出了一个新的人类中心框架,利用Counterfactual Generation来实现人类中心模型。这个框架使用Counterfactual Generation的能力来做 causa attribution,实现了模型预测结果的实质拓展。
  • results: 这篇论文在一个肺癌诊断数据集上进行了实验,结果显示了实质的人类与模型之间的对准。
    Abstract Deep neural networks have demonstrated impressive accuracy in supervised learning tasks. However, their lack of transparency makes it hard for humans to trust their results, especially in safe-critic domains such as healthcare. To address this issue, recent explanation-guided learning approaches proposed to align the gradient-based attention map to image regions annotated by human experts, thereby obtaining an intrinsically human-centered model. However, the attention map these methods are based on may fail to causally attribute the model predictions, thus compromising their validity for alignment. To address this issue, we propose a novel human-centered framework based on counterfactual generation. In particular, we utilize the counterfactual generation's ability for causal attribution to introduce a novel loss called the CounterFactual Alignment (CF-Align) loss. This loss guarantees that the features attributed by the counterfactual generation for the classifier align with the human annotations. To optimize the proposed loss that entails a counterfactual generation with an implicit function form, we leverage the implicit function theorem for backpropagation. Our method is architecture-agnostic and, therefore can be applied to any neural network. We demonstrate the effectiveness of our method on a lung cancer diagnosis dataset, showcasing faithful alignment to humans.
    摘要 To address this issue, we propose a novel human-centered framework based on counterfactual generation. Our approach utilizes the counterfactual generation's ability to provide causal attribution, and introduces a new loss function called the Counterfactual Alignment (CF-Align) loss. This loss ensures that the features attributed by the counterfactual generation for the classifier align with human annotations.To optimize the proposed loss, we leverage the implicit function theorem for backpropagation. Our method is architecture-agnostic, meaning it can be applied to any neural network. We demonstrate the effectiveness of our method on a lung cancer diagnosis dataset, showing faithful alignment with human annotations.

Data Cleaning and Machine Learning: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2310.01765
  • repo_url: None
  • paper_authors: Pierre-Olivier Côté, Amin Nikanjam, Nafisa Ahmed, Dmytro Humeniuk, Foutse Khomh
    for: 这篇论文的目的是总结最新的数据清洁方法,以及 ML 技术是如何用于数据清洁的。methods: 该论文使用系统性的文献综述方法,从 2016 年至 2022 年 inclusively 检索了相关的学术论文。results: 该论文总结了 101 篇论文,其中涵盖了多种数据清洁活动,如 feature cleaning、label cleaning、实体匹配、异常检测、填充和整体数据清洁。此外,论文还提出了 24 个未来工作建议。
    Abstract Context: Machine Learning (ML) is integrated into a growing number of systems for various applications. Because the performance of an ML model is highly dependent on the quality of the data it has been trained on, there is a growing interest in approaches to detect and repair data errors (i.e., data cleaning). Researchers are also exploring how ML can be used for data cleaning; hence creating a dual relationship between ML and data cleaning. To the best of our knowledge, there is no study that comprehensively reviews this relationship. Objective: This paper's objectives are twofold. First, it aims to summarize the latest approaches for data cleaning for ML and ML for data cleaning. Second, it provides future work recommendations. Method: We conduct a systematic literature review of the papers published between 2016 and 2022 inclusively. We identify different types of data cleaning activities with and for ML: feature cleaning, label cleaning, entity matching, outlier detection, imputation, and holistic data cleaning. Results: We summarize the content of 101 papers covering various data cleaning activities and provide 24 future work recommendations. Our review highlights many promising data cleaning techniques that can be further extended. Conclusion: We believe that our review of the literature will help the community develop better approaches to clean data.
    摘要 <>传输文本到Simplified Chinese。<>Context: Machine Learning (ML) 已经被应用于多种系统中,其性能强度取决于训练数据的质量。因此,数据错误检测和修复(即数据清洁)已成为一项热点研究领域。研究人员还在探索如何使用 ML 进行数据清洁,从而创造了 ML 和数据清洁之间的双重关系。根据我们所知,现有的研究都没有全面评估这种关系。目标:本文的目标是两fold。首先,它旨在概括最新的数据清洁技术 для ML 和 ML для数据清洁。第二,它提供未来工作建议。方法:我们进行了2016年至2022年 вклюlusively 发表的文献系统性文献评估。我们认定了不同类型的数据清洁活动,包括特征清洁、标签清洁、实体匹配、异常检测、填充和整体数据清洁。结果:我们总结了101篇文章,涵盖了多种数据清洁活动,并提供了24个未来工作建议。我们的评估显示了许多有前途的数据清洁技术,可以进一步推广。结论:我们认为,我们的文献评估会帮助社区开发更好的数据清洁方法。

Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization

  • paper_url: http://arxiv.org/abs/2310.01762
  • repo_url: None
  • paper_authors: Frederic Koehler, Thuy-Duong Vuong
  • for: 本文研究了使用基本Score函数模型来学习数据中的分布。
  • methods: 本文使用了Hyv"arinen提出的vanilla score匹配方法,并证明了这种方法可以成功地生成自然的多模态分布。
  • results: 本文通过实验表明,使用Langevin随机 движение和早期停止,初始化为数据的empirical分布,可以成功地生成多模态分布(包括log-concave分布的混合)。
    Abstract There is a long history, as well as a recent explosion of interest, in statistical and generative modeling approaches based on score functions -- derivatives of the log-likelihood of a distribution. In seminal works, Hyv\"arinen proposed vanilla score matching as a way to learn distributions from data by computing an estimate of the score function of the underlying ground truth, and established connections between this method and established techniques like Contrastive Divergence and Pseudolikelihood estimation. It is by now well-known that vanilla score matching has significant difficulties learning multimodal distributions. Although there are various ways to overcome this difficulty, the following question has remained unanswered -- is there a natural way to sample multimodal distributions using just the vanilla score? Inspired by a long line of related experimental works, we prove that the Langevin diffusion with early stopping, initialized at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions (mixtures of log-concave distributions).
    摘要 有一长史和最近一次爆发的兴趣在统计和生成模型方面,基于分数函数——极值函数的Derivative。在经典著作中,Hyvärinen提出了vanilla score匹配作为从数据学习分布的方法,并建立了与已知技术如对称随机过程和 Pseudolikelihood估计的连接。现在已经 widely known that vanilla score匹配对多模分布有问题。虽然有各种方法可以解决这问题,但这个问题仍然未被回答——是否有一个自然的方法使用just vanilla score来抽象多模分布?受到相关实验工作的激发,我们证明了Langevin diffusion with early stopping,从empirical distribution初始化,并在数据上Estimate score函数后,能够成功生成自然的多模分布(mixture of log-concave distributions)。

Linearization of ReLU Activation Function for Neural Network-Embedded Optimization:Optimal Day-Ahead Energy Scheduling

  • paper_url: http://arxiv.org/abs/2310.01758
  • repo_url: None
  • paper_authors: Cunzhi Zhao, Xingpeng Li
  • for: 这个论文旨在提出一种解决使用神经网络模型和ReLU活化函数时难以解决的优化问题的方法。
  • methods: 该论文提出了四种适用于ReLU活化函数的 linearization 方法,并对它们进行了分析和比较。
  • results: 该论文的实验结果表明,这些提出的 linearization 方法可以有效地解决神经网络模型中的非线性问题,并使优化问题更容易解决。
    Abstract Neural networks have been widely applied in the power system area. They can be used for better predicting input information and modeling system performance with increased accuracy. In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models that enforce limits on the output of the same learning model. This will create a neural network-embedded optimization problem; the use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable. To address this emerging challenge, this paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function. Four linearization methods tailored for the ReLU activation function are developed, analyzed and compared in this paper. Each method employs a set of linear constraints to replace the ReLU function, effectively linearizing the optimization problem, which can overcome the computational challenges associated with the nonlinearity of the neural network model. These proposed linearization methods provide valuable tools for effectively solving optimization problems that integrate neural network models with ReLU activation functions.
    摘要

CausalTime: Realistically Generated Time-series for Benchmarking of Causal Discovery

  • paper_url: http://arxiv.org/abs/2310.01753
  • repo_url: https://github.com/jarrycyx/unn
  • paper_authors: Yuxiao Cheng, Ziqian Wang, Tingxiong Xiao, Qin Zhong, Jinli Suo, Kunlun He
  • for: 这个研究旨在提供一种可靠的时间序列 causal discovery (TSCD)评估方法,以便在实际应用中评估TSCD算法的性能。
  • methods: 该研究提出了一个名为 CausalTime 的数据生成管道,该管道可以生成具有真实时间序列特征的时间序列数据,并且可以提供准确的 causal 图。
  • results: 在实验中,CausalTime 管道生成的数据被证明是具有高准确性和可靠性的,并且可以用于评估TSCD算法的性能。此外,该研究还提供了一个用户友好的网站(www.causaltime.cc),以便易于使用该方法。
    Abstract Time-series causal discovery (TSCD) is a fundamental problem of machine learning. However, existing synthetic datasets cannot properly evaluate or predict the algorithms' performance on real data. This study introduces the CausalTime pipeline to generate time-series that highly resemble the real data and with ground truth causal graphs for quantitative performance evaluation. The pipeline starts from real observations in a specific scenario and produces a matching benchmark dataset. Firstly, we harness deep neural networks along with normalizing flow to accurately capture realistic dynamics. Secondly, we extract hypothesized causal graphs by performing importance analysis on the neural network or leveraging prior knowledge. Thirdly, we derive the ground truth causal graphs by splitting the causal model into causal term, residual term, and noise term. Lastly, using the fitted network and the derived causal graph, we generate corresponding versatile time-series proper for algorithm assessment. In the experiments, we validate the fidelity of the generated data through qualitative and quantitative experiments, followed by a benchmarking of existing TSCD algorithms using these generated datasets. CausalTime offers a feasible solution to evaluating TSCD algorithms in real applications and can be generalized to a wide range of fields. For easy use of the proposed approach, we also provide a user-friendly website, hosted on www.causaltime.cc.
    摘要 时序序 causal discovery (TSCD) 是机器学习的基本问题。然而,现有的 sintetic 数据集不能正确评估或预测算法的性能在实际数据上。这项研究提出了 CausalTime 管道,用于生成高度相似于实际数据的时序序数据,并提供了相关的真实 causal 图 для量化性能评估。该管道从实际观察开始,生成匹配的 benchmark 数据集。首先,我们利用深度神经网络和正则化流来准确捕捉实际动力。其次,我们从神经网络中提取出假设 causal 图,通过重要性分析或利用先验知识。最后,我们 derive 真实 causal 图,将 causal 模型分解为 causal 项、剩余项和噪声项。使用已经适应的网络和 derive 的 causal 图,我们生成对应的 versatile 时序序数据,适用于算法评估。在实验中,我们验证生成的数据的准确性通过 качеitative 和量化的实验,然后使用这些生成的数据进行 TSCD 算法的 benchmarking。CausalTime 提供了评估 TSCD 算法的可行解决方案,可扩展到广泛的领域。为方便使用该方法,我们还提供了一个 user-friendly 的网站,位于 www.causaltime.cc。

5G Network Slicing: Analysis of Multiple Machine Learning Classifiers

  • paper_url: http://arxiv.org/abs/2310.01747
  • repo_url: None
  • paper_authors: Mirsad Malkoc, Hisham A. Kholidy
  • for: 本研究旨在调查不同机器学习算法在检测网络slice的准确性和精度。
  • methods: 本研究使用了多种机器学习算法,包括logistic regression模型、线性积分模型、k-最近邻模型、决策树模型、随机森林模型、SVC BernoulliNB模型和 GaussianNB模型。
  • results: 研究发现,SVC BernoulliNB模型和GaussianNB模型在检测网络slice的准确性和精度方面表现最佳,而其他机器学习算法的性能相对较差。
    Abstract The division of one physical 5G communications infrastructure into several virtual network slices with distinct characteristics such as bandwidth, latency, reliability, security, and service quality is known as 5G network slicing. Each slice is a separate logical network that meets the requirements of specific services or use cases, such as virtual reality, gaming, autonomous vehicles, or industrial automation. The network slice can be adjusted dynamically to meet the changing demands of the service, resulting in a more cost-effective and efficient approach to delivering diverse services and applications over a shared infrastructure. This paper assesses various machine learning techniques, including the logistic regression model, linear discriminant model, k-nearest neighbor's model, decision tree model, random forest model, SVC BernoulliNB model, and GaussianNB model, to investigate the accuracy and precision of each model on detecting network slices. The report also gives an overview of 5G network slicing.
    摘要 “5G 通信基础设施的分割成多个虚拟网络slice,每个slice具有不同的特点,如带宽、延迟、可靠性、安全性和服务质量,被称为5G网络分割。每个slice是独立的逻辑网络,满足特定服务或应用场景的需求,如虚拟现实、游戏、自动驾驶、工业自动化等。网络slice可以在服务需求变化时进行动态调整,从而实现更加cost-effective和高效的多服务应用执行。本文使用不同的机器学习技术,包括Logistic Regression模型、线性混合模型、k-最近邻模型、决策树模型、Random Forest模型、SVM BernoulliNB模型和GaussianNB模型,来评估每种模型在检测网络slice的准确性和精度。报告还提供了5G网络分割的概述。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Randomized Dimension Reduction with Statistical Guarantees

  • paper_url: http://arxiv.org/abs/2310.01739
  • repo_url: None
  • paper_authors: Yijun Dong
  • for: 本研究旨在提出快速减少维度的算法和有效利用数据的算法,以推动算法的可能性的推进。
  • methods: 本研究使用了”矩阵绘制”技术来实现快速随机低维度分解算法,以及在不同情况下进行数据增强学习算法。
  • results: 本研究实现了快速减少维度的算法和有效利用数据的算法,并且提供了一些有效的数据增强学习算法,以提高算法的泛化和分布性 robustness。
    Abstract Large models and enormous data are essential driving forces of the unprecedented successes achieved by modern algorithms, especially in scientific computing and machine learning. Nevertheless, the growing dimensionality and model complexity, as well as the non-negligible workload of data pre-processing, also bring formidable costs to such successes in both computation and data aggregation. As the deceleration of Moore's Law slackens the cost reduction of computation from the hardware level, fast heuristics for expensive classical routines and efficient algorithms for exploiting limited data are increasingly indispensable for pushing the limit of algorithm potency. This thesis explores some of such algorithms for fast execution and efficient data utilization. From the computational efficiency perspective, we design and analyze fast randomized low-rank decomposition algorithms for large matrices based on "matrix sketching", which can be regarded as a dimension reduction strategy in the data space. These include the randomized pivoting-based interpolative and CUR decomposition discussed in Chapter 2 and the randomized subspace approximations discussed in Chapter 3. From the sample efficiency perspective, we focus on learning algorithms with various incorporations of data augmentation that improve generalization and distributional robustness provably. Specifically, Chapter 4 presents a sample complexity analysis for data augmentation consistency regularization where we view sample efficiency from the lens of dimension reduction in the function space. Then in Chapter 5, we introduce an adaptively weighted data augmentation consistency regularization algorithm for distributionally robust optimization with applications in medical image segmentation.
    摘要 大型模型和庞大数据是现代算法取得成就的关键驱动力,特别是在科学计算和机器学习领域。然而,随着模型复杂度和数据维度的增加,以及数据处理的非轻量级劳动力,也带来了计算和数据聚合的沉重成本。随着莫勒定律的减速,计算硬件层次的成本减少不再是可预测的,因此快速的经典算法优化和有效地利用数据的算法变得越来越重要。这个论文探讨一些用于快速执行和有效地利用数据的算法。从计算效率角度来看,我们设计和分析了一些基于矩阵绘图的快速随机低级别分解算法,包括在第2章中介绍的随机扫描基于 interpolative 和 CUR 分解,以及在第3章中介绍的随机子空间近似。从样本效率角度来看,我们关注使用不同的数据扩展技术来提高通用化和分布robustness的学习算法。特别是在第4章中,我们提供了样本复杂度分析为数据扩展一致规范,视样本效率为维度减少在函数空间中。然后在第5章中,我们介绍了一种自适应权重数据扩展一致规范算法,用于分布robust优化,并应用于医学图像分割。

Large Language Models for Test-Free Fault Localization

  • paper_url: http://arxiv.org/abs/2310.01726
  • repo_url: https://github.com/squareslab/llmao
  • paper_authors: Aidan Z. H. Yang, Ruben Martins, Claire Le Goues, Vincent J. Hellendoorn
  • for: 本研究旨在自动地 lokalisierung buggy 代码行,这是许多手动和自动调试任务的关键开头。先前的 FL 技术假设输入测试的提供,并经常需要广泛的程序分析、程序工具或数据处理。先前的深度学习 для APR 努力学习从小数据集中学习,但是其生成的结果对实际程序而言有限。
  • methods: 我们采用了大语言模型(LLMs)的能力,它可以根据几个示例适应新任务。我们调整了一小组批量的方向性Adapter层,以便在 LLMs 学习的表示之上生成 LLMAO,这是首个不需要输入测试信息的语言模型基于的错误 lokalisierung 方法。
  • results: 我们的技术可以在不同的模型大小下实现显著更高的确定性,并且 bug lokalisierung 性能随模型大小呈指数增长。我们在小 manually cura 的报告中训练了 350 万、6 亿和 16 亿参数的 LLMs,并观察到我们的技术可以在大型模型上实现更高的性能。我们的实验表明,LLMAO 可以在 state-of-the-art 机器学习 fault localization (MLFL)基线上提高 Top-1 结果,从而提高 bug lokalisierung 性能。此外,LLMAO 还是首个使用语言模型架构来检测代码行级别安全漏洞的 FL 技术。
    Abstract Fault Localization (FL) aims to automatically localize buggy lines of code, a key first step in many manual and automatic debugging tasks. Previous FL techniques assume the provision of input tests, and often require extensive program analysis, program instrumentation, or data preprocessing. Prior work on deep learning for APR struggles to learn from small datasets and produces limited results on real-world programs. Inspired by the ability of large language models (LLMs) of code to adapt to new tasks based on very few examples, we investigate the applicability of LLMs to line level fault localization. Specifically, we propose to overcome the left-to-right nature of LLMs by fine-tuning a small set of bidirectional adapter layers on top of the representations learned by LLMs to produce LLMAO, the first language model based fault localization approach that locates buggy lines of code without any test coverage information. We fine-tune LLMs with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs such as the Defects4J corpus. We observe that our technique achieves substantially more confidence in fault localization when built on the larger models, with bug localization performance scaling consistently with the LLM size. Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%. LLMAO is also the first FL technique trained using a language model architecture that can detect security vulnerabilities down to the code line level.
    摘要 fault localization (FL) 目的是自动LOCATE buggy 代码行,这是许多手动和自动调试任务的关键第一步。先前的 FL 技术假设提供输入测试,并经常需要广泛的程序分析、程序 инструментирование或数据处理。关于深度学习 для APR 的先前工作困难以从小数据集学习,并且在实际程序上提供有限的结果。我们由 code 大型语言模型(LLMs)的能力启发,LLMs 可以适应新任务基于非常少的示例,我们调查 LLMAO 是否可以在没有测试覆盖信息的情况下自动LOCATE buggy 代码。我们提出了在 LLMs 上 fine-tune 小量的 bidirectional adapter layers,以生成 LLMAO,这是基于语言模型的代码LOCATE approach。我们 fine-tune LLMs 的参数为 350 million、6 billion 和 16 billion,并在手动精心选择的 buggy 程序 corpora 上进行训练。我们发现,当使用更大的模型时,我们的技术可以获得更高的信任度,并且 bug localization 性能随模型大小呈指数增长。我们的实验表明,LLMAO 可以在 state-of-the-art machine learning fault localization (MLFL) 基线上提高 Top-1 结果,并且 Top-5 结果均提高 14.4%-35.6%。此外,LLMAO 是第一个使用语言模型架构的代码LOCATE 技术,可以在代码行级别检测安全漏洞。

Large Language Models as Analogical Reasoners

  • paper_url: http://arxiv.org/abs/2310.01714
  • repo_url: None
  • paper_authors: Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, Denny Zhou
  • for: 增强语言模型的推理能力
  • methods: 使用自动生成的例子和知识来引导推理过程
  • results: 在多种推理任务中表现出色,比如GSM8K和MATH中的数学问题解决、Codeforces中的代码生成和BIG-Bench中的其他推理任务等。
    Abstract Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, Analogical Prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.
    摘要 Chain-of-thought(CoT)提示对语言模型表现出色,但通常需要标注的示例来引导理智过程。在这项工作中,我们介绍了一种新的提示方法,即相似性提示(Analogical Prompting),用于自动引导语言模型的理智过程。这种方法灵感于人类的相似理智,人们在面临新问题时从相关的过去经验中练习出答案。我们的方法会让语言模型在 Context 中自动生成相关的示例或知识,然后解决给定的问题。这种方法具有以下优点:不需要标注或检索示例,更加通用和便利;同时,可以根据每个问题自动生成适应性的示例和知识。实验结果表明,我们的方法在多种理智任务中超过零shot CoT和手动几shot CoT,包括 GSM8K 和 MATH 中的数学问题解决、Codeforces 中的代码生成和其他理智任务在 BIG-Bench 中。

On Representation Complexity of Model-based and Model-free Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.01706
  • repo_url: None
  • paper_authors: Hanlin Zhu, Baihe Huang, Stuart Russell
  • for: 本研究探讨了基于模型和无模型的强化学习(RL)在Circuit复杂度上的表示复杂性。
  • methods: 我们使用了理论方法证明了一类MDP中的转移和奖励函数可以被表示为常数深度电路的 polynomialsize,而且优化的$Q$-函数具有常数深度电路中的过程复杂度。
  • results: 我们的理论表明,在某些情况下,模型基于的算法可以更好地采用更少的样本来学习环境,而无模型基的算法往往需要更多的样本来学习。我们还通过对多种Mujoco环境的证明,证明了这种差异。
    Abstract We study the representation complexity of model-based and model-free reinforcement learning (RL) in the context of circuit complexity. We prove theoretically that there exists a broad class of MDPs such that their underlying transition and reward functions can be represented by constant depth circuits with polynomial size, while the optimal $Q$-function suffers an exponential circuit complexity in constant-depth circuits. By drawing attention to the approximation errors and building connections to complexity theory, our theory provides unique insights into why model-based algorithms usually enjoy better sample complexity than model-free algorithms from a novel representation complexity perspective: in some cases, the ground-truth rule (model) of the environment is simple to represent, while other quantities, such as $Q$-function, appear complex. We empirically corroborate our theory by comparing the approximation error of the transition kernel, reward function, and optimal $Q$-function in various Mujoco environments, which demonstrates that the approximation errors of the transition kernel and reward function are consistently lower than those of the optimal $Q$-function. To the best of our knowledge, this work is the first to study the circuit complexity of RL, which also provides a rigorous framework for future research.
    摘要