cs.LG - 2023-09-01

Universal Normalization Enhanced Graph Representation Learning for Gene Network Prediction

  • paper_url: http://arxiv.org/abs/2309.00738
  • repo_url: None
  • paper_authors: Zehao Dong, Muhan Zhang, Qihang Zhao, Philip R. O. Payne, Michael Province, Carlos Cruchaga, Tianyu Zhao, Yixin Chen, Fuhai Li
  • for: 这篇论文旨在提高生物信息学中的基因网络表示学问题中的表现力,通过对基因网络进行ormalization来提高基因网络表示学模型的稳定性和表现力。
  • methods: 本文提出了一个 novel UNGNN(通用normalized GNN)框架,它在基因网络中实现了通用的均值normalization,以提高基因网络表示学模型的表现力。
  • results: 根据实验结果,UNGNN模型在基因网络基础上的表现力比前一代基因网络表示学模型高出16%的表现力。此外,UNGNN模型在其他具有通用均值normalization的图形资料上也实现了superior表现。
    Abstract Effective gene network representation learning is of great importance in bioinformatics to predict/understand the relation of gene profiles and disease phenotypes. Though graph neural networks (GNNs) have been the dominant architecture for analyzing various graph-structured data like social networks, their predicting on gene networks often exhibits subpar performance. In this paper, we formally investigate the gene network representation learning problem and characterize a notion of \textit{universal graph normalization}, where graph normalization can be applied in an universal manner to maximize the expressive power of GNNs while maintaining the stability. We propose a novel UNGNN (Universal Normalized GNN) framework, which leverages universal graph normalization in both the message passing phase and readout layer to enhance the performance of a base GNN. UNGNN has a plug-and-play property and can be combined with any GNN backbone in practice. A comprehensive set of experiments on gene-network-based bioinformatical tasks demonstrates that our UNGNN model significantly outperforms popular GNN benchmarks and provides an overall performance improvement of 16 $\%$ on average compared to previous state-of-the-art (SOTA) baselines. Furthermore, we also evaluate our theoretical findings on other graph datasets where the universal graph normalization is solvable, and we observe that UNGNN consistently achieves the superior performance.
    摘要 <>转换文本为简化中文。<>生物信息学中有效的基因网络表示学习非常重要,以预测/理解基因Profile和疾病表型之间的关系。虽然图神经网络(GNNs)已经在社会网络等多种图结构数据上进行分析,但它们在基因网络上预测通常表现不佳。在这篇论文中,我们正式调查基因网络表示学习问题,并提出了一种通用图Normalization的概念,可以在universal manner中减少图结构数据的表达能力,同时保持稳定。我们提出了一种UNGNN(通用Normalized GNN)框架,它在消息传递阶段和读取层都应用通用图Normalization,以提高基于GNN的表达能力。UNGNN具有插入性的性质,可以与任何GNN脊梁结合使用。我们进行了基于基因网络的生物信息学任务的广泛实验,并证明了我们的UNGNN模型在SOTA基elines上提供了16%的平均性能提升。此外,我们还评估了我们的理论发现在其他可解 Graphdataset上,并发现UNGNN在这些dataset上一直保持了最高表现。

Prediction Error Estimation in Random Forests

  • paper_url: http://arxiv.org/abs/2309.00736
  • repo_url: https://github.com/iankrupkin/Prediction-Error-Estimation-in-Random-Forests
  • paper_authors: Ian Krupkin, Johanna Hardin
  • for: 这paper主要研究了Random Forests中的错误估计。
  • methods: 该paper使用了Bates et al. (2023)提出的初始理论框架,对Random Forests中的错误估计进行了理论和实验研究。
  • results: 研究发现,在分类情况下,Random Forests的预测错误估计比true error rate更加接近。这与Bates et al. (2023)的结论相反,该结论是为логистиelde regression。此外,该结论在不同的错误估计策略(如cross-validation、bagging和数据分割)下都持平。
    Abstract In this paper, error estimates of classification Random Forests are quantitatively assessed. Based on the initial theoretical framework built by Bates et al. (2023), the true error rate and expected error rate are theoretically and empirically investigated in the context of a variety of error estimation methods common to Random Forests. We show that in the classification case, Random Forests' estimates of prediction error is closer on average to the true error rate instead of the average prediction error. This is opposite the findings of Bates et al. (2023) which were given for logistic regression. We further show that this result holds across different error estimation strategies such as cross-validation, bagging, and data splitting.
    摘要 在这篇论文中,Random Forests 分类预测错误估计的量化评估。基于Bates et al.(2023)提出的初始理论框架,我们 theoretically和empirically investigate了Random Forests中的真正错误率和预测错误率在不同的错误估计策略中的关系。我们发现在分类 caso,Random Forests 的预测错误估计比true error rate更加接近,而不是average prediction error。这与Bates et al.(2023)关于 logistic regression 的发现相反。我们还证明了这个结果在不同的错误估计策略,如交叉验证、bagging 和数据分割中都是如此。

Tempestas ex machina: A review of machine learning methods for wavefront control

  • paper_url: http://arxiv.org/abs/2309.00730
  • repo_url: None
  • paper_authors: J. Fowler, Rico Landman
  • for: 这篇论文的目的是为了开发和探索用于内部探测地球型行星的适应光学系统,并且探讨这些技术可以帮助我们的适应光学系统实现更高的影像质量。
  • methods: 这篇论文使用了机器学习技术来改善适应光学系统的波前控制,并且探讨了各种机器学习方法的应用。
  • results: 这篇论文总结了过去几十年来关于适应光学系统的机器学习方法的研究,并且提出了一些新的机器学习方法来改善适应光学系统的影像质量。
    Abstract As we look to the next generation of adaptive optics systems, now is the time to develop and explore the technologies that will allow us to image rocky Earth-like planets; wavefront control algorithms are not only a crucial component of these systems, but can benefit our adaptive optics systems without requiring increased detector speed and sensitivity or more effective and efficient deformable mirrors. To date, most observatories run the workhorse of their wavefront control as a classic integral controller, which estimates a correction from wavefront sensor residuals, and attempts to apply that correction as fast as possible in closed-loop. An integrator of this nature fails to address temporal lag errors that evolve over scales faster than the correction time, as well as vibrations or dynamic errors within the system that are not encapsulated in the wavefront sensor residuals; these errors impact high contrast imaging systems with complex coronagraphs. With the rise in popularity of machine learning, many are investigating applying modern machine learning methods to wavefront control. Furthermore, many linear implementations of machine learning methods (under varying aliases) have been in development for wavefront control for the last 30-odd years. With this work we define machine learning in its simplest terms, explore the most common machine learning methods applied in the context of this problem, and present a review of the literature concerning novel machine learning approaches to wavefront control.
    摘要 为了下一代适应光学系统,现在是时候开发和探索能够捕捉岩石地球类行星的图像技术。扩散前方控制算法不仅是这些系统的关键组件,而且可以为我们的适应光学系统带来更多的好处,无需增加探测器的速度和敏感度,或者更有效和高效的可变镜。至今为止,大多数天文台都运行着 класси的积分控制器,这种控制器根据波前传感器异常值来估算修正,并尽可能快速地应用这些修正。然而,这种积分控制器无法处理时间延迟错误,以及系统中的振荡或动态错误,这些错误会影响高对比图像系统中的复杂卷积。随着机器学习的兴起,许多人正在研究应用现代机器学习方法来控制波前。此外,许多线性实现机器学习方法(以不同的别称出现)已经在波前控制领域进行了30多年的开发。在这项工作中,我们定义机器学习在最简单的形式下,探讨了在这个问题上最常见的机器学习方法,并对Literature concerning novel machine learning approaches to wavefront control.

Learning Shared Safety Constraints from Multi-task Demonstrations

  • paper_url: http://arxiv.org/abs/2309.00711
  • repo_url: None
  • paper_authors: Konwoo Kim, Gokul Swamy, Zuxin Liu, Ding Zhao, Sanjiban Choudhury, Zhiwei Steven Wu
  • for: 学习安全约束,以避免机器人在完成任务时出现危险行为。
  • methods: 基于专家示范的安全任务完成方式进行约束学习,通过扩展反奖学习技术来学习约束。
  • results: 通过多示范的利用,学习出更加紧致的约束,以避免过度保守的约束导致机器人无法完成任务。
    Abstract Regardless of the particular task we want them to perform in an environment, there are often shared safety constraints we want our agents to respect. For example, regardless of whether it is making a sandwich or clearing the table, a kitchen robot should not break a plate. Manually specifying such a constraint can be both time-consuming and error-prone. We show how to learn constraints from expert demonstrations of safe task completion by extending inverse reinforcement learning (IRL) techniques to the space of constraints. Intuitively, we learn constraints that forbid highly rewarding behavior that the expert could have taken but chose not to. Unfortunately, the constraint learning problem is rather ill-posed and typically leads to overly conservative constraints that forbid all behavior that the expert did not take. We counter this by leveraging diverse demonstrations that naturally occur in multi-task settings to learn a tighter set of constraints. We validate our method with simulation experiments on high-dimensional continuous control tasks.
    摘要 无论我们想让我们的代理人完成哪个任务在环境中,通常存在共享的安全约束我们希望我们的代理人遵循。例如,无论是制作三明治还是清理桌子,厨房机器人不应该砸碎碗。手动指定这种约束可能会非常时间consuming和容易出错。我们展示了如何从专家示例安全任务完成的方式学习约束。理解来说,我们学习约束,禁止专家完成任务时高度奖励的行为。然而,约束学习问题通常很不充分定义,通常会导致过度保守的约束,禁止专家没有完成的所有行为。我们通过利用多个示例来学习更紧的约束。我们验证我们的方法通过高维连续控制任务的 simulations experiments。

Randomized Polar Codes for Anytime Distributed Machine Learning

  • paper_url: http://arxiv.org/abs/2309.00682
  • repo_url: None
  • paper_authors: Burak Bartan, Mert Pilanci
  • for: 这个论文是为了提出一种新的分布式计算框架,可以在慢计算节点的情况下进行稳定的线性运算 aproximate 和精确计算。
  • methods: 该机制基于随机损块和极icode的概念,并提出了一种顺序解码算法,可以处理实数据并保持低计算复杂度。此外, authors 还提出了一种任何时间估计器,可以在可用节点输出集不可解码的情况下产生可靠的估计。
  • results: authors 通过实践示例,包括大规模矩阵乘法和黑盒优化,证明了这个框架的可扩展性和实用性。 authors 还在无服务器云计算系统上实现了这些方法,并提供了大规模计算结果来证明其扩展性。
    Abstract We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations.
    摘要 我们提出了一种新的分布式计算框架,具有鲁棒性能快速计算节点的特点,可以进行精度和近似计算线性操作。我们的机制兼用随机损害和极码在计算机中的概念。我们提出了一种顺序解码算法,可以处理实数据,并保持低计算复杂性。此外,我们还提供了一个任何时间估计器,可以在可用节点输出集不可解码时产生可靠地估计。我们在不同场景中应用了这些方法,如大规模矩阵乘法和黑盒优化。我们在无服务器云计算系统上实现了这些方法,并提供了数字结果来证明其在实践中的扩展性,包括图像缩放计算。

Bayesian deep learning for cosmic volumes with modified gravity

  • paper_url: http://arxiv.org/abs/2309.00612
  • repo_url: https://github.com/JavierOrjuela/Bayesian-Neural-Net-with-MNFs-for-f-R-
  • paper_authors: Jorge Enrique García-Farieta, Héctor J Hortúa, Francisco-Shu Kitaura
  • For: This paper aims to extract cosmological parameters from modified gravity (MG) simulations using deep neural networks, with a focus on uncertainty estimations.* Methods: The paper uses Bayesian neural networks (BNNs) with an enriched approximate posterior distribution, and trains the networks with real-space density fields and power-spectra from a suite of 2000 dark matter only particle mesh $N$-body simulations.* Results: The paper finds that BNNs can accurately predict parameters for $\Omega_m$ and $\sigma_8$ and their correlation with the MG parameter, and yields well-calibrated uncertainty estimates. Additionally, the presence of MG parameter leads to a significant degeneracy with $\sigma_8$, and ignoring MG results in a deviation of the relative errors in $\Omega_m$ and $\sigma_8$ by at least $30%$.
    Abstract The new generation of galaxy surveys will provide unprecedented data allowing us to test gravity at cosmological scales. A robust cosmological analysis of the large-scale structure demands exploiting the nonlinear information encoded in the cosmic web. Machine Learning techniques provide such tools, however, do not provide a priori assessment of uncertainties. This study aims at extracting cosmological parameters from modified gravity (MG) simulations through deep neural networks endowed with uncertainty estimations. We implement Bayesian neural networks (BNNs) with an enriched approximate posterior distribution considering two cases: one with a single Bayesian last layer (BLL), and another one with Bayesian layers at all levels (FullB). We train both BNNs with real-space density fields and power-spectra from a suite of 2000 dark matter only particle mesh $N$-body simulations including modified gravity models relying on MG-PICOLA covering 256 $h^{-1}$ Mpc side cubical volumes with 128$^3$ particles. BNNs excel in accurately predicting parameters for $\Omega_m$ and $\sigma_8$ and their respective correlation with the MG parameter. We find out that BNNs yield well-calibrated uncertainty estimates overcoming the over- and under-estimation issues in traditional neural networks. We observe that the presence of MG parameter leads to a significant degeneracy with $\sigma_8$ being one of the possible explanations of the poor MG predictions. Ignoring MG, we obtain a deviation of the relative errors in $\Omega_m$ and $\sigma_8$ by at least $30\%$. Moreover, we report consistent results from the density field and power spectra analysis, and comparable results between BLL and FullB experiments which permits us to save computing time by a factor of two. This work contributes in setting the path to extract cosmological parameters from complete small cosmic volumes towards the highly nonlinear regime.
    摘要 新一代星系调查将提供无前例的数据,允许我们在 cosmological scales 测试 gravitation。在大规模结构中的非线性信息探索需要使用机器学习技术,但这些技术不提供先验知道不确定性的工具。本研究尝试透过深度神经网络(BNN)估计 cosmological 参数,并在这些 BNN 中添加不确定性估计。我们实现了两种情况:一个包括单一的 Bayesian 最后层(BLL),另一个则是在所有层级添加 Bayesian 层(FullB)。我们将这两种 BNN 训练使用 real-space density 场和对应的 power-spectra,这些实验包括 modified gravity 模型,并且覆盖 256 $h^{-1}$ Mpc 方块Volume 2000 个 dark matter 对称运动 mesh simulate 128$^3$ 个粒子。BNN 能够精准地预测参数,特别是 $\Omega_m$ 和 $\sigma_8$ 的参数,并且与 MG 参数之间的相互关联性。我们发现 BNN 能够提供良好的不确定性估计,并且与传统神经网络相比,这些不确定性估计的问题较小。在忽略 MG 参数时,我们发现 $\Omega_m$ 和 $\sigma_8$ 的相对误差偏移至少 30%。此外,我们发现 density 场和对应的 power-spectra 分析具有相互关联性,并且 FullB 和 BLL 实验具有相互关联性。这些结果表明我们可以将 cosmological 参数从完整的小宇宙体积中提取,并且在非线性 regime 中进行测试。

Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair

  • paper_url: http://arxiv.org/abs/2309.00608
  • repo_url: https://github.com/ise-uiuc/Repilot
  • paper_authors: Yuxiang Wei, Chunqiu Steven Xia, Lingming Zhang
    for:本文主要用于提高自动程序修复(APR)的效果,尤其是在普通程序语言中进行修复。methods:本文提出了一种名为Repilot的框架,它可以帮助AI“助手”(即大型自然语言模型)生成更有效的补丁。Repilot通过让LLM生成token,然后通过一个完成引擎来约束和改进这些token,以生成更有效的补丁。results:根据对Defects4j 1.2和2.0数据集的评估,Repilot可以 fixes 66和50个漏洞,分别高于基线最佳方案的14和16个漏洞。此外,Repilot可以在给定的生成预算下生成更有效和正确的补丁,比基eline LLM更高效。
    Abstract During Automated Program Repair (APR), it can be challenging to synthesize correct patches for real-world systems in general-purpose programming languages. Recent Large Language Models (LLMs) have been shown to be helpful "copilots" in assisting developers with various coding tasks, and have also been directly applied for patch synthesis. However, most LLMs treat programs as sequences of tokens, meaning that they are ignorant of the underlying semantics constraints of the target programming language. This results in plenty of statically invalid generated patches, impeding the practicality of the technique. Therefore, we propose Repilot, a framework to further copilot the AI "copilots" (i.e., LLMs) by synthesizing more valid patches during the repair process. Our key insight is that many LLMs produce outputs autoregressively (i.e., token by token), resembling human writing programs, which can be significantly boosted and guided through a Completion Engine. Repilot synergistically synthesizes a candidate patch through the interaction between an LLM and a Completion Engine, which 1) prunes away infeasible tokens suggested by the LLM and 2) proactively completes the token based on the suggestions provided by the Completion Engine. Our evaluation on a subset of the widely-used Defects4j 1.2 and 2.0 datasets shows that Repilot fixes 66 and 50 bugs, respectively, surpassing the best-performing baseline by 14 and 16 bugs fixed. More importantly, Repilot is capable of producing more valid and correct patches than the base LLM when given the same generation budget.
    摘要 在自动化程序修复(APR)过程中,可能会遇到将程序作为字符序列处理的挑战。最近的大型自然语言模型(LLM)有助于开发者完成各种编程任务,并直接应用于补丁生成。然而,大多数LLM都忽略了目标编程语言的 semantics 约束,从而生成了许多静态无效的补丁,这阻碍了技术的实用性。因此,我们提出了 Repilot,一个框架,以帮助AI " copilots"(即LLM)生成更有效的补丁。我们的关键发现是,许多LLM生成输出以 autoregressive 方式进行(即字符串按字符串顺序),与人类编写程序类似,可以得到显著的提高和指导。Repilot 同时利用了一个 Completion Engine 来完成补丁的生成。其中,1) 筛除 LLM 所提供的不可能的字符,2) 积极地根据 Completion Engine 的建议完成字符。我们对 Defects4j 1.2 和 2.0 数据集进行了评估,发现 Repilot 可以修复 66 和 50 个漏洞,分别高于基eline 的最佳performanced by 14 和 16 个漏洞。此外,Repilot 能够在给定的生成预算下生成更有效和正确的补丁,与基础 LLM 相比。

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

  • paper_url: http://arxiv.org/abs/2309.00591
  • repo_url: None
  • paper_authors: Qining Zhang, Lei Ying
  • for: 本研究考虑了一个Stochastic Multi-armed Bandit(MAB)问题,旨在同时实现两个目标:快速标识优致arm,并在序列中的$T$个轮次中 Maximize reward。
  • methods: 本文引入了Regret Optimal Best Arm Identification(ROBAI),以解决这两个目标。为了解决ROBAI,我们提出了 $\mathsf{EOCP}$ 算法和其变种,可以在 Gaussian 和通用bandit 中实现 asymptotic 优化的 regret,并在 $\mathcal{O}(\log T)$ 轮次内commit to the optimal arm。
  • results: 我们还确定了 ROBAI 的下界,表明 $\mathsf{EOCP}$ 算法和其变种是 sample 优化的,并且在适应 stopping time 下可以 дости到 almost sample 优化的性能。numerical results 表明,$\mathsf{EOCP}$ 算法在 comparison 中比 classic $\mathsf{UCB}$ 算法更有优势,这表明过度探索可能是系统性能的障碍。
    Abstract This paper considers a stochastic multi-armed bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present the $\mathsf{EOCP}$ algorithm and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in $\mathcal{O}(\log T)$ rounds with pre-determined stopping time and $\mathcal{O}(\log^2 T)$ rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to sample complexity) of ROBAI, showing that $\mathsf{EOCP}$ and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting ``over-exploration'' phenomenon carried by classic $\mathsf{UCB}$ algorithms, such that $\mathsf{EOCP}$ has smaller regret even though it stops exploration much earlier than $\mathsf{UCB}$ ($\mathcal{O}(\log T)$ versus $\mathcal{O}(T)$), which suggests over-exploration is unnecessary and potentially harmful to system performance.
    摘要 本文考虑了一个随机多оружие带刺(MAB)问题,该问题具有两个目标:快速确定优至的枪和最大化奖励 throughout a sequence of $T$ consecutive rounds。虽然每个目标都已经 individually 得到了研究,即最优枪 identification 和 regret 最小化,但是同时实现这两个目标仍然是一个开放的问题,尽管它在实际中具有重要的实际意义。本文引入了 $\emph{Regret Optimal Best Arm Identification}$(ROBAI),以实现这两个目标。为了解决 ROBAI 的问题,我们提出了 $\mathsf{EOCP}$ 算法和其变种,这些算法不仅在 Gaussian 和通用带刺中实现了 asymptotic 最优的 regret,而且在 pre-determined stopping time 和 adaptive stopping time 下可以在 $\mathcal{O}(\log T)$ 轮或 $\mathcal{O}(\log^2 T)$ 轮内commit to the optimal arm。我们还给出了 ROBAI 的下界,证明 $\mathsf{EOCP}$ 和其变种是 sample 优的,并且在 adaptive stopping time 下是 almost sample 优的。实际结果验证了我们的理论分析,并显示了 класси的 $\mathsf{UCB}$ 算法会在 exploration 过程中出现“过度探索”现象,即 $\mathsf{EOCP}$ 在 much earlier than $\mathsf{UCB}$ ($\mathcal{O}(\log T)$ versus $\mathcal{O}(T)$)内stop exploration,从而具有更小的 regret。这表明过度探索可能对系统性能产生负面影响,并且 $\mathsf{EOCP}$ 可能是一个更好的选择。

PolyGET: Accelerating Polymer Simulations by Accurate and Generalizable Forcefield with Equivariant Transformer

  • paper_url: http://arxiv.org/abs/2309.00585
  • repo_url: None
  • paper_authors: Rui Feng, Huan Tran, Aubrey Toland, Binghong Chen, Qi Zhu, Rampi Ramprasad, Chao Zhang
  • for: 这个论文的目的是开发一种新的聚合物力场模型,以提高聚合物模拟的准确性和效率。
  • methods: 这个论文使用了一种新的框架 called PolyGET,它使用了通用对称变换器来捕捉聚合物中的复杂量子交互,并且可以在不同的聚合物家族之间进行泛化。
  • results: 论文的实验结果表明,PolyGET可以在一个大规模的数据集上达到状态的艺术的性能,并且可以在不同的聚合物类型之间进行泛化。此外,PolyGET还可以在大聚合物模拟中实现高精度的DFT方法,而不需要大量的计算资源。
    Abstract Polymer simulation with both accuracy and efficiency is a challenging task. Machine learning (ML) forcefields have been developed to achieve both the accuracy of ab initio methods and the efficiency of empirical force fields. However, existing ML force fields are usually limited to single-molecule settings, and their simulations are not robust enough. In this paper, we present PolyGET, a new framework for Polymer Forcefields with Generalizable Equivariant Transformers. PolyGET is designed to capture complex quantum interactions between atoms and generalize across various polymer families, using a deep learning model called Equivariant Transformers. We propose a new training paradigm that focuses exclusively on optimizing forces, which is different from existing methods that jointly optimize forces and energy. This simple force-centric objective function avoids competing objectives between energy and forces, thereby allowing for learning a unified forcefield ML model over different polymer families. We evaluated PolyGET on a large-scale dataset of 24 distinct polymer types and demonstrated state-of-the-art performance in force accuracy and robust MD simulations. Furthermore, PolyGET can simulate large polymers with high fidelity to the reference ab initio DFT method while being able to generalize to unseen polymers.
    摘要 聚合物模拟具有精度和效率是一项复杂的任务。机器学习(ML)力场已经开发出来实现精度和经验力场的同时。然而,现有的ML力场通常只能处理单个分子的设置,其模拟不够稳定。在这篇论文中,我们介绍了PolyGET框架,它是一种新的聚合物力场框架,用于捕捉聚合物中的复杂量子交互作用,并可以通过深度学习模型来泛化到不同的聚合物家族。我们提出了一种新的训练方法,它专门关注优化力,而不是与能量一起优化力和能量的现有方法。这种简单的力中心的目标函数可以避免了能量和力之间的竞争目标,因此允许学习一种综合的ML模型,可以在不同的聚合物家族上学习。我们对24种不同的聚合物类型进行了大规模的数据集测试,并证明了PolyGET在力精度和稳定的MD模拟方面具有状态机器之作。此外,PolyGET可以模拟大分子,并且可以泛化到未经见过的聚合物。

Laminar: A New Serverless Stream-based Framework with Semantic Code Search and Code Completion

  • paper_url: http://arxiv.org/abs/2309.00584
  • repo_url: None
  • paper_authors: Zaynab Zahra, Zihao Li, Rosa Filgueira
  • for: 这篇论文旨在提供一个新的服务器less框架,基于dispel4py parallel流数据流库。
  • methods: 该框架使用专门的注册表来有效地管理流动工作流和组件,提供了无缝服务器less经验。
  • results: 该论文使用大语言模型提高了框架,并增加了semantic code搜索、代码概要和代码完成等功能。这种贡献有助于改善服务器less计算的执行,更有效地管理数据流,并提供了价值的工具 для研究人员和实践者。
    Abstract This paper introduces Laminar, a novel serverless framework based on dispel4py, a parallel stream-based dataflow library. Laminar efficiently manages streaming workflows and components through a dedicated registry, offering a seamless serverless experience. Leveraging large lenguage models, Laminar enhances the framework with semantic code search, code summarization, and code completion. This contribution enhances serverless computing by simplifying the execution of streaming computations, managing data streams more efficiently, and offering a valuable tool for both researchers and practitioners.
    摘要 Translation in Simplified Chinese:这篇论文介绍了Laminar,一种基于dispel4py的新的服务器less框架。Laminar通过专门的注册表来有效地管理流处理工作流和组件,为用户提供了无缝服务器less体验。通过大型自然语言模型,Laminar增强了框架,添加了semantic code search、code summarization和code completion等功能。这一贡献提高了服务器less计算,更有效地管理数据流,为研究人员和实践者提供了一个有价值的工具。

Geometry-Informed Neural Operator for Large-Scale 3D PDEs

  • paper_url: http://arxiv.org/abs/2309.00583
  • repo_url: None
  • paper_authors: Zongyi Li, Nikola Borislavov Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Prakash Otta, Mohammad Amin Nabian, Maximilian Stadler, Christian Hundt, Kamyar Azizzadenesheli, Anima Anandkumar
  • for: 这个论文是为了学习大规模partial differential equations的解operator而写的。
  • methods: 该方法使用了签名距离函数和点云表示输入形状,以及基于图和傅риер体系的神经Operator来学习解operator。
  • results: 该方法可以高效地应用于大规模流体力学 simulator,并且可以在不同的几何参数下提供高精度的结果。
    Abstract We propose the geometry-informed neural operator (GINO), a highly efficient approach to learning the solution operator of large-scale partial differential equations with varying geometries. GINO uses a signed distance function and point-cloud representations of the input shape and neural operators based on graph and Fourier architectures to learn the solution operator. The graph neural operator handles irregular grids and transforms them into and from regular latent grids on which Fourier neural operator can be efficiently applied. GINO is discretization-convergent, meaning the trained model can be applied to arbitrary discretization of the continuous domain and it converges to the continuum operator as the discretization is refined. To empirically validate the performance of our method on large-scale simulation, we generate the industry-standard aerodynamics dataset of 3D vehicle geometries with Reynolds numbers as high as five million. For this large-scale 3D fluid simulation, numerical methods are expensive to compute surface pressure. We successfully trained GINO to predict the pressure on car surfaces using only five hundred data points. The cost-accuracy experiments show a $26,000 \times$ speed-up compared to optimized GPU-based computational fluid dynamics (CFD) simulators on computing the drag coefficient. When tested on new combinations of geometries and boundary conditions (inlet velocities), GINO obtains a one-fourth reduction in error rate compared to deep neural network approaches.
    摘要 我们提出了几何资料驱动的神经运算器(GINO),一种高效的方法来学习巨大数据点解析方程的解析运算器。GINO使用签名距离函数和点云表示的输入形状,以及基于图形和傅立叶架构的神经运算器来学习解析运算器。图形神经运算器可以处理不规则格子,并将其转换为和傅立叶格子的常规隐藏格子,在这之上进行高效的傅立叶神经运算器应用。GINO是精确化易的, meaning the trained model can be applied to any discretization of the continuous domain and it converges to the continuum operator as the discretization is refined.为了实际验证我们的方法在大规模模拟中的性能,我们生成了3D车身对象的industry-standard aerodynamics数据集,其中Reynolds数达到500万。这些大规模3D流体模拟的计算Surface pressure是 computationally expensive的。我们成功地使用GINO预测车身表面压力,只需使用500个数据点。cost-accuracy实验显示GINO与优化的GPU基于computational fluid dynamics(CFD)仿真器相比,在计算抗力系数方面节省了26,000倍的时间成本。当GINO在新的几何和边界条件下进行测试时,它获得了对于深度神经网络方法的一半减少的错误率。

Consistency of Lloyd’s Algorithm Under Perturbations

  • paper_url: http://arxiv.org/abs/2309.00578
  • repo_url: None
  • paper_authors: Dhruv Patel, Hui Shen, Shankar Bhamidi, Yufeng Liu, Vladas Pipiras
  • for: 这个论文主要研究了 Lloyd 算法在不监督学习中的正确性,具体来说是在不知道真实的样本分布情况下,通过预处理管道如спектル方法来学习样本分布,然后使用 Lloyd 算法进行划分。
  • methods: 这个论文使用了 Lloyd 算法和相关的预处理管道,如спектル方法,来研究不监督学习中的划分问题。
  • results: 这个论文得出的结果是,对于不监督学习中的样本分布,Lloyd 算法的误分类率是在 $O(\log(n))$ 迭代后下降到某种下界,具体来说是 $\frac{C}{n^{1/4}$,其中 $C$ 是一个常数,$n$ 是样本数量。此外,这个论文还提供了一些具体的应用场景,如高维时间序列、多维折射和稀疏网络社区检测等。
    Abstract In the context of unsupervised learning, Lloyd's algorithm is one of the most widely used clustering algorithms. It has inspired a plethora of work investigating the correctness of the algorithm under various settings with ground truth clusters. In particular, in 2016, Lu and Zhou have shown that the mis-clustering rate of Lloyd's algorithm on $n$ independent samples from a sub-Gaussian mixture is exponentially bounded after $O(\log(n))$ iterations, assuming proper initialization of the algorithm. However, in many applications, the true samples are unobserved and need to be learned from the data via pre-processing pipelines such as spectral methods on appropriate data matrices. We show that the mis-clustering rate of Lloyd's algorithm on perturbed samples from a sub-Gaussian mixture is also exponentially bounded after $O(\log(n))$ iterations under the assumptions of proper initialization and that the perturbation is small relative to the sub-Gaussian noise. In canonical settings with ground truth clusters, we derive bounds for algorithms such as $k$-means$++$ to find good initializations and thus leading to the correctness of clustering via the main result. We show the implications of the results for pipelines measuring the statistical significance of derived clusters from data such as SigClust. We use these general results to derive implications in providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering.
    摘要 在无监督学习框架下,沛罗德算法是最广泛使用的聚类算法之一。它已经引起了许多研究确定算法在不同设定下的正确性。特别是在2016年,简和周在一个sub-Gaussian混合中的$n$个独立样本上显示了沛罗德算法的误分类率是在$O(\log(n))$迭代后经过抑制的,假设初始化正确。然而,在许多应用中,真实的样本是未观察的,需要从数据中学习via预处理管道,如спектраль方法在适当的数据矩阵上。我们表明,在sub-Gaussian混合中的受损样本上,沛罗德算法的误分类率也是在$O(\log(n))$迭代后抑制的,假设初始化正确且受损相对于sub-Gaussian噪声是小的。在 canonical 设定下,我们 derivebounds for algorithms such as $k$-means$++$ to find good initializations and thus leading to the correctness of clustering via the main result.我们显示了这些结果对于pipelines measuring the statistical significance of derived clusters from data such as SigClust的影响。我们使用这些总结来 derive implications for providing theoretical guarantees on the misclustering rate for Lloyd's algorithm in a host of applications, including high-dimensional time series, multi-dimensional scaling, and community detection for sparse networks via spectral clustering.

Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

  • paper_url: http://arxiv.org/abs/2309.00564
  • repo_url: https://github.com/joachimschaeffer/hdreganalytics
  • paper_authors: Joachim Schaeffer, Eric Lenz, William C. Chueh, Martin Z. Bazant, Rolf Findeisen, Richard D. Braatz
  • for: This paper is written for researchers and practitioners who work with high-dimensional linear regression in various scientific fields, particularly those who deal with discrete measured data of underlying smooth latent processes.
  • methods: The paper proposes an optimization formulation to compare regression coefficients and to understand the relationship between the nullspace and regularization in high-dimensional linear regression. The authors also use physical engineering knowledge to interpret the regression results.
  • results: The case studies show that regularization and z-scoring are important design choices that can lead to interpretable regression results, while the combination of the nullspace and regularization can hinder interpretability. Additionally, the paper demonstrates that regression methods that do not produce coefficients orthogonal to the nullspace, such as fused lasso, can improve interpretability.
    Abstract High-dimensional linear regression is important in many scientific fields. This article considers discrete measured data of underlying smooth latent processes, as is often obtained from chemical or biological systems. Interpretation in high dimensions is challenging because the nullspace and its interplay with regularization shapes regression coefficients. The data's nullspace contains all coefficients that satisfy $\mathbf{Xw}=\mathbf{0}$, thus allowing very different coefficients to yield identical predictions. We developed an optimization formulation to compare regression coefficients and coefficients obtained by physical engineering knowledge to understand which part of the coefficient differences are close to the nullspace. This nullspace method is tested on a synthetic example and lithium-ion battery data. The case studies show that regularization and z-scoring are design choices that, if chosen corresponding to prior physical knowledge, lead to interpretable regression results. Otherwise, the combination of the nullspace and regularization hinders interpretability and can make it impossible to obtain regression coefficients close to the true coefficients when there is a true underlying linear model. Furthermore, we demonstrate that regression methods that do not produce coefficients orthogonal to the nullspace, such as fused lasso, can improve interpretability. In conclusion, the insights gained from the nullspace perspective help to make informed design choices for building regression models on high-dimensional data and reasoning about potential underlying linear models, which are important for system optimization and improving scientific understanding.
    摘要 高维Linear regression在多个科学领域非常重要。这篇文章考虑了化学或生物系统中的离散测量数据,这些数据通常表示下面的精灵过程。在高维度下解释具有搜索挑战,因为null space和其与正则化的交互会影响回归系数。数据的null space包含所有满足 $\mathbf{Xw=0}$ 的系数,从而允许非常不同的系数导致同样的预测。我们提出了一种优化表述,用于比较回归系数和基于物理工程知识来理解系数差异。这个null space方法在一个 sintetic例和锂离子电池数据上进行了测试。案例研究表明,如果选择合适的正则化和z-scoring,那么回归结果将更加可读性。否则,null space和正则化的交互会使回归结果不可解释,而且在存在真实的下面线性模型时,无法获得正确的系数。此外,我们还示出了不能将系数正交于null space的回归方法,如融合lasso,可以提高可读性。因此,从null space角度获得的知识可以帮助我们做出有用的设计选择,建立回归模型,以便优化系统和提高科学理解。

Interactive and Concentrated Differential Privacy for Bandits

  • paper_url: http://arxiv.org/abs/2309.00557
  • repo_url: None
  • paper_authors: Achraf Azize, Debabrota Basu
  • for: 这篇论文关注了在中央决策者的情况下保护用户隐私的问题。
  • methods: 这篇论文使用了交互式敏感度隐私(DP)来保护用户隐私。
  • results: 论文提供了对固定武器和线性投掷器的最小最大偏差和问题依赖的下界,这些下界表明在不同的隐私预算$\rho$下,$\rho$-全球敏感度隐私对偏差的影响不同。论文还提出了两种$\rho$-全球敏感度隐私投掷算法,即AdaC-UCB和AdaC-GOPE。这两种算法都使用了 Gaussian 机制和适应集。论文分析了这些算法的偏差,并证明了AdaC-UCB实现了问题依赖的下界,而AdaC-GOPE实现了最小最大偏差下界。最后,论文提供了不同设置下的实验 validate 论文的结论。
    Abstract Bandits play a crucial role in interactive learning schemes and modern recommender systems. However, these systems often rely on sensitive user data, making privacy a critical concern. This paper investigates privacy in bandits with a trusted centralized decision-maker through the lens of interactive Differential Privacy (DP). While bandits under pure $\epsilon$-global DP have been well-studied, we contribute to the understanding of bandits under zero Concentrated DP (zCDP). We provide minimax and problem-dependent lower bounds on regret for finite-armed and linear bandits, which quantify the cost of $\rho$-global zCDP in these settings. These lower bounds reveal two hardness regimes based on the privacy budget $\rho$ and suggest that $\rho$-global zCDP incurs less regret than pure $\epsilon$-global DP. We propose two $\rho$-global zCDP bandit algorithms, AdaC-UCB and AdaC-GOPE, for finite-armed and linear bandits respectively. Both algorithms use a common recipe of Gaussian mechanism and adaptive episodes. We analyze the regret of these algorithms to show that AdaC-UCB achieves the problem-dependent regret lower bound up to multiplicative constants, while AdaC-GOPE achieves the minimax regret lower bound up to poly-logarithmic factors. Finally, we provide experimental validation of our theoretical results under different settings.
    摘要 匪徒在互动学习方案和现代推荐系统中发挥关键作用,但这些系统经常依赖敏感用户数据,因此隐私成为一个重要问题。这篇论文通过交互式差异private(DP)的视角研究了隐私在匪徒中。虽然纯$\epsilon$-全球DP已经得到了广泛的研究,但我们在zero Concentrated DP(zCDP)下进行了研究。我们提供了基于finite-armed和线性匪徒的最小最大和问题依赖的下界,这些下界量化了在这些设定下的 regret的成本。这些下界显示了两个硬度 режи,即隐私预算$\rho$的硬度和隐私预算$\rho$的硬度。我们还提出了两种$\rho$-全球zCDP匪徒算法,即AdaC-UCB和AdaC-GOPE。这两种算法都使用了共同的 Gaussian mechanism 和自适应集。我们分析了这些算法的 regret,并证明了AdaC-UCB实现了问题依赖的 regret下界,而AdaC-GOPE实现了最小最大的 regret下界。最后,我们在不同的设定下进行了实验验证。

Adaptive function approximation based on the Discrete Cosine Transform (DCT)

  • paper_url: http://arxiv.org/abs/2309.00530
  • repo_url: None
  • paper_authors: Ana I. Pérez-Neira, Marc Martinez-Gost, Miguel Ángel Lagunas
  • for: 这篇论文研究了基于恒等函数的一元连续函数近似方法,不具备快速储存的缺点。
  • methods: 本论文使用了一种supervised学习方法来获取近似系数,而不是使用快速储存变换 (DCT)。
  • results: 由于cosine基函数的有限动态和正交性,使用这种简单的梯度算法,如正规化最小二乘 (NLMS),可以获得控制的和预测可靠的融合时间和误差补偿。这种技术的简单性使其成为在更复杂的超级vised学习系统中使用的佳技术。
    Abstract This paper studies the cosine as basis function for the approximation of univariate and continuous functions without memory. This work studies a supervised learning to obtain the approximation coefficients, instead of using the Discrete Cosine Transform (DCT). Due to the finite dynamics and orthogonality of the cosine basis functions, simple gradient algorithms, such as the Normalized Least Mean Squares (NLMS), can benefit from it and present a controlled and predictable convergence time and error misadjustment. Due to its simplicity, the proposed technique ranks as the best in terms of learning quality versus complexity, and it is presented as an attractive technique to be used in more complex supervised learning systems. Simulations illustrate the performance of the approach. This paper celebrates the 50th anniversary of the publication of the DCT by Nasir Ahmed in 1973.
    摘要 这篇论文研究了无记忆函数的cosine作为基函数,用于精度地近似单变量连续函数。这项研究使用了指导学习而不是使用抽象 cosine transform (DCT) 来获取近似系数。由于cosine基函数的有限动态和正交性,简单的梯度算法,如normalized least squares (NLMS),可以从中受益,并且可以控制和预测误差补偿的时间和误差。由于其简单性,该技术在学习质量 versus 复杂性方面排名最高,并且作为更复杂的超visisted learning系统中的一种吸引人技术。实验表明了该方法的性能。这篇论文纪念1973年Nasir Ahmed发表的《抽象cosine transform》50周年。

Online Distributed Learning over Random Networks

  • paper_url: http://arxiv.org/abs/2309.00520
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Nicola Bastianello, Diego Deplano, Mauro Franceschelli, Karl H. Johansson
  • for: 本研究的目的是解决分布式学习问题,特别是在多代理系统中,代理不直接分享数据,而是通过协作来学习模型。
  • methods: 本研究使用了分布式运算理论(DOT)版本的分解方向方法(ADMM),称为DOT-ADMM算法,以解决在线学习、异步代理计算、不可靠和有限通信等实际问题。
  • results: 本研究证明了DOT-ADMM算法在一类凸学习问题(如线性和启发式回归问题)中 converge linear rate,并Characterize了它们的解决方案如何受到(i)到(iv)的影响。在数值实验中,DOT-ADMM算法与其他当前状态算法进行比较,显示它具有对(i)到(iv)的Robustness。
    Abstract The recent deployment of multi-agent systems in a wide range of scenarios has enabled the solution of learning problems in a distributed fashion. In this context, agents are tasked with collecting local data and then cooperatively train a model, without directly sharing the data. While distributed learning offers the advantage of preserving agents' privacy, it also poses several challenges in terms of designing and analyzing suitable algorithms. This work focuses specifically on the following challenges motivated by practical implementation: (i) online learning, where the local data change over time; (ii) asynchronous agent computations; (iii) unreliable and limited communications; and (iv) inexact local computations. To tackle these challenges, we introduce the Distributed Operator Theoretical (DOT) version of the Alternating Direction Method of Multipliers (ADMM), which we call the DOT-ADMM Algorithm. We prove that it converges with a linear rate for a large class of convex learning problems (e.g., linear and logistic regression problems) toward a bounded neighborhood of the optimal time-varying solution, and characterize how the neighborhood depends on~$\text{(i)--(iv)}$. We corroborate the theoretical analysis with numerical simulations comparing the DOT-ADMM Algorithm with other state-of-the-art algorithms, showing that only the proposed algorithm exhibits robustness to (i)--(iv).
    摘要
  1. Online learning: Local data changes over time.2. Asynchronous agent computations.3. Unreliable and limited communications.4. Inexact local computations.To address these challenges, we introduce the Distributed Operator Theoretical (DOT) version of the Alternating Direction Method of Multipliers (ADMM), which we call the DOT-ADMM algorithm. We prove that the DOT-ADMM algorithm converges with a linear rate for a wide range of convex learning problems (such as linear and logistic regression problems) and shows robustness to (i)–(iv). Numerical simulations comparing the DOT-ADMM algorithm with other state-of-the-art algorithms demonstrate its superior performance in these challenging scenarios.

Solving multiscale elliptic problems by sparse radial basis function neural networks

  • paper_url: http://arxiv.org/abs/2309.03107
  • repo_url: None
  • paper_authors: Zhiwen Wang, Minxin Chen, Jingrun Chen
  • for: 解决多スケール elliptic partialling differential equations (PDEs) 的问题
  • methods: 使用 sparse radial basis function neural network (RBFNN) 方法,启发自 deep mixed residual method,将二次问题转化为一次系统,并使用多个 RBFNN 来近似未知函数
  • results: 提出了一种新的 $\ell_1$ regularization 技术,可以避免过拟合问题,并且可以在三维空间中提供可靠的数值解决方案,并且比较稳定和精度高于大多数其他可算法。
    Abstract Machine learning has been successfully applied to various fields of scientific computing in recent years. In this work, we propose a sparse radial basis function neural network method to solve elliptic partial differential equations (PDEs) with multiscale coefficients. Inspired by the deep mixed residual method, we rewrite the second-order problem into a first-order system and employ multiple radial basis function neural networks (RBFNNs) to approximate unknown functions in the system. To aviod the overfitting due to the simplicity of RBFNN, an additional regularization is introduced in the loss function. Thus the loss function contains two parts: the $L_2$ loss for the residual of the first-order system and boundary conditions, and the $\ell_1$ regularization term for the weights of radial basis functions (RBFs). An algorithm for optimizing the specific loss function is introduced to accelerate the training process. The accuracy and effectiveness of the proposed method are demonstrated through a collection of multiscale problems with scale separation, discontinuity and multiple scales from one to three dimensions. Notably, the $\ell_1$ regularization can achieve the goal of representing the solution by fewer RBFs. As a consequence, the total number of RBFs scales like $\mathcal{O}(\varepsilon^{-n\tau})$, where $\varepsilon$ is the smallest scale, $n$ is the dimensionality, and $\tau$ is typically smaller than $1$. It is worth mentioning that the proposed method not only has the numerical convergence and thus provides a reliable numerical solution in three dimensions when a classical method is typically not affordable, but also outperforms most other available machine learning methods in terms of accuracy and robustness.
    摘要 Machine learning 在近年sciences computing中得到了成功应用。在这项工作中,我们提议使用稀疏卷积基函数神经网络方法解析各种具有多级别系数的几何 partial differential equations (PDEs)。受深混合异常方法的激发,我们将第二阶问题重写为第一阶系统,并使用多个卷积基函数神经网络(RBFNNs)来近似未知函数。为了避免RBFNN的过拟合,我们在损失函数中添加了一个 $\ell_1$ 规范项。因此,损失函数包含 $L_2$ 损失项和边界条件,以及 $\ell_1$ 规范项。我们提出了一种优化特定损失函数的算法,以加速训练过程。我们通过一系列具有多个级别、离散和多级别的问题来证明方法的准确性和有效性。值得注意的是,$\ell_1$ 规范可以实现函数的折衔,从而使得总的RBF数量 scales like $\mathcal{O}(\varepsilon^{-n\tau})$,其中 $\varepsilon$ 是最小的尺度,$n$ 是维度,$\tau$ 通常小于 1。此外,我们的方法不仅具有数值收敛性,可以在三维空间提供可靠的数值解决方案,而且在精度和稳定性方面超越大多数可用的机器学习方法。

Structure and Gradient Dynamics Near Global Minima of Two-layer Neural Networks

  • paper_url: http://arxiv.org/abs/2309.00508
  • repo_url: None
  • paper_authors: Leyang Zhang, Yaoyu Zhang, Tao Luo
  • for: 研究两层神经网络的损失地形结构,特别是在全局最优点附近,并确定可以达到完美泛化的参数集。
  • methods: 使用新的技术来探索复杂的损失地形,并发现模型、目标函数、样本和初始化对训练动态的影响不同。
  • results: 研究发现,(过参数化)神经网络可以很好地泛化,并且解释了这种能力的原因。
    Abstract Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks near global minima, determine the set of parameters which give perfect generalization, and fully characterize the gradient flows around it. With novel techniques, our work uncovers some simple aspects of the complicated loss landscape and reveals how model, target function, samples and initialization affect the training dynamics differently. Based on these results, we also explain why (overparametrized) neural networks could generalize well.
    摘要 “我们在两层神经网络附近全球最佳点下调查损失地图的结构,决定具有完美泛化的参数集,并将渐进流动 vollständigCharacterize。我们的工作发现了一些简单的损失地图特性,并说明了模型、目标函数、样本和初始化对训练动态的不同影响。根据这些结果,我们也解释了为什么(过 Parametrization)神经网络具有良好的泛化能力。”Here's the breakdown of the translation:* “Under mild assumptions” becomes “在几何上的假设下” (shì yǐ jī hòu yǐ jī)* “we investigate the structure of loss landscape of two-layer neural networks” becomes “我们调查两层神经网络损失地图的结构” (wǒ men zhù chá yī liàng jī nǎo wǎng jī)* “near global minima” becomes “附近全球最佳点” (pò jìn qū jiāo zhì diǎn)* “determine the set of parameters which give perfect generalization” becomes “决定具有完美泛化的参数集” (jī dìng shì yǐ jī de fāng yì)* “and fully characterize the gradient flows around it” becomes “并将渐进流动 vollständigCharacterize” (bìng shì yǐ jī de jì qiǎo yǐ jī)* “With novel techniques” becomes “使用新的技术” (shǐ yòu xīn de jì huì)* “our work uncovers some simple aspects of the complicated loss landscape” becomes “我们的工作发现了一些简单的损失地图特性” (wǒ men de gōng zuò fā xiàn le yī si xiǎng xīn de zhòng jī)* “and reveals how model, target function, samples and initialization affect the training dynamics differently” becomes “并说明了模型、目标函数、样本和初始化对训练动态的不同影响” (bìng shuō mìng le mó delǐ, mù bìng funcție, yàng bèi hé chū shì yì jī)* “Based on these results, we also explain why (overparametrized) neural networks could generalize well” becomes “根据这些结果,我们也解释了为什么(过 Parametrization)神经网络具有良好的泛化能力” (gēn jī yǐ jī de, wǒ men yě also jiě shì le, shì zhè yǐ jī de, zhòng yì de, zhòng yì de)

Application of Deep Learning Methods in Monitoring and Optimization of Electric Power Systems

  • paper_url: http://arxiv.org/abs/2309.00498
  • repo_url: None
  • paper_authors: Ognjen Kundacina
  • for: 这个博士论文探讨了深度学习技术在电力系统监测和优化方面的应用,以提高电力系统状态估计和动态分布网络重新配置。
  • methods: 本论文使用图神经网络进行电力系统状态估计的提升,并使用强化学习进行动态分布网络重新配置。
  • results: 经过广泛的实验和仿真,提出的方法得到了证明,并且在电力系统监测和优化方面表现出色。
    Abstract This PhD thesis thoroughly examines the utilization of deep learning techniques as a means to advance the algorithms employed in the monitoring and optimization of electric power systems. The first major contribution of this thesis involves the application of graph neural networks to enhance power system state estimation. The second key aspect of this thesis focuses on utilizing reinforcement learning for dynamic distribution network reconfiguration. The effectiveness of the proposed methods is affirmed through extensive experimentation and simulations.
    摘要 这个博士论文全面检讨了深度学习技术在电力系统监测和优化方面的应用。本论文的第一个重要贡献是通过图 neural network 提高电力系统状态估计的精度。第二个关键方面是通过强化学习来动态重新配置分布网络。实验和 simulations 证明了提议的方法的有效性。Here's the breakdown of the translation:* 这个博士论文 (zhè ge bóshì zhōngzì) - This PhD thesis* 全面检讨 (quánxiàn jiǎnzhèng) - thoroughly examines* 深度学习技术 (shēn dào xuéxí jìshù) - deep learning techniques* 在电力系统监测和优化方面 (zhī yì electric power systems) - in the monitoring and optimization of electric power systems* 应用 (yìngzuò) - application* 第一个重要贡献 (dì yī jī zhòngyì) - the first major contribution* 通过图 neural network (tōngguò graphein neural network) - through the use of graph neural networks* 提高电力系统状态估计的精度 (jīngdé electric power system state estimation) - to enhance the accuracy of power system state estimation* 第二个关键方面 (dì èr jiānjiāng fāngbiàn) - the second key aspect* 通过强化学习来动态重新配置分布网络 (tōngguò qiánghuà xuéxí jiào dòngxīn zhòngxīn) - through the use of reinforcement learning to dynamically reconfigure the distribution network* 实验和 simulations (shìyàn yǔ simulated) - experimental and simulated* 证明了 (jiànming le) - prove the effectiveness of* 提议的方法 (tiěyì de fāngzhì) - the proposed methods

How Does Forecasting Affect the Convergence of DRL Techniques in O-RAN Slicing?

  • paper_url: http://arxiv.org/abs/2309.00489
  • repo_url: None
  • paper_authors: Ahmad M. Nagib, Hatem Abou-Zeid, Hossam S. Hassanein
  • For: This paper focuses on improving the convergence of deep reinforcement learning (DRL) agents in open radio access network (O-RAN) architectures, specifically for immersive applications such as virtual reality (VR) gaming and metaverse services.* Methods: The authors use time series forecasting of traffic demands to improve the convergence of the DRL-based slicing agents. They propose a novel forecasting-aided DRL approach and provide an exhaustive experiment that supports multiple services, including real VR gaming traffic.* Results: The proposed approach shows significant improvements in the average initial reward value, convergence rate, and number of converged scenarios compared to the implemented baselines. The results also demonstrate the approach’s robustness against forecasting errors and the feasibility of using imperfect forecasting models.
    Abstract The success of immersive applications such as virtual reality (VR) gaming and metaverse services depends on low latency and reliable connectivity. To provide seamless user experiences, the open radio access network (O-RAN) architecture and 6G networks are expected to play a crucial role. RAN slicing, a critical component of the O-RAN paradigm, enables network resources to be allocated based on the needs of immersive services, creating multiple virtual networks on a single physical infrastructure. In the O-RAN literature, deep reinforcement learning (DRL) algorithms are commonly used to optimize resource allocation. However, the practical adoption of DRL in live deployments has been sluggish. This is primarily due to the slow convergence and performance instabilities suffered by the DRL agents both upon initial deployment and when there are significant changes in network conditions. In this paper, we investigate the impact of time series forecasting of traffic demands on the convergence of the DRL-based slicing agents. For that, we conduct an exhaustive experiment that supports multiple services including real VR gaming traffic. We then propose a novel forecasting-aided DRL approach and its respective O-RAN practical deployment workflow to enhance DRL convergence. Our approach shows up to 22.8%, 86.3%, and 300% improvements in the average initial reward value, convergence rate, and number of converged scenarios respectively, enhancing the generalizability of the DRL agents compared with the implemented baselines. The results also indicate that our approach is robust against forecasting errors and that forecasting models do not have to be ideal.
    摘要 成功的 immerse 应用,如虚拟现实 (VR) 游戏和 metaverse 服务,需要低延迟和可靠的连接。为提供无缝用户体验,开放无线访问网络 (O-RAN) 架构和 sixth-generation 网络 (6G) 将扮演关键角色。RAN 分割,O-RAN 架构中的一个关键组件,允许网络资源根据 immerse 服务的需求进行分配,在单个物理基础设施上创建多个虚拟网络。在 O-RAN 文献中,深度强化学习 (DRL) 算法广泛用于资源分配优化。然而,实际应用中 DRL 的普及率较低。这主要是由 DRL 代理人在部署时和网络条件变化时表现缓慢和性能不稳定所致。在本文中,我们研究了基于时间序列预测的吞吐量需求对 DRL-based 分割代理人的 converges 的影响。为此,我们进行了详细的实验,支持多种服务,包括真实的 VR 游戏流量。然后,我们提出了一种 forecasting-aided DRL 方法和其相应的 O-RAN 实践部署工作流程,以提高 DRL 代理人的 converges。我们的方法显示在初始奖励值、 converges 速度和 converged 场景数量上增加了22.8%、86.3% 和 300%,从而提高了 DRL 代理人的通用性。结果还表明,我们的方法对预测错误有较好的Robustness,并且预测模型不需要 идеal。

Geometry-aware Line Graph Transformer Pre-training for Molecular Property Prediction

  • paper_url: http://arxiv.org/abs/2309.00483
  • repo_url: None
  • paper_authors: Peizhen Bai, Xianyuan Liu, Haiping Lu
  • for: 提高分子表示学习的精度,增强分子功能预测的能力
  • methods: 使用自我超vised学习方法,通过2D和3D模式提取分子特征信息
  • results: 与6个基eline比较,在12个性能测试上 consistently outperform所有基eline,证明其效果
    Abstract Molecular property prediction with deep learning has gained much attention over the past years. Owing to the scarcity of labeled molecules, there has been growing interest in self-supervised learning methods that learn generalizable molecular representations from unlabeled data. Molecules are typically treated as 2D topological graphs in modeling, but it has been discovered that their 3D geometry is of great importance in determining molecular functionalities. In this paper, we propose the Geometry-aware line graph transformer (Galformer) pre-training, a novel self-supervised learning framework that aims to enhance molecular representation learning with 2D and 3D modalities. Specifically, we first design a dual-modality line graph transformer backbone to encode the topological and geometric information of a molecule. The designed backbone incorporates effective structural encodings to capture graph structures from both modalities. Then we devise two complementary pre-training tasks at the inter and intra-modality levels. These tasks provide properly supervised information and extract discriminative 2D and 3D knowledge from unlabeled molecules. Finally, we evaluate Galformer against six state-of-the-art baselines on twelve property prediction benchmarks via downstream fine-tuning. Experimental results show that Galformer consistently outperforms all baselines on both classification and regression tasks, demonstrating its effectiveness.
    摘要 молекулярная свойство предсказание с глубоким обучением получила много внимания в последние годы. из-за scarcity of labeled molecules, there has been growing interest in self-supervised learning methods that learn generalizable molecular representations from unlabeled data. Molecules are typically treated as 2D topological graphs in modeling, but it has been discovered that their 3D geometry is of great importance in determining molecular functionalities. In this paper, we propose the Geometry-aware line graph transformer (Galformer) pre-training, a novel self-supervised learning framework that aims to enhance molecular representation learning with 2D and 3D modalities. Specifically, we first design a dual-modality line graph transformer backbone to encode the topological and geometric information of a molecule. The designed backbone incorporates effective structural encodings to capture graph structures from both modalities. Then we devise two complementary pre-training tasks at the inter and intra-modality levels. These tasks provide properly supervised information and extract discriminative 2D and 3D knowledge from unlabeled molecules. Finally, we evaluate Galformer against six state-of-the-art baselines on twelve property prediction benchmarks via downstream fine-tuning. Experimental results show that Galformer consistently outperforms all baselines on both classification and regression tasks, demonstrating its effectiveness.

Polynomial-Model-Based Optimization for Blackbox Objectives

  • paper_url: http://arxiv.org/abs/2309.00663
  • repo_url: None
  • paper_authors: Janina Schreiber, Damar Wicaksono, Michael Hecht
  • For: 这个论文是为了解决黑盒优化问题而写的,黑盒优化是指很多系统的结构是未知的,并且使用泛型模型来优化这些系统以实现最佳性。* Methods: 这篇论文提出了一种新的黑盒优化算法,即Polynomial-Model-Based Optimization(PMBO),它使用了 bayesian 优化的思想,通过逐步更新模型,以实现平衡探索和利用率,同时提供了模型不确定性的估计。* Results: 论文通过对一些人工、分析函数进行比较,展示了 PMBO 与其他当前领先算法相比,具有竞争力,甚至在某些情况下超越了它们。因此,作者认为 PMBO 是解决黑盒优化问题的首选方法。
    Abstract For a wide range of applications the structure of systems like Neural Networks or complex simulations, is unknown and approximation is costly or even impossible. Black-box optimization seeks to find optimal (hyper-) parameters for these systems such that a pre-defined objective function is minimized. Polynomial-Model-Based Optimization (PMBO) is a novel blackbox optimizer that finds the minimum by fitting a polynomial surrogate to the objective function. Motivated by Bayesian optimization the model is iteratively updated according to the acquisition function Expected Improvement, thus balancing the exploitation and exploration rate and providing an uncertainty estimate of the model. PMBO is benchmarked against other state-of-the-art algorithms for a given set of artificial, analytical functions. PMBO competes successfully with those algorithms and even outperforms all of them in some cases. As the results suggest, we believe PMBO is the pivotal choice for solving blackbox optimization tasks occurring in a wide range of disciplines.
    摘要 Motivated by Bayesian optimization, the model is iteratively updated according to the acquisition function Expected Improvement, balancing the exploitation and exploration rate and providing an uncertainty estimate of the model. PMBO is benchmarked against other state-of-the-art algorithms for a given set of artificial, analytical functions. PMBO competes successfully with those algorithms and even outperforms all of them in some cases. As the results suggest, we believe PMBO is the pivotal choice for solving blackbox optimization tasks occurring in a wide range of disciplines.Translated into Simplified Chinese:为许多应用领域,系统如神经网络或复杂的仿真模型的结构未知,并且估算是不可能或者很Expensive。黑盒优化寻找这些系统的优化参数,以实现预定的目标函数的最小化。Polynomial-Model-Based Optimization (PMBO) 是一种新的黑盒优化算法,通过适应 polynomial 模型来拟合目标函数。受 Bayesian 优化的 inspiritation,模型在 Expected Improvement 的优化函数下进行逐步更新,平衡利用率和探索率,并提供模型的不确定性估计。PMBO 与其他状态对照算法进行比较,在一组人工、分析函数上得到了竞争性的成绩,甚至在一些情况下超越了所有其他算法。据结果显示,我们认为 PMBO 是解决黑盒优化问题的绝佳选择,在许多领域中得到广泛的应用。

A Locality-based Neural Solver for Optical Motion Capture

  • paper_url: http://arxiv.org/abs/2309.00428
  • repo_url: https://github.com/non-void/localmocap
  • paper_authors: Xiaoyu Pan, Bowen Zheng, Xinwei Jiang, Guanglong Xu, Xianli Gu, Jingxiang Li, Qilong Kou, He Wang, Tianjia Shao, Kun Zhou, Xiaogang Jin
  • for: 本研究旨在提出一种基于本地特征的Optical Motion Capture(OMC)数据清洁和解决方法,以减少 marker 误差和 occlusion 的影响。
  • methods: 提出了一种基于标签和关节的不同类型节点的 hetroogeneous graph neural network(HGNN),使用图 convolution 操作提取 marker 和关节的本地特征,并将其转化为干净的运动。
  • results: 对多个数据集进行了extensive comparison,并证明了我们的方法可以在多个纬度上高度准确地预测 occluded marker 位置错误,并对重建关节旋转和位置进行了30%的误差减少。代码和数据可以在https://github.com/non-void/LocalMoCap上下载。
    Abstract We present a novel locality-based learning method for cleaning and solving optical motion capture data. Given noisy marker data, we propose a new heterogeneous graph neural network which treats markers and joints as different types of nodes, and uses graph convolution operations to extract the local features of markers and joints and transform them to clean motions. To deal with anomaly markers (e.g. occluded or with big tracking errors), the key insight is that a marker's motion shows strong correlations with the motions of its immediate neighboring markers but less so with other markers, a.k.a. locality, which enables us to efficiently fill missing markers (e.g. due to occlusion). Additionally, we also identify marker outliers due to tracking errors by investigating their acceleration profiles. Finally, we propose a training regime based on representation learning and data augmentation, by training the model on data with masking. The masking schemes aim to mimic the occluded and noisy markers often observed in the real data. Finally, we show that our method achieves high accuracy on multiple metrics across various datasets. Extensive comparison shows our method outperforms state-of-the-art methods in terms of prediction accuracy of occluded marker position error by approximately 20%, which leads to a further error reduction on the reconstructed joint rotations and positions by 30%. The code and data for this paper are available at https://github.com/non-void/LocalMoCap.
    摘要 我们提出了一种新的地域性学习方法,用于清洁和解决光学动作捕捉数据。给定含有噪声的标记数据,我们提议一种新的异类图 neural network,其中标记和关节被视为不同类型的节点,并使用图 convolution 操作来提取标记和关节的本地特征,并将其转化为清洁动作。为了处理异常标记(例如受到遮盖或大跟踪错误),我们的关键发现是,标记的运动具有强相关性,与其邻近的标记的运动相关,而与其他标记的运动相关性较弱,这使得我们能够高效地填充缺失的标记(例如由遮盖所致)。此外,我们还可以识别标记异常(例如由跟踪错误所致),通过研究它们的加速度轨迹。最后,我们建议一种基于表示学习和数据扩展的训练方法,通过在数据上进行掩码训练。掩码方案的目的是模拟实际数据中的受遮盖和噪声标记。我们的方法在多个维度上达到高精度,相比之前的方法,我们的方法在填充缺失标记和跟踪错误方面的预测精度提高约20%,这导致了再次的误差减少在重建关节旋转和位置上约30%。代码和数据可以在https://github.com/non-void/LocalMoCap中获取。

Advancing Personalized Federated Learning: Group Privacy, Fairness, and Beyond

  • paper_url: http://arxiv.org/abs/2309.00416
  • repo_url: None
  • paper_authors: Filippo Galli, Kangsoo Jung, Sayan Biswas, Catuscia Palamidessi, Tommaso Cucinotta
  • for: 本研究旨在探讨在分布式学习框架下,如何兼顾个人化、隐私保障和公平性。
  • methods: 本研究使用了 $d$-privacy(也称为 metric privacy)来保证客户端数据的隐私,并通过对模型更新进行权限控制来实现个人化模型训练。
  • results: 研究发现,通过使用 $d$-privacy,可以在分布式学习框架下实现个人化模型训练,同时提供正式的隐私保障和较好的群体公平性。
    Abstract Federated learning (FL) is a framework for training machine learning models in a distributed and collaborative manner. During training, a set of participating clients process their data stored locally, sharing only the model updates obtained by minimizing a cost function over their local inputs. FL was proposed as a stepping-stone towards privacy-preserving machine learning, but it has been shown vulnerable to issues such as leakage of private information, lack of personalization of the model, and the possibility of having a trained model that is fairer to some groups than to others. In this paper, we address the triadic interaction among personalization, privacy guarantees, and fairness attained by models trained within the FL framework. Differential privacy and its variants have been studied and applied as cutting-edge standards for providing formal privacy guarantees. However, clients in FL often hold very diverse datasets representing heterogeneous communities, making it important to protect their sensitive information while still ensuring that the trained model upholds the aspect of fairness for the users. To attain this objective, a method is put forth that introduces group privacy assurances through the utilization of $d$-privacy (aka metric privacy). $d$-privacy represents a localized form of differential privacy that relies on a metric-oriented obfuscation approach to maintain the original data's topological distribution. This method, besides enabling personalized model training in a federated approach and providing formal privacy guarantees, possesses significantly better group fairness measured under a variety of standard metrics than a global model trained within a classical FL template. Theoretical justifications for the applicability are provided, as well as experimental validation on real-world datasets to illustrate the working of the proposed method.
    摘要 federated learning(FL)是一种分布式和合作的机器学习框架,在训练过程中,参与训练的客户端将本地存储的数据进行处理,并仅将模型更新分布式地提交给训练。FL被提出为隐私保护机器学习的一个途径,但它存在一些问题,如透露隐私信息、缺乏个性化模型和对某些群体更加公平的模型。本文探讨在FL框架中个性化、隐私保障和公平性之间的三元交互。使用泛化隐私和其变种的研究和应用已成为当前隐私保障的标准。然而,FL中的客户端通常拥有具有不同数据集和异质社区的本地数据,因此保护这些敏感信息的同时仍保证模型对用户具有公平性是一项重要的任务。为解决这个问题,我们提出了基于$d$-隐私(即 метриック隐私)的方法,该方法通过本地化隐私保障,保持原始数据的Topological分布,并在个性化模型训练中提供正式的隐私保障。这种方法不仅允许在分布式方式进行个性化模型训练,还具有较好的群体公平度,并且在不同的标准度量中测试了其工作。我们还提供了理论上的正当性和实验 validate的数据来证明方法的有效性。

Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

  • paper_url: http://arxiv.org/abs/2309.00380
  • repo_url: None
  • paper_authors: Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega
  • for: 这个论文是为了提出一种基于深度隐藏变量模型的多modal数据生成模型,以jointly explain multiple modalities的latent representations。
  • methods: 这个论文使用了多modal Variational Autoencoders(VAEs)作为生成模型,并使用了Product-of-Experts(PoE)或Mixture-of-Experts(MoE)的归一化方法来编码来自不同modalities的隐藏变量。
  • results: 该论文通过提出更加灵活的归一化方法和更加紧密的Lower bounding方法,以提高多modal数据生成模型的生成质量和多modal性能。
    Abstract Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations which jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. In order to encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational bound that can tightly lower bound the data log-likelihood. We develop more flexible aggregation schemes that generalise PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational bounds and various aggregation schemes. We show that tighter variational bounds and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.
    摘要 开发深入的潜在变量模型以便多模态数据已经是机器学习研究的长期主题。多模态变量自动机(VAEs)是一种受欢迎的生成模型类,它学习共同解释多个模态的latent表示。对于这些模型,有各种目标函数被建议,通常是多模态数据的日志概率下界或信息理论上的考虑。为了从不同模态子集中编码潜在变量,Product-of-Experts(PoE)或Mixture-of-Experts(MoE)的汇集方案经常使用,并显示出不同的贸易OFF,例如生成质量或多模态之间的一致性。在这个工作中,我们考虑一种可以紧紧下界多模态数据的日志概率的 variational bound。我们开发更 flexible的汇集方案,把不同模态的编码特征通过具有征文化不变性的神经网络进行组合。我们的数字实验表明,多模态variational bound和不同的汇集方案之间存在贸易OFF。我们显示,更紧的variational bound和更flexible的汇集模型可以在 aproximate true joint distribution over observed modalities and latent variables in identifiable models 中变得有利。

Anomaly detection with semi-supervised classification based on risk estimators

  • paper_url: http://arxiv.org/abs/2309.00379
  • repo_url: None
  • paper_authors: Le Thi Khanh Hien, Sukanya Patra, Souhaib Ben Taieb
  • for: 本研究旨在超越一类分类异常检测方法的重要局限性,即假设训练数据只包含正常实例的假设。
  • methods: 我们提出了两种新的分类基于异常检测方法,包括使用不偏的风险估计的混合学习异常检测方法,以及使用非正式风险估计的深度异常检测方法。
  • results: 我们对两种风险估计的选择和正则化参数的选择进行了严格的分析,并通过广泛的实验证明了异常检测方法的有效性。
    Abstract A significant limitation of one-class classification anomaly detection methods is their reliance on the assumption that unlabeled training data only contains normal instances. To overcome this impractical assumption, we propose two novel classification-based anomaly detection methods. Firstly, we introduce a semi-supervised shallow anomaly detection method based on an unbiased risk estimator. Secondly, we present a semi-supervised deep anomaly detection method utilizing a nonnegative (biased) risk estimator. We establish estimation error bounds and excess risk bounds for both risk minimizers. Additionally, we propose techniques to select appropriate regularization parameters that ensure the nonnegativity of the empirical risk in the shallow model under specific loss functions. Our extensive experiments provide strong evidence of the effectiveness of the risk-based anomaly detection methods.
    摘要 一个重要的限制是一类分类异常检测方法的假设,即训练数据只包含正常实例。为了突破这个不现实的假设,我们提出了两种新的分类基于异常检测方法。首先,我们介绍了一种半监督浅层异常检测方法,基于不偏的风险估计器。其次,我们介绍了一种半监督深层异常检测方法,利用非负(偏)风险估计器。我们确立了风险估计器的估计误差 bound 和过分布 bound,以及适当的规则化参数选择技术,以保证训练数据的非负性,特别是在特定的损失函数下。我们的广泛的实验表明,风险基于异常检测方法是有效的。

Where Did the Gap Go? Reassessing the Long-Range Graph Benchmark

  • paper_url: http://arxiv.org/abs/2309.00367
  • repo_url: https://github.com/toenshoff/lrgb
  • paper_authors: Jan Tönshoff, Martin Ritzert, Eran Rosenbluth, Martin Grohe
  • for: estabilish a higher standard of empirical rigor within the graph machine learning community
  • methods: carefully reevaluate multiple MPGNN baselines and the Graph Transformer GPS
  • results: the reported performance gap is overestimated due to suboptimal hyperparameter choices, and the performance gap completely vanishes after basic hyperparameter optimization.Here’s the text in Simplified Chinese:
  • for: estabilish 高水平的实验准则在图机器学习社区
  • methods: 精心重评多个MPGNN基线和图 transformer GPS
  • results: 报告的性能差异过度估计,因为SUB优化参数选择不当Note: “SUB” stands for “suboptimal” in English.
    Abstract The recent Long-Range Graph Benchmark (LRGB, Dwivedi et al. 2022) introduced a set of graph learning tasks strongly dependent on long-range interaction between vertices. Empirical evidence suggests that on these tasks Graph Transformers significantly outperform Message Passing GNNs (MPGNNs). In this paper, we carefully reevaluate multiple MPGNN baselines as well as the Graph Transformer GPS (Ramp\'a\v{s}ek et al. 2022) on LRGB. Through a rigorous empirical analysis, we demonstrate that the reported performance gap is overestimated due to suboptimal hyperparameter choices. It is noteworthy that across multiple datasets the performance gap completely vanishes after basic hyperparameter optimization. In addition, we discuss the impact of lacking feature normalization for LRGB's vision datasets and highlight a spurious implementation of LRGB's link prediction metric. The principal aim of our paper is to establish a higher standard of empirical rigor within the graph machine learning community.
    摘要 最近的长距离图 benchMark (LRGB, Dwivedi et al. 2022) 引入了一系列强依赖于长距离交互的图学任务。实际证据表明,在这些任务上,图Transformers 明显超越了Message Passing GNNs (MPGNNs)。在这篇论文中,我们仔细重新评估了多个 MPGNN 基eline以及 Graph Transformer GPS (Ramp\'a\v{s}ek et al. 2022) 在 LRGB 上的性能。通过严格的实际分析,我们表明了报告的性能差距被过度估计,这是因为使用不优化的超参数。在多个 dataset 上,性能差距完全消失了 после基本超参数优化。此外,我们讨论了 LRGB 视觉数据集上缺失的Feature Normalization的影响,并指出了 LRGB 链接预测度量的误导性实现。我们文章的主要目标是在图机器学习社区中提高实际的严格度。

FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning

  • paper_url: http://arxiv.org/abs/2309.00363
  • repo_url: https://github.com/alibaba/federatedscope
  • paper_authors: Weirui Kuang, Bingchen Qian, Zitao Li, Daoyuan Chen, Dawei Gao, Xuchen Pan, Yuexiang Xie, Yaliang Li, Bolin Ding, Jingren Zhou
  • for: This paper focuses on the challenges of fine-tuning large language models (LLMs) in federated learning (FL) settings, and proposes a package called FS-LLM to address these challenges.
  • methods: The paper introduces several components of the FS-LLM package, including an end-to-end benchmarking pipeline, federated parameter-efficient fine-tuning algorithms, and resource-efficient operators for fine-tuning LLMs with limited resources.
  • results: The paper conducts extensive experiments to validate the effectiveness of FS-LLM and compares it with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings. The results show that FS-LLM achieves better performance with lower communication and computation costs, and provides valuable insights into federated fine-tuning LLMs for the research community.Here’s the Chinese translation of the three information points:
  • for: This paper focuses on the challenges of fine-tuning large language models (LLMs) in federated learning (FL) settings, and proposes a package called FS-LLM to address these challenges.
  • methods: The paper introduces several components of the FS-LLM package, including an end-to-end benchmarking pipeline, federated parameter-efficient fine-tuning algorithms, and resource-efficient operators for fine-tuning LLMs with limited resources.
  • results: The paper conducts extensive experiments to validate the effectiveness of FS-LLM and compares it with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings. The results show that FS-LLM achieves better performance with lower communication and computation costs, and provides valuable insights into federated fine-tuning LLMs for the research community.
    Abstract LLMs have demonstrated great capabilities in various NLP tasks. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their data cannot be shared because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. However, fine-tuning LLMs in federated learning settings still lacks adequate support from existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution, which consists of the following components: (1) we build an end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution, and performance evaluation on federated LLM fine-tuning; (2) we provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios with low communication and computation costs, even without accessing the full model; (3) we adopt several accelerating and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings, which also yields valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm.
    摘要 LLMs 已经在不同的自然语言处理任务中表现出了惊人的能力。不同的实体可以通过特定的下游任务进一步提高 LLMs 的性能。当多个实体有相似的 interessetasks,但其数据不能被共享由隐私问题限制的情况下,联邦学习(FL)成为了主流的解决方案,以利用不同实体的数据来提高 LLMs 的性能。然而,在联邦学习设置下进行 LLMs 的 fine-tuning 仍然缺乏现有 FL 框架的有效支持,因为需要处理大量的通信和计算资源,为不同任务准备数据,并遵守不同的隐私保护要求。本文首先描述了联邦 fine-tuning LLMs 中存在的挑战,并 introduce 我们的 package FS-LLM 作为主要贡献,该包包括以下三个组件:1. 我们建立了一个端到端的 benchmarking 管道,自动化了数据预处理、联邦 fine-tuning 执行和性能评估在联邦 LL 的 fine-tuning 中。2. 我们提供了联邦参数高效的 fine-tuning 算法实现和多样化的编程接口,以便在 FL 场景中,即使没有访问全模型,也可以实现低通信和计算成本的 fine-tuning。3. 我们采用了一些加速和资源高效的操作符,以便在有限资源的情况下进行 fine-tuning LLMs,并采用可替换的子过程,以便进行跨学科研究。我们进行了广泛的实验 validate 了 FS-LLM 的有效性,并在 FL 设置下对 advanced LLMs 进行了参数高效的 fine-tuning,从而获得了价值的发现,以便为研究社区提供参考。为了促进更多的研究和应用,我们将FS-LLM 发布在 GitHub 上,可以在 中获取。

Local and adaptive mirror descents in extensive-form games

  • paper_url: http://arxiv.org/abs/2309.00656
  • repo_url: None
  • paper_authors: Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko
  • for: 这个论文是研究如何在零 SUM 不完全信息游戏中学习 $\epsilon$-优策略的。
  • methods: 该论文使用了一种固定抽样方法,将玩家的政策更新到每个 episoden 中,并使用了一个适应 Online Mirror Descent(OMD)算法来实现。
  • results: 该论文显示了这种方法可以在 $\tilde{\mathcal{O}(T^{-1/2})$ 的速度下 converge,并且在游戏参数中具有near-optimal的依赖关系。
    Abstract We study how to learn $\epsilon$-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by $T$. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of $\tilde{\mathcal{O}(T^{-1/2})$ with high probability and has a near-optimal dependence on the game parameters when applied with the best theoretical choices of learning rates and sampling policies. To achieve these results, we generalize the notion of OMD stabilization, allowing for time-varying regularization with convex increments.
    摘要 我们研究如何学习 $\epsilon$-优化策略在零和游戏中(IIG)中,其中玩家采用批量更新策略基于他们观察到的行为序列。现有的方法具有高度的卷积变iance,这是由importance sampling over sequences of actions引入的。为了降低这种变iance,我们考虑了一种固定抽样方法,其中玩家仍然在时间内更新策略,但是通过固定的抽样策略获取观察。我们的方法基于一种适应 Online Mirror Descent(OMD)算法,该算法在每个信息集中应用 OMD 地方,使用个人减少学习率和带有规化损失的梯度下降。我们证明了这种方法在高probability下 converges at a rate of $\tilde{\mathcal{O}(T^{-1/2})$,并且在游戏参数下有 near-optimal 的依赖性,当应用最佳的理论学习率和抽样策略时。为了实现这些结果,我们扩展了 OMD 稳定性的概念,允许时间变化的正则化,并使用几何增量。

Bespoke Nanoparticle Synthesis and Chemical Knowledge Discovery Via Autonomous Experimentations

  • paper_url: http://arxiv.org/abs/2309.00349
  • repo_url: None
  • paper_authors: Hyuk Jun Yoo, Nayeon Kim, Heeseung Lee, Daeho Kim, Leslie Tiong Ching Ow, Hyobin Nam, Chansoo Kim, Seung Yong Lee, Kwan-Young Lee, Donghun Kim, Sang Soo Han
  • for: 本研究旨在开发一种智能化的纳米材料合成平台,以优化纳米材料的Synthesize方法并实现targeted的光学性质。
  • methods: 该平台采用了一种封闭的closed-loop机制,通过将批量合成模块和UV-Vis光谱测量模块相连接,基于人工智能优化模型的反馈,以实现精准地控制纳米材料的Synthesize过程。
  • results: 通过用银(Ag)粒子为例,我们示出了 bayesian优化器在五种合成原料中进行优化时的高效性,可以在200轮 iteration中 precisely possession desired absorption spectra。此外,我们还发现了一种新的化学效应,即 citrate的Amount对硬度和材料的形态具有关键作用,从而影响了硬度和光谱特征。
    Abstract The optimization of nanomaterial synthesis using numerous synthetic variables is considered to be extremely laborious task because the conventional combinatorial explorations are prohibitively expensive. In this work, we report an autonomous experimentation platform developed for the bespoke design of nanoparticles (NPs) with targeted optical properties. This platform operates in a closed-loop manner between a batch synthesis module of NPs and a UV- Vis spectroscopy module, based on the feedback of the AI optimization modeling. With silver (Ag) NPs as a representative example, we demonstrate that the Bayesian optimizer implemented with the early stopping criterion can efficiently produce Ag NPs precisely possessing the desired absorption spectra within only 200 iterations (when optimizing among five synthetic reagents). In addition to the outstanding material developmental efficiency, the analysis of synthetic variables further reveals a novel chemistry involving the effects of citrate in Ag NP synthesis. The amount of citrate is a key to controlling the competitions between spherical and plate-shaped NPs and, as a result, affects the shapes of the absorption spectra as well. Our study highlights both capabilities of the platform to enhance search efficiencies and to provide a novel chemical knowledge by analyzing datasets accumulated from the autonomous experimentations.
    摘要 “精细材料合成优化使用多个合成变量是一项非常困难的任务,因为传统的可靠性探索是非常昂贵的。在这项工作中,我们报道了一种自动化实验平台,用于设计targeted optical properties的粒子(NPs)。这个平台通过closed-loop模式,将批量合成NPs模块和UV-Vispectroscopy模块相连,基于AI优化模型的反馈。使用银(Ag)NPs作为例子,我们示示了bayesian优化器,在5种合成原料中进行优化时,可以高效地生成银NPs,具有所需的吸收 спектrum。此外,分析合成变量还揭示了一种新的化学知识,即 citrate的影响在银NP合成中。 citrate的含量是控制球形和板状NPs的竞争的关键,因此也影响了吸收спектrum的形状。我们的研究强调了该平台的搜索效率提高和化学知识的提供。”

Multitask Deep Learning for Accurate Risk Stratification and Prediction of Next Steps for Coronary CT Angiography Patients

  • paper_url: http://arxiv.org/abs/2309.00330
  • repo_url: None
  • paper_authors: Juan Lu, Mohammed Bennamoun, Jonathon Stewart, JasonK. Eshraghian, Yanbin Liu, Benjamin Chow, Frank M. Sanfilippo, Girish Dwivedi
  • for: 这份研究是为了提高怀疑和证明 coronary artery disease (CAD) 患者的风险评估和诊断决策。
  • methods: 这篇研究使用了多任务深度学习模型,以支持风险评估和下游测试选择。
  • results: 研究结果显示,这个模型可以对 CCTA 报告数据进行实际的分析,并且可以实现高度的 CAD 风险评估和下游测试选择。模型的 AUC 为 0.76,可以精确地估计 CAD 的可能性和建议下游测试。
    Abstract Diagnostic investigation has an important role in risk stratification and clinical decision making of patients with suspected and documented Coronary Artery Disease (CAD). However, the majority of existing tools are primarily focused on the selection of gatekeeper tests, whereas only a handful of systems contain information regarding the downstream testing or treatment. We propose a multi-task deep learning model to support risk stratification and down-stream test selection for patients undergoing Coronary Computed Tomography Angiography (CCTA). The analysis included 14,021 patients who underwent CCTA between 2006 and 2017. Our novel multitask deep learning framework extends the state-of-the art Perceiver model to deal with real-world CCTA report data. Our model achieved an Area Under the receiver operating characteristic Curve (AUC) of 0.76 in CAD risk stratification, and 0.72 AUC in predicting downstream tests. Our proposed deep learning model can accurately estimate the likelihood of CAD and provide recommended downstream tests based on prior CCTA data. In clinical practice, the utilization of such an approach could bring a paradigm shift in risk stratification and downstream management. Despite significant progress using deep learning models for tabular data, they do not outperform gradient boosting decision trees, and further research is required in this area. However, neural networks appear to benefit more readily from multi-task learning than tree-based models. This could offset the shortcomings of using single task learning approach when working with tabular data.
    摘要 医学诊断调查在抑制性肺动脉疾病(CAD)的诊断和治疗中扮演着重要的角色。然而,现有大多数工具都是专注于门槛测试的选择,而忽略了下游测试或治疗的信息。我们提议一种多任务深度学习模型,用于支持CCTA扫描后的风险分级和下游测试选择。我们的分析包括2006年至2017年间对CCTA扫描的14,021名患者。我们的新的多任务深度学习框架将状态艺术Perceiver模型扩展到实际CCTA报告数据上。我们的模型在CAD风险分级方面 achieve了0.76的接受分数(AUC),而在预测下游测试方面 achieve了0.72的接受分数(AUC)。我们的提议的深度学习模型可以准确地估计CAD的可能性,并基于过去CCTA数据提供下游测试的建议。在临床实践中,使用这种方法可能会引入一种新的风格转移,改善风险分级和下游管理。虽然使用深度学习模型对 tabular 数据进行了重要的进步,但它们不会超过梯度提升 decision trees,需要进一步的研究。然而,神经网络似乎更易受到多任务学习的影响,这可能将 tabular 数据上的弱点 offset。

Mi-Go: Test Framework which uses YouTube as Data Source for Evaluating Speech Recognition Models like OpenAI’s Whisper

  • paper_url: http://arxiv.org/abs/2309.00329
  • repo_url: None
  • paper_authors: Tomasz Wojnar, Jaroslaw Hryszko, Adam Roman
  • for: 评估语音识别机器学习模型的性能和适应性 across 多种语言、方言、发音样式和音质水平。
  • methods: 利用 YouTube 作为数据源,覆盖多种语言、方言、发音样式和音质水平,并对 OpenAI 开发的 Whisper 模型进行测试。
  • results: YouTube 作为测试平台可以保证语音识别模型的稳定性、准确性和适应性,并可以帮助找出 YouTube 上的搜索引擎优化。
    Abstract This article introduces Mi-Go, a novel testing framework aimed at evaluating the performance and adaptability of general-purpose speech recognition machine learning models across diverse real-world scenarios. The framework leverages YouTube as a rich and continuously updated data source, accounting for multiple languages, accents, dialects, speaking styles, and audio quality levels. To demonstrate the effectiveness of the framework, the Whisper model, developed by OpenAI, was employed as a test object. The tests involve using a total of 124 YouTube videos to test all Whisper model versions. The results underscore the utility of YouTube as a valuable testing platform for speech recognition models, ensuring their robustness, accuracy, and adaptability to diverse languages and acoustic conditions. Additionally, by contrasting the machine-generated transcriptions against human-made subtitles, the Mi-Go framework can help pinpoint potential misuse of YouTube subtitles, like Search Engine Optimization.
    摘要

Multi-fidelity reduced-order surrogate modeling

  • paper_url: http://arxiv.org/abs/2309.00325
  • repo_url: https://github.com/contipaolo/multifidelity_pod
  • paper_authors: Paolo Conti, Mengwu Guo, Andrea Manzoni, Attilio Frangi, Steven L. Brunton, J. Nathan Kutz
  • for: 这个论文是用于描述一种基于多 fideltity神经网络的减简方法,用于在有限的计算预算下,使用低精度模型来提高解的预测精度。
  • methods: 该方法首先使用高精度解决生成特征值分解(POD),然后使用多 fideltity长短期记忆(LSTM)网络来approximate低精度解的动态行为。
  • results: 该方法可以有效地捕捉低精度解中的不稳定性和转移现象,并且可以在不侵入式的方式下,使用低精度模型来重建全解场景。
    Abstract High-fidelity numerical simulations of partial differential equations (PDEs) given a restricted computational budget can significantly limit the number of parameter configurations considered and/or time window evaluated for modeling a given system. Multi-fidelity surrogate modeling aims to leverage less accurate, lower-fidelity models that are computationally inexpensive in order to enhance predictive accuracy when high-fidelity data are limited or scarce. However, low-fidelity models, while often displaying important qualitative spatio-temporal features, fail to accurately capture the onset of instability and critical transients observed in the high-fidelity models, making them impractical as surrogate models. To address this shortcoming, we present a new data-driven strategy that combines dimensionality reduction with multi-fidelity neural network surrogates. The key idea is to generate a spatial basis by applying the classical proper orthogonal decomposition (POD) to high-fidelity solution snapshots, and approximate the dynamics of the reduced states - time-parameter-dependent expansion coefficients of the POD basis - using a multi-fidelity long-short term memory (LSTM) network. By mapping low-fidelity reduced states to their high-fidelity counterpart, the proposed reduced-order surrogate model enables the efficient recovery of full solution fields over time and parameter variations in a non-intrusive manner. The generality and robustness of this method is demonstrated by a collection of parametrized, time-dependent PDE problems where the low-fidelity model can be defined by coarser meshes and/or time stepping, as well as by misspecified physical features. Importantly, the onset of instabilities and transients are well captured by this surrogate modeling technique.
    摘要 高精度数学模拟(PDE)在限制计算预算的情况下可能很难以考虑大量参数配置和时间窗口。多层次模型可以利用低精度模型,以提高预测精度,但低精度模型通常不能准确地捕捉高精度模型中的不稳定性和关键过渡。为解决这个缺点,我们提出了一种新的数据驱动策略,它将维度减少与多层次神经网络模型结合在一起。我们首先应用高精度解决方案中的经典正交分解(POD)法生成空间基,然后使用多层次长短期记忆(LSTM)网络来approximate高精度解决方案中的动力学行为。通过将低精度减少到高精度解决方案,我们提出的减少的模型可以非侵入地重建全解场在时间和参数变化中。我们通过一系列 Parametrized,时间依赖的PDE问题的集合来证明这种方法的一般性和稳定性。importantly,这种模拟方法可以准确地捕捉不稳定性和过渡。

SortedNet, a Place for Every Network and Every Network in its Place: Towards a Generalized Solution for Training Many-in-One Neural Networks

  • paper_url: http://arxiv.org/abs/2309.00255
  • repo_url: None
  • paper_authors: Mojtaba Valipour, Mehdi Rezagholizadeh, Hossein Rajabzadeh, Marzieh Tahaei, Boxing Chen, Ali Ghodsi
  • for: 这篇论文旨在提出一个通用且可扩展的方法,以实现深度学习模型在数据量和计算资源限制下的最佳化。
  • methods: 本论文提出了一个sorted training的方法,具有以下几个特点:(1)使用一个嵌入式的架构,将深度学习模型分解为多个子网络;(2)在训练过程中,随机抽样子网络,并使用数据类型和数据量的组合来决定子网络的训练;(3)使用一个新的更新方法,将子网络的训练结果组合成最终的模型。
  • results: 实验结果显示, sorted training 方法可以实现高效的深度学习模型训练,并且比过去的动态训练方法高效得多。具体来说,这篇论文可以训练 160 个不同的子网络 simultaneously,并且维持模型性能的 96%。
    Abstract As the size of deep learning models continues to grow, finding optimal models under memory and computation constraints becomes increasingly more important. Although usually the architecture and constituent building blocks of neural networks allow them to be used in a modular way, their training process is not aware of this modularity. Consequently, conventional neural network training lacks the flexibility to adapt the computational load of the model during inference. This paper proposes SortedNet, a generalized and scalable solution to harness the inherent modularity of deep neural networks across various dimensions for efficient dynamic inference. Our training considers a nested architecture for the sub-models with shared parameters and trains them together with the main model in a sorted and probabilistic manner. This sorted training of sub-networks enables us to scale the number of sub-networks to hundreds using a single round of training. We utilize a novel updating scheme during training that combines random sampling of sub-networks with gradient accumulation to improve training efficiency. Furthermore, the sorted nature of our training leads to a search-free sub-network selection at inference time; and the nested architecture of the resulting sub-networks leads to minimal storage requirement and efficient switching between sub-networks at inference. Our general dynamic training approach is demonstrated across various architectures and tasks, including large language models and pre-trained vision models. Experimental results show the efficacy of the proposed approach in achieving efficient sub-networks while outperforming state-of-the-art dynamic training approaches. Our findings demonstrate the feasibility of training up to 160 different sub-models simultaneously, showcasing the extensive scalability of our proposed method while maintaining 96% of the model performance.
    摘要 随着深度学习模型的大小不断增长,在内存和计算限制下找到最佳模型变得越来越重要。尽管架构和组成部件的 Neil 网络允许它们在模块化方式下使用,但训练过程并不意识到这种模块性。因此,传统的神经网络训练缺乏对模型计算负荷的灵活性。这篇论文提出了 SortNet,一种通用且可扩展的解决方案,以便在多维度上利用深度神经网络的内置模块性进行高效的动态推理。我们的训练方法包括嵌入式架构,共享参数的卷积网络,并在排序和 probabilistic 的方式下对卷积网络进行训练。这种排序训练方法使得我们可以在单一轮训练中批量地训练数百个子网络。我们还提出了一种新的更新方法,将随机抽样的子网络与梯度积累相结合,以提高训练效率。此外,排序的训练方法导致在推理时不需要搜索子网络,并且嵌入式架构的结果是最小的存储需求和高效的子网络交换。我们的通用动态训练方法在不同的架构和任务上进行了广泛的实验,包括大语言模型和预训练视觉模型。实验结果表明我们的方法可以高效地实现高性能的子网络,并超越当前的动态训练方法。我们的发现表明可以同时训练160个不同的子模型, demonstrating the extensive scalability of our proposed method while maintaining 96% of the model performance.

Data-Driven Projection for Reducing Dimensionality of Linear Programs: Generalization Bound and Learning Methods

  • paper_url: http://arxiv.org/abs/2309.00203
  • repo_url: None
  • paper_authors: Shinsaku Sakaue, Taihei Oki
  • for: 这个论文研究了一种基于数据的高维线性 програм(LP)解决方法。给定过去的 $n$-维LP数据,我们学习了一个 $n\times k$ 的投影矩阵 ($n > k$),将高维问题降维到低维问题。然后,我们通过解决低维LP问题并通过投影矩阵恢复高维解决方案。这种方法可以与任何用户喜欢的LP解决方法结合使用,因此可以快速解决LP问题。
  • methods: 我们提出了一种基于数据驱动的LP解决方法,包括使用PCA和梯度下降两种方法学习投影矩阵。PCA方法简单效率高,而梯度下降方法可能会提供更高质量的解决方案。
  • results: 我们的实验表明,学习投影矩阵可以快速和精准地解决LP问题。具体来说,我们可以在减少LP解决时间的同时保持高质量解决方案。此外,我们还发现在某些情况下,使用梯度下降方法可以提供更高质量的解决方案。
    Abstract This paper studies a simple data-driven approach to high-dimensional linear programs (LPs). Given data of past $n$-dimensional LPs, we learn an $n\times k$ \textit{projection matrix} ($n > k$), which reduces the dimensionality from $n$ to $k$. Then, we address future LP instances by solving $k$-dimensional LPs and recovering $n$-dimensional solutions by multiplying the projection matrix. This idea is compatible with any user-preferred LP solvers, hence a versatile approach to faster LP solving. One natural question is: how much data is sufficient to ensure the recovered solutions' quality? We address this question based on the idea of \textit{data-driven algorithm design}, which relates the amount of data sufficient for generalization guarantees to the \textit{pseudo-dimension} of performance metrics. We present an $\tilde{\mathrm{O}(nk^2)$ upper bound on the pseudo-dimension ($\tilde{\mathrm{O}$ compresses logarithmic factors) and complement it by an $\Omega(nk)$ lower bound, hence tight up to an $\tilde{\mathrm{O}(k)$ factor. On the practical side, we study two natural methods for learning projection matrices: PCA- and gradient-based methods. While the former is simple and efficient, the latter sometimes leads to better solution quality. Experiments confirm that learned projection matrices are beneficial for reducing the time for solving LPs while maintaining high solution quality.
    摘要 A natural question is how much data is needed to ensure the quality of the recovered solutions. We address this using the idea of data-driven algorithm design, which relates the amount of data needed for generalization guarantees to the pseudo-dimension of performance metrics. We provide an $\tilde{\mathrm{O}(nk^2)$ upper bound on the pseudo-dimension and complement it with an $\Omega(nk)$ lower bound, giving a tight bound up to an $\tilde{\mathrm{O}(k)$ factor.In practice, we examine two methods for learning projection matrices: principal component analysis (PCA)-based and gradient-based methods. While PCA-based methods are simple and efficient, gradient-based methods sometimes lead to better solution quality. Our experiments show that learned projection matrices can significantly reduce the time required to solve LPs while maintaining high solution quality.

Deep-learning-based Early Fixing for Gas-lifted Oil Production Optimization: Supervised and Weakly-supervised Approaches

  • paper_url: http://arxiv.org/abs/2309.00197
  • repo_url: None
  • paper_authors: Bruno Machado Pacheco, Laio Oriel Seman, Eduardo Camponogara
  • for: 提高天然气提取油气井的油量,解决杂项线性规划问题 (Mixed-Integer Linear Programs, MILPs)。
  • methods: 基于深度学习模型,提出一种适应型规划策略,通过提供所有整数变量的值,将原始问题转化为线性Program (LP)。提出了两种发展学习基于策略:一种是指导学习方法,需要训练集中的优化整数值;另一种是弱指导学习方法,只需要解决随机分配整数变量的早期固定问题。
  • results: 比较结果表明,使用学习基于策略可以实现runtime的减少率为71.11%,而弱指导学习模型具有显著的值提供能力,即使在训练过程中从未看到优化的整数值。
    Abstract Maximizing oil production from gas-lifted oil wells entails solving Mixed-Integer Linear Programs (MILPs). As the parameters of the wells, such as the basic-sediment-to-water ratio and the gas-oil ratio, are updated, the problems must be repeatedly solved. Instead of relying on costly exact methods or the accuracy of general approximate methods, in this paper, we propose a tailor-made heuristic solution based on deep learning models trained to provide values to all integer variables given varying well parameters, early-fixing the integer variables and, thus, reducing the original problem to a linear program (LP). We propose two approaches for developing the learning-based heuristic: a supervised learning approach, which requires the optimal integer values for several instances of the original problem in the training set, and a weakly-supervised learning approach, which requires only solutions for the early-fixed linear problems with random assignments for the integer variables. Our results show a runtime reduction of 71.11% Furthermore, the weakly-supervised learning model provided significant values for early fixing, despite never seeing the optimal values during training.
    摘要 最大化天然气吸取油井生产具有解决杂integer线性程序(MILP)的挑战。随着井井的参数,如基本粉末水比和气油比,的更新,问题必须重复解决。而不是依赖于贵重的精确方法或通用的估计方法,在这篇论文中,我们提出了特制的深度学习模型,用于提供所有整数变量的值,以及 varying 井井参数下的 LP。我们提出了两种方法来开发学习基于模型:一种监督学习方法,需要训练集中的优化整数值的多个实例;一种弱监督学习方法,只需要解决早期固定的线性问题,并将整数变量 randomly 分配。我们的结果显示,使用学习模型可以reduces 71.11%的运行时间。此外,弱监督学习模型在没有训练优化值的情况下也提供了重要的初始化估计值。