cs.LG - 2023-11-26

Frobenius-Type Norms and Inner Products of Matrices and Linear Maps with Applications to Neural Network Training

  • paper_url: http://arxiv.org/abs/2311.15419
  • repo_url: None
  • paper_authors: Roland Herzog, Frederik Köhne, Leonie Kreis, Anton Schiela
  • for: 这篇论文主要针对的是 Frobenius нор和内积的应用在神经网络训练中。
  • methods: 论文使用了 Frobenius 内积来评估对矩阵变量的梯度,并提供了一种更通用的 Frobenius-type нор来替代传统的 Frobenius нор。
  • results: 论文表明了 Frobenius norm 和内积在域和幂域空间之间的相互关系,并使用了这种更通用的 Frobenius-type norm 来预处理神经网络训练。
    Abstract The Frobenius norm is a frequent choice of norm for matrices. In particular, the underlying Frobenius inner product is typically used to evaluate the gradient of an objective with respect to matrix variable, such as those occuring in the training of neural networks. We provide a broader view on the Frobenius norm and inner product for linear maps or matrices, and establish their dependence on inner products in the domain and co-domain spaces. This shows that the classical Frobenius norm is merely one special element of a family of more general Frobenius-type norms. The significant extra freedom furnished by this realization can be used, among other things, to precondition neural network training.
    摘要 “弗罗贝尼矩阵是一种常用的矩阵范数。具体来说,它通常用于评估神经网络训练中矩阵变量的梯度。我们为linear map或矩阵提供了更广泛的视角,并证明它们与域和共域空间中的内积相关。这表明了классиical Frobenius范数只是一种特殊的 Frobenius-type 范数家族中的一个元素。这些额外的自由空间可以用于PREconditioning神经网络训练等其他应用。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Applying statistical learning theory to deep learning

  • paper_url: http://arxiv.org/abs/2311.15404
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Cédric Gerbelot, Avetik Karagulyan, Stefani Karp, Kavya Ravichandran, Menachem Stern, Nathan Srebro
  • for: 本文旨在提供一个深度学习从学习理论角度的概述,尤其是如何不同的架构可以导致束缚偏见 when 使用梯度下降方法进行训练。
  • methods: 本文使用隐式偏见的概念来描述梯度下降算法的工作机制,并提出了一种可以回 и forth между参数空间和对应的函数空间的描述方法。
  • results: 本文对Linear Diagonal Networks(LINE)进行了详细的研究,并证明了不同的损失函数、参数初值的尺度和网络深度可以导致不同的隐式偏见。
    Abstract Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep learning from a learning theory perspective. After a brief reminder on statistical learning theory and stochastic optimization, we discuss implicit bias in the context of benign overfitting. We then move to a general description of the mirror descent algorithm, showing how we may go back and forth between a parameter space and the corresponding function space for a given learning problem, as well as how the geometry of the learning problem may be represented by a metric tensor. Building on this framework, we provide a detailed study of the implicit bias of gradient descent on linear diagonal networks for various regression tasks, showing how the loss function, scale of parameters at initialization and depth of the network may lead to various forms of implicit bias, in particular transitioning between kernel or feature learning.
    摘要 尽管统计学学习理论提供了一个强大的框架来理解超参数学习,但是许多深度学习的理论方面仍然不清楚,特别是如何不同的架构可以导致拟合偏见。这些讲义的目标是为了提供一个学习理论角度下对深度学习的概述,包括一些主要的问题和挑战。首先,我们将提供一个简要的统计学学习理论和随机优化的简介,然后讨论在恰当的拟合情况下的隐含偏见。接着,我们将介绍mirror descent算法,并示出如何在参数空间和对应的函数空间之间进行往返,以及如何在学习问题的几何结构上使用度量矩阵来表示。基于这个框架,我们进行了深入的研究,探讨了不同的梯度下降算法在不同回归任务中的隐含偏见,包括损失函数、参数的Initialization scale和网络的深度。

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

  • paper_url: http://arxiv.org/abs/2311.15390
  • repo_url: None
  • paper_authors: Zhihang Li, Zhao Song, Zifan Wang, Junze Yin
  • for: 这个论文的目的是对二层回归问题进行分析,并研究如何使用不同的激活函数来构建更复杂的回归模型。
  • methods: 这个论文使用了一种称为柯尔莫哈沃-均值梯度下降(CGD)的逼近新顿法来训练模型。
  • results: 该论文提供了对CGD算法的本地可确定性 guarantees,即在满足某些假设条件下,可以在$O(\log(1/\epsilon))$次迭代后找到一个$\epsilon$-近似的训练损失最小值。
    Abstract There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation, sentiment analysis, and question-answering. The accomplishments of LLMs have led to a substantial increase in research efforts in this domain. One specific two-layer regression problem has been well-studied in prior works, where the first layer is activated by a ReLU unit, and the second layer is activated by a softmax unit. While previous works provide a solid analysis of building a two-layer regression, there is still a gap in the analysis of constructing regression problems with more than two layers. In this paper, we take a crucial step toward addressing this problem: we provide an analysis of a two-layer regression problem. In contrast to previous works, our first layer is activated by a softmax unit. This sets the stage for future analyses of creating more activation functions based on the softmax function. Rearranging the softmax function leads to significantly different analyses. Our main results involve analyzing the convergence properties of an approximate Newton method used to minimize the regularized training loss. We prove that the loss function for the Hessian matrix is positive definite and Lipschitz continuous under certain assumptions. This enables us to establish local convergence guarantees for the proposed training algorithm. Specifically, with an appropriate initialization and after $O(\log(1/\epsilon))$ iterations, our algorithm can find an $\epsilon$-approximate minimizer of the training loss with high probability. Each iteration requires approximately $O(\mathrm{nnz}(C) + d^\omega)$ time, where $d$ is the model size, $C$ is the input matrix, and $\omega < 2.374$ is the matrix multiplication exponent.
    摘要 LLMs 在我们日常生活中的不同方面做出了重要的进步。 LLMs 在自然语言处理方面发挥了转变性的作用,其应用包括文本生成、翻译、情感分析和问答。 LLMs 的成就引发了这个领域的研究努力的增加。在先前的作品中,我们已经彻底分析了建立两层回归问题,其中第一层由 ReLU 单元活化,第二层由 softmax 单元活化。虽然先前的作品提供了建立两层回归的坚实分析,但还有一个空白:构建多层回归问题的分析。在这篇论文中,我们采取了一个重要的步骤:我们对二层回归问题进行分析。与先前作品不同的是,我们的第一层使用 softmax 单元活化。这种设置为未来构建更多的 activation 函数基于 softmax 函数提供了前景。通过重新排序 softmax 函数,我们可以获得不同的分析结果。我们的主要结果是对approximate Newton 方法来 minimize 正则化训练损失的 convergency 性进行分析。我们证明了训练损失的偏导数矩阵是正定定义的和 Lipschitz 连续的,以满足 certain 假设。这使得我们可以建立本地 convergency 保证,具体来说,通过适当的初始化和 $O(\log(1/\epsilon))$ 迭代,我们的算法可以在高probability 下找到 $\epsilon$-近似的训练损失的最小值。每个迭代需要约 $O(\text{nnz}(C) + d^\omega)$ 时间,其中 $d$ 是模型大小,$C$ 是输入矩阵,$\omega < 2.374$ 是矩阵乘法幂 exponent。

Spectro-ViT: A Vision Transformer Model for GABA-edited MRS Reconstruction Using Spectrograms

  • paper_url: http://arxiv.org/abs/2311.15386
  • repo_url: None
  • paper_authors: Gabriel Dias, Rodrigo Pommot Berto, Mateus Oliveira, Lucas Ueda, Sergio Dertkigil, Paula D. P. Costa, Amirmohammad Shamaei, Roberto Souza, Ashley Harris, Leticia Rittner
    for: 这个论文的目的是利用Vision Transformer(ViT)来重建/除噪GABA编辑核磁共振спектроскопи(MRS)数据,从原来的四分之一数量的脉冲中获取更多的信息。methods: 这个论文使用了Short-Time Fourier Transform(STFT)将GABA编辑MRS扫描数据转换为图像表示形式,然后采用已经训练过的ViT进行重建GABA编辑MRSpectra(Spectro-ViT)。Spectro-ViT在\textit{in vivo} GABA编辑MRS数据上进行了微调和测试。与其他已有的模型相比,Spectro-ViT表现得更好,以四个分量的量化评价指标(平均方差、形态分数、GABA+/水适应错误、半width at half maximum)来评价。results: 结果表明,Spectro-ViT模型在四个分量的量化评价指标中全面高于其他所有模型(平均方差、形态分数、GABA+/水适应错误、半width at half maximum)。GABA编辑MRSpectra的估计值与通常使用完整的脉冲收集的GABA编辑MRSpectra的估计值一致。
    Abstract Purpose: To investigate the use of a Vision Transformer (ViT) to reconstruct/denoise GABA-edited magnetic resonance spectroscopy (MRS) from a quarter of the typically acquired number of transients using spectrograms. Theory and Methods: A quarter of the typically acquired number of transients collected in GABA-edited MRS scans are pre-processed and converted to a spectrogram image representation using the Short-Time Fourier Transform (STFT). The image representation of the data allows the adaptation of a pre-trained ViT for reconstructing GABA-edited MRS spectra (Spectro-ViT). The Spectro-ViT is fine-tuned and then tested using \textit{in vivo} GABA-edited MRS data. The Spectro-ViT performance is compared against other models in the literature using spectral quality metrics and estimated metabolite concentration values. Results: The Spectro-ViT model significantly outperformed all other models in four out of five quantitative metrics (mean squared error, shape score, GABA+/water fit error, and full width at half maximum). The metabolite concentrations estimated (GABA+/water, GABA+/Cr, and Glx/water) were consistent with the metabolite concentrations estimated using typical GABA-edited MRS scans reconstructed with the full amount of typically collected transients. Conclusion: The proposed Spectro-ViT model achieved state-of-the-art results in reconstructing GABA-edited MRS, and the results indicate these scans could be up to four times faster.
    摘要 Theory and Methods: A quarter of the typically acquired number of transients collected in GABA-edited MRS scans are pre-processed and converted to a spectrogram image representation using the Short-Time Fourier Transform (STFT). The image representation of the data allows the adaptation of a pre-trained ViT for reconstructing GABA-edited MRS spectra (Spectro-ViT). The Spectro-ViT is fine-tuned and then tested using \textit{in vivo} GABA-edited MRS data. The Spectro-ViT performance is compared against other models in the literature using spectral quality metrics and estimated metabolite concentration values. Results: The Spectro-ViT model significantly outperformed all other models in four out of five quantitative metrics (mean squared error, shape score, GABA+/water fit error, and full width at half maximum). The metabolite concentrations estimated (GABA+/water, GABA+/Cr, and Glx/water) were consistent with the metabolite concentrations estimated using typical GABA-edited MRS scans reconstructed with the full amount of typically collected transients. Conclusion: The proposed Spectro-ViT model achieved state-of-the-art results in reconstructing GABA-edited MRS, and the results indicate these scans could be up to four times faster.

Robust and Automatic Data Clustering: Dirichlet Process meets Median-of-Means

  • paper_url: http://arxiv.org/abs/2311.15384
  • repo_url: None
  • paper_authors: Supratik Basu, Jyotishka Ray Choudhury, Debolina Paul, Swagatam Das
  • for: 提出了一种高效自动化归一化技术,以满足不同场景下的归一化需求。
  • methods: integrate了模型基于方法和中心基于方法的原理,以减轻噪声对归一化质量的影响,同时不需要在先知数据中 Specify the number of clusters。
  • results: 通过对模拟和实际数据进行了严格的评估,并提供了对归一化误差的统计保证,而显示了我们提出的方法在对归一化质量的影响有明显的优势。
    Abstract Clustering stands as one of the most prominent challenges within the realm of unsupervised machine learning. Among the array of centroid-based clustering algorithms, the classic $k$-means algorithm, rooted in Lloyd's heuristic, takes center stage as one of the extensively employed techniques in the literature. Nonetheless, both $k$-means and its variants grapple with noteworthy limitations. These encompass a heavy reliance on initial cluster centroids, susceptibility to converging into local minima of the objective function, and sensitivity to outliers and noise in the data. When confronted with data containing noisy or outlier-laden observations, the Median-of-Means (MoM) estimator emerges as a stabilizing force for any centroid-based clustering framework. On a different note, a prevalent constraint among existing clustering methodologies resides in the prerequisite knowledge of the number of clusters prior to analysis. Utilizing model-based methodologies, such as Bayesian nonparametric models, offers the advantage of infinite mixture models, thereby circumventing the need for such requirements. Motivated by these facts, in this article, we present an efficient and automatic clustering technique by integrating the principles of model-based and centroid-based methodologies that mitigates the effect of noise on the quality of clustering while ensuring that the number of clusters need not be specified in advance. Statistical guarantees on the upper bound of clustering error, and rigorous assessment through simulated and real datasets suggest the advantages of our proposed method over existing state-of-the-art clustering algorithms.
    摘要 群集是Machine learning中最突出的挑战之一,而中心基于的聚类算法中,$k$-means算法,基于沃尔特的假设,在文献中广泛应用。然而,$k$-means和其变体受到一些局限性的影响。这些局限性包括听起来Initial cluster centroids依赖度高,容易 converges into local minima of the objective function,和敏感于数据中噪声和异常值。当面临含噪声或异常值的数据时,Median-of-Means(MoM)估计器作为一种稳定的力量,可以为任何聚类框架提供稳定的效果。此外,现有的聚类方法ologies中一个普遍的限制是必须在分析之前知道聚类数量。使用模型基于的方法ologies,如概率非参数模型,可以减少这种需求。我们在这篇文章中提出一种高效的自动聚类技术,通过结合模型基于和中心基于的方法ologies,以降低噪声对聚类质量的影响,同时不需要在先知道聚类数量。我们的提posed方法在 simulate和实际数据上进行了严格的评估,并且在 clustering error的Upper bound上提供了统计保证。

Evaluating Multi-Global Server Architecture for Federated Learning

  • paper_url: http://arxiv.org/abs/2311.15382
  • repo_url: None
  • paper_authors: Asfia Kawnine, Hung Cao, Atah Nuh Mih, Monica Wachowicz
  • for: 提高 federated learning 系统的可靠性和效率,适用于多设备协同学习和模型训练。
  • methods: 提出了一种基于多global server的 federated learning框架,通过本地协同和知识整合来提高模型效果,并通过多个服务器处理来解决单服务器架构中的通信失败问题。
  • results: 通过对电动车充电记录数据进行实验,发现在多服务器架构下,模型性能差异小于1%,而且可以解决单服务器架构中的通信失败问题。
    Abstract Federated learning (FL) with a single global server framework is currently a popular approach for training machine learning models on decentralized environment, such as mobile devices and edge devices. However, the centralized server architecture poses a risk as any challenge on the central/global server would result in the failure of the entire system. To minimize this risk, we propose a novel federated learning framework that leverages the deployment of multiple global servers. We posit that implementing multiple global servers in federated learning can enhance efficiency by capitalizing on local collaborations and aggregating knowledge, and the error tolerance in regard to communication failure in the single server framework would be handled. We therefore propose a novel framework that leverages the deployment of multiple global servers. We conducted a series of experiments using a dataset containing the event history of electric vehicle (EV) charging at numerous stations. We deployed a federated learning setup with multiple global servers and client servers, where each client-server strategically represented a different region and a global server was responsible for aggregating local updates from those devices. Our preliminary results of the global models demonstrate that the difference in performance attributed to multiple servers is less than 1%. While the hypothesis of enhanced model efficiency was not as expected, the rule for handling communication challenges added to the algorithm could resolve the error tolerance issue. Future research can focus on identifying specific uses for the deployment of multiple global servers.
    摘要 Currently, federated learning (FL) with a single global server framework is a popular approach for training machine learning models in decentralized environments, such as mobile devices and edge devices. However, the centralized server architecture poses a risk, as any challenge on the central/global server would result in the failure of the entire system. To minimize this risk, we propose a novel federated learning framework that leverages the deployment of multiple global servers. We believe that implementing multiple global servers in federated learning can enhance efficiency by capitalizing on local collaborations and aggregating knowledge, and the error tolerance in regard to communication failure in the single server framework can be handled. Therefore, we propose a novel framework that leverages the deployment of multiple global servers.We conducted a series of experiments using a dataset containing the event history of electric vehicle (EV) charging at numerous stations. We deployed a federated learning setup with multiple global servers and client servers, where each client-server strategically represented a different region, and a global server was responsible for aggregating local updates from those devices. Our preliminary results of the global models demonstrate that the difference in performance attributed to multiple servers is less than 1%. While the hypothesis of enhanced model efficiency was not as expected, the rule for handling communication challenges added to the algorithm could resolve the error tolerance issue. Future research can focus on identifying specific uses for the deployment of multiple global servers.

Untargeted Code Authorship Evasion with Seq2Seq Transformation

  • paper_url: http://arxiv.org/abs/2311.15366
  • repo_url: None
  • paper_authors: Soohyeon Choi, Rhongho Jang, DaeHun Nyang, David Mohaisen
  • for: 本研究旨在提出一种代码作者归属特征分析方法,通过分析代码的风格特征来归属作者。
  • methods: 本方法使用了Seq2Seq代码转换器 called StructCoder,通过转换代码来增强代码的风格特征,从而提高代码归属特征的准确率。
  • results: 本研究实现了代码归属特征分析的高效化,同时保持了85%的转换成功率和95.77%的诱导成功率。
    Abstract Code authorship attribution is the problem of identifying authors of programming language codes through the stylistic features in their codes, a topic that recently witnessed significant interest with outstanding performance. In this work, we present SCAE, a code authorship obfuscation technique that leverages a Seq2Seq code transformer called StructCoder. SCAE customizes StructCoder, a system designed initially for function-level code translation from one language to another (e.g., Java to C#), using transfer learning. SCAE improved the efficiency at a slight accuracy degradation compared to existing work. We also reduced the processing time by about 68% while maintaining an 85% transformation success rate and up to 95.77% evasion success rate in the untargeted setting.
    摘要 <>程序作者归属问题是通过代码风格特征确定程序作者的问题,最近受到了广泛关注,表现出色。在这项工作中,我们介绍了SCAE,一种基于Seq2Seq代码转换器called StructCoder的代码作者隐蔽技术。SCAE通过转移学习自定义StructCoder,提高了效率,但是有些程度下grades accuracy。我们还将处理时间减少了约68%,保持了85%的转换成功率和95.77%的不argeted Setting中的逃脱成功率。Note:* “Seq2Seq”是Sequence-to-Sequence的缩写,指的是一种从一种语言到另一种语言的代码转换技术。* “StructCoder”是一种用于函数级代码翻译的系统,可以将一种语言的代码转换成另一种语言的代码。* “transfer learning”是一种使用已经训练好的模型来适应新的任务的技术,可以减少训练时间和提高模型的性能。

A Convergence result of a continuous model of deep learning via Łojasiewicz–Simon inequality

  • paper_url: http://arxiv.org/abs/2311.15365
  • repo_url: None
  • paper_authors: Noboru Isobe
  • for: 这个研究关注水星梯度流,该流表示一种深度神经网络(DNN)的连续模型优化过程。
  • methods: 我们首先证明了模型的均值损失下存在最小值,然后证明损失函数的最大坡度曲线的存在。我们的主要结果是流会 converge to 损失函数的极值点,这个结果基于 \L{}ojasiewicz–Simon 梯度不等式。
  • results: 我们的证明方法为非конvex函数的 Wasserstein-type 梯度流提供了新的分析方法。
    Abstract This study focuses on a Wasserstein-type gradient flow, which represents an optimization process of a continuous model of a Deep Neural Network (DNN). First, we establish the existence of a minimizer for an average loss of the model under $L^2$-regularization. Subsequently, we show the existence of a curve of maximal slope of the loss. Our main result is the convergence of flow to a critical point of the loss as time goes to infinity. An essential aspect of proving this result involves the establishment of the \L{}ojasiewicz--Simon gradient inequality for the loss. We derive this inequality by assuming the analyticity of NNs and loss functions. Our proofs offer a new approach for analyzing the asymptotic behavior of Wasserstein-type gradient flows for nonconvex functionals.
    摘要 Translation in Simplified Chinese:这项研究关注 Wasserstein-类型的梯度流,它表示一个深度神经网络(DNN)的连续模型优化过程。首先,我们证明了 $L^2$ 正则化下的模型均值损失的存在最小值。然后,我们显示了损失函数的最大坡度曲线的存在。我们的主要结论是梯度流的渐近归一化到损失函数的极值点,随着时间的推移。一个关键的证明步骤是通过假设神经网络和损失函数的分析性来确定 \L{}ojasiewicz--Simon 梯度不等式。我们从这个不等式中得出了一种新的分析方法,用于研究非凸函数的 Wasserstein-类型梯度流的渐近行为。

Generative Modelling of Stochastic Actions with Arbitrary Constraints in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.15341
  • repo_url: None
  • paper_authors: Changyu Chen, Ramesha Karunasena, Thanh Hong Nguyen, Arunesh Sinha, Pradeep Varakantham
  • for: 本研究旨在解决强化学习中大量离散多维动作空间的问题,如安放多个安全资源和紧急救援单位等问题。这些问题的目标是找到最优策略,但存在大量离散动作空间和难以表述的有效性约束的问题。
  • methods: 本研究使用conditional normalizing flow来压缩策略,通过网络生成一个随机选择的动作和对应的日志概率,然后使用actor-critic方法进行更新。此外,还使用无效动作拒绝方法(通过无效动作oracle)来更新基策略。
  • results: 实验表明,我们的方法可以比PRIOR方法更好地扩展到大量离散动作空间,并且可以在任意状态下对动作分布的支持进行任意 Conditional state-conditional constraints。
    Abstract Many problems in Reinforcement Learning (RL) seek an optimal policy with large discrete multidimensional yet unordered action spaces; these include problems in randomized allocation of resources such as placements of multiple security resources and emergency response units, etc. A challenge in this setting is that the underlying action space is categorical (discrete and unordered) and large, for which existing RL methods do not perform well. Moreover, these problems require validity of the realized action (allocation); this validity constraint is often difficult to express compactly in a closed mathematical form. The allocation nature of the problem also prefers stochastic optimal policies, if one exists. In this work, we address these challenges by (1) applying a (state) conditional normalizing flow to compactly represent the stochastic policy -- the compactness arises due to the network only producing one sampled action and the corresponding log probability of the action, which is then used by an actor-critic method; and (2) employing an invalid action rejection method (via a valid action oracle) to update the base policy. The action rejection is enabled by a modified policy gradient that we derive. Finally, we conduct extensive experiments to show the scalability of our approach compared to prior methods and the ability to enforce arbitrary state-conditional constraints on the support of the distribution of actions in any state.
    摘要 许多强化学习(RL)问题寻找最优策略,其中动作空间较大、离散多维且无序; 这些问题包括资源预配和紧急响应单位等的 randomly 分配。一个挑战在这种设定下是,动作空间是分类的(离散且无序),而且较大,现有的RL方法不太适合。此外,这些问题需要动作的有效性(实现),这是Difficult to express compactly in a closed mathematical form. 因此,我们在这种情况下采取以下两种方法:1. 使用状态conditional normalizing flow来紧凑地表示随机策略,其中网络只生成一个随机动作和对应的Log概率值,然后使用actor-critic方法来更新基策略。2. 通过无效动作拒绝方法(via valid action oracle)来更新基策略。无效动作拒绝是通过我们Derive的修改政策梯度来实现的。最后,我们进行了广泛的实验,以证明我们的方法可以比PRIOR方法更加扩展,并且可以在任何状态下对动作分布的支持 enforcing arbitrary state-conditional constraints。

FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots

  • paper_url: http://arxiv.org/abs/2311.15327
  • repo_url: None
  • paper_authors: Akinari Onishi
  • For: 该研究旨在开发一种适用于社交机器人的增强学习算法,以避免用户产生厌烦。* Methods: 该算法基于忘记过程、随机化过程和分类过程,并与传统Q学习算法进行比较。* Results: FRAC-Q-学习算法在用户兴趣和厌烦强度方面表现出明显的改善,比传统Q学习算法更难使用者产生厌烦。这些结果表明,FRAC-Q-学习算法可以应用于开发不会厌烦用户的社交机器人。
    Abstract The reinforcement learning algorithms have often been applied to social robots. However, most reinforcement learning algorithms were not optimized for the use of social robots, and consequently they may bore users. We proposed a new reinforcement learning method specialized for the social robot, the FRAC-Q-learning, that can avoid user boredom. The proposed algorithm consists of a forgetting process in addition to randomizing and categorizing processes. This study evaluated interest and boredom hardness scores of the FRAC-Q-learning by a comparison with the traditional Q-learning. The FRAC-Q-learning showed significantly higher trend of interest score, and indicated significantly harder to bore users compared to the traditional Q-learning. Therefore, the FRAC-Q-learning can contribute to develop a social robot that will not bore users. The proposed algorithm can also find applications in Web-based communication and educational systems. This paper presents the entire process, detailed implementation and a detailed evaluation method of the of the FRAC-Q-learning for the first time.
    摘要 社交机器人常用到强化学习算法,但大多数强化学习算法没有特地针对社交机器人的使用,因此可能会让用户感到无聊。我们提出了一种特化于社交机器人的强化学习方法,即FRAC-Q-learning,可以避免用户无聊。该算法包括忘记过程、随机过程和分类过程。本研究通过对FRAC-Q-learning和传统Q-学习进行比较,评估了用户兴趣和无聊强度 scores。FRAC-Q-learning表现出了明显更高的兴趣度趋势,并表明与传统Q-学习相比,用户更难被无聊。因此,FRAC-Q-learning可以帮助开发一种不会让用户无聊的社交机器人。此外,该算法还可以应用于网络基本通信和教育系统。本文为FRAC-Q-learning的完整过程、详细实现和评估方法提供了首次描述。

Generalized Graph Prompt: Toward a Unification of Pre-Training and Downstream Tasks on Graphs

  • paper_url: http://arxiv.org/abs/2311.15317
  • repo_url: https://github.com/gmcmt/graph_prompt_extension
  • paper_authors: Xingtong Yu, Zhenghao Liu, Yuan Fang, Zemin Liu, Sihong Chen, Xinming Zhang
  • for: 这篇论文的目的是提出一个名为GraphPrompt的图形预训推导框架,以便将图形预训推导框架与下游任务集成为一个通用的任务模板。
  • methods: 这篇论文使用了一个名为GraphPrompt的预训推导框架,该框架不仅将预训推导框架与下游任务集成为一个通用的任务模板,而且还使用了一个可学习的提示来帮助下游任务在特定任务中找到适当的知识。
  • results: 在五个公开数据集上进行了广泛的实验,以评估和分析GraphPrompt和GraphPrompt+。
    Abstract Graph neural networks have emerged as a powerful tool for graph representation learning, but their performance heavily relies on abundant task-specific supervision. To reduce labeling requirement, the "pre-train, prompt" paradigms have become increasingly common. However, existing study of prompting on graphs is limited, lacking a universal treatment to appeal to different downstream tasks. In this paper, we propose GraphPrompt, a novel pre-training and prompting framework on graphs. GraphPrompt not only unifies pre-training and downstream tasks into a common task template but also employs a learnable prompt to assist a downstream task in locating the most relevant knowledge from the pre-trained model in a task-specific manner. To further enhance GraphPrompt in these two stages, we extend it into GraphPrompt+ with two major enhancements. First, we generalize several popular graph pre-training tasks beyond simple link prediction to broaden the compatibility with our task template. Second, we propose a more generalized prompt design that incorporates a series of prompt vectors within every layer of the pre-trained graph encoder, in order to capitalize on the hierarchical information across different layers beyond just the readout layer. Finally, we conduct extensive experiments on five public datasets to evaluate and analyze GraphPrompt and GraphPrompt+.
    摘要 图 нейрон网络已成为图表示学习的强大工具,但其性能受到任务特定的监督性的限制。为了减少标注要求,“预训练、提示”的方法在图上成为越来越普遍。然而,现有的图提示研究受限,缺乏一个通用的方法来吸引不同的下游任务。在这篇论文中,我们提出了图Prompt,一种新的预训练和提示框架 на graphs。图Prompt不仅将预训练和下游任务合并到一个共同任务模板中,而且使用可学习的提示来帮助下游任务在任务特定的方式中找到预训练模型中最相关的知识。为了进一步提高图Prompt在这两个阶段,我们在GraphPrompt+中对其进行了两个主要改进。首先,我们扩展了许多流行的图预训练任务,以扩大与我们任务模板的兼容性。其次,我们提出了一种更通用的提示设计,将每层的图编码器中的多个提示向量纳入到每层中,以利用图层次信息的各种层次。最后,我们在五个公共数据集上进行了广泛的实验和分析,评估和分析图Prompt和GraphPrompt+。

Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs

  • paper_url: http://arxiv.org/abs/2311.15310
  • repo_url: None
  • paper_authors: Yizheng Zhu, Yuncheng Wu, Zhaojing Luo, Beng Chin Ooi, Xiaokui Xiao
  • for: 本研究旨在提供一种安全和可靠的 Federated Learning(FL)解决方案,以便在数据分享和数据分析方面实现数据合作。
  • methods: 本文使用了随机完整性检查方法和混合承诺方案来保证输入隐私和完整性。同时, authors 还提供了一种理论上的安全保证。
  • results: 实验结果表明,提出的解决方案可以快速和高效地完成客户端计算和通信。比如,与三种州前的基准值 ACORN、RoFL 和 EIFFeL 进行比较,RiseFL 在客户端计算方面比它们快速到 28 倍、53 倍和 164 倍。
    Abstract Organizations are increasingly recognizing the value of data collaboration for data analytics purposes. Yet, stringent data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated Learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously.Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Secondly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we theoretically prove the security guarantee of the proposed solution. Extensive experiments on synthetic and real-world datasets suggest that our solution is effective and is highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x and 164x faster than three state-of-the-art baselines ACORN, RoFL and EIFFeL for the client computation.
    摘要 organizations increasingly recognize the value of data collaboration for data analytics purposes. However, strict data protection laws prohibit the direct exchange of raw data. To facilitate data collaboration, federated learning (FL) emerges as a viable solution, which enables multiple clients to collaboratively train a machine learning (ML) model under the supervision of a central server while ensuring the confidentiality of their raw data. However, existing studies have unveiled two main risks: (i) the potential for the server to infer sensitive information from the client's uploaded updates (i.e., model gradients), compromising client input privacy, and (ii) the risk of malicious clients uploading malformed updates to poison the FL model, compromising input integrity. Recent works utilize secure aggregation with zero-knowledge proofs (ZKP) to guarantee input privacy and integrity in FL. Nevertheless, they suffer from extremely low efficiency and, thus, are impractical for real deployment. In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration, ensuring input privacy and integrity simultaneously.Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Secondly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance. Thirdly, we theoretically prove the security guarantee of the proposed solution. Extensive experiments on synthetic and real-world datasets suggest that our solution is effective and is highly efficient in both client computation and communication. For instance, RiseFL is up to 28x, 53x and 164x faster than three state-of-the-art baselines ACORN, RoFL and EIFFeL for the client computation.

Controllable Expensive Multi-objective Optimization with Warm-starting Gaussian Processes

  • paper_url: http://arxiv.org/abs/2311.15297
  • repo_url: None
  • paper_authors: Quang-Huy Nguyen, Long P. Hoang, Hoang V. Viet, Dung D. Le
  • for: 这篇论文目的是提出一种可控的Pareto集学习方法,以解决现有的Pareto集学习方法在多目标优化问题中的不稳定和不效率问题。
  • methods: 该方法包括两个阶段:首先使用温始 Bayesian优化来获取高质量的 Gaussian Processes 先验,然后使用可控Pareto集学习来准确地获取多目标优化问题中的参数映射。
  • results: 在 synthesis 和实际多目标优化问题中,该方法可以显著提高多目标优化任务的效率和稳定性。
    Abstract Pareto Set Learning (PSL) is a promising approach for approximating the entire Pareto front in multi-objective optimization (MOO) problems. However, existing derivative-free PSL methods are often unstable and inefficient, especially for expensive black-box MOO problems where objective function evaluations are costly. In this work, we propose to address the instability and inefficiency of existing PSL methods with a novel controllable PSL method, called Co-PSL. Particularly, Co-PSL consists of two stages: (1) warm-starting Bayesian optimization to obtain quality Gaussian Processes priors and (2) controllable Pareto set learning to accurately acquire a parametric mapping from preferences to the corresponding Pareto solutions. The former is to help stabilize the PSL process and reduce the number of expensive function evaluations. The latter is to support real-time trade-off control between conflicting objectives. Performances across synthesis and real-world MOO problems showcase the effectiveness of our Co-PSL for expensive multi-objective optimization tasks.
    摘要 pareto set learning (PSL) 是一种有前途的方法,用于approximate multi-objective optimization (MOO) 问题中的整个 pareto 前。然而,现有的derivative-free PSL方法经常unstable和不fficient,特别是在costly black-box MOO问题中, objective function evaluations 是 Expensive。在这种情况下,我们提出了一种新的可控PSL方法,called Co-PSL。特别是,Co-PSL包括两个阶段:(1) warm-starting Bayesian optimization,以获得高质量 Gaussian Processes 的先验,和(2)可控 pareto set learning,以准确地获得对 preferences 的参数 mapping。前者是为了稳定 PSL 过程,降低 expensive function evaluations 的数量。后者是为了支持实时负载控制,并且在 conflicting objectives 中进行trade-off。在 synthesis 和实际 MOO 问题中,我们的 Co-PSL 表现出了高效性,用于 expensive multi-objective optimization 任务。

A Nearly Optimal and Low-Switching Algorithm for Reinforcement Learning with General Function Approximation

  • paper_url: http://arxiv.org/abs/2311.15238
  • repo_url: None
  • paper_authors: Heyang Zhao, Jiafan He, Quanquan Gu
  • for: 这个论文的目的是解决复杂模型类型的强化学习中的探索-利用之间的矛盾。
  • methods: 该论文提出了一种新的算法 named Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB),用于强化学习中的通用函数近似。算法的关键设计包括(1)一种通用的决定性策略切换策略,以实现低的切换成本,(2)一种幂等值函数结构,并且控制函数类型的复杂性,以及(3)一种重量考虑的回归方案,可以充分利用历史轨迹来实现高效的历史轨迹利用。
  • results: MQL-UCB可以实现$\tilde{O}(d\sqrt{HK})$的最小最优误差和$\tilde{O}(dH)$的近乎最优策略切换成本,其中$d$是函数类型的扩展维度,$H$是规划时间 horizon,$K$是话数。该论文的研究 shed light on 设计可靠性和部署效率的Q-学习算法。
    Abstract The exploration-exploitation dilemma has been a central challenge in reinforcement learning (RL) with complex model classes. In this paper, we propose a new algorithm, Monotonic Q-Learning with Upper Confidence Bound (MQL-UCB) for RL with general function approximation. Our key algorithmic design includes (1) a general deterministic policy-switching strategy that achieves low switching cost, (2) a monotonic value function structure with carefully controlled function class complexity, and (3) a variance-weighted regression scheme that exploits historical trajectories with high data efficiency. MQL-UCB achieves minimax optimal regret of $\tilde{O}(d\sqrt{HK})$ when $K$ is sufficiently large and near-optimal policy switching cost of $\tilde{O}(dH)$, with $d$ being the eluder dimension of the function class, $H$ being the planning horizon, and $K$ being the number of episodes. Our work sheds light on designing provably sample-efficient and deployment-efficient Q-learning with nonlinear function approximation.
    摘要 “寻找刚好的挑战(exploration-exploitation dilemma)在强化学习(reinforcement learning,RL)中的复杂模型类型上是一个中心问题。在这篇论文中,我们提出了一个新的算法,即 monotonic Q-learning with Upper Confidence Bound(MQL-UCB),用于RL中的通用函数近似。我们的关键算法设计包括:1. 一种通用的决策策略,实现低的交换成本,2. 具有控制函数类型复杂度的幂等值函数结构,3. 使用历史轨迹进行高效的回归计算。MQL-UCB实现了 $\tilde{O}(d\sqrt{HK})$ 的最小最大 regret,并且具有 $\tilde{O}(dH)$ 的策略交换成本,其中 $d$ 是函数类型的欧拉度,$H$ 是规划时间步长,$K$ 是话数。我们的研究帮助设计可靠的、高效的 Q-学习算法,并且帮助解决了在非线性函数近似下的寻找刚好的挑战。”

Decision Tree Psychological Risk Assessment in Currency Trading

  • paper_url: http://arxiv.org/abs/2311.15222
  • repo_url: https://github.com/jp9621/pricalc
  • paper_authors: Jai Pal
  • for: 这个研究paper的目的是探讨人工智能(AI)在货币交易中的应用,并提出了个性化AI模型的开发,这些模型可以作为个性化的智能助手,为个体投资者提供更加准确和深入的心理风险评估。
  • methods: 该paper使用了复杂的技术,包括分类决策树,以 классификаierung boundaries 的形式提供 clearer decision-making boundaries。 该模型还 integrate了用户的时间序列贸易记录,以便更好地识别高亮心理风险的时刻。
  • results: 该paper的实验结果表明,该模型可以在实时计算中提供更加准确和有价值的心理风险评估,并且可以提供timely alerts ,以避免因心理风险而导致的投资错误。
    Abstract This research paper focuses on the integration of Artificial Intelligence (AI) into the currency trading landscape, positing the development of personalized AI models, essentially functioning as intelligent personal assistants tailored to the idiosyncrasies of individual traders. The paper posits that AI models are capable of identifying nuanced patterns within the trader's historical data, facilitating a more accurate and insightful assessment of psychological risk dynamics in currency trading. The PRI is a dynamic metric that experiences fluctuations in response to market conditions that foster psychological fragility among traders. By employing sophisticated techniques, a classifying decision tree is crafted, enabling clearer decision-making boundaries within the tree structure. By incorporating the user's chronological trade entries, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risks. The implications of this research extend beyond the confines of currency trading, reaching into the realms of other industries where the judicious application of personalized modeling emerges as an efficient and strategic approach. This paper positions itself at the intersection of cutting-edge technology and the intricate nuances of human psychology, offering a transformative paradigm for decision making support in dynamic and high-pressure environments.
    摘要 The paper introduces the Psychological Risk Index (PRI), a dynamic metric that fluctuates in response to market conditions that may cause psychological fragility among traders. By using sophisticated techniques, such as decision trees, the model becomes adept at identifying critical junctures when psychological risks are heightened. The real-time nature of the calculations enhances the model's utility as a proactive tool, offering timely alerts to traders about impending moments of psychological risk.The implications of this research extend beyond currency trading, as the judicious application of personalized modeling can be an efficient and strategic approach in other industries. This paper positions itself at the intersection of cutting-edge technology and human psychology, offering a transformative paradigm for decision-making support in dynamic and high-pressure environments.

The Local Landscape of Phase Retrieval Under Limited Samples

  • paper_url: http://arxiv.org/abs/2311.15221
  • repo_url: None
  • paper_authors: Kaizhao Liu, Zihao Wang, Lei Wu
  • for: 本文研究了受限样本量下的相位回拟的本地景观,特别是确定最小样本大小以确保优质的本地景观环境。
  • methods: 作者们使用了本地抽象和一点强式抽象来分析本地景观的非拟合性和一点强式抽象的稳定性。
  • results: 研究结果表明,当样本大小为$o(d\log d)$时,本地景观是非拟合的,而当样本大小为$\omega(d)$时,本地景观在一定的区域内是一点强式抽象的,这意味着梯度下降 initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast。此外,当样本大小为$o(d\log d)$时,一点强式抽象在相应的更小的本地球体内失效。
    Abstract In this paper, we provide a fine-grained analysis of the local landscape of phase retrieval under the regime with limited samples. Our aim is to ascertain the minimal sample size necessary to guarantee a benign local landscape surrounding global minima in high dimensions. Let $n$ and $d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. Consequently, the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=\omega(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetilde\Theta\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity.
    摘要 在这篇论文中,我们提供了细化的分析方法,以探讨在有限样本下的阶段采样特性。我们的目标是确定最小样本大小,以确保在高维度下的global minimum附近具有温顺的本地特性。 let $n$和$d$ denote the sample size and input dimension, respectively. We first explore the local convexity and establish that when $n=o(d\log d)$, for almost every fixed point in the local ball, the Hessian matrix must have negative eigenvalues as long as $d$ is sufficiently large. This indicates that the local landscape is highly non-convex. We next consider the one-point strong convexity and show that as long as $n=\omega(d)$, with high probability, the landscape is one-point strongly convex in the local annulus: $\{$w\in\mathbb{R}^d: o_d(1)\leqslant \|w-w^*\|\leqslant c\}$, where $w^*$ is the ground truth and $c$ is an absolute constant. This implies that gradient descent initialized from any point in this domain can converge to an $o_d(1)$-loss solution exponentially fast. Furthermore, we show that when $n=o(d\log d)$, there is a radius of $\widetilde\Theta\left(\sqrt{1/d}\right)$ such that one-point convexity breaks in the corresponding smaller local ball. This indicates an impossibility to establish a convergence to exact $w^*$ for gradient descent under limited samples by relying solely on one-point convexity.

Solve Large-scale Unit Commitment Problems by Physics-informed Graph Learning

  • paper_url: http://arxiv.org/abs/2311.15216
  • repo_url: None
  • paper_authors: Jingtao Qin, Nanpeng Yu
  • for: solves large-scale unit commitment (UC) problems with improved performance and scalability.
  • methods: leverages physics-informed hierarchical graph convolutional networks (PI-GCN) for neural diving and model-based graph convolutional networks (MB-GCN) for neural branching.
  • results: achieves better performance and scalability than the baseline MB-GCN on neural diving, and outperforms a modern MIP solver for all testing days after combining it with the proposed neural diving model and the baseline neural branching model.Here’s the text in Simplified Chinese:
  • for: solves large-scale Unit Commitment (UC) 问题,提高性能和可扩展性。
  • methods: 使用Physics-Informed Hierarchical Graph Convolutional Networks (PI-GCN) 进行神经浸润,并使用Model-Based Graph Convolutional Networks (MB-GCN) 进行神经分支。
  • results: 与基eline MB-GCN 进行神经浸润比较,在所有测试天数上获得更好的性能和可扩展性。
    Abstract Unit commitment (UC) problems are typically formulated as mixed-integer programs (MIP) and solved by the branch-and-bound (B&B) scheme. The recent advances in graph neural networks (GNN) enable it to enhance the B&B algorithm in modern MIP solvers by learning to dive and branch. Existing GNN models that tackle MIP problems are mostly constructed from mathematical formulation, which is computationally expensive when dealing with large-scale UC problems. In this paper, we propose a physics-informed hierarchical graph convolutional network (PI-GCN) for neural diving that leverages the underlying features of various components of power systems to find high-quality variable assignments. Furthermore, we adopt the MIP model-based graph convolutional network (MB-GCN) for neural branching to select the optimal variables for branching at each node of the B&B tree. Finally, we integrate neural diving and neural branching into a modern MIP solver to establish a novel neural MIP solver designed for large-scale UC problems. Numeral studies show that PI-GCN has better performance and scalability than the baseline MB-GCN on neural diving. Moreover, the neural MIP solver yields the lowest operational cost and outperforms a modern MIP solver for all testing days after combining it with our proposed neural diving model and the baseline neural branching model.
    摘要 <>将文本翻译成简化中文。<>Unit commitment(UC)问题通常被表示为杂合数学程序(MIP),并使用分支和约束(B&B)算法解决。现代Graph神经网络(GNN)可以增强B&B算法,以便更好地解决现代MIP问题。现有的GNN模型,通常是从数学形式来构建,对于大规模UC问题来说,计算成本较高。在本文中,我们提议一种物理约束层次 Graph卷积神经网络(PI-GCN),可以利用电力系统各种组件的下面特征来找到高质量变量分配。此外,我们采用MIP模型基于Graph卷积神经网络(MB-GCN)来选择每个B&B树的优质变量。最后,我们将神经分割和神经分支结合到现代MIP解决器中,以建立一种基于神经网络的MIP解决器,适用于大规模UC问题。数值研究表明,PI-GCN在神经分割方面比基eline MB-GCN有更好的性能和可扩展性。此外,神经MIP解决器在所有测试天数后,与现代MIP解决器相比,具有最低的运营成本和最高的性能。

A Novel Normalized-Cut Solver with Nearest Neighbor Hierarchical Initialization

  • paper_url: http://arxiv.org/abs/2311.15214
  • repo_url: None
  • paper_authors: Feiping Nie, Jitao Lu, Danyang Wu, Rong Wang, Xuelong Li
  • for: 提出了一种基于坐标下降方法的新的 Normalized-Cut(N-Cut)解决方案,以解决传统N-Cut解决方案存在的两个主要问题:1)两阶段方法无法解决原始问题的优良解;2)解决relax版本问题需要征值 decompositions,具有 $\mathcal{O}(n^3)$ 时间复杂度($n$ 是节点数)。
  • methods: 提出了一种基于坐标下降方法的新的 N-Cut 解决方案,并设计了减速策略来降低时间复杂度至 $\mathcal{O}(|E|)$( $|E|$ 是边数)。此外,还提出了一种高效的初始化方法,以避免依赖于随机初始化,从而减少了 clustering 中的不确定性。
  • results: 实验表明,提出的解决方案可以获得更大的对应值,同时实现了传统解决方案所不能达到的更好的 clustering 性能。
    Abstract Normalized-Cut (N-Cut) is a famous model of spectral clustering. The traditional N-Cut solvers are two-stage: 1) calculating the continuous spectral embedding of normalized Laplacian matrix; 2) discretization via $K$-means or spectral rotation. However, this paradigm brings two vital problems: 1) two-stage methods solve a relaxed version of the original problem, so they cannot obtain good solutions for the original N-Cut problem; 2) solving the relaxed problem requires eigenvalue decomposition, which has $\mathcal{O}(n^3)$ time complexity ($n$ is the number of nodes). To address the problems, we propose a novel N-Cut solver designed based on the famous coordinate descent method. Since the vanilla coordinate descent method also has $\mathcal{O}(n^3)$ time complexity, we design various accelerating strategies to reduce the time complexity to $\mathcal{O}(|E|)$ ($|E|$ is the number of edges). To avoid reliance on random initialization which brings uncertainties to clustering, we propose an efficient initialization method that gives deterministic outputs. Extensive experiments on several benchmark datasets demonstrate that the proposed solver can obtain larger objective values of N-Cut, meanwhile achieving better clustering performance compared to traditional solvers.
    摘要 通用剖除(N-Cut)是一种著名的спектраль分 clustering模型。传统的N-Cut解决方案是两 stage的:1)计算正规化 Laplacian 矩阵的连续 спектраль嵌入; 2)逐点化via $K$-means 或 спектраль旋转。然而,这种模式存在两个重要问题:1)两 stage 方法解决的是放松版本的原始问题,因此无法获得好的解决方案 для原始N-Cut问题; 2)解决放松问题需要各值归一化,这有 $\mathcal{O}(n^3)$ 时间复杂度 ($n$ 是节点数)。为了解决这些问题,我们提出了一种基于知名的坐标极下降法的新的N-Cut解决方案。由于vanilla坐标极下降法也有 $\mathcal{O}(n^3)$ 时间复杂度,我们设计了多种加速策略来降低时间复杂度到 $\mathcal{O}(|E|)$ ($|E|$ 是边数)。为了避免依赖Random initialization 带来分 clustering 不确定性,我们提出了一种高效的初始化方法,该方法可以确定性地输出结果。广泛的实验表明,提出的解决方案可以获得更大的对象值,同时实现 Traditional 解决方案比较好的分 clustering性能。

Topology combined machine learning for consonant recognition

  • paper_url: http://arxiv.org/abs/2311.15210
  • repo_url: https://github.com/AnnFeng233/TDA_Consonant_Recognition
  • paper_authors: Pingyao Feng, Siheng Yi, Qingrui Qu, Zhiwang Yu, Yifei Zhu
  • For: The paper is written for researchers and practitioners in the field of artificial intelligence and signal processing, particularly those interested in using topological methods for machine learning.* Methods: The paper proposes a new method called TopCap, which combines time-delay embedding and persistent homology to capture the most salient topological features of time series data. The method is designed to be transparent and broadly applicable, and can capture features that are not easily detected in datasets with low intrinsic dimensionality.* Results: The paper demonstrates the effectiveness of TopCap in classifying voiced and voiceless consonants, achieving an accuracy of over 96%. The method is also geared towards designing topological convolutional layers for deep learning of speech and audio signals.
    Abstract In artificial-intelligence-aided signal processing, existing deep learning models often exhibit a black-box structure, and their validity and comprehensibility remain elusive. The integration of topological methods, despite its relatively nascent application, serves a dual purpose of making models more interpretable as well as extracting structural information from time-dependent data for smarter learning. Here, we provide a transparent and broadly applicable methodology, TopCap, to capture the most salient topological features inherent in time series for machine learning. Rooted in high-dimensional ambient spaces, TopCap is capable of capturing features rarely detected in datasets with low intrinsic dimensionality. Applying time-delay embedding and persistent homology, we obtain descriptors which encapsulate information such as the vibration of a time series, in terms of its variability of frequency, amplitude, and average line, demonstrated with simulated data. This information is then vectorised and fed into multiple machine learning algorithms such as k-nearest neighbours and support vector machine. Notably, in classifying voiced and voiceless consonants, TopCap achieves an accuracy exceeding 96% and is geared towards designing topological convolutional layers for deep learning of speech and audio signals.
    摘要 人工智能帮助的信号处理中,现有的深度学习模型经常具有黑盒结构,其效度和可读性尚未得到了充分的解释。通过束缚方法的应用,可以同时使模型更加可读性高并提取时间依赖的数据中的结构信息,从而使学习更加聪明。我们提出了一种透明和广泛适用的方法,即TopCap,用于捕捉时间序列中最重要的拓扑特征。基于高维的启发空间,TopCap能够捕捉低维度数据中缺失的特征。通过时间延迟嵌入和不变Homology,我们获得了描述符,它们包含时间序列的变化频率、振荡幅度和平均线,通过对simeulated数据进行示例。这些信息被vector化并 feed into多种机器学习算法,如k-最近邻和支持向量机。值得注意的是,使用TopCap对杂音和无声辅音的分类,达到了96%以上的准确率,这使得TopCap可以用于深度学习的语音和音频信号处理。

Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors

  • paper_url: http://arxiv.org/abs/2311.15207
  • repo_url: None
  • paper_authors: Yun-Wen Mao, Roman V. Krems
  • for: 该论文目的是提出一种准确且数据缺乏的分子性质模型,用于在化学化合物空间中进行 interpolating。
  • methods: 该论文使用的方法包括基于三维物理特征的幂函数回归,以及基于格 Neurol Networks 的六维特征。
  • results: 该论文的结果表明,使用九维描述符和变函数形式的幂函数回归模型可以高效地预测分子的热能和静止振荡能,并且精度在 1 kcal mol$^{-1}$ 以下。
    Abstract We demonstrate accurate data-starved models of molecular properties for interpolation in chemical compound spaces with low-dimensional descriptors. Our starting point is based on three-dimensional, universal, physical descriptors derived from the properties of the distributions of the eigenvalues of Coulomb matrices. To account for the shape and composition of molecules, we combine these descriptors with six-dimensional features informed by the Gershgorin circle theorem. We use the nine-dimensional descriptors thus obtained for Gaussian process regression based on kernels with variable functional form, leading to extremely efficient, low-dimensional interpolation models. The resulting models trained with 100 molecules are able to predict the product of entropy and temperature ($S \times T$) and zero point vibrational energy (ZPVE) with the absolute error under 1 kcal mol$^{-1}$ for $> 78$ \% and under 1.3 kcal mol$^{-1}$ for $> 92$ \% of molecules in the test data. The test data comprises 20,000 molecules with complexity varying from three atoms to 29 atoms and the ranges of $S \times T$ and ZPVE covering 36 kcal mol$^{-1}$ and 161 kcal mol$^{-1}$, respectively. We also illustrate that the descriptors based on the Gershgorin circle theorem yield more accurate models of molecular entropy than those based on graph neural networks that explicitly account for the atomic connectivity of molecules.
    摘要 我们展示了一种精准的数据缺乏模型,用于在化学化合物空间中 interpolate 分子属性。我们的起点是基于三维、通用的物理描述符,它们来自于玻璃矩阵的 eigenvalues 的分布特性。为了考虑分子形态和组成,我们将这些描述符与六维基于GERSHGORIN环 theorem 的特征相结合。我们使用这些九维描述符进行 Gaussian process regression,使用可变形式的函数核,从而获得了极其高效的低维 interpolate 模型。这些模型通过对 100 个分子进行训练,可以预测 $S \times T$ 和 ZPVE 的绝对误差在 1 kcal mol$^{-1}$ 以下,对于大于 78% 的分子而言。测试数据包括 20,000 个分子,其中分子复licity从三个原子到 29 个原子,-$S \times T$ 和 ZPVE 的范围分别为 36 kcal mol$^{-1}$ 和 161 kcal mol$^{-1}$。此外,我们还证明了基于 GERSHGORIN 环 theorem 的描述符可以更好地预测分子热 capacity than 基于图 neural network 的模型,其直接考虑分子原子之间的连接。

A Data-Driven Approach for High-Impedance Fault Localization in Distribution Systems

  • paper_url: http://arxiv.org/abs/2311.15168
  • repo_url: None
  • paper_authors: Yuqi Zhou, Yuqing Dong, Rui Yang
  • for: 本研究旨在提出一种数据驱动的高阻位短路检测方法,以提高分布系统的可靠运行。
  • methods: 该方法首先使用优化问题来近似扭曲轨迹,然后收集所有分段函数特征作为输入,使用支持向量机方法进行高阻位短路 Identification。
  • results: 数值研究表明,提出的方法可以准确地在不同位置上 Identification 高阻位短路事件,并且有较高的准确率和速度。
    Abstract Accurate and quick identification of high-impedance faults is critical for the reliable operation of distribution systems. Unlike other faults in power grids, HIFs are very difficult to detect by conventional overcurrent relays due to the low fault current. Although HIFs can be affected by various factors, the voltage current characteristics can substantially imply how the system responds to the disturbance and thus provides opportunities to effectively localize HIFs. In this work, we propose a data-driven approach for the identification of HIF events. To tackle the nonlinearity of the voltage current trajectory, first, we formulate optimization problems to approximate the trajectory with piecewise functions. Then we collect the function features of all segments as inputs and use the support vector machine approach to efficiently identify HIFs at different locations. Numerical studies on the IEEE 123-node test feeder demonstrate the validity and accuracy of the proposed approach for real-time HIF identification.
    摘要 高域故障的准确和快速识别是分布系统的可靠运行所必需的。与其他网格故障不同,高域故障(HIF)通过传统过电流关系器不易探测,因为故障电流很低。尽管HIF可能受到多种因素影响,但系统电压电流特征可以大大预示系统如何响应干扰,从而提供了local化HIF的机会。在这种工作中,我们提出了基于数据驱动的HIF事件识别方法。为了处理非线性电压电流轨迹,我们首先将轨迹分解成多个段,然后收集每个段的功能特征作为输入,使用支持向量机器学习方法进行高速和准确地识别HIF。数字研究表明,在IEEE 123-node 测试供电系统上,提议的方法可以在实时中准确地识别HIF。