cs.LG - 2023-09-09

Symplectic Structure-Aware Hamiltonian (Graph) Embeddings

paper_url: http://arxiv.org/abs/2309.04885
repo_url: None
paper_authors: Jiaxu Liu, Xinping Yi, Tianle Zhang, Xiaowei Huang
for: 这个研究旨在提高传统图形神经网络（GNN）的灵活性，以便更好地适应不同的图形几何。
methods: 这个研究使用了规律方程式来更新节点特征，并运用了里曼对称数学来自适性地学习底下的对称结构。
results: 这个研究获得了在不同类型图形资料集上的优秀表现和灵活性，并且在训练过程中实现了能量守恒性。

Abstract
In traditional Graph Neural Networks (GNNs), the assumption of a fixed embedding manifold often limits their adaptability to diverse graph geometries. Recently, Hamiltonian system-inspired GNNs are proposed to address the dynamic nature of such embeddings by incorporating physical laws into node feature updates. In this work, we present SAH-GNN, a novel approach that generalizes Hamiltonian dynamics for more flexible node feature updates. Unlike existing Hamiltonian-inspired GNNs, SAH-GNN employs Riemannian optimization on the symplectic Stiefel manifold to adaptively learn the underlying symplectic structure during training, circumventing the limitations of existing Hamiltonian GNNs that rely on a pre-defined form of standard symplectic structure. This innovation allows SAH-GNN to automatically adapt to various graph datasets without extensive hyperparameter tuning. Moreover, it conserves energy during training such that the implicit Hamiltonian system is physically meaningful. To this end, we empirically validate SAH-GNN's superior performance and adaptability in node classification tasks across multiple types of graph datasets.

摘要
传统的图 neuronal networks (GNNs) 假设了固定的嵌入 manifold 经常限制它们在不同的图 геометрии上的适应性。最近，基于 Hamiltonian 系统的 GNNs 被提出来解决图嵌入的动态性，通过将物理法则 integrate 到节点特征更新中。在这项工作中，我们提出了 SAH-GNN，一种新的方法，可以扩展 Hamiltonian 动力学来更 flexible 的节点特征更新。与现有的 Hamiltonian-inspired GNNs 不同，SAH-GNN 使用 Riemannian 优化在 симплектиче Stiefel 拟合中学习Podcast 的下面结构，从而自适应地适应不同的图数据集。这种创新使得 SAH-GNN 可以自动适应不同类型的图数据集，而不需要较多的 гипер参数调整。此外，它保持了能量的 física 意义，从而使得 implicit Hamiltonian 系统 Physically meaningful。为了证明 SAH-GNN 的超过性和适应性，我们在多种类型的图数据集上进行了 empirical 验证。

A Gentle Introduction to Gradient-Based Optimization and Variational Inequalities for Machine Learning

paper_url: http://arxiv.org/abs/2309.04877
repo_url: None
paper_authors: Neha S. Wadia, Yatin Dandi, Michael I. Jordan
for: 这篇论文主要针对的是机器学习领域的扩展和进步，具体来说是从优化角度出发，转移到决策和多代人问题上。
methods: 论文使用的方法包括落差点和矩阵游戏等，这些方法可以帮助解决机器学习问题中的新的数学挑战。
results: 论文提供了一种更加广泛的框架来理解机器学习中的梯度下降算法，包括落差点和矩阵游戏等。但是，论文的主要重点不是提供具体的计算证明，而是为了提供动机和直觉。

Abstract
The rapid progress in machine learning in recent years has been based on a highly productive connection to gradient-based optimization. Further progress hinges in part on a shift in focus from pattern recognition to decision-making and multi-agent problems. In these broader settings, new mathematical challenges emerge that involve equilibria and game theory instead of optima. Gradient-based methods remain essential -- given the high dimensionality and large scale of machine-learning problems -- but simple gradient descent is no longer the point of departure for algorithm design. We provide a gentle introduction to a broader framework for gradient-based algorithms in machine learning, beginning with saddle points and monotone games, and proceeding to general variational inequalities. While we provide convergence proofs for several of the algorithms that we present, our main focus is that of providing motivation and intuition.

摘要
随着机器学习领域的快速进步，总是基于高度生产力的梯度基于优化。未来的进步受到一定程度的宽度化的影响，转移焦点从形式识别向决策和多代人问题。在这些更广泛的设置下，新的数学挑战出现，涉及到平衡和游戏理论而不是最优点。梯度基于方法仍然是机器学习问题中的基础，但简单的梯度下降不再是算法设计的起点。我们提供一个温顺的引入，开始于极点和 monotone 游戏，然后进行总variational 不等式。虽提供了一些算法的收敛证明，但我们的主要焦点是提供动机和直觉。

Approximating ReLU on a Reduced Ring for Efficient MPC-based Private Inference

paper_url: http://arxiv.org/abs/2309.04875
repo_url: None
paper_authors: Kiwan Maeng, G. Edward Suh
for: 这篇论文旨在提高无信赖服务器端的机器学习运算速度，并维护用户的隐私敏感资料。
methods: 本文使用多方点 computation（MPC）技术，并运用一个名为 HummingBird 的框架，将 ReLU 评估过程中的通信量大幅降低。
results: HummingBird 可以在多服务器端实现高精度机器学习运算，并在实际应用中实现2.03-2.67倍的终端执行时间增速，最高可达8.64倍。

Abstract
Secure multi-party computation (MPC) allows users to offload machine learning inference on untrusted servers without having to share their privacy-sensitive data. Despite their strong security properties, MPC-based private inference has not been widely adopted in the real world due to their high communication overhead. When evaluating ReLU layers, MPC protocols incur a significant amount of communication between the parties, making the end-to-end execution time multiple orders slower than its non-private counterpart. This paper presents HummingBird, an MPC framework that reduces the ReLU communication overhead significantly by using only a subset of the bits to evaluate ReLU on a smaller ring. Based on theoretical analyses, HummingBird identifies bits in the secret share that are not crucial for accuracy and excludes them during ReLU evaluation to reduce communication. With its efficient search engine, HummingBird discards 87--91% of the bits during ReLU and still maintains high accuracy. On a real MPC setup involving multiple servers, HummingBird achieves on average 2.03--2.67x end-to-end speedup without introducing any errors, and up to 8.64x average speedup when some amount of accuracy degradation can be tolerated, due to its up to 8.76x communication reduction.

摘要
安全多方计算（MPC）使用户可以在不信任服务器上执行机器学习推理，而不需要将隐私敏感数据分享。尽管它们具有强安全性质，但MPC基于私人推理还没有在实际应用中广泛采用，因为它们的通信开销较高。在评估ReLU层时，MPC协议在党之间交换大量数据，使总端到端执行时间与非私人计算相比多次 slower。本文介绍了HummingBird框架，它可以减少ReLU通信开销，使用一个较小的环来评估ReLU。基于理论分析，HummingBird可以在秘密分享中标识不重要的比特，并在ReLU评估中排除它们，以减少通信。具有高效的搜索引擎，HummingBird可以在ReLU评估中抛弃87--91%的比特，并仍保持高精度。在多服务器MPC设置中，HummingBird在平均2.03--2.67倍的端到端执行时间内实现了无错误的8.76倍通信减少。

Approximation Results for Gradient Descent trained Neural Networks

paper_url: http://arxiv.org/abs/2309.04860
repo_url: None
paper_authors: G. Welper
for: 这篇论文的目的是为了提供对具有 Sobolev 的函数进行预测的神经网络的近似保证。
methods: 这篇论文使用的方法包括 gradient flow 和 neural tangent kernel (NTK) 分析。
results: 论文得到的结果是，对于 Sobolev 的函数，采用 gradient flow 训练的神经网络可以在不超过参数的情况下提供高度的近似保证。

Abstract
The paper contains approximation guarantees for neural networks that are trained with gradient flow, with error measured in the continuous $L_2(\mathbb{S}^{d-1})$-norm on the $d$-dimensional unit sphere and targets that are Sobolev smooth. The networks are fully connected of constant depth and increasing width. Although all layers are trained, the gradient flow convergence is based on a neural tangent kernel (NTK) argument for the non-convex second but last layer. Unlike standard NTK analysis, the continuous error norm implies an under-parametrized regime, possible by the natural smoothness assumption required for approximation. The typical over-parametrization re-enters the results in form of a loss in approximation rate relative to established approximation methods for Sobolev smooth functions.

摘要
文章提供了对神经网络的规uli guarantees，该网络通过梯度流进行训练，错误度量为绝对-$L_2(\mathbb{S}^{d-1})$ norm在$d$维单位球上，目标函数具有 Sobolev 的准确性。网络是完全连接的，深度和宽度都是常数。虽然所有层都被训练，但梯度流 converges based on neural tangent kernel（NTK）Argument for the non-convex second but last layer。与标准 NTK 分析不同，绝对错误 norm implies an under-parametrized regime, 可能由自然的 Sobolev 的假设所需的approximation。通常的过 Parametrization 重新出现在结果中，relative to established approximation methods for Sobolev smooth functions as a loss in approximation rate.

HAct: Out-of-Distribution Detection with Neural Net Activation Histograms

paper_url: http://arxiv.org/abs/2309.04837
repo_url: None
paper_authors: Sudeepta Mondal, Ganesh Sundaramoorthi
for: 检测训练后神经网络模型对于非典型数据（out-of-distribution，OOD）的探测
methods: 提出了一种简单、高效、准确的OOD探测方法，基于神经网络层输出值的激活分布（HAct）
results: 在多个OOD图像分类benchmark上达到了state-of-the-art的准确率（TPR），例如使用Resnet-50达到了95%的TPR，同时具有低的假阳性率（FP），比前一代方法提高20.66%。

Abstract
We propose a simple, efficient, and accurate method for detecting out-of-distribution (OOD) data for trained neural networks, a potential first step in methods for OOD generalization. We propose a novel descriptor, HAct - activation histograms, for OOD detection, that is, probability distributions (approximated by histograms) of output values of neural network layers under the influence of incoming data. We demonstrate that HAct is significantly more accurate than state-of-the-art on multiple OOD image classification benchmarks. For instance, our approach achieves a true positive rate (TPR) of 95% with only 0.05% false-positives using Resnet-50 on standard OOD benchmarks, outperforming previous state-of-the-art by 20.66% in the false positive rate (at the same TPR of 95%). The low computational complexity and the ease of implementation make HAct suitable for online implementation in monitoring deployed neural networks in practice at scale.

摘要

Correcting sampling biases via importance reweighting for spatial modeling

paper_url: http://arxiv.org/abs/2309.04824
repo_url: None
paper_authors: Boris Prokhorov, Diana Koldasbayeva, Alexey Zaytsev
for: 该 paper 是为了解决Machine Learning模型中错误估计中的分布偏见问题，尤其是在环境学研究中的空间数据中。
methods: 该方法基于重要度抽样的想法，通过考虑愿望错误和可用数据之间的差异，重新权重错误在每个样点上， нейтралиzed 分布偏见。使用重要度抽样技术和kernel density estimation进行重新权重。
results: 我们使用人工数据，模拟实际的空间数据集， validate 该方法的有效性。我们发现，该方法可以减少预测错误的总体错误率，从7%降低到2%，并且随着样本规模增加，预测错误率越来越小。

Abstract
In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at each sample point and neutralizes the shift. Importance sampling technique and kernel density estimation were used for reweighteing. We validate the effectiveness of our approach using artificial data that resemble real-world spatial datasets. Our findings demonstrate advantages of the proposed approach for the estimation of the target error, offering a solution to a distribution shift problem. Overall error of predictions dropped from 7% to just 2% and it gets smaller for larger samples.

摘要
在机器学习模型中，错误估计通常受到分布偏见的影响，特别是在环境学研究中的空间数据中。我们介绍了一种基于重要性抽样的方法，以获得不偏的目标错误估计。通过考虑愿景错误和可用数据之间的差异，我们的方法在每个抽样点重新权重错误。我们使用重要性抽样技术和核密度估计来重新权重错误。我们使用人工数据，模拟实际世界的空间数据集，以验证我们的方法的效果。我们的发现表明，我们的方法可以减少预测错误的总错误率，从7%降低到2%，并且随着样本规模的增加，错误率会更加小。

Detecting Violations of Differential Privacy for Quantum Algorithms

paper_url: http://arxiv.org/abs/2309.04819
repo_url: None
paper_authors: Ji Guan, Wang Fang, Mingyu Huang, Mingsheng Ying
for: 本研究旨在提出一种形式化的检测方法，用于检测量子算法中的不同步私隐私泄露。
methods: 本文使用tensor网络数据结构和量子计算平台TensorFlow Quantum和TorchQuantum进行实现，开发了一种检测算法，可以自动生成泄露信息，以便检测量子算法中的不同步私隐私泄露。
results: 实验结果表明，本方法可以准确地检测大多数量子算法中的不同步私隐私泄露，包括量子优化算法、量子机器学习模型、量子约等优化算法和量子均衡算法。

Abstract
Quantum algorithms for solving a wide range of practical problems have been proposed in the last ten years, such as data search and analysis, product recommendation, and credit scoring. The concern about privacy and other ethical issues in quantum computing naturally rises up. In this paper, we define a formal framework for detecting violations of differential privacy for quantum algorithms. A detection algorithm is developed to verify whether a (noisy) quantum algorithm is differentially private and automatically generate bugging information when the violation of differential privacy is reported. The information consists of a pair of quantum states that violate the privacy, to illustrate the cause of the violation. Our algorithm is equipped with Tensor Networks, a highly efficient data structure, and executed both on TensorFlow Quantum and TorchQuantum which are the quantum extensions of famous machine learning platforms -- TensorFlow and PyTorch, respectively. The effectiveness and efficiency of our algorithm are confirmed by the experimental results of almost all types of quantum algorithms already implemented on realistic quantum computers, including quantum supremacy algorithms (beyond the capability of classical algorithms), quantum machine learning models, quantum approximate optimization algorithms, and variational quantum eigensolvers with up to 21 quantum bits.

摘要
近十年内，有许多关于实际问题的量子算法被提出，如数据搜索和分析、产品推荐和借记评分。随着量子计算技术的发展，关注隐私和其他伦理问题的担忧自然而生。在这篇论文中，我们定义了一个形式化的检测框架，用于检测量子算法中的不同隐私抵触。我们开发了一个检测算法，用于验证（含噪）量子算法是否遵循不同隐私规则，并自动生成违反隐私规则的信息。这些信息包括两个量子状态，用于说明违反的原因。我们的算法使用了矩阵网络，一种高效的数据结构，并在TensorFlow Quantum和TorchQuantum上执行，这两者分别是矩阵Flow和PyTorch的量子扩展。我们的实验结果表明，我们的算法具有高效和高可靠性。

Neural Latent Geometry Search: Product Manifold Inference via Gromov-Hausdorff-Informed Bayesian Optimization

paper_url: http://arxiv.org/abs/2309.04810
repo_url: None
paper_authors: Haitz Saez de Ocariz Borde, Alvaro Arroyo, Ismael Morales, Ingmar Posner, Xiaowen Dong
for: 提高机器学习模型的性能，通过调整幽默空间的几何结构，使其更好地模型数据结构。
methods: 提出了一种名为神经幽默几何搜索（NLGS）的新形式，它是一种基于度量几何的方法，可以自动地找到最佳的幽默空间，以提高模型的性能。
results: 通过实验证明，NLGS可以高效地找到多种机器学习模型的最佳幽默空间，提高模型的性能。

Abstract
Recent research indicates that the performance of machine learning models can be improved by aligning the geometry of the latent space with the underlying data structure. Rather than relying solely on Euclidean space, researchers have proposed using hyperbolic and spherical spaces with constant curvature, or combinations thereof, to better model the latent space and enhance model performance. However, little attention has been given to the problem of automatically identifying the optimal latent geometry for the downstream task. We mathematically define this novel formulation and coin it as neural latent geometry search (NLGS). More specifically, we introduce a principled method that searches for a latent geometry composed of a product of constant curvature model spaces with minimal query evaluations. To accomplish this, we propose a novel notion of distance between candidate latent geometries based on the Gromov-Hausdorff distance from metric geometry. In order to compute the Gromov-Hausdorff distance, we introduce a mapping function that enables the comparison of different manifolds by embedding them in a common high-dimensional ambient space. Finally, we design a graph search space based on the calculated distances between candidate manifolds and use Bayesian optimization to search for the optimal latent geometry in a query-efficient manner. This is a general method which can be applied to search for the optimal latent geometry for a variety of models and downstream tasks. Extensive experiments on synthetic and real-world datasets confirm the efficacy of our method in identifying the optimal latent geometry for multiple machine learning problems.

摘要
We propose a novel approach called neural latent geometry search (NLGS) to address this problem. NLGS is a principled method that searches for a latent geometry composed of a product of constant curvature model spaces with minimal query evaluations. To accomplish this, we introduce a new notion of distance between candidate latent geometries based on the Gromov-Hausdorff distance from metric geometry. This distance measure allows us to compare different manifolds by embedding them in a common high-dimensional ambient space.We then design a graph search space based on the calculated distances between candidate manifolds and use Bayesian optimization to search for the optimal latent geometry in a query-efficient manner. This method is general and can be applied to search for the optimal latent geometry for a variety of models and downstream tasks.Extensive experiments on synthetic and real-world datasets confirm the effectiveness of our method in identifying the optimal latent geometry for multiple machine learning problems. By automatically identifying the optimal latent geometry, our method can improve the performance of machine learning models and help to unlock their full potential.

Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape

paper_url: http://arxiv.org/abs/2309.04788
repo_url: None
paper_authors: Persia Jana Kamali, Pierfrancesco Urbani
for: 这个论文主要研究了泊松梯度下降（SGD）在训练人工神经网络时的效果，以及SGD在高维非对称优化问题中的表现。
methods: 这篇论文使用了动态均衡理论来分析SGD在高维限制下的性能。
results: 研究发现，使用SGD比使用梯度下降（GD）可以更好地优化高维非对称优化问题，特别是在小批量大小下。SGD的刺激时间下降的Power Law适应比GD更好。

Abstract
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artificial neural networks. However very little is known on to what extent SGD is crucial for to the success of this technology and, in particular, how much it is effective in optimizing high-dimensional non-convex cost functions as compared to other optimization algorithms such as Gradient Descent (GD). In this work we leverage dynamical mean field theory to analyze exactly its performances in the high-dimensional limit. We consider the problem of recovering a hidden high-dimensional non-linearly encrypted signal, a prototype high-dimensional non-convex hard optimization problem. We compare the performances of SGD to GD and we show that SGD largely outperforms GD. In particular, a power law fit of the relaxation time of these algorithms shows that the recovery threshold for SGD with small batch size is smaller than the corresponding one of GD.

摘要

RRCNN$^{+}$: An Enhanced Residual Recursive Convolutional Neural Network for Non-stationary Signal Decomposition

paper_url: http://arxiv.org/abs/2309.04782
repo_url: https://github.com/zhoudafa08/RRCNN_plus
paper_authors: Feng Zhou, Antonio Cicone, Haomin Zhou
for: 这个论文主要针对非线性和非站点信号时频分析中的挑战。
methods: 该论文提出了一种基于实验模式分解法的新方法，并利用深度学习提供了一个独特的非站点信号分解视角。
results: 研究表明，该新方法可以在大规模信号批处理中实现更稳定的分解，同时具有低计算成本和高效率。

Abstract
Time-frequency analysis is an important and challenging task in many applications. Fourier and wavelet analysis are two classic methods that have achieved remarkable success in many fields. They also exhibit limitations when applied to nonlinear and non-stationary signals. To address this challenge, a series of nonlinear and adaptive methods, pioneered by the empirical mode decomposition method have been proposed. Their aim is to decompose a non-stationary signal into quasi-stationary components which reveal better features in the time-frequency analysis. Recently, inspired by deep learning, we proposed a novel method called residual recursive convolutional neural network (RRCNN). Not only RRCNN can achieve more stable decomposition than existing methods while batch processing large-scale signals with low computational cost, but also deep learning provides a unique perspective for non-stationary signal decomposition. In this study, we aim to further improve RRCNN with the help of several nimble techniques from deep learning and optimization to ameliorate the method and overcome some of the limitations of this technique.

摘要
时频分析是许多应用中的重要和挑战性任务。法oux和涤纹分析是两种经典的方法，在许多领域取得了很大的成功。但它们在非线性和非站点信号处理中表现有限。为了解决这个挑战，一系列的非线性和适应方法，如empirical mode decomposition方法，在提出了解决非站点信号的分解。这些方法的目标是将非站点信号分解成更好地表征的 quasi-stationary 组件。在最近，我们受到深度学习的启发，提出了一种新的方法：差异循环神经网络（RRCNN）。RRCNN不仅可以在批处理大规模信号时实现更稳定的分解，同时也可以在低计算成本下提供更高的分解精度。此外，深度学习提供了非站点信号分解中独特的视角。在本研究中，我们想要通过深度学习和优化技术来提高RRCNN方法，并解决一些这种方法的限制。

A Comprehensive Survey on Deep Learning Techniques in Educational Data Mining

paper_url: http://arxiv.org/abs/2309.04761
repo_url: None
paper_authors: Yuanguo Lin, Hong Chen, Wei Xia, Fan Lin, Pengcheng Wu, Zongyue Wang, Yong Li
for: 这篇论文旨在系统地回顾现代教育中使用深度学习技术的教育数据挖掘（EDM）现状。
methods: 本论文使用深度学习技术分析和建模教育数据，包括知识追踪、不良学生检测、性能预测和个性化推荐等四个教育场景。
results: 本论文对现有的公共数据集和处理工具进行了全面的概述，并指出了未来这个领域的趋势和发展方向。

Abstract
Educational Data Mining (EDM) has emerged as a vital field of research, which harnesses the power of computational techniques to analyze educational data. With the increasing complexity and diversity of educational data, Deep Learning techniques have shown significant advantages in addressing the challenges associated with analyzing and modeling this data. This survey aims to systematically review the state-of-the-art in EDM with Deep Learning. We begin by providing a brief introduction to EDM and Deep Learning, highlighting their relevance in the context of modern education. Next, we present a detailed review of Deep Learning techniques applied in four typical educational scenarios, including knowledge tracing, undesirable student detecting, performance prediction, and personalized recommendation. Furthermore, a comprehensive overview of public datasets and processing tools for EDM is provided. Finally, we point out emerging trends and future directions in this research area.

摘要
现代教育数据挖掘（EDM）已成为一个重要的研究领域，利用计算机技术来分析教育数据。随着教育数据的复杂度和多样性的增加，深度学习技术在处理和模型这些数据方面表现出了显著的优势。本文系统地回顾了现代教育数据挖掘领域中使用深度学习技术的状态。我们首先提供了 EDM 和深度学习的简介，强调它们在现代教育中的重要性。然后，我们提供了四种常见的教育场景，包括知识追踪、不良学生检测、性能预测和个性化推荐。此外，我们还提供了一个全面的公共数据集和处理工具的概述。最后，我们指出了这个研究领域的出现趋势和未来方向。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Gromov-Hausdorff Distances for Comparing Product Manifolds of Model Spaces

paper_url: http://arxiv.org/abs/2309.05678
repo_url: None
paper_authors: Haitz Saez de Ocariz Borde, Alvaro Arroyo, Ismael Morales, Ingmar Posner, Xiaowen Dong
for: 提高机器学习模型的性能，通过对积累空间的几何特征与数据结构的对应进行调整。
methods: 使用非欧几何空间（如偏 sfere 和 hyperbolic space）或其组合（知为产品 manifold）来提高模型性能，并使用图earch space来搜索最佳积累geometry。
results: 提出一种新的评估积累geometry的方法，基于度量几何学中的Gromov-Hausdorff距离，并实现了计算Gromov-Hausdorff距离的算法。

Abstract
Recent studies propose enhancing machine learning models by aligning the geometric characteristics of the latent space with the underlying data structure. Instead of relying solely on Euclidean space, researchers have suggested using hyperbolic and spherical spaces with constant curvature, or their combinations (known as product manifolds), to improve model performance. However, there exists no principled technique to determine the best latent product manifold signature, which refers to the choice and dimensionality of manifold components. To address this, we introduce a novel notion of distance between candidate latent geometries using the Gromov-Hausdorff distance from metric geometry. We propose using a graph search space that uses the estimated Gromov-Hausdorff distances to search for the optimal latent geometry. In this work we focus on providing a description of an algorithm to compute the Gromov-Hausdorff distance between model spaces and its computational implementation.

摘要
近期研究建议通过对 latent space 的几何特征与数据结构进行对齐，以提高机器学习模型的性能。而不是仅仅采用欧几何空间，研究人员已经提议使用扁球空间和圆柱空间（或其组合）来改进模型性能。然而，没有一种原则性的技巧来确定最佳的 latent product manifold 签名，即选择和维度 manifold 组件的决策。为此，我们介绍了一种新的 latent geometry 距离度量，基于度量几何中的 Gromov-Hausdorff 距离。我们提议使用图搜索空间，使用估计的 Gromov-Hausdorff 距离来搜索最佳 latent geometry。在这篇文章中，我们主要关注 computing Gromov-Hausdorff distance 和其计算实现。

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in ReLU Networks

paper_url: http://arxiv.org/abs/2309.04742
repo_url: None
paper_authors: Diksha Bhandari, Jakiw Pidstrigach, Sebastian Reich
for: 用 ensemble Kalman filter 进行 Bayesian inference для logistic regression
methods: 使用两种互动的 particle systems 采样 approximate posterior，并证明这些 particle systems 在数量趋于无穷时 Display quantitative convergence rates
results: 应用这些技术，对 ReLU 网络中 predictive uncertainty 进行评估，并证明其效果

Abstract
We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in ReLU networks.

摘要
我们考虑使用 ensemble Kalman filter 的扩展来进行 bayesian 推理 для Logistic Regression。我们提出了两种互动的 particle system，它们可以从 approximate posterior 中采样，并证明这些互动 particle system 在数量的增加时对mean-field limit的量化准确率。此外，我们运用这些技术来评估它们在 ReLU 网络中Quantifying predictive uncertainty的效果。Here's the word-for-word translation of the text:我们考虑使用ensemble Kalman filter的扩展来进行 bayesian推理 дляLogistic Regression。我们提出了两种互动的particle system，它们可以从approximate posterior中采样，并证明这些互动particle system在数量的增加时对mean-field limit的量化准确率。此外，我们运用这些技术来评估它们在ReLU网络中Quantifying predictive uncertainty的效果。

Training of Spiking Neural Network joint Curriculum Learning Strategy

paper_url: http://arxiv.org/abs/2309.04737
repo_url: None
paper_authors: Lingling Tang, Jielei Chu, Zhiguo Gong, Tianrui Li
For: The paper aims to enhance the biological plausibility of Spiking Neural Networks (SNNs) by introducing Curriculum Learning (CL) into SNNs.* Methods: The proposed CL-SNN model uses a confidence-aware loss to measure and process samples with different difficulty levels, allowing the model to learn more like humans and with higher biological interpretability.* Results: The authors conducted experiments on various datasets, including static image datasets MNIST, Fashion-MNIST, CIFAR10, and neuromorphic datasets N-MNIST, CIFAR10-DVS, DVS-Gesture, and the results are promising. To the best of the authors’ knowledge, this is the first proposal to enhance the biologically plausibility of SNNs by introducing CL.Here is the information in Simplified Chinese text:* For: 这篇论文目的是增强神经网络模型的生物可能性，通过引入学习环境中的学习策略，使神经网络模型更加类似于人类学习。* Methods: 提议的 CL-SNN 模型使用一种自信感掌握损失函数来评估不同难度水平的样本，从而使模型更加类似于人类学习。* Results: 作者在不同的数据集上进行了实验，包括静止图像集 MNIST、Fashion-MNIST、CIFAR10，以及 neuromorphic 数据集 N-MNIST、CIFAR10-DVS、DVS-Gesture，结果很有前途。据作者所知，这是首次通过引入 CL 增强 SNN 的生物可能性。

Abstract
Starting with small and simple concepts, and gradually introducing complex and difficult concepts is the natural process of human learning. Spiking Neural Networks (SNNs) aim to mimic the way humans process information, but current SNNs models treat all samples equally, which does not align with the principles of human learning and overlooks the biological plausibility of SNNs. To address this, we propose a CL-SNN model that introduces Curriculum Learning(CL) into SNNs, making SNNs learn more like humans and providing higher biological interpretability. CL is a training strategy that advocates presenting easier data to models before gradually introducing more challenging data, mimicking the human learning process. We use a confidence-aware loss to measure and process the samples with different difficulty levels. By learning the confidence of different samples, the model reduces the contribution of difficult samples to parameter optimization automatically. We conducted experiments on static image datasets MNIST, Fashion-MNIST, CIFAR10, and neuromorphic datasets N-MNIST, CIFAR10-DVS, DVS-Gesture. The results are promising. To our best knowledge, this is the first proposal to enhance the biologically plausibility of SNNs by introducing CL.

摘要
人类学习的自然过程是从小而简单的概念开始，然后慢慢地引入复杂和困难的概念。神经网络模型（SNN）想要模仿人类信息处理的方式，但现有的SNN模型对所有样本进行同等的处理，这并不符合人类学习的原理，而且忽略了神经网络的生物学可能性。为解决这个问题，我们提出了CL-SNN模型，它将CURRICULUM学习（CL）引入SNN，使SNN更像人类学习的方式，并提供更高的生物学可解性。CL是一种培训策略，它提出将更容易的数据给模型之前，然后逐渐增加更加困难的数据，这与人类学习过程相似。我们使用了对样本的信任度进行评估和处理的confidence-aware损失函数。通过学习不同样本的信任度，模型会自动减少困难样本对参数优化的贡献。我们在静止图像集MNIST、Fashion-MNIST、CIFAR10、神经元逻辑集N-MNIST、CIFAR10-DVS、DVS-Gesture上进行了实验。结果很有前途。到我们知道的 extend，这是第一个通过引入CL提高神经网络的生物学可能性的提议。

MultiCaM-Vis: Visual Exploration of Multi-Classification Model with High Number of Classes

paper_url: http://arxiv.org/abs/2309.05676
repo_url: None
paper_authors: Syed Ahsan Ali Dilawer, Shah Rukh Humayoun
for: 本文旨在帮助机器学习专家在学习阶段出现错误的问题时，通过可见化分析，快速定位问题的根本原因。
methods: 本文提出了一种交互式可见化分析工具，名为MultiCaM-Vis，它提供了Overview+Detail样式的并行坐标图和一个Chord диаграм来探索和检查实例级别的错误分类。
results: 本文还提出了一项初步的用户研究，通过12名参与者的实验，发现这种可见化分析工具可以帮助机器学习专家快速定位问题的根本原因。

Abstract
Visual exploration of multi-classification models with large number of classes would help machine learning experts in identifying the root cause of a problem that occurs during learning phase such as miss-classification of instances. Most of the previous visual analytics solutions targeted only a few classes. In this paper, we present our interactive visual analytics tool, called MultiCaM-Vis, that provides \Emph{overview+detail} style parallel coordinate views and a Chord diagram for exploration and inspection of class-level miss-classification of instances. We also present results of a preliminary user study with 12 participants.

摘要
<>传送给定文本到简化中文。>通过视觉探索多类分类模型的许多类型的实例会帮助机器学习专家在学习阶段出现错误的问题的根本原因。大多数前一代的视觉分析解决方案仅针对其中的一些类型。在这篇论文中，我们提出了我们的交互式视觉分析工具 MultiCaM-Vis，它提供了概览+细节并行拐视图和一个弦表来探索和检查实例的类别错误分类。我们还发布了12名参与者的初步用户研究结果。

Weak-PDE-LEARN: A Weak Form Based Approach to Discovering PDEs From Noisy, Limited Data

paper_url: http://arxiv.org/abs/2309.04699
repo_url: https://github.com/punkduckable/weak_pde_learn
paper_authors: Robert Stephany, Christopher Earls
for: 用于从噪音有限的解析数据中直接推断非线性偏微分方程（PDE）。
methods: 使用适应损失函数基于弱形式来训练神经网络，approximate PDE解而同时标识主要PDE。
results: 可以快速精准地推断多种偏微分方程，并且具有较高的噪音抗性和可靠性。

Abstract
We introduce Weak-PDE-LEARN, a Partial Differential Equation (PDE) discovery algorithm that can identify non-linear PDEs from noisy, limited measurements of their solutions. Weak-PDE-LEARN uses an adaptive loss function based on weak forms to train a neural network, $U$, to approximate the PDE solution while simultaneously identifying the governing PDE. This approach yields an algorithm that is robust to noise and can discover a range of PDEs directly from noisy, limited measurements of their solutions. We demonstrate the efficacy of Weak-PDE-LEARN by learning several benchmark PDEs.

摘要
我们介绍Weak-PDE-LEARN，一种partial differential equation（PDE）发现算法，可以从噪音、有限测量的解方面获取非线性PDE。Weak-PDE-LEARN使用适应损失函数基于弱形式来训练神经网络U，以估计PDE解释，同时也可以获取统治PDE。这种方法可以对噪音有效，并且可以直接从噪音有限测量的解方面获取PDE。我们透过训练几个benchmark PDE来证明其效果。

Redundancy-Free Self-Supervised Relational Learning for Graph Clustering

paper_url: http://arxiv.org/abs/2309.04694
repo_url: https://github.com/yisiyu95/r2fgc
paper_authors: Si-Yu Yi, Wei Ju, Yifang Qin, Xiao Luo, Luchen Liu, Yong-Dao Zhou, Ming Zhang
for: 这篇论文的目的是提出一种基于自动编码器和图自动编码器的自然语言 clustering 方法，以优化图 струкured 数据中的 semantic 信息的抽象和利用。
methods: 该方法使用了一种名为 Relational Redundancy-Free Graph Clustering (R$^2$FGC)，它从全球和本地视图中提取了属性和结构层次的关系信息，并通过保持归一化后的节点归一化来提取归一化后的semantic信息。此外，该方法还采用了一种简单 yet 有效的策略来解决过滤问题。
results: 对于 widely 使用的 benchmark 数据集，R$^2$FGC 在比较基准方法的情况下显示出了优越性，并且可以更好地利用图 structured 数据中的semantic信息。

Abstract
Graph clustering, which learns the node representations for effective cluster assignments, is a fundamental yet challenging task in data analysis and has received considerable attention accompanied by graph neural networks in recent years. However, most existing methods overlook the inherent relational information among the non-independent and non-identically distributed nodes in a graph. Due to the lack of exploration of relational attributes, the semantic information of the graph-structured data fails to be fully exploited which leads to poor clustering performance. In this paper, we propose a novel self-supervised deep graph clustering method named Relational Redundancy-Free Graph Clustering (R$^2$FGC) to tackle the problem. It extracts the attribute- and structure-level relational information from both global and local views based on an autoencoder and a graph autoencoder. To obtain effective representations of the semantic information, we preserve the consistent relation among augmented nodes, whereas the redundant relation is further reduced for learning discriminative embeddings. In addition, a simple yet valid strategy is utilized to alleviate the over-smoothing issue. Extensive experiments are performed on widely used benchmark datasets to validate the superiority of our R$^2$FGC over state-of-the-art baselines. Our codes are available at https://github.com/yisiyu95/R2FGC.

摘要
GRAPH CLUSTERING，即通过学习节点表示来实现有效的分群任务，是数据分析领域的基础 yet challenging task，在最近的几年中，随着图神经网络的发展，得到了广泛的关注。然而，大多数现有的方法忽略了图中 nodes 之间的自然关系信息，因此 не能充分利用图 structured data 中的semantic信息，这导致了分群性能的下降。在这篇论文中，我们提出了一种新的自动supervised deep graph clustering方法，名为 Relational Redundancy-Free Graph Clustering (R$^2$FGC)，以解决这个问题。R$^2$FGC 方法通过自动编码器和图自动编码器来提取图中 attribute-和 structure-level 的关系信息，并在 global 和 local 视图下对这些信息进行拓展。为了获得有效的semantic信息表示，我们保留了归一化后的节点之间的一致关系，而 redundant 关系则进一步减少以学习特异性的嵌入。此外，我们采用了一种简单 yet valid 的策略来解决过拟合问题。我们在 widely used 的 benchmark 数据集上进行了广泛的实验，以验证 R$^2$FGC 的超越性。codes 可以在 https://github.com/yisiyu95/R2FGC 上获取。

Compact: Approximating Complex Activation Functions for Secure Computation

paper_url: http://arxiv.org/abs/2309.04664
repo_url: None
paper_authors: Mazharul Islam, Sunpreet S. Arora, Rahul Chatterjee, Peter Rindal, Maliheh Shirvanian
for: 提供隐私保护的深度神经网络（DNN）模型查询服务，使用公共云计算。
methods: 使用现状顶尖的多方 computation（MPC）技术，并使用Compact生成 piece-wise polynomialapproximation来提高MPC技术的效率。
results: Compact不需要任何限制model训练，并且对四种不同的机器学习任务进行了广泛的实验评估，结果表明Compact与DNN特有的方法相比，对于处理复杂非线性 activation functions（AFs）而言，具有 negligible accuracy loss，同时提供了2-5倍的计算速度提升。

Abstract
Secure multi-party computation (MPC) techniques can be used to provide data privacy when users query deep neural network (DNN) models hosted on a public cloud. State-of-the-art MPC techniques can be directly leveraged for DNN models that use simple activation functions (AFs) such as ReLU. However, DNN model architectures designed for cutting-edge applications often use complex and highly non-linear AFs. Designing efficient MPC techniques for such complex AFs is an open problem. Towards this, we propose Compact, which produces piece-wise polynomial approximations of complex AFs to enable their efficient use with state-of-the-art MPC techniques. Compact neither requires nor imposes any restriction on model training and results in near-identical model accuracy. We extensively evaluate Compact on four different machine-learning tasks with DNN architectures that use popular complex AFs SiLU, GeLU, and Mish. Our experimental results show that Compact incurs negligible accuracy loss compared to DNN-specific approaches for handling complex non-linear AFs. We also incorporate Compact in two state-of-the-art MPC libraries for privacy-preserving inference and demonstrate that Compact provides 2x-5x speedup in computation compared to the state-of-the-art approximation approach for non-linear functions -- while providing similar or better accuracy for DNN models with large number of hidden layers

摘要
使用安全多方计算（MPC）技术可以保证用户在公共云上查询深度神经网络（DNN）模型时的数据隐私。现状的MPC技术可以直接应用于使用简单 activation function（AF）的 DNN 模型，如 ReLU。然而，设计用于进行先进应用的 DNN 模型 architecture 通常使用复杂和高度非线性的 AF。为此，我们提出了 Compact，它生成了 piece-wise 多项式近似的复杂 AF，以便使用现状的MPC技术进行高效的使用。Compact 不需要或强制任何模型训练限制，并且会导致模型准确性几乎不变。我们在四种不同的机器学习任务上进行了广泛的实验，并证明了 Compact 与 DNN 特有的方法相比，对于处理复杂非线性 AF 的模型而言，减少了精度损失。此外，我们将 Compact 集成到了两个现状的MPC库中，并证明了 Compact 在计算速度方面比现状的近似方法提供了2-5倍的提升，而同时保持了模型中多个隐藏层的准确性。

Intelligent upper-limb exoskeleton using deep learning to predict human intention for sensory-feedback augmentation

paper_url: http://arxiv.org/abs/2309.04655
repo_url: None
paper_authors: Jinwoo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Youngjin Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo
for: 这个研究旨在开发一种基于云计算和感知反馈的智能 upper-limb exoskeleton系统，以增强人类的手部运动能力。
methods: 该系统使用云计算的深度学习算法预测人类的意图动作，并通过软件感知器收集实时肌肉信号来提供感知反馈。
results: 研究表明，该系统可以在200-250毫秒响应时间内预测四个 upper-limb 关节运动，准确率达96.2%，并可以提供5.15倍的人类力量增强。

Abstract
The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learning to predict human intention for strength augmentation. The embedded soft wearable sensors provide sensory feedback by collecting real-time muscle signals, which are simultaneously computed to determine the user's intended movement. The cloud-based deep-learning predicts four upper-limb joint motions with an average accuracy of 96.2% at a 200-250 millisecond response rate, suggesting that the exoskeleton operates just by human intention. In addition, an array of soft pneumatics assists the intended movements by providing 897 newton of force and 78.7 millimeter of displacement at maximum. Collectively, the intent-driven exoskeleton can augment human strength by 5.15 times on average compared to the unassisted exoskeleton. This report demonstrates an exoskeleton robot that augments the upper-limb joint movements by human intention based on a machine-learning cloud computing and sensory feedback.

摘要
人们日常活动中使用上肢部时，年龄和roke-相关的肌肉强度下降会导致功能下降。虽然有一些外套式机器人，但它们需要人工操作，因为缺乏感知反馈和移动意图预测。在这里，我们介绍了一个智能上肢部外套系统，使用云计算深度学习预测人类意图，以增强肌肉强度。系统内置软件式感知器收集实时肌肉信号，并同时计算用户的意图移动。云计算深度学习预测四个上肢部 JOINT 运动，平均准确率为96.2%，响应时间为200-250毫秒，这表明机器人只遵循人类意图。此外，一个数组软空气填充器助力用户意图的运动，提供897牛顿的力和78.7毫米的移动距离最大。总的来说，意图驱动的机器人可以增强人类上肢部 JOINT 运动的强度，平均提高5.15倍 compared to 无助担机器人。这份报告描述了一种基于机器学习云计算和感知反馈的肌肉强度增强机器人。

Towards Understanding Neural Collapse: The Effects of Batch Normalization and Weight Decay

paper_url: http://arxiv.org/abs/2309.04644
repo_url: None
paper_authors: Leyan Pan, Xinyuan Cao
For: 这个论文研究了在神经网络分类器的最后一层使用批Normalization和权重衰退后，是否会出现神经崩溃现象。* Methods: 该论文提出了一种基于几何学的内部类和间类cosine相似度度量，可以捕捉到神经崩溃现象的多个核心方面。同时，该论文还提供了对于最佳化混合Entropy损失函数时，神经崩溃的理论保证。* Results: 实验结果表明，在神经网络模型中添加批Normalization和高权重衰退值时，神经崩溃现象更加明显，而且与批Normalization和权重衰退值之间存在正相关性。

Abstract
Neural Collapse is a recently observed geometric structure that emerges in the final layer of neural network classifiers. Specifically, Neural Collapse states that at the terminal phase of neural networks training, 1) the intra-class variability of last-layer features tends to zero, 2) the class feature means form an Equiangular Tight Frame (ETF), 3) last-layer class features and weights becomes equal up the scaling, and 4) classification behavior collapses to the nearest class center (NCC) decision rule. This paper investigates the effect of batch normalization and weight decay on the emergence of Neural Collapse. We propose the geometrically intuitive intra-class and inter-class cosine similarity measure which captures multiple core aspects of Neural Collapse. With this measure, we provide theoretical guarantees of Neural Collapse emergence with last-layer batch normalization and weight decay when the regularized cross-entropy loss is near optimal. We also perform further experiments to show that the Neural Collapse is most significant in models with batch normalization and high weight-decay values. Collectively, our results imply that batch normalization and weight decay may be fundamental factors in the emergence of Neural Collapse.

摘要
neural collapse 是一种最近发现的几何结构，它在神经网络分类器的最后一层出现。具体来说，神经collapse 表示在神经网络训练的末期，1）最后一层特征变量内部减少到零，2）类特征均值形成等角紧凑框（ETF），3）最后一层类特征和权重归一化，4）分类行为归一化到最近的类中心（NCC）决策规则。本文研究了批Normalization和权重衰减对神经collapse 的影响。我们提出了几何直观的内类和间类夹角相似度度量，该度量捕捉了多个核心方面的神经collapse。通过这个度量，我们提供了理论保证神经collapse 的出现，当批Normalization和权重衰减值很大时。我们还进行了更多的实验，证明神经collapse 在模型中具有批Normalization和高权重衰减值时最为明显。总的来说，我们的结果表明，批Normalization和权重衰减可能是神经collapse 的基本因素。