2023-09-14

cs.LG

cs.LG - 2023-09-14

How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent

paper_url: http://arxiv.org/abs/2309.08044
repo_url: None
paper_authors: Mike Nguyen, Nicole Mücke
for: 这篇论文旨在研究两层神经网络在神经归一化kernel（NTK） Régime中的泛化性能，使用梯度下降（GD）进行训练。
methods: 论文使用了梯度下降（GD）进行训练，并且在早停止GD的情况下 derive fast rates of convergence，这些速率被认为是非 Parametric regression in reproducing kernel Hilbert spaces中的最优性。
results: 论文表明，在训练过程中，权重会保持在初始化的周围，即 radius 取决于 Structural assumptions 如抽象函数的细节和 интеграル运算符的eigenvalue decay。

Abstract
We analyze the generalization properties of two-layer neural networks in the neural tangent kernel (NTK) regime, trained with gradient descent (GD). For early stopped GD we derive fast rates of convergence that are known to be minimax optimal in the framework of non-parametric regression in reproducing kernel Hilbert spaces. On our way, we precisely keep track of the number of hidden neurons required for generalization and improve over existing results. We further show that the weights during training remain in a vicinity around initialization, the radius being dependent on structural assumptions such as degree of smoothness of the regression function and eigenvalue decay of the integral operator associated to the NTK.

摘要
我们分析两层神经网络在神经 Tangent 公式（NTK） режиме下的通用性特性，使用梯度下降（GD）进行训练。我们 derivate fast rates of convergence，这些率已知为非 Parametric 回归中极值优化的框架中的最优化率。在我们的路径上，我们精确地跟踪隐藏神经元数量所需的通用性，并超过现有结果。此外，我们还证明在训练过程中的权重保持在初始化附近，半径取决于干扰函数的度数平滑性和积分算子相关的谱值衰落。

On Prediction Feature Assignment in the Heckman Selection Model

paper_url: http://arxiv.org/abs/2309.08043
repo_url: None
paper_authors: Huy Mai, Xintao Wu
for: Handle missing-not-at-random (MNAR) sample selection bias in prediction models.
methods: Heckman selection model and its variants, with a novel data-driven framework called Heckman-FA to obtain prediction features.
results: Robust regression model under MNAR sample selection bias, using experimental results on real-world datasets.Here’s the full text in Simplified Chinese:
for: Handle missing-not-at-random (MNAR) 样本选择偏见在预测模型中。
methods: Heckman选择模型及其变种，使用novel数据驱动框架called Heckman-FA获取预测特征。
results: Robust预测模型 unter MNAR样本选择偏见，使用实验结果在实际数据集上。

Abstract
Under missing-not-at-random (MNAR) sample selection bias, the performance of a prediction model is often degraded. This paper focuses on one classic instance of MNAR sample selection bias where a subset of samples have non-randomly missing outcomes. The Heckman selection model and its variants have commonly been used to handle this type of sample selection bias. The Heckman model uses two separate equations to model the prediction and selection of samples, where the selection features include all prediction features. When using the Heckman model, the prediction features must be properly chosen from the set of selection features. However, choosing the proper prediction features is a challenging task for the Heckman model. This is especially the case when the number of selection features is large. Existing approaches that use the Heckman model often provide a manually chosen set of prediction features. In this paper, we propose Heckman-FA as a novel data-driven framework for obtaining prediction features for the Heckman model. Heckman-FA first trains an assignment function that determines whether or not a selection feature is assigned as a prediction feature. Using the parameters of the trained function, the framework extracts a suitable set of prediction features based on the goodness-of-fit of the prediction model given the chosen prediction features and the correlation between noise terms of the prediction and selection equations. Experimental results on real-world datasets show that Heckman-FA produces a robust regression model under MNAR sample selection bias.

摘要
Under missing-not-at-random (MNAR) 样本选择偏见，预测模型的性能 oft 被降低。这篇论文关注了一个经典的 MNAR 样本选择偏见情况，其中一个子集样本有非Randomly missing 结果。赫克曼选择模型和其变种经常用来处理这种样本选择偏见。赫克曼模型使用两个分开的方程来模型预测和选择样本，其中选择特征包括所有预测特征。在使用赫克曼模型时，预测特征必须从选择特征中选择。然而，选择合适的预测特征是赫克曼模型中的一个挑战。这是特别是当数量很大的选择特征时。现有的方法通常使用手动选择预测特征。在这篇论文中，我们提出了一种新的数据驱动的 Heckman-FA 框架，用于获取适用的预测特征。Heckman-FA 首先训练一个分配函数，该函数确定一个选择特征是否被用作预测特征。使用该函数的参数，框架提取了一个适合的预测特征集，基于预测模型给出的准确性和选择特征之间的相关性。实验结果表明，Heckman-FA 在真实世界数据上生成了一个稳定的回归模型，并且在 MNAR 样本选择偏见情况下表现良好。

USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models

paper_url: http://arxiv.org/abs/2309.08023
repo_url: None
paper_authors: Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang
For: The paper is written for detecting speaker changes and performing automatic speech recognition (ASR) for 96 languages.* Methods: The paper uses a multilingual speaker change detection model (USM-SCD) that is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data. The model is fine-tuned for the downstream task of speaker change detection and ASR.* Results: The USM-SCD model achieves more than 75% average speaker change detection F1 score across a test set of 96 languages, with an 85.8% speaker change detection F1 score on American English. The model also exhibits state-of-the-art ASR quality compared to a strong public ASR baseline, making it suitable for handling both tasks with negligible additional computational cost.

Abstract
We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model is adapted from a speech foundation model trained on a large quantity of supervised and unsupervised data, demonstrating the utility of fine-tuning from a large generic foundation model for a downstream task. We analyze the performance of this multilingual speaker change detection model through a series of ablation studies. We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages. On American English, the USM-SCD model can achieve an 85.8% speaker change detection F1 score across various public and internal test sets, beating the previous monolingual baseline model by 21% relative. We also show that we only need to fine-tune one-quarter of the trainable model parameters to achieve the best model performance. The USM-SCD model exhibits state-of-the-art ASR quality compared with a strong public ASR baseline, making it suitable to handle both tasks with negligible additional computational cost.

摘要
我们介绍了一个多语言说话变化检测模型（USM-SCD），该模型可同时检测说话转移并进行ASR для 96种语言。该模型基于一个大量监督和无监督数据训练的语音基础模型，并示了将大型通用基础模型进行精度调整的利用性。我们通过一系列剥离研究分析了USM-SCD模型的性能。我们发现USM-SCD模型可以在96种语言测试集上达到75%以上的平均说话变化检测F1分数。在美国英语上，USM-SCD模型可以在不同的公共和内部测试集上达到85.8%的说话变化检测F1分数，比前一代单语言基线模型提高21%的Relative。我们还发现只需要调整模型参数的一半可以达到最佳模型性能。USM-SCD模型在ASR质量方面与一个强大的公共ASR基线模型相当，使其适合同时处理两个任务，计算成本几乎为零加上。

CRYPTO-MINE: Cryptanalysis via Mutual Information Neural Estimation

paper_url: http://arxiv.org/abs/2309.08019
repo_url: None
paper_authors: Benjamin D. Kim, Vipindev Adat Vasudevan, Jongchan Woo, Alejandro Cohen, Rafael G. L. D’Oliveira, Thomas Stahlbuhk, Muriel Médard
for: 评估密码系统的计算安全性
methods: 使用神经网络来估计密码系统中文本和密文之间的共同信息
results: 对多种加密算法和基准方法进行实验分析，并研究输入分布与信息泄露之间的关系

Abstract
The use of Mutual Information (MI) as a measure to evaluate the efficiency of cryptosystems has an extensive history. However, estimating MI between unknown random variables in a high-dimensional space is challenging. Recent advances in machine learning have enabled progress in estimating MI using neural networks. This work presents a novel application of MI estimation in the field of cryptography. We propose applying this methodology directly to estimate the MI between plaintext and ciphertext in a chosen plaintext attack. The leaked information, if any, from the encryption could potentially be exploited by adversaries to compromise the computational security of the cryptosystem. We evaluate the efficiency of our approach by empirically analyzing multiple encryption schemes and baseline approaches. Furthermore, we extend the analysis to novel network coding-based cryptosystems that provide individual secrecy and study the relationship between information leakage and input distribution.

摘要
使用互讯信息（MI）作为加密系统的效率评估的历史悠久。然而，在高维空间中估计MI的挑战很大。现代机器学习技术的进步使得MI估计使用神经网络得到进展。本工作提出了一种应用MI估计的新方法，直接用于计算文本和加密文本之间的MI。如果有任何泄露信息，可能会被敌方利用来损害加密系统的计算安全性。我们通过实验分析多种加密方案和基eline方法来评估我们的方法的效率。此外，我们将分析新的网络编码基于加密系统，并研究输入分布与信息泄露之间的关系。

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

paper_url: http://arxiv.org/abs/2309.07988
repo_url: None
paper_authors: Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Ernie Chang, Yangyang Shi, Vikas Chandra
for: 这篇论文的目的是优化Transformer数据准确度模型，以提高声识别器的效率和可携性。
methods: 本文使用了一种名为“folding attention”的技术，对于 linear projection layers 进行了优化，以减少模型的大小和计算量，同时不影响模型的准确度和计算过程。
results: 实验结果显示，使用“folding attention”技术可以将模型的大小（以及相应的内存消耗）降低到24%，并且降低了电源消耗到23%，而且无需妥协模型的准确度或计算过程。

Abstract
Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.

摘要
具有转换器基于模型的模型在语音识别方面表现出色。现有的优化转换器推理，通常是为长上下文应用，围绕简化关注分数计算。然而，流动语音识别模型通常每次处理的令牌数相对较少，因此关注分数计算不是瓶颈。相反，瓶颈在多头注意力和feedforward网络的线性投影层上，这些层占据了模型大小的较大比例，并且对计算、内存和能源使用做出了重要贡献。为解决这个瓶颈，我们提出了“叠加注意力”技术， Targeting these linear layers, this technique significantly reduces the model size and improves memory and power efficiency. Our experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead.

SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems

paper_url: http://arxiv.org/abs/2309.07983
repo_url: https://github.com/s3l-official/slmia-sr
paper_authors: Guangke Chen, Yedi Zhang, Fu Song
for: 这个论文的目的是提出一种特有的会员推测攻击（Membership Inference Attack），用于 speaker recognition（SR）领域。
methods: 这个攻击使用了两个特征组合体 engineering 来量化训练和非训练 speaker 之间的差异，并通过一种新的混合比例训练策略来提高攻击效果。
results: 实验表明，这个攻击能够成功地判断 SR 模型中是否包含特定的声音例子，并且可以在白卷和黑卷场景下进行。此外，攻击还可以采用两种新的技术来降低黑卷查询数量。

Abstract
Membership inference attacks allow adversaries to determine whether a particular example was contained in the model's training dataset. While previous works have confirmed the feasibility of such attacks in various applications, none has focused on speaker recognition (SR), a promising voice-based biometric recognition technique. In this work, we propose SLMIA-SR, the first membership inference attack tailored to SR. In contrast to conventional example-level attack, our attack features speaker-level membership inference, i.e., determining if any voices of a given speaker, either the same as or different from the given inference voices, have been involved in the training of a model. It is particularly useful and practical since the training and inference voices are usually distinct, and it is also meaningful considering the open-set nature of SR, namely, the recognition speakers were often not present in the training data. We utilize intra-closeness and inter-farness, two training objectives of SR, to characterize the differences between training and non-training speakers and quantify them with two groups of features driven by carefully-established feature engineering to mount the attack. To improve the generalizability of our attack, we propose a novel mixing ratio training strategy to train attack models. To enhance the attack performance, we introduce voice chunk splitting to cope with the limited number of inference voices and propose to train attack models dependent on the number of inference voices. Our attack is versatile and can work in both white-box and black-box scenarios. Additionally, we propose two novel techniques to reduce the number of black-box queries while maintaining the attack performance. Extensive experiments demonstrate the effectiveness of SLMIA-SR.

摘要
<>输入文本翻译成简化中文。<> membrane 攻击允许敌对者确定一个特定的示例是否包含在模型的训练数据集中。先前的工作已经证明了这种攻击的可行性在多种应用场景中，但没有关注 speaker recognition（SR），一种有前途的语音基于生物认证技术。在这项工作中，我们提出了 SLMIA-SR，首个针对 SR 的成员推断攻击。与传统的示例级攻击不同，我们的攻击具有 speaker 级成员推断，即确定一个给定的推断声音是否在模型训练中出现过。这是非常有用和实用的，因为训练和推断声音通常不同，而 SR 的开放集成特性也使得这种攻击有意义。我们利用 SR 的内部亲缘和外部远离两个训练目标，将不同的声音分类为两组特征驱动了精心设计的特征工程，以进行攻击。为了提高攻击的通用性，我们提出了一种新的混合比例训练策略。为了提高攻击性能，我们引入了声音块拼接技术，并根据推断声音的数量进行模型训练。我们的攻击可以在白盒和黑盒两种enario下进行。此外，我们还提出了两种新的黑盒查询数量减少技术，以保持攻击性能。我们的实验证明了 SLMIA-SR 的有效性。

Uncertainty quantification for learned ISTA

paper_url: http://arxiv.org/abs/2309.07982
repo_url: None
paper_authors: Frederik Hoppe, Claudio Mayrink Verdun, Felix Krahmer, Hannah Laus, Holger Rauhut
for: 这篇论文主要是为了解决 inverse problems 中的问题，通过 integrate 数学模型和深度学习技术，以提高问题的解决效率和可解释性。
methods: 这篇论文使用了 algorithm unrolling schemes 这种模型基于深度学习技术，并将具有解释性的 prior domain knowledge 纳入训练过程中。
results: 这篇论文提出了一种可以获得 confidence intervals 的方法，以便在解决 inverse problems 中提高 uncertainty quantification 的能力。

Abstract
Model-based deep learning solutions to inverse problems have attracted increasing attention in recent years as they bridge state-of-the-art numerical performance with interpretability. In addition, the incorporated prior domain knowledge can make the training more efficient as the smaller number of parameters allows the training step to be executed with smaller datasets. Algorithm unrolling schemes stand out among these model-based learning techniques. Despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap proposing a rigorous way to obtain confidence intervals for the LISTA estimator.

摘要
numerical 性能与解释性之间的桥梁是基于深度学习的模型解 inverse 问题，它们在解决高维统计方法的问题上具有优势。此外，结合域知识可以使训练更加效率，因为 fewer 参数使得训练步骤可以使用 smaller datasets。algorithm 折叠方案在这些模型学习技术中占据主导地位。 despite their rapid advancement and their close connection to traditional high-dimensional statistical methods, they lack certainty estimates and a theory for uncertainty quantification is still elusive. This work provides a step towards closing this gap by proposing a rigorous way to obtain confidence intervals for the LISTA estimator.Note: Please keep in mind that the translation is done using a machine translation tool, and the quality of the translation may vary.

Improving physics-informed DeepONets with hard constraints

paper_url: http://arxiv.org/abs/2309.07899
repo_url: None
paper_authors: Rüdiger Brecht, Dmytro R. Popovych, Alex Bihlo, Roman O. Popovych
for: 提高现有物理学 Informed（标准或运算员）神经网络的精度，使其不需要准确地学习系统的初始条件。
methods: 提议使用Physics-Informed Deep Learning（PIDL）策略，使得初始条件不需要被学习，并且 garantia 预测解的连续性。
results: PIDL 策略可以提高现有物理学 Informed 神经网络的精度，并且可以确保预测解的连续性。

Abstract
Current physics-informed (standard or operator) neural networks still rely on accurately learning the initial conditions of the system they are solving. In contrast, standard numerical methods evolve such initial conditions without needing to learn these. In this study, we propose to improve current physics-informed deep learning strategies such that initial conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a DeepONet is applied multiple times to time step a solution, the resulting function is continuous.

摘要
当前的物理学 informed (标准或运算符) 神经网络仍然需要准确地学习系统的初始条件。相比之下，标准的数学方法会自动演化这些初始条件而不需要学习。在本研究中，我们提议改进当前的物理学 informed 深度学习策略，使得初始条件不需要被学习，并且在预测解的过程中被 precisiely 表示。此外，这种方法 garantía 当 DeepONet 被应用多次来步骤解 solution，得到的函数是连续的。

Choosing a Proxy Metric from Past Experiments

paper_url: http://arxiv.org/abs/2309.07893
repo_url: None
paper_authors: Nilesh Tripuraneni, Lee Richardson, Alexander D’Amour, Jacopo Soriano, Steve Yadlowsky
for: 这个论文的目的是提出一种新的统计框架，用于在Homogeneous population的随机实验中定义和构建优化的代理指标。
methods: 这个框架首先将在给定实验中构建优化代理指标转化为一个股票优化问题，这个问题取决于实验中真实的潜在治疗效果和噪声水平。然后，通过对历史 Randomized 实验中的观察治疗效果和代理指标进行减噪处理，提取实验中的潜在治疗效果估计。
results: 这个框架的实现和评估基于一个大量的Randomized 实验 Corpora，并构建了一些高效的代理指标，与多个基elines compare favorably。

Abstract
In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.

摘要
很多随机实验中，长期指标（即首要评估目标）的治疗效果很难或不可能量度。这些长期指标通常需要时间才能响应变化，并且具有较高的噪音水平，使其 faithful 地估计困难。为了解决这个问题，我们常常会测量一些短期代理指标，以便它们可以尽可能地跟踪长期指标，从而用于决策。我们介绍了一种新的统计框架，用于在同质人口中的随机实验中定义和构建优化的代理指标。我们的过程将在给定实验中构建优化的代理指标降低到一个股票优化问题，这个问题取决于实验中真实的潜在治疗效果和噪音水平。然后，我们将历史 Randomized 实验中观察到的治疗效果和代理指标进行滤波，以提取用于优化问题的 latent 治疗效果估计。我们的研究显示，用于某个实验的优化代理指标不是先验定义的，而是基于实验中样本大小（或有效噪音水平）。为了实现和评估我们的框架，我们使用我们的方法在一个大量的Randomized实验 Corpora中进行实践和评估。我们的结果表明，我们的方法可以在这些实验中建立高效的代理指标，并且相比于多种基准，它们表现良好。

Some notes concerning a generalized KMM-type optimization method for density ratio estimation

paper_url: http://arxiv.org/abs/2309.07887
repo_url: https://github.com/cdalecsa/generalized-kmm
paper_authors: Cristian Daniel Alecsa
for: 本研究は密集度比估算のための新たな优化算法を引入します。
methods: 本研究では、KMM法の拡张に基づいて、适切な损失函数を构筑し、密集度比估算に対する更加一般的な Situationをカバーすることを目的としています。
results: 本研究では、适切な损失函数を用いて密集度比估算を行い、训练データおよびテストデータの密集度比を推定することができました。

Abstract
In the present paper we introduce new optimization algorithms for the task of density ratio estimation. More precisely, we consider extending the well-known KMM method using the construction of a suitable loss function, in order to encompass more general situations involving the estimation of density ratio with respect to subsets of the training data and test data, respectively. The associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.

摘要
现在的论文中，我们介绍了一些新的优化算法用于概率比率估计任务。更加准确地说，我们考虑将广泛known的KMM方法扩展，通过构建适当的损失函数，以涵盖更加一般的情况，即对于训练数据和测试数据中的子集的概率比率估计。相关的代码可以在https://github.com/CDAlecsa/Generalized-KMM中找到。

Identifying the Group-Theoretic Structure of Machine-Learned Symmetries

paper_url: http://arxiv.org/abs/2309.07860
repo_url: None
paper_authors: Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva, Alexander Roman, Eyup B. Unlu, Sarunas Verner
for: 这篇论文是用于探讨和识别深度学习中潜在的群论变换的方法。
methods: 这篇论文提出了一些方法来检查和识别深度学习中的机器学习约束的群论结构。它们包括在深度学习阶段或后续处理阶段使用损失函数来探讨子代数结构。
results: 这篇论文通过使用示例来说明了这些方法的效果，并应用到了粒子物理中的SU(3)和SU(5)非阿贝尔 gauge Symmetries的破碎。

Abstract
Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.

摘要
深度学习最近成功地应用于找到保留重要物理量的同态变换。这些技术完全无知，在后续阶段确定发现的同态性。在这封信中，我们提议用于检查和识别深度学习learned的同态性的方法。我们设计损失函数，在深度学习阶段或后续处理阶段探索同态性的子代数结构。我们通过使用U(n)李群家族的例子，获得相应的子代数分解。在素 particles physics中，我们示例了在非阿贝尔 gaugeSymmetries的自发性折损后剩下的 residual symmetries。Here's the translation in Traditional Chinese:深度学习最近成功地应用于找到保留重要物理量的同态变换。这些技术完全无知，在后续阶段确定发现的同态性。在这封信中，我们提议用于检查和识别深度学习learned的同态性的方法。我们设计损失函数，在深度学习阶段或后续处理阶段探索同态性的子代数结构。我们通过使用U(n)李群家族的例子，获得相应的子代数分解。在素粒子物理中，我们示例了在非阿贝尔 gaugeSymmetries的自发性折损后剩下的 residual symmetries。

Complex-Valued Neural Networks for Data-Driven Signal Processing and Signal Understanding

paper_url: http://arxiv.org/abs/2309.07948
repo_url: None
paper_authors: Josiah W. Smith
for: 这份研究是为了提供一个基于 PyTorch 的复值神经网络库，并且提供轻量级的界面 для 复值神经网络操作和架构。
methods: 本研究使用了 PyTorch 库，并提供了一些常用的复值神经网络操作和架构，包括线性层、核函数层、注意力层、批训层和对称层等。此外，还包括一些常用的数据驱动模型化技术，例如批训层和对称层。
results: 本研究提供了一个轻量级的复值神经网络库，并且包括了一些高效的实现方式，例如 batchnorm 和 layernorm 等。此外，还包括了一些未被充分探索的曼哈频率-based 复值神经网络层，这些层在许多研究上显示了杰出的表现。

Abstract
Complex-valued neural networks have emerged boasting superior modeling performance for many tasks across the signal processing, sensing, and communications arenas. However, developing complex-valued models currently demands development of basic deep learning operations, such as linear or convolution layers, as modern deep learning frameworks like PyTorch and Tensor flow do not adequately support complex-valued neural networks. This paper overviews a package built on PyTorch with the intention of implementing light-weight interfaces for common complex-valued neural network operations and architectures. Similar to natural language understanding (NLU), which as recently made tremendous leaps towards text-based intelligence, RF Signal Understanding (RFSU) is a promising field extending conventional signal processing algorithms using a hybrid approach of signal mechanics-based insight with data-driven modeling power. Notably, we include efficient implementations for linear, convolution, and attention modules in addition to activation functions and normalization layers such as batchnorm and layernorm. Additionally, we include efficient implementations of manifold-based complex-valued neural network layers that have shown tremendous promise but remain relatively unexplored in many research contexts. Although there is an emphasis on 1-D data tensors, due to a focus on signal processing, communications, and radar data, many of the routines are implemented for 2-D and 3-D data as well. Specifically, the proposed approach offers a useful set of tools and documentation for data-driven signal processing research and practical implementation.

摘要
复杂值神经网络在信号处理、感知和通信领域表现出了优秀的模型化能力。然而，开发复杂值模型目前需要开发基础的深度学习操作，如线性或核函数层，因为现代深度学习框架如PyTorch和TensorFlow等不充分支持复杂值神经网络。这篇论文介绍了基于PyTorch的一个包，旨在实现轻量级的复杂值神经网络操作和架构。与自然语言理解（NLU）一样，RF信号理解（RFSU）是一个有前途的领域，通过将信号机械学的视角与数据驱动模型的力量相结合，从传统的信号处理算法中扩展出来的。我们在这个包中提供了高效的线性、核函数、注意模块以及激活函数和 нормализаayer，以及一些未explored的复杂值神经网络层，如杯形正则和层正则。尽管我们主要关注1-D数据tensor，由于对信号处理、通信和雷达数据的ocus，许多routines也被实现为2-D和3-D数据。特别是，我们的方法提供了一套有用的工具和文档，供数据驱动的信号处理研究和实践中使用。

Learning to Warm-Start Fixed-Point Optimization Algorithms

paper_url: http://arxiv.org/abs/2309.07835
repo_url: https://github.com/stellatogrp/l2ws_fixed_point
paper_authors: Rajiv Sambharya, Georgina Hall, Brandon Amos, Bartolomeo Stellato
for: 这个论文旨在提出一种机器学习框架，用于优化固定点算法。
methods: 该框架包含一个神经网络，用于将问题参数转换为固定点启动值，然后进行一定数量的固定点迭代。文章提出了两个损失函数，用于将神经网络的输出与真实解做比较。
results: 通过应用这种框架，可以在控制、统计和信号处理等领域中，对固定点算法进行优化，并且可以减少解决问题所需的迭代次数和解决时间。

Abstract
We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.

摘要
我团队引入一种机器学习框架，用于启动fixed-point优化算法。我们的架构包括一个神经网络，将问题参数映射到启动点，然后跟踪一定数量的固定点迭代。我们提出了两个损失函数，一个是将固定点剩余拟合到最小，另一个是将固定点与真实解偏移到最小。因此，神经网络预测的启动点的目的是将下游损失函数最小化。我们的架构的一个重要特点是其灵活性，可以预测任何数量的固定点迭代后的启动点，不受训练数据中的固定点迭代数量限制。我们提供了PAC-Bayes一致bounds，用于未经见过的数据集。在控制、统计和信号处理等应用中，我们发现通过学习的启动点，可以减少解决问题所需的迭代数和解决时间。

Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis

paper_url: http://arxiv.org/abs/2309.07813
repo_url: None
paper_authors: Aarthi Venkat, Joyce Chew, Ferran Cardoso Rodriguez, Christopher J. Tape, Michael Perlmutter, Smita Krishnaswamy
for: 这篇论文是为了描述一种新的推理方法，即指向散射自动编码器（DSAE），用于学习科学知识图的层次结构。
methods: 这种方法使用指向的几何散射变换，并结合自动编码器的非线性几何性和几何空间的几何性，来学习潜在层次结构。
results: 论文表明，使用DSAE方法可以在描述指向图的任务上表现出色，超过了许多其他方法。

Abstract
Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.

摘要
<>TRANSLATE_TEXTDirected graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.TRANSLATE_TEXT

Communication Efficient Private Federated Learning Using Dithering

paper_url: http://arxiv.org/abs/2309.07809
repo_url: None
paper_authors: Burak Hasircioglu, Deniz Gunduz
for: 保护隐私 während 确保有效的通信
methods: 使用 subtractive dithering 量化方法实现
results: 实验结果表明，我们的方法可以保持与其他客户端的隐私相同水平，同时减少了通信量，并且准确率与使用全精度Gradient方法相同。

Abstract
The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.

摘要
federated learning 中保护隐私的任务是一个基本挑战，我们在可信汇聚模型中解决了这个问题，并提出了同时实现隐私和效率通信的解决方案。我们表明，在客户端使用基于减准的抖动 quantization scheme 可以有效地重现在汇聚器中添加正常噪声的过程。这意味着我们可以对其他客户端保持同样的隐私保护水平，同时减少了与汇聚器之间的通信量，相比于将全精度梯度传输和使用中央噪声添加。我们还经验证了我们的提议方法和全精度梯度方法的准确性相同。

Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

paper_url: http://arxiv.org/abs/2309.07742
repo_url: None
paper_authors: Emanuele Marconato, Andrea Passerini, Stefano Teso
for: 本研究旨在提供一个数学框架，以便获取可解释的表示，并且适用于后期解释和基于概念的神经网络。
methods: 本研究使用了 causal 表示学习的最新进展，并且直接模型了人类投资者的元素。这允许我们获取一个原则性的可解释性概念，并且与人类的概念词汇相互对应。
results: 本研究提出了一个名为“名称转移游戏”的概念，用于连结了解可解释性和可读性。此外，我们还证明了可解释性和概念泄露之间的关联，以及内容Style分类的关联。

Abstract
Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.

摘要
Focus in Explainable AI 是从低级元素定义的解释向解释编码在数据上学来的可解释概念移植。然而，如何可靠地获得这些概念仍然是一个不清楚的问题。无共识的概念可解释性存在，导致使用 post-hoc 解释器和基于神经网络的概念网络的概念都是通过不同的、互不兼容的策略获得的。然而，大多数情况忽略了人类的问题：一个表示只有在人类接收端可以理解。人类可解释表示学习的关键挑战在于如何模型和实现这个人类元素。在这种工作中，我们提出了一种数学框架，用于获得适用于 post-hoc 解释器和基于神经网络的可解释表示。我们的 HRL formalization 基于 recent Advances in causal representation learning，并直接模型了人类股东。这使得我们可以 derivation 一种原则性的对机器表示和人类理解的词汇之间的协调。在这样做之前，我们将 alignment 和可解释性联系起来，通过一个简单直观的名称传递游戏，并将 alignment 与知名的表示性质联系起来。我们还表明了协调与潜在的概念泄露问题和内容风格分离问题之间的联系， alles durch eine allgemeine information-theoretische Reformulierung dieser Eigenschaften。我们的概念化旨在跨越人类和算法两侧的可解释性，建立一个新的研究领域，即人类可解释表示。

Slow Invariant Manifolds of Singularly Perturbed Systems via Physics-Informed Machine Learning

paper_url: http://arxiv.org/abs/2309.07946
repo_url: None
paper_authors: Dimitrios G. Patsatzis, Gianluca Fabiani, Lucia Russo, Constantinos Siettos
for: 该论文旨在提出一种基于物理信息的机器学习（PIML）方法，用于近似稳定演化涉及的各种系统中的慢演化替换（SIMs）。
methods: 该方法使用了两种神经网络结构，即逻辑神经网络（FNNs）和随机投影神经网络（RPNNs），并使用了符号计算来计算学习过程中所需的梯度。
results: 该方法可以提供与传统GSPT基于方法相等或更高的准确性，并且不受杂态参数的大小影响。此外，该方法的计算成本比Symbolic、自动和数值近似方法更低。

Abstract
We present a physics-informed machine-learning (PIML) approach for the approximation of slow invariant manifolds (SIMs) of singularly perturbed systems, providing functionals in an explicit form that facilitate the construction and numerical integration of reduced order models (ROMs). The proposed scheme solves a partial differential equation corresponding to the invariance equation (IE) within the Geometric Singular Perturbation Theory (GSPT) framework. For the solution of the IE, we used two neural network structures, namely feedforward neural networks (FNNs), and random projection neural networks (RPNNs), with symbolic differentiation for the computation of the gradients required for the learning process. The efficiency of our PIML method is assessed via three benchmark problems, namely the Michaelis-Menten, the target mediated drug disposition reaction mechanism, and the 3D Sel'kov model. We show that the proposed PIML scheme provides approximations, of equivalent or even higher accuracy, than those provided by other traditional GSPT-based methods, and importantly, for any practical purposes, it is not affected by the magnitude of the perturbation parameter. This is of particular importance, as there are many systems for which the gap between the fast and slow timescales is not that big, but still ROMs can be constructed. A comparison of the computational costs between symbolic, automatic and numerical approximation of the required derivatives in the learning process is also provided.

摘要
我们提出了一种物理学 informative machine learning（PIML）方法，用于简单减少系统中的迟停态 manifold（SIM）的近似，并提供了一个明确的函数形式，以便建立和计算简化过程的简化模型（ROM）。我们的方案解决了对对称等式（IE）的解决方案在几何学上困难调适理论（GSPT）框架下。为解决IE，我们使用了两种神经网络结构， specifically feedforward neural networks（FNNs）和随机投射神经网络（RPNNs），并使用symbolic differentiations来计算学习过程中所需的梯度。我们评估了我们的PIML方法的效率，使用了三个问题作为测试：Michaelis-Menten、target mediated drug disposition reaction mechanism和3D Sel'kov model。我们发现，我们的PIML方法可以提供相等或更高的精度，并且不受 perturbation parameter的大小影响。这是特别重要的，因为有许多系统，其中速速态和慢态态之间的差距不大，但仍然可以建立ROM。我们还提供了 символіic、自动和数字approximation的计算成本比较。

Understanding Vector-Valued Neural Networks and Their Relationship with Real and Hypercomplex-Valued Neural Networks

paper_url: http://arxiv.org/abs/2309.07716
repo_url: None
paper_authors: Marcos Eduardo Valle
for:* This paper presents a broad framework for vector-valued neural networks (V-nets) that can naturally consider the intercorrelation between feature channels.methods:* The paper explains the relationship between vector-valued and traditional neural networks, and shows how V-nets can be implemented in current deep-learning libraries as real-valued networks.results:* The paper provides a more robust training method for deep learning models by using vector-valued neural networks with fewer parameters.

Abstract
Despite the many successful applications of deep learning models for multidimensional signal and image processing, most traditional neural networks process data represented by (multidimensional) arrays of real numbers. The intercorrelation between feature channels is usually expected to be learned from the training data, requiring numerous parameters and careful training. In contrast, vector-valued neural networks are conceived to process arrays of vectors and naturally consider the intercorrelation between feature channels. Consequently, they usually have fewer parameters and often undergo more robust training than traditional neural networks. This paper aims to present a broad framework for vector-valued neural networks, referred to as V-nets. In this context, hypercomplex-valued neural networks are regarded as vector-valued models with additional algebraic properties. Furthermore, this paper explains the relationship between vector-valued and traditional neural networks. Precisely, a vector-valued neural network can be obtained by placing restrictions on a real-valued model to consider the intercorrelation between feature channels. Finally, we show how V-nets, including hypercomplex-valued neural networks, can be implemented in current deep-learning libraries as real-valued networks.

摘要
尽管深度学习模型在多维信号和图像处理方面取得了多个成功应用，但大多数传统神经网络仍然处理由多维数组表示的实数据。通常情况下，神经网络在训练数据中学习Feature通道之间的相互关联，需要大量参数并且精心训练。相比之下，向量值神经网络是为处理向量数组而设计的，自然地考虑Feature通道之间的相互关联，通常具有 fewer parameters，并且训练更加稳定。这篇论文旨在提出一个抽象框架，用于描述向量值神经网络，被称为V-网络。在这个上下文中，幂复数值神经网络被视为向量值模型的一种特殊情况，具有附加的代数性质。此外，这篇论文还解释了向量值神经网络与传统神经网络之间的关系。具体来说，一个向量值神经网络可以通过对实数据进行限制，使其考虑Feature通道之间的相互关联。最后，我们展示了如何在当前深度学习库中实现V-网络，包括幂复数值神经网络。

Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context

paper_url: http://arxiv.org/abs/2309.07708
repo_url: None
paper_authors: Haochong Xia, Shuo Sun, Xinrun Wang, Bo An
for: 这个论文的目的是提高金融预测精度、管理风险和促进金融决策。
methods: 该论文提出了一种新的金融市场模拟方法，包括市场动力模型、动态时间戳拼接和生成对抗网络。
results: 论文在使用 Dow Jones 工业指数数据从 2000 年到 2023 年进行评估，并显示了与 4 种现有时间序列生成模型相比的超越性表现。

Abstract
Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.

摘要
金融模拟器在提高预测精度、管理风险和促进战略金融决策方面扮演着重要角色。despite the development of financial market simulation methodologies， existing frameworks often struggle with adapting to specialized simulation context. We identify the challenges as follows:1. current financial datasets do not contain context labels;2. current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities;3. the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data.To address these challenges, our contributions are as follows:1. we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics;2. we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer;3. we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the first stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage.We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.

Causal Entropy and Information Gain for Measuring Causal Control

paper_url: http://arxiv.org/abs/2309.07703
repo_url: None
paper_authors: Francisco Nunes Ferreira Quialheiro Simoes, Mehdi Dastani, Thijs van Ommen
for: 本文旨在提出一种基于 causal 结构的信息理论量表示Feature Selection的方法，以提高模型的解释性。
methods: 本文提出了两种新的信息理论量： causal entropy 和 causal information gain，它们可以评估特定输出变量的 causal 影响。
results: 对比标准的共购信息， causal information gain 能够更好地揭示控制输出变量的特定特征。

Abstract
Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.

摘要
人工智能模型和方法通常缺乏 causal 解释力。尽管批处可解释机器学习（IML）方法得到了进步，它们经常将重要性归属于无关 causal 影响的输出变量。选择 causally 相关的特征 среди被这些方法认为是相关的特征，或者在模型训练之前进行选择，可以提供一个解决方案。基于信息理论量的特征选择方法已经成功地鉴定了统计上相关的特征。然而，这些信息理论量并不包含 causality，因此无法适用于这些情况。为解决这个挑战，本文提出了包含系统 causal 结构的信息理论量，可以用来评估特征对某个输出变量的 causal 重要性。specifically，我们引入 causal 版本的 entropy 和 mutual information，称为 causal entropy 和 causal 信息增强，用于评估特征对输出变量的控制能力。这些新定义的量捕捉变量在其他变量的间接作用下的 entropy 变化。我们 derivation 了基本的结论，证明这些量与 causal 效应的存在有关。我们还示出了使用 causal 信息增强在特征选择中的优势， highlighting 它们可以更好地揭示选择输出变量的控制特征。我们的调查开创了用于在 causation 领域中提高解释力的方法的道路。

FedFNN: Faster Training Convergence Through Update Predictions in Federated Recommender Systems

paper_url: http://arxiv.org/abs/2309.08635
repo_url: None
paper_authors: Francesco Fabbri, Xianghang Liu, Jack R. McKenzie, Bartlomiej Twardowski, Tri Kurniawan Wijaya
for: 这篇论文的目的是提出一种加速分布式机器学习的算法，以提高在线个性化的时间效率，同时保持用户数据隐私。
methods: 这篇论文使用的方法是预测未被采样的用户的weight更新，使用采样集的更新来预测。
results: 这篇论文的实验结果显示，FedFNN可以比其他方法更快地完成分布式模型训练，并且可以维持或改善准确性。

Abstract
Federated Learning (FL) has emerged as a key approach for distributed machine learning, enhancing online personalization while ensuring user data privacy. Instead of sending private data to a central server as in traditional approaches, FL decentralizes computations: devices train locally and share updates with a global server. A primary challenge in this setting is achieving fast and accurate model training - vital for recommendation systems where delays can compromise user engagement. This paper introduces FedFNN, an algorithm that accelerates decentralized model training. In FL, only a subset of users are involved in each training epoch. FedFNN employs supervised learning to predict weight updates from unsampled users, using updates from the sampled set. Our evaluations, using real and synthetic data, show: 1. FedFNN achieves training speeds 5x faster than leading methods, maintaining or improving accuracy; 2. the algorithm's performance is consistent regardless of client cluster variations; 3. FedFNN outperforms other methods in scenarios with limited client availability, converging more quickly.

摘要
分布式学习（FL）已成为分布式机器学习的关键方法，提高在线个性化而保护用户数据隐私。在传统方法中，用户的私人数据将被中央服务器发送，而在FL中，设备在本地进行计算，并将更新发送到全球服务器。在这种情况下，一个主要挑战是实现快速准确的模型训练——这对于推荐系统来说非常重要，因为延迟可能会影响用户的参与度。本文介绍了FedFNN算法，它可以加速分布式模型训练。在FL中，只有一 subset of 用户参与每次训练epoch。FedFNN使用supervised learning来预测unsampled 用户的weight更新，使用sampled set的更新进行预测。我们的评估，使用实际和synthetic数据，显示：1. FedFNN可以在5倍 бы于领先方法进行训练，保持或提高准确性;2. FedFNN的性能是客户端群集变化的不变的;3. FedFNN在有限客户可用情况下比其他方法更快 converges。

A DenseNet-based method for decoding auditory spatial attention with EEG

paper_url: http://arxiv.org/abs/2309.07690
repo_url: https://github.com/xuxiran/asad_densenet
paper_authors: Xiran Xu, Bo Wang, Yujie Yan, Xihong Wu, Jing Chen
For: The paper aims to improve the performance of auditory spatial attention detection (ASAD) using a 3D deep convolutional neural network (DenseNet-3D) to extract temporal and spatial features of the neural representation for the attended locations.* Methods: The proposed method transforms the original EEG channels into a 2D spatial topological map and then uses a 3D DenseNet to extract temporal and spatial features of the neural representation for the attended locations.* Results: The proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet’s 90.6%) with a 1-second decision window for the widely used KULeuven (KUL) dataset.

Abstract
Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: https://github.com/xuxiran/ASAD_DenseNet

摘要
听觉空间注意力检测（ASAD）目标是通过电enzephalogram（EEG）在多个发声人员的情况下解码注意力所投入的空间位置。ASAD技术由大脑偏好听觉空间注意力处理中的 cortical neural response的 lateralization inspirited，并在听觉注意力解码（AAD）中表现出了良好的性能。在先前的ASAD方法中，EEG电极的空间分布未能得到完全利用，这可能会限制这些方法的性能。在当前的工作中，我们将原始EEG通道转换成二维（2D）空间Topological Map，将EEG数据转换成三维（3D）排序，其中包含空间-时间信息。然后，我们使用3D深度卷积神经网络（DenseNet-3D）提取时间和空间特征，以提高注意力解码精度。结果显示，我们的方法在KUL数据集上达到了94.4%的解码精度，高于先前的SOTA方法（XANet的90.6%）。代码可以在GitHub上获取：https://github.com/xuxiran/ASAD_DenseNet。

Benchmarking machine learning models for quantum state classification

paper_url: http://arxiv.org/abs/2309.07679
repo_url: None
paper_authors: Edoardo Pedicillo, Andrea Pasquale, Stefano Carrazza
for: 本文是用于描述量子计算领域中两级量子状态（qubits）的处理方法。
methods: 本文使用了多种分类技术来识别测量结果中的基态和升态。
results: 本文对真实的量子设备进行了多种分类技术的 benchmarking。

Abstract
Quantum computing is a growing field where the information is processed by two-levels quantum states known as qubits. Current physical realizations of qubits require a careful calibration, composed by different experiments, due to noise and decoherence phenomena. Among the different characterization experiments, a crucial step is to develop a model to classify the measured state by discriminating the ground state from the excited state. In this proceedings we benchmark multiple classification techniques applied to real quantum devices.

摘要
量子计算是一个快速发展的领域，信息通过两级量子状态 known as qubits 处理。当前实现 qubits 需要精心准备，由于噪声和紊乱现象而需要不同的实验。其中一个关键步骤是开发一个模型，用于将测量结果分类， diferenciating 基准态从受激态。在这个论文中，我们对真实量子设备上的多种分类技术进行了比较。

Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis

paper_url: http://arxiv.org/abs/2309.07675
repo_url: None
paper_authors: Mehdi Zadem, Sergio Mover, Sao Mai Nguyen
for: 本研究旨在提出一种开发机制，以便自动发现符号化目标表示，并使用这种表示来进行高级刺激学习。
methods: 本研究使用了符号推理来实现目标发现，并使用了一种层次推理算法来同时学习目标表示和层次策略。
results: 实验结果表明，使用符号推理进行目标发现可以提高数据效率，并且学习的目标表示是可解释的和可传输的。

Abstract
Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.

摘要
开放式学习受益于符号方法的目标表示，因为它们提供了结构化知识的方式，以便有效和可传递地学习。然而，现有的层次强化学习（HRL）方法，它们通常是通过手动设定目标表示来实现的，这会带来一个挑战：自动发现符号目标表示，需要保留环境动力学的重要信息。在这篇论文中，我们提出一种发展机制，通过生成表示来自动发现目标。我们引入了一种封建的HRL算法，同时学习目标表示和层次策略。该算法使用神经网络上的符号可达性分析来approximate环境转移关系，并用此来精细化目标表示。我们在复杂的导航任务上评估了我们的方法，显示学习的表示是可读取的、可传递的和数据效率地学习。

Physics-constrained robust learning of open-form PDEs from limited and noisy data

paper_url: http://arxiv.org/abs/2309.07672
repo_url: None
paper_authors: Mengge Du, Longfeng Nie, Siyu Lou, Yuntian Chenc, Dongxiao Zhang
for: This paper aims to propose a framework for robustly uncovering open-form partial differential equations (PDEs) from limited and noisy data, which is a significant challenge in nonlinear dynamic systems.
methods: The proposed framework, called R-DISCOVER, uses two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training.
results: The numerical experiments demonstrate that the proposed framework can uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.

Abstract
Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.

摘要
<>translate "Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding." into Simplified Chinese.以下是文本的中文翻译：“揭示非线性动力系统下的基本法则还是一项具有挑战性的任务，特别是当面临有噪讯观测和没有先验知识时。这项研究提出了R-DISCOVER框架，用于有效地推测非线性动力系统中的开放式偏微分方程（PDE）。该框架通过两个交替更新过程：发现和嵌入来实现。发现阶段使用 симвоlic Representation 和基于强化学习（RL）的混合 PDE 生成器来高效地生成多种开放式 PDE 树结构。一个基于神经网络的预测模型用于评估系统响应，并作为生成器的奖励评价器。RL 方法用于迭代优化生成器，并选择最佳performing PDE 以 Parameters-free 稳定度量。嵌入阶段将初始identified PDE 作为物理约束 integrate 到预测模型中，以实现Robust 训练。自动构建计算图和嵌入过程 без人工参与的 Traversal PDE 树 automates 构建计算图和嵌入过程。 numerics experiments 表明我们的框架可以从有限和高噪讯数据中推测动力系统的基本法则，并在其他基于物理学习网络的发现方法中表现出色。这项工作开启了新的可能性，用于探索具有有限理解的实际系统。”

Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE

paper_url: http://arxiv.org/abs/2309.07663
repo_url: None
paper_authors: Yuma Ichikawa, Koji Hukushima
for: 避免Variational Autoencoder（VAE）中的 posterior collapse，提高表示学习质量。
methods: 使用高维limite下的minimal VAE，通过closed-form表达分析β参数与数据集大小、 posterior collapse、rate-distortion曲线之间的关系。
results: beta参数可以induce posterior collapse，不同于常见的 regularization parameters。beta值越高，普遍错误曲线上延伸的满板期变得越长，超过某个 beta 阈值后变为∞。这意味着beta需要仔细调整，并且较大的数据集需要以高率实现高质量的rate-distortion曲线。

Abstract
In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.

摘要
在变量自适应器（VAE）中，变量 posterior 经常与假设一样，这被称为 posterior collapse，这会妨碍表示学习质量。为了解决这个问题，VAE中引入了可调参数 beta。这篇论文提供了关于 beta 在 VAE、数据集大小、 posterior collapse 和 rate-distortion 曲线之间的关系的关闭式表达。这些结果表明，在一定的 beta 阈值以上，generalization error 的总化Error 会出现长满的浓淡区。 beta 增加时，浓淡区的长度延长，并 eventually 变为无限大。这意味着 beta 不同于常见的 regularization 参数，可以导致 posterior collapse，而不是 dataset size。因此，beta 是一个危险的参数，需要仔细调整。此外，考虑数据集大小对 rate-distortion 曲线的依赖，需要一个较大的数据集以获得高率的 rate-distortion 曲线。广泛的数值实验支持我们的分析。

Structure-Preserving Transformers for Sequences of SPD Matrices

paper_url: http://arxiv.org/abs/2309.07579
repo_url: https://github.com/mathieuseraphim/spdtransnet
paper_authors: Mathieu Seraphim, Alexis Lechervy, Florian Yger, Luc Brun, Olivier Etard
for: 这 paper 是用于分类Symmetric Positive Definite matrices 的sequence，保持它们的Riemannian geometry。
methods: 这 paper 使用Transformer-based auto-attention机制，将EEG-derived covariance matrices 转化为sequence，并进行自动睡眠阶段分类。
results: 这 paper 在使用标准数据集上实现了高水平的stage-wise性能。

Abstract
In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.

摘要
Note:* "Symmetric Positive Definite" (SPD) matrices are matrices that are symmetric and positive definite, meaning that they are equal to their own transpose and have all positive eigenvalues.* "Riemannian geometry" refers to the study of curved spaces, such as the surface of a sphere or a saddle-shaped surface, and the properties of curves and surfaces within those spaces. In this context, the Riemannian geometry of the SPD matrices is preserved throughout the analysis.

Naturalistic Robot Arm Trajectory Generation via Representation Learning

paper_url: http://arxiv.org/abs/2309.07550
repo_url: None
paper_authors: Jayjun Lee, Adam J. Spiers
for: 这个论文旨在提高助手机器人在家庭环境中的预测和人类化运动。特别是用于支持脊梁部分 паралиysis 患者的独立生活。
methods: 本文使用自我监督学习方法，使用自适应域 spatial-temporal图 neural network 来学习人类示范动作。通过穿戴在人臂上的 IMU 传感器记录的无动作任务示范数据，学习出自然和功能的喝奶动作轨迹 для UR5e 机器人臂。
results: 研究表明，通过自适应域 spatial-temporal图 neural network 学习人类示范动作，可以生成自然和功能的喝奶动作轨迹，并且可以在不同的人类动作示范数据上进行适应和适应。

Abstract
The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.

摘要
文中提到了在家庭环境中 integrating manipulate 机器人，这表明了更加预测可靠并且人类化的机器人运动的需求，尤其是用于椅子上的帮助机器人，以支持脊梁部分瘫痪人的独立。本文探讨了通过人类示范者的模仿来生成自然的动作轨迹。我们使用了一种自动适应空间时间图 neuronal network 来实现这一点，并使用了来自不同人类动作轨迹数据的穿梭IMU传感器来学习。我们使用了多个参与者的臂动作数据来生成自然而有用的喝彩动作轨迹 для UR5e 机器人臂。

Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering

paper_url: http://arxiv.org/abs/2309.07548
repo_url: None
paper_authors: Yuki Akiyama, Konstantinos Slavakis
For: 本研究旨在探讨动力学学习（Reinforcement Learning，RL）的算法基础和理论核心，特别是引入距离bellman映射，这种映射在征表kernel空间（RKHS）中定义，并且可以利用RKHS的强表示性和内积来获得更好的近似性。* Methods: 本研究使用的方法包括提出了一种新的距离bellman映射，这种映射在RKHS中定义，并且可以利用RKHS的强表示性和内积来获得更好的近似性。此外，本研究还提出了一种基于这种映射的策略迭代算法，用于在线选择最佳exponent $p$，以适应线性适应过滤器中的异常值。* Results: 本研究的数据测试显示，基于提出的映射和策略迭代算法的方法可以在synthetic数据上显示出优于非RL和基于kernel RL的方法。

Abstract
This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.

摘要

VerilogEval: Evaluating Large Language Models for Verilog Code Generation

paper_url: http://arxiv.org/abs/2309.07544
repo_url: None
paper_authors: Mingjie Liu, Nathaniel Pinckney, Brucek Khailany, Haoxing Ren
for: This paper is written for evaluating the performance of large language models (LLMs) in generating Verilog code for hardware design and verification.
methods: The paper proposes a benchmarking framework for LLMs that includes a comprehensive evaluation dataset of 156 problems from the Verilog instructional website HDLBits, and a method for automatically testing the generated Verilog code for functional correctness.
results: The paper shows that the Verilog code generation capability of pretrained language models can be improved with supervised fine-tuning by bootstrapping with LLM-generated synthetic problem-code pairs.Here’s the simplified Chinese version of the three key points:
for: 这篇论文是为了评估大语言模型（LLM）在硬件设计和验证中的Verilog代码生成性能而写的。
methods: 论文提出了一种特化于LLM的评估框架，包括156个HDLLBits网站上的Verilog教程问题集，以及一种自动比较生成的Verilog代码与经典解决方案的方法。
results: 论文表明，通过监督微调，使用LLM生成的 simulateproblem-code对可以提高预训练的语言模型的Verilog代码生成能力。

Abstract
The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.

摘要
LLMs的增长 популярность开创了它们在多个领域的应用。这篇论文提出了专门为鉴定LLM表现在Verilog代码生成和验证中而设计的 bencmarking 框架。我们提供了完整的评估数据集，包括156个来自 HDLBits 教程网站的 Verilog 代码生成任务，这些任务的复杂程度从简单的 combinational 电路到复杂的 finite state machine。生成的 Verilog 代码可以通过与 golden 解决方案进行对比来自动测试其功能正确性。此外，我们还证明了预训练的语言模型可以通过监督微调来提高 Verilog 代码生成能力。

Adaptive approximation of monotone functions

paper_url: http://arxiv.org/abs/2309.07530
repo_url: None
paper_authors: Pierre Gaillard, Sébastien Gerchinovitz, Étienne de Montbrun
for: 本研究 targets the classical problem of approximating a non-decreasing function $f$ in $L^p(\mu)$ norm using sequential queries.
methods: 我们提出了一种新的GreedyBox算法，是Novak（1992）原始的数值积分算法的推广。我们证明了GreedyBox算法在任何函数$f$上具有最佳样本复杂度。
results: 我们发现了多种性能差异，包括适应性和非适应性算法、平滑和分割平滑函数、 monotone和非 monotone函数之间的性能差异。此外，我们还计算了最佳合理极小样本复杂度 для特定的函数$f$.

Abstract
We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.

摘要
我们研究了一个古典问题，即使用顺序询问函数 $f: \mathcal{X} \to \mathcal{Y}$ 的渐近估计，在 $L^p(\mu)$ нор下的渐近估计问题。我们考虑了知道的封闭实数interval $\mathcal{X}$, $\mathcal{Y}$ 和知道的概率度量 $\mu$ на $\mathcal{X}$。对于任何函数 $f$，我们Characterize了终止后的 $\hat{f}$ 的 $L^p(\mu)$ 误差下的最小询问数量，而不是worst-case结果，这些结果是函数 $f$ 具体的dependent。为解决这个问题，我们引入 GreedyBox 算法，是 Novak (1992) 提出的数值积分算法的一个扩展。我们证明了 GreedyBox 算法在任何函数 $f$ 上实现最佳样本Complexity，几乎是 logarithmic factor。此外，我们发现了关于 piecewise-smooth 函数的结果。对于这些函数，GreedyBox 算法的 $L^p(\mu)$ 误差明显更快下降，比预期的更快。甚至可以通过一个简单的修改，使 GreedyBox 算法在这些函数上实现最佳的混合估计率，我们 computed explicitly。我们的结论显示了多个性能差异，包括适应和非适应算法、紧密和 piecewise-smooth 函数、以及单调和非单调函数。最后，我们提供了一些数学实验支持我们的理论结果。

Learning Beyond Similarities: Incorporating Dissimilarities between Positive Pairs in Self-Supervised Time Series Learning

paper_url: http://arxiv.org/abs/2309.07526
repo_url: None
paper_authors: Adrian Atienza, Jakob Bardram, Sadasivan Puthusserypady
for: 用于提高时间序列数据中的冲击性识别率
methods: 利用自然学习方法，通过对着同类样本进行压缩编码，并考虑异同样本的关系，从而提高时间序列数据的表征
results: 在使用电心率信号进行识别AFib时，提高了+10%的检测精度，并且可能开拓了新的SSL方法应用于时间序列数据

Abstract
By identifying similarities between successive inputs, Self-Supervised Learning (SSL) methods for time series analysis have demonstrated their effectiveness in encoding the inherent static characteristics of temporal data. However, an exclusive emphasis on similarities might result in representations that overlook the dynamic attributes critical for modeling cardiovascular diseases within a confined subject cohort. Introducing Distilled Encoding Beyond Similarities (DEBS), this paper pioneers an SSL approach that transcends mere similarities by integrating dissimilarities among positive pairs. The framework is applied to electrocardiogram (ECG) signals, leading to a notable enhancement of +10\% in the detection accuracy of Atrial Fibrillation (AFib) across diverse subjects. DEBS underscores the potential of attaining a more refined representation by encoding the dynamic characteristics of time series data, tapping into dissimilarities during the optimization process. Broadly, the strategy delineated in this study holds the promise of unearthing novel avenues for advancing SSL methodologies tailored to temporal data.

摘要
自适应学习（SSL）方法在时间序列分析中已经表现出其效iveness，通过识别同时间序列数据的相似性来编码时间序列中的静态特征。但是，强调同时间序列的相似性可能会忽略了模型心血管疾病中的动态特征，这可能会导致模型的识别率下降。为了解决这个问题，本文提出了一种名为“压缩编码 beyond similarities”（DEBS）的SSL方法，它通过对正例对的不同性进行束缚来补充相似性编码。这种方法在应用于电cardiogram（ECG）信号上，导致了AFib检测精度的提高达10%。DEBS表明了在SSL方法中编码时间序列数据的动态特征可以通过压缩编码来提高模型的性能。总之，本研究的策略可能会探索新的SSL方法，以更好地适应时间序列数据的特点。

Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering

paper_url: http://arxiv.org/abs/2309.07486
repo_url: None
paper_authors: Sepideh Aghamolaei, Mohammad Ghodsi
for: 本文 escrit para resolver el problema de clasificación de puntos etiquetados con k label, y presenta una aproximación de la solución utilizando un algoritmo de ordenamiento de mapas de calor.
methods: El algoritmo utilizado es un algoritmo de ordenamiento de mapas de calor, que se basa en la reducción de dimensionalidad mediante hashing localmente sensitivo. Se prueba la complejidad NP-hard del problema y se da un algoritmo de aproximación para un caso especial NP-hard.
results: Se compara el algoritmo propuesto con k-means y DBSCAN en términos de calidad de agrupación y tiempo de ejecución en varios grafos dirigidos y no dirigidos de redes de correo electrónico y redes de computadoras. Los resultados muestran que el algoritmo propuesto tiene una mejor calidad de agrupación y un tiempo de ejecución más rápido que k-means y DBSCAN en muchos casos.

Abstract
Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.

摘要
给定一个点集合，我们引入热图排序问题，即重新排序和合并点和维度，保持cluster（标签）。一个cluster preserved if it remains connected, 即不被分解成多个cluster并且没有两个cluster合并。我们证明该问题是NP困难，并提供了一个固定参数算法，其中每台机器有SUBLINEAR的内存，总内存量为线性。我们还提供了一个优化算法 дляNP困难的特殊情况。我们通过本地敏感哈希进行维度减少对email和计算机网络directed和undirected图进行实验比较我们的算法与k-means和DBSCAN分 clustering。

Improved Auto-Encoding using Deterministic Projected Belief Networks

paper_url: http://arxiv.org/abs/2309.07481
repo_url: None
paper_authors: Paul M Baggenstoss
for: 本研究利用决定性投影信念网络（D-PBN）的特有特性，全面利用可训练复合启动函数（TCAs）。
methods: 本研究使用D-PBN auto-encoder，并利用TCAs作为活化函数。
results: 研究发现，使用D-PBN auto-encoder和TCAs可以明显超越标准 auto-encoder，包括变量 auto-encoder。

Abstract
In this paper, we exploit the unique properties of a deterministic projected belief network (D-PBN) to take full advantage of trainable compound activation functions (TCAs). A D-PBN is a type of auto-encoder that operates by "backing up" through a feed-forward neural network. TCAs are activation functions with complex monotonic-increasing shapes that change the distribution of the data so that the linear transformation that follows is more effective. Because a D-PBN operates by "backing up", the TCAs are inverted in the reconstruction process, restoring the original distribution of the data, thus taking advantage of a given TCA in both analysis and reconstruction. In this paper, we show that a D-PBN auto-encoder with TCAs can significantly out-perform standard auto-encoders including variational auto-encoders.

摘要
在这篇论文中，我们利用杜比贝尔网络（D-PBN）的特有性，全面利用可调编组活化函数（TCAs）。D-PBN 是一种自适应神经网络，通过“后退”方式运作，可以将数据分布改变，使之更容易进行线性变换。因为 D-PBN 在重建过程中会将 TCAs 反转，所以可以利用给定的 TCA 进行分析和重建。在这篇论文中，我们表明了 D-PBN 自适应神经网络加上 TCAs 可以明显超越标准自适应神经网络，包括可变自适应神经网络。

SC-MAD: Mixtures of Higher-order Networks for Data Augmentation

paper_url: http://arxiv.org/abs/2309.07453
repo_url: None
paper_authors: Madeline Navarro, Santiago Segarra
for: 这个论文旨在扩展基于图的对等连接到高阶关系上，以满足复杂系统的研究需求。
methods: 该论文提出了一种基于 simplicial complex 的数据增强方法，包括线性和非线性混合机制，以生成混合样本。此外，它还提出了一种几何归一化混合方法来实现数据集之间的关系。
results: 研究人员通过对实验数据集进行混合，并对混合样本进行分类，发现混合样本可以在Homomorphism densities上 interpolate among existing data。

Abstract
The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.

摘要
多样化的复杂系统与多向交互刺激了对高阶关系的扩展。特别是 simplicial complex inspirits 对 graph neural network (GNNs) 的扩展。学习这些系统需要大量数据，可能是非常昂贵或者不可能获得。我们提议对 simplicial complex 进行数据扩充，通过线性和非线性混合机制返回混合的已有标注样本。此外，我们还提出了一种几何 clustering 混合方法，以便在数据驱动的关系下 interpolate 多个 simplicial complex。我们理论上验证了resultant synthetic simplicial complexes 在 homomorphism density 上 interpolate 于现有数据。我们的方法在 both synthetic 和实际世界数据上进行了 simplicial complex classification 的示例。

Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?

paper_url: http://arxiv.org/abs/2309.07452
repo_url: None
paper_authors: Lianke Qin, Zhao Song, Baocheng Sun
for: 本研究的目的是证明NTK regression和GNN Training是等价的。
methods: 本文使用NTK kernel方法来研究图学习，并提出了三个新的理论结论。
results: 本文证明了NTK regression和GNN Training是等价的，并提供了首个NTK formulation for node-level regression。

Abstract
A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression.

摘要
一种在理论深度学习中崛起的趋势是理解深度学习是如何工作的，通过神经束凝彻函数（NTK）来实现。NTK是一种等效于使用梯度下降来训练多层无穷广链神经网络的kernel方法。NTK对理论深度学习是一个重要的进步，它使得研究人员可以使用传统的数学工具来分析深度神经网络的性质和解释各种神经网络技术的概念视角。在NTK的基础之上，研究人员已经提出了\textit{图像神经束凝彻函数（GNTK）}，并在图像级别 regression 中提供了GNTK的形式化。在此纸上，我们提供了三个新的理论结果。首先，我们正式证明了图像级别 regression 的等价性。其次，我们提供了第一个GNTK的形式化，用于节点级别 regression。最后，我们证明了节点级别 regression 的等价性。

TensorFlow Chaotic Prediction and Blow Up

paper_url: http://arxiv.org/abs/2309.07450
repo_url: None
paper_authors: M. Andrecut
for: 预测高维非线性系统的混沌动力学行为
methods: 使用TensorFlow库进行深度神经网络训练和预测
results: 短时预测能够得到有效结果，但长时预测会因TensorFlow库的非束定性导致结果快速衰减和爆炸

Abstract
Predicting the dynamics of chaotic systems is one of the most challenging tasks for neural networks, and machine learning in general. Here we aim to predict the spatiotemporal chaotic dynamics of a high-dimensional non-linear system. In our attempt we use the TensorFlow library, representing the state of the art for deep neural networks training and prediction. While our results are encouraging, and show that the dynamics of the considered system can be predicted for short time, we also indirectly discovered an unexpected and undesirable behavior of the TensorFlow library. More specifically, the longer term prediction of the system's chaotic behavior quickly deteriorates and blows up due to the nondeterministic behavior of the TensorFlow library. Here we provide numerical evidence of the short time prediction ability, and of the longer term predictability blow up.

摘要
<>预测混沌系统的动态是机器学习领域中最为困难的任务之一。在这里，我们试图预测高维非线性系统的空间时间混沌动态。我们使用TensorFlow库，表示深度神经网络训练和预测的状态体系。虽然我们的结果吸引人，并显示了系统的短时预测能力，但我们也发现了TensorFlow库的意外和不愉快的行为。 Specifically，系统的混沌行为的长期预测快速下降和爆炸，这是因为TensorFlow库的非确定性行为。我们提供了数字证据，证明了短时预测能力和长期预测爆炸。

A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time

paper_url: http://arxiv.org/abs/2309.07418
repo_url: None
paper_authors: Yeqi Gao, Zhao Song, Weixin Wang, Junze Yin
for: 这个论文主要针对的是优化大型语言模型（LLM）中的注意力 regression 问题。
methods: 作者提出了一种Iterative greedy algorithm，用于在 $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}{j_0} x ) , {\bf 1}n \rangle^{-1} \exp( \mathsf{A}{j_0} x ), A{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$ loss function上训练，这个loss function是一个一层注意力网络的目标函数。
results: 作者提出了一种时间复杂度为 $\widetilde{O}( ({\cal T}{\mathrm{mat}(n,n,d) + {\cal T}{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ 的训练算法，用于将 $L(X,Y)$ 的损失降到 $\epsilon$ 之下。这个算法可以在大型语言模型中应用。

Abstract
Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}(n,n,d) + {\cal T}_{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.

摘要
大型语言模型（LLM）已经革命化了我们日常生活中的各种方面。在这种情况下，解决注意力回归是对LMM的优化的基本任务。在这种工作中，我们关注于给出可证明保证的一层注意力网络对函数$L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$是 kronecker乘积 between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. $X, Y \in \mathbb{R}^{d \times d}$是我们想要学习的变量， $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. 在多层LLM网络中，矩阵$B \in \mathbb{R}^{n \times d}$可以被视为层的输出，而$A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$可以被视为层的输入。矩阵版本的$x$可以被视为$QK^\top$，而$Y$可以被视为$V$.我们提供了一种迭代循环算法来训练损失函数$L(X,Y)$，该算法在$\epsilon$阈值下运行时间为$\widetilde{O}( ({\cal T}_{\mathrm{mat}(n,n,d) + {\cal T}_{\mathrm{mat}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$。这里的${\cal T}_{\mathrm{mat}(a,b,c)$表示矩阵$a \times b$的乘法，$\omega\approx 2.37$是矩阵乘法的废入。

Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy

paper_url: http://arxiv.org/abs/2309.07402
repo_url: None
paper_authors: Jiaren Xiao, Quanyu Dai, Xiao Shen, Xiaochen Xie, Jing Dai, James Lam, Ka-Wai Kwok
for: 本研究目的是解决在图像上面临着数据标注成本高的情况下，采用 semi-supervised domain adaptation (SSDA) 技术来优化图像中节点的分类。
methods: 本研究提出了一种新的方法 called SemiGCL，它利用图像的本地和全局视图来生成有用的节点表示，并通过对不同视图的节点表示进行对比来增强节点表示的有用性。此外，SemiGCL 还利用了最大 entropy 损失来降低领域差异。
results: 实验结果表明，SemiGCL 在 SSDA 任务中表现出色，高于现有的基线方法。

Abstract
Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.

摘要
很多现实应用中会遇到图像标签稀缺的问题，这是因为数据标签的成本很高。为此，半经典领域适应（SSDA）在图上进行了应用，以利用来自源图的标注知识来帮助目标图的节点分类。然而，到目前为止，已有的跨图节点分类方法并未正式地考虑SSDA问题。为解决SSDA问题，一种名为SemiGCL的新方法被提出，它利用图形对比学习和最大 entropy 训练来生成有用的节点表示。此外，SemiGCL通过对不同视图中的节点表示进行对比，以提高节点表示的有用性。同时，SemiGCL通过对无标签目标节点的 entropy 损失进行反向优化，以减少领域差异。实验结果表明，SemiGCL在 benchmark 数据集上的性能高于当前的基eline。

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

paper_url: http://arxiv.org/abs/2309.07937
repo_url: None
paper_authors: Soumi Maiti, Yifan Peng, Shukjae Choi, Jee-weon Jung, Xuankai Chang, Shinji Watanabe
for: 本研究提出了一种基于语音特征的多任务语言模型（VoxtLM），可以执行四项任务：语音识别、语音生成、文本生成和语音续写。
methods: VoxtLM将文本词汇与自动学习的语音特征拼接起来，使用特殊符号实现多任务学习。与单任务模型相比，VoxtLM在语音生成方面具有显著的改善，语音清晰度从28.9降低到5.6，对象质量从2.68提高到3.90。
results: VoxtLM在语音生成、语音识别和文本生成等方面都超过单任务模型的表现。模型使用公开的数据和训练规则，并将模型检查点开源，以便完全可重现。

Abstract
We propose a decoder-only language model, \textit{VoxtLM}, that can perform four tasks: speech recognition, speech synthesis, text generation, and speech continuation. VoxtLM integrates text vocabulary with discrete speech tokens from self-supervised speech features and uses special tokens to enable multitask learning. Compared to a single-task model, VoxtLM exhibits a significant improvement in speech synthesis, with improvements in both speech intelligibility from 28.9 to 5.6 and objective quality from 2.68 to 3.90. VoxtLM also improves speech generation and speech recognition performance over the single-task counterpart. VoxtLM is trained with publicly available data and training recipes and model checkpoints will be open-sourced to make fully reproducible work.

摘要
我们提议一个只有decoder的语言模型，名为VoxtLM，可以完成四个任务：语音识别、语音生成、文本生成和语音续写。VoxtLM将文本词汇与自然语言的特征Token结合在一起，并使用特殊的Token来实现多任务学习。相比单任务模型，VoxtLM在语音生成方面表现出了显著的改善，其中语音清晰度从28.9降低到5.6，对象质量从2.68提高到3.90。VoxtLM还提高了语音识别和语音生成的性能。VoxtLM通过公共可用数据和训练课程来训练，并将模型检查点开源，以便实现完全可重现的工作。Note: "VoxtLM" is a name I translated as "语音LM" (speech language model) in Simplified Chinese.

EnCodecMAE: Leveraging neural codecs for universal audio representation learning

paper_url: http://arxiv.org/abs/2309.07391
repo_url: https://github.com/habla-liaa/encodecmae
paper_authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer
for: 这项研究的目的是学习一种通用的音频表示学习模型，可以用于各种音频处理任务，如语音、音乐和环境声。
methods: 这项研究使用了基于自我supervised模型（如BERT）的方法，适应音频处理的需求。这些模型利用文本的整数性，因此对音频信号进行映射或改变学习目标是必要的。这项研究使用EnCodec neural audio codec生成扩展目标，并使用masked autoencoder（MAE）来学习通用音频模型。
results: 研究在各种音频任务中表现出色，包括语音、音乐和环境声，并与现有音频表示模型的性能相比或更好。

Abstract
The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMAE, on a wide range of audio tasks spanning speech, music and environmental sounds, achieving performances comparable or better than leading audio representation models.

摘要
“ universal audio representation learning 的目的是获得可以用于多种下游任务中的基础模型，包括语音、音乐或环境 зву频谱。为了解决这个问题，通常使用受自动批评模型（BERT）所启发的方法，并将其适应到音频处理中。这些模型对于文本的类别性很有帮助，因此为了将这种方法应用到音频处理中，需要对学习目标或音频信号进行变数映射。在这个工作中，我们探索使用 EnCodec，一个神经网络对话器，生成类别目标，以便透过隐藏autoencoder（MAE）进行学习 universal audio 模型。我们评估这种方法，我们称之为 EncodecMAE，在丰富的音频任务中，包括语音、音乐和环境音频，实现了与领先的音频表现模型相同或更好的性能。”

Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning

paper_url: http://arxiv.org/abs/2309.07383
repo_url: None
paper_authors: Ali Bouland, Shengyuan Niu, Sai Tej Paruchuri, Andrew Kurdila, John Burns, Eugenio Schuster
for: 研究了一些值函数近似的收敛率，其出现在一组嵌入kernel空间（RKHS）$H(\Omega)$中的优化控制问题中。
methods: 通过将优化控制问题划入特定的本地空间中， derive strong rates of convergence for the operator equation that enables offline approximations in policy iteration。
results: derive explicit upper bounds on error in value function approximations in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.

Abstract
This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(\Omega)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function approximations are derived in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.

摘要

Beta quantile regression for robust estimation of uncertainty in the presence of outliers

paper_url: http://arxiv.org/abs/2309.07374
repo_url: None
paper_authors: Haleh Akrami, Omar Zamzam, Anand Joshi, Sergul Aydore, Richard Leahy
For: The paper is written for estimating aleatoric uncertainty in deep neural networks and generating prediction intervals, with a focus on critical applications such as clinical diagnosis.* Methods: The paper proposes a robust solution for quantile regression that incorporates concepts from robust divergence, and compares the performance of this method with two existing methods (least trimmed quantile regression and robust regression based on case-specific parameter regularization) in a simple real dataset and a medical imaging translation task using diffusion models.* Results: The proposed method is shown to be effective in addressing the problem of outlier features in deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, and can provide more accurate and robust results compared to existing methods.

Abstract
Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.

摘要
量词回归（QR）可以用于深度神经网络中计算 aleatoric 不确定性，并生成预测范围。量化不确定性 particuarly 重要在医疗应用中，如诊断、诊断和治疗规划。量词回归模型通常用于 Parametric likelihood 无法指定的情况下。虽然量词回归很 robust 对于异常响应观测，但可能敏感于异常 covariate 观测（特征）。异常特征可能会导致深度学习回归问题如 Style 翻译、图像重建和深度异常检测出现问题，从而导致误导性的结论。为解决这个问题，我们提出了一种 Robust 解决方案，即 incorporating robust 分配概念。我们与（i）最多截取量词回归和（ii）基于 Regularization 的 Case-specific 参数规则化进行比较性能。这些方法在深度学习框架中没有应用。我们还示出了提议的方法的可行性，通过应用它到一个医学图像翻译任务中。

Deep Multi-Agent Reinforcement Learning for Decentralized Active Hypothesis Testing

paper_url: http://arxiv.org/abs/2309.08477
repo_url: None
paper_authors: Hadar Szostak, Kobi Cohen
for: 这个研究是用来解决多代理 active hypothesis testing（AHT）问题的，多个代理在环境中搜集不确定观测数据，并将其转换为几个假设。
methods: 这个研究使用了深度多代理学习（deep multi-agent reinforcement learning，MARL）方法来解决AHT问题，每个代理使用一个训练好的深度神经网来将其状态映射到动作（抽样规则或停止规则），以最小化bayes risk。
results: 实验结果显示了代理们可以透过 MARLA 学习协力策略，提高性能，同时与单代理学习方法（single-agent learning）相比，MARL 表现更好。

Abstract
We consider a decentralized formulation of the active hypothesis testing (AHT) problem, where multiple agents gather noisy observations from the environment with the purpose of identifying the correct hypothesis. At each time step, agents have the option to select a sampling action. These different actions result in observations drawn from various distributions, each associated with a specific hypothesis. The agents collaborate to accomplish the task, where message exchanges between agents are allowed over a rate-limited communications channel. The objective is to devise a multi-agent policy that minimizes the Bayes risk. This risk comprises both the cost of sampling and the joint terminal cost incurred by the agents upon making a hypothesis declaration. Deriving optimal structured policies for AHT problems is generally mathematically intractable, even in the context of a single agent. As a result, recent efforts have turned to deep learning methodologies to address these problems, which have exhibited significant success in single-agent learning scenarios. In this paper, we tackle the multi-agent AHT formulation by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning. This algorithm, named Multi-Agent Reinforcement Learning for AHT (MARLA), operates at each time step by having each agent map its state to an action (sampling rule or stopping rule) using a trained deep neural network with the goal of minimizing the Bayes risk. We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance using MARLA. Furthermore, we demonstrate the superiority of MARLA over single-agent learning approaches. Finally, we provide an open-source implementation of the MARLA framework, for the benefit of researchers and developers in related domains.

摘要
我们考虑了一个分散式的活动假设测试（AHT）问题，多个代理人从环境中获得不确定的观察，以确定正确的假设。在每个时间步骤中，代理人可以选择抽样动作。这些不同的动作将从不同的分布中获得观察，每个假设都有相应的分布。代理人协力完成任务，并在有限的通信频道上进行讯息交换。目的是发展一个多代理人政策，以最小化巴耶斯风险。这种风险包括抽样成本和代理人宣布假设时的终端成本。实现结构化政策的构成是一个严格的数学问题，即使是单一代理人的情况下也是如此。因此，最近的努力是使用深度学习方法来解决这些问题，这些方法在单一代理人学习情况下已经表现出了很大的成功。在这篇论文中，我们使用名为多代理人强化学习 для AHT（MARLA）的新算法，它在每个时间步骤中让每个代理人使用训练的深度神经网来将其状态映射到动作（抽样规则或停止规则），以最小化巴耶斯风险。我们提供了一个完整的实验结果，详细展示了代理人们可以通过MARLA学习合作策略，并提高性能。此外，我们还证明了MARLA比单一代理人学习方法更有优势。最后，我们提供了一个开源的MARLA框架，以便研究人员和相关领域的开发人员参考。