cs.LG - 2023-11-07

Device Sampling and Resource Optimization for Federated Learning in Cooperative Edge Networks

  • paper_url: http://arxiv.org/abs/2311.04350
  • repo_url: None
  • paper_authors: Su Wang, Roberto Morabito, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton
  • for: 提高 Federated Learning(FedL)训练精度,适应现代无线网络中的异构计算/通信资源和数据分布差异。
  • methods: 提出一种新的优化方法,通过智能设备采样和设备间通信(D2D)卸载来考虑网络中设备的异构资源和数据分布差异,以最大化 FedL 训练精度,同时最小化数据处理和D2D通信资源消耗。
  • results: 通过 theoretically 分析 D2D 卸载子问题,得到 FedL 融合 bounds 和高效的顺序凸优化器,并基于图 convolutional neural networks(GCNs)开发一种采样方法,可以最大化 FedL 精度。通过实验结果,发现该方法比文献中常见的设备采样方法更高效,可以提高 ML 模型性能、减少数据处理负担和能量消耗。
    Abstract The conventional federated learning (FedL) architecture distributes machine learning (ML) across worker devices by having them train local models that are periodically aggregated by a server. FedL ignores two important characteristics of contemporary wireless networks, however: (i) the network may contain heterogeneous communication/computation resources, and (ii) there may be significant overlaps in devices' local data distributions. In this work, we develop a novel optimization methodology that jointly accounts for these factors via intelligent device sampling complemented by device-to-device (D2D) offloading. Our optimization methodology aims to select the best combination of sampled nodes and data offloading configuration to maximize FedL training accuracy while minimizing data processing and D2D communication resource consumption subject to realistic constraints on the network topology and device capabilities. Theoretical analysis of the D2D offloading subproblem leads to new FedL convergence bounds and an efficient sequential convex optimizer. Using these results, we develop a sampling methodology based on graph convolutional networks (GCNs) which learns the relationship between network attributes, sampled nodes, and D2D data offloading to maximize FedL accuracy. Through evaluation on popular datasets and real-world network measurements from our edge testbed, we find that our methodology outperforms popular device sampling methodologies from literature in terms of ML model performance, data processing overhead, and energy consumption.
    摘要 传统的联邦学习(FedL)架构将机器学习(ML)分布在工作设备上,通过让它们在服务器 periodic 合并的地方训练本地模型。FedL 忽略了当代无线网络中两个重要特征:(一)网络可能包含不同类型的通信/计算资源,(二)设备之间的本地数据分布可能存在重叠。在这项工作中,我们开发了一种新的优化方法,通过智能设备采样和设备间通信(D2D)卸载来考虑这两个因素。我们的优化方法的目标是选择最佳的采样节点和数据卸载配置,以最大化 FedL 训练精度,同时最小化数据处理和D2D通信资源消耗,并且具有现实的网络拓扑和设备能力的限制。对 D2D 卸载子问题的理论分析导致了新的 FedL 收敛界限和高效的序列凸优化器。使用这些结果,我们开发了一种基于图 convolutional neural networks(GCNs)的采样方法,可以最大化 FedL 精度。经过对流行的数据集和实际网络测量数据进行评估,我们发现我们的方法在 ML 模型性能、数据处理开销和能 consumption 等方面都超过了文献中常见的设备采样方法。

InstrumentGen: Generating Sample-Based Musical Instruments From Text

  • paper_url: http://arxiv.org/abs/2311.04339
  • repo_url: None
  • paper_authors: Shahan Nercessian, Johannes Imort
  • for: 本研究targets at generating sample-based musical instruments based on textual prompts.
  • methods: 提议InstrumentGen模型,该模型基于文本提示扩展了生成音频框架,并condition on instrument family, source type, pitch (across an 88-key spectrum), velocity, and a joint text/audio embedding.
  • results: 我们的结果建立了一个基本的文本到乐器基线,扩展了自动化样本基于乐器生成的研究领域。
    Abstract We introduce the text-to-instrument task, which aims at generating sample-based musical instruments based on textual prompts. Accordingly, we propose InstrumentGen, a model that extends a text-prompted generative audio framework to condition on instrument family, source type, pitch (across an 88-key spectrum), velocity, and a joint text/audio embedding. Furthermore, we present a differentiable loss function to evaluate the intra-instrument timbral consistency of sample-based instruments. Our results establish a foundational text-to-instrument baseline, extending research in the domain of automatic sample-based instrument generation.
    摘要 我们介绍了文本到乐器任务,该任务目标是根据文本提示生成基于样本的乐器。我们提议了InstrumentGen模型,该模型是基于文本提示生成音频框架的扩展,并将Condition on instrument family, source type, pitch (across an 88-key spectrum), velocity, and a joint text/audio embedding。此外,我们提出了可导的损失函数来评估样本基于乐器的内部同一性。我们的结果建立了文本到乐器的基线,推动了自动样本基于乐器生成的研究。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Convex Methods for Constrained Linear Bandits

  • paper_url: http://arxiv.org/abs/2311.04338
  • repo_url: None
  • paper_authors: Amirhossein Afsharrad, Ahmadreza Moradipari, Sanjay Lall
  • for: 这个论文主要关注在实际世界中的安全敏感系统中,它们需要重复地与人类进行交互。
  • methods: 该论文提出了一种基于凸编程工具的执行效率的帮助者优化算法框架,以便实现安全线性帮助者问题的优化。
  • results: 该论文提出了一种结束到端的安全线性帮助者算法管道,该管道仅仅是解凸问题。此外,论文还进行了数值性评估。
    Abstract Recently, bandit optimization has received significant attention in real-world safety-critical systems that involve repeated interactions with humans. While there exist various algorithms with performance guarantees in the literature, practical implementation of the algorithms has not received as much attention. This work presents a comprehensive study on the computational aspects of safe bandit algorithms, specifically safe linear bandits, by introducing a framework that leverages convex programming tools to create computationally efficient policies. In particular, we first characterize the properties of the optimal policy for safe linear bandit problem and then propose an end-to-end pipeline of safe linear bandit algorithms that only involves solving convex problems. We also numerically evaluate the performance of our proposed methods.
    摘要 最近,匪帮优化在实际世界中的安全关键系统中受到了重要的关注。虽然文献中存在许多Algorithms with performance guarantees,但实际的实现尚未收到过相应的注意。这项工作提出了安全线性匪帮算法的计算性能方面的全面研究,specifically safe linear bandits,by introducing a framework that leverages convex programming tools to create computationally efficient policies. In particular, we first characterize the properties of the optimal policy for safe linear bandit problem and then propose an end-to-end pipeline of safe linear bandit algorithms that only involves solving convex problems. We also numerically evaluate the performance of our proposed methods.Note: "匪帮" (bandit) in Chinese refers to a "rogue" or "fraudulent" agent, and "安全" (safe) refers to the property of being free from harm or danger. "线性" (linear) refers to a linear relationship between variables.

Lie Point Symmetry and Physics Informed Networks

  • paper_url: http://arxiv.org/abs/2311.04293
  • repo_url: None
  • paper_authors: Tara Akhound-Sadegh, Laurence Perreault-Levasseur, Johannes Brandstetter, Max Welling, Siamak Ravanbakhsh
  • for: 提高神经网络的通用性,使其能够更好地解决偏微分方程(PDE)。
  • methods: 利用射点Symmetries(Lie point symmetries),一种常见的神经网络模型,即物理学信息神经网络(PINNs)中的一个主要家族。
  • results: 通过提出一个loss函数,使神经网络了解射点Symmetries(Lie point symmetries),从而使神经网络学习的解决方案拥有更高的通用性。
    Abstract Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equivariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family of neural solvers known as physics-informed neural networks (PINNs). We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Intuitively, our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions. Effectively, this means that once the network learns a solution, it also learns the neighbouring solutions generated by Lie point symmetries. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.
    摘要 Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equivariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family of neural solvers known as physics-informed neural networks (PINNs). We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Intuitively, our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions. Effectively, this means that once the network learns a solution, it also learns the neighboring solutions generated by Lie point symmetries. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.Here is the text in Traditional Chinese:Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equivariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family of neural solvers known as physics-informed neural networks (PINNs). We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Intuitively, our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions. Effectively, this means that once the network learns a solution, it also learns the neighboring solutions generated by Lie point symmetries. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.

Compilation of product-formula Hamiltonian simulation via reinforcement learning

  • paper_url: http://arxiv.org/abs/2311.04285
  • repo_url: https://github.com/leamarion/rl-for-compilation-of-product-formula-hamiltonian-simulation
  • paper_authors: Lea M. Trenkwalder, Eleanor Scerri, Thomas E. O’Brien, Vedran Dunjko
  • for: Hamiltoniano simulation是一个量子计算机可以获得量子优势的任务之一。
  • methods: Trotterization是一种常用的办法,使用了approximation $e^{i\sum_jA_j}\sim \prod_je^{iA_j}$和其他更高阶 corrections。但是,这还留下了操作顺序的问题(即在产生$j$中的顺序,这知道会affect the quality of approximation)。在某些情况下,这个顺序是由 Desire to minimize the error of approximation决定的,当它不是,我们提议可以根据native quantum architecture进行优化编译。这个问题被称为order-agnostic quantum circuit compilation,我们证明其是NP-hard的最坏情况。
  • results: We compare three methods of heuristic optimization of compilation: simulated annealing, Monte Carlo tree search, and reinforcement learning. While two of the methods outperform a naive heuristic, reinforcement learning clearly outperforms all others, with a gain of around 12% with respect to the second-best method and of around 50% compared to the naive heuristic in terms of the gate count. We also test the ability of RL to generalize across instances of the compilation problem, and find that a single learner is able to solve entire problem families. This demonstrates the ability of machine learning techniques to provide assistance in an order-agnostic quantum compilation task.
    Abstract Hamiltonian simulation is believed to be one of the first tasks where quantum computers can yield a quantum advantage. One of the most popular methods of Hamiltonian simulation is Trotterization, which makes use of the approximation $e^{i\sum_jA_j}\sim \prod_je^{iA_j}$ and higher-order corrections thereto. However, this leaves open the question of the order of operations (i.e. the order of the product over $j$, which is known to affect the quality of approximation). In some cases this order is fixed by the desire to minimise the error of approximation; when it is not the case, we propose that the order can be chosen to optimize compilation to a native quantum architecture. This presents a new compilation problem -- order-agnostic quantum circuit compilation -- which we prove is NP-hard in the worst case. In lieu of an easily-computable exact solution, we turn to methods of heuristic optimization of compilation. We focus on reinforcement learning due to the sequential nature of the compilation task, comparing it to simulated annealing and Monte Carlo tree search. While two of the methods outperform a naive heuristic, reinforcement learning clearly outperforms all others, with a gain of around 12% with respect to the second-best method and of around 50% compared to the naive heuristic in terms of the gate count. We further test the ability of RL to generalize across instances of the compilation problem, and find that a single learner is able to solve entire problem families. This demonstrates the ability of machine learning techniques to provide assistance in an order-agnostic quantum compilation task.
    摘要 希amiltonian模拟被认为是量子计算机能够获得量子优势的首要任务之一。一种最受欢迎的希amiltonian模拟方法是Trotter化,它使用近似关系 $e^{i\sum_jA_j}\sim \prod_je^{iA_j}$ 和更高阶 corrections 来进行模拟。然而,这些问题还留下了操作顺序的问题(即在 $j$ 中进行乘法的顺序),这个问题的解决尚未得到了一致的答案。在某些情况下,操作顺序可以根据模拟误差的最小化来固定;而在其他情况下,我们建议可以根据native量子架构的特点来选择操作顺序,以便优化模拟。这个问题被称为order-agnostic量子电路编译问题,我们证明其是NP困难的最坏情况。由于无法计算出精确的解,我们转而使用优化技术来解决这个问题。我们主要关注了利用强化学习来优化编译,并对其进行比较。在希amiltonian模拟任务中,强化学习表现出色,与第二好的方法相比,它的gateCount减少了约12%,与naive heuristic相比,它的gateCount减少了约50%。我们进一步测试了RL的能力泛化到希amiltonian模拟任务中的不同问题家族,发现一个RL学习者能够解决整个问题家族。这表明机器学习技术可以为order-agnostic量子编译任务提供帮助。

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems

  • paper_url: http://arxiv.org/abs/2311.04161
  • repo_url: None
  • paper_authors: Nikita Puchkin, Eduard Gorbunov, Nikolay Kutuzov, Alexander Gasnikov
  • for: 这种研究是为了解决涉及杂散乱的优化问题。
  • methods: 这篇论文使用了精细的满足条件下的梯度积分法,以减少杂散乱的影响。
  • results: 研究人员发现,通过使用精细的满足条件下的梯度积分法,可以实现更快的速度收敛,比如$\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$。
    Abstract We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than $\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$, when the stochastic gradients have finite moments of order $\alpha \in (1, 2]$. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, using smoothed medians of means. We prove that the resulting estimates have negligible bias and controllable variance. This allows us to carefully incorporate them into clipped-SGD and clipped-SSTM and derive new high-probability complexity bounds in the considered setup.
    摘要 我们考虑了随机优化问题,其中噪声具有巨大尾数的概率分布。我们显示,在这种情况下,可以得到比$\mathcal{O}(K^{-2(\alpha - 1)/\alpha})$更快的收敛率,其中噪声方差具有finite moment of order $\alpha \in (1, 2]$. Specifically,我们使用平滑的 médian of means来稳定随机梯度,并证明这些估计具有可控的方差和无所谓的偏差。这使我们可以谨慎地将其包含到clipped-SGD和clipped-SSTM中,并 derivnew high-probability complexity bounds in the considered setup.

Computing Approximate $\ell_p$ Sensitivities

  • paper_url: http://arxiv.org/abs/2311.04158
  • repo_url: None
  • paper_authors: Swati Padmanabhan, David P. Woodruff, Qiuyi, Zhang
  • for: 这个论文是用来解决维度减少问题中的敏感度问题的,提供可证明的保证,并且可以通过下采样来移除低敏感度的数据点。
  • methods: 这个论文使用了快速的$\ell_p$敏感度计算算法,包括$\ell_1$敏感度和总$\ell_p$敏感度的计算。此外,它还提供了一种基于 Lewis 重要性权重的重要性采样算法,用于估计总$\ell_p$敏感度。
  • results: 这个论文的计算结果显示,在许多实际世界数据集中,总敏感度可以快速地被估计,并且它们比理论预测的要小得多,这表明了实际数据集的内在有效维度很低。
    Abstract Recent works in dimensionality reduction for regression tasks have introduced the notion of sensitivity, an estimate of the importance of a specific datapoint in a dataset, offering provable guarantees on the quality of the approximation after removing low-sensitivity datapoints via subsampling. However, fast algorithms for approximating $\ell_p$ sensitivities, which we show is equivalent to approximate $\ell_p$ regression, are known for only the $\ell_2$ setting, in which they are termed leverage scores. In this work, we provide efficient algorithms for approximating $\ell_p$ sensitivities and related summary statistics of a given matrix. In particular, for a given $n \times d$ matrix, we compute $\alpha$-approximation to its $\ell_1$ sensitivities at the cost of $O(n/\alpha)$ sensitivity computations. For estimating the total $\ell_p$ sensitivity (i.e. the sum of $\ell_p$ sensitivities), we provide an algorithm based on importance sampling of $\ell_p$ Lewis weights, which computes a constant factor approximation to the total sensitivity at the cost of roughly $O(\sqrt{d})$ sensitivity computations. Furthermore, we estimate the maximum $\ell_1$ sensitivity, up to a $\sqrt{d}$ factor, using $O(d)$ sensitivity computations. We generalize all these results to $\ell_p$ norms for $p > 1$. Lastly, we experimentally show that for a wide class of matrices in real-world datasets, the total sensitivity can be quickly approximated and is significantly smaller than the theoretical prediction, demonstrating that real-world datasets have low intrinsic effective dimensionality.
    摘要 现有研究在减简维度方面为回归任务引入了敏感度(estimate of the importance of a specific datapoint in a dataset),提供可证明的保证,表明在移除低敏感度数据点后,减简后的数据集质量保证。然而,快速算法 дляapproximating $\ell_p$ 敏感度(equivalent to approximate $\ell_p$ regression)只知道在 $\ell_2$ 设定下,称为杠杆分数。在这项工作中,我们提供高效的算法来approximate $\ell_p$ 敏感度和相关的摘要统计量。具体来说,我们计算 $\alpha$-approximation to its $\ell_1$ 敏感度,在 $O(n/\alpha)$ 敏感度计算成本下。 для估算总 $\ell_p$ 敏感度(即总 $\ell_p$ 敏感度的和),我们提供基于 $\ell_p$ Lewis 质量的重要样本计算算法,该算法在 $O(\sqrt{d})$ 敏感度计算成本下计算一个常数因子的approximation。此外,我们估算最大 $\ell_1$ 敏感度,在 $O(d)$ 敏感度计算成本下,即 $\sqrt{d}$ 因子。我们推广所有这些结果到 $\ell_p$ нор的 $p > 1$。最后,我们通过实验表明,在实际世界数据中,总敏感度可以快速地被 aproximated,并且与理论预测相比,实际数据的内在有效维度较低。

Kernel-, mean- and noise-marginalised Gaussian processes for exoplanet transits and $H_0$ inference

  • paper_url: http://arxiv.org/abs/2311.04153
  • repo_url: https://github.com/zwei-beiner/transdimensional_sampler
  • paper_authors: Namu Kroupa, David Yallup, Will Handley, Michael Hobson
  • for: 这个论文的目的是扩展 Gaussian Process regression,以包括对核心选择和核心参数的权化。
  • methods: 这个方法使用了完全 bayesian 方法,通过证据来比较模型。计算共同 posterior 使用了 transdimensional sampler,同时样本涵盖着离散核心选择和其参数。
  • results: 在synthetic数据上进行了测试,true核心在低噪声区域中被恢复,而在大噪声区域中没有核心被选择。此外,对物理星际行星参数的推断也进行了。在高噪声区域中,可以除偏、宽化POSTERIOR或提高推断准确性。此外,模型中的mean函数预测分布中的不确定性增加。最后,该方法被扩展到了对mean函数和噪声模型进行了权化。
    Abstract Using a fully Bayesian approach, Gaussian Process regression is extended to include marginalisation over the kernel choice and kernel hyperparameters. In addition, Bayesian model comparison via the evidence enables direct kernel comparison. The calculation of the joint posterior was implemented with a transdimensional sampler which simultaneously samples over the discrete kernel choice and their hyperparameters by embedding these in a higher-dimensional space, from which samples are taken using nested sampling. This method was explored on synthetic data from exoplanet transit light curve simulations. The true kernel was recovered in the low noise region while no kernel was preferred for larger noise. Furthermore, inference of the physical exoplanet hyperparameters was conducted. In the high noise region, either the bias in the posteriors was removed, the posteriors were broadened or the accuracy of the inference was increased. In addition, the uncertainty in mean function predictive distribution increased due to the uncertainty in the kernel choice. Subsequently, the method was extended to marginalisation over mean functions and noise models and applied to the inference of the present-day Hubble parameter, $H_0$, from real measurements of the Hubble parameter as a function of redshift, derived from the cosmologically model-independent cosmic chronometer and {\Lambda}CDM-dependent baryon acoustic oscillation observations. The inferred $H_0$ values from the cosmic chronometers, baryon acoustic oscillations and combined datasets are $H_0$ = 66$\pm$6 km/s/Mpc, $H_0$ = 67$\pm$10 km/s/Mpc and $H_0$ = 69$\pm$6 km/s/Mpc, respectively. The kernel posterior of the cosmic chronometers dataset prefers a non-stationary linear kernel. Finally, the datasets are shown to be not in tension with ln(R)=12.17$\pm$0.02.
    摘要

HyperS2V: A Framework for Structural Representation of Nodes in Hyper Networks

  • paper_url: http://arxiv.org/abs/2311.04149
  • repo_url: https://github.com/liushu2019/hypers2v
  • paper_authors: Shu Liu, Cameron Lai, Fujio Toriumi
  • for: 本研究旨在提出一种基于结构相似性的节点嵌入方法(HyperS2V),以便在含有更复杂关系的超网络上应用机器学习方法。
  • methods: HyperS2V 方法首先引入了超网络中节点的超度概念,然后提出了一种新的结构相似度函数来衡量不同超度值之间的结构相似性。最后,通过多尺度随机游走框架来生成结构嵌入。
  • results: 在各种内在和外在实验中,HyperS2V 方法表现出了更高的解释力和下游任务应用性。
    Abstract In contrast to regular (simple) networks, hyper networks possess the ability to depict more complex relationships among nodes and store extensive information. Such networks are commonly found in real-world applications, such as in social interactions. Learning embedded representations for nodes involves a process that translates network structures into more simplified spaces, thereby enabling the application of machine learning approaches designed for vector data to be extended to network data. Nevertheless, there remains a need to delve into methods for learning embedded representations that prioritize structural aspects. This research introduces HyperS2V, a node embedding approach that centers on the structural similarity within hyper networks. Initially, we establish the concept of hyper-degrees to capture the structural properties of nodes within hyper networks. Subsequently, a novel function is formulated to measure the structural similarity between different hyper-degree values. Lastly, we generate structural embeddings utilizing a multi-scale random walk framework. Moreover, a series of experiments, both intrinsic and extrinsic, are performed on both toy and real networks. The results underscore the superior performance of HyperS2V in terms of both interpretability and applicability to downstream tasks.
    摘要 contrast to regular (simple) networks, hyper networks possess the ability to depict more complex relationships among nodes and store extensive information. Such networks are commonly found in real-world applications, such as in social interactions. Learning embedded representations for nodes involves a process that translates network structures into more simplified spaces, thereby enabling the application of machine learning approaches designed for vector data to be extended to network data. Nevertheless, there remains a need to delve into methods for learning embedded representations that prioritize structural aspects. This research introduces HyperS2V, a node embedding approach that centers on the structural similarity within hyper networks. Initially, we establish the concept of hyper-degrees to capture the structural properties of nodes within hyper networks. Subsequently, a novel function is formulated to measure the structural similarity between different hyper-degree values. Lastly, we generate structural embeddings utilizing a multi-scale random walk framework. Moreover, a series of experiments, both intrinsic and extrinsic, are performed on both toy and real networks. The results underscore the superior performance of HyperS2V in terms of both interpretability and applicability to downstream tasks.Here's the word-for-word translation of the text into Simplified Chinese:对于常见的简单网络,超网络具有更复杂的节点关系和更多的信息存储能力。这些网络在实际应用中很常见,如社交互动等。学习节点嵌入表示需要将网络结构简化成vector数据适用的机器学习方法的数据类型。然而,还需要关注优化结构嵌入的方法。本研究介绍了HyperS2V节点嵌入方法,该方法在超网络中强调结构相似性。我们首先提出了超度的概念,用于捕捉超网络中节点的结构特性。然后,我们定义了一种新的函数,用于度量不同超度值之间的结构相似性。最后,我们使用多尺度随机游走框架生成结构嵌入。此外,我们对各种内在和外在实验进行了多个实验,包括小型网络和真实网络。结果表明HyperS2V在 interpretability 和下游任务应用性方面具有显著优势。

Multi-resolution Time-Series Transformer for Long-term Forecasting

  • paper_url: http://arxiv.org/abs/2311.04147
  • repo_url: None
  • paper_authors: Yitian Zhang, Liheng Ma, Soumyasundar Pal, Yingxue Zhang, Mark Coates
  • for: 预测时间序列数据的性能提高了 significatively,特别是使用 patches segmentation 技术来学习时间序列中复杂的模式。
  • methods: 我们提出了一种新的框架,即多分辨率时间序列变换器(MTST),它通过多个分支结构同时模型不同的时间模式来提高预测性能。
  • results: 对多个实际数据集进行了广泛的实验,与现有的预测技术进行比较,MTST 显示出了更高的预测性能。
    Abstract The performance of transformers for time-series forecasting has improved significantly. Recent architectures learn complex temporal patterns by segmenting a time-series into patches and using the patches as tokens. The patch size controls the ability of transformers to learn the temporal patterns at different frequencies: shorter patches are effective for learning localized, high-frequency patterns, whereas mining long-term seasonalities and trends requires longer patches. Inspired by this observation, we propose a novel framework, Multi-resolution Time-Series Transformer (MTST), which consists of a multi-branch architecture for simultaneous modeling of diverse temporal patterns at different resolutions. In contrast to many existing time-series transformers, we employ relative positional encoding, which is better suited for extracting periodic components at different scales. Extensive experiments on several real-world datasets demonstrate the effectiveness of MTST in comparison to state-of-the-art forecasting techniques.
    摘要 “ transformers 的时间序列预测性能有所改善。现代架构通过分割时间序列为 patches,并将 patches 作为 tokens 进行学习。patch size 控制了 transformers 能够学习不同频率的时间模式:短 patches 有助于学习本地高频率模式,而长 patches 则能够捕捉长期季节性和趋势。这一观察所启发我们提出了一个新的框架:多解析时间序列transformer (MTST),这个框架包括多支分支架构,以同时模型不同分辨率的时间模式。相比于许多现有的时间序列transformer,我们使用相对位置编码,这种编码更适合提取不同 scales 的周期性。实际实验显示,MTST 与现有的预测技术相比,具有更高的预测性能。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Generative learning for nonlinear dynamics

  • paper_url: http://arxiv.org/abs/2311.04128
  • repo_url: None
  • paper_authors: William Gilpin
  • for: 这篇论文主要研究的是如何使用非线性动力学来描述和分析大规模生成统计学模型。
  • methods: 论文使用了信息论和非线性动力学的工具来推导和分析 chaos 的特性,以及如何将这些特性应用到实际数据中。
  • results: 论文发现了一些经典工具,如拟合器重构和符号表示法,可以用于分析和理解大规模生成统计学模型。此外,论文还发现了一些新的应用场景,如在复杂流体动力学中使用 operator-theoretic 方法,以及在生物数据中探测破碎的平衡。
    Abstract Modern generative machine learning models demonstrate surprising ability to create realistic outputs far beyond their training data, such as photorealistic artwork, accurate protein structures, or conversational text. These successes suggest that generative models learn to effectively parametrize and sample arbitrarily complex distributions. Beginning half a century ago, foundational works in nonlinear dynamics used tools from information theory to infer properties of chaotic attractors from time series, motivating the development of algorithms for parametrizing chaos in real datasets. In this perspective, we aim to connect these classical works to emerging themes in large-scale generative statistical learning. We first consider classical attractor reconstruction, which mirrors constraints on latent representations learned by state space models of time series. We next revisit early efforts to use symbolic approximations to compare minimal discrete generators underlying complex processes, a problem relevant to modern efforts to distill and interpret black-box statistical models. Emerging interdisciplinary works bridge nonlinear dynamics and learning theory, such as operator-theoretic methods for complex fluid flows, or detection of broken detailed balance in biological datasets. We anticipate that future machine learning techniques may revisit other classical concepts from nonlinear dynamics, such as transinformation decay and complexity-entropy tradeoffs.
    摘要 现代生成机器学习模型展示了奇异的能力创造真实的输出,比如写实际的艺术作品、准确的蛋白结构或对话文本。这些成功表明生成模型可以有效地参数化和采样无法预测的分布。半个世纪以前,基础性工作在非线性动力学使用信息理论工具来推导归案动力系统的属性,这些工具驱动了生成模型 Parametrize chaos 的发展。在这个视角下,我们想要连接这些古老的工作和新兴的主题在大规模生成统计学习中。我们首先考虑经典吸引器重建,这与生成模型学习的秘密表示有着相似的约束。然后我们回到了早期的尝试使用符号表示法来比较微观生成过程中的最小精炼生成器,这个问题与现代尝试总结和解释黑盒统计模型有着相似之处。新兴的交叉学科工作将非线性动力学和学习理论相连,如操作理论方法 для复杂流体流动或生物数据中的假设不均衡检测。我们预计未来的机器学习技术可能会回到其他古老的非线性动力学概念,如信息泄露衰落和复杂度-熵负担交易。

Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection

  • paper_url: http://arxiv.org/abs/2311.04109
  • repo_url: None
  • paper_authors: Benjamin Steenhoek, Md Mahbubur Rahman, Shaila Sharmin, Wei Le
  • for: This paper aims to analyze the alignment of pretrained language models with bug semantics in the context of vulnerability detection.
  • methods: The paper uses three distinct methods to analyze the models: interpretability tools, attention analysis, and interaction matrix analysis.
  • results: The paper finds that better-performing models align better with potentially vulnerable statements (PVS), but the models fail to align strongly to buggy paths. The paper also develops two annotation methods to highlight bug semantics inside the model’s inputs, which improve the models’ performance in most settings.
    Abstract Recently, pretrained language models have shown state-of-the-art performance on the vulnerability detection task. These models are pretrained on a large corpus of source code, then fine-tuned on a smaller supervised vulnerability dataset. Due to the different training objectives and the performance of the models, it is interesting to consider whether the models have learned the semantics of code relevant to vulnerability detection, namely bug semantics, and if so, how the alignment to bug semantics relates to model performance. In this paper, we analyze the models using three distinct methods: interpretability tools, attention analysis, and interaction matrix analysis. We compare the models' influential feature sets with the bug semantic features which define the causes of bugs, including buggy paths and Potentially Vulnerable Statements (PVS). We find that (1) better-performing models also aligned better with PVS, (2) the models failed to align strongly to PVS, and (3) the models failed to align at all to buggy paths. Based on our analysis, we developed two annotation methods which highlight the bug semantics inside the model's inputs. We evaluated our approach on four distinct transformer models and four vulnerability datasets and found that our annotations improved the models' performance in the majority of settings - 11 out of 16, with up to 9.57 points improvement in F1 score compared to conventional fine-tuning. We further found that with our annotations, the models aligned up to 232% better to potentially vulnerable statements. Our findings indicate that it is helpful to provide the model with information of the bug semantics, that the model can attend to it, and motivate future work in learning more complex path-based bug semantics. Our code and data are available at https://figshare.com/s/4a16a528d6874aad51a0.
    摘要 近期,预训言语模型已经显示出了检测漏洞性能的状态之一。这些模型首先在大量的源代码中预训练,然后在更小的指导漏洞数据集上细化调教。由于不同的训练目标和模型性能,我们很有趣地考虑是否模型学习了相关的代码 semantics,即漏洞 semantics,并如何对这种对应关系。在这篇论文中,我们使用三种不同的方法来分析模型:可读性工具、注意力分析和交互矩阵分析。我们将比较模型的影响特征集与漏洞 semantics 定义的 bug 的原因,包括漏洞路径和潜在漏洞语句 (PVS)。我们发现:1. 性能更高的模型也更好地与 PVS 对应。2. 模型无法强制对 PVS 进行对应。3. 模型无法对漏洞路径进行对应。根据我们的分析,我们开发了两种注释方法,它们可以在模型的输入中标注漏洞 semantics。我们对四种 transformer 模型和四个漏洞数据集进行了评估,发现我们的注释可以提高模型的性能,其中最高提高为 9.57 个 F1 分数。此外,我们发现,通过我们的注释,模型可以更好地对 PVS 进行对应,最高提高 232%。我们的发现表明,提供模型 bug semantics 信息可以有助于模型更好地attend to 它,并促使未来的工作在学习更复杂的路径基本 bug semantics。我们的代码和数据可以在 上获取。

Time-Efficient Reinforcement Learning with Stochastic Stateful Policies

  • paper_url: http://arxiv.org/abs/2311.04082
  • repo_url: None
  • paper_authors: Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo
  • for: 论文旨在探讨一种新的状态full策略训练方法,用于解决backpropagation through time(BPTT)的缺点,如训练过程慢和梯度衰减或扩散。
  • methods: 该方法基于分解状态full策略为随机内部状态核心和状态无关策略,并将两部分同时优化。
  • results: 对复杂的连续控制任务(如人工智能步行)进行了评估,并证明了该方法的渐进性和简单性,比BPTT更快速和稳定。
    Abstract Stateful policies play an important role in reinforcement learning, such as handling partially observable environments, enhancing robustness, or imposing an inductive bias directly into the policy structure. The conventional method for training stateful policies is Backpropagation Through Time (BPTT), which comes with significant drawbacks, such as slow training due to sequential gradient propagation and the occurrence of vanishing or exploding gradients. The gradient is often truncated to address these issues, resulting in a biased policy update. We present a novel approach for training stateful policies by decomposing the latter into a stochastic internal state kernel and a stateless policy, jointly optimized by following the stateful policy gradient. We introduce different versions of the stateful policy gradient theorem, enabling us to easily instantiate stateful variants of popular reinforcement learning and imitation learning algorithms. Furthermore, we provide a theoretical analysis of our new gradient estimator and compare it with BPTT. We evaluate our approach on complex continuous control tasks, e.g., humanoid locomotion, and demonstrate that our gradient estimator scales effectively with task complexity while offering a faster and simpler alternative to BPTT.
    摘要 状态ful策略在强化学习中扮演着重要的角色,如处理部分可见环境、增强稳定性或直接在策略结构中强制条件。 conventinal方法 для训练状态ful策略是经验回propagation through time(BPTT),但它带来了一些缺点,如递归梯度传播导致训练速度慢,以及gradient会消失或扩散导致策略更新受到偏见。我们提出了一种新的方法,即将状态ful策略分解为随机内部状态核心和状态eless策略,共同优化。我们还提出了不同版本的状态ful策略梯度定理,使得我们可以轻松地实现状态ful变种的强化学习和模仿学习算法。此外,我们提供了对我们新的梯度估计器的理论分析,并与BPTT进行比较。我们在复杂的连续控制任务上,如人类型步行,进行了实验,并证明了我们的梯度估计器可以有效地扩展到任务复杂度,而且比BPTT更快和简单。

Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation

  • paper_url: http://arxiv.org/abs/2311.04060
  • repo_url: None
  • paper_authors: Lennart Röstel, Johannes Pitz, Leon Sievers, Berthold Bäuml
  • for: 这篇论文是针对机器人内部抓取和调整问题提出了解决方案,具体是针对无感知的、目标条件的、手指向下倒转,并且能够在实际中实现高精度的State Estimation和控制策略。
  • methods: 这篇论文使用了分开训练控制策略和State Estimator,并在训练时候将它们联合在一起,以提高State Estimation的精度和控制策略的可读性。另外,使用了GPU加速,从零开始训练时间只需6.5小时。
  • results: 这篇论文在四种不同的物体形状下,在 sim2real 转移中成功地转动了四种物体到24个不同的方向,并且可以 consecutively 转动一个方巧到九个目标(中位数),这是之前的方法无法达成的。
    Abstract This paper identifies and addresses the problems with naively combining (reinforcement) learning-based controllers and state estimators for robotic in-hand manipulation. Specifically, we tackle the challenging task of purely tactile, goal-conditioned, dextrous in-hand reorientation with the hand pointing downwards. Due to the limited sensing available, many control strategies that are feasible in simulation when having full knowledge of the object's state do not allow for accurate state estimation. Hence, separately training the controller and the estimator and combining the two at test time leads to poor performance. We solve this problem by coupling the control policy to the state estimator already during training in simulation. This approach leads to more robust state estimation and overall higher performance on the task while maintaining an interpretability advantage over end-to-end policy learning. With our GPU-accelerated implementation, learning from scratch takes a median training time of only 6.5 hours on a single, low-cost GPU. In simulation experiments with the DLR-Hand II and for four significantly different object shapes, we provide an in-depth analysis of the performance of our approach. We demonstrate the successful sim2real transfer by rotating the four objects to all 24 orientations in the $\pi/2$ discretization of SO(3), which has never been achieved for such a diverse set of shapes. Finally, our method can reorient a cube consecutively to nine goals (median), which was beyond the reach of previous methods in this challenging setting.
    摘要 Here is the text in Simplified Chinese:这篇论文描述了在将学习控制器和状态估计器结合使用时存在的问题,特别是在 robotic 手部受控 manipulation 中实现纯粹的感觉控制。具体来说,我们面临的任务是在手指向下的情况下,使用纯粹的感觉控制来将物体 rotate 到指定的目标位置。由于感知限制,许多在模拟中可行的控制策略无法准确地估计状态,因此将控制策略和状态估计器分开训练并在测试时结合并不能达到高性能。我们解决这个问题是通过在模拟中couple 控制策略和状态估计器的训练来实现更加稳定的状态估计和总体更高的性能,同时保持了可解释性的优势。我们使用了GPU加速的实现,从零开始训练时间为6.5小时,在单个、低成本的GPU上。我们在DLR-Hand II和四种不同的物体形状的模拟实验中进行了深入的分析性能,并证明了我们的方法可以在模拟和实际中实现精准的转换。最后,我们的方法可以 consecutively 将一个方块转换到九个目标位置(中值),这是之前的方法在这种复杂的情况下无法达到的。

Feature Space Renormalization for Semi-supervised Learning

  • paper_url: http://arxiv.org/abs/2311.04055
  • repo_url: None
  • paper_authors: Jun Sun, Zhongjie Mao, Chao Li, Chao Zhou, Xiao-Jun Wu
  • for: 本研究旨在提出一种新的半监督学习方法,以便利用无标签数据来减轻模型对大量标签数据的依赖。
  • methods: 本研究使用了一种新的特征空间normalization机制来取代常用的一致正则化机制,以学习更好的抑制特征。
  • results: 实验结果表明,我们的方法可以在多种标准半监督学习 benchmark datasets上 achieve better performance,而且提出的特征空间normalization机制还可以增强其他半监督学习方法的性能。
    Abstract Semi-supervised learning (SSL) has been proven to be a powerful method for leveraging unlabelled data to alleviate models' dependence on large labelled datasets. The common framework among recent approaches is to train the model on a large amount of unlabelled data with consistency regularization to constrain the model predictions to be invariant to input perturbation. However, the existing SSL frameworks still have room for improvement in the consistency regularization method. Instead of regularizing category predictions in the label space as in existing frameworks, this paper proposes a feature space renormalization (FSR) mechanism for SSL. First, we propose a feature space renormalization mechanism to substitute for the commonly used consistency regularization mechanism to learn better discriminative features. To apply this mechanism, we start by building a basic model and an empirical model and then introduce our mechanism to renormalize the feature learning of the basic model with the guidance of the empirical model. Second, we combine the proposed mechanism with pseudo-labelling to obtain a novel effective SSL model named FreMatch. The experimental results show that our method can achieve better performance on a variety of standard SSL benchmark datasets, and the proposed feature space renormalization mechanism can also enhance the performance of other SSL approaches.
    摘要 我们首先提出了一种特征空间normalization机制,以代替常用的一致性regularization机制,以学习更好的抽象特征。我们开始于建立基本模型和经验模型,然后引入我们的机制,使基本模型在经验模型的指导下进行特征学习的normalization。其次,我们将我们的机制与 Pseudo-labeling 结合,得到了一种新的有效的SSL模型,名为FreMatch。实验结果表明,我们的方法可以在多个标准SSLbenchmark数据集上达到更好的性能,而且我们提出的特征空间normalization机制还可以提高其他SSL方法的性能。

Extracting human interpretable structure-property relationships in chemistry using XAI and large language models

  • paper_url: http://arxiv.org/abs/2311.04047
  • repo_url: https://github.com/geemi725/xpertai
  • paper_authors: Geemi P. Wellawatte, Philippe Schwaller
  • for: 本研究旨在对机器学习模型的不透明性进行解释,并且通过将XAI方法与大型自然语言模型(LLM)结合,自动生成化学数据的可读性 naturallanguage explanations。
  • methods: 本研究使用了XpertAI框架,具体来说是结合XAI方法和LLMs,以生成化学数据的可读性自然语言解释。
  • results: 我们在5个案例研究中发现,XpertAI组合了XAI工具和LLMs的优点,可以生成具体、科学和可解释的化学数据解释。
    Abstract Explainable Artificial Intelligence (XAI) is an emerging field in AI that aims to address the opaque nature of machine learning models. Furthermore, it has been shown that XAI can be used to extract input-output relationships, making them a useful tool in chemistry to understand structure-property relationships. However, one of the main limitations of XAI methods is that they are developed for technically oriented users. We propose the XpertAI framework that integrates XAI methods with large language models (LLMs) accessing scientific literature to generate accessible natural language explanations of raw chemical data automatically. We conducted 5 case studies to evaluate the performance of XpertAI. Our results show that XpertAI combines the strengths of LLMs and XAI tools in generating specific, scientific, and interpretable explanations.
    摘要 “可解释人工智能”(XAI)是一个emerging field in AI,旨在解决机器学习模型的透明性问题。此外,实验表明XAI可以用来撷取输入-输出关系,使其成为化学领域理解结构-性能关系的有用工具。然而,XAI方法的主要限制是它们是技术专业人员所设计的。我们提出了XpertAI框架,它 integrate XAI方法和大量自然语言模型(LLM),以生成自动化的原始化化学数据的可读性自然语言解释。我们进行了5例study来评估XpertAI的表现。我们的结果显示XpertAI结合了LLM和XAI工具的优点,能够生成特定、科学和可解释的解释。

Discordance Minimization-based Imputation Algorithms for Missing Values in Rating Data

  • paper_url: http://arxiv.org/abs/2311.04035
  • repo_url: None
  • paper_authors: Young Woong Park, Jinhak Kim, Dan Zhu
    for: 这个论文是为了处理多个评分列表的缺失评分问题而写的。methods: 本论文使用了数据分析和优化模型来填充缺失评分,并使用了known评分信息来优化模型。results: Computational experiments based on real-world and synthetic rating data sets show that the proposed methods outperform the state-of-the-art general imputation methods in the literature in terms of imputation accuracy。
    Abstract Ratings are frequently used to evaluate and compare subjects in various applications, from education to healthcare, because ratings provide succinct yet credible measures for comparing subjects. However, when multiple rating lists are combined or considered together, subjects often have missing ratings, because most rating lists do not rate every subject in the combined list. In this study, we propose analyses on missing value patterns using six real-world data sets in various applications, as well as the conditions for applicability of imputation algorithms. Based on the special structures and properties derived from the analyses, we propose optimization models and algorithms that minimize the total rating discordance across rating providers to impute missing ratings in the combined rating lists, using only the known rating information. The total rating discordance is defined as the sum of the pairwise discordance metric, which can be written as a quadratic function. Computational experiments based on real-world and synthetic rating data sets show that the proposed methods outperform the state-of-the-art general imputation methods in the literature in terms of imputation accuracy.
    摘要 评分 frequently 用于评估和比较应用领域中的对象,从教育到医疗,因为评分提供了简洁又可靠的对比方法。然而,当多个评分列表合并或考虑在一起时,对象经常会有缺失评分,因为大多数评分列表不会对所有的对象进行评分。在这项研究中,我们对缺失值模式进行分析,使用了六个真实世界数据集,来描述不同应用领域的特殊结构和性质。基于这些分析结果,我们提出了优化模型和算法,以最小化总评分差异 across 评分提供者,使用只知道的评分信息来填充缺失评分。总评分差异可以写作quadratic function。实验结果表明,我们的方法在实际世界和 sintetic 评分数据集上比现有的一般填充方法更高的填充精度。

Joint model for longitudinal and spatio-temporal survival data

  • paper_url: http://arxiv.org/abs/2311.04008
  • repo_url: None
  • paper_authors: Victor Medina-Olivares, Finn Lindgren, Raffaella Calabrese, Jonathan Crook
  • for: 这篇论文主要关注的是债务风险分析中的生存模型,尤其是在对于时间变化的条件下,如何模型对于债务者的时间存活。
  • methods: 这篇论文提出了一个称为Spatio-Temporal Joint Model(STJM)的统计模型,用于捕捉债务者的空间和时间效应,并且考虑到这些效应之间的互动。另外,这篇论文还使用了一种称为Integrated Nested Laplace Approximation(INLA)的估计方法,以便应对大规模的数据集。
  • results: 这篇论文的实验结果显示,包含空间效应可以舒缓债务者的时间存活预测性,但是当还包含时间变化的条件时,这些效应的提升效果变得较少定态。
    Abstract In credit risk analysis, survival models with fixed and time-varying covariates are widely used to predict a borrower's time-to-event. When the time-varying drivers are endogenous, modelling jointly the evolution of the survival time and the endogenous covariates is the most appropriate approach, also known as the joint model for longitudinal and survival data. In addition to the temporal component, credit risk models can be enhanced when including borrowers' geographical information by considering spatial clustering and its variation over time. We propose the Spatio-Temporal Joint Model (STJM) to capture spatial and temporal effects and their interaction. This Bayesian hierarchical joint model reckons the survival effect of unobserved heterogeneity among borrowers located in the same region at a particular time. To estimate the STJM model for large datasets, we consider the Integrated Nested Laplace Approximation (INLA) methodology. We apply the STJM to predict the time to full prepayment on a large dataset of 57,258 US mortgage borrowers with more than 2.5 million observations. Empirical results indicate that including spatial effects consistently improves the performance of the joint model. However, the gains are less definitive when we additionally include spatio-temporal interactions.
    摘要 在信用风险分析中, fixes 和时间变化的 covariates 的存 lived 模型广泛用于预测债务人的时间事件。当时间变化的驱动器是内生的时,模型同时考虑到了存 lived 时间的演化和内生 covariates。此外,信用风险模型可以通过包括债务人的地理信息来提高,考虑到了空间层次和时间变化的交互作用。我们提议使用 Spatio-Temporal Joint Model (STJM) 来捕捉空间和时间效应的交互作用。这是一种 Bayesian 层次模型,可以考虑债务人之间的不见性差异,并在同一个地区和同一个时间点上reckons 这些不见性差异的存 lived 效应。为了优化 STJM 模型在大量数据上的估计,我们使用 Integrated Nested Laplace Approximation (INLA) 方法。我们应用 STJM 模型来预测57,258名美国 mortgage 债务人的全 prepayment 时间,共有超过250万个观测。实际结果表明,包括空间效应可以一直提高存 lived 模型的表现,但是在同时包括空间-时间交互的情况下,提高的效果较为吃水。

An Initialization Schema for Neuronal Networks on Tabular Data

  • paper_url: http://arxiv.org/abs/2311.03996
  • repo_url: None
  • paper_authors: Wolfgang Fuhl
  • for: 这篇论文是关于使用神经网络进行表格数据预测的研究,尤其是在使用多元数据时进行回归和分类 tasks 时。
  • methods: 该论文提出了一种使用二进制初始化神经网络来解决表格数据预测问题,同时还提出了一种使用梯度Masking和最后一层二进制初始化来进行集成训练。
  • results: 作者在多个公共数据集上进行了实验,并证明了该方法可以比其他神经网络方法提供更好的性能。Here’s the same information in English:
  • for: This paper is about using neural networks for tabular data prediction, particularly for regression and classification tasks when dealing with heterogeneous data.
  • methods: The paper proposes using a binomial initialization scheme for neural networks to solve the tabular data prediction problem, and also introduces a gradient masking and last-layer binomial initialization method for joint ensemble training.
  • results: The authors experimented on multiple public datasets and demonstrated that their approach outperforms other neural network-based methods.
    Abstract Nowadays, many modern applications require heterogeneous tabular data, which is still a challenging task in terms of regression and classification. Many approaches have been proposed to adapt neural networks for this task, but still, boosting and bagging of decision trees are the best-performing methods for this task. In this paper, we show that a binomial initialized neural network can be used effectively on tabular data. The proposed approach shows a simple but effective approach for initializing the first hidden layer in neural networks. We also show that this initializing schema can be used to jointly train ensembles by adding gradient masking to batch entries and using the binomial initialization for the last layer in a neural network. For this purpose, we modified the hinge binary loss and the soft max loss to make them applicable for joint ensemble training. We evaluate our approach on multiple public datasets and showcase the improved performance compared to other neural network-based approaches. In addition, we discuss the limitations and possible further research of our approach for improving the applicability of neural networks to tabular data. Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list
    摘要 现在,许多现代应用需要多种类型的表格数据,这是神经网络预测和分类任务中的挑战。许多方法已经被提出来使神经网络适应这个任务,但是权值和袋包的决策树仍然是最佳性能的方法。在这篇论文中,我们表明了使用初始化的神经网络可以有效地处理表格数据。我们提出的方法显示了一种简单 yet 有效的初始化方法,可以用于初始化神经网络的第一层隐藏层。此外,我们还显示了如何使用梯度屏蔽和最后一层神经网络的 binomial 初始化来联合训练集成体。为此,我们修改了边界二进制损失和软max损失,使其适用于联合集成训练。我们在多个公共数据集上评估了我们的方法,并示出了与其他神经网络基于方法相比的改善性。此外,我们还讨论了我们方法的局限性和可能的进一步研究,以提高神经网络对表格数据的应用性。Link: https://es-cloud.cs.uni-tuebingen.de/d/8e2ab8c3fdd444e1a135/?p=%2FInitializationNeuronalNetworksTabularData&mode=list

Bandit Pareto Set Identification: the Fixed Budget Setting

  • paper_url: http://arxiv.org/abs/2311.03992
  • repo_url: None
  • paper_authors: Cyrille Kone, Emilie Kaufmann, Laura Richert
  • for: 本研究探讨了一个多目标纯探索问题,即在多重臂抽象模型中识别不确定多变量分布的问题。
  • methods: 我们提出了首个针对 fixes 预算 pareto 集标识任务的算法,并进行了分析。我们的算法 combine了精心估算每个臂在或外 pareto 集中的困难程度与一种通用淘汰方案。
  • results: 我们证明了两种特定实现(EGE-SR 和 EGE-SH)在预算增长 exponentially 快的速度下,其错误概率 decay exponentially 快,并且与信息论下界相支持。我们还通过使用实际数据和 sintetic 数据进行了Empirical 研究,显示了我们的算法在实际场景中的良好表现。
    Abstract We study a multi-objective pure exploration problem in a multi-armed bandit model. Each arm is associated to an unknown multi-variate distribution and the goal is to identify the distributions whose mean is not uniformly worse than that of another distribution: the Pareto optimal set. We propose and analyze the first algorithms for the \emph{fixed budget} Pareto Set Identification task. We propose Empirical Gap Elimination, a family of algorithms combining a careful estimation of the ``hardness to classify'' each arm in or out of the Pareto set with a generic elimination scheme. We prove that two particular instances, EGE-SR and EGE-SH, have a probability of error that decays exponentially fast with the budget, with an exponent supported by an information theoretic lower-bound. We complement these findings with an empirical study using real-world and synthetic datasets, which showcase the good performance of our algorithms.
    摘要 我们研究一个多目标纯探索问题在多重机枪模型中。每棒机枪都与未知多变分布相关,目标是确定这些分布的含义,使其不 worse than 另一个分布:Pareto优化集。我们提出并分析了首个预算Pareto集标识任务的算法。我们提出了Empirical Gap Elimination,一种结合精心估计每棒机枪是否在Pareto集中的“困难分类”度量和一种通用排除方案。我们证明了两个特定实例EGE-SR和EGE-SH的概率错误在预算下随预算呈指数减少,其下界支持信息论下界。我们补充这些发现通过实际数据和 sintetic 数据的实验,显示我们的算法表现良好。

Cup Curriculum: Curriculum Learning on Model Capacity

  • paper_url: http://arxiv.org/abs/2311.03956
  • repo_url: https://github.com/luca-scharr/cupcurriculum
  • paper_authors: Luca Scharr, Vanessa Toborek
  • for: 提高自然语言处理任务的学习性能
  • methods: 使用变化版iterative magnitude pruning来降低模型容量,然后在第二阶段重新引入这些 weights,使模型容量示出“杯”形曲线
  • results: 比EARLY STOPPING更可靠地避免过拟合,并且可靠地提高自然语言处理任务的学习性能
    Abstract Curriculum learning (CL) aims to increase the performance of a learner on a given task by applying a specialized learning strategy. This strategy focuses on either the dataset, the task, or the model. There is little to no work analysing the possibilities to apply CL on the model capacity in natural language processing. To close this gap, we propose the cup curriculum. In a first phase of training we use a variation of iterative magnitude pruning to reduce model capacity. These weights are reintroduced in a second phase, resulting in the model capacity to show a cup-shaped curve over the training iterations. We empirically evaluate different strategies of the cup curriculum and show that it outperforms early stopping reliably while exhibiting a high resilience to overfitting.
    摘要 curriculum学习(CL)目的是提高学习者在给定任务上的性能,通过特殊的学习策略。这种策略可以对数据集、任务或模型进行特化。现在,对于自然语言处理领域中模型容量的可能性分析几乎无人做。为了填补这个差距,我们提出了杯课程。在第一个训练阶段,我们使用迭代矩阵裁剪来降低模型容量。这些权重在第二个训练阶段重新引入,使模型容量示出一个杯形曲线在训练轮次中变化。我们对杯课程不同策略进行实验性评估,并证明它可靠地超过早停止,同时具有高鲁棒性。

Blind Federated Learning via Over-the-Air q-QAM

  • paper_url: http://arxiv.org/abs/2311.04253
  • repo_url: None
  • paper_authors: Saeed Razavikia, José Mairton Barros Da Silva Júnior, Carlo Fischione
  • for: 这项研究探讨了 federated edge learning 在多址接入通道上的应用。
  • methods: 该研究提出了一种使用 q-ary quadrature amplitude modulation 的数字在天空计算策略,以减轻 edge devices 和接入点之间的通信负担。
  • results: 研究人员发现,通过增加 edge server 上的多个天线,可以有效地 Mitigate 随机折叠的影响,并且可以通过调整模ulation 的质因数来提高模型精度。
    Abstract In this work, we investigate federated edge learning over a fading multiple access channel. To alleviate the communication burden between the edge devices and the access point, we introduce a pioneering digital over-the-air computation strategy employing q-ary quadrature amplitude modulation, culminating in a low latency communication scheme. Indeed, we propose a new federated edge learning framework in which edge devices use digital modulation for over-the-air uplink transmission to the edge server while they have no access to the channel state information. Furthermore, we incorporate multiple antennas at the edge server to overcome the fading inherent in wireless communication. We analyze the number of antennas required to mitigate the fading impact effectively. We prove a non-asymptotic upper bound for the mean squared error for the proposed federated learning with digital over-the-air uplink transmissions under both noisy and fading conditions. Leveraging the derived upper bound, we characterize the convergence rate of the learning process of a non-convex loss function in terms of the mean square error of gradients due to the fading channel. Furthermore, we substantiate the theoretical assurances through numerical experiments concerning mean square error and the convergence efficacy of the digital federated edge learning framework. Notably, the results demonstrate that augmenting the number of antennas at the edge server and adopting higher-order modulations improve the model accuracy up to 60\%.
    摘要 在这项工作中,我们调查了联邦边缘学习在潮湮多access Channel上。为了减轻边缘设备和访问点之间的通信负担,我们提出了一种先进的数字飞行计算策略,使用q-ary quadrature amplitude modulation,从而实现低延迟的通信方案。实际上,我们提出了一个新的联邦边缘学习框架,在其中边缘设备使用数字模ulation进行无线上传到边缘服务器,而无需知道通道状态信息。此外,我们在边缘服务器中添加了多个天线,以抗衡无线通信中的潮湮。我们分析了需要用于消除潮湮的天线数量。我们证明了在噪声和潮湮条件下的非假性上限,用于评估提出的联邦边缘学习中的平均方差误差。基于 derive的上限,我们characterize了联邦边缘学习过程中的学习速率,即在损失函数不对称情况下,由潮湮通道引起的平均方差误差的减少率。此外,我们通过数字 federated edge learning框架的数学实验,证明了论证的保障。结果表明,增加边缘服务器中天线数量和采用更高的模ulation取得更高的模型准确率,达到60%。

CNN-Based Structural Damage Detection using Time-Series Sensor Data

  • paper_url: http://arxiv.org/abs/2311.04252
  • repo_url: None
  • paper_authors: Ishan Pathak, Ishan Jha, Aditya Sadana, Basuraj Bhowmik
  • for: 本研究旨在提出一种新的损害检测方法,利用新的卷积神经网络算法来检测结构损害。
  • methods: 本方法使用卷积神经网络算法来检测时间序列数据中的深度空间特征,并结合空间和时间特征来增强检测精度。
  • results: 实验结果表明,新的卷积神经网络算法能够准确地检测三层结构中的损害。
    Abstract Structural Health Monitoring (SHM) is vital for evaluating structural condition, aiming to detect damage through sensor data analysis. It aligns with predictive maintenance in modern industry, minimizing downtime and costs by addressing potential structural issues. Various machine learning techniques have been used to extract valuable information from vibration data, often relying on prior structural knowledge. This research introduces an innovative approach to structural damage detection, utilizing a new Convolutional Neural Network (CNN) algorithm. In order to extract deep spatial features from time series data, CNNs are taught to recognize long-term temporal connections. This methodology combines spatial and temporal features, enhancing discrimination capabilities when compared to methods solely reliant on deep spatial features. Time series data are divided into two categories using the proposed neural network: undamaged and damaged. To validate its efficacy, the method's accuracy was tested using a benchmark dataset derived from a three-floor structure at Los Alamos National Laboratory (LANL). The outcomes show that the new CNN algorithm is very accurate in spotting structural degradation in the examined structure.
    摘要 structural health monitoring (SHM) 是现代结构健康监测的关键技术,通过感知器数据分析来评估结构的状况,旨在检测结构的损害。与predictive maintenance相结合,SHM可以最小化机器下TIME和成本,通过解决可能的结构问题。various machine learning techniques 有很多机器学习技术,通常基于结构知识,用于从振荡数据中提取有价值信息。本研究提出了一种新的卷积神经网络算法,用于结构损害检测。通过学习长期时间连接,卷积神经网络能够提取深层空间特征,从而提高分类能力。时间序列数据被分为两类:不损害和损害。为验证这种方法的正确性,研究人员使用了来自三层结构的LANL benchmark dataset进行测试。结果表明,新的卷积神经网络算法具有高准确性,可以准确地检测LANL结构中的结构衰老。

Structure of universal formulas

  • paper_url: http://arxiv.org/abs/2311.03910
  • repo_url: https://github.com/smith86n/wiki-is-mostly-fake-radom-words-word-genrationr-
  • paper_authors: Dmitry Yarotsky
  • for: This paper is written to analyze the essential structural elements of highly expressive models, such as neural networks, and to study their approximating capabilities.
  • methods: The paper uses a hierarchy of expressiveness classes to connect the global approximability property to the weaker property of infinite VC dimension, and proves a series of classification results for several increasingly complex functional families.
  • results: The paper shows that fixed-size neural networks with not more than one layer of neurons having transcendental activations cannot in general approximate functions on arbitrary finite sets, but gives examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets but fail to do so on the whole domain of definition.
    Abstract By universal formulas we understand parameterized analytic expressions that have a fixed complexity, but nevertheless can approximate any continuous function on a compact set. There exist various examples of such formulas, including some in the form of neural networks. In this paper we analyze the essential structural elements of these highly expressive models. We introduce a hierarchy of expressiveness classes connecting the global approximability property to the weaker property of infinite VC dimension, and prove a series of classification results for several increasingly complex functional families. In particular, we introduce a general family of polynomially-exponentially-algebraic functions that, as we prove, is subject to polynomial constraints. As a consequence, we show that fixed-size neural networks with not more than one layer of neurons having transcendental activations (e.g., sine or standard sigmoid) cannot in general approximate functions on arbitrary finite sets. On the other hand, we give examples of functional families, including two-hidden-layer neural networks, that approximate functions on arbitrary finite sets, but fail to do that on the whole domain of definition.
    摘要 通过通用方程式我们理解参数化分析表达式,它们有固定的复杂性,但可以将任何连续函数在有界集中逼近。存在许多这种表达式的例子,包括一些神经网络。在这篇论文中,我们分析高表达能力模型的基本结构元素。我们建立一个表达能力层次结构,将全球逼近性质连接到弱性质VC次数,并证明了一系列分类结果。特别是,我们引入一个总体上是多项式幂指数代数函数的一个家族,并证明它们是受到多项式约束的。这使得我们显示,具有不超过一层神经元的几何活动函数(例如,极值函数或标准sigmoid函数)的固定大小神经网络不能在一般的有限集上逼近函数。然而,我们给出了一些函数家族的示例,包括两层隐藏层神经网络,它们在有限集上逼近函数,但在整个定义域上不能逼近函数。

Learning-Based Latency-Constrained Fronthaul Compression Optimization in C-RAN

  • paper_url: http://arxiv.org/abs/2311.03899
  • repo_url: None
  • paper_authors: Axel Grönland, Bleron Klaiqi, Xavier Gelabert
  • for: 这个论文旨在探讨云化无线移动网络的发展,以及Radio Access Network(RAN)功能的分布或中央化部署,以获得低成本、高容量和改善硬件利用率等优点。
  • methods: 该论文提出了一种使用深度强化学习(DRL)控制功能部署,以适应不同的FRONTHAUL(FH)负荷水平。
  • results: simulation结果表明,该DRL控制方法能够实现高达68.7%的FH资源利用率,同时能够遵循先定的FH延迟限制(在这个案例中为260微秒),并在不同的FH负荷水平下保持良好的空接口吞吐量。
    Abstract The evolution of wireless mobile networks towards cloudification, where Radio Access Network (RAN) functions can be hosted at either a central or distributed locations, offers many benefits like low cost deployment, higher capacity, and improved hardware utilization. Nevertheless, the flexibility in the functional deployment comes at the cost of stringent fronthaul (FH) capacity and latency requirements. One possible approach to deal with these rigorous constraints is to use FH compression techniques. To ensure that FH capacity and latency requirements are met, more FH compression is applied during high load, while less compression is applied during medium and low load to improve FH utilization and air interface performance. In this paper, a model-free deep reinforcement learning (DRL) based FH compression (DRL-FC) framework is proposed that dynamically controls FH compression through various configuration parameters such as modulation order, precoder granularity, and precoder weight quantization that affect both FH load and air interface performance. Simulation results show that DRL-FC exhibits significantly higher FH utilization (68.7% on average) and air interface throughput than a reference scheme (i.e. with no applied compression) across different FH load levels. At the same time, the proposed DRL-FC framework is able to meet the predefined FH latency constraints (in our case set to 260 $\mu$s) under various FH loads.
    摘要 云化的无线移动网络的演进,让无线接入网络(RAN)功能可以在中央或分布式位置上进行主机,具有较低的成本、更高的容量和更好的硬件利用率。然而,这些功能的分布部署带来了前方对接(FH)的容量和延迟需求的严格要求。为了解决这个问题,可以使用FH压缩技术。为了确保FH的容量和延迟需求得到满足,在高负载情况下增加更多的FH压缩,而在中等和低负载情况下则增加更少的压缩,以提高FH的利用率和空中接口性能。在这篇研究中,我们提出了一个基于深度学习(DRL)的无预设FH压缩框架(DRL-FC),可以动态控制FH压缩。这个框架通过调整不同的配置参数,例如模ulation order、前节精致度和前节重量量化,以影响FH负载和空中接口性能。我们的 simulations 显示,DRL-FC 可以在不同的FH负载水平上显示出较高的FH利用率(平均为68.7%)和空中接口传输率,并且可以遵循预先定义的FH延迟限制(在我们的情况下为260微秒)。

An Explainable Framework for Machine learning-Based Reactive Power Optimization of Distribution Network

  • paper_url: http://arxiv.org/abs/2311.03863
  • repo_url: None
  • paper_authors: Wenlong Liao, Benjamin Schäfer, Dalin Qin, Gonghao Zhang, Zhixian Wang, Zhe Yang
  • for: 优化分布网络的反应力能量,使用机器学习模型,但是通常 Considered as black boxes,困难 для电力系统运维者了解和理解机器学习模型的决策过程中的可能的偏见或错误。
  • methods: 提出了一种可解释的机器学习框架,用于优化分布网络的反应力能量。首先,使用 Shapley 添加性解释框架来衡量每个输入特征对机器学习模型生成的反应力优化解决方案的贡献。其次,开发了一种模型不依赖的近似方法,以避免直接计算 Shapley 值的计算含量。
  • results: 使用 simulated annealing 算法和 Shapley 添加性解释框架,可以准确地解释机器学习模型生成的反应力优化解决方案,从全球和实例两个视角来看。此外,提出的可解释框架是模型无关的,因此可以应用于不同的模型(如神经网络)。
    Abstract To reduce the heavy computational burden of reactive power optimization of distribution networks, machine learning models are receiving increasing attention. However, most machine learning models (e.g., neural networks) are usually considered as black boxes, making it challenging for power system operators to identify and comprehend potential biases or errors in the decision-making process of machine learning models. To address this issue, an explainable machine-learning framework is proposed to optimize the reactive power in distribution networks. Firstly, a Shapley additive explanation framework is presented to measure the contribution of each input feature to the solution of reactive power optimizations generated from machine learning models. Secondly, a model-agnostic approximation method is developed to estimate Shapley values, so as to avoid the heavy computational burden associated with direct calculations of Shapley values. The simulation results show that the proposed explainable framework can accurately explain the solution of the machine learning model-based reactive power optimization by using visual analytics, from both global and instance perspectives. Moreover, the proposed explainable framework is model-agnostic, and thus applicable to various models (e.g., neural networks).
    摘要 Firstly, a Shapley additive explanation framework is presented to measure the contribution of each input feature to the solution of reactive power optimizations generated from machine learning models. Secondly, a model-agnostic approximation method is developed to estimate Shapley values, so as to avoid the heavy computational burden associated with direct calculations of Shapley values.The simulation results show that the proposed explainable framework can accurately explain the solution of the machine learning model-based reactive power optimization by using visual analytics, from both global and instance perspectives. Moreover, the proposed explainable framework is model-agnostic, and thus applicable to various models (e.g., neural networks).

Improved MDL Estimators Using Fiber Bundle of Local Exponential Families for Non-exponential Families

  • paper_url: http://arxiv.org/abs/2311.03852
  • repo_url: None
  • paper_authors: Kohei Miyamoto, Andrew R. Barron, Jun’ichi Takeuchi
  • for: 这篇论文主要是关于最小描述长度(MDL)估计器的分析。
  • methods: 作者使用了两部分代码,其中一部分是通用编码,另一部分是基于数据描述的地方射影家族。
  • results: 作者得出了一个 tight 的上界于风险和损失,基于Barron和Cover在1991年提出的理论。此外,作者还证明了这些结果可以应用于杂合家族,这些家族是非对称家族的典型示例。
    Abstract Minimum Description Length (MDL) estimators, using two-part codes for universal coding, are analyzed. For general parametric families under certain regularity conditions, we introduce a two-part code whose regret is close to the minimax regret, where regret of a code with respect to a target family M is the difference between the code length of the code and the ideal code length achieved by an element in M. This is a generalization of the result for exponential families by Gr\"unwald. Our code is constructed by using an augmented structure of M with a bundle of local exponential families for data description, which is not needed for exponential families. This result gives a tight upper bound on risk and loss of the MDL estimators based on the theory introduced by Barron and Cover in 1991. Further, we show that we can apply the result to mixture families, which are a typical example of non-exponential families.
    摘要 “我们研究了使用两部分编码的最小描述长度(MDL)估计器,并提出了一个具有近似最小最大 regret的两部分编码。这是对于一般假设家族下的特定正规情况下的一个扩展,并且不需要对于对应的家族M进行特殊的调整。我们的编码是通过将家族M与一个本地对应的对应家族所组成的扩展结构,并且不需要在描述数据时使用本地对应家族。这个结果可以对于泛化家族进行紧凑的上限 bounds,并且可以应用到混合家族,这些家族通常是非对称的。”

User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates

  • paper_url: http://arxiv.org/abs/2311.03797
  • repo_url: None
  • paper_authors: Hilal Asi, Daogao Liu
  • for: 这个研究探讨了具有用户级 differentially private stochastic convex optimization (DP-SCO) 的方法,其中每个用户可能拥有多个数据项目。
  • methods: 我们开发了新的算法,用于实现用户级 DP-SCO,这些算法可以在 polynomial time 内得到均值预测的优化率,并且需要用户数量增长仅具有对数arithm 相关的增长。
  • results: 我们的算法可以在 polynomial time 内得到均值预测的优化率,并且是首个在非凸函数上取得优化率的方法。这些算法基于多个通过 DP-SGD,结合了一个新的私人均值估计程序,用于处理对集中数据的测量。
    Abstract We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy, where each user may hold multiple data items. Existing work for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al. (2023)] or requires the number of users to grow polynomially with the dimensionality of the problem with additional strict assumptions [Bassily et al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain optimal rates for both convex and strongly convex functions in polynomial time and require the number of users to grow only logarithmically in the dimension. Moreover, our algorithms are the first to obtain optimal rates for non-smooth functions in polynomial time. These algorithms are based on multiple-pass DP-SGD, combined with a novel private mean estimation procedure for concentrated data, which applies an outlier removal step before estimating the mean of the gradients.
    摘要 我们研究具有用户级隐私的泛化减少隐私整数优化(DP-SCO),每个用户可能拥有多个数据项。现有的用户级DP-SCO工作ether requires super-polynomial runtime [Ghazi et al. (2023)]或者需要用户数量随着问题维度的度数增长,并且具有更多的严格假设 [Bassily et al. (2023)].我们开发了新的用户级DP-SCO算法,可以在 polynomial time 内获得 convex 和强 convex 函数的优化率,并且用户数量只需要在维度上增长logarithmically。此外,我们的算法还是第一个在 polynomial time 内获得非光滑函数的优化率。这些算法基于多通道DP-SGD,并结合了一种新的隐私性含义评估方法,用于集中数据中的异常值除除。

Neuro-GPT: Developing A Foundation Model for EEG

  • paper_url: http://arxiv.org/abs/2311.03764
  • repo_url: None
  • paper_authors: Wenhui Cui, Woojae Jeong, Philipp Thölke, Takfarinas Medani, Karim Jerbi, Anand A. Joshi, Richard M. Leahy
  • for: Addressing the challenges of data scarcity and heterogeneity in Brain-Computer Interface (BCI) tasks using Electroencephalography (EEG) data.
  • methods: Using a foundation model consisting of an EEG encoder and a GPT model, pre-trained on a large-scale public EEG dataset with a self-supervised task, and fine-tuning the model on a Motor Imagery Classification task with only 9 subjects.
  • results: Significant improvement in classification performance compared to a model trained from scratch, demonstrating the advanced generalizability of the foundation model and its ability to address data scarcity and heterogeneity.
    Abstract To handle the scarcity and heterogeneity of electroencephalography (EEG) data in Brain-Computer Interface (BCI) tasks, and to harness the vast public data, we propose Neuro-GPT, a foundation model consisting of an EEG encoder and a GPT model. The foundation model is pre-trained on a large-scale public EEG dataset, using a self-supervised task which learns how to reconstruct the masked chunk in EEG. We then fine-tune the foundation model on a Motor Imagery Classification task where only 9 subjects are available. Experiments demonstrated that applying foundation model can significantly improve classification performance compared to the model trained from scratch, which provides evidence for the advanced generalizability of foundation model and the ability to address the challenges of data scarcity and heterogeneity.
    摘要 为了处理电气脑图(EEG)数据的缺乏和多样性在脑计算机接口(BCI)任务中,以及利用公共数据,我们提议使用Neuro-GPT基础模型。该基础模型包括EEG编码器和GPT模型。我们在一个大规模公共EEG数据集上预训练了基础模型,使用自我supervised任务,学习如何重建屏蔽的EEG块。然后,我们在9名参与者的电幕想象分类任务上细化了基础模型。实验表明,通过基础模型可以明显提高分类性能,提供了基础模型的高度通用性和数据缺乏和多样性的能力。

Posterior Sampling-Based Bayesian Optimization with Tighter Bayesian Regret Bounds

  • paper_url: http://arxiv.org/abs/2311.03760
  • repo_url: None
  • paper_authors: Shion Takeno, Yu Inatsu, Masayuki Karasuyama, Ichiro Takeuchi
  • for: 本研究旨在提出一种新的吸引函数(PIMS),用于解决GP-UCB和TS在 bayesian 优化中的 hyperparameter 调整和过度探索问题。
  • methods: 本研究使用了一种随机变体的 GP-UCB,以及一种新的吸引函数PIMS,并对其进行了 theoretically 和实验性的分析。
  • results: 研究发现,PIMS可以减少GP-UCB和TS中的 hyperparameter 调整和过度探索问题,同时保持BCR 下界。此外,PIMS在各种实验中表现出色,可以 Mitigate GP-UCB 和 TS 中的实际问题。
    Abstract Among various acquisition functions (AFs) in Bayesian optimization (BO), Gaussian process upper confidence bound (GP-UCB) and Thompson sampling (TS) are well-known options with established theoretical properties regarding Bayesian cumulative regret (BCR). Recently, it has been shown that a randomized variant of GP-UCB achieves a tighter BCR bound compared with GP-UCB, which we call the tighter BCR bound for brevity. Inspired by this study, this paper first shows that TS achieves the tighter BCR bound. On the other hand, GP-UCB and TS often practically suffer from manual hyperparameter tuning and over-exploration issues, respectively. To overcome these difficulties, we propose yet another AF called a probability of improvement from the maximum of a sample path (PIMS). We show that PIMS achieves the tighter BCR bound and avoids the hyperparameter tuning, unlike GP-UCB. Furthermore, we demonstrate a wide range of experiments, focusing on the effectiveness of PIMS that mitigates the practical issues of GP-UCB and TS.
    摘要 中文翻译:在搜索优化(Search Optimization)中,各种获得函数(Acquisition Functions,AF)具有证明过的理论性,其中 Gaussian Process Upper Confidence Bound(GP-UCB)和Thompson Sampling(TS)是最为知名的选择。最近,一种随机变体的 GP-UCB 被证明可以实现更紧的 Bayesian Cumulative Regret(BCR) bound,我们简称为“更紧的 BCR bound”。受到这个研究的启发,本文首先证明 TS 实现了更紧的 BCR bound。然而,GP-UCB 和 TS 在实践中经常受到手动参数调整和过度探索问题的困扰。为了解决这些问题,我们提出了另一种 AF,即样本路径上的最大可能性提升(PIMS)。我们证明 PIMS 实现了更紧的 BCR bound,并且不需要手动参数调整,与 GP-UCB 不同。此外,我们在各种实验中展示了 PIMS 可以减轻 GP-UCB 和 TS 的实践问题。

Manifold learning: what, how, and why

  • paper_url: http://arxiv.org/abs/2311.03757
  • repo_url: None
  • paper_authors: Marina Meilă, Hanyu Zhang
  • for: 这个论文主要用于探讨幂等学习(Manifold Learning,简称ML)的原理、方法和统计基础。
  • methods: 论文涵盖了主要的ML方法,包括ISOMAP、LLE、T-SNE等,以及它们的统计基础。
  • results: 论文通过描述ML方法的应用场景和特点,帮助读者更深入地理解高维数据的几何结构和可视化。
    Abstract Manifold learning (ML), known also as non-linear dimension reduction, is a set of methods to find the low dimensional structure of data. Dimension reduction for large, high dimensional data is not merely a way to reduce the data; the new representations and descriptors obtained by ML reveal the geometric shape of high dimensional point clouds, and allow one to visualize, de-noise and interpret them. This survey presents the principles underlying ML, the representative methods, as well as their statistical foundations from a practicing statistician's perspective. It describes the trade-offs, and what theory tells us about the parameter and algorithmic choices we make in order to obtain reliable conclusions.
    摘要 多元学习(ML),也称为非线性维度减少,是一组方法用于找到数据的低维度结构。对于大量、高维度数据来说,维度减少不仅是一种减少数据的方式,新的表示和描述符由ML获得的减少后的点云几何结构,允许我们视觉、去噪和解释它们。本文介绍了ML的原理、代表方法以及其统计基础,从实际统计家的视角出发,描述了Parameter和算法选择的交易和统计理论的权威性。

Enhanced physics-informed neural networks with domain scaling and residual correction methods for multi-frequency elliptic problems

  • paper_url: http://arxiv.org/abs/2311.03746
  • repo_url: None
  • paper_authors: Deok-Kyu Jang, Hyea Hyun Kim, Kyungsoo Kim
  • for: 这 paper 用于开发基于神经网络的偏微分方程解决方法,用于解决具有多频解的偏微分方程。
  • methods: 这 paper 使用神经网络方法来approximation偏微分方程,而这些方法不受偏微分方程的形式或问题域的约束。为了 Addressing the issue of multi-frequency solutions, the paper proposes domain scaling and residual correction methods.
  • results: 约束 experiments 表明,提案的方法可以高效地和准确地解决多频模型问题。
    Abstract In this paper, neural network approximation methods are developed for elliptic partial differential equations with multi-frequency solutions. Neural network work approximation methods have advantages over classical approaches in that they can be applied without much concerns on the form of the differential equations or the shape or dimension of the problem domain. When applied to problems with multi-frequency solutions, the performance and accuracy of neural network approximation methods are strongly affected by the contrast of the high- and low-frequency parts in the solutions. To address this issue, domain scaling and residual correction methods are proposed. The efficiency and accuracy of the proposed methods are demonstrated for multi-frequency model problems.
    摘要 在这篇论文中,我们开发了基于神经网络的方法来近似椭圆型偏微分方程。神经网络方法在对偏微分方程的形式和问题域的形状和维度没有很多限制下表现出优势。当应用到具有多频解的问题时,神经网络方法的性能和准确性受到解的高频和低频部分之间的对比强烈影响。为解决这个问题,我们提出了域扩展和剩余修正方法。我们对多频模拟问题进行了效率和准确性的证明。

Improved weight initialization for deep and narrow feedforward neural network

  • paper_url: http://arxiv.org/abs/2311.03733
  • repo_url: None
  • paper_authors: Hyunwoo Lee, Yunho Kim, Seungyeop Yang, Hayoung Choi
  • for: 提高深度神经网络模型的训练和部署效果和效率
  • methods: 提出新的初始化 веса方法,并证明其初始 веса矩阵的性质,以便减少ReLU神经元的死亡现象
  • results: 通过一系列实验和比较 existed方法,证明新 initialization 方法的效果
    Abstract Appropriate weight initialization settings, along with the ReLU activation function, have been a cornerstone of modern deep learning, making it possible to train and deploy highly effective and efficient neural network models across diverse artificial intelligence. The problem of dying ReLU, where ReLU neurons become inactive and yield zero output, presents a significant challenge in the training of deep neural networks with ReLU activation function. Theoretical research and various methods have been introduced to address the problem. However, even with these methods and research, training remains challenging for extremely deep and narrow feedforward networks with ReLU activation function. In this paper, we propose a new weight initialization method to address this issue. We prove the properties of the proposed initial weight matrix and demonstrate how these properties facilitate the effective propagation of signal vectors. Through a series of experiments and comparisons with existing methods, we demonstrate the effectiveness of the new initialization method.
    摘要 现代深度学习中的一个重要基础是合适的初始化设置和ReLU活化函数,使得可以训练和部署高效和高效的神经网络模型。ReLU活化函数导致神经元变得不活跃,并产生零输出,这成为深度神经网络训练中的一个重要挑战。理论研究和多种方法已经被提出来解决这个问题,但是训练非常深和窄的feedforward神经网络仍然是一个挑战。在这篇论文中,我们提出了一种新的初始化方法来解决这个问题。我们证明了提案的初始 вес矩阵的性质,并证明这些性质使得信号向量的有效传播。通过一系列实验和现有方法的比较,我们证明了新 initialization 方法的效果。

Pipeline Parallelism for DNN Inference with Practical Performance Guarantees

  • paper_url: http://arxiv.org/abs/2311.03703
  • repo_url: None
  • paper_authors: Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu
  • for: 优化深度神经网络(DNN)推理的管道并行性,通过将模型图分解成 $k$ 个阶段,最小化瓶颈阶段的运行时间,包括通信。
  • methods: 提出了实用的算法来解决这个NP困难问题,并证明其在做实践中是非常近似于优化的。通过比较与新的混合整数编程(MIP)表示法获得的强下界,证明了这些算法的有效性。
  • results: 应用这些算法和下界方法到生产模型中,实现了较大的优化保证比例,比如在 $k=16$ 管道阶段的生产数据上,通过几何平均来衡量,MIP 表示法可以超越标准的 combinatorial 下界,从 $2.175$ 提高到 $1.058$。这种工作表明,虽然最大吞吐量分配是理论上困难,但在实践中我们已经掌握了算法的问题,主要是开发更加准确的成本模型,以便 feed 到分配算法中。
    Abstract We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We design practical algorithms for this NP-hard problem and show that they are nearly optimal in practice by comparing against strong lower bounds obtained via novel mixed-integer programming (MIP) formulations. We apply these algorithms and lower-bound methods to production models to achieve substantially improved approximation guarantees compared to standard combinatorial lower bounds. For example, evaluated via geometric means across production data with $k=16$ pipeline stages, our MIP formulations more than double the lower bounds, improving the approximation ratio from $2.175$ to $1.058$. This work shows that while max-throughput partitioning is theoretically hard, we have a handle on the algorithmic side of the problem in practice and much of the remaining challenge is in developing more accurate cost models to feed into the partitioning algorithms.
    摘要 我们优化深度神经网络(DNN)推理管道的并行性,通过将模型图分割成$k$个阶段,最小化瓶颈阶段的运行时间,包括通信。我们设计了实用的算法来解决这个NP困难的问题,并证明它们在做实践中几乎是最佳的。我们通过对生产数据进行应用和下界方法进行比较,发现我们的MIP形式下的下界比标准的可combined下界高出许多。例如,通过 геометри mean 对生产数据进行评估,在$k=16$个管道阶段下,我们的MIP形式下的下界超过了标准的可combined下界,从$2.175$提高到$1.058$。这个工作表明,虽然最大吞吐量分配是理论上困难,但在实践中,我们已经掌握了分配问题的算法方面,主要的挑战在于开发更准确的成本模型,以便将其 feed 到分配算法中。

Dynamic Non-monotone Submodular Maximization

  • paper_url: http://arxiv.org/abs/2311.03685
  • repo_url: None
  • paper_authors: Kiarash Banihashem, Leyla Biabani, Samira Goudarzi, MohammadTaghi Hajiaghayi, Peyman Jabbarzade, Morteza Monemizadeh
  • for: 这paper主要研究了动态算法的应用于非升序子模式最大化问题。
  • methods: 该paper使用了几种动态算法来解决非升序子模式最大化问题,包括一种基于缓存的算法和一种基于扩展的算法。
  • results: 该paper提出了一种可以在非升序子模式下实现最大化的动态算法,并且可以在有限的订单数据上实现$(8+\epsilon)$-近似的解决方案。此外,paper还展示了这种动态算法在视频概要和最大扩展问题中的应用。
    Abstract Maximizing submodular functions has been increasingly used in many applications of machine learning, such as data summarization, recommendation systems, and feature selection. Moreover, there has been a growing interest in both submodular maximization and dynamic algorithms. In 2020, Monemizadeh and Lattanzi, Mitrovic, Norouzi{-}Fard, Tarnawski, and Zadimoghaddam initiated developing dynamic algorithms for the monotone submodular maximization problem under the cardinality constraint $k$. Recently, there have been some improvements on the topic made by Banihashem, Biabani, Goudarzi, Hajiaghayi, Jabbarzade, and Monemizadeh. In 2022, Chen and Peng studied the complexity of this problem and raised an important open question: "Can we extend [fully dynamic] results (algorithm or hardness) to non-monotone submodular maximization?". We affirmatively answer their question by demonstrating a reduction from maximizing a non-monotone submodular function under the cardinality constraint $k$ to maximizing a monotone submodular function under the same constraint. Through this reduction, we obtain the first dynamic algorithms to solve the non-monotone submodular maximization problem under the cardinality constraint $k$. Our algorithms maintain an $(8+\epsilon)$-approximate of the solution and use expected amortized $O(\epsilon^{-3}k^3\log^3(n)\log(k))$ or $O(\epsilon^{-1}k^2\log^3(k))$ oracle queries per update, respectively. Furthermore, we showcase the benefits of our dynamic algorithm for video summarization and max-cut problems on several real-world data sets.
    摘要 最大化半模态函数已经在机器学习中得到了广泛应用,如数据摘要、推荐系统和特征选择。此外,关于半模态最大化和动态算法的研究也越来越热门。在2020年,Monemizadeh和Lattanzi等人开始了对偏微 monotone 半模态最大化问题下的 cardinality 约束 $k$ 的动态算法的研究。近些年来,有一些关于这个主题的改进。在2022年,陈和平 studyd了这个问题的复杂性,并提出了一个重要的开放问题:"是否可以将fully dynamic 的结果(算法或困难)推广到非偏微半模态最大化问题?"。我们答复了这个问题,并通过一种减少从非偏微半模态最大化函数下的 cardinality 约束 $k$ 到偏微半模态最大化函数下的同样约束的方法来提供了第一个动态算法来解决非偏微半模态最大化问题。我们的算法保证能够获得 $(8+\epsilon)$ 的近似解和使用预期的整合 $O(\epsilon^{-3}k^3\log^3(n)\log(k))$ 或 $O(\epsilon^{-1}k^2\log^3(k))$ 的缓存读取次数每个更新。此外,我们还展示了我们的动态算法在视频摘要和 max-cut 问题中的应用。

Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural Networks

  • paper_url: http://arxiv.org/abs/2311.03683
  • repo_url: None
  • paper_authors: Ahmad Rashid, Serena Hacker, Guojun Zhang, Agustinus Kristiadi, Pascal Poupart
  • for: 防止神经网络在不同频谱上的过度自信
  • methods: 添加一个对应于额外类的输出值,使神经网络在远离训练数据时避免过度自信
  • results: 在多个benchmark上表现出色,与竞争对手相比,在远离训练数据和实际OOD数据上都达到了优秀的表现
    Abstract Discriminatively trained, deterministic neural networks are the de facto choice for classification problems. However, even though they achieve state-of-the-art results on in-domain test sets, they tend to be overconfident on out-of-distribution (OOD) data. For instance, ReLU networks -- a popular class of neural network architectures -- have been shown to almost always yield high confidence predictions when the test data are far away from the training set, even when they are trained with OOD data. We overcome this problem by adding a term to the output of the neural network that corresponds to the logit of an extra class, that we design to dominate the logits of the original classes as we move away from the training data.This technique provably prevents arbitrarily high confidence on far-away test data while maintaining a simple discriminative point-estimate training. Evaluation on various benchmarks demonstrates strong performance against competitive baselines on both far-away and realistic OOD data.
    摘要 通用训练的权重学习网络在分类问题上是默认选择。然而,它们在不同 FROM 数据集上测试时会表现出过度自信。例如,ReLU 网络——一种流行的 neural network 架构——在测试数据远离训练数据时总是给出高自信估计。我们解决这个问题 by adding 一个对应于输出 neural network 的特征,该特征对于远离训练数据的测试数据进行了抑制。这种技巧可以证明地防止测试数据远离训练数据时的无理高自信,而不需要采用复杂的泛化点估计。我们在多个 benchmark 上进行了评估,并与竞争对手的基准进行了比较,得到了强大的表现。

Graph Neural Networks for Power Grid Operational Risk Assessment

  • paper_url: http://arxiv.org/abs/2311.03661
  • repo_url: None
  • paper_authors: Yadong Zhang, Pranav M Karve, Sankaran Mahadevan
  • for: investigate the utility of graph neural network (GNN) surrogates for Monte Carlo (MC) sampling-based risk quantification in daily operations of power grid
  • methods: train GNN surrogates using supervised learning, and use them to obtain MC samples of the quantities of interest (operating reserve, transmission line flow) given the (hours-ahead) probabilistic wind generation and load forecast
  • results: GNN surrogates are sufficiently accurate for predicting the (bus-level, branch-level and system-level) grid state and enable fast as well as accurate operational risk quantification for power gridsHere is the same information in Simplified Chinese text:
  • for: 研究用 graph neural network (GNN) 模拟器来实现 Monte Carlo (MC) 样本基于的电力网运行风险评估
  • methods: 使用指导学习训练 GNN 模拟器,并使用它们来 obtian MC 样本中的量 Of Interest (操作储备、传输线流) 给 (多小时) 风力发电和吞吐量预测
  • results: GNN 模拟器能够准确预测 (电压级、分支级和系统级) 网络状态,并提供快速并准确的操作风险评估 для 电力网
    Abstract In this article, the utility of graph neural network (GNN) surrogates for Monte Carlo (MC) sampling-based risk quantification in daily operations of power grid is investigated. The MC simulation process necessitates solving a large number of optimal power flow (OPF) problems corresponding to the sample values of stochastic grid variables (power demand and renewable generation), which is computationally prohibitive. Computationally inexpensive surrogates of the OPF problem provide an attractive alternative for expedited MC simulation. GNN surrogates are especially suitable due to their superior ability to handle graph-structured data. Therefore, GNN surrogates of OPF problem are trained using supervised learning. They are then used to obtain Monte Carlo (MC) samples of the quantities of interest (operating reserve, transmission line flow) given the (hours-ahead) probabilistic wind generation and load forecast. The utility of GNN surrogates is evaluated by comparing OPF-based and GNN-based grid reliability and risk for IEEE Case118 synthetic grid. It is shown that the GNN surrogates are sufficiently accurate for predicting the (bus-level, branch-level and system-level) grid state and enable fast as well as accurate operational risk quantification for power grids. The article thus develops various tools for fast reliability and risk quantification for real-world power grids using GNNs.
    摘要 Here is the text in Simplified Chinese:这篇文章研究了在电力网络的日常运营中使用图 нейрон网络(GNN)的代理人来实现 Monte Carlo(MC)样本基于的风险评估。MC simulation过程需要解决一个很大的优化电力流(OPF)问题,这是计算拥有的。GNN代理人通过超vised学习来训练,然后用来获取MC样本中的量度(运行储备、电力线流)的预测值,对于(前一个小时)预测风力生产和需求。GNN代理人的准确性被通过对OPF问题和GNN问题的网格可靠性和风险进行比较来评估。结果显示,GNN代理人可以准确地预测(电站级、线路级和系统级)网格状态,并且可以快速地进行风险评估。文章因此开发了基于GNN的快速可靠性和风险评估工具,用于实际电力网络中的运营管理。

  • paper_url: http://arxiv.org/abs/2311.03639
  • repo_url: None
  • paper_authors: Amirhossein Mollaali, Izzet Sahin, Iqrar Raza, Christian Moya, Guillermo Paniagua, Guang Lin
    for: This paper aims to develop a deep operator learning-based framework for achieving high-fidelity results in computational simulations with limited computational resources.methods: The proposed framework uses a physics-guided, bi-fidelity, Fourier-featured Deep Operator Network (DeepONet) that combines low and high-fidelity datasets to learn the foundational solution patterns and refine the initial low-fidelity output. The network utilizes an extensive dataset for foundational learning and a small high-fidelity dataset for refinement.results: The proposed approach is validated using a well-known 2D benchmark cylinder problem, and the results show that the physics-guided Fourier-featured deep operator network possesses superior predictive capability for the lift and drag coefficients compared to data-driven counterparts.
    Abstract In the pursuit of accurate experimental and computational data while minimizing effort, there is a constant need for high-fidelity results. However, achieving such results often requires significant computational resources. To address this challenge, this paper proposes a deep operator learning-based framework that requires a limited high-fidelity dataset for training. We introduce a novel physics-guided, bi-fidelity, Fourier-featured Deep Operator Network (DeepONet) framework that effectively combines low and high-fidelity datasets, leveraging the strengths of each. In our methodology, we began by designing a physics-guided Fourier-featured DeepONet, drawing inspiration from the intrinsic physical behavior of the target solution. Subsequently, we train this network to primarily learn the low-fidelity solution, utilizing an extensive dataset. This process ensures a comprehensive grasp of the foundational solution patterns. Following this foundational learning, the low-fidelity deep operator network's output is enhanced using a physics-guided Fourier-featured residual deep operator network. This network refines the initial low-fidelity output, achieving the high-fidelity solution by employing a small high-fidelity dataset for training. Notably, in our framework, we employ the Fourier feature network as the Trunk network for the DeepONets, given its proficiency in capturing and learning the oscillatory nature of the target solution with high precision. We validate our approach using a well-known 2D benchmark cylinder problem, which aims to predict the time trajectories of lift and drag coefficients. The results highlight that the physics-guided Fourier-featured deep operator network, serving as a foundational building block of our framework, possesses superior predictive capability for the lift and drag coefficients compared to its data-driven counterparts.
    摘要 在寻求精准实验和计算数据的同时减少努力的过程中,需要高精度的结果。然而,获得这些结果通常需要显著的计算资源。为解决这个挑战,这篇文章提出了一个基于深度学习的深度运算网络(DeepONet)框架,只需要一小量高精度数据进行训练。我们提出了一种新的物理导向的、双精度的、傅里埃特征的深度运算网络(DeepONet)框架,可以有效地结合低精度和高精度数据,利用每种数据的优势。在我们的方法中,我们首先设计了物理导向的傅里埃特征的深度运算网络,Drawing inspiration from the intrinsic physical behavior of the target solution。然后,我们将这个网络训练到主要学习低精度解,使用了广泛的数据集。这个过程确保了我们对基础解决方案的全面的掌握。接着,我们使用物理导向的傅里埃特征的深度运算网络来进一步改进低精度深度运算网络的输出,以达到高精度解。在我们的框架中,我们employs the Fourier feature network as the Trunk network for the DeepONets, given its proficiency in capturing and learning the oscillatory nature of the target solution with high precision。我们验证了我们的方法使用了一个知名的2D标准cylinder问题,该问题的目标是预测时间trajectory的升力和阻力坐标。结果表明,物理导向的傅里埃特征的深度运算网络,作为我们框架的基本建筑 block,在预测升力和阻力坐标方面具有更高的预测能力,相比于其数据驱动的对手。

Counterfactual Data Augmentation with Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.03630
  • repo_url: None
  • paper_authors: Ahmed Aloui, Juncheng Dong, Cat P. Le, Vahid Tarokh
  • for: 这篇论文的目的是解决估计 conditional Average Treatment Effects (CATE) 中的 statistically disparity 问题。
  • methods: 本文引入了一种model-agnostic data augmentation方法,通过对选择的个体进行对比学习,从而实现可靠地插入替代对 grupo的实际结果。
  • results: 理论分析和实验研究表明,本方法可以对 state-of-the-art 模型进行改进,提高其性能和防止过拟合性。
    Abstract Statistical disparity between distinct treatment groups is one of the most significant challenges for estimating Conditional Average Treatment Effects (CATE). To address this, we introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals. Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. Theoretical analysis and experimental studies on synthetic and semi-synthetic benchmarks demonstrate that our method achieves significant improvements in both performance and robustness to overfitting across state-of-the-art models.
    摘要 “统计差异 между不同治疗组是计算条件减少治疗效应(CATE)估计中最大的挑战。为解决这个问题,我们介绍了一种模型无关的数据扩充方法,该方法在选定的一些个体上进行了假设的可能的结果的投入。 Specifically, we utilize contrastive learning to learn a representation space and a similarity measure such that in the learned representation space, close individuals identified by the learned similarity measure have similar potential outcomes. This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group. By augmenting the original dataset with these reliable imputations, we can effectively reduce the discrepancy between different treatment groups, while inducing minimal imputation error. The augmented dataset is subsequently employed to train CATE estimation models. 理论分析和实验研究在synthetic和半synthetic benchmark上表明,我们的方法可以在state-of-the-art模型中实现显著提高性和抗过拟合性。”Note: The translation is in Simplified Chinese, which is one of the two standard varieties of Chinese. The other variety is Traditional Chinese.

Are Words Enough? On the semantic conditioning of affective music generation

  • paper_url: http://arxiv.org/abs/2311.03624
  • repo_url: None
  • paper_authors: Jorge Forero, Gilberto Bernardes, Mónica Mendes
  • for: 本文旨在探讨自动生成音乐的可能性,特别是基于情感的音乐生成。
  • methods: 本文回顾了两种主要的自动音乐生成方法:规则化模型和机器学习模型。特别是深度学习架构,它们可以从文本描述生成高质量的音乐。
  • results: 研究表明,使用深度学习和自然语言处理可以提供创作行业powerful工具,用于提示和生成新的音乐作品。
    Abstract Music has been commonly recognized as a means of expressing emotions. In this sense, an intense debate emerges from the need to verbalize musical emotions. This concern seems highly relevant today, considering the exponential growth of natural language processing using deep learning models where it is possible to prompt semantic propositions to generate music automatically. This scoping review aims to analyze and discuss the possibilities of music generation conditioned by emotions. To address this topic, we propose a historical perspective that encompasses the different disciplines and methods contributing to this topic. In detail, we review two main paradigms adopted in automatic music generation: rules-based and machine-learning models. Of note are the deep learning architectures that aim to generate high-fidelity music from textual descriptions. These models raise fundamental questions about the expressivity of music, including whether emotions can be represented with words or expressed through them. We conclude that overcoming the limitation and ambiguity of language to express emotions through music, some of the use of deep learning with natural language has the potential to impact the creative industries by providing powerful tools to prompt and generate new musical works.
    摘要 音乐被广泛认为是表达情感的方式。在这意义上,有一场激烈的辩论是如何用语言表达音乐中的情感。这个问题在今天更加重要,因为深度学习模型的激增使得可以通过提出semantic proposition来自动生成音乐。本篇评论的目的是分析和讨论conditioned by emotions的音乐生成的可能性。为了实现这一目的,我们提出了历史背景,涵盖了不同的学科和方法对这个话题的贡献。在详细的情况下,我们评论了两种主要的自动音乐生成方法:规则based和机器学习模型。特别是深度学习架构,可以生成基于文本描述的高质量音乐。这些模型提出了音乐表达的基本问题,包括情感是否可以通过语言表达,或者通过语言表达出来。我们结论认为,超越语言表达情感的限制和抽象,使用深度学习与自然语言的组合可能会对创作产业产生深见的影响,提供了强大的工具来促进和生成新的音乐作品。

Exploring Latent Spaces of Tonal Music using Variational Autoencoders

  • paper_url: http://arxiv.org/abs/2311.03621
  • repo_url: https://github.com/nadiacarvalho/latent-tonal-music
  • paper_authors: Nádia Carvalho, Gilberto Bernardes
  • for: 这个论文旨在评估varational Autoencoders(VAEs)在生成 cognitive和semantic value的latent representation方面的效果。
  • methods: 这个论文使用了不同的VAE编码方法,包括Piano roll、MIDI、ABC、Tonnetz、DFT of pitch和pitch class distributions,以生成不同的latent space。
  • results: 研究发现,ABC编码perform最好地重construct原始数据,而Pitch DFT编码能够从latent space中提取更多的信息。此外,通过对12个主或副调轨道每个曲目进行对准评估,研究发现Pitch DFT VAE latent space最好地与认知空间相对应,并提供了一个共同频谱空间,在这个空间中,每个关键的组件之间存在某种稳定的顺序关系。
    Abstract Variational Autoencoders (VAEs) have proven to be effective models for producing latent representations of cognitive and semantic value. We assess the degree to which VAEs trained on a prototypical tonal music corpus of 371 Bach's chorales define latent spaces representative of the circle of fifths and the hierarchical relation of each key component pitch as drawn in music cognition. In detail, we compare the latent space of different VAE corpus encodings -- Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, and pitch class distributions -- in providing a pitch space for key relations that align with cognitive distances. We evaluate the model performance of these encodings using objective metrics to capture accuracy, mean square error (MSE), KL-divergence, and computational cost. The ABC encoding performs the best in reconstructing the original data, while the Pitch DFT seems to capture more information from the latent space. Furthermore, an objective evaluation of 12 major or minor transpositions per piece is adopted to quantify the alignment of 1) intra- and inter-segment distances per key and 2) the key distances to cognitive pitch spaces. Our results show that Pitch DFT VAE latent spaces align best with cognitive spaces and provide a common-tone space where overlapping objects within a key are fuzzy clusters, which impose a well-defined order of structural significance or stability -- i.e., a tonal hierarchy. Tonal hierarchies of different keys can be used to measure key distances and the relationships of their in-key components at multiple hierarchies (e.g., notes and chords). The implementation of our VAE and the encodings framework are made available online.
    摘要 variational autoencoders (VAEs) 有效地生成了 cognitive 和 semantic value 的潜在表示。我们评估了 VaEs 在 prototypical 柔软音乐 corpora 上定义的 latent space 是否与 cognitive 音乐理论中的圆形五度和 Hierarchical 关系相匹配。具体来说,我们比较了不同的 VaE corpus 编码 -- Piano roll, MIDI, ABC, Tonnetz, DFT of pitch, 和 pitch class distributions -- 在提供一个 pitch space 以便关键关系的匹配。我们使用对象 metric 评估这些编码的表现,包括准确率、平均方差 error (MSE)、KL- divergence 和计算成本。ABC 编码表现最好地重建原始数据,而 Pitch DFT 似乎捕捉了 latent space 中更多的信息。此外,我们采用对象评估方法,对每个键的 12 大小或小调轨迹进行量化,以衡量这些轨迹与 cognitive pitch space 之间的对应关系。我们的结果表明,Pitch DFT VAE latent space 最好地与 cognitive space 匹配,并提供了一个 common-tone space,其中每个键的 overlap 对象在一个某个键中具有明确的顺序和结构意义 -- i.e., tonality hierarchy。不同键的 tonality hierarchy 可以用来测量键之间的距离和关系,以及它们的内部组件的层次结构。我们的 VAE 和编码框架已经在线上实现。