cs.LG - 2023-11-21

Training Deep 3D Convolutional Neural Networks to Extract BSM Physics Parameters Directly from HEP Data: a Proof-of-Concept Study Using Monte Carlo Simulations

  • paper_url: http://arxiv.org/abs/2311.13060
  • repo_url: None
  • paper_authors: S. Dubey, T. E. Browder, S. Kohani, R. Mandal, A. Sibidanov, R. Sinha
  • for: 这个论文是用计算机视觉技术直接从高能物理(HEP)味道数据中提取 beyond the Standard Model(BSM)参数的一种新应用。
  • methods: 这个论文开发了一种将angular和动量分布转换成” quasi-images”的方法,以便使用卷积神经网络进行回归任务,类似于适应。这与通常在 HEП 中使用机器学习/人工智能(ML/AI)进行分类任务不同。
  • results: 作为证明,这个论文使用了34层差分神经网络来回归 MC(Monte Carlo) simulations of $B \rightarrow K^{*}\mu^{+}\mu^{-}$ 减噪 decay 中的 Wilson Coefficient $C_{9}$。这种技术可以推广应用,可能在其他 HEEP 实验中找到应用。
    Abstract We report on a novel application of computer vision techniques to extract beyond the Standard Model (BSM) parameters directly from high energy physics (HEP) flavor data. We develop a method of transforming angular and kinematic distributions into "quasi-images" that can be used to train a convolutional neural network to perform regression tasks, similar to fitting. This contrasts with the usual classification functions performed using ML/AI in HEP. As a proof-of-concept, we train a 34-layer Residual Neural Network to regress on these images and determine the Wilson Coefficient $C_{9}$ in MC (Monte Carlo) simulations of $B \rightarrow K^{*}\mu^{+}\mu^{-}$ decays. The technique described here can be generalized and may find applicability across various HEP experiments and elsewhere.
    摘要 我们报道了一种使用计算机视觉技术直接从高能物理(HEP)味道数据中提取非标准模型(BSM)参数的新应用。我们开发了一种将天体和动量分布转换成“假像”的方法,这些假像可以用于训练卷积神经网络进行回归任务,类似于适应。这与通常使用机器学习/人工智能(ML/AI)在HEP中进行分类任务不同。作为证明,我们在MC( Monte Carlo) simulate $B \to K^{*}\mu^{+}\mu^{-}$ 衰变中训练了34层差分神经网络,以计算威尔逊系数 $C_{9}$。该技术可以普遍应用,可能在其他HEP实验中找到应用。

A note on estimating the dimension from a random geometric graph

  • paper_url: http://arxiv.org/abs/2311.13059
  • repo_url: None
  • paper_authors: Caelan Atamanchuk, Luc Devroye, Gabor Lugosi
  • for: 这个论文研究了使用随机几何图来估计Underlying Space中维度d的问题,当我们有图的邻接矩阵,但不知道r_n或X_i vectors的情况下。
  • methods: 论文使用了随机几何图的方法来估计维度d,并提出了一个可靠的估计方法,其中不需要知道density的情况下,当n^{3/2} r_n^d 到达极限时,估计方法就会 converge到维度d。
  • results: 论文的主要结果是,当density满足 $\int f^5 < \infty$ 时,当 $n^{3/2} r_n^d \to \infty$ 且 $r_n = o(1)$ 时,存在一个可靠的估计方法,其中估计方法会 converge到维度d。此外,无论density是什么,当 $n r_n^d \to \infty$ 且 $r_n = o(1)$ 时,也存在一个可靠的估计方法。
    Abstract Let $G_n$ be a random geometric graph with vertex set $[n]$ based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.
    摘要 Let $G_n$ be a random geometric graph with vertex set $[n]$,based on $n$ i.i.d.\ random vectors $X_1,\ldots,X_n$ drawn from an unknown density $f$ on $\R^d$. An edge $(i,j)$ is present when $\|X_i -X_j\| \le r_n$, for a given threshold $r_n$ possibly depending upon $n$, where $\| \cdot \|$ denotes Euclidean distance. We study the problem of estimating the dimension $d$ of the underlying space when we have access to the adjacency matrix of the graph but do not know $r_n$ or the vectors $X_i$. The main result of the paper is that there exists an estimator of $d$ that converges to $d$ in probability as $n \to \infty$ for all densities with $\int f^5 < \infty$ whenever $n^{3/2} r_n^d \to \infty$ and $r_n = o(1)$. The conditions allow very sparse graphs since when $n^{3/2} r_n^d \to 0$, the graph contains isolated edges only, with high probability. We also show that, without any condition on the density, a consistent estimator of $d$ exists when $n r_n^d \to \infty$ and $r_n = o(1)$.

Multi-fidelity Bayesian Optimization in Engineering Design

  • paper_url: http://arxiv.org/abs/2311.13050
  • repo_url: None
  • paper_authors: Bach Do, Ruda Zhang
  • For: 这篇论文主要针对的是用多元信息优化(MFO)和 bayesian优化(BO)结合解决高成本工程设计优化问题,利用这种方法的优势,如物理和数学理解、资源节约、探索尝试补做、不确定性考虑和并行计算。* Methods: 这篇论文主要探讨了两个关键组成部分的进步:基于 Gaussian process(GP)的MF 模拟和获取函数。论文首先将现有的MF模型方法和MFO策略分类,将MF BO置于大家族中的Surrogate-based优化和MFO算法中。然后,通过探讨每个组成部分中共享的特性,描述了重要的GP基于MF模拟和审查函数。* Results: 论文的结果显示,MF BO 可以在解决复杂且重要的设计优化问题中表现出色,如约束优化、高维度优化、不确定性优化和多目标优化。然而,论文还提出了一些需要进一步研究的重要方面,例如约束优化、高维度优化、不确定性优化和多目标优化。
    Abstract Resided at the intersection of multi-fidelity optimization (MFO) and Bayesian optimization (BO), MF BO has found a niche in solving expensive engineering design optimization problems, thanks to its advantages in incorporating physical and mathematical understandings of the problems, saving resources, addressing exploitation-exploration trade-off, considering uncertainty, and processing parallel computing. The increasing number of works dedicated to MF BO suggests the need for a comprehensive review of this advanced optimization technique. In this paper, we survey recent developments of two essential ingredients of MF BO: Gaussian process (GP) based MF surrogates and acquisition functions. We first categorize the existing MF modeling methods and MFO strategies to locate MF BO in a large family of surrogate-based optimization and MFO algorithms. We then exploit the common properties shared between the methods from each ingredient of MF BO to describe important GP-based MF surrogate models and review various acquisition functions. By doing so, we expect to provide a structured understanding of MF BO. Finally, we attempt to reveal important aspects that require further research for applications of MF BO in solving intricate yet important design optimization problems, including constrained optimization, high-dimensional optimization, optimization under uncertainty, and multi-objective optimization.
    摘要 居住在多元调整(MFO)和 bayesian优化(BO)的交叉点上,MF BO 在解决高成本工程设计优化问题方面发现了一个 nich,感谢它在 физи学和数学问题的理解方面具有优势,避免资源浪费,处理探索-探索负载问题,考虑不确定性,并可以并行计算。随着更多的研究works dedicated to MF BO,需要一篇 comprehensive 的 Review 文章来概述这种进步的优化技术。在这篇文章中,我们首先将 существу的 MF 模型方法和 MFO 策略分类,以便将 MF BO 置入一个大家族的 surrogate-based 优化和 MFO 算法中。然后,我们利用这些方法之间的共同特性来描述重要的 GP 基于 MF 模型,并评审各种 acquisition 函数。通过这样,我们期望提供一个结构化的理解 MF BO。最后,我们尝试揭示需要进一步研究的重要方面,以便应用 MF BO 解决复杂但重要的设计优化问题,包括受限优化、高维度优化、不确定性优化和多値优化。

Favour: FAst Variance Operator for Uncertainty Rating

  • paper_url: http://arxiv.org/abs/2311.13036
  • repo_url: None
  • paper_authors: Thomas D. Ahle, Sahar Karimi, Peter Tak Peter Tang
  • for: 本文旨在提高 bayesian neural network(BNN)的广泛应用,通过采样 posterior distribution 来评估预测结果的不确定性。
  • methods: 本文提出了一种更有原理的幂度传播框架,基于 “spiked covariance matrices” 来平滑地选择质量和推理时间之间的平衡。这种框架使用了一种新的快速更新 diagonally-plus-low-rank 矩阵 approximation 的算法,可以在不同操作下进行更新。
  • results: 对于 MC Dropout 和 Variational Inference 等下游不确定性主题任务,我们测试了我们的算法和采样基eline,发现 Favour 可以匹配 10-100 个推理样本的性能,而且速度和性能之间没有明显的负面关系。因此,本文使得 BNN 在性能关键任务中得到了广泛应用。
    Abstract Bayesian Neural Networks (BNN) have emerged as a crucial approach for interpreting ML predictions. By sampling from the posterior distribution, data scientists may estimate the uncertainty of an inference. Unfortunately many inference samples are often needed, the overhead of which greatly hinder BNN's wide adoption. To mitigate this, previous work proposed propagating the first and second moments of the posterior directly through the network. However, on its own this method is even slower than sampling, so the propagated variance needs to be approximated such as assuming independence between neural nodes. The resulting trade-off between quality and inference time did not match even plain Monte Carlo sampling. Our contribution is a more principled variance propagation framework based on "spiked covariance matrices", which smoothly interpolates between quality and inference time. This is made possible by a new fast algorithm for updating a diagonal-plus-low-rank matrix approximation under various operations. We tested our algorithm against sampling based MC Dropout and Variational Inference on a number of downstream uncertainty themed tasks, such as calibration and out-of-distribution testing. We find that Favour is as fast as performing 2-3 inference samples, while matching the performance of 10-100 samples. In summary, this work enables the use of BNN in the realm of performance critical tasks where they have previously been out of reach.
    摘要 bayesian neural networks (BNN) 已经成为解释机器学习预测的关键方法。通过采样 posterior distribution,数据科学家可以估计预测的不确定性。然而,很多采样通常需要很长时间,这会很大的阻碍 BNN 的广泛应用。为了解决这个问题,先前的工作提议直接通过网络传播 posterior 的第一和第二 moments。然而,这种方法甚至 slower than sampling,因此需要approximate propagated variance,例如 Assuming independence between neural nodes。这导致了quality和采样时间之间的权衡不符合,even plain Monte Carlo sampling。我们的贡献是一种更有原理的方法,基于 "spiked covariance matrices",可以平滑地转换 междуquality和采样时间。这是通过一种新的快速更新 diagonally-plus-low-rank matrix approximation under various operations的算法来实现的。我们对 sampling based MC Dropout和Variational Inference 进行了许多下游不确定性主题任务的测试,如准确性和外围测试。我们发现,Favour 可以与 2-3 个采样相比,具有与 10-100 个采样相同的性能。总之,这项工作使得 BNN 可以在性能关键任务中应用,这些任务之前已经不可达。

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

  • paper_url: http://arxiv.org/abs/2311.17894
  • repo_url: None
  • paper_authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore
  • for: 本研究使用机器学习方法来研究半导体原子在碳单层上的动态转移,当电子扫描顺序电子显微镜(STEM)电子束辐射时。
  • methods: 我们的方法基于数据中心,利用STEM中收集的数据样本进行处理和筛选,生成符号表示,并使用神经网络来预测转移概率。
  • results: 我们的实验研究表明,我们的方法可以准确地预测半导体原子的动态转移,并且可以在适当的目标位置引导半导体原子。
    Abstract We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural network to predict transition probabilities. These learned transition dynamics are then leveraged to guide a single silicon atom throughout the lattice to pre-determined target destinations. We present empirical analyses that demonstrate the efficacy and generality of our approach.
    摘要 我们介绍了一种机器学习方法,用于确定单层碳原子上锆原子的迁移动力学。我们的方法是数据驱动的,利用了扫描传输电子显微镜(STEM)中的电子束数据。我们对数据样本进行了处理和筛选,生成了符号表示法,并使用这些符号表示法来训练神经网络预测迁移概率。这些学习得到的迁移动力学然后用于导引单个锆原子在网格中到预先确定的目标位置。我们对实验数据进行了empirical分析,证明了我们的方法的有效性和通用性。

Fast and Interpretable Mortality Risk Scores for Critical Care Patients

  • paper_url: http://arxiv.org/abs/2311.13015
  • repo_url: https://github.com/muhangtian/gfr-experiments
  • paper_authors: Chloe Qinyu Zhu, Muhang Tian, Lesia Semenova, Jiachang Liu, Jack Xu, Joseph Scarpa, Cynthia Rudin
  • for: 预测 ICU 病人死亡的风险评估是重要的任务之一,这些评估的目的是提高护理医学中的病人监测和护理。
  • methods: 我们使用现代可读性 Machine Learning 技术来设计准确且可读性的死亡风险分数模型。我们利用了最大的公共 ICU 监测数据集,即 MIMIC III 和 eICU 数据集。我们通过评估风险的各个医疗机构来研究风险的普适性。
  • results: 我们的 GroupFasterRisk 算法可以在几个小时内生成高度可读性的死亡风险分数模型,这些模型的预测性能与黑盒 ML 模型相当,但它们具有许多优点,例如可控制特征数量、团队稀烈性、幂等性和可选的域知识 correction。这些模型在医疗机构中的实践中表现出色,并且可以根据域专家的需求进行定制。
    Abstract Prediction of mortality in intensive care unit (ICU) patients is an important task in critical care medicine. Prior work in creating mortality risk models falls into two major categories: domain-expert-created scoring systems, and black box machine learning (ML) models. Both of these have disadvantages: black box models are unacceptable for use in hospitals, whereas manual creation of models (including hand-tuning of logistic regression parameters) relies on humans to perform high-dimensional constrained optimization, which leads to a loss in performance. In this work, we bridge the gap between accurate black box models and hand-tuned interpretable models. We build on modern interpretable ML techniques to design accurate and interpretable mortality risk scores. We leverage the largest existing public ICU monitoring datasets, namely the MIMIC III and eICU datasets. By evaluating risk across medical centers, we are able to study generalization across domains. In order to customize our risk score models, we develop a new algorithm, GroupFasterRisk, which has several important benefits: (1) it uses hard sparsity constraint, allowing users to directly control the number of features; (2) it incorporates group sparsity to allow more cohesive models; (3) it allows for monotonicity correction on models for including domain knowledge; (4) it produces many equally-good models at once, which allows domain experts to choose among them. GroupFasterRisk creates its risk scores within hours, even on the large datasets we study here. GroupFasterRisk's risk scores perform better than risk scores currently used in hospitals, and have similar prediction performance to black box ML models (despite being much sparser). Because GroupFasterRisk produces a variety of risk scores and handles constraints, it allows design flexibility, which is the key enabler of practical and trustworthy model creation.
    摘要 估计 icu 病人死亡的风险是重要的任务在抢救医学中。现有的 mortality risk 模型可以分为两大类:域专家创造的分数系统,以及黑obox 机器学习(ml)模型。两者都有缺点:黑obox 模型不可以在医院中使用,而手动创建模型(包括手动调整 logistic regression 参数)需要人类进行高维度受限化优化,这会导致性能下降。在这种情况下,我们 bridges 这两种模型的缺点,并创建了准确且可解释的 mortality risk 分数。我们基于现代可解释ml技术,设计了高精度的 mortality risk 分数。我们利用了最大的 icu 监测数据集,namely MIMIC III 和 eICU 数据集。通过评估风险的医院差异,我们能够研究风险的通用性。为了自定义我们的风险分数模型,我们开发了一个新的算法,GroupFasterRisk,它具有以下重要优点:1. 使用硬约束,allow users 直接控制特征数量。2. incorporates group sparsity,allowing more cohesive models.3. 允许含域知识的 monotonicity correction.4. 生成多个 equally good models,allowing domain experts to choose among them.GroupFasterRisk 在很短的时间内生成了高精度的风险分数,并且在我们所研究的大型数据集上表现了更好的预测性能。 GroupFasterRisk 的风险分数比现在使用的医院风险分数更好,并且与黑obox ml 模型的预测性能相似,即使它们非常紧凑。由于 GroupFasterRisk 生成了多个风险分数和处理约束,因此具有设计灵活性,这是实用和信任worthy 模型的关键 enable。

How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks

  • paper_url: http://arxiv.org/abs/2311.12997
  • repo_url: None
  • paper_authors: Rahul Ramesh, Mikail Khona, Robert P. Dick, Hidenori Tanaka, Ekdeep Singh Lubana
  • for: 本研究旨在评估 transformer 模型是否可以学习并执行复杂的操作。
  • methods: 作者使用 autoregressive Transformer 模型,并在这些模型上进行了大量的系统性实验,以评估模型是否可以学习并执行复杂的操作。
  • results: 研究发现,autoregressive Transformer 模型可以从训练数据中学习compositional结构,并可以通过将输出作为 intermediate 输出来执行复杂的操作。此外,训练数据对模型的作 composition 能力有着重要的影响,并且模型的吸引层在后半部分对 compositional 能力具有重要的作用。
    Abstract Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e.g., performing simple logical operations. Given the inherent compositional nature of language, one can expect the model to learn to compose these capabilities, potentially yielding a combinatorial explosion of what operations it can perform on an input. Motivated by the above, we aim to assess in this paper "how capable can a transformer become?". Specifically, we train autoregressive Transformer models on a data-generating process that involves compositions of a set of well-defined monolithic capabilities. Through a series of extensive and systematic experiments on this data-generating process, we show that: (1) autoregressive Transformers can learn compositional structures from the training data and generalize to exponentially or even combinatorially many functions; (2) composing functions by generating intermediate outputs is more effective at generalizing to unseen compositions, compared to generating no intermediate outputs; (3) the training data has a significant impact on the model's ability to compose unseen combinations of functions; and (4) the attention layers in the latter half of the model are critical to compositionality.
    摘要 transformers 受训于庞大的文本资料库显示出一个惊人的能力集合,例如执行简单的逻辑运算。由于语言的自然层次结构,我们可以预期模型会学习这些能力的组合,可能导致输入的可能性急剧增加。为此,我们在这篇论文中尝试评估transformers 的能力底线。特别是,我们使用一种基于推论的Transformer模型来训练数据生成过程,该过程包含一组已定义的独立功能。通过对这种数据生成过程进行了系统和广泛的实验,我们发现:1. 推论型Transformer 可以从训练数据中学习层次结构,并对未before seen的组合函数进行泛化。2. 通过生成中间输出来组合函数的方式更有效地泛化到未经见过的组合函数,相比于不生成中间输出。3. 训练数据对模型对未经见过的组合函数的能力具有很大的影响。4. 模型的后半部分的注意层是对组合性的关键。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the language in a simpler form, especially for non-native speakers. The translation is based on the "Yale Romanization" system, which is commonly used in academic publications.

Hierarchical Learning for Quantum ML: Novel Training Technique for Large-Scale Variational Quantum Circuits

  • paper_url: http://arxiv.org/abs/2311.12929
  • repo_url: None
  • paper_authors: Hrant Gharibyan, Vincent Su, Hayk Tepanyan
  • for: 实现大规模量子矩阵的有效训练
  • methods: 利用层次学习的新量子架构,实现大规模量子矩阵的训练
  • results: 在量子矩阵上训练3维多重 Gaussian 分布,精度可达4%的总差异 distance,并且运行在现有量子硬件(IBM 7和27个粒子)上
    Abstract We present hierarchical learning, a novel variational architecture for efficient training of large-scale variational quantum circuits. We test and benchmark our technique for distribution loading with quantum circuit born machines (QCBMs). With QCBMs, probability distributions are loaded into the squared amplitudes of computational basis vectors represented by bitstrings. Our key insight is to take advantage of the fact that the most significant (qu)bits have a greater effect on the final distribution and can be learned first. One can think of it as a generalization of layerwise learning, where some parameters of the variational circuit are learned first to prevent the phenomena of barren plateaus. We briefly review adjoint methods for computing the gradient, in particular for loss functions that are not expectation values of observables. We first compare the role of connectivity in the variational ansatz for the task of loading a Gaussian distribution on nine qubits, finding that 2D connectivity greatly outperforms qubits arranged on a line. Based on our observations, we then implement this strategy on large-scale numerical experiments with GPUs, training a QCBM to reproduce a 3-dimensional multivariate Gaussian distribution on 27 qubits up to $\sim4\%$ total variation distance. Though barren plateau arguments do not strictly apply here due to the objective function not being tied to an observable, this is to our knowledge the first practical demonstration of variational learning on large numbers of qubits. We also demonstrate hierarchical learning as a resource-efficient way to load distributions for existing quantum hardware (IBM's 7 and 27 qubit devices) in tandem with Fire Opal optimizations.
    摘要 我们介绍了一种当今简化架构,用于有效地训练大规模的量子矩阵变量学习。我们将这种技术应用于量子矩阵生成机器(QCBM)上,将几率分布载入量子矩阵的两个基底向量上。我们的关键见解是利用最重要的(qu)比特对最终分布的影响更大,因此可以从最重要的比特开始学习。这可以看作是对层次学习的扩展,以避免巴伦板块现象。我们简要介绍了附属方法来计算导数,特别是 для损失函数不是观测值。我们在9个比特上载入 Gaussian 分布时发现,2D 连接性能比发展在线的连接性能更高。基于我们的观察,我们然后将这策略应用到大规模的数据库中,使用 GPU 训练一个 QCBM,以重现3维多项 Gaussian 分布在27个比特上,差异值为约4%。虽然巴伦板块理论不直接适用于这个任务,但这是我们知道的首个实际运用量子学习大量比特的示例。我们还证明这种简化架构可以实现资源效率地载入分布,并与 Fire Opal 优化相结合。

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

  • paper_url: http://arxiv.org/abs/2311.12786
  • repo_url: None
  • paper_authors: Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger
  • for: 本研究的目的是解释在预训练模型上进行细化是如何改变模型内部学习的能力?研究人员通过使用机制可解释性工具(例如网络剔除和探测)来理解模型在预训练和细化过程中学习的能力是如何变化。
  • methods: 研究人员使用synthetic和控制的设置来进行实验,并使用机制可解释性工具来理解模型在预训练和细化过程中学习的能力是如何变化。
  • results: 研究人员发现:(i)细化通常不会改变模型内部学习的能力;(ii)在预训练后,模型会学习一个较少的变换,称为”包裹”,这使得模型看起来像是改变了能力;(iii)在下游任务中细化时,模型可以快速”复活”已经被遗弃的能力,即模型在只需几个梯度步后就可以重新使用这些能力。这表明,在细化模型时,实际上可能会意外地移除模型的安全封包,从而导致模型的不安全。研究人员还在使用TinyStories dataset进行语言模型的实验,以支持他们的声明。
    Abstract Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy. Despite its clear importance, there has been minimal work that explains how fine-tuning alters the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely novel capabilities or does it just modulate existing ones? We address this question empirically in synthetic, controlled settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.
    摘要 现在,许多大型预训模型的细化已成为开发任务特定和通用机器学习系统的标准策略,包括开发安全部署的模型。 despite its importance, there has been little research on how fine-tuning affects the underlying capabilities learned by a model during pretraining: does fine-tuning yield entirely new capabilities or just modulate existing ones? We address this question empirically in controlled, synthetic settings where we can use mechanistic interpretability tools (e.g., network pruning and probing) to understand how the model's underlying capabilities are changing. We perform an extensive analysis of the effects of fine-tuning in these settings, and show that: (i) fine-tuning rarely alters the underlying model capabilities; (ii) a minimal transformation, which we call a 'wrapper', is typically learned on top of the underlying model capabilities, creating the illusion that they have been modified; and (iii) further fine-tuning on a task where such hidden capabilities are relevant leads to sample-efficient 'revival' of the capability, i.e., the model begins reusing these capability after only a few gradient steps. This indicates that practitioners can unintentionally remove a model's safety wrapper merely by fine-tuning it on a, e.g., superficially unrelated, downstream task. We additionally perform analysis on language models trained on the TinyStories dataset to support our claims in a more realistic setup.

Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+α$ Moments

  • paper_url: http://arxiv.org/abs/2311.12784
  • repo_url: None
  • paper_authors: Trung Dang, Jasper C. H. Lee, Maoyuan Song, Paul Valiant
    for: The paper is written to study the problem of mean estimation in $\mathbb{R}$, specifically exploring whether algorithms can leverage useful features of the input distribution to achieve better performance than the current best-known bounds.methods: The paper uses a combination of theoretical and empirical techniques to study the mean estimation problem, including the construction of a new distribution that demonstrates the limitations of existing algorithms and the introduction of a new definitional framework for analyzing the fine-grained optimality of algorithms.results: The paper shows that, despite the existence of useful features in the input distribution, no reasonable estimator can achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. Additionally, the paper introduces a new definitional framework for analyzing the fine-grained optimality of algorithms, which it applies to show that median-of-means is neighborhood optimal, up to constant factors.
    Abstract There is growing interest in improving our algorithmic understanding of fundamental statistical problems such as mean estimation, driven by the goal of understanding the limits of what we can extract from valuable data. The state of the art results for mean estimation in $\mathbb{R}$ are 1) the optimal sub-Gaussian mean estimator by [LV22], with the tight sub-Gaussian constant for all distributions with finite but unknown variance, and 2) the analysis of the median-of-means algorithm by [BCL13] and a lower bound by [DLLO16], characterizing the big-O optimal errors for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0,1)$. Both results, however, are optimal only in the worst case. We initiate the fine-grained study of the mean estimation problem: Can algorithms leverage useful features of the input distribution to beat the sub-Gaussian rate, without explicit knowledge of such features? We resolve this question with an unexpectedly nuanced answer: "Yes in limited regimes, but in general no". For any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22]. More generally, we introduce a new definitional framework to analyze the fine-grained optimality of algorithms, which we call "neighborhood optimality", interpolating between the unattainably strong "instance optimality" and the trivially weak "admissibility" definitions. Applying the new framework, we show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-optimal estimator without constant factor slackness.
    摘要 “现在有增加的 интерес在改善我们的算法理解基本的统计问题,如mean estimation,这是因为了了解对于价值得资料中提取的限制。现有的最佳结果在 $\mathbb{R}$ 上是:1)最佳对�elta-Gaussian mean estimator,由 [LV22] 提供,具有所有 Distribution 的finite但未知方差下的紧密 sub-Gaussian 常数,以及2) [BCL13] 和 [DLLO16] 的分析,characterizing the big-O optimal error for distributions for which only a $1+\alpha$ moment exists for $\alpha \in (0,1)$.但这些结果仅在最差情况下是最佳的。我们开始了基本的mean estimation问题的细部研究:可以算法利用输入分布的有用特征来超过 sub-Gaussian 率吗?我们给出了一个不意料的答案:“是,但仅对于有限的特定区间”. for any distribution $p$ with a finite mean, we construct a distribution $q$ whose mean is well-separated from $p$'s, yet $p$ and $q$ are not distinguishable with high probability, and $q$ further preserves $p$'s moments up to constants. The main consequence is that no reasonable estimator can asymptotically achieve better than the sub-Gaussian error rate for any distribution, matching the worst-case result of [LV22].更一般地,我们引入一个新的分析定义架构,我们称之为 “邻区优化”,它位于“不可接受地强”的“实例优化”和“无效的”“可接受性”定义之间。我们运用这个定义架构,证明 median-of-means 是邻区优化的,具有常数因子的固定。是否可以找到一个不含常数因子的邻区优化 estimator 是开问题。”

Generative Machine Learning for Multivariate Equity Returns

  • paper_url: http://arxiv.org/abs/2311.14735
  • repo_url: None
  • paper_authors: Ruslan Tepelyan, Achintya Gopal
  • for: 这个论文旨在使用现代机器学习方法来模拟股票回报的分布。
  • methods: 这篇论文使用了conditional importance weighted autoencoders和conditional normalizing flows等现代机器学习方法来实现这个目标。
  • results: 论文表明这些机器学习模型可以广泛应用于金融领域,包括生成真实的synthetic数据、估计volatility和相关性、风险分析(如资产值风险,或VaR,的 portefolio)和资产配置优化。
    Abstract The use of machine learning to generate synthetic data has grown in popularity with the proliferation of text-to-image models and especially large language models. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods common in finance of fitting statistical models to data. In this work, we explore the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.
    摘要 使用机器学习生成合成数据的使用已经在文本到图像模型和特别是大语言模型的普及下逐渐增长。这些模型的核心方法是学习下面数据的分布,与传统金融中使用统计模型 fits 数据类似。在这项工作中,我们探讨使用现代机器学习方法,特别是conditional importance weighted autoencoders(variational autoencoders的变体)和conditional normalizing flows,来模型股票回报的分布。我们的主要问题是学习S&P 500成员的共同分布,即学习500维联合分布。我们表明这种生成模型在金融领域有广泛的应用,包括生成真实的合成数据、评估风险(如资产值风险,或VaR)、资产组合优化等。

Deep Learning-Based Real-Time Quality Control of Standard Video Compression for Live Streaming

  • paper_url: http://arxiv.org/abs/2311.12918
  • repo_url: None
  • paper_authors: Matin Mortaheb, Mohammad A. Amir Khojastepour, Srimat T. Chakradhar, Sennur Ulukus
  • for: 提高无线用户视频质量的实时控制技术
  • methods: 使用实时深度学习控制H.264编码器参数,以保持视频质量的PSNR水平在 Specified threshold 以上,同时最小化编码后的视频带宽使用
  • results: 实验结果表明,该方法可以在QCIF数据集和多种公共数据集上实现到2.5倍的带宽使用率提升,并且非遵从概率低于10^-2
    Abstract Ensuring high-quality video content for wireless users has become increasingly vital. Nevertheless, maintaining a consistent level of video quality faces challenges due to the fluctuating encoded bitrate, primarily caused by dynamic video content, especially in live streaming scenarios. Video compression is typically employed to eliminate unnecessary redundancies within and between video frames, thereby reducing the required bandwidth for video transmission. The encoded bitrate and the quality of the compressed video depend on encoder parameters, specifically, the quantization parameter (QP). Poor choices of encoder parameters can result in reduced bandwidth efficiency and high likelihood of non-conformance. Non-conformance refers to the violation of the peak signal-to-noise ratio (PSNR) constraint for an encoded video segment. To address these issues, a real-time deep learning-based H.264 controller is proposed. This controller dynamically estimates the optimal encoder parameters based on the content of a video chunk with minimal delay. The objective is to maintain video quality in terms of PSNR above a specified threshold while minimizing the average bitrate of the compressed video. Experimental results, conducted on both QCIF dataset and a diverse range of random videos from public datasets, validate the effectiveness of this approach. Notably, it achieves improvements of up to 2.5 times in average bandwidth usage compared to the state-of-the-art adaptive bitrate video streaming, with a negligible non-conformance probability below $10^{-2}$.
    摘要 保证无线用户高质量视频内容已成为当前非常重要。然而,在保持视频质量水平时,面临着因动态视频内容而导致的编码比特率波动的挑战。为解决这些问题,提出了一种在线实时基于深度学习的H.264控制器。这种控制器在视频块内容中估算了最佳编码参数,以保证视频质量PSNR水平上一定的门槛,同时最小化压缩视频的平均比特率。实验结果,在QCIF数据集和公共数据集中采集的多种随机视频上,证明了这种方法的有效性。特别是,可以在比特率使用量下降2.5倍以上,同时非遵从probability在10^-2以下。

Neural-Integrated Meshfree (NIM) Method: A differentiable programming-based hybrid solver for computational mechanics

  • paper_url: http://arxiv.org/abs/2311.12915
  • repo_url: None
  • paper_authors: Honghui Du, QiZhi He
  • for: 该研究旨在提出一种基于强化学习的 meshfree 方法,用于解决计算机力学问题。
  • methods: 该方法使用了 differentiable programming 技术,将传统的物理学习法与深度学习架构集成,并采用了 NeuroPU 混合方法来有效地表示解。
  • results: 研究表明,NIM 方法可以提高解表示精度和训练效率,并且在不同的数据中保持高度的一致性和泛化能力。
    Abstract We present the neural-integrated meshfree (NIM) method, a differentiable programming-based hybrid meshfree approach within the field of computational mechanics. NIM seamlessly integrates traditional physics-based meshfree discretization techniques with deep learning architectures. It employs a hybrid approximation scheme, NeuroPU, to effectively represent the solution by combining continuous DNN representations with partition of unity (PU) basis functions associated with the underlying spatial discretization. This neural-numerical hybridization not only enhances the solution representation through functional space decomposition but also reduces both the size of DNN model and the need for spatial gradient computations based on automatic differentiation, leading to a significant improvement in training efficiency. Under the NIM framework, we propose two truly meshfree solvers: the strong form-based NIM (S-NIM) and the local variational form-based NIM (V-NIM). In the S-NIM solver, the strong-form governing equation is directly considered in the loss function, while the V-NIM solver employs a local Petrov-Galerkin approach that allows the construction of variational residuals based on arbitrary overlapping subdomains. This ensures both the satisfaction of underlying physics and the preservation of meshfree property. We perform extensive numerical experiments on both stationary and transient benchmark problems to assess the effectiveness of the proposed NIM methods in terms of accuracy, scalability, generalizability, and convergence properties. Moreover, comparative analysis with other physics-informed machine learning methods demonstrates that NIM, especially V-NIM, significantly enhances both accuracy and efficiency in end-to-end predictive capabilities.
    摘要 我们介绍了一种叫做神经网络集成粗糙方法(NIM),这是一种基于渐进学习架构的混合粗糙方法,用于计算机构学。NIM融合了传统物理基础的粗糙分解技术与深度学习架构,使得解析方法可以更好地表示解析结果。这个神经数据混合不只提高了解析方法的数据空间分解,还可以减少深度学习模型的大小和基于自动渐进的空间梯度计算,因此提高了训练效率。在NIM框架下,我们提出了两种真正粗糙的解析方法:强式形式基于NIM(S-NIM)和地方可变形基于NIM(V-NIM)。S-NIM解析方法直接考虑到了强式形式的管理 equation,而V-NIM解析方法则使用了地方 Petrov-Galerkin 方法,可以根据任意重叠的子区域建构可变残留。这 Ensures both the satisfaction of underlying physics and the preservation of meshfree property。我们对站点和变化benchmark问题进行了广泛的数据实验,以评估NIM方法的精度、可扩展性、一致性和收敛性。此外,我们还进行了与其他物理资料机器学习方法的比较分析,结果显示NIM方法,特别是V-NIM方法,可以明显提高精度和效率在端到端预测能力。

Learning to Optimise Wind Farms with Graph Transformers

  • paper_url: http://arxiv.org/abs/2311.12750
  • repo_url: None
  • paper_authors: Siyi Li, Arnaud Robert, A. Aldo Faisal, Matthew D. Piggott
  • for: 提供一种数据驱动模型,可以准确预测风力电站中所有风轮机的发电量。
  • methods: 将风电园转换为完全连接图,并使用图变换器进行处理。
  • results: 得到一个具有普适性的替补模型,可以用于优化风轮机的法向角配置,并达到类似于 industrially-standard 风电园仿真工具的准确率,仅需计算成本的一小部分。
    Abstract This work proposes a novel data-driven model capable of providing accurate predictions for the power generation of all wind turbines in wind farms of arbitrary layout, yaw angle configurations and wind conditions. The proposed model functions by encoding a wind farm into a fully-connected graph and processing the graph representation through a graph transformer. The graph transformer surrogate is shown to generalise well and is able to uncover latent structural patterns within the graph representation of wind farms. It is demonstrated how the resulting surrogate model can be used to optimise yaw angle configurations using genetic algorithms, achieving similar levels of accuracy to industrially-standard wind farm simulation tools while only taking a fraction of the computational cost.
    摘要

Exploring Graph Classification Techniques Under Low Data Constraints: A Comprehensive Study

  • paper_url: http://arxiv.org/abs/2311.12737
  • repo_url: None
  • paper_authors: Kush Kothari, Bhavya Mehta, Reshmika Nambiar, Seema Shrawne
  • for: 这个论文提供了对 graf数据增强和少数shot学习的最新研究进展的简洁概述。
  • methods: 论文涵盖了不同类型的graph数据增强技术,包括节点和边扰动、图缩放、图生成等,以及最新的少数shot学习技术,如元学习和模型独立元学习。
  • results: 论文对这些领域进行了深入探讨,并将它们分为rule based方法和学习基于方法。在图数据增强方面,论文还研究了度量学习技术和优化基于技术。总的来说,这篇论文为解决图处理问题中的低数据场景提供了广泛的技术选择。
    Abstract This survey paper presents a brief overview of recent research on graph data augmentation and few-shot learning. It covers various techniques for graph data augmentation, including node and edge perturbation, graph coarsening, and graph generation, as well as the latest developments in few-shot learning, such as meta-learning and model-agnostic meta-learning. The paper explores these areas in depth and delves into further sub classifications. Rule based approaches and learning based approaches are surveyed under graph augmentation techniques. Few-Shot Learning on graphs is also studied in terms of metric learning techniques and optimization-based techniques. In all, this paper provides an extensive array of techniques that can be employed in solving graph processing problems faced in low-data scenarios.
    摘要 这份调查论文提供了近期研究Graph数据增强和几 shot学习的全面回顾。它覆盖了各种Graph数据增强技术,包括节点和边干扰、图缩放、图生成等,以及最新的几 shot学习技术,如元学习和模型无关元学习。文章深入探讨这些领域,并进一步分类。Rule based approaches和learning based approaches分别被surveyed under Graph增强技术。几 shot学习在图上也被研究,包括度量学习技术和优化基于技术。总之,这份论文提供了诸多可用于解决低数据情况下的图处理问题的技术。

Non-Sequential Ensemble Kalman Filtering using Distributed Arrays

  • paper_url: http://arxiv.org/abs/2311.12909
  • repo_url: None
  • paper_authors: Cédric Travelletti, Jörg Franke, David Ginsbourger, Stefan Brönnimann
  • for: 这个论文是为了提出一种新的分布式实现 Ensemble Kalman Filter (EnKF),以便在高维问题中进行大量数据的非顺序吸收。
  • methods: 这个论文使用了分布式计算的新进展,以实现构建和使用全模型差异矩阵的分布式存储,从而实现单批吸收所有观测值,消除观测顺序依赖。
  • results: 比较性性能评估表明,新的非顺序实现超过了传统的顺序实现。
    Abstract This work introduces a new, distributed implementation of the Ensemble Kalman Filter (EnKF) that allows for non-sequential assimilation of large datasets in high-dimensional problems. The traditional EnKF algorithm is computationally intensive and exhibits difficulties in applications requiring interaction with the background covariance matrix, prompting the use of methods like sequential assimilation which can introduce unwanted consequences, such as dependency on observation ordering. Our implementation leverages recent advancements in distributed computing to enable the construction and use of the full model error covariance matrix in distributed memory, allowing for single-batch assimilation of all observations and eliminating order dependencies. Comparative performance assessments, involving both synthetic and real-world paleoclimatic reconstruction applications, indicate that the new, non-sequential implementation outperforms the traditional, sequential one.
    摘要

Attacks of fairness in Federated Learning

  • paper_url: http://arxiv.org/abs/2311.12715
  • repo_url: https://github.com/slkdfjslkjfd/fl_fairness_attacks
  • paper_authors: Joseph Rance, Filip Svoboda
  • for: 本研究探讨了 Federated Learning 中数据隐私的攻击方式,特别是在某些特征下可以通过控制一小部分客户端来引入后门。
  • methods: 研究人员使用了一种类似于后门攻击的威胁模型,并证明了在某些特征下,只需控制一个客户端就可以影响模型的偏见性。
  • results: 研究人员发现,通过引入后门,攻击者可以让模型在某些特征下具有不公平的性能分布。此外,研究人员还发现,这种攻击可以通过控制单个客户端来实现。
    Abstract Federated Learning is an important emerging distributed training paradigm that keeps data private on clients. It is now well understood that by controlling only a small subset of FL clients, it is possible to introduce a backdoor to a federated learning model, in the presence of certain attributes. In this paper, we present a new type of attack that compromises the fairness of the trained model. Fairness is understood to be the attribute-level performance distribution of a trained model. It is particularly salient in domains where, for example, skewed accuracy discrimination between subpopulations could have disastrous consequences. We find that by employing a threat model similar to that of a backdoor attack, an attacker is able to influence the aggregated model to have an unfair performance distribution between any given set of attributes. Furthermore, we find that this attack is possible by controlling only a single client. While combating naturally induced unfairness in FL has previously been discussed in depth, its artificially induced kind has been neglected. We show that defending against attacks on fairness should be a critical consideration in any situation where unfairness in a trained model could benefit a user who participated in its training.
    摘要 Federated Learning 是一种重要的新趋势的分布式训练模式,它可以保持客户端上的数据私有。现在已经了解到,只需控制一小部分的 Federated Learning 客户端,就可以引入一个后门到 Federated Learning 模型中,具有特定属性的情况下。在这篇论文中,我们介绍了一种新的攻击方式,它会损害模型的公平性。公平性被理解为模型在不同属性上的性能分布。在一些领域,如果模型对某些子Population的准确率有偏见,可能会导致灾难性的后果。我们发现,通过使用类似于后门攻击模型的威胁模型,攻击者可以通过控制单个客户端,使模型在任意属性之间具有不公平的性能分布。此外,我们发现,这种攻击可以通过控制单个客户端来实现。在过去,对 Federated Learning 中自然occurring 的不公平性进行了深入的研究,但是人工引入的不公平性却被忽视了。我们表明,在任何情况下,如果模型中的不公平性可能会利用用户参与其训练, THEN 防御这种攻击应该是一个关键的考虑因素。

Regression-Based Analysis of Multimodal Single-Cell Data Integration Strategies

  • paper_url: http://arxiv.org/abs/2311.12711
  • repo_url: None
  • paper_authors: Bhavya Mehta, Nirmit Deliwala, Madhav Chandane
  • for: 这 paper 是为了探索多Modal single-cell 技术在疾病生物标志物质检测和药物发现方面的应用,以及这些技术在细胞发育过程中单个细胞水平上的表达。
  • methods: 这 paper 使用了不同的机器学习技术,包括echo state networks,来模型单个细胞水平上的DNA、RNA和蛋白质之间的相互关系,以及细胞发育过程中的细胞ular differentiation。
  • results: 实验结果表明,使用 echo state networks 可以在细胞水平上高度准确地检测疾病生物标志物质,并且可以准确地预测细胞ular differentiation。这些发现可能会推动我们对细胞发育和功能的理解,以及利用机器学习技术在医学领域的应用。
    Abstract Multimodal single-cell technologies enable the simultaneous collection of diverse data types from individual cells, enhancing our understanding of cellular states. However, the integration of these datatypes and modeling the interrelationships between modalities presents substantial computational and analytical challenges in disease biomarker detection and drug discovery. Established practices rely on isolated methodologies to investigate individual molecular aspects separately, often resulting in inaccurate analyses. To address these obstacles, distinct Machine Learning Techniques are leveraged, each of its own kind to model the co-variation of DNA to RNA, and finally to surface proteins in single cells during hematopoietic stem cell development, which simplifies understanding of underlying cellular mechanisms and immune responses. Experiments conducted on a curated subset of a 300,000-cell time course dataset, highlights the exceptional performance of Echo State Networks, boasting a remarkable state-of-the-art correlation score of 0.94 and 0.895 on Multi-omic and CiteSeq datasets. Beyond the confines of this study, these findings hold promise for advancing comprehension of cellular differentiation and function, leveraging the potential of Machine Learning.
    摘要 多Modal单细胞技术可以同时从单个细胞中收集多种数据类型,提高我们对细胞状态的理解。然而,将这些数据类型集成并模型细胞间关系却存在很大的计算和分析挑战,在疾病标记物识别和药物探索中。现有的做法通常是采用分离的方法来研究单个分子方面,常导致不准确的分析。为了解决这些障碍,不同的机器学习技术被投入使用,每种都用自己的方式来模型细胞中DNA和RNA之间的相关性,最终是surfaceproteins的模型。在单细胞发育过程中,这种方法简化了细胞内部机制和免疫应答的理解。实验在一个精心选择的300,000个细胞时序数据集上进行,显示了Echo State Networks的极高性能,与Multi-omic和CiteSeq数据集的相关性分别达到0.94和0.895的remarkable状态。 beyond这些研究,这些发现对细胞 diferenciación和功能的理解有极大的前途。

On the Out-of-Distribution Coverage of Combining Split Conformal Prediction and Bayesian Deep Learning

  • paper_url: http://arxiv.org/abs/2311.12688
  • repo_url: None
  • paper_authors: Paul Scemama, Ariel Kapusta
  • for: 这 paper 的目的是将 Bayesian deep learning 和异常预测结合使用,以增强机器学习系统中的不确定性和安全性。
  • methods: 这 paper 使用了 split conformal prediction 方法,并研究了这些方法在多类图像分类 tasks 中的效果。
  • results: 研究发现,如果模型在核对集上具有较低的自信度,那么结果的异常预测覆盖率可能会比简单的预测可信区域更差。相反,如果模型在核对集上具有较高的自信度,那么使用异常预测方法可能会提高异常预测覆盖率。
    Abstract Bayesian deep learning and conformal prediction are two methods that have been used to convey uncertainty and increase safety in machine learning systems. We focus on combining Bayesian deep learning with split conformal prediction and how this combination effects out-of-distribution coverage; particularly in the case of multiclass image classification. We suggest that if the model is generally underconfident on the calibration set, then the resultant conformal sets may exhibit worse out-of-distribution coverage compared to simple predictive credible sets. Conversely, if the model is overconfident on the calibration set, the use of conformal prediction may improve out-of-distribution coverage. We evaluate prediction sets as a result of combining split conformal methods and neural networks trained with (i) stochastic gradient descent, (ii) deep ensembles, and (iii) mean-field variational inference. Our results suggest that combining Bayesian deep learning models with split conformal prediction can, in some cases, cause unintended consequences such as reducing out-of-distribution coverage.
    摘要 bayesian深度学习和具有保证性的预测是两种用于传递不确定性并提高机器学习系统安全性的方法。我们关注将bayesian深度学习与分割具有保证性的预测结合使用,并对这种结合对于多类图像分类中的外部数据覆盖性产生影响。我们认为,如果模型在校准集上通常是不确定的,那么结果的具有保证性的预测集可能会比简单的预测信任区更差地覆盖外部数据。相反,如果模型在校准集上过于自信,那么使用具有保证性的预测可能会提高外部数据覆盖性。我们通过结合分割具有保证性的预测方法和深度 ensemble、Stochastic gradient descent和mean-field Variational inference所训练的神经网络进行评估。我们的结果表明,将bayesian深度学习模型与分割具有保证性的预测结合可能会在某些情况下导致意外的后果,如降低外部数据覆盖性。

Managing ML-Based Application Non-Functional Behavior: A Multi-Model Approach

  • paper_url: http://arxiv.org/abs/2311.12686
  • repo_url: None
  • paper_authors: Marco Anisetti, Claudio A. Ardagna, Nicola Bena, Ernesto Damiani, Paolo G. Panero
  • for: 提高 Machine Learning(ML)模型在应用程序生命周期中的稳定性和可靠性,包括设计、实现和运维阶段。
  • methods: 提出了一种多模型方法,通过在运行时选择多个 ML 模型,以保证应用程序的非功能性特性在时间和模型变化中保持稳定。该方法包括模型评估和模型替换两个步骤,以确保应用程序的非功能性特性在运行时得到保障。
  • results: 实验结果表明,该方法可以在真实世界的场景中提高非功能性特性的公平性。
    Abstract Modern applications are increasingly driven by Machine Learning (ML) models whose non-deterministic behavior is affecting the entire application life cycle from design to operation. The pervasive adoption of ML is urgently calling for approaches that guarantee a stable non-functional behavior of ML-based applications over time and across model changes. To this aim, non-functional properties of ML models, such as privacy, confidentiality, fairness, and explainability, must be monitored, verified, and maintained. This need is even more pressing when modern applications operate in the edge-cloud continuum, increasing their complexity and dynamicity. Existing approaches mostly focus on i) implementing classifier selection solutions according to the functional behavior of ML models, ii) finding new algorithmic solutions to this need, such as continuous re-training. In this paper, we propose a multi-model approach built on dynamic classifier selection, where multiple ML models showing similar non-functional properties are made available to the application and one model is selected over time according to (dynamic and unpredictable) contextual changes. Our solution goes beyond the state of the art by providing an architectural and methodological approach that continuously guarantees a stable non-functional behavior of ML-based applications, is applicable to different ML models, and is driven by non-functional properties assessed on the models themselves. It consists of a two-step process working during application operation, where model assessment verifies non-functional properties of ML models trained and selected at development time, and model substitution guarantees a continuous and stable support of non-functional properties. We experimentally evaluate our solution in a real-world scenario focusing on non-functional property fairness.
    摘要 现代应用程序越来越受机器学习(ML)模型的影响,这些模型的不确定性对应用生命周期的设计、实现和运维产生了深远的影响。随着ML的普遍应用,需要一些稳定性保证ML模型基于应用的不同时间和模型变更下的行为。为此,需要监测、验证和维护ML模型的隐私、安全、公平和解释性等非功能性特性。这种需求更加急迫,当现代应用程序在边缘云环境中运行时,它们的复杂性和动态性增加。现有的方法主要集中在:(i)根据ML模型的功能行为实现分类器选择解决方案,(ii)找到新的算法解决方案。在这篇论文中,我们提出了一种基于动态分类选择的多模型方法,其中多个显示类似非功能性特性的ML模型被提供给应用程序,并在时间上根据动态和不可预测的上下文变化选择一个模型。我们的解决方案超出了当前的状态艺术,因为它提供了一种体系和方法论的方法,可以在不同的ML模型上保证ML模型基于应用的不同时间和模型变更下的稳定性,并且在运行时进行模型评估和模型替换,以确保持续性和稳定性。我们在一个真实的应用场景中进行实验评估,关注非功能性特性公平。

Adversarial Reweighting Guided by Wasserstein Distance for Bias Mitigation

  • paper_url: http://arxiv.org/abs/2311.12684
  • repo_url: None
  • paper_authors: Xuan Zhao, Simone Fabbrizzi, Paula Reyero Lobo, Siamak Ghodsi, Klaus Broelemann, Steffen Staab, Gjergji Kasneci
  • for: 本研究旨在 Addressing 数据集中不同群体的不平等表现,以避免机器学习模型对少数群体的歧视。
  • methods: 本研究提出了一种新的对抗重量方法,通过减少多数群体的样本来弥补数据集中的表现偏袋。
  • results: 我们的实验表明,我们的方法可以遏制偏袋,不会牺牲分类精度,并且在图像和表格 benchmark 数据集上表现出色。
    Abstract The unequal representation of different groups in a sample population can lead to discrimination of minority groups when machine learning models make automated decisions. To address these issues, fairness-aware machine learning jointly optimizes two (or more) metrics aiming at predictive effectiveness and low unfairness. However, the inherent under-representation of minorities in the data makes the disparate treatment of subpopulations less noticeable and difficult to deal with during learning. In this paper, we propose a novel adversarial reweighting method to address such \emph{representation bias}. To balance the data distribution between the majority and the minority groups, our approach deemphasizes samples from the majority group. To minimize empirical risk, our method prefers samples from the majority group that are close to the minority group as evaluated by the Wasserstein distance. Our theoretical analysis shows the effectiveness of our adversarial reweighting approach. Experiments demonstrate that our approach mitigates bias without sacrificing classification accuracy, outperforming related state-of-the-art methods on image and tabular benchmark datasets.
    摘要 不平等的人群代表性在样本人口中可能导致机器学习模型自动做出不公正的决策。为解决这些问题,公平意识感机器学习同时优化了两个(或更多)指标,即预测效果和低不公正。然而,数据中少数群体的自然下降使得不同人群之间的不同待遇更加难以注意和处理。在这篇论文中,我们提出了一种新的对抗重量法来解决这种“表示偏见”问题。为了均衡多个人群的数据分布,我们的方法减少了主要群体的样本。为了最小化预测风险,我们的方法偏好主要群体的样本,它们与少数群体的距离根据沃氏距离最小。我们的理论分析表明了我们的对抗重量法的效果。实验表明,我们的方法可以减轻偏见而不损失分类精度,在图像和表格 benchmark 数据集上超过相关的现状方法。

Interpretation of the Transformer and Improvement of the Extractor

  • paper_url: http://arxiv.org/abs/2311.12678
  • repo_url: None
  • paper_authors: Zhe Chen
  • for: 本研究旨在提供对Transformer架构的深入理解和全面解释,以便更好地改进Transformer架构。
  • methods: 本研究使用了自己的理解和经验来对Transformer架构进行了全面的解释,并对这些解释进行了证明和验证。此外,本研究还提出了一种改进Extractor的方法,该方法可以在不添加额外可训练参数的情况下,提高Transformer架构的性能。
  • results: 实验结果表明,提高后的Extractor可以在不添加额外可训练参数的情况下,实现更高的性能,这表明了Transformer架构的改进方法。
    Abstract It has been over six years since the Transformer architecture was put forward. Surprisingly, the vanilla Transformer architecture is still widely used today. One reason is that the lack of deep understanding and comprehensive interpretation of the Transformer architecture makes it more challenging to improve the Transformer architecture. In this paper, we first interpret the Transformer architecture comprehensively in plain words based on our understanding and experiences. The interpretations are further proved and verified. These interpretations also cover the Extractor, a family of drop-in replacements for the multi-head self-attention in the Transformer architecture. Then, we propose an improvement on a type of the Extractor that outperforms the self-attention, without introducing additional trainable parameters. Experimental results demonstrate that the improved Extractor performs even better, showing a way to improve the Transformer architecture.
    摘要 六年以上了,Transformer架构仍然广泛使用。一个原因是因为没有深入理解和全面解释Transformer架构,使其更难提高。在这篇论文中,我们首先以简单的话语言解释Transformer架构,基于我们的理解和经验。这些解释还覆盖了多头自注意的家族替换器。然后,我们提出了一种改进该类替换器,超越自注意,无需添加额外可训练参数。实验结果表明,改进后的替换器表现更好,展示了如何提高Transformer架构。

Contrastive Left-Right Wearable Sensors (IMUs) Consistency Matching for HAR

  • paper_url: http://arxiv.org/abs/2311.12674
  • repo_url: None
  • paper_authors: Dominique Nshimyimana, Vitor Fortes Rey, Paul Lukowic
  • for: 这篇论文旨在探讨如何使用实际数据进行自主学习,不需要任何转换。
  • methods: 该方法利用实际数据中的对称性,通过对两个不同的传感器(左右手或腿上的IMU)进行对比,将相互出现的传感器数据变得更加相似,非相互出现的传感器数据变得更加不同。
  • results: 在MM-Fit数据集上,该方法与基准方法SimCLR相比,有 statistically significant 的改善,而在Opportunity数据集上,则与基准方法和SimCLR相比,有轻微的改善。此外,该方法能够在仅使用少量数据进行训练时,仍能提高超参量基eline的性能。
    Abstract Machine learning algorithms are improving rapidly, but annotating training data remains a bottleneck for many applications. In this paper, we show how real data can be used for self-supervised learning without any transformations by taking advantage of the symmetry present in the activities. Our approach involves contrastive matching of two different sensors (left and right wrist or leg-worn IMUs) to make representations of co-occurring sensor data more similar and those of non-co-occurring sensor data more different. We test our approach on the Opportunity and MM-Fit datasets. In MM-Fit we show significant improvement over the baseline supervised and self-supervised method SimCLR, while for Opportunity there is significant improvement over the supervised baseline and slight improvement when compared to SimCLR. Moreover, our method improves supervised baselines even when using only a small amount of the data for training. Future work should explore under which conditions our method is beneficial for human activity recognition systems and other related applications.
    摘要 机器学习算法在快速进步,但训练数据标注仍然是许多应用程序的瓶颈。在这篇论文中,我们展示了如何使用实际数据进行自助学习,无需任何转换。我们的方法利用活动中的对称性,对左右腕或膝上IMU的两种感知器进行对比匹配,使同时发生的感知器数据表示更相似,不同时发生的感知器数据表示更不同。我们在MM-Fit和Opportunity数据集上测试了我们的方法,在MM-Fit数据集上显示了与基eline超级vised和自助学习方法SimCLR的显著改善,在Opportunity数据集上则表现出较小的改善。此外,我们的方法可以在训练数据量很小时也提高超级vised基eline的表现。未来的工作应该探讨我们的方法在人体活动识别系统和相关应用中的有效性。

Towards a more inductive world for drug repurposing approaches

  • paper_url: http://arxiv.org/abs/2311.12670
  • repo_url: https://github.com/ubioinformat/graphemb
  • paper_authors: Jesus de la Fuente, Guillermo Serrano, Uxía Veleiro, Mikel Casals, Laura Vera, Marija Pizurica, Antonio Pineda-Lucena, Idoia Ochoa, Silve Vicent, Olivier Gevaert, Mikel Hernaez
  • for: 预测药物-目标结合(DTI),以减少药物复用的成本和时间投入。
  • methods: 基于图模型学习,并使用新的负边采样策略以增强模型的鲁棒性和可重用性。
  • results: 通过精心评估当前DTI预测模型和数据集,发现现有模型在某些情况下存在泛化问题,并提出了一种新的负边采样策略来解决这个问题。
    Abstract Drug-target interaction (DTI) prediction is a challenging, albeit essential task in drug repurposing. Learning on graph models have drawn special attention as they can significantly reduce drug repurposing costs and time commitment. However, many current approaches require high-demanding additional information besides DTIs that complicates their evaluation process and usability. Additionally, structural differences in the learning architecture of current models hinder their fair benchmarking. In this work, we first perform an in-depth evaluation of current DTI datasets and prediction models through a robust benchmarking process, and show that DTI prediction methods based on transductive models lack generalization and lead to inflated performance when evaluated as previously done in the literature, hence not being suited for drug repurposing approaches. We then propose a novel biologically-driven strategy for negative edge subsampling and show through in vitro validation that newly discovered interactions are indeed true. We envision this work as the underpinning for future fair benchmarking and robust model design. All generated resources and tools are publicly available as a python package.
    摘要 药target交互(DTI)预测是一项具有挑战性的 yet essential task in drug repurposing. 学习图模型在这一领域吸引了特别的注意力,因为它们可以显著减少药品重定向成本和时间投入。然而,当前的方法 often require additional information beyond DTIs, which complicates their evaluation process and usability. 此外,当前的学习建筑结构也妨碍了其公平的比较。在这项工作中,我们首先进行了现有 DTI 数据集和预测模型的深入评估,并通过一种可靠的比较过程来评估这些模型。我们发现,基于推uctive模型的 DTI 预测方法缺乏泛化能力,并且在过去的 literatura 中所评估的方法性能不准确。我们然后提出了一种基于生物学驱动的负边样本采样策略,并通过尺度验证表明了新发现的交互是真实的。我们希望这项工作可以成为未来公平比较和稳定模型设计的基础。所有生成的资源和工具都公开可用,可以通过python包下载。

SSVEP-DAN: A Data Alignment Network for SSVEP-based Brain Computer Interfaces

  • paper_url: http://arxiv.org/abs/2311.12666
  • repo_url: https://github.com/cecnl/ssvep-dan
  • paper_authors: Sung-Yu Chen, Chi-Min Chang, Kuan-Jung Chiang, Chun-Shu Wei
  • For: The paper aims to address the challenge of data insufficiency in steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) by proposing a dedicated neural network model called SSVEP-DAN.* Methods: The proposed SSVEP-DAN model is designed to align SSVEP data across different domains, including various sessions, subjects, or devices. The model uses a novel domain adaptation technique to transform existing source SSVEP data into supplementary calibration data, which can significantly enhance SSVEP decoding accuracy in scenarios with limited calibration data.* Results: The paper presents experimental results across multiple cross-domain scenarios, demonstrating the capability of SSVEP-DAN to transform existing source SSVEP data into supplementary calibration data, leading to improved SSVEP decoding accuracy in scenarios with limited calibration data. The results suggest that SSVEP-DAN can be a catalyst for practical SSVEP-based BCI applications with minimal calibration.
    Abstract Steady-state visual-evoked potential (SSVEP)-based brain-computer interfaces (BCIs) offer a non-invasive means of communication through high-speed speller systems. However, their efficiency heavily relies on individual training data obtained during time-consuming calibration sessions. To address the challenge of data insufficiency in SSVEP-based BCIs, we present SSVEP-DAN, the first dedicated neural network model designed for aligning SSVEP data across different domains, which can encompass various sessions, subjects, or devices. Our experimental results across multiple cross-domain scenarios demonstrate SSVEP-DAN's capability to transform existing source SSVEP data into supplementary calibration data, significantly enhancing SSVEP decoding accuracy in scenarios with limited calibration data. We envision SSVEP-DAN as a catalyst for practical SSVEP-based BCI applications with minimal calibration. The source codes in this work are available at: https://github.com/CECNL/SSVEP-DAN.
    摘要 静态状态视觉诱发电位(SSVEP)基于的脑机器接口(BCI)提供了一种非侵入的通信方式,通过高速排序系统。然而,其效率受到个人培训数据的限制,这些数据通常在耗时的准备期间获取。为解决 SSVEP 基于 BCI 的数据不足问题,我们提出了 SSVEP-DAN,首个专门为 SSVEP 数据进行对齐的神经网络模型。我们在多个跨领域场景中进行了实验, demonstarted SSVEP-DAN 能够将源 SSVEP 数据转换为补充准备数据,明显提高 SSVEP 解码精度,特别是在有限准备数据的情况下。我们看到 SSVEP-DAN 将成为实用 SSVEP 基于 BCI 应用程序的catalyst,减少准备时间。源代码在这里可以获取:https://github.com/CECNL/SSVEP-DAN。

Carbohydrate NMR chemical shift predictions using E(3) equivariant graph neural networks

  • paper_url: http://arxiv.org/abs/2311.12657
  • repo_url: https://github.com/mariabankestad/geqshift
  • paper_authors: Maria Bånkestad, Keven M. Dorst, Göran Widmalm, Jerk Rönnols
  • for: 这篇论文的目的是理解碳水化合物的分子结构和核磁共振成像。
  • methods: 这篇论文使用了E(3)对称图内成神经网络来预测碳水化合物核磁共振спектrum。
  • results: 这种新方法可以减少均方差误差,比传统方法更加准确,并且具有更好的普适性和泛化能力。
    Abstract Carbohydrates, vital components of biological systems, are well-known for their structural diversity. Nuclear Magnetic Resonance (NMR) spectroscopy plays a crucial role in understanding their intricate molecular arrangements and is essential in assessing and verifying the molecular structure of organic molecules. An important part of this process is to predict the NMR chemical shift from the molecular structure. This work introduces a novel approach that leverages E(3) equivariant graph neural networks to predict carbohydrate NMR spectra. Notably, our model achieves a substantial reduction in mean absolute error, up to threefold, compared to traditional models that rely solely on two-dimensional molecular structure. Even with limited data, the model excels, highlighting its robustness and generalization capabilities. The implications are far-reaching and go beyond an advanced understanding of carbohydrate structures and spectral interpretation. For example, it could accelerate research in pharmaceutical applications, biochemistry, and structural biology, offering a faster and more reliable analysis of molecular structures. Furthermore, our approach is a key step towards a new data-driven era in spectroscopy, potentially influencing spectroscopic techniques beyond NMR.
    摘要 碳水化合物,生物系统中重要的组成部分,具有复杂的分子结构。核磁共振(NMR) спектроскопия在理解其分子结构中扮演关键角色,是评估和验证有机分子分子结构的必备工具。在这个过程中,我们提出了一种新的方法,利用E(3)对称Graph Neural Network(GNN)预测碳水化合物NMR спектrum。与传统模型相比,我们的模型可以减少均方差误差,达到3倍以上的减少,即使数据有限。这种模型的稳定性和泛化能力在有限数据情况下表现出色, indicating its potential in accelerating research in pharmaceutical applications, biochemistry, and structural biology.此外,我们的方法可能会影响spectroscopy技术的发展,不仅限于NMR。

FedDRO: Federated Compositional Optimization for Distributionally Robust Learning

  • paper_url: http://arxiv.org/abs/2311.12652
  • repo_url: None
  • paper_authors: Prashant Khanduri, Chengyin Li, Rafi Ibn Sultan, Yao Qiang, Joerg Kliewer, Dongxiao Zhu
  • For: The paper is written to address the challenges of solving compositional optimization (CO) problems in the federated learning (FL) setting, where large-scale and distributed data is available.* Methods: The paper proposes efficient FedAvg-type algorithms for solving non-convex CO problems in the FL setting, utilizing the DRO problem structure to design a communication strategy that controls the bias in the estimation of the compositional gradient.* Results: The paper achieves $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients, and corroborates the theoretical findings with empirical studies on large-scale DRO problems.Here is the same information in Simplified Chinese text:
  • for: 本 paper 是为了解决 federated learning(FL)中的 compositional optimization(CO)问题,即大规模和分布式数据的情况下的 CO 问题。
  • methods: 本 paper 提出了一种基于 FedAvg 的方法来解决非对称 CO 问题在 FL Setting 中,利用 DRO 问题结构设计了一种通信策略来控制每个客户端的LOCAL compositional gradient 估计中的偏好。
  • results: 本 paper 在 FL Setting 中实现了 $\mathcal{O}(\epsilon^{-2})$ 样本和 $\mathcal{O}(\epsilon^{-3/2})$ 通信复杂性,并实现了与客户端数量线性增长的速度。此外,paper 还通过大规模 DRO 问题的实证研究证明了其理论发现。
    Abstract Recently, compositional optimization (CO) has gained popularity because of its applications in distributionally robust optimization (DRO) and many other machine learning problems. Large-scale and distributed availability of data demands the development of efficient federated learning (FL) algorithms for solving CO problems. Developing FL algorithms for CO is particularly challenging because of the compositional nature of the objective. Moreover, current state-of-the-art methods to solve such problems rely on large batch gradients (depending on the solution accuracy) not feasible for most practical settings. To address these challenges, in this work, we propose efficient FedAvg-type algorithms for solving non-convex CO in the FL setting. We first establish that vanilla FedAvg is not suitable to solve distributed CO problems because of the data heterogeneity in the compositional objective at each client which leads to the amplification of bias in the local compositional gradient estimates. To this end, we propose a novel FL framework FedDRO that utilizes the DRO problem structure to design a communication strategy that allows FedAvg to control the bias in the estimation of the compositional gradient. A key novelty of our work is to develop solution accuracy-independent algorithms that do not require large batch gradients (and function evaluations) for solving federated CO problems. We establish $\mathcal{O}(\epsilon^{-2})$ sample and $\mathcal{O}(\epsilon^{-3/2})$ communication complexity in the FL setting while achieving linear speedup with the number of clients. We corroborate our theoretical findings with empirical studies on large-scale DRO problems.
    摘要 近期,compositional optimization(CO)在分布式机器学习(DRO)和其他机器学习问题中得到了广泛应用。随着数据的大规模和分布式化,开发高效的联合学习(FL)算法成为了解决CO问题的必要。在解决CO问题中,FL算法受到了作物的结构化目标函数的特殊挑战。现有的状态arius方法依赖于大批量的梯度(具体取决于解决精度),这些方法在实际应用中不可能实现。为此,在这个工作中,我们提出了高效的FedAvg-type算法来解决非对称CO问题。我们首先证明了vanilla FedAvg不适用于分布式CO问题,因为每个客户端上的数据不同性会使得组合性的本地 Compositional梯度估计中的偏见增加。为此,我们提出了一种基于DRO问题结构的联合学习框架FedDRO,该框架可以通过控制组合性梯度估计中的偏见来设计通信策略。我们的工作的一个关键创新是开发不依赖于大批量梯度(以及函数评估)的解决方案,可以在FL设置下实现$\mathcal{O}(\epsilon^{-2})$样本和$\mathcal{O}(\epsilon^{-3/2})$通信复杂度,同时实现线性增速。我们的理论发现和实验研究表明,这些算法在大规模DRO问题中具有优秀的性能。

Careful Selection and Thoughtful Discarding: Graph Explicit Pooling Utilizing Discarded Nodes

  • paper_url: http://arxiv.org/abs/2311.12644
  • repo_url: None
  • paper_authors: Chuang Liu, Wenhang Yu, Kuang Gao, Xueqi Ma, Yibing Zhan, Jia Wu, Bo Du, Wenbin Hu
  • for: 本研究旨在提高图神经网络(GNNs)中的图表示学习,通过图表示学习来提高图分类任务的性能。
  • methods: 本研究提出了一种新的图显式池化(GrePool)方法,该方法选择节点的过程不仅基于图 convolutional neural networks (CNNs)或多层感知器(MLPs),还基于每个节点对最终表示向量的影响。此外,本研究还提出了一种扩展版的GrePool(GrePool+),该方法在抛弃节点时应用了一个固定的损失函数,以便在训练过程中提高分类精度。
  • results: 根据在12个常用的数据集上进行的广泛的实验,GrePool在大多数数据集上超过14个基eline方法表现。同时,在应用GrePool+后,GrePool的性能得到了进一步改进,而无需增加计算成本。
    Abstract Graph pooling has been increasingly recognized as crucial for Graph Neural Networks (GNNs) to facilitate hierarchical graph representation learning. Existing graph pooling methods commonly consist of two stages: selecting top-ranked nodes and discarding the remaining to construct coarsened graph representations. However, this paper highlights two key issues with these methods: 1) The process of selecting nodes to discard frequently employs additional Graph Convolutional Networks or Multilayer Perceptrons, lacking a thorough evaluation of each node's impact on the final graph representation and subsequent prediction tasks. 2) Current graph pooling methods tend to directly discard the noise segment (dropped) of the graph without accounting for the latent information contained within these elements. To address the first issue, we introduce a novel Graph Explicit Pooling (GrePool) method, which selects nodes by explicitly leveraging the relationships between the nodes and final representation vectors crucial for classification. The second issue is addressed using an extended version of GrePool (i.e., GrePool+), which applies a uniform loss on the discarded nodes. This addition is designed to augment the training process and improve classification accuracy. Furthermore, we conduct comprehensive experiments across 12 widely used datasets to validate our proposed method's effectiveness, including the Open Graph Benchmark datasets. Our experimental results uniformly demonstrate that GrePool outperforms 14 baseline methods for most datasets. Likewise, implementing GrePool+ enhances GrePool's performance without incurring additional computational costs.
    摘要 graph pooling 已经被认为是几何学神经网络(GNNs)中一种重要的技术,以便实现层次化几何学表示学习。现有的几何学池化方法通常包括两个阶段:选择状态ranked nodes和将其搁置以建立简化的几何学表示。但是,这篇文章显示了两个关键问题:1)选择状态搁置过程通常还是使用更多的几何学径网络或多层感知器,未经充分评估每个状态的影响,以及后续的几何学表示和预测任务。2)现有的几何学池化方法通常直接将噪声段(dropped)的几何学搁置掉,未经考虑这些元素中含的潜在信息。为了解决第一个问题,我们引入了一种新的几何学Explicit Pooling(GrePool)方法,可以透过考虑状态之间的关系和最终的表示向量,实现分类的关键性。对于第二个问题,我们提出了增强版的GrePool(i.e., GrePool+),可以透过对搁置状态的均匀损失进行训练,以提高分类精度。此外,我们在12个常用的Open Graph Benchmark datasets上进行了广泛的实验,结果显示GrePool在大多数dataset上表现出色,超过14个基eline方法。另外,实现GrePool+可以将GrePool的性能提高,而不需要额外的计算成本。

Hierarchical Joint Graph Learning and Multivariate Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2311.12630
  • repo_url: None
  • paper_authors: Juhyeon Kim, Hyungeun Lee, Seungwon Yu, Ung Hwang, Wooyul Jung, Miseon Park, Kijung Yoon
  • for: 这篇论文旨在提供一种用于多变量时间序列的模型,以便更好地预测长期时间序列资料中的趋势。
  • methods: 本论文使用 graf neural network (GNN) 和注意力机制,以有效地学习时间序列资料中的下一步关系。
  • results: 根据实验结果,本论文的提案比现有的模型有23%的减少 Mean Squared Error (MSE)。
    Abstract Multivariate time series is prevalent in many scientific and industrial domains. Modeling multivariate signals is challenging due to their long-range temporal dependencies and intricate interactions--both direct and indirect. To confront these complexities, we introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them. Specifically, we leverage graph neural networks (GNN) and attention mechanisms to efficiently learn the underlying relationships within the time series data. Moreover, we suggest employing hierarchical signal decompositions running over the graphs to capture multiple spatial dependencies. The effectiveness of our proposed model is evaluated across various real-world benchmark datasets designed for long-term forecasting tasks. The results consistently showcase the superiority of our model, achieving an average 23\% reduction in mean squared error (MSE) compared to existing models.
    摘要 这文本将被翻译为简化字的中文。多元时间序列在许多科学和工业领域非常普遍,模型多元信号具有复杂的长距离时间相互作用和复杂的互动——直接和间接。为了面对这些复杂性,我们提出一种将多元信号转换为图形中的节点,并在这些节点之间设置节点间的关系。 Specifically,我们充分利用图形神经网络(GNN)和注意机制来划时间序列数据的下一步。此外,我们建议使用层次信号分解在图形上执行多个空间相互作用。我们的提案模型的效果在许多实际世界的参考数据上进行了长期预测任务的评估。结果显示我们的模型具有超过23%的减少 Mean Squared Error(MSE),相比于现有的模型。

Bridging Algorithmic Information Theory and Machine Learning: A New Approach to Kernel Learning

  • paper_url: http://arxiv.org/abs/2311.12624
  • repo_url: None
  • paper_authors: Boumediene Hamzi, Marcus Hutter, Houman Owhadi
  • for: 这个论文的目的是探讨机器学习(ML)和算法信息理论(AIT)如何看待复杂性的问题。
  • methods: 该论文采用了AIT的视角来研究从数据中学习kernels的问题,特别是在kernel ridge regression中使用 sparse kernel flows的方法。
  • results: 该论文证明了使用 sparse kernel flows方法可以自然地从数据中学习kernels,而无需通过统计学的路径来 derivation。此外,论文还发现了MDL和RML之间的相似性和 diferencias,并证明了这种方法的可行性。
    Abstract Machine Learning (ML) and Algorithmic Information Theory (AIT) look at Complexity from different points of view. We explore the interface between AIT and Kernel Methods (that are prevalent in ML) by adopting an AIT perspective on the problem of learning kernels from data, in kernel ridge regression, through the method of Sparse Kernel Flows. In particular, by looking at the differences and commonalities between Minimal Description Length (MDL) and Regularization in Machine Learning (RML), we prove that the method of Sparse Kernel Flows is the natural approach to adopt to learn kernels from data. This paper shows that it is not necessary to use the statistical route to derive Sparse Kernel Flows and that one can directly work with code-lengths and complexities that are concepts that show up in AIT.
    摘要

Koopman Learning with Episodic Memory

  • paper_url: http://arxiv.org/abs/2311.12615
  • repo_url: None
  • paper_authors: William T. Redman, Dean Huang, Maria Fonoberova, Igor Mezić
  • for: 学习非站点时间序列预测
  • methods: 使用koopman方法和 episodic memory机制
  • results: 显著改善预测性能 на синтетиче和实际数据上
    Abstract Koopman operator theory, a data-driven dynamical systems framework, has found significant success in learning models from complex, real-world data sets, enabling state-of-the-art prediction and control. The greater interpretability and lower computational costs of these models, compared to traditional machine learning methodologies, make Koopman learning an especially appealing approach. Despite this, little work has been performed on endowing Koopman learning with the ability to learn from its own mistakes. To address this, we equip Koopman methods - developed for predicting non-stationary time-series - with an episodic memory mechanism, enabling global recall of (or attention to) periods in time where similar dynamics previously occurred. We find that a basic implementation of Koopman learning with episodic memory leads to significant improvements in prediction on synthetic and real-world data. Our framework has considerable potential for expansion, allowing for future advances, and opens exciting new directions for Koopman learning.
    摘要 科普曼算法理论,一种数据驱动动系统框架,在处理复杂实际数据集时取得了显著成功,实现了当前的预测和控制。相比传统机器学习方法,科普曼学习具有更高的解释性和较低的计算成本,使其成为非常吸引人的方法。然而,有很少关于使科普曼学习能够学习自己的错误的研究。为解决这个问题,我们在科普曼方法(开发用于预测非站立时序)中增加了一个 episodic memory 机制,允许在时间上搜索(或强调)过去的相似动力期间。我们发现,将基本的科普曼学习与 episodic memory 结合使用可以在 sintetic 和实际数据上显著提高预测性能。我们的框架具有很大的潜力,允许未来的扩展,开启了新的探索方向。

Decentralised Q-Learning for Multi-Agent Markov Decision Processes with a Satisfiability Criterion

  • paper_url: http://arxiv.org/abs/2311.12613
  • repo_url: None
  • paper_authors: Keshav P. Keval, Vivek S. Borkar
  • for: 解决多代理Markov决策过程(MMDP)中每个代理的时间平均成本低于预先指定的代理特定上限。
  • methods: combining Q-learning algorithm with gossip algorithm和Metropolis-Hastings或Multiplicative Weights formalism,使用多个时间尺度,并证明在某些条件下,该算法可以准确达到每个代理的欲望的上限。
  • results: 在MMDP中,该算法可以实现时间平均成本低于预先指定的代理特定上限,并在更一般的情况下,对MMDP有jointly controlled per-stage costs进行了实验性证明。
    Abstract In this paper, we propose a reinforcement learning algorithm to solve a multi-agent Markov decision process (MMDP). The goal, inspired by Blackwell's Approachability Theorem, is to lower the time average cost of each agent to below a pre-specified agent-specific bound. For the MMDP, we assume the state dynamics to be controlled by the joint actions of agents, but the per-stage costs to only depend on the individual agent's actions. We combine the Q-learning algorithm for a weighted combination of the costs of each agent, obtained by a gossip algorithm with the Metropolis-Hastings or Multiplicative Weights formalisms to modulate the averaging matrix of the gossip. We use multiple timescales in our algorithm and prove that under mild conditions, it approximately achieves the desired bounds for each of the agents. We also demonstrate the empirical performance of this algorithm in the more general setting of MMDPs having jointly controlled per-stage costs.
    摘要 在这篇论文中,我们提出了一种利用强化学习算法解决多 Agent Markov决策过程(MMDP)的问题。我们的目标,启发自黑威尔的接近性定理,是使每个代理人的时间平均成本低于预先指定的代理人特定的上限。对于MMDP,我们假设状态动力是由代理人共同控制的,但每个阶段的成本只取决于每个代理人的行动。我们将Q学习算法与Weighted Combination的成本来实现一个协调器,并使用多个时间尺度。我们证明,在某些条件下,我们的算法可以相对准确地实现每个代理人的目标。此外,我们还在更通用的MMDP中表明了我们的算法的实际表现。

A New Type Of Upper And Lower Bounds On Right-Tail Probabilities Of Continuous Random Variables

  • paper_url: http://arxiv.org/abs/2311.12612
  • repo_url: None
  • paper_authors: Nikola Zlatanov
  • for: 这 paper 是为了提供一种 Completely new type of upper and lower bounds on the right-tail probabilities of continuous random variables with unbounded support and with semi-bounded support from the left.
  • methods: 这 paper 使用的方法是基于 probability density function (PDF), its first derivative, and two parameters for tightening the bounds.
  • results: 这 paper 提供的 result 是一种新的 tail bounds, which is shown to be tight for a wide range of continuous random variables via numerical examples.
    Abstract In this paper, I present a completely new type of upper and lower bounds on the right-tail probabilities of continuous random variables with unbounded support and with semi-bounded support from the left. The presented upper and lower right-tail bounds depend only on the probability density function (PDF), its first derivative, and two parameters that are used for tightening the bounds. These tail bounds hold under certain conditions that depend on the PDF, its first and second derivatives, and the two parameters. The new tail bounds are shown to be tight for a wide range of continuous random variables via numerical examples.
    摘要 在这篇论文中,我提出了一种 completelly new的上下 bounds 类型,用于绝对随机变量的右尾概率。这些上下 bounds 仅仅取决于分布函数(PDF)、其首导函数和两个参数,用于紧紧化 bounds。这些尾 bounds 在满足某些基于 PDF、其首导函数和两个参数的条件下成立。我们通过 numerical examples 示出了这些尾 bounds 对许多不同类型的绝对随机变量的紧紧性。

ChronoPscychosis: Temporal Segmentation and Its Impact on Schizophrenia Classification Using Motor Activity Data

  • paper_url: http://arxiv.org/abs/2311.12590
  • repo_url: None
  • paper_authors: Pradnya Rajendra Jadhav, Raviprasad Aduri
    for:This paper aims to identify reliable biomarkers for the accurate classification of Schizophrenia patients using motor activity data.methods:The paper uses temporal pattern analysis and machine learning models to classify individuals with Schizophrenia based on their motor activity data. The dataset contains per-minute motor activity measurements collected for an average of 12.7 days in a row for each participant, and the authors use segmentation techniques to divide each day into six parts. They employ sixteen statistical features within these temporal segments and train seven machine learning models to evaluate their impact on classification.results:The results show that temporal segmentation significantly improves the classification of Schizophrenia patients and controls, with the LightGBM model outperforming the other six models. The authors find that distinguishing between diurnal and nocturnal segments amplifies the differences between Schizophrenia patients and controls, but further subdivisions into smaller time segments do not affect the AUC-ROC significantly. The paper concludes that extensive temporal classification beyond distinguishing between day and night does not yield substantial results, offering an efficient approach for further classification, early diagnosis, and monitoring of Schizophrenia.Here is the result in Simplified Chinese text:for: 这篇论文目的是通过动作数据来准确分类Schizophrenia患者。methods: 这篇论文使用时间模式分析和机器学习模型来分类Schizophrenia患者和控制人群的动作数据。数据集包含每分钟动作测量记录,每个参与者平均记录了12.7天的数据。作者使用时间段分法将每天分成六个部分,并使用16个统计特征来评估其影响分类。results: 结果表明,时间段分法可以有效地分类Schizophrenia患者和控制人群,LightGBM模型在七种机器学习模型中表现最佳。作者发现,将每天分成晨、下午、晚上和夜间四个部分可以增强对Schizophrenia患者和控制人群的分化。然而,进一步分为更小的时间段并不会对AUC-ROC产生显著影响。
    Abstract Schizophrenia is a complicated mental illness characterized by a broad spectrum of symptoms affecting cognition, behavior, and emotion. The task of identifying reliable biomarkers to classify Schizophrenia accurately continues to be a challenge in the field of psychiatry. We investigate the temporal patterns within the motor activity data as a potential key to enhancing the categorization of individuals with Schizophrenia, using the dataset having motor activity recordings of 22 Schizophrenia patients and 32 control subjects. The dataset contains per-minute motor activity measurements collected for an average of 12.7 days in a row for each participant. We dissect each day into segments (Twelve, Eight, six, four, three, and two parts) and evaluate their impact on classification. We employ sixteen statistical features within these temporal segments and train them on Seven machine learning models to get deeper insights. LightGBM model outperforms the other six models. Our results indicate that the temporal segmentation significantly improves the classification, with AUC-ROC = 0.93, F1 score = 0.84( LightGBM- without any segmentation) and AUC-ROC = 0.98, F1 score = 0.93( LightGBM- with segmentation). Distinguishing between diurnal and nocturnal segments amplifies the differences between Schizophrenia patients and controls. However, further subdivisions into smaller time segments do not affect the AUC- ROC significantly. Morning, afternoon, evening, and night partitioning gives similar classification performance to day-night partitioning. These findings are valuable as they indicate that extensive temporal classification beyond distinguishing between day and night does not yield substantial results, offering an efficient approach for further classification, early diagnosis, and monitoring of Schizophrenia.
    摘要 <>Schizophrenia 是一种复杂的心理疾病,表现为认知、行为和情感方面的各种症状。鉴别Schizophrenia 的可靠生物标志物仍然是心理医学领域中的挑战。我们通过分析motor activity数据中的时间 patrterns 来增强Schizophrenia 的分类,使用了22名Schizophrenia 患者和32名控制组的motor activity记录数据。数据集包含每分钟的motor activity测量,每名参与者收集了12.7天的数据。我们将每天分解成不同的时间段(12、8、6、4、3、2分钟),并评估它们对分类的影响。我们使用16个统计特征来训练7种机器学习模型,以获得更深入的理解。LightGBM模型在所有模型中表现最佳,我们的结果表明,时间分segmentation 可以明显提高分类效果,AUC-ROC = 0.93,F1 score = 0.84(LightGBM- без分segmentation)和AUC-ROC = 0.98,F1 score = 0.93(LightGBM- with segmentation)。通过分区Diurnal和Nocturnal Segmentation,可以强化对Schizophrenia 患者和控制组的分别。然而,进一步分解时间段不会对AUC-ROC产生显著影响。晨、下午、晚上和夜分 partitioning 具有相同的分类表现,与日夜分 partitioning 相同。这些发现有价值,因为它们表明,进一步的时间分类,超出了日夜分的分类,不会提供显著的改善,提供有效的方法 дляSchizophrenia 的早期诊断和监测。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Nonlinear System Identification of Swarm of UAVs Using Deep Learning Methods

  • paper_url: http://arxiv.org/abs/2311.12906
  • repo_url: None
  • paper_authors: Saman Yazdannik, Morteza Tayefi, Mojtaba Farrokh
  • for: 这个研究设计并评估了多种非线性系统识别技术,用于模型探索无人航空器群系统在平面空间中的动态。
  • methods: 研究使用了RNNs、CNNs和Neural ODE等学习方法,并对它们进行比较。
  • results: 结果显示,结合Neural ODE的模型,使用了稳定数据进行训练,能够灵活地预测无人航空器群的稳定性。
    Abstract This study designs and evaluates multiple nonlinear system identification techniques for modeling the UAV swarm system in planar space. learning methods such as RNNs, CNNs, and Neural ODE are explored and compared. The objective is to forecast future swarm trajectories by accurately approximating the nonlinear dynamics of the swarm model. The modeling process is performed using both transient and steady-state data from swarm simulations. Results show that the combination of Neural ODE with a well-trained model using transient data is robust for varying initial conditions and outperforms other learning methods in accurately predicting swarm stability.
    摘要 Translated into Simplified Chinese:这个研究设计和评估多种非线性系统identification技术,用于模型planar空间中的UAV群体系统。学习方法包括RNNs, CNNs,和Neural ODE,并对它们的能力进行比较,以 forecast future swarm trajectories by accurately approximating the nonlinear dynamics of the swarm model。模型进程使用了 Both transient and steady-state data from swarm simulations。结果表明,将Neural ODE与良好训练的模型结合使用,使用transient数据,对于不同的初始条件是Robust和其他学习方法在accurately predicting swarm stability。

Machine-Guided Discovery of a Real-World Rogue Wave Model

  • paper_url: http://arxiv.org/abs/2311.12579
  • repo_url: https://github.com/dionhaefner/rogue-wave-discovery
  • paper_authors: Dion Häfner, Johannes Gemmrich, Markus Jochum
  • for: 这 paper 的目的是探索如何使用机器学习模型进行科学发现,特别是在海洋恶势力波预测方面。
  • methods: 该 paper 使用了 causal 分析、深度学习、简洁性导向的模型选择和符号回归来从数据中发现一个新的符号模型。
  • results: 该 paper 通过训练一个人工神经网络,使用 causal 特征从广泛的波报船观测数据集中提取特征,并通过符号回归将黑盒模型转化为可读的 математиче Equation,以实现更好的预测性和理解。
    Abstract Big data and large-scale machine learning have had a profound impact on science and engineering, particularly in fields focused on forecasting and prediction. Yet, it is still not clear how we can use the superior pattern matching abilities of machine learning models for scientific discovery. This is because the goals of machine learning and science are generally not aligned. In addition to being accurate, scientific theories must also be causally consistent with the underlying physical process and allow for human analysis, reasoning, and manipulation to advance the field. In this paper, we present a case study on discovering a new symbolic model for oceanic rogue waves from data using causal analysis, deep learning, parsimony-guided model selection, and symbolic regression. We train an artificial neural network on causal features from an extensive dataset of observations from wave buoys, while selecting for predictive performance and causal invariance. We apply symbolic regression to distill this black-box model into a mathematical equation that retains the neural network's predictive capabilities, while allowing for interpretation in the context of existing wave theory. The resulting model reproduces known behavior, generates well-calibrated probabilities, and achieves better predictive scores on unseen data than current theory. This showcases how machine learning can facilitate inductive scientific discovery, and paves the way for more accurate rogue wave forecasting.
    摘要 大数据和大规模机器学习对科学和工程领域的预测和预测有着深远的影响,特别是在预测和预测方面。然而,还没有解释如何使用机器学习模型的优秀模式匹配能力为科学发现提供价值。这是因为机器学习和科学的目标通常不匹配。除了准确性,科学理论还需要符合物理过程的 causal consistency,并且允许人类分析、理解和操作,以提高领域的进步。在这篇论文中,我们提出了一个案例研究,旨在从数据中发现新的符号式模型,用于海洋恶性漫游波。我们使用 causal 分析、深度学习、偏好导向的模型选择和符号回归来训练人工神经网络,并选择了预测性和 causal invariance。我们将这个黑obox模型通过符号回归转换成一个数学方程,保留神经网络的预测能力,同时允许在现有波动理论中进行解释。得到的模型可以重现已知行为,生成准确抽样,并在未seen数据上达到现有理论的预测能力。这种示例揭示了如何使用机器学习进行 inductive 科学发现,并为更准确的恶性漫游波预测开辟了道路。

BEND: Benchmarking DNA Language Models on biologically meaningful tasks

  • paper_url: http://arxiv.org/abs/2311.12570
  • repo_url: https://github.com/frederikkemarin/bend
  • paper_authors: Frederikke Isa Marin, Felix Teufel, Marc Horlacher, Dennis Madsen, Dennis Pultz, Ole Winther, Wouter Boomsma
  • for: 这个论文的目的是为了评估 DNA 语言模型,并提供一个可靠的评估标准。
  • methods: 这个论文使用了一个名为 BEND 的 Benchmark,用于评估 DNA 语言模型的表现。 BEND 包含了一系列真实和生物学上有意义的下游任务,定义于人类基因组。
  • results: 研究发现,当前的 DNA LM 嵌入可以接近专家方法的表现在一些任务上,但只能捕捉长距离特征的有限信息。
    Abstract The genome sequence contains the blueprint for governing cellular processes. While the availability of genomes has vastly increased over the last decades, experimental annotation of the various functional, non-coding and regulatory elements encoded in the DNA sequence remains both expensive and challenging. This has sparked interest in unsupervised language modeling of genomic DNA, a paradigm that has seen great success for protein sequence data. Although various DNA language models have been proposed, evaluation tasks often differ between individual works, and might not fully recapitulate the fundamental challenges of genome annotation, including the length, scale and sparsity of the data. In this study, we introduce BEND, a Benchmark for DNA language models, featuring a collection of realistic and biologically meaningful downstream tasks defined on the human genome. We find that embeddings from current DNA LMs can approach performance of expert methods on some tasks, but only capture limited information about long-range features. BEND is available at https://github.com/frederikkemarin/BEND.
    摘要 genomic DNA 序列中包含细胞生物学过程的蓝图。自过去几个 décadas 来, genomic DNA 序列的可用性有很大提高,但实验室标注 genomic DNA 序列中不同类型的功能、非编码和调控元件的实验室标注仍然是非常昂贵和困难的。这引发了 DNA 语言模型的无监督学习的兴趣,这种 Paradigma 在蛋白质序列数据上已经取得了很大的成功。虽然不同的 DNA 语言模型已经被提出,但评估任务经常不同于具体的研究,并且可能不完全回归生物学上的基本挑战,包括人类基因组序列的长度、大小和稀疏性。在本研究中,我们介绍了 BEND,一个基本的 DNA 语言模型评估工具,其中包含人类基因组序列上的实际和生物学上有意义的下游任务。我们发现,现有的 DNA LM 的嵌入可以在某些任务上 approached 专家方法的性能,但只能捕捉到长距离特征的有限信息。BEND 可以在 上下载。

Differentiable Sampling of Categorical Distributions Using the CatLog-Derivative Trick

  • paper_url: http://arxiv.org/abs/2311.12569
  • repo_url: None
  • paper_authors: Lennert De Smet, Emanuele Sansone, Pedro Zuidberg Dos Martires
  • for: 这个论文主要针对的是如何在某些维度上学习 categorical 随机变量,即 discrete 和不确定的数据中的一部分。
  • methods: 这个论文使用了 Log-Derivative trick 和 CatLog-Derivative trick 等技术来估算 categorical 分布的Gradient。
  • results: 论文提出了一种新的、不偏的 gradient estimator 名为 IndeCateR,其可以更好地估算 products of independent categorical distributions 中的 gradient,并且可以在实际应用中fficiently implement。
    Abstract Categorical random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art.
    摘要 categorial random variables can faithfully represent the discrete and uncertain aspects of data as part of a discrete latent variable model. Learning in such models necessitates taking gradients with respect to the parameters of the categorical probability distributions, which is often intractable due to their combinatorial nature. A popular technique to estimate these otherwise intractable gradients is the Log-Derivative trick. This trick forms the basis of the well-known REINFORCE gradient estimator and its many extensions. While the Log-Derivative trick allows us to differentiate through samples drawn from categorical distributions, it does not take into account the discrete nature of the distribution itself. Our first contribution addresses this shortcoming by introducing the CatLog-Derivative trick - a variation of the Log-Derivative trick tailored towards categorical distributions. Secondly, we use the CatLog-Derivative trick to introduce IndeCateR, a novel and unbiased gradient estimator for the important case of products of independent categorical distributions with provably lower variance than REINFORCE. Thirdly, we empirically show that IndeCateR can be efficiently implemented and that its gradient estimates have significantly lower bias and variance for the same number of samples compared to the state of the art.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Variational Elliptical Processes

  • paper_url: http://arxiv.org/abs/2311.12566
  • repo_url: None
  • paper_authors: Maria Bånkestad, Jens Sjölund, Jalil Taghia, Thomas B. Schöon
  • for: 这篇论文是为了描述一种非 Parametric 随机模型,它包含 Gaussian 过程和 Student’s t 过程的总和,并且包含了一些新的重 tailed 行为。
  • methods: 这篇论文使用了一种基于椭圆分布的表示,将椭圆分布转换为一个连续的混合 Gaussian 分布,并使用变量推断来参数化这个混合分布。
  • results: 论文通过 regression 和 classification 实验表明,elliptical 过程可以在大规模问题中应用,并且在一些情况下可以超越 Gaussian 过程,如非 Gaussian 的 likelihood 或者需要高精度的 tail 模型。
    Abstract We present elliptical processes, a family of non-parametric probabilistic models that subsume Gaussian processes and Student's t processes. This generalization includes a range of new heavy-tailed behaviors while retaining computational tractability. Elliptical processes are based on a representation of elliptical distributions as a continuous mixture of Gaussian distributions. We parameterize this mixture distribution as a spline normalizing flow, which we train using variational inference. The proposed form of the variational posterior enables a sparse variational elliptical process applicable to large-scale problems. We highlight advantages compared to Gaussian processes through regression and classification experiments. Elliptical processes can supersede Gaussian processes in several settings, including cases where the likelihood is non-Gaussian or when accurate tail modeling is essential.
    摘要 我们介绍了椭圆过程,一种非参数 probabilistic 模型家族,包括 Gaussian 过程和 Student's t 过程的总结。这种总结包括一些新的重 tailed 行为,而且保持计算可 tractability。椭圆过程基于椭圆分布的连续混合模型,我们用 spline 正规流来参数化这个混合模型。我们使用变分INFERENCE 来训练这个变量 posteriors,并且可以通过大规模问题中的稀疏变量 elliptical process 来实现。我们在回归和分类 экспериментах中强调了这种方法的优势,比如在非 Gaussian 的 likelihood 中或者需要高精度尾部模型时。椭圆过程可以在一些情况下超越 Gaussian 过程,例如非 Gaussian 的 likelihood 中或者需要高精度尾部模型时。

Summary of the DISPLACE Challenge 2023 – DIarization of SPeaker and LAnguage in Conversational Environments

  • paper_url: http://arxiv.org/abs/2311.12564
  • repo_url: None
  • paper_authors: Shikha Baghel, Shreyas Ramoji, Somil Jain, Pratik Roy Chowdhuri, Prachi Singh, Deepu Vijayasenan, Sriram Ganapathy
  • for: 这个论文是关于多语言多说者的会话中提取信息的技术问题。
  • methods: 这个论文使用了演讲技术来分类多语言多说者的会话,并提供了一个基准系统来评估这些技术。
  • results: 论文描述了一个开放挑战(DISPLACE),用于评估多语言多说者会话中的 speaker 和语言分类技术。挑战中有两个跟踪,一个是关于 speaker 分类(SD),另一个是关于语言分类(LD)。两个跟踪都使用了同一个原始音频数据进行评估。挑战收到了全球共42个注册和19个提交。这篇论文还提供了一个简洁的概述,关于提交的系统和其性能。
    Abstract In multi-lingual societies, where multiple languages are spoken in a small geographic vicinity, informal conversations often involve mix of languages. Existing speech technologies may be inefficient in extracting information from such conversations, where the speech data is rich in diversity with multiple languages and speakers. The DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) challenge constitutes an open-call for evaluating and bench-marking the speaker and language diarization technologies on this challenging condition. The challenge entailed two tracks: Track-1 focused on speaker diarization (SD) in multilingual situations while, Track-2 addressed the language diarization (LD) in a multi-speaker scenario. Both the tracks were evaluated using the same underlying audio data. To facilitate this evaluation, a real-world dataset featuring multilingual, multi-speaker conversational far-field speech was recorded and distributed. Furthermore, a baseline system was made available for both SD and LD task which mimicked the state-of-art in these tasks. The challenge garnered a total of $42$ world-wide registrations and received a total of $19$ combined submissions for Track-1 and Track-2. This paper describes the challenge, details of the datasets, tasks, and the baseline system. Additionally, the paper provides a concise overview of the submitted systems in both tracks, with an emphasis given to the top performing systems. The paper also presents insights and future perspectives for SD and LD tasks, focusing on the key challenges that the systems need to overcome before wide-spread commercial deployment on such conversations.
    摘要 在多语言社会中, WHERE 多种语言在小 geographic vicinity 中被使用, informal conversations 常常包含多种语言的混合。现有的 speech 技术可能无法有效地从这些 conversations 中提取信息,因为 speech 数据具有多种语言和 speaker 的多样性。为了解决这个问题, DISPLACE (DIarization of SPeaker and LAnguage in Conversational Environments) 挑战成为了一个开放的评估和比较 speaker 和 language 分类技术的平台。挑战包括了两个track:Track-1 专注于 speaker diarization (SD)在多语言 situational 中,而 Track-2 则关注了 language diarization (LD)在多 speaker 场景中。两个 track 都是通过同一个基础 dataset 进行评估。为了促进这种评估,一个实际的多语言、多 speaker 对话型 far-field speech 数据集被记录和分发。此外,一个基eline system 也被提供了,以模拟当前 SD 和 LD 任务的状态。挑战共收到了全球 $42$ 个注册,并接收了 $19$ 个共订 submissions ,包括 Track-1 和 Track-2 两个 track 的 submissions。这篇文章介绍了挑战,包括数据集、任务和基eline system 的详细信息。此外,文章还提供了 submissions 的概括,强调 top performing systems。文章还提供了 SD 和 LD 任务的材料和未来展望,专注于系统需要在这些对话中解决的关键挑战。

Explainable Anomaly Detection using Masked Latent Generative Modeling

  • paper_url: http://arxiv.org/abs/2311.12550
  • repo_url: None
  • paper_authors: Daesoo Lee, Sara Malacarne, Erlend Aune
  • for: 这个研究是为了提出一个新的时间序列偏常检测方法,其可以实现高精度的检测结果,并且提供更高的解释性。
  • methods: 本研究使用的方法是基于时间序列生成模型,具体来说是使用TimeVQVAE模型,这个模型在时间频率域的潜在空间中进行了覆盖式生成。这使得在不同的频率带中保留了时间序列的尺度 semantics,从而为检测偏常提供更好的解释。
  • results: 实验结果显示,TimeVQVAE-AD方法在UCRL Time Series Anomaly档案中具有较高的检测精度和解释性,较之前的方法有所改善。
    Abstract We present a novel time series anomaly detection method that achieves excellent detection accuracy while offering a superior level of explainability. Our proposed method, TimeVQVAE-AD, leverages masked generative modeling adapted from the cutting-edge time series generation method known as TimeVQVAE. The prior model is trained on the discrete latent space of a time-frequency domain. Notably, the dimensional semantics of the time-frequency domain are preserved in the latent space, enabling us to compute anomaly scores across different frequency bands, which provides a better insight into the detected anomalies. Additionally, the generative nature of the prior model allows for sampling likely normal states for detected anomalies, enhancing the explainability of the detected anomalies through counterfactuals. Our experimental evaluation on the UCR Time Series Anomaly archive demonstrates that TimeVQVAE-AD significantly surpasses the existing methods in terms of detection accuracy and explainability.
    摘要 我们提出了一种新的时间序列异常检测方法,它在检测精度和解释性两个方面具有出色的表现。我们的提议方法,时间VQVAE-AD,利用了遮盖性生成模型,这种模型是基于时间序列生成领域的前沿技术之一。该模型在时间-频域中的权重空间进行训练,并保留了时间-频域中的维度 semantics,因此可以在不同的频率带上计算异常分数,从而提供更好的异常检测result。此外,生成性的特性使得可以对检测到的异常进行采样,从而提高了异常检测的解释性。我们对UCRL时间序列异常存档进行实验表明,TimeVQVAE-AD在检测精度和解释性两个方面都有明显的优势。

Learning to Compute Gröbner Bases

  • paper_url: http://arxiv.org/abs/2311.12904
  • repo_url: None
  • paper_authors: Hiroshi Kera, Yuki Ishihara, Yuta Kambe, Tristan Vaccon, Kazuhiro Yokoyama
  • for: 本研究的目的是通过训练变换器来实现多变量多式系统的格氏矩阵计算,以提高计算效率。
  • methods: 本研究使用了随机生成格氏矩阵和转换非格氏矩阵的方法,以及使用零维度 radical ideals 解决 backwards Gröbner problem。
  • results: 实验表明,在五变量情况下,提议的数据集生成方法比直观方法快五个数量级,解决了计算格氏矩阵的重要挑战。
    Abstract Solving a polynomial system, or computing an associated Gr\"obner basis, has been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.
    摘要 解决一个多项式系统或计算其相关的格罗本基 hath been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.Here's the translation in Traditional Chinese:解决一个多项式系统或计算其相关的格罗本基 hath been a fundamental task in computational algebra. However, it is also known for its notoriously expensive computational cost -- doubly exponential time complexity in the number of variables in the worst case. In this paper, we achieve for the first time Gr\"obner basis computation through the training of a transformer. The training requires many pairs of a polynomial system and the associated Gr\"obner basis, thus motivating us to address two novel algebraic problems: random generation of Gr\"obner bases and the transformation of them into non-Gr\"obner polynomial systems, termed as \textit{backward Gr\"obner problem}. We resolve these problems with zero-dimensional radical ideals, the ideals appearing in various applications. The experiments show that in the five-variate case, the proposed dataset generation method is five orders of magnitude faster than a naive approach, overcoming a crucial challenge in learning to compute Gr\"obner bases.

An efficient likelihood-free Bayesian inference method based on sequential neural posterior estimation

  • paper_url: http://arxiv.org/abs/2311.12530
  • repo_url: https://github.com/yifei-xiong/efficient-snpe
  • paper_authors: Yifei Xiong, Xiliang Yang, Sanguo Zhang, Zhijian He
  • for: 用于处理基于模拟的模型,具有不可解likelihood的问题。
  • methods: 使用Sequential Neural Posterior Estimation(SNPE)技术,通过神经网络基于的Conditional Density Estimator(CDE)来学习 posterior。
  • results: 提出了一种适应calibration kernel的SNPE方法,以及多种减少方差的技术,从而提高了SNPE的稳定性和训练效率,并在数值实验中证明了其比原始SNPE方法和一些现有竞争者更好地预测 posterior。
    Abstract Sequential neural posterior estimation (SNPE) techniques have been recently proposed for dealing with simulation-based models with intractable likelihoods. Unlike approximate Bayesian computation, SNPE techniques learn the posterior from sequential simulation using neural network-based conditional density estimators by minimizing a specific loss function. The SNPE method proposed by Lueckmann et al. (2017) used a calibration kernel to boost the sample weights around the observed data, resulting in a concentrated loss function. However, the use of calibration kernels may increase the variances of both the empirical loss and its gradient, making the training inefficient. To improve the stability of SNPE, this paper proposes to use an adaptive calibration kernel and several variance reduction techniques. The proposed method greatly speeds up the process of training, and provides a better approximation of the posterior than the original SNPE method and some existing competitors as confirmed by numerical experiments.
    摘要 对于基于模拟的模型,Sequential Neural Posterior Estimation(SNPE)技术最近得到了提出。与 approximate Bayesian computation 不同,SNPE 技术通过使用神经网络基于的 conditional density 估计器来学习 posterior。Lueckmann et al. (2017)所提出的 SNPE 方法使用 calibration kernel 提高样本权重 around observed data,从而导致一个集中的损失函数。然而,使用 calibration kernels 可能会增加样本权重的方差,使训练变得不稳定。为了改进 SNPE 的稳定性,本文提出了使用自适应 calibration kernel 和一些减少方差技术。提议的方法可以快速加速训练过程,并提供更好的 posterior 估计than original SNPE 方法和一些现有的竞争对手,如数字实验所证明。

Inverse Problems with Learned Forward Operators

  • paper_url: http://arxiv.org/abs/2311.12528
  • repo_url: None
  • paper_authors: Simon Arridge, Andreas Hauptmann, Yury Korolev
  • for: 这个论文的目的是解决反问题,但是精确的模型可能是计算成本高昂的,所以需要一些便宜的方法来降低计算成本而不影响重建质量。
  • methods: 这篇论文评论了在反问题中使用学习前向Operator的重建方法,这些方法分为两个不同的方法 paradigm。第一种是完全不知道前向Operator的方法,它学习了 restrict 到训练数据所表示的子空间上的前向Operator。然后使用regulization by projection来找到重建。第二种方法使用测量过程的物理模型的简化版本,只需要使用训练数据来学习一个模型修正。
  • results: 我们提出了这两种方法的理论基础,并通过数值比较这两种方法的性能。结果表明,这两种方法都需要或至少受益于训练数据不仅包括前向Operator,还包括其 adj 。
    Abstract Solving inverse problems requires knowledge of the forward operator, but accurate models can be computationally expensive and hence cheaper variants are desired that do not compromise reconstruction quality. This chapter reviews reconstruction methods in inverse problems with learned forward operators that follow two different paradigms. The first one is completely agnostic to the forward operator and learns its restriction to the subspace spanned by the training data. The framework of regularisation by projection is then used to find a reconstruction. The second one uses a simplified model of the physics of the measurement process and only relies on the training data to learn a model correction. We present the theory of these two approaches and compare them numerically. A common theme emerges: both methods require, or at least benefit from, training data not only for the forward operator, but also for its adjoint.
    摘要 Translated into Simplified Chinese:解决反问题需要对前进Operator的了解,但高精度的模型可能是计算成本高昂的,因此需要更加便宜的方法。这章介绍了在反问题中使用学习前进Operator的两种不同方法。第一种是完全不知道前进Operator,只学习其 restrict 到训练数据所 span 的子空间中的部分。然后使用 regularization by projection 方法来找到重建。第二种使用 measurement 过程的物理模型简化,只靠训练数据来学习模型 correction。我们提出了这两种方法的理论,并对它们进行数值比较。一个共同主题出现:两种方法都需要、或至少受益于,训练数据不仅包括前进Operator,还包括其 adj 。

Local Convolution Enhanced Global Fourier Neural Operator For Multiscale Dynamic Spaces Prediction

  • paper_url: http://arxiv.org/abs/2311.12902
  • repo_url: None
  • paper_authors: Xuanle Zhao, Yue Sun, Tielin Zhang, Bo Xu
  • for: 解决多尺度partial differential equations (PDEs)问题
  • methods: 提出了一种基于Fourier Neural Operator (FNO)的 hierarchical neural operator,包括改进的Fourier层和注意机制,以解决多尺度PDEs问题
  • results: 在多种物理场景中进行了实验,对多尺度动态空间的预测和解析具有优秀表现,特别是在快变 coefficients的情况下
    Abstract Neural operators extend the capabilities of traditional neural networks by allowing them to handle mappings between function spaces for the purpose of solving partial differential equations (PDEs). One of the most notable methods is the Fourier Neural Operator (FNO), which is inspired by Green's function method and approximate operator kernel directly in the frequency domain. In this work, we focus on predicting multiscale dynamic spaces, which is equivalent to solving multiscale PDEs. Multiscale PDEs are characterized by rapid coefficient changes and solution space oscillations, which are crucial for modeling atmospheric convection and ocean circulation. To solve this problem, models should have the ability to capture rapid changes and process them at various scales. However, the FNO only approximates kernels in the low-frequency domain, which is insufficient when solving multiscale PDEs. To address this challenge, we propose a novel hierarchical neural operator that integrates improved Fourier layers with attention mechanisms, aiming to capture all details and handle them at various scales. These mechanisms complement each other in the frequency domain and encourage the model to solve multiscale problems. We perform experiments on dynamic spaces governed by forward and reverse problems of multiscale elliptic equations, Navier-Stokes equations and some other physical scenarios, and reach superior performance in existing PDE benchmarks, especially equations characterized by rapid coefficient variations.
    摘要

From Microbes to Methane: AI-Based Predictive Modeling of Feed Additive Efficacy in Dairy Cows

  • paper_url: http://arxiv.org/abs/2311.12901
  • repo_url: None
  • paper_authors: Yaniv Altshuler, Tzruya Calvao Chebach, Shalom Cohen
    for:The paper aims to optimize livestock feed for enhancing yield and minimizing environmental impact in sustainable agriculture.methods:The study uses rumen microbiome data to predict the efficacy of feed additives in dairy cattle, with a dataset of methane emissions from 2,190 Holstein cows across 34 sites. The experimental groups were administered four leading commercial feed additives, and methane emissions were measured before and after the administration of additives. The study also used deep metagenomic shotgun sequencing of rumen microbiome samples from 510 cows to develop a predictive model for additive efficacy.results:The study found that using targeted feed additive strategies can significantly reduce methane emissions and optimize dairy yield and milk composition. The predictive model developed in the study demonstrates the potential for reducing overall emissions by over 27% by guiding the assignment of additives to farms where they are most effective.
    Abstract In an era of increasing pressure to achieve sustainable agriculture, the optimization of livestock feed for enhancing yield and minimizing environmental impact is a paramount objective. This study presents a pioneering approach towards this goal, using rumen microbiome data to predict the efficacy of feed additives in dairy cattle. We collected an extensive dataset that includes methane emissions from 2,190 Holstein cows distributed across 34 distinct sites. The cows were divided into control and experimental groups in a double-blind, unbiased manner, accounting for variables such as age, days in lactation, and average milk yield. The experimental groups were administered one of four leading commercial feed additives: Agolin, Kexxtone, Allimax, and Relyon. Methane emissions were measured individually both before the administration of additives and over a subsequent 12-week period. To develop our predictive model for additive efficacy, rumen microbiome samples were collected from 510 cows from the same herds prior to the study's onset. These samples underwent deep metagenomic shotgun sequencing, yielding an average of 15.7 million reads per sample. Utilizing innovative artificial intelligence techniques we successfully estimated the efficacy of these feed additives across different farms. The model's robustness was further confirmed through validation with independent cohorts, affirming its generalizability and reliability. Our results underscore the transformative capability of using targeted feed additive strategies to both optimize dairy yield and milk composition, and to significantly reduce methane emissions. Specifically, our predictive model demonstrates a scenario where its application could guide the assignment of additives to farms where they are most effective. In doing so, we could achieve an average potential reduction of over 27\% in overall emissions.
    摘要 在现代农业中,增进可持续的农业发展对于提高产量和减少环境影响是一个 Paramount 的目标。这项研究提出了一种创新的方法,使用羊肠微生物数据预测添加剂效果在牛奶畜牧中。我们收集了一个很大的数据集,包括2,190头的豪士兰牛,分布于34个不同的地点。这些牛被分为控制和试验组,按照年龄、产 milk 天数和平均牛奶产量进行匹配。试验组接受了四种商业添加剂:Agolin、Kexxtone、Allimax 和 Relyon。羊肠气体量被测量为每个牛的前期和12周后。为建立我们的预测模型,羊肠微生物样本从510头牛的同一个牧场上收集,并进行深入的 метаGENOMIC 枪扫sequencing,每个样本平均获得15.7万个读取。通过应用创新的人工智能技术,我们成功地预测了这些添加剂的效果在不同的农场。模型的稳定性得到了独立的验证,证明其普遍性和可靠性。我们的结果表明,通过使用目标添加剂策略,可以同时优化牛奶产量和牛奶组分,并大幅减少气体排放。具体来说,我们的预测模型表明,通过对农场分配添加剂,可以实现平均减少27.3%的总排放。

Fair Polylog-Approximate Low-Cost Hierarchical Clustering

  • paper_url: http://arxiv.org/abs/2311.12501
  • repo_url: None
  • paper_authors: Marina Knittel, Max Springer, John Dickerson, MohammadTaghi Hajiaghayi
  • for: 本研究旨在提出一种可靠的、低成本的、层次归一化 clustering 算法,以解决现代智能系统中的伦理争议。
  • methods: 我们提出了一种基于 Dasgupta 的成本函数优化的 hierarchical clustering 算法,并且使用了一种新的 polylogarithmic-approximate 方法来缩小执行成本。
  • results: 我们的算法可以在低成本下实现高度的层次归一化 clustering,并且可以bridge the gap between the best fair和vanilla hierarchical clustering approximations。
    Abstract Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in \textit{hierarchical} clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta's [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.
    摘要 Knittel et al. (2023) proposed the first practical fair approximation for cost, but were unable to break the polynomial-approximate barrier they posed as a challenge. Our proposed algorithm, on the other hand, achieves truly polylogarithmic-approximate low-cost fair hierarchical clustering, bridging the gap between the best fair and vanilla hierarchical clustering approximations.

Multi-Objective Reinforcement Learning based on Decomposition: A taxonomy and framework

  • paper_url: http://arxiv.org/abs/2311.12495
  • repo_url: https://github.com/lucasalegre/morl-baselines
  • paper_authors: Florian Felten, El-Ghazali Talbi, Grégoire Danoy
  • for: 本研究旨在探讨多目标强化学习(MORL)如何帮助agent在多个目标之间做出妥协。
  • methods: 本研究基于多目标优化(MOO)和分解(D)的方法,提出了一种基于分解的MORL方法,并提供了一个完整的分类系统来涵盖现有和未来MORL研究。
  • results: 研究表明,基于分解的MORL方法可以在多个配置下实现比较好的性能,并且具有较高的灵活性和可扩展性。
    Abstract Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces Multi-Objective Reinforcement Learning based on Decomposition (MORL/D), a novel methodology bridging RL and MOO literature. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Implementation across various configurations demonstrates its versatility, assessed against benchmark problems. Results indicate MORL/D instantiations achieve comparable performance with significantly greater versatility than current state-of-the-art approaches. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL, contributing to the continued advancement of this field.
    摘要 多目标强化学习(MORL)是传统强化学习的扩展,它寻求Policy取得不同的compromise among conflicting objectives。在这些研究的兴趣的最近增长以来,有多种研究和解决方法,经常借鉴现有的多目标优化基于分解(MOO/D)的知识。然而,现有文献中没有一个清晰的分类系统,使得MORL研究人员在归类相关研究时面临困难,因为缺乏一个标准化的分类系统。为解决这个问题,本文提出了基于分解的多目标强化学习(MORL/D),一种新的方法ología,将RL和MOO/D两者相结合。本文还提供了一个完整的分类系统,用于分类现有和潜在的MORL工作。这个分类系统的引入,使得MORL研究更加明了清晰,结构化,并且通过Well-defined分类,提高了文献的简洁性。此外,本文还提出了一个灵活的框架,基于分类系统,可以容纳多种实例,使用RL和MOO/D两者的工具。在不同的配置下进行实现,这个框架的灵活性得到了证明,并且与标准方法相比,实现了更高的多样性。结果表明,MORL/D实例可以达到相当的性能,同时具有更大的多样性。通过提出分类系统和框架,本文为MORL提供了一个全面的视角和一个统一的术语,不仅方便了Algorithmic贡献的标识,还为MORL领域的进一步发展奠定了基础。

Heuristics for Detecting CoinJoin Transactions on the Bitcoin Blockchain

  • paper_url: http://arxiv.org/abs/2311.12491
  • repo_url: None
  • paper_authors: Hugo Schnoering, Michalis Vazirgiannis
  • for: 这份研究探讨比特币的各种方面,包括分布式Peer-to-Peer网络和其相关的区块链,以及用户隐私权的挑战。
  • methods: 这份研究使用了对比特币区块链的分析,以及对CoinJoin协议的实现和分析。
  • results: 研究对比特币区块链上CoinJoin交易的识别提出了新的HEURISTICS,并对这些HEURISTICS的效果进行了全面的分析。
    Abstract This research delves into the intricacies of Bitcoin, a decentralized peer-to-peer network, and its associated blockchain, which records all transactions since its inception. While this ensures integrity and transparency, the transparent nature of Bitcoin potentially compromises users' privacy rights. To address this concern, users have adopted CoinJoin, a method that amalgamates multiple transaction intents into a single, larger transaction to bolster transactional privacy. This process complicates individual transaction tracing and disrupts many established blockchain analysis heuristics. Despite its significance, limited research has been conducted on identifying CoinJoin transactions. Particularly noteworthy are varied CoinJoin implementations such as JoinMarket, Wasabi, and Whirlpool, each presenting distinct challenges due to their unique transaction structures. This study delves deeply into the open-source implementations of these protocols, aiming to develop refined heuristics for identifying their transactions on the blockchain. Our exhaustive analysis covers transactions up to block 760,000, offering a comprehensive insight into CoinJoin transactions and their implications for Bitcoin blockchain analysis.
    摘要 Translation notes:* "decentralized" is translated as "分布式" (pínshū zhì)* "peer-to-peer" is translated as "点对点" (diǎn duī diǎn)* "blockchain" is translated as "区块链" (kuàng zhì lián)* "transactions" is translated as "交易" (jiāoyì)* "integrity" is translated as "完整性" (wánchégòu xìng)* "transparency" is translated as "透明性" (tòu míng xìng)* "privacy" is translated as "隐私" (yǐn wèi)* "CoinJoin" is translated as " coinJoin" (coinJoin)* "transaction intents" is translated as "交易意图" (jiāoyì yìxiǎng)* "single, larger transaction" is translated as "单一大交易" (dan yī dà jiāoyì)* "bolster transactional privacy" is translated as "增强交易隐私" (zhòng qiáng jiāoyì yǐn wèi)* "complicates individual transaction tracing" is translated as "增加个人交易跟踪" (zhòng jī gè rén jiāoyì gēn zhì)* "disrupts many established blockchain analysis heuristics" is translated as "破坏许多已有的区块链分析方法" (hà hǎo duō yǐ yǒu de qū zhì xiǎng fāng fáng)* "limited research has been conducted" is translated as "有限的研究已经进行" (yǒu xiàn jì de yán jí yǐjīn zhì xíng)* "CoinJoin implementations" is translated as " coinJoin 实现" (coinJoin shí jiàn)* "JoinMarket, Wasabi, and Whirlpool" are translated as "JoinMarket、Wasabi 和 Whirlpool" (JoinMarket, Wasabi, yǔ Whirlpool)* "unique transaction structures" is translated as "特有的交易结构" (tè yǒu de jiāoyì jiégòu)* "open-source implementations" is translated as "开源实现" (kāi yuè shí jiàn)* "exhaustive analysis" is translated as "详细分析" (shí jiě fān yì)* "covering transactions up to block 760,000" is translated as "覆盖到块号760,000的交易" (fù gài daō zhù hào 760,000 de jiāoyì)

Harnessing FPGA Technology for Enhanced Biomedical Computation

  • paper_url: http://arxiv.org/abs/2311.12439
  • repo_url: None
  • paper_authors: Nisanur Alici, Kayode Inadagbo, Murat Isik
  • for: 这个研究探讨了具有增强运算能力的神经网络框架,例如卷积神经网络(CNN)、回传神经网络(RNN)、深度快照内存网络(LSTM)和深度信仰网络(DBN),以进一步分析电子心脏信号(ECG)资料。
  • methods: 这个研究使用了麻省理工学院生物医学信息学数据库(MIT-BIH)的ECG资料进行训练和评估,并添加了 Gaussian 噪声以提高算法的适应能力。研究人员还使用了不同层次的对应和分类功能,例如 EarlyStopping 回调和 Dropout 层来避免过拟合。此外,这篇文章还详细介绍了在 PYNQ Z1 平台上开发的特殊的 Tensor Compute Unit (TCU) 加速器。
  • results: 这个研究显示了 FPGA 在生物医学计算中的高效性,包括适用于不同领域的 FPGA 机器学习的全面方法。研究人员还评估了具有延迟和通过率等效能指标的模型。
    Abstract This research delves into sophisticated neural network frameworks like Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory Networks (LSTMs), and Deep Belief Networks (DBNs) for improved analysis of ECG signals via Field Programmable Gate Arrays (FPGAs). The MIT-BIH Arrhythmia Database serves as the foundation for training and evaluating our models, with added Gaussian noise to heighten the algorithms' resilience. The developed architectures incorporate various layers for specific processing and categorization functions, employing strategies such as the EarlyStopping callback and Dropout layer to prevent overfitting. Additionally, this paper details the creation of a tailored Tensor Compute Unit (TCU) accelerator for the PYNQ Z1 platform. It provides a thorough methodology for implementing FPGA-based machine learning, encompassing the configuration of the Tensil toolchain in Docker, selection of architectures, PS-PL configuration, and the compilation and deployment of models. By evaluating performance indicators like latency and throughput, we showcase the efficacy of FPGAs in advanced biomedical computing. This study ultimately serves as a comprehensive guide to optimizing neural network operations on FPGAs across various fields.
    摘要

Deep State-Space Model for Predicting Cryptocurrency Price

  • paper_url: http://arxiv.org/abs/2311.14731
  • repo_url: None
  • paper_authors: Shalini Sharma, Angshul Majumdar, Emilie Chouzenoux, Victor Elvira
  • for: 预测下一天的 криптовалю价格
  • methods: 提议使用深度 neural network 模型
  • results: 实验结果显示,提议的方法可以准确预测 криптовалю价格,并且比STATE-OF-THE-ART和传统动力学模型更好。
    Abstract Our work presents two fundamental contributions. On the application side, we tackle the challenging problem of predicting day-ahead crypto-currency prices. On the methodological side, a new dynamical modeling approach is proposed. Our approach keeps the probabilistic formulation of the state-space model, which provides uncertainty quantification on the estimates, and the function approximation ability of deep neural networks. We call the proposed approach the deep state-space model. The experiments are carried out on established cryptocurrencies (obtained from Yahoo Finance). The goal of the work has been to predict the price for the next day. Benchmarking has been done with both state-of-the-art and classical dynamical modeling techniques. Results show that the proposed approach yields the best overall results in terms of accuracy.
    摘要 我们的工作有两个基本贡献。在应用方面,我们解决了预测当天加密货币价格的复杂问题。在方法学方面,我们提出了一种新的动态模型方法。我们的方法保留了状态空间模型的 probabilistic 表述,这提供了估计结果的不确定性评估,同时具有深度神经网络的函数逼近能力。我们称之为深度状态空间模型。我们在已知加密货币(从 Yahoo Finance 获取)进行了实验,目标是预测下一天的价格。我们与现有的状态艺术技术和古典动态模型技术进行了比较。结果显示,我们的方法在准确性方面取得了最佳成绩。

Classifier Calibration with ROC-Regularized Isotonic Regression

  • paper_url: http://arxiv.org/abs/2311.12436
  • repo_url: None
  • paper_authors: Eugene Berta, Francis Bach, Michael Jordan
  • for: 本研究旨在提高机器学习分类器的可靠性和可解释性,使其预测结果更加可靠和有意义。
  • methods: 本研究使用了iso随变 regression(IR)技术来calibrate binary classifier,通过在calibration set上减少cross entropy来实现。IR acted as an adaptive binning procedure, which allows achieving a calibration error of zero while controlling for overfitting of the calibration set.
  • results: 本研究证明了IR preserve the convex hull of the ROC curve, guaranteeing that a classifier is calibrated while controlling for overfitting of the calibration set. In addition, a novel generalization of IR to accommodate classifiers with K classes was presented, which constructs a multidimensional adaptive binning scheme on the probability simplex and achieves a multi-class calibration error equal to zero. The algorithm was regularized by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier, and empirical results showed that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.
    Abstract Calibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model confidence and actual probabilities. One prominent technique, isotonic regression (IR), aims at calibrating binary classifiers by minimizing the cross entropy on a calibration set via monotone transformations. IR acts as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance. In this paper, we first prove that IR preserves the convex hull of the ROC curve -- an essential performance metric for binary classifiers. This ensures that a classifier is calibrated while controlling for overfitting of the calibration set. We then present a novel generalization of isotonic regression to accommodate classifiers with K classes. Our method constructs a multidimensional adaptive binning scheme on the probability simplex, again achieving a multi-class calibration error equal to zero. We regularize this algorithm by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier. We show empirically that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.
    摘要 ��calcibration of machine learning classifiers is necessary to obtain reliable and interpretable predictions, bridging the gap between model confidence and actual probabilities. One prominent technique, isotonic regression (IR), aims at calibrating binary classifiers by minimizing the cross entropy on a calibration set via monotone transformations. IR acts as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance. In this paper, we first prove that IR preserves the convex hull of the ROC curve -- an essential performance metric for binary classifiers. This ensures that a classifier is calibrated while controlling for overfitting of the calibration set. We then present a novel generalization of isotonic regression to accommodate classifiers with K classes. Our method constructs a multidimensional adaptive binning scheme on the probability simplex, again achieving a multi-class calibration error equal to zero. We regularize this algorithm by imposing a form of monotony that preserves the K-dimensional ROC surface of the classifier. We show empirically that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.Here's the text with the same word order as the original English text:machine learning 分类器的准确性和可解释性是必要的,它们可以减少模型信任度和实际概率之间的差距。一种常见的技术是iso逻辑回归(IR),它可以对binary分类器进行calibration,通过在calibration set上下降cross entropy的 monotone 变换。IR acted as an adaptive binning procedure, which allows achieving a calibration error of zero, but leaves open the issue of the effect on performance.在这篇论文中,我们首先证明了IR 保持了分类器的ROC曲线 convex hull -- 一个重要的性能指标。这 garanties that a classifier is calibrated while controlling for overfitting of the calibration set. 然后,我们提出了一种扩展IR 以适应 K 类分类器。我们的方法在可信度Simplex上构建了多维ensional adaptive binning scheme,再次实现了多类calibration error equal to zero。我们在这个算法中加入了一种形式的monotony criterion,以保持 K 维 ROC 表面。我们 empirically show that this general monotony criterion is effective in striking a balance between reducing cross entropy loss and avoiding overfitting of the calibration set.

Looped Transformers are Better at Learning Learning Algorithms

  • paper_url: http://arxiv.org/abs/2311.12424
  • repo_url: None
  • paper_authors: Liu Yang, Kangwook Lee, Robert Nowak, Dimitris Papailiopoulos
  • for: 解决各种数据适应问题,如报告中的Garg等人所报道的各种模型适应问题。
  • methods: 利用循环式变换器架构和相关训练方法,具有迭代特性,以便将迭代算法 integrate 到变换器架构中。
  • results: 实验结果表明,循环式变换器可以与标准变换器相比,在解决各种数据适应问题中表现出色,而且参数计数少于10%。
    Abstract Transformers have demonstrated effectiveness in \emph{in-context solving} data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of \emph{looped} transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10\% of the parameter count.
    摘要 吸引器(Transformers)已经在具有不同(latent)模型的数据适应问题上表现出效果,据格arg等人所报道。然而,吸引器架构缺乏自身迭代结构,这使得模仿传统机器学习方法中常用的迭代算法变得困难。为此,我们提议利用循环吸引器架构和相关的训练方法,以尝试将迭代特性 incorporated into transformer architectures。实验结果表明,循环吸引器可以与标准吸引器相当的性能 solve various data-fitting problems,而且使用参数计数少于10%。

Federated Learning via Consensus Mechanism on Heterogeneous Data: A New Perspective on Convergence

  • paper_url: http://arxiv.org/abs/2311.12358
  • repo_url: https://github.com/fedcome/fedcome
  • paper_authors: Shu Zheng, Tiandi Ye, Xiang Li, Ming Gao
  • for: 本研究旨在提高异步学习(Federated Learning,FL)在异步数据(heterogeneous data)上的效果,特别是保证每个客户端的风险减少。
  • methods: 我们提出了一种名为FedCOME的新方法,它在服务器端对客户端的梯度进行微小调整,以生成客户端之间的锐角,从而保证每个客户端的风险减少。此外,我们还提出了一种新的客户端采样策略,可以在不同的客户端上选择最 represntative的数据,以便在全球数据分布上进行训练。
  • results: 我们在四个 benchmark 数据集上进行了广泛的实验,并证明了 FedCOME 方法在效果、效率和公平性方面与其他现有方法相比较出色。具体来说,FedCOME 方法可以在不同的客户端上减少风险,并且可以在不同的数据分布下进行高效的训练。我们还提供了 reproduce 的源代码,可以在 \url{https://github.com/fedcome/fedcome} 上下载。
    Abstract Federated learning (FL) on heterogeneous data (non-IID data) has recently received great attention. Most existing methods focus on studying the convergence guarantees for the global objective. While these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address the problem,we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. In particular, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.
    摘要 Federated learning (FL) on 异步数据 (异步数据) 在最近几年内得到了广泛关注。大多数现有方法都是研究全局目标的收敛保证。although these methods can guarantee the decrease of the global objective in each communication round, they fail to ensure risk decrease for each client. In this paper, to address this problem, we propose FedCOME, which introduces a consensus mechanism to enforce decreased risk for each client after each training round. Specifically, we allow a slight adjustment to a client's gradient on the server side, which generates an acute angle between the corrected gradient and the original ones of other clients. We theoretically show that the consensus mechanism can guarantee the convergence of the global objective. To generalize the consensus mechanism to the partial participation FL scenario, we devise a novel client sampling strategy to select the most representative clients for the global data distribution. Training on these selected clients with the consensus mechanism could empirically lead to risk decrease for clients that are not selected. Finally, we conduct extensive experiments on four benchmark datasets to show the superiority of FedCOME against other state-of-the-art methods in terms of effectiveness, efficiency, and fairness. For reproducibility, we make our source code publicly available at: \url{https://github.com/fedcome/fedcome}.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The traditional Chinese writing system is also widely used in Taiwan and Hong Kong, but it may differ slightly from the Simplified Chinese version.

Random Linear Projections Loss for Hyperplane-Based Optimization in Regression Neural Networks

  • paper_url: http://arxiv.org/abs/2311.12356
  • repo_url: https://github.com/ahmedaloui1997/randomlinearprojections
  • paper_authors: Shyam Venkatasubramanian, Ahmed Aloui, Vahid Tarokh
    for:这个研究旨在提出一种名为随机直线投影(RLP)损失函数,可以有效遏制过滤复杂数据集的神经网络。methods:这个研究使用了RLP损失函数,将神经网络中的特征预测对组和特征标签对组之间的距离最小化。results:实验结果显示,使用RLP损失函数训练神经网络可以提高性能,需要 fewer data samples,并且对于添加性噪声更加抗性。我们也提供了理论分析支持我们的实验结果。
    Abstract Despite their popularity across a wide range of domains, regression neural networks are prone to overfitting complex datasets. In this work, we propose a loss function termed Random Linear Projections (RLP) loss, which is empirically shown to mitigate overfitting. With RLP loss, the distance between sets of hyperplanes connecting fixed-size subsets of the neural network's feature-prediction pairs and feature-label pairs is minimized. The intuition behind this loss derives from the notion that if two functions share the same hyperplanes connecting all subsets of feature-label pairs, then these functions must necessarily be equivalent. Our empirical studies, conducted across benchmark datasets and representative synthetic examples, demonstrate the improvements of the proposed RLP loss over mean squared error (MSE). Specifically, neural networks trained with the RLP loss achieve better performance while requiring fewer data samples and are more robust to additive noise. We provide theoretical analysis supporting our empirical findings.
    摘要 尽管归 regression neural network 在各个领域广泛应用,但它们往往容易过拟合复杂的数据集。在这项工作中,我们提出了一种名为Random Linear Projections(RLP)损失函数,可以有效遏制过拟合。RLP损失函数的目标是将连接固定大小的 neural network 特征预测对的集合与特征标签对的距离最小化。这个概念的INTUITION是,如果两个函数共享所有特征标签对的hyperplane,那么这两个函数必然相等。我们的实验研究,在标准数据集和代表性的synthetic例子中,表明RLP损失函数比mean squared error(MSE)更有优势。具体来说,使用RLP损失函数训练的神经网络在数据量更少的情况下表现更好,并且更敏感于加速噪音。我们还提供了理论分析,支持我们的实验结论。

Acceleration and Implicit Regularization in Gaussian Phase Retrieval

  • paper_url: http://arxiv.org/abs/2311.12888
  • repo_url: None
  • paper_authors: Tyler Maunu, Martin Molina-Fructuoso
  • for: 在泊利阶段减少优化问题中研究加速优化方法。
  • methods: 使用斜环方法或额外推导方法。
  • results: 实验证明,加速方法比梯度下降更快 converge。
    Abstract We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit regularization ensures that the algorithms remain in a nice region, where the cost function is strongly convex and smooth despite being nonconvex in general. This ensures that these accelerated methods achieve faster rates of convergence than gradient descent. Experimental evidence demonstrates that the accelerated methods converge faster than gradient descent in practice.
    摘要 我们研究加速优化方法在幂相幂逻辑问题中。在这个设定下,我们证明了梯度方法(包括波亚克和纳塞诺夫摇摆)具有类似于梯度下降的隐式规则化,这使得算法保持在一个易于处理的区域内,其中Cost函数具有强 convexity和圆滑性,尽管在总体上是非对称的。这使得这些加速方法可以比梯度下降更快地 converges。实验证明,加速方法在实践中比梯度下降更快 converges。Here's the translation in Traditional Chinese:我们研究加速优化方法在幂相幂逻辑问题中。在这个设定下,我们证明了梯度方法(包括波亚克和纳塞诺夫摇摆)具有类似于梯度下降的隐式规则化,这使得算法保持在一个易于处理的区域内,其中Cost函数具有强凸性和圆滑性,几乎在总体上是非对称的。这使得这些加速方法可以比梯度下降更快地 converge。实验证明,加速方法在实践中比梯度下降更快 converge。

Graph Neural Ordinary Differential Equations-based method for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2311.12329
  • repo_url: None
  • paper_authors: Ke Xu, Yuanjie Zhu, Weizhi Zhang, Philip S. Yu
  • for: collaborative filtering
  • methods: Graph Neural Ordinary Differential Equation-based method (GODE-CF)
  • results: outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods, with advantages in simplicity, efficiency, and training time.Here’s the Chinese translation:
  • for: 协同推荐
  • methods: 图 neural ordinary differential equation 基本方法 (GODE-CF)
  • results: 比竞争基eline模型和其他状态级 CF 方法高效,具有简单、高效和快速训练时间的优势。
    Abstract Graph Convolution Networks (GCNs) are widely considered state-of-the-art for collaborative filtering. Although several GCN-based methods have been proposed and achieved state-of-the-art performance in various tasks, they can be computationally expensive and time-consuming to train if too many layers are created. However, since the linear GCN model can be interpreted as a differential equation, it is possible to transfer it to an ODE problem. This inspired us to address the computational limitations of GCN-based models by designing a simple and efficient NODE-based model that can skip some GCN layers to reach the final state, thus avoiding the need to create many layers. In this work, we propose a Graph Neural Ordinary Differential Equation-based method for Collaborative Filtering (GODE-CF). This method estimates the final embedding by utilizing the information captured by one or two GCN layers. To validate our approach, we conducted experiments on multiple datasets. The results demonstrate that our model outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods. Notably, our proposed GODE-CF model has several advantages over traditional GCN-based models. It is simple, efficient, and has a fast training time, making it a practical choice for real-world situations.
    摘要 格 graphs Convolution Networks (GCNs) 是现状的Reference for collaborative filtering. Although several GCN-based methods have been proposed and achieved state-of-the-art performance in various tasks, they can be computationally expensive and time-consuming to train if too many layers are created. However, since the linear GCN model can be interpreted as a differential equation, it is possible to transfer it to an ODE problem. This inspired us to address the computational limitations of GCN-based models by designing a simple and efficient NODE-based model that can skip some GCN layers to reach the final state, thus avoiding the need to create many layers. In this work, we propose a Graph Neural Ordinary Differential Equation-based method for Collaborative Filtering (GODE-CF). This method estimates the final embedding by utilizing the information captured by one or two GCN layers. To validate our approach, we conducted experiments on multiple datasets. The results demonstrate that our model outperforms competitive baselines, including GCN-based models and other state-of-the-art CF methods. Notably, our proposed GODE-CF model has several advantages over traditional GCN-based models. It is simple, efficient, and has a fast training time, making it a practical choice for real-world situations.

Power grid operational risk assessment using graph neural network surrogates

  • paper_url: http://arxiv.org/abs/2311.12309
  • repo_url: None
  • paper_authors: Yadong Zhang, Pranav M Karve, Sankaran Mahadevan
  • for: 这篇论文旨在研究图 neural network (GNN) 是否可以作为执行决策算法 (optimal power flow (OPF) 和 security-constrained unit commitment (SCUC)) 的代理,以便进行准确的风险评估。
  • methods: 研究使用了多个 Monte Carlo (MC) 样本,从推测的空间时间相关的随机网络变量中采样出来。然后使用传统的 OPF 和 SCUC 解决方案来生成数据用于训练 GNN 模型。
  • results: GNN 模型可以快速和准确地预测关键量表 (QoI),包括系统和个别区域热电做出的输出和剩余能源。此外,GNN 模型还可以准确地评估系统的可靠性和风险。
    Abstract We investigate the utility of graph neural networks (GNNs) as proxies of power grid operational decision-making algorithms (optimal power flow (OPF) and security-constrained unit commitment (SCUC)) to enable rigorous quantification of the operational risk. To conduct principled risk analysis, numerous Monte Carlo (MC) samples are drawn from the (foretasted) probability distributions of spatio-temporally correlated stochastic grid variables. The corresponding OPF and SCUC solutions, which are needed to quantify the risk, are generated using traditional OPF and SCUC solvers to generate data for training GNN model(s). The GNN model performance is evaluated in terms of the accuracy of predicting quantities of interests (QoIs) derived from the decision variables in OPF and SCUC. Specifically, we focus on thermal power generation and load shedding at system and individual zone level. We also perform reliability and risk quantification based on GNN predictions and compare with that obtained from OPF/SCUC solutions. Our results demonstrate that GNNs are capable of providing fast and accurate prediction of QoIs and thus can be good surrogate models for OPF and SCUC. The excellent accuracy of GNN-based reliability and risk assessment further suggests that GNN surrogate has the potential to be applied in real-time and hours-ahead risk quantification.
    摘要 我们研究Graph Neural Networks(GNN)作为电力网络运行决策算法(优化电力流(OPF)和安全限制单位配置(SCUC))的代理,以便正确评估运行风险。为了进行原则性的风险分析,我们从 correlate的空间时间协调随机变量中采样了许多Monte Carlo(MC)样本。对应的OPF和SCUC解决方案,需要用传统的OPF和SCUC解决方案生成数据来训练GNN模型。我们评估GNN模型性能是通过对决变量中的量问题(QoI)的预测精度来进行。我们专注于系统和个人区域水平的热电力生产和减少负荷。我们还对GNN预测结果进行可靠性和风险评估,并与OPF/SCUC解决方案中的风险评估进行比较。我们的结果表明GNN可以提供快速和准确的QoI预测,因此可以作为OPF和SCUC的代理模型。GNN模型的可靠性和风险评估的出色性表明它可以在实时和多小时前的风险评估中应用。

Mapping “Brain Coral” Regions on Mars using Deep Learning

  • paper_url: http://arxiv.org/abs/2311.12292
  • repo_url: https://github.com/pearsonkyle/mars-brain-coral-network
  • paper_authors: Kyle A. Pearson, Eldar Noe, Daniel Zhao, Alphan Altinok, Alex Morgan
  • for: 搜寻水星上可能存在过或现在存在生命的证据。
  • methods: 使用 convolutional neural networks 检测水星表面的 “Brain Coral” 地形,并利用 JPEG 压缩和抽象域的方法提高处理速度。
  • results: 在大约28TB的图像数据集中,找到了超过200个图像中的检测结果,并且实现了 ~93% 的准确率和 ~95% 的处理时间减少。
    Abstract One of the main objectives of the Mars Exploration Program is to search for evidence of past or current life on the planet. To achieve this, Mars exploration has been focusing on regions that may have liquid or frozen water. A set of critical areas may have seen cycles of ice thawing in the relatively recent past in response to periodic changes in the obliquity of Mars. In this work, we use convolutional neural networks to detect surface regions containing "Brain Coral" terrain, a landform on Mars whose similarity in morphology and scale to sorted stone circles on Earth suggests that it may have formed as a consequence of freeze/thaw cycles. We use large images (~100-1000 megapixels) from the Mars Reconnaissance Orbiter to search for these landforms at resolutions close to a few tens of centimeters per pixel (~25--50 cm). Over 52,000 images (~28 TB) were searched (~5% of the Martian surface) where we found detections in over 200 images. To expedite the processing we leverage a classifier network (prior to segmentation) in the Fourier domain that can take advantage of JPEG compression by leveraging blocks of coefficients from a discrete cosine transform in lieu of decoding the entire image at the full spatial resolution. The hybrid pipeline approach maintains ~93% accuracy while cutting down on ~95% of the total processing time compared to running the segmentation network at the full resolution on every image. The timely processing of big data sets helps inform mission operations, geologic surveys to prioritize candidate landing sites, avoid hazardous areas, or map the spatial extent of certain terrain. The segmentation masks and source code are available on Github for the community to explore and build upon.
    摘要 一个主要目标 OF Mars Exploration Program 是搜寻 mars 上过去或当前生命的证据。为了实现这一目标, Mars 探索团队将注意力集中在可能存在液态或冰封水的区域上。在这种工作中,我们使用 convolutional neural networks (CNN) 来检测 mars 表面的 "Brain Coral" 地形,这种地形的形态和比例与地球上的排序石圈相似, suggesting 它可能是冰封/解冻过程中形成的。我们使用 Mars Reconnaissance Orbiter 上大量的图像 (~100-1000 megapixels) 在 ~25--50 cm 的分辨率下搜索这些地形(相当于 ~28 TB 的图像),并在 ~5% 的 Mars 表面上找到了多达 200 个检测结果。为了加速处理,我们利用了一个类ifier network (在 Fourier 频域) ,可以利用 JPEG 压缩来优化图像处理。这种混合管道方法可以保持 ~93% 的准确率,同时减少 ~95% 的处理时间。快速处理大数据集可以帮助指导任务操作、 geologic 评估和选择候选 touched 地点,以及地图这些特定地形的空间扩展。检测mask 和源代码都可以在 GitHub 上找到,以便社区可以进一步探索和建立在这些基础上。

A Supervised Contrastive Learning Pretrain-Finetune Approach for Time Series

  • paper_url: http://arxiv.org/abs/2311.12290
  • repo_url: None
  • paper_authors: Trang H. Tran, Lam M. Nguyen, Kyongmin Yeo, Nam Nguyen, Roman Vaculin
  • for: 本研究旨在推广机器学习领域内的基础模型,以提高大规模数据处理的效率。
  • methods: 本研究使用了指导学习法,通过在预训练数据集中学习特征之间的对比,从预训练数据集中提取特征表示。然后,使用这些表示进行细订训练,以更好地预测目标数据。
  • results: 我们的实验结果显示,我们的方法可以提高预测目标数据的准确率。
    Abstract Foundation models have recently gained attention within the field of machine learning thanks to its efficiency in broad data processing. While researchers had attempted to extend this success to time series models, the main challenge is effectively extracting representations and transferring knowledge from pretraining datasets to the target finetuning dataset. To tackle this issue, we introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset. This pretraining phase enables a probabilistic similarity metric, which assesses the likelihood of a univariate sample being closely related to one of the pretraining datasets. Subsequently, using this similarity metric as a guide, we propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets. Our experiments have shown promising results which demonstrate the efficacy of our approach.
    摘要 基础模型在机器学习领域最近受到了关注,感谢它的广泛数据处理效率。然而,研究人员在扩展这种成功到时间序列模型方面遇到了主要挑战,即从预训练数据集中提取有用的表示并将知识传递到目标训练数据集。为解决这个问题,我们提出了一种新的预训练方法,利用supervised contrastive learning来分别特征。这个预训练阶段生成了一个概率相似度度量,用于衡量预训练数据集中单变量样本与其他预训练数据集之间的相似性。然后,我们提议一种细化训练方法,通过将预训练数据集中学习的动力与目标数据集更加相似来提高目标数据集的准确预测。我们的实验结果表明我们的方法的效果是可靠的。

Orthogonally weighted $\ell_{2,1}$ regularization for rank-aware joint sparse recovery: algorithm and analysis

  • paper_url: http://arxiv.org/abs/2311.12282
  • repo_url: https://github.com/a-petr/owl
  • paper_authors: Armenak Petrosyan, Konstantin Pieper, Hoang Tran
  • for: 解决 JOINT SPARSE RECOVERY 问题
  • methods: 使用新的规则化方法 called orthogonally weighted $\ell_{2,1}$(ow$\ell_{2,1}$),该方法考虑解matrix的排名特性
  • results: 提出一种高效的算法,并提供了证明和数学实验来证明其效果。
    Abstract We propose and analyze an efficient algorithm for solving the joint sparse recovery problem using a new regularization-based method, named orthogonally weighted $\ell_{2,1}$ ($\mathit{ow}\ell_{2,1}$), which is specifically designed to take into account the rank of the solution matrix. This method has applications in feature extraction, matrix column selection, and dictionary learning, and it is distinct from commonly used $\ell_{2,1}$ regularization and other existing regularization-based approaches because it can exploit the full rank of the row-sparse solution matrix, a key feature in many applications. We provide a proof of the method's rank-awareness, establish the existence of solutions to the proposed optimization problem, and develop an efficient algorithm for solving it, whose convergence is analyzed. We also present numerical experiments to illustrate the theory and demonstrate the effectiveness of our method on real-life problems.
    摘要 我们提出并分析了一种高效的算法,用于解决共聚散恢复问题,使用一种新的规则化基于方法,称为正交加重 $\ell_{2,1}$(OW $\ell_{2,1}$)。这种方法特别是为了考虑解决矩阵的级别,而不是通常使用的 $\ell_{2,1}$ 规则化和其他现有的规则化方法。我们证明了该方法的级别意识,证明存在解决提出的优化问题的解,并开发了一种高效的解决方案,其 converges 分析。我们还在实际问题上进行了数值实验,以证明我们的方法的理论和实际效果。

Beyond Simulated Drivers: Evaluating the Impact of Real-World Car-Following in Mixed Traffic Control

  • paper_url: http://arxiv.org/abs/2311.12261
  • repo_url: None
  • paper_authors: Bibek Poudel, Weizi Li
  • for: 本研究旨在研究人工驾驶车辆如何使用 robot 车辆来缓解交通堵塞问题,提高安全性、有效性和稳定性。
  • methods: 本研究使用实际的人类驾驶轨迹数据,提取了各种加速行为,并将这些行为integrated into simulations where robot vehicles from prior studies are employed to mitigate congestion. 在这些 simulations中,我们还引入了一种基于强化学习的 robot 车辆,使用一个堵塞stage分类神经网络来优化”安全+稳定”或”效率”在人类驾驶行为的存在下。
  • results: 我们在两个不同的混合交通控制环境中评估了提出的 robot 车辆,并与先前的 robot 车辆进行比较。结果表明,我们的方法可以提高安全性、有效性和稳定性,并且可以适应不同的人类驾驶行为。
    Abstract Human-driven vehicles can amplify naturally occurring perturbations in traffic, leading to congestion and consequently increased fuel consumption, higher collision risks, and reduced capacity utilization. While previous research has highlighted that a fraction of Robot Vehicles (RVs) can mitigate these issues, they often rely on simulations with simplistic, model-based Human-driven Vehicles (HVs) during car-following scenarios. Diverging from this trend, in this study, we analyze real-world human driving trajectories, extracting a wide range of acceleration behaviors during car-following. We then incorporate these behaviors in simulation where RVs from prior studies are employed to mitigate congestion, and evaluate their safety, efficiency, and stability. Further, we also introduce a reinforcement learning based RV that utilizes a congestion stage classifier neural network to optimize either "safety+stability" or "efficiency" in the presence of the diverse human driving behaviors. We evaluate the proposed RVs in two different mixed traffic control environments at various densities, configurations, and penetration rates and compare with the existing RVs.
    摘要 人类驾驶车可以增强天然发生的交通干扰,导致堵塞和更高的燃油消耗、更高的碰撞风险和更低的负载使用率。而前一些研究表明,一部分机器人车(RV)可以解决这些问题,但它们通常在车辆尾随场景下使用简单化的人类驾驶车(HV)模型进行模拟。在这种情况下,我们分析了实际的人类驾驶轨迹数据,提取了车辆尾随场景中的各种加速行为。然后,我们在模拟中包含这些行为,使用先前研究中的RV来 mitigate堵塞,并评估其安全、效率和稳定性。此外,我们还引入了一种基于强化学习的RV,使用车辆堵塞阶段分类神经网络来优化“安全+稳定”或“效率”在人类驾驶行为的存在下。我们对提议的RV进行了两种不同的混合交通控制环境的评估,包括不同的混合率、配置和分布。并与现有的RV进行了比较。

  • paper_url: http://arxiv.org/abs/2311.12255
  • repo_url: https://github.com/silencex12138/time-granularity-on-temporal-graphs
  • paper_authors: Xiangjian Jiang, Yanyi Pu
  • for: 这 paper 的目的是研究 dynamic graph neural networks (DGNNs) 在处理动态图数据时的性能和稳定性如何受到时间信息的影响,特别是在不同的时间粒度下进行预测任务时。
  • methods: 该 paper 使用了多种 domain 的动态图和三种不同的 DGNN 模型,通过对四种不同的时间粒度进行比较来探讨时间粒度对模型性能和稳定性的影响。
  • results: 研究发现,一个复杂的记忆机制和适当的时间粒度是在动态链接预测任务中使 DGNN 达到竞争性和稳定性的关键因素。 In addition, the paper also discusses the limitations of the considered models and datasets and proposes promising directions for future research on the time granularity of temporal graphs.
    Abstract Dynamic Graph Neural Networks (DGNNs) have emerged as the predominant approach for processing dynamic graph-structured data. However, the influence of temporal information on model performance and robustness remains insufficiently explored, particularly regarding how models address prediction tasks with different time granularities. In this paper, we explore the impact of time granularity when training DGNNs on dynamic graphs through extensive experiments. We examine graphs derived from various domains and compare three different DGNNs to the baseline model across four varied time granularities. We mainly consider the interplay between time granularities, model architectures, and negative sampling strategies to obtain general conclusions. Our results reveal that a sophisticated memory mechanism and proper time granularity are crucial for a DGNN to deliver competitive and robust performance in the dynamic link prediction task. We also discuss drawbacks in considered models and datasets and propose promising directions for future research on the time granularity of temporal graphs.
    摘要 “几何图 neural network(DGNN)已成为处理动态图струк成数据的主流方法。然而,对于模型在不同时间粒度下的表现和可靠性的影响仍然不充分探索,特别是在不同时间粒度下进行预测任务时。本文通过广泛的实验进行了深入探讨。我们使用来自不同领域的图进行测试,比较了三种不同的DGNN和基eline模型,并在四种不同的时间粒度下进行比较。我们主要考虑了时间粒度、模型架构和负样本策略之间的互动,以获得一般的结论。我们的结果显示,在动态链接预测任务中,一个智能的内存机制和适当的时间粒度是不可或缺的。我们 также Discuss了考虑的模型和数据集的缺陷,并提出了未来研究时间粒度的几何图模型的可能性。”

The limitation of neural nets for approximation and optimization

  • paper_url: http://arxiv.org/abs/2311.12253
  • repo_url: https://github.com/sohaboumaima/basesnnapproxforopt
  • paper_authors: Tommaso Giovannelli, Oumaima Sohab, Luis Nunes Vicente
  • for: 本研究旨在利用神经网络作为优化问题中的准确模型,以优化和规避优化问题中的目标函数。
  • methods: 本研究首先确定了最佳的激活函数来近似各种非线性优化问题中的目标函数,并证明了~SiLU激活函数的最佳性。然后,研究分析了神经网络和 interpol/regression 模型对目标函数值、导数和偏导数的近似精度。
  • results: 研究发现,神经网络可以在优化问题中提供竞争力强的零和首项近似(但需要高训练成本),但在第二项近似方面表现较差。然而,结合神经网络激活函数和自然基准的 quadratic interpol/regression 模型可以减少参数数量,从而提高模型精度。最后,研究证明了一种现有的偏导数-free 优化算法的性能很难超过使用神经网络或其他准确模型来approximate gradient的情况。
    Abstract We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.
    摘要 我们 interessante在使用神经网络作为估算函数的优化模型,以替代估算函数的最小化问题。神经网络在机器学习任务中广泛使用,但它们在解决估算函数问题上的应用相对较少。我们的研究开始于选择最佳启动函数,以便对各种非线性估算函数进行拟合。我们的证据显示,~SiLU 启动函数在这些估算函数中的表现最佳。然后,我们分析了在这些估算函数中的函数值、梯度和贝维对应的精度。相比于插值/回归模型,神经网络可以提供竞争力的零阶和一阶拟合(对于训练成本而言),但它们在二阶拟合方面表现不佳。然而,我们发现,将神经网络启动函数与自然基底结合起来,可以将跨项减少到最少,从而获得 fewer 参数的模型。最后,我们提供证据,认为使用现有的导引�optimization algorithm可以很难提高目标函数的梯度拟合精度,包括使用神经网络。