cs.LG - 2023-12-03

Recurrent Distance-Encoding Neural Networks for Graph Representation Learning

paper_url: http://arxiv.org/abs/2312.01538
repo_url: https://github.com/skeletondyh/GRED
paper_authors: Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann
for: 本研究旨在提出一种新的图 neural network 架构，以解决图Transformers 的计算复杂性和 inductive bias 问题。
methods: 本研究使用了深度状态空间模型的突破性，对目标节点进行最短距离 aggregation，并使用并行化的线性循环网络对距离链进行自然编码。无需 позицион编码，我们的模型在多个 benchmark 上实现了与现状册方法竞争的性能，并且理论上更加表达能力。
results: 我们的模型在多个 benchmark 上实现了与现状册方法竞争的性能，并且理论上更加表达能力。与之前的一 hop message passing нейрон网络相比，我们的模型具有更高的计算效率和表达能力。

Abstract
Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but suffer from high computational complexity and have to rely on ad-hoc positional encoding to bake in the graph inductive bias. In this paper, we propose a new architecture to reconcile these challenges. Our approach stems from the recent breakthroughs in long-range modeling provided by deep state-space models on sequential data: for a given target node, our model aggregates other nodes by their shortest distances to the target and uses a parallelizable linear recurrent network over the chain of distances to provide a natural encoding of its neighborhood structure. With no need for positional encoding, we empirically show that the performance of our model is highly competitive compared with that of state-of-the-art graph transformers on various benchmarks, at a drastically reduced computational complexity. In addition, we show that our model is theoretically more expressive than one-hop message passing neural networks.

摘要
基于迭代一遍消息传递的图神经网络在距离节点很远时很难以有效地利用信息。相反，图转换器允许每个节点直接关注所有其他节点，但是它们具有高计算复杂性，并且需要采用特定的位置编码来带入图逻辑偏好。在这篇论文中，我们提出了一种新的建议，它基于序列数据中的深度状态空间模型的最近突破。对于给定的目标节点，我们的模型将其他节点通过它们与目标节点的最短距离来聚合，并使用并行化的线性循环网络来提供一种自然的编码方式。无需位置编码，我们实验表明，我们的模型与当前领导的图转换器在各种测试集上的性能非常竞争，并且计算复杂性减少了极多。此外，我们也证明了我们的模型在一遍消息传递神经网络中是更加表达力强的。

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

paper_url: http://arxiv.org/abs/2312.01530
repo_url: None
paper_authors: Henrik von Kleist, Alireza Zamanian, Ilya Shpitser, Narges Ahmidi
for: 这篇论文的目的是解决人工智能代理人在医疗领域中选择功能特征的问题，以便在投入资源方面做出合理的决策。
methods: 这篇论文使用了活动特征获取（AFA）代理人，并研究了在不同假设下（即无直接效应假设和无不观察混合假设）下的性能评估方法。
results: 研究人员提出了三种新的估计器：直接方法（DM）、反比权重估计器（IPW）和双强化学习估计器（DRL），这些估计器在不同假设下具有更好的数据效率。

Abstract
Machine learning methods often assume input features are available at no cost. However, in domains like healthcare, where acquiring features could be expensive or harmful, it is necessary to balance a feature's acquisition cost against its predictive value. The task of training an AI agent to decide which features to acquire is called active feature acquisition (AFA). By deploying an AFA agent, we effectively alter the acquisition strategy and trigger a distribution shift. To safely deploy AFA agents under this distribution shift, we present the problem of active feature acquisition performance evaluation (AFAPE). We examine AFAPE under i) a no direct effect (NDE) assumption, stating that acquisitions don't affect the underlying feature values; and ii) a no unobserved confounding (NUC) assumption, stating that retrospective feature acquisition decisions were only based on observed features. We show that one can apply offline reinforcement learning under the NUC assumption and missing data methods under the NDE assumption. When NUC and NDE hold, we propose a novel semi-offline reinforcement learning framework, which requires a weaker positivity assumption and yields more data-efficient estimators. We introduce three novel estimators: a direct method (DM), an inverse probability weighting (IPW), and a double reinforcement learning (DRL) estimator.

摘要

无直接效应（NDE）假设：策略不会影响下游特征值。2. 无隐藏干扰（NUC）假设：决策只基于观察特征。我们表明，可以在NUC假设下应用线上强化学习，并在NDE假设下使用失去数据方法。当NUC和NDE假设成立时，我们提出了一种新的半线上强化学习框架，需要较弱的正确性假设，并可以更高效地处理数据。我们 introduce three novel estimators：1. 直接方法（DM）。2. 反比权重（IPW）。3. 双强化学习（DRL）计算器。

Learn2Extend: Extending sequences by retaining their statistical properties with mixture models

paper_url: http://arxiv.org/abs/2312.01507
repo_url: None
paper_authors: Dimitris Vartziotis, George Dasoulas, Florian Pausinger
for: 本研究旨在推广通常有限长的实数序列，保持其内在的统计性质，通过机器学习技术。
methods: 本研究使用了自适应扩展混合模型（SEMM），直接估计条件概率 Distribution，而不是激发函数。
results: 对多种点过程进行比较实验，包括波尔兹曼、本地吸引和本地抗拒绝序列，并在里曼ζ函数零点预测中进行了实践研究。结果表明，提案的混合模型在保持特定统计性质的同时，对序列扩展表现出了超越传统神经网络架构的能力。

Abstract
This paper addresses the challenge of extending general finite sequences of real numbers within a subinterval of the real line, maintaining their inherent statistical properties by employing machine learning. Our focus lies on preserving the gap distribution and pair correlation function of these point sets. Leveraging advancements in deep learning applied to point processes, this paper explores the use of an auto-regressive \textit{Sequence Extension Mixture Model} (SEMM) for extending finite sequences, by estimating directly the conditional density, instead of the intensity function. We perform comparative experiments on multiple types of point processes, including Poisson, locally attractive, and locally repelling sequences, and we perform a case study on the prediction of Riemann $\zeta$ function zeroes. The results indicate that the proposed mixture model outperforms traditional neural network architectures in sequence extension with the retention of statistical properties. Given this motivation, we showcase the capabilities of a mixture model to extend sequences, maintaining specific statistical properties, i.e. the gap distribution, and pair correlation indicators.

摘要

The mechanistic basis of data dependence and abrupt learning in an in-context classification task

paper_url: http://arxiv.org/abs/2312.03002
repo_url: None
paper_authors: Gautam Reddy
for: 本研究探讨了Transformer模型在语言处理中的增Context学习能力，即基于输入序列中的示例例子来预测新的查询结果。
methods: 本研究使用了一个简化的注意力网络，并通过证明了特定的语言分布特性（如快速变化、大 dictionaries和不均匀排名分布）控制了Context vs weights学习的贸易或同时出现。
results: 研究发现，在一个简化的注意力网络中，Context学习是通过突然出现的概念头quarters而实现的，并与 weights学习进行竞争。通过采用进步度量和targeted实验，研究构建了一个两参数模型，可以模拟全数据分布相关性 displayed by attention-based network。

Abstract
Transformer models exhibit in-context learning: the ability to accurately predict the response to a novel query based on illustrative examples in the input sequence. In-context learning contrasts with traditional in-weights learning of query-output relationships. What aspects of the training data distribution and architecture favor in-context vs in-weights learning? Recent work has shown that specific distributional properties inherent in language, such as burstiness, large dictionaries and skewed rank-frequency distributions, control the trade-off or simultaneous appearance of these two forms of learning. We first show that these results are recapitulated in a minimal attention-only network trained on a simplified dataset. In-context learning (ICL) is driven by the abrupt emergence of an induction head, which subsequently competes with in-weights learning. By identifying progress measures that precede in-context learning and targeted experiments, we construct a two-parameter model of an induction head which emulates the full data distributional dependencies displayed by the attention-based network. A phenomenological model of induction head formation traces its abrupt emergence to the sequential learning of three nested logits enabled by an intrinsic curriculum. We propose that the sharp transitions in attention-based networks arise due to a specific chain of multi-layer operations necessary to achieve ICL, which is implemented by nested nonlinearities sequentially learned during training.

摘要
启发器模型展示在上下文学习能力：可以准确预测基于输入序列中的示例例子的响应。在上下文学习与传统的查询输出关系学习之间的区别。哪些特征在训练数据分布和架构中帮助启发器学习？最近的研究表明，语言中的特定分布特性，如快速峰值、大字典和偏斜分布，控制了这两种学习形式之间的贸易或同时出现。我们首先表明，这些结果在一个简化的注意力网络上进行训练后被复现。启发器学习（ICL）是由突然出现的一个引导头quarters，然后与查询输出学习竞争。我们通过识别进度度量和Targeted Experiments，构建了一个两个参数的引导头模型，该模型模拟了全数据分布上的所有关系依赖项。一种现象学模型跟踪了引导头的突然出现，它是在训练中逐层学习三个嵌入的假设的结果。我们建议，在注意力网络中的突然变化 arise from a specific chain of multi-layer operations necessary to achieve ICL，这是通过内置的课程学习进行嵌入非线性sequentially during training。

Normed Spaces for Graph Embedding

paper_url: http://arxiv.org/abs/2312.01502
repo_url: https://github.com/andyweizhao/graphs-normed-spaces
paper_authors: Diaaeldin Taha, Wei Zhao, J. Maxwell Riestenberg, Michael Strube
for: 这篇论文旨在探讨 normed space 是否可以作为 Riemannian manifold 的替代方案，以实现更高效和更 flexible 的图像学习。
methods: 这篇论文使用 theoretical results from discrete geometry 和 normed space 的 abstraction 来提出一种新的图像学习方法，并对比了这种方法与传统的 Riemannian manifold 方法。
results: 实验结果表明，normed space embeddings 可以在各种 synthetic 和实际世界的图像 reconstruction 任务上表现出色，并且需要更少的计算资源。此外， authors 还验证了 normed space embeddings 在不同类型的图像中的表现，并证明了它们在图像大小增加时的表现。最后， authors 还应用了 normed space embeddings 在两个实际任务中，namely link prediction 和 recommender systems。

Abstract
Theoretical results from discrete geometry suggest that normed spaces can abstractly embed finite metric spaces with surprisingly low theoretical bounds on distortion in low dimensions. In this paper, inspired by this theoretical insight, we highlight normed spaces as a more flexible and computationally efficient alternative to several popular Riemannian manifolds for learning graph embeddings. Normed space embeddings significantly outperform several popular manifolds on a large range of synthetic and real-world graph reconstruction benchmark datasets while requiring significantly fewer computational resources. We also empirically verify the superiority of normed space embeddings on growing families of graphs associated with negative, zero, and positive curvature, further reinforcing the flexibility of normed spaces in capturing diverse graph structures as graph sizes increase. Lastly, we demonstrate the utility of normed space embeddings on two applied graph embedding tasks, namely, link prediction and recommender systems. Our work highlights the potential of normed spaces for geometric graph representation learning, raises new research questions, and offers a valuable tool for experimental mathematics in the field of finite metric space embeddings. We make our code and data publically available.

摘要
理论结果表明，归一化空间可以抽象地嵌入有限度度的 метри空间，并且这些嵌入的扭曲约束在低维度下是有较低的理论上的下界。本文根据这一理论启示，提出了使用归一化空间来取代一些受欢迎的里曼纹理 manifold 来学习图像嵌入。归一化空间嵌入在许多 sintetic 和实际世界的图像重建 benchmark 数据集上表现出色，而且需要 significanly fewer 计算资源。我们还通过对增长的 familiy OF 图像进行 empirical 验证，证明了归一化空间嵌入的优越性，并且在不同的 curvature 下也能够 Capture 多样化的图像结构。最后，我们示例了归一化空间嵌入在两个应用任务上，namely, 链接预测和推荐系统。我们的工作高光了归一化空间在finite metric space嵌入中的潜力，提出了新的研究问题，并提供了实际数学中的一 valuable 工具。我们将我们的代码和数据公开发布。

OpenVoice: Versatile Instant Voice Cloning

paper_url: http://arxiv.org/abs/2312.01479
repo_url: https://github.com/myshell-ai/openvoice
paper_authors: Zengyi Qin, Wenliang Zhao, Xumin Yu, Xin Sun
for: 这篇论文的目的是提出一种新的语音模仿方法，可以使用短 audio clip 来模仿引用 speaker 的语音。
methods: 这种方法使用了一种新的语音模仿技术，可以在不同的语言中进行语音模仿，并且可以控制语音的情感、口音、节奏、停顿和声调。
results: 该方法可以在不同的语言中进行语音模仿，并且可以实现零批量跨语言语音模仿。此外，该方法的计算成本比较低，只需要一些时间和计算资源。

Abstract
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.ai.

摘要
我们介绍OpenVoice，一种多元化的声音模仿方法，仅需从参照说话者的短时间声音抓取来模仿他们的声音和生成多种语言的话语。OpenVoice解决了以下开放挑战：1）灵活的声音风格控制。OpenVoice允许精确地控制声音风格，包括情感、口音、节奏、停顿和声调，而不是直接从参照说话者的风格 копи写。 previous approaches lacked the ability to flexibly manipulate voice styles after cloning。2）零次跨语言声音模仿。OpenVoice实现了零次跨语言声音模仿，即无需大量多语言听力训练数据（MSML）来模仿新语言的声音。OpenVoice也是计算效率高，成本只有商业可用的API的一半。为促进更多的研究，我们将源代码和训练模型公开 accessible。我们还提供了质量结果在我们的 demo website。在2023年5月至10月之前，我们内部版本的OpenVoice已经用户数量为 tens of millions，作为MyShell.ai的后端。

Regularity as Intrinsic Reward for Free Play

paper_url: http://arxiv.org/abs/2312.01473
repo_url: https://github.com/martius-lab/rair-mbrl
paper_authors: Cansu Sancaktar, Justus Piater, Georg Martius
for: 本研究提出了一种新的奖励信号，即规律性，用于自适应学习中的内在动机。
methods: 我们基于儿童发展的想法，认为寻求结构和秩序可以导引探索到一个不受普通的不确定性基于内在奖励的任务空间。我们扩展了这个思想，并在模型基本学习中实现了它。
results: 在一个synthetic环境中，我们展示了追求规律性可以产生各种结构化模式。在多对象Robot抓取环境中，我们也证明了我们的方法可以提高零下任务性能。在自由环境中，我们将RaIR作为内在奖励，并与模型的认知不确定性相结合，从而在建筑塔等常见结构中自动搭建了各种结构。

Abstract
We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks.

摘要
我们提议使用常规性作为激励学习中的新的奖励信号。受儿development的 inspirations，我们认为渴望结构和秩序可以导引探索到一个由欠知的uncertainty-based intrinsic reward不受欢迎的任务子空间。我们的通用形式的常规性作为内在奖励（RaIR）让我们在模型基 Reinforcement learning中实现了它。在一个 sintetic环境中，我们展示了追求常规性的多种结构化模式的能力。我们还在多个目标 robotic manipulation环境中证明了我们的方法的强大性。我们将 RaIR incorporated into free play，并使其与模型的epistemic uncertainty作为内在奖励。这样做，我们在自由玩家中见到了无需任务指导的自主建筑塔和其他常规结构，这导致了零上下游任务性能的明显提高。

Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits

paper_url: http://arxiv.org/abs/2312.01457
repo_url: https://github.com/faaizt/mr-ope
paper_authors: Muhammad Faaiz Taufiq, Arnaud Doucet, Rob Cornish, Jean-Francois Ton
for: 评估新策略使用现有数据而无需高成本实验。
methods: 引入一种新的评估方法——Marginal Ratio（MR）计算器，关注在结果分布的变化中。
results: 相比传统方法 like Inverse Probability Weighting（IPW）和 Doubly Robust（DR）计算器，MR计算器具有更低的准确性。在 causal inference 中，MR 计算器可以更好地估计干扰效应。我们的实验结果表明，MR 计算器在 contextual bandits 中的应用有所提升。

Abstract
Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new policies using existing data without costly experimentation. However, current OPE methods, such as Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators, suffer from high variance, particularly in cases of low overlap between target and behavior policies or large action and context spaces. In this paper, we introduce a new OPE estimator for contextual bandits, the Marginal Ratio (MR) estimator, which focuses on the shift in the marginal distribution of outcomes $Y$ instead of the policies themselves. Through rigorous theoretical analysis, we demonstrate the benefits of the MR estimator compared to conventional methods like IPW and DR in terms of variance reduction. Additionally, we establish a connection between the MR estimator and the state-of-the-art Marginalized Inverse Propensity Score (MIPS) estimator, proving that MR achieves lower variance among a generalized family of MIPS estimators. We further illustrate the utility of the MR estimator in causal inference settings, where it exhibits enhanced performance in estimating Average Treatment Effects (ATE). Our experiments on synthetic and real-world datasets corroborate our theoretical findings and highlight the practical advantages of the MR estimator in OPE for contextual bandits.

摘要
Contextual bandits 的 Off-Policy Evaluation (OPE) 是非常重要，可以使用现有数据来评估新政策，而不需要昂贵的实验。然而，现有的 OPE 方法，如反杂 probabilities Weighting (IPW) 和 Doubly Robust (DR) 估计器，在 target 和 behavior 政策之间的覆盖率低或 action 和 context 空间较大时，具有高弹性。在这篇论文中，我们介绍了一种新的 OPE 估计器，即 Margin Ratio (MR) 估计器，它关注在结果 Y 的 marginal 分布上而不是政策本身。通过严谨的理论分析，我们证明 MR 估计器与传统的 IPW 和 DR 估计器相比，可以减少弹性。此外，我们证明 MR 估计器与 state-of-the-art Marginalized Inverse Propensity Score (MIPS) 估计器之间存在连接，MR 估计器在一个扩展的 MIPS 家族中具有更低的弹性。我们还证明 MR 估计器在 causal inference 设置中可以更好地估计 Average Treatment Effects (ATE)。我们在 synthetic 和实际数据上进行了实验，并证明了 MR 估计器在 OPE 中的实际优势。

Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees

paper_url: http://arxiv.org/abs/2312.01456
repo_url: https://github.com/mlech26l/neural_martingales
paper_authors: Đorđe Žikelić, Mathias Lechner, Abhinav Verma, Krishnendu Chatterjee, Thomas A. Henzinger
for: 学习 neural network 策略的复杂控制任务
methods: 使用 SpectRL 的逻辑特征来学习图状的 probabilistic reach-avoid 特征，并与 reach-avoid supermartingales (RASM) 一起学习 global 策略
results: 提供了一种 formal certificate， guaranteeing that the policy’s behavior satisfies the desired probability level, and also derived a tighter lower bound on the probability of reach-avoidance compared to previous work.

Abstract
Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.

摘要
<> translate "Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment."into Simplified Chinese.下面是文本的简化中文翻译：Reinforcement learning已经在复杂控制任务上显示出了承诺的结果，但缺乏正式的保证性 guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph's sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.

Classification of Home Network Problems with Transformers

paper_url: http://arxiv.org/abs/2312.01445
repo_url: None
paper_authors: Jeremias Dötterl, Zahra Hemmati Fard
for: 本研究旨在开发一种可以识别家庭网络问题的分类模型，以便帮助用户快速地检测和解决问题。
methods: 本研究使用了深度学习的transformer架构，特别是我们提议的预处理器来将工具输出分割成Token序列。
results: 我们的模型在实验中达到了高准确率，证明了使用transformer进行网络问题分类的高潜力。

Abstract
We propose a classifier that can identify ten common home network problems based on the raw textual output of networking tools such as ping, dig, and ip. Our deep learning model uses an encoder-only transformer architecture with a particular pre-tokenizer that we propose for splitting the tool output into token sequences. The use of transformers distinguishes our approach from related work on network problem classification, which still primarily relies on non-deep-learning methods. Our model achieves high accuracy in our experiments, demonstrating the high potential of transformer-based problem classification for the home network.

摘要
我们提出一种分类器，可以基于网络工具的原始文本输出（如 ping、dig 和 ip）识别家庭网络上的十种常见问题。我们的深度学习模型采用一个Encoder-only transformer架构，并使用我们提议的特定预处理器将工具输出切分成Token序列。与相关的工作相比，我们的方法使用transformer对网络问题分类进行了划算，并且在我们的实验中，模型实现了高准确率，这表明了transformer-based问题分类在家庭网络中的高潜力。

Fast Dual Subgradient Optimization of the Integrated Transportation Distance Between Stochastic Kernels

paper_url: http://arxiv.org/abs/2312.01432
repo_url: None
paper_authors: Zhengqi Lin, Andrzej Ruszczynski
for: 本研究旨在开发一种基于温水态metric的约化技术，用于简化泊尔马尔可夫系统中的分布kernel的计算。
methods: 本研究使用了一种泊尔马尔可夫系统中的约化技术，基于温水态metric的定义，以及一种特殊的双重算法来构建约化kernel。
results: 本研究的实验结果表明，使用本方法可以快速和高效地简化泊尔马尔可夫系统中的分布kernel，而无需进行计算ем实杂的矩阵操作。此外，本方法还可以在实际应用中实现高效的分布kernel的替换和合理的拟合。

Abstract
A generalization of the Wasserstein metric, the integrated transportation distance, establishes a novel distance between probability kernels of Markov systems. This metric serves as the foundation for an efficient approximation technique, enabling the replacement of the original system's kernel with a kernel with a discrete support of limited cardinality. To facilitate practical implementation, we present a specialized dual algorithm capable of constructing these approximate kernels quickly and efficiently, without requiring computationally expensive matrix operations. Finally, we demonstrate the efficacy of our method through several illustrative examples, showcasing its utility in practical scenarios. This advancement offers new possibilities for the streamlined analysis and manipulation of stochastic systems represented by kernels.

摘要
一种泛化 Wasserstein 距离 metric，综合运输距离，可以将普洛布渠系统中的概率密度函数之间建立一种新的距离。这个距离serve as the foundation for an efficient approximation technique，可以将原始系统的kernel replaced with a kernel having a discrete support of limited cardinality。为了实现实际应用，我们提出了一种特殊的 dual 算法，可以快速和高效地构造这些精简化kernel，不需要计算成本高的矩阵操作。最后，我们通过一些示例证明了我们的方法的有用性，展示了它在实际场景中的应用前景。这一进步提供了新的可能性 для静止分析和 manipulate 概率系统的kernel。Note: "泛化 Wasserstein 距离" is a simplified Chinese term that is commonly used to refer to the integrated transportation distance.

Neural Network Characterization and Entropy Regulated Data Balancing through Principal Component Analysis

paper_url: http://arxiv.org/abs/2312.01392
repo_url: None
paper_authors: David Yevick, Karolina Hutchison
for: 本研究探讨了神经网络行为和训练数据记录投影形成的分布之间的关系。
methods: 本研究使用了低维主成分分析法对训练数据进行分析，并通过对每个数据记录的相关主成分进行计算，来实现神经网络计算中的数据处理。
results: 研究发现，在一个基准计算中，旋转和未旋转的mnist数字之间的分布在低维主成分空间中可以很好地分割，并且这些分布可以通过对每个数据记录的相关主成分进行平均来实现数据均衡。

Abstract
This paper examines the relationship between the behavior of a neural network and the distribution formed from the projections of the data records into the space spanned by the low-order principal components of the training data. For example, in a benchmark calculation involving rotated and unrotated MNIST digits, classes (digits) that are mapped far from the origin in a low-dimensional principal component space and that overlap minimally with other digits converge rapidly and exhibit high degrees of accuracy in neural network calculations that employ the associated components of each data record as inputs. Further, if the space spanned by these low-order principal components is divided into bins and the input data records that are mapped into a given bin averaged, the resulting pattern can be distinguished by its geometric features which interpolate between those of adjacent bins in an analogous manner to variational autoencoders. Based on this observation, a simply realized data balancing procedure can be realized by evaluating the entropy associated with each histogram bin and subsequently repeating the original image data associated with the bin by a number of times that is determined from this entropy.

摘要
Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The translation is based on the Google Translate API, and may not be perfect in terms of grammar or nuance.

Regret Optimality of GP-UCB

paper_url: http://arxiv.org/abs/2312.01386
repo_url: None
paper_authors: Wenjia Wang, Xiaowei Zhang, Lu Zou
for: 本研究的目的是回答GP-UCB是否具有最佳 regret的问题，这个问题在bayesian优化领域是一个非常重要的开问。
methods: 本研究使用了GP-UCB方法，并通过对目标函数的几何性质进行假设，提出了新的Upper bound，以证明GP-UCB的 regret是最佳的。
results: 研究发现，当目标函数具有某些几何性质时，GP-UCB方法可以同时实现最佳的simple regret和cumulative regret，而且这些Upper bound与最佳的最小值幂相等，甚至高于。这些结果表明，GP-UCB方法在搜索最佳解时具有优化的性能。

Abstract
Gaussian Process Upper Confidence Bound (GP-UCB) is one of the most popular methods for optimizing black-box functions with noisy observations, due to its simple structure and superior performance. Its empirical successes lead to a natural, yet unresolved question: Is GP-UCB regret optimal? In this paper, we offer the first generally affirmative answer to this important open question in the Bayesian optimization literature. We establish new upper bounds on both the simple and cumulative regret of GP-UCB when the objective function to optimize admits certain smoothness property. These upper bounds match the known minimax lower bounds (up to logarithmic factors independent of the feasible region's dimensionality) for optimizing functions with the same smoothness. Intriguingly, our findings indicate that, with the same level of exploration, GP-UCB can simultaneously achieve optimality in both simple and cumulative regret. The crux of our analysis hinges on a refined uniform error bound for online estimation of functions in reproducing kernel Hilbert spaces. This error bound, which we derive from empirical process theory, is of independent interest, and its potential applications may reach beyond the scope of this study.

摘要

Relation between PLS and OLS regression in terms of the eigenvalue distribution of the regressor covariance matrix

paper_url: http://arxiv.org/abs/2312.01379
repo_url: None
paper_authors: David del Val, José R. Berrendero, Alberto Suárez
for: 本研究针对某些对数 regression 问题提出了一个新的方法，即使用几何 Statistics 的对数缩减技术（PLS）。
methods: 本研究使用 PLS 技术，包括找到最小二乘项的对数缩减项，并使用 Krylov 子空间的等价形式来分析这个问题。
results: 研究发现，随着对数缩减项的数量增加，对数缩减项的估计值将更接近于对数最小二乘项的估计值。此外，研究还发现，对数缩减项的分布会受到几何 Statistics 的缩减项的影响。

Abstract
Partial least squares (PLS) is a dimensionality reduction technique introduced in the field of chemometrics and successfully employed in many other areas. The PLS components are obtained by maximizing the covariance between linear combinations of the regressors and of the target variables. In this work, we focus on its application to scalar regression problems. PLS regression consists in finding the least squares predictor that is a linear combination of a subset of the PLS components. Alternatively, PLS regression can be formulated as a least squares problem restricted to a Krylov subspace. This equivalent formulation is employed to analyze the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}^{\scriptscriptstyle {(L)}$, the PLS estimator of the vector of coefficients of the linear regression model based on $L$ PLS components, and $\hat{\boldsymbol \beta}_{\mathrm{OLS}$, the one obtained by ordinary least squares (OLS), as a function of $L$. Specifically, ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}^{\scriptscriptstyle {(L)}$ is the vector of coefficients in the aforementioned Krylov subspace that is closest to $\hat{\boldsymbol \beta}_{\mathrm{OLS}$ in terms of the Mahalanobis distance with respect to the covariance matrix of the OLS estimate. We provide a bound on this distance that depends only on the distribution of the eigenvalues of the regressor covariance matrix. Numerical examples on synthetic and real-world data are used to illustrate how the distance between ${\hat{\boldsymbol\beta}\;}_{\mathrm{PLS}^{\scriptscriptstyle {(L)}$ and $\hat{\boldsymbol \beta}_{\mathrm{OLS}$ depends on the number of clusters in which the eigenvalues of the regressor covariance matrix are grouped.

摘要
优化方差分解（PLS）是一种维度减少技术，在化学ometrics领域引入并在其他领域成功应用。PLS组件由最大化投影变量和目标变量之间的协方差的最大化而得。在这个工作中，我们关注它在整数 regresión问题上的应用。PLS回归是找到最小二乘predictor，这是一个线性组合PLS组件的子集。或者，PLS回归可以被表示为一个具有限制的Krylov空间的最小二乘问题。这个等价形式被用来分析PLS估计Vector of coefficients of the linear regression model based on $L$ PLS components and $\hat{\boldsymbol \beta}_{\mathrm{OLS}$的距离，作为一函数of $L$。具体来说，${\hat{\boldsymbol\beta}\;}_{\text{PLS}^{(L)}$是Krylov空间中最近于$\hat{\boldsymbol \beta}_{\text{OLS}$的Vector of coefficients的线性组合，这个空间是由OLS估计的covariance matrix确定的。我们提供一个只取决于投影变量协方差矩阵的分布的下界，用于衡量这个距离。例如，在synthetic和实际数据上，我们通过numerical例子来示出PLS估计和OLS估计之间的距离，随着投影变量协方差矩阵中元素的分组数（cluster number）的变化。

Simulation-Based Inference of Surface Accumulation and Basal Melt Rates of an Antarctic Ice Shelf from Isochronal Layers

paper_url: http://arxiv.org/abs/2312.02997
repo_url: https://github.com/mackelab/preprocessing-ice-data
paper_authors: Guy Moss, Vjeran Višnjević, Olaf Eisen, Falk M. Oraschewski, Cornelius Schröder, Jakob H. Macke, Reinhard Drews
for:This paper aims to infer the surface accumulation and basal melt rates of ice shelves over decadal and centennial timescales, using radar observations and a new method called simulation-based inference (SBI).methods:The method uses a kinematic forward model of internal stratigraphy to infer the spatial dependence of surface accumulation and basal melt rates along flow line transects. The SBI method trains neural networks on simulations of the forward model to approximate the posterior distribution, allowing for quantification of uncertainties over the inferred parameters.results:The results show stable atmospheric and oceanographic conditions over the past several centuries in the catchment of Antarctica’s Ekstr"om Ice Shelf, based on the inferred surface accumulation and basal melt rates. The use of observed internal stratigraphy allows for the separation of the effects of surface accumulation and basal melt, enabling their interpretation in a historical context.

Abstract
The ice shelves buttressing the Antarctic ice sheet determine the rate of ice-discharge into the surrounding oceans. The geometry of ice shelves, and hence their buttressing strength, is determined by ice flow as well as by the local surface accumulation and basal melt rates, governed by atmospheric and oceanic conditions. Contemporary methods resolve one of these rates, but typically not both. Moreover, there is little information of how they changed in time. We present a new method to simultaneously infer the surface accumulation and basal melt rates averaged over decadal and centennial timescales. We infer the spatial dependence of these rates along flow line transects using internal stratigraphy observed by radars, using a kinematic forward model of internal stratigraphy. We solve the inverse problem using simulation-based inference (SBI). SBI performs Bayesian inference by training neural networks on simulations of the forward model to approximate the posterior distribution, allowing us to also quantify uncertainties over the inferred parameters. We demonstrate the validity of our method on a synthetic example, and apply it to Ekstr\"om Ice Shelf, Antarctica, for which newly acquired radar measurements are available. We obtain posterior distributions of surface accumulation and basal melt averaging over 42, 84, 146, and 188 years before 2022. Our results suggest stable atmospheric and oceanographic conditions over this period in this catchment of Antarctica. Use of observed internal stratigraphy can separate the effects of surface accumulation and basal melt, allowing them to be interpreted in a historical context of the last centuries and beyond.

摘要
ice shelves 支撑 Antarctic ice sheet 决定 ice 的排放速率 into 周围的 ocean。 ice shelves 的 geometry，以及其 supporting strength，是由 ice flow 以及 local surface accumulation 和 basal melt rates 决定，这些 rates 是由 atmospheric 和 oceanic conditions 控制。 contemporary methods 只能 resolve 一个这些 rates，而 typically not both。 In addition, there is little information about how they changed over time.我们 present a new method to simultaneously infer the surface accumulation 和 basal melt rates averaged over decadal 和 centennial timescales。 we infer the spatial dependence of these rates along flow line transects using internal stratigraphy observed by radars，using a kinematic forward model of internal stratigraphy。 we solve the inverse problem using simulation-based inference (SBI). SBI performs Bayesian inference by training neural networks on simulations of the forward model to approximate the posterior distribution，allowing us to also quantify uncertainties over the inferred parameters。我们在 synthetic example 中证明了方法的有效性，并将其应用到了 Ekstr\"om Ice Shelf, Antarctica，这里 newly acquired radar measurements 是可用的。 we obtain posterior distributions of surface accumulation 和 basal melt，averaging over 42, 84, 146, 和 188 years before 2022。 our results suggest stable atmospheric 和 oceanographic conditions over this period in this catchment of Antarctica。使用 observed internal stratigraphy 可以分开 surface accumulation 和 basal melt，这些 parameters 可以在 historical context 中被解释，包括 last centuries 以及更远的过去。

Graph Coordinates and Conventional Neural Networks – An Alternative for Graph Neural Networks

paper_url: http://arxiv.org/abs/2312.01342
repo_url: None
paper_authors: Zheyi Qin, Randy Paffenroth, Anura P. Jayasumana
For: The paper proposes two novel and efficient alternatives to traditional message passing graph neural networks (GNNs) for graph-based machine learning tasks.* Methods: The proposed methods, called Topology Coordinate Neural Network (TCNN) and Directional Virtual Coordinate Neural Network (DVCNN), leverage the graph’s topology directly and sidestep the computational challenges of message passing GNNs.* Results: The proposed methods achieve competitive or superior performance to message passing GNNs in terms of accuracy and ROC-AUC, with fewer trainable parameters. Specifically, TCNN requires fewer parameters than any neural network method currently listed in the Open Graph Benchmark Leaderboard for both OGBN-Proteins and OGBN-Products datasets, while achieving higher performance for a similar number of trainable parameters.

Abstract
Graph-based data present unique challenges and opportunities for machine learning. Graph Neural Networks (GNNs), and especially those algorithms that capture graph topology through message passing for neighborhood aggregation, have been a leading solution. However, these networks often require substantial computational resources and may not optimally leverage the information contained in the graph's topology, particularly for large-scale or complex graphs. We propose Topology Coordinate Neural Network (TCNN) and Directional Virtual Coordinate Neural Network (DVCNN) as novel and efficient alternatives to message passing GNNs, that directly leverage the graph's topology, sidestepping the computational challenges presented by competing algorithms. Our proposed methods can be viewed as a reprise of classic techniques for graph embedding for neural network feature engineering, but they are novel in that our embedding techniques leverage ideas in Graph Coordinates (GC) that are lacking in current practice. Experimental results, benchmarked against the Open Graph Benchmark Leaderboard, demonstrate that TCNN and DVCNN achieve competitive or superior performance to message passing GNNs. For similar levels of accuracy and ROC-AUC, TCNN and DVCNN need far fewer trainable parameters than contenders of the OGBN Leaderboard. The proposed TCNN architecture requires fewer parameters than any neural network method currently listed in the OGBN Leaderboard for both OGBN-Proteins and OGBN-Products datasets. Conversely, our methods achieve higher performance for a similar number of trainable parameters. By providing an efficient and effective alternative to message passing GNNs, our work expands the toolbox of techniques for graph-based machine learning.

摘要
GRAPH-BASED DATA 提供了各种挑战和机会，机器学习中的 GRAPH Neural Networks (GNNs) 和特别是通过消息传递来汇集邻居的 topological 信息，已成为解决方案。然而，这些网络经常需要巨量的计算资源，并且可能不优化图形的信息，尤其是大规模或复杂的图形。我们提议 Topology Coordinate Neural Network (TCNN) 和 Directional Virtual Coordinate Neural Network (DVCNN) 作为传递消息 GNNs 的新和高效的替代方案，直接利用图形的 topological 信息，强制消息传递中的计算挑战。我们的提议方法可以视为现有的图形嵌入技术的改进，但它们是新的，因为我们的嵌入技术利用 Graph Coordinates (GC) 中缺失的想法。我们的实验结果，在对 Open Graph Benchmark Leaderboard 进行比较，表明 TCNN 和 DVCNN 与消息传递 GNNs 相比，具有竞争力或更高的性能，同时需要更少的可训练参数。TCNN 架构需要更少的参数 than any neural network method currently listed in the OGBN Leaderboard для OGBN-Proteins 和 OGBN-Products 数据集。相反，我们的方法在同等级别的性能下，需要更少的参数。我们的工作扩展了图形基于机器学习的工具箱。

Robust Non-parametric Knowledge-based Diffusion Least Mean Squares over Adaptive Networks

paper_url: http://arxiv.org/abs/2312.01299
repo_url: None
paper_authors: Soheil Ashkezari-Toussi, Hadi sadoghi-Yazdi
for: 本研究提出一种基于非参数知识的扩散最小二乘算法，用于MAP估计不知道参数 вектор。
methods: 该算法使用预测分布和缓冲一些中间估计，计算每个节点的参数 vector 的先验分布和条件概率。使用 Pseudo Huber 损失函数定义似然函数。还定义了一个错误阈值函数，以降低计算开销和减弱噪声的影响。
results: 对于静止和非静止场景，results 表明提案的算法在不同噪声类型的情况下具有坚定性。

Abstract
The present study proposes incorporating non-parametric knowledge into the diffusion least-mean-squares algorithm in the framework of a maximum a posteriori (MAP) estimation. The proposed algorithm leads to a robust estimation of an unknown parameter vector in a group of cooperative estimators. Utilizing kernel density estimation and buffering some intermediate estimations, the prior distribution and conditional likelihood of the parameters vector in each node are calculated. Pseudo Huber loss function is used for designing the likelihood function. Also, an error thresholding function is defined to reduce the computational overhead as well as more relaxation against noise, which stops the update every time an error is less than a predefined threshold. The performance of the proposed algorithm is examined in the stationary and non-stationary scenarios in the presence of Gaussian and non-Gaussian noise. Results show the robustness of the proposed algorithm in the presence of different noise types.

摘要
当前研究提出了在扩散最小二乘算法中涵盖非参数知识的 incorporation 的方法，基于最大 posteriori（MAP）估计框架。该提议的算法可以实现一个 robust 的参数向量估计在多个合作估计器中。通过计算每个节点的前后分布和Conditional likelihood函数，以及使用 Pseudo Huber 损失函数和错误阈值函数，可以减少计算开销并提高对噪声的耐受性。在静止和非静止场景下，在 Gaussian 和非 Gaussian 噪声下，对提议算法的性能进行了测试，结果显示了该算法在不同噪声类型下的稳定性。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. If you need the translation in Traditional Chinese, please let me know.

Anomaly Detection Under Uncertainty Using Distributionally Robust Optimization Approach

paper_url: http://arxiv.org/abs/2312.01296
repo_url: None
paper_authors: Amir Hossein Noormohammadia, Seyed Ali MirHassania, Farnaz Hooshmand Khaligh
for: 这篇论文是针对异常检测问题的，具体来说是使用分类方法，包括一类支持向量机（SVM）来找到数据点不符合主流模式的异常点。
methods: 本论文使用了一类支持向量机（SVM）方法，但是这种方法需要完全知道数据点的真实概率分布，而在实际问题中，这种信息通常是不可知的。因此，该论文提出了基于分布信息的分布随机风险限制模型，可以在不知道数据点的概率分布情况下进行异常检测。
results: 该论文的计算结果表明，提出的模型在不同的概率分布情况下具有良好的稳定性和高度的准确率，并且在与标准一类支持向量机（SVM）进行比较时，具有更高的评价指标。

Abstract
Anomaly detection is defined as the problem of finding data points that do not follow the patterns of the majority. Among the various proposed methods for solving this problem, classification-based methods, including one-class Support Vector Machines (SVM) are considered effective and state-of-the-art. The one-class SVM method aims to find a decision boundary to distinguish between normal data points and anomalies using only the normal data. On the other hand, most real-world problems involve some degree of uncertainty, where the true probability distribution of each data point is unknown, and estimating it is often difficult and costly. Assuming partial distribution information such as the first and second-order moments is known, a distributionally robust chance-constrained model is proposed in which the probability of misclassification is low. By utilizing a mapping function to a higher dimensional space, the proposed model will be capable of classifying origin-inseparable datasets. Also, by adopting the kernel idea, the need for explicitly knowing the mapping is eliminated, computations can be performed in the input space, and computational complexity is reduced. Computational results validate the robustness of the proposed model under different probability distributions and also the superiority of the proposed model compared to the standard one-class SVM in terms of various evaluation metrics.

摘要
“异常检测问题定义为找到数据点不符合多数据点的模式。 amongst various提出的解决方法，基于分类的方法，包括一类支持向量机（SVM）被视为有效和现代的。一类 SVM 方法的目标是在使用仅 normal 数据来定义异常点和正常点的决策边界。然而，实际问题通常带有不确定性，每个数据点的真实概率分布未知，估计它们是困难和成本的。假设有部分分布信息，如第一和第二 moments，则提出一种分布 robust chance-constrained 模型。通过使用映射函数，该模型可以处理origin-inseparable 数据集，并且通过采用内核想法，可以消除显式知道映射的需求，在输入空间进行计算，降低计算复杂度。计算结果证明提案模型在不同的概率分布下 Display 稳定性，并且与标准一类 SVM 相比，在不同评价指标上具有superiority。”

Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series

paper_url: http://arxiv.org/abs/2312.01294
repo_url: None
paper_authors: Ying Liu, Peng Cui, Wenbo Hu, Richang Hong
for: 这篇论文的目的是提出一种非生成式时间序列填写方法，可以精确地填写 missing value，同时 computational efficiency 高。
methods: 本文使用深度对称组合方法，具有内置不确定性估计的能力，并且可以降低大量的计算成本。
results: 本文在实验中证明了这种方法可以实现精确的时间序列填写，并且比Score-based diffusion方法更快速，且需要 fewer model parameters。

Abstract
Multivariate time series are everywhere. Nevertheless, real-world time series data often exhibit numerous missing values, which is the time series imputation task. Although previous deep learning methods have been shown to be effective for time series imputation, they are shown to produce overconfident imputations, which might be a potentially overlooked threat to the reliability of the intelligence system. Score-based diffusion method(i.e., CSDI) is effective for the time series imputation task but computationally expensive due to the nature of the generative diffusion model framework. In this paper, we propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty and meanwhile is computationally efficient. Specifically, we incorporate deep ensembles into quantile regression with a shared model backbone and a series of quantile discrimination functions.This framework combines the merits of accurate uncertainty estimation of deep ensembles and quantile regression and above all, the shared model backbone tremendously reduces most of the computation overhead of the multiple ensembles. We examine the performance of the proposed method on two real-world datasets: air quality and health-care datasets and conduct extensive experiments to show that our method excels at making deterministic and probabilistic predictions. Compared with the score-based diffusion method: CSDI, we can obtain comparable forecasting results and is better when more data is missing. Furthermore, as a non-generative model compared with CSDI, the proposed method consumes a much smaller computation overhead, yielding much faster training speed and fewer model parameters.

摘要
多变量时间序列在所有地方都存在，然而实际世界时间序列数据经常具有多个缺失值，这是时间序列投入任务。 although previous deep learning methods have been shown to be effective for time series imputation, they are shown to produce overconfident imputations, which might be a potentially overlooked threat to the reliability of the intelligence system. Score-based diffusion method (i.e., CSDI) is effective for the time series imputation task but computationally expensive due to the nature of the generative diffusion model framework. In this paper, we propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty and meanwhile is computationally efficient. Specifically, we incorporate deep ensembles into quantile regression with a shared model backbone and a series of quantile discrimination functions. This framework combines the merits of accurate uncertainty estimation of deep ensembles and quantile regression, and above all, the shared model backbone tremendously reduces most of the computation overhead of the multiple ensembles. We examine the performance of the proposed method on two real-world datasets: air quality and health-care datasets and conduct extensive experiments to show that our method excels at making deterministic and probabilistic predictions. Compared with the score-based diffusion method: CSDI, we can obtain comparable forecasting results and is better when more data is missing. Furthermore, as a non-generative model compared with CSDI, the proposed method consumes a much smaller computation overhead, yielding much faster training speed and fewer model parameters.

Task-Oriented Edge Networks: Decentralized Learning Over Wireless Fronthaul

paper_url: http://arxiv.org/abs/2312.01288
repo_url: None
paper_authors: Hoon Lee, Seung-Wook Kim
for: This paper is written for task-oriented edge networks where multiple edge internet-of-things nodes execute machine learning tasks with the help of powerful deep neural networks (DNNs) at a network cloud.
methods: The paper uses task-oriented encoder DNNs for compressing local observations at individual edge nodes (ENs), and develops a decentralized training algorithm for separate edge-cloud DNNs over downlink wireless fronthaul channels.
results: The paper gets versatile calculations that are independent of the number of ENs, and an efficient cloud inference model that integrates a number of shallow DNNs, inspired by the nomographic function.

Abstract
This paper studies task-oriented edge networks where multiple edge internet-of-things nodes execute machine learning tasks with the help of powerful deep neural networks (DNNs) at a network cloud. Separate edge nodes (ENs) result in a partially observable system where they can only get partitioned features of the global network states. These local observations need to be forwarded to the cloud via resource-constrained wireless fronthual links. Individual ENs compress their local observations into uplink fronthaul messages using task-oriented encoder DNNs. Then, the cloud carries out a remote inference task by leveraging received signals. Such a distributed topology requests a decentralized training and decentralized execution (DTDE) learning framework for designing edge-cloud cooperative inference rules and their decentralized training strategies. First, we develop fronthaul-cooperative DNN architecture along with proper uplink coordination protocols suitable for wireless fronthaul interconnection. Inspired by the nomographic function, an efficient cloud inference model becomes an integration of a number of shallow DNNs. This modulized architecture brings versatile calculations that are independent of the number of ENs. Next, we present a decentralized training algorithm of separate edge-cloud DNNs over downlink wireless fronthaul channels. An appropriate downlink coordination protocol is proposed, which backpropagates gradient vectors wirelessly from the cloud to the ENs.

摘要
To achieve this, the paper develops a fronthaul-cooperative DNN architecture and proper uplink coordination protocols suitable for wireless fronthaul interconnection. The cloud inference model is designed as an integration of multiple shallow DNNs, inspired by the nomographic function, which provides versatile calculations that are independent of the number of ENs.The paper also presents a decentralized training algorithm for separate edge-cloud DNNs over downlink wireless fronthaul channels. An appropriate downlink coordination protocol is proposed, which backpropagates gradient vectors wirelessly from the cloud to the ENs. This allows for efficient decentralized training of the DNNs in the edge-cloud system.

Continuous Convolutional Neural Networks for Disruption Prediction in Nuclear Fusion Plasmas

paper_url: http://arxiv.org/abs/2312.01286
repo_url: None
paper_authors: William F Arnold, Lucas Spangher, Christina Rea
for: grid decarbonization for climate change
methods: Machine Learning approaches, specifically Continuous Convolutional Neural Networks
results: significantly better performance in disruption prediction compared to previous discrete models (Area Under the Receiver Operating Characteristic Curve = 0.974 v.s. 0.799) with fewer parameters.

Abstract
Grid decarbonization for climate change requires dispatchable carbon-free energy like nuclear fusion. The tokamak concept offers a promising path for fusion, but one of the foremost challenges in implementation is the occurrence of energetic plasma disruptions. In this study, we delve into Machine Learning approaches to predict plasma state outcomes. Our contributions are twofold: (1) We present a novel application of Continuous Convolutional Neural Networks for disruption prediction and (2) We examine the advantages and disadvantages of continuous models over discrete models for disruption prediction by comparing our model with the previous, discrete state of the art, and show that continuous models offer significantly better performance (Area Under the Receiver Operating Characteristic Curve = 0.974 v.s. 0.799) with fewer parameters

摘要
格里德减 carbon化为气候变化所需的可控可 carbon-free能源，如核聚变。tokamak概念提供了核聚变可能的一条可行之路，但实施中的一个重要挑战是高能束聚变的发生。在这篇研究中，我们探讨使用机器学习方法预测束聚变结果。我们的贡献有两个方面：1. 我们提出了一种新的连续卷积神经网络 для束聚变预测，并对其与之前的离散模型进行了比较。2. 我们研究了连续模型在束聚变预测中的优劣点，并发现连续模型在预测性能方面表现明显更好（接受收 Operation Characteristic Curve 的面积 = 0.974 v.s. 0.799），但它们具有更少的参数。

Mendata: A Framework to Purify Manipulated Training Data

paper_url: http://arxiv.org/abs/2312.01281
repo_url: None
paper_authors: Zonghao Huang, Neil Gong, Michael K. Reiter
for: 防止训练数据被恶意修改，以防止模型学习出现隐藏的属性。
methods: 提出了一种名为Mendata的数据纯化框架，通过在一个小型参考集中寻找类似于参考数据的输入，并通过水星距离来衡量输入的分布相似性，从而消除恶意修改的影响。
results: 通过应用Mendata，可以击败当前数据毒素和数据追踪技术。

Abstract
Untrusted data used to train a model might have been manipulated to endow the learned model with hidden properties that the data contributor might later exploit. Data purification aims to remove such manipulations prior to training the model. We propose Mendata, a novel framework to purify manipulated training data. Starting from a small reference dataset in which a large majority of the inputs are clean, Mendata perturbs the training inputs so that they retain their utility but are distributed similarly (as measured by Wasserstein distance) to the reference data, thereby eliminating hidden properties from the learned model. A key challenge is how to find such perturbations, which we address by formulating a min-max optimization problem and developing a two-step method to iteratively solve it. We demonstrate the effectiveness of Mendata by applying it to defeat state-of-the-art data poisoning and data tracing techniques.

摘要
非信任数据可能已经被修改，以允许模型学习隐藏的属性，以便后续滥用。数据纯化目标是在训练模型之前移除这些修改。我们提出了一个名为Mendata的新框架，用于纯化训练数据。从一个小的参考集中，我们开始，其中大多数输入都是清晰的。Mendata会将训练输入扰动，以保留其用途性，但是与参考数据相似的分布（按照沃氏距离测量），从而将隐藏属性从学习的模型中除除。我们需要找到这些扰动的方法，我们解决这个挑战通过定义一个最小最大优化问题，并开发了一种两步方法来逐步解决它。我们通过应用Mendata来击败当前的数据毒 poisoning和数据跟踪技术。

A Review of Link Prediction Applications in Network Biology

paper_url: http://arxiv.org/abs/2312.01275
repo_url: None
paper_authors: Ahmad F. Al Musawi, Satyaki Roy, Preetam Ghosh
for: 这个评论概要是关于生物网络中不同类型的生物分子和生物系统之间的交互关系，以及使用链接预测方法来协助预测这些交互关系。
methods: 本文系统地批判了本地、中心性和嵌入式链接预测方法，并应用到生物网络中的静态和动态网络上。
results: 文章进行了现有生物网络数据集的实际应用和评估，并 Compares the similarity of prediction trends among models and the specific network attributes that contribute to effective link prediction。

Abstract
In the domain of network biology, the interactions among heterogeneous genomic and molecular entities are represented through networks. Link prediction (LP) methodologies are instrumental in inferring missing or prospective associations within these biological networks. In this review, we systematically dissect the attributes of local, centrality, and embedding-based LP approaches, applied to static and dynamic biological networks. We undertake an examination of the current applications of LP metrics for predicting links between diseases, genes, proteins, RNA, microbiomes, drugs, and neurons. We carry out comprehensive performance evaluations on established biological network datasets to show the practical applications of standard LP models. Moreover, we compare the similarity in prediction trends among the models and the specific network attributes that contribute to effective link prediction, before underscoring the role of LP in addressing the formidable challenges prevalent in biological systems, ranging from noise, bias, and data sparseness to interpretability. We conclude the review with an exploration of the essential characteristics expected from future LP models, poised to advance our comprehension of the intricate interactions governing biological systems.

摘要
在生物网络领域，不同类型的生物分子和分子实体之间的交互关系通过网络表示。链接预测（LP）方法ologies是推断缺失或可能的生物网络中连接的重要工具。本文系统地分析了本地、中心性和嵌入式LP方法的特点，应用于静止和动态生物网络。我们进行了现有生物网络数据集的实践应用LP指标预测疾病、基因、蛋白质、RNA、微生物、药物和神经元之间的链接。此外，我们对LP模型的预测趋势进行了全面的性能评估，并比较了不同网络特性对LP模型的影响。最后，我们强调了LP在生物系统中解决困难问题的重要性，包括噪音、偏见、数据稀缺和解释性。我们结束这篇文章，探讨未来LP模型的必要特征，以提高我们对生物系统中复杂交互的理解。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Multiscale Topology in Interactomic Network: From Transcriptome to Antiaddiction Drug Repurposing

paper_url: http://arxiv.org/abs/2312.01272
repo_url: None
paper_authors: Hongyan Du, Guo-Wei Wei, Tingjun Hou
for: This paper aims to identify potential drug repurposing candidates for opioid and cocaine addiction treatment by analyzing addiction-related transcriptomic data and protein-protein interaction networks.methods: The authors use differential gene expression analysis, persistent Laplacians, and machine learning models to identify key genes and predict binding affinities of drug compounds to target proteins.results: The authors identify three pivotal molecular targets, mTOR, mGluR5, and NMDAR, for drug repurposing and evaluate their drug-likeness using a machine learning model. They also demonstrate the versatility of their methods for applications across a range of diseases and transcriptomic datasets.

Abstract
The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein-protein interaction (PPI) network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis, and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5, and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging binding affinities of DrugBank compounds to selected targets. Furthermore, we elucidated the interactions of promising drugs with the targets and evaluated their drug-likeness. This study delineates a multi-faceted and comprehensive analytical framework, amalgamating bioinformatics, topological data analysis and machine learning, for drug repurposing in addiction treatment, setting the stage for subsequent experimental validation. The versatility of the methods we developed allows for applications across a range of diseases and transcriptomic datasets.

摘要
美国的药物成瘾危机加剧，强调了创新的治疗策略的需求。这项研究采用了一种创新的和严格的方法，从药物成瘾相关的转录组数据分析中挖掘可能的药物复用候选者，从转录组数据分析 bridges 到药物发现。我们开始我们的方法，通过对成瘾相关的转录组数据进行差异表达分析，以确定关键基因。我们提出了一种新的拓扑差异分析方法，使用持续拉普拉斯来准确地从PPI网络中提取关键节点，并在多级别进行分析，以确保高度可靠。通过严格的文献验证、路径分析和数据可用性检查，我们确定了三个关键分子标arget，即mTOR、mGluR5和NMDAR，可以从药物库中复用。我们制作了使用自然语言处理（NLP）基于的两种嵌入和传统的2D指纹，用于测量药物库中的绑定亲和力。此外，我们还描述了这些药物与目标之间的交互，并评估了它们的药理性。本研究描述了一种多方面和全面的分析框架，结合生物信息学、拓扑数据分析和机器学习，用于药物复用的投入研究，为后续实验验证设置了舞台。我们的方法的多样性允许其应用于多种疾病和转录组数据集。

Distributed Reinforcement Learning for Molecular Design: Antioxidant case

paper_url: http://arxiv.org/abs/2312.01267
repo_url: None
paper_authors: Huanyi Qin, Denis Akhiyarov, Sophie Loehle, Kenneth Chiu, Mauricio Araya-Polo
For: 这个研究旨在开发一个分布式深度强化学习算法，以便对抗氧化剂的适用。* Methods: 这个算法利用了分布式深度强化学习和化学物理属性预测器，以提高训练时间和缩小训练数据的影响。* Results: 这个研究发现了一个新的分布式算法，可以对抗氧化剂的适用，并且比前一代算法快得多，可以在 proprietary 和公共数据集上验证。

Abstract
Deep reinforcement learning has successfully been applied for molecular discovery as shown by the Molecule Deep Q-network (MolDQN) algorithm. This algorithm has challenges when applied to optimizing new molecules: training such a model is limited in terms of scalability to larger datasets and the trained model cannot be generalized to different molecules in the same dataset. In this paper, a distributed reinforcement learning algorithm for antioxidants, called DA-MolDQN is proposed to address these problems. State-of-the-art bond dissociation energy (BDE) and ionization potential (IP) predictors are integrated into DA-MolDQN, which are critical chemical properties while optimizing antioxidants. Training time is reduced by algorithmic improvements for molecular modifications. The algorithm is distributed, scalable for up to 512 molecules, and generalizes the model to a diverse set of molecules. The proposed models are trained with a proprietary antioxidant dataset. The results have been reproduced with both proprietary and public datasets. The proposed molecules have been validated with DFT simulations and a subset of them confirmed in public "unseen" datasets. In summary, DA-MolDQN is up to 100x faster than previous algorithms and can discover new optimized molecules from proprietary and public antioxidants.

摘要
深度强化学学成功应用于分子发现，如示例的分子深度Q学习算法（MolDQN）。该算法在优化新分子方面存在挑战，训练这样的模型的可扩展性受到限制，而且训练的模型无法泛化到同一个数据集中的其他分子。在这篇论文中，一种分布式强化学学算法 для抗氧化剂，称为DA-MolDQN，是提出来解决这些问题。该算法 integrate了当前顶峰的键解键能（BDE）和质子化能（IP）预测器，这些属性是优化抗氧化剂中的关键性质。通过算法改进，分子修改的训练时间得到了降低。该算法可扩展，可以处理 hasta 512个分子，并且可以泛化到多种分子。该算法使用了一个专有的抗氧化剂数据集进行训练，并且得到了公共数据集的重现结果。结果表明，DA-MolDQN比前一代算法快速多少，可以从专有和公共抗氧化剂中找到优化的新分子。

Rethinking PGD Attack: Is Sign Function Necessary?

paper_url: http://arxiv.org/abs/2312.01260
repo_url: https://github.com/junjieyang97/rgd
paper_authors: Junjie Yang, Tianlong Chen, Xuxi Chen, Zhangyang Wang, Yingbin Liang
for: This paper focuses on improving the performance of adversarial attacks on neural networks by proposing a new algorithm called Raw Gradient Descent (RGD).
methods: The RGD algorithm eliminates the use of sign functions in the update process, instead using a new hidden variable of non-clipped perturbation to move beyond the constraint.
results: The proposed RGD algorithm outperforms existing algorithms such as Projected Gradient Descent (PGD) and other competitors in various settings, without incurring any additional computational overhead.Here’s the Chinese version:
for: 本文提出了一种新的敏感攻击算法 Raw Gradient Descent (RGD)，以提高神经网络上的敏感攻击性能。
methods: RGD算法减少了使用 sign 函数的更新过程，而是使用一个新的隐藏变量 non-clipped 偏移来突破约束。
results: RGD算法在各种场景中比 PGD 和其他竞争者更高效，而且不增加计算开销。

Abstract
Neural networks have demonstrated success in various domains, yet their performance can be significantly degraded by even a small input perturbation. Consequently, the construction of such perturbations, known as adversarial attacks, has gained significant attention, many of which fall within "white-box" scenarios where we have full access to the neural network. Existing attack algorithms, such as the projected gradient descent (PGD), commonly take the sign function on the raw gradient before updating adversarial inputs, thereby neglecting gradient magnitude information. In this paper, we present a theoretical analysis of how such sign-based update algorithm influences step-wise attack performance, as well as its caveat. We also interpret why previous attempts of directly using raw gradients failed. Based on that, we further propose a new raw gradient descent (RGD) algorithm that eliminates the use of sign. Specifically, we convert the constrained optimization problem into an unconstrained one, by introducing a new hidden variable of non-clipped perturbation that can move beyond the constraint. The effectiveness of the proposed RGD algorithm has been demonstrated extensively in experiments, outperforming PGD and other competitors in various settings, without incurring any additional computational overhead. The codes is available in https://github.com/JunjieYang97/RGD.

摘要