cs.LG - 2023-09-06

Ensemble linear interpolators: The role of ensembling

  • paper_url: http://arxiv.org/abs/2309.03354
  • repo_url: None
  • paper_authors: Mingqi Wu, Qiang Sun
  • for: 这个论文研究了 ensemble 如何稳定化和提高泛化性能,特别是在 dealing with noisy data 时。
  • methods: 这个论文使用了 bagging 和 multiplier-bootstrap 等随机化方法来实现 ensemble。
  • results: 研究发现,bagging 可以有效地减少 interpolator 的偏差和噪声,从而提高泛化性能。此外,研究还发现了 sketching 和 bagging 在不同参数空间下的统计作用。
    Abstract Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    摘要 interpolators are unstable. for example, the minimum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. in this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. we focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. we introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. the proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. our results reveal the statistical roles of sketching and bagging. in particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. however, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. in stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. to further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. we also discuss several extensions.

Robotic Table Tennis: A Case Study into a High Speed Learning System

  • paper_url: http://arxiv.org/abs/2309.03315
  • repo_url: None
  • paper_authors: David B. D’Ambrosio, Jonathan Abelian, Saminda Abeyruwan, Michael Ahn, Alex Bewley, Justin Boyd, Krzysztof Choromanski, Omar Cortes, Erwin Coumans, Tianli Ding, Wenbo Gao, Laura Graesser, Atil Iscen, Navdeep Jaitly, Deepali Jain, Juhana Kangaspunta, Satoshi Kataoka, Gus Kouretas, Yuheng Kuang, Nevena Lazic, Corey Lynch, Reza Mahjourian, Sherry Q. Moore, Thinh Nguyen, Ken Oslund, Barney J Reed, Krista Reymann, Pannag R. Sanketi, Anish Shankar, Pierre Sermanet, Vikas Sindhwani, Avi Singh, Vincent Vanhoucke, Grace Vesom, Peng Xu
  • for: 本研究探讨了一个真实世界机器人学习系统,该系统在前一项研究中已经能够与人类进行数百次网球比赛,并且可以准确返回球到 désiré 目标。
  • methods: 该系统使用了优化的感知子系统、高速低延迟的机器人控制器、可预防实际世界损害的模拟平台,以及自动化的实际世界环境重置,以便在物理机器人上进行自主训练和评估。
  • results: 研究人员通过详细描述系统的全部设计决策,以及一系列的研究,解释了mitigate多种延迟的重要性、训练和部署分布shift的考虑、感知系统的Robustness、策略参数的敏感性和行动空间的选择。视频展示了系统的组件和实验结果的细节,可以在https://youtu.be/uFcnWjB42I0找到。
    Abstract We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
    摘要 我们提供了一个深入探讨真实世界机器人学习系统的文章,之前的研究已经证明该系统可以与人类进行数百个桌球赛,并且可以准确返回杯子到愿意的目标上。这个系统结合了高度优化的感知子系统,高速低延迟的机器人控制器,可以防止实际世界中的损害并用于零战斗训练和评估,以及自动化的实际环境重置。我们补充了完整的系统描述,包括多个设计决策,通常不宜公开,以及一系列研究,解释了减少各种延迟的重要性,训练和部署分布shift的考虑,感知系统的稳定性,策略参数的敏感性,和动作空间的选择。一个详细展示系统组件和实验结果的视频可以在https://youtu.be/uFcnWjB42I0找到。

Scalable Learning of Intrusion Responses through Recursive Decomposition

  • paper_url: http://arxiv.org/abs/2309.03292
  • repo_url: None
  • paper_authors: Kim Hammar, Rolf Stadler
  • for: 本研究旨在自动化网络入侵响应,以提高IT基础设施的安全性。
  • methods: 本研究使用了半观察游戏理论和自适应学习来解决攻击者和防御者之间的互动。为解决大规模游戏的计算复杂性问题,我们提出了一种将游戏划分成多个子游戏的方法,并使用优化停止理论来计算最佳回应策略。
  • results: 我们在一个模拟环境中评估了学习的策略,并发现它们可以准确地模拟攻击者和防御者之间的互动。相比之下,一种现有的算法在一个真实的基础设施配置下表现较差。
    Abstract We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
    摘要 我们研究自动化入侵应急回应概念,将攻击者和防御者之间的互动视为一个部分可见随机游戏。为解决这个游戏,我们采用一种方法,其中攻击和防御策略通过反射学习和自我玩家的共演演化向Equilibrium。先前的解决方案虽有可能性,但由于基础设施规模的幂等增长,无法承受实际场景中的游戏。为解决这个问题,我们提出一种方法,即 recursively decomposing the game into subgames,可以并发解决。通过优质停止理论,我们表明攻击策略在这些子游戏中具有阈值结构,可以有效计算。为解决分解的游戏,我们提出了一种算法,即 Decompositional Fictitious Self-Play(DFSP),它通过随机approximation学习Nash equilibria。在模拟环境中,我们测试了学习到的策略,结果表明学习策略接近Equilibrium,并且DFSP在一个实际基础设施配置下显著超越了当前状态艺术算法。

R2D2: Deep neural network series for near real-time high-dynamic range imaging in radio astronomy

  • paper_url: http://arxiv.org/abs/2309.03291
  • repo_url: None
  • paper_authors: Aghabiglou A, Chu C S, Jackson A, Dabbech A, Wiaux Y
    for: 这个论文是用于描述一种基于深度神经网络(DNN)和数据一致更新的新型人工智能(AI)方法,用于高分辨率高动态范围成像探测天文学中的电波天文学(RI)。methods: 该方法基于混合的DNN和数据一致更新,其重建是通过一系列的差分图像来实现,每个差分图像被视为DNN的输出。该方法可以看作是一种学习的匹配追踪方法,其中模型组件是通过差分尘埃图像来逐步确定。results: 在使用S射频的Cygnus~A观测数据上,R2D2模型可以提供高精度的成像,与CLEAN和AIRI/uSARA等其他算法相当。R2D2模型的计算效率远高于AIRI和uSARA,并且比CLEAN更快,这些成果都表明了R2D2模型在RI成像中的优势。
    Abstract We present a novel AI approach for high-resolution high-dynamic range synthesis imaging by radio interferometry (RI) in astronomy. R2D2, standing for "{R}esidual-to-{R}esidual {D}NN series for high-{D}ynamic range imaging", is a model-based data-driven approach relying on hybrid deep neural networks (DNNs) and data-consistency updates. Its reconstruction is built as a series of residual images estimated as the outputs of DNNs, each taking the residual dirty image of the previous iteration as an input. The approach can be interpreted as a learned version of a matching pursuit approach, whereby model components are iteratively identified from residual dirty images, and of which CLEAN is a well-known example. We propose two variants of the R2D2 model, built upon two distinctive DNN architectures: a standard U-Net, and a novel unrolled architecture. We demonstrate their use for monochromatic intensity imaging on highly-sensitive observations of the radio galaxy Cygnus~A at S band, from the Very Large Array (VLA). R2D2 is validated against CLEAN and the recent RI algorithms AIRI and uSARA, which respectively inject a learned implicit regularization and an advanced handcrafted sparsity-based regularization into the RI data. With only few terms in its series, the R2D2 model is able to deliver high-precision imaging, significantly superior to CLEAN and matching the precision of AIRI and uSARA. In terms of computational efficiency, R2D2 runs at a fraction of the cost of AIRI and uSARA, and is also faster than CLEAN, opening the door to real-time precision imaging in RI.
    摘要 我们提出了一种新的人工智能方法,用于高解度高动态范围成像探测(RI)在天文学中。我们称之为“R2D2”,即“差异到差异的深度神经网络系列 для高动态范围成像”。这种方法基于混合深度神经网络(DNN)和数据一致更新。它的重建是建立为一系列的差异图像,每个差异图像被视为前一轮差异 dirty image 的输入。这种方法可以被视为一种学习的匹配追踪方法,其中模型元件是通过差异 dirty images 进行逐步确定。我们提出了两种基于不同 DNN 架构的 R2D2 模型:标准 U-Net 和一种新的折衣架构。我们示出了它们在单色强度成像中的应用,使用 Very Large Array(VLA)对天鹅星系 Cygnus A 进行了高敏感观测。R2D2 被证明比 CLEAN 和 reciently 的 RI 算法 AIRI 和 uSARA 更高精度,并且在计算效率方面也更高,能够在实时高精度成像中具有优势。

Let Quantum Neural Networks Choose Their Own Frequencies

  • paper_url: http://arxiv.org/abs/2309.03279
  • repo_url: None
  • paper_authors: Ben Jaderberg, Antonio A. Gentile, Youssef Achari Berrada, Elvira Shishenina, Vincent E. Elfving
  • for: 这篇论文旨在探讨 parameterized quantum circuit 作为机器学习模型,以及其中的代表性函数的幂展开。
  • methods: 作者使用了一种通过增加可训练参数来扩展传统固定频率模型的方法,以便学习更适合任务的生成器。
  • results: 作者通过数值示例表明,这种方法可以学习出具有恰当特性的生成器,包括非Regularly spaced 频谱和灵活的 spectral richness。此外,作者还实际应用了这种方法,并达到了解决 Navier-Stokes 方程的更高精度。
    Abstract Parameterized quantum circuits as machine learning models are typically well described by their representation as a partial Fourier series of the input features, with frequencies uniquely determined by the feature map's generator Hamiltonians. Ordinarily, these data-encoding generators are chosen in advance, fixing the space of functions that can be represented. In this work we consider a generalization of quantum models to include a set of trainable parameters in the generator, leading to a trainable frequency (TF) quantum model. We numerically demonstrate how TF models can learn generators with desirable properties for solving the task at hand, including non-regularly spaced frequencies in their spectra and flexible spectral richness. Finally, we showcase the real-world effectiveness of our approach, demonstrating an improved accuracy in solving the Navier-Stokes equations using a TF model with only a single parameter added to each encoding operation. Since TF models encompass conventional fixed frequency models, they may offer a sensible default choice for variational quantum machine learning.
    摘要 Parameterized量子Circuits作为机器学习模型通常由其输入特征的 partial Fourier 系列描述,它们的频率唯一由特征映射生成器的 Hamiltonians Determined。通常情况下,这些数据编码生成器在先前选择,确定了可以表示的函数空间。在这项工作中,我们考虑了量子模型的总体化,包括在生成器中添加可学习参数,导致可学习频率(TF)量子模型。我们数值示示了TF模型可以学习符合任务的 generator 的恰当性,包括非 Regularly spaced 的频谱和灵活的 spectral richness。最后,我们展示了我们的方法的实际效果,通过在每个编码操作中添加单个参数来提高解决 Navier-Stokes 方程的精度。由于 TF 模型包括 fixede frequency 模型,它们可能成为变量量子机器学习的合理默认选择。

Matcha-TTS: A fast TTS architecture with conditional flow matching

  • paper_url: http://arxiv.org/abs/2309.03199
  • repo_url: https://github.com/shivammehta25/Matcha-TTS
  • paper_authors: Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter
  • for: 这个研究是为了提出一个新的encoder-decoder架构,实现快速的语音处理模型,并且使用最佳运输汇流匹配(OT-CFM)进行训练。
  • methods: 这个模型使用了一个基于射影函数的数据驱动的decoder,并且通过调整特定的设计选择,使每个合成步骤的运行时间变得更加快速。这个模型是 probabilistic、非 autoregressive,并且从零学习说话。
  • results: 与对照模型相比,Matcha-TTS系统具有最小的内存占用量,在长语音上与最快的模型相当,并且在听力测试中获得了最高的意见分数。另外,这个系统还提供了一些audio例子、代码和预训练模型,请参考https://shivammehta25.github.io/Matcha-TTS/。
    Abstract We introduce Matcha-TTS, a new encoder-decoder architecture for speedy TTS acoustic modelling, trained using optimal-transport conditional flow matching (OT-CFM). This yields an ODE-based decoder capable of high output quality in fewer synthesis steps than models trained using score matching. Careful design choices additionally ensure each synthesis step is fast to run. The method is probabilistic, non-autoregressive, and learns to speak from scratch without external alignments. Compared to strong pre-trained baseline models, the Matcha-TTS system has the smallest memory footprint, rivals the speed of the fastest models on long utterances, and attains the highest mean opinion score in a listening test. Please see https://shivammehta25.github.io/Matcha-TTS/ for audio examples, code, and pre-trained models.
    摘要 我们介绍Matcha-TTS,一个新的编码器-解码器架构,用于快速的干扰声模型训练,使用最佳交通流匹配(OT-CFM)。这个方法使得解码器能够在 fewer synthesis steps 中实现高质量输出,比较Score Matching 训练的模型更快。我们在设计时遵循了一些精心选择,使每个合成步骤都很快。Matcha-TTS 是一个 probabilistic 、非自律的系统,可以从零开始学习说话,不需要外部对齐。相比强大的预训练基eline模型,Matcha-TTS 系统具有最小的内存占用量,在长句子上的速度与最快的模型相当,并在聆听测试中获得了最高的意见分数。您可以参考 获取音频示例、代码和预训练模型。

  • paper_url: http://arxiv.org/abs/2309.03190
  • repo_url: https://github.com/zhxchd/blink_gnn
  • paper_authors: Xiaochen Zhu, Vincent Y. F. Tan, Xiaokui Xiao
  • for: 防止训练 Graph Neural Networks (GNNs) 时提高隐私投入,使得 Collaborative Graph Learning 可以在不披露图structure的情况下进行。
  • methods: 使用 link local differential privacy over decentralized nodes,通过 Bayesian estimation 更好地减少图像噪声,使得 GNNs 的训练不受隐私影响。
  • results: 提出了两种隐私机制,可以根据不同的隐私预算来选择合适的机制,并提出了一种 hybrid 机制,可以在不同的隐私预算下表现更好。实验结果表明,我们的方法可以在不同的隐私预算下提高 GNNs 的准确性。
    Abstract Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 在不同的图像任务中学习节点嵌入显示出了不断增长的潜力,但是它们的训练可能会引起隐私问题。为了解决这问题,我们提出了基于图像的链地方 differential privacy(LDP),允许不可信服务器在训练 GNNs 时不 revela any 链的存在。我们的方法在服务器端分配隐私预算,并使用某些隐私技术来更好地降噪图像,从而减轻 LDP 对训练 GNNs 的影响。我们对推测的链probability bound the mean absolute error against the ground truth graph topology。我们还提出了两种不同的隐私设定下的LDP机制,一种优先采用 fewer links 来避免高度不确定性时的假阳性链估计,另一种则可以在更高的隐私预算下获得更好的性能。最后,我们提出了一种混合机制,可以在不同的隐私预算下实现更好的性能。我们的实验表明,我们的方法在不同的隐私预算下都能够超越现有的方法。

Impression-Informed Multi-Behavior Recommender System: A Hierarchical Graph Attention Approach

  • paper_url: http://arxiv.org/abs/2309.03169
  • repo_url: None
  • paper_authors: Dong Li, Divya Bhargavi, Vidya Sagar Ravipati
  • for: This paper aims to address the limitations of traditional recommender systems that rely solely on implicit feedback, such as item purchases, by incorporating multi-behavior interactions and hierarchical attention mechanisms to improve the accuracy of recommendations.
  • methods: The proposed Hierarchical Multi-behavior Graph Attention Network (HMGN) utilizes attention mechanisms to distinguish between different types of behaviors and hierarchical Bayesian personalized ranking for optimization. The model also incorporates a specialized multi-behavior sub-graph sampling technique and can seamlessly integrate knowledge metadata and time-series data.
  • results: The paper reports up to 64% performance boost in NDCG@100 metrics compared to conventional graph neural network methods, demonstrating the effectiveness of the proposed HMGN model in improving the accuracy of recommendations based on multi-behavior interactions.
    Abstract While recommender systems have significantly benefited from implicit feedback, they have often missed the nuances of multi-behavior interactions between users and items. Historically, these systems either amalgamated all behaviors, such as \textit{impression} (formerly \textit{view}), \textit{add-to-cart}, and \textit{buy}, under a singular 'interaction' label, or prioritized only the target behavior, often the \textit{buy} action, discarding valuable auxiliary signals. Although recent advancements tried addressing this simplification, they primarily gravitated towards optimizing the target behavior alone, battling with data scarcity. Additionally, they tended to bypass the nuanced hierarchy intrinsic to behaviors. To bridge these gaps, we introduce the \textbf{H}ierarchical \textbf{M}ulti-behavior \textbf{G}raph Attention \textbf{N}etwork (HMGN). This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors while employing a multi-task Hierarchical Bayesian Personalized Ranking (HBPR) for optimization. Recognizing the need for scalability, our approach integrates a specialized multi-behavior sub-graph sampling technique. Moreover, the adaptability of HMGN allows for the seamless inclusion of knowledge metadata and time-series data. Empirical results attest to our model's prowess, registering a notable performance boost of up to 64\% in NDCG@100 metrics over conventional graph neural network methods.
    摘要 历史上,推荐系统往往忽略了用户和物品之间的多种互动行为的细节。这些系统可能会将所有行为都汇总到一个单一的“互动”标签下,或者仅仅优先级化目标行为(通常是购买行为),抛弃了有价值的辅助信号。虽然最近的进步尝试了解决这些简化,但它们主要是专注于优化目标行为,面临着数据缺乏问题。此外,它们往往忽略了行为之间的层次结构。为了bridging这些差距,我们介绍了一种新的 Hierarchical Multi-behavior Graph Attention Network (HMGN)。这种革新的框架利用了注意力机制,以便在多种互动行为之间找出信息,同时使用多任务的层次 bayesian个性化排序(HBPR)进行优化。我们的方法还 integrates 特殊的多种互动子图采样技术,以确保可扩展性。此外,HMGN 还可以轻松地包含知识metadata和时间序列数据。实验结果表明,我们的模型在 NDCG@100 指标上具有显著的性能提升,高达 64%,较 conventional graph neural network 方法更出色。

Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.03157
  • repo_url: https://github.com/theilem/uavSim
  • paper_authors: Mirco Theile, Harald Bayerlein, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli
  • for: solving the power-constrained coverage path planning problem for battery-limited unmanned aerial vehicles (UAVs)
  • methods: using a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, action masking, and discount factor scheduling
  • results: outperforming a baseline heuristic, generalizing to different target zones and maps, with limited generalization to unseen maps.
    Abstract Coverage path planning (CPP) is a critical problem in robotics, where the goal is to find an efficient path that covers every point in an area of interest. This work addresses the power-constrained CPP problem with recharge for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable challenge emerges from integrating recharge journeys into the overall coverage strategy, highlighting the intricate task of making strategic, long-term decisions. We propose a novel proximal policy optimization (PPO)-based deep reinforcement learning (DRL) approach with map-based observations, utilizing action masking and discount factor scheduling to optimize coverage trajectories over the entire mission horizon. We further provide the agent with a position history to handle emergent state loops caused by the recharge capability. Our approach outperforms a baseline heuristic, generalizes to different target zones and maps, with limited generalization to unseen maps. We offer valuable insights into DRL algorithm design for long-horizon problems and provide a publicly available software framework for the CPP problem.
    摘要

Data-Driven Neural Polar Codes for Unknown Channels With and Without Memory

  • paper_url: http://arxiv.org/abs/2309.03148
  • repo_url: None
  • paper_authors: Ziv Aharoni, Bashar Huleihel, Henry D. Pfister, Haim H. Permuter
  • for: 这个论文是为了设计具有内存和无内存通道的浮点码而提出的一种数据驱动方法。
  • methods: 这种方法利用了successive cancellation(SC)解码器的结构,通过将SC解码器的核心元素 replaced by neural networks(NNs)来设计一个名为 neural SC(NSC)解码器。此外,还增加了一个NN来嵌入通道输出到SC解码器的输入空间。
  • results: 该方法被证明有理论保证,其计算复杂度不随通道内存大小增长,与successive cancellation trellis(SCT)解码器不同。在memoryless通道和具有内存通道上,实验结果与最佳浮点解码器相比较好。此外,该方法还可以应用于SC和SCT解码器不可用的情况。
    Abstract In this work, a novel data-driven methodology for designing polar codes for channels with and without memory is proposed. The methodology is suitable for the case where the channel is given as a "black-box" and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model. The proposed method leverages the structure of the successive cancellation (SC) decoder to devise a neural SC (NSC) decoder. The NSC decoder uses neural networks (NNs) to replace the core elements of the original SC decoder, the check-node, the bit-node and the soft decision. Along with the NSC, we devise additional NN that embeds the channel outputs into the input space of the SC decoder. The proposed method is supported by theoretical guarantees that include the consistency of the NSC. Also, the NSC has computational complexity that does not grow with the channel memory size. This sets its main advantage over successive cancellation trellis (SCT) decoder for finite state channels (FSCs) that has complexity of $O(|\mathcal{S}|^3 N\log N)$, where $|\mathcal{S}|$ denotes the number of channel states. We demonstrate the performance of the proposed algorithms on memoryless channels and on channels with memory. The empirical results are compared with the optimal polar decoder, given by the SC and SCT decoders. We further show that our algorithms are applicable for the case where there SC and SCT decoders are not applicable.
    摘要 “在这项工作中,我们提出了一种新的数据驱动方法,用于设计极码 для无记忆和具有记忆的通道。这种方法适用于 Situation where the channel is given as a "black-box" and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model。我们的方法利用了成功的极化缓冲(SC)解码器的结构,并使用神经网络(NN)来替换SC解码器的核心元素,包括检查节点、位节点和软决策。此外,我们还提出了一个额外的NN,用于将通道输出嵌入到SC解码器的输入空间中。我们的方法得到了理论保证,包括NSC的一致性,并且NSC的计算复杂度不随通道记忆大小增长。这个特点使得NSC在对于有限状态通道(FSC)上的计算复杂度为O($|\mathcal{S}|^3N\log N$),而SCT解码器的计算复杂度为O($|\mathcal{S}|^3N^2\log N$).我们在无记忆通道和具有记忆通道上对方法进行了实验,并与最佳极码解码器(SC和SCT解码器)进行了比较。我们还证明了我们的方法在SC和SCT解码器不可用的情况下也适用。”

The Best Arm Evades: Near-optimal Multi-pass Streaming Lower Bounds for Pure Exploration in Multi-armed Bandits

  • paper_url: http://arxiv.org/abs/2309.03145
  • repo_url: None
  • paper_authors: Sepehr Assadi, Chen Wang
  • for: 这个论文是为了解决多支枪(Multi-armed bandits)中的纯exploration问题,提出了一种near-optimal的样本传递贸易,即使使用流式算法并不需要较高的内存占用。
  • methods: 这个论文使用了流式算法,并且使用了优化的样本复杂度为$O(\frac{n}{\Delta^2}$,其中$n$是枪数和$\Delta$是奖励差值。
  • results: 这个论文得到了与 Jin et al. 的 ICML’21 论文(即使下个数)匹配的结果,即使用$O(1)$内存可以得到$O(\log(\frac{1}{\Delta}))$ 的pass结果,并解决了Assadi和Wang 在 STOC’20 上提出的一个开问。
    Abstract We give a near-optimal sample-pass trade-off for pure exploration in multi-armed bandits (MABs) via multi-pass streaming algorithms: any streaming algorithm with sublinear memory that uses the optimal sample complexity of $O(\frac{n}{\Delta^2})$ requires $\Omega(\frac{\log{(1/\Delta)}{\log\log{(1/\Delta)})$ passes. Here, $n$ is the number of arms and $\Delta$ is the reward gap between the best and the second-best arms. Our result matches the $O(\log(\frac{1}{\Delta}))$-pass algorithm of Jin et al. [ICML'21] (up to lower order terms) that only uses $O(1)$ memory and answers an open question posed by Assadi and Wang [STOC'20].
    摘要 我们提供了一个近似优化的样本传递协议 для纯探索在多重抓枪(MAB)中,通过多重流程算法:任何流程算法具有线性内存使用最佳样本复杂度为$O(\frac{n}{\Delta^2})$,需要$\Omega(\frac{\log{(1/\Delta)}{\log\log{(1/\Delta)})$ passes。在这里,$n$ 是抓枪的数量,$\Delta$ 是最佳和第二最佳抓枪之间的奖励差。我们的结果与 Jin et al. 的 $O(\log(\frac{1}{\Delta}))$-pass算法(Up to lower order terms)相匹配,该算法只使用 $O(1)$ 内存,并回答了 Assadi 和 Wang 提出的问题(STOC'20)。

Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2309.03139
  • repo_url: None
  • paper_authors: Daniel Levy, Sékou-Oumar Kaba, Carmelo Gonzales, Santiago Miret, Siamak Ravanbakhsh
  • for: 这个论文是为了扩展E(n)-对称图 neural network,使其使用多个对称向量。
  • methods: 这个论文使用多个对称向量来提高性能,并展示了这种扩展对不同物理系统 benchmark 任务有着明显的改善。
  • results: 对于N-体电荷粒子动力学、分子性质预测和太阳系体 trajectory 预测等任务,多通道EGNN 表现出了更高的性能,与标准单通道EGNN 的差异非常小。
    Abstract We present a natural extension to E(n)-equivariant graph neural networks that uses multiple equivariant vectors per node. We formulate the extension and show that it improves performance across different physical systems benchmark tasks, with minimal differences in runtime or number of parameters. The proposed multichannel EGNN outperforms the standard singlechannel EGNN on N-body charged particle dynamics, molecular property predictions, and predicting the trajectories of solar system bodies. Given the additional benefits and minimal additional cost of multi-channel EGNN, we suggest that this extension may be of practical use to researchers working in machine learning for the physical sciences
    摘要 我们提出了多通道E(n)-对称图 neural network的自然扩展,该扩展使用每个节点多个对称向量。我们形式化了扩展并证明其在不同物理系统benchmark任务中提高性能,而且与标准单通道EGNN的运行时间和参数数量差不多。我们的多通道EGNN在N-体电荷 particule动力学、分子性质预测和太阳系体天路预测等任务上表现出色,与标准单通道EGNN相比,它的性能更高。考虑到这种扩展的额外优点和较少的额外成本,我们建议这种扩展可能对物理科学领域的研究人员有实际用途。

Graph Theory Applications in Advanced Geospatial Research

  • paper_url: http://arxiv.org/abs/2309.03249
  • repo_url: None
  • paper_authors: Surajit Ghosh, Archita Mallick, Anuva Chowdhury, Kounik De Sarkar
  • for: 本文旨在探讨 graf 理论在地理科学中的应用,包括网络分析、空间连接性、地理信息系统等方面。
  • methods: 本文使用 graf 理论的多种算法和概念来模拟和分析空间关系,包括度量学、最短路径、最大流等。
  • results: 本文提供了各种实际应用场景,如环境监测、交通规划、基础设施规划等,以及使用 graf 理论解决这些问题的研究和技术。
    Abstract Geospatial sciences include a wide range of applications, from environmental monitoring transportation to infrastructure planning, as well as location-based analysis and services. Graph theory algorithms in mathematics have emerged as indispensable tools in these domains due to their capability to model and analyse spatial relationships efficiently. This technical report explores the applications of graph theory algorithms in geospatial sciences, highlighting their role in network analysis, spatial connectivity, geographic information systems, and various other spatial problem-solving scenarios. It provides a comprehensive idea about the key concepts and algorithms of graph theory that assist the modelling processes. The report provides insights into the practical significance of graph theory in addressing real-world geospatial challenges and opportunities. It lists the extensive research, innovative technologies and methodologies implemented in this field.
    摘要 地ospatial科学包括各种应用,从环境监测到交通规划,以及基础设施规划,同时还包括位置基于分析和服务。数学中的图论算法在这些领域中已成为不可或缺的工具,这是因为它可以有效地模拟和分析空间关系。本技术报告探讨了图论算法在地ospatial科学中的应用,特别是在网络分析、空间连接性、地图信息系统和其他空间问题解决方案中。报告提供了关键概念和算法的全面了解,并提供了实际应用中图论在地ospatial挑战中的实际意义和机遇。报告还列出了该领域的广泛的研究、创新技术和方法。

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.03081
  • repo_url: https://github.com/link-zju/orl-auditor
  • paper_authors: Linkang Du, Min Chen, Mingyang Sun, Shouling Ji, Peng Cheng, Jiming Chen, Zhikun Zhang
  • for: 本研究是为了提供一种基于累积奖励的 trajectory-level 数据审核机制,以适应 Offline Deep Reinforcement Learning(Offline DRL)enario。
  • methods: 本研究使用了累积奖励来作为数据审核的依据,并提出了一种基于累积奖励的数据审核机制——ORL-AUDITOR。
  • results: 实验表明,ORL-AUDITOR 可以准确地审核 Offline DRL 模型是否使用了正确的数据,并且可以在多个任务和模型上达到审核精度高于 95% 并且 false positive rate 低于 2.88%。
    Abstract Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with the real-world environment as the online DRL. To support the development of these models, many institutions make datasets publicly available with opensource licenses, but these datasets are at risk of potential misuse or infringement. Injecting watermarks to the dataset may protect the intellectual property of the data, but it cannot handle datasets that have already been published and is infeasible to be altered afterward. Other existing solutions, such as dataset inference and membership inference, do not work well in the offline DRL scenario due to the diverse model behavior characteristics and offline setting constraints. In this paper, we advocate a new paradigm by leveraging the fact that cumulative rewards can act as a unique identifier that distinguishes DRL models trained on a specific dataset. To this end, we propose ORL-AUDITOR, which is the first trajectory-level dataset auditing mechanism for offline RL scenarios. Our experiments on multiple offline DRL models and tasks reveal the efficacy of ORL-AUDITOR, with auditing accuracy over 95% and false positive rates less than 2.88%. We also provide valuable insights into the practical implementation of ORL-AUDITOR by studying various parameter settings. Furthermore, we demonstrate the auditing capability of ORL-AUDITOR on open-source datasets from Google and DeepMind, highlighting its effectiveness in auditing published datasets. ORL-AUDITOR is open-sourced at https://github.com/link-zju/ORL-Auditor.
    摘要 “资料是人工智能中的重要资产,高质量的数据可以帮助机器学习模型提高表现。在安全敏感领域,如自动驾驶车,离线深度学习(离线DRL)常常用于训练模型,而不是在实际环境中进行线上DRL。为了支持这些模型的发展,许多机构会公开数据,但这些数据受到潜在的违用或侵犯的威胁。将水印加入数据可以保护数据的知识产权,但这不能处理已经发布的数据,且无法在后续更改。现有的解决方案,如数据推理和会员推理,在离线DRL scenario中不太适用,因为模型行为特性和离线设定的限制。在这篇文章中,我们主张一个新的思路,利用总奖励可以作为特定数据集训练离线RL模型的唯一识别码。为此,我们提出了ORL-AUDITOR,是离线RL scenario中首个数据集评审机制。我们的实验显示,ORL-AUDITOR在多个离线DRL模型和任务上显示出评审精度高于95%,False Positive率低于2.88%。我们还提供了实际实施ORL-AUDITOR的有用指导,以及在Google和DeepMind的开源数据集上进行评审的能力。ORL-AUDITOR的开源网站可以在https://github.com/link-zju/ORL-Auditor 中找到。”

Parameterizing pressure-temperature profiles of exoplanet atmospheres with neural networks

  • paper_url: http://arxiv.org/abs/2309.03075
  • repo_url: https://github.com/timothygebhard/ml4ptp
  • paper_authors: Timothy D. Gebhard, Daniel Angerhausen, Björn S. Konrad, Eleonora Alei, Sascha P. Quanz, Bernhard Schölkopf
  • For: The paper aims to improve the accuracy and efficiency of atmospheric retrieval (AR) for exoplanets by introducing a new, data-driven parameterization scheme for pressure-temperature (PT) profiles.* Methods: The authors use a latent variable model (based on a neural network) to learn a distribution over functions (PT profiles) and a decoder network to map pressure to temperature. They train and evaluate their method on two publicly available datasets of self-consistent PT profiles.* Results: The authors find that their method achieves better fit quality than existing baseline methods, despite using fewer parameters. In an AR based on existing literature, their model (using two parameters) produces a tighter, more accurate posterior for the PT profile than the five-parameter polynomial baseline, while also speeding up the retrieval by more than a factor of three.
    Abstract Atmospheric retrievals (AR) of exoplanets typically rely on a combination of a Bayesian inference technique and a forward simulator to estimate atmospheric properties from an observed spectrum. A key component in simulating spectra is the pressure-temperature (PT) profile, which describes the thermal structure of the atmosphere. Current AR pipelines commonly use ad hoc fitting functions here that limit the retrieved PT profiles to simple approximations, but still use a relatively large number of parameters. In this work, we introduce a conceptually new, data-driven parameterization scheme for physically consistent PT profiles that does not require explicit assumptions about the functional form of the PT profiles and uses fewer parameters than existing methods. Our approach consists of a latent variable model (based on a neural network) that learns a distribution over functions (PT profiles). Each profile is represented by a low-dimensional vector that can be used to condition a decoder network that maps $P$ to $T$. When training and evaluating our method on two publicly available datasets of self-consistent PT profiles, we find that our method achieves, on average, better fit quality than existing baseline methods, despite using fewer parameters. In an AR based on existing literature, our model (using two parameters) produces a tighter, more accurate posterior for the PT profile than the five-parameter polynomial baseline, while also speeding up the retrieval by more than a factor of three. By providing parametric access to physically consistent PT profiles, and by reducing the number of parameters required to describe a PT profile (thereby reducing computational cost or freeing resources for additional parameters of interest), our method can help improve AR and thus our understanding of exoplanet atmospheres and their habitability.
    摘要 通常情况下,描述外层行星大气的 Retrieval (AR) 都是通过 Bayesian 推理技术和前向模拟器来估算大气属性,其中一个关键组件是压力-温度(PT) 规则,它描述大气的热结构。现有的 AR 管道通常使用各种各样的适应函数来限制从观测 спектrum 中获取的 PT 规则,但仍然使用较多参数。在这项工作中,我们提出了一种新的、数据驱动的参数化方案,这种方案不需要显式地假设 PT 规则的函数形式,并且使用 fewer 参数 than 现有方法。我们的方法基于一个 latent 变量模型(基于神经网络),该模型学习一个分布 über 函数(PT 规则)。每个 profile 都被表示为一个低维度的 вектор,可以用来 condition 一个 decoder 网络,该网络将 $P$ 映射到 $T$。在训练和评估我们的方法时,我们发现我们的方法在两个公开可用的 datasets 上的自 consistency 的 PT 规则上达到了更高的适应质量,尽管使用 fewer 参数。在基于现有文献的 AR 中,我们的模型(使用两个参数)生成了一个更紧凑、更准确的 posterior 对 PT 规则,而且同时提高了计算速度,比现有五个参数的多项式基线 faster 了 более三分之一。通过为 PT 规则提供参数化的物理一致性,并降低计算量或释放资源,我们的方法可以帮助改进 AR,从而提高我们对外层行星大气的理解和其居住性。

Learning Active Subspaces for Effective and Scalable Uncertainty Quantification in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.03061
  • repo_url: None
  • paper_authors: Sanket Jantre, Nathan M. Urban, Xiaoning Qian, Byung-Jun Yoon
  • for: 这个论文旨在提出一种解决 bayesian deep learning 的计算复杂性问题的方法,使得可以提供准确的预测和量化的uncertainty。
  • methods: 该方法是通过构建一个具有低维度的 neural network 参数空间,并在这个空间中进行 Bayesian 推断,从而使得计算复杂性得到了改进。
  • results: 实验表明,该方法可以提供可靠的预测和具有robustness的uncertainty估计,并且可以在多种回归任务中应用。
    Abstract Bayesian inference for neural networks, or Bayesian deep learning, has the potential to provide well-calibrated predictions with quantified uncertainty and robustness. However, the main hurdle for Bayesian deep learning is its computational complexity due to the high dimensionality of the parameter space. In this work, we propose a novel scheme that addresses this limitation by constructing a low-dimensional subspace of the neural network parameters-referred to as an active subspace-by identifying the parameter directions that have the most significant influence on the output of the neural network. We demonstrate that the significantly reduced active subspace enables effective and scalable Bayesian inference via either Monte Carlo (MC) sampling methods, otherwise computationally intractable, or variational inference. Empirically, our approach provides reliable predictions with robust uncertainty estimates for various regression tasks.
    摘要 bayesian 推理 для神经网络,或 bayesian 深度学习,有可能提供准确的预测和量化的不确定性。然而,bayesian 深度学习的主要障碍是其参数空间的维度太高,导致计算复杂度过高。在这种情况下,我们提议一种新的方案,即在神经网络参数空间中构建一个低维度的活动子空间(referred to as an active subspace),其中神经网络参数的方向对输出的影响最大。我们证明了,这个减少后的活动子空间可以使bayesian 推理变得可行和扩展性强。我们通过MC sampling方法或variational推理进行实验,并证明了我们的方法可以提供可靠的预测和robust的不确定性估计。

CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

  • paper_url: http://arxiv.org/abs/2309.03060
  • repo_url: https://github.com/wilson-labs/cola
  • paper_authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson
  • for: 这篇论文是为了解决大规模机器学习和科学中的线性代数问题,例如 eigendecompositions、解决线性系统、计算矩阵幂和迹估计。
  • methods: 这篇论文提出了一个简单 yet 通用的框架,named CoLA (Compositional Linear Algebra),可以自动构建高效的内存和时间间步的数据运算法。CoLA使用了线性算子抽象和compositional dispatch规则,可以实现内存高效的自动微分、低精度计算和GPU加速,并且可以与多种下游套件进行整合。
  • results: 这篇论文显示了CoLA在广泛的应用中的高效性,包括partial differential equations、Gaussian processes、equivariant model construction和无supervised learning。
    Abstract Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine learning, named CoLA (Compositional Linear Algebra). By combining a linear operator abstraction with compositional dispatch rules, CoLA automatically constructs memory and runtime efficient numerical algorithms. Moreover, CoLA provides memory efficient automatic differentiation, low precision computation, and GPU acceleration in both JAX and PyTorch, while also accommodating new objects, operations, and rules in downstream packages via multiple dispatch. CoLA can accelerate many algebraic operations, while making it easy to prototype matrix structures and algorithms, providing an appealing drop-in tool for virtually any computational effort that requires linear algebra. We showcase its efficacy across a broad range of applications, including partial differential equations, Gaussian processes, equivariant model construction, and unsupervised learning.
    摘要 很多机器学习和科学领域都涉及到大规模的线性代数问题,如归一化、解系统、计算矩阵幂和跟踪估计。这些矩阵经常具有克罗内cker、卷积、块对角、和积加结构。在这篇论文中,我们提出了一个简单 yet general的框架 для大规模的线性代数问题,名为CoLA(Compositional Linear Algebra)。通过将线性运算抽象与组合调度规则结合起来,CoLA自动构建了高效的内存和运行时数值算法。此外,CoLA提供了内存高效的自动微分、低精度计算和GPU加速,同时也支持在JAX和PyTorch中进行多重派发。CoLA可以加速许多线性运算,并使其易于实际matrix结构和算法的探索,提供了许多应用程序的灵活Drop-in工具。我们在各种应用程序中展示了CoLA的效果,包括偏微分方程、Gaussian процеcess、对称型建构和无监督学习。

Automated CVE Analysis for Threat Prioritization and Impact Prediction

  • paper_url: http://arxiv.org/abs/2309.03040
  • repo_url: None
  • paper_authors: Ehsan Aghaei, Ehab Al-Shaer, Waseem Shadid, Xi Niu
  • For: This paper aims to improve the efficiency and accuracy of CVE analysis and threat prioritization by introducing a novel predictive model and tool called CVEDrill.* Methods: CVEDrill uses machine learning algorithms to estimate the CVSS vector for precise threat mitigation and priority ranking, and it also automates the classification of CVEs into the appropriate CWE hierarchy classes.* Results: CVEDrill outperforms state-of-the-art tools like ChaptGPT in terms of accuracy and timeliness, allowing organizations to implement cybersecurity countermeasure mitigation with unparalleled effectiveness.Here are the three points in Simplified Chinese text:* For: 这篇论文目的是提高CVE分析和威胁优先级的效率和准确性,通过引入一种新的预测模型和工具CVEDrill。* Methods: CVEDrill使用机器学习算法来估算CVSS向量,以便精确地确定威胁优先级和Countermeasure的适用性,同时自动将CVE分类到适当的CWE层次结构中。* Results: CVEDrill比ChaptGPT等现有工具更高效和准确, allowing organizations可以通过无 precedent的效果和时效性来实施cybersecurity countermeasure mitigation。
    Abstract The Common Vulnerabilities and Exposures (CVE) are pivotal information for proactive cybersecurity measures, including service patching, security hardening, and more. However, CVEs typically offer low-level, product-oriented descriptions of publicly disclosed cybersecurity vulnerabilities, often lacking the essential attack semantic information required for comprehensive weakness characterization and threat impact estimation. This critical insight is essential for CVE prioritization and the identification of potential countermeasures, particularly when dealing with a large number of CVEs. Current industry practices involve manual evaluation of CVEs to assess their attack severities using the Common Vulnerability Scoring System (CVSS) and mapping them to Common Weakness Enumeration (CWE) for potential mitigation identification. Unfortunately, this manual analysis presents a major bottleneck in the vulnerability analysis process, leading to slowdowns in proactive cybersecurity efforts and the potential for inaccuracies due to human errors. In this research, we introduce our novel predictive model and tool (called CVEDrill) which revolutionizes CVE analysis and threat prioritization. CVEDrill accurately estimates the CVSS vector for precise threat mitigation and priority ranking and seamlessly automates the classification of CVEs into the appropriate CWE hierarchy classes. By harnessing CVEDrill, organizations can now implement cybersecurity countermeasure mitigation with unparalleled accuracy and timeliness, surpassing in this domain the capabilities of state-of-the-art tools like ChaptGPT.
    摘要 通用漏洞和曝露(CVE)是重要的cybersecurity措施之一,包括服务补丁、安全强化等。然而,CVE通常只提供低级别、产品特定的漏洞描述,经常缺乏必要的攻击语义信息,这对于全面评估漏洞弱点和威胁影响进行了精度的评估和准确的威胁评估是关键。现行industry实践是通过手动评估CVE来评估其攻击严重性,并将其映射到Common Weakness Enumeration(CWE)中 для可能的缓解 identification。然而,这个手动分析过程存在主要的瓶颈,导致漏洞分析过程中的拥堵和可能的人工错误导致的不准确。在这项研究中,我们介绍了我们的新的预测模型和工具(称为CVEDrill),这种模型可以准确地估算CVSS vector,以便precise的威胁缓解和排名,并自动将CVE分类到相应的CWE层次分类中。通过使用CVEDrill,组织可以现在通过无前例的准确性和时效性,超越现有的工具,如ChaptGPT。

Deep Learning for Polycystic Kidney Disease: Utilizing Neural Networks for Accurate and Early Detection through Gene Expression Analysis

  • paper_url: http://arxiv.org/abs/2309.03033
  • repo_url: None
  • paper_authors: Kapil Panda, Anirudh Mazumder
  • for: 早期诊断肾脏瘤病(PKD),以确保病情管理的效果。
  • methods: 使用深度学习方法分析病人基因表达,实现精准和可靠的病症检测。
  • results: 研究提出的神经网络模型可以准确地预测病人可能患有PKD。
    Abstract With Polycystic Kidney Disease (PKD) potentially leading to fatal complications in patients due to the formation of cysts in the kidneys, early detection of PKD is crucial for effective management of the condition. However, the various patient-specific factors that play a role in the diagnosis make it an intricate puzzle for clinicians to solve. Therefore, in this study, we aim to utilize a deep learning-based approach for early disease detection. The devised neural network can achieve accurate and robust predictions for possible PKD in patients by analyzing patient gene expressions.
    摘要 患有肾脏瘤病(PKD)的患者可能会因肾脏中的肿块形成而导致致命的合并症状。因此,早期PKD的诊断非常重要,以便有效管理疾病。然而,诊断过程中受到各种患者特定的因素的影响,使得诊断变得非常复杂。因此,在这项研究中,我们尝试使用深度学习的方法来早期诊断PKD。我们设计的神经网络可以通过分析患者的基因表达来实现准确和可靠的预测。

Amortised Inference in Bayesian Neural Networks

  • paper_url: http://arxiv.org/abs/2309.03018
  • repo_url: https://github.com/sheev13/bnn_amort_inf
  • paper_authors: Tommy Rochussen
  • for: 这个论文的目的是提出一种更有效率的 probabilistic meta-learning 方法,以便在有限的数据情况下进行预测。
  • methods: 这篇论文使用了 Bayesian neural networks 和 per-datapoint amortisation of inference 技术,实现了更有效率的 probabilistic meta-learning。
  • results: 这个方法在一个一维 regression 问题和一个更加复杂的图像完成问题中,具有较好的预测性能, especialy when the amount of training data is limited.
    Abstract Meta-learning is a framework in which machine learning models train over a set of datasets in order to produce predictions on new datasets at test time. Probabilistic meta-learning has received an abundance of attention from the research community in recent years, but a problem shared by many existing probabilistic meta-models is that they require a very large number of datasets in order to produce high-quality predictions with well-calibrated uncertainty estimates. In many applications, however, such quantities of data are simply not available. In this dissertation we present a significantly more data-efficient approach to probabilistic meta-learning through per-datapoint amortisation of inference in Bayesian neural networks, introducing the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN). First, we show that the approximate posteriors obtained under our amortised scheme are of similar or better quality to those obtained through traditional variational inference, despite the fact that the amortised inference is performed in a single forward pass. We then discuss how the APOVI-BNN may be viewed as a new member of the neural process family, motivating the use of neural process training objectives for potentially better predictive performance on complex problems as a result. Finally, we assess the predictive performance of the APOVI-BNN against other probabilistic meta-models in both a one-dimensional regression problem and in a significantly more complex image completion setting. In both cases, when the amount of training data is limited, our model is the best in its class.
    摘要 <>translate_language: zh-CN<>框架:机器学习模型在一系列数据集上进行训练,以生成新数据集上的预测。 probabilistic meta-learning 在研究 сообществе中获得了很多关注,但是现有的 probabilistic meta-models Problem 是需要很多数据集来生成高质量的预测和准确的不确定度估计。在许多应用程序中,这些数据集的数量并不够。在这个论文中,我们提出了一种更加数据效率的 probabilistic meta-learning 方法,通过 Bayesian neural networks 中的 per-datapoint amortisation of inference, introduce the Amortised Pseudo-Observation Variational Inference Bayesian Neural Network (APOVI-BNN).我们首先示出,我们的杜立格 posterior 与传统的 variational inference 相比,可以在单个前进 pass 中获得类似或更好的结果。然后,我们讨论了如何视 APOVI-BNN 为一种新的 neural process 成员,并motivate 使用 neural process training objectives 以实现更好的预测性能在复杂问题上。最后,我们评估了 APOVI-BNN 的预测性能与其他 probabilistic meta-models 在一个一维回归问题和一个更加复杂的图像完成问题上。在这两个问题上,当训练数据少于时,我们的模型在其类中表现最佳。

SymED: Adaptive and Online Symbolic Representation of Data on the Edge

  • paper_url: http://arxiv.org/abs/2309.03014
  • repo_url: None
  • paper_authors: Daniel Hofstätter, Shashikant Ilager, Ivan Lujic, Ivona Brandic
  • for: 这个研究旨在实现对互联网预设设备(IoT)产生的数据进行 proximity 处理,并解决将资料传输、储存和处理到资源有限的边缘设备上所出现的挑战。
  • methods: 这个研究使用了符号表示法(SR)来将实际的原始数据转换为符号,以便在边缘设备上进行数据分析(例如异常检测和趋势预测),从而帮助大量边缘应用程序。
  • results: 这个研究的结果显示了 SymED 可以实现以下三个目的:(i)将原始数据压缩为平均压缩率为 9.5%;(ii)在 DTW 空间中保持低的重建误差为 13.25;(iii)同时提供实时适应性,以便在一般延迟为 42ms 的符号中进行在线流动 IoT 数据处理。
    Abstract The edge computing paradigm helps handle the Internet of Things (IoT) generated data in proximity to its source. Challenges occur in transferring, storing, and processing this rapidly growing amount of data on resource-constrained edge devices. Symbolic Representation (SR) algorithms are promising solutions to reduce the data size by converting actual raw data into symbols. Also, they allow data analytics (e.g., anomaly detection and trend prediction) directly on symbols, benefiting large classes of edge applications. However, existing SR algorithms are centralized in design and work offline with batch data, which is infeasible for real-time cases. We propose SymED - Symbolic Edge Data representation method, i.e., an online, adaptive, and distributed approach for symbolic representation of data on edge. SymED is based on the Adaptive Brownian Bridge-based Aggregation (ABBA), where we assume low-powered IoT devices do initial data compression (senders) and the more robust edge devices do the symbolic conversion (receivers). We evaluate SymED by measuring compression performance, reconstruction accuracy through Dynamic Time Warping (DTW) distance, and computational latency. The results show that SymED is able to (i) reduce the raw data with an average compression rate of 9.5%; (ii) keep a low reconstruction error of 13.25 in the DTW space; (iii) simultaneously provide real-time adaptability for online streaming IoT data at typical latencies of 42ms per symbol, reducing the overall network traffic.
    摘要 Edge computing paradigm 帮助处理互联网物联网(IoT)生成的数据在数据的 sources 附近。但是,将大量快速增长的数据转移、存储和处理到资源有限的边缘设备上存在挑战。符号表示(SR)算法是一种有前途的解决方案,可以将实际的原始数据转换成符号,从而降低数据大小。此外,SR 还允许在符号上进行数据分析(例如,异常检测和趋势预测),对于许多边缘应用程序而言是非常有利。然而,现有的 SR 算法都是中央化的设计,在批处理数据的情况下做出了线上的处理,这对实时情况是不可能的。我们提出了 SymED - 符号边缘数据表示方法,即在线、适应、分布式的符号表示方法。SymED 基于 Adaptive Brownian Bridge-based Aggregation(ABBA),假设低功率的 IoT 设备进行初步数据压缩(发送器),而更 robust的边缘设备进行符号转换(接收器)。我们通过测量压缩性能、重建精度通过动态时间戳(DTW)距离以及计算延迟来评估 SymED。结果显示,SymED 能够:1. 将原始数据压缩到平均压缩率为 9.5%。2. 在 DTW 空间保持低于 13.25 的重建错误。3. 同时提供在线适应性,对于常见的 42ms 每个符号的延迟,减少总网络流量。

Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2309.03004
  • repo_url: None
  • paper_authors: Ze Peng, Lei Qi, Yinghuan Shi, Yang Gao
  • for: 这个论文旨在解释 activation sparsity 的起源,以及如何通过 gradient sparsity 来实现 adversarial robustness。
  • methods: 这篇论文使用了 gradient sparsity 的概念,以及 random matrix theory (RMT) 来解释 activation sparsity 的起源。
  • results: 该论文提出了两种 plug-and-play 模块和一种 радикаль 修改,以提高模型的 sparse 性和 flatness,并通过实验证明了这些修改的效果。
    Abstract A recent empirical observation of activation sparsity in MLP layers offers an opportunity to drastically reduce computation costs for free. Despite several works attributing it to training dynamics, the theoretical explanation of activation sparsity's emergence is restricted to shallow networks, small training steps well as modified training, even though the sparsity has been found in deep models trained by vanilla protocols for large steps. To fill the three gaps, we propose the notion of gradient sparsity as the source of activation sparsity and a theoretical explanation based on it that explains gradient sparsity and then activation sparsity as necessary steps to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed pure MLPs, and further to Transformers or other architectures if noises are added to weights during training. To eliminate other sources of flatness when arguing sparsities' necessity, we discover the phenomenon of spectral concentration, i.e., the ratio between the largest and the smallest non-zero singular values of weight matrices is small. We utilize random matrix theory (RMT) as a powerful theoretical tool to analyze stochastic gradient noises and discuss the emergence of spectral concentration. With these insights, we propose two plug-and-play modules for both training from scratch and sparsity finetuning, as well as one radical modification that only applies to from-scratch training. Another under-testing module for both sparsity and flatness is also immediate from our theories. Validational experiments are conducted to verify our explanation. Experiments for productivity demonstrate modifications' improvement in sparsity, indicating further theoretical cost reduction in both training and inference.
    摘要 近期观察到多层感知(MLP)层中的活动稀畴现象,可以带来计算成本的削减。虽然一些研究归因于训练动力学,但理论解释活动稀畴的出现受限于浅网络、小训练步骤以及修改训练协议,即使发现在深度模型中训练。为填充这三个差距,我们提出梯度稀畴作为活动稀畴的来源,并基于这一理论来解释梯度稀畴和活动稀畴是对抗攻击的必要步骤,即隐藏特征和参数的平坦性,这与模型训练得到的极值几乎相同。这种理论适用于标准地训练了LayerNorm-ed纯MLP,并可以扩展到Transformers或其他架构。为消除其他 sources of flatness when arguing sparsities' necessity,我们发现特征归一化现象,即权重矩阵的最大和最小非零特征值占总特征值的比率很小。我们利用随机矩阵理论(RMT)作为一种强大的理论工具,分析随机梯度噪声,并讨论活动稀畴的出现。基于这些理解,我们提出了两个插件和一个重大修改,其中一个仅适用于从零开始训练。另外一个模块适用于 both sparsity and flatness,并且可以在训练和推理中带来进一步的成本削减。VALIDATION experiments are conducted to verify our explanation. Productivity experiments demonstrate modifications' improvement in sparsity, indicating further theoretical cost reduction in both training and inference.

Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

  • paper_url: http://arxiv.org/abs/2309.02976
  • repo_url: None
  • paper_authors: Pierre Schumacher, Thomas Geijtenbeek, Vittorio Caggiano, Vikash Kumar, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle
  • for: The paper aims to develop a reinforcement learning (RL) method for natural bipedal walking without relying on extensive expert demonstrations.
  • methods: The paper uses RL to learn a controller that can generate human-like walking with bipedal biomechanical models in complex natural environments.
  • results: The paper achieves natural locomotion with RL without sacrificing robustness, paving the way for a novel approach to studying human walking in complex natural environments.Here’s the Chinese translation of the three key points:
  • for: 本研究的目的是开发一种不需要广泛专家示范的强化学习(RL)方法,以便生成人类自然的双脚步行。
  • methods: 本研究使用RL来学习一个可以生成人类双脚生物运动模型中的自然步行控制器。
  • results: 本研究成功地实现了自然步行,并确保了其稳定性,这些结果显示RL可以在复杂的自然环境中实现人类步行。
    Abstract Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl
    摘要 人类在复杂自然环境中能够实现robust的双腿行走,每步都能够适应地与生物机械动力学和神经传导信号进行互动,以实现稳定性和可靠性。然而, nervious系统如何解决musculoskeletal redundanteness以解决多目标控制问题,包括稳定性、可靠性和能效性,仍未被完全理解。在计算机实验中,能量最小化被证明是一个成功的优化目标,可以通过轨迹优化或刷新控制方法来复制自然的行走。然而,这些方法通常会专注于特定的动作,并且 resulting控制器在补做干扰时有限。在机器人学中,使用强化学习(RL)方法已经实现了高稳定性(以及高效率)的四肢系统行走,但是使用人类类似的双腿生物机械模型进行行走却需要广泛使用专家数据集。这种强制依赖于示例数据集的方法通常会导致 brittle policies 并限制应用于新的行为,尤其是考虑到高维musculoskeletal模型在3D中的可能的多种运动。实现自然的行走通过RL而不需要牺牲其惊人的稳定性可能会开启一种新的研究人类行走的方法。Video:

On the Impact of Feeding Cost Risk in Aquaculture Valuation and Decision Making

  • paper_url: http://arxiv.org/abs/2309.02970
  • repo_url: https://github.com/kevinkamm/aquaculturestochasticfeeding
  • paper_authors: Christian Oliver Ewald, Kevin Kamm
  • For: 研究了动物原料的生产要素中的偶极成本的影响, 专门是关于养殖业。* Methods: 使用了Soybean futures来推断鲑鱼饲料的随机行为, 假设饲料采用Schwartz-2-factor模型。 Comparing harvesting salmon using a decision rule that accounts for stochastic feeding costs or deterministic feeding costs, and using deep neural networks to infer the decision boundary.* Results: 在一些情况下, 考虑到随机饲料成本会导致显著改善, 而在其他情况下, 固定饲料成本可以作为一个好的代理。 新的决策规则都显示了更好的性能, 而且计算成本很低。 使用深度神经网络来改进循环采集和拟合方法, 并在更高维度问题上scale well.
    Abstract We study the effect of stochastic feeding costs on animal-based commodities with particular focus on aquaculture. More specifically, we use soybean futures to infer on the stochastic behaviour of salmon feed, which we assume to follow a Schwartz-2-factor model. We compare the decision of harvesting salmon using a decision rule assuming either deterministic or stochastic feeding costs, i.e. including feeding cost risk. We identify cases, where accounting for stochastic feeding costs leads to significant improvements as well as cases where deterministic feeding costs are a good enough proxy. Nevertheless, in all of these cases, the newly derived rules show superior performance, while the additional computational costs are negligible. From a methodological point of view, we demonstrate how to use Deep-Neural-Networks to infer on the decision boundary that determines harvesting or continuation, improving on more classical regression-based and curve-fitting methods. To achieve this we use a deep classifier, which not only improves on previous results but also scales well for higher dimensional problems, and in addition mitigates effects due to model uncertainty, which we identify in this article. effects due to model uncertainty, which we identify in this article.
    摘要 我们研究生物动物商品中的随机食品成本的影响,特别是关注养殖业。更具体地说,我们使用соยbean futures来推断鳟鱼饲料的随机行为,假设鳟鱼饲料遵循Schwartz-2-factor模型。我们比较了在决定鳟鱼采择时使用决定规则,包括饲料成本风险,以及不包括饲料成本风险的决定规则。我们发现在一些情况下,考虑随机饲料成本可以导致显著改善,而在其他情况下,决定规则假设饲料成本是确定的够好。然而,在所有这些情况下,我们新 derivation的规则都显示出了更高的性能,而且计算成本几乎是零。从方法ологиical的角度来看,我们示例了如何使用深度神经网络来推断决定边界,改进了以往的回归基于方法和曲线适应方法。为了实现这一点,我们使用深度分类器,不仅提高了前一代的结果,而且可扩展到更高维度的问题,并且减轻模型不确定性的影响,我们在这篇文章中提到。

CR-VAE: Contrastive Regularization on Variational Autoencoders for Preventing Posterior Collapse

  • paper_url: http://arxiv.org/abs/2309.02968
  • repo_url: None
  • paper_authors: Fotios Lygerakis, Elmar Rueckert
  • for: 解决Variational Autoencoder(VAE)中的后降现象,使其生成的含义不会随机变化。
  • methods: 我们提出了一种新的解决方案,即对VAE进行对比识别的正则化(CR-VAE),通过增加一个对比目标函数,使得输入和其含义之间的信息流最大化,从而避免后降现象。
  • results: 我们在多个视觉数据集上测试了我们的方法,结果显示,CR-VAE在避免后降现象方面表现出色,超过了现有的方法。
    Abstract The Variational Autoencoder (VAE) is known to suffer from the phenomenon of \textit{posterior collapse}, where the latent representations generated by the model become independent of the inputs. This leads to degenerated representations of the input, which is attributed to the limitations of the VAE's objective function. In this work, we propose a novel solution to this issue, the Contrastive Regularization for Variational Autoencoders (CR-VAE). The core of our approach is to augment the original VAE with a contrastive objective that maximizes the mutual information between the representations of similar visual inputs. This strategy ensures that the information flow between the input and its latent representation is maximized, effectively avoiding posterior collapse. We evaluate our method on a series of visual datasets and demonstrate, that CR-VAE outperforms state-of-the-art approaches in preventing posterior collapse.
    摘要 “Variational Autoencoder(VAE)受到后期崩溃现象的影响,即生成的内在表现变得与输入无关。这会导致输入的表现为病征,并且被归于VAE的目标函数的限制。在这个工作中,我们提出了一个新的解决方案,即对VAE进行对照调整(CR-VAE)。我们的方法是将原始VAE加以一个对照目标,以 maximize 输入和其内在表现之间的相互信息。这策略可以确保输入和它的内在表现之间的信息流汇流,彻底避免后期崩溃。我们在一系列类别视觉数据集上评估了我们的方法,并证明了CR-VAE可以较好地避免后期崩溃。”Note: "后期崩溃" (posterior collapse) refers to a phenomenon where the latent representations generated by a Variational Autoencoder (VAE) become independent of the inputs, leading to degenerate representations of the input.

EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System

  • paper_url: http://arxiv.org/abs/2309.03246
  • repo_url: https://github.com/simula-complex/evoclinical
  • paper_authors: Chengjie Lu, Qinghua Xu, Tao Yue, Shaukat Ali, Thomas Schwitalla, Jan F. Nygård
  • for: 提高GURI自动癌病注册系统的可靠性和精准性,以便为癌病研究和相关统计提供可靠的基础。
  • methods: 提出了一种基于遗传算法的演进式临床数据采集方法,使用先前版本的CCDT作为预训练模型,并使用新版GURI上的数据进行微调。
  • results: 通过评估EvoCLINICAL在三个演进过程中的性能,发现其精度、准确率和F1分数均高于91%,表明EvoCLINICAL是有效的。此外,通过将EvoCLINICAL中的活动学习部分替换为随机选择,研究了转移学习对EvoCLINICAL的总性能的贡献。结果表明,在EvoCLINICAL中使用活动学习可以逐止性能的提高。
    Abstract The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.
    摘要 norway 癌症注册系统 (CRN) 收集癌症患者信息,通过不同的医疗机构(如医学实验室和医院)所提供的癌症信息。这些信息会被自动化癌症注册系统:GURI 验证。GURI 的正常运行非常重要,因为它为癌症研究提供关键的癌症相关统计,并且是癌症研究的基础。为了促进 GURI 的运行,我们提出了创建一个 циber-циber数字双胞胎 (CCDT),可以在不需要与真实系统进行互动的情况下,进行不同的实验和高级分析。然而,GURI 不断发展,因为新的医学诊断和治疗技术、技术进步等。因此,CCDT 也需要不断更新,以保持与 GURI 的同步。一个主要挑战是在更新 CCDT 时,需要大量的数据,并且这些数据需要由新的 GURI 标注。为解决这个问题,我们提出了 EvoCLINICAL,它将前一个 GURI 版本中开发的 CCDT 作为预训练模型,并将其调整为新的 GURI 版本的数据。EvoCLINICAL 使用进化算法选择最佳的肿瘤信息 subset,并将其提交给 GURI 进行查询。我们在三个演变过程中评估 EvoCLINICAL,结果显示其精度、准确率和 F1 分数都高于 91%,这demonstrates EvoCLINICAL 的有效性。此外,我们将 EvoCLINICAL 中的活动学习部分替换为随机选择,以研究转移学习对 EvoCLINICAL 的总性表现的贡献。结果显示,在 EvoCLINICAL 中使用活动学习可以提高其表现的一致性。

A hybrid quantum-classical fusion neural network to improve protein-ligand binding affinity predictions for drug discovery

  • paper_url: http://arxiv.org/abs/2309.03919
  • repo_url: None
  • paper_authors: S. Banerjee, S. He Yuxun, S. Konakanchi, L. Ogunfowora, S. Roy, S. Selvaras, L. Domingo, M. Chehimi, M. Djukic, C. Johnson
  • for: 这个论文主要针对的是药物发现领域中精准预测药物分子和目标蛋白之间的绑定亲和力,特别是当这些蛋白直接影响疾病进程时。
  • methods: 这篇论文使用了hybrid量子机器学习(QML)模型,其中结合了3D和空间图像 convolutional neural networks(CNN)在优化的量子架构中。
  • results: 实验结果显示,提案的模型相比现有的类型模型具有6%的提升预测精度,同时其 converge性也更为稳定。
    Abstract The field of drug discovery hinges on the accurate prediction of binding affinity between prospective drug molecules and target proteins, especially when such proteins directly influence disease progression. However, estimating binding affinity demands significant financial and computational resources. While state-of-the-art methodologies employ classical machine learning (ML) techniques, emerging hybrid quantum machine learning (QML) models have shown promise for enhanced performance, owing to their inherent parallelism and capacity to manage exponential increases in data dimensionality. Despite these advances, existing models encounter issues related to convergence stability and prediction accuracy. This paper introduces a novel hybrid quantum-classical deep learning model tailored for binding affinity prediction in drug discovery. Specifically, the proposed model synergistically integrates 3D and spatial graph convolutional neural networks within an optimized quantum architecture. Simulation results demonstrate a 6% improvement in prediction accuracy relative to existing classical models, as well as a significantly more stable convergence performance compared to previous classical approaches.
    摘要 领域的药物发现涉及精准预测新药分子与目标蛋白之间的绑定亲和力,特别是当这些蛋白直接影响疾病进程时。然而,估计绑定亲和力需要较大的金融和计算资源。当前的方法使用经典机器学习(ML)技术,新兴的量子机器学习(QML)模型也有显著提高表现,因为它们具有内置的并行性和数据维度的加法性。然而,现有模型受到稳定性和预测精度的问题。本文介绍一种新的Quantum-Classical深度学习模型,专门用于药物绑定亲和力预测。该模型 synergistically integrates 3D和空间图 convolutional neural networks within an optimized quantum architecture。实验结果表明,提案的模型与现有经典模型相比,提高预测精度6%,同时也比之前的经典方法更加稳定地 converges。

GroupEnc: encoder with group loss for global structure preservation

  • paper_url: http://arxiv.org/abs/2309.02917
  • repo_url: None
  • paper_authors: David Novak, Sofie Van Gassen, Yvan Saeys
  • for: 本研究旨在开发一种基于Variational Autoencoder(VAE)和SQuadMDS算法的深度学习模型,用于降维高维数据,以提高下游处理效果。
  • methods: 本研究使用了Structure Preservation的概念,在本地和全局水平进行了定制,以创建一种叫做GroupEnc的编码器模型,其中使用了’group loss’函数来创建具有更少全局结构扭曲的嵌入。
  • results: 本研究使用了公共available的生物单细胞脱氧核酸数据集,采用RNX曲线进行评估,并证明了GroupEnc模型可以创建具有更高精度和更少扭曲的嵌入,比VAE模型更好。
    Abstract Recent advances in dimensionality reduction have achieved more accurate lower-dimensional embeddings of high-dimensional data. In addition to visualisation purposes, these embeddings can be used for downstream processing, including batch effect normalisation, clustering, community detection or trajectory inference. We use the notion of structure preservation at both local and global levels to create a deep learning model, based on a variational autoencoder (VAE) and the stochastic quartet loss from the SQuadMDS algorithm. Our encoder model, called GroupEnc, uses a 'group loss' function to create embeddings with less global structure distortion than VAEs do, while keeping the model parametric and the architecture flexible. We validate our approach using publicly available biological single-cell transcriptomic datasets, employing RNX curves for evaluation.
    摘要 (Simplified Chinese)现有最新的维度减少技术已经实现了高维数据的更准确的二维表示。除了可视化目的外,这些表示还可以用于下游处理,包括批处理准备、聚类、社区探测或轨迹推断。我们使用了“结构保持”的概念来创建一个深度学习模型,基于变量自动编码器(VAE)和SQuadMDS算法中的随机四重损失函数。我们的编码器模型,称为GroupEnc,使用了“群体损失”函数来创建具有较少全局结构扭曲的嵌入,而保持模型参数化和架构灵活。我们验证了我们的方法使用公开ailable的生物单细胞脉冲数据集,使用RNX曲线进行评估。

Ensemble DNN for Age-of-Information Minimization in UAV-assisted Networks

  • paper_url: http://arxiv.org/abs/2309.02913
  • repo_url: None
  • paper_authors: Mouhamed Naby Ndiaye, El Houcine Bergou, Hajar El Hammouti
  • for: 本研究旨在解决无人机协助网络中的年龄信息问题(Age-of-Information,AoI),以最小化各设备的预期AoI。
  • methods: 我们首先 derivates a closed-form表达式,以表征设备选择概率下的预期AoI。然后,我们将问题转化为一个非核心最小化问题,并采用Ensemble Deep Neural Network(EDNN)方法来解决。Specifically, we use Deep Neural Networks(DNNs)在ensemble中,通过Lagrangian函数来学习不确定参数。
  • results: 我们的实验表明,提议的EDNN方法可以效果地减少预期AoI,实现了$29.5%$的remarkable减少。
    Abstract This paper addresses the problem of Age-of-Information (AoI) in UAV-assisted networks. Our objective is to minimize the expected AoI across devices by optimizing UAVs' stopping locations and device selection probabilities. To tackle this problem, we first derive a closed-form expression of the expected AoI that involves the probabilities of selection of devices. Then, we formulate the problem as a non-convex minimization subject to quality of service constraints. Since the problem is challenging to solve, we propose an Ensemble Deep Neural Network (EDNN) based approach which takes advantage of the dual formulation of the studied problem. Specifically, the Deep Neural Networks (DNNs) in the ensemble are trained in an unsupervised manner using the Lagrangian function of the studied problem. Our experiments show that the proposed EDNN method outperforms traditional DNNs in reducing the expected AoI, achieving a remarkable reduction of $29.5\%$.
    摘要

A Multimodal Learning Framework for Comprehensive 3D Mineral Prospectivity Modeling with Jointly Learned Structure-Fluid Relationships

  • paper_url: http://arxiv.org/abs/2309.02911
  • repo_url: None
  • paper_authors: Yang Zheng, Hao Deng, Ruisheng Wang, Jingjie Wu
  • for: 这种研究旨在开发一种新的多模态融合模型,用于三维矿产可能地图(3D MPM),能够有效地结合结构和流体信息,通过深度网络架构。
  • methods: 该模型使用了卷积神经网络(CNN)和多层感知神经网络(MLP),并使用 canonical correlation analysis(CCA)对多模态特征进行对接和融合。
  • results: 对于硅铁矿质量分析 dataset 的严谨评估表明,该模型在分辨矿物实例和预测矿产可能性方面具有显著的优势,比其他模型更高效。减少学问还表明了对共同特征利用和 CCA 的重要性。
    Abstract This study presents a novel multimodal fusion model for three-dimensional mineral prospectivity mapping (3D MPM), effectively integrating structural and fluid information through a deep network architecture. Leveraging Convolutional Neural Networks (CNN) and Multilayer Perceptrons (MLP), the model employs canonical correlation analysis (CCA) to align and fuse multimodal features. Rigorous evaluation on the Jiaojia gold deposit dataset demonstrates the model's superior performance in distinguishing ore-bearing instances and predicting mineral prospectivity, outperforming other models in result analyses. Ablation studies further reveal the benefits of joint feature utilization and CCA incorporation. This research not only advances mineral prospectivity modeling but also highlights the pivotal role of data integration and feature alignment for enhanced exploration decision-making.
    摘要 这项研究提出了一种新的多Modal融合模型,用于三维矿物资源潜力地图(3D MPM),能够有效地结合结构和流体信息通过深度网络架构。通过Convolutional Neural Networks (CNN)和多层感知器(MLP),模型使用 canonical correlation analysis (CCA) 对多模态特征进行对齐和融合。经过严格的Jiaojia金矿储量数据集的测试,模型的表现superior于其他模型,在分类矿物实例和预测矿物潜力方面表现出色。剥离研究还表明了特征共同使用和CCA包含的利处。这项研究不仅提高了矿物资源评估模型,还强调了数据集成和特征对齐的重要性,为探索决策提供了更好的支持。

Testing properties of distributions in the streaming model

  • paper_url: http://arxiv.org/abs/2309.03245
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Sampriti Roy, Yadu Vasudev
  • for: 这篇论文研究了在标准访问模型和条件访问模型中的分布测试,当内存available to testing algorithm是有界的情况下。
  • methods: 论文使用了一种优化的样本复杂度和存储空间复杂度的方法来测试分布的性质, samples appear in an online fashion。
  • results: 论文显示了一种可以快速学习幂等分布的简短表示方法,并且可以在内存限制下实现高效的分布测试。此外,这种算法还可以扩展到更大的可分布类型。
    Abstract We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.
    摘要 我们研究分布测试在标准访问模型和条件访问模型中,当存储空间限制的情况下。在两种情况下,样本会在在线方式上出现,并且目标是使用最优的样本数量来测试分布的属性,受到存储空间的限制。首先,我们提供了样本复杂性和存储空间之间的贸易OFF,并在样本是根据条件访问oracle采样时显示了这种贸易OFF。然后,我们示出了如何高效地学习幂等分布的简短表示,并且存储空间的限制是几乎最优的。最后,我们extend了这种算法到更大的分布类型。

Non-Clashing Teaching Maps for Balls in Graphs

  • paper_url: http://arxiv.org/abs/2309.02876
  • repo_url: None
  • paper_authors: Jérémie Chalopin, Victor Chepoi, Fionn Mc Inerney, Sébastien Ratel
  • for: This paper is written to study non-clashing teaching and its applications in machine learning.
  • methods: The paper uses techniques from teaching and learning, including the concept of non-clashing teaching maps and the decision problem of non-clashing teaching dimension.
  • results: The paper shows that the decision problem of non-clashing teaching dimension for a specific class of concept is NP-complete, and derives upper and lower bounds on the size of non-clashing teaching maps for various types of graphs.Here is the answer in Simplified Chinese text:
  • for: 这篇论文是研究非冲突教学和其在机器学习中的应用。
  • methods: 这篇论文使用教学和学习中的概念,包括非冲突教学图和机器学习中的冲突教学维度问题。
  • results: 这篇论文显示了一个特定类型的概念的冲突教学维度问题是NP完备的,并 derive了不同类型的图的非冲突教学图的大小上下限。
    Abstract Recently, Kirkpatrick et al. [ALT 2019] and Fallat et al. [JMLR 2023] introduced non-clashing teaching and showed it to be the most efficient machine teaching model satisfying the benchmark for collusion-avoidance set by Goldman and Mathias. A teaching map $T$ for a concept class $\cal{C}$ assigns a (teaching) set $T(C)$ of examples to each concept $C \in \cal{C}$. A teaching map is non-clashing if no pair of concepts are consistent with the union of their teaching sets. The size of a non-clashing teaching map (NCTM) $T$ is the maximum size of a $T(C)$, $C \in \cal{C}$. The non-clashing teaching dimension NCTD$(\cal{C})$ of $\cal{C}$ is the minimum size of an NCTM for $\cal{C}$. NCTM$^+$ and NCTD$^+(\cal{C})$ are defined analogously, except the teacher may only use positive examples. We study NCTMs and NCTM$^+$s for the concept class $\mathcal{B}(G)$ consisting of all balls of a graph $G$. We show that the associated decision problem {\sc B-NCTD$^+$} for NCTD$^+$ is NP-complete in split, co-bipartite, and bipartite graphs. Surprisingly, we even prove that, unless the ETH fails, {\sc B-NCTD$^+$} does not admit an algorithm running in time $2^{2^{o(vc)}\cdot n^{O(1)}$, nor a kernelization algorithm outputting a kernel with $2^{o(vc)}$ vertices, where vc is the vertex cover number of $G$. These are extremely rare results: it is only the second (fourth, resp.) problem in NP to admit a double-exponential lower bound parameterized by vc (treewidth, resp.), and only one of very few problems to admit an ETH-based conditional lower bound on the number of vertices in a kernel. We complement these lower bounds with matching upper bounds. For trees, interval graphs, cycles, and trees of cycles, we derive NCTM$^+$s or NCTMs for $\mathcal{B}(G)$ of size proportional to its VC-dimension. For Gromov-hyperbolic graphs, we design an approximate NCTM$^+$ for $\mathcal{B}(G)$ of size 2.
    摘要 最近, Kirkpatrick 等人(ALT 2019)和 Fallat 等人(JMLR 2023)提出了不冲突教学模型,并证明其能满足由 Goldman 和 Mathias 提出的冲突避免标准。一个教学地图 $T$ 将一个概念集 $\mathcal{C}$ 中的每个概念 $C$ 映射到一个示例集 $T(C)$。一个不冲突的教学地图(NCTM)的大小是示例集的最大大小。概念集 $\mathcal{C}$ 的不冲突教学维度(NCTD)是最小的 NCTM 的大小。NCTM 和 NCTD 分别在 $\mathcal{C}$ 中的正例和负例上进行教学。我们研究了 $\mathcal{B}(G)$ 中的 NCTM 和 NCTM +$ $,其中 $G$ 是一个图。我们证明了其相关的决策问题 $\sc B$-NCTD$^+$ 是 NP 完全的,并且在 split、co-bipartite 和 bipartite 图中存在 double-exponential 下界。这些结果非常罕见:只有第二(第四,resp.)个 NP 问题可以 Parametrized by vc (treewidth,resp.)下界,并且只有几个问题可以通过 ETH 基础来提供 conditional 下界。我们还提供了匹配的上界。对于树、Interval 图、循环图和树状循环图,我们得到了 NCTM 和 NCTM +$ 的大小与 $\mathcal{B}(G)$ 的 VC-维度成正比。对于 Gromov-hyperbolic 图,我们设计了一个 Approximate NCTM +$。

Learning Hybrid Dynamics Models With Simulator-Informed Latent States

  • paper_url: http://arxiv.org/abs/2309.02873
  • repo_url: None
  • paper_authors: Katharina Ensinger, Sebastian Ziesche, Sebastian Trimpe
  • for: 本研究旨在提出一种新的混合模型方法,用于结合学习和物理假设模型来提高预测结果的物理有效性。
  • methods: 我们提出了一种基于观察器的方法,通过将观察器与黑盒模拟器结合在一起,以控制预测结果的误差。在学习中,我们同时学习了动力和观察器,以便通过模拟器来更正学习过程中的模型匹配错误。
  • results: 我们的方法可以在不可预知的情况下提供更加准确和物理有效的预测结果,并且可以保持模型的灵活性。
    Abstract Dynamics model learning deals with the task of inferring unknown dynamics from measurement data and predicting the future behavior of the system. A typical approach to address this problem is to train recurrent models. However, predictions with these models are often not physically meaningful. Further, they suffer from deteriorated behavior over time due to accumulating errors. Often, simulators building on first principles are available being physically meaningful by design. However, modeling simplifications typically cause inaccuracies in these models. Consequently, hybrid modeling is an emerging trend that aims to combine the best of both worlds. In this paper, we propose a new approach to hybrid modeling, where we inform the latent states of a learned model via a black-box simulator. This allows to control the predictions via the simulator preventing them from accumulating errors. This is especially challenging since, in contrast to previous approaches, access to the simulator's latent states is not available. We tackle the task by leveraging observers, a well-known concept from control theory, inferring unknown latent states from observations and dynamics over time. In our learning-based setting, we jointly learn the dynamics and an observer that infers the latent states via the simulator. Thus, the simulator constantly corrects the latent states, compensating for modeling mismatch caused by learning. To maintain flexibility, we train an RNN-based residuum for the latent states that cannot be informed by the simulator.
    摘要 模型学习动态模型的任务是从测量数据中推断未知动态和预测系统的未来行为。一般来说,使用回归模型进行训练。然而,这些模型的预测结果frequently不是物理意义上的。此外,随着时间的推移,这些模型的性能会逐渐下降,这是因为模型中的错误会积累。而物理意义上的模拟器通常是可用的,但是这些模型通常受到简化的限制,导致它们不准确。因此,hybrid模型成为一种趋势,它将了解的模型和物理意义上的模拟器结合在一起。在这篇论文中,我们提出一种新的hybrid模型方法,我们通过黑盒模拟器来控制了学习后的隐藏状态。这是因为,不同于之前的方法,我们没有直接访问黑盒模拟器的隐藏状态。我们使用控制理论中的观察器来推断未知的隐藏状态,并且在学习过程中,我们同时学习了动态和观察器。因此,黑盒模拟器不断地更正隐藏状态,以 compensate 模型的偏差,保持模型的准确性。为保持灵活性,我们在隐藏状态中使用RNN基的剩余来表示无法通过黑盒模拟器获取的信息。

On Reducing Undesirable Behavior in Deep Reinforcement Learning Models

  • paper_url: http://arxiv.org/abs/2309.02869
  • repo_url: None
  • paper_authors: Ophir M. Carmel, Guy Katz
  • for: 提高深度强化学习(DRL)软件的可靠性和可解释性,降低DRL训练过程中不良行为的频率。
  • methods: 提出一种基于EXTRACTING decision tree classifiers的新框架,将这些树 integrating into DRL 训练 loop,对系统进行错误处理,从而减少不良行为。
  • results: 在三个 significanthcase studies中,我们的方法可以 straightforward manner extend existing frameworks,增加训练时间的开销较小,并且对性能的影响较小或者甚至提高,同时减少不良行为的频率。
    Abstract Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the system. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the system whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases - improves it, while significantly reducing the frequency of undesirable behavior.
    摘要

Enhancing Event Sequence Modeling with Contrastive Relational Inference

  • paper_url: http://arxiv.org/abs/2309.02868
  • repo_url: None
  • paper_authors: Yan Wang, Zhixuan Chu, Tao Zhou, Caigao Jiang, Hongyan Hao, Minjie Zhu, Xindong Cai, Qing Cui, Longfei Li, James Y Zhang, Siqiao Xue, Jun Zhou
  • for: 模型化连续时间事件序列,特别是捕捉事件之间的交互关系,以进行事件序列预测等推理任务。
  • methods: 基于神经关系推理(NRI),学习一个关系图,捕捉事件之间的交互关系,同时也学习事件序列的动态模式。
  • results: 在三个实际数据集上,实验表明我们的模型能够有效地捕捉事件之间的交互关系,用于事件序列模型化任务。
    Abstract Neural temporal point processes(TPPs) have shown promise for modeling continuous-time event sequences. However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data. Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks.
    摘要 neural temporal point processes (TPPs) 有推荐模型 continuous-time event sequences. However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data. Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions. In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data. Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework. It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints. Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks.Here's the breakdown of the translation:* neural temporal point processes (TPPs) 是一种 continuous-time event sequences 的模型。* However, capturing the interactions between events is challenging yet critical for performing inference tasks like forecasting on event sequence data。* Existing TPP models have focused on parameterizing the conditional distribution of future events but struggle to model event interactions。* In this paper, we propose a novel approach that leverages Neural Relational Inference (NRI) to learn a relation graph that infers interactions while simultaneously learning the dynamics patterns from observational data。* Our approach, the Contrastive Relational Inference-based Hawkes Process (CRIHP), reasons about event interactions under a variational inference framework。* It utilizes intensity-based learning to search for prototype paths to contrast relationship constraints。* Extensive experiments on three real-world datasets demonstrate the effectiveness of our model in capturing event interactions for event sequence modeling tasks。

A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques

  • paper_url: http://arxiv.org/abs/2309.02854
  • repo_url: https://github.com/ait-aecid/anomaly-detection-log-datasets
  • paper_authors: Max Landauer, Florian Skopik, Markus Wurzenberger
  • for: 本研究旨在分析六种公共可用的日志数据集,以检测日志数据中异常现象的探测方法。
  • methods: 本研究使用了深度学习技术来检测日志数据中的异常现象,并评估了这些异常检测技术的效果。
  • results: 研究发现,大多数异常现象不是直接相关于顺序的探测,而是相关于其他特征的探测。此外,研究还发现了一些简单的检测技术可以在这些数据集上达到高的检测率。
    Abstract Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.
    摘要 log数据存储事件执行模式,与系统或应用程序的下面工作流程相对应。大多数日志数据都很有用,但日志数据也包含了失败或事件的畸形或物理特征。因此,日志数据经常用于评估自动发现不当或有关系系统行为模式的异常检测技术。在过去几年中,使用深度学习的检测方法在异常检测方面得到了越来越多的关注,特别是在 seq 执行模式中发现异常的变化。一些公共可用的数据集,如 HDFS、BGL、Thunderbird、OpenStack 和 Hadoop,已成为评估这些异常检测技术的标准数据集,但这些数据集的适用性没有过去仔细研究。在这篇论文中,我们分析了六个公共可用的日志数据集,关注异常的表现和简单的检测技术。我们的发现表明,大多数异常并不直接与 seq 执行模式相关,并且不需要高级的检测技术可以在这些数据集上 достичь高的检测率。

Random postprocessing for combinatorial Bayesian optimization

  • paper_url: http://arxiv.org/abs/2309.02842
  • repo_url: None
  • paper_authors: Keisuke Morita, Yoshihiko Nishikawa, Masayuki Ohzeki
  • for: optimize discrete “black-box” optimization problems
  • methods: Bayesian optimization techniques with postprocessing method to avoid duplicated samples
  • results: significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation.
    Abstract Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.
    摘要 模型基于的连续Sequential方法,包括 Bayesian 优化技术,通常会在给定目标函数中访问相同的点多次,从而导致找到全局最优的过程比较慢。在这里,我们numerically研究了在 Bayesian 优化中使用禁止重复样本的后处理方法的效果。我们发现这种后处理方法可以减少找到全局最优的步骤数量,特别是当采集函数是最大 posterior 估计时。我们的结果提供了一种简单 yet 通用的策略,用于解决高维问题中 Bayesian 优化的慢速收敛问题。

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

  • paper_url: http://arxiv.org/abs/2309.02836
  • repo_url: https://github.com/sony/bigvsan_eval
  • paper_authors: Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji
  • for: 这个论文是关于使用生成对抗网络(GAN)实现高质量音频合成的研究。
  • methods: 这个论文使用了改进的对抗网络训练框架——割辑对抗网络(SAN),以便在特征空间找到最佳投影。
  • results: 通过实验,这个论文表明了SAN可以提高GAN基于 vocoder的性能,包括BigVGAN,只需小改动。
    Abstract Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between real and fake data in the feature space. In the literature, it has been demonstrated that slicing adversarial network (SAN), an improved GAN training framework that can find the optimal projection, is effective in the image generation task. In this paper, we investigate the effectiveness of SAN in the vocoding task. For this purpose, we propose a scheme to modify least-squares GAN, which most GAN-based vocoders adopt, so that their loss functions satisfy the requirements of SAN. Through our experiments, we demonstrate that SAN can improve the performance of GAN-based vocoders, including BigVGAN, with small modifications. Our code is available at https://github.com/sony/bigvsan.
    摘要

Introducing Thermodynamics-Informed Symbolic Regression – A Tool for Thermodynamic Equations of State Development

  • paper_url: http://arxiv.org/abs/2309.02805
  • repo_url: https://github.com/scoop-group/tisr
  • paper_authors: Viktor Martinek, Ophelia Frotscher, Markus Richter, Roland Herzog
  • For: The paper is written for researchers and developers who are interested in creating accurate thermodynamic equations of state (EOS) for various industries and academic applications.* Methods: The paper introduces a new symbolic regression (SR) tool called thermodynamics-informed symbolic regression (TiSR), which combines the SR base with extensions to work with scattered experimental data, different residual pre- and post-processing options, and additional features for thermodynamic EOS development.* Results: The paper showcases the progress of TiSR and discusses its current state, future directions, and potential applications in the field of thermodynamics.Here is the same information in Simplified Chinese text:* For: 这篇论文是为了各种工业和学术应用中的热力学方程状态(EOS)的创建而写的。* Methods: 论文介绍了一种新的符号回归(SR)工具——热力学 Informed Symbolic Regression(TiSR),它将SR基础与EXTensions相结合,以处理各种散布的实验数据、不同的剩余预处理和后处理选项、以及附加的热力学EOS发展要求。* Results: 论文展示了TiSR的进步和当前状态,并讨论了未来的发展和应用于热力学领域。I hope this helps!
    Abstract Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development process, we introduce thermodynamics-informed symbolic regression (TiSR), a symbolic regression (SR) tool aimed at thermodynamic EOS modeling. TiSR is already a capable SR tool, which was used in the research of https://doi.org/10.1007/s10765-023-03197-z. It aims to combine an SR base with the extensions required to work with often strongly scattered experimental data, different residual pre- and post-processing options, and additional features required to consider thermodynamic EOS development. Although TiSR is not ready for end users yet, this paper is intended to report on its current state, showcase the progress, and discuss (distant and not so distant) future directions. TiSR is available at https://github.com/scoop-group/TiSR and can be cited as https://doi.org/10.5281/zenodo.8317547.
    摘要 thermodynamic equation of state (EOS) 是各行业以及学术界中非常重要的。即使不考虑数据收集的昂贵和复杂的测量活动,EOS 的开发还是一个非常时间consuming的过程,它frequently 仍然倚靠专家知识和迭代精细调整。为了改进和加速 EOS 开发过程,我们介绍 thermodynamic-informed symbolic regression (TiSR),一种符号 regression (SR) 工具专门用于 thermodynamic EOS 模型化。TiSR 已经是一种可靠的 SR 工具,它在 https://doi.org/10.1007/s10765-023-03197-z 中的研究中使用。它的目标是将 SR 基础结合 thermodynamic EOS 开发所需的扩展,以及不同的剩余预处理和后处理选项,以及考虑 thermodynamic EOS 开发中的其他特性。虽然 TiSR 还不是用户准备就绪,但这篇文章是为报告其当前状况,展示进度,并讨论(远离和不远)未来方向。TiSR 可以在 https://github.com/scoop-group/TiSR 上下载,并可以在 https://doi.org/10.5281/zenodo.8317547 上引用。

Dynamic Encoding and Decoding of Information for Split Learning in Mobile-Edge Computing: Leveraging Information Bottleneck Theory

  • paper_url: http://arxiv.org/abs/2309.02787
  • repo_url: None
  • paper_authors: Omar Alhussein, Moshi Wei, Arashmid Akhavain
  • for: 该研究旨在提出一种基于分解学习的隐私保护分布式学习方法,以提高移动端Edge计算中的网络功能(如流量预测)的训练效果。
  • methods: 该研究使用了数据处理不均性和信息瓶颈理论来实现一种动态均衡传输资源的使用与共享干扰表示的信息含量,从而直接影响预测性能。
  • results: 该研究提出了一种基于encoder-decoder神经网络架构的新训练机制,可以实现可调Complexity-Relevance质量的交换,并且可以适应不同的实时网络条件和应用要求,从而减少运营成本和提高网络灵活性。
    Abstract Split learning is a privacy-preserving distributed learning paradigm in which an ML model (e.g., a neural network) is split into two parts (i.e., an encoder and a decoder). The encoder shares so-called latent representation, rather than raw data, for model training. In mobile-edge computing, network functions (such as traffic forecasting) can be trained via split learning where an encoder resides in a user equipment (UE) and a decoder resides in the edge network. Based on the data processing inequality and the information bottleneck (IB) theory, we present a new framework and training mechanism to enable a dynamic balancing of the transmission resource consumption with the informativeness of the shared latent representations, which directly impacts the predictive performance. The proposed training mechanism offers an encoder-decoder neural network architecture featuring multiple modes of complexity-relevance tradeoffs, enabling tunable performance. The adaptability can accommodate varying real-time network conditions and application requirements, potentially reducing operational expenditure and enhancing network agility. As a proof of concept, we apply the training mechanism to a millimeter-wave (mmWave)-enabled throughput prediction problem. We also offer new insights and highlight some challenges related to recurrent neural networks from the perspective of the IB theory. Interestingly, we find a compression phenomenon across the temporal domain of the sequential model, in addition to the compression phase that occurs with the number of training epochs.
    摘要 分学学习是一种隐私保护的分布式学习 paradigma,在其中一个机器学习模型(例如神经网络)被分解成两部分(即编码器和解码器)。编码器分享所谓的隐藏表示,而不是原始数据,用于模型训练。在移动边缘计算中,网络功能(如交通预测)可以通过分学学习进行训练,其中编码器位于用户设备(UE)中,而解码器位于边缘网络中。基于数据处理不等式和信息瓶颈(IB)理论,我们提出了一个新的框架和训练机制,以实现在传输资源消耗和隐藏表示的共享中进行动态的衡量平衡,直接影响预测性能。我们的训练机制提供了一个编码器-解码器神经网络架构, featuring 多种复杂度-相关性质的交易,以实现可调性。这种适应性可以满足不同的实时网络条件和应用需求,可能减少运营成本并提高网络的灵活性。作为证明,我们应用训练机制到一个毫米波(mmWave)吞吐量预测问题上。我们还提供了新的视角和高亮一些与IB理论相关的挑战,包括循环神经网络中的压缩现象。 Interestingly,我们发现在时间Domain中,隐藏表示的压缩现象,以及在训练环次中的压缩阶段。

CVE-driven Attack Technique Prediction with Semantic Information Extraction and a Domain-specific Language Model

  • paper_url: http://arxiv.org/abs/2309.02785
  • repo_url: None
  • paper_authors: Ehsan Aghaei, Ehab Al-Shaer
  • for: 本研究旨在填补cybersecurity中的漏洞信息和攻击动作之间的知识渠道差距。
  • methods: 本文引入TTPpredictor工具,利用创新技术分析CVE描述文本,推断可能的攻击动作(tactics, techniques, and procedures,或TTP)。
  • results: 实验证明TTPpredictor具有约98%的准确率和95%-98%的F1分数,能够准确地将CVE分类到ATT&CK技术中。TTPpredictor的性能高于现有的语言模型工具 like ChatGPT。
    Abstract This paper addresses a critical challenge in cybersecurity: the gap between vulnerability information represented by Common Vulnerabilities and Exposures (CVEs) and the resulting cyberattack actions. CVEs provide insights into vulnerabilities, but often lack details on potential threat actions (tactics, techniques, and procedures, or TTPs) within the ATT&CK framework. This gap hinders accurate CVE categorization and proactive countermeasure initiation. The paper introduces the TTPpredictor tool, which uses innovative techniques to analyze CVE descriptions and infer plausible TTP attacks resulting from CVE exploitation. TTPpredictor overcomes challenges posed by limited labeled data and semantic disparities between CVE and TTP descriptions. It initially extracts threat actions from unstructured cyber threat reports using Semantic Role Labeling (SRL) techniques. These actions, along with their contextual attributes, are correlated with MITRE's attack functionality classes. This automated correlation facilitates the creation of labeled data, essential for categorizing novel threat actions into threat functionality classes and TTPs. The paper presents an empirical assessment, demonstrating TTPpredictor's effectiveness with accuracy rates of approximately 98% and F1-scores ranging from 95% to 98% in precise CVE classification to ATT&CK techniques. TTPpredictor outperforms state-of-the-art language model tools like ChatGPT. Overall, this paper offers a robust solution for linking CVEs to potential attack techniques, enhancing cybersecurity practitioners' ability to proactively identify and mitigate threats.
    摘要

On the Effects of Heterogeneous Errors on Multi-fidelity Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2309.02771
  • repo_url: None
  • paper_authors: Zahra Zanjani Foumani, Amin Yousefpour, Mehdi Shishehbor, Ramin Bostanabad
  • for: This paper is written for researchers and practitioners who are interested in using multi-fidelity methods for Bayesian optimization in materials design.
  • methods: The paper proposes a new multi-fidelity emulation method that learns a noise model for each data source and enables the use of highly biased low-fidelity sources for Bayesian optimization.
  • results: The paper demonstrates the performance of the proposed method through analytical examples and engineering problems on materials design, showing that it can improve the efficiency and accuracy of Bayesian optimization compared to existing methods.
    Abstract Bayesian optimization (BO) is a sequential optimization strategy that is increasingly employed in a wide range of areas including materials design. In real world applications, acquiring high-fidelity (HF) data through physical experiments or HF simulations is the major cost component of BO. To alleviate this bottleneck, multi-fidelity (MF) methods are used to forgo the sole reliance on the expensive HF data and reduce the sampling costs by querying inexpensive low-fidelity (LF) sources whose data are correlated with HF samples. However, existing multi-fidelity BO (MFBO) methods operate under the following two assumptions that rarely hold in practical applications: (1) LF sources provide data that are well correlated with the HF data on a global scale, and (2) a single random process can model the noise in the fused data. These assumptions dramatically reduce the performance of MFBO when LF sources are only locally correlated with the HF source or when the noise variance varies across the data sources. In this paper, we dispense with these incorrect assumptions by proposing an MF emulation method that (1) learns a noise model for each data source, and (2) enables MFBO to leverage highly biased LF sources which are only locally correlated with the HF source. We illustrate the performance of our method through analytical examples and engineering problems on materials design.
    摘要 bayesian 优化 (BO) 是一种顺序优化策略,在各种领域中越来越广泛应用,包括材料设计。在实际应用中,通过物理实验或高精度计算获取高精度数据是 BO 的主要成本组成部分。为了缓解这个瓶颈,使用多元精度 (MF) 方法,可以减少样本成本,通过访问便宜的低精度 (LF) 源的数据,其与高精度数据相关。然而,现有的 MFBO 方法假设 rarely 在实际应用中成立:(1) LF 源提供的数据与高精度数据在全球范围内很好地相关,和(2)数据源之间的噪声可以由单个随机过程模型。这些假设会使 MFBO 在 LF 源只是地方相关于高精度源或噪声方差随着数据源而变化时表现不佳。在这篇文章中,我们抛弃这些错误假设,提出一种 MF 模拟方法,其中(1)学习每个数据源的噪声模型,和(2)允许 MFBO 利用高偏见 LF 源。我们通过分析示例和工程问题来证明我们的方法的性能。

Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond

  • paper_url: http://arxiv.org/abs/2309.02769
  • repo_url: None
  • paper_authors: Zhiqi Shao, Dai Shi, Andi Han, Yi Guo, Qibin Zhao, Junbin Gao
    for:The paper aims to address critical computational challenges in graph neural networks (GNNs), such as over-smoothing and limited expressive power, by introducing a new method called Multi-Scaled Heat Kernel based GNN (MHKG) and its generalization G-MHKG.methods:The proposed method reverses the time direction of the graph heat equation to enhance the sharpness of graph node features, and leverages high pass filtering functions to improve the performance of GNNs.results:The proposed models (MHKG and G-MHKG) outperform several GNN baseline models in terms of performance across various graph datasets characterized by both homophily and heterophily. Additionally, the trade-off between over-smoothing and over-squashing is analyzed, and the method is shown to handle both issues under mild conditions.
    Abstract Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum physics, we reverse the time direction of the graph heat equation. The resulted reversing process yields a class of high pass filtering functions that enhance the sharpness of graph node features. Leveraging this concept, we introduce the Multi-Scaled Heat Kernel based GNN (MHKG) by amalgamating diverse filtering functions' effects on node features. To explore more flexible filtering conditions, we further generalize MHKG into a model termed G-MHKG and thoroughly show the roles of each element in controlling over-smoothing, over-squashing and expressive power. Notably, we illustrate that all aforementioned issues can be characterized and analyzed via the properties of the filtering functions, and uncover a trade-off between over-smoothing and over-squashing: enhancing node feature sharpness will make model suffer more from over-squashing, and vice versa. Furthermore, we manipulate the time again to show how G-MHKG can handle both two issues under mild conditions. Our conclusive experiments highlight the effectiveness of proposed models. It surpasses several GNN baseline models in performance across graph datasets characterized by both homophily and heterophily.
    摘要 GRAPH神经网络(GNNs)已经成为机器学习图structured数据的一种主流方法。尽管它们具有出色的成功,但是计算挑战,如过滤、过压和表达能力的局限性,仍然影响GNNs的性能。在这个研究中,我们基于时间反转原理,通常用于古典物理和量子物理中,对图热方程进行时间反转。这个过程生成了一类高通过滤函数,可以增强图节点特征的锐度。基于这个概念,我们介绍了多级热均衡基于GNN(MHKG),通过不同滤波函数对节点特征产生的影响。为了探索更多的筛选条件,我们进一步总结MHKG模型,并详细介绍每个元素在控制过滤、过压和表达能力方面的作用。我们发现,所有的问题都可以通过滤波函数的性质来characterized和分析,并发现过滤和过压之间存在一种负相关关系:增强节点特征锐度会使模型受到更多的过压,并且vice versa。此外,我们再次操作时间,展示了G-MHKG模型在轻度条件下可以处理两个问题。我们的实验结果表明,提出的模型在多个图 Dataset上表现出色,超过了一些GNN基eline模型。

Towards Unsupervised Graph Completion Learning on Graphs with Features and Structure Missing

  • paper_url: http://arxiv.org/abs/2309.02762
  • repo_url: None
  • paper_authors: Sichao Fu, Qinmu Peng, Yang He, Baokun Du, Xinge You
    for: 这种方法可以用于改善存在特定任务下的图分析Graph Neural Networks (GNN)的性能,特别是在存在部分缺失的节点特征或结构关系的情况下。methods: 我们提出了一种更通用的图完成学习(GCL)框架,通过自我监督学习来提高现有GNN变体在图中的任务性能,并解决了现有GCL方法存在的标签依赖、节点特征和结构关系的偏见问题。results: 我们在八个 dataset、三种 GNN 变体和五种缺失率进行了广泛的实验,结果显示我们的提议方法能够有效地提高GNN的任务性能。
    Abstract In recent years, graph neural networks (GNN) have achieved significant developments in a variety of graph analytical tasks. Nevertheless, GNN's superior performance will suffer from serious damage when the collected node features or structure relationships are partially missing owning to numerous unpredictable factors. Recently emerged graph completion learning (GCL) has received increasing attention, which aims to reconstruct the missing node features or structure relationships under the guidance of a specifically supervised task. Although these proposed GCL methods have made great success, they still exist the following problems: the reliance on labels, the bias of the reconstructed node features and structure relationships. Besides, the generalization ability of the existing GCL still faces a huge challenge when both collected node features and structure relationships are partially missing at the same time. To solve the above issues, we propose a more general GCL framework with the aid of self-supervised learning for improving the task performance of the existing GNN variants on graphs with features and structure missing, termed unsupervised GCL (UGCL). Specifically, to avoid the mismatch between missing node features and structure during the message-passing process of GNN, we separate the feature reconstruction and structure reconstruction and design its personalized model in turn. Then, a dual contrastive loss on the structure level and feature level is introduced to maximize the mutual information of node representations from feature reconstructing and structure reconstructing paths for providing more supervision signals. Finally, the reconstructed node features and structure can be applied to the downstream node classification task. Extensive experiments on eight datasets, three GNN variants and five missing rates demonstrate the effectiveness of our proposed method.
    摘要 Recently, graph neural networks (GNN) have made significant progress in various graph analytical tasks. However, GNN's performance will be severely affected when the collected node features or structural relationships are partially missing due to various unpredictable factors. To address this issue, graph completion learning (GCL) has received increasing attention, which aims to reconstruct the missing node features or structural relationships under the guidance of a specifically supervised task. Although these proposed GCL methods have achieved great success, they still have the following problems: reliance on labels, bias in the reconstructed node features and structural relationships. Moreover, the existing GCL methods have difficulty in generalizing when both collected node features and structural relationships are partially missing simultaneously.To solve these issues, we propose a more general GCL framework with the aid of self-supervised learning to improve the task performance of the existing GNN variants on graphs with features and structure missing, termed unsupervised GCL (UGCL). Specifically, to address the mismatch between missing node features and structure during the message-passing process of GNN, we separate the feature reconstruction and structural reconstruction and design personalized models for each. Then, we introduce a dual contrastive loss on the structure level and feature level to maximize the mutual information of node representations from feature reconstructing and structure reconstructing paths, providing more supervision signals. Finally, the reconstructed node features and structure can be applied to the downstream node classification task. Our extensive experiments on eight datasets, three GNN variants, and five missing rates demonstrate the effectiveness of our proposed method.

Safe Neural Control for Non-Affine Control Systems with Differentiable Control Barrier Functions

  • paper_url: http://arxiv.org/abs/2309.04492
  • repo_url: None
  • paper_authors: Wei Xiao, Ross Allen, Daniela Rus
  • for: 这 paper Addresses the problem of safety-critical control for non-affine control systems.
  • methods: The paper uses Control Barrier Functions (CBFs) to optimize quadratic costs subject to state and control constraints, and incorporates higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems.
  • results: The proposed framework is capable of learning complex and optimal control policies that are usually intractable online, and can address the conservativeness of CBFs such that the system state will not stay unnecessarily far away from safe set boundaries. The effectiveness of the proposed framework is illustrated on LiDAR-based autonomous driving and compared with existing methods.
    Abstract This paper addresses the problem of safety-critical control for non-affine control systems. It has been shown that optimizing quadratic costs subject to state and control constraints can be sub-optimally reduced to a sequence of quadratic programs (QPs) by using Control Barrier Functions (CBFs). Our recently proposed High Order CBFs (HOCBFs) can accommodate constraints of arbitrary relative degree. The main challenges in this approach are that it requires affine control dynamics and the solution of the CBF-based QP is sub-optimal since it is solved point-wise. To address these challenges, we incorporate higher-order CBFs into neural ordinary differential equation-based learning models as differentiable CBFs to guarantee safety for non-affine control systems. The differentiable CBFs are trainable in terms of their parameters, and thus, they can address the conservativeness of CBFs such that the system state will not stay unnecessarily far away from safe set boundaries. Moreover, the imitation learning model is capable of learning complex and optimal control policies that are usually intractable online. We illustrate the effectiveness of the proposed framework on LiDAR-based autonomous driving and compare it with existing methods.
    摘要

Improved Outlier Robust Seeding for k-means

  • paper_url: http://arxiv.org/abs/2309.02710
  • repo_url: None
  • paper_authors: Amit Deshpande, Rameshwar Pratap
  • for: 提出了一种robust的$k$-means算法,可以抗性减少噪声和异常点的影响。
  • methods: 使用了一种简单的变换方法来改进$D^{2}$ sampling的分布,使其更加抗性。
  • results: 提出了一种 Linear Time $k$-means算法,可以在 $O(ndk)$ 时间内输出 $O(k)$ 个团集,并且有 $O(1)$ 的近似性保证。
    Abstract The $k$-means is a popular clustering objective, although it is inherently non-robust and sensitive to outliers. Its popular seeding or initialization called $k$-means++ uses $D^{2}$ sampling and comes with a provable $O(\log k)$ approximation guarantee \cite{AV2007}. However, in the presence of adversarial noise or outliers, $D^{2}$ sampling is more likely to pick centers from distant outliers instead of inlier clusters, and therefore its approximation guarantees \textit{w.r.t.} $k$-means solution on inliers, does not hold. Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee. Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++~\cite{AV2007}, uniform random seeding, greedy sampling for $k$ means~\cite{tkmeanspp}, and robust $k$-means++~\cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.
    摘要 “$k$-means”是一种受欢迎的聚类目标函数,然而它本身是非稳定的和敏感于异常值的。它的常用的种子或初始化方法called $k$-means++使用$D^{2}$抽样,并有一个可证明的$O(\log k)$的近似保证 \cite{AV2007}。然而,在异常值或异常抽样下,$D^{2}$抽样更容易从远程异常值中选择中心点而不是内lier峰值,因此其近似保证与$k$-means解峰不同。 Assuming that the outliers constitute a constant fraction of the given data, we propose a simple variant in the $D^2$ sampling distribution, which makes it robust to the outliers. Our algorithm runs in $O(ndk)$ time, outputs $O(k)$ clusters, discards marginally more points than the optimal number of outliers, and comes with a provable $O(1)$ approximation guarantee. Our algorithm can also be modified to output exactly $k$ clusters instead of $O(k)$ clusters, while keeping its running time linear in $n$ and $d$. This is an improvement over previous results for robust $k$-means based on LP relaxation and rounding \cite{Charikar}, \cite{KrishnaswamyLS18} and \textit{robust $k$-means++} \cite{DeshpandeKP20}. Our empirical results show the advantage of our algorithm over $k$-means++ \cite{AV2007}, uniform random seeding, greedy sampling for $k$ means \cite{tkmeanspp}, and robust $k$-means++ \cite{DeshpandeKP20}, on standard real-world and synthetic data sets used in previous work. Our proposal is easily amenable to scalable, faster, parallel implementations of $k$-means++ \cite{Bahmani,BachemL017} and is of independent interest for coreset constructions in the presence of outliers \cite{feldman2007ptas,langberg2010universal,feldman2011unified}.

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.02669
  • repo_url: None
  • paper_authors: Tianchi Cai, Jiyan Jiang, Wenpeng Zhang, Shiji Zhou, Xierui Song, Li Yu, Lihong Gu, Xiaodong Zeng, Jinjie Gu, Guannan Zhang
  • for: 这个论文关注在线广告运营中的营销预算分配问题,使用先前收集的线上数据。
  • methods: 该论文提出了一种基于游戏理论的离线值基 reinforcement learning方法,使用混合策略,从而减少了存储多个策略的需求,实现了几乎最佳的策略效率,使其在实际应用中成为可能。
  • results: 我们的实验表明,该方法可以在一个大规模的广告运营中,覆盖数十万名用户和数百亿的预算,并且超过了多种基准方法。此外,该方法可以 garantía到最佳策略的收敛,而不是先前的值基 reinforcement learning方法不能 garantía。
    Abstract We study the budget allocation problem in online marketing campaigns that utilize previously collected offline data. We first discuss the long-term effect of optimizing marketing budget allocation decisions in the offline setting. To overcome the challenge, we propose a novel game-theoretic offline value-based reinforcement learning method using mixed policies. The proposed method reduces the need to store infinitely many policies in previous methods to only constantly many policies, which achieves nearly optimal policy efficiency, making it practical and favorable for industrial usage. We further show that this method is guaranteed to converge to the optimal policy, which cannot be achieved by previous value-based reinforcement learning methods for marketing budget allocation. Our experiments on a large-scale marketing campaign with tens-of-millions users and more than one billion budget verify the theoretical results and show that the proposed method outperforms various baseline methods. The proposed method has been successfully deployed to serve all the traffic of this marketing campaign.
    摘要 我们研究线上广告营运中的预算分配问题,利用先前收集的资料进行优化。我们首先讨论了在线上设定预算分配的长期影响。为解决这个挑战,我们提出了一种基于游戏理论的掌握策略混合方法。这种方法可以将储存多个策略的需求降低至几个策略,实现近乎最佳策略效率,使其在工业中实际和有利。我们进一步证明这种方法将会导向最佳策略,而先前的值基掌握学习方法不能实现这一点。我们的实验显示,这种方法在一个涉及数十万名用户和一百亿预算的大规模广告营运中表现出色,并且超过了各种基准方法。我们已经将这种方法发布到服务这个广告营运中的所有流量。

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat

  • paper_url: http://arxiv.org/abs/2309.03237
  • repo_url: None
  • paper_authors: Erdong Hu, Yuxin Tang, Anastasios Kyrillidis, Chris Jermaine
  • for: 这个论文是为了研究在联合环境中学习的算法。
  • methods: 论文使用了多种算法来学习,包括分割神经网络、纵向归一化和权重学习。
  • results: 研究发现,将神经网络Vertically decomposing可以获得最佳结果,并超过标准的重要方法。
    Abstract We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods.
    摘要 我们仔细评估了许多 Federated Learning 环境中学习算法,并对各种图像分类任务进行测试。我们考虑了许多未经充分考虑的问题:不同数据集之间的图像多样性对结果的影响;使用预训练的特征提取“脊梁”;评估学习器性能的方法(我们认为分类精度不够)等。总之,在多种设置下,我们发现垂直 decomposing 神经网络可以获得最好的结果,并超越了标准的重新整合方法。

Contrastive Learning as Kernel Approximation

  • paper_url: http://arxiv.org/abs/2309.02651
  • repo_url: None
  • paper_authors: Konstantinos Christopher Tsiolis
  • for: 这个论文主要针对的是用contrastive learning方法学习从无标签数据中提取特征。
  • methods: 这个论文使用了contrastive loss函数来训练低维ensional的特征表示,并使用了对偶搜集方法来生成对称的训练对。
  • results: 研究人员发现,使用contrastive learning方法可以从大量的无标签数据中提取高质量的特征,并且可以在小型标签数据上进行高精度的预测。
    Abstract In standard supervised machine learning, it is necessary to provide a label for every input in the data. While raw data in many application domains is easily obtainable on the Internet, manual labelling of this data is prohibitively expensive. To circumvent this issue, contrastive learning methods produce low-dimensional vector representations (also called features) of high-dimensional inputs on large unlabelled datasets. This is done by training with a contrastive loss function, which enforces that similar inputs have high inner product and dissimilar inputs have low inner product in the feature space. Rather than annotating each input individually, it suffices to define a means of sampling pairs of similar and dissimilar inputs. Contrastive features can then be fed as inputs to supervised learning systems on much smaller labelled datasets to obtain high accuracy on end tasks of interest. The goal of this thesis is to provide an overview of the current theoretical understanding of contrastive learning, specifically as it pertains to the minimizers of contrastive loss functions and their relationship to prior methods for learning features from unlabelled data. We highlight popular contrastive loss functions whose minimizers implicitly approximate a positive semidefinite (PSD) kernel. The latter is a well-studied object in functional analysis and learning theory that formalizes a notion of similarity between elements of a space. PSD kernels provide an implicit definition of features through the theory of reproducing kernel Hilbert spaces.
    摘要 通常的超级vised机器学习中需要为每个输入提供标签。然而,在许多应用领域中,原始数据的 raw 数据可以轻松地从互联网上获得,但是手动标签这些数据是不可能的。为了解决这个问题,对冲学方法生成了低维度向量表示(也称为特征),这些特征可以在大量未标注数据上进行训练。这是通过对冲损失函数进行训练,使得相似的输入在特征空间中具有高内积,而不相似的输入具有低内积。而不是每个输入都需要注释,只需要定义一种方法来随机对应输入进行对比即可。对冲特征可以被用作supervised learning系统的输入,并在许多小型标注数据上获得高精度的终点任务。本论文的目标是提供对冲学的当前理论认知,具体来说是关于对冲损失函数的最小值和其与先前未标注数据学习方法之间的关系。我们强调了流行的对冲损失函数,其最小值隐式地 aproximate 一个正semidefinite(PSD)kernel。后者是函数分析和学习理论中已有的一个很好地定义相似性的概念。PSD kernel 提供了一种隐式地定义特征的方法,通过对冲特征空间的理论。