cs.LG - 2023-08-31

A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

  • paper_url: http://arxiv.org/abs/2308.16904
  • repo_url: None
  • paper_authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma
  • for: Linear systems with noisy coefficient matrices and right-hand side vectors, and the need for efficient iterative solvers.
  • methods: Randomized Kaczmarz (RK) algorithm and its convergence analysis in the presence of both additive and multiplicative noise.
  • results: The paper provides a robust analysis of RK’s convergence for noisy linear systems, without requiring knowledge of the noiseless coefficient matrix, and demonstrates the effectiveness of the method through comprehensive numerical experiments.Here’s the full text in Simplified Chinese:
  • for: Linear systems $Ax=b$ 频繁出现在实践中,需要有效的迭代解算法。它们经常受到操作错误或数据收集过程中的干扰,导致系统受到干扰。过去的十年,Randomized Kaczmarz(RK)算法已经被广泛研究,作为有效的迭代解算法。但是,RK在干扰 regime 的整合研究尚不充分,只考虑了右侧向量 $b$ 的干扰。在实际中, Matrix $A$ 也可能受到干扰。
  • methods: RK 算法在干扰 Linear Systems 中的整合分析,包括 $A$ 和 $b$ 的干扰。
  • results: paper 提供了 Robust 的 RK 整合分析方法,不需要知道干扰后的 Matrix $A$,并通过对各种干扰条件的分析,可以控制 RK 的整合。实验证明了这些理论成果的实际可行性。
    Abstract Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is limited and considers measurement noise in the right-hand side vector, $b$. Unfortunately, in practice, that is not always the case; the coefficient matrix $A$ can also be noisy. In this paper, we analyze the convergence of RK for noisy linear systems when the coefficient matrix, $A$, is corrupted with both additive and multiplicative noise, along with the noisy vector, $b$. In our analyses, the quantity $\tilde R=\| \tilde A^{\dagger} \|_2^2 \|\tilde A \|_F^2$ influences the convergence of RK, where $\tilde A$ represents a noisy version of $A$. We claim that our analysis is robust and realistically applicable, as we do not require information about the noiseless coefficient matrix, $A$, and considering different conditions on noise, we can control the convergence of RK. We substantiate our theoretical findings by performing comprehensive numerical experiments.
    摘要

Learning to Taste: A Multimodal Wine Dataset

  • paper_url: http://arxiv.org/abs/2308.16900
  • repo_url: None
  • paper_authors: Thoranna Bender, Simon Møe Sørensen, Alireza Kashani, K. Eldjarn Hjorleifsson, Grethe Hyldig, Søren Hauberg, Serge Belongie, Frederik Warburg
  • for: 研究视觉、语言和味道之间的关系
  • methods: 使用大量多模态葡萄酒数据集(WineSensed),包括897k图像和824k葡萄酒评分,以及5k对精细味道距离的人工味道试验结果
  • results: 提出一种将人类经验与自动机器相似性kernels合并的低维度概念嵌入算法,可以提高粗略味道分类(酒精度、国家、葡萄、价格、评分),并与人类味道感受相似。
    Abstract We present WineSensed, a large multimodal wine dataset for studying the relations between visual perception, language, and flavor. The dataset encompasses 897k images of wine labels and 824k reviews of wines curated from the Vivino platform. It has over 350k unique vintages, annotated with year, region, rating, alcohol percentage, price, and grape composition. We obtained fine-grained flavor annotations on a subset by conducting a wine-tasting experiment with 256 participants who were asked to rank wines based on their similarity in flavor, resulting in more than 5k pairwise flavor distances. We propose a low-dimensional concept embedding algorithm that combines human experience with automatic machine similarity kernels. We demonstrate that this shared concept embedding space improves upon separate embedding spaces for coarse flavor classification (alcohol percentage, country, grape, price, rating) and aligns with the intricate human perception of flavor.
    摘要 我们介绍WineSensed,一个大型多模态葡萄酒数据集,用于研究视觉感知、语言和味道之间的关系。该数据集包括897万张葡萄酒标签图像和824万瓶葡萄酒的评论,来自Vivino平台。它包含了350万个唯一的年份、地区、评分、酒精含量、价格和葡萄种植物的注释。我们对一个子集进行了葡萄酒味道 экспериimento,让256名参与者按照葡萄酒的味道相似性进行排序,共计超过5000个对的Pairwise味道距离。我们提议一种低维度概念嵌入算法,结合人类经验和自动机器相似性kernels。我们示示了这个共享概念嵌入空间可以超越分类(酒精含量、国家、葡萄、价格、评分),并与人类对味道的细致感知相匹配。

Federated Learning in UAV-Enhanced Networks: Joint Coverage and Convergence Time Optimization

  • paper_url: http://arxiv.org/abs/2308.16889
  • repo_url: None
  • paper_authors: Mariam Yahya, Setareh Maghsudi, Slawomir Stanczak
  • for: 本研究旨在实现无人机增强无线网络中的联邦学习(Federated Learning,FL),并且将其应用于具有能源限制的无线传输网络中。
  • methods: 本研究使用了多元目标多臂枪械理论来优化网络覆盖,同时降低了联邦学习延迟。另外,我们还提出了一个特别适用于大量动作集和严格能源限制的解决方案,使用了单一最佳臂识别算法来寻找最佳臂,以实现最大化网络覆盖和最小化联邦学习延迟。
  • results: NUMERICAL результаTS show the effectiveness of our approach, and demonstrate that our proposed method can significantly improve the coverage of the wireless sensor network while minimizing the FL delay.
    Abstract Federated learning (FL) involves several devices that collaboratively train a shared model without transferring their local data. FL reduces the communication overhead, making it a promising learning method in UAV-enhanced wireless networks with scarce energy resources. Despite the potential, implementing FL in UAV-enhanced networks is challenging, as conventional UAV placement methods that maximize coverage increase the FL delay significantly. Moreover, the uncertainty and lack of a priori information about crucial variables, such as channel quality, exacerbate the problem. In this paper, we first analyze the statistical characteristics of a UAV-enhanced wireless sensor network (WSN) with energy harvesting. We then develop a model and solution based on the multi-objective multi-armed bandit theory to maximize the network coverage while minimizing the FL delay. Besides, we propose another solution that is particularly useful with large action sets and strict energy constraints at the UAVs. Our proposal uses a scalarized best-arm identification algorithm to find the optimal arms that maximize the ratio of the expected reward to the expected energy cost by sequentially eliminating one or more arms in each round. Then, we derive the upper bound on the error probability of our multi-objective and cost-aware algorithm. Numerical results show the effectiveness of our approach.
    摘要 Federated 学习(FL)包括多个设备共同训练一个共享模型,而不需要传输本地数据。FL 减少了通信开销,使其成为无人机增强无线网络中的优秀学习方法,具有紧张的能源资源。不过,在实施 FL 中,使用 convent ional 无人机布局方法可能会增加延迟,而且无法预知 crucial 变量,如通道质量,使得问题更加复杂。在这篇论文中,我们首先分析了一个由无人机增强的无线传感器网络(WSN)中的能量收集。然后,我们开发了一个基于多目标多臂投机理论的模型和解决方案,以最大化网络覆盖率,同时最小化 FL 延迟。此外,我们还提出了一个特点是在大动作集和严格能源限制下特别有用的解决方案。我们的建议使用一个权重积分算法来找出最佳的臂,以最大化预期回报与预期能源成本的比率。然后,我们 deriv 了Upper bound 上的错误概率。数字结果表明我们的方法的效果。

Prediction of Diblock Copolymer Morphology via Machine Learning

  • paper_url: http://arxiv.org/abs/2308.16886
  • repo_url: None
  • paper_authors: Hyun Park, Boyuan Yu, Juhae Park, Ge Sun, Emad Tajkhorshid, Juan J. de Pablo, Ludwig Schneider
  • for: 该研究旨在加速大域颗粒材料形态演化计算,以便更好地理解粒子在材料中的Diffusion行为。
  • methods: 该方法利用了粒子级别的empirical模型和宏观尺度的数学模型之间的分离,并通过使用人工智能来学习Stochastic驱动的异常消失过程。
  • results: 该研究 validate了一种基于UNet架构的Explainable AI方法,可以快速计算大域颗粒材料的形态演化,并且可以生成大型系统和长时间的轨迹来研究异常密度和其演化过程。
    Abstract A machine learning approach is presented to accelerate the computation of block polymer morphology evolution for large domains over long timescales. The strategy exploits the separation of characteristic times between coarse-grained particle evolution on the monomer scale and slow morphological evolution over mesoscopic scales. In contrast to empirical continuum models, the proposed approach learns stochastically driven defect annihilation processes directly from particle-based simulations. A UNet architecture that respects different boundary conditions is adopted, thereby allowing periodic and fixed substrate boundary conditions of arbitrary shape. Physical concepts are also introduced via the loss function and symmetries are incorporated via data augmentation. The model is validated using three different use cases. Explainable artificial intelligence methods are applied to visualize the morphology evolution over time. This approach enables the generation of large system sizes and long trajectories to investigate defect densities and their evolution under different types of confinement. As an application, we demonstrate the importance of accessing late-stage morphologies for understanding particle diffusion inside a single block. This work has implications for directed self-assembly and materials design in micro-electronics, battery materials, and membranes.
    摘要 machine learning方法提出以加速大域领域内部链 polymer 结构演化计算。该策略利用粗粒体EVOLUTION的特征时间分解,从粗粒体 simulations 直接学习随机驱动的 Defect 消失过程。与经验法模型不同,该方法从粗粒体基本上学习随机驱动的 Defect 消失过程。采用UNet 架构,并遵循不同的边界条件,以 periodic 和固定substrate 边界条件。通过引入物理概念到损失函数中,并通过数据扩展来 incorporate Symmetries。该模型通过三个不同的应用 validate。使用 Explainable AI 方法可以Visualize 链 polymer 结构演化过程。这种方法可以生成大型系统和长时间轨迹,以调查各种封闭环境中 Defect 的浓度和演化。作为应用,我们示出了在一个块内部粒子 diffusion 的重要性。这种方法有关 directed self-assembly 和材料设计在微电子、电池材料和膜料等领域。

Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs

  • paper_url: http://arxiv.org/abs/2308.16859
  • repo_url: None
  • paper_authors: Mishfad Shaikh Veedu, Deepjyoti Deka, Murti V. Salapaka
  • for: 本文研究了一种 Linear Dynamical System (LDS) 上的 Directed Acyclic Graph (DAG) 的下面复杂度。 Specifically, the paper studies the sample complexity of learning the underlying DAG of an LDS over a DAG, where the nodal states are temporally correlated and driven by unobserved exogenous noise sources.
  • methods: 本文提出了一种基于 PSD 矩阵的度量和算法来重建 DAG。 The proposed metric and algorithm are inspired by the static settings, but are modified to accommodate the temporal correlations in the nodal states. The paper also considers the case where the equal noise PSD assumption can be relaxed.
  • results: 本文证明了 DAG 的下面复杂度为 $n = \Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. This upper bound is proven using a concentration bound for the PSD estimation, as well as a matching min-max lower bound based on generalized Fano’s inequality.
    Abstract In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.
    摘要 在本文中,我们研究了一个线性动力系统(LDS)下的指向图(DAG)的下面复杂性。已经对静止系统的结构学习了许多,但是对动态系统的研究较少。我们称这种DAG为动态DAG(DDAG)。特别是,我们考虑了一个DDAG,其中节点动力是由未观察的外部噪声源驱动,这些噪声源是时间方向上广泛站立的(WSS),且没有相互相关。根据静止设置,我们提出了一个度量和一个算法,使用观察时间序列的PSD矩阵来重建DDAG。PSD假设可以被放宽,以便不论identifiability condition不被违反。对LDS WITH WSS(子) Gaussian噪声源,我们证明了需要学习DDAG的样本复杂度为 $n = \Theta(q\log(p/q))$,其中$p$ 是节点数量,$q$ 是每个节点最多的父节点数。为证明样本复杂度上限,我们 derivated一个PSD估计的吸引 bound,以及一个通用Fano的不等式来提供下界。因此,我们得出了DDAG的学习算法的度量优化性。

Majorization-Minimization for sparse SVMs

  • paper_url: http://arxiv.org/abs/2308.16858
  • repo_url: None
  • paper_authors: Alessandro Benfenati, Emilie Chouzenoux, Giorgia Franchini, Salla Latva-Aijo, Dominik Narnhofer, Jean-Christophe Pesquet, Sebastian J. Scott, Mahsa Yousefi
  • for: 本研究旨在提出一种基于稀疏Promoting-Regularized squared hinge loss minimization的支持向量机器学习(SVM)训练方法,以便应用快速训练方法和提高性能。
  • methods: 本研究使用了稀疏Promoting-Regularized squared hinge loss minimization方法,该方法利用了对函数梯度的Lippenchitz可微性,并可以处理稀疏正则化项,以提高选择最重要特征的能力。
  • results: 根据对三个不同数据集的测试和比较,提出的方法在指标(准确率、精度、回归率和F 1 分数)和计算成本两个方面具有良好的表现。
    Abstract Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost.
    摘要

Natural Quantum Monte Carlo Computation of Excited States

  • paper_url: http://arxiv.org/abs/2308.16848
  • repo_url: None
  • paper_authors: David Pfau, Simon Axelrod, Halvard Sutterud, Ingrid von Glehn, James S. Spencer
  • for: 这个论文是为了估计量子系统的最低升阶态而写的。
  • methods: 这个方法使用变分 Monte Carlo 算法,无需任何自由参数和显式正交化不同态态。
  • results: 这个方法可以准确地估计量子系统的升阶态,并可以计算不同态之间的偏振矩。这种方法在分子物理中将是非常有用,例如可以准确地估计分子的升阶能量和振荡矩。
    Abstract We present a variational Monte Carlo algorithm for estimating the lowest excited states of a quantum system which is a natural generalization of the estimation of ground states. The method has no free parameters and requires no explicit orthogonalization of the different states, instead transforming the problem of finding excited states of a given system into that of finding the ground state of an expanded system. Expected values of arbitrary observables can be calculated, including off-diagonal expectations between different states such as the transition dipole moment. Although the method is entirely general, it works particularly well in conjunction with recent work on using neural networks as variational Ansatze for many-electron systems, and we show that by combining this method with the FermiNet and Psiformer Ansatze we can accurately recover vertical excitation energies and oscillator strengths on molecules as large as benzene. Beyond the examples on molecules presented here, we expect this technique will be of great interest for applications of variational quantum Monte Carlo to atomic, nuclear and condensed matter physics.
    摘要 我团队提出了一种变分 Monte Carlo 算法,用于估算量子系统的最低强度态。这种方法是自然推广估算系统的基准态的方法,没有自由参数,不需要显式对不同态进行正交化。将问题变为找到系统的扩展系统的基准态。可以计算出任意观测量的期望值,包括不同态之间的偏移量,如轨道电动势矩。这种方法是普适的,但特别适用于使用神经网络作为多电子系统的变量 Ansatz,我们示出了将这种方法与 FermiNet 和 Psiformer Ansatz 结合使用可以准确地回归分子上的垂直升降能和振荡强度。此外,我们预期这种技术在原子、核和 condensed matter 物理中将具有广泛的应用。

FedDD: Toward Communication-efficient Federated Learning with Differential Parameter Dropout

  • paper_url: http://arxiv.org/abs/2308.16835
  • repo_url: None
  • paper_authors: Zhiying Feng, Xu Chen, Qiong Wu, Wen Wu, Xiaoxi Zhang, Qianyi Huang
  • for: 提高 Federated Learning(FL)的通信效率和模型融合能力,解决客户端间网络环境不同而导致的长时间通信延迟和模型融合问题。
  • methods: 提出了一种基于模型参数抽象的 Federated Learning 方案,即 Dropout 率分配和上传参数选择两个关键模块,通过优化客户端对应的模型参数上传比例和选择重要参数上传,以适应不同客户端的各种各样的网络环境和数据特点。
  • results: 通过 teorically 分析和实验评估,显示了提出的 FedDD 方案可以在通信效率和模型融合能力两个方面具有出色的表现,同时具有强大的泛化能力,能够适应数据的罕见类。
    Abstract Federated Learning (FL) requires frequent exchange of model parameters, which leads to long communication delay, especially when the network environments of clients vary greatly. Moreover, the parameter server needs to wait for the slowest client (i.e., straggler, which may have the largest model size, lowest computing capability or worst network condition) to upload parameters, which may significantly degrade the communication efficiency. Commonly-used client selection methods such as partial client selection would lead to the waste of computing resources and weaken the generalization of the global model. To tackle this problem, along a different line, in this paper, we advocate the approach of model parameter dropout instead of client selection, and accordingly propose a novel framework of Federated learning scheme with Differential parameter Dropout (FedDD). FedDD consists of two key modules: dropout rate allocation and uploaded parameter selection, which will optimize the model parameter uploading ratios tailored to different clients' heterogeneous conditions and also select the proper set of important model parameters for uploading subject to clients' dropout rate constraints. Specifically, the dropout rate allocation is formulated as a convex optimization problem, taking system heterogeneity, data heterogeneity, and model heterogeneity among clients into consideration. The uploaded parameter selection strategy prioritizes on eliciting important parameters for uploading to speedup convergence. Furthermore, we theoretically analyze the convergence of the proposed FedDD scheme. Extensive performance evaluations demonstrate that the proposed FedDD scheme can achieve outstanding performances in both communication efficiency and model convergence, and also possesses a strong generalization capability to data of rare classes.
    摘要 federated learning (FL) 需要频繁交换模型参数,这会导致长途通信延迟,特别是当客户端环境差异较大时。此外,服务器还需要等待最慢的客户端(即废物,可能有最大模型大小、最低计算能力或最差网络条件)上传参数,这可能会对通信效率产生重大影响。常见的客户端选择方法,如部分客户端选择,会导致计算资源浪费和全局模型的泛化弱化。为解决这个问题,我们在这篇论文中提出了参数掉弃法,并对其进行了修改,从而提出了一种新的 Federated Learning 方案——Differential Parameter Dropout (FedDD)。FedDD 包括两个关键模块:Dropout 率分配和上传参数选择,它们会优化模型参数上传比例适应不同客户端的各种差异,同时选择合适的重要模型参数上传。具体来说,Dropout 率分配是一个凸型优化问题,考虑到系统差异、数据差异和模型差异。上传参数选择策略强调选择重要的参数上传,以加速协同整合。此外,我们还 theoretically 分析了 FedDD 方案的收敛性。EXT 的性能评估表明,提出的 FedDD 方案可以在通信效率和模型收敛之间取得极佳的平衡,同时具有强大的泛化能力,对数据的罕见类型进行泛化。

Joint Semantic-Native Communication and Inference via Minimal Simplicial Structures

  • paper_url: http://arxiv.org/abs/2308.16789
  • repo_url: None
  • paper_authors: Qiyang Zhao, Hang Zou, Mehdi Bennis, Merouane Debbah, Ebtesam Almazrouei, Faouzi Bader
  • for: 在本文中,我们研究了 semantic communication and inference 问题,在其中一个学生代理(Mobile device)向一个教师代理(Cloud server)提出查询以生成更高级别的数据semantics。
  • methods: 教师首先将其数据映射到 k-order simplicial complex 中,并学习其高阶相关性。为了有效地传输信息和进行推理,教师寻找最小 suffice 和不变的 semantic structures,并通过judiciously removing simplices 选择由Hodge Laplacians 选择的 simplicial structures。学生本地运行自己的查询,基于masked simplicial convolutional autoencoder (SCAE),并利用本地和远程教师的知识。
  • results: 数字结果表明我们提出的方法有效地提高了查询准确率,不同通道条件和 simplicial 结构下。在一个合作作者数据集上,去掉 simplicial 结构 ranked Laplacian values 可以减少payload大小85%,不会影响准确率。 joint semantic communication and inference by masked SCAE 可以提高查询准确率25%,比本地学生基于查询和远程教师基于查询的准确率高15%。最后,我们发现 incorporating channel semantics 可以有效地提高推理准确率,特别是在低 SNR 值下。
    Abstract In this work, we study the problem of semantic communication and inference, in which a student agent (i.e. mobile device) queries a teacher agent (i.e. cloud sever) to generate higher-order data semantics living in a simplicial complex. Specifically, the teacher first maps its data into a k-order simplicial complex and learns its high-order correlations. For effective communication and inference, the teacher seeks minimally sufficient and invariant semantic structures prior to conveying information. These minimal simplicial structures are found via judiciously removing simplices selected by the Hodge Laplacians without compromising the inference query accuracy. Subsequently, the student locally runs its own set of queries based on a masked simplicial convolutional autoencoder (SCAE) leveraging both local and remote teacher's knowledge. Numerical results corroborate the effectiveness of the proposed approach in terms of improving inference query accuracy under different channel conditions and simplicial structures. Experiments on a coauthorship dataset show that removing simplices by ranking the Laplacian values yields a 85% reduction in payload size without sacrificing accuracy. Joint semantic communication and inference by masked SCAE improves query accuracy by 25% compared to local student based query and 15% compared to remote teacher based query. Finally, incorporating channel semantics is shown to effectively improve inference accuracy, notably at low SNR values.
    摘要 在这项研究中,我们研究了 semantic communication和推理问题,在其中一个学生代理(即移动设备)向一个教师代理(即云服务器)发送查询来生成更高级数据 semantics。特别是,教师首先将其数据映射到 k-order simplicial complex 中,并学习其高级相关性。为了有效地传输信息并进行推理,教师寻找最小 suffice 和不变的 semantic structures ,以便在传输信息之前进行准备。这些最小 simplicial structures 通过 judiciously removing simplices 选择 Hodge Laplacians 而获得。然后,学生本地运行自己的查询,基于 masked simplicial convolutional autoencoder (SCAE) ,利用本地和远程教师的知识。numerical results 表明提议的方法可以在不同的通道条件和 simplicial structures 下提高推理查询精度。在 coauthorship 数据集上,通过将 simplices 按照 Laplacian 值排序来减少payload大小,可以得到85%的减少,而不会影响准确性。 joint semantic communication and inference by masked SCAE 可以提高查询精度,比 мест学生基于查询的精度提高25%,比远程教师基于查询的精度提高15%。最后, incorporating channel semantics 可以有效地提高推理精度,特别是在低 SNR 值下。

Constructing Indoor Region-based Radio Map without Location Labels

  • paper_url: http://arxiv.org/abs/2308.16759
  • repo_url: None
  • paper_authors: Zheng Xing, Junting Chen
  • for: construct a radio map without location labels
  • methods: signal subspace model with sequential prior, integrated segmentation and clustering algorithm
  • results: reduces region localization error by roughly 50% compared to baseline, outperforms some supervised localization schemes
    Abstract Radio map construction requires a large amount of radio measurement data with location labels, which imposes a high deployment cost. This paper develops a region-based radio map from received signal strength (RSS) measurements without location labels. The construction is based on a set of blindly collected RSS measurement data from a device that visits each region in an indoor area exactly once, where the footprints and timestamps are not recorded. The main challenge is to cluster the RSS data and match clusters with the physical regions. Classical clustering algorithms fail to work as the RSS data naturally appears as non-clustered due to multipaths and noise. In this paper, a signal subspace model with a sequential prior is constructed for the RSS data, and an integrated segmentation and clustering algorithm is developed, which is shown to find the globally optimal solution in a special case. Furthermore, the clustered data is matched with the physical regions using a graph-based approach. Based on real measurements from an office space, the proposed scheme reduces the region localization error by roughly 50% compared to a weighted centroid localization (WCL) baseline, and it even outperforms some supervised localization schemes, including k-nearest neighbor (KNN), support vector machine (SVM), and deep neural network (DNN), which require labeled data for training.
    摘要 Radio 地图构建需要大量的无线测量数据与位置标签,这会导致高达 deployment 成本。本文基于接收信号强度(RSS)测量数据无法获取位置标签,开发了一种基于区域的无线地图。该构建基于一组隐藏收集的 RSS 测量数据,来自设备在室内区域中访问每个区域仅一次,无法记录足迹和时间戳。主要挑战是将 RSS 数据分组并与物理区域匹配。经典的分组算法无法工作,因为 RSS 数据自然出现非分布的特征,即 multipath 和噪声。本文构建了一个信号特征空间模型,并开发了一种整合分组和 clustering 算法,该算法在特定情况下能够找到全球最佳解决方案。此外,分组后的数据与物理区域进行图形相匹配。基于实际测量的办公室空间数据,提出的方案与权重中心位置标注(WCL)基准相比,减少了地区本地化错误约50%,甚至超过了一些指导式本地化方案,包括 k-最近邻(KNN)、支持向量机(SVM)和深度神经网络(DNN),这些方案需要训练数据。

Training Neural Networks Using Reproducing Kernel Space Interpolation and Model Reduction

  • paper_url: http://arxiv.org/abs/2308.16754
  • repo_url: None
  • paper_authors: Eric Arthur Werneburg
  • for: 这篇论文是关于使用插值技术来训练神经网络的理论研究。
  • methods: 这篇论文使用了 reproduce kernel Hilbert space 理论来推广神经网络的训练方法,并研究了 associate Hilbert space 的概念以提高 activation function 的表达能力。
  • results: 该论文提出了一种基于多重复杂函数理论的神经网络 architecture,称为 Prolongation Neural Networks (PNN),并证明了 PNN 在噪音环境中表现更好于当前state-of-the-art 方法。
    Abstract We introduce and study the theory of training neural networks using interpolation techniques from reproducing kernel Hilbert space theory. We generalize the method to Krein spaces, and show that widely-used neural network architectures are subsets of reproducing kernel Krein spaces (RKKS). We study the concept of "associated Hilbert spaces" of RKKS and develop techniques to improve upon the expressivity of various activation functions. Next, using concepts from the theory of functions of several complex variables, we prove a computationally applicable, multidimensional generalization of the celebrated Adamjan- Arov-Krein (AAK) theorem. The theorem yields a novel class of neural networks, called Prolongation Neural Networks (PNN). We demonstrate that, by applying the multidimensional AAK theorem to gain a PNN, one can gain performance superior to both our interpolatory methods and current state-of-the-art methods in noisy environments. We provide useful illustrations of our methods in practice.
    摘要 我们介绍和研究使用 interpolating 技术训练神经网络的理论,并将其推广到 Krein 空间中。我们显示了现有的神经网络架构是 reproduce 的 kernel Krein 空间(RKKS)的子集。我们研究了 associate 的希尔伯特空间 的概念,并开发了提高各种激活函数表达能力的技巧。接着,使用函数多个复数变量理论,我们证明了一种计算可行的、多维度泛化的 Adamjan-Arov-Krein(AAK)定理。这个定理提供了一种新的神经网络,称为 Prolongation Neural Networks(PNN)。我们示出了,通过将多维度 AAK 定理应用于 PNN,可以在噪音环境中获得性能更高的结果,比我们的 interpolatory 方法和当前领域的状态OF-the-art 方法更好。我们在实践中提供了有用的示例。

Moreau Envelope ADMM for Decentralized Weakly Convex Optimization

  • paper_url: http://arxiv.org/abs/2308.16752
  • repo_url: None
  • paper_authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Alexander Jung, Stefan Werner
  • for: 提出一种基于分布式优化的 proximal ADMM 算法,用于解决各种 convex 和非 convex 优化问题。
  • methods: 使用 Moreau envelope function 分析 ADMM 算法的收敛性,并计算 dual 变量更新步骤中的 bound。
  • results: 对一系列 numerical experiments 进行了比较,发现提出的方法比 widely-used approaches 更快和稳定。
    Abstract This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization. Although the current versions of ADMM algorithm provide promising numerical results in producing solutions that are close to optimal for many convex and non-convex optimization problems, it remains unclear if they can converge to a stationary point for weakly convex and locally non-smooth functions. Through our analysis using the Moreau envelope function, we demonstrate that MADM can indeed converge to a stationary point under mild conditions. Our analysis also includes computing the bounds on the amount of change in the dual variable update step by relating the gradient of the Moreau envelope function to the proximal function. Furthermore, the results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
    摘要 这篇论文提出了一种靠近的多向量方法(MADM),用于分布式优化。 current ADMM 算法可以在许多凹陷和非凹陷优化问题上提供优秀的数值结果,但是无法确定是否可以到达一个稳定点 для弱 convex 和地方非滑降函数。 我们通过使用Moreau函数的包装函数进行分析,并证明了 MADM 可以到达稳定点,只要满足一定的轻量级条件。 我们的分析还包括计算 dual 变量更新步骤中的变化 bounds,通过将 proximal 函数的梯度与 Moreau 函数的梯度相关。 此外,我们的数值实验结果表明,我们的方法比广泛使用的方法更快和更稳定。

Robust Representation Learning for Unreliable Partial Label Learning

  • paper_url: http://arxiv.org/abs/2308.16718
  • repo_url: None
  • paper_authors: Yu Shi, Dong-Dong Wu, Xin Geng, Min-Ling Zhang
  • for: 提高弱监督学习中对假标签的耐质量性能
  • methods: 提出了一种基于不可靠部分标签学习的不可靠度耐质量学习框架(URRL),并提出了一种双战略,包括KNN基于候选标签集修正和一致规则基于标签冲突的修正方法
  • results: 对多个 datasets 进行了广泛的实验,并证明了该方法可以比现有方法在不同的不可靠性和模糊性下表现出较好的性能
    Abstract Partial Label Learning (PLL) is a type of weakly supervised learning where each training instance is assigned a set of candidate labels, but only one label is the ground-truth. However, this idealistic assumption may not always hold due to potential annotation inaccuracies, meaning the ground-truth may not be present in the candidate label set. This is known as Unreliable Partial Label Learning (UPLL) that introduces an additional complexity due to the inherent unreliability and ambiguity of partial labels, often resulting in a sub-optimal performance with existing methods. To address this challenge, we propose the Unreliability-Robust Representation Learning framework (URRL) that leverages unreliability-robust contrastive learning to help the model fortify against unreliable partial labels effectively. Concurrently, we propose a dual strategy that combines KNN-based candidate label set correction and consistency-regularization-based label disambiguation to refine label quality and enhance the ability of representation learning within the URRL framework. Extensive experiments demonstrate that the proposed method outperforms state-of-the-art PLL methods on various datasets with diverse degrees of unreliability and ambiguity. Furthermore, we provide a theoretical analysis of our approach from the perspective of the expectation maximization (EM) algorithm. Upon acceptance, we pledge to make the code publicly accessible.
    摘要 受限 Label Learning(PLL)是一种弱有监督学习方法,每个训练实例都会被分配一组候选标签,但只有一个标签是真实的。然而,这种理想化的假设并不总是成立,因为可能存在注释错误,导致真实的标签不在候选标签集中。这被称为不可靠受限 Label Learning(UPLL),它带来了额外的复杂性,由于受限标签的不可靠性和抽象性,通常会导致现有方法的下降性能。为解决这个挑战,我们提出了不可靠性Robust Representation Learning框架(URRL),它利用不可靠性Robust contrastive learning来帮助模型在受限标签下坚持effectively。同时,我们提出了两个策略:一是KNN基于候选标签集 corrections,二是Consistency regularization基于标签卷积来纠正标签质量并增强表示学习的能力。广泛的实验显示,我们提出的方法在不同的数据集上与已有PLL方法进行比较,具有更高的性能。此外,我们还提供了基于EM算法的理论分析。接受后,我们将代码公开访问。

Everything, Everywhere All in One Evaluation: Using Multiverse Analysis to Evaluate the Influence of Model Design Decisions on Algorithmic Fairness

  • paper_url: http://arxiv.org/abs/2308.16681
  • repo_url: https://github.com/reliable-ai/fairml-multiverse
  • paper_authors: Jan Simson, Florian Pfisterer, Christoph Kern
    for: This paper aims to study the fairness of algorithmic decision-making (ADM) systems and provide a method for analyzing their fairness.methods: The authors introduce the method of multiverse analysis for algorithmic fairness, which turns implicit design decisions into explicit ones and demonstrates their fairness implications.results: The authors use an exemplary case study of predicting public health coverage for vulnerable populations to illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness, and how to detect these effects using multiverse analysis. The results show that the method can be used to better understand variability and robustness of algorithmic fairness.
    Abstract A vast number of systems across the world use algorithmic decision making (ADM) to (partially) automate decisions that have previously been made by humans. When designed well, these systems promise more objective decisions while saving large amounts of resources and freeing up human time. However, when ADM systems are not designed well, they can lead to unfair decisions which discriminate against societal groups. The downstream effects of ADMs critically depend on the decisions made during the systems' design and implementation, as biases in data can be mitigated or reinforced along the modeling pipeline. Many of these design decisions are made implicitly, without knowing exactly how they will influence the final system. It is therefore important to make explicit the decisions made during the design of ADM systems and understand how these decisions affect the fairness of the resulting system. To study this issue, we draw on insights from the field of psychology and introduce the method of multiverse analysis for algorithmic fairness. In our proposed method, we turn implicit design decisions into explicit ones and demonstrate their fairness implications. By combining decisions, we create a grid of all possible "universes" of decision combinations. For each of these universes, we compute metrics of fairness and performance. Using the resulting dataset, one can see how and which decisions impact fairness. We demonstrate how multiverse analyses can be used to better understand variability and robustness of algorithmic fairness using an exemplary case study of predicting public health coverage of vulnerable populations for potential interventions. Our results illustrate how decisions during the design of a machine learning system can have surprising effects on its fairness and how to detect these effects using multiverse analysis.
    摘要 很多系统在全球使用算法决策(ADM)来自动化以前由人类做出的决策。当这些系统设计得好时,它们承诺会提供更Objective的决策,同时节省大量资源并释放人类时间。然而,当ADM系统不好设计时,它们可能会导致不公正的决策,排挤社会群体。downstream效果 OF ADMs取决于系统设计和实施阶段中的决策,因为数据中的偏见可以在模型化管道中被减少或加强。许多这些设计决策是通过不确定的方式进行,不知道它们会在最终系统中产生什么影响。因此,需要将ADM系统的设计决策变为显式的,并理解这些决策如何影响系统的公平性。为了解决这个问题,我们从心理学中借鉴了一些思想,并提出了用于算法公平的多宇托分析方法。在我们的提议方法中,我们将设计决策转换为显式的决策,并证明这些决策对公平性的影响。我们创建了一个包含所有可能的决策组合的网格。对每个这些宇托,我们计算公平性和性能的度量。使用这些数据,我们可以看到哪些决策对公平性产生了影响。我们示例案例研究了预测护理覆盖的护理人口,以及可能的交叉 intervención。我们的结果表明,设计决策可能对算法公平产生不期望的影响,并如何使用多宇托分析来探测这些影响。

Branches of a Tree: Taking Derivatives of Programs with Discrete and Branching Randomness in High Energy Physics

  • paper_url: http://arxiv.org/abs/2308.16680
  • repo_url: None
  • paper_authors: Michael Kagan, Lukas Heinrich
  • for: 高能物理领域中的程序 diffeentiation
  • methods: gradient estimation techniques
  • results: 开 up gradient based optimization in detector design optimization, simulator tuning, or data analysis and reconstruction optimizationHere’s the full translation of the abstract in Simplified Chinese:
  • for: 本文为高能物理领域中的程序 diffeentiation提供了方法。
  • methods: 本文使用了多种gradient estimation techniques。
  • results: 本文通过gradient estimation techniques,开 up了高能物理中的探测设计优化、模拟器调整、数据分析和重建优化。I hope this helps!
    Abstract We propose to apply several gradient estimation techniques to enable the differentiation of programs with discrete randomness in High Energy Physics. Such programs are common in High Energy Physics due to the presence of branching processes and clustering-based analysis. Thus differentiating such programs can open the way for gradient based optimization in the context of detector design optimization, simulator tuning, or data analysis and reconstruction optimization. We discuss several possible gradient estimation strategies, including the recent Stochastic AD method, and compare them in simplified detector design experiments. In doing so we develop, to the best of our knowledge, the first fully differentiable branching program.
    摘要 我们提议使用多种梯度估计技术来启用高能物理中程序中的随机性的 differentiability。这些程序在高能物理中非常普遍,因为它们包含分支过程和归一化分析。因此,可以通过梯度基本优化来优化探测设计优化、模拟调试、数据分析和重建优化。我们讨论了多种可能的梯度估计策略,包括最近的随机AD方法,并在简化的探测设计实验中进行比较。在这之前,我们开发了,以我们所知道的,第一个完全可导分支程序。

Dynamic nsNet2: Efficient Deep Noise Suppression with Early Exiting

  • paper_url: http://arxiv.org/abs/2308.16678
  • repo_url: None
  • paper_authors: Riccardo Miccini, Alaa Zniber, Clément Laroche, Tobias Piechowiak, Martin Schoeberl, Luca Pezzarossa, Ouassim Karrakchou, Jens Sparsø, Mounir Ghogho
  • for: 提高资源受限设备上深度降噪模型的性能和资源利用率
  • methods: 基于nsNet2的早退模型,实现多级准确率和计算复杂度的负担减少,并通过分流信息流来考虑引入的动态性
  • results: 通过Established metrics显示了模型性能和计算复杂度之间的负担减少和平衡Here’s a breakdown of each point:
  • for: The paper is written to improve the performance and resource utilization of deep noise suppression models on resource-constrained devices.
  • methods: The paper proposes an early-exiting model based on nsNet2, which provides multiple levels of accuracy and resource savings by halting computations at different stages. The original architecture is adapted by splitting the information flow to account for injected dynamism.
  • results: The paper shows the trade-offs between performance and computational complexity based on established metrics.
    Abstract Although deep learning has made strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.
    摘要 although deep learning has made great strides in the field of deep noise suppression, leveraging deep architectures on resource-constrained devices still proved challenging. Therefore, we present an early-exiting model based on nsNet2 that provides several levels of accuracy and resource savings by halting computations at different stages. Moreover, we adapt the original architecture by splitting the information flow to take into account the injected dynamism. We show the trade-offs between performance and computational complexity based on established metrics.Here's the word-for-word translation:深度学习在深度噪声抑制领域已经做出了很大的进步,但是在有限资源的设备上运行深度建筑仍然是一个挑战。因此,我们提出了基于nsNet2的早退模型,可以在不同的阶段停止计算,并提供了几个级别的准确性和资源节省。此外,我们修改了原始建筑,将信息流分成多个流程,以考虑在注入的动态中的变化。我们根据已知的度量表示出了性能和计算复杂度之间的交易。

Communication-Efficient Decentralized Federated Learning via One-Bit Compressive Sensing

  • paper_url: http://arxiv.org/abs/2308.16671
  • repo_url: None
  • paper_authors: Shenglong Zhou, Kaidi Xu, Geoffrey Ye Li
  • for: 这个论文目的是为了实现分散式联合学习(DFL)中的模型训练,并且在分散式环境中实现模型训练的效率和稳定性。
  • methods: 本论文使用了一个基于对调方向法(iADM)的新算法,并且将模型受限于简洁性的限制,使得可以使用一位数探测(1BCS)来实现模型训练。在训练过程中,仅有部分邻居参与训练,使得算法具有耐慢端机器的特性。此外,算法使用关注点解决方法来解决复杂的子问题,并且使用关注点的关注点解决方法来解决复杂的子问题。
  • results: 数据实验表明,这个算法在通信和计算效率方面具有优秀的性能,并且可以实现模型训练的稳定性和可靠性。
    Abstract Decentralized federated learning (DFL) has gained popularity due to its practicality across various applications. Compared to the centralized version, training a shared model among a large number of nodes in DFL is more challenging, as there is no central server to coordinate the training process. Especially when distributed nodes suffer from limitations in communication or computational resources, DFL will experience extremely inefficient and unstable training. Motivated by these challenges, in this paper, we develop a novel algorithm based on the framework of the inexact alternating direction method (iADM). On one hand, our goal is to train a shared model with a sparsity constraint. This constraint enables us to leverage one-bit compressive sensing (1BCS), allowing transmission of one-bit information among neighbour nodes. On the other hand, communication between neighbour nodes occurs only at certain steps, reducing the number of communication rounds. Therefore, the algorithm exhibits notable communication efficiency. Additionally, as each node selects only a subset of neighbours to participate in the training, the algorithm is robust against stragglers. Additionally, complex items are computed only once for several consecutive steps and subproblems are solved inexactly using closed-form solutions, resulting in high computational efficiency. Finally, numerical experiments showcase the algorithm's effectiveness in both communication and computation.
    摘要 随着各种应用场景的实际需求,分布式联合学习(DFL)在最近几年得到了广泛的关注。与中央服务器协调训练过程相比,在DFL中训练共享模型对一大量节点进行训练是更加困难,因为没有中央服务器可以协调训练过程。尤其是当分布式节点受到通信或计算资源的限制时,DFL会遭遇极其不稳定和不fficient的训练。为了解决这些挑战,在这篇论文中,我们开发了基于不准确的 alternate direction 方法(iADM)的一种新算法。一方面,我们的目标是在共享模型中实现稀疏性约束。这种约束使我们可以利用一位数据压缩感知(1BCS),允许邻居节点之间传输一位信息。另一方面,邻居节点之间的通信只在某些步骤发生,因此算法的通信效率非常高。此外,每个节点只选择一 subset of 邻居节点参与训练,使算法具有抗异常节点(straggler)的性能。此外,复杂的项目只在几个连续步骤中计算一次,并使用关闭式解决方案解决减法问题,导致高效的计算性能。最后,数字实验证明算法在通信和计算方面的效果非常出色。

What can we learn from quantum convolutional neural networks?

  • paper_url: http://arxiv.org/abs/2308.16664
  • repo_url: None
  • paper_authors: Chukwudubem Umeano, Annie E. Paine, Vincent E. Elfving, Oleksandr Kyriienko
  • for: 这篇论文主要探讨了量子卷积神经网络(QCNN)的应用和性能。
  • methods: 这篇论文使用了量子卷积神经网络(QCNN)模型,并对其进行了严格的分析和测试。
  • results: 研究发现,使用QCNN模型可以高效地识别量子阶段,并且可以通过选择合适的基准函数来构建高性能的决策边界。此外,研究还发现,QCNN模型的泛化能力强度取决于嵌入类型,而使用旋转基准函数和快速变化的特征MAP可以提高模型的性能。
    Abstract We can learn from analyzing quantum convolutional neural networks (QCNNs) that: 1) working with quantum data can be perceived as embedding physical system parameters through a hidden feature map; 2) their high performance for quantum phase recognition can be attributed to generation of a very suitable basis set during the ground state embedding, where quantum criticality of spin models leads to basis functions with rapidly changing features; 3) pooling layers of QCNNs are responsible for picking those basis functions that can contribute to forming a high-performing decision boundary, and the learning process corresponds to adapting the measurement such that few-qubit operators are mapped to full-register observables; 4) generalization of QCNN models strongly depends on the embedding type, and that rotation-based feature maps with the Fourier basis require careful feature engineering; 5) accuracy and generalization of QCNNs with readout based on a limited number of shots favor the ground state embeddings and associated physics-informed models. We demonstrate these points in simulation, where our results shed light on classification for physical processes, relevant for applications in sensing. Finally, we show that QCNNs with properly chosen ground state embeddings can be used for fluid dynamics problems, expressing shock wave solutions with good generalization and proven trainability.
    摘要 我们可以从分析量子卷积神经网络(QCNN)中学习到以下几点:1. 在处理量子数据时,可以将物理系统的参数嵌入到隐藏特征地图中,并且这个嵌入可以帮助我们获得更好的表现。2. QCNNs 的高性能在量子阶梯识别 зада问题上可以推广到基于物理系统的权威性,这是因为在这些系统中,基于量子扭转的特征函数在嵌入过程中会快速变化。3. QCNNs 中的填充层负责选择适合形成高性能决策边界的基底函数,并且学习过程将量子扭转映射到全域观测器上。4. QCNNs 的对应类型强烈取决于嵌入类型,而且使用Rotation-based特征函数和Fourier基底需要特别的特征工程。5. QCNNs 的精度和通用性强烈取决于嵌入类型和读取方式,而且使用有限次调试的读取方式优先预测类型和物理知识。我们在模拟中证明了这些点,并且显示了这些模型在感测领域中的应用。最后,我们还证明了 QCNNs 以适当的嵌入类型和物理知识可以用于流体动力学问题,表示出对于冲击波解的好通用性和训练可靠性。

Autoencoder-based Online Data Quality Monitoring for the CMS Electromagnetic Calorimeter

  • paper_url: http://arxiv.org/abs/2308.16659
  • repo_url: None
  • paper_authors: Abhirami Harilal, Kyungmin Park, Michael Andrews, Manfred Paulini
  • for: 这项研究旨在开发一个基于深度学习的ECAL数据质量监测系统,以快速检测和诊断ECAL仪器中的各种问题,以确保物理数据的质量。
  • methods: 该研究使用了无监督的深度学习方法,开发了一个在实时中运行的自适应压缩器,以检测ECAL中未seen的异常。
  • results: 该系统能够有效地检测ECAL中的异常,并保持false discovery rate在$10^{-2}$到$10^{-4}$之间,超过了现有的标准准则。实际中的性能也得到了验证,并在2018和2022年的LHC冲撞数据中探测到了一些隐藏的问题。
    Abstract The online Data Quality Monitoring system (DQM) of the CMS electromagnetic calorimeter (ECAL) is a crucial operational tool that allows ECAL experts to quickly identify, localize, and diagnose a broad range of detector issues that would otherwise hinder physics-quality data taking. Although the existing ECAL DQM system has been continuously updated to respond to new problems, it remains one step behind newer and unforeseen issues. Using unsupervised deep learning, a real-time autoencoder-based anomaly detection system is developed that is able to detect ECAL anomalies unseen in past data. After accounting for spatial variations in the response of the ECAL and the temporal evolution of anomalies, the new system is able to efficiently detect anomalies while maintaining an estimated false discovery rate between $10^{-2}$ to $10^{-4}$, beating existing benchmarks by about two orders of magnitude. The real-world performance of the system is validated using anomalies found in 2018 and 2022 LHC collision data. Additionally, first results from deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.
    摘要 在CMS电磁calorimeter(ECAL)的在线数据质量监测系统(DQM)中,一个重要的运作工具可以帮助ECAL专家快速发现、定位和诊断各种探测器问题,这些问题会否妨碍物理质量数据收集。虽然现有的ECAL DQM系统已经不断更新以应对新的问题,但它仍然一步落后于新的问题。使用无监督深度学习,一个实时自适应器基于异常检测系统被开发出来,可以在ECAL中检测到未在过去数据中出现的异常。在考虑ECAL的响应特性和时间演化的情况下,新系统可以高效地检测异常,并且保持估计的假发现率在10^-2到10^-4之间,超过现有的标准 benchmark by about two orders of magnitude。实际性的性能 Validated using anomalies found in 2018 and 2022 LHC collision data. In addition, the first results of deploying the autoencoder-based system in the CMS online DQM workflow for the ECAL barrel during Run 3 of the LHC are presented, showing its promising performance in detecting obscure issues that could have been missed in the existing DQM system.

A Causal Discovery Approach To Learn How Urban Form Shapes Sustainable Mobility Across Continents

  • paper_url: http://arxiv.org/abs/2308.16599
  • repo_url: None
  • paper_authors: Felix Wagner, Florian Nachtigall, Lukas Franken, Nikola Milojevic-Dupont, Rafael H. M. Pereira, Nicolas Koch, Jakob Runge, Marta Gonzalez, Felix Creutzig
  • for: 本研究旨在提供对城市规划和交通系统低碳化发展的准确指导,通过掌握城市建设环境对旅行的影响关系。
  • methods: 本研究采用了 causal discovery 和可解释机器学习框架,通过高分辨率移动数据来探索城市形态对城市内旅行的影响关系。
  • results: 研究发现,城市的距离市中心、人口结构和密度对其他城市形态特征产生间接影响,而城市规划师和城市管理者可以通过了解这些影响关系来做出更加有效的城市规划决策。
    Abstract Global sustainability requires low-carbon urban transport systems, shaped by adequate infrastructure, deployment of low-carbon transport modes and shifts in travel behavior. To adequately implement alterations in infrastructure, it's essential to grasp the location-specific cause-and-effect mechanisms that the constructed environment has on travel. Yet, current research falls short in representing causal relationships between the 6D urban form variables and travel, generalizing across different regions, and modeling urban form effects at high spatial resolution. Here, we address all three gaps by utilizing a causal discovery and an explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We show that both distance to city center, demographics and density indirectly affect other urban form features. By considering the causal relationships, we find that location-specific influences align across cities, yet vary in magnitude. In addition, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits. Differences in urban form effects across the cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.
    摘要 Here, we address these gaps by using a causal discovery and explainable machine learning framework to detect urban form effects on intra-city travel based on high-resolution mobility data of six cities across three continents. We find that both distance to city center, demographics, and density indirectly affect other urban form features. By considering the causal relationships, we show that location-specific influences align across cities, yet vary in magnitude. Additionally, the spread of the city and the coverage of jobs across the city are the strongest determinants of travel-related emissions, highlighting the benefits of compact development and associated benefits.Differences in urban form effects across cities call for a more holistic definition of 6D measures. Our work is a starting point for location-specific analysis of urban form effects on mobility behavior using causal discovery approaches, which is highly relevant for city planners and municipalities across continents.

Development and validation of an interpretable machine learning-based calculator for predicting 5-year weight trajectories after bariatric surgery: a multinational retrospective cohort SOPHIA study

  • paper_url: http://arxiv.org/abs/2308.16585
  • repo_url: None
  • paper_authors: Patrick Saux, Pierre Bauvin, Violeta Raverdy, Julien Teigny, Hélène Verkindt, Tomy Soumphonphakdy, Maxence Debert, Anne Jacobs, Daan Jacobs, Valerie Monpellier, Phong Ching Lee, Chin Hong Lim, Johanna C Andersson-Assarsson, Lena Carlsson, Per-Arne Svensson, Florence Galtier, Guelareh Dezfoulian, Mihaela Moldovanu, Severine Andrieux, Julien Couster, Marie Lepage, Erminia Lembo, Ornella Verrastro, Maud Robert, Paulina Salminen, Geltrude Mingrone, Ralph Peterli, Ricardo V Cohen, Carlos Zerrweck, David Nocca, Carel W Le Roux, Robert Caiazzo, Philippe Preux, François Pattou
  • for: 预测个人5年减肥轨迹后operation。
  • methods: 使用机器学习模型,通过选择变量并构建可读性树来预测个人5年减肥轨迹。
  • results: 在多国多中心的测试 cohort 中,模型的 median absolute deviation(MAD)和 root mean squared error(RMSE)的Body Mass Index(BMI)值均在2.8kg/m${}^2$ 和4.7kg/m${}^2$ 之间,mean difference between predicted and observed BMI均为-0.3kg/m${}^2$。这个模型已经被 integrate into an easy-to-use and interpretable web-based prediction tool,以帮助在进行前置决策。
    Abstract Background Weight loss trajectories after bariatric surgery vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery. Methods In this multinational retrospective observational study we enrolled adult participants (aged $\ge$18 years) from ten prospective cohorts (including ABOS [NCT01129297], BAREVAL [NCT02310178], the Swedish Obese Subjects study, and a large cohort from the Dutch Obesity Clinic [Nederlandse Obesitas Kliniek]) and two randomised trials (SleevePass [NCT00793143] and SM-BOSS [NCT00356213]) in Europe, the Americas, and Asia, with a 5 year followup after Roux-en-Y gastric bypass, sleeve gastrectomy, or gastric band. Patients with a previous history of bariatric surgery or large delays between scheduled and actual visits were excluded. The training cohort comprised patients from two centres in France (ABOS and BAREVAL). The primary outcome was BMI at 5 years. A model was developed using least absolute shrinkage and selection operator to select variables and the classification and regression trees algorithm to build interpretable regression trees. The performances of the model were assessed through the median absolute deviation (MAD) and root mean squared error (RMSE) of BMI. Findings10 231 patients from 12 centres in ten countries were included in the analysis, corresponding to 30 602 patient-years. Among participants in all 12 cohorts, 7701 (75$\bullet$3%) were female, 2530 (24$\bullet$7%) were male. Among 434 baseline attributes available in the training cohort, seven variables were selected: height, weight, intervention type, age, diabetes status, diabetes duration, and smoking status. At 5 years, across external testing cohorts the overall mean MAD BMI was 2$\bullet$8 kg/m${}^2$ (95% CI 2$\bullet$6-3$\bullet$0) and mean RMSE BMI was 4$\bullet$7 kg/m${}^2$ (4$\bullet$4-5$\bullet$0), and the mean difference between predicted and observed BMI was-0$\bullet$3 kg/m${}^2$ (SD 4$\bullet$7). This model is incorporated in an easy to use and interpretable web-based prediction tool to help inform clinical decision before surgery. InterpretationWe developed a machine learning-based model, which is internationally validated, for predicting individual 5-year weight loss trajectories after three common bariatric interventions.
    摘要 背景:肥胖手术后Weight loss trajectories vary widely between individuals, and predicting weight loss before the operation remains challenging. We aimed to develop a model using machine learning to provide individual preoperative prediction of 5-year weight loss trajectories after surgery.方法:我们在多国多中心的retrospective observational study中包括了18岁以上成年参与者(包括ABOS [NCT01129297]、BAREVAL [NCT02310178]、瑞典肥胖Subjects研究和荷兰肥胖临床[Nederlandse Obesitas Kliniek])和两个随机化试验(SleevePass [NCT00793143] 和 SM-BOSS [NCT00356213]),涵盖欧洲、美洲和亚洲,并进行了5年跟踪。排除了前一次肥胖手术的患者或延迟了实际访问的时间。教学组包括了法国两个中心(ABOS和BAREVAL)。结果:10231名患者从12个中心的10个国家被包括在分析中,共计30602名患者-年。参与者中的7701名(75%)是女性,2530名(24%)是男性。在训练组中可以提供的434个基eline attribute中,选择了7个变量:身高、体重、 intervención类型、年龄、 диабе尼状况、 диабе尼持续时间和吸烟状况。在5年后的外部测试组中,总平均MAD BMI为2.8 kg/m${}^2$(95% CI 2.6-3.0),RMSE BMI为4.7 kg/m${}^2$(4.4-5.0),预测与实际BMI之差为-0.3 kg/m${}^2$(SD 4.7)。这个模型已经被 integrating into an easy-to-use and interpretable web-based prediction tool to help inform clinical decision before surgery。 interpret:我们使用机器学习来开发了一个在多国 Validated model for predicting individual 5-year weight loss trajectories after three common bariatric interventions。

MONDEO: Multistage Botnet Detection

  • paper_url: http://arxiv.org/abs/2308.16570
  • repo_url: https://github.com/TLDart/mondeo
  • paper_authors: Duarte Dias, Bruno Sousa, Nuno Antunes
  • for: The paper is written to detect DNS-based botnet malware in mobile devices using a lightweight and flexible mechanism called MONDEO.
  • methods: MONDEO uses four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation to identify botnet malware.
  • results: MONDEO was tested against several datasets and achieved high performance with RandomForest classifiers, making it a useful tool for detecting botnet malware in mobile devices.Here’s the same information in Simplified Chinese text:
  • for: 该文章是为探测移动设备中的 DNS 基于恶意软件使用轻量级、灵活的机制 MONDEO。
  • methods: MONDEO 使用四个检测阶段:黑名单/白名单、查询速率分析、 DGA 分析和机器学习评估来识别恶意软件。
  • results: MONDEO 在多个数据集上进行测试,并 achieved 高性能使用 RandomForest 分类器,使其成为移动设备中探测恶意软件的有用工具。
    Abstract Mobile devices have widespread to become the most used piece of technology. Due to their characteristics, they have become major targets for botnet-related malware. FluBot is one example of botnet malware that infects mobile devices. In particular, FluBot is a DNS-based botnet that uses Domain Generation Algorithms (DGA) to establish communication with the Command and Control Server (C2). MONDEO is a multistage mechanism with a flexible design to detect DNS-based botnet malware. MONDEO is lightweight and can be deployed without requiring the deployment of software, agents, or configuration in mobile devices, allowing easy integration in core networks. MONDEO comprises four detection stages: Blacklisting/Whitelisting, Query rate analysis, DGA analysis, and Machine learning evaluation. It was created with the goal of processing streams of packets to identify attacks with high efficiency, in the distinct phases. MONDEO was tested against several datasets to measure its efficiency and performance, being able to achieve high performance with RandomForest classifiers. The implementation is available at github.
    摘要 mobile devices已经广泛普及,成为现代技术中最受欢迎的一种。由于它们的特点,它们成为了主要的针对 botnet 恶意软件的目标。fluBot是一种 DNS 基于的 botnet 恶意软件,通过Domain Generation Algorithms(DGA)与命令和控制服务器(C2)进行通信。MONDEO是一种多stage机制,具有灵活的设计,用于探测 DNS 基于的 botnet 恶意软件。MONDEO 轻量级,不需要在移动设备中部署软件、代理或配置,可以方便地集成到核心网络中。MONDEO 包括四个检测阶段:黑名单/白名单、查询率分析、DGA 分析和机器学习评估。它的目标是处理流量包来确定攻击,高效率地完成任务。MONDEO 在多个数据集上进行测试,并能够达到高效率的RandomForest 分类器。实现可以在 GitHub 上找到。

Forecasting Emergency Department Crowding with Advanced Machine Learning Models and Multivariable Input

  • paper_url: http://arxiv.org/abs/2308.16544
  • repo_url: None
  • paper_authors: Jalmari Tuominen, Eetu Pulkkinen, Jaakko Peltonen, Juho Kanniainen, Niku Oksala, Ari Palomäki, Antti Roine
  • for: 预测急诊室(ED)填满情况,以提高患者安全性和健康结果。
  • methods: 使用先进的机器学习模型(ML)预测ED填满情况24小时前,并使用电子健康记录数据、床位数据、交通数据和天气变量等多变量输入。
  • results: N-BEATS和LightGBM模型在比较准确性方面表现出色,与统计学 benchmark 相比提高11%和9%;DeepAR模型预测下一天拥挤情况的ROC曲线为0.76(95% CI 0.69-0.84)。
    Abstract Emergency department (ED) crowding is a significant threat to patient safety and it has been repeatedly associated with increased mortality. Forecasting future service demand has the potential patient outcomes. Despite active research on the subject, several gaps remain: 1) proposed forecasting models have become outdated due to quick influx of advanced machine learning models (ML), 2) amount of multivariable input data has been limited and 3) discrete performance metrics have been rarely reported. In this study, we document the performance of a set of advanced ML models in forecasting ED occupancy 24 hours ahead. We use electronic health record data from a large, combined ED with an extensive set of explanatory variables, including the availability of beds in catchment area hospitals, traffic data from local observation stations, weather variables, etc. We show that N-BEATS and LightGBM outpeform benchmarks with 11 % and 9 % respective improvements and that DeepAR predicts next day crowding with an AUC of 0.76 (95 % CI 0.69-0.84). To the best of our knowledge, this is the first study to document the superiority of LightGBM and N-BEATS over statistical benchmarks in the context of ED forecasting.
    摘要 急诊室拥堵是一种严重的 patient safety 问题,已经被重复地与增加 mortality 相关。预测未来服务需求有可能改善 patient outcomes。Despite 多年的研究,还有几个空白:1)提议的预测模型已经因为快速的机器学习模型(ML)的涌入而过时,2)数据的多变量输入受限,3)绝对性表现指标很少被报道。在这种研究中,我们记录了一些高级 ML 模型在预测急诊室占用 24 小时前的表现。我们使用了一个大型、集成的急诊室数据,包括抢救区域医院床位可用性、当地观测站交通数据、天气变量等多个说服变量。我们发现,N-BEATS 和 LightGBM 在比较均匀的情况下表现出了11%和9%的提升,而 DeepAR 预测下一天拥堵的 AUC 为 0.76(95% CI 0.69-0.84)。据我们所知,这是第一个证明 LightGBM 和 N-BEATS 在急诊室预测中超过统计标准的研究。

Scalable Incomplete Multi-View Clustering with Structure Alignment

  • paper_url: http://arxiv.org/abs/2308.16541
  • repo_url: https://github.com/wy1019/simvc-sa
  • paper_authors: Yi Wen, Siwei Wang, Ke Liang, Weixuan Liang, Xinhang Wan, Xinwang Liu, Suyuan Liu, Jiyuan Liu, En Zhu
  • for: This paper focuses on the problem of incomplete multi-view clustering (IMVC) and proposes a novel incomplete anchor graph learning framework called Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA) to tackle the issues of inter-view discrepancy and anchor misalignment.
  • methods: The proposed method constructs view-specific anchor graphs to capture complementary information from different views, and aligns the cross-view anchor correspondence using a novel structure alignment module. The anchor graph construction and alignment are jointly optimized in the unified framework to enhance clustering quality.
  • results: Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of the proposed method, with linear time and space complexity correlated with the number of samples. The code is publicly available at https://github.com/wy1019/SIMVC-SA.
    Abstract The success of existing multi-view clustering (MVC) relies on the assumption that all views are complete. However, samples are usually partially available due to data corruption or sensor malfunction, which raises the research of incomplete multi-view clustering (IMVC). Although several anchor-based IMVC methods have been proposed to process the large-scale incomplete data, they still suffer from the following drawbacks: i) Most existing approaches neglect the inter-view discrepancy and enforce cross-view representation to be consistent, which would corrupt the representation capability of the model; ii) Due to the samples disparity between different views, the learned anchor might be misaligned, which we referred as the Anchor-Unaligned Problem for Incomplete data (AUP-ID). Such the AUP-ID would cause inaccurate graph fusion and degrades clustering performance. To tackle these issues, we propose a novel incomplete anchor graph learning framework termed Scalable Incomplete Multi-View Clustering with Structure Alignment (SIMVC-SA). Specially, we construct the view-specific anchor graph to capture the complementary information from different views. In order to solve the AUP-ID, we propose a novel structure alignment module to refine the cross-view anchor correspondence. Meanwhile, the anchor graph construction and alignment are jointly optimized in our unified framework to enhance clustering quality. Through anchor graph construction instead of full graphs, the time and space complexity of the proposed SIMVC-SA is proven to be linearly correlated with the number of samples. Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.
    摘要 成功的多视图划分(MVC)取决于所有视图都是完整的,但是样本通常只有部分可用,这引起了划分不完整的多视图划分(IMVC)的研究。虽然一些基于锚点的IMVC方法已经提出,但它们仍然受到以下缺点的影响:一、大多数现有方法忽视了视图之间的差异,强制跨视图表示保持一致,这会让模型的表示能力受损;二、由于不同视图中的样本差异,学习的锚点可能会偏移,我们称之为缺失锚点问题(AUP-ID)。这种AUP-ID会导致不正确的图融合和下降划分性能。为解决这些问题,我们提出了一种基于缺失锚点的新型多视图划分框架,名为扩展可靠多视图划分with结构对齐(SIMVC-SA)。我们特别是建立视图特定的锚点图来捕捉不同视图中的补做信息。为解决AUP-ID,我们提出了一种新的结构对齐模块,以修正跨视图锚点对应关系。同时,锚点图建构和对齐在我们的统一框架中同步优化,以提高划分质量。通过锚点图建构而不是全图建构,我们提出的SIMVC-SA的时间和空间复杂度被证明为与样本数量直接相关。我们的代码可以在https://github.com/wy1019/SIMVC-SA上获取。Extensive experiments on seven incomplete benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Our code is publicly available at https://github.com/wy1019/SIMVC-SA.

Echocardiographic View Classification with Integrated Out-of-Distribution Detection for Enhanced Automatic Echocardiographic Analysis

  • paper_url: http://arxiv.org/abs/2308.16483
  • repo_url: None
  • paper_authors: Jaeik Jeon, Seongmin Ha, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Yeonggul Jang, Youngtaek Hong, Hyuk-Jae Chang
  • for: 这份研究旨在提高自动echocardiography分类的精度和可靠性,以便在诊断和评估心脏病的过程中帮助医生。
  • methods: 这篇研究使用了深度学习的方法,包括31种标本分类和对于没有分布(OOD)的检测。
  • results: 研究结果显示,ECHO-VICODE可以实现高精度和可靠性的心脏病分类,并且能够有效地检测没有分布的检测结果。
    Abstract In the rapidly evolving field of automatic echocardiographic analysis and interpretation, automatic view classification is a critical yet challenging task, owing to the inherent complexity and variability of echocardiographic data. This study presents ECHOcardiography VIew Classification with Out-of-Distribution dEtection (ECHO-VICODE), a novel deep learning-based framework that effectively addresses this challenge by training to classify 31 classes, surpassing previous studies and demonstrating its capacity to handle a wide range of echocardiographic views. Furthermore, ECHO-VICODE incorporates an integrated out-of-distribution (OOD) detection function, leveraging the relative Mahalanobis distance to effectively identify 'near-OOD' instances commonly encountered in echocardiographic data. Through extensive experimentation, we demonstrated the outstanding performance of ECHO-VICODE in terms of view classification and OOD detection, significantly reducing the potential for errors in echocardiographic analyses. This pioneering study significantly advances the domain of automated echocardiography analysis and exhibits promising prospects for substantial applications in extensive clinical research and practice.
    摘要 在自动echocardiographic分析和解释领域中,自动视类别是一项挑战性的任务,主要因为echocardiographic数据的内在复杂性和变化性。本研究提出了ECHOcardiography View Classification with Out-of-Distribution Detection(ECHO-VICODE),一种深度学习基础的框架,能够有效地解决这个挑战。ECHO-VICODE通过训练31个类别,超过了先前的研究,并证明了其能够处理广泛的echocardiographic视图。此外,ECHO-VICODE还包括内置的out-of-distribution(OOD)检测功能,利用相对的Mahalanobis距离有效地标识echocardiographic数据中的“近OOD”实例。经过广泛的实验,我们证明了ECHO-VICODE在视图类别和OOD检测方面的出色性能,明显减少了echocardiographic分析中的可能的错误。这项创新的研究在自动echocardiography分析领域中具有先驱性,展现出了广泛的临床研究和实践应用的潜在前景。

A Policy Adaptation Method for Implicit Multitask Reinforcement Learning Problems

  • paper_url: http://arxiv.org/abs/2308.16471
  • repo_url: None
  • paper_authors: Satoshi Yamamori, Jun Morimoto
  • for: 这个研究旨在应对动态运动生成任务中的聚合和碰撞,小的政策参数变化可能导致应力的完全不同结果。例如,在足球游戏中,几分之一的对策变化可以导致球会飞行在完全不同的方向上,但是不需要完全不同的技能。
  • methods: 这个研究使用了多任务强化学习算法来适应单一运动类别中的不同目标或环境,以及不同的奖励函数或物理环境的 Parameters。
  • results: 研究结果显示,提案的方法可以适应不同的目标位置或球的弹性系数的隐藏变化,而标准的预像随机化方法则无法处理不同的任务设定。
    Abstract In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.
    摘要 在动态动作生成任务中,包括 contacts 和碰撞,小型政策参数变化可以导致极其不同的返回。例如,在足球中,通过些微改变球头位置或发球力量,球可以飞向完全不同的方向,但是不需要完全不同的技能。在本研究中,我们提出了一种多任务强化学习算法,用于适应单个动作类别中的不同目标或环境中的隐式变化。我们使用一个单脚机器人模型进行评估。结果表明,我们的方法可以适应不同的目标位置或球的归退率,而标准领域随机化方法无法处理不同的任务设定。

Domain-adaptive Message Passing Graph Neural Network

  • paper_url: http://arxiv.org/abs/2308.16470
  • repo_url: https://github.com/shenxiaocam/dm_gnn
  • paper_authors: Xiao Shen, Shirui Pan, Kup-Sze Choi, Xi Zhou
  • for: 本研究旨在Addressing the challenge of cross-network node classification (CNNC), which involves classifying nodes in a label-deficient target network using the knowledge from a source network with abundant labels.
  • methods: 本研究提出了一种域 adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN 可以学习具有识别力的表示,同时也可以在不同网络之间进行转移学习。
  • results: 对于 eleven state-of-the-art methods, 本研究的 DM-GNN 显示出更高的效果,能够更好地匹配类别分布 across networks.
    Abstract Cross-network node classification (CNNC), which aims to classify nodes in a label-deficient target network by transferring the knowledge from a source network with abundant labels, draws increasing attention recently. To address CNNC, we propose a domain-adaptive message passing graph neural network (DM-GNN), which integrates graph neural network (GNN) with conditional adversarial domain adaptation. DM-GNN is capable of learning informative representations for node classification that are also transferrable across networks. Firstly, a GNN encoder is constructed by dual feature extractors to separate ego-embedding learning from neighbor-embedding learning so as to jointly capture commonality and discrimination between connected nodes. Secondly, a label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction. In addition, a label-aware propagation scheme is devised for the labeled source network to promote intra-class propagation while avoiding inter-class propagation, thus yielding label-discriminative source embeddings. Thirdly, conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, so that the class-conditional distributions across networks can be better matched. Comparisons with eleven state-of-the-art methods demonstrate the effectiveness of the proposed DM-GNN.
    摘要 The proposed DM-GNN consists of three main components:1. GNN encoder: A GNN encoder is constructed using dual feature extractors to separate ego-embedding learning from neighbor-embedding learning. This allows the model to jointly capture commonality and discrimination between connected nodes.2. Label propagation node classifier: A label propagation node classifier is proposed to refine each node's label prediction by combining its own prediction and its neighbors' prediction.3. Conditional adversarial domain adaptation: Conditional adversarial domain adaptation is performed to take the neighborhood-refined class-label information into account during adversarial domain adaptation, allowing the class-conditional distributions across networks to be better matched.The proposed DM-GNN is evaluated using eleven state-of-the-art methods, and the results demonstrate its effectiveness in node classification tasks.In simplified Chinese, the text can be translated as: crossed-network node classification (CNNC) 是一种将知识从有 labels 的源网络转移到无 labels 的目标网络中进行分类的技术,在最近引起了越来越多的注意。为了解决 CNNC,我们提出了一个域 adapted 的讯息传递图 neural network (DM-GNN),它结合了图 neural network (GNN) 和 conditional adversarial domain adaptation。DM-GNN 可以学习对网络分类的 informative 表现,并且可以跨网络传递。DM-GNN 的主要Component包括:1. GNN Encoder:使用 dual feature extractors 将 ego-embedding 学习和 neighbor-embedding 学习分开,以便同时捕捉网络中连接的node之间的共同性和分别性。2. Label Propagation Node Classifier:提出了一个 label propagation 节点分类器,可以透过节点的自己预测和邻居预测来优化节点的类别预测。3. Conditional Adversarial Domain Adaptation:在 conditional adversarial domain adaptation 中,使用节点预测的类别信息来对网络进行域对应,以便更好地匹配网络间的类别分布。DM-GNN 在 eleven state-of-the-art 方法中进行评估,结果显示了它的效果。

Computing excited states of molecules using normalizing flows

  • paper_url: http://arxiv.org/abs/2308.16468
  • repo_url: None
  • paper_authors: Yahya Saleh, Álvaro Fernández Corral, Armin Iske, Jochen Küpper, Andrey Yachmenev
  • for: 用于计算量子系统的ground和 excited状态
  • methods: 使用线性 span基函数的拟合方法,通过组合normalizing flows进行优化
  • results: 在计算三原料H$_2$S分子的很多振荡态和一些电子状态的诸如氢原子、分子氢离子和碳原子等一元电子系统中,达到了更高的精度和加速基aset快速整合。
    Abstract We present a new nonlinear variational framework for simultaneously computing ground and excited states of quantum systems. Our approach is based on approximating wavefunctions in the linear span of basis functions that are augmented and optimized \emph{via} composition with normalizing flows. The accuracy and efficiency of our approach are demonstrated in the calculations of a large number of vibrational states of the triatomic H$_2$S molecule as well as ground and several excited electronic states of prototypical one-electron systems including the hydrogen atom, the molecular hydrogen ion, and a carbon atom in a single-active-electron approximation. The results demonstrate significant improvements in the accuracy of energy predictions and accelerated basis-set convergence even when using normalizing flows with a small number of parameters. The present approach can be also seen as the optimization of a set of intrinsic coordinates that best capture the underlying physics within the given basis set.
    摘要 我们提出了一种新的非线性变分方法,用于同时计算量子系统的基态和激发态。我们的方法基于使用线性span中的基函数,并通过组合normalizing flow进行优化和增强。我们在计算了大量的振荡态状态的H2S分子以及电子系统的基态和一些激发态的许多例子中,显示了我们的方法的精度和效率。结果表明,使用normalizing flow的少量参数可以大幅提高基准集合的准确性和基准集合的快速收敛。此外,我们的方法也可以看作是在给定基准集合中优化一组内在坐标,以最好捕捉下面物理的内在物理。

Least Squares Maximum and Weighted Generalization-Memorization Machines

  • paper_url: http://arxiv.org/abs/2308.16456
  • repo_url: None
  • paper_authors: Shuai Wang, Zhen Wang, Yuan-Hai Shao
  • for: 这个论文提出了一种新的记忆机制,用于改进支持向量机器学习(LSSVM)模型。这种机制可以准确分区训练集而不导致过拟合。
  • methods: 该论文提出了两种记忆影响模型(MIMM和WIMM),以及一些不同的记忆影响函数。这些模型可以降解到LSSVM模型中。
  • results: 实验结果表明,我们的MIMM和WIMM模型在总体性能和时间成本方面都有优势,比LSSVM和其他记忆模型更好。
    Abstract In this paper, we propose a new way of remembering by introducing a memory influence mechanism for the least squares support vector machine (LSSVM). Without changing the equation constraints of the original LSSVM, this mechanism, allows an accurate partitioning of the training set without overfitting. The maximum memory impact model (MIMM) and the weighted impact memory model (WIMM) are then proposed. It is demonstrated that these models can be degraded to the LSSVM. Furthermore, we propose some different memory impact functions for the MIMM and WIMM. The experimental results show that that our MIMM and WIMM have better generalization performance compared to the LSSVM and significant advantage in time cost compared to other memory models.
    摘要 在这篇论文中,我们提出了一种新的记忆机制,用于改进最小二乘支持向量机(LSSVM)的性能。无需更改原始LSSVM的方程约束,这种机制可以准确地分区训练集而不导致过拟合。然后,我们提出了最大记忆影响模型(MIMM)和权重记忆影响模型(WIMM)。我们还提出了一些不同的记忆影响函数 для MIMM 和 WIMM。实验结果表明,我们的 MIMM 和 WIMM 在泛化性能方面表现更好,而且在时间成本方面具有显著的优势 compared to 其他记忆模型。

Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training

  • paper_url: http://arxiv.org/abs/2308.16453
  • repo_url: None
  • paper_authors: Xiang Li, Juncheng Guo, Qige Song, Jiang Xie, Yafei Sang, Shuyuan Zhao, Yongzheng Zhang
  • for: 本研究旨在提出一种新的隐私传输分类(ETC)框架,以便在移动互联网环境中有效地管理 encrypted traffic。
  • methods: 本文提出了一种新的 Pre-trAining Semi-Supervised ETC 框架(PASS),具有以下三个关键组成部分:1)对原始训练集进行采样和对比性预训练,以避免类别偏袋问题;2)使用半导引优化策略,以利用大量无标示流量数据并减轻人工标注工作负担;3)采用强化的循环优化策略,以提高 ETC 方法的泛化能力。
  • results: 对四个公共数据集进行比较,PASS 方法的性能比 state-of-the-art ETC 方法和普通采样方法更高,特别是在面临类别偏袋和流量同质化问题时。PASS 方法可以适应不同的特征提取器,并且可以在不同的网络环境下进行高效地隐私传输分类。
    Abstract Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.
    摘要 Mobile 互联网已经深深影响现代生活方式,Encrypted Traffic Classification(ETC)在移动互联网管理中扮演着关键角色,尤其是在移动应用程序使用加密通信的情况下。despite some existing learning-based ETC methods showing promising results, there are still three-fold limitations in real-world network environments, including 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors.

AntM$^{2}$C: A Large Scale Dataset For Multi-Scenario Multi-Modal CTR Prediction

  • paper_url: http://arxiv.org/abs/2308.16437
  • repo_url: None
  • paper_authors: Zhaoxin Huan, Ke Ding, Ang Li, Xiaolu Zhang, Xu Min, Yong He, Liang Zhang, Jun Zhou, Linjian Mo, Jinjie Gu, Zhongyi Liu, Wenliang Zhong, Guannan Zhang
    for:The paper is written for proposing a new CTR dataset, AntM$^{2}$C, to address the limitations of existing CTR datasets.methods:The paper uses a multi-scenario multi-modal approach, including 200 million users and 6 million items, to provide a more comprehensive understanding of user preferences. The dataset includes 200 features, such as ID-based features, raw text features, and image features.results:The paper provides comparisons with baseline methods on several typical CTR tasks based on the AntM$^{2}$C dataset, which is currently the largest-scale CTR dataset available.
    Abstract Click-through rate (CTR) prediction is a crucial issue in recommendation systems. There has been an emergence of various public CTR datasets. However, existing datasets primarily suffer from the following limitations. Firstly, users generally click different types of items from multiple scenarios, and modeling from multiple scenarios can provide a more comprehensive understanding of users. Existing datasets only include data for the same type of items from a single scenario. Secondly, multi-modal features are essential in multi-scenario prediction as they address the issue of inconsistent ID encoding between different scenarios. The existing datasets are based on ID features and lack multi-modal features. Third, a large-scale dataset can provide a more reliable evaluation of models, fully reflecting the performance differences between models. The scale of existing datasets is around 100 million, which is relatively small compared to the real-world CTR prediction. To address these limitations, we propose AntM$^{2}$C, a Multi-Scenario Multi-Modal CTR dataset based on industrial data from Alipay. Specifically, AntM$^{2}$C provides the following advantages: 1) It covers CTR data of 5 different types of items, providing insights into the preferences of users for different items, including advertisements, vouchers, mini-programs, contents, and videos. 2) Apart from ID-based features, AntM$^{2}$C also provides 2 multi-modal features, raw text and image features, which can effectively establish connections between items with different IDs. 3) AntM$^{2}$C provides 1 billion CTR data with 200 features, including 200 million users and 6 million items. It is currently the largest-scale CTR dataset available. Based on AntM$^{2}$C, we construct several typical CTR tasks and provide comparisons with baseline methods. The dataset homepage is available at https://www.atecup.cn/home.
    摘要 Click-through rate (CTR) 预测是推荐系统中的关键问题。随着不同场景的数据的出现,当前的 dataset 受到以下一些限制:首先,用户通常从多个场景中点击不同类型的项目,模型需要从多个场景中学习,以提供更全面的用户偏好。现有 dataset 仅包含同一类型的项目的数据,来自单一场景。其次,多 modal 特征是多场景预测中非常重要的,它们可以解决不同 ID 编码之间的不一致问题。现有 dataset 基于 ID 特征,缺乏多 modal 特征。第三,大规模 dataset 可以提供更可靠的评估,全面反映模型之间的性能差异。现有 dataset 的规模约为 100 万,相比实际 CTR 预测中更小。为解决这些限制,我们提出 AntM$^{2}$C,一个基于互联网数据的多场景多modal CTR dataset。具体来说,AntM$^{2}$C 提供了以下优势:1. 覆盖 5 种不同类型的项目的 CTR 数据,为用户偏好的研究提供了新的视角。2. 除 ID 特征之外,AntM$^{2}$C 还提供了 2 个多 modal 特征,包括原始文本和图像特征,可以有效建立不同 ID 之间的连接。3. AntM$^{2}$C 提供了 1 亿 CTR 数据,包括 200 万用户和 6 万个项目。它是目前已知最大规模的 CTR dataset。基于 AntM$^{2}$C,我们构建了一些典型的 CTR 任务,并与基准方法进行比较。 dataset 的主页可以在 中找到。

On the Equivalence between Implicit and Explicit Neural Networks: A High-dimensional Viewpoint

  • paper_url: http://arxiv.org/abs/2308.16425
  • repo_url: None
  • paper_authors: Zenan Ling, Zhenyu Liao, Robert C. Qiu
  • for: 研究高维隐式神经网络的性质和特点。
  • methods: 提供高维隐式神经网络的 conjugate kernels 和 neural tangent kernels 的等价性。
  • results: 在高维情况下,隐式和显式神经网络之间存在等价关系。In English, this translates to:
  • for: Investigating the properties and characteristics of high-dimensional implicit neural networks.
  • methods: Providing the high-dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels.
  • results: Demonstrating the equivalence between implicit and explicit networks in high dimensions.
    Abstract Implicit neural networks have demonstrated remarkable success in various tasks. However, there is a lack of theoretical analysis of the connections and differences between implicit and explicit networks. In this paper, we study high-dimensional implicit neural networks and provide the high dimensional equivalents for the corresponding conjugate kernels and neural tangent kernels. Built upon this, we establish the equivalence between implicit and explicit networks in high dimensions.
    摘要 高维隐藏神经网络(Implicit Neural Networks)在各种任务中表现出色,但是对于隐藏和显式网络之间的连接和差异的理论分析尚缺乏。本文研究高维隐藏神经网络,并提供高维等价的拟合kernel和神经折射kernel。基于这些结果,我们证明高维隐藏神经网络和显式神经网络在高维空间中是等价的。

DECODE: DilatEd COnvolutional neural network for Detecting Extreme-mass-ratio inspirals

  • paper_url: http://arxiv.org/abs/2308.16422
  • repo_url: None
  • paper_authors: Tianyu Zhao, Yue Zhou, Ruijun Shi, Zhoujian Cao, Zhixiang Ren
  • for: 探测EXTREME MASS RATIO INSPIRALS (EMRIs) 的检测具有复杂的波形、长时间和低信号响应率 (SNR),因此比单簇binary coalescences更加困难。
  • methods: 我们介绍了一种名为DECODE的端到端模型,该模型通过频域序列模型来检测EMRI信号。DECODE的核心是一个扩展的 causal convolutional neural network,通过训练使用 TDI-1.5 探测器响应来训练。
  • results: 我们对1年的多通道 TDI 数据进行了评估,并取得了一个真正正确率为 96.3%,假阳性率为 1%,并且每个检测只需0.01秒钟的计算时间。我们还通过三个示例 EMRI 信号的可见化来证明DECODE 的强大潜力。
    Abstract The detection of Extreme Mass Ratio Inspirals (EMRIs) is intricate due to their complex waveforms, extended duration, and low signal-to-noise ratio (SNR), making them more challenging to be identified compared to compact binary coalescences. While matched filtering-based techniques are known for their computational demands, existing deep learning-based methods primarily handle time-domain data and are often constrained by data duration and SNR. In addition, most existing work ignores time-delay interferometry (TDI) and applies the long-wavelength approximation in detector response calculations, thus limiting their ability to handle laser frequency noise. In this study, we introduce DECODE, an end-to-end model focusing on EMRI signal detection by sequence modeling in the frequency domain. Centered around a dilated causal convolutional neural network, trained on synthetic data considering TDI-1.5 detector response, DECODE can efficiently process a year's worth of multichannel TDI data with an SNR of around 50. We evaluate our model on 1-year data with accumulated SNR ranging from 50 to 120 and achieve a true positive rate of 96.3% at a false positive rate of 1%, keeping an inference time of less than 0.01 seconds. With the visualization of three showcased EMRI signals for interpretability and generalization, DECODE exhibits strong potential for future space-based gravitational wave data analyses.
    摘要 <Translate into Simplified Chinese.探测Extreme Mass Ratio Inspirals(EMRI)的过程非常复杂,因为它们的复杂波形、长时间 duration和低信号响应率(SNR),使其与固体二体合并更加棘手。尽管匹配滤波器基本技术需要大量计算资源,现有的深度学习基于方法主要处理时间频谱数据,并且通常受到数据持续时间和SNR的限制。此外,大多数现有工作忽略了时间延迟折射(TDI)和探测器响应计算中的长波长approximation,因此限制了它们对激光频率噪声的处理能力。在本研究中,我们介绍DECODE,一种集中在顺序模型频率域的端到端模型,用于EMRI信号检测。 DECODE中心在填充 causal 卷积神经网络,通过对合成数据进行TDI-1.5探测器响应训练,可以高效处理一年的多通道TDI数据,SNR约为50。我们对1年的数据进行评估,SNR值分别为50-120,DECODE实现了true positive率96.3%,false positive率1%,计算时间下than 0.01秒。通过对三个示例EMRI信号进行可见化和总结,DECODE表现出了对未来空间 gravitational wave数据分析的强大潜力。

CktGNN: Circuit Graph Neural Network for Electronic Design Automation

  • paper_url: http://arxiv.org/abs/2308.16406
  • repo_url: https://github.com/zehao-dong/CktGNN
  • paper_authors: Zehao Dong, Weidong Cao, Muhan Zhang, Dacheng Tao, Yixin Chen, Xuan Zhang
  • for: 这paper的目的是提出一种基于神经网络的电子设计自动化方法,用于快速和高效地设计分析电路。
  • methods: 这paper使用了一种名为Circuit Graph Neural Network(CktGNN)的方法,该方法同时自动生成电路拓扑和设备大小。CktGNN使用了两层GNN框架,将电路表示为一系列嵌入式GNN的输入。这种方法可以大幅提高设计效率,降低了消息传递的数量。
  • results: experiments表明,CktGNN在Open Circuit Benchmark(OCB)上表现出色,在 represenation-based optimization frameworks 下,其性能较其他最近的GNN基elines和人工设计更高。这些结果预示了一种基于学习的电子设计自动化方法的可能性。
    Abstract The electronic design automation of analog circuits has been a longstanding challenge in the integrated circuit field due to the huge design space and complex design trade-offs among circuit specifications. In the past decades, intensive research efforts have mostly been paid to automate the transistor sizing with a given circuit topology. By recognizing the graph nature of circuits, this paper presents a Circuit Graph Neural Network (CktGNN) that simultaneously automates the circuit topology generation and device sizing based on the encoder-dependent optimization subroutines. Particularly, CktGNN encodes circuit graphs using a two-level GNN framework (of nested GNN) where circuits are represented as combinations of subgraphs in a known subgraph basis. In this way, it significantly improves design efficiency by reducing the number of subgraphs to perform message passing. Nonetheless, another critical roadblock to advancing learning-assisted circuit design automation is a lack of public benchmarks to perform canonical assessment and reproducible research. To tackle the challenge, we introduce Open Circuit Benchmark (OCB), an open-sourced dataset that contains $10$K distinct operational amplifiers with carefully-extracted circuit specifications. OCB is also equipped with communicative circuit generation and evaluation capabilities such that it can help to generalize CktGNN to design various analog circuits by producing corresponding datasets. Experiments on OCB show the extraordinary advantages of CktGNN through representation-based optimization frameworks over other recent powerful GNN baselines and human experts' manual designs. Our work paves the way toward a learning-based open-sourced design automation for analog circuits. Our source code is available at \url{https://github.com/zehao-dong/CktGNN}.
    摘要 electronic design automation of analog circuits 是integrated circuit field中长期的挑战,因为设计空间很大,设计几何选择很复杂。在过去的几十年中,大量的研究努力都是专注于将晶体大小调整为 givent circuit topology。本文发表了一个名为Circuit Graph Neural Network(CktGNN)的网络模型,可以同时自动生成图形和设备调整,基于encoder-dependent的优化子routines。尤其是,CktGNN将图形编码为一个二级GNN框架(of nested GNN),将图形视为一系列对known subgraph basis的 комбинации。这样可以对设计效率产生重要的改善,将讯息传递需要的子图减少。然而,几乎所有的学习帮助图形设计自动化都受到一个问题的限制:缺乏公共的底线来进行 canonical assessment和可重复的研究。为了解决这个挑战,我们创建了Open Circuit Benchmark(OCB),一个开源的数据集,包含10,000个不同的操作增强器,对于这些图形进行了精确的特性extraction。OCB还具有通信式图形生成和评估功能,可以帮助CktGNN对各种图形进行设计。实验结果表明,CktGNN通过表现基础优化框架比其他最近的强大GNN基eline和人工设计来得到了很好的成绩。我们的工作开启了一个学习基于的开源设计自动化的未来。我们的代码可以在https://github.com/zehao-dong/CktGNN中找到。

Balancing between the Local and Global Structures (LGS) in Graph Embedding

  • paper_url: http://arxiv.org/abs/2308.16403
  • repo_url: None
  • paper_authors: Jacob Miller, Vahan Huroyan, Stephen Kobourov
  • for: 这个论文的目的是提出一种方法来平衡本地和全局结构(LGS)在图像模型中,通过可调参数。
  • methods: 该方法使用了一些图像模型,以及一些已知的质量指标,如压力和邻域保持。
  • results: 该研究通过synthetic和实际数据进行了评估,并结果表明LGS与现有方法竞争,使用了新的质量指标cluster distance preservation来评估中间结构捕捉。所有代码、数据、实验和分析都可以在线获取。
    Abstract We present a method for balancing between the Local and Global Structures (LGS) in graph embedding, via a tunable parameter. Some embedding methods aim to capture global structures, while others attempt to preserve local neighborhoods. Few methods attempt to do both, and it is not always possible to capture well both local and global information in two dimensions, which is where most graph drawing live. The choice of using a local or a global embedding for visualization depends not only on the task but also on the structure of the underlying data, which may not be known in advance. For a given graph, LGS aims to find a good balance between the local and global structure to preserve. We evaluate the performance of LGS with synthetic and real-world datasets and our results indicate that it is competitive with the state-of-the-art methods, using established quality metrics such as stress and neighborhood preservation. We introduce a novel quality metric, cluster distance preservation, to assess intermediate structure capture. All source-code, datasets, experiments and analysis are available online.
    摘要 我们提出了一种方法来平衡本地和全局结构(LGS)在图像中,通过可调参数。一些图像方法是捕捉全局结构,而另一些则尝试保留本地邻居。许多方法都不是同时能够 capture well both local和全局信息在两个维度,这是大多数图形描述生成器生活的地方。选择使用本地或全局嵌入 для可视化取决于任务以及对下面数据的结构,可能不知道在先。对于给定的图,LGS寻找一个好的本地和全局结构平衡,以保留。我们通过 synthetic和实际数据进行评估,结果表明LGS与当前状态的方法竞争,使用已确立的质量指标 such as 压力和邻居保持。我们引入了一个新的质量指标,即群集距离保持,来评估中间结构捕捉。所有源代码、数据集、实验和分析都可以在线获取。

Improving Robustness and Accuracy of Ponzi Scheme Detection on Ethereum Using Time-Dependent Features

  • paper_url: http://arxiv.org/abs/2308.16391
  • repo_url: None
  • paper_authors: Phuong Duy Huynh, Son Hoang Dau, Xiaodong Li, Phuc Luong, Emanuele Viterbo
  • for: The paper aims to detect Ponzi schemes in the cryptocurrency market, specifically using transaction data to improve the robustness and accuracy of detection.
  • methods: The authors propose new detection models that rely only on transactions and introduce novel time-dependent features to capture Ponzi behavior characteristics.
  • results: The proposed models achieve considerably higher accuracy, precision, recall, and F1-score than existing transaction-based models, making them more effective in detecting Ponzi schemes in the cryptocurrency market.Here’s the simplified Chinese text for the three information points:
  • for: 本研究旨在探讨区块链市场中 Ponzi 骗财 schemes 的探测方法,使用交易数据来提高检测的可靠性和准确性。
  • methods: 作者提出了一种基于交易数据的新的检测模型,该模型具有更高的可靠性、准确性、报告率和 F1 分数,可以更好地检测区块链市场中的 Ponzi 骗财。
  • results: 提出的模型在存在 Ponzi 骗财的情况下的检测效果比现有的交易基于模型更高,可以更好地保证检测的可靠性和准确性。
    Abstract The rapid development of blockchain has led to more and more funding pouring into the cryptocurrency market, which also attracted cybercriminals' interest in recent years. The Ponzi scheme, an old-fashioned fraud, is now popular on the blockchain, causing considerable financial losses to many crypto-investors. A few Ponzi detection methods have been proposed in the literature, most of which detect a Ponzi scheme based on its smart contract source code or opcode. The contract-code-based approach, while achieving very high accuracy, is not robust: first, the source codes of a majority of contracts on Ethereum are not available, and second, a Ponzi developer can fool a contract-code-based detection model by obfuscating the opcode or inventing a new profit distribution logic that cannot be detected (since these models were trained on existing Ponzi logics only). A transaction-based approach could improve the robustness of detection because transactions, unlike smart contracts, are harder to be manipulated. However, the current transaction-based detection models achieve fairly low accuracy. We address this gap in the literature by developing new detection models that rely only on the transactions, hence guaranteeing the robustness, and moreover, achieve considerably higher Accuracy, Precision, Recall, and F1-score than existing transaction-based models. This is made possible thanks to the introduction of novel time-dependent features that capture Ponzi behaviours characteristics derived from our comprehensive data analyses on Ponzi and non-Ponzi data from the XBlock-ETH repository
    摘要 随着区块链技术的快速发展,更多的投资者转移到了 криптовалюencies Market,其中也吸引了黑客的关注。在最近几年中, Ponzi 型骗财活动在区块链上变得非常流行,导致了许多 крип投资者遭受了重大的金融损失。文献中已经提出了一些 Ponzi 探测方法,大多数是基于智能合约源代码或 opcode 进行探测,但是这种方法并不够 robust。首先,大多数 Ethereum 上的合约源代码不可用,而且 Ponzi 开发者可以使用混淆或新的利润分配逻辑来欺骗合约代码检测模型。在交易基础上进行探测可以提高检测的Robustness,但现有的交易基础上的检测模型的准确率相对较低。我们在文献中填补这个空白,通过开发基于交易的新检测模型,以 guarantees 的Robustness和更高的准确率、精度、回归率和 F1 分数来检测 Ponzi。这种新的检测模型基于我们对 Ponzi 和非 Ponzi 数据进行了全面的分析,从而提取了 Ponzi 行为特征的时间依赖特征。

Multi-Objective Decision Transformers for Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.16379
  • repo_url: None
  • paper_authors: Abdelghani Ghanem, Philippe Ciblat, Mounir Ghogho
  • for: 提高停滞RL的效果,使其更能够利用 transformer 模型的注意力机制。
  • methods: 将 offline RL Reformulated 为多目标优化问题,并引入 action space regions 来改善 trajectory 表示。
  • results: experiments 表明,我们的提议可以更好地使用 transformer 模型的注意力机制,并且可以与现有 state-of-the-art 方法匹配或超越其性能。
    Abstract Offline Reinforcement Learning (RL) is structured to derive policies from static trajectory data without requiring real-time environment interactions. Recent studies have shown the feasibility of framing offline RL as a sequence modeling task, where the sole aim is to predict actions based on prior context using the transformer architecture. However, the limitation of this single task learning approach is its potential to undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the art methods.
    摘要 偏向式学习(RL)是结构化的,以 derive 策略从静止轨迹数据中获得,不需要实时环境交互。 recent studies have shown that framing offline RL as a sequence modeling task is feasible, where the sole aim is to predict actions based on prior context using the transformer architecture. However, this single task learning approach may undermine the transformer model's attention mechanism, which should ideally allocate varying attention weights across different tokens in the input context for optimal prediction. To address this, we reformulate offline RL as a multi-objective optimization problem, where the prediction is extended to states and returns. We also highlight a potential flaw in the trajectory representation used for sequence modeling, which could generate inaccuracies when modeling the state and return distributions. This is due to the non-smoothness of the action distribution within the trajectory dictated by the behavioral policy. To mitigate this issue, we introduce action space regions to the trajectory representation. Our experiments on D4RL benchmark locomotion tasks reveal that our propositions allow for more effective utilization of the attention mechanism in the transformer model, resulting in performance that either matches or outperforms current state-of-the-art methods.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation is written in a formal and literal style, which may not always be the most idiomatic or natural way to express the ideas in Chinese.

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

  • paper_url: http://arxiv.org/abs/2308.16369
  • repo_url: None
  • paper_authors: Amey Agrawal, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav S. Gulavani, Ramachandran Ramjee
  • for: 提高 LLVM 推理性能
  • methods: 使用 chunked-prefills 和 decode-maximal batching 技术
  • results: 实现了 significanly 提高 LLVM 推理性能,包括 decode throughput 提高 10 倍,总体 throughput 提高 1.33 倍,并且减少了 pipeline bubbles。
    Abstract Large Language Model (LLM) inference consists of two distinct phases - prefill phase which processes the input prompt and decode phase which generates output tokens autoregressively. While the prefill phase effectively saturates GPU compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles. We present SARATHI to address these challenges. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. During inference, the prefill chunk saturates GPU compute, while the decode requests 'piggyback' and cost up to an order of magnitude less compared to a decode-only batch. Chunked-prefills allows constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles. Our techniques yield significant improvements in inference performance across models and hardware. For the LLaMA-13B model on A6000 GPU, SARATHI improves decode throughput by up to 10x, and accelerates end-to-end throughput by up to 1.33x. For LLaMa-33B on A100 GPU, we achieve 1.25x higher end-to-end-throughput and up to 4.25x higher decode throughput. When used with pipeline parallelism on GPT-3, SARATHI reduces bubbles by 6.29x, resulting in an end-to-end throughput improvement of 1.91x.
    摘要 大型语言模型(LLM)推理包括两个不同阶段:预填阶段和解码阶段。预填阶段处理输入提示,而解码阶段通过将输出 tokens 生成 autoregressively。在预填阶段, GPU 计算资源会受到小批量大小的限制,而在解码阶段,每个请求都会生成一个 token,导致计算资源利用率低。此外,预填和解码时间的变化也会导致微批处理中的不均衡,从而导致更多的气泡。为解决这些挑战,我们提出了 SARATHI。SARATHI 使用 chunked-prefills,将预填请求分割成等大小的块,并使用 decode-maximal batching,将每个块中的 decode 和预填块相结合,以实现 GPU 计算资源的最大利用。在推理过程中,预填块会使 GPU 计算资源充沛,而 decode 请求会“乘车”并产生相对论少的计算成本。chunked-prefills 允许构建多个 decode-maximal batches,从而最大化 decode 的覆盖率。此外,uniform 的 compute 设计也减轻了微批处理中的不均衡,从而减少气泡。我们的技术可以在不同的模型和硬件上实现显著的推理性能提升。对于 LLaMA-13B 模型和 A6000 GPU,SARATHI 可以提高解码吞吐量 by up to 10x,并提高总体吞吐量 by up to 1.33x。对于 LLaMa-33B 模型和 A100 GPU,我们可以达到 1.25x 更高的总体吞吐量和 up to 4.25x 更高的解码吞吐量。当用于 GPT-3 pipeline parallelism 时,SARATHI 可以减少气泡 by 6.29x,从而实现总体吞吐量的提升 by 1.91x。