cs.LG - 2023-10-14

Towards Semi-Structured Automatic ICD Coding via Tree-based Contrastive Learning

  • paper_url: http://arxiv.org/abs/2310.09672
  • repo_url: None
  • paper_authors: Chang Lu, Chandan K. Reddy, Ping Wang, Yue Ning
  • for: 提高现有ICD编码模型的性能,解决由医疗专业人员的写作习惯和病人的多种病理特征引起的医学记录的变化和有限的数据难题。
  • methods: 对医学记录进行自动分割,引入对比预训练策略,使用软多标签相似度度量基于树编辑距离进行预训练。还设计了遮盖部分训练策略,使ICD编码模型能够更好地定位相关的ICD代码段落。
  • results: 通过对存在限制数据的ICD编码模型进行预训练,提高了其性能。同时,通过遮盖部分训练策略,ICD编码模型能够更好地定位相关的ICD代码段落。
    Abstract Automatic coding of International Classification of Diseases (ICD) is a multi-label text categorization task that involves extracting disease or procedure codes from clinical notes. Despite the application of state-of-the-art natural language processing (NLP) techniques, there are still challenges including limited availability of data due to privacy constraints and the high variability of clinical notes caused by different writing habits of medical professionals and various pathological features of patients. In this work, we investigate the semi-structured nature of clinical notes and propose an automatic algorithm to segment them into sections. To address the variability issues in existing ICD coding models with limited data, we introduce a contrastive pre-training approach on sections using a soft multi-label similarity metric based on tree edit distance. Additionally, we design a masked section training strategy to enable ICD coding models to locate sections related to ICD codes. Extensive experimental results demonstrate that our proposed training strategies effectively enhance the performance of existing ICD coding methods.
    摘要 自动编码国际疾病分类(ICD)是一个多标签文本分类任务,它涉及提取医疗记录中的疾病或手术代码。尽管应用了当前最先进的自然语言处理(NLP)技术,仍然存在一些挑战,包括数据的有限可用性由隐私限制和医疗记录的高度变化,即医生们的不同写作风格和患者的多种疾病特征。在这项工作中,我们调查了医疗记录的半结构化特性,并提议一种自动分段算法。为了解决现有ICD编码模型受限于数据的变化问题,我们引入了一种对section进行预训练的对比预训练方法,并设计了一种遮盖section训练策略,以便ICD编码模型能够定位相关的section。我们的实验结果表明,我们的提议的训练策略有效地提高了现有ICD编码方法的表现。

A Blockchain-empowered Multi-Aggregator Federated Learning Architecture in Edge Computing with Deep Reinforcement Learning Optimization

  • paper_url: http://arxiv.org/abs/2310.09665
  • repo_url: None
  • paper_authors: Xiao Li, Weili Wu
  • for: 本研究旨在提出一种基于区块链技术的多收集器Edge Federated Learning架构(BMA-FL),以提高 Edge Federated Learning 的安全性和效率。
  • methods: 本研究提出了一种新的轻量级的拜占庭同质性验证机制(PBCM),用于在BMA-FL中实现安全和快速的模型归并和同步。此外,我们还提出了一种多智能深度强化学习算法,用于帮助收集器决定最佳训练策略。
  • results: 我们在实际数据集上进行了实验,并证明了BMA-FL可以更快地获得更好的模型,比基eline更高效。这表明了PBCM和我们提出的深度强化学习算法的有效性。
    Abstract Federated learning (FL) is emerging as a sought-after distributed machine learning architecture, offering the advantage of model training without direct exposure of raw data. With advancements in network infrastructure, FL has been seamlessly integrated into edge computing. However, the limited resources on edge devices introduce security vulnerabilities to FL in the context. While blockchain technology promises to bolster security, practical deployment on resource-constrained edge devices remains a challenge. Moreover, the exploration of FL with multiple aggregators in edge computing is still new in the literature. Addressing these gaps, we introduce the Blockchain-empowered Heterogeneous Multi-Aggregator Federated Learning Architecture (BMA-FL). We design a novel light-weight Byzantine consensus mechanism, namely PBCM, to enable secure and fast model aggregation and synchronization in BMA-FL. We also dive into the heterogeneity problem in BMA-FL that the aggregators are associated with varied number of connected trainers with Non-IID data distributions and diverse training speed. We proposed a multi-agent deep reinforcement learning algorithm to help aggregators decide the best training strategies. The experiments on real-word datasets demonstrate the efficiency of BMA-FL to achieve better models faster than baselines, showing the efficacy of PBCM and proposed deep reinforcement learning algorithm.
    摘要 《采用分布式机器学习架构的联邦学习(Federated Learning,FL)正在吸引越来越多的关注,因为它可以在不直接暴露原始数据的情况下进行模型训练。随着网络基础设施的提高,FL已经顺利地集成到边缘计算中。然而,边缘设备的限制性资源引入了FL在边缘计算环境中的安全漏洞。而区块链技术则承诺可以加强安全性,但是在资源有限的边缘设备上实际部署仍然是一大挑战。此外,现有的文献中对多个聚合器在边缘计算中的FLexploration还是新的。为了解决这些问题,我们介绍了区块链 empowered 多聚合器联邦学习架构(BMA-FL)。我们设计了一种轻量级的Byzantine共识机制,称为PBCM,以便在BMA-FL中安全快速地进行模型聚合和同步。我们还考虑了BMA-FL中聚合器与不同数据分布和多个连接的训练速度的多样性问题,并提出了一种基于多智能深度优化学习算法的解决方案。实验结果表明,BMA-FL可以更快地获得更好的模型,比基eline更高效。这显示了PBCM和我们提议的深度优化学习算法的有效性。》Note: The translation is done using Google Translate and may not be perfect. Please note that the translation is in Simplified Chinese, not Traditional Chinese.

Topology-guided Hypergraph Transformer Network: Unveiling Structural Insights for Improved Representation

  • paper_url: http://arxiv.org/abs/2310.09657
  • repo_url: None
  • paper_authors: Khaled Mohammed Saifuddin, Mehmet Emin Aktas, Esra Akbas
  • for: 本文旨在扩展传统图的概念,使用嵌入式的图神经网络(GNN)进行图表示学习,并且可以在高阶关系的图上进行表示学习。
  • methods: 本文提出了一种基于图的Topology-guided Hypergraph Transformer Network(THTN)模型,该模型首先将图转换成高阶图,然后使用简单 yet effective的结构和空间编码模块,将节点的表示增强,同时捕捉节点的本地和全局 topological 表达。
  • results: 对于节点分类任务,提出的模型表现比既有方法更好,可以更好地捕捉节点的本地和全局 topological 表达。
    Abstract Hypergraphs, with their capacity to depict high-order relationships, have emerged as a significant extension of traditional graphs. Although Graph Neural Networks (GNNs) have remarkable performance in graph representation learning, their extension to hypergraphs encounters challenges due to their intricate structures. Furthermore, current hypergraph transformers, a special variant of GNN, utilize semantic feature-based self-attention, ignoring topological attributes of nodes and hyperedges. To address these challenges, we propose a Topology-guided Hypergraph Transformer Network (THTN). In this model, we first formulate a hypergraph from a graph while retaining its structural essence to learn higher-order relations within the graph. Then, we design a simple yet effective structural and spatial encoding module to incorporate the topological and spatial information of the nodes into their representation. Further, we present a structure-aware self-attention mechanism that discovers the important nodes and hyperedges from both semantic and structural viewpoints. By leveraging these two modules, THTN crafts an improved node representation, capturing both local and global topological expressions. Extensive experiments conducted on node classification tasks demonstrate that the performance of the proposed model consistently exceeds that of the existing approaches.
    摘要 �� Hypergraphs, with their ability to depict high-order relationships, have emerged as a significant extension of traditional graphs. Although Graph Neural Networks (GNNs) have shown remarkable performance in graph representation learning, their extension to hypergraphs encounters challenges due to their complex structures. Furthermore, current hypergraph transformers, a special variant of GNN, use semantic feature-based self-attention, ignoring the topological attributes of nodes and hyperedges. To address these challenges, we propose a Topology-guided Hypergraph Transformer Network (THTN).In this model, we first formulate a hypergraph from a graph while retaining its structural essence to learn higher-order relations within the graph. Then, we design a simple yet effective structural and spatial encoding module to incorporate the topological and spatial information of the nodes into their representation. Further, we present a structure-aware self-attention mechanism that discovers the important nodes and hyperedges from both semantic and structural viewpoints. By leveraging these two modules, THTN crafts an improved node representation, capturing both local and global topological expressions.Extensive experiments conducted on node classification tasks demonstrate that the performance of the proposed model consistently exceeds that of the existing approaches.

Mixed-Type Tabular Data Synthesis with Score-based Diffusion in Latent Space

  • paper_url: http://arxiv.org/abs/2310.09656
  • repo_url: https://github.com/amazon-science/tabsyn
  • paper_authors: Hengrui Zhang, Jiani Zhang, Balasubramaniam Srinivasan, Zhengyuan Shen, Xiao Qin, Christos Faloutsos, Huzefa Rangwala, George Karypis
  • for: 本文旨在提出一种能够生成高质量的 tabular 数据的方法,以满足不同数据类型的混合和复杂的分布特性。
  • methods: 本文提出了一种基于 variational autoencoder (VAE) 的 diffusion 模型,可以将不同数据类型转化为单一的空间,并显式地捕捉列之间的关系。
  • results: 对 six 个数据集进行了广泛的实验,并证明了 TABSYN 可以在 column-wise 分布和列对卷积关系的估计中具有更高的准确率,比如 existing 方法的 86% 和 67%。
    Abstract Recent advances in tabular data generation have greatly enhanced synthetic data quality. However, extending diffusion models to tabular data is challenging due to the intricately varied distributions and a blend of data types of tabular data. This paper introduces TABSYN, a methodology that synthesizes tabular data by leveraging a diffusion model within a variational autoencoder (VAE) crafted latent space. The key advantages of the proposed TABSYN include (1) Generality: the ability to handle a broad spectrum of data types by converting them into a single unified space and explicitly capture inter-column relations; (2) Quality: optimizing the distribution of latent embeddings to enhance the subsequent training of diffusion models, which helps generate high-quality synthetic data, (3) Speed: much fewer number of reverse steps and faster synthesis speed than existing diffusion-based methods. Extensive experiments on six datasets with five metrics demonstrate that TABSYN outperforms existing methods. Specifically, it reduces the error rates by 86% and 67% for column-wise distribution and pair-wise column correlation estimations compared with the most competitive baselines.
    摘要
  1. 通用性: 能够处理广泛的数据类型,将它们转换为单一的空间,并且显著地捕捉列之间的关系。2. 质量: 优化纹理中的 latent embedding 分布,以提高 diffusion 模型的训练,从而生成高质量的 sintetic data。3. 速度: 比现有的 diffusion-based 方法快得多,减少了反向步骤的数量,提高了生成速度。广泛的实验表明,TABSYN 在六个 dataset 上的五个指标上都超过了现有的方法。具体来说,它相比最竞争的基准值,降低了列级分布和对列相关性的估计错误率 by 86% 和 67%。

DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization

  • paper_url: http://arxiv.org/abs/2310.09639
  • repo_url: None
  • paper_authors: Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
  • for: 这篇论文目的是解决对于执行大型自然语言模型(LLM)的微调问题,具体是处理内存和隐私问题。
  • methods: 这篇论文使用了零次方法来进行斜方向优化,这些方法仅从前进推导,因此可以大大减少训练时间的内存负载。然而,直接结合标准的隐私机制和零次方法会增加维度相依的复杂性。
  • results: 这篇论文提出了一个名为DPZero的新的隐私条件下的零次方法,它的复杂度仅对问题的内部维度有很强的依赖性,实际上可以实现高效的应用。
    Abstract The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continue to grow, encompassing billions of parameters, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize and disclose sensitive training data, the privacy of fine-tuning data must be respected. To this end, we explore the potential of zeroth-order methods in differentially private optimization for fine-tuning LLMs. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differential privacy mechanism poses dimension-dependent complexity. To bridge the gap, we introduce DPZero, a novel differentially private zeroth-order algorithm with nearly dimension-independent rates. Our theoretical analysis reveals that its complexity hinges primarily on the problem's intrinsic dimension and exhibits only a logarithmic dependence on the ambient dimension. This renders DPZero a highly practical option for real-world LLMs deployments.
    摘要 广泛的专业化大型语言模型(LLM)微调问题面临两大挑战:首先,随着 LLM 的大小不断增长,使用反对推导法(backpropagation)进行梯度下降课题的记忆需求变得禁制高昂。其次,由于 LLM 倾向于记忆和泄露敏感训练数据,因此训练数据的隐私必须受到尊重。为此,我们探索了零顺序方法在不同数据隐私优化中的潜力。零顺序方法,它仅靠前进通过,可以实现严重降低训练中的记忆消耗。但是,直接结合它们与标准的隐私机制会带来维度相依的复杂性。为了填补这个差距,我们提出了 DPZero,一种基于零顺序方法的不同数据隐私优化算法。我们的理论分析显示,DPZero 的复杂度仅受到问题的内在维度的影响,并且只具有对数对应的依赖性。这使得 DPZero 在实际应用中成为了非常实用的选择。

Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling

  • paper_url: http://arxiv.org/abs/2310.09636
  • repo_url: https://github.com/tiberiu44/TTS-Cube
  • paper_authors: Tiberiu Boros, Stefan Daniel Dumitrescu, Ionut Mironica, Radu Chivereanu
  • for: 这个论文描述了一个端到端的语音合成系统,使用生成对抗训练。
  • methods: 这个系统使用生成对抗训练来训练其 vocoder,用于 raw phoneme-to-audio 转换,并使用显式的 fonetic、抑制和持续时间模型。
  • results: 研究人员通过对多个预训练模型进行实验,并引入了一种新的高度表达式的字符声音匹配方法,基于独特的风格标识符。
    Abstract We describe an end-to-end speech synthesis system that uses generative adversarial training. We train our Vocoder for raw phoneme-to-audio conversion, using explicit phonetic, pitch and duration modeling. We experiment with several pre-trained models for contextualized and decontextualized word embeddings and we introduce a new method for highly expressive character voice matching, based on discreet style tokens.
    摘要 我们描述了一个端到端的语音合成系统,使用生成对抗训练。我们用raw phoneme-to-audio转换器进行辅助训练,使用显式的音位、抑制和持续时间模型。我们对几种预训练的词袋模型进行实验,并 introduce一种新的高度表达式的字符声音匹配方法,基于精细的风格то克。

Landslide Topology Uncovers Failure Movements

  • paper_url: http://arxiv.org/abs/2310.09631
  • repo_url: None
  • paper_authors: Kamal Rana, Kushanav Bhuyan, Joaquin Vicente Ferrer, Fabrice Cotton, Ugur Ozturk, Filippo Catani, Nishant Malik
    for:* The paper aims to improve the accuracy of landslide predictive models and impact assessments by identifying failure types based on their movements.methods:* The approach uses 3D landslide topology to identify failure types, such as slides and flows, by analyzing topological proxies that reveal the mechanics of mass movements.results:* The approach achieves 80 to 94% accuracy in identifying failure types in historic and event-specific landslide databases from various geomorphological and climatic contexts.
    Abstract The death toll and monetary damages from landslides continue to rise despite advancements in predictive modeling. The predictive capability of these models is limited as landslide databases used in training and assessing the models often have crucial information missing, such as underlying failure types. Here, we present an approach for identifying failure types based on their movements, e.g., slides and flows by leveraging 3D landslide topology. We observe topological proxies reveal prevalent signatures of mass movement mechanics embedded in the landslide's morphology or shape, such as detecting coupled movement styles within complex landslides. We find identical failure types exhibit similar topological properties, and by using them as predictors, we can identify failure types in historic and event-specific landslide databases (including multi-temporal) from various geomorphological and climatic contexts such as Italy, the US Pacific Northwest region, Denmark, Turkey, and China with 80 to 94 % accuracy. To demonstrate the real-world application of the method, we implement it in two undocumented datasets from China and publicly release the datasets. These new insights can considerably improve the performance of landslide predictive models and impact assessments. Moreover, our work introduces a new paradigm for studying landslide shapes to understand underlying processes through the lens of landslide topology.
    摘要 “死亡人数和经济损害由滑坡继续增加,尽管预测模型的技术已经得到进步。预测模型的能力受到数据库中的信息损失的限制,这些数据库通常缺乏关键信息,如滑坡类型。我们提出了一种方法,利用滑坡三维 topology 来识别滑坡类型,如滑坡和流体。我们发现了类似的滑坡类型具有相似的 topological 特征,并使用这些特征作为预测器,可以准确地识别 historic 和事件特定的滑坡数据库(包括多时间),从不同的地质和气候背景中获得 80-94% 的准确率。为证明这种方法的实际应用,我们在中国两个未文件的数据集中实现了它,并公开发布了这些数据集。这些新的发现可以明显提高滑坡预测模型的性能和影响评估。此外,我们的工作还 introduce 了一种新的理解滑坡形态的方法,通过滑坡 topology 来理解滑坡的下面过程。”Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Federated Battery Diagnosis and Prognosis

  • paper_url: http://arxiv.org/abs/2310.09628
  • repo_url: None
  • paper_authors: Nur Banu Altinpulluk, Deniz Altinpulluk, Paritosh Ramanan, Noah Paulson, Feng Qiu, Susan Babinec, Murat Yildirim
  • for: 这个研究旨在提出一个分布式电池诊断和预测模型,以解决现有的数据拥有权、隐私、通信和处理等问题。
  • methods: 我们提出了一个分布式电池诊断和预测模型,将标准电流电压时间数据分布式处理,仅将模型参数通信,减少通信负载并保持数据隐私。
  • results: 我们的模型可以实现隐私保护的分布式电池诊断和预测,并且可以预测电池剩余寿命。这个模型将为电池健康管理带来一个新的思维方向。
    Abstract Battery diagnosis, prognosis and health management models play a critical role in the integration of battery systems in energy and mobility fields. However, large-scale deployment of these models is hindered by a myriad of challenges centered around data ownership, privacy, communication, and processing. State-of-the-art battery diagnosis and prognosis methods require centralized collection of data, which further aggravates these challenges. Here we propose a federated battery prognosis model, which distributes the processing of battery standard current-voltage-time-usage data in a privacy-preserving manner. Instead of exchanging raw standard current-voltage-time-usage data, our model communicates only the model parameters, thus reducing communication load and preserving data confidentiality. The proposed model offers a paradigm shift in battery health management through privacy-preserving distributed methods for battery data processing and remaining lifetime prediction.
    摘要 锂电池诊断、预测和健康管理模型在能源和交通领域的集成中扮演了关键角色。然而,大规模部署这些模型受到数据所有权、隐私、通信和处理等多种挑战。现状的锂电池诊断和预测方法均需要集中收集数据,这进一步增加了这些挑战。我们提议一种联邦锂电池预测模型,该模型在隐私保护的方式下分布式处理标准电压电流时间使用数据。而不是交换Raw标准电压电流时间数据,我们的模型仅交换模型参数,从而降低了通信负担和保持了数据Confidentiality。我们提出的模型将锂电池健康管理领域带来隐私保护分布式处理锂电池数据的新模式,并提高锂电池剩余寿命预测的精度。

Machine Learning for Urban Air Quality Analytics: A Survey

  • paper_url: http://arxiv.org/abs/2310.09620
  • repo_url: None
  • paper_authors: Jindong Han, Weijia Zhang, Hao Liu, Hui Xiong
  • for: 这篇论文主要目标是探讨机器学习(ML)技术在空气质量分析领域的应用,以提供一份全面的报告,帮助专业人士寻找适合自己的问题和进行前沿研究。
  • methods: 本论文使用的方法包括数据收集、数据预处理、机器学习模型的应用等,涵盖了多种空气质量分析任务,如污染 patrern mining、空气质量推断和预测等。
  • results: 本论文提供了一系列已有的空气质量分析任务的综述和分类,同时还提供了一些已有的公共空气质量数据集,以便进一步研究。此外,文章还预测了未来研究的一些可能的方向。
    Abstract The increasing air pollution poses an urgent global concern with far-reaching consequences, such as premature mortality and reduced crop yield, which significantly impact various aspects of our daily lives. Accurate and timely analysis of air pollution is crucial for understanding its underlying mechanisms and implementing necessary precautions to mitigate potential socio-economic losses. Traditional analytical methodologies, such as atmospheric modeling, heavily rely on domain expertise and often make simplified assumptions that may not be applicable to complex air pollution problems. In contrast, Machine Learning (ML) models are able to capture the intrinsic physical and chemical rules by automatically learning from a large amount of historical observational data, showing great promise in various air quality analytical tasks. In this article, we present a comprehensive survey of ML-based air quality analytics, following a roadmap spanning from data acquisition to pre-processing, and encompassing various analytical tasks such as pollution pattern mining, air quality inference, and forecasting. Moreover, we offer a systematic categorization and summary of existing methodologies and applications, while also providing a list of publicly available air quality datasets to ease the research in this direction. Finally, we identify several promising future research directions. This survey can serve as a valuable resource for professionals seeking suitable solutions for their specific challenges and advancing their research at the cutting edge.
    摘要 这个增长的空气污染问题具有急迫的全球性,带来许多深远的后果,如提早死亡和减少的农作物生产,这些影响了我们日常生活的多方面。精确和时间的空气污染分析是理解其下面的机制和适当的预防措施,以减少可能的社会经济损失。传统的分析方法,如大气模型,严重依赖专家知识和假设,可能无法应对复杂的空气污染问题。相比之下,机器学习(ML)模型能够自动从历史观测数据中学习出空气污染的内在物理和化学规律,显示了它们在不同的空气质量分析任务中的杰出应用潜力。在这篇文章中,我们提供了一个概要的机器学习基于空气质量分析的调查,包括数据收集到预处理的步骤,以及各种分析任务,如污染图像探索、空气质量推断和预测。此外,我们提供了现有的方法和应用的系统性概括和摘要,同时提供了访问公共空气质量数据的列表,以便进一步研究这个方向。最后,我们点出了未来研究的一些有前途的方向。这篇调查可以作为专业人员寻找适合他们特定挑战的适当解决方案,并为他们的研究进一步发展到边缘领域。

STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.09615
  • repo_url: https://github.com/weipu-zhang/storm
  • paper_authors: Weipu Zhang, Gang Wang, Jian Sun, Yetian Yuan, Gao Huang
  • for: 本研究旨在提高模型基 Reinforcement learning 算法在视觉输入环境中的表现。
  • methods: 该方法首先通过自我超vised learning 构建一个参数化的 simulate 世界模型,然后利用这个模型来提高 Agent 的策略。
  • results: 该方法可以达到人类水平的表现($126.7%$ 的 Atari $100$k 评估标准),并且比之前的方法更加高效。
    Abstract Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments. These approaches begin by constructing a parameterized simulation world model of the real environment through self-supervised learning. By leveraging the imagination of the world model, the agent's policy is enhanced without the constraints of sampling from the real environment. The performance of these algorithms heavily relies on the sequence modeling and generation capabilities of the world model. However, constructing a perfectly accurate model of a complex unknown environment is nearly impossible. Discrepancies between the model and reality may cause the agent to pursue virtual goals, resulting in subpar performance in the real environment. Introducing random noise into model-based reinforcement learning has been proven beneficial. In this work, we introduce Stochastic Transformer-based wORld Model (STORM), an efficient world model architecture that combines the strong sequence modeling and generation capabilities of Transformers with the stochastic nature of variational autoencoders. STORM achieves a mean human performance of $126.7\%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods that do not employ lookahead search techniques. Moreover, training an agent with $1.85$ hours of real-time interaction experience on a single NVIDIA GeForce RTX 3090 graphics card requires only $4.3$ hours, showcasing improved efficiency compared to previous methodologies.
    摘要 近些时候,基于模型的强化学习算法在视觉输入环境中表现出色。这些方法首先通过自动学习建立一个参数化的模拟世界模型,然后利用模型的想像力提高代理人的策略,不受实际环境的采样限制。然而,建立一个完全准确的模型是几乎不可能的。模型和现实之间的差异可能导致代理人追求虚拟目标,从而导致实际环境的性能下降。在这种情况下,引入随机噪声到模型基于强化学习已经证明有利。在这项工作中,我们提出了 Stochastic Transformer-based wORld Model(STORM),这是一种高效的世界模型架构,它将Transformers强大的序列模型和生成能力与变量自动编码器的随机性结合起来。STORM在Atari 100k 测试benchmark上达到了人类性能的mean值 $126.7\%$,创下了没有使用lookahead搜索技术的新纪录。此外,通过实际时间互动体验的1.85小时,我们只需要4.3小时的训练时间,这显示了与前一代方法相比的更高效性。

Towards Intelligent Network Management: Leveraging AI for Network Service Detection

  • paper_url: http://arxiv.org/abs/2310.09609
  • repo_url: None
  • paper_authors: Khuong N. Nguyen, Abhishek Sehgal, Yuming Zhu, Junsu Choi, Guanbo Chen, Hao Chen, Boon Loong Ng, Charlie Zhang
  • for: 这个研究旨在开发一个高级的网络流量分类系统,以便在现代无线通讯网络中进行精确的流量分析。
  • methods: 本研究使用机器学习方法ologies来分析网络流量,并将其分为不同的网络服务类型。我们的方法包括识别网络流量中的对应类型,并将其分为多个小型流量流。我们使用机器学习模型来训练这些服务类型。
  • results: 我们的研究结果显示,我们的方法可以实现高度的精确性,并且可以在不同的无线网络条件下进行类型化。这些结果显示了机器学习在无线技术中的应用潜力。
    Abstract As the complexity and scale of modern computer networks continue to increase, there has emerged an urgent need for precise traffic analysis, which plays a pivotal role in cutting-edge wireless connectivity technologies. This study focuses on leveraging Machine Learning methodologies to create an advanced network traffic classification system. We introduce a novel data-driven approach that excels in identifying various network service types in real-time, by analyzing patterns within the network traffic. Our method organizes similar kinds of network traffic into distinct categories, referred to as network services, based on latency requirement. Furthermore, it decomposes the network traffic stream into multiple, smaller traffic flows, with each flow uniquely carrying a specific service. Our ML models are trained on a dataset comprised of labeled examples representing different network service types collected on various Wi-Fi network conditions. Upon evaluation, our system demonstrates a remarkable accuracy in distinguishing the network services. These results emphasize the substantial promise of integrating Artificial Intelligence in wireless technologies. Such an approach encourages more efficient energy consumption, enhances Quality of Service assurance, and optimizes the allocation of network resources, thus laying a solid groundwork for the development of advanced intelligent networks.
    摘要 (Simplified Chinese translation)随着现代计算机网络的复杂性和规模不断增加,需要精准的网络流量分析已成为现代无线连接技术的急需。本研究利用机器学习方法创建高级网络流量分类系统。我们提出了一种新的数据驱动方法,可以在实时中识别不同的网络服务类型,通过分析网络流量中的模式。我们的方法将类似的网络流量分成不同类别,称为网络服务,基于延迟需求。此外,它还将网络流量流分为多个更小的流量流,每个流唯一携带特定的服务。我们的ML模型在各种Wi-Fi网络条件下收集的标注示例集上进行训练。评估结果显示,我们的系统在分类网络服务时表现出了很高的准确率。这些结果强调了将人工智能 integrating into wireless technologies 的极大承诺,这种方法可以提高能效的能源消耗,提高服务质量保证,并优化网络资源的分配,从而为高级智能网络的发展提供坚实的基础。

Adaptive maximization of social welfare

  • paper_url: http://arxiv.org/abs/2310.09597
  • repo_url: None
  • paper_authors: Nicolo Cesa-Bianchi, Roberto Colomboni, Maximilian Kasy
  • for: 本文研究了 repeatedly choosing policies 以 maximize social welfare.
  • methods: 文章使用了 experimentation 学习 response functions, 并 derive 了 regret bound.
  • results: 文章证明了 regret Rate of $T^{2/3}$, 这意味着 (i) welfare maximization 比 multi-armed bandit problem 更加困难, 并且 (ii) 算法实现了最佳率。 如果社会利益是凹形的, 可以使用 dyadic search algorithm 实现 $T^{1/2}$ 率。
    Abstract We consider the problem of repeatedly choosing policies to maximize social welfare. Welfare is a weighted sum of private utility and public revenue. Earlier outcomes inform later policies. Utility is not observed, but indirectly inferred. Response functions are learned through experimentation. We derive a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 algorithm. Cumulative regret grows at a rate of $T^{2/3}$. This implies that (i) welfare maximization is harder than the multi-armed bandit problem (with a rate of $T^{1/2}$ for finite policy sets), and (ii) our algorithm achieves the optimal rate. For the stochastic setting, if social welfare is concave, we can achieve a rate of $T^{1/2}$ (for continuous policy sets), using a dyadic search algorithm. We analyze an extension to nonlinear income taxation, and sketch an extension to commodity taxation. We compare our setting to monopoly pricing (which is easier), and price setting for bilateral trade (which is harder).
    摘要 我们考虑了 repeatedly choosing policies 以 maximize social welfare。 social welfare 是一个 weighted sum of private utility 和 public revenue。earlier outcomes inform later policies。 utility 不能 directly observed,而是 indirectly inferred。response functions 是通过 experimentation 学习的。我们 derivated a lower bound on regret, and a matching adversarial upper bound for a variant of the Exp3 algorithm。cumulative regret grows at a rate of $T^{2/3}$.这意味着 (i) welfare maximization 比 multi-armed bandit problem 更难(with a rate of $T^{1/2}$ for finite policy sets),和 (ii) our algorithm achieves the optimal rate。对于随机设置,如果 social welfare 是凹形函数,我们可以使用 dyadic search algorithm achiev a rate of $T^{1/2}$ (for continuous policy sets)。我们还分析了 nonlinear income taxation 的扩展,并略 outline commodity taxation 的扩展。我们比较了我们的设置与 monopoly pricing 和 bilateral trade 的价格设置。Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

Causality and Independence Enhancement for Biased Node Classification

  • paper_url: http://arxiv.org/abs/2310.09586
  • repo_url: https://github.com/chen-gx/cie
  • paper_authors: Guoxin Chen, Yongqing Wang, Fangda Guo, Qinglang Guo, Jiangli Shao, Huawei Shen, Xueqi Cheng
  • For: The paper aims to address the problem of out-of-distribution (OOD) generalization for node classification on graphs, specifically focusing on mixed biases and low-resource scenarios.* Methods: The proposed Causality and Independence Enhancement (CIE) framework estimates causal and spurious features at the node representation level, mitigates the influence of spurious correlations through backdoor adjustment, and introduces independence constraint to improve discriminability and stability of causal and spurious features.* Results: The proposed CIE approach significantly enhances the performance of graph neural networks (GNNs) and outperforms state-of-the-art debiased node classification methods in various scenarios, including specific types of data biases, mixed biases, and low-resource scenarios.
    Abstract Most existing methods that address out-of-distribution (OOD) generalization for node classification on graphs primarily focus on a specific type of data biases, such as label selection bias or structural bias. However, anticipating the type of bias in advance is extremely challenging, and designing models solely for one specific type may not necessarily improve overall generalization performance. Moreover, limited research has focused on the impact of mixed biases, which are more prevalent and demanding in real-world scenarios. To address these limitations, we propose a novel Causality and Independence Enhancement (CIE) framework, applicable to various graph neural networks (GNNs). Our approach estimates causal and spurious features at the node representation level and mitigates the influence of spurious correlations through the backdoor adjustment. Meanwhile, independence constraint is introduced to improve the discriminability and stability of causal and spurious features in complex biased environments. Essentially, CIE eliminates different types of data biases from a unified perspective, without the need to design separate methods for each bias as before. To evaluate the performance under specific types of data biases, mixed biases, and low-resource scenarios, we conducted comprehensive experiments on five publicly available datasets. Experimental results demonstrate that our approach CIE not only significantly enhances the performance of GNNs but outperforms state-of-the-art debiased node classification methods.
    摘要 现有的方法主要对应类别数据上的特定类型偏见,如标签选择偏见或结构偏见。但是,预测偏见类型在进程中很困难,设计对一种特定类型的模型可能不会改善整体普遍化性能。另外,有限的研究集中在混合偏见的影响,这些偏见在实际场景中更为普遍和复杂。为了解决这些限制,我们提出了一个新的 causality and independence enhancement(CIE)框架,可以应用于多种图像神经网络(GNNs)。我们的方法在节点表现层估计 causal 和误假特征,并通过反门侧调整减少误假相关性。同时,我们引入独立性限制,以提高统计特征的检测性和稳定性在复杂偏见环境中。简而言之,CIE 可以从一个统一的角度消除不同类型的数据偏见,不需要在进程中设计每种偏见的特别方法。为了评估在特定类型的数据偏见、混合偏见、低资源情况下的表现,我们进行了广泛的实验,结果显示了我们的 CIE 方法不仅能够明显提高 GNN 的表现,而且超过了目前的调整类别数据方法的表现。

Two Sides of The Same Coin: Bridging Deep Equilibrium Models and Neural ODEs via Homotopy Continuation

  • paper_url: http://arxiv.org/abs/2310.09583
  • repo_url: None
  • paper_authors: Shutong Ding, Tianyu Cui, Jingya Wang, Ye Shi
  • for: 这个论文主要目标是提出一种新的隐式模型,即HomODE,以解决隐式模型中的稳点问题。
  • methods: 这个论文使用了homotopy continuation方法来解决隐式模型中的稳点问题,并且开发了一种加速方法来提高模型的性能。
  • results: 实验结果表明,HomODE可以在图像分类任务中超过现有的隐式模型,并且具有更好的稳定性和更低的内存占用率。
    Abstract Deep Equilibrium Models (DEQs) and Neural Ordinary Differential Equations (Neural ODEs) are two branches of implicit models that have achieved remarkable success owing to their superior performance and low memory consumption. While both are implicit models, DEQs and Neural ODEs are derived from different mathematical formulations. Inspired by homotopy continuation, we establish a connection between these two models and illustrate that they are actually two sides of the same coin. Homotopy continuation is a classical method of solving nonlinear equations based on a corresponding ODE. Given this connection, we proposed a new implicit model called HomoODE that inherits the property of high accuracy from DEQs and the property of stability from Neural ODEs. Unlike DEQs, which explicitly solve an equilibrium-point-finding problem via Newton's methods in the forward pass, HomoODE solves the equilibrium-point-finding problem implicitly using a modified Neural ODE via homotopy continuation. Further, we developed an acceleration method for HomoODE with a shared learnable initial point. It is worth noting that our model also provides a better understanding of why Augmented Neural ODEs work as long as the augmented part is regarded as the equilibrium point to find. Comprehensive experiments with several image classification tasks demonstrate that HomoODE surpasses existing implicit models in terms of both accuracy and memory consumption.
    摘要 深度平衡模型(DEQs)和神经常微方程(Neural ODEs)是两种隐式模型,它们具有优秀的性能和内存占用率。尽管两者都是隐式模型,但它们来自不同的数学表述。以homotopy继续为灵感,我们建立了这两种模型之间的连接,并证明它们实际上是同一种质量的两面镜子。homotopy继续是一种解决非线性方程的古老方法,基于相应的ODE。给出这种连接后,我们提出了一种新的隐式模型called HomoODE,它继承了DEQs的高精度性和Neural ODEs的稳定性。不同于DEQs,HomoODE在前向传播中使用修改后的Neural ODE来解决平衡点找问题,而不是直接使用Newton方法来解决平衡点找问题。此外,我们还开发了一种加速HomoODE的方法,其中learnable初始点可以被共享。值得注意的是,我们的模型还提供了为何augmented Neural ODEs会工作,只要视augmented部分为平衡点来找问题。我们对多个图像分类任务进行了广泛的实验,并证明了HomoODE在性能和内存占用率两个方面都超过了现有的隐式模型。

Reduced Policy Optimization for Continuous Control with Hard Constraints

  • paper_url: http://arxiv.org/abs/2310.09574
  • repo_url: None
  • paper_authors: Shutong Ding, Jingya Wang, Yali Du, Ye Shi
  • for: 这篇论文的目的是提出一种能够有效地处理硬限制的可estrained Reinforcement Learning(RL)算法。
  • methods: 该算法基于Generalized Reduced Gradient(GRG)算法,并将RL与GRG结合起来解决了一般硬限制问题。具体来说,该算法将行为 partitioning 为基本行为和非基本行为,然后使用一个策略网络来输出基本行为。然后,该算法计算非基本行为,并使用 obtained 基本行为来更新策略网络。此外,该算法还引入了一种基于减少 gradient 的 action projection 过程,并使用一种修改后 Lagrangian relaxation 技术来保证不等约束的满足。
  • results: 与之前的受限RL算法相比,RPO在三个新的宽泛环境中(包括两个机器人控制任务和一个智能网格操作控制任务)表现出色,在累积奖励和约束违反方面都达到了更好的性能。
    Abstract Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints. RPO partitions actions into basic actions and nonbasic actions following the GRG method and outputs the basic actions via a policy network. Subsequently, RPO calculates the nonbasic actions by solving equations based on equality constraints using the obtained basic actions. The policy network is then updated by implicitly differentiating nonbasic actions with respect to basic actions. Additionally, we introduce an action projection procedure based on the reduced gradient and apply a modified Lagrangian relaxation technique to ensure inequality constraints are satisfied. To the best of our knowledge, RPO is the first attempt that introduces GRG to RL as a way of efficiently handling both equality and inequality hard constraints. It is worth noting that there is currently a lack of RL environments with complex hard constraints, which motivates us to develop three new benchmarks: two robotics manipulation tasks and a smart grid operation control task. With these benchmarks, RPO achieves better performance than previous constrained RL algorithms in terms of both cumulative reward and constraint violation. We believe RPO, along with the new benchmarks, will open up new opportunities for applying RL to real-world problems with complex constraints.
    摘要 近期在受限式强化学习(RL)中,有些新的技术突破已经赋予了RL certain safety guarantee。然而,在continuous control tasks中,使用现有的受限式RL算法仍然是一个挑战,特别是在非凸硬件约束的情况下。 Drawing inspiration from generalized reduced gradient(GRG)算法,一种经典的受限式优化技术,我们提出了一种受限policy优化(RPO)算法,该算法结合RL和GRG来解决一般硬件约束。 RPO将动作分为基本动作和非基本动作,根据GRG方法,并通过一个政策网络输出基本动作。然后,RPO计算非基本动作,并使用已经获得的基本动作来解决等式约束。政策网络最后被更新,通过间接导数非基本动作与基本动作的关系来进行更新。此外,我们还引入了减少gradient的动作投影过程,并使用修改后Lagrangian relaxation技术来保证不等约束得到满足。据我们所知,RPO是第一个将GRG引入RL中,以有效地处理等约束和不等约束的硬件约束。值得注意的是,目前RL环境中还缺乏具有复杂硬件约束的环境,这使我们开发了三个新的 Referenztasks:两个机器人抓取任务和一个智能网格操作控制任务。与过去的受限式RL算法相比,RPO在累积奖励和约束违反方面表现更好。我们认为RPO,与新 Referenztasks,将打开RL应用于实际问题中的新可能性。

Neural network scoring for efficient computing

  • paper_url: http://arxiv.org/abs/2310.09554
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Hugo Waltsburger, Erwan Libessart, Chengfang Ren, Anthony Kolar, Regis Guinvarc’h
    for:* 这篇论文的目的是提出一种新的评估高性能计算和深度学习算法的效率评估方法,以及一种新的开源工具来实现这种方法。methods:* 该论文使用了一种新的评估方法,即基于精细的电力消耗、内存/CPU/GPU使用率、存储和网络输入/输出(I/O)的多 metric 评估方法。results:* 该论文通过对多种硬件平台上的现状模型进行评估,实现了在各种硬件平台上测试和评估 neural network 算法的能效性。In Simplified Chinese text, the three key points would be:for:* 这篇论文是为了评估高性能计算和深度学习算法的效率而写的。methods:* 该论文使用了一种基于多metric的评估方法,包括精细的电力消耗、内存/CPU/GPU使用率、存储和网络输入/输出(I/O)。results:* 该论文通过对多种硬件平台上的现状模型进行评估,实现了在各种硬件平台上测试和评估 neural network 算法的能效性。
    Abstract Much work has been dedicated to estimating and optimizing workloads in high-performance computing (HPC) and deep learning. However, researchers have typically relied on few metrics to assess the efficiency of those techniques. Most notably, the accuracy, the loss of the prediction, and the computational time with regard to GPUs or/and CPUs characteristics. It is rare to see figures for power consumption, partly due to the difficulty of obtaining accurate power readings. In this paper, we introduce a composite score that aims to characterize the trade-off between accuracy and power consumption measured during the inference of neural networks. For this purpose, we present a new open-source tool allowing researchers to consider more metrics: granular power consumption, but also RAM/CPU/GPU utilization, as well as storage, and network input/output (I/O). To our best knowledge, it is the first fit test for neural architectures on hardware architectures. This is made possible thanks to reproducible power efficiency measurements. We applied this procedure to state-of-the-art neural network architectures on miscellaneous hardware. One of the main applications and novelties is the measurement of algorithmic power efficiency. The objective is to allow researchers to grasp their algorithms' efficiencies better. This methodology was developed to explore trade-offs between energy usage and accuracy in neural networks. It is also useful when fitting hardware for a specific task or to compare two architectures more accurately, with architecture exploration in mind.
    摘要 很多研究已经专注于高性能计算(HPC)和深度学习中的工作负载优化和评估。然而,研究人员通常只是依靠几个指标来评估这些技术的效率。主要包括准确率、预测错误率以及与GPU或CPU特性相关的计算时间。很少看到电力消耗的数据,一些原因是获取准确的电力指标的困难。在这篇论文中,我们提出了一个复合指标,旨在描述神经网络执行期间的准确率和电力消耗之间的质量负载。为了实现这一目标,我们开发了一个新的开源工具,允许研究人员考虑更多指标:精细的电力消耗、RAM/CPU/GPU资源利用率、存储和网络输入/输出(I/O)。我们知道,这是首次对神经网络架构在硬件架构上进行了适应测试。这种方法可以帮助研究人员更好地理解他们的算法的效率。我们对当今的神经网络架构进行了测试,并实现了算法的电力效率测量。这种方法可以帮助研究人员更好地把握他们的算法的效率,同时也可以用于硬件适应测试和两个架构比较。

ARTree: A Deep Autoregressive Model for Phylogenetic Inference

  • paper_url: http://arxiv.org/abs/2310.09553
  • repo_url: https://github.com/tyuxie/artree
  • paper_authors: Tianyu Xie, Cheng Zhang
  • for: developing efficient phylogenetic inference methods
  • methods: 使用深度循环神经网络(GNNs)生成树 topology 模型
  • results: 提供了一种简单的样本生成和概率估计方法,不需要手动设计特征Feature,并且可以处理实际数据中的挑战性问题。
    Abstract Designing flexible probabilistic models over tree topologies is important for developing efficient phylogenetic inference methods. To do that, previous works often leverage the similarity of tree topologies via hand-engineered heuristic features which would require pre-sampled tree topologies and may suffer from limited approximation capability. In this paper, we propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs), called ARTree. By decomposing a tree topology into a sequence of leaf node addition operations and modeling the involved conditional distributions based on learnable topological features via GNNs, ARTree can provide a rich family of distributions over the entire tree topology space that have simple sampling algorithms and density estimation procedures, without using heuristic features. We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems.
    摘要 Designing flexible probabilistic models over tree topologies is important for developing efficient phylogenetic inference methods. Previous works often use hand-engineered heuristic features to leverage the similarity of tree topologies, which can be limited in their approximation capability and require pre-sampled tree topologies. In this paper, we propose a deep autoregressive model for phylogenetic inference based on graph neural networks (GNNs), called ARTree. By decomposing a tree topology into a sequence of leaf node addition operations and modeling the involved conditional distributions based on learnable topological features via GNNs, ARTree can provide a rich family of distributions over the entire tree topology space that have simple sampling algorithms and density estimation procedures, without using heuristic features. We demonstrate the effectiveness and efficiency of our method on a benchmark of challenging real data tree topology density estimation and variational Bayesian phylogenetic inference problems.Here's the translation in Traditional Chinese:设计可以灵活地描述树结构的概率模型是诊断生物学演化推理的重要方法。以前的工作通常会运用手动设计的几何特征来利用树结构之间的相似性,这可能会受到局限性的推理和需要预先输入的树结构。在这篇文章中,我们提出了基于图解神经网络(GNNs)的深度推理模型,称为ARTree。通过将树结构 decomposed into a sequence of leaf node addition operations,并使用GNNs来学习 topological features,ARTree可以提供一个具有简单抽象和密度估计的概率家族,不需要使用几何特征。我们透过在一系列实际数据中的测试来证明我们的方法的有效性和高效率。

Hypernetwork-based Meta-Learning for Low-Rank Physics-Informed Neural Networks

  • paper_url: http://arxiv.org/abs/2310.09528
  • repo_url: None
  • paper_authors: Woojin Cho, Kookjin Lee, Donsub Rim, Noseong Park
  • for: This paper aims to explore the use of physics-informed neural networks (PINNs) as a solver for repetitive numerical simulations of partial differential equations (PDEs) in various engineering and applied science applications.
  • methods: The proposed method uses a lightweight low-rank PINNs with only hundreds of model parameters and an associated hypernetwork-based meta-learning algorithm to efficiently approximate solutions of PDEs for varying ranges of PDE input parameters.
  • results: The proposed method is effective in overcoming the “failure modes” of PINNs and shows promising results in solving PDEs for many-query scenarios.
    Abstract In various engineering and applied science applications, repetitive numerical simulations of partial differential equations (PDEs) for varying input parameters are often required (e.g., aircraft shape optimization over many design parameters) and solvers are required to perform rapid execution. In this study, we suggest a path that potentially opens up a possibility for physics-informed neural networks (PINNs), emerging deep-learning-based solvers, to be considered as one such solver. Although PINNs have pioneered a proper integration of deep-learning and scientific computing, they require repetitive time-consuming training of neural networks, which is not suitable for many-query scenarios. To address this issue, we propose a lightweight low-rank PINNs containing only hundreds of model parameters and an associated hypernetwork-based meta-learning algorithm, which allows efficient approximation of solutions of PDEs for varying ranges of PDE input parameters. Moreover, we show that the proposed method is effective in overcoming a challenging issue, known as "failure modes" of PINNs.
    摘要 在多种工程和应用科学领域中,需要重复地数值仿真部分偏微分方程(PDEs)的输入参数变化(例如,飞机设计参数的优化),而且需要速度快的计算。在这种情况下,我们建议使用物理学习神经网络(PINNs)作为一种可能的解决方案。虽然PINNs已经实现了深度学习和科学计算的有效结合,但它们需要重复的时间consuming的神经网络训练,这不适合多个查询场景。为解决这个问题,我们提出了一种轻量级低级PINNs,具有只有百个模型参数,以及相关的卷积网络基于meta学习算法,以高效地估算PDEs的解决方案。此外,我们还证明了我们的方法可以有效地解决PINNs的“失败模式”问题。

  • paper_url: http://arxiv.org/abs/2310.09516
  • repo_url: None
  • paper_authors: Yuxin Wang, Xiannian Hu, Quan Gan, Xuanjing Huang, Xipeng Qiu, David Wipf
  • for: 这 paper 是为了提高 graph neural network (GNN) 的链接预测性能而写的。
  • methods: 这 paper 使用了一种新的 GNN 架构,其中 forward pass 会受到 both positive 和 negative 边的影响,以生成更flexible yet still cheap 的 node-wise embeddings。
  • results: 这 paper 的实验结果表明,这种新的 GNN 架构可以 retained 节点 wise 模型的执行速度,同时和边 wise 模型相比,其精度表现也是比较好的。
    Abstract Graph neural networks (GNNs) for link prediction can loosely be divided into two broad categories. First, \emph{node-wise} architectures pre-compute individual embeddings for each node that are later combined by a simple decoder to make predictions. While extremely efficient at inference time (since node embeddings are only computed once and repeatedly reused), model expressiveness is limited such that isomorphic nodes contributing to candidate edges may not be distinguishable, compromising accuracy. In contrast, \emph{edge-wise} methods rely on the formation of edge-specific subgraph embeddings to enrich the representation of pair-wise relationships, disambiguating isomorphic nodes to improve accuracy, but with the cost of increased model complexity. To better navigate this trade-off, we propose a novel GNN architecture whereby the \emph{forward pass} explicitly depends on \emph{both} positive (as is typical) and negative (unique to our approach) edges to inform more flexible, yet still cheap node-wise embeddings. This is achieved by recasting the embeddings themselves as minimizers of a forward-pass-specific energy function (distinct from the actual training loss) that favors separation of positive and negative samples. As demonstrated by extensive empirical evaluations, the resulting architecture retains the inference speed of node-wise models, while producing competitive accuracy with edge-wise alternatives.
    摘要 Graph Neural Networks (GNNs) for link prediction can be broadly divided into two categories. First, \emph{node-wise} architectures pre-compute individual embeddings for each node and then combine them using a simple decoder to make predictions. While efficient at inference time, the model's expressiveness is limited, and isomorphic nodes contributing to candidate edges may not be distinguishable, compromising accuracy. In contrast, \emph{edge-wise} methods form edge-specific subgraph embeddings to enrich the representation of pair-wise relationships, disambiguating isomorphic nodes and improving accuracy, but with increased model complexity. To better balance this trade-off, we propose a novel GNN architecture that depends on both positive and negative edges in the forward pass to inform more flexible, yet still cheap node-wise embeddings. This is achieved by recasting the embeddings as minimizers of a forward-pass-specific energy function that favors separation of positive and negative samples. As shown by extensive empirical evaluations, the resulting architecture retains the inference speed of node-wise models while producing competitive accuracy with edge-wise alternatives.

Online Parameter Identification of Generalized Non-cooperative Game

  • paper_url: http://arxiv.org/abs/2310.09511
  • repo_url: None
  • paper_authors: Jianguo Chen, Jinlong Lei, Hongsheng Qi, Yiguang Hong
  • for: 这篇论文研究了一种通用非合作游戏中的参数识别问题,其中每个玩家的成本函数受到可观察信号和一些未知参数的影响。我们考虑的情况是,游戏的平衡状态在一些可观察信号下可以观察到噪音,而我们的目标是通过观察数据来确定未知参数。
  • methods: 我们将这个参数识别问题建模为在线优化问题,并提出了一种新的在线参数识别算法。我们构造了一个正则化损失函数,该函数平衡保守性和正确性。我们然后证明,当玩家的成本函数对未知参数 Linear 时,并且学习率满足 \mu_k \propto 1/\sqrt{k},则提出的算法的做异变Bound 为 O(\sqrt{K)。
  • results: 我们通过一个Nash-Cournot问题的数值实验表明,提出的算法在线参数识别性能与Offline设置相当。
    Abstract This work studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. We consider the scenario where equilibrium of the game at some observable signals can be observed with noises, whereas our goal is to identify the unknown parameters with the observed data. Assuming that the observable signals and the corresponding noise-corrupted equilibriums are acquired sequentially, we construct this parameter identification problem as online optimization and introduce a novel online parameter identification algorithm. To be specific, we construct a regularized loss function that balances conservativeness and correctiveness, where the conservativeness term ensures that the new estimates do not deviate significantly from the current estimates, while the correctiveness term is captured by the Karush-Kuhn-Tucker conditions. We then prove that when the players' cost functions are linear with respect to the unknown parameters and the learning rate of the online parameter identification algorithm satisfies \mu_k \propto 1/\sqrt{k}, along with other assumptions, the regret bound of the proposed algorithm is O(\sqrt{K}). Finally, we conduct numerical simulations on a Nash-Cournot problem to demonstrate that the performance of the online identification algorithm is comparable to that of the offline setting.
    摘要 To construct the regularized loss function, we balance conservativeness and correctiveness by incorporating a conservativeness term that ensures the new estimates do not deviate significantly from the current estimates, and a correctiveness term captured by the Karush-Kuhn-Tucker conditions. We prove that when the players' cost functions are linear with respect to the unknown parameters and the learning rate of the online parameter identification algorithm satisfies $\mu_k \propto 1/\sqrt{k}$, the regret bound of the proposed algorithm is $O(\sqrt{K})$.We demonstrate the performance of the online identification algorithm through numerical simulations on a Nash-Cournot problem, showing that its performance is comparable to that of the offline setting.Here is the text in Simplified Chinese:这个研究studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. 我们考虑了一个场景,在可观察的信号下可以观察到游戏的平衡,但我们的目标是使用观察数据来确定未知参数。我们将这个问题作为在线优化问题进行处理,并提出了一种新的在线参数标识算法。为了构建正则化损失函数,我们尝试了保守和正确之间的平衡,其中保守性条件保证新的估计不会偏离当前估计的情况,而正确性条件则是由卡鲁什-库南-图克条件捕捉。我们证明,当玩家的成本函数对未知参数是线性的,并且在线参数标识算法的学习率满足 $\mu_k \propto 1/\sqrt{k}$ 的情况下,我们的算法的 regret bound是 $O(\sqrt{K})$。最后,我们通过对纳希-库诺问题的数值实验表明,我们的算法的性能与Offline Setting的性能相似。Here is the text in Traditional Chinese:这个研究studies the parameter identification problem of a generalized non-cooperative game, where each player's cost function is influenced by an observable signal and some unknown parameters. 我们考虑了一个场景,在可观察的信号下可以观察到游戏的平衡,但我们的目标是使用观察数据来确定未知参数。我们将这个问题作为在线优化问题进行处理,并提出了一种新的在线参数标识算法。为了建构正规化损失函数,我们尝试了保守和正确之间的平衡,其中保守性条件保证新的估计不会偏离当前估计的情况,而正确性条件则是由卡鲁什-库南-图克条件捕捉。我们证明,当玩家的成本函数对未知参数是线性的,并且在线参数标识算法的学习率满足 $\mu_k \propto 1/\sqrt{k}$ 的情况下,我们的算法的 regret bound是 $O(\sqrt{K})$。最后,我们通过对纳希-库诺问题的数值实验表明,我们的算法的性能与Offline Setting的性能相似。

Advancing Test-Time Adaptation for Acoustic Foundation Models in Open-World Shifts

  • paper_url: http://arxiv.org/abs/2310.09505
  • repo_url: None
  • paper_authors: Hongfu Liu, Hengguan Huang, Ye Wang
  • for: 本研究旨在解决在推理时遇到分布shift的问题,尤其是在视觉识别任务中。
  • methods: 本研究使用了一种新的adaptation方法,该方法不依赖于传统的heuristic,而是通过confidence enhancement和consistency regularization来增强适应性。
  • results: 实验结果表明,本方法在synthetic和real-world数据上表现出色,超过了现有基elines。
    Abstract Test-Time Adaptation (TTA) is a critical paradigm for tackling distribution shifts during inference, especially in visual recognition tasks. However, while acoustic models face similar challenges due to distribution shifts in test-time speech, TTA techniques specifically designed for acoustic modeling in the context of open-world data shifts remain scarce. This gap is further exacerbated when considering the unique characteristics of acoustic foundation models: 1) they are primarily built on transformer architectures with layer normalization and 2) they deal with test-time speech data of varying lengths in a non-stationary manner. These aspects make the direct application of vision-focused TTA methods, which are mostly reliant on batch normalization and assume independent samples, infeasible. In this paper, we delve into TTA for pre-trained acoustic models facing open-world data shifts. We find that noisy, high-entropy speech frames, often non-silent, carry key semantic content. Traditional TTA methods might inadvertently filter out this information using potentially flawed heuristics. In response, we introduce a heuristic-free, learning-based adaptation enriched by confidence enhancement. Noting that speech signals' short-term consistency, we also apply consistency regularization during test-time optimization. Our experiments on synthetic and real-world datasets affirm our method's superiority over existing baselines.
    摘要 测试时适应(TTA)是视觉识别任务中的一种关键方法,尤其在面临测试时数据分布变化时。然而,针对声音模型在开放数据中的应用,TTA技术尚缺乏专门的设计。这个差距更加明显,当考虑声音基础模型的特点:1)它们主要基于变换器架构,并且2)它们在测试时声音数据的非站ARY方式进行处理。这些特点使得直接应用视觉领域的TTA方法,它们主要基于批处理Normalization,并假设样本独立,成为不可能的。在这篇论文中,我们探讨了声音模型面临开放数据分布变化时的TTA方法。我们发现,噪音高 entropy的声音帧,常常包含关键的 semantic content。传统的TTA方法可能会通过可能的损害的规则,干扰这些信息。为此,我们提出了一种不含规则的学习型适应,并在测试时优化时应用了一致性规范。我们在 sintetic 和实际世界的数据集上进行了实验,并证明了我们的方法在现有的基准点上表现出色。

ARM: Refining Multivariate Forecasting with Adaptive Temporal-Contextual Learning

  • paper_url: http://arxiv.org/abs/2310.09488
  • repo_url: None
  • paper_authors: Jiecheng Lu, Xu Han, Shihao Yang
  • for: 这篇论文是为了解决长期时间序列预测(LTSF)中的复杂时间contextual relationships问题,以提高LTSF的准确性和效率。
  • methods: 本论文提出了ARM方法,它是一种多元时间contextual adaptive learning方法,具有Adaptive Univariate Effect Learning(AUEL)、Random Dropping(RD)训练策略和Multi-kernel Local Smoothing(MKLS)等特点,能够更好地捕捉个别时间序列的特征和正确地学习时间序列之间的相互关联。
  • results: 本论文透过多个 benchmark 评估,表明ARM方法可以实现与vanilla Transformer相比的高度改进,无需增加计算成本。此外,ARM方法还可以应用于其他LTSF架构中,进一步提高LTSF的准确性和效率。
    Abstract Long-term time series forecasting (LTSF) is important for various domains but is confronted by challenges in handling the complex temporal-contextual relationships. As multivariate input models underperforming some recent univariate counterparts, we posit that the issue lies in the inefficiency of existing multivariate LTSF Transformers to model series-wise relationships: the characteristic differences between series are often captured incorrectly. To address this, we introduce ARM: a multivariate temporal-contextual adaptive learning method, which is an enhanced architecture specifically designed for multivariate LTSF modelling. ARM employs Adaptive Univariate Effect Learning (AUEL), Random Dropping (RD) training strategy, and Multi-kernel Local Smoothing (MKLS), to better handle individual series temporal patterns and correctly learn inter-series dependencies. ARM demonstrates superior performance on multiple benchmarks without significantly increasing computational costs compared to vanilla Transformer, thereby advancing the state-of-the-art in LTSF. ARM is also generally applicable to other LTSF architecture beyond vanilla Transformer.
    摘要 长期时间序预测(LTSF)在多个领域都具有重要性,但面临诸多复杂的时间temporal-contextual关系处理挑战。由于多变量输入模型在一些最近的单变量对手下表现不佳,我们认为这是现有多变量 LTSF Transformers 的不 suficient efficiency 导致的,即不能correctly capture series-wise differences between series。为解决这个问题,我们引入 ARM:一种多变量时间temporal-contextual适应学习方法,这是特制的 multivariate LTSF 模型建立的改进建筑。ARM 使用 Adaptive Univariate Effect Learning(AUEL)、Random Dropping(RD)训练策略和 Multi-kernel Local Smoothing(MKLS),以更好地处理个别时间序的特征和正确地学习间seriesdependencies。ARM 在多个标准测试集上表现出色,不对计算成本做出 significanlly increase 相比 vanilla Transformer,因此提高了 LTSF 领域的状态。ARM 还可以应用于其他 LTSF 架构 beyond vanilla Transformer。

Applying Bayesian Ridge Regression AI Modeling in Virus Severity Prediction

  • paper_url: http://arxiv.org/abs/2310.09485
  • repo_url: None
  • paper_authors: Jai Pal, Bryan Hong
    for: This paper aims to review the strengths and weaknesses of Bayesian Ridge Regression, an AI model for virus analysis in healthcare.methods: The paper uses Bayesian Ridge Regression to analyze vast amounts of data and provide more accurate and speedy diagnoses.results: The model shows promising results with room for improvement in data organization, and the severity index serves as a valuable tool for gaining a broad overview of patient care needs.Here’s the text in Simplified Chinese:for: 这篇论文目标是对健康预测模型进行评估,以提高医疗系统的效率和质量。methods: 论文使用拟合梯度回归模型对庞大数据进行分析,以提供更准确和快速的诊断。results: 模型显示了有前途的结果,但存在数据组织问题,同时严重性指数可以作为医疗专业人员的全面评估工具。
    Abstract Artificial intelligence (AI) is a powerful tool for reshaping healthcare systems. In healthcare, AI is invaluable for its capacity to manage vast amounts of data, which can lead to more accurate and speedy diagnoses, ultimately easing the workload on healthcare professionals. As a result, AI has proven itself to be a power tool across various industries, simplifying complex tasks and pattern recognition that would otherwise be overwhelming for humans or traditional computer algorithms. In this paper, we review the strengths and weaknesses of Bayesian Ridge Regression, an AI model that can be used to bring cutting edge virus analysis to healthcare professionals around the world. The model's accuracy assessment revealed promising results, with room for improvement primarily related to data organization. In addition, the severity index serves as a valuable tool to gain a broad overview of patient care needs, aligning with healthcare professionals' preference for broader categorizations.
    摘要 人工智能(AI)是一种 poderful工具,可以改变医疗系统的方式。在医疗方面, AI 的能力极大,可以处理大量数据,从而导致更准确和更快的诊断,最终减轻医疗专业人员的工作负担。作为结果, AI 在不同的行业中证明了自己的价值,通过简化复杂任务和人工智能模型来实现人类无法完成的任务。在这篇评论中,我们评估了悲观梁树回归模型的优点和缺点,该模型可以为医疗专业人员提供 cutting-edge 病毒分析。模型的准确性评估表明了扎实的结果,主要是因为数据的组织问题。此外,严重指数作为一个有价值的工具,可以为医疗专业人员提供全面的患者护理需求的大致评估,与医疗专业人员的偏好相符,即通过更广泛的分类来评估患者的需求。

Can CNNs Accurately Classify Human Emotions? A Deep-Learning Facial Expression Recognition Study

  • paper_url: http://arxiv.org/abs/2310.09473
  • repo_url: None
  • paper_authors: Ashley Jisue Hong, David DiStefano, Sejal Dua
  • for: 这项研究旨在评估一种CNN模型是否能够识别和分类人类表情(积极、中性、消极)。
  • methods: 该模型使用Python编程和预处理的芝加哥人脸数据库数据进行训练,并采用了更简单的设计来进一步检验其能力。
  • results: 研究结果表明,该模型在10,000张图像(数据)上达到75%的准确率,证明了人类情感分析的可能性并且预示了可行的情感AI。
    Abstract Emotional Artificial Intelligences are currently one of the most anticipated developments of AI. If successful, these AIs will be classified as one of the most complex, intelligent nonhuman entities as they will possess sentience, the primary factor that distinguishes living humans and mechanical machines. For AIs to be classified as "emotional," they should be able to empathize with others and classify their emotions because without such abilities they cannot normally interact with humans. This study investigates the CNN model's ability to recognize and classify human facial expressions (positive, neutral, negative). The CNN model made for this study is programmed in Python and trained with preprocessed data from the Chicago Face Database. The model is intentionally designed with less complexity to further investigate its ability. We hypothesized that the model will perform better than chance (33.3%) in classifying each emotion class of input data. The model accuracy was tested with novel images. Accuracy was summarized in a percentage report, comparative plot, and confusion matrix. Results of this study supported the hypothesis as the model had 75% accuracy over 10,000 images (data), highlighting the possibility of AIs that accurately analyze human emotions and the prospect of viable Emotional AIs.
    摘要 currently, Emotional Artificial Intelligences are one of the most anticipated developments in AI. if successful, these AIs will be classified as one of the most complex, intelligent nonhuman entities, as they will possess sentience, the primary factor that distinguishes living humans and mechanical machines. for AIs to be classified as "emotional," they should be able to empathize with others and classify their emotions, because without such abilities, they cannot normally interact with humans. this study investigates the CNN model's ability to recognize and classify human facial expressions (positive, neutral, negative). the CNN model made for this study is programmed in Python and trained with preprocessed data from the Chicago Face Database. the model is intentionally designed with less complexity to further investigate its ability. we hypothesized that the model will perform better than chance (33.3%) in classifying each emotion class of input data. the model accuracy was tested with novel images. accuracy was summarized in a percentage report, comparative plot, and confusion matrix. results of this study supported the hypothesis as the model had 75% accuracy over 10,000 images (data), highlighting the possibility of AIs that accurately analyze human emotions and the prospect of viable Emotional AIs.

Randomized Benchmarking of Local Zeroth-Order Optimizers for Variational Quantum Systems

  • paper_url: http://arxiv.org/abs/2310.09468
  • repo_url: https://github.com/ltecot/rand_bench_opt_quantum
  • paper_authors: Lucas Tecot, Cho-Jui Hsieh
  • for: 本研究旨在比较不同类型的классиical optimizer在partially-randomized任务中的表现,以更广泛地涵盖量子优化问题的空间。
  • methods: 本研究使用了本地零次ORDER optimizer,因为它们在量子系统上通常有更好的表现和查询效率。
  • results: 实验结果表明,不同类型的 классиical optimizer在不同的任务中的表现有很大差异,并且本地零次ORDER optimizer在一些任务中表现较好。
    Abstract In the field of quantum information, classical optimizers play an important role. From experimentalists optimizing their physical devices to theorists exploring variational quantum algorithms, many aspects of quantum information require the use of a classical optimizer. For this reason, there are many papers that benchmark the effectiveness of different optimizers for specific quantum optimization tasks and choices of parameterized algorithms. However, for researchers exploring new algorithms or physical devices, the insights from these studies don't necessarily translate. To address this concern, we compare the performance of classical optimizers across a series of partially-randomized tasks to more broadly sample the space of quantum optimization problems. We focus on local zeroth-order optimizers due to their generally favorable performance and query-efficiency on quantum systems. We discuss insights from these experiments that can help motivate future works to improve these optimizers for use on quantum systems.
    摘要 在量子信息领域中,古典优化器扮演着重要的角色。从实验室的设备优化到理论家探索量子变量算法,许多量子信息领域的问题都需要使用古典优化器。因此,有很多论文来比较不同的优化器在特定量子优化任务中的效果。然而,对于探索新的算法或物理设备的研究人员,这些研究并不能够直接适用。为了解决这个问题,我们对古典优化器在部分随机任务中的性能进行了比较,以更广泛地涵盖量子优化问题的空间。我们主要关注本地零阶优化器,因为它们在量子系统上通常有良好的性能和查询效率。我们对这些实验的结果进行了讨论,以帮助未来的工作者改进这些优化器的性能在量子系统上。