cs.AI - 2023-11-17

Flexible Model Interpretability through Natural Language Model Editing

  • paper_url: http://arxiv.org/abs/2311.10905
  • repo_url: None
  • paper_authors: Karel D’Oosterlinck, Thomas Demeester, Chris Develder, Christopher Potts
  • for: 论文目的是提高模型解释性和修改模型行为的可能性。
  • methods: 论文使用了修改模型行为以探索人类概念的方法。
  • results: 研究发现,可以通过修改模型行为来提高模型内部表示的解释性,并且可以通过这种方法找到相关的表示和修改它们。
    Abstract Model interpretability and model editing are crucial goals in the age of large language models. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically manipulating them.
    摘要 MODEL理解和模型编辑是当今大语言模型时代的两个关键目标。有趣的是,这两个目标之间存在一种联系:如果一种方法可以系统地编辑模型的行为,以便更好地理解人类概念的内部表示。这种编辑方法可以帮助暴露内部表示,并系统地操作它们,从而使模型的内部表示更加可读。

On Functional Activations in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2311.10898
  • repo_url: None
  • paper_authors: Andrew S. Nencka, L. Tugan Muftuler, Peter LaViolette, Kevin M. Koch
  • for: probing the workings of deep neural networks, specifically the Facebook Galactica-125M language model
  • methods: functional neuroimaging techniques were applied to the model, using block-designed task-based prompt sequences to probe its functional structure
  • results: distinct, overlapping networks were identified for each task, with the most overlap between medical imaging and pathology networks, and the identified functional networks were found to be repeatable across repeated performance of related tasks and accurate in identifying presented tasks
    Abstract Background: Deep neural networks have proven to be powerful computational tools for modeling, prediction, and generation. However, the workings of these models have generally been opaque. Recent work has shown that the performance of some models are modulated by overlapping functional networks of connections within the models. Here the techniques of functional neuroimaging are applied to an exemplary large language model to probe its functional structure. Methods: A series of block-designed task-based prompt sequences were generated to probe the Facebook Galactica-125M model. Tasks included prompts relating to political science, medical imaging, paleontology, archeology, pathology, and random strings presented in an off/on/off pattern with prompts about other random topics. For the generation of each output token, all layer output values were saved to create an effective time series. General linear models were fit to the data to identify layer output values which were active with the tasks. Results: Distinct, overlapping networks were identified with each task. Most overlap was observed between medical imaging and pathology networks. These networks were repeatable across repeated performance of related tasks, and correspondence of identified functional networks and activation in tasks not used to define the functional networks was shown to accurately identify the presented task. Conclusion: The techniques of functional neuroimaging can be applied to deep neural networks as a means to probe their workings. Identified functional networks hold the potential for use in model alignment, modulation of model output, and identifying weights to target in fine-tuning.
    摘要 Background: 深度神经网络已经证明是有力的计算工具,用于模型、预测和生成。然而,这些模型的工作方式通常是不透明的。最近的研究表明,一些模型的性能是由模型内部的重叠功能网络连接所模ulated。在这种情况下,我们使用函数神经成像技术来探索Facebook Galactica-125M模型的函数结构。Methods: 我们生成了一系列块设计的任务基本提示序列,用于探索Facebook Galactica-125M模型。这些任务包括政治科学、医学成像、古生物学、考古学、病理学和随机串列,并以On/Off/On的模式提交关于其他随机主题的提示。为每个输出字符生成,所有层输出值都被保存,以创建一个有效的时间序列。我们使用通用线性模型适应 данных,以确定每个任务的相关层输出值。Results: 我们发现了每个任务都有独特的、重叠的函数网络。最多的重叠是在医学成像和病理学任务之间。这些网络在相关任务的重复执行中是重复的,并且可以用来确定提交的任务。我们还发现,在不使用定义函数网络的任务中,活跃的层输出值与确定的任务相关。Conclusion: 我们可以将函数神经成像技术应用于深度神经网络,以探索它们的工作方式。被发现的函数网络具有识别和调整模型输出的潜在优势。

The Hidden Linear Structure in Score-Based Models and its Application

  • paper_url: http://arxiv.org/abs/2311.10892
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Binxu Wang, John J. Vastola
  • for: 本研究旨在发现Score-based模型中是否存在 universale结构,以便更好地设计模型和处理数据。
  • methods: 研究人员使用了 normative analysis of the score function 来找到这种结构,并通过 empirical validation of pre-trained images diffusion model 和理论分析来证明其存在。
  • results: 研究人员发现,在高噪音级别下,Well-trained diffusion models 的学习得分可以近似于 Gaussian 的直线得分。这种发现可以帮助预测初始噪音轨迹,并在预测图像扩散过程中提高样本质量。
    Abstract Score-based models have achieved remarkable results in the generative modeling of many domains. By learning the gradient of smoothed data distribution, they can iteratively generate samples from complex distribution e.g. natural images. However, is there any universal structure in the gradient field that will eventually be learned by any neural network? Here, we aim to find such structures through a normative analysis of the score function. First, we derived the closed-form solution to the scored-based model with a Gaussian score. We claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian. We demonstrated this through empirical validation of pre-trained images diffusion model and theoretical analysis of the score function. This finding enabled us to precisely predict the initial diffusion trajectory using the analytical solution and to accelerate image sampling by 15-30\% by skipping the initial phase without sacrificing image quality. Our finding of the linear structure in the score-based model has implications for better model design and data pre-processing.
    摘要 score-based 模型在多个领域的生成模型中获得了惊人的成绩。通过学习抽象数据分布的梯度场,它们可以逐步生成复杂分布,例如自然图像。然而,任何神经网络都会学习到 universal 结构在梯度场中吗?在这里,我们通过评估函数的正规分析来寻找这些结构。首先,我们得到了涉及到 scored-based 模型的关闭式解。我们证明,对于具有高噪声级别的扩散模型,学习的得分在高噪声级别上是可以近似为 Gaussian 梯度的线性得分。我们通过实验验证预训练的图像扩散模型和理论分析得分函数来证明这一点。这一发现使我们能够准确预测扩散过程的初始态使用关闭式解,并且可以通过跳过初始阶段来加速图像抽取,从而提高图像质量。我们发现了梯度场中的线性结构,这种结构在 score-based 模型中具有更好的模型设计和数据预处理的意义。

Verified Compositional Neuro-Symbolic Control for Stochastic Systems with Temporal Logic Tasks

  • paper_url: http://arxiv.org/abs/2311.10863
  • repo_url: None
  • paper_authors: Jun Wang, Kaiyuan Tan, Zihe Sun, Yiannis Kantaros
    for:* The paper aims to learn neural network (NN) controllers for autonomous agents with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL).methods:* The paper proposes a new approach that integrates automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems.* The proposed method uses a neuro-symbolic controller that allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills.results:* The paper shows correctness of the proposed method and provides conditions under which it is complete.* The proposed method is demonstrated through extensive numerical simulations and hardware experiments on robot navigation tasks.
    Abstract Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents, with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). Due to the sample-inefficiency of the majority of these works, compositional learning methods have been proposed decomposing the LTL specification into smaller sub-tasks. Then, separate controllers are learned and composed to satisfy the original task. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge. Particularly, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers - and if so, to compute it - that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that relies on a novel integration of automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We show correctness of the proposed method and we provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.
    摘要 Recently, several methods have been proposed to learn neural network (NN) controllers for autonomous agents with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). However, due to the sample-inefficiency of most of these works, compositional learning methods have been proposed to decompose the LTL specification into smaller sub-tasks, and then learn separate controllers to satisfy the original task. A major challenge within these approaches is the lack of safety guarantees or impractical guarantees provided. This paper aims to address this challenge. Specifically, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers, and if so, to compute it, that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that combines automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We prove the correctness of the proposed method and provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.

Formal concept analysis for evaluating intrinsic dimension of a natural language

  • paper_url: http://arxiv.org/abs/2311.10862
  • repo_url: None
  • paper_authors: Sergei O. Kuznetsov, Vasilii A. Gromov, Nikita S. Borodin, Andrei M. Divavin
  • for: 这些研究是为了确定孟加拉语和俄语的内在维度的计算实验结果。
  • methods: 这些研究使用了正式概念分析算法来解决这个问题,同时分别考虑了这两种语言的单词和大词组。
  • results: 研究发现,这两种语言的内在维度远低于 популяр的人工神经网络模型在自然语言处理中使用的维度。
    Abstract Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of these languages are significantly less than the dimensions used in popular neural network models in natural language processing.
    摘要 <>将文本翻译成简化中文。<>这些计算实验结果表明,孟加拉语和俄语的语言变体的内在维度非常低。在这些语言中,单词和大字组都被考虑了。使用了正式概念分析算法来解决这个问题。结果显示,这些语言的内在维度明显比流行的人工神经网络模型在自然语言处理中使用的维度更低。

Exploring the Consistency, Quality and Challenges in Manual and Automated Coding of Free-text Diagnoses from Hospital Outpatient Letters

  • paper_url: http://arxiv.org/abs/2311.10856
  • repo_url: None
  • paper_authors: Warren Del-Pinto, George Demetriou, Meghna Jani, Rikesh Patel, Leanne Gray, Alex Bulcock, Niels Peek, Andrew S. Kanter, William G Dixon, Goran Nenadic
  • For: This paper aims to evaluate the quality and consistency of manual and automated clinical coding of diagnoses from hospital outpatient letters.* Methods: The authors used 100 randomly selected letters for coding, and two human clinicians performed coding of diagnosis lists to SNOMED CT. Automated coding was also performed using IMO’s Concept Tagger. A gold standard was constructed by a panel of clinicians from a subset of the annotated diagnoses.* Results: The results indicate that humans slightly out-performed automated coding, while both performed notably better when there was only a single diagnosis contained in the free-text description. Automated coding was considered acceptable by the panel of clinicians in approximately 90% of cases.Here’s the Chinese translation of the three key information points:* For: 这篇论文目标是评估人工和自动化医疗诊断代码的质量和一致性。* Methods: 作者使用100个随机选择的医疗信息letter进行编码,并由两名人类医生对诊断列表进行编码到SNOMED CT。自动编码也使用IMO的概念标签。一个金标准由一组医生从一 subset of annotated diagnoses中构建。* Results: 结果显示人工编码slightly exceeded自动编码,而两者都在只有一个诊断存在于自由文本描述时表现更好。自动编码被评估为可以接受的程度,大约90%的 случа子。
    Abstract Coding of unstructured clinical free-text to produce interoperable structured data is essential to improve direct care, support clinical communication and to enable clinical research.However, manual clinical coding is difficult and time consuming, which motivates the development and use of natural language processing for automated coding. This work evaluates the quality and consistency of both manual and automated clinical coding of diagnoses from hospital outpatient letters. Using 100 randomly selected letters, two human clinicians performed coding of diagnosis lists to SNOMED CT. Automated coding was also performed using IMO's Concept Tagger. A gold standard was constructed by a panel of clinicians from a subset of the annotated diagnoses. This was used to evaluate the quality and consistency of both manual and automated coding via (1) a distance-based metric, treating SNOMED CT as a graph, and (2) a qualitative metric agreed upon by the panel of clinicians. Correlation between the two metrics was also evaluated. Comparing human and computer-generated codes to the gold standard, the results indicate that humans slightly out-performed automated coding, while both performed notably better when there was only a single diagnosis contained in the free-text description. Automated coding was considered acceptable by the panel of clinicians in approximately 90% of cases.
    摘要 临床自由文本编码到可互操作结构化数据是直接护理、促进临床通信和促进临床研究的关键。然而,手动临床编码困难和时间consuming,这些动机了开发和使用自然语言处理(NLP)自动编码。这项工作评估手动和自动临床编码诊断列表的质量和一致性。使用100份随机选择的医疗信息,两名人类临床医生 manually编码诊断列表到SNOMED CT。自动编码也使用IMO的概念标签器。一个由临床医生组成的委员会从一个子集中的注释诊断构建了黄金标准。这些标准用于评估手动和自动编码质量和一致性。首先,使用 distance-based 度量,将 SNOMED CT 视为图,并使用一种由委员会所议定的质量量表。其次,对手动和自动编码与黄金标准进行比较。结果显示,人类轻微地超越了自动编码,而两者都在单个诊断存在于自由文本描述时表现出色。自动编码被委员会认为是可接受的约90%的情况下。

Artificial Intelligence in Fetal Resting-State Functional MRI Brain Segmentation: A Comparative Analysis of 3D UNet, VNet, and HighRes-Net Models

  • paper_url: http://arxiv.org/abs/2311.10844
  • repo_url: None
  • paper_authors: Farzan Vahedifard, Xuchu Liu, Mehmet Kocak, H. Asher Ai, Mark Supanich, Christopher Sica., Kranthi K Marathu, Seth Adler, Maysam Orouskhani, Sharon Byrd
  • for: 这个研究的目的是提高婴儿脑功能磁共振成像(fMRI)中脑部分分 Segmentation 的精度。
  • methods: 这个研究使用了人工智能(AI)技术来自动进行脑部分分 Segmentation,并使用了一个开源的婴儿 fMRI 数据集来训练 AI 模型。
  • results: 研究发现,使用 VNet 模型可以提高脑部分分 Segmentation 的精度,但是需要进一步的调整和研究以全面 explore 每种模型的潜力和局限性。
    Abstract Introduction: Fetal resting-state functional magnetic resonance imaging (rs-fMRI) is a rapidly evolving field that provides valuable insight into brain development before birth. Accurate segmentation of the fetal brain from the surrounding tissue in nonstationary 3D brain volumes poses a significant challenge in this domain. Current available tools have 0.15 accuracy. Aim: This study introduced a novel application of artificial intelligence (AI) for automated brain segmentation in fetal brain fMRI, magnetic resonance imaging (fMRI). Open datasets were employed to train AI models, assess their performance, and analyze their capabilities and limitations in addressing the specific challenges associated with fetal brain fMRI segmentation. Method: We utilized an open-source fetal functional MRI (fMRI) dataset consisting of 160 cases (reference: fetal-fMRI - OpenNeuro). An AI model for fMRI segmentation was developed using a 5-fold cross-validation methodology. Three AI models were employed: 3D UNet, VNet, and HighResNet. Optuna, an automated hyperparameter-tuning tool, was used to optimize these models. Results and Discussion: The Dice scores of the three AI models (VNet, UNet, and HighRes-net) were compared, including a comparison between manually tuned and automatically tuned models using Optuna. Our findings shed light on the performance of different AI models for fetal resting-state fMRI brain segmentation. Although the VNet model showed promise in this application, further investigation is required to fully explore the potential and limitations of each model, including the HighRes-net model. This study serves as a foundation for further extensive research into the applications of AI in fetal brain fMRI segmentation.
    摘要 引言:胎儿休息状态功能磁共振成像(rs-fMRI)是一个快速发展的领域,它为胎儿脑发展前的脑部提供了价值的信息。但是,准确地从周围组织中分离胎儿脑部在非站ARY 3D脑部图像中是一项 significante challenges。目前可用的工具的准确率只有0.15。目标:本研究推出了一种使用人工智能(AI)自动 segmentation的胎儿脑部fMRI magnet resonance imaging(fMRI)应用。我们使用了开源的胎儿功能MRI数据集(fetal-fMRI - OpenNeuro)来训练AI模型,评估其性能,并分析它们在特定挑战中的能力和局限性。方法:我们使用了一个5-fold cross-validation方法来开发一个AI模型。我们使用了3D UNet、VNet和HighResNet三种AI模型。我们使用了Optuna,一个自动调参工具,来优化这些模型。结果和讨论:我们比较了三种AI模型(VNet、UNet和HighRes-net)的Dice分数,包括手动调参和使用Optuna自动调参的模型的比较。我们的发现 shed light on the performance of different AI models for fetal resting-state fMRI brain segmentation。虽然VNet模型表现良好,但需要进一步的研究以全面探讨每种模型的潜在和局限性,包括HighRes-net模型。这项研究为胎儿脑fMRI segmentation中AI应用的进一步探索提供了基础。

Integration and Implementation Strategies for AI Algorithm Deployment with Smart Routing Rules and Workflow Management

  • paper_url: http://arxiv.org/abs/2311.10840
  • repo_url: None
  • paper_authors: Barbaros Selnur Erdal, Vikash Gupta, Mutlu Demirer, Kim H. Fair, Richard D. White, Jeff Blair, Barbara Deichert, Laurie Lafleur, Ming Melvin Qin, David Bericat, Brad Genereaux
  • for: 这篇论文探讨了医疗行业广泛采用人工智能解决方案的挑战,特别是医疗图像应用,以及如何通过可交互性和企业级扩展性解决这些挑战。
  • methods: 论文强调了健康域的复杂性,大量和安全的医疗图像数据管理,以及人工智能开发的标准化框架缺失,这些因素都是阻碍医疗AI应用广泛采用的主要障碍。
  • results: 论文认为,通过可交互性和企业级扩展性来解决这些挑战,可以提高医疗AI应用的普及率。例如,DICOM、HL7和IHE等标准被用于健康域的共同图像工作流程。Laurel Bridge在这一领域中发挥了转变性的作用。而MONAI项目,成立于2019年,也被认为是一种重要的 iniciativa,用于重定义医疗AI应用的开发。MONAI部署App SDK是该项目的关键工具,可以简化AI应用的包装和部署过程,使得AI应用的扩展和标准化部署模式变得可能。
    Abstract This paper reviews the challenges hindering the widespread adoption of artificial intelligence (AI) solutions in the healthcare industry, focusing on computer vision applications for medical imaging, and how interoperability and enterprise-grade scalability can be used to address these challenges. The complex nature of healthcare workflows, intricacies in managing large and secure medical imaging data, and the absence of standardized frameworks for AI development pose significant barriers and require a new paradigm to address them. The role of interoperability is examined in this paper as a crucial factor in connecting disparate applications within healthcare workflows. Standards such as DICOM, Health Level 7 HL7, and Integrating the Healthcare Enterprise (IHE) are highlighted as foundational for common imaging workflows. A specific focus is placed on the role of DICOM gateways, with Laurel Bridge leading transformational efforts in this area. To drive enterprise scalability, new tools are needed. Project MONAI, established in 2019, is introduced as an initiative aiming to redefine the development of medical AI applications. The MONAI Deploy App SDK, a component of Project MONAI, is identified as a key tool in simplifying the packaging and deployment process, enabling repeatable, scalable, and standardized deployment patterns for AI applications. The abstract underscores the potential impact of successful AI adoption in healthcare, offering physicians both life-saving and time-saving insights and driving efficiencies in radiology department workflows. The collaborative efforts between academia and industry, exemplified by collaborations with organizations like NVIDIA and Laurel Bridge, are emphasized as essential for advancing the adoption of healthcare AI solutions.
    摘要 这篇论文检讨了医疗领域人工智能解决方案的广泛应用面临的挑战,特别是医疗图像应用,以及如何通过可交互性和企业级扩展性来解决这些挑战。医疗工作流程的复杂性、管理大量安全医疗图像的细节和开发人工智能应用程序的缺乏标准化框架是主要的阻碍因素,需要一种新的思维方式来解决这些问题。本文强调了可交互性在医疗工作流程中的重要性,并提到了如DICOM、HL7和IHE等标准的作用。特别是在医疗图像工作流程中,DICOM网关的角色很重要,如Laurel Bridge等公司的努力。为了驱动企业级扩展,新的工具是需要的。2019年成立的项目MONAI是一个旨在重定义医疗人工智能应用程序的开发的 iniciativa。MONAI Deploy App SDK是项目MONAI的一个关键工具,可以简化包装和部署过程,实现可重复、可扩展和标准化的部署模式。摘要 highlights the potential impact of successful AI adoption in healthcare, offering physicians both life-saving and time-saving insights and driving efficiencies in radiology department workflows。学术和产业之间的合作,例如与NVIDIA和Laurel Bridge等组织的合作,被视为健康医疗人工智能解决方案的发展的重要因素。

Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations

  • paper_url: http://arxiv.org/abs/2311.10832
  • repo_url: None
  • paper_authors: Elaheh Jafarigol, Theodore Trafalis, Talayeh Razzaghi, Mona Zamankhani
  • for: 本文是一篇系统性的文献评论,旨在为研究者和实践者提供 Federated Learning 的全面性评论,尤其是在机器学习领域。
  • methods: 本文综述了过去几年的学术文献,包括支持学习/无支持学习机器学习算法、集成方法、meta-heuristic方法、区块链技术和强化学习,以及 Federated Learning 应用场景。
  • results: 本文对 Federated Learning 的不同组件进行了评论,以及其在不同应用场景中的应用。此外,文章还提供了一些未解决的问题和未来研究方向。
    Abstract In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided.
    摘要 在快速发展的人工智能世界中,联邦学习是一种分布式学习框架,旨在保护个人数据隐私。联邦学习为敏捷决策和敏捷研究提供了一个保密的平台,当危机发生时,多个实体可以共同工作,无需分享敏感数据。这种分布式方法允许我们利用多个来源的信息,获得更多的多样化的洞察。本文是根据《系统atic Review和Meta-Analysis(PRISMA)》指南进行的一项系统性的文献综述,具体来说,我们对于联邦学习框架中的支持学习/无支持学习机器学习算法、集成方法、meta-heuristic方法、区块链技术和强化学习的应用进行了广泛的回顾。此外,我们还对联邦学习应用的各种领域进行了概述。本文的主要目的是为研究者和实践者提供联邦学习从机器学习角度的全面的视图。文章还提供了一些开放问题和未来研究方向的讨论。

A Language Agent for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2311.10813
  • repo_url: https://github.com/usc-gvl/agent-driver
  • paper_authors: Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang
  • For: This paper aims to integrate human-like intelligence into autonomous driving systems by leveraging Large Language Models (LLMs) as a cognitive agent.* Methods: The proposed approach, called Agent-Driver, includes a versatile tool library, a cognitive memory of common sense and experiential knowledge, and a reasoning engine for chain-of-thought reasoning, task planning, motion planning, and self-reflection.* Results: The approach significantly outperforms state-of-the-art driving methods on the large-scale nuScenes benchmark, with superior interpretability and few-shot learning ability.Here is the text in Simplified Chinese:* For: 这篇论文目标是将人类智能集成到自动驾驶系统中,通过使用大型自然语言模型(LLM)作为认知代理。* Methods: 提议的方法,称为Agent-Driver,包括一个多功能工具库、一个认知记忆,以及一个链式思维、任务规划、运动规划和自我反思的理解引擎。* Results: 该方法在大规模的 nuScenes 数据集上与当前最佳驾驶方法进行比较,显著超过了它们,并且具有更高的可读性和少量学习能力。
    Abstract Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Project page: \href{https://github.com/USC-GVL/Agent-Driver/blob/main/index.html}{here}.
    摘要 人类水平驾驶是自动驾驶的最终目标。传统方法将自动驾驶分为感知、预测和规划三个步骤,但是这些系统不能充分利用人类的内在智能和经验知识。在这篇论文中,我们提出了一种基本的思想转变,利用大型自然语言模型(LLM)作为认知代理人,将人类化智能integrated into自动驾驶系统。我们的方法,称为Agent-Driver,把传统的自动驾驶管道转变为一个可访问函数调用的多功能工具库、一个包含常识和经验知识的认知储存、以及一个可以进行链式思维、任务规划、运动规划和自我反思的解释引擎。通过LLM的支持,我们的Agent-Driver具有直观的常识和强大的解释能力,因此可以实现更加人类化、智能化的自动驾驶方法。我们在nuScenes大规模测试 benchmark上进行了广泛的实验,并证明了我们的Agent-Driver在与当前状态的驾驶方法相比有很大的提升。我们的方法还表现出了更好的可解释性和少量学习能力。项目页面:

Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning

  • paper_url: http://arxiv.org/abs/2311.10709
  • repo_url: None
  • paper_authors: Rohit Girdhar, Mannat Singh, Andrew Brown, Quentin Duval, Samaneh Azadi, Sai Saketh Rambhatla, Akbar Shah, Xi Yin, Devi Parikh, Ishan Misra
  • for: 这个论文是用于描述一种文本到视频生成模型,即Emu Video,它将文本生成分解成两步:首先根据文本生成图像,然后根据图像和文本生成视频。
  • methods: 这个模型使用了调整的噪声表格和多stage训练来直接生成高质量和高分辨率的视频,不需要先后串行多个模型。
  • results: 人工评价中,我们生成的视频质量高于所有之前的工作(81% vs. Google的Imagen Video,90% vs. Nvidia的PYOCO,96% vs. Meta的Make-A-Video),并且超过了商业解决方案 such as RunwayML的Gen2和Pika Labs。此外,我们的因式方法自然地适用于根据用户文本提示生成动画图像,我们的生成被评价为96%高于前工作。
    Abstract We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
    摘要 我们介绍Emu Video,一种文本到视频生成模型,它将生成过程分为两步:首先生成基于文本的图像,然后生成基于文本和生成图像的视频。我们提出了关键的设计决策——调整的噪声学习策略和多Stage训练——使得我们可以直接生成高质量和高分辨率的视频,不需要先前的深度层次模型。在人工评估中,我们生成的视频质量得到了81%的优势 compared to Google的Imagen Video,90%的优势 compared to Nvidia的PYOCO,和96%的优势 compared to Meta的Make-A-Video。我们的模型超越了商业解决方案 such as RunwayML的Gen2和Pika Labs。最后,我们的分解方法自然地适用于基于用户文本提示的图像动画,我们的生成被评估为96%高于先前的工作。

Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders

  • paper_url: http://arxiv.org/abs/2311.10699
  • repo_url: https://github.com/artefactory-uk/autoencoder-paper
  • paper_authors: Marcel Marais, Mate Hartstein, George Cevora
  • for: 这 paper 的目的是提出一种新的初始化方法,以优化人工神经网络的训练。
  • methods: 该 paper 使用了一种名为 Straddled Matrix Initialiser 的新初始化技术,它基于作者假设大规模数据中的关系都是线性的,只有小规模的非线性关系需要复杂的非线性函数。该技术结合 Straddled Matrix 和 ReLU 活化函数,初始化神经网络为一个几乎是线性模型。
  • results: 作者在三个数据集上使用 Straddled Matrix Initialiser 和七种state-of-the-art 初始化方法进行比较,结果显示 Straddled Matrix Initialiser 在所有 экспериментах中明显超过了所有其他方法。
    Abstract Good weight initialisation is an important step in successful training of Artificial Neural Networks. Over time a number of improvements have been proposed to this process. In this paper we introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. This initialisation technique is motivated by our assumption that major, global-scale relationships in data are linear with only smaller effects requiring complex non-linearities. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model, which we postulate should be a better starting point for optimisation given our assumptions. We test this by training autoencoders on three datasets using Straddled Matrix and seven other state-of-the-art weight initialisation techniques. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
    摘要 好的初始化Weight是训练人工神经网络成功的重要步骤。随着时间的推移,许多改进有被提议用于这个过程。在这篇论文中,我们介绍了一种新的Weight初始化技术called Straddled Matrix Initializer。这种初始化技术基于我们假设大规模数据中的主要关系是线性的,只有小规模的影响需要复杂的非线性关系。将Straddled Matrix和ReLU活化函数结合使得神经网络被初始化为一个事实上的线性模型,我们认为这应该是优化的开始点。我们通过在三个数据集上使用Straddled Matrix和七种现状顶尖Weight初始化技术进行训练autoencoder来测试这个假设。在所有我们的实验中,Straddled Matrix Initializer明显超过了所有其他方法。

A novel post-hoc explanation comparison metric and applications

  • paper_url: http://arxiv.org/abs/2311.10811
  • repo_url: None
  • paper_authors: Shreyan Mitra, Leilani Gilpin
  • for: 这paper是为了评估explainable AI系统之间的一致性而写的。
  • methods: 这paper使用了一种新的评估 metric called Shreyan Distance,用于比较两种explainable AI系统(SHAP和LIME)在 regression和 classification 任务中的一致性。
  • results: 该paper发现,在不同的学习任务中,explainable AI系统之间的一致性可以 vary significantly,表明一致性不仅取决于explainer的自身特性,还取决于学习任务的类型。Here’s the English version of the paper’s abstract:This paper presents a novel metric called the Shreyan Distance to quantify the differences between explainable AI systems. The Shreyan Distance is based on the weighted difference between ranked feature importance lists produced by such systems. The paper compares two popular explainable AI systems, SHAP and LIME, for both regression and classification learning tasks. The results show that the average Shreyan Distance varies significantly between these two tasks, indicating that consistency between explainers not only depends on inherent properties of the explainers themselves, but also the type of learning task. The paper also introduces the XAISuite library, which integrates the Shreyan distance algorithm into machine learning pipelines.
    Abstract Explanatory systems make the behavior of machine learning models more transparent, but are often inconsistent. To quantify the differences between explanatory systems, this paper presents the Shreyan Distance, a novel metric based on the weighted difference between ranked feature importance lists produced by such systems. This paper uses the Shreyan Distance to compare two explanatory systems, SHAP and LIME, for both regression and classification learning tasks. Because we find that the average Shreyan Distance varies significantly between these two tasks, we conclude that consistency between explainers not only depends on inherent properties of the explainers themselves, but also the type of learning task. This paper further contributes the XAISuite library, which integrates the Shreyan distance algorithm into machine learning pipelines.
    摘要 <>使用描述系统可以使机器学习模型的行为更加透明,但这些系统经常存在不一致性。为了量化这些描述系统之间的差异,这篇论文提出了尚然距离度量,这是基于权重加权的特征重要性列表生成的描述系统之间的差异。本文使用尚然距离对两种描述系统(SHAP和LIME)进行了回归和分类学习任务的比较。我们发现,在这两种任务之间,均值尚然距离有很大的差异,这表明,透明度的一致性不仅取决于描述系统本身的内在特性,还取决于学习任务的类型。此外,本文还提供了XAISuite库,它将尚然距离算法集成到机器学习管道中。

PEFT-MedAware: Large Language Model for Medical Awareness

  • paper_url: http://arxiv.org/abs/2311.10697
  • repo_url: None
  • paper_authors: Keivalya Pandya
  • for: 这个研究旨在提高医疗问题回答 tasks 的精度和效率,特别是在资源有限的环境下。
  • methods: 本研究使用 parameter-efficient fine-tuning (PEFT) 技术来优化 Falcon-1b 大型自然语言模型,并使用特殊的 MedQuAD 资料集,包含 16,407 个医疗问题组。
  • results: 研究发现,使用 PEFT 技术可以将 Falcon-1b 模型在医疗问题回答 tasks 中提高精度,并且只需使用 0.44% 的训练urable 参数,实现了资源有限下的高效性。
    Abstract Chat models are capable of answering a wide range of questions, however, the accuracy of their responses is highly uncertain. In this research, we propose a specialized PEFT-MedAware model where we utilize parameter-efficient fine-tuning (PEFT) to enhance the Falcon-1b large language model on specialized MedQuAD data consisting of 16,407 medical QA pairs, leveraging only 0.44% of its trainable parameters to enhance computational efficiency. The paper adopts data preprocessing and PEFT to optimize model performance, complemented by a BitsAndBytesConfig for efficient transformer training. The resulting model was capable of outperforming other LLMs in medical question-answering tasks in specific domains with greater accuracy utilizing limited computational resources making it suitable for deployment in resource-constrained environments. We propose further improvements through expanded datasets, larger models, and feedback mechanisms for sustained medical relevancy. Our work highlights the efficiency gains and specialized capabilities of PEFT in medical AI, outpacing standard models in precision without extensive resource demands. The proposed model and data are released for research purposes only.
    摘要 协议模型可以回答广泛的问题,但它们的准确性却很 uncertain。在这项研究中,我们提出了一种特殊的 PEFT-MedAware 模型,我们利用 parameter-efficient fine-tuning(PEFT)来提高 Falcon-1b 大语言模型在特殊的 MedQuAD 数据上,只使用 0.44% 的可训练参数来提高计算效率。本文采用了数据处理和 PEFT 来优化模型性能,并使用 BitsAndBytesConfig 来快速训练 transformer。得到的模型能够在具体的医疗问答任务中超越其他 LLMS 的精度,使用有限的计算资源。我们建议进一步改进通过扩展数据集、更大的模型和反馈机制来保持医疗相关性。我们的工作表明 PEFT 在医疗 AI 中具有高效性和特殊能力,在精度不受极限计算资源的情况下,超越标准模型。我们发布了提posed模型和数据供研究用途。

Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records

  • paper_url: http://arxiv.org/abs/2311.10810
  • repo_url: None
  • paper_authors: Yao-Shun Chuang, Xiaoqian Jiang, Chun-Teh Lee, Ryan Brandon, Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji
  • for: 这个研究探讨了对命名实体识别(NER)任务的提问生成器的可用性,以及不同设定中的提问对性能的影响。
  • methods: 该研究使用GPT-J模型生成提问,直接测试标准金句并生成种子,然后将种子传输给RoBERTa模型和spaCy包。
  • results: 研究发现,具有更少的负例并且更多的示例的提问可以达到最佳结果,即F1分数0.72。此外,在训练RoBERTa模型后,所有设定中的性能都保持了0.92-0.97的Consistency,表明种子质量比quantity更重要。这种提问生成方法可以快速和高效地挖掘医疗记录中的 periodontal 诊断。
    Abstract This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher numbers of examples in prompt achieved the best results with a F1 score of 0.72. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model. The study highlighted the importance of seed quality rather than quantity in feeding NER models. This research reports on an efficient and accurate way to mine clinical notes for periodontal diagnoses, allowing researchers to easily and quickly build a NER model with the prompt generation approach.
    摘要

Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

  • paper_url: http://arxiv.org/abs/2311.10809
  • repo_url: None
  • paper_authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang
  • for: 本研究使用文本处理和自然语言处理(NLP)模型,从临床笔记中挖掘出periodontitis的诊断。
  • methods: 研究使用了不同的正则表达(RE)方法,分别是简单RE和复杂RE,对文本数据进行提取和生成训练数据。使用了SpaCy包和RoBERTa变换器模型来构建Named Entity Recognition(NER)模型,并对其与手动标注的金标准进行评估。
  • results: 研究发现,随着RE算法的复杂度增加,F1分数从0.3-0.4提高到约0.9。NER模型显示了优秀的预测性,简单RE方法在评估指标中得分0.84-0.92,而复杂RE方法和组合RE方法在评估指标中得分0.95-0.99。这项研究示例了将NER方法和NLP模型结合使用,从自由文本中提取目标信息,并将其转化为结构化数据,以满足缺失的诊断。
    Abstract This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes.
    摘要 Note: Simplified Chinese is also known as "简化字符" or "简化字符".Please note that the translation is in Simplified Chinese, if you prefer Traditional Chinese, please let me know.

Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections

  • paper_url: http://arxiv.org/abs/2311.10678
  • repo_url: https://github.com/Stanford-ILIAD/droc
  • paper_authors: Lihan Zha, Yuchen Cui, Li-Heng Lin, Minae Kwon, Montserrat Gonzalez Arenas, Andy Zeng, Fei Xia, Dorsa Sadigh
  • for: 本研究旨在帮助机器人在新环境中提高表现,通过人类纠正反馈来启发机器人学习总结和泛化。
  • methods: 本研究使用了大型自然语言模型(LLM),可以响应任意形式的语言反馈,提取纠正中的总结知识,并根据文本和视觉相似性检索相关的过去经验。
  • results: 对比其他直接通过LLM生成机器人代码的技术,DROC需要一round的纠正数量的一半,并在两轮纠正后几乎不需要再纠正。研究还表明,DROC在新任务或物体实例中表现出色,并提供了视频、提示和代码等详细结果。
    Abstract Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
    摘要 We present Distillation and Retrieval of Online Corrections (DROC), a system that responds to arbitrary language feedback, distills generalizable knowledge from corrections, and retrieves relevant past experiences based on textual and visual similarity. DROC can respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in new settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via large language models by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations.For more information, videos, prompts, and code, please visit .

Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference

  • paper_url: http://arxiv.org/abs/2311.10671
  • repo_url: None
  • paper_authors: Marvin Schmitt, Stefan T. Radev, Paul-Christian Bürkner
  • for: 这个论文旨在探讨如何通过将多种不同来源的数据融合,使用神经网络进行模拟基于推断,以提高数据分析精度。
  • methods: 该论文提出了多Modal Neural Posterior Estimation(MultiNPE)方法,启发自深度融合学习的进步,可以将不同领域的数据融合到一起,进行复杂数学模型的参数推断。 authors 提出了不同的多模态融合方法(早期、晚期和混合),并对它们在三个数学实验中进行评估。
  • results: 论文表明,MultiNPE不仅在一个基本模型上超越了 na"ive 基elines,而且在 neuroscience 和 cardiology 等科学领域中的代表性模型上也实现了更高的推断精度。 authors 还系统地研究了部分缺失数据对不同融合策略的影响。 results 表明,晚期和混合融合技术在实际应用中是选择的。
    Abstract We present multimodal neural posterior estimation (MultiNPE), a method to integrate heterogeneous data from different sources in simulation-based inference with neural networks. Inspired by advances in attention-based deep fusion learning, it empowers researchers to analyze data from different domains and infer the parameters of complex mathematical models with increased accuracy. We formulate different multimodal fusion approaches for MultiNPE (early, late, and hybrid) and evaluate their performance in three challenging numerical experiments. MultiNPE not only outperforms na\"ive baselines on a benchmark model, but also achieves superior inference on representative scientific models from neuroscience and cardiology. In addition, we systematically investigate the impact of partially missing data on the different fusion strategies. Across our different experiments, late and hybrid fusion techniques emerge as the methods of choice for practical applications of multimodal simulation-based inference.
    摘要 我们介绍了多Modal neural posterior估计(MultiNPE),一种能够将不同来源的数据集成在基于神经网络的 simulated-based推理中,以提高参数推导的准确性。受到深度融合学习的进步启发,MultiNPE允许研究人员从不同领域中的数据中分析数据,并使用更加准确地推导复杂的数学模型的参数。我们对MultiNPE的不同多模态融合方法(早期、晚期和混合)进行了不同的评估,并在三个复杂的数学实验中评估其性能。MultiNPE不仅在一个标准模型上超越了Na\"ive的基eline,还在 neuroscience和cardiology等科学领域中的代表性模型上实现了更高的推导精度。此外,我们系统地研究了partially missing data的不同融合策略的影响。在我们的不同实验中,晚期和混合融合技术被证明为实际应用中的首选方法。

Multi-delay arterial spin-labeled perfusion estimation with biophysics simulation and deep learning

  • paper_url: http://arxiv.org/abs/2311.10640
  • repo_url: https://github.com/hcmue/2311COMP106401
  • paper_authors: Renjiu Hu, Qihao Zhang, Pascal Spincemaille, Thanh D. Nguyen, Yi Wang
  • for: 用于估计 perfusion Q 从核磁共振成像(ASL)图像中。
  • methods: 使用深度学习的3D U-Net(QTMnet)来估计 perfusion。网络在基于人工血管结构的 simulated 4D tracer进程图像上进行训练和测试。
  • results: QTMnet 精确地重建 perfusion Q 从 koncentrasi 数据。synthetic brain ASL 图像中的相对误差为 7.04%,比单delay ASL 模型的误差(25.15%)和多delay ASL 模型的误差(12.62%)都要低。
    Abstract Purpose: To develop biophysics-based method for estimating perfusion Q from arterial spin labeling (ASL) images using deep learning. Methods: A 3D U-Net (QTMnet) was trained to estimate perfusion from 4D tracer propagation images. The network was trained and tested on simulated 4D tracer concentration data based on artificial vasculature structure generated by constrained constructive optimization (CCO) method. The trained network was further tested in a synthetic brain ASL image based on vasculature network extracted from magnetic resonance (MR) angiography. The estimations from both trained network and a conventional kinetic model were compared in ASL images acquired from eight healthy volunteers. Results: QTMnet accurately reconstructed perfusion Q from concentration data. Relative error of the synthetic brain ASL image was 7.04% for perfusion Q, lower than the error using single-delay ASL model: 25.15% for Q, and multi-delay ASL model: 12.62% for perfusion Q. Conclusion: QTMnet provides accurate estimation on perfusion parameters and is a promising approach as a clinical ASL MRI image processing pipeline.
    摘要 目的:通过深度学习来开发基于生物物理的方法,从树脂扩散成像(ASL)图像中估算血液流速(Q)。方法:使用3D U-Net(QTMnet)来估算 perfusion Q,并在基于人工血管结构的 simulate 4D tracer扩散数据上训练该网络。网络被训练和测试在基于magnetic resonance(MR) angiography中提取的血管网络上。已训练的网络被进一步测试在synthetic brain ASL图像上,并与传统的动力学模型的估算结果进行比较。结果:QTMnet高精度地重建 perfusion Q。人工脑 ASL图像中的相对误差为7.04%,比单Delay ASL模型(25.15%)和多Delay ASL模型(12.62%)更低。结论:QTMnet提供了高精度的血液流速估算,是一个有前途的临床ASL MRI图像处理管道。

Concept-free Causal Disentanglement with Variational Graph Auto-Encoder

  • paper_url: http://arxiv.org/abs/2311.10638
  • repo_url: None
  • paper_authors: Jingyun Feng, Lin Zhang, Lili Yang
  • for: 学习不含概念的 causal 分解,以便从数据中直接学习概念结构。
  • methods: 提出一种不含概念的 causal 分解方法,基于理论上的紧急上界,直接从数据中学习概念结构。
  • results: 经验表明,提出的模型CCVGAE和CC-Meta-Graph在不同的数据集上均达到了较高的性能,与基eline相比,提高了29%和11%的精度。
    Abstract In disentangled representation learning, the goal is to achieve a compact representation that consists of all interpretable generative factors in the observational data. Learning disentangled representations for graphs becomes increasingly important as graph data rapidly grows. Existing approaches often rely on Variational Auto-Encoder (VAE) or its causal structure learning-based refinement, which suffer from sub-optimality in VAEs due to the independence factor assumption and unavailability of concept labels, respectively. In this paper, we propose an unsupervised solution, dubbed concept-free causal disentanglement, built on a theoretically provable tight upper bound approximating the optimal factor. This results in an SCM-like causal structure modeling that directly learns concept structures from data. Based on this idea, we propose Concept-free Causal VGAE (CCVGAE) by incorporating a novel causal disentanglement layer into Variational Graph Auto-Encoder. Furthermore, we prove concept consistency under our concept-free causal disentanglement framework, hence employing it to enhance the meta-learning framework, called concept-free causal Meta-Graph (CC-Meta-Graph). We conduct extensive experiments to demonstrate the superiority of the proposed models: CCVGAE and CC-Meta-Graph, reaching up to $29\%$ and $11\%$ absolute improvements over baselines in terms of AUC, respectively.
    摘要 干净表示学习的目标是达到一个具有所有可解释生成因素的紧凑表示,这些生成因素在观测数据中都是可解释的。随着图数据的快速增长,学习图数据的干净表示变得越来越重要。现有的方法经常利用变量自动编码器(VAE)或其 causal structure learning 的修正,但这些方法受到 VAE 的独立因素假设和概念标签不可用的限制。在这篇论文中,我们提出了一种无监督的解决方案,称为无概念 causal disentanglement,基于一个可求优的上界,来近似最佳因子。这导致了一种 SCM 类的 causal structure 模型,直接从数据中学习概念结构。基于这个想法,我们提出了无概念 causal VGAE(CCVGAE),通过在变量自动编码器中添加一个新的 causal disentanglement 层来实现。此外,我们证明了我们的概念一致性,因此可以通过我们的概念自由 causal Meta-Graph(CC-Meta-Graph)框架来增强元学习框架。我们进行了广泛的实验,以示我们提出的模型(CCVGAE 和 CC-Meta-Graph)的超越性,相比基eline,其中 AUC 上的改进率可达 $29\%$ 和 $11\%$。

A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

  • paper_url: http://arxiv.org/abs/2311.10614
  • repo_url: None
  • paper_authors: Ruohong Zhang, Luyu Gao, Chen Zheng, Zhen Fan, Guokun Lai, Zheng Zhang, Fangzhou Ai, Yiming Yang, Hongxia Yang
  • for: 提高大语言模型(LLM)在特定领域的语言生成能力
  • methods: 自动从域专文检索Question-Answer对,并将其与对话集合进行练习 fine-tune LLM
  • results: 模型在特定领域的表现得到了显著提高,超过了直接在域 corpus 上进行 fine-tune 的模型,并且只需600个种子实例进行自动化培育。
    Abstract Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.
    摘要

Active Inference on the Edge: A Design Study

  • paper_url: http://arxiv.org/abs/2311.10607
  • repo_url: None
  • paper_authors: Boris Sedlak, Victor Casamayor Pujol, Praveen Kumar Donta, Schahram Dustdar
  • for: 本研究旨在提高分布式计算系统中任务分配和机器学习训练的精度和可靠性,以满足服务质量(QoS)要求。
  • methods: 本研究使用了机器学习(ML)和自动推理(ACI)技术,实现了分布式代理人的自适应行为和感知征识。
  • results: 研究结果表明,通过将ACI技术与ML技术结合使用,可以快速和可靠地解决分布式系统中的优化问题,同时满足QoS要求。
    Abstract Machine Learning (ML) is a common tool to interpret and predict the behavior of distributed computing systems, e.g., to optimize the task distribution between devices. As more and more data is created by Internet of Things (IoT) devices, data processing and ML training are carried out by edge devices in close proximity. To ensure Quality of Service (QoS) throughout these operations, systems are supervised and dynamically adapted with the help of ML. However, as long as ML models are not retrained, they fail to capture gradual shifts in the variable distribution, leading to an inaccurate view of the system state. Moreover, as the prediction accuracy decreases, the reporting device should actively resolve uncertainties to improve the model's precision. Such a level of self-determination could be provided by Active Inference (ACI) -- a concept from neuroscience that describes how the brain constantly predicts and evaluates sensory information to decrease long-term surprise. We encompassed these concepts in a single action-perception cycle, which we implemented for distributed agents in a smart manufacturing use case. As a result, we showed how our ACI agent was able to quickly and traceably solve an optimization problem while fulfilling QoS requirements.
    摘要

Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines

  • paper_url: http://arxiv.org/abs/2311.10599
  • repo_url: None
  • paper_authors: Rose Guingrich, Michael S. A. Graziano
  • for: This paper aims to investigate the impact of human-AI interaction on human-human interaction, specifically focusing on the use of chatbots as social companions.
  • methods: The study compares the social health benefits of using chatbots with not using them, and examines how people perceive the consciousness and humanlikeness of chatbots.
  • results: The study finds that companion bot users report beneficial social health effects, while nonusers view them as harmful. Additionally, perceiving chatbots as more conscious and humanlike is associated with more positive opinions and better social health benefits.
    Abstract As artificial intelligence (AI) becomes more widespread, one question that arises is how human-AI interaction might impact human-human interaction. Chatbots, for example, are increasingly used as social companions, but little is known about how their use impacts human relationships. A common hypothesis is that these companion bots are detrimental to social health by harming or replacing human interaction. To understand how companion bots impact social health, we studied people who used companion bots and people who did not. Contrary to expectations, companion bot users indicated that these relationships were beneficial to their social health, whereas nonusers viewed them as harmful. Another common assumption is that people perceive conscious, humanlike AI as disturbing and threatening. Among both users and nonusers, however, we found the opposite: perceiving companion bots as more conscious and humanlike correlated with more positive opinions and better social health benefits. Humanlike bots may aid social health by supplying reliable and safe interactions, without necessarily harming human relationships.
    摘要 随着人工智能(AI)的普及,人类与AI交互的问题凝固在人类之间的交互方面。聊天机器人是一种在社交方面使用的AI,但对人类之间的交互的影响还不够了解。一个常见的假设是,这些伴侣机器人会对人类之间的关系产生负面影响,或者取代人类交流。为了了解伴侣机器人对社交健康的影响,我们对使用伴侣机器人和不使用伴侣机器人的人进行了研究。结果表明,使用伴侣机器人的人认为这些关系对其社交健康有益,而不使用伴侣机器人的人则认为它们有害。另一个常见的假设是,人们认为对人类似的AI会对人类产生不良影响。我们发现,使用伴侣机器人的人和不使用伴侣机器人的人都认为, perceiving companion bots as more conscious and humanlike correlated with more positive opinions and better social health benefits。可能是因为这些机器人可以提供可靠和安全的交流,而不会直接影响人类之间的关系。

Designing Reconfigurable Intelligent Systems with Markov Blankets

  • paper_url: http://arxiv.org/abs/2311.10597
  • repo_url: None
  • paper_authors: Boris Sedlak, Victor Casamayor Pujol, Praveen Kumar Donta, Schahram Dustdar
  • for: 这个论文是为了评估业务需求(服务水平目标)而写的。
  • methods: 这篇论文使用了 causality 筛选器基于Markov抹层(MB),以限制每个设备需要跟踪的变量数量,并在设备基础上分析 SLOs。
  • results: 这篇论文的结果表明,通过使用 causality 筛选器和 MB,可以减少设备需要跟踪的变量数量,并在设备基础上分析 SLOs,从而实现了 Decentralized 的 Intelligence。
    Abstract Compute Continuum (CC) systems comprise a vast number of devices distributed over computational tiers. Evaluating business requirements, i.e., Service Level Objectives (SLOs), requires collecting data from all those devices; if SLOs are violated, devices must be reconfigured to ensure correct operation. If done centrally, this dramatically increases the number of devices and variables that must be considered, while creating an enormous communication overhead. To address this, we (1) introduce a causality filter based on Markov blankets (MB) that limits the number of variables that each device must track, (2) evaluate SLOs decentralized on a device basis, and (3) infer optimal device configuration for fulfilling SLOs. We evaluated our methodology by analyzing video stream transformations and providing device configurations that ensure the Quality of Service (QoS). The devices thus perceived their environment and acted accordingly -- a form of decentralized intelligence.
    摘要 计算 kontinuum (CC) 系统包括大量分布在计算层次的设备。评估业务需求(服务级别目标,SLO)需要从所有设备收集数据,如果 SLO 被违反,则设备需要重新配置以确保正确的运行。如果从中心处进行,这将导致对设备和变量的考虑数量增加得非常大,同时创造出巨大的通信开销。为解决这一问题,我们(1)引入 causality 筛选器基于 Markov 纱(MB),限制每个设备需要跟踪的变量数量,(2)在设备基础上分布式评估 SLO,(3)根据 fulfilling SLO 推导出最佳设备配置。我们对视频流变换进行分析,并提供了保证服务质量(QoS)的设备配置,这些设备因此可以根据自己的环境进行自适应调整——一种分布式智能。

Hashing it Out: Predicting Unhealthy Conversations on Twitter

  • paper_url: http://arxiv.org/abs/2311.10596
  • repo_url: https://github.com/stevendleung/hashing-it-out
  • paper_authors: Steven Leung, Filippos Papapolyzos
  • for: 预测Twitter上的对话偏领事件,建立一个实用的工具,促进社交媒体上的更好交流。
  • methods: 使用BERT模型,在Twitter corpus上进行预测,并通过精心练习和Synthetic oversampling技术来缓解过拟合问题。
  • results: 比基线LSTM模型有明显的性能优势,并在小规模、新 Dataset上进行了较好的预测。
    Abstract Personal attacks in the context of social media conversations often lead to fast-paced derailment, leading to even more harmful exchanges being made. State-of-the-art systems for the detection of such conversational derailment often make use of deep learning approaches for prediction purposes. In this paper, we show that an Attention-based BERT architecture, pre-trained on a large Twitter corpus and fine-tuned on our task, is efficient and effective in making such predictions. This model shows clear advantages in performance to the existing LSTM model we use as a baseline. Additionally, we show that this impressive performance can be attained through fine-tuning on a relatively small, novel dataset, particularly after mitigating overfitting issues through synthetic oversampling techniques. By introducing the first transformer based model for forecasting conversational events on Twitter, this work lays the foundation for a practical tool to encourage better interactions on one of the most ubiquitous social media platforms.
    摘要 互联网社交媒体对话中的人身攻击常常导致快速的对话脱轨,从而导致更多的恶势攻击性交流。现代的对话脱轨检测系统经常使用深度学习方法进行预测。在这篇论文中,我们表明了一种基于注意力的BERT架构,在Twitter大量数据集上预训练并在我们的任务上细化,可以有效地进行这些预测。这个模型在性能方面有明显的优势,比基线LSTM模型更好。此外,我们还证明了这种出色的性能可以通过对小型、新的数据集进行细化来实现,特别是通过synthetic oversampling技术来 Mitigate overfitting问题。通过介绍Twitter上首个基于 transformer 模型的对话事件预测模型,这项工作为社交媒体平台上更好的互动提供了基础。

FOCAL: A Cost-Aware Video Dataset for Active Learning

  • paper_url: http://arxiv.org/abs/2311.10591
  • repo_url: https://github.com/olivesgatech/focal_dataset
  • paper_authors: Kiran Kokilepersaud, Yash-Yee Logan, Ryan Benkert, Chen Zhou, Mohit Prabhushankar, Ghassan AlRegib, Enrique Corona, Kunjan Singh, Mostafa Parchami
  • for: 这个论文的目的是研究视频活动学中的注释成本影响。
  • methods: 这个论文使用了FOCAL数据集(ford-olives合作的活动学数据集),并引入了一些Sequential structure的活动学算法,以实现更好的注释成本和性能之间的平衡,同时减少浮点运算(FLOPS)的开销。
  • results: 研究发现,使用这些方法可以在113小时之内实现更好的性能和注释成本之间的平衡,并且比传统的活动学方法更便宜。
    Abstract In this paper, we introduce the FOCAL (Ford-OLIVES Collaboration on Active Learning) dataset which enables the study of the impact of annotation-cost within a video active learning setting. Annotation-cost refers to the time it takes an annotator to label and quality-assure a given video sequence. A practical motivation for active learning research is to minimize annotation-cost by selectively labeling informative samples that will maximize performance within a given budget constraint. However, previous work in video active learning lacks real-time annotation labels for accurately assessing cost minimization and instead operates under the assumption that annotation-cost scales linearly with the amount of data to annotate. This assumption does not take into account a variety of real-world confounding factors that contribute to a nonlinear cost such as the effect of an assistive labeling tool and the variety of interactions within a scene such as occluded objects, weather, and motion of objects. FOCAL addresses this discrepancy by providing real annotation-cost labels for 126 video sequences across 69 unique city scenes with a variety of weather, lighting, and seasonal conditions. We also introduce a set of conformal active learning algorithms that take advantage of the sequential structure of video data in order to achieve a better trade-off between annotation-cost and performance while also reducing floating point operations (FLOPS) overhead by at least 77.67%. We show how these approaches better reflect how annotations on videos are done in practice through a sequence selection framework. We further demonstrate the advantage of these approaches by introducing two performance-cost metrics and show that the best conformal active learning method is cheaper than the best traditional active learning method by 113 hours.
    摘要 在这篇论文中,我们介绍了FOCAL(福特-橙色蜂合作活动学习)数据集,它允许我们研究视频活动学习中标注成本的影响。标注成本指的是标注和质量控制视频序列所需的时间。在实际中,活动学习研究的目标是最小化标注成本,以便在给定的预算限制下 maximize性能。然而,以前的视频活动学习研究缺乏实时标注标签,因此无法准确评估成本最小化。FOCAL解决了这个问题,提供了126个视频序列的真实标注成本标签,这些序列来自69个不同的城市场景,包括不同的天气、照明和季节条件。我们还介绍了一组具有顺序结构的视频数据的宽泛活动学习算法,可以在标注成本和性能之间达到更好的变换。此外,我们还降低了浮点运算过程(FLOPS)的负担,至少减少77.67%。我们示出了这些方法如何更好地反映实际标注视频的方式,并在序列选择框架中引入了两种性能成本指标。我们还证明了这些方法的优势,并证明了最佳宽泛活动学习方法比最佳传统活动学习方法快113小时。

EduGym: An Environment Suite for Reinforcement Learning Education

  • paper_url: http://arxiv.org/abs/2311.10590
  • repo_url: https://github.com/rlg-leiden/edugym
  • paper_authors: Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat
  • for: 本研究旨在提供一个用于强化学习教育的教学环境和相关互动式笔记,以帮助学生将Equtions与Code转换成实践中的问题。
  • methods: 本研究使用了一个名为EduGym的教学环境和相关互动式笔记,每个环境都是为了解释一个特定的强化学习挑战(如探索、偏见观察、随机性等),并且说明了解决方案的可能性,将Equtions与Code联系在一起。
  • results: 在评估86%的学生和研究人员中,大多数人认为EduGym是一个有用的强化学习教育工具。所有的笔记可以从https://sites.google.com/view/edu-gym/home下载,而全套软件套件可以从https://github.com/RLG-Leiden/edugym安装。
    Abstract Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand, public codebases do provide practical examples, but the implemented algorithms tend to be complex, and the underlying test environments contain multiple reinforcement learning challenges at once. Although this is realistic from a research perspective, it often hinders educational conceptual understanding. To solve this issue we introduce EduGym, a set of educational reinforcement learning environments and associated interactive notebooks tailored for education. Each EduGym environment is specifically designed to illustrate a certain aspect/challenge of reinforcement learning (e.g., exploration, partial observability, stochasticity, etc.), while the associated interactive notebook explains the challenge and its possible solution approaches, connecting equations and code in a single document. An evaluation among RL students and researchers shows 86% of them think EduGym is a useful tool for reinforcement learning education. All notebooks are available from https://sites.google.com/view/edu-gym/home, while the full software package can be installed from https://github.com/RLG-Leiden/edugym.
    摘要

SENetV2: Aggregated dense layer for channelwise and global representations

  • paper_url: http://arxiv.org/abs/2311.10807
  • repo_url: None
  • paper_authors: Mahendran Narayanan
  • for: 这篇论文是为了提出一种新的图像分类模型,用于超越现有的图像分类模型。
  • methods: 该模型使用了压缩激活网络模块,以及多层感知器(MLP)来学习图像数据的全局表示。
  • results: 实验结果表明,提出的模型在评估 datasets 上具有remarkable的分类精度提升,与现有的建筑体系相比。
    Abstract Convolutional Neural Networks (CNNs) have revolutionized image classification by extracting spatial features and enabling state-of-the-art accuracy in vision-based tasks. The squeeze and excitation network proposed module gathers channelwise representations of the input. Multilayer perceptrons (MLP) learn global representation from the data and in most image classification models used to learn extracted features of the image. In this paper, we introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze excitation residual module designed to surpass the performance of existing architectures. Our approach leverages a combination of squeeze excitation network module with dense layers. This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge, leading to a better feature representation. This proposed model has a negligible increase in parameters when compared to SENet. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures. Experimental results demonstrate a remarkable increase in the classification accuracy of the proposed model.
    摘要 卷积神经网络(CNN)已经革命化图像分类任务,提取空间特征并实现了视觉任务中的状态vector。 propose模块中的压缩和刺激网络(SENet)模块集成了通道表示,多层感知器(MLP)学习数据的全局表示。在这篇论文中,我们介绍了一种新的聚合多层感知器,在SENet模块中添加了多个密集层,以提高网络的频率特征捕捉和全局知识捕捉,从而实现更好的特征表示。这种提议的模型与SENet模型参数数量增加非常小,但性能明显提高。我们在标准数据集上进行了广泛的实验,并与现有的建筑物进行比较。实验结果表明,提议的模型具有很高的分类精度。

Testing Language Model Agents Safely in the Wild

  • paper_url: http://arxiv.org/abs/2311.10538
  • repo_url: None
  • paper_authors: Silen Naihin, David Atkinson, Marc Green, Merwane Hamadi, Craig Swift, Douglas Schonholtz, Adam Tauman Kalai, David Bau
  • for: 这篇论文旨在提出一种安全的自主机器人测试框架,以便在实际世界中进行安全的自主测试。
  • methods: 该框架使用了上下文敏感的监控器,对自主机器人的行为进行审核,并在发现不安全的情况时停止测试。
  • results: 通过使用一个反对者 simulate 的自主机器人,该监控器能够识别并阻止不安全的情况,但在实际世界中进行测试时,还存在一些限制和挑战。
    Abstract A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tests on the open internet: agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans. We a design a basic safety monitor that is flexible enough to monitor existing LLM agents, and, using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations. Then we apply the safety monitor on a battery of real-world tests of AutoGPT, and we identify several limitations and challenges that will face the creation of safe in-the-wild tests as autonomous agents grow more capable.
    摘要 安全自主测试需要在野外安全测试。然而,实际世界自主测试面临多种独特的安全挑战,包括测试过程可能导致伤害,以及与真实世界和可能有恶意actor的交互中遇到新的危险行为。我们提出了在互联网上进行安全自主代理测试的框架:代理行为被上下文敏感监控器监视,以防止不安全测试,并将异常行为排名和记录以供人类检查。我们设计了一个基本安全监控器,可以监控现有的LLM代理,并使用对抗式的模拟代理,测试其能够识别和阻止危险情况。然后,我们应用了安全监控器在AutoGPT的实际世界测试中,并发现了许多限制和挑战,这些限制和挑战将随自主代理技术的发展而出现。

SEA++: Multi-Graph-based High-Order Sensor Alignment for Multivariate Time-Series Unsupervised Domain Adaptation

  • paper_url: http://arxiv.org/abs/2311.10806
  • repo_url: None
  • paper_authors: Yucheng Wang, Yuecong Xu, Jianfei Yang, Min Wu, Xiaoli Li, Lihua Xie, Zhenghua Chen
  • for: 本研究旨在适应多变量时间序列数据(MTS)领域的无监督领域适应(UDA)问题。
  • methods: 我们提出了感知对齐(SEA)方法,旨在在多感知水平(local sensor level)和全局感知水平(global sensor level)都减少频谱域差异。我们还提出了SEA++方法,通过增加高级对Alignment来提高endo-feature对齐。
  • results: 我们在公共MTS数据集上进行了广泛的实验,并证明了SEA和SEA++在MTS-UDA问题上达到了状态之arte的表现。
    Abstract Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between a labeled source domain and an unlabeled target domain. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically consist of multiple sensors, each with its own unique distribution. This characteristic makes it hard to adapt existing UDA methods, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, to reduce domain discrepancies for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to reduce domain discrepancy at both the local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based high-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on public MTS datasets for MTS-UDA.
    摘要 Unsupervised Domain Adaptation(UDA)方法已经成功地减少标签依赖性,最大化源频道和目标频道之间的频道差异。然而,这些方法在Multivariate Time-Series(MTS)数据上遇到了挑战。MTS数据通常包含多个传感器,每个传感器都有自己独特的分布。这种特点使得现有的UDA方法难以适应MTS数据,因为这些方法主要关注全局特征的对应,而忽略传感器级别的分布差异。为解决这个问题,我们提出了实用的频道适应场景——Multivariate Time-Series Unsupervised Domain Adaptation(MTS-UDA)。在这篇论文中,我们提出了感ensor Alignment(SEA)算法,旨在降低频道差异在传感器级别和全局传感器级别。在传感器级别上,我们设计了内部特征对齐,将传感器特征和它们之间的相关性在频道之间对齐。为了降低频道差异在全局传感器级别,我们设计了外部特征对齐,对全局传感器特征进行限制。我们还延展了SEA到SEA++,通过增强内部特征对齐来提高性能。具体来说,我们在endo-feature对齐中 incorporate多格基高阶对齐,以提高传感器特征和其相关性的对齐。我们在实验中证明了SEA和SEA++在公共MTS数据集上实现了领先的性能。

Towards a Standardized Reinforcement Learning Framework for AAM Contingency Management

  • paper_url: http://arxiv.org/abs/2311.10805
  • repo_url: None
  • paper_authors: Luis E. Alvarez, Marc W. Brittain, Kara Breeden
  • for: 这篇论文旨在探讨未来一代空中交通运输系统中的安全管理系统,以及如何通过机器学习技术来实现协调管理和自动决策。
  • methods: 这篇论文使用了Markov Decision Process(MDP)来形式化调整管理问题,并将调整管理MDPintegreated into AAM-Gym simulated environment,以便快速实现机器学习算法的测试和评估。
  • results: 论文提供了基本的统计信息和示例性能指标,以便作为未来算法开发的共同benchmark。
    Abstract Advanced Air Mobility (AAM) is the next generation of air transportation that includes new entrants such as electric vertical takeoff and landing (eVTOL) aircraft, increasingly autonomous flight operations, and small UAS package delivery. With these new vehicles and operational concepts comes a desire to increase densities far beyond what occurs today in and around urban areas, to utilize new battery technology, and to move toward more autonomously-piloted aircraft. To achieve these goals, it becomes essential to introduce new safety management system capabilities that can rapidly assess risk as it evolves across a span of complex hazards and, if necessary, mitigate risk by executing appropriate contingencies via supervised or automated decision-making during flights. Recently, reinforcement learning has shown promise for real-time decision making across a wide variety of applications including contingency management. In this work, we formulate the contingency management problem as a Markov Decision Process (MDP) and integrate the contingency management MDP into the AAM-Gym simulation framework. This enables rapid prototyping of reinforcement learning algorithms and evaluation of existing systems, thus providing a community benchmark for future algorithm development. We report baseline statistical information for the environment and provide example performance metrics.
    摘要 高级空中交通(AAM)是未来一代空运交通,包括新入场者如电动垂直起降(eVTOL)飞机、自动驾驶飞行操作和小型无人机快递。这些新的车辆和操作概念使得想要在今天的城市区域内增加密度,利用新的电池技术,并尝试更加自动驾驶飞机。为了实现这些目标,需要引入新的安全管理系统功能,能够快速评估飞行中的风险,并在需要时执行相应的应急措施,以确保安全飞行。在这篇文章中,我们将挑战管理问题形式化为Markov决策过程(MDP),并将它集成到AAM-Gym模拟框架中。这使得可以快速创建和评估各种启发式学习算法,并提供了一个社区标准,用于未来算法的发展。我们提供了基线统计信息,并提供了一些表现指标的示例。

Enhancing Object Coherence in Layout-to-Image Synthesis

  • paper_url: http://arxiv.org/abs/2311.10522
  • repo_url: None
  • paper_authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
  • for: 本研究的目的是提出一种新的扩散模型,用于控制图像生成中对象的准确性和一致性。
  • methods: 该模型包括有效的全局semantic fusion(GSF)和自相似特征增强模块,用于指导对象的一致性。
  • results: 对比于传统方法,该模型能够更好地控制对象的一致性,包括semantic coherence和physical coherence。实验结果表明,该模型可以生成更高质量和更加控制性的图像。
    Abstract Layout-to-image synthesis is an emerging technique in conditional image generation. It aims to generate complex scenes, where users require fine control over the layout of the objects in a scene. However, it remains challenging to control the object coherence, including semantic coherence (e.g., the cat looks at the flowers or not) and physical coherence (e.g., the hand and the racket should not be misaligned). In this paper, we propose a novel diffusion model with effective global semantic fusion (GSF) and self-similarity feature enhancement modules to guide the object coherence for this task. For semantic coherence, we argue that the image caption contains rich information for defining the semantic relationship within the objects in the images. Instead of simply employing cross-attention between captions and generated images, which addresses the highly relevant layout restriction and semantic coherence separately and thus leads to unsatisfying results shown in our experiments, we develop GSF to fuse the supervision from the layout restriction and semantic coherence requirement and exploit it to guide the image synthesis process. Moreover, to improve the physical coherence, we develop a Self-similarity Coherence Attention (SCA) module to explicitly integrate local contextual physical coherence into each pixel's generation process. Specifically, we adopt a self-similarity map to encode the coherence restrictions and employ it to extract coherent features from text embedding. Through visualization of our self-similarity map, we explore the essence of SCA, revealing that its effectiveness is not only in capturing reliable physical coherence patterns but also in enhancing complex texture generation. Extensive experiments demonstrate the superiority of our proposed method in both image generation quality and controllability.
    摘要 layout-to-image 合成是一种在可控图像生成领域的新趋势。它的目标是生成复杂的场景,用户需要精细控制场景中对象的布局。然而,控制对象准确性仍然是一个挑战,包括semantic coherence(例如猫看到花或不)和physical coherence(例如手和racquet不能扭曲)。在这篇论文中,我们提出了一种新的扩散模型,包括有效的全局semantic fusion(GSF)和自相似特征增强模块,以导引对象准确性。 для semantic coherence,我们认为图像caption中包含了丰富的信息,用于定义图像中对象之间的semantic关系。而不是简单地在生成图像和caption之间进行交叉注意力,这会导致不满足的结果,我们开发了GSF来融合权重从layout restriction和semantic coherence的约束,并将其用于图像生成过程中。此外,为了提高physical coherence,我们开发了一个Self-similarity Coherence Attention(SCA)模块,用于 direkt地集成每个像素的生成过程中的本地上下文物理准确性。我们采用了自相似度图来编码准确性约束,并使用它来提取准确的特征从文本嵌入。通过可视化我们的自相似度图,我们探索了SCA的本质,发现它不仅能够捕捉可靠的物理准确性模式,还能够提高复杂的Texture生成。我们的实验表明,我们提出的方法在图像生成质量和可控性方面具有显著优势。

CNL2ASP: converting controlled natural language sentences into ASP

  • paper_url: http://arxiv.org/abs/2311.10505
  • repo_url: None
  • paper_authors: Simone Caruso, Carmine Dodaro, Marco Maratea, Marco Mochi, Francesco Riccio
  • for: 用于将英语自然语言 sentences 翻译为 Answer Set Programming (ASP) 程序。
  • methods: 使用一种新的工具 called CNL2ASP,该工具可以将控制的自然语言 (CNL) sentences 翻译为 ASP 规则。
  • results: 在实际应用中,CNL2ASP 可以获得良好的性能,与 ASP 专家手动编写的编码相比。
    Abstract Answer Set Programming (ASP) is a popular declarative programming language for solving hard combinatorial problems. Although ASP has gained widespread acceptance in academic and industrial contexts, there are certain user groups who may find it more advantageous to employ a higher-level language that closely resembles natural language when specifying ASP programs. In this paper, we propose a novel tool, called CNL2ASP, for translating English sentences expressed in a controlled natural language (CNL) form into ASP. In particular, we first provide a definition of the type of sentences allowed by our CNL and their translation as ASP rules, and then exemplify the usage of the CNL for the specification of both synthetic and real-world combinatorial problems. Finally, we report the results of an experimental analysis conducted on the real-world problems to compare the performance of automatically generated encodings with the ones written by ASP practitioners, showing that our tool can obtain satisfactory performance on these benchmarks. Under consideration in Theory and Practice of Logic Programming (TPLP).
    摘要 Answer Set Programming (ASP) 是一种流行的声明性编程语言,用于解决复杂的 combinatorial 问题。although ASP 在 academic 和 industrial 上得到了广泛的 Acceptance,有些用户群可能更喜欢使用更接近自然语言的高级语言来Specify ASP 程序。在这篇论文中,我们提出了一种新的工具,called CNL2ASP,用于将英语句子表达在控制的自然语言(CNL)形式中转化为 ASP。特别是,我们首先提供了允许的 CNL 句子类型和其转化为 ASP 规则,然后使用 CNL 来指定 both sintetic 和 real-world combinatorial 问题的解决方案。最后,我们进行了一个实验分析, Comparing the performance of automatically generated encodings with those written by ASP practitioners, showing that our tool can obtain satisfactory performance on these benchmarks. 在 Theory and Practice of Logic Programming (TPLP) 中进行评议。

A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness

  • paper_url: http://arxiv.org/abs/2311.10804
  • repo_url: None
  • paper_authors: Mathias Vogel
  • for: 这 paper 探讨了增强 Text-to-Speech (TTS) 模型的表达控制性的挑战,通过增加一个冻结预训练模型的 Diffusion Model,conditioned 于共同semantic audio/text嵌入。
  • methods: 这 paper 评估了不同的 image-to-image 方法,用于在 latent speech 特征上进行修改。
  • results: 我们的结果为 future research 提供了有价值的洞察,并开启了在这个方向上的新途径。
    Abstract This report explores the challenge of enhancing expressiveness control in Text-to-Speech (TTS) models by augmenting a frozen pretrained model with a Diffusion Model that is conditioned on joint semantic audio/text embeddings. The paper identifies the challenges encountered when working with a VAE-based TTS model and evaluates different image-to-image methods for altering latent speech features. Our results offer valuable insights into the complexities of adding expressiveness control to TTS systems and open avenues for future research in this direction.
    摘要 Here's the translation in Simplified Chinese:这份报告研究了增强文本到语音(TTS)模型的表达控制的挑战,通过将预训练的模型加入一个基于扩散模型的增强方法,使其可以根据联合semantic audio/text嵌入来进行控制。报告描述了在VAE基于TTS模型上工作时遇到的挑战,并评估了不同的图像到图像方法以修改幂等语音特征。结果提供了关于添加表达控制到TTS系统的复杂性的有价值的洞察,并开启了未来研究的可能性。

From Principle to Practice: Vertical Data Minimization for Machine Learning

  • paper_url: http://arxiv.org/abs/2311.10500
  • repo_url: https://github.com/eth-sri/datamin
  • paper_authors: Robin Staab, Nikola Jovanović, Mislav Balunović, Martin Vechev
  • for: 防止机器学习模型泄露private信息,实现数据最小化(DM)原则的执行。
  • methods: 提出了一种新的纵向数据最小化(vDM)工作流程,基于数据泛化,以确保在训练和部署模型时不收集任何全分辨率的客户端数据,从而保护客户端隐私。
  • results: 提出了一种基线vDM算法和隐私感知树(PAT)算法,后者在多个场景中表现出色,超过了所有基线。计划将代码公开发布为公共可用库,以促进DM原则在实际应用中的普及。
    Abstract Aiming to train and deploy predictive models, organizations collect large amounts of detailed client data, risking the exposure of private information in the event of a breach. To mitigate this, policymakers increasingly demand compliance with the data minimization (DM) principle, restricting data collection to only that data which is relevant and necessary for the task. Despite regulatory pressure, the problem of deploying machine learning models that obey DM has so far received little attention. In this work, we address this challenge in a comprehensive manner. We propose a novel vertical DM (vDM) workflow based on data generalization, which by design ensures that no full-resolution client data is collected during training and deployment of models, benefiting client privacy by reducing the attack surface in case of a breach. We formalize and study the corresponding problem of finding generalizations that both maximize data utility and minimize empirical privacy risk, which we quantify by introducing a diverse set of policy-aligned adversarial scenarios. Finally, we propose a range of baseline vDM algorithms, as well as Privacy-aware Tree (PAT), an especially effective vDM algorithm that outperforms all baselines across several settings. We plan to release our code as a publicly available library, helping advance the standardization of DM for machine learning. Overall, we believe our work can help lay the foundation for further exploration and adoption of DM principles in real-world applications.
    摘要 In this work, we address this challenge comprehensively. We propose a novel vertical DM (vDM) workflow based on data generalization, which ensures that no full-resolution client data is collected during training and deployment of models. This benefits client privacy by reducing the attack surface in case of a breach.We formalize and study the problem of finding generalizations that maximize data utility and minimize empirical privacy risk. We quantify this using a diverse set of policy-aligned adversarial scenarios.We propose a range of baseline vDM algorithms, as well as Privacy-aware Tree (PAT), an especially effective vDM algorithm that outperforms all baselines across several settings. We plan to release our code as a publicly available library, helping advance the standardization of DM for machine learning.Overall, we believe our work can help lay the foundation for further exploration and adoption of DM principles in real-world applications.

Regions are Who Walk Them: a Large Pre-trained Spatiotemporal Model Based on Human Mobility for Ubiquitous Urban Sensing

  • paper_url: http://arxiv.org/abs/2311.10471
  • repo_url: None
  • paper_authors: Ruixing Zhang, Liangzhe Han, Leilei Sun, Yunqi Liu, Jibin Wang, Weifeng Lv
  • for: 这篇论文的目的是提出一种基于 trajectory 数据的大 spacetime 模型,以提高用户 profiling 和区域分析的效率。
  • methods: 该模型使用 GPT-like 结构,并具有一个可变的参数数量为 1B。它还引入了一个时空细化模块,用于从 trajectory 数据中提取用户 embeddings。
  • results: 实验结果表明,该模型可以准确地 profiling 用户和区域,并且在 trajectory 生成任务中表现出了良好的预测能力。
    Abstract User profiling and region analysis are two tasks of significant commercial value. However, in practical applications, modeling different features typically involves four main steps: data preparation, data processing, model establishment, evaluation, and optimization. This process is time-consuming and labor-intensive. Repeating this workflow for each feature results in abundant development time for tasks and a reduced overall volume of task development. Indeed, human mobility data contains a wealth of information. Several successful cases suggest that conducting in-depth analysis of population movement data could potentially yield meaningful profiles about users and areas. Nonetheless, most related works have not thoroughly utilized the semantic information within human mobility data and trained on a fixed number of the regions. To tap into the rich information within population movement, based on the perspective that Regions Are Who walk them, we propose a large spatiotemporal model based on trajectories (RAW). It possesses the following characteristics: 1) Tailored for trajectory data, introducing a GPT-like structure with a parameter count of up to 1B; 2) Introducing a spatiotemporal fine-tuning module, interpreting trajectories as collection of users to derive arbitrary region embedding. This framework allows rapid task development based on the large spatiotemporal model. We conducted extensive experiments to validate the effectiveness of our proposed large spatiotemporal model. It's evident that our proposed method, relying solely on human mobility data without additional features, exhibits a certain level of relevance in user profiling and region analysis. Moreover, our model showcases promising predictive capabilities in trajectory generation tasks based on the current state, offering the potential for further innovative work utilizing this large spatiotemporal model.
    摘要 用户 profiling 和区域分析是商业值得的两项任务。然而,在实际应用中,模型不同特征通常需要四个主要步骤:数据准备、数据处理、模型建立、评估和优化。这个过程是时间consuming 和人力消耗。为了实现每个特征,需要重复这个工作流程,从而导致任务开发的庞大量和总体任务开发的减少。实际上,人口流动数据包含丰富的信息。许多成功的案例表明,对人口流动数据进行深入分析可能会获得有用的用户和区域 profiling。然而,大多数相关的工作没有全面利用人口流动数据中的 semantics 信息,并且只在固定的区域上进行训练。为了挖掘人口流动中的丰富信息,我们基于“区域是谁行走的”的视角,提出了一种大型空间时间模型(RAW)。它具有以下特点:1. 适用于轨迹数据,引入 GPT-like 结构,最多可以 Count 1B 参数。2. 引入空间时间细化模块,解释轨迹为用户集合,生成任意区域嵌入。这个框架允许基于大型空间时间模型的快速任务开发。我们进行了广泛的实验 validate 我们提议的大型空间时间模型的效果。显然,我们的提议方法, solely 基于人口流动数据而无需其他特征,在用户 profiling 和区域分析中 exhibit 一定的相关性。此外,我们的模型在 trajectory 生成任务中表现出了良好的预测能力,提供了可能进一步利用这个大型空间时间模型的机会。

Using Cooperative Game Theory to Prune Neural Networks

  • paper_url: http://arxiv.org/abs/2311.10468
  • repo_url: None
  • paper_authors: Mauricio Diaz-Ortiz Jr, Benjamin Kempinski, Daphne Cornelisse, Yoram Bachrach, Tal Kachman
  • for: 用于降低深度神经网络(DNN)的计算需求,同时保持预测精度。
  • methods: 使用游戏理论支持的方法,根据神经元对预测质量的共同影响进行减少神经网络大小,同时保持预测精度。
  • results: 比较 existing 方法,Game Theory Assisted Pruning(GTAP)在实现参数数量和模型精度之间的平衡方面表现出色。
    Abstract We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks. The ever-growing size of deep neural networks (DNNs) increases their performance, but also their computational requirements. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy. GTAP is based on eliminating neurons in the network based on an estimation of their joint impact on the prediction quality through game theoretic solutions. Specifically, we use a power index akin to the Shapley value or Banzhaf index, tailored using a procedure similar to Dropout (commonly used to tackle overfitting problems in machine learning). Empirical evaluation of both feedforward networks and convolutional neural networks shows that this method outperforms existing approaches in the achieved tradeoff between the number of parameters and model accuracy.
    摘要 我们展示了使用合作游戏理论的解决方案来解决深度神经网络(DNNs)的压缩问题。深度神经网络的 ever-growing 大小可以提高其性能,但也增加计算需求。我们提出了一种方法called Game Theory Assisted Pruning(GTAP),它可以降低神经网络的大小,同时保持预测精度。GTAP 基于 eliminating neurons 在神经网络中,根据游戏理论解决方案估计neurons 的共同影响力。具体来说,我们使用一种力量指数,类似于 Shapley 值或 Banzhaf 指数,通过 Dropout 方法(通常用于避免过拟合问题)进行定制。我们的实验表明,GTAP 方法可以与现有方法比较,在实现参数数量和模型精度之间的平衡中具有更好的性能。

Accurate and Fast Fischer-Tropsch Reaction Microkinetics using PINNs

  • paper_url: http://arxiv.org/abs/2311.10456
  • repo_url: None
  • paper_authors: Harshil Patel, Aniruddha Panda, Tymofii Nikolaienko, Stanislav Jaso, Alejandro Lopez, Kaushic Kalyanaraman
  • for: 用于模拟 Fischer-Tropsch 合成(FTS)中的化学变化。
  • methods: 使用物理学 informed neural networks(PINNs)模型FTS微谱。
  • results: 提出了一种计算高效并准确的方法,可以在现实的生产条件下解决FTS微谱模型。该模型可以准确计算含有活化剂的晶格site的比例,并且可以在GPU上运行,比传统方法快速多少。
    Abstract Microkinetics allows detailed modelling of chemical transformations occurring in many industrially relevant reactions. Traditional way of solving the microkinetics model for Fischer-Tropsch synthesis (FTS) becomes inefficient when it comes to more advanced real-time applications. In this work, we address these challenges by using physics-informed neural networks(PINNs) for modelling FTS microkinetics. We propose a computationally efficient and accurate method, enabling the ultra-fast solution of the existing microkinetics models in realistic process conditions. The proposed PINN model computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with median relative error (MRE) of 0.03%, and the FTS product formation rates with MRE of 0.1%. Compared to conventional equation solvers, the model achieves up to 1E+06 times speed-up when running on GPUs, thus being fast enough for multi-scale and multi-physics reactor modelling and enabling its applications in real-time process control and optimization.
    摘要 The proposed PINN model accurately computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with a median relative error (MRE) of 0.03%. Additionally, the model accurately predicts FTS product formation rates with an MRE of 0.1%. In comparison to conventional equation solvers, the PINN model achieves up to 10^6 times speed-up when running on GPUs, making it fast enough for multi-scale and multi-physics reactor modeling and enabling its applications in real-time process control and optimization.

Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools

  • paper_url: http://arxiv.org/abs/2311.10801
  • repo_url: None
  • paper_authors: Wentao Zhang
  • for: 本研究旨在提高套利投资(Portfolio Management)的效率和可靠性,通过使用强化学习(Reinforcement Learning)训练智能代理人。
  • methods: 本研究提出了一种名为EarnMore的强化学习框架,使用自适应掩码(Maskable Stock Representation)来处理个性化股票池(Customizable Stock Pools,CSPs)。
  • results: 经过广泛的实验 validate 了EarnMore 在 8 个不同股票池的亚集(US 股票市场)上,与 14 个基eline 之间的差异性达到 40% 以上。
    Abstract Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.
    摘要 PORTFOLIO管理(PM)是财务交易中的基本任务,它探索在不同股票中分配资金以实现长期收益。强化学习(RL)在最近几年内已经表现出培养财务交易的可能性,但现有的工作主要集中在固定股票池中,这与投资者的实际需求不符。特别是,投资者的目标股票池在不同的市场状况下会有很大的变化,个人投资者可能会在不同的时间内增加或减少感兴趣的股票,这导致个性化股票池(CSP)。现有的RL方法需要在股票池变化时重新训练RL代理人,这会导致计算成本高涨和性能不稳定。为解决这个挑战,我们提出了EarnMore,一个基于强化学习的投资培养框架,可以处理CSP。具体来说,我们首先引入一种机制,用于在股票外部的表示上层Masking。其次,我们通过一种自我监督的Masking和重建过程来学习有意义的股票表示。最后,我们设计了一种重要性调整机制,以便让股票组合更集中于有利股票,并忽略股票外部的表示。通过对美国股市8个子集股票池进行了广泛的实验,我们证明EarnMore在6种常见的金融指标上显著超过14种基eline的表现,增加了40%以上的利润。

A Bridge between Dynamical Systems and Machine Learning: Engineered Ordinary Differential Equations as Classification Algorithm (EODECA)

  • paper_url: http://arxiv.org/abs/2311.10387
  • repo_url: None
  • paper_authors: Raffaele Marino, Lorenzo Giambagli, Lorenzo Chicchi, Lorenzo Buffoni, Duccio Fanelli
  • for: 这个研究旨在 bridge machine learning 和动力系统,提高机器学习模型的解释性。
  • methods: 这篇论文提出了 Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs),这是基于连续普通微分方程的神经网络模型,具有高度的解释性和高分类性能。
  • results: EODECAs 可以提供高分类性能和自然的解释性,与传统的深度学习模型相比,它们更加透明和可解。
    Abstract In a world increasingly reliant on machine learning, the interpretability of these models remains a substantial challenge, with many equating their functionality to an enigmatic black box. This study seeks to bridge machine learning and dynamical systems. Recognizing the deep parallels between dense neural networks and dynamical systems, particularly in the light of non-linearities and successive transformations, this manuscript introduces the Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs). Uniquely designed as neural networks underpinned by continuous ordinary differential equations, EODECAs aim to capitalize on the well-established toolkit of dynamical systems. Unlike traditional deep learning models, which often suffer from opacity, EODECAs promise both high classification performance and intrinsic interpretability. They are naturally invertible, granting them an edge in understanding and transparency over their counterparts. By bridging these domains, we hope to usher in a new era of machine learning models where genuine comprehension of data processes complements predictive prowess.
    摘要 在一个越来越依赖机器学习的世界中,机器学习模型的解释性仍然是一大挑战,许多人将其功能比作一个神秘的黑盒子。这个研究想要把机器学习和动力系统相连起来。Recognizing the deep parallels between dense neural networks and dynamical systems, particularly in the light of non-linearities and successive transformations, this manuscript introduces the Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs). Uniquely designed as neural networks underpinned by continuous ordinary differential equations, EODECAs aim to capitalize on the well-established toolkit of dynamical systems. Unlike traditional deep learning models, which often suffer from opacity, EODECAs promise both high classification performance and intrinsic interpretability. They are naturally invertible, granting them an edge in understanding and transparency over their counterparts. By bridging these domains, we hope to usher in a new era of machine learning models where genuine comprehension of data processes complements predictive prowess.

Quantum Data Encoding: A Comparative Analysis of Classical-to-Quantum Mapping Techniques and Their Impact on Machine Learning Accuracy

  • paper_url: http://arxiv.org/abs/2311.10375
  • repo_url: None
  • paper_authors: Minati Rath, Hema Date
  • for: 本研究旨在把量子数据嵌入技术与经典机器学习(ML)算法结合起来,以评估性能改进和计算因素的影响。
  • methods: 我们探讨了多种经典数据编码方法,包括基准编码、角度编码和振荡编码,并对各种经典ML算法进行了广泛的实验,包括Logistic Regression、K-Nearest Neighbors、支持向量机和集成方法Like Random Forest、LightGBM、AdaBoost和CatBoost。
  • results: 我们发现,量子数据嵌入可以提高分类精度和F1分数,特别是在具有增强特征表示的模型中。我们发现在运行时间方面,低复杂度模型显示出了中等增加,而更复杂的模型则表现出了明显的变化。意外地,集成方法表现出了有利的平衡,即性能改进和计算开销之间的权衡。这种研究证明了量子数据嵌入在经典ML模型中的潜在优势,并强调在实际应用中考虑性能改进和计算成本之间的平衡。未来的研究可能包括优化量子编码过程以提高计算效率,以及探讨量子嵌入技术在实际应用中的扩展性和可扩展性。
    Abstract This research explores the integration of quantum data embedding techniques into classical machine learning (ML) algorithms, aiming to assess the performance enhancements and computational implications across a spectrum of models. We explore various classical-to-quantum mapping methods, ranging from basis encoding, angle encoding to amplitude encoding for encoding classical data, we conducted an extensive empirical study encompassing popular ML algorithms, including Logistic Regression, K-Nearest Neighbors, Support Vector Machines and ensemble methods like Random Forest, LightGBM, AdaBoost, and CatBoost. Our findings reveal that quantum data embedding contributes to improved classification accuracy and F1 scores, particularly notable in models that inherently benefit from enhanced feature representation. We observed nuanced effects on running time, with low-complexity models exhibiting moderate increases and more computationally intensive models experiencing discernible changes. Notably, ensemble methods demonstrated a favorable balance between performance gains and computational overhead. This study underscores the potential of quantum data embedding in enhancing classical ML models and emphasizes the importance of weighing performance improvements against computational costs. Future research directions may involve refining quantum encoding processes to optimize computational efficiency and exploring scalability for real-world applications. Our work contributes to the growing body of knowledge at the intersection of quantum computing and classical machine learning, offering insights for researchers and practitioners seeking to harness the advantages of quantum-inspired techniques in practical scenarios.
    摘要

Dates Fruit Disease Recognition using Machine Learning

  • paper_url: http://arxiv.org/abs/2311.10365
  • repo_url: None
  • paper_authors: Ghassen Ben Brahim, Jaafar Alghazo, Ghazanfar Latif, Khalid Alnujaidi
  • for: 这项研究的目的是提出一种自动检测日tto fruit疾病的集成解决方案,以提高日tto fruit生产的效率和质量。
  • methods: 该研究使用了计算机视觉、机器学习、无人机技术等新技术,提出了一种基于Lab颜色特征、统计特征和DWT纹理特征的混合特征提取方法,以早期检测和识别日tto fruit疾病。
  • results: 研究发现,将Lab、统计和DWT特征结合使用,可以提高检测精度和综合性,并且在871个图像中实现了95.2%的平均准确率。
    Abstract Many countries such as Saudi Arabia, Morocco and Tunisia are among the top exporters and consumers of palm date fruits. Date fruit production plays a major role in the economies of the date fruit exporting countries. Date fruits are susceptible to disease just like any fruit and early detection and intervention can end up saving the produce. However, with the vast farming lands, it is nearly impossible for farmers to observe date trees on a frequent basis for early disease detection. In addition, even with human observation the process is prone to human error and increases the date fruit cost. With the recent advances in computer vision, machine learning, drone technology, and other technologies; an integrated solution can be proposed for the automatic detection of date fruit disease. In this paper, a hybrid features based method with the standard classifiers is proposed based on the extraction of L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features for the early detection and classification of date fruit disease. A dataset was developed for this work consisting of 871 images divided into the following classes; Healthy date, Initial stage of disease, Malnourished date, and Parasite infected. The extracted features were input to common classifiers such as the Random Forest (RF), Multilayer Perceptron (MLP), Na\"ive Bayes (NB), and Fuzzy Decision Trees (FDT). The highest average accuracy was achieved when combining the L*a*b, Statistical, and DWT Features.
    摘要 许多国家如沙特阿拉伯、摩洛哥和突尼斯等是dates果实的主要出口国和消费国。dates果实的生产对出口国经济发展具有重要作用。然而,由于庞大的农业地域,农民无法在一定时间内频繁地检查dates Tree,因此早期病诊断和 intervención可能会增加dates fruit的成本。在现代计算机视觉、机器学习、无人机技术等的支持下,一个整合的解决方案可以提议用于自动检测dates fruit病诊断。在这篇论文中,一种基于混合特征的方法被提议,该方法基于L*a*b颜色特征、统计特征和Discrete Wavelet Transform(DWT)Texture特征进行早期检测和分类dates fruit病诊断。一个数据集被开发用于这项工作,该数据集包括871张图像,分为以下类别:健康的dates,病诊断的初期阶段,营养不良的dates和受到寄生虫感染的。提取的特征被输入到常见的分类器 such as Random Forest(RF)、Multilayer Perceptron(MLP)、Na\"ive Bayes(NB)和Fuzzy Decision Trees(FDT)。结果表明,将L*a*b、统计和DWT特征结合使用时,获得了最高的平均准确率。

Quantum-Assisted Simulation: A Framework for Designing Machine Learning Models in the Quantum Computing Domain

  • paper_url: http://arxiv.org/abs/2311.10363
  • repo_url: None
  • paper_authors: Minati Rath, Hema Date
  • for: 这篇论文主要针对的是使用量子计算(QC)技术加速机器学习(ML)模型的训练,以提高数据处理效率和准确率。
  • methods: 本论文使用了量子机器学习(QML)算法,将传统的机器学习算法映射到量子机制域中,以实现对大数据的加速处理。
  • results: 通过对一个数据集使用机器学习和量子机器学习两种方法进行比较,研究发现量子机器学习方法可以提高数据处理效率和准确率。
    Abstract Machine learning (ML) models are trained using historical data to classify new, unseen data. However, traditional computing resources often struggle to handle the immense amount of data, commonly known as Big Data, within a reasonable timeframe. Quantum computing (QC) provides a novel approach to information processing. Quantum algorithms have the potential to process classical data exponentially faster than classical computing. By mapping quantum machine learning (QML) algorithms into the quantum mechanical domain, we can potentially achieve exponential improvements in data processing speed, reduced resource requirements, and enhanced accuracy and efficiency. In this article, we delve into both the QC and ML fields, exploring the interplay of ideas between them, as well as the current capabilities and limitations of hardware. We investigate the history of quantum computing, examine existing QML algorithms, and aim to present a simplified procedure for setting up simulations of QML algorithms, making it accessible and understandable for readers. Furthermore, we conducted simulations on a dataset using both machine learning and quantum machine learning approaches. We then proceeded to compare their respective performances by utilizing a quantum simulator.
    摘要 机器学习(ML)模型通过历史数据来分类新的、未经见过的数据。然而,传统计算资源经常无法处理大量数据,通常被称为大数据,在合理的时间framworks内进行处理。量子计算(QC)提供了一种新的信息处理方法。量子算法有能力在经典计算中进行批处理,并且可以在批处理中实现对数据的快速分类。通过将量子机器学习(QML)算法映射到量子机械领域,我们可以实现对数据的快速处理,降低资源需求,提高准确率和效率。在这篇文章中,我们将探讨QC和ML两个领域之间的交互,以及现有硬件的能力和限制。我们还会investigate量子计算的历史,检查现有的QML算法,并尝试提供一个简化的程序来设置QML算法的 simulations,使其更加可读性和可理解性。此外,我们还进行了一个数据集的simulation,使用了机器学习和量子机器学习两种方法。然后,我们对这两种方法的性能进行了比较,使用了量子模拟器。

INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis

  • paper_url: http://arxiv.org/abs/2311.10798
  • repo_url: None
  • paper_authors: Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, Jason A. Fries
  • for: 这个论文的目的是为了提供一个基准数据集,用于评估多modal的医学应用程序的性能。
  • methods: 这个论文使用了多modal的医学数据集,包括CT图像、辐射报告和结构化电子医疗记录数据。
  • results: 这个论文提出了一个基准数据集,用于评估多modal的医学模型的性能。这个数据集包括19,402名病人的数据,包括CT图像、辐射报告和结构化电子医疗记录数据。
    Abstract Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research.
    摘要 现代医学中合并信息 FROM多种数据源的Synthesizing plays a crucial role。Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets。To address this limitation, we introduce INSPECT,which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes。INSPECT contains data from 19,402 patients,including CT images,radiology report impression sections,and structured electronic health record (EHR) data(i.e. demographics,diagnoses,procedures,vitals,and medications)。Using INSPECT,we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks。We evaluate image-only,EHR-only,and multimodal fusion models。Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement。To the best of our knowledge,INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research。

TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes

  • paper_url: http://arxiv.org/abs/2311.10797
  • repo_url: None
  • paper_authors: Bibek Upadhayay, Vahid Behzadan
  • for: 提出了一种cost-effective的解决方案,以便在多语言 Setting中训练和调节LLMs。
  • methods: 提出了一种名为“TaCo:翻译帮助交叉语言”的新方法,该方法利用翻译进行链式思考过程,以实现在新语言上进行 instrucion-tuning LLMs。
  • results: 对于三种低资源语言和一种高资源语言进行了进一步的 instrucion-tuning,并通过比较result表明,TaCo方法可以提高GPT-4的性能, especailly for low-resource languages。
    Abstract LLMs such as ChatGPT and PaLM can be utilized to train on a new language and revitalize low-resource languages. However, it is evidently very costly to pretrain pr fine-tune LLMs to adopt new languages. Another challenge is the limitation of benchmark datasets and the metrics used to measure the performance of models in multilingual settings. This paper proposes cost-effective solutions to both of the aforementioned challenges. We introduce the Multilingual Instruction-Tuning Dataset (MITS), which is comprised of the translation of Alpaca-52K, Dolly-15K, and Vicuna Benchmark in 132 languages. Also, we propose a new method called \emph{TaCo: Translation-Assisted Cross-Linguality}, which make uses of translation in a chain-of-thought process to instruction-tune LLMs on a new languages through a curriculum learning process. As a proof of concept, we experimented with the instruction-tuned Guanaco-33B model and performed further instruction tuning using the TaCo method in three low-resource languages and one high-resource language. Our results show that the TaCo method impresses the GPT-4 with 82% for a low-resource language in the Vicuna Benchmark dataset, and boosts performance by double in contrast to the performance of instruction tuning only. Our results show that TaCo is a promising method for creating multilingual LLMs, even for low-resource languages. We have released our datasets and the model adapters, and encourage the research community to make use of these resources towards advancing work on multilingual LLMs.
    摘要 LLMs 如 ChatGPT 和 PaLM 可以用来训练新语言并恢复低资源语言。然而,训练或精度调整 LLMs 以采用新语言明显很昂贵。另外,评价模型在多语言设置下的表现也受到限制。这篇论文提出了经济的解决方案。我们引入了多语言指导集(MITS),其包括了 Alpaca-52K、Dolly-15K 和 Vicuna Benchmark 的翻译版本,涵盖 132 种语言。此外,我们提出了一种新方法 called “TaCo:翻译协助跨语言”,它利用翻译的链条过程来帮助 LLMs 在新语言上进行指导调整。作为证明,我们在 Guanaco-33B 模型上进行了进一步的指导调整,并使用 TaCo 方法在三种低资源语言和一种高资源语言进行了实验。我们的结果表明,TaCo 方法可以在 Vicuna Benchmark 数据集中让 GPT-4 的表现提高至 82%,并在低资源语言中提高表现的两倍。我们的结果表明,TaCo 是一种有前途的方法,可以创造多语言 LLMs,即使是低资源语言。我们已经发布了数据集和模型适配器,并鼓励研究社区使用这些资源来推进多语言 LLMs 的研究。

Federated Knowledge Graph Completion via Latent Embedding Sharing and Tensor Factorization

  • paper_url: http://arxiv.org/abs/2311.10341
  • repo_url: None
  • paper_authors: Maolin Wang, Dun Zeng, Zenglin Xu, Ruocheng Guo, Xiangyu Zhao
  • for: 这篇论文主要用于提出一种新的 federated tensor factorization 方法,用于解决分布式知识图(KG)完成任务中的隐私问题。
  • methods: 该方法使用 federated tensor factorization 技术,包括嵌入矩阵分解和秘密词汇共享,以降低隐私风险。
  • results: 实验结果表明,FLEST 方法可以具有较高的效率和隐私保护,同时也能够保持完成任务的性能。
    Abstract Knowledge graphs (KGs), which consist of triples, are inherently incomplete and always require completion procedure to predict missing triples. In real-world scenarios, KGs are distributed across clients, complicating completion tasks due to privacy restrictions. Many frameworks have been proposed to address the issue of federated knowledge graph completion. However, the existing frameworks, including FedE, FedR, and FEKG, have certain limitations. = FedE poses a risk of information leakage, FedR's optimization efficacy diminishes when there is minimal overlap among relations, and FKGE suffers from computational costs and mode collapse issues. To address these issues, we propose a novel method, i.e., Federated Latent Embedding Sharing Tensor factorization (FLEST), which is a novel approach using federated tensor factorization for KG completion. FLEST decompose the embedding matrix and enables sharing of latent dictionary embeddings to lower privacy risks. Empirical results demonstrate FLEST's effectiveness and efficiency, offering a balanced solution between performance and privacy. FLEST expands the application of federated tensor factorization in KG completion tasks.
    摘要 知识图(KG)是一种嵌入式的三元组,总是缺失一些 triple,需要完成过程来预测缺失的 triple。在实际应用中,KG 分布在客户端上,因此完成任务变得更加复杂,因为隐私限制。许多框架已经被提出来解决联邦知识图完成任务中的问题,包括 FedE、FedR 和 FEKG,但是这些框架都有一些局限性。FedE 可能会导致信息泄露,FedR 的优化效果随着关系的重叠度下降,而 FEKG 则受到计算成本和模式塌 collapse 问题的限制。为了解决这些问题,我们提出了一种新的方法,即联邦隐藏嵌入分解 tensor factorization (FLEST),它使用联邦tensor factorization来完成KG completion任务。FLEST 将嵌入矩阵分解成多个独立的嵌入矩阵,从而降低了隐私风险。我们的实验结果表明,FLEST 具有较高的效果和效率,可以均衡性和隐私之间的 contradiction。FLEST 扩展了联邦tensor factorization在KG completion任务中的应用范围。

Emotion-Aware Music Recommendation System: Enhancing User Experience Through Real-Time Emotional Context

  • paper_url: http://arxiv.org/abs/2311.10796
  • repo_url: None
  • paper_authors: Tina Babu, Rekha R Nair, Geetha A
  • for: 这项研究旨在改进传统的音乐推荐系统,强调用户的情感因素在音乐选择中的重要性。
  • methods: 该研究提出了一种基于人工智能的音乐推荐模型,通过准确检测用户当前情感状态,为用户提供个性化的音乐推荐。
  • results: 该模型可以增强用户音乐经验,为用户提供与当前情感状态相符的音乐推荐,从而创造更有意义和有感的听众体验。
    Abstract This study addresses the deficiency in conventional music recommendation systems by focusing on the vital role of emotions in shaping users music choices. These systems often disregard the emotional context, relying predominantly on past listening behavior and failing to consider the dynamic and evolving nature of users emotional preferences. This gap leads to several limitations. Users may receive recommendations that do not match their current mood, which diminishes the quality of their music experience. Furthermore, without accounting for emotions, the systems might overlook undiscovered or lesser-known songs that have a profound emotional impact on users. To combat these limitations, this research introduces an AI model that incorporates emotional context into the song recommendation process. By accurately detecting users real-time emotions, the model can generate personalized song recommendations that align with the users emotional state. This approach aims to enhance the user experience by offering music that resonates with their current mood, elicits the desired emotions, and creates a more immersive and meaningful listening experience. By considering emotional context in the song recommendation process, the proposed model offers an opportunity for a more personalized and emotionally resonant musical journey.
    摘要 To address these limitations, this research introduces an AI model that incorporates emotional context into the song recommendation process. By accurately detecting users' real-time emotions, the model can generate personalized song recommendations that align with the users' emotional state. This approach aims to enhance the user experience by offering music that resonates with their current mood, elicits the desired emotions, and creates a more immersive and meaningful listening experience.By considering emotional context in the song recommendation process, the proposed model offers an opportunity for a more personalized and emotionally resonant musical journey. This study has the potential to revolutionize the music industry by providing users with a more tailored and emotionally satisfying experience, ultimately leading to a more engaging and fulfilling listening experience.

High-fidelity Person-centric Subject-to-Image Synthesis

  • paper_url: http://arxiv.org/abs/2311.10329
  • repo_url: None
  • paper_authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
  • for: 这 paper written for improving the quality of person-centric image generation, specifically addressing the challenges of training imbalance and quality compromise in current subject-driven image generation methods.
  • methods: 这 paper propose a collaborative generation pipeline called Face-diffuser, which consists of two specialized pre-trained diffusion models (TDM and SDM) and a novel mechanism called Saliency-adaptive Noise Fusion (SNF) to eliminate training imbalance and quality compromise.
  • results: 该 paper achieve impressive effectiveness and robustness in person-centric image generation, with extensive experiments confirming the improved performance of Face-diffuser over existing methods.
    Abstract Current subject-driven image generation methods encounter significant challenges in person-centric image generation. The reason is that they learn the semantic scene and person generation by fine-tuning a common pre-trained diffusion, which involves an irreconcilable training imbalance. Precisely, to generate realistic persons, they need to sufficiently tune the pre-trained model, which inevitably causes the model to forget the rich semantic scene prior and makes scene generation over-fit to the training data. Moreover, even with sufficient fine-tuning, these methods can still not generate high-fidelity persons since joint learning of the scene and person generation also lead to quality compromise. In this paper, we propose Face-diffuser, an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise. Specifically, we first develop two specialized pre-trained diffusion models, i.e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively. The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement. The first and last stages are performed by TDM and SDM respectively. The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion (SNF). Specifically, it is based on our key observation that there exists a robust link between classifier-free guidance responses and the saliency of generated images. In each time step, SNF leverages the unique strengths of each model and allows for the spatial blending of predicted noises from both models automatically in a saliency-aware manner. Extensive experiments confirm the impressive effectiveness and robustness of the Face-diffuser.
    摘要 现有的主题驱动图像生成方法遇到了人центric图像生成中的显著挑战。原因在于它们通过精度调整一个共同预训练的扩散来学习 semantic scene和人类生成,而这种培训不匹配导致模型忘记了丰富的 semantic scene prior,从而导致场景生成过拟合训练数据。此外,即使充分调整,这些方法仍然无法生成高效的人类,因为共同学习场景和人类生成也会导致质量牺牲。在这篇论文中,我们提出了Face-diffuser,一种有效的合作生成管线,以消除以上培训不匹配和质量牺牲。具体来说,我们首先开发了两个特殊的预训练扩散模型,即 Text-driven Diffusion Model (TDM) 和 Subject-augmented Diffusion Model (SDM),用于场景和人类生成。采样过程分为三个顺序阶段:semantic scene construction、subject-scene fusion 和 subject enhancement。第一个和最后一个阶段由 TDM 和 SDM 执行。subject-scene fusion 阶段通过一种新的、非常有效的机制——Saliency-adaptive Noise Fusion (SNF) 实现了合作。具体来说,SNF 基于我们关键观察到的,在生成图像中存在稳定的静止特征的robust链接。在每次迭代中,SNF 利用每个模型的独特优势,自动在一种saliency-aware的方式进行空间混合预测噪音。广泛的实验证明了Face-diffuser 的强大效果和稳定性。

Clustering Techniques for Stable Linear Dynamical Systems with applications to Hard Disk Drives

  • paper_url: http://arxiv.org/abs/2311.10322
  • repo_url: None
  • paper_authors: Nikhil Potu Surya Prakash, Joohwan Seo, Jongeun Choi, Roberto Horowitz
  • for: 这个论文是为了设计对多个植入函数或家族植入函数的稳定控制器而写的。
  • methods: 这篇论文使用了 clustering 技术来分组稳定线性动力系统,以便设计每个群组内的优化控制器。
  • results: 该论文提出了基于 k-medoids 算法的硬 clustering 方法和基于 Gaussian Mixture Models 的特殊类 LTI 系统 clustering 方法,以便设计对每个群组内的优化控制器。
    Abstract In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This paper presents a way of clustering stable linear dynamical systems for the design of robust controllers within each of the clusters such that the controllers are optimal for each of the clusters. First a k-medoids algorithm for hard clustering will be presented for stable Linear Time Invariant (LTI) systems and then a Gaussian Mixture Models (GMM) clustering for a special class of LTI systems, common for Hard Disk Drive plants, will be presented.
    摘要 在Robust控制和数据驱动Robust控制设计方法中,考虑多个植入函数或一家植入函数家族,设计一个通用控制器,使得所有fall into这个家族的植入函数都稳定。虽然植入函数都稳定,但控制器可能对每个植入函数进行优化。这篇文章介绍了稳定线性动力系统集群的方法,以设计内部的优化控制器。首先,将介绍k-medians算法 для硬 clustering稳定线性时间不变(LTI)系统,然后介绍Gaussian Mixture Models(GMM)集群方法,专门适用于硬盘植入系统。

Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification

  • paper_url: http://arxiv.org/abs/2311.10319
  • repo_url: None
  • paper_authors: Pranav Singh, Raviteja Chukkapalli, Shravan Chaudhari, Luoyao Chen, Mei Chen, Jinqian Pan, Craig Smuda, Jacopo Cirrone
  • for: 提高临床治疗和研究的前进受到了监督学习技术的限制,这些技术需要大量的标注数据,并且需要许多临床专家的时间。
  • methods: 本文提出使用自监学习和半监学习技术,这些技术可以进行无标签任务,可以轻松地扩大机器监督的覆盖率,相比完全监督技术更加容易。
  • results: 本文提出的S4MI(自监学习和半监学习 для医疗影像)管道可以在三个医疗影像数据集上进行分类和 segmentation 的检验,并且在大多数数据集中,使用10%的标注 perfomed 更好于100%的标注。半监学习方法在 segmentation 中取得了良好的效果,使用50% fewer labels 在所有三个数据集中都超过了完全监督方法。
    Abstract Advancements in clinical treatment and research are limited by supervised learning techniques that rely on large amounts of annotated data, an expensive task requiring many hours of clinical specialists' time. In this paper, we propose using self-supervised and semi-supervised learning. These techniques perform an auxiliary task that is label-free, scaling up machine-supervision is easier compared with fully-supervised techniques. This paper proposes S4MI (Self-Supervision and Semi-Supervision for Medical Imaging), our pipeline to leverage advances in self and semi-supervision learning. We benchmark them on three medical imaging datasets to analyze their efficacy for classification and segmentation. This advancement in self-supervised learning with 10% annotation performed better than 100% annotation for the classification of most datasets. The semi-supervised approach yielded favorable outcomes for segmentation, outperforming the fully-supervised approach by using 50% fewer labels in all three datasets.
    摘要 临床治疗和研究的进步受到监督学习技术的限制,这些技术需要大量的注解数据,需要许多临床专家的时间和努力。在这篇论文中,我们提议使用自监学习和半监学习技术。这些技术可以完成无标签任务,比完全监学技术更容易扩大机器监督。我们提出S4MI(自监学和半监学医学影像处理管道),我们的管道可以利用自监学和半监学技术来提高医学影像处理的效果。我们在三个医学影像数据集上对S4MI进行了比较,发现在大多数数据集中,自监学技术比100%注解技术表现更好,而半监学技术在所有三个数据集中对segmentation问题表现更好,只需使用50%的标签数据。

Supervised structure learning

  • paper_url: http://arxiv.org/abs/2311.10300
  • repo_url: https://github.com/alteryx/compose
  • paper_authors: Karl J. Friston, Lancelot Da Costa, Alexander Tschantz, Alex Kiefer, Tommaso Salvatori, Victorita Neacsu, Magnus Koudahl, Conor Heins, Noor Sajid, Dimitrije Markovic, Thomas Parr, Tim Verbelen, Christopher L Buckley
  • for: 这篇论文关注于构造学习或发现抽象生成模型。它强调 bayesian 模型选择和训练数据或内容的吸收,尤其是数据的顺序排序。
  • methods: 该论文使用 bayesian 模型选择,基于预先知道的结果(即偏好结果)来设置约束。在这种设置下,预期自由能量降为受约束的共识信息。
  • results: 在 MNIST dataset 上进行图像分类示例中,该方法可以快速地学习抽象生成模型。在一个更加复杂的问题中,使用一种简单的 sprite-based 视觉分解问题和 Tower of Hanoi 问题,生成模型可以自动地恢复(即分离)积极状态的因素结构和特征路径或动态。
    Abstract This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move - in the ensuing schemes - is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states - and their characteristic paths or dynamics.
    摘要 Translation notes:* "discrete generative models" 转为 "整数生成模型"* "Bayesian model selection" 转为 " Bayesian 模型选择"* "assimilation of training data or content" 转为 "训练数据或内容的吸收"* "priors on the selection of models" 转为 "模型选择的先验"* "expected free energy" 转为 "预期的自由能"* "constrained mutual information" 转为 "约束的共同信息"* "outcomes" 转为 "结果"* "preferred outcomes" 转为 "偏好的结果"* "autodidactically" 转为 "自学习地"

Traffic Sign Interpretation in Real Road Scene

  • paper_url: http://arxiv.org/abs/2311.10793
  • repo_url: None
  • paper_authors: Chuang Yang, Kai Zhuang, Mulin Chen, Haozhao Ma, Xu Han, Tao Han, Changxing Guo, Han Han, Bingxuan Zhao, Qi Wang
  • for: 本研究旨在提供高精度的自动驾驶或助手驾驶支持,通过解决现有的交通标志检测和识别问题。
  • methods: 本研究提出了交通标志解释(TSI)任务,用于解释交通标志的全球 semantics 逻辑,并将其转换为自然语言,以提供高精度的交通指导。同时,我们设计了一种多任务学习架构,用于实现 TSI 任务,包括交通标志检测和识别以及交通标志的自然语言解释。
  • results: 我们在 TSI-CN 数据集上进行了实验,并证明了 TSI 任务的可行性,并且 TSI 架构可以从场景中成功地解释交通标志,即使存在复杂的 semantics 逻辑。
    Abstract Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.
    摘要 现有的交通标志相关工作都是专注于分别检测和识别交通标志,而忽略了交通标志之间的全球semantic逻辑,可能导致不准确的交通指示。为了解决这些问题,我们提出了交通标志解释(TSI)任务,该任务旨在将交通标志转化为自然语言,以提供准确的交通指示支持给自动驾驶或助手驾驶。同时,我们设计了一种多任务学习架构 для TSI,该架构负责检测和识别各种交通标志,并将它们转化为自然语言,类似于人类。由于没有公开的TSI可用数据集,我们建立了交通标志解释数据集(TSI-CN)。该数据集包括真实的路面场景图像, captured from 高速公路和城市道路在中国 driver's 视角。它包含了文本、符号和引导板的准确位置标签,以及相应的自然语言描述标签。实验表明,TSI任务是可行的,并且TSI架构可以从场景中成功地解释交通标志,即使存在复杂的semantic逻辑。TSI-CN数据集和TSI架构的源代码将在修订过程后公开。

Attention Mechanism for Lithium-Ion Battery Lifespan Prediction: Temporal and Cyclic Attention

  • paper_url: http://arxiv.org/abs/2311.10792
  • repo_url: None
  • paper_authors: Jaewook Lee, Seongmin Heo, Jay H. Lee
  • for: 预测锂离子电池(LIB)寿命,以便优化使用和避免事故。
  • methods: employs attention mechanisms(AM)construct data-driven models for predicting LIB lifespan using easily measurable inputs such as voltage, current, temperature, and capacity data.
  • results: 1) 通过使用时间注意力(TA)和循环注意力(CA),提高预测精度和描述输入数据中关键特征。2) 计算TA scores highlights the rest phase as a key characteristic distinguishing LIB data among different batches.3) CA scores reveal variations in the importance of cycles across batches, and the potential to reduce the number of cycles in the input data.
    Abstract Accurately predicting the lifespan of lithium-ion batteries (LIBs) is pivotal for optimizing usage and preventing accidents. Previous studies in constructing prediction models often relied on inputs challenging to measure in real-time operations and failed to capture intra-cycle and inter-cycle data patterns, essential features for accurate predictions, comprehensively. In this study, we employ attention mechanisms (AM) to develop data-driven models for predicting LIB lifespan using easily measurable inputs such as voltage, current, temperature, and capacity data. The developed model integrates recurrent neural network (RNN) and convolutional neural network (CNN) components, featuring two types of attention mechanisms: temporal attention (TA) and cyclic attention (CA). The inclusion of TA aims to identify important time steps within each cycle by scoring the hidden states of the RNN, whereas CA strives to capture key features of inter-cycle correlations through self-attention (SA). This enhances model accuracy and elucidates critical features in the input data. To validate our method, we apply it to publicly available cycling data consisting of three batches of cycling modes. The calculated TA scores highlight the rest phase as a key characteristic distinguishing LIB data among different batches. Additionally, CA scores reveal variations in the importance of cycles across batches. By leveraging CA scores, we explore the potential to reduce the number of cycles in the input data. The single-head and multi-head attentions enable us to decrease the input dimension from 100 to 50 and 30 cycles, respectively.
    摘要 预测锂离子电池(LIB)的寿命是非常重要的,以便优化使用和避免事故。之前的研究常常建立预测模型,但这些模型通常需要难以在实时操作中测量的输入,并且未能捕捉到循环和循环之间的数据模式,这些特征是精确预测的关键。在本研究中,我们使用注意力机制(AM)来开发基于数据驱动的LIB寿命预测模型。我们的模型结合了循环神经网络(RNN)和卷积神经网络(CNN)组件,并包括两种注意力机制:时间注意力(TA)和循环注意力(CA)。TA的目的是在每个循环中 identific important的时间步骤,而CA则是捕捉循环之间的关键特征。这使得我们的模型更加准确,并且可以帮助我们理解输入数据中的关键特征。为了验证我们的方法,我们对公共可用的循环数据进行应用,该数据包括三批循环模式。计算的TA分数显示,循环静止阶段是不同批次中LIB数据的关键特征。此外,CA分数还揭示了循环之间的重要性差异。通过利用CA分数,我们可以考虑将输入数据减少到30循环或50循环。单头和多头注意力允许我们降低输入维度。

Physics-Enhanced Multi-fidelity Learning for Optical Surface Imprint

  • paper_url: http://arxiv.org/abs/2311.10278
  • repo_url: None
  • paper_authors: Yongchao Chen
  • for: 本研究旨在利用多似真神经网络(MFNN)解决材料归一化问题,即将仪器测量的弹性印记转换为真实的弹性曲线。
  • methods: 本研究使用了多似真神经网络(MFNN)模型,首先通过纯 simulate 数据进行 aktif 训练,然后通过传输学习将 sim-to-real 距离降低。特点是通过神经网络提取未知物理特性,同时还注入已知物理特性到传输学习框架中,从而大幅提高模型稳定性和数据需求。
  • results: 本研究结果表明,通过使用多似真神经网络(MFNN)模型可以高效地解决材料归一化问题,并且可以减少数据需求和提高模型稳定性。这种方法可以在实验研究中应用到受限于数据约束和真实变化的场景中。
    Abstract Human fingerprints serve as one unique and powerful characteristic for each person, from which policemen can recognize the identity. Similar to humans, many natural bodies and intrinsic mechanical qualities can also be uniquely identified from surface characteristics. To measure the elasto-plastic properties of one material, one formally sharp indenter is pushed into the measured body under constant force and retracted, leaving a unique residual imprint of the minute size from several micrometers to nanometers. However, one great challenge is how to map the optical image of this residual imprint into the real wanted mechanical properties, i.e., the tensile force curve. In this paper, we propose a novel method to use multi-fidelity neural networks (MFNN) to solve this inverse problem. We first actively train the NN model via pure simulation data, and then bridge the sim-to-real gap via transfer learning. The most innovative part is that we use NN to dig out the unknown physics and also implant the known physics into the transfer learning framework, thus highly improving the model stability and decreasing the data requirement. This work serves as one great example of applying machine learning into the real experimental research, especially under the constraints of data limitation and fidelity variance.
    摘要 人类指印也是每个人的唯一和强大特征之一,它可以帮助警察认定人员身份。与人类类似,许多自然体系和内在机械特性也可以通过表面特征进行uniquely标识。为了测量材料的弹性挤压性能,我们使用一个高精度的检测器压入测量体系,并在Constant Force下 retraction,从而留下微米级别的剩下印记。然而,一大问题是如何将光学图像转化为真实想要的机械性能Curve。在这篇论文中,我们提出了一种新的方法,使用多种信任度神经网络(MFNN)解决这个逆问题。我们首先通过纯度 simulation 数据 актив地训练 NN 模型,然后使用传输学习桥接 sim-to-real 漏洞。最创新的部分是使用 NN 挖掘未知物理,同时还注入已知物理到传输学习框架中,从而高度提高模型稳定性和数据需求。这项工作作为机器学习在实验研究中的应用,特别是在数据约束和可靠性变化的情况下。

Interpretable pap smear cell representation for cervical cancer screening

  • paper_url: http://arxiv.org/abs/2311.10269
  • repo_url: None
  • paper_authors: Yu Ando, Nora Jee-Young Park and, Gun Oh Chong, Seokhwan Ko, Donghyeon Lee, Junghwan Cho, Hyungsoo Han
  • for: 本研究旨在开发一种基于一类分类的变分自动编码器来学习可解释的深度陵胞细胞图像表示,以自动化陵胞检测。
  • methods: 本研究使用变分自动编码器来学习陵胞细胞图像表示,并使用一类分类来计算细胞异常程度。
  • results: 研究发现,使用变分自动编码器可以计算细胞异常程度,并可以在不使用异常样本的情况下进行训练。最佳模型在分类陵胞细胞癌(SCC)和常见细胞(NOR)之间的区别达到0.908+-0.003的AUC,而在分类高级细胞变化(HSIL)和常见细胞之间的区别达到0.920+-0.002的AUC。相比其他聚类方法,我们的方法可以更好地隔离不同的异常区域,帮助解释我们的结果。
    Abstract Screening is critical for prevention and early detection of cervical cancer but it is time-consuming and laborious. Supervised deep convolutional neural networks have been developed to automate pap smear screening and the results are promising. However, the interest in using only normal samples to train deep neural networks has increased owing to class imbalance problems and high-labeling costs that are both prevalent in healthcare. In this study, we introduce a method to learn explainable deep cervical cell representations for pap smear cytology images based on one class classification using variational autoencoders. Findings demonstrate that a score can be calculated for cell abnormality without training models with abnormal samples and localize abnormality to interpret our results with a novel metric based on absolute difference in cross entropy in agglomerative clustering. The best model that discriminates squamous cell carcinoma (SCC) from normals gives 0.908 +- 0.003 area under operating characteristic curve (AUC) and one that discriminates high-grade epithelial lesion (HSIL) 0.920 +- 0.002 AUC. Compared to other clustering methods, our method enhances the V-measure and yields higher homogeneity scores, which more effectively isolate different abnormality regions, aiding in the interpretation of our results. Evaluation using in-house and additional open dataset show that our model can discriminate abnormality without the need of additional training of deep models.
    摘要 屏选是预防和早期检测颈部癌症的关键,但是它却是时间consuming和劳动密集的。在超vised深度征 neural networks 中,有人开发了 automate pap smear 屏选方法,并且结果很有 promise。然而,因为医疗行业中的类别不均和标签成本高的问题,有越来越多的人关注使用正常样本来训练深度神经网络的可能性。在这项研究中,我们提出了一种方法,通过一类分类使用变量自动encoder来学习可解释深度颈部细胞表示图像。我们的发现是,可以计算细胞异常程度无需训练异常样本,并且可以将异常区域当地化,以便更好地解释我们的结果。我们的最佳模型可以在SCC和正常样本之间分类,得到0.908 ± 0.003的AUC,而在HSIL和正常样本之间分类,得到0.920 ± 0.002的AUC。相比其他分 clustering 方法,我们的方法可以提高V-度量和Homogeneity 分数,从而更好地隔离不同的异常区域,帮助解释我们的结果。我们的模型可以在我们的实验室和其他开放数据集上进行评估,并且可以在不需要深度模型进行额外训练的情况下,有效地检测异常。

FedTruth: Byzantine-Robust and Backdoor-Resilient Federated Learning Framework

  • paper_url: http://arxiv.org/abs/2311.10248
  • repo_url: None
  • paper_authors: Sheldon C. Ebron Jr., Kan Yang
  • for: 本研究旨在提出一种robust的防御机制,以保护 Federated Learning(FL)中的机器学习模型免受恶意客户端的攻击。
  • methods: 本研究使用了一种名为FedTruth的新防御机制,该机制不假设特定的数据分布,并不需要benign的根据 dataset。它通过动态权重聚合来估算全局的模型更新,考虑所有benign客户端的贡献。
  • results: 实验表明,FedTruth可以有效地防止恶意客户端的攻击,使FL模型免受模型毒化和后门攻击的影响。
    Abstract Federated Learning (FL) enables collaborative machine learning model training across multiple parties without sharing raw data. However, FL's distributed nature allows malicious clients to impact model training through Byzantine or backdoor attacks, using erroneous model updates. Existing defenses measure the deviation of each update from a 'ground-truth model update.' They often rely on a benign root dataset on the server or use trimmed mean or median for clipping, both methods having limitations. We introduce FedTruth, a robust defense against model poisoning in FL. FedTruth doesn't assume specific data distributions nor requires a benign root dataset. It estimates a global model update with dynamic aggregation weights, considering contributions from all benign clients. Empirical studies demonstrate FedTruth's efficacy in mitigating the impacts of poisoned updates from both Byzantine and backdoor attacks.
    摘要 共同学习(Federated Learning,FL)允许多方合作培训机器学习模型,无需分享原始数据。然而,FL的分布式结构允许有恶意客户端通过Byzantine或后门攻击,使用错误的模型更新。现有防御方法测量每个更新的偏差,通常基于服务器上的善意根据dataset或使用截断平均或中值clip,两种方法均有局限性。我们介绍FedTruth,一种鲁棒的FL模型报 poisoning防御方法。FedTruth不假设特定的数据分布,不需要善意根据dataset。它估算全局模型更新,使用动态汇集权重考虑所有善意客户端的贡献。实验研究表明FedTruth有效地减少了恶意更新的影响,包括Byzantine和后门攻击。

Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning

  • paper_url: http://arxiv.org/abs/2311.10246
  • repo_url: None
  • paper_authors: Amartya Banerjee, Christopher J. Hazard, Jacob Beel, Cade Mack, Jack Xia, Michael Resnick, Will Goddin
  • for: 本研究旨在提出一种基于信息理论的 robust和可解释的$k$-nearest neighbors($k$-NN)算法框架,用于进行分类、回归和异常检测等任务。
  • methods: 该算法使用一种新的表示形式的\textit{surprisal}(用于解释差异)而不是传统的距离度量,从而提高了模型的可解释性和鲁棒性。
  • results: 研究表明,使用该算法可以在分类、回归和异常检测等任务上达到或超过现有技术的性能,同时提供了新的数据和预测概念,以增强模型的解释性。
    Abstract Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, and anomaly detection using a single model. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of \textit{surprisal} (amount of information required to explain the difference between the observed and expected result). Finally, we demonstrate this architecture's capability to perform at-par or above the state-of-the-art on classification, regression, and anomaly detection tasks using a single model with enhanced interpretability by providing novel concepts for characterizing data and predictions.
    摘要 在传统的最近邻算法中,使用的距离度量需要缩放和contextualize,我们使用一种新的表述 Surprisal(需要解释异常结果的信息量)。最后,我们示出了这个架构可以在分类、回归和异常检测任务中表现至少与现有技术水平一样,同时提供了新的概念来描述数据和预测。

Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors

  • paper_url: http://arxiv.org/abs/2311.10788
  • repo_url: None
  • paper_authors: Peter Grönquist, Yufan Ren, Qingyi He, Alessio Verardo, Sabine Süsstrunk
  • for: 这篇论文旨在探讨一种新的深度伪造检测方法,它可以快速和计算效率地检测视频中的深度伪造。
  • methods: 这篇论文使用了H.264 видео编解码器中的动作 векто(MV)和信息面罩(IM)来检测深度伪造的 temporally inconsistent。
  • results: experiments 显示,这种方法可以实现高效的深度伪造检测,并且与传统的每帧RGB-only方法相比,有着更小的计算成本。
    Abstract Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person's expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.
    摘要 视频深伪是利用深度学习(DL)创建的假媒体,可以 manipulate 人的表情或身份。现有的大多数 DeepFake 检测方法是分析每帧图像独立,忽略帧之间的不一致和不自然的运动。一些 newer 方法使用光流模型来捕捉这种 temporal 方面,但它们的计算成本较高。相比之下,我们提议使用 H.264 视频编码器中的相关但 часто被忽略的动作向量(MV)和信息面积(IM),来检测 DeepFakes 中的时间不一致。我们的实验表明,这种方法可以减少计算成本,并且效果比 RGB 每帧方法更好。这可能导致新的实时时间感知 DeepFake 检测方法,用于视频通话和流媒体。

Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers

  • paper_url: http://arxiv.org/abs/2311.10242
  • repo_url: None
  • paper_authors: Staphord Bengesi, Hoda El-Sayed, Md Kamruzzaman Sarker, Yao Houkpati, John Irungu, Timothy Oladunni
  • for: This paper explores the recent advancements in Generative Artificial Intelligence, particularly the development and applications of cutting-edge tools like Bard, Stable Diffusion, DALL-E, Make-A-Video, Runway ML, and Jukebox.
  • methods: The paper discusses various state-of-the-art models used in these tools, including Stable Diffusion, transformer models like GPT-3 (recent GPT-4), variational autoencoders, and generative adversarial networks.
  • results: The paper highlights the remarkable capabilities of these tools in accomplishing tasks such as text generation, music composition, image creation, video production, code generation, and scientific work, and also discusses the challenges posed by these advancements.
    Abstract The launch of ChatGPT has garnered global attention, marking a significant milestone in the field of Generative Artificial Intelligence. While Generative AI has been in effect for the past decade, the introduction of ChatGPT has ignited a new wave of research and innovation in the AI domain. This surge in interest has led to the development and release of numerous cutting-edge tools, such as Bard, Stable Diffusion, DALL-E, Make-A-Video, Runway ML, and Jukebox, among others. These tools exhibit remarkable capabilities, encompassing tasks ranging from text generation and music composition, image creation, video production, code generation, and even scientific work. They are built upon various state-of-the-art models, including Stable Diffusion, transformer models like GPT-3 (recent GPT-4), variational autoencoders, and generative adversarial networks. This advancement in Generative AI presents a wealth of exciting opportunities and, simultaneously, unprecedented challenges. Throughout this paper, we have explored these state-of-the-art models, the diverse array of tasks they can accomplish, the challenges they pose, and the promising future of Generative Artificial Intelligence.
    摘要 Launch of ChatGPT 引发全球关注,标志着生成人工智能领域的重要突破口。过去十年来,生成 AI 已经在发展,但 ChatGPT 的出现启动了新一波的研究和创新在 AI 领域。这一波的兴趣使得许多 cutting-edge 工具的出现,如 Bard、Stable Diffusion、DALL-E、Make-A-Video、Runway ML 和 Jukebox 等。这些工具在文本生成、音乐创作、图像创建、视频制作、代码生成和科学工作等多个任务上表现出惊人的能力。它们基于多种当前最先进的模型,包括 Stable Diffusion、变换器模型如 GPT-3(最新 GPT-4)、变量自动编码器和生成对抗网络。这一次的生成 AI 发展呈现了许多激动人心的机遇,同时也面临着前所未有的挑战。在这篇文章中,我们探讨了这些当前最先进的模型,它们可以完成的多种任务,它们的挑战,以及生成人工智能的未来。Note: Please keep in mind that the translation is done by a machine and may not be perfect. If you need a more accurate translation, please consider using a professional translation service.