cs.AI - 2023-08-30

A General-Purpose Self-Supervised Model for Computational Pathology

  • paper_url: http://arxiv.org/abs/2308.15474
  • repo_url: None
  • paper_authors: Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Bowen Chen, Andrew Zhang, Daniel Shao, Andrew H. Song, Muhammad Shaban, Mane Williams, Anurag Vaidya, Sharifa Sahai, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Walt Williams, Long Phi Le, Georg Gerber, Faisal Mahmood
  • for: 本研究旨在提出一种通用的自助学习模型,用于解决生物病理学图像分类和诊断问题。
  • methods: 该模型使用了100万个各种组织类型的病理图像补充,并通过自助学习方法进行预训练。
  • results: 该模型在33种不同的临床任务中表现出色,包括分类、诊断和疾病类型划分等。此外,模型还能够在各种组织类型和诊断难度不同的情况下进行泛化和转移。
    Abstract Tissue phenotyping is a fundamental computational pathology (CPath) task in learning objective characterizations of histopathologic biomarkers in anatomic pathology. However, whole-slide imaging (WSI) poses a complex computer vision problem in which the large-scale image resolutions of WSIs and the enormous diversity of morphological phenotypes preclude large-scale data annotation. Current efforts have proposed using pretrained image encoders with either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin-stained WSIs across 20 major tissue types, and evaluated on 33 representative CPath clinical tasks in CPath of varying diagnostic difficulties. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree code classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient AI models that can generalize and transfer to a gamut of diagnostically-challenging tasks and clinical workflows in anatomic pathology.
    摘要 组织现象评估是computational pathology(CPath)的基本任务,它的目标是通过学习标注组织生物marker的Objective characterizations,以帮助诊断医学。然而,整个标本影像(WSI)对于计算机视觉问题而言是一个复杂的问题,因为标本影像的大规模分辨率和丰富多样性的形态现象对于大规模标注数据的生成提供了一定的挑战。目前的努力已经提议使用预训归数图像Encoder, either transfer learning from natural image datasets or self-supervised pretraining on publicly-available histopathology datasets,但这些努力尚未得到了广泛的发展和评估。我们提出了UNI,一个通用的自主学习模型,预训使用了1000万个组织小图像,来自100000多个诊断HE染色标本影像,并在20个主要组织类型上进行了33个CPath临床任务的评估。此外,我们还展示了一些新的模型化能力,例如resolution-agnostic tissue classification、slides classification using few-shot class prototypes、和疾病分类对108种癌症的分类。UNI在CPath中进行了无监督学习的扩展,并在这些任务上实现了资料效率的AI模型,可以对诊断挑战性任务和临床工作流程进行普遍和转移。

Multimodal Contrastive Learning and Tabular Attention for Automated Alzheimer’s Disease Prediction

  • paper_url: http://arxiv.org/abs/2308.15469
  • repo_url: None
  • paper_authors: Weichen Huang
  • for: 这个研究旨在开发一个多 modal 对照学习框架,以利用 MRI 扫描和 PET 等神经成像数据,并处理 AD 疾病数据中的值得注意的表格资料。
  • methods: 这个框架使用了一个新的表格注意模组,可以强调和排名表格中的重要特征。它还使用了多 modal 对照学习技术,以将图像和表格资料结合在一起。
  • results: 实验结果显示,这个框架可以从 ADNI 数据库中的逾 882 个 MRI 扫描标本中检测出 AD 疾病,并且可以实现高于 83.8% 的准确率,与前一代的州度优化技术相比,提高了约 10%。
    Abstract Alongside neuroimaging such as MRI scans and PET, Alzheimer's disease (AD) datasets contain valuable tabular data including AD biomarkers and clinical assessments. Existing computer vision approaches struggle to utilize this additional information. To address these needs, we propose a generalizable framework for multimodal contrastive learning of image data and tabular data, a novel tabular attention module for amplifying and ranking salient features in tables, and the application of these techniques onto Alzheimer's disease prediction. Experimental evaulations demonstrate the strength of our framework by detecting Alzheimer's disease (AD) from over 882 MR image slices from the ADNI database. We take advantage of the high interpretability of tabular data and our novel tabular attention approach and through attribution of the attention scores for each row of the table, we note and rank the most predominant features. Results show that the model is capable of an accuracy of over 83.8%, almost a 10% increase from previous state of the art.
    摘要 alongside neuroimaging such as MRI scans and PET, Alzheimer's disease (AD) datasets contain valuable tabular data including AD biomarkers and clinical assessments. Existing computer vision approaches struggle to utilize this additional information. To address these needs, we propose a generalizable framework for multimodal contrastive learning of image data and tabular data, a novel tabular attention module for amplifying and ranking salient features in tables, and the application of these techniques onto Alzheimer's disease prediction. Experimental evaulations demonstrate the strength of our framework by detecting Alzheimer's disease (AD) from over 882 MR image slices from the ADNI database. We take advantage of the high interpretability of tabular data and our novel tabular attention approach and through attribution of the attention scores for each row of the table, we note and rank the most predominant features. Results show that the model is capable of an accuracy of over 83.8%, almost a 10% increase from previous state of the art.Here's the translation in Traditional Chinese:附加了脑成像技术,如MRI扫描和PET,Alzheimer病(AD)数据集包含重要的表格资料,包括AD标识和临床评估。现有的计算机视觉方法对这些额外资讯难以使用。为解决这些需求,我们提出了一个通用的多modal对比学习框架,一个新的表格注意模组,以强调和排名表格中的重要特征。我们还应用了这些技术 onto Alzheimer病预测。实验评估显示了我们的框架在ADNI数据库中的882个MRI图像探针中检测到Alzheimer病的能力,比前一代的state of the art约10%高。我们利用表格资料的高解释性和我们的新的表格注意方法,通过每行表格中的注意分数汇总,发现和排名表格中的最主要特征。结果显示模型可以达到83.8%的精度,比前一代的state of the art约10%高。

A Comparative Study of Loss Functions: Traffic Predictions in Regular and Congestion Scenarios

  • paper_url: http://arxiv.org/abs/2308.15464
  • repo_url: https://github.com/xieyangxinyu/a-comparative-study-of-loss-functions-traffic-predictions-in-regular-and-congestion-scenarios
  • paper_authors: Yangxinyu Xie, Tanwi Mallick
  • for: 这 paper 的目的是提高深度学习模型在交通预测中的精度,特别是预测堵塞情况。
  • methods: 这 paper 使用了多种积分函数,包括 MAE-Focal Loss 和 Gumbel Loss,来解决传统损失函数的局限性。
  • results: 经过大规模实验,这 paper 发现 MAE-Focal Loss 和 Gumbel Loss 在预测交通速度方面具有最高效果,能够准确预测堵塞情况而不会妨碍正常交通预测。
    Abstract Spatiotemporal graph neural networks have achieved state-of-the-art performance in traffic forecasting. However, they often struggle to forecast congestion accurately due to the limitations of traditional loss functions. While accurate forecasting of regular traffic conditions is crucial, a reliable AI system must also accurately forecast congestion scenarios to maintain safe and efficient transportation. In this paper, we explore various loss functions inspired by heavy tail analysis and imbalanced classification problems to address this issue. We evaluate the efficacy of these loss functions in forecasting traffic speed, with an emphasis on congestion scenarios. Through extensive experiments on real-world traffic datasets, we discovered that when optimizing for Mean Absolute Error (MAE), the MAE-Focal Loss function stands out as the most effective. When optimizing Mean Squared Error (MSE), Gumbel Loss proves to be the superior choice. These choices effectively forecast traffic congestion events without compromising the accuracy of regular traffic speed forecasts. This research enhances deep learning models' capabilities in forecasting sudden speed changes due to congestion and underscores the need for more research in this direction. By elevating the accuracy of congestion forecasting, we advocate for AI systems that are reliable, secure, and resilient in practical traffic management scenarios.
    摘要 现代交通预测中使用的空间时间图 neural network 已经达到了领先的性能水平。然而,它们经常因传统的损失函数的局限性而难以准确预测堵塞情况。尽管正确预测常规交通情况是非常重要,但一个可靠的 AI 系统也必须准确预测堵塞场景,以保证安全和高效的交通运输。在这篇论文中,我们探讨了各种启发自重态分析和不均衡分类问题的损失函数,以解决这一问题。我们对这些损失函数在预测交通速度方面的效果进行了广泛的实验,发现了使用 MAE-Focal Loss 函数时,MAE 函数在预测堵塞场景中表现最佳。使用 MSE 函数时,Gumbel Loss 函数表现最佳。这些选择可以准确预测交通堵塞事件,不会 compromise 正常交通速度预测的准确性。这项研究提高了深度学习模型在预测快速变化的能力,并强调了对堵塞预测的需求。我们认为,通过提高堵塞预测的准确性,可以建立可靠、安全、可靠的 AI 系统,以满足实际交通管理场景中的需求。

ParaGuide: Guided Diffusion Paraphrasers for Plug-and-Play Textual Style Transfer

  • paper_url: http://arxiv.org/abs/2308.15459
  • repo_url: https://github.com/zacharyhorvitz/ParaGuide
  • paper_authors: Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown
  • for: 文章的目的是将文本的风格特征转换为新的风格,保留 semantic information。
  • methods: 本研究使用了一个新的扩散基础架构,叫做 ParaGuide,可以灵活地适应任意目标风格。这个方法使用了句子重写条件下的扩散模型,以及Gradient-based guidance from both off-the-shelf classifiers和强大的现有风格嵌入。
  • results: 研究在Enron Email Corpus上进行了人工和自动评估,和强大的基eline均以上。它可以成功地将文本的风格转换为新的风格,保留 semantic information。
    Abstract Textual style transfer is the task of transforming stylistic properties of text while preserving meaning. Target "styles" can be defined in numerous ways, ranging from single attributes (e.g, formality) to authorship (e.g, Shakespeare). Previous unsupervised style-transfer approaches generally rely on significant amounts of labeled data for only a fixed set of styles or require large language models. In contrast, we introduce a novel diffusion-based framework for general-purpose style transfer that can be flexibly adapted to arbitrary target styles at inference time. Our parameter-efficient approach, ParaGuide, leverages paraphrase-conditioned diffusion models alongside gradient-based guidance from both off-the-shelf classifiers and strong existing style embedders to transform the style of text while preserving semantic information. We validate the method on the Enron Email Corpus, with both human and automatic evaluations, and find that it outperforms strong baselines on formality, sentiment, and even authorship style transfer.
    摘要 文本风格转换是将文本的风格属性转换为另一种风格的任务,保持意义不变。目标风格可以定义为多种方式,从单一特征(例如正式度)到作者(例如莎士比亚)。前一代无监督风格转换方法通常需要大量标注数据,仅适用于固定的风格集或需要大型语言模型。相比之下,我们介绍了一种新的扩散基于框架,可以通过扩散模型来实现通用风格转换,并在推理时适应任意目标风格。我们的参数有效的方法, ParaGuide,利用了句子重构conditional扩散模型,并与梯度导航从存储类фика器和强有力的现有风格编码器来转换文本的风格,保持 semantic information。我们在恩рон电子邮件集上验证了该方法,并与人类和自动评估表明,它在正式度、情感和作者风格转换方面超过了强大基eline。

From SMOTE to Mixup for Deep Imbalanced Classification

  • paper_url: http://arxiv.org/abs/2308.15457
  • repo_url: https://github.com/ntucllab/imbalanced-dl
  • paper_authors: Wei-Chao Cheng, Tan-Ha Mai, Hsuan-Tien Lin
  • for: 本研究旨在探讨深度学习中对异质数据的处理方法,尤其是SMOTE数据增强技术是否有利于深度学习。
  • methods: 本研究使用了SMOTE技术,以及其扩展版本——软标签SMOTE和混合技术。
  • results: 研究发现,通过将SMOTE和混合技术结合使用,可以提高深度学习模型的泛化性能,并且在极端异质数据上达到最佳性能。
    Abstract Given imbalanced data, it is hard to train a good classifier using deep learning because of the poor generalization of minority classes. Traditionally, the well-known synthetic minority oversampling technique (SMOTE) for data augmentation, a data mining approach for imbalanced learning, has been used to improve this generalization. However, it is unclear whether SMOTE also benefits deep learning. In this work, we study why the original SMOTE is insufficient for deep learning, and enhance SMOTE using soft labels. Connecting the resulting soft SMOTE with Mixup, a modern data augmentation technique, leads to a unified framework that puts traditional and modern data augmentation techniques under the same umbrella. A careful study within this framework shows that Mixup improves generalization by implicitly achieving uneven margins between majority and minority classes. We then propose a novel margin-aware Mixup technique that more explicitly achieves uneven margins. Extensive experimental results demonstrate that our proposed technique yields state-of-the-art performance on deep imbalanced classification while achieving superior performance on extremely imbalanced data. The code is open-sourced in our developed package https://github.com/ntucllab/imbalanced-DL to foster future research in this direction.
    摘要

When Do Program-of-Thoughts Work for Reasoning?

  • paper_url: http://arxiv.org/abs/2308.15452
  • repo_url: https://github.com/zjunlp/easyinstruct
  • paper_authors: Zhen Bi, Ningyu Zhang, Yinuo Jiang, Shumin Deng, Guozhou Zheng, Huajun Chen
  • for: 本研究旨在探讨 Large Language Models (LLMs) 在肢体人工智能领域的逻辑能力如何提高,以及程序语言的影响。
  • methods: 本研究提出了一种 complexity-impacted reasoning score (CIRS),用于衡量程序语言的结构和逻辑特性对逻辑能力的影响。CIRS 使用抽象树来编码结构信息,并通过考虑困难性和环状复杂性来计算逻辑复杂性。
  • results: 研究发现,不同的程序数据复杂性会影响 LLMS 的逻辑能力提升。优化的复杂度是关键的,程序帮助提示可以提高逻辑能力。研究还提出了一种自动生成和分级算法,并应用于数学逻辑和代码生成任务。广泛的结果表明了我们的提出的方法的有效性。
    Abstract The reasoning capabilities of Large Language Models (LLMs) play a pivotal role in the realm of embodied artificial intelligence. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score (CIRS), which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach. Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
    摘要 LLMs的逻辑能力在人工智能中扮演着关键性角色。虽有有效的程序激活提示法 для LLMs,但代码数据对复杂逻辑任务的改进仍然未得到足够的探讨。为解决这个空白,我们提出了复杂性影响逻辑能力指数(CIRS),它将结构性和逻辑性特征相结合,用于衡量代码数据对逻辑能力的相关性。我们使用抽象树来编码结构信息,并根据难度和 cyclomatic complexity来计算逻辑复杂性。经 empirical 分析发现,不 всех复杂度的代码数据可以被 LLMS 学习或理解。优质的复杂度是关键的,以便通过程序帮助提示来提高逻辑能力。然后,我们设计了自动生成和分配算法,并应用它到数学逻辑和代码生成任务中。广泛的结果表明了我们的提出的方法的有效性。代码将会被集成到 EasyInstruct 框架中,可以在 中找到。

Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction

  • paper_url: http://arxiv.org/abs/2308.15427
  • repo_url: None
  • paper_authors: Wenjie Gao, Jiawei Fu, Haodong Jing, Nanning Zheng
  • for: 提高自动驾驶系统中的高清晰地图建构方法的性能,使其更敏感于废弃环境。
  • methods: 补充车载感知器上的信息使用卫星地图,提高HD地图建构方法的性能。提出一种层次融合模块,通过Feature级别融合和BEV级别融合来实现卫星地图信息的更好融合。
  • results: 在扩展nuScenes数据集上,证明了我们的模块可以准确地融合到现有的HD地图建构方法中,提高其在HD地图Semantic segmentation和实例检测任务中的性能。
    Abstract High-Definition (HD) maps play a crucial role in autonomous driving systems. Recent methods have attempted to construct HD maps in real-time based on information obtained from vehicle onboard sensors. However, the performance of these methods is significantly susceptible to the environment surrounding the vehicle due to the inherent limitation of onboard sensors, such as weak capacity for long-range detection. In this study, we demonstrate that supplementing onboard sensors with satellite maps can enhance the performance of HD map construction methods, leveraging the broad coverage capability of satellite maps. For the purpose of further research, we release the satellite map tiles as a complementary dataset of nuScenes dataset. Meanwhile, we propose a hierarchical fusion module that enables better fusion of satellite maps information with existing methods. Specifically, we design an attention mask based on segmentation and distance, applying the cross-attention mechanism to fuse onboard Bird's Eye View (BEV) features and satellite features in feature-level fusion. An alignment module is introduced before concatenation in BEV-level fusion to mitigate the impact of misalignment between the two features. The experimental results on the augmented nuScenes dataset showcase the seamless integration of our module into three existing HD map construction methods. It notably enhances their performance in both HD map semantic segmentation and instance detection tasks.
    摘要 高清定义(HD)地图在自动驾驶系统中扮演着关键角色。现有方法尝试在实时基础上构建HD地图,但这些方法的性能受周围环境的影响很大,因为搭载在车辆上的感知器件具有较弱的远程探测能力。在本研究中,我们发现可以通过补充搭载在车辆上的感知器件与卫星地图的信息来提高HD地图构建方法的性能。为进一步研究,我们释放了卫星地图块作为nuScenes数据集的补充数据集。此外,我们提议了一种层次融合模块,使得更好地融合卫星地图信息与现有方法。具体来说,我们设计了一个基于分割和距离的注意mask,通过交叉注意机制来融合搭载在 Bird's Eye View(BEV)上的特征和卫星特征在特征层融合。在BEV层融合之前,我们引入了一个对齐模块,以mitigate卫星特征和BEV特征之间的偏移的影响。实验结果表明,我们的模块可以覆盖现有的三种HD地图构建方法,并在HD地图semantic segmentation和实例检测任务中显著提高其性能。