eess.IV - 2023-08-14

Automated Ensemble-Based Segmentation of Adult Brain Tumors: A Novel Approach Using the BraTS AFRICA Challenge Data

paper_url: http://arxiv.org/abs/2308.07214
repo_url: None
paper_authors: Chiranjeewee Prasad Koirala, Sovesh Mohapatra, Advait Gosai, Gottfried Schlaug
for: 这篇论文旨在探讨使用深度学习技术来提高脑肿瘤分割精度，尤其是在 SUB-SAHARAN AFRICA 地区患者群体中。
methods: 这篇论文使用了多种多Modal MRI 数据，并提出了一种ensemble方法，包括eleven个不同的变体，基于三种核心架构：UNet3D、ONet3D 和 SphereNet3D，以及修改的损失函数。
results: 研究发现， ensemble方法可以在不同的核心架构和修改后的损失函数下提高评估指标，例如 Dice 分数为0.82、0.82和0.87分别用于提高肿瘤、肿瘤核心和全肿瘤标签。

Abstract
Brain tumors, particularly glioblastoma, continue to challenge medical diagnostics and treatments globally. This paper explores the application of deep learning to multi-modality magnetic resonance imaging (MRI) data for enhanced brain tumor segmentation precision in the Sub-Saharan Africa patient population. We introduce an ensemble method that comprises eleven unique variations based on three core architectures: UNet3D, ONet3D, SphereNet3D and modified loss functions. The study emphasizes the need for both age- and population-based segmentation models, to fully account for the complexities in the brain. Our findings reveal that the ensemble approach, combining different architectures, outperforms single models, leading to improved evaluation metrics. Specifically, the results exhibit Dice scores of 0.82, 0.82, and 0.87 for enhancing tumor, tumor core, and whole tumor labels respectively. These results underline the potential of tailored deep learning techniques in precisely segmenting brain tumors and lay groundwork for future work to fine-tune models and assess performance across different brain regions.

摘要
脑肿、特别是 glioblastoma，在全球医学诊断和治疗中仍然存在挑战。这篇论文探讨了深度学习在多modal磁共振成像（MRI）数据上的应用，以提高脑肿分 segmentation精度在非洲南部患者人口中。我们介绍了一个集成方法，包括eleven个独特变种，基于三种核心架构：UNet3D、ONet3D 和 SphereNet3D，以及修改的损失函数。研究强调了需要Age和Population基于的分 segmentation模型，以全面考虑脑部的复杂性。我们的发现表明，ensemble方法，将不同架构相结合，可以提高评价指标。具体来说，结果表明，整合ensemble方法可以提高评价指标，达到0.82、0.82和0.87的Dice分数。这些结果证明了深度学习技术在精确地分 segmentation脑肿中的潜力，并为未来细化模型和评估不同脑部区域的性能奠定基础。

SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation

paper_url: http://arxiv.org/abs/2308.07156
repo_url: None
paper_authors: An Wang, Mobarakol Islam, Mengya Xu, Yang Zhang, Hongliang Ren
for: 这个论文主要研究的是semantic segmentation模型Segment Anything Model（SAM）在Robotic Surgery领域的Robustness和零shot泛化能力。
methods: 这个论文使用了SAM模型，以及不同的提示方法，包括bounding box和点提示。
results: 研究发现，SAM在bounding box提示下表现出remarkable的零shot泛化能力，但在点提示和无提示情况下表现不佳，特别是在复杂的外科手术场景下。此外，SAM也存在对数据损害的敏感性和难以在不同情况下维持高性能的问题。

Abstract
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation and demonstrates remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examine SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explore different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compare the performance of SAM with state-of-the-art supervised models. We conduct all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results reveal that although SAM shows remarkable zero-shot generalization ability with bounding box prompts, it struggles to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrate that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggles to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM is insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempt to fine-tune SAM using Low-rank Adaptation (LoRA) and propose SurgicalSAM, which shows the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks.

摘要
Segment Anything Model (SAM) acted as a fundamental model for semantic segmentation and demonstrated remarkable generalization capabilities across a wide range of downstream scenarios. In this empirical study, we examined SAM's robustness and zero-shot generalizability in the field of robotic surgery. We comprehensively explored different scenarios, including prompted and unprompted situations, bounding box and points-based prompt approaches, as well as the ability to generalize under corruptions and perturbations at five severity levels. Additionally, we compared the performance of SAM with state-of-the-art supervised models. We conducted all the experiments with two well-known robotic instrument segmentation datasets from MICCAI EndoVis 2017 and 2018 challenges. Our extensive evaluation results showed that although SAM showed remarkable zero-shot generalization ability with bounding box prompts, it struggled to segment the whole instrument with point-based prompts and unprompted settings. Furthermore, our qualitative figures demonstrated that the model either failed to predict certain parts of the instrument mask (e.g., jaws, wrist) or predicted parts of the instrument as wrong classes in the scenario of overlapping instruments within the same bounding box or with the point-based prompt. In fact, SAM struggled to identify instruments in complex surgical scenarios characterized by the presence of blood, reflection, blur, and shade. Additionally, SAM was insufficiently robust to maintain high performance when subjected to various forms of data corruption. We also attempted to fine-tune SAM using Low-rank Adaptation (LoRA) and proposed SurgicalSAM, which showed the capability in class-wise mask prediction without prompt. Therefore, we can argue that, without further domain-specific fine-tuning, SAM is not ready for downstream surgical tasks.

FocusFlow: Boosting Key-Points Optical Flow Estimation for Autonomous Driving

paper_url: http://arxiv.org/abs/2308.07104
repo_url: https://github.com/zhonghuayi/focusflow_official
paper_authors: Zhonghua Yi, Hao Shi, Kailun Yang, Qi Jiang, Yaozu Ye, Ze Wang, Kaiwei Wang
for:The paper is focused on improving the performance of optical flow estimation in key-point-critical scenarios for autonomous driving applications.methods:The proposed method, called FocusFlow, uses a points-based modeling approach that explicitly learns key-point-related priors. It also introduces a new loss function called Conditional Point Control Loss (CPCL) and a Condition Control Encoder (CCE) to improve the performance of the model.results:The proposed FocusFlow framework shows outstanding performance with up to +44.5% precision improvement on various key points such as ORB, SIFT, and even learning-based SiLK, and exceptional scalability for most existing data-driven optical flow methods. It also yields competitive or superior performances rivaling the original models on the whole frame.

Abstract
Key-point-based scene understanding is fundamental for autonomous driving applications. At the same time, optical flow plays an important role in many vision tasks. However, due to the implicit bias of equal attention on all points, classic data-driven optical flow estimation methods yield less satisfactory performance on key points, limiting their implementations in key-point-critical safety-relevant scenarios. To address these issues, we introduce a points-based modeling method that requires the model to learn key-point-related priors explicitly. Based on the modeling method, we present FocusFlow, a framework consisting of 1) a mix loss function combined with a classic photometric loss function and our proposed Conditional Point Control Loss (CPCL) function for diverse point-wise supervision; 2) a conditioned controlling model which substitutes the conventional feature encoder by our proposed Condition Control Encoder (CCE). CCE incorporates a Frame Feature Encoder (FFE) that extracts features from frames, a Condition Feature Encoder (CFE) that learns to control the feature extraction behavior of FFE from input masks containing information of key points, and fusion modules that transfer the controlling information between FFE and CFE. Our FocusFlow framework shows outstanding performance with up to +44.5% precision improvement on various key points such as ORB, SIFT, and even learning-based SiLK, along with exceptional scalability for most existing data-driven optical flow methods like PWC-Net, RAFT, and FlowFormer. Notably, FocusFlow yields competitive or superior performances rivaling the original models on the whole frame. The source code will be available at https://github.com/ZhonghuaYi/FocusFlow_official.

摘要
基点 centered scene understanding 是自动驾驶应用的基础。同时，光流扮演了许多视觉任务中重要的角色。然而，由于约定性偏袋所有点都具有相同的注意力， класси的数据驱动光流估计方法在关键点上的表现不如预期，这限制了它们在关键点关键的安全相关场景中的实现。为解决这些问题，我们介绍了一种点 cloud 模型化方法，要求模型直接学习关键点相关的前置知识。基于该方法，我们提出了 FOCUSFLOW 框架，包括以下两个部分：1. 一种 mix 损失函数，与 класси的光学损失函数和我们所提出的 Conditional Point Control Loss (CPCL) 函数进行多样化点wise 超vision;2. 一种受控制的模型，替换了传统的特征编码器，我们提出的 Condition Control Encoder (CCE)。CCE 包括 Frame Feature Encoder (FFE)、Condition Feature Encoder (CFE) 和 fusion module，CFE 通过输入Mask 中关键点信息来学习控制 FFE 提取特征的行为，并将控制信息传递给 FFE。我们的 FOCUSFLOW 框架在多个关键点上（包括 ORB、SIFT 和学习基于 SiLK 的）表现出色，同时具有出色的扩展性，可以与大多数现有的数据驱动光流估计方法（如 PWC-Net、RAFT 和 FlowFormer）进行比较。特别是，FOCUSFLOW 在整个帧上的表现与原始模型相当或更好。源代码将在 GitHub 上发布，详情请参考。

When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation

paper_url: http://arxiv.org/abs/2308.07093
repo_url: None
paper_authors: Chenwei Wang, Jifang Pei, Zhiyong Wang, Yulin Huang, Junjie Wu, Haiguang Yang, Jianyu Yang
for: 本文旨在提出一种基于多任务学习的Synthetic Aperture Radar（SAR）自动目标识别（ATR）方法，以便同时提取多种目标特征。
methods: 本文提出了一种基于深度学习理论的多任务学习框架，包括两个主要结构：编码器和解码器。编码器用于提取不同尺度的图像特征，而解码器则是一种任务特有的结构，它使用这些提取的特征进行适应性和优化性地进行识别和分割。
results: 根据Moving and Stationary Target Acquisition and Recognition（MSTAR） dataset的实验结果表明，提出的方法在识别和分割方面具有优秀的性能。

Abstract
With the recent advances of deep learning, automatic target recognition (ATR) of synthetic aperture radar (SAR) has achieved superior performance. By not being limited to the target category, the SAR ATR system could benefit from the simultaneous extraction of multifarious target attributes. In this paper, we propose a new multi-task learning approach for SAR ATR, which could obtain the accurate category and precise shape of the targets simultaneously. By introducing deep learning theory into multi-task learning, we first propose a novel multi-task deep learning framework with two main structures: encoder and decoder. The encoder is constructed to extract sufficient image features in different scales for the decoder, while the decoder is a tasks-specific structure which employs these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation. Therefore, the proposed framework has the ability to achieve superior recognition and segmentation performance. Based on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset, experimental results show the superiority of the proposed framework in terms of recognition and segmentation.

摘要
(Simplified Chinese translation)随着深度学习的进步，激光探测器自动目标识别（ATR）已经达到了出色的性能。由于不受目标类别的限制，SAR ATR系统可以从同时提取多种目标属性中受益。在这篇论文中，我们提出了一种新的多任务学习方法 для SAR ATR，可以同时获得目标的准确分类和精确的形状信息。通过将深度学习理论引入多任务学习中，我们首先提出了一种新的多任务深度学习框架，包括编码器和解码器两部分。编码器用于抽取不同缩放级别的图像特征，以便解码器使用这些抽取的特征进行适应性和优化的处理。因此，我们提出的框架具有提高认知和分割性能的能力。基于MSTAR数据集，我们的实验结果表明，我们的方法在认知和分割方面具有优越性。

Deepbet: Fast brain extraction of T1-weighted MRI using Convolutional Neural Networks

paper_url: http://arxiv.org/abs/2308.07003
repo_url: None
paper_authors: Lukas Fisch, Stefan Zumdick, Carlotta Barkhau, Daniel Emden, Jan Ernsting, Ramona Leenings, Kelvin Sarink, Nils R. Winter, Benjamin Risse, Udo Dannlowski, Tim Hahn
for: 这个论文的目的是提出一种高精度、快速的脑部EXTRACTION方法，用于多种神经成像预处理管道中。
methods: 该方法使用了现代的深度学习方法，包括LinkNet网络，在两个预测过程中进行两个阶段预测，从而提高了 segmentation 性能。
results: 该方法在跨验证中得到了 novel state-of-the-art 性能， median Dice 分数 (DSC) 为 99.0%，超过当前状态的艺术模型 (DSC = 97.8% 和 DSC = 97.9%)。此外，该方法能够更好地抗抑异常值，Dice 分数大于 96.9% для所有样本。最后，该模型可以加速脑部EXTRACTION的速度，比现有方法快约 10 倍，可以在低级别硬件上处理一个图像只需 ~2 秒。

Abstract
Brain extraction in magnetic resonance imaging (MRI) data is an important segmentation step in many neuroimaging preprocessing pipelines. Image segmentation is one of the research fields in which deep learning had the biggest impact in recent years enabling high precision segmentation with minimal compute. Consequently, traditional brain extraction methods are now being replaced by deep learning-based methods. Here, we used a unique dataset comprising 568 T1-weighted (T1w) MR images from 191 different studies in combination with cutting edge deep learning methods to build a fast, high-precision brain extraction tool called deepbet. deepbet uses LinkNet, a modern UNet architecture, in a two stage prediction process. This increases its segmentation performance, setting a novel state-of-the-art performance during cross-validation with a median Dice score (DSC) of 99.0% on unseen datasets, outperforming current state of the art models (DSC = 97.8% and DSC = 97.9%). While current methods are more sensitive to outliers, resulting in Dice scores as low as 76.5%, deepbet manages to achieve a Dice score of > 96.9% for all samples. Finally, our model accelerates brain extraction by a factor of ~10 compared to current methods, enabling the processing of one image in ~2 seconds on low level hardware.

摘要
magnetic resonance imaging（MRI）数据中的脑部提取是许多神经成像预处理管道中重要的分 segmentation步骤。图像分 segmentation是深度学习过去几年内最大的影响领域之一，使得传统的脑部提取方法被取代了深度学习基于方法。我们使用了568张T1-weighted（T1w）MR图像从191个不同的研究中的独特数据集，并使用最新的深度学习方法来构建一个高速、高精度的脑部提取工具called deepbet。deepbet使用了LinkNet，一种现代的UNet架构，在两个预测过程中进行两个阶段预测。这使得它的分 segmentation性能得到了提升，在批处理中 median Dice score（DSC）达到了99.0%，超越当前状态的艺术模型（DSC = 97.8%和DSC = 97.9%）。而当前方法更敏感于异常值，导致Dice score只有76.5%，而deepbet则可以达到> 96.9%的Dice score。最后，我们的模型可以将脑部提取加速了约10倍，使得一张图像在低级别硬件上只需2秒钟左右。

How inter-rater variability relates to aleatoric and epistemic uncertainty: a case study with deep learning-based paraspinal muscle segmentation

paper_url: http://arxiv.org/abs/2308.06964
repo_url: None
paper_authors: Parinaz Roshanzamir, Hassan Rivaz, Joshua Ahn, Hamza Mirza, Neda Naghdi, Meagan Anstruther, Michele C. Battié, Maryse Fortin, Yiming Xiao
for: This paper aims to explore the relationship between inter-rater variability and uncertainty in deep learning models for medical image segmentation, and to compare the performance of different DL models and label fusion strategies.methods: The authors use test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to measure aleatoric and epistemic uncertainties, and compare the performance of UNet and TransUNet with two label fusion strategies.results: The study reveals the interplay between inter-rater variability and uncertainties, and shows that the choice of label fusion strategies and DL models can affect the performance and uncertainty of the resulting algorithms.

Abstract
Recent developments in deep learning (DL) techniques have led to great performance improvement in medical image segmentation tasks, especially with the latest Transformer model and its variants. While labels from fusing multi-rater manual segmentations are often employed as ideal ground truths in DL model training, inter-rater variability due to factors such as training bias, image noise, and extreme anatomical variability can still affect the performance and uncertainty of the resulting algorithms. Knowledge regarding how inter-rater variability affects the reliability of the resulting DL algorithms, a key element in clinical deployment, can help inform better training data construction and DL models, but has not been explored extensively. In this paper, we measure aleatoric and epistemic uncertainties using test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble to explore their relationship with inter-rater variability. Furthermore, we compare UNet and TransUNet to study the impacts of Transformers on model uncertainty with two label fusion strategies. We conduct a case study using multi-class paraspinal muscle segmentation from T2w MRIs. Our study reveals the interplay between inter-rater variability and uncertainties, affected by choices of label fusion strategies and DL models.

摘要
In this paper, we investigate the relationship between inter-rater variability and the reliability of DL algorithms by measuring aleatoric and epistemic uncertainties using test-time augmentation (TTA), test-time dropout (TTD), and deep ensemble. We also compare UNet and TransUNet to study the impact of Transformers on model uncertainty with two label fusion strategies. Our case study focuses on multi-class paraspinal muscle segmentation from T2w MRIs.Our findings reveal an interplay between inter-rater variability and uncertainties, which are influenced by the choice of label fusion strategies and DL models. By understanding these relationships, we can better construct training data and develop more reliable DL models for clinical applications.

Robustness Stress Testing in Medical Image Classification

paper_url: http://arxiv.org/abs/2308.06889
repo_url: https://github.com/mobarakol/robustness_stress_testing
paper_authors: Mobarakol Islam, Zeju Li, Ben Glocker
for: This paper aims to assess the robustness and equity of disease detection models using progressive stress testing.methods: The authors use five different bidirectional and unidirectional image perturbations with six different severity levels to test the models’ robustness.results: The authors find that some models may yield more robust and equitable performance than others, and pretraining characteristics play an important role in downstream robustness.

Abstract
Deep neural networks have shown impressive performance for image-based disease detection. Performance is commonly evaluated through clinical validation on independent test sets to demonstrate clinically acceptable accuracy. Reporting good performance metrics on test sets, however, is not always a sufficient indication of the generalizability and robustness of an algorithm. In particular, when the test data is drawn from the same distribution as the training data, the iid test set performance can be an unreliable estimate of the accuracy on new data. In this paper, we employ stress testing to assess model robustness and subgroup performance disparities in disease detection models. We design progressive stress testing using five different bidirectional and unidirectional image perturbations with six different severity levels. As a use case, we apply stress tests to measure the robustness of disease detection models for chest X-ray and skin lesion images, and demonstrate the importance of studying class and domain-specific model behaviour. Our experiments indicate that some models may yield more robust and equitable performance than others. We also find that pretraining characteristics play an important role in downstream robustness. We conclude that progressive stress testing is a viable and important tool and should become standard practice in the clinical validation of image-based disease detection models.

摘要
深度神经网络在图像基于疾病检测方面表现出色。性能通常通过临床验证方法进行评估，以证明在临床上达到可接受的准确率。然而，只是在测试数据集上报告好的性能指标不一定是一个可靠的指示器，特别是当测试数据集来自同一个分布为训练数据集时，iid测试集性能可能是一个不可靠的准确率估计。在这篇论文中，我们使用压力测试来评估模型的可靠性和 subgroup 性能差异。我们设计了进行逆向和直向的图像扰动，并使用六个不同的严重程度。作为一个使用场景，我们将压力测试应用于肺X射线和皮肤损伤图像中的疾病检测模型，并证明了研究类和领域特定的模型行为的重要性。我们的实验表示某些模型可能具有更高的可靠性和公平性。我们还发现预训练特征对下游可靠性具有重要作用。我们结论，进行进程式压力测试是一种可靠的和重要的工具，应成为临床验证图像基于疾病检测模型的标准实践。