results: 我们的方法在ADNI数据库上进行了ablation实验,使用了两种数据模式,获得了89.71%和91.18%的AD诊断准确性,并超过了一些现有的方法。Abstract
Structural MRI and PET imaging play an important role in the diagnosis of Alzheimer's disease (AD), showing the morphological changes and glucose metabolism changes in the brain respectively. The manifestations in the brain image of some cognitive impairment patients are relatively inconspicuous, for example, it still has difficulties in achieving accurate diagnosis through sMRI in clinical practice. With the emergence of deep learning, convolutional neural network (CNN) has become a valuable method in AD-aided diagnosis, but some CNN methods cannot effectively learn the features of brain image, making the diagnosis of AD still presents some challenges. In this work, we propose an end-to-end 3D CNN framework for AD diagnosis based on ResNet, which integrates multi-layer features obtained under the effect of the attention mechanism to better capture subtle differences in brain images. The attention maps showed our model can focus on key brain regions related to the disease diagnosis. Our method was verified in ablation experiments with two modality images on 792 subjects from the ADNI database, where AD diagnostic accuracies of 89.71% and 91.18% were achieved based on sMRI and PET respectively, and also outperformed some state-of-the-art methods.
摘要
Magnetic Resonance Imaging (MRI) 和 Positron Emission Tomography (PET) 影像技术在诊断阿尔茨海默病 (AD) 中扮演着重要的角色,显示了脑部的形态变化和葡萄糖代谢变化。但是,在临床实践中,使用深度学习的时候,一些 convolutional neural network (CNN) 方法无法有效地学习脑部影像中的特征,从而使得 AD 的诊断仍然存在一些挑战。在这项工作中,我们提出了基于 ResNet 的终端三维 CNN 框架,该框架可以更好地捕捉脑部影像中的微小差异。我们的注意力地図显示了我们的模型可以更好地关注与疾病诊断相关的关键脑部区域。我们的方法在 ADNI 数据库中进行了ablation实验,使用两种模式的图像进行诊断,并达到了89.71% 和 91.18% 的 AD 诊断精度。此外,我们的方法还超过了一些当前的状态艺术方法。
SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network
results: 通过对多个实验来说,提出的方法可以具有较好的Azimuth控制和SAR图像生成精度。Abstract
Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR images' azimuths. This network mainly contains three parts: generator, discriminator, and predictor. Through the proposed specific network structure, the generator can extract and fuse the optimal target features from two input SAR target images to generate SAR target image. Then a similarity discriminator and an azimuth predictor are designed. The similarity discriminator can differentiate the generated SAR target images from the real SAR images to ensure the accuracy of the generated, while the azimuth predictor measures the difference of azimuth between the generated and the desired to ensure the azimuth controllability of the generated. Therefore, the proposed network can generate precise SAR images, and their azimuths can be controlled well by the inputs of the deep network, which can generate the target images in different azimuths to solve the small sample problem to some degree and benefit the researches of SAR images. Extensive experimental results show the superiority of the proposed method in azimuth controllability and accuracy of SAR target image generation.
摘要
够多的Synthetic Aperture Radar(SAR)目标图像非常重要 для研究发展。然而,实际中可用的SAR目标图像往往受限,这会阻碍SAR应用的进步。在这篇论文中,我们提出了一种 azimuth 可控的生成对抗网络,可以生成具有中间 azimuth 的精准 SAR 目标图像。该网络主要包括三部分:生成器、判别器和预测器。通过我们的具体网络结构,生成器可以从两个输入 SAR 目标图像中提取和融合最佳的目标特征,生成 SAR 目标图像。然后,我们设计了一个相似性判别器和一个 azimuth 预测器。相似性判别器可以判断生成的 SAR 目标图像是否与实际 SAR 图像相似,以确保生成的准确性。而 azimuth 预测器则可以测量生成的 azimuth 与所需的 azimuth 之间的差异,以确保生成的 azimuth 可控性。因此,我们的网络可以生成精准的 SAR 目标图像,并且可以通过输入深度网络的参数控制生成的 azimuth。这有助于解决小样本问题,并为 SAR 图像研究提供一些答案。我们的实验结果表明,我们的方法在 azimuth 可控性和精准性方面具有显著优势。
Surface Masked AutoEncoder: Self-Supervision for Cortical Imaging Data
paper_authors: Simon Dahan, Mariana da Silva, Daniel Rueckert, Emma C Robinson
for: This paper aims to improve the performance of vision transformer models in cortical surface learning tasks, specifically in the context of cortical imaging where datasets are limited in size.
methods: The proposed method uses Masked AutoEncoder (MAE) self-supervision to pre-train vision transformer models on large datasets, such as the UK Biobank (UKB), and then fine-tunes the models on smaller cortical phenotype regression datasets.
results: The pre-trained models achieve a 26% improvement in performance and an 80% faster convergence compared to models trained from scratch, demonstrating the effectiveness of the proposed method in learning strong representations for cortical surface learning tasks.Abstract
Self-supervision has been widely explored as a means of addressing the lack of inductive biases in vision transformer architectures, which limits generalisation when networks are trained on small datasets. This is crucial in the context of cortical imaging, where phenotypes are complex and heterogeneous, but the available datasets are limited in size. This paper builds upon recent advancements in translating vision transformers to surface meshes and investigates the potential of Masked AutoEncoder (MAE) self-supervision for cortical surface learning. By reconstructing surface data from a masked version of the input, the proposed method effectively models cortical structure to learn strong representations that translate to improved performance in downstream tasks. We evaluate our approach on cortical phenotype regression using the developing Human Connectome Project (dHCP) and demonstrate that pre-training leads to a 26\% improvement in performance, with an 80\% faster convergence, compared to models trained from scratch. Furthermore, we establish that pre-training vision transformer models on large datasets, such as the UK Biobank (UKB), enables the acquisition of robust representations for finetuning in low-data scenarios. Our code and pre-trained models are publicly available at \url{https://github.com/metrics-lab/surface-vision-transformers}.
摘要
自我超视听已经广泛探索了用于解决视transformer架构中缺乏逻辑假设的问题,这限制了通用化 quando networks是在小数据集上训练的。这篇文章基于latest advancements in translating vision transformers to surface meshes, and investigates the potential of Masked AutoEncoder (MAE) self-supervision for cortical surface learning. By reconstructing surface data from a masked version of the input, the proposed method effectively models cortical structure to learn strong representations that translate to improved performance in downstream tasks. We evaluate our approach on cortical phenotype regression using the developing Human Connectome Project (dHCP) and demonstrate that pre-training leads to a 26% improvement in performance, with an 80% faster convergence, compared to models trained from scratch. Furthermore, we establish that pre-training vision transformer models on large datasets, such as the UK Biobank (UKB), enables the acquisition of robust representations for finetuning in low-data scenarios. Our code and pre-trained models are publicly available at \url{https://github.com/metrics-lab/surface-vision-transformers}.Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.
Global in Local: A Convolutional Transformer for SAR ATR FSL
results: 在Moving and Stationary Target Acquisition and Recognition(MSTAR)数据集上进行了实验,并达到了 pioneering 的性能,不需要其他SAR目标图像进行训练。Abstract
Convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, we propose a Convolutional Transformer (ConvT) for SAR ATR few-shot learning (FSL). The proposed method focuses on constructing a hierarchical feature representation and capturing global dependencies of local features in each layer, named global in local. A novel hybrid loss is proposed to interpret the few SAR images in the forms of recognition labels and contrastive image pairs, construct abundant anchor-positive and anchor-negative image pairs in one batch and provide sufficient loss for the optimization of the ConvT to overcome the few sample effect. An auto augmentation is proposed to enhance and enrich the diversity and amount of the few training samples to explore the hidden feature in a few SAR images and avoid the over-fitting in SAR ATR FSL. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition dataset (MSTAR) have shown the effectiveness of our proposed ConvT for SAR ATR FSL. Different from existing SAR ATR FSL methods employing additional training datasets, our method achieved pioneering performance without other SAR target images in training.
摘要
convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, we propose a Convolutional Transformer (ConvT) for SAR ATR few-shot learning (FSL). The proposed method focuses on constructing a hierarchical feature representation and capturing global dependencies of local features in each layer, named global in local. A novel hybrid loss is proposed to interpret the few SAR images in the forms of recognition labels and contrastive image pairs, construct abundant anchor-positive and anchor-negative image pairs in one batch and provide sufficient loss for the optimization of the ConvT to overcome the few sample effect. An auto augmentation is proposed to enhance and enrich the diversity and amount of the few training samples to explore the hidden feature in a few SAR images and avoid the over-fitting in SAR ATR FSL. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition dataset (MSTAR) have shown the effectiveness of our proposed ConvT for SAR ATR FSL. Different from existing SAR ATR FSL methods employing additional training datasets, our method achieved pioneering performance without other SAR target images in training.Here's the breakdown of the translation:* "convolutional neural networks" is translated as " convolutional neural networks" (同 "convolutional neural networks" in English)* "synthetic aperture radar" is translated as "干扰雷达" (a common translation for "synthetic aperture radar")* "automatic target recognition" is translated as "自动目标识别" (a common translation for "automatic target recognition")* "few-shot learning" is translated as "少量学习" (a common translation for "few-shot learning")* "Convolutional Transformer" is translated as "卷积变换器" (a common translation for "Convolutional Transformer")* "Moving and Stationary Target Acquisition and Recognition dataset" is translated as "移动和静止目标获取和识别数据集" (a common translation for "Moving and Stationary Target Acquisition and Recognition dataset")Note that the translation is in Simplified Chinese, which is the most widely used version of Chinese. If you need the translation in Traditional Chinese, please let me know.
Transforming Breast Cancer Diagnosis: Towards Real-Time Ultrasound to Mammogram Conversion for Cost-Effective Diagnosis
paper_authors: Sahar Almahfouz Nasser, Ashutosh Sharma, Anmol Saraf, Amruta Mahendra Parulekar, Purvi Haria, Amit Sethi for: This research aims to provide surgeons with mammogram-like image quality in real-time from noisy US images.methods: The research utilizes the Stride software to numerically solve the forward model and generate ultrasound images from mammography images. Additionally, generative adversarial networks (GANs) are used to tackle the inverse problem of generating mammogram-quality images from ultrasound images.results: The resultant images have considerably more discernible details than the original US images.Abstract
Ultrasound (US) imaging is better suited for intraoperative settings because it is real-time and more portable than other imaging techniques, such as mammography. However, US images are characterized by lower spatial resolution noise-like artifacts. This research aims to address these limitations by providing surgeons with mammogram-like image quality in real-time from noisy US images. Unlike previous approaches for improving US image quality that aim to reduce artifacts by treating them as (speckle noise), we recognize their value as informative wave interference pattern (WIP). To achieve this, we utilize the Stride software to numerically solve the forward model, generating ultrasound images from mammograms images by solving wave-equations. Additionally, we leverage the power of domain adaptation to enhance the realism of the simulated ultrasound images. Then, we utilize generative adversarial networks (GANs) to tackle the inverse problem of generating mammogram-quality images from ultrasound images. The resultant images have considerably more discernible details than the original US images.
摘要
超声成像(US)在操作间更适合使用,因为它们是实时的,更携带性好于其他成像技术,如胸部X光图像。然而,US图像受到低分辨率噪声杂音的限制。这些研究希望通过为外科医生提供来自噪声US图像的高品质图像,例如胸部X光图像的品质。不同于以往的方法,我们不会将噪声视为干扰,而是认为它们是有用的波形干扰(WIP)。为了实现这一目标,我们利用Stride软件来数学模拟前向模型,将ultrasound图像转换成胸部X光图像,并利用领域适应来增强模拟的超声图像的实际性。然后,我们利用生成对抗网络(GANs)解决超声图像到胸部X光图像的逆问题,得到了胸部X光图像的高品质图像。这些图像的详细程度明显高于原始US图像。
A Generalized Physical-knowledge-guided Dynamic Model for Underwater Image Enhancement
methods: 这个方法包括三部分:大气基于动态结构(ADS)、传输基于动态结构(TDS)和优先级基于多尺度结构(PMS)。特别是,为了涵盖复杂的水下场景,这种研究改变了全球大气光和传输,通过形成模型来模拟不同类型的水下图像(如水下图像颜色从黄到蓝)。然后,ADS和TDS使用动态 convolution来自适应地提取水下图像中的准确信息,并生成 Parameters for PMS。
results: 这个方法可以适应不同类型的水下图像,并且可以提高水下图像的对比度和颜色准确性。具体来说,对于不同的水类型,这个方法可以自动选择合适的参数,从而提高图像增强的效果。Abstract
Underwater images often suffer from color distortion and low contrast resulting in various image types, due to the scattering and absorption of light by water. While it is difficult to obtain high-quality paired training samples with a generalized model. To tackle these challenges, we design a Generalized Underwater image enhancement method via a Physical-knowledge-guided Dynamic Model (short for GUPDM), consisting of three parts: Atmosphere-based Dynamic Structure (ADS), Transmission-guided Dynamic Structure (TDS), and Prior-based Multi-scale Structure (PMS). In particular, to cover complex underwater scenes, this study changes the global atmosphere light and the transmission to simulate various underwater image types (e.g., the underwater image color ranging from yellow to blue) through the formation model. We then design ADS and TDS that use dynamic convolutions to adaptively extract prior information from underwater images and generate parameters for PMS. These two modules enable the network to select appropriate parameters for various water types adaptively. Besides, the multi-scale feature extraction module in PMS uses convolution blocks with different kernel sizes and obtains weights for each feature map via channel attention block and fuses them to boost the receptive field of the network. The source code will be available at \href{https://github.com/shiningZZ/GUPDM}{https://github.com/shiningZZ/GUPDM}.
摘要
水下图像经常受到颜色扭曲和对比度低下,导致各种图像类型,这是由水媒体扩散和吸收光线所致。而获得高质量的通用模型 paired 训练样本很难。为了解决这些挑战,我们设计了一种通用水下图像提升方法,通过物理知识导引动模型(简称 GUPDM),它由三部分组成:大气基础动态结构(ADS)、传输基础动态结构(TDS)和优先级多尺度结构(PMS)。具体来说,为了处理复杂的水下场景,本研究通过形成模型来改变全球大气光和传输,模拟不同的水下图像颜色(例如,水下图像颜色从黄色到蓝色)。然后,我们设计了 ADS 和 TDS,它们使用动态滤波器来自适应地提取水下图像中的优先信息,并生成参数 для PMS。这两个模块使得网络可以在不同的水类型上适应性选择合适的参数。此外,PMS 中的多尺度特征提取模块使用不同的核群尺寸和权重获取器,通过通道注意力块并将其拼接起来,以提高网络的感知范围。源代码将在 \href{https://github.com/shiningZZ/GUPDM}{https://github.com/shiningZZ/GUPDM} 上公开。
Enhancing Low-light Light Field Images with A Deep Compensation Unfolding Network
results: 比革命性更好,保留图像的基本 геометрическую结构Here’s a breakdown of each point:1. for: The paper is written for restoring low-light light field images.2. methods: The paper uses a novel and interpretable end-to-end learning framework called the deep compensation unfolding network (DCUNet), which is designed to mimic the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework includes a multi-stage architecture and a content-associated deep compensation module to suppress noise and illumination map estimation errors. Additionally, the paper proposes a pseudo-explicit feature interaction module to comprehensively exploit redundant information in LF images.3. results: The experimental results on both simulated and real datasets demonstrate the superiority of the DCUNet over state-of-the-art methods, both qualitatively and quantitatively. The results also show that DCUNet preserves the essential geometric structure of enhanced LF images much better than other methods.Abstract
This paper presents a novel and interpretable end-to-end learning framework, called the deep compensation unfolding network (DCUNet), for restoring light field (LF) images captured under low-light conditions. DCUNet is designed with a multi-stage architecture that mimics the optimization process of solving an inverse imaging problem in a data-driven fashion. The framework uses the intermediate enhanced result to estimate the illumination map, which is then employed in the unfolding process to produce a new enhanced result. Additionally, DCUNet includes a content-associated deep compensation module at each optimization stage to suppress noise and illumination map estimation errors. To properly mine and leverage the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU.
摘要
To fully utilize the unique characteristics of LF images, this paper proposes a pseudo-explicit feature interaction module that comprehensively exploits redundant information in LF images. The experimental results on both simulated and real datasets demonstrate the superiority of our DCUNet over state-of-the-art methods, both qualitatively and quantitatively. Moreover, DCUNet preserves the essential geometric structure of enhanced LF images much better. The code will be publicly available at https://github.com/lyuxianqiang/LFLL-DCU.中文翻译:本文提出了一种新的、可解释的端到端学习框架,called deep compensation unfolding network (DCUNet),用于提高低光照的光场图像(LF)。DCUNet 采用了多stage 架构,模拟了解决逆转媒体问题的优化过程。框架使用 intermediate 增强结果来估算照明图像,然后employs 它们在 unfolding 过程中生成新的增强结果。此外,DCUNet 还包括一个内容相关的深度补偿模块,以 suppress 噪声和照明图像估算错误。为了充分利用光场图像的特点,这篇文章提出了一种 Pseudo-explicit feature interaction module,可以全面利用光场图像中的重复信息。实验结果表明,我们的 DCUNet 在实际和模拟数据集上比 state-of-the-art 方法更高效, both qualitatively and quantitatively。此外,DCUNet 更好地保持了增强后 LF 图像的重要几何结构。代码将在 https://github.com/lyuxianqiang/LFLL-DCU 上公开。
TriDo-Former: A Triple-Domain Transformer for Direct PET Reconstruction from Low-Dose Sinograms
paper_authors: Jiaqi Cui, Pinxian Zeng, Xinyi Zeng, Peng Wang, Xi Wu, Jiliu Zhou, Yan Wang, Dinggang Shen
For: 提高低剂量Positron发射Tomography(PET)图像质量,以降低辐射暴露。* Methods: 提出一种基于变换器的模型,称为TriDo-Former,可以直接将低剂量PET(LPET)的信号转换为标准剂量PET(SPET)图像。该模型包括两个融合的网络:一个权重提升变换器(SE-Former)用于去噪LPET信号,以及一个空间-спектраль重建变换器(SSR-Former)用于从去噪后的LPET信号中重建SPET图像。* Results: 与现有方法相比,TriDo-Former可以更好地保持图像的细节和边缘,并且能够更好地捕捉全局结构。在临床数据集上进行验证,TriDo-Former表现较好,both qualitatively and quantitatively。Abstract
To obtain high-quality positron emission tomography (PET) images while minimizing radiation exposure, various methods have been proposed for reconstructing standard-dose PET (SPET) images from low-dose PET (LPET) sinograms directly. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from prorogating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively.
摘要
为了获得高质量的 positron发射tomography(PET)图像,同时尽量降低辐射暴露,various methods have been proposed for reconstructing standard-dose PET(SPET)图像 directly from low-dose PET(LPET)sinograms. However, current methods often neglect boundaries during sinogram-to-image reconstruction, resulting in high-frequency distortion in the frequency domain and diminished or fuzzy edges in the reconstructed images. Furthermore, the convolutional architectures, which are commonly used, lack the ability to model long-range non-local interactions, potentially leading to inaccurate representations of global structures. To alleviate these problems, we propose a transformer-based model that unites triple domains of sinogram, image, and frequency for direct PET reconstruction, namely TriDo-Former. Specifically, the TriDo-Former consists of two cascaded networks, i.e., a sinogram enhancement transformer (SE-Former) for denoising the input LPET sinograms and a spatial-spectral reconstruction transformer (SSR-Former) for reconstructing SPET images from the denoised sinograms. Different from the vanilla transformer that splits an image into 2D patches, based specifically on the PET imaging mechanism, our SE-Former divides the sinogram into 1D projection view angles to maintain its inner-structure while denoising, preventing the noise in the sinogram from propagating into the image domain. Moreover, to mitigate high-frequency distortion and improve reconstruction details, we integrate global frequency parsers (GFPs) into SSR-Former. The GFP serves as a learnable frequency filter that globally adjusts the frequency components in the frequency domain, enforcing the network to restore high-frequency details resembling real SPET images. Validations on a clinical dataset demonstrate that our TriDo-Former outperforms the state-of-the-art methods qualitatively and quantitatively.
Towards General and Fast Video Derain via Knowledge Distillation
results: 我们的开发的通用方法在运行速度和雨除效果两个方面达到了最佳效果。Abstract
As a common natural weather condition, rain can obscure video frames and thus affect the performance of the visual system, so video derain receives a lot of attention. In natural environments, rain has a wide variety of streak types, which increases the difficulty of the rain removal task. In this paper, we propose a Rain Review-based General video derain Network via knowledge distillation (named RRGNet) that handles different rain streak types with one pre-training weight. Specifically, we design a frame grouping-based encoder-decoder network that makes full use of the temporal information of the video. Further, we use the old task model to guide the current model in learning new rain streak types while avoiding forgetting. To consolidate the network's ability to derain, we design a rain review module to play back data from old tasks for the current model. The experimental results show that our developed general method achieves the best results in terms of running speed and derain effect.
摘要
为了解决雨水影响视觉系统性能的问题,视频雨除(video derain)已经受到了广泛的关注。自然环境中的雨水有多种斑斓类型,这使得雨除任务变得更加困难。在这篇论文中,我们提出了一种基于知识储存(knowledge distillation)的通用视频雨除网络(RRGNet),可以处理不同的雨斑类型。具体来说,我们设计了一个帧组合网络,使得网络可以充分利用视频的时间信息。此外,我们使用老任务模型来导引当前模型学习新的雨斑类型,以避免忘记。为了巩固网络的雨除能力,我们设计了雨评模块,以便将老任务数据播放给当前模型。实验结果表明,我们开发的通用方法在运行速度和雨除效果两个方面均达到了最佳效果。
Geometric Learning-Based Transformer Network for Estimation of Segmentation Errors
results: 这个方法在一个高分辨率微型CT数据集上进行了评估,结果显示它可以实现约0.042的平均绝对误差和79.53%的准确率,优于其他 Graph Neural Network(GNN)。此外,这个方法还提出了点批评点预测(vertex-normal prediction)作为自适应任务,以提高网络的总性能。Abstract
Many segmentation networks have been proposed for 3D volumetric segmentation of tumors and organs at risk. Hospitals and clinical institutions seek to accelerate and minimize the efforts of specialists in image segmentation. Still, in case of errors generated by these networks, clinicians would have to manually edit the generated segmentation maps. Given a 3D volume and its putative segmentation map, we propose an approach to identify and measure erroneous regions in the segmentation map. Our method can estimate error at any point or node in a 3D mesh generated from a possibly erroneous volumetric segmentation map, serving as a Quality Assurance tool. We propose a graph neural network-based transformer based on the Nodeformer architecture to measure and classify the segmentation errors at any point. We have evaluated our network on a high-resolution micro-CT dataset of the human inner-ear bony labyrinth structure by simulating erroneous 3D segmentation maps. Our network incorporates a convolutional encoder to compute node-centric features from the input micro-CT data, the Nodeformer to learn the latent graph embeddings, and a Multi-Layer Perceptron (MLP) to compute and classify the node-wise errors. Our network achieves a mean absolute error of ~0.042 over other Graph Neural Networks (GNN) and an accuracy of 79.53% over other GNNs in estimating and classifying the node-wise errors, respectively. We also put forth vertex-normal prediction as a custom pretext task for pre-training the CNN encoder to improve the network's overall performance. Qualitative analysis shows the efficiency of our network in correctly classifying errors and reducing misclassifications.
摘要
很多分割网络已经被提出用于3D分割肿瘤和潜在受影响的器官。医院和临床机构希望通过加速和减少专家图像分割的努力来加速图像分割过程。然而,在这些网络生成的分割地图中出现错误时,临床医生需要手动修改生成的分割地图。我们提出一种方法,可以在3D矩阵中找到和测量错误的区域。我们的方法可以在3D矩阵中计算错误的误差,并且可以作为图像质量控制工具。我们提出一种基于Transformer架构的图神经网络,可以在3D矩阵中测量和分类分割错误。我们的网络包括一个卷积Encoder来计算输入微CT数据中的节点特征,Nodeformer来学习幂等图像嵌入,以及一个多层感知器(MLP)来计算和分类节点错误。我们的网络在其他图神经网络(GNN)中的mean absolute error为~0.042,并且在分类节点错误的精度为79.53%。我们还提出了预训练CNNEncoder的预测顶点均值作为自定义预处理任务,以提高网络的整体性能。 качеitative分析表明,我们的网络能够正确地分类错误并减少错误分类。