eess.IV - 2023-09-09

Latent Degradation Representation Constraint for Single Image Deraining

paper_url: http://arxiv.org/abs/2309.04780
repo_url: None
paper_authors: Yuhong He, Long Peng, Lu Wang, Jun Cheng
for: 提高单图雨植物除法的精度和效果，解决现有方法存在过度或未适应现象。
methods: 提出了一种基于irection-aware编码器、UNet排除网络和多尺度交互块的Latent Degradation Representation Constraint Network（LDRCNet），通过带有方向一致性的扩展几何梯度来适应各种雨植物模式，并在训练时引入约束损失来显式地学习雨植物表示。
results: 在 sintetic 和实际数据集上实验表明，提出的方法可以达到新的状态级性能。

Abstract
Since rain streaks show a variety of shapes and directions, learning the degradation representation is extremely challenging for single image deraining. Existing methods are mainly targeted at designing complicated modules to implicitly learn latent degradation representation from coupled rainy images. This way, it is hard to decouple the content-independent degradation representation due to the lack of explicit constraint, resulting in over- or under-enhancement problems. To tackle this issue, we propose a novel Latent Degradation Representation Constraint Network (LDRCNet) that consists of Direction-Aware Encoder (DAEncoder), UNet Deraining Network, and Multi-Scale Interaction Block (MSIBlock). Specifically, the DAEncoder is proposed to adaptively extract latent degradation representation by using the deformable convolutions to exploit the direction consistency of rain streaks. Next, a constraint loss is introduced to explicitly constraint the degradation representation learning during training. Last, we propose an MSIBlock to fuse with the learned degradation representation and decoder features of the deraining network for adaptive information interaction, which enables the deraining network to remove various complicated rainy patterns and reconstruct image details. Experimental results on synthetic and real datasets demonstrate that our method achieves new state-of-the-art performance.

摘要
因为雨托 Streaks 显示出多种形状和方向，学习降解表示是单图像抖掉极其困难的。现有方法主要targeted at设计复杂的模块，以异步学习潜在的降解表示从相关的雨照图像中。这种方法难以分离内容独立的降解表示，导致过度或不足进行增强问题。为了解决这个问题，我们提出了一种新的降解表示约束网络（LDRCNet），包括方向感知编码器（DAEncoder）、UNet抖掉网络和多Scale交互块（MSIBlock）。具体来说，DAEncoder是用具有可变扩散的卷积来适应ively抽取降解表示，并且通过利用雨托的方向一致性来提取有用的降解表示。然后，我们引入了一个约束损失来在训练中直接约束降解表示学习。最后，我们提出了一个MSIBlock来与学习的降解表示和抖掉网络的解码器特征进行相互交互，使得抖掉网络能够去除各种复杂的雨托模式，并重建图像细节。实验结果表明，我们的方法在 sintetic 和实际 datasets 上达到了新的状态级表现。

SSHNN: Semi-Supervised Hybrid NAS Network for Echocardiographic Image Segmentation

paper_url: http://arxiv.org/abs/2309.04672
repo_url: None
paper_authors: Renqi Chen, Jingjing Luo, Fan Nian, Yuhui Cen, Yiheng Peng, Zekuan Yu
for: 这个研究旨在提高医疗影像分类的精度，特别是用于echocardiographic影像，减少不必要的噪声。
methods: 这个研究使用Neural Architecture Search（NAS）来设计网络，并将层别特征聚合和对应的Transformers引入，以提高分类的精度和效率。
results: 实验结果显示，这个 SSHNN 网络可以优于现有的方法，实现更高的分类精度和效率。

Abstract
Accurate medical image segmentation especially for echocardiographic images with unmissable noise requires elaborate network design. Compared with manual design, Neural Architecture Search (NAS) realizes better segmentation results due to larger search space and automatic optimization, but most of the existing methods are weak in layer-wise feature aggregation and adopt a ``strong encoder, weak decoder" structure, insufficient to handle global relationships and local details. To resolve these issues, we propose a novel semi-supervised hybrid NAS network for accurate medical image segmentation termed SSHNN. In SSHNN, we creatively use convolution operation in layer-wise feature fusion instead of normalized scalars to avoid losing details, making NAS a stronger encoder. Moreover, Transformers are introduced for the compensation of global context and U-shaped decoder is designed to efficiently connect global context with local features. Specifically, we implement a semi-supervised algorithm Mean-Teacher to overcome the limited volume problem of labeled medical image dataset. Extensive experiments on CAMUS echocardiography dataset demonstrate that SSHNN outperforms state-of-the-art approaches and realizes accurate segmentation. Code will be made publicly available.

摘要
准确的医学影像分割特别是用于echocardiographic图像的分割需要精心设计网络。与手动设计相比，使用神经网络搜索（NAS）可以实现更好的分割结果，因为它可以在更大的搜索空间中进行自动优化，但是现有的方法往往弱于层次特征聚合和采用了“强Encoder,弱Decoder”结构，无法处理全局关系和本地细节。为解决这些问题，我们提出了一种新的半supervised混合NAS网络，称为SSHNN。在SSHNN中，我们创新地使用卷积操作来实现层次特征融合，而不是使用正常化的整数，以避免丢失细节。此外，我们还引入了Transformers来补做全局上下文，并设计了U字形解码器来有效地连接全局上下文和本地特征。具体来说，我们实现了一种半supervised算法 Mean-Teacher，以解决有限量的医学影像数据集的问题。我们的实验表明，SSHNN在CAMUS echocardiography数据集上表现出了State-of-the-art的分割结果，并实现了准确的分割。我们将代码公开。

ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation

paper_url: http://arxiv.org/abs/2309.05674
repo_url: None
paper_authors: Xian Lin, Zengqiang Yan, Xianbo Deng, Chuansheng Zheng, Li Yu
for: 提高 трансформа器在医学图像分割中的表现，特别是解决注意力归一化问题。
methods: 建立CNN风格的transformer（ConvFormer），通过pooling、CNN风格自注意（CSA）和卷积FeedForward Network（CFFN）来提高注意力归一化和feature refinement。
results: 在多个 dataset 上实验表明，ConvFormer 可以作为替换 transformer 框架中的插件模块，提高表现。

Abstract
Transformers have been extensively studied in medical image segmentation to build pairwise long-range dependence. Yet, relatively limited well-annotated medical image data makes transformers struggle to extract diverse global features, resulting in attention collapse where attention maps become similar or even identical. Comparatively, convolutional neural networks (CNNs) have better convergence properties on small-scale training data but suffer from limited receptive fields. Existing works are dedicated to exploring the combinations of CNN and transformers while ignoring attention collapse, leaving the potential of transformers under-explored. In this paper, we propose to build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance. Specifically, ConvFormer consists of pooling, CNN-style self-attention (CSA), and convolutional feed-forward network (CFFN) corresponding to tokenization, self-attention, and feed-forward network in vanilla vision transformers. In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction. In this way, CSA takes 2D feature maps as inputs and establishes long-range dependency by constructing self-attention matrices as convolution kernels with adaptive sizes. Following CSA, 2D convolution is utilized for feature refinement through CFFN. Experimental results on multiple datasets demonstrate the effectiveness of ConvFormer working as a plug-and-play module for consistent performance improvement of transformer-based frameworks. Code is available at https://github.com/xianlin7/ConvFormer.

摘要
transformers 在医学影像 segmentation 领域得到了广泛的研究，以建立对应的长距离依赖关系。然而，有限的高质量医学影像数据使 transformers 在提取多样的全球特征方面困难，导致注意力坍塌，注意力映射变得相似或 même identical。相比之下，卷积神经网络（CNN）在小规模训练数据上有更好的收敛性能，但受限于宽度的接受范围。现有的工作主要是探索 CCN 和 transformers 的组合，忽略了注意力坍塌问题，因此 transformers 的潜在能力还未得到充分的探索。本文提出了一种具有 CNN 特征的 transformers（ConvFormer），以便提高注意力均匀性和 segmentation 性能。具体来说，ConvFormer 由池化、CNN 式自注意（CSA）和卷积 feed-forward 网络（CFFN）组成，与 vanilla vision transformers 中的封装、自注意和 feed-forward 网络相对应。与 pozitional embedding 和封装不同，ConvFormer 采用了2D卷积和最大池化来保持位置信息和特征大小减少。这样，CSA 可以将 2D 特征图作为输入，建立长距离依赖关系，并通过构建自注意矩阵来实现自注意。接下来，2D 卷积被用来进行特征细化，通过 CFFN 进行特征细化。实验结果表明，ConvFormer 作为 transformer 基础架构上的插件模块，可以为 transformer 基础架构提供可靠的性能提升。代码可以在上获取。

Video and Synthetic MRI Pre-training of 3D Vision Architectures for Neuroimage Analysis

paper_url: http://arxiv.org/abs/2309.04651
repo_url: None
paper_authors: Nikhil J. Dhinagar, Amit Singh, Saket Ozarkar, Ketaki Buwa, Sophia I. Thomopoulos, Conor Owens-Walton, Emily Laltoo, Yao-Liang Chen, Philip Cook, Corey McMillan, Chih-Chien Tsai, J-J Wang, Yih-Ru Wu, Paul M. Thompson
for:* 3D medical imaging tasks, particularly Alzheimer’s disease (AD) and Parkinson’s disease (PD) classification, and “brain age” prediction.methods:* Pre-training deep learning models on a large corpus of data, including natural images, medical images, and synthetically generated MRI scans or video data.* Adapting pre-trained models to downstream neuroimaging tasks with a range of difficulty.results:* Pre-training improved performance across all tasks, with a boost of 7.4% for AD classification and 4.6% for PD classification for ViTs, and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs.* Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs.* CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances.* Pre-training improved generalization to out-of-distribution datasets and sites.

Abstract
Transfer learning represents a recent paradigm shift in the way we build artificial intelligence (AI) systems. In contrast to training task-specific models, transfer learning involves pre-training deep learning models on a large corpus of data and minimally fine-tuning them for adaptation to specific tasks. Even so, for 3D medical imaging tasks, we do not know if it is best to pre-train models on natural images, medical images, or even synthetically generated MRI scans or video data. To evaluate these alternatives, here we benchmarked vision transformers (ViTs) and convolutional neural networks (CNNs), initialized with varied upstream pre-training approaches. These methods were then adapted to three unique downstream neuroimaging tasks with a range of difficulty: Alzheimer's disease (AD) and Parkinson's disease (PD) classification, "brain age" prediction. Experimental tests led to the following key observations: 1. Pre-training improved performance across all tasks including a boost of 7.4% for AD classification and 4.6% for PD classification for the ViT and 19.1% for PD classification and reduction in brain age prediction error by 1.26 years for CNNs, 2. Pre-training on large-scale video or synthetic MRI data boosted performance of ViTs, 3. CNNs were robust in limited-data settings, and in-domain pretraining enhanced their performances, 4. Pre-training improved generalization to out-of-distribution datasets and sites. Overall, we benchmarked different vision architectures, revealing the value of pre-training them with emerging datasets for model initialization. The resulting pre-trained models can be adapted to a range of downstream neuroimaging tasks, even when training data for the target task is limited.

摘要
Transfer learning 表示现代人工智能系统的一种新的思路。而不是专门为每个任务训练特定的模型， transfer learning 是在大量数据上预训练深度学习模型，并在最小化 fine-tuning 后应用于特定任务。然而，对于3D医学影像任务，我们不知道是否应该预训练模型于自然图像、医学影像或者 sintetically生成的 MRI 扫描或视频数据。为了评估这些选项，我们在这里对 vision transformers（ViTs）和卷积神经网络（CNNs）进行了比较。这些方法在三个独特的下游神经成像任务中进行了适应：阿尔茨heimer 病（AD）和 пар金森病（PD）分类、"brain age" 预测。实验证明了以下关键观察：1. 预训练可以提高所有任务的性能，包括boost 7.4% 的 AD 分类和4.6% 的 PD 分类 для ViT，以及19.1% 的 PD 分类和 reductions 的 brain age 预测错误率为 1.26 年 для CNNs。2. 预训练于大规模的视频或生成的 MRI 数据可以提高 ViTs 的性能。3. CNNs 在有限数据设置下表现稳定，并且在域内预训练下进一步提高了其性能。4. 预训练可以提高模型对outsidel distribution 数据集和站点的一致性。总的来说，我们对不同的视觉架构进行了比较，发现预训练它们以 emerging 数据集为初始化的值。这些预训练后的模型可以适应具有有限数据的下游神经成像任务，并且在训练数据中不同的站点上也能够达到良好的性能。