eess.IV - 2023-07-12

On the Importance of Denoising when Learning to Compress Images

paper_url: http://arxiv.org/abs/2307.06233
repo_url: https://github.com/trougnouf/compression
paper_authors: Benoit Brummer, Christophe De Vleeschouwer
for: 这个研究旨在提高图像压缩和去噪的效果。
methods: 研究人员使用了一个混合不同噪音水平的图像训练集，并将图像压缩和去噪过程结合在一起进行训练。
results: 研究人员发现这种方法可以实现更好的率误比和更高的图像质量，比起单独使用图像压缩或去噪方法。

Abstract
Image noise is ubiquitous in photography. However, image noise is not compressible nor desirable, thus attempting to convey the noise in compressed image bitstreams yields sub-par results in both rate and distortion. We propose to explicitly learn the image denoising task when training a codec. Therefore, we leverage the Natural Image Noise Dataset, which offers a wide variety of scenes captured with various ISO numbers, leading to different noise levels, including insignificant ones. Given this training set, we supervise the codec with noisy-clean image pairs, and show that a single model trained based on a mixture of images with variable noise levels appears to yield best-in-class results with both noisy and clean images, achieving better rate-distortion than a compression-only model or even than a pair of denoising-then-compression models with almost one order of magnitude fewer GMac operations.

摘要
图像噪声是摄影中的普遍存在问题。然而，图像噪声不能压缩 nor 愿意被压缩，因此在压缩图像比特流中传递噪声会导致质量下降。我们提议在编码器训练时显式学习图像杀噪任务。因此，我们利用自然图像噪声数据集，该数据集包括不同ISO数字化水平的场景，导致不同的噪声水平，包括无法快速识别的轻度噪声。我们将这些噪声污染图像与干净图像对组成，并证明了基于这些混合图像的变量噪声水平进行训练的单个模型可以在不同的图像中达到最佳级别的结果，超过了压缩Only模型或者是denoising-then-compression模型，并且具有约一个数量级的GMac操作数量的减少。

CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification

paper_url: http://arxiv.org/abs/2307.06182
repo_url: https://github.com/zhenrongshen/cellgan
paper_authors: Zhenrong Shen, Maosong Cao, Sheng Wang, Lichi Zhang, Qian Wang
for: 帮助病理学家更准确地检测预防性癌症。
methods: 使用CellGANSynthesize cytopathological images of various cervical cell types to augment patch-level cell classification.
results: 实验表明，CellGAN可以生成具有很高可信度的TCT cytopathological images，并且可以大幅提高patch-level细胞分类性能。

Abstract
Automatic examination of thin-prep cytologic test (TCT) slides can assist pathologists in finding cervical abnormality for accurate and efficient cancer screening. Current solutions mostly need to localize suspicious cells and classify abnormality based on local patches, concerning the fact that whole slide images of TCT are extremely large. It thus requires many annotations of normal and abnormal cervical cells, to supervise the training of the patch-level classifier for promising performance. In this paper, we propose CellGAN to synthesize cytopathological images of various cervical cell types for augmenting patch-level cell classification. Built upon a lightweight backbone, CellGAN is equipped with a non-linear class mapping network to effectively incorporate cell type information into image generation. We also propose the Skip-layer Global Context module to model the complex spatial relationship of the cells, and attain high fidelity of the synthesized images through adversarial learning. Our experiments demonstrate that CellGAN can produce visually plausible TCT cytopathological images for different cell types. We also validate the effectiveness of using CellGAN to greatly augment patch-level cell classification performance.

摘要
自动检查薄准cytologic test(TCT)板块可以帮助病理学家更准确地检测cervical畸形，从而提高癌症检测的效率。现有的解决方案通常需要在local化异常cell和分类异常cell based on local patches，这是因为TCT板块的整个图像非常大。因此需要许多标注正常和异常cervical cell，以进行训练patch-level分类器。在这篇论文中，我们提出了CellGAN，用于生成cytopathological图像。CellGAN基于轻量级的背bone，并配备了非线性类别映射网络，以有效地将细胞类型信息 incorporated into图像生成。我们还提出了Skip-layer Global Context模块，用于模型细胞之间的复杂空间关系，并通过对抗学习来保证生成的图像的高准确性。我们的实验表明，CellGAN可以生成可信度高的TCT cytopathological图像。此外，我们还验证了使用CellGAN可以大幅提高patch-level细胞分类性能。

Learning Kernel-Modulated Neural Representation for Efficient Light Field Compression

paper_url: http://arxiv.org/abs/2307.06143
repo_url: None
paper_authors: Jinglei Shi, Yihong Xu, Christine Guillemot
for: This paper is written for the purpose of compressing light field data, which is a type of image data that captures 3D scene information.
methods: The paper proposes a compact neural network representation for light field compression, which is inspired by the visual characteristics of Sub-Aperture Images (SAIs) of light fields. The network is composed of two types of kernels: descriptive kernels that store scene description information, and modulatory kernels that control the rendering of different SAIs from the queried perspectives.
results: The paper demonstrates that the proposed method outperforms other state-of-the-art methods by a significant margin in the light field compression task. Additionally, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for the view synthesis task.

Abstract
Light field is a type of image data that captures the 3D scene information by recording light rays emitted from a scene at various orientations. It offers a more immersive perception than classic 2D images but at the cost of huge data volume. In this paper, we draw inspiration from the visual characteristics of Sub-Aperture Images (SAIs) of light field and design a compact neural network representation for the light field compression task. The network backbone takes randomly initialized noise as input and is supervised on the SAIs of the target light field. It is composed of two types of complementary kernels: descriptive kernels (descriptors) that store scene description information learned during training, and modulatory kernels (modulators) that control the rendering of different SAIs from the queried perspectives. To further enhance compactness of the network meanwhile retain high quality of the decoded light field, we accordingly introduce modulator allocation and kernel tensor decomposition mechanisms, followed by non-uniform quantization and lossless entropy coding techniques, to finally form an efficient compression pipeline. Extensive experiments demonstrate that our method outperforms other state-of-the-art (SOTA) methods by a significant margin in the light field compression task. Moreover, after aligning descriptors, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for view synthesis task.

摘要
光场是一种图像数据类型，记录了场景中光束的多个方向信息。它提供了比 класси二dimensional图像更有彩虹的感受，但是需要巨大的数据量。在这篇论文中，我们从Sub-Aperture Image（SAI）的视觉特征中继承了灵感，并设计了一种可靠的神经网络表示方法 для光场压缩任务。网络背部使用随机初始化的噪声作为输入，并在SAI的目标光场上进行监督。它包括两种相辅相成的核心：描述核心（描述器），这些核心将在训练中学习的场景描述信息存储在内部，以及调制核心（调制器），这些核心控制来自不同视角的SAI的渲染。为了进一步提高网络的Compactness，同时保持高质量的解码光场，我们采用了调制器分配机制、核心矩阵分解机制、非均匀量化和无损编码技术。最终，我们形成了高效的压缩管道。广泛的实验证明了我们的方法在光场压缩任务中具有明显的优势，并且将描述器调整后，从一个光场中学习的调制器可以转移到新的光场中，表示一种可能的视 synthesis 任务的解决方案。

Spatially-Adaptive Learning-Based Image Compression with Hierarchical Multi-Scale Latent Spaces

paper_url: http://arxiv.org/abs/2307.06102
repo_url: None
paper_authors: Fabian Brand, Alexander Kopte, Kristian Fischer, André Kaup
for: 提高图像和视频压缩系统的效率
methods: 使用多尺度latent space和加速单元实现多尺度处理
results: 比传统自适应压缩网络提高7%的比特率，同时Complexity和解码时间均降低

Abstract
Adaptive block partitioning is responsible for large gains in current image and video compression systems. This method is able to compress large stationary image areas with only a few symbols, while maintaining a high level of quality in more detailed areas. Current state-of-the-art neural-network-based image compression systems however use only one scale to transmit the latent space. In previous publications, we proposed RDONet, a scheme to transmit the latent space in multiple spatial resolutions. Following this principle, we extend a state-of-the-art compression network by a second hierarchical latent-space level to enable multi-scale processing. We extend the existing rate variability capabilities of RDONet by a gain unit. With that we are able to outperform an equivalent traditional autoencoder by 7% rate savings. Furthermore, we show that even though we add an additional latent space, the complexity only increases marginally and the decoding time can potentially even be decreased.

摘要
adaptive block partitioning 已经使现代图像和视频压缩系统中获得了大量的提升。这种方法可以压缩大的静止图像区域只需几个符号，同时保持更详细的区域的高质量。current state-of-the-art neural network-based image compression systems however 使用 solo scale 传输缓存空间。在先前的发表文章中，我们提议了 RDONet 方案，该方案在多个空间分辨率下传输缓存空间。基于这个原理，我们扩展了现有的 compression network ，添加了第二层嵌入空间级别，以实现多尺度处理。我们还扩展了 RDONet 的存在变化能力，添加了一个 gain unit。通过这些扩展，我们能够超越相同的传统 autoencoder ，减少了7%的比特量。此外，我们还表明，尽管我们添加了一个额外的嵌入空间，但复杂度只增加了微不足，并且解码时间可能实际下降。

ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression

paper_url: http://arxiv.org/abs/2307.06342
repo_url: None
paper_authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
for: 这个论文的目的是提出一个高效的ConvNeXt-基本的对称代码架构，以提高对称代码的 compressive 性和重建精度。
methods: 本文使用了ConvNeXt-基本的对称代码架构，并与 compute-efficient 的通道自适应条件预测（Channel-wise auto-regressive prior，简称 CARP）结合，以捕捉对称代码中的全局和本地上下文。
results: 实验结果显示，ConvNeXt-ChARM 在四个常用的测试数据集上实现了平均5.24%和1.22%的BD-rate（PSNR）reduction，较VVC referencencoder（VTM-18.0）和SwinT-ChARM更高。此外，我们还进行了模型缩减研究和一些对象和主观分析，以显示ConvNeXt-ChARM 的 Computational efficiency和Performance gap。

Abstract
Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the reconstruction fidelity, especially in non-homogeneous textured image areas. Those models also require more parameters and a higher decoding time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive prior to capturing both global and local contexts from the hyper and quantized latent representations. The proposed architecture can be optimized end-to-end to fully exploit the context information and extract compact latent representation while reconstructing higher-quality images. Experimental results on four widely-used datasets showed that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.

摘要
过去几年，神经网络压缩得到了广泛的研究和业界的关注，并且实现了较好的端到端深度神经编码器，比传统编码器在比特率-质量性能方面表现更出色。然而，现有方法，包括注意力基于转换编码，仍然需要进一步改进，以降低编码率，保持重建准确度，特别是在非同质图像区域。这些模型还需要更多的参数和更高的解码时间。为解决以上挑战，我们提出了 ConvNeXt-ChARM，一个高效的 ConvNeXt 基于的转换编码框架，配备了 compute-efficient 通道 wise 自适应梯度估计。该框架可以通过练习策略来完全利用上下文信息，抽象出更紧凑的干扰表示，并重建更高质量的图像。我们对四个广泛使用的数据集进行实验，结果表明，ConvNeXt-ChARM 可以在 PSNR 和 BD-rate 上提供了平均下降约 5.24% 和 1.22%，相比 VVC 参考编码器 (VTM-18.0) 和 state-of-the-art 学习型图像压缩方法 SwinT-ChARM。此外，我们还提供了模型缩放研究，以证明我们的方法的计算效率，并进行了一些对象和主观分析，以强调 ConvNeXt 的性能差异与 Swin Transformer。

AICT: An Adaptive Image Compression Transformer

paper_url: http://arxiv.org/abs/2307.06091
repo_url: None
paper_authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
for: 提高SwinT-ChARM的效率调查
methods: 使用更直观却有效的通道wise自动回归先前模型，并利用学习缩放模块和ConvNeXt预/后处理器来提取更紧凑的归一化表示
results: 相比VVC参考解码器和SwinT-ChARM，提出的自适应图像压缩 трансформа器（AICT）框架在质量与编码效率之间做出了显著改进

Abstract
Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive image compression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.

摘要
Translated into Simplified Chinese:受到 transformer 基于 transform 编码框架的效率研究的启发，我们提议通过更直观又有效的 transformer 基于通道自动回归先前模型来增强其他，并得到一个绝对图像压缩变换器（ICT）。当前仍然基于 ConvNet 的 entropy 编码方法受到了其局部连接和逐渐增加的建筑约束和偏好的限制，无法正确模型长距离依赖关系。相比之下，我们提议的 ICT 可以从 latent 表示中捕捉到全局和局部上下文，更好地参数化压缩 latent 的分布。此外，我们还利用了一个可学习的缩放模块和一个 ConvNeXt 基于的前/后处理器来准确地提取更加紧凑的 latent 表示，并在重建更高质量的图像时进行更好的重建。我们在标准测试集上进行了广泛的实验研究，结果表明，我们提议的自适应图像压缩转换器（AICT）框架可以在 coding 效率和解码器复杂度之间进行更好的平衡，并超过了 VVC 参考编码器（VTM-18.0）和神经编码器 SwinT-ChARM。

Flexible and Fully Quantized Ultra-Lightweight TinyissimoYOLO for Ultra-Low-Power Edge Systems

paper_url: http://arxiv.org/abs/2307.05999
repo_url: None
paper_authors: Julian Moosmann, Hanna Mueller, Nicky Zimmerman, Georg Rutishauser, Luca Benini, Michele Magno
for: 这篇论文旨在探讨和探索基于边缘系统的ultra-lightweight对象检测网络TinyissimoYOLO的变体，以实现边缘系统中的几十毫瓦级电力范围内的对象检测。
methods: 本论文使用了TinyissimoYOLO的多种变体，通过实验测量，对网络的检测性能进行了全面的特征化，包括输入分辨率、对象类型数量和隐藏层调整的影响。
results: 本论文通过实验测量，对不同平台的响应时间和能效率进行了对比，并将TinyissimoYOLO部署到了现代最低功耗极端边缘平台上，包括GAP9 from Greenwaves、STM32H7 from ST Microelectronics、STM32L4 from STM和Apollo4b from Ambiq等。实验结果表明，GAP9的硬件加速器可以在2.12ms和150uJ的情况下实现最低的推理延迟和能效率，比最佳竞争平台MAX78000高2倍和20%。

Abstract
This paper deploys and explores variants of TinyissimoYOLO, a highly flexible and fully quantized ultra-lightweight object detection network designed for edge systems with a power envelope of a few milliwatts. With experimental measurements, we present a comprehensive characterization of the network's detection performance, exploring the impact of various parameters, including input resolution, number of object classes, and hidden layer adjustments. We deploy variants of TinyissimoYOLO on state-of-the-art ultra-low-power extreme edge platforms, presenting an in-depth a comparison on latency, energy efficiency, and their ability to efficiently parallelize the workload. In particular, the paper presents a comparison between a novel parallel RISC-V processor (GAP9 from Greenwaves) with and without use of its on-chip hardware accelerator, an ARM Cortex-M7 core (STM32H7 from ST Microelectronics), two ARM Cortex-M4 cores (STM32L4 from STM and Apollo4b from Ambiq), and a multi-core platform with a CNN hardware accelerator (Analog Devices MAX78000). Experimental results show that the GAP9's hardware accelerator achieves the lowest inference latency and energy at 2.12ms and 150uJ respectively, which is around 2x faster and 20% more efficient than the next best platform, the MAX78000. The hardware accelerator of GAP9 can even run an increased resolution version of TinyissimoYOLO with 112x112 pixels and 10 detection classes within 3.2ms, consuming 245uJ. To showcase the competitiveness of a versatile general-purpose system we also deployed and profiled a multi-core implementation on GAP9 at different operating points, achieving 11.3ms with the lowest-latency and 490uJ with the most energy-efficient configuration. With this paper, we demonstrate the suitability and flexibility of TinyissimoYOLO on state-of-the-art detection datasets for real-time ultra-low-power edge inference.

摘要

FreeSeed: Frequency-band-aware and Self-guided Network for Sparse-view CT Reconstruction

paper_url: http://arxiv.org/abs/2307.05890
repo_url: https://github.com/masaaki-75/freeseed
paper_authors: Chenglong Ma, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan
for: 提高简单视图计算机 tomography（CT）图像的速度和辐射暴露减少，但重建图像仍然受到严重的扭曲痕迹的影响，这些痕迹会影响后续的检查和诊断。
methods: 我们提出了一种基于深度学习的图像后处理方法，以及其双域对应的方法，可以显著提高图像质量。
results: 我们的方法可以有效地除除扭曲痕迹和损失细节，并且在简单视图CT图像重建方法中表现出色，比前者更高效。

Abstract
Sparse-view computed tomography (CT) is a promising solution for expediting the scanning process and mitigating radiation exposure to patients, the reconstructed images, however, contain severe streak artifacts, compromising subsequent screening and diagnosis. Recently, deep learning-based image post-processing methods along with their dual-domain counterparts have shown promising results. However, existing methods usually produce over-smoothed images with loss of details due to (1) the difficulty in accurately modeling the artifact patterns in the image domain, and (2) the equal treatment of each pixel in the loss function. To address these issues, we concentrate on the image post-processing and propose a simple yet effective FREquency-band-awarE and SElf-guidED network, termed FreeSeed, which can effectively remove artifact and recover missing detail from the contaminated sparse-view CT images. Specifically, we first propose a frequency-band-aware artifact modeling network (FreeNet), which learns artifact-related frequency-band attention in Fourier domain for better modeling the globally distributed streak artifact on the sparse-view CT images. We then introduce a self-guided artifact refinement network (SeedNet), which leverages the predicted artifact to assist FreeNet in continuing to refine the severely corrupted details. Extensive experiments demonstrate the superior performance of FreeSeed and its dual-domain counterpart over the state-of-the-art sparse-view CT reconstruction methods. Source code is made available at https://github.com/Masaaki-75/freeseed.

摘要
稀视 computed tomography (CT) 是一种有前途的解决方案，可以快速扫描和减少病人接受到辐射的方法，但是重构图像中仍然存在严重的条纹artefacts，这会对后续检查和诊断造成干扰。近年来，基于深度学习的图像后处理方法以及其双域对应方法已经展示了良好的结果。然而，现有方法通常会生成过滤平滑的图像，losing details due to (1) 缺乏 accurately modeling artifact patterns in the image domain, and (2) treating each pixel equally in the loss function.为了解决这些问题，我们将注意力集中在图像后处理上，并提出了一种简单 yet effective的FREquency-band-awarE and SElf-guidED网络（FreeSeed），可以有效地除掉 artifact和恢复损害的细节。 Specifically, we first propose a frequency-band-aware artifact modeling network（FreeNet），可以在Fourier domain中更好地模型稀视 CT 图像上的全球分布的条纹artefact。然后，我们引入了一种自领导的 artifact refinement network（SeedNet），可以通过预测的artefact来帮助 FreeNet 继续改进严重损害的细节。我们的实验证明了 FreeSeed 和其双域对应方法的超过状态艺术的稀视 CT 重构方法的优越性。源代码可以在上获取。

Denoising Simulated Low-Field MRI (70mT) using Denoising Autoencoders (DAE) and Cycle-Consistent Generative Adversarial Networks (Cycle-GAN)

paper_url: http://arxiv.org/abs/2307.06338
repo_url: None
paper_authors: Fernando Vega, Abdoljalil Addeh, M. Ethan MacDonald
for: 提高低场磁共振成像（MRI）图像质量
methods: 使用潮汐律环GAN（Cycle-GAN）和推理 autoencoder（DAE）进行生成和恢复
results: 在 simulations 中，Cycle-GAN 能够提高低场 MRI 图像的高场、高分辨率和高信噪比（SNR），而且不需要图像对。

Abstract
In this work, a denoising Cycle-GAN (Cycle Consistent Generative Adversarial Network) is implemented to yield high-field, high resolution, high signal-to-noise ratio (SNR) Magnetic Resonance Imaging (MRI) images from simulated low-field, low resolution, low SNR MRI images. Resampling and additive Rician noise were used to simulate low-field MRI. Images were utilized to train a Denoising Autoencoder (DAE) and a Cycle-GAN, with paired and unpaired cases. Both networks were evaluated using SSIM and PSNR image quality metrics. This work demonstrates the use of a generative deep learning model that can outperform classical DAEs to improve low-field MRI images and does not require image pairs.

摘要
在这项工作中，我们实现了一种杜因论文（Cycle Consistent Generative Adversarial Network）来生成高场、高分辨率、高信噪比（SNR）核磁共振成像（MRI）图像，从低场、低分辨率、低SNR MRI图像中生成。使用抽样和加法 ricain 噪声来模拟低场 MRI。图像用于训练一个 Denoising Autoencoder（DAE）和一个 Cycle-GAN，包括对应和不对应的情况。两个网络都被评估使用 SSsim 和 PSNR 图像质量指标。这项工作表明了使用生成深度学习模型可以超过传统 DAEs 来改善低场 MRI 图像，并不需要图像对。

Improving Segmentation and Detection of Lesions in CT Scans Using Intensity Distribution Supervision

paper_url: http://arxiv.org/abs/2307.05804
repo_url: https://github.com/rsummers11/CADLab
paper_authors: Seung Yeon Shin, Thomas C. Shen, Ronald M. Summers
for: 用于提高 segmentation 和检测网络的训练
methods: 使用intensity histogram建立 lesion probability 函数，并将其作为额外监督信息提供给网络训练
results: 对小肠癌瘤、肾肿瘤和肺核吸引蛋白的 segmentation 和检测效果进行改进，并且对肾肿瘤检测效果提高了64.6% -> 75.5%。

Abstract
We propose a method to incorporate the intensity information of a target lesion on CT scans in training segmentation and detection networks. We first build an intensity-based lesion probability (ILP) function from an intensity histogram of the target lesion. It is used to compute the probability of being the lesion for each voxel based on its intensity. Finally, the computed ILP map of each input CT scan is provided as additional supervision for network training, which aims to inform the network about possible lesion locations in terms of intensity values at no additional labeling cost. The method was applied to improve the segmentation of three different lesion types, namely, small bowel carcinoid tumor, kidney tumor, and lung nodule. The effectiveness of the proposed method on a detection task was also investigated. We observed improvements of 41.3% -> 47.8%, 74.2% -> 76.0%, and 26.4% -> 32.7% in segmenting small bowel carcinoid tumor, kidney tumor, and lung nodule, respectively, in terms of per case Dice scores. An improvement of 64.6% -> 75.5% was achieved in detecting kidney tumors in terms of average precision. The results of different usages of the ILP map and the effect of varied amount of training data are also presented.

摘要
We applied this method to improve the segmentation of three lesion types: small bowel carcinoid tumor, kidney tumor, and lung nodule. Our results show improvements of 41.3% to 47.8%, 74.2% to 76.0%, and 26.4% to 32.7% in segmenting these lesions, respectively, in terms of per case Dice scores. We also achieved an improvement of 64.6% to 75.5% in detecting kidney tumors in terms of average precision.We also explored the effect of using the ILP map in different ways and the impact of varying amounts of training data. Our results show that the ILP map can be used effectively to improve the accuracy of lesion segmentation and detection, and that more training data can lead to better performance.

A Hierarchical Transformer Encoder to Improve Entire Neoplasm Segmentation on Whole Slide Image of Hepatocellular Carcinoma

paper_url: http://arxiv.org/abs/2307.05800
repo_url: None
paper_authors: Zhuxian Guo, Qitong Wang, Henning Müller, Themis Palpanas, Nicolas Loménie, Camille Kurtz
for: This paper is written for the purpose of proposing a novel deep learning architecture for entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC).
methods: The paper uses a hierarchical Transformer encoder, called HiTrans, to learn global dependencies within expanded 4096x4096 WSI patches.
results: The proposed method leads to better segmentation performance by taking into account regional and global dependency information.

Abstract
In digital histopathology, entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC) plays an important role, especially as a preprocessing filter to automatically exclude healthy tissue, in histological molecular correlations mining and other downstream histopathological tasks. The segmentation task remains challenging due to HCC's inherent high-heterogeneity and the lack of dependency learning in large field of view. In this article, we propose a novel deep learning architecture with a hierarchical Transformer encoder, HiTrans, to learn the global dependencies within expanded 4096$\times$4096 WSI patches. HiTrans is designed to encode and decode the patches with larger reception fields and the learned global dependencies, compared to the state-of-the-art Fully Convolutional Neural networks (FCNN). Empirical evaluations verified that HiTrans leads to better segmentation performance by taking into account regional and global dependency information.

摘要
在数字 histopathology 中，整个肿瘤分 segmentation 在 Whole Slide Image (WSI) 的肝细胞癌 (HCC) 中扮演着重要的角色，特别是作为自动排除健康组织的预处理过滤器，在 histological molecular 相关性挖掘和其他下游 histopathological 任务中。该分 segmentation 任务仍然具有挑战性，因为 HCC 的内在高积分和大视场视野中的不具有依赖学习。在这篇文章中，我们提出了一种新的深度学习架构，即层次 Transformer 编码器 HiTrans，用于学习大视场范围内的全局依赖关系。HiTrans 设计用于编码和解码大视场范围内的补丁，并学习全局依赖关系，比之前的状态态准FCNN（完全 convolutional neural networks）更高效。实验证明，HiTrans 可以更好地进行分 segmentation，通过考虑区域和全局依赖信息。

3D Medical Image Segmentation based on multi-scale MPU-Net

paper_url: http://arxiv.org/abs/2307.05799
repo_url: https://github.com/Stefan-Yu404/MP-UNet
paper_authors: Zeqiu. Yu, Shuo. Han, Ziheng. Song
for: 这个论文是为了提出一种基于Transformer的快速准确肿瘤分割模型，以解决自动化肿瘤分割的问题。
methods: 这个模型使用了Transformer搭配全球注意机制，以便更好地捕捉肿瘤的深度相关性和多尺度信息。它还具有多尺度模块和交叉注意机制，以增强特征抽取和整合。
results: 根据LiTS 2017数据集测试，MPU-Net模型比标准的U-Net模型显著提高了肿瘤分割效果，其最佳分割结果中的 dice、准确率、特征率、准确率、IOU和MCC指标均达到了92.17%、99.08%、91.91%、99.52%、85.91%和91.74%。这些优秀的指标表现 illustrate this framework’s exceptional performance in automatic medical image segmentation.

Abstract
The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation is problematic due to the irregular stereo structure of 3D volume organs. As a basic model for this class of real applications, U-Net excels. It can learn certain global and local features, but still lacks the capacity to grasp spatial long-range relationships and contextual information at multiple scales. This paper proposes a tumor segmentation model MPU-Net for patient volume CT images, which is inspired by Transformer with a global attention mechanism. By combining image serialization with the Position Attention Module, the model attempts to comprehend deeper contextual dependencies and accomplish precise positioning. Each layer of the decoder is also equipped with a multi-scale module and a cross-attention mechanism. The capability of feature extraction and integration at different levels has been enhanced, and the hybrid loss function developed in this study can better exploit high-resolution characteristic information. Moreover, the suggested architecture is tested and evaluated on the Liver Tumor Segmentation Challenge 2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results. The dice, accuracy, precision, specificity, IOU, and MCC metrics for the best model segmentation results are 92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding indicators in various aspects illustrate the exceptional performance of this framework in automatic medical image segmentation.

摘要
难以自动分割目标器官的主要问题在于三维组织结构的不规则性，导致自动分割医学影像中的准确率偏低。为了解决这问题，这篇论文提出了一种基于Transformer的吸引机制的肿瘤分割模型（MPU-Net），旨在提高分割精度。该模型通过图像序列化和位置吸引模块来理解更深层次的Contextual Dependencies，并通过多层次模块和交叉吸引机制来强化特征抽取和集成。在LiTS2017数据集上测试和评估，MPU-Net模型的分割结果较为出色，比对标本UNet模型更高。分割指标（dice、准确率、精度、特征率、IOU和MCC）的最佳值分别为92.17%、99.08%、91.91%、99.52%、85.91%和91.74%。这些精度指标在不同的方面都达到了 Exceptional Performance，illustrating the superior performance of this framework in automatic medical image segmentation.

SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution

paper_url: http://arxiv.org/abs/2307.05700
repo_url: None
paper_authors: Priyanka Goyal, Sohan Patnaik, Adway Mitra, Manjira Sinha
for: 这个研究旨在提高Remote Sensing图像分析的精度，以提供更好的食物安全性、资源管理和可持续农业实践。
methods: 本研究使用Deep Learning技术分析高分辨率卫星图像，并将HRNet搭配分离型梯度层和自我注意力层，以捕捉空间和时间特征。HRNet模型 serves as a backbone，提取高分辨率特征，而分离型梯度层在浅层中更有效地捕捉细节状态。自我注意力层则捕捉长期时间的依赖关系。最后，一个CNN嵌入式产生了农作物地图。
results: 本研究在Zuericrop数据集上实现了97.5%的分类准确率和55.2%的IoU值，而与现有模型相比，其结果有所超越。

Abstract
The accurate mapping of crop production is crucial for ensuring food security, effective resource management, and sustainable agricultural practices. One way to achieve this is by analyzing high-resolution satellite imagery. Deep Learning has been successful in analyzing images, including remote sensing imagery. However, capturing intricate crop patterns is challenging due to their complexity and variability. In this paper, we propose a novel Deep learning approach that integrates HRNet with Separable Convolutional layers to capture spatial patterns and Self-attention to capture temporal patterns of the data. The HRNet model acts as a backbone and extracts high-resolution features from crop images. Spatially separable convolution in the shallow layers of the HRNet model captures intricate crop patterns more effectively while reducing the computational cost. The multi-head attention mechanism captures long-term temporal dependencies from the encoded vector representation of the images. Finally, a CNN decoder generates a crop map from the aggregated representation. Adaboost is used on top of this to further improve accuracy. The proposed algorithm achieves a high classification accuracy of 97.5\% and IoU of 55.2\% in generating crop maps. We evaluate the performance of our pipeline on the Zuericrop dataset and demonstrate that our results outperform state-of-the-art models such as U-Net++, ResNet50, VGG19, InceptionV3, DenseNet, and EfficientNet. This research showcases the potential of Deep Learning for Earth Observation Systems.

摘要
准确地映射农作物生产是确保食品安全、有效地资源管理以及可持续的农业实践的关键。一种实现这一目标的方法是通过分析高分辨率卫星图像。深度学习在分析图像方面取得了成功，但是捕捉农作物细致图案是困难的，因为它们的复杂性和变化性。在这篇论文中，我们提出了一种新的深度学习方法，该方法将HRNet模型作为背bone，并将彩色卷积 layer和自注意力机制结合在一起，以捕捉空间和时间特征。HRNet模型在农作物图像中提取高分辨率特征。在浅层的彩色卷积 layer中，使用分解卷积来更有效地捕捉农作物图案。多头注意力机制在编码器中捕捉图像序列中的长期时间相关性。最后，一个CNN解码器将生成农作物地图。使用Adaboost进一步提高准确率。我们的算法在Zuericrop数据集上实现了97.5%的分类精度和55.2%的IOU，在生成农作物地图方面表现出色，并超过了state-of-the-art模型 such as U-Net++, ResNet50、VGG19、InceptionV3、DenseNet和EfficientNet。这项研究展示了深度学习在地球观测系统中的潜力。