eess.IV - 2023-08-06

FourLLIE: Boosting Low-Light Image Enhancement by Fourier Frequency Information

  • paper_url: http://arxiv.org/abs/2308.03033
  • repo_url: https://github.com/wangchx67/fourllie
  • paper_authors: Chenxi Wang, Hongjun Wu, Zhi Jin
  • for: 提高低光照图像的亮度和细节
  • methods: 利用 fourier 频谱信息和空间信息,提出了一种基于 fourier 频谱的 LLIE 网络(FourLLIE),通过估计 fourier 频谱变换图并在第二个阶段引入 SNR 图来提高图像细节和亮度。
  • results: FourLLIE 在四个标准测试集上的表现比既有 SOTA 方法更好,同时保持了好的模型效率。
    Abstract Recently, Fourier frequency information has attracted much attention in Low-Light Image Enhancement (LLIE). Some researchers noticed that, in the Fourier space, the lightness degradation mainly exists in the amplitude component and the rest exists in the phase component. By incorporating both the Fourier frequency and the spatial information, these researchers proposed remarkable solutions for LLIE. In this work, we further explore the positive correlation between the magnitude of amplitude and the magnitude of lightness, which can be effectively leveraged to improve the lightness of low-light images in the Fourier space. Moreover, we find that the Fourier transform can extract the global information of the image, and does not introduce massive neural network parameters like Multi-Layer Perceptrons (MLPs) or Transformer. To this end, a two-stage Fourier-based LLIE network (FourLLIE) is proposed. In the first stage, we improve the lightness of low-light images by estimating the amplitude transform map in the Fourier space. In the second stage, we introduce the Signal-to-Noise-Ratio (SNR) map to provide the prior for integrating the global Fourier frequency and the local spatial information, which recovers image details in the spatial space. With this ingenious design, FourLLIE outperforms the existing state-of-the-art (SOTA) LLIE methods on four representative datasets while maintaining good model efficiency.
    摘要 近期,傅里叶频率信息在低光照图像提升(LLIE)中受到了很多注意。一些研究人员发现,在傅里叶空间中,亮度减退主要存在于幅度组件中,剩下的存在于相位组件中。通过结合傅里叶频率和空间信息,这些研究人员提出了有优势的解决方案。在这个工作中,我们进一步探索幅度组件的积分和亮度之间的正相关关系,可以有效地提高低光照图像的亮度在傅里叶空间中。此外,我们发现傅里叶变换可以提取图像的全局信息,不需要大量的神经网络参数如多层感知器(MLP)或变换器。为此,我们提出了一个两Stage的傅里叶基于LLIE网络(FourLLIE)。在第一stage中,我们使用傅里叶变换map来提高低光照图像的亮度。在第二stage中,我们引入信号噪比(SNR)地图,以提供亮度提升的优先级,并将全局傅里叶频率和本地空间信息集成起来,以恢复图像的细节。通过这种独特的设计,FourLLIE在四个代表性的数据集上比前一些SOTA LLIE方法表现出色,同时保持了好的模型效率。

Recurrent Spike-based Image Restoration under General Illumination

  • paper_url: http://arxiv.org/abs/2308.03018
  • repo_url: https://github.com/bit-vision/rsir
  • paper_authors: Lin Zhu, Yunlong Zheng, Mengyue Geng, Lizhi Wang, Hua Huang
  • for: 这种新型的生物体注视传感器可以记录光Intensity为高速度的锥形数组,具有高时间分辨率(20,000 Hz)。这种新的视觉传感器方式可以提供更多的视觉任务,如高速度图像重建。但是,现有的颗粒基本approaches通常假设场景中有足够的光Intensity,这通常不符合实际世界中的许多场景,如雨天或晚上场景。
  • methods: 我们提出了一种Recurrent Spike-based Image Restoration(RSIR)网络,这是首个能够从颗粒数组中恢复清晰图像的方法。我们根据采样过程建立了物理基于的颗粒噪声模型,并根据这个噪声模型,我们设计了我们的RSIR网络,该网络包括自适应颗粒变换模块、回归时间特征融合模块和频率基于的颗粒去噪模块。我们的RSIR可以在循环方式处理颗粒数组,以确保颗粒时间信息得到了好用。
  • results: 我们通过对实际 datasets with different illuminations进行了广泛的实验,证明了我们的方法的有效性。代码和数据集在https://github.com/BIT-Vision/RSIR上发布。
    Abstract Spike camera is a new type of bio-inspired vision sensor that records light intensity in the form of a spike array with high temporal resolution (20,000 Hz). This new paradigm of vision sensor offers significant advantages for many vision tasks such as high speed image reconstruction. However, existing spike-based approaches typically assume that the scenes are with sufficient light intensity, which is usually unavailable in many real-world scenarios such as rainy days or dusk scenes. To unlock more spike-based application scenarios, we propose a Recurrent Spike-based Image Restoration (RSIR) network, which is the first work towards restoring clear images from spike arrays under general illumination. Specifically, to accurately describe the noise distribution under different illuminations, we build a physical-based spike noise model according to the sampling process of the spike camera. Based on the noise model, we design our RSIR network which consists of an adaptive spike transformation module, a recurrent temporal feature fusion module, and a frequency-based spike denoising module. Our RSIR can process the spike array in a recursive manner to ensure that the spike temporal information is well utilized. In the training process, we generate the simulated spike data based on our noise model to train our network. Extensive experiments on real-world datasets with different illuminations demonstrate the effectiveness of the proposed network. The code and dataset are released at https://github.com/BIT-Vision/RSIR.
    摘要 新型的蜂巢相机(Spike camera)是一种基于生物体的视觉传感器,它记录光度的变化形式为高度精度的蜂巢数组(20,000 Hz)。这种新的视觉传感器 paradigma提供了许多视觉任务的高速重建优势,但现有的蜂巢基本方法通常假设场景中有足够的光度,这通常不符合实际情况,如雨天或晚上场景。为了拓展更多的蜂巢应用场景,我们提出了一种基于蜂巢的图像修复网络(RSIR),这是首个在普通照明下修复清晰图像的工作。 Specifically, 我们建立了基于采样过程的物理基于蜂巢噪声模型,以描述不同照明下噪声分布。根据噪声模型,我们设计了我们的 RSIR 网络,该网络包括自适应蜂巢变换模块、回归时间特征融合模块和频率基于蜂巢噪声除净模块。我们的 RSIR 可以 recursive 地处理蜂巢数组,以确保蜂巢时间信息得到好好利用。在训练过程中,我们根据我们的噪声模型生成了模拟的蜂巢数据来训练我们的网络。广泛的实验表明,我们的方法可以在不同的照明下进行高效的图像修复。代码和数据集可以在 上下载。

High-Resolution Vision Transformers for Pixel-Level Identification of Structural Components and Damage

  • paper_url: http://arxiv.org/abs/2308.03006
  • repo_url: None
  • paper_authors: Kareem Eltouny, Seyedomid Sajedi, Xiao Liang
  • for: 这个研究旨在提高桥梁检查图像的解析和检测效率,使用视transformer和劳拉幂 pyramids scaling网络来高效分割高分辨率视频检查图像。
  • methods: 该研究提出了一种基于视transformer和劳拉幂 pyramids scaling网络的 semantic segmentation网络,可以高效地处理大量的高分辨率视频检查图像,并保持本地细节和全局 semantics 信息。
  • results: 经过对bridge inspection report图像的详细实验,该方法能够高效地检测桥梁材料的分布,并且在多种 метриках上达到了比较高的准确率。
    Abstract Visual inspection is predominantly used to evaluate the state of civil structures, but recent developments in unmanned aerial vehicles (UAVs) and artificial intelligence have increased the speed, safety, and reliability of the inspection process. In this study, we develop a semantic segmentation network based on vision transformers and Laplacian pyramids scaling networks for efficiently parsing high-resolution visual inspection images. The massive amounts of collected high-resolution images during inspections can slow down the investigation efforts. And while there have been extensive studies dedicated to the use of deep learning models for damage segmentation, processing high-resolution visual data can pose major computational difficulties. Traditionally, images are either uniformly downsampled or partitioned to cope with computational demands. However, the input is at risk of losing local fine details, such as thin cracks, or global contextual information. Inspired by super-resolution architectures, our vision transformer model learns to resize high-resolution images and masks to retain both the valuable local features and the global semantics without sacrificing computational efficiency. The proposed framework has been evaluated through comprehensive experiments on a dataset of bridge inspection report images using multiple metrics for pixel-wise materials detection.
    摘要 <>Translate the given text into Simplified Chinese.<>视觉检查主要用于评估公共建筑物,但最近的无人飞行器(UAV)和人工智能技术的发展已经提高了检查过程的速度、安全性和可靠性。在这项研究中,我们开发了基于视觉变换器和傅里叶分割网络的Semantic Segmentation网络,用于高效地解析视觉检查图像。收集的大量高分辨率图像可能会拖垮调查工作。虽然有很多关于深度学习模型的损害分割研究,但处理高分辨率视觉数据可以带来很大的计算困难。传统上,图像会被uniform downsample或分割,以降低计算压力,但输入可能会产生Local细腐 crack或全局Contextual信息的产生。受超分辨率架构启发,我们的视觉变换器模型可以resize高分辨率图像和mask来保留有价值的Local特征和全局Semantics信息,不需要牺牲计算效率。我们提出的框架在 bridge 检查报告图像上进行了广泛的实验,使用多种度量来进行像素精度检测。

Weakly supervised segmentation of intracranial aneurysms using a 3D focal modulation UNet

  • paper_url: http://arxiv.org/abs/2308.03001
  • repo_url: None
  • paper_authors: Amirhossein Rasoulian, Soorena Salari, Yiming Xiao
  • for: 本研究旨在提高脑动脉疾病诊断和治疗决策的准确性和效率,通过自动化三维血管动脉分割来提高UIA诊断和评估。
  • methods: 本研究使用了弱监督学习和粗糙标签,通过一种基于focal modulation的3D focal modulation UNet和Conditional Random Field(CRF)后处理来实现高精度的UIA分割。
  • results: 实验结果表明,提出的方法在评估指标Dice分数和 Hausdorff距离上均超过了现有的3D UNet和Swin-UNETR方法,并且显示了 focal modulation 的潜在优势。
    Abstract Accurate identification and quantification of unruptured intracranial aneurysms (UIAs) are essential for the risk assessment and treatment decisions of this cerebrovascular disorder. Current assessment based on 2D manual measures of aneurysms on 3D magnetic resonance angiography (MRA) is sub-optimal and time-consuming. Automatic 3D measures can significantly benefit the clinical workflow and treatment outcomes. However, one major issue in medical image segmentation is the need for large well-annotated data, which can be expensive to obtain. Techniques that mitigate the requirement, such as weakly supervised learning with coarse labels are highly desirable. In this paper, we leverage coarse labels of UIAs from time-of-flight MRAs to obtain refined UIAs segmentation using a novel 3D focal modulation UNet, called FocalSegNet and conditional random field (CRF) postprocessing, with a Dice score of 0.68 and 95% Hausdorff distance of 0.95 mm. We evaluated the performance of the proposed algorithms against the state-of-the-art 3D UNet and Swin-UNETR, and demonstrated the superiority of the proposed FocalSegNet and the benefit of focal modulation for the task.
    摘要 精准识别和量化非ruptured intracranial aneurysms (UIAs) 是脑血管疾病风险评估和治疗决策中的关键。现有的评估方法基于2D手动测量在3D磁共振成像(MRA)上的动脉瘤是次优化的和时间consuming。自动3D测量可以帮助优化诊断和治疗结果。然而,医疗图像分割的一个主要问题是需要大量高质量标注数据,这可以是成本高的。我们在这篇论文中利用了时间反射MRAs中的UIAs粗略标注来获得精细的UIAs分割,使用了一种新的3D焦点修饰UNet(FocalSegNet)和条件Random Field(CRF)后处理,得到了0.68的Dice分数和0.95毫米的95% Hausdorff距离。我们对已有的3D UNets和Swin-UNETR进行了比较,并证明了我们提出的FocalSegNet的优越性和焦点修饰的好处。

DermoSegDiff: A Boundary-aware Segmentation Diffusion Model for Skin Lesion Delineation

  • paper_url: http://arxiv.org/abs/2308.02959
  • repo_url: https://github.com/mindflow-institue/dermosegdiff
  • paper_authors: Afshin Bozorgpour, Yousef Sadegheih, Amirhossein Kazerouni, Reza Azad, Dorit Merhof
  • for: 静脉皮肤病症诊断早期检测
  • methods: 使用边缘信息在学习过程中进行增强,并 introduce 一种新的损失函数来优先级化边界信息。
  • results: 对多个皮肤分割数据集进行实验,表明 DermoSegDiff 比现有 CNN、转换器和扩散模型更高效和普遍。
    Abstract Skin lesion segmentation plays a critical role in the early detection and accurate diagnosis of dermatological conditions. Denoising Diffusion Probabilistic Models (DDPMs) have recently gained attention for their exceptional image-generation capabilities. Building on these advancements, we propose DermoSegDiff, a novel framework for skin lesion segmentation that incorporates boundary information during the learning process. Our approach introduces a novel loss function that prioritizes the boundaries during training, gradually reducing the significance of other regions. We also introduce a novel U-Net-based denoising network that proficiently integrates noise and semantic information inside the network. Experimental results on multiple skin segmentation datasets demonstrate the superiority of DermoSegDiff over existing CNN, transformer, and diffusion-based approaches, showcasing its effectiveness and generalization in various scenarios. The implementation is publicly accessible on \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub}
    摘要 皮肤 lesion 分割在早期检测和准确诊断皮肤病理中扮演了关键角色。 reciently, Denoising Diffusion Probabilistic Models (DDPMs) 在图像生成方面受到了广泛关注。 基于这些进步,我们提出了 DermoSegDiff,一种新的皮肤 lesion 分割框架,它在学习过程中引入边界信息。我们的方法引入了一种新的损失函数,在训练过程中优先级是边界区域,逐渐减少其他区域的重要性。我们还引入了一种基于 U-Net 的混合噪声和semantic信息的denoising网络。多个皮肤分割数据集的实验结果表明,DermoSegDiff 在不同场景下比核心 CNN、transformer 和 diffusion 基本上表现出色,展示其效果和泛化能力。实现可以在 \href{https://github.com/mindflow-institue/dermosegdiff}{GitHub} 上获取。

MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction

  • paper_url: http://arxiv.org/abs/2308.02949
  • repo_url: None
  • paper_authors: Zhangxing Bian, Shuwen Wei, Yihao Liu, Junyu Chen, Jiachen Zhuo, Fangxu Xing, Jonghye Woo, Aaron Carass, Jerry L. Prince
    for:这篇论文旨在提出一种新的“劫持、射击、更正”框架,用于在具有复杂模式和大动量的情况下进行劫持动态图像的运动场 estimation。methods:这种框架基于李群和李代数原理,在坐标空间内积累动量,使用抽象映射在坐标空间中快速地逼近真的极小值,并且通过后续的更正步骤确保劫持到真的极小值。results:在一个2D synthetic数据集和一个实际的3D tMRI数据集上,这种方法能够准确地估计2D/3D动态图像中的劫持动量场,并且能够适应大动量和复杂模式。
    Abstract Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.
    摘要 标记的核磁共振成像(tMRI)已经在数十年内用于测量软组织的运动。然而,基于准确的注册的运动估计从tMRI中很难进行,尤其是当运动较大时。大量运动会让注册方法被困在本地最佳点,导致运动估计错误。我们介绍了一种新的“动量、射击和修正”框架,用于在具有重复模式和大运动的情况下进行拉格朗日运动估计。这个框架基于李代数和李群原理,在 tangent 空间中积累动量,并使用 exponential mapping 在 diffeomorphic 空间中快速地逼近真正的最佳点, circumventing 本地最佳点。后续的修正步骤确保了真正的最佳点准确性。 synthetic 数据集和真实的 3D tMRI 数据集的结果表明,我们的方法可以快速、高精度地估计软组织的 2D/3D 运动场,即使在大运动和重复模式的情况下。