eess.IV - 2023-07-15

HQG-Net: Unpaired Medical Image Enhancement with High-Quality Guidance

  • paper_url: http://arxiv.org/abs/2307.07829
  • repo_url: None
  • paper_authors: Chunming He, Kai Li, Guoxia Xu, Jiangpeng Yan, Longxiang Tang, Yulun Zhang, Xiu Li, Yaowei Wang
  • for: 本研究旨在将低品质医疗影像(LQ)转换为高品质医疗影像(HQ),不需要靠据双对边的图像进行训练。
  • methods: 我们提出了一种新的UMIE方法,通过直接将高品质图像的特征(例如特征提取)转换为低品质图像的增强过程中,以确保增强过程中的资讯是高品质图像的资讯。
  • results: 我们的方法在三个医疗图像 dataset 上进行实验,比较了旧有的方法,结果显示我们的方法可以提高增强效果和下游任务的表现。
    Abstract Unpaired Medical Image Enhancement (UMIE) aims to transform a low-quality (LQ) medical image into a high-quality (HQ) one without relying on paired images for training. While most existing approaches are based on Pix2Pix/CycleGAN and are effective to some extent, they fail to explicitly use HQ information to guide the enhancement process, which can lead to undesired artifacts and structural distortions. In this paper, we propose a novel UMIE approach that avoids the above limitation of existing methods by directly encoding HQ cues into the LQ enhancement process in a variational fashion and thus model the UMIE task under the joint distribution between the LQ and HQ domains. Specifically, we extract features from an HQ image and explicitly insert the features, which are expected to encode HQ cues, into the enhancement network to guide the LQ enhancement with the variational normalization module. We train the enhancement network adversarially with a discriminator to ensure the generated HQ image falls into the HQ domain. We further propose a content-aware loss to guide the enhancement process with wavelet-based pixel-level and multi-encoder-based feature-level constraints. Additionally, as a key motivation for performing image enhancement is to make the enhanced images serve better for downstream tasks, we propose a bi-level learning scheme to optimize the UMIE task and downstream tasks cooperatively, helping generate HQ images both visually appealing and favorable for downstream tasks. Experiments on three medical datasets, including two newly collected datasets, verify that the proposed method outperforms existing techniques in terms of both enhancement quality and downstream task performance. We will make the code and the newly collected datasets publicly available for community study.
    摘要 <>translate(umie)Unsupervised Medical Image Enhancement (UMIE) 的目标是将低质量(LQ)医疗图像转化为高质量(HQ)图像,而不依赖于对训练图像的对应。现有的方法多数基于 Pix2Pix/CycleGAN,虽然有一定的效果,但是它们不会直接使用 HQ 信息来导引增强过程,这可能会导致不 DESIRED 的artefacts 和结构扭曲。在这篇论文中,我们提出了一种新的 UMIE 方法,通过直接在 LQ 增强过程中编码 HQ 信息来避免上述 limitation。 Specifically,我们从 HQ 图像中提取特征,并将这些特征直接插入增强网络中,以帮助 LQ 图像增强。我们使用变量 норmalization 模块来确保生成的 HQ 图像处于 HQ 领域内。我们还提出了一种内容相关的损失函数,用于引导增强过程,并且使用 wavelet 基于像素级和多个 encoder 基于特征级的约束来限制增强过程。此外,作为增强图像的主要目的是为了使其更适合下游任务,我们提出了一种两级学习方案,通过将 UMIE 任务和下游任务相互协同学习,以生成高质量的图像,同时也能够满足下游任务的需求。实验结果表明,提出的方法在三个医疗数据集上都有较高的增强质量和下游任务性能。我们将代码和新收集的数据集公开发布,以便社区进行研究。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

MUVF-YOLOX: A Multi-modal Ultrasound Video Fusion Network for Renal Tumor Diagnosis

  • paper_url: http://arxiv.org/abs/2307.07807
  • repo_url: https://github.com/jeunyuli/muaf
  • paper_authors: Junyu Li, Han Huang, Dong Ni, Wufeng Xue, Dongmei Zhu, Jun Cheng
    for:这个论文的目的是检测和分类肾脏癌,以提高患者存活率。methods:这个论文使用了多模态超声影像视频融合网络,将B模式和CEUS模式超声影像视频融合到一起,以提高肾脏癌诊断的准确性。results:实验结果表明,提案的框架在多中心数据集上表现出色,超过单模态模型和竞争方法。此外,我们的OTA模块在分类任务中获得了更高的准确率。代码可以在GitHub上获取:https://github.com/JeunyuLi/MUAF。
    Abstract Early diagnosis of renal cancer can greatly improve the survival rate of patients. Contrast-enhanced ultrasound (CEUS) is a cost-effective and non-invasive imaging technique and has become more and more frequently used for renal tumor diagnosis. However, the classification of benign and malignant renal tumors can still be very challenging due to the highly heterogeneous appearance of cancer and imaging artifacts. Our aim is to detect and classify renal tumors by integrating B-mode and CEUS-mode ultrasound videos. To this end, we propose a novel multi-modal ultrasound video fusion network that can effectively perform multi-modal feature fusion and video classification for renal tumor diagnosis. The attention-based multi-modal fusion module uses cross-attention and self-attention to extract modality-invariant features and modality-specific features in parallel. In addition, we design an object-level temporal aggregation (OTA) module that can automatically filter low-quality features and efficiently integrate temporal information from multiple frames to improve the accuracy of tumor diagnosis. Experimental results on a multicenter dataset show that the proposed framework outperforms the single-modal models and the competing methods. Furthermore, our OTA module achieves higher classification accuracy than the frame-level predictions. Our code is available at \url{https://github.com/JeunyuLi/MUAF}.
    摘要 早期诊断reno肿瘤可以大大提高患者存活率。对比增强超声(CEUS)是一种 Cost-effective 和非侵入的成像技术,在reno肿瘤诊断中越来越常用。然而,分类benign和malignantreno肿瘤仍然是非常困难的,这是因为肿瘤的高度多样性和成像 artifacts。我们的目标是通过 integrate B-mode和CEUS-mode超声视频来检测和分类reno肿瘤。为此,我们提出了一种 novel 多模态超声视频融合网络,可以有效地执行多模态特征融合和视频分类。我们的注意力基于多模态融合模块使用 Cross-attention 和自注意力来提取模式不变特征和模式特征。此外,我们设计了一个 object-level temporal aggregation(OTA)模块,可以自动筛选低质量特征并有效地集成多帧中的时间信息,以提高肿瘤诊断的准确性。实验结果表明,我们提出的框架在多中心数据集上超过单模态模型和竞争方法。此外,我们的 OTA 模块在 Frame-level 预测中实现了更高的分类精度。我们的代码可以在 上获取。

Theoretical Analysis of Binary Masks in Snapshot Compressive Imaging Systems

  • paper_url: http://arxiv.org/abs/2307.07796
  • repo_url: None
  • paper_authors: Mengyu Zhao, Shirin Jalali
  • for: 这 paper 主要研究了 binary 面Mask 在 compressive imaging 系统中的影响。
  • methods: 该 paper 使用了 teoretic 分析方法来 investigate binary 面Mask 的影响。
  • results: 研究发现,最佳的 binary 面Mask 的概率非零元素小于 0.5,这提供了设计和优化 binary 面Mask 的 valuable 信息。
    Abstract Snapshot compressive imaging (SCI) systems have gained significant attention in recent years. While previous theoretical studies have primarily focused on the performance analysis of Gaussian masks, practical SCI systems often employ binary-valued masks. Furthermore, recent research has demonstrated that optimized binary masks can significantly enhance system performance. In this paper, we present a comprehensive theoretical characterization of binary masks and their impact on SCI system performance. Initially, we investigate the scenario where the masks are binary and independently identically distributed (iid), revealing a noteworthy finding that aligns with prior numerical results. Specifically, we show that the optimal probability of non-zero elements in the masks is smaller than 0.5. This result provides valuable insights into the design and optimization of binary masks for SCI systems, facilitating further advancements in the field. Additionally, we extend our analysis to characterize the performance of SCI systems where the mask entries are not independent but are generated based on a stationary first-order Markov process. Overall, our theoretical framework offers a comprehensive understanding of the performance implications associated with binary masks in SCI systems.
    摘要 快照压缩成像(SCI)系统在最近几年内获得了广泛关注。而在理论研究中,既前面的研究主要集中在 Gaussian 面积上的性能分析,实际的 SCI 系统却常常使用二值面积。此外,最近的研究表明,优化的二值面积可以显著提高系统性能。在这篇论文中,我们提供了 SCi 系统中二值面积的完整理论Characterization,并对其对系统性能的影响进行了深入分析。首先,我们研究了面积为二值独立相同分布(iid)的情况,发现一个值得注意的结论,与之前的数值结果相符。具体来说,我们证明了最佳非零元素概率在面积中小于 0.5。这个结论为 SCi 系统中 binary 面积的设计和优化提供了有价值的准则,推动了领域的进一步发展。此外,我们将分析推广到 SCi 系统中面积条件不独立,而是基于一个站立的首阶Markov 过程生成的情况。总的来说,我们的理论框架为 SCi 系统中二值面积的性能影响提供了全面的理解。

Tightly-Coupled LiDAR-Visual SLAM Based on Geometric Features for Mobile Agents

  • paper_url: http://arxiv.org/abs/2307.07763
  • repo_url: None
  • paper_authors: Ke Cao, Ruiping Liu, Ze Wang, Kunyu Peng, Jiaming Zhang, Junwei Zheng, Zhifeng Teng, Kailun Yang, Rainer Stiefelhagen
  • for: 提供自主导航和任务执行的基础,帮助移动机器人在复杂和未知环境中运行。
  • methods: 利用光梯雷达-视觉SLAM基于几何特征,包括两个子系统(光梯雷达和单目视觉SLAM)以及一个融合框架。融合框架将深度和 semantic 特征相关,以补做视觉线底标记,并在Bundle Adjustment 中添加方向优化。
  • results: 在公共数据集 M2DGR 上进行评估,与当前状态的多模式方法相比,我们的系统实现了更高精度和更加稳定的姿态估计。
    Abstract The mobile robot relies on SLAM (Simultaneous Localization and Mapping) to provide autonomous navigation and task execution in complex and unknown environments. However, it is hard to develop a dedicated algorithm for mobile robots due to dynamic and challenging situations, such as poor lighting conditions and motion blur. To tackle this issue, we propose a tightly-coupled LiDAR-visual SLAM based on geometric features, which includes two sub-systems (LiDAR and monocular visual SLAM) and a fusion framework. The fusion framework associates the depth and semantics of the multi-modal geometric features to complement the visual line landmarks and to add direction optimization in Bundle Adjustment (BA). This further constrains visual odometry. On the other hand, the entire line segment detected by the visual subsystem overcomes the limitation of the LiDAR subsystem, which can only perform the local calculation for geometric features. It adjusts the direction of linear feature points and filters out outliers, leading to a higher accurate odometry system. Finally, we employ a module to detect the subsystem's operation, providing the LiDAR subsystem's output as a complementary trajectory to our system while visual subsystem tracking fails. The evaluation results on the public dataset M2DGR, gathered from ground robots across various indoor and outdoor scenarios, show that our system achieves more accurate and robust pose estimation compared to current state-of-the-art multi-modal methods.
    摘要 Mobile robot靠SLAM(同时地址和地图生成)提供自主导航和任务执行在复杂和未知环境中。然而,为手动机器人开发专门的算法很难,因为有动态和挑战性的情况,如亮度不足和运动模糊。为解决这个问题,我们提议一种紧密相互关联的LiDAR-视觉SLAM,包括两个子系统(LiDAR和单目视觉SLAM)和一个融合框架。融合框架将LiDAR和视觉的多模态几何特征相关,以增强视觉线坐标的精度和 semantics,并在Bundle Adjustment(BA)中添加方向优化。这会进一步约束视觉odoometry。另一方面,视觉子系统检测到的整条视觉线将LiDAR子系统的局部计算限制,并且可以调整视觉线的方向和过滤异常值,从而实现更高精度的odoometry系统。最后,我们采用一个模块来检测子系统的操作,提供LiDAR子系统的补做轨迹,而视觉子系统tracking失败时。根据M2DGR公共数据集,评估结果显示,我们的系统在多模态方法中实现了更高精度和robust的pose估计。

Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

  • paper_url: http://arxiv.org/abs/2307.07757
  • repo_url: https://github.com/ruipingl/opensu
  • paper_authors: Ruiping Liu, Jiaming Zhang, Kunyu Peng, Junwei Zheng, Ke Cao, Yufan Chen, Kailun Yang, Rainer Stiefelhagen
  • for: 协助人们 WITH 视觉障碍(PVI)获得精确的场景理解和独立移动。
  • methods: 使用 Grounded Situation Recognition (GSR) 技术,并将其扩展为 Open Scene Understanding (OpenSU) 系统,包括实现 pixel-wise dense segmentation masks 以及增强特征提取和Encoder-Decoder 结构之间的互动。
  • results: 在 SWiG 数据集上取得了最佳性能,并在实际应用中显示了对 PVI 人群的独立移动能力提高。
    Abstract Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to navigate their surroundings confidently and make informed decisions. For the first time, we propose an Open Scene Understanding (OpenSU) system that aims to generate pixel-wise dense segmentation masks of involved entities instead of bounding boxes. Specifically, we build our OpenSU system on top of GSR by additionally adopting an efficient Segment Anything Model (SAM). Furthermore, to enhance the feature extraction and interaction between the encoder-decoder structure, we construct our OpenSU system using a solid pure transformer backbone to improve the performance of GSR. In order to accelerate the convergence, we replace all the activation functions within the GSR decoders with GELU, thereby reducing the training duration. In quantitative analysis, our model achieves state-of-the-art performance on the SWiG dataset. Moreover, through field testing on dedicated assistive technology datasets and application demonstrations, the proposed OpenSU system can be used to enhance scene understanding and facilitate the independent mobility of people with visual impairments. Our code will be available at https://github.com/RuipingL/OpenSU.
    摘要 “固定场景认知(GSR)能够理解和解释视觉场景,生成出明确的活动(动词)和参与者(角色)。在这项工作中,我们关注使用GSR进行辅助视障人群(PVI)。然而,精确的本地化信息可以帮助人们自信地导航周围环境和做出 Informed 决策。为了实现这一目标,我们第一次提出了一个开放场景理解(OpenSU)系统,旨在生成像素粒度的精密分割mask,而不是 bounding box。具体来说,我们基于GSR构建了OpenSU系统,并采用高效的Segment Anything Model(SAM)。此外,为了提高Encoder-Decoder结构中的特征提取和交互,我们使用了坚实的纯transformer背景。为了加速训练,我们在GSR解码器中replace所有活动函数,使得训练时间缩短。在量化分析中,我们的模型在SWiG数据集上达到了领先的性能。此外,通过特定辅助技术数据集和应用示例测试,我们的OpenSU系统可以增强场景理解和推动视障人群的独立行动。我们的代码将在https://github.com/RuipingL/OpenSU上公开。”

ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

  • paper_url: http://arxiv.org/abs/2307.07710
  • repo_url: https://github.com/wyf0912/ExposureDiffusion
  • paper_authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen
  • for: 提高图像的亮度和细节,并且可以处理不同的扬光率和噪声模型
  • methods: 结合扩散模型和物理基础模型,实现从噪声图像直接 restore 到正常曝光图像,而不需要先处理噪声
  • results: 比传统扩散模型具有更好的性能和更快的执行时间,并且可以与不同的背景网络和实际对照数据集一起使用
    Abstract Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic mappings from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure model. Different from a vanilla diffusion model that has to perform Gaussian denoising, with the injected physics-based exposure model, our restoration process can directly start from a noisy image instead of pure noise. As such, our method obtains significantly improved performance and reduced inference time compared with vanilla diffusion models. To make full use of the advantages of different intermediate steps, we further propose an adaptive residual layer that effectively screens out the side-effect in the iterative refinement when the intermediate results have been already well-exposed. The proposed framework can work with both real-paired datasets, SOTA noise models, and different backbone networks. Note that, the proposed framework is compatible with real-paired datasets, real/synthetic noise models, and different backbone networks. We evaluate the proposed method on various public benchmarks, achieving promising results with consistent improvements using different exposure models and backbones. Besides, the proposed method achieves better generalization capacity for unseen amplifying ratios and better performance than a larger feedforward neural model when few parameters are adopted.
    摘要 以前的Raw图像基于低光照图像增强方法主要采用了Feedforward神经网络来学习确定性的映射从低光照图像到正常曝光图像。然而,它们没有捕捉到重要的分布信息,导致视觉不满意的结果。这个工作解决这个问题,通过将扩散模型与物理基础曝光模型相结合。与普通的扩散模型不同,我们的恢复过程可以直接从噪声图像开始,而不需要纯噪声。因此,我们的方法可以获得显著改进的性能和减少推理时间,相比普通的扩散模型。为了充分利用不同的中间结果的优势,我们还提出了适应性的剩余层,可以有效地排除中间结果的副作用在迭代纠正过程中。我们的框架可以与真实对应的数据集、前进推理模型和不同的背景网络一起工作。注意,我们的框架与真实对应的数据集、真实/生成噪声模型和不同的背景网络兼容。我们在各种公共测试benchmark上评估了我们的方法,实现了优秀的结果,并在不同的曝光模型和背景网络上获得了适应性和性能优势。此外,我们的方法在未见扩大比率上也有更好的总体适应能力和性能优势。

DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration

  • paper_url: http://arxiv.org/abs/2307.07688
  • repo_url: https://github.com/YuanshuoCheng/DRM-IR
  • paper_authors: Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Chao Wang, Wangmeng Zuo
  • for: 这个论文主要针对的是多种噪声和损害的图像修复问题,以实现所有在一个的图像修复方法。
  • methods: 该论文提出了一种高效的动态参照模型方法(DRM-IR),它包括任务适应型噪声模型和图像修复模型。具体来说,这两个子任务是通过一对参照图像对MAP推断来形式化,并在不断嵌套的方式进行优化。
  • results: 对多个标准 benchmark 数据集进行了广泛的实验,结果显示,我们的 DRM-IR 方法可以在 All-In-One 图像修复中达到状态的前者。
    Abstract Existing All-In-One image restoration (IR) methods usually lack flexible modeling on various types of degradation, thus impeding the restoration performance. To achieve All-In-One IR with higher task dexterity, this work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR), which consists of task-adaptive degradation modeling and model-based image restoring. Specifically, these two subtasks are formalized as a pair of entangled reference-based maximum a posteriori (MAP) inferences, which are optimized synchronously in an unfolding-based manner. With the two cascaded subtasks, DRM-IR first dynamically models the task-specific degradation based on a reference image pair and further restores the image with the collected degradation statistics. Besides, to bridge the semantic gap between the reference and target degraded images, we further devise a Degradation Prior Transmitter (DPT) that restrains the instance-specific feature differences. DRM-IR explicitly provides superior flexibility for All-in-One IR while being interpretable. Extensive experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR.
    摘要 通常的全面修复(IR)方法通常缺乏多种降低的灵活模型化,因此影响了修复性能。为了实现更高的任务敏捷度,这项工作提出了一种高效的动态参照模型(DRM-IR),它包括任务适应型的降低模型和基于模型的图像修复。具体来说,这两个子任务被формализова为一对推理最大 posteriori(MAP)推理,它们在一个层次结构中被同步优化。通过这两个相顺序的子任务,DRM-IR首先在参照图像对的基础上动态模型任务特定的降低,然后使用收集的降低统计来修复图像。此外,为了跨越参照图像和目标降低图像之间的semantic gap,我们还开发了一种降低先验(DPT),它限制了特定的特征差异。DRM-IR通过显式提供多种降低类型的灵活性,而且可读性高。广泛的实验表明,我们的DRM-IR在全面修复中实现了state-of-the-art。