results: 本研究总结了深度学习基于图像注册的最新进展,包括多种新的网络架构、特定于注册的损失函数和注册不确定性的估计方法,以及这些技术在医疗影像应用中的实际应用和未来展望。Abstract
Over the past decade, deep learning technologies have greatly advanced the field of medical image registration. The initial developments, such as ResNet-based and U-Net-based networks, laid the groundwork for deep learning-driven image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regularizations, and uncertainty estimation. These advancements have not only enriched the field of deformable image registration but have also facilitated its application in a wide range of tasks, including atlas construction, multi-atlas segmentation, motion estimation, and 2D-3D registration. In this paper, we present a comprehensive overview of the most recent advancements in deep learning-based image registration. We begin with a concise introduction to the core concepts of deep learning-based image registration. Then, we delve into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty. Additionally, this paper explores appropriate evaluation metrics for assessing the performance of deep learning models in registration tasks. Finally, we highlight the practical applications of these novel techniques in medical imaging and discuss the future prospects of deep learning-based image registration.
摘要
过去一个十年,深度学习技术在医学图像对接方面做出了很大的进步。初期的发展,如基于ResNet和U-Net的网络,为深度学习驱动的图像对接提供了基础。后续的进步包括相似度度量、准确度评估、和变形规则等方面,这些进步不仅推动了可变图像对接的发展,还使其在各种任务中得到了广泛的应用,如制作图像 Atlases、多个 Atlases 分割、运动估计和2D-3D对接。在这篇论文中,我们提供了深度学习基于图像对接最新的进步的全面概述。我们从核心概念开始,然后探讨了专门为对接设计的网络架构、特定于对接的损失函数和对接不确定性的估计方法。此外,这篇论文还考虑了对接任务的评估指标,并将其应用于医学图像中。最后,我们讨论了这些新技术在医学图像中的实际应用和未来前景。
Integrated Digital Reconstruction of Welded Components: Supporting Improved Fatigue Life Prediction
paper_authors: Anders Faarbæk Mikkelstrup, Morten Kristiansen
For: The paper aims to improve the fatigue performance of welded joints in offshore jacket foundations by enhancing the quality of post-weld treatment through digital reconstruction of the weld.* Methods: The paper proposes an industrial manipulator combined with a line scanner to integrate digital reconstruction as part of the automated HFMI treatment setup, using standard image processing, simple filtering techniques, and non-linear optimization to align and merge overlapping scans.* Results: The proposed framework enables generic digital reconstruction of welded parts, aiding in component design, overall quality assurance, and documentation of the HFMI treatment, and improves fatigue life prediction and possible crack location prediction.Abstract
In the design of offshore jacket foundations, fatigue life is crucial. Post-weld treatment has been proposed to enhance the fatigue performance of welded joints, where particularly high-frequency mechanical impact (HFMI) treatment has been shown to improve fatigue performance significantly. Automated HFMI treatment has improved quality assurance and can lead to cost-effective design when combined with accurate fatigue life prediction. However, the finite element method (FEM), commonly used for predicting fatigue life in complex or multi-axial joints, relies on a basic CAD depiction of the weld, failing to consider the actual weld geometry and defects. Including the actual weld geometry in the FE model improves fatigue life prediction and possible crack location prediction but requires a digital reconstruction of the weld. Current digital reconstruction methods are time-consuming or require specialised scanning equipment and potential component relocation. The proposed framework instead uses an industrial manipulator combined with a line scanner to integrate digital reconstruction as part of the automated HFMI treatment setup. This approach applies standard image processing, simple filtering techniques, and non-linear optimisation for aligning and merging overlapping scans. A screened Poisson surface reconstruction finalises the 3D model to create a meshed surface. The outcome is a generic, cost-effective, flexible, and rapid method that enables generic digital reconstruction of welded parts, aiding in component design, overall quality assurance, and documentation of the HFMI treatment.
摘要
design of offshore jacket foundations, fatigue life is crucial. Post-weld treatment has been proposed to enhance the fatigue performance of welded joints, where particularly high-frequency mechanical impact (HFMI) treatment has been shown to improve fatigue performance significantly. Automated HFMI treatment has improved quality assurance and can lead to cost-effective design when combined with accurate fatigue life prediction. However, the finite element method (FEM), commonly used for predicting fatigue life in complex or multi-axial joints, relies on a basic CAD depiction of the weld, failing to consider the actual weld geometry and defects. Including the actual weld geometry in the FE model improves fatigue life prediction and possible crack location prediction but requires a digital reconstruction of the weld. Current digital reconstruction methods are time-consuming or require specialized scanning equipment and potential component relocation. The proposed framework instead uses an industrial manipulator combined with a line scanner to integrate digital reconstruction as part of the automated HFMI treatment setup. This approach applies standard image processing, simple filtering techniques, and non-linear optimization for aligning and merging overlapping scans. A screened Poisson surface reconstruction finalizes the 3D model to create a meshed surface. The outcome is a generic, cost-effective, flexible, and rapid method that enables generic digital reconstruction of welded parts, aiding in component design, overall quality assurance, and documentation of the HFMI treatment.Here's the text in Traditional Chinese:在海上桅杆基础设计中,腐蚀生命是关键。post-weld treatment 已经被提议来提高锋械焊接处的腐蚀性能,其中特别是高频机械冲击(HFMI)处理能够明显改善腐蚀性能。自动化HFMI处理可以提高质量保证和设计成本,并且可以与精确的腐蚀生命预测相结合。然而,通用finite element方法(FEM),常用于预测复杂或多轴焊接处的腐蚀生命,仅仅基于焊接部的基本CAD描述,而不考虑实际焊接件的几何和缺陷。包括实际焊接件的几何在FEM模型中可以提高腐蚀生命预测和可能的裂缝位置预测,但需要焊接件的数字重建。现有的数字重建方法是时间consuming 或需要特殊扫描设备和可能的组件重新位置。提议的框架INSTEAD uses an industrial manipulator combined with a line scanner to integrate digital reconstruction as part of the automated HFMI treatment setup. This approach applies standard image processing, simple filtering techniques, and non-linear optimization for aligning and merging overlapping scans. A screened Poisson surface reconstruction finalizes the 3D model to create a meshed surface. The outcome is a generic, cost-effective, flexible, and rapid method that enables generic digital reconstruction of welded parts, aiding in component design, overall quality assurance, and documentation of the HFMI treatment.
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation of Road Scenes
results: 在UrbanLF-Real和-Syn数据集上达到了状态之内表现,与之前的记录 (+4.53%) 相比,并在UrbanLF-Real Extended数据集上达到了84.93%的mIoU记录。Abstract
Light field cameras can provide rich angular and spatial information to enhance image semantic segmentation for scene understanding in the field of autonomous driving. However, the extensive angular information of light field cameras contains a large amount of redundant data, which is overwhelming for the limited hardware resource of intelligent vehicles. Besides, inappropriate compression leads to information corruption and data loss. To excavate representative information, we propose an Omni-Aperture Fusion model (OAFuser), which leverages dense context from the central view and discovers the angular information from sub-aperture images to generate a semantically-consistent result. To avoid feature loss during network propagation and simultaneously streamline the redundant information from the light field camera, we present a simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed sub-aperture images into angular features without any additional memory cost. Furthermore, to address the mismatched spatial information across viewpoints, we present Center Angular Rectification Module (CARM) realized feature resorting and prevent feature occlusion caused by asymmetric information. Our proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and -Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real Extended dataset, with a gain of +4.53%. The source code of OAFuser will be made publicly available at https://github.com/FeiBryantkit/OAFuser.
摘要
《Light field cameras可以提供rich的angular和空间信息,以提高自动驾驶场景理解。然而,广泛的angular信息中包含大量冗余数据,这会超过智能汽车的硬件资源。另外,不当压缩会导致信息损坏和数据丢失。为了挖掘代表性信息,我们提出了Omni-Aperture Fusion模型(OAFuser),它利用中心视图的dense context和子视图图像中的angular信息,生成具有相同semantic consistency的结果。为了避免网络传播过程中的特征损失并同时压缩 redundancy from the light field camera,我们提出了一个简单 yet highly effective Sub-Aperture Fusion Module(SAFM),可以将子视图图像嵌入angular特征中,无需额外内存成本。此外,为了 Addressing the mismatched spatial information across viewpoints,我们提出了Center Angular Rectification Module(CARM),实现了Feature Resorting和避免了因视角不同而导致的特征遮挡。我们的提出的OAFuser实现了UrbanLF-Real和UrbanLF-Syn数据集上的state-of-the-art performance,并在UrbanLF-Real Extended数据集上达到了84.93%的mIoU记录,升幅+4.53%。我们将在https://github.com/FeiBryantkit/OAFuser上公开源代码。》
Defocus Blur Synthesis and Deblurring via Interpolation and Extrapolation in Latent Space
paper_authors: Ioana Mazilu, Shunxin Wang, Sven Dummer, Raymond Veldhuis, Christoph Brune, Nicola Strisciuglio
For: 提高微scopic图像质量,使其更适合进一步处理和分析疾病。* Methods: 使用自适应抑制神经网络和明确 regularization 技术,从拉тен空间中 linearly interpolate/extrapolate 图像各个Focus plane的表示。* Results: 能够有效地产生不同degree of blur的图像,提高数据多样性,并且可以用作数据增强技术,提高微scopic图像的质量和分析效果。Abstract
Though modern microscopes have an autofocusing system to ensure optimal focus, out-of-focus images can still occur when cells within the medium are not all in the same focal plane, affecting the image quality for medical diagnosis and analysis of diseases. We propose a method that can deblur images as well as synthesize defocus blur. We train autoencoders with implicit and explicit regularization techniques to enforce linearity relations among the representations of different blur levels in the latent space. This allows for the exploration of different blur levels of an object by linearly interpolating/extrapolating the latent representations of images taken at different focal planes. Compared to existing works, we use a simple architecture to synthesize images with flexible blur levels, leveraging the linear latent space. Our regularized autoencoders can effectively mimic blur and deblur, increasing data variety as a data augmentation technique and improving the quality of microscopic images, which would be beneficial for further processing and analysis.
摘要
modern microscopes 有自动对焦系统以确保最佳Focus,但可以出现不对焦图像,因为细胞在媒体中不都在同一个focus plane,这会影响医疗诊断和疾病分析的图像质量。我们提出了一种方法,可以除锈图像以及生成杂化模糊。我们在自适应神经网络中使用了隐式和显式正则化技术,以强制在潜在空间中的表示之间存在线性关系。这allow us to explore不同的杂化水平,通过线性 interpolate/extrapolate latent representation of images taken at different focal planes。相比之下,我们使用了简单的architecture来生成具有 flexible 杂化水平的图像,利用潜在空间的线性特性。我们的正则化自适应神经网络可以有效地模拟杂化和除锈,增加数据多样性作为数据扩充技术,提高微scopic 图像的质量,这将有利于进一步处理和分析。
ERCPMP: An Endoscopic Image and Video Dataset for Colorectal Polyps Morphology and Pathology
results: 这篇论文通过分析endoscopic图像和视频数据集,开发了一个准确的医疗算法,可以用于识别抑血管肿瘤的形态和病理特征。Abstract
In the recent years, artificial intelligence (AI) and its leading subtypes, machine learning (ML) and deep learning (DL) and their applications are spreading very fast in various aspects such as medicine. Today the most important challenge of developing accurate algorithms for medical prediction, detection, diagnosis, treatment and prognosis is data. ERCPMP is an Endoscopic Image and Video Dataset for Recognition of Colorectal Polyps Morphology and Pathology. This dataset contains demographic, morphological and pathological data, endoscopic images and videos of 191 patients with colorectal polyps. Morphological data is included based on the latest international gastroenterology classification references such as Paris, Pit and JNET classification. Pathological data includes the diagnosis of the polyps including Tubular, Villous, Tubulovillous, Hyperplastic, Serrated, Inflammatory and Adenocarcinoma with Dysplasia Grade & Differentiation. The current version of this dataset is published and available on Elsevier Mendeley Dataverse and since it is under development, the latest version is accessible via: https://databiox.com.
摘要
近年来,人工智能(AI)和其主要分支——机器学习(ML)和深度学习(DL)在各种领域广泛应用。医学领域的主要挑战是开发高精度算法,用于医疗预测、检测、诊断、治疗和评估。ERCPMP是一个涵盖肠道肿瘤形态和病理学特征的杜立特内镜影像和视频数据集。这个数据集包括191名患者的肠道肿瘤数据,包括人类学分类标准(Paris、Pit和JNET)中的最新分类参考。病理数据包括肿瘤诊断,包括毛细血管、 villous、 tubulovillous、 炎性、serrated、inflammatory和adenocarcinoma,以及分化度和分化度。现有版本的这个数据集已经发布,可以在 Elsevier Mendeley Dataverse 上获取,而最新版本可以通过以下链接获取:https://databiox.com。
RAWIW: RAW Image Watermarking Robust to ISP Pipeline
paper_authors: Kang Fu, Xiaohong Liu, Jun Jia, Zicheng Zhang, Yicong Peng, Jia Wang, Guangtao Zhai for:这个研究是为了提供一个基于深度学习的RAW图像潜像 watermarking框架,以保护RAW图像的版权。methods:我们使用了一个内置的神经网络来实现RAW图像与RGB图像之间的跨领域版权保护,并且将copyright信息直接嵌入RAW图像中。results:我们的实验结果显示,RAWIW框架可以成功地在不同的ISP管道和传输过程中维持版权保护,并且可以实现高质量和适当的隐藏性。Abstract
Invisible image watermarking is essential for image copyright protection. Compared to RGB images, RAW format images use a higher dynamic range to capture the radiometric characteristics of the camera sensor, providing greater flexibility in post-processing and retouching. Similar to the master recording in the music industry, RAW images are considered the original format for distribution and image production, thus requiring copyright protection. Existing watermarking methods typically target RGB images, leaving a gap for RAW images. To address this issue, we propose the first deep learning-based RAW Image Watermarking (RAWIW) framework for copyright protection. Unlike RGB image watermarking, our method achieves cross-domain copyright protection. We directly embed copyright information into RAW images, which can be later extracted from the corresponding RGB images generated by different post-processing methods. To achieve end-to-end training of the framework, we integrate a neural network that simulates the ISP pipeline to handle the RAW-to-RGB conversion process. To further validate the generalization of our framework to traditional ISP pipelines and its robustness to transmission distortion, we adopt a distortion network. This network simulates various types of noises introduced during the traditional ISP pipeline and transmission. Furthermore, we employ a three-stage training strategy to strike a balance between robustness and concealment of watermarking. Our extensive experiments demonstrate that RAWIW successfully achieves cross-domain copyright protection for RAW images while maintaining their visual quality and robustness to ISP pipeline distortions.
摘要
这文字说明RAW图像标识(RAWIW)框架,用于图像版权保护。相比于RGB图像,RAW格式图像使用更高的动态范围来捕捉相机感应器的射频特性,提供更多的后处理和修复 flexibility。RAW图像被视为图像生产和分布的原始格式,因此需要版权保护。现有的标识方法通常target RGB图像,这借gabe a gap for RAW图像。为了解决这个问题,我们提出了首个基于深度学习的RAW图像标识(RAWIW)框架。不同于RGB图像标识,我们的方法可以在不同的处理和修复方法下进行标识,并且可以从RGB图像中提取标识信息。为了实现端到端训练的框架,我们将一个神经网络 integrate into the ISP pipeline to handle the RAW-to-RGB conversion process。此外,我们还使用一个抖扰网络,以模拟传输过程中引入的各种噪声。 Finally, we employ a three-stage training strategy to balance the robustness and concealment of watermarking. Our extensive experiments show that RAWIW successfully achieves cross-domain copyright protection for RAW images while maintaining their visual quality and robustness to ISP pipeline distortions.
MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression
methods: 该方法使用了linear complexity global correlations capturing,通过分解softmax操作来实现,并提出了一种基于多reference entropy模型的学习图像压缩方法MLIC$^{++}$。
results: 对于Kodak数据集,相比VTM-17.0,MLIC$^{++}$可以减少BD-rate by 12.44%,并且在PSNR上具有更高的效率。Abstract
Recently, multi-reference entropy model has been proposed, which captures channel-wise, local spatial, and global spatial correlations. Previous works adopt attention for global correlation capturing, however, the quadratic cpmplexity limits the potential of high-resolution image coding. In this paper, we propose the linear complexity global correlations capturing, via the decomposition of softmax operation. Based on it, we propose the MLIC$^{++}$, a learned image compression with linear complexity for multi-reference entropy modeling. Our MLIC$^{++}$ is more efficient and it reduces BD-rate by 12.44% on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Code will be available at https://github.com/JiangWeibeta/MLIC.
摘要
最近,多参照 entropy 模型已经被提出,该模型捕捉了通道级、本地空间和全局空间相关性。前一些作品采用了注意力来捕捉全局相关性,但是这种 quadratic complexity 限制了高分辨率图像编码的潜力。在本文中,我们提出了线性复杂度全球相关性捕捉方法,通过软max操作的归一化 decomposition。基于这种方法,我们提出了 MLIC++,一种学习图像压缩的线性复杂度模型。我们的 MLIC++ 比 VTM-17.0 在 Kodak 数据集上减少了BD-rate 12.44%,相对于 PSNR 来说。代码将在 GitHub 上提供。
Fast Dust Sand Image Enhancement Based on Color Correction and New Membership Function
results: 对多个真实的灰尘照片进行测试和评估,研究表明,提出的解决方案在去除红色和黄色投影方面表现出色,并提供了高质量和量的灰尘照片Abstract
Images captured in dusty environments suffering from poor visibility and quality. Enhancement of these images such as sand dust images plays a critical role in various atmospheric optics applications. In this work, proposed a new model based on Color Correction and new membership function to enhance san dust images. The proposed model consists of three phases: correction of color shift, removal of haze, and enhancement of contrast and brightness. The color shift is corrected using a new membership function to adjust the values of U and V in the YUV color space. The Adaptive Dark Channel Prior (A-DCP) is used for haze removal. The stretching contrast and improving image brightness are based on Contrast Limited Adaptive Histogram Equalization (CLAHE). The proposed model tests and evaluates through many real sand dust images. The experimental results show that the proposed solution is outperformed the current studies in terms of effectively removing the red and yellow cast and provides high quality and quantity dust images.
摘要
图像 capture in 尘埃环境中,因visibility和质量受到干扰。这些图像的提高,如沙尘图像,在大气仪器应用中扮演着关键角色。在这个工作中,我们提出了一种基于色彩修正和新成员函数的新模型,用于提高沙尘图像的质量。该模型包括三个阶段:色彩偏移的修正、雾气除尘和对比度和亮度的提高。色彩偏移的修正使用了一种新的成员函数来调整YUV色彩空间中U和V的值。使用适应黑通道优先(A-DCP)进行雾气除尘。对比度和亮度的提高基于对比限定适应 histogram 平衡(CLAHE)。我们对多个实际的沙尘图像进行测试和评估。实验结果表明,我们的解决方案在 removes 红色和黄色抹雾效果上比现有研究更高效,并提供了高质量和数量的尘埃图像。
Generative AI for Medical Imaging: extending the MONAI Framework
paper_authors: Walter H. L. Pinaya, Mark S. Graham, Eric Kerfoot, Petru-Daniel Tudosiu, Jessica Dafflon, Virginia Fernandez, Pedro Sanchez, Julia Wolleb, Pedro F. da Costa, Ashay Patel, Hyungjin Chung, Can Zhao, Wei Peng, Zelong Liu, Xueyan Mei, Oeslle Lucena, Jong Chul Ye, Sotirios A. Tsaftaris, Prerna Dogra, Andrew Feng, Marc Modat, Parashkev Nachev, Sebastien Ourselin, M. Jorge Cardoso
for: This paper is written for researchers and developers who want to easily train, evaluate, and deploy generative models and related applications in medical imaging.
methods: The paper uses a variety of generative models, including diffusion models, autoregressive transformers, and GANs, and implements them in a generalizable fashion for 2D and 3D medical images with different modalities and anatomical areas.
results: The paper provides pre-trained models for the community and demonstrates the reproducibility of state-of-the-art studies using a standardized approach, as well as the extension of current applications to future features through a modular and extensible approach.Abstract
Recent advances in generative AI have brought incredible breakthroughs in several areas, including medical imaging. These generative models have tremendous potential not only to help safely share medical data via synthetic datasets but also to perform an array of diverse applications, such as anomaly detection, image-to-image translation, denoising, and MRI reconstruction. However, due to the complexity of these models, their implementation and reproducibility can be difficult. This complexity can hinder progress, act as a use barrier, and dissuade the comparison of new methods with existing works. In this study, we present MONAI Generative Models, a freely available open-source platform that allows researchers and developers to easily train, evaluate, and deploy generative models and related applications. Our platform reproduces state-of-art studies in a standardised way involving different architectures (such as diffusion models, autoregressive transformers, and GANs), and provides pre-trained models for the community. We have implemented these models in a generalisable fashion, illustrating that their results can be extended to 2D or 3D scenarios, including medical images with different modalities (like CT, MRI, and X-Ray data) and from different anatomical areas. Finally, we adopt a modular and extensible approach, ensuring long-term maintainability and the extension of current applications for future features.
摘要
In this study, we present MONAI Generative Models, a freely available open-source platform that allows researchers and developers to easily train, evaluate, and deploy generative models and related applications. Our platform reproduces state-of-the-art studies in a standardized way involving different architectures (such as diffusion models, autoregressive transformers, and GANs), and provides pre-trained models for the community. We have implemented these models in a generalizable fashion, illustrating that their results can be extended to 2D or 3D scenarios, including medical images with different modalities (such as CT, MRI, and X-ray data) and from different anatomical areas.Finally, we adopt a modular and extensible approach, ensuring long-term maintainability and the extension of current applications for future features.
Sparsity aware coding for single photon sensitive vision using Selective Sensing
paper_authors: Yizhou Lu, Trevor Seets, Ehsan Ahmadi, Felipe Gutierrez-Barragan, Andreas Velten
for: 提高图像技术的性能
methods: 使用选择感知来学习特征和优化编码策略
results: 在实验和模拟中表明,选择感知可以提高编码性能和总准确率,特别是在Poisson噪音的场景下。Abstract
Optical coding has been widely adopted to improve the imaging techniques. Traditional coding strategies developed under additive Gaussian noise fail to perform optimally in the presence of Poisson noise. It has been observed in previous studies that coding performance varies significantly between these two noise models. In this work, we introduce a novel approach called selective sensing, which leverages training data to learn priors and optimizes the coding strategies for downstream classification tasks. By adapting to the specific characteristics of photon-counting sensors, the proposed method aims to improve coding performance under Poisson noise and enhance overall classification accuracy. Experimental and simulated results demonstrate the effectiveness of selective sensing in comparison to traditional coding strategies, highlighting its potential for practical applications in photon counting scenarios where Poisson noise are prevalent.
摘要
光学编码已广泛应用来提高影像技术。传统的编码策略在添加性 Gaussian 噪声下发展起来的情况下,无法达到最佳性。以前的研究表明,编码性能在这两种噪声模型之间存在很大差异。在这种工作中,我们提出了一种新的方法 called 选择感知(selective sensing),利用训练数据学习假设和优化下推类 зада务中的编码策略。通过适应光子计数器的特点,提出的方法希望在Poisson噪声下提高编码性能和总精度。实验和 simulate 结果表明,选择感知方法比传统编码策略更有效, highlighting 其在光子计数器场景中的应用潜力。