results: 这个方法在低剂量和非contrast CT 检测中实现了高水平的检测性能。Abstract
Lung cancer is a leading cause of death worldwide and early screening is critical for improving survival outcomes. In clinical practice, the contextual structure of nodules and the accumulated experience of radiologists are the two core elements related to the accuracy of identification of benign and malignant nodules. Contextual information provides comprehensive information about nodules such as location, shape, and peripheral vessels, and experienced radiologists can search for clues from previous cases as a reference to enrich the basis of decision-making. In this paper, we propose a radiologist-inspired method to simulate the diagnostic process of radiologists, which is composed of context parsing and prototype recalling modules. The context parsing module first segments the context structure of nodules and then aggregates contextual information for a more comprehensive understanding of the nodule. The prototype recalling module utilizes prototype-based learning to condense previously learned cases as prototypes for comparative analysis, which is updated online in a momentum way during training. Building on the two modules, our method leverages both the intrinsic characteristics of the nodules and the external knowledge accumulated from other nodules to achieve a sound diagnosis. To meet the needs of both low-dose and noncontrast screening, we collect a large-scale dataset of 12,852 and 4,029 nodules from low-dose and noncontrast CTs respectively, each with pathology- or follow-up-confirmed labels. Experiments on several datasets demonstrate that our method achieves advanced screening performance on both low-dose and noncontrast scenarios.
摘要
肺癌是全球最主要的死亡原因之一, early screening 对于提高存活率的影响是关键。在临床实践中, nodule 的 contextual 结构和 radiologist 的经验是识别 benign 和 malignant nodule 的两个核心元素。 contextual 信息可以提供 nodule 的全面信息,如位置、形状和周围血管,经验丰富的 radiologist 可以通过对前期案例进行参考,增强基于决策的基础。在这篇论文中,我们提出了一种基于 radiologist 的方法,用于模拟诊断过程。该方法包括 context parsing 和 prototype recalling 两个模块。context parsing 模块首先将 nodule 的 contextual 结构分解,然后将 contextual 信息聚合,以更全面地理解 nodule。prototype recalling 模块使用 prototype-based learning 来压缩已经学习的案例,并在线更新,以便在训练过程中保持最新。基于这两个模块,我们的方法可以利用 nodule 的内在特征和已经accumulated的外部知识来实现准确的诊断。为了满足低剂量和非对抗 CT 的需求,我们收集了12,852和4,029 nodule 的数据集,其中每个数据集具有 pathology-或 follow-up-确认的标签。实验表明,我们的方法在低剂量和非对抗场景中都达到了高水平的检测性能。
A novel integrated method of detection-grasping for specific object based on the box coordinate matching
results: 实验表明,提高后的模型在对物体检测和抓取任务中均表现出色,而且在对几种具体物体的抓取任务中也得到了可行和有效的结果。Abstract
To better care for the elderly and disabled, it is essential for service robots to have an effective fusion method of object detection and grasp estimation. However, limited research has been observed on the combination of object detection and grasp estimation. To overcome this technical difficulty, a novel integrated method of detection-grasping for specific object based on the box coordinate matching is proposed in this paper. Firstly, the SOLOv2 instance segmentation model is improved by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation. Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to verify the superiority of improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, demonstrating the feasibility and effectiveness of DG-BCM algorithm proposed in this paper.
摘要
Firstly, the SOLOv2 instance segmentation model is enhanced by adding channel attention module (CAM) and spatial attention module (SAM). Then, the atrous spatial pyramid pooling (ASPP) and CAM are added to the generative residual convolutional neural network (GR-CNN) model to optimize grasp estimation.Furthermore, a detection-grasping integrated algorithm based on box coordinate matching (DG-BCM) is proposed to obtain the fusion model of object detection and grasp estimation. For verification, experiments on object detection and grasp estimation are conducted separately to demonstrate the superiority of the improved models. Additionally, grasping tasks for several specific objects are implemented on a simulation platform, showcasing the feasibility and effectiveness of the DG-BCM algorithm proposed in this paper.
for: This paper aims to present a novel, geometry-based, end-to-end compression scheme for point clouds, which is essential for various applications such as virtual reality and 3D modeling.
methods: The proposed method combines information on the geometrical features of the point cloud and the user’s position to achieve remarkable results for aggressive compression schemes demanding very small bit rates. It includes separating visible and non-visible points, calculating four saliency maps, and using delta coordinates and solving a sparse linear system for decoding.
results: The proposed method achieves significantly better results than the geometry-based point cloud compression (G-PCC) algorithm by the Moving Picture Experts Group (MPEG) for small bit rates, as demonstrated by evaluation studies and comparisons with various point clouds.Abstract
The increasing demand for accurate representations of 3D scenes, combined with immersive technologies has led point clouds to extensive popularity. However, quality point clouds require a large amount of data and therefore the need for compression methods is imperative. In this paper, we present a novel, geometry-based, end-to-end compression scheme, that combines information on the geometrical features of the point cloud and the user's position, achieving remarkable results for aggressive compression schemes demanding very small bit rates. After separating visible and non-visible points, four saliency maps are calculated, utilizing the point cloud's geometry and distance from the user, the visibility information, and the user's focus point. A combination of these maps results in a final saliency map, indicating the overall significance of each point and therefore quantizing different regions with a different number of bits during the encoding process. The decoder reconstructs the point cloud making use of delta coordinates and solving a sparse linear system. Evaluation studies and comparisons with the geometry-based point cloud compression (G-PCC) algorithm by the Moving Picture Experts Group (MPEG), carried out for a variety of point clouds, demonstrate that the proposed method achieves significantly better results for small bit rates.
摘要
随着三维场景的准确表示需求的增加,并与 immerse 技术相结合,点云的受欢迎程度有所提高。然而,高质量点云需要大量数据,因此压缩方法的需求是非常重要。在这篇论文中,我们提出了一种新的、基于几何特征的、端到端压缩方案,其中包括点云的几何特征和用户的位置信息,实现了非常出色的压缩效果,特别是在非常小的比特率下。首先,我们将可见和非可见点分开,然后计算出四个重要性地图,使用点云的几何特征和用户的距离、可见信息和用户的关注点。这些地图的组合得到了最终的重要性地图,用于在编码过程中对每个点 quantize 不同数量的比特。解码器使用 delta 坐标和解决一个稀疏线性系统来重建点云。经过评估和与基于几何特征的点云压缩算法(G-PCC)由移动画像专家组(MPEG)提出的比较研究,我们的方法在不同的点云上实现了明显更好的效果。
Prediction of sunflower leaf area at vegetative stage by image analysis and application to the estimation of water stress response parameters in post-registration varieties
results: 使用线性模型并应用 posteriori 缓和后果,我们可以估计植物总叶面积的Relative squared error为11%,效率为93%。使用这些数据计算的叶面积响应和蒸散响应(LER和TR),并与手动测量结果进行比较。结果显示,自动测量的LE和TR参数估计值可以用于模拟。相比之下,手动测量结果显示自动测量的LE值较低,但与绿house grown植物的手动测量结果更相似,这可能表明自动测量方法偏向压力敏感。Abstract
The automatic measurement of developmental and physiological responses of sunflowers to water stress represents an applied challenge for a better knowledge of the varieties available to growers, but also a fundamental one for identifying the biological, genetic and molecular bases of plant response to their environment.On INRAE Toulouse's Heliaphen high-throughput phenotyping platform, we set up two experiments, each with 8 varieties (2*96 plants), and acquired images of plants subjected or not to water stress, using a light barrier on a daily basis. At the same time, we manually measured the leaf surfaces of these plants every other day for the duration of the stress, which lasted around ten days. The images were analyzed to extract morphological characteristics of the segmented plants and different models were evaluated to estimate total plant leaf areas using these data.A linear model with a posteriori smoothing was used to estimate total leaf area with a relative squared error of 11% and an efficiency of 93%. Leaf areas estimated conventionally or with the developed model were used to calculate the leaf expansion and transpiration responses (LER and TR) used in the SUNFLO crop model for 8 sunflower varieties studied. Correlation coefficients of 0.61 and 0.81 for LER and TR respectively validate the use of image-based leaf area estimation. However, the estimated values for LER are lower than for the manual method on Heliaphen, but closer overall to the manual method on greenhouse-grown plants, potentially suggesting an overestimation of stress sensitivity.It can be concluded that the LE and TR parameter estimates can be used for simulations. The low cost of this method (compared with manual measurements), the possibility of parallelizing and repeating measurements on the Heliaphen platform, and of benefiting from the Heliaphen platform's data management, are major improvements for valorizing the SUNFLO model and characterizing the drought sensitivity of cultivated varieties.
摘要
自动测量阳光花的发育和生理响应对水压力是一个应用挑战,以便更好地了解可用种植者,也是一个基础的挑战,以确定植物对环境的响应的生物、遗传和分子基础。在INRAE Toulouse的Heliaphen高通量现场测量平台上,我们设计了两个实验,每个实验有8种(2*96植物),并在每天基础上使用光梯图像这些植物,并手动测量这些植物的叶面每两天,测试期间约10天。图像分析后,EXTract了植物的形态特征,并评估了不同的模型以估算植物的总叶面面积。使用线性模型和 posteriori 缓和后处理时,可以Estimate植物的总叶面面积,相对平方差为11%,效率为93%。使用这些数据计算植物的叶面Expand和蒸发响应(LER和TR),并使用这些响应在SUNFLO植物模型中。 corrCoef = 0.61和0.81,这些值 validate the use of image-based leaf area estimation。然而,Estimated LER values are lower than those obtained using the manual method on Heliaphen, but closer to the manual method on greenhouse-grown plants, suggesting an overestimation of stress sensitivity。可以 concluThat the LE and TR parameter estimates can be used for simulations. The low cost of this method (compared with manual measurements), the possibility of parallelizing and repeating measurements on the Heliaphen platform, and the possibility of benefiting from the Heliaphen platform's data management, are major improvements for valorizing the SUNFLO model and characterizing the drought sensitivity of cultivated varieties。
Depth from Defocus Technique: A Simple Calibration-Free Approach for Dispersion Size Measurement
results: 研究表明,该方法可以高精度地测量粒子的大小和位置,并且可以在多相流体中跟踪粒子的运动。Abstract
Dispersed particle size measurement is crucial in a variety of applications, be it in the sizing of spray droplets, tracking of particulate matter in multiphase flows, or the detection of target markers in machine vision systems. Further to sizing, such systems are characterised by extracting quantitative information like spatial position and associated velocity of the dispersed phase particles. In the present study we propose an imaging based volumetric measurement approach for estimating the size and position of spherically dispersed particles. The approach builds on the 'Depth from Defocus' (DFD) technique using a single camera approach. The simple optical configuration, consisting of a shadowgraph setup and a straightforward calibration procedure, makes this method readily deployable and accessible for broader applications.
摘要
“粒子大小分布测量在各种应用中都非常重要,无论是在喷涂液滴大小的测量,或者追踪多相流体中的固体粒子,或者是机器视觉系统中的目标标记检测。在当前研究中,我们提出了基于图像的体积测量方法,用于估算归一化粒子的大小和位置。这种方法基于‘深度从对焦’(DFD)技术,使用单一相机设置。该简单的光学配置和简单的准备过程,使得这种方法可以广泛应用和易于实施。”Note that Simplified Chinese is the standard writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other countries.
Efficient Unified Demosaicing for Bayer and Non-Bayer Patterned Image Sensors
results: 该论文的实验结果表明,该方法可以在Synthetic和实际RAW数据中实现了state-of-the-art的排除性能,并且可以在不同的CFAs和照明条件下实现了比较高的稳定性和精度。Abstract
As the physical size of recent CMOS image sensors (CIS) gets smaller, the latest mobile cameras are adopting unique non-Bayer color filter array (CFA) patterns (e.g., Quad, Nona, QxQ), which consist of homogeneous color units with adjacent pixels. These non-Bayer sensors are superior to conventional Bayer CFA thanks to their changeable pixel-bin sizes for different light conditions but may introduce visual artifacts during demosaicing due to their inherent pixel pattern structures and sensor hardware characteristics. Previous demosaicing methods have primarily focused on Bayer CFA, necessitating distinct reconstruction methods for non-Bayer patterned CIS with various CFA modes under different lighting conditions. In this work, we propose an efficient unified demosaicing method that can be applied to both conventional Bayer RAW and various non-Bayer CFAs' RAW data in different operation modes. Our Knowledge Learning-based demosaicing model for Adaptive Patterns, namely KLAP, utilizes CFA-adaptive filters for only 1% key filters in the network for each CFA, but still manages to effectively demosaic all the CFAs, yielding comparable performance to the large-scale models. Furthermore, by employing meta-learning during inference (KLAP-M), our model is able to eliminate unknown sensor-generic artifacts in real RAW data, effectively bridging the gap between synthetic images and real sensor RAW. Our KLAP and KLAP-M methods achieved state-of-the-art demosaicing performance in both synthetic and real RAW data of Bayer and non-Bayer CFAs.
摘要
“随着CMOS图像感知器(CIS)的物理大小变小,latest的手机摄像头正在采用不同的非 Bayer颜色网络阵列(CFA)模式(例如Quad、Nona、QxQ),这些非 Bayer 感知器比传统Bayer CFA 高效,因为它们可以根据不同的光照情况调整像素单元大小,但可能会在构成过程中带来视觉错误。先前的构成方法主要集中在Bayer CFA,需要特别的重建方法 для非 Bayer 模式的 CIS,不同的照明情况下。在这个工作中,我们提出了一个高效的统一构成方法,可以应用于传统Bayer RAW 和不同的非 Bayer CFAs 的 RAW 数据,不同的操作模式下。我们的知识学习基本的构成模型,即KLAP,只需要1%的键 filters 在网络中,但还是能够有效地构成所有的 CFAs,与大规模模型相比,具有相似的性能。此外,通过在推断中使用meta-learning(KLAP-M),我们的模型可以实际地消除实际感知器的普通遗传特征错误,实现实际数据和 sintetic 数据之间的距离。我们的 KLAP 和 KLAP-M 方法在实验和实际 RAW 数据中均 achiev 了顶尖的构成性能。”
Physics-Driven Turbulence Image Restoration with Stochastic Refinement
paper_authors: Ajay Jaiswal, Xingguang Zhang, Stanley H. Chan, Zhangyang Wang for: PiRN 是一种用于恢复这个问题的红外线成像系统中的湮渠问题。methods: PiRN 使用了直接将物理模型integrated 到训练过程中,以帮助网络对实际世界中的湮渠状况进行适应。此外,PiRN 还引入了 Stochastic Refinement (SR) 来增强其 perceived quality。results: PiRN 和 PiRN-SR 能够提高 unknown turbulence conditions 的一致性和恢复评分,并且提供了 pixels 精度和 perceived quality 的州际先进成果。Abstract
Image distortion by atmospheric turbulence is a stochastic degradation, which is a critical problem in long-range optical imaging systems. A number of research has been conducted during the past decades, including model-based and emerging deep-learning solutions with the help of synthetic data. Although fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions recently, the training of such models only relies on the synthetic data and ground truth pairs. This paper proposes the Physics-integrated Restoration Network (PiRN) to bring the physics-based simulator directly into the training process to help the network to disentangle the stochasticity from the degradation and the underlying image. Furthermore, to overcome the ``average effect" introduced by deterministic models and the domain gap between the synthetic and real-world degradation, we further introduce PiRN with Stochastic Refinement (PiRN-SR) to boost its perceptual quality. Overall, our PiRN and PiRN-SR improve the generalization to real-world unknown turbulence conditions and provide a state-of-the-art restoration in both pixel-wise accuracy and perceptual quality. Our codes are available at \url{https://github.com/VITA-Group/PiRN}.
摘要
图像扭曲 caused by atmospheric turbulence 是一种随机的延迟问题,对于远程光学感知系统来说是一个 kritical problem。过去几十年,一些研究已经进行了,包括基于模型的和 emerging deep learning 解决方案,使用合成数据。虽然最近已经出现了基于物理的快速模拟工具,但是这些模型的训练只是基于合成数据和真实数据对。本文提出了物理结合的Restoration Network(PiRN),将物理基础的模拟器直接引入训练过程,以帮助网络分离随机性和扭曲以及下面的图像。此外,为了超越由 deterministic 模型引入的“平均效果”和真实世界扭曲与合成数据之间的领域差异,我们进一步引入 PiRN with Stochastic Refinement(PiRN-SR),以提高它的感知质量。总的来说,我们的 PiRN 和 PiRN-SR 可以提高对真实世界未知扭曲条件的泛化能力,并提供了 pixel-wise 精度和感知质量的 state-of-the-art 修复。我们的代码可以在 \url{https://github.com/VITA-Group/PiRN} 上获取。
paper_authors: Subhashis Suara, Aayush Jha, Pratik Sinha, Arif Ahmed Sekh
For: This paper is written for researchers and practitioners in the field of medical imaging and artificial intelligence, particularly those interested in Explainable Deep Learning and its applications in medical imaging.* Methods: The paper discusses various explainability techniques, including Grad-CAM, and their limitations in medical imaging applications.* Results: The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了吸引医学图像和人工智能领域的研究人员和实践者,尤其是关注Explainable Deep Learning和其在医学图像应用中的潜在应用。* Methods: 论文讨论了各种解释技术,包括Grad-CAM,以及它们在医学图像应用中的限制。* Results: 发现表明Explainable Deep Learning和Grad-CAM在医学图像应用中可以提高深度学习模型的准确率和可读性。Abstract
Explainable Deep Learning has gained significant attention in the field of artificial intelligence (AI), particularly in domains such as medical imaging, where accurate and interpretable machine learning models are crucial for effective diagnosis and treatment planning. Grad-CAM is a baseline that highlights the most critical regions of an image used in a deep learning model's decision-making process, increasing interpretability and trust in the results. It is applied in many computer vision (CV) tasks such as classification and explanation. This study explores the principles of Explainable Deep Learning and its relevance to medical imaging, discusses various explainability techniques and their limitations, and examines medical imaging applications of Grad-CAM. The findings highlight the potential of Explainable Deep Learning and Grad-CAM in improving the accuracy and interpretability of deep learning models in medical imaging. The code is available in (will be available).
摘要
simplified Chinese:人工智能(AI)领域中,可解释深度学习(Explainable Deep Learning)已经受到了广泛关注,尤其在医疗影像领域,因为需要准确和可解释的机器学习模型以便有效的诊断和治疗规划。Grad-CAM是一个基线,它可以高亮深度学习模型决策过程中使用的图像中的关键区域,从而提高了可解释性和信任性。它在许多计算机视觉(CV)任务中被应用,如分类和解释。本研究探讨了Explainable Deep Learning的原理和医疗影像中的应用,讨论了不同的解释技术和其局限性,以及Grad-CAM在医疗影像中的应用。研究发现,Explainable Deep Learning和Grad-CAM在医疗影像中可以提高深度学习模型的准确性和可解释性。代码将在(将来可用)。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
results: 提供了对小游戏玩家对Metaverse的认知和期望的准确描述,可以用于规划更详细的主观实验和MMSP技术的研发Abstract
When developing technologies for the Metaverse, it is important to understand the needs and requirements of end users. Relatively little is known about the specific perspectives on the use of the Metaverse by the youngest audience: children ten and under. This paper explores the Metaverse from the perspective of a young gamer. It examines their understanding of the Metaverse in relation to the physical world and other technologies they may be familiar with, looks at some of their expectations of the Metaverse, and then relates these to the specific multimedia signal processing (MMSP) research challenges. The perspectives presented in the paper may be useful for planning more detailed subjective experiments involving young gamers, as well as informing the research on MMSP technologies targeted at these users.
摘要
在开发 metaverse 技术时,需要了解用户的需求和要求。目前对最年轻的用户(10岁以下)对 metaverse 的使用情况知之甚少。这篇论文从年轻游戏者的视角来探讨 metaverse,了解它与物理世界以及他们可能熟悉的其他技术之间的关系,探讨他们对 metaverse 的期望,并将这些期望与 multimedia signal processing(MMSP)技术研究挑战相关联。这篇论文中的视角可以用于规划更详细的主观实验,以及 informing MMSP 技术的研究。
Adversarial Latent Autoencoder with Self-Attention for Structural Image Synthesis
paper_authors: Jiajie Fan, Laure Vuaille, Hao Wang, Thomas Bäck for:SA-ALAE is proposed to facilitate industrial engineering processes by generating feasible design images of complex engineering parts.methods:SA-ALAE uses a novel Self-Attention Adversarial Latent Autoencoder architecture, which allows generating feasible design images by leveraging the structural patterns and long-range dependencies in industrial design images.results:SA-ALAE is shown to be effective in generating engineering blueprints in a real automotive design task, allowing users to explore novel variants of an existing design and control the generation process by operating in latent space.Abstract
Generative Engineering Design approaches driven by Deep Generative Models (DGM) have been proposed to facilitate industrial engineering processes. In such processes, designs often come in the form of images, such as blueprints, engineering drawings, and CAD models depending on the level of detail. DGMs have been successfully employed for synthesis of natural images, e.g., displaying animals, human faces and landscapes. However, industrial design images are fundamentally different from natural scenes in that they contain rich structural patterns and long-range dependencies, which are challenging for convolution-based DGMs to generate. Moreover, DGM-driven generation process is typically triggered based on random noisy inputs, which outputs unpredictable samples and thus cannot perform an efficient industrial design exploration. We tackle these challenges by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE), which allows generating feasible design images of complex engineering parts. With SA-ALAE, users can not only explore novel variants of an existing design, but also control the generation process by operating in latent space. The potential of SA-ALAE is shown by generating engineering blueprints in a real automotive design task.
摘要
<>转换文本到简化中文。<>生成工程设计方法驱动深度生成模型(DGM)已经提出,以便促进工程设计过程。在这些过程中,设计通常以图像的形式出现,例如蓝图、工程图纸和CAD模型,具体取决于级别。DGM已经成功地应用于自然图像的生成,例如显示动物、人脸和风景。然而,工程设计图像与自然场景不同,它们具有丰富的结构征式和长距离依赖关系,这些关系是 convolution-based DGM 生成的挑战。此外,DGM 驱动的生成过程通常由随机噪声触发,输出不可预测的样本,因此无法进行有效的工程设计探索。我们解决这些挑战 by proposing a novel model Self-Attention Adversarial Latent Autoencoder (SA-ALAE),允许生成复杂工程部件的可靠设计图像。通过 SA-ALAE,用户不仅可以探索现有设计的新变体,还可以在幽默空间控制生成过程。SA-ALAE 的潜力在一个实际的汽车设计任务中展示。
Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis
results: 这个研究的结果显示了Make-A-Volume框架可以实现高效的跨modal医疗影像合成,并且可以避免模式落差和训练不稳定的问题。Abstract
Cross-modality medical image synthesis is a critical topic and has the potential to facilitate numerous applications in the medical imaging field. Despite recent successes in deep-learning-based generative models, most current medical image synthesis methods rely on generative adversarial networks and suffer from notorious mode collapse and unstable training. Moreover, the 2D backbone-driven approaches would easily result in volumetric inconsistency, while 3D backbones are challenging and impractical due to the tremendous memory cost and training difficulty. In this paper, we introduce a new paradigm for volumetric medical data synthesis by leveraging 2D backbones and present a diffusion-based framework, Make-A-Volume, for cross-modality 3D medical image synthesis. To learn the cross-modality slice-wise mapping, we employ a latent diffusion model and learn a low-dimensional latent space, resulting in high computational efficiency. To enable the 3D image synthesis and mitigate volumetric inconsistency, we further insert a series of volumetric layers in the 2D slice-mapping model and fine-tune them with paired 3D data. This paradigm extends the 2D image diffusion model to a volumetric version with a slightly increasing number of parameters and computation, offering a principled solution for generic cross-modality 3D medical image synthesis. We showcase the effectiveness of our Make-A-Volume framework on an in-house SWI-MRA brain MRI dataset and a public T1-T2 brain MRI dataset. Experimental results demonstrate that our framework achieves superior synthesis results with volumetric consistency.
摘要
《医学影像合成》是一个关键的领域,它具有推动多种医学影像应用的潜力。虽然最近的深度学习生成模型已经取得了一定的成功,但大多数当前的医学影像合成方法仍然 rely on 生成对抗网络,并且受到模式坍缩和训练不稳定的问题困扰。此外,2D 脊梁驱动的方法会导致 volumes 的不一致,而 3D 脊梁却是由于巨大的存储成本和训练困难而困难实现。在这篇论文中,我们介绍了一种新的概念,即通过 2D 脊梁来实现 cross-modality 3D 医学影像合成。为了学习 cross-modality slice-wise 映射,我们采用了扩散模型,并学习了一个低维的隐藏空间,从而实现了高效的计算。为了启用 3D 图像合成并减少 volumes 的不一致,我们进一步插入了一系列的volumetric层到 2D slice-mapping 模型中,并通过对匹配的 3D 数据进行精心调整。这种方法将 2D 图像扩散模型扩展到volumetric版本,只需要微小的参数和计算量增加,可以实现一种原理性的解决方案 для通用的 cross-modality 3D 医学影像合成。我们在一个自家的 SWI-MRA 脑MRI 数据集和一个公共 T1-T2 脑MRI 数据集上展示了我们的 Make-A-Volume 框架的效果,并证明了我们的框架可以实现superior的合成结果,同时保持volumetric consistency。