results: 测试结果表明,这个方法可以在只使用T2 MRI数据时,Synthesize主要血管分割图像,并达到了state-of-the-art segmentation网络的水平,包括transformer U-Net和nnU-net,同时使用的参数数量也相对较少。主要的qualitative difference在于synthetic血管分割图像的分辨率更高,特别是在后 Circulation中。Abstract
Magnetic resonance angiography (MRA) is an imaging modality for visualising blood vessels. It is useful for several diagnostic applications and for assessing the risk of adverse events such as haemorrhagic stroke (resulting from the rupture of aneurysms in blood vessels). However, MRAs are not acquired routinely, hence, an approach to synthesise blood vessel segmentations from more routinely acquired MR contrasts such as T1 and T2, would be useful. We present an encoder-decoder model for synthesising segmentations of the main cerebral arteries in the circle of Willis (CoW) from only T2 MRI. We propose a two-phase multi-objective learning approach, which captures both global and local features. It uses learned local attention maps generated by dilating the segmentation labels, which forces the network to only extract information from the T2 MRI relevant to synthesising the CoW. Our synthetic vessel segmentations generated from only T2 MRI achieved a mean Dice score of $0.79 \pm 0.03$ in testing, compared to state-of-the-art segmentation networks such as transformer U-Net ($0.71 \pm 0.04$) and nnU-net($0.68 \pm 0.05$), while using only a fraction of the parameters. The main qualitative difference between our synthetic vessel segmentations and the comparative models was in the sharper resolution of the CoW vessel segments, especially in the posterior circulation.
摘要
磁共振成像(MRA)是一种成像血管的方式,可以用于许多诊断应用以及评估风险因素,如血栓形成的可能性。然而,MRA不是每次都会被获取,因此一种能够从常见的MR增强像素,如T1和T2, sintesize血管分割的方法会非常有用。我们提出了一种encoder-decoder模型,可以从T2 MRI中 sintesize主要脑血管的分割。我们使用了两个阶段多目标学习方法,可以捕捉全局和局部特征。它使用了学习的局部注意力图,由扩展分割标签来生成,这使得网络只能从T2 MRI中提取与synthesizing CoW相关的信息。我们在测试中得到的合成血管分割的Mean Dice分数为$0.79 \pm 0.03$,比比较方法,如trasformer U-Net($0.71 \pm 0.04$)和nnU-net($0.68 \pm 0.05$)高,同时使用的参数只是其中的一部分。主要的qualitative差异在于CoW血管段的分辨率,特别是后流程的血管段。
Achromatic imaging systems with flat lenses enabled by deep learning
results: 通过这种方法, authors 实现了在整个可见光谱中的高质量成像,并且在量化方面也达到了 PSNR 和 SSIM 分数的新高水平。Abstract
Motivated by their great potential to reduce the size, cost and weight, flat lenses, a category that includes diffractive lenses and metalenses, are rapidly emerging as key components with the potential to replace the traditional refractive optical elements in modern optical systems. Yet, the inherently strong chromatic aberration of these flat lenses is significantly impairing their performance in systems based on polychromatic illumination or passive ambient light illumination, stalling their widespread implementation. Hereby, we provide a promising solution and demonstrate high quality imaging based on flat lenses over the entire visible spectrum. Our approach is based on creating a novel dataset of color outdoor images taken with our flat lens and using this dataset to train a deep-learning model for chromatic aberrations correction. Based on this approach we show unprecedented imaging results not only in terms of qualitative measures but also in the quantitative terms of the PSNR and SSIM scores of the reconstructed images. The results pave the way for the implementation of flat lenses in advanced polychromatic imaging systems.
摘要
Translated into Simplified Chinese:驱动于它们的巨大潜力减少大小、成本和重量,扁镜(包括杂散镜和金属镜)在现代光学系统中迅速emerging为关键组件,替代传统的准直光学元件。然而,扁镜的自然强度偏振镜辐射对于基于多色照明或自然环境照明的系统来说,会 significatively impair其性能,阻碍其广泛应用。在这里,我们提供了一个有优势的解决方案,并展示了基于扁镜的高质量成像,覆盖整个可见光谱。我们的方法是基于创建一个新的彩色外景图像数据集,用于训练深度学习模型来纠正偏振镜辐射。基于这种方法,我们展示了无 precedent的成像结果,不仅在质量度量上,还在量化度量上,如PSNR和SSIM分数。这些结果铺开了扁镜在高级多色成像系统中的应用之路。
A Study of Age and Sex Bias in Multiple Instance Learning based Classification of Acute Myeloid Leukemia Subtypes
results: 我们发现AML分型的性别和年龄偏见对模型的表现有 statistically significant 的影响。具体来说,女性患者更容易受到性别偏见的影响,而certain age groups, such as patients with 72 to 86 years of age with the RUNX1::RUNX1T1 genetic subtype, are significantly affected by an age bias present in the training data。确保训练数据的包容性是为获得可靠和公平的结果,最终对多元化患者人口具有帮助。Abstract
Accurate classification of Acute Myeloid Leukemia (AML) subtypes is crucial for clinical decision-making and patient care. In this study, we investigate the potential presence of age and sex bias in AML subtype classification using Multiple Instance Learning (MIL) architectures. To that end, we train multiple MIL models using different levels of sex imbalance in the training set and excluding certain age groups. To assess the sex bias, we evaluate the performance of the models on male and female test sets. For age bias, models are tested against underrepresented age groups in the training data. We find a significant effect of sex and age bias on the performance of the model for AML subtype classification. Specifically, we observe that females are more likely to be affected by sex imbalance dataset and certain age groups, such as patients with 72 to 86 years of age with the RUNX1::RUNX1T1 genetic subtype, are significantly affected by an age bias present in the training data. Ensuring inclusivity in the training data is thus essential for generating reliable and equitable outcomes in AML genetic subtype classification, ultimately benefiting diverse patient populations.
摘要
正确地分类AML分型是临床决策和患者照顾中非常重要的。在这项研究中,我们探索了AML分型分配中的年龄和性别偏袋的潜在影响。为此,我们使用多个MIL建筑物(Multiple Instance Learning)进行训练,其中一些包含性别偏袋,一些则排除某些年龄组。为了评估性别偏袋的影响,我们对男女测试集进行评估。对年龄偏袋,我们将模型测试于训练数据中下 Representatives of underrepresented age groups。我们发现,女性更容易受到数据中的性别偏袋影响,而 certain age groups,如72-86岁的患者,受到训练数据中的年龄偏袋的显著影响。因此,在AML分型分配中确保训练数据的多样性非常重要,以便为不同的患者人口群体提供可靠和公平的结果。
SCP: Spherical-Coordinate-based Learned Point Cloud Compression
results: 实验结果表明,SCP 方法可以覆盖前一代方法,提高点云压缩率和图像质量。在 point-to-point PSNR BD-Rate 指标上,SCP 方法与前一代方法相比,提高了29.14%。Abstract
In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular shapes and azimuthal angle invariance features within the point clouds. However, these two features have been largely overlooked by previous methodologies. In this paper, we introduce a model-agnostic method called Spherical-Coordinate-based learned Point cloud compression (SCP), designed to leverage the aforementioned features fully. Additionally, we propose a multi-level Octree for SCP to mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree. SCP exhibits excellent universality, making it applicable to various learned point cloud compression techniques. Experimental results demonstrate that SCP surpasses previous state-of-the-art methods by up to 29.14% in point-to-point PSNR BD-Rate.
摘要
近年来,学习点云压缩任务得到了更多的关注。一种重要的点云类型是旋转LiDAR点云,由旋转LiDAR在车辆上生成。这个过程会生成许多圆形和方位角度的不变性特征在点云中。然而,这两个特征在前一代方法中得到了相对较少的注意。在本文中,我们介绍了一种模型无关的方法called Spherical-Coordinate-based learned Point cloud compression (SCP),旨在充分利用上述特征。此外,我们还提议了一种多层Octree来 mitigate the reconstruction error for distant areas within the Spherical-coordinate-based Octree。SCP具有优秀的通用性,使其适用于多种学习点云压缩技术。实验结果表明,SCP比前一代方法提高了29.14%的点到点PSNRBD率。
FFEINR: Flow Feature-Enhanced Implicit Neural Representation for Spatio-temporal Super-Resolution
results: 与三线 interpolate方法相比,FFEINR得到了显著更好的结果Abstract
Large-scale numerical simulations are capable of generating data up to terabytes or even petabytes. As a promising method of data reduction, super-resolution (SR) has been widely studied in the scientific visualization community. However, most of them are based on deep convolutional neural networks (CNNs) or generative adversarial networks (GANs) and the scale factor needs to be determined before constructing the network. As a result, a single training session only supports a fixed factor and has poor generalization ability. To address these problems, this paper proposes a Feature-Enhanced Implicit Neural Representation (FFEINR) for spatio-temporal super-resolution of flow field data. It can take full advantage of the implicit neural representation in terms of model structure and sampling resolution. The neural representation is based on a fully connected network with periodic activation functions, which enables us to obtain lightweight models. The learned continuous representation can decode the low-resolution flow field input data to arbitrary spatial and temporal resolutions, allowing for flexible upsampling. The training process of FFEINR is facilitated by introducing feature enhancements for the input layer, which complements the contextual information of the flow field.To demonstrate the effectiveness of the proposed method, a series of experiments are conducted on different datasets by setting different hyperparameters. The results show that FFEINR achieves significantly better results than the trilinear interpolation method.
摘要
大规模的数字实验能生成数据达到tera bytes甚至是 petabytes 的规模。作为科学视觉社区中广泛研究的有前途的数据减少方法,超分辨率(SR)在这些研究中得到了广泛的应用。然而,大多数都是基于深度卷积神经网络(CNN)或生成对抗网络(GAN),并且需要在建立网络之前确定缩放因子。这导致单个训练会话只能支持固定因子,并且具有低通用能力。为了解决这些问题,本文提出了带有特征增强的偏 implicit neural representation(FFEINR),用于流体场数据的空间时间超分辨率。它可以完全利用偏 implicit neural representation 的结构和采样分辨率来获得轻量级模型。学习的连续表示可以将低分辨率流体场输入数据解码到任意的空间和时间分辨率,以便灵活地进行放大。 FFINR 的训练过程由引入特征增强来支持输入层的 Contextual information ,使得模型能够更好地利用输入数据的信息。为证明 FFINR 的有效性,在不同的数据集上进行了一系列实验,并通过设置不同的超参数来评估其性能。结果显示,FFINR 比较 trilinear interpolation 方法更好地得到结果。
MOFA: A Model Simplification Roadmap for Image Restoration on Mobile Devices
results: 我们的方法在多个图像恢复数据集上进行了广泛的实验,并显示了减少运行时间13%,减少参数数量23%,同时提高PSNR和SSIM指标。源代码可以在 \href{https://github.com/xiangyu8/MOFA}{https://github.com/xiangyu8/MOFA} 上获取。Abstract
Image restoration aims to restore high-quality images from degraded counterparts and has seen significant advancements through deep learning techniques. The technique has been widely applied to mobile devices for tasks such as mobile photography. Given the resource limitations on mobile devices, such as memory constraints and runtime requirements, the efficiency of models during deployment becomes paramount. Nevertheless, most previous works have primarily concentrated on analyzing the efficiency of single modules and improving them individually. This paper examines the efficiency across different layers. We propose a roadmap that can be applied to further accelerate image restoration models prior to deployment while simultaneously increasing PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index). The roadmap first increases the model capacity by adding more parameters to partial convolutions on FLOPs non-sensitive layers. Then, it applies partial depthwise convolution coupled with decoupling upsampling/downsampling layers to accelerate the model speed. Extensive experiments demonstrate that our approach decreases runtime by up to 13% and reduces the number of parameters by up to 23%, while increasing PSNR and SSIM on several image restoration datasets. Source Code of our method is available at \href{https://github.com/xiangyu8/MOFA}{https://github.com/xiangyu8/MOFA}.
摘要
Image 修复目标是将高质量图像从受损版本中恢复,深度学习技术在此领域已经取得了显著进步。这种技术已经广泛应用于移动设备上进行手机摄影等任务。由于移动设备的资源有限,如内存约束和运行时间要求,在部署过程中模型的效率变得非常重要。然而,前一些工作主要集中在分析单个模块的效率,并将其改进。这篇论文则研究模型各层之间的效率,并提出一种可以在部署前加速图像修复模型的路线图。该路线图首先增加了模型容量,通过在FLOPs不敏感层添加更多参数进行partial convolution。然后,它应用partial depthwise convolution和分离upsampling/downsampling层来加速模型速度。经验表明,我们的方法可以降低运行时间13%,并降低参数数量23%,同时提高PSNR和SSIM在多个图像修复数据集上。我们的代码可以在 \href{https://github.com/xiangyu8/MOFA}{https://github.com/xiangyu8/MOFA} 中找到。
InverseSR: 3D Brain MRI Super-Resolution Using a Latent Diffusion Model
results: 我们在IXI dataset上 validate 了方法,并证明LDM可以用于MRI重建。Abstract
High-resolution (HR) MRI scans obtained from research-grade medical centers provide precise information about imaged tissues. However, routine clinical MRI scans are typically in low-resolution (LR) and vary greatly in contrast and spatial resolution due to the adjustments of the scanning parameters to the local needs of the medical center. End-to-end deep learning methods for MRI super-resolution (SR) have been proposed, but they require re-training each time there is a shift in the input distribution. To address this issue, we propose a novel approach that leverages a state-of-the-art 3D brain generative model, the latent diffusion model (LDM) trained on UK BioBank, to increase the resolution of clinical MRI scans. The LDM acts as a generative prior, which has the ability to capture the prior distribution of 3D T1-weighted brain MRI. Based on the architecture of the brain LDM, we find that different methods are suitable for different settings of MRI SR, and thus propose two novel strategies: 1) for SR with more sparsity, we invert through both the decoder of the LDM and also through a deterministic Denoising Diffusion Implicit Models (DDIM), an approach we will call InverseSR(LDM); 2) for SR with less sparsity, we invert only through the LDM decoder, an approach we will call InverseSR(Decoder). These two approaches search different latent spaces in the LDM model to find the optimal latent code to map the given LR MRI into HR. The training process of the generative model is independent of the MRI under-sampling process, ensuring the generalization of our method to many MRI SR problems with different input measurements. We validate our method on over 100 brain T1w MRIs from the IXI dataset. Our method can demonstrate that powerful priors given by LDM can be used for MRI reconstruction.
摘要
高解像(HR)MRI扫描从研究级医疗机构获得的数据提供了精确的组织信息。然而,日常临床MRI扫描通常是低解像(LR)的,并且因为扫描参数的调整而具有不同的冲击和空间分辨率。为解决这个问题,我们提出了一种新的方法,利用UK BioBank上训练的状态艺术3D脑生成模型(LDM),提高临床MRI扫描的分辨率。LDM acted as a generative prior,可以捕捉3D T1-weighted MRI的先验分布。基于脑LDM的建筑,我们发现不同的方法适用于不同的MRI SR设置,因此我们提出了两种新策略:1)为SR具有更多的缺乏,我们通过LDM的解码器和干扰推理模型(DDIM)的推理进行反向推理,称之为InverseSR(LDM);2)为SR具有更少的缺乏,我们只通过LDM解码器进行反向推理,称之为InverseSR(Decoder)。这两种方法在LDM模型中不同的秘密空间中寻找最佳的秘密编码,将给出LR MRI扫描变换为HR。我们的训练过程不受MRI下抽象过程的影响,因此我们的方法可以通用多种MRI SR问题。我们验证了我们的方法在IXI数据集上的100余个脑T1w MRI中,我们的方法可以证明LDM给出的强大先验可以用于MRI重建。
HNAS-reg: hierarchical neural architecture search for deformable medical image registration
methods: 这篇论文使用了一个层次 NAS 框架(HNAS-Reg),包括了涉及 convolutional 操作的搜索和网络架构搜索,以找到最佳的网络架构。实际上,这个框架使用了一种叫做 partial channel strategy,以降低计算负载和内存限制,但不失去优化质量。
results: 实验结果显示,提案的方法可以建立一个具有改善医疗影像注册精度和减少模型大小的深度学习模型,比过去的 estado-of-the-art 医疗影像注册方法更好。Abstract
Convolutional neural networks (CNNs) have been widely used to build deep learning models for medical image registration, but manually designed network architectures are not necessarily optimal. This paper presents a hierarchical NAS framework (HNAS-Reg), consisting of both convolutional operation search and network topology search, to identify the optimal network architecture for deformable medical image registration. To mitigate the computational overhead and memory constraints, a partial channel strategy is utilized without losing optimization quality. Experiments on three datasets, consisting of 636 T1-weighted magnetic resonance images (MRIs), have demonstrated that the proposal method can build a deep learning model with improved image registration accuracy and reduced model size, compared with state-of-the-art image registration approaches, including one representative traditional approach and two unsupervised learning-based approaches.
摘要
convolutional neural networks (CNNs) 已经广泛应用于医疗图像 регистраción中深度学习模型建立,但是手动设计的网络架构可能并不是最佳的。这篇论文提出了一种层次 NAS 框架(HNAS-Reg),包括 convolutional 操作搜索和网络架构搜索,以确定最佳的网络架构 для 弹性医疗图像REGISTRACTION。为了减少计算开销和内存限制,本文使用了一种 incomplete channel 策略,而不失去优化质量。实验在三个数据集上,包括 636 个 T1 束缚Magnetic Resonance Images (MRI),表明提案方法可以建立一个具有改善图像REGISTRACTION精度和减少模型大小的深度学习模型,相比之前的图像REGISTRACTION方法,包括一种传统方法和两种无监督学习方法。
Reframing the Brain Age Prediction Problem to a More Interpretable and Quantitative Approach
results: 结果表明, voxel-wise 预测模型比全局预测模型更加有 interpretability,因为它们提供了脑年龄层次结构信息,并且具有量化的优势。Abstract
Deep learning models have achieved state-of-the-art results in estimating brain age, which is an important brain health biomarker, from magnetic resonance (MR) images. However, most of these models only provide a global age prediction, and rely on techniques, such as saliency maps to interpret their results. These saliency maps highlight regions in the input image that were significant for the model's predictions, but they are hard to be interpreted, and saliency map values are not directly comparable across different samples. In this work, we reframe the age prediction problem from MR images to an image-to-image regression problem where we estimate the brain age for each brain voxel in MR images. We compare voxel-wise age prediction models against global age prediction models and their corresponding saliency maps. The results indicate that voxel-wise age prediction models are more interpretable, since they provide spatial information about the brain aging process, and they benefit from being quantitative.
摘要
深度学习模型已经达到了脑年龄估计的州前性result,这是脑健康指标之一,从核磁共振(MR)图像中获取。然而,大多数这些模型只提供全局年龄预测,并使用技术,如saliency map来解释其结果。这些saliency map highlights输入图像中对模型预测的重要区域,但它们具有困难 interpretability,并且saliency map值不同样品直接比较。在这项工作中,我们将MR图像中的脑年龄预测问题重新带为一个图像到图像回归问题,我们预测每个脑voxel的脑年龄。我们比较了voxel-wise年龄预测模型和全局年龄预测模型以及其相应的saliency map。结果表明,voxel-wise年龄预测模型更加可读性,因为它们提供了脑年龄变化过程的空间信息,并且它们受益于量化的。
SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation
results: 根据 MoNuSeg-2018 数据集的测试结果,SPPNet 比现有的 U-Shape 架构表现出更好的性能,并且在训练过程中更快地收敛。相比之下,SPPNet 比 segment anything 模型更快,只需要一个点提取,而不需要多个点提取和训练。这些成果表明,SPPNet 是一种高效、可靠的核体像素分割方法。Abstract
Image segmentation plays an essential role in nuclei image analysis. Recently, the segment anything model has made a significant breakthrough in such tasks. However, the current model exists two major issues for cell segmentation: (1) the image encoder of the segment anything model involves a large number of parameters. Retraining or even fine-tuning the model still requires expensive computational resources. (2) in point prompt mode, points are sampled from the center of the ground truth and more than one set of points is expected to achieve reliable performance, which is not efficient for practical applications. In this paper, a single-point prompt network is proposed for nuclei image segmentation, called SPPNet. We replace the original image encoder with a lightweight vision transformer. Also, an effective convolutional block is added in parallel to extract the low-level semantic information from the image and compensate for the performance degradation due to the small image encoder. We propose a new point-sampling method based on the Gaussian kernel. The proposed model is evaluated on the MoNuSeg-2018 dataset. The result demonstrated that SPPNet outperforms existing U-shape architectures and shows faster convergence in training. Compared to the segment anything model, SPPNet shows roughly 20 times faster inference, with 1/70 parameters and computational cost. Particularly, only one set of points is required in both the training and inference phases, which is more reasonable for clinical applications. The code for our work and more technical details can be found at https://github.com/xq141839/SPPNet.
摘要
图像分割在生物图像分析中扮演着关键性的角色。近些年来,“ segment anything” 模型在这些任务中做出了重要突破。然而,目前的模型存在两个主要问题:(1) segment anything 模型的图像编码器包含大量参数,需要贵重的计算资源进行 retraining 或 fine-tuning。(2)在点提示模式下,点会从真实的图像中心 sampling,需要多个点集来确保可靠性,这对实际应用来说是不fficient的。在这篇论文中,我们提出了一种基于单点提示网络的核心图像分割模型,称为 SPPNet。我们将原始图像编码器替换为轻量级视Transformer,并在平行地添加了一个有效的 convolutional 块,以提取图像中低级别的 semantics 信息,补偿因小图像编码器而导致的性能下降。我们提出了基于 Gaussian kernel 的新的点提取方法。我们的模型在 MoNuSeg-2018 数据集上进行了评估,结果显示,SPPNet 在 U-shape 架构中出色地超越了现有的模型,并在训练中显示了更快的收敛速度。相比于 segment anything 模型,SPPNet 在执行速度方面比其快约 20 倍,具有 1/70 的参数和计算成本。特别是,在训练和执行阶段都只需要一个点集,这更加合理 для临床应用。我们的代码和技术详细信息可以在 https://github.com/xq141839/SPPNet 找到。