results: 研究表明,通过包含准确性检查步骤,可以在噪声入口图像上获得高精度的物品预测结果。此外,还可以根据屏障的性质来确定屏障的物品。Abstract
Dual energy cargo inspection systems are sensitive to both the area density and the atomic number of an imaged container due to the Z dependence of photon attenuation. The ability to identify cargo contents by their atomic number enables improved detection capabilities of illicit materials. Existing methods typically classify materials into a few material classes using an empirical calibration step. However, such a coarse label discretization limits atomic number selectivity and can yield inaccurate results if a material is near the midpoint of two bins. This work introduces a high resolution atomic number prediction method by minimizing the chi-squared error between measured transparency values and a semiempirical transparency model. Our previous work showed that by incorporating calibration step, the semiempirical transparency model can capture second order effects such as scattering. This method is benchmarked using two simulated radiographic phantoms, demonstrating the ability to obtain accurate material predictions on noisy input images by incorporating an image segmentation step. Furthermore, we show that this approach can be adapted to identify shielded objects after first determining the properties of the shielding, taking advantage of the closed-form nature of the transparency model.
摘要
双能量货物检测系统具有区域密度和原子数的敏感性,由光子吸收度随着Z值变化而受到影响。通过物质的原子数确定货物内容,可以提高披靡材料的检测能力。现有方法通常通过静默分类材料到几个物类来实现,但这会限制原子数选择性并可能导致结果不准确,如果材料处于两个极值点之间。这项工作介绍了高分辨率原子数预测方法,通过最小化χ²错误值来预测测量值和 semiempirical 透明性模型之间的差异。我们之前的工作表明,通过添加准确步骤, semiempirical 透明性模型可以捕捉到第二个效应,如散射。这种方法在使用两个模拟的放射学phantom中进行了测试,并显示了在噪声输入图像上获得高精度材料预测的能力。此外,我们还示出了在确定防护物的属性后,通过利用闭式形式的透明性模型,可以识别防护物。
On the Effectiveness of Spectral Discriminators for Perceptual Quality Improvement
results: 我们的方法可以更好地保持SR图像的spectrum,从而提高PD质量。此外,我们的ensembled discriminator可以更好地预测图像质量,并在无参图像质量评估任务中获得更高的准确率。Abstract
Several recent studies advocate the use of spectral discriminators, which evaluate the Fourier spectra of images for generative modeling. However, the effectiveness of the spectral discriminators is not well interpreted yet. We tackle this issue by examining the spectral discriminators in the context of perceptual image super-resolution (i.e., GAN-based SR), as SR image quality is susceptible to spectral changes. Our analyses reveal that the spectral discriminator indeed performs better than the ordinary (a.k.a. spatial) discriminator in identifying the differences in the high-frequency range; however, the spatial discriminator holds an advantage in the low-frequency range. Thus, we suggest that the spectral and spatial discriminators shall be used simultaneously. Moreover, we improve the spectral discriminators by first calculating the patch-wise Fourier spectrum and then aggregating the spectra by Transformer. We verify the effectiveness of the proposed method twofold. On the one hand, thanks to the additional spectral discriminator, our obtained SR images have their spectra better aligned to those of the real images, which leads to a better PD tradeoff. On the other hand, our ensembled discriminator predicts the perceptual quality more accurately, as evidenced in the no-reference image quality assessment task.
摘要
Translation notes:* "perceptual image super-resolution" is translated as "图像超分辨" (tú zhì xiào fāng zhì)* "GAN-based SR" is translated as "基于GAN的SR" (jī yú GAN de SR)* "SR image quality" is translated as "超分辨图像质量" (tú zhì xiào fāng zhì tú yàng)* "spectral discriminator" is translated as "频谱检测器" (freqüência kēng cè qì)* "ordinary discriminator" is translated as "常规检测器" (cháng guī kēng cè qì)* "perceptual quality" is translated as "人类质量" (rén zhì zhì yè)* "no-reference image quality assessment task" is translated as "无参考图像质量评估任务" (wú jiàng qiào tú yàng zhì yè bìng xiǎng)
A Cascade Transformer-based Model for 3D Dose Distribution Prediction in Head and Neck Cancer Radiotherapy
results: 该模型在一个自有的头颈癌 dataset上得到了0.79和2.71的Dice和HD95分数,分别高于现有的基eline。此外,该模型还在OpenKBP dataset上得到了2.77和1.79的辐射剂量和DVH分数,并且在链接auxiliary segmentation任务时表现更优异。Abstract
Radiation therapy is the primary method used to treat cancer in the clinic. Its goal is to deliver a precise dose to the planning target volume (PTV) while protecting the surrounding organs at risk (OARs). However, the traditional workflow used by dosimetrists to plan the treatment is time-consuming and subjective, requiring iterative adjustments based on their experience. Deep learning methods can be used to predict dose distribution maps to address these limitations. The study proposes a cascade model for organs at risk segmentation and dose distribution prediction. An encoder-decoder network has been developed for the segmentation task, in which the encoder consists of transformer blocks, and the decoder uses multi-scale convolutional blocks. Another cascade encoder-decoder network has been proposed for dose distribution prediction using a pyramid architecture. The proposed model has been evaluated using an in-house head and neck cancer dataset of 96 patients and OpenKBP, a public head and neck cancer dataset of 340 patients. The segmentation subnet achieved 0.79 and 2.71 for Dice and HD95 scores, respectively. This subnet outperformed the existing baselines. The dose distribution prediction subnet outperformed the winner of the OpenKBP2020 competition with 2.77 and 1.79 for dose and DVH scores, respectively. The predicted dose maps showed good coincidence with ground truth, with a superiority after linking with the auxiliary segmentation task. The proposed model outperformed state-of-the-art methods, especially in regions with low prescribed doses.
摘要
射频疗法是临床肿瘤治疗的主要方法。其目标是精确地对规划目标量(PTV)进行处理,保护周围的器官随机变化(OARs)。然而,传统的规划工作流程由剂理师进行,是时间consuming和主观的,需要迭代的调整基于它们的经验。深度学习方法可以用来预测射频分布图,解决这些限制。本研究提出了组织随机分布预测和射频分布预测的卷积模型。具有变数块的对话网络已经为分 Segmentation 任务开发,具有变数块的对话网络使用多尺度的对话网络。另一个组织随机分布预测的卷积模型已经提出,使用 pyramid 架构。本研究使用了96名患有头颈癌的患者的内部头颈癌数据集和OpenKBP,一个公共头颈癌数据集,进行评估。分割子网络获得了0.79和2.71的Dice和HD95分数,分别高于现有的基准。射频分布预测子网络高于OpenKBP2020大赛中的胜出者,具有2.77和1.79的射频和DVH分数,分别。预测的射频图表示与真实射频图有good的一致,并且在附加分 segmentation 任务下表现出superiority。本研究的模型在低剂量区域表现出了特别的优势,比如脑部和肺部。
ELiOT : End-to-end Lidar Odometry using Transformer Framework
results: 该方法在 urbane 数据集上显示了 Encouraging 的结果,即 translational 和 rotational 错误分别为 7.59% 和 2.67%。Abstract
In recent years, deep-learning-based point cloud registration methods have shown significant promise. Furthermore, learning-based 3D detectors have demonstrated their effectiveness in encoding semantic information from LiDAR data. In this paper, we introduce ELiOT, an end-to-end LiDAR odometry framework built on a transformer architecture. Our proposed Self-attention flow embedding network implicitly represents the motion of sequential LiDAR scenes, bypassing the need for 3D-2D projections traditionally used in such tasks. The network pipeline, composed of a 3D transformer encoder-decoder, has shown effectiveness in predicting poses on urban datasets. In terms of translational and rotational errors, our proposed method yields encouraging results, with 7.59% and 2.67% respectively on the KITTI odometry dataset. This is achieved with an end-to-end approach that foregoes the need for conventional geometric concepts.
摘要
Recently, deep learning-based point cloud registration methods have shown significant promise. Furthermore, learning-based 3D detectors have demonstrated their effectiveness in encoding semantic information from LiDAR data. In this paper, we introduce ELiOT, an end-to-end LiDAR odometry framework built on a transformer architecture. Our proposed Self-attention flow embedding network implicitly represents the motion of sequential LiDAR scenes, bypassing the need for 3D-2D projections traditionally used in such tasks. The network pipeline, composed of a 3D transformer encoder-decoder, has shown effectiveness in predicting poses on urban datasets. In terms of translational and rotational errors, our proposed method yields encouraging results, with 7.59% and 2.67% respectively on the KITTI odometry dataset. This is achieved with an end-to-end approach that foregoes the need for conventional geometric concepts.Here's a word-for-word translation of the text into Simplified Chinese:近年来,深度学习基于点云注册方法已经显示了很大的承诺。此外,学习基于LiDAR数据的3D探测器也证明了它们在编码 semantic信息方面的效iveness。在这篇文章中,我们介绍了 ELiOT,一个基于transformer架构的LiDAR odometry框架。我们的提出的Self-attention流embedding网络可以隐式地表示sequential LiDAR scene中的运动,不需要传统的3D-2D投影。这个框架由3D transformer编码器-解码器组成,在城市数据集上表现出了良好的效果。在翻译和旋转错误方面,我们的提出方法得到了鼓舞人的结果,分别为7.59%和2.67%在KITTI odometry数据集上。这是通过一个端到端方法来实现的,不需要传统的几何概念。
Topology-Preserving Automatic Labeling of Coronary Arteries via Anatomy-aware Connection Classifier
results: 我们在公共 orCaScore 数据集上提供了高质量的涵围 coronary artery 标注数据,并在 orCaScore 数据集和一个内部数据集上进行了实验。结果显示,我们的 TopoLab 已经实现了领先的性能。Abstract
Automatic labeling of coronary arteries is an essential task in the practical diagnosis process of cardiovascular diseases. For experienced radiologists, the anatomically predetermined connections are important for labeling the artery segments accurately, while this prior knowledge is barely explored in previous studies. In this paper, we present a new framework called TopoLab which incorporates the anatomical connections into the network design explicitly. Specifically, the strategies of intra-segment feature aggregation and inter-segment feature interaction are introduced for hierarchical segment feature extraction. Moreover, we propose the anatomy-aware connection classifier to enable classification for each connected segment pair, which effectively exploits the prior topology among the arteries with different categories. To validate the effectiveness of our method, we contribute high-quality annotations of artery labeling to the public orCaScore dataset. The experimental results on both the orCaScore dataset and an in-house dataset show that our TopoLab has achieved state-of-the-art performance.
摘要
自动标注 coronary artery 是冠артериосclerosis 诊断过程中的关键任务。经验丰富的放射学家可以借助 anatomical connections 准确地标注 artery segments,然而这一知识在前一些研究中几乎未被探讨。在这篇论文中,我们提出了一种新的框架 called TopoLab,该框架将 anatomical connections 直接包含到网络设计中。具体来说,我们引入了 intra-segment feature aggregation 和 inter-segment feature interaction 策略,以实现 hierarchical segment feature extraction。此外,我们还提出了 anatomy-aware connection classifier,以便对每个连接 segment pair 进行分类,从而有效利用了不同类别 artery 之间的先天 topology。为验证我们的方法的有效性,我们在公共 orCaScore 数据集上提供了高质量的 annotations of artery labeling。实验结果表明,我们的 TopoLab 在 orCaScore 数据集和自有数据集上均 achieved state-of-the-art performance。
PartDiff: Image Super-resolution with Partial Diffusion Models
results: 论文的实验表明,在使用Partial Diffusion Model(PartDiff)方法时,可以significantly reduce the number of denoising steps without sacrificing the quality of generation,相比于传统的 diffusion-based super-resolution methods。Abstract
Denoising diffusion probabilistic models (DDPMs) have achieved impressive performance on various image generation tasks, including image super-resolution. By learning to reverse the process of gradually diffusing the data distribution into Gaussian noise, DDPMs generate new data by iteratively denoising from random noise. Despite their impressive performance, diffusion-based generative models suffer from high computational costs due to the large number of denoising steps.In this paper, we first observed that the intermediate latent states gradually converge and become indistinguishable when diffusing a pair of low- and high-resolution images. This observation inspired us to propose the Partial Diffusion Model (PartDiff), which diffuses the image to an intermediate latent state instead of pure random noise, where the intermediate latent state is approximated by the latent of diffusing the low-resolution image. During generation, Partial Diffusion Models start denoising from the intermediate distribution and perform only a part of the denoising steps. Additionally, to mitigate the error caused by the approximation, we introduce "latent alignment", which aligns the latent between low- and high-resolution images during training. Experiments on both magnetic resonance imaging (MRI) and natural images show that, compared to plain diffusion-based super-resolution methods, Partial Diffusion Models significantly reduce the number of denoising steps without sacrificing the quality of generation.
摘要
diffeomorphism probabilistic models (DDPMs) haben achieved impressive performance on various image generation tasks, including image super-resolution. By learning to reverse the process of gradually diffusing the data distribution into Gaussian noise, DDPMs generate new data by iteratively denoising from random noise. Despite their impressive performance, diffusion-based generative models suffer from high computational costs due to the large number of denoising steps.In this paper, we first observed that the intermediate latent states gradually converge and become indistinguishable when diffusing a pair of low- and high-resolution images. This observation inspired us to propose the Partial Diffusion Model (PartDiff), which diffuses the image to an intermediate latent state instead of pure random noise, where the intermediate latent state is approximated by the latent of diffusing the low-resolution image. During generation, Partial Diffusion Models start denoising from the intermediate distribution and perform only a part of the denoising steps. Additionally, to mitigate the error caused by the approximation, we introduce "latent alignment", which aligns the latent between low- and high-resolution images during training. Experiments on both magnetic resonance imaging (MRI) and natural images show that, compared to plain diffusion-based super-resolution methods, Partial Diffusion Models significantly reduce the number of denoising steps without sacrificing the quality of generation.
Conditional Temporal Attention Networks for Neonatal Cortical Surface Reconstruction
results: CoTAN 可以减少 mesh 自交错错误,并且只需 0.21 秒可以将 initial template mesh 变换成 cortical white matter 和 pial surfaces。与州际基准相比,CoTAN achieved 0.12mm 的准确性和 0.07% 的自交错错误。Abstract
Cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (MRI). Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field (CTVF) by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration. The proposed CTVF defines a diffeomorphic surface deformation, which reduces mesh self-intersection errors effectively. It only requires 0.21 seconds to deform an initial template mesh to cortical white matter and pial surfaces for each brain hemisphere. CoTAN is validated on the Developing Human Connectome Project (dHCP) dataset with 877 3D brain MR images acquired from preterm and term born neonates. Compared to state-of-the-art baselines, CoTAN achieves superior performance with only 0.12mm geometric error and 0.07% self-intersecting faces. The visualization of our attention maps illustrates that CoTAN indeed learns coarse-to-fine surface deformations automatically without intermediate supervision.
摘要
cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (MRI). Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field (CTVF) by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration. The proposed CTVF defines a diffeomorphic surface deformation, which reduces mesh self-intersection errors effectively. It only requires 0.21 seconds to deform an initial template mesh to cortical white matter and pial surfaces for each brain hemisphere. CoTAN is validated on the Developing Human Connectome Project (dHCP) dataset with 877 3D brain MR images acquired from preterm and term born neonates. Compared to state-of-the-art baselines, CoTAN achieves superior performance with only 0.12mm geometric error and 0.07% self-intersecting faces. The visualization of our attention maps illustrates that CoTAN indeed learns coarse-to-fine surface deformations automatically without intermediate supervision.Here's the translation in Traditional Chinese: cortical surface reconstruction plays a fundamental role in modeling the rapid brain development during the perinatal period. In this work, we propose Conditional Temporal Attention Network (CoTAN), a fast end-to-end framework for diffeomorphic neonatal cortical surface reconstruction. CoTAN predicts multi-resolution stationary velocity fields (SVF) from neonatal brain magnetic resonance images (MRI). Instead of integrating multiple SVFs, CoTAN introduces attention mechanisms to learn a conditional time-varying velocity field (CTVF) by computing the weighted sum of all SVFs at each integration step. The importance of each SVF, which is estimated by learned attention maps, is conditioned on the age of the neonates and varies with the time step of integration. The proposed CTVF defines a diffeomorphic surface deformation, which reduces mesh self-intersection errors effectively. It only requires 0.21 seconds to deform an initial template mesh to cortical white matter and pial surfaces for each brain hemisphere. CoTAN is validated on the Developing Human Connectome Project (dHCP) dataset with 877 3D brain MR images acquired from preterm and term born neonates. Compared to state-of-the-art baselines, CoTAN achieves superior performance with only 0.12mm geometric error and 0.07% self-intersecting faces. The visualization of our attention maps illustrates that CoTAN indeed learns coarse-to-fine surface deformations automatically without intermediate supervision.
Digital Modeling on Large Kernel Metamaterial Neural Network
results: 实验结果显示,提案的LMNN可以提高分类精度,同时降低计算延迟。Abstract
Deep neural networks (DNNs) utilized recently are physically deployed with computational units (e.g., CPUs and GPUs). Such a design might lead to a heavy computational burden, significant latency, and intensive power consumption, which are critical limitations in applications such as the Internet of Things (IoT), edge computing, and the usage of drones. Recent advances in optical computational units (e.g., metamaterial) have shed light on energy-free and light-speed neural networks. However, the digital design of the metamaterial neural network (MNN) is fundamentally limited by its physical limitations, such as precision, noise, and bandwidth during fabrication. Moreover, the unique advantages of MNN's (e.g., light-speed computation) are not fully explored via standard 3x3 convolution kernels. In this paper, we propose a novel large kernel metamaterial neural network (LMNN) that maximizes the digital capacity of the state-of-the-art (SOTA) MNN with model re-parametrization and network compression, while also considering the optical limitation explicitly. The new digital learning scheme can maximize the learning capacity of MNN while modeling the physical restrictions of meta-optic. With the proposed LMNN, the computation cost of the convolutional front-end can be offloaded into fabricated optical hardware. The experimental results on two publicly available datasets demonstrate that the optimized hybrid design improved classification accuracy while reducing computational latency. The development of the proposed LMNN is a promising step towards the ultimate goal of energy-free and light-speed AI.
摘要
深度神经网络(DNN)最近的应用中使用了计算单元(例如CPU和GPU)。这种设计可能会导致重大的计算负担、显著的延迟和高度的能量投入,这些限制在互联网智能(IoT)、边缘计算和无人机应用中是关键的。最近的光学计算单元(例如元material)的进步暴露了无需能源和光速神经网络。然而,光学计算单元的数字设计(MNN)的物理限制,如精度、噪声和带宽,会在制造过程中带来限制。此外,MNN的独特优势(例如光速计算)还没有通过标准3x3卷积核来完全发挥。在本文中,我们提出了一种新的大型卷积金刚物理神经网络(LMNN),该网络可以最大化MNN的数字能力,同时考虑光学限制。新的数字学习方案可以最大化MNN的学习能力,同时模拟光学限制。通过我们的LMNN,计算前端的计算成本可以被卷积到制造的光学硬件上。实验结果表明,使用我们提出的LMNN可以提高分类精度,同时减少计算延迟。开发LMNN是通向无需能源和光速AI的最终目标的一步。
Deep Learning Hyperspectral Pansharpening on large scale PRISMA dataset
paper_authors: Simone Zini, Mirko Paolo Barbato, Flavio Piccoli, Paolo Napoletano for:* 这个论文旨在评估多种深度学习策略用于光谱缩放。methods:* 作者使用了多种现有的深度学习方法,并适应了PRISMA卫星获得的光谱数据进行适应。results:* 研究发现,基于数据驱动的神经网络方法在RR和FR协议中都能够超越机器学习无法预测的方法,并适应更好地完成光谱缩放任务。Abstract
In this work, we assess several deep learning strategies for hyperspectral pansharpening. First, we present a new dataset with a greater extent than any other in the state of the art. This dataset, collected using the ASI PRISMA satellite, covers about 262200 km2, and its heterogeneity is granted by randomly sampling the Earth's soil. Second, we adapted several state of the art approaches based on deep learning to fit PRISMA hyperspectral data and then assessed, quantitatively and qualitatively, the performance in this new scenario. The investigation has included two settings: Reduced Resolution (RR) to evaluate the techniques in a supervised environment and Full Resolution (FR) for a real-world evaluation. The main purpose is the evaluation of the reconstruction fidelity of the considered methods. In both scenarios, for the sake of completeness, we also included machine-learning-free approaches. From this extensive analysis has emerged that data-driven neural network methods outperform machine-learning-free approaches and adapt better to the task of hyperspectral pansharpening, both in RR and FR protocols.
摘要
在这项工作中,我们评估了数字深度学习策略用于多спектраль照片增高。首先,我们提供了一个新的数据集,其覆盖环境比现有的状态 искус到达更大。这个数据集,通过ASI PRISMA卫星收集,覆盖约262200 km2的区域,并通过随机采样地球土壤来保证其多样性。其次,我们适应了一些现有的深度学习方法,用于适应PRISMA多спектраль数据,然后评估了这些方法的性能。我们在两个设置下进行了评估:减小分辨率(RR)以评估这些方法在指导环境下的性能,以及全分辨率(FR)以评估这些方法在真实世界中的性能。为了充分评估这些方法的重建精度,我们还包括了不含机器学习的方法。从这项广泛的分析中,我们发现了数据驱动的神经网络方法在多спектраль照片增高任务中表现更好,在RR和FR协议中都能够更好地适应任务。