paper_authors: Kunyu Wang, Xuanran He, Wenxuan Wang, Xiaosen Wang
for: 防止深度学习模型受到攻击,增强深度学习模型的安全性。
methods: 使用输入变换基于的攻击方法,包括块混淆和旋转(BSR)。
results: BSR方法可以在单模型和集成模型下实现显著更高的传输性,并且可以与现有输入变换方法组合使用,以达到更高的传输性和安全性。Abstract
Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability, which refers to their ability to deceive other models, thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability, the performance still falls short compared with white-box attacks. In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding, we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically, BSR splits the input image into several blocks, then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability, which significantly outperforms the state-of-the-art methods.
摘要
深度学习模型被攻击性例子所欺骗,通过不可见的扰动而导致模型错误。其中一个重要方面是其可传递性,即它们能够在黑盒 Setting中攻击其他模型,从而对深度学习 pose 威胁。虽然多种方法已经被提出来提高可传递性,但其性能仍然落后于白盒攻击。在这种情况下,我们发现现有的输入变换基于攻击方法中的 attention heatmap 可能会受限制可传递性。我们还发现,对图像的内部关系的扰动可以破坏原始图像的 attention heatmap。基于这一发现,我们提出了一种新的输入变换基于攻击方法 called block shuffle and rotation (BSR)。具体来说,BSR 将输入图像分成多个块,然后随机排序和旋转这些块来构建一组新的图像,用于计算梯度。我们的实验表明,BSR 在 ImageNet 数据集上可以达到单模型和集成模型的情况下,与现有的输入变换基于方法相比,有显著更好的可传递性。此外,将 BSR 与当前的输入变换方法相结合,可以进一步提高可传递性,并与当前的状态OF-the-art方法相比,具有显著更好的性能。
Domain Reduction Strategy for Non Line of Sight Imaging
results: 在多种NLOS场景下,包括非平面关墙、稀疏扫描模式、射频和非射频、和表面几何重建等,实验结果表明提出的方法在一般NLOS场景中具有优越性和高效性。Abstract
This paper presents a novel optimization-based method for non-line-of-sight (NLOS) imaging that aims to reconstruct hidden scenes under various setups. Our method is built upon the observation that photons returning from each point in hidden volumes can be independently computed if the interactions between hidden surfaces are trivially ignored. We model the generalized light propagation function to accurately represent the transients as a linear combination of these functions. Moreover, our proposed method includes a domain reduction procedure to exclude empty areas of the hidden volumes from the set of propagation functions, thereby improving computational efficiency of the optimization. We demonstrate the effectiveness of the method in various NLOS scenarios, including non-planar relay wall, sparse scanning patterns, confocal and non-confocal, and surface geometry reconstruction. Experiments conducted on both synthetic and real-world data clearly support the superiority and the efficiency of the proposed method in general NLOS scenarios.
摘要
Crucial Feature Capture and Discrimination for Limited Training Data SAR ATR
paper_authors: Chenwei Wang, Siyi Luo, Jifang Pei, Yulin Huang, Yin Zhang, Jianyu Yang for: 这个研究旨在提高SAR ATR的表现,对于具有限制training samples的情况进行设计。methods: 本研究提出了一个SAR ATR框架,包括两个支线和两个模组:全球协助支线和本地增强支线,特征捕捉模组和特征区别模组。在每次训练过程中,全球协助支线首先完成了初始识别,基于整个图像。然后,特征捕捉模组自动搜寻并锁定了重要图像区域,我们称之为图像的“金钥”。最后,本地增强支线对捕捉到的本地特征进行了进一步处理。results: 我们通过模型健全性实验和实验结果展示了我们的方法的有效性。在MSTAR和OPENSAR上进行了比较,我们的方法已经实现了Superior的识别性能。Abstract
Although deep learning-based methods have achieved excellent performance on SAR ATR, the fact that it is difficult to acquire and label a lot of SAR images makes these methods, which originally performed well, perform weakly. This may be because most of them consider the whole target images as input, but the researches find that, under limited training data, the deep learning model can't capture discriminative image regions in the whole images, rather focus on more useless even harmful image regions for recognition. Therefore, the results are not satisfactory. In this paper, we design a SAR ATR framework under limited training samples, which mainly consists of two branches and two modules, global assisted branch and local enhanced branch, feature capture module and feature discrimination module. In every training process, the global assisted branch first finishes the initial recognition based on the whole image. Based on the initial recognition results, the feature capture module automatically searches and locks the crucial image regions for correct recognition, which we named as the golden key of image. Then the local extract the local features from the captured crucial image regions. Finally, the overall features and local features are input into the classifier and dynamically weighted using the learnable voting parameters to collaboratively complete the final recognition under limited training samples. The model soundness experiments demonstrate the effectiveness of our method through the improvement of feature distribution and recognition probability. The experimental results and comparisons on MSTAR and OPENSAR show that our method has achieved superior recognition performance.
摘要
尽管深度学习基本方法在Synthetic Aperture Radar(SAR)特征识别(ATR)中表现出色,但由于获得和标注SAR图像困难,这些方法在限制性训练数据下表现弱化。这可能是因为大多数方法将整个目标图像作为输入,但研究人员发现,在有限的训练样本下,深度学习模型无法捕捉权重图像区域,而是专注于无用甚至害图像区域的识别。因此,结果不 satisfactory。在这篇论文中,我们设计了一个基于有限训练样本的SAR ATR框架,主要包括两个分支和两个模块:全球协助分支和本地增强分支,特征捕捉模块和特征分类模块。在每次训练过程中,全球协助分支首先基于整个图像完成初步识别。基于初步识别结果,特征捕捉模块自动搜索和锁定正确识别的关键图像区域,我们称之为图像的“金钥匙”。然后,本地EXTRACTLOCAL特征从捕捉到的关键图像区域中提取本地特征。最后,总特征和本地特征通过可学习投票参数进行相互协同完成最终识别。实验证明我们的方法有效性,通过改善特征分布和识别概率。实验结果和MSTAR和OPENSAR的比较表明,我们的方法在限制性训练样本下实现了superior的识别性能。
An Entropy-Awareness Meta-Learning Method for SAR Open-Set ATR
paper_authors: Chenwei Wang, Siyi Luo, Jifang Pei, Xiaoyu Liu, Yulin Huang, Yin Zhang, Jianyu Yang for: 这个论文主要针对的是开放集Recognition(OSR)问题,即使用Synthetic Aperture Radar自动目标识别(SAR ATR)方法对不同的目标类进行分类。methods: 该方法基于Entropy-awareness meta-learning技术,通过meta-学任务来学习建立动态分配的知识类别的特征空间,以便同时分类已知类别并排除未知类别。results: 实验结果表明,该方法在Moving and Stationary Target Acquisition and Recognition(MSTAR)数据集上表现出色,能够同时分类动态分配的知识类别和排除未知类别。Abstract
Existing synthetic aperture radar automatic target recognition (SAR ATR) methods have been effective for the classification of seen target classes. However, it is more meaningful and challenging to distinguish the unseen target classes, i.e., open set recognition (OSR) problem, which is an urgent problem for the practical SAR ATR. The key solution of OSR is to effectively establish the exclusiveness of feature distribution of known classes. In this letter, we propose an entropy-awareness meta-learning method that improves the exclusiveness of feature distribution of known classes which means our method is effective for not only classifying the seen classes but also encountering the unseen other classes. Through meta-learning tasks, the proposed method learns to construct a feature space of the dynamic-assigned known classes. This feature space is required by the tasks to reject all other classes not belonging to the known classes. At the same time, the proposed entropy-awareness loss helps the model to enhance the feature space with effective and robust discrimination between the known and unknown classes. Therefore, our method can construct a dynamic feature space with discrimination between the known and unknown classes to simultaneously classify the dynamic-assigned known classes and reject the unknown classes. Experiments conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset have shown the effectiveness of our method for SAR OSR.
摘要
现有的Synthetic Aperture Radar自动目标识别(SAR ATR)方法已经有效地分类了见到的目标类。然而,更重要和挑战性的是分类未经见到的目标类,即开放集 recognition(OSR)问题,这是实际SAR ATR中的一个紧迫问题。OSR的关键解决方案是有效地建立已知类别的特征分布的独特性。在这封信中,我们提出了一种基于熵意识的meta-学习方法,该方法可以提高已知类别的特征分布独特性,即我们的方法可以不仅分类见到的类别,也可以遇到未经见到的其他类别。通过meta-学习任务,我们提出的方法学习了动态分配的已知类别的特征空间。这个特征空间是由任务需要拒绝不属于已知类别的所有其他类别。同时,我们的熵意识损失帮助模型增强特征空间的有效和可靠地区分已知和未知类别。因此,我们的方法可以建立动态的特征空间,同时分类动态分配的已知类别和拒绝未知类别。在MSTAR数据集上进行的实验表明了我们的方法在SAR OSR中的效果。
SAR Ship Target Recognition via Selective Feature Discrimination and Multifeature Center Classifier
For: 本研究旨在提高空中遥感船舶目标识别精度,尤其是在资源价格较低的情况下。* Methods: 本研究提出了一种基于选择性特征分化和多特征中心分类器的SAR船舶目标识别方法。选择性特征分化自动地寻找最相似的多个类别之间的相似特征,并将不相似的特征提高为更大的内类分布。多特征中心分类器将每个船舶类别分配多个学习可能的特征中心,以分解大内类分布。* Results: 实验结果显示,我们的方法在对OpenSARShip和FUSAR-Ship datasets进行测试时,实现了与训练数据减少的高精度识别。Abstract
Maritime surveillance is not only necessary for every country, such as in maritime safeguarding and fishing controls, but also plays an essential role in international fields, such as in rescue support and illegal immigration control. Most of the existing automatic target recognition (ATR) methods directly send the extracted whole features of SAR ships into one classifier. The classifiers of most methods only assign one feature center to each class. However, the characteristics of SAR ship images, large inner-class variance, and small interclass difference lead to the whole features containing useless partial features and a single feature center for each class in the classifier failing with large inner-class variance. We proposes a SAR ship target recognition method via selective feature discrimination and multifeature center classifier. The selective feature discrimination automatically finds the similar partial features from the most similar interclass image pairs and the dissimilar partial features from the most dissimilar inner-class image pairs. It then provides a loss to enhance these partial features with more interclass separability. Motivated by divide and conquer, the multifeature center classifier assigns multiple learnable feature centers for each ship class. In this way, the multifeature centers divide the large inner-class variance into several smaller variances and conquered by combining all feature centers of one ship class. Finally, the probability distribution over all feature centers is considered comprehensively to achieve an accurate recognition of SAR ship images. The ablation experiments and experimental results on OpenSARShip and FUSAR-Ship datasets show that our method has achieved superior recognition performance under decreasing training SAR ship samples.
摘要
海上监控不仅是每个国家必需的,如海上安全和渔业控制,而且在国际领域也扮演着关键角色,如救援支持和非法移民控制。现有的自动目标识别(ATR)方法大多直接将扫描后的整个特征向量传递给一个分类器。但是,抛物线船图像的特征是巨大的内类差异和小的 между类差异,这导致整个特征向量包含了无用的部分特征和单个特征中心为每个类别的分类器失败。我们提出了一种基于选择特征筛选和多特征中心分类器的SAR船目标识别方法。选择特征筛选自动从最相似的interclass图像对中找到相似的部分特征和最不相似的inner-class图像对中找到不相似的部分特征,然后为这些部分特征提供损失以提高它们的 между类差异性。以分类器为核心,多特征中心分类器将每个船类分配多个学习的特征中心。这样,多个特征中心将内类差异分解为多个较小的差异,并通过将所有特征中心的一个船类拟合而 conquer。最后,对所有特征中心的概率分布进行全面考虑,以实现准确地识别SAR船图像。ablation实验和OpenSARShip和FUSAR-Ship数据集的实验结果表明,我们的方法在减少待学SAR船样本的情况下实现了superior的识别性能。
SAR Ship Target Recognition Via Multi-Scale Feature Attention and Adaptive-Weighed Classifier
results: 通过对OpenSARship数据集进行实验和比较,我们的方法 Validated to achieve state-of-the-art performance for SAR ship recognition。Abstract
Maritime surveillance is indispensable for civilian fields, including national maritime safeguarding, channel monitoring, and so on, in which synthetic aperture radar (SAR) ship target recognition is a crucial research field. The core problem to realizing accurate SAR ship target recognition is the large inner-class variance and inter-class overlap of SAR ship features, which limits the recognition performance. Most existing methods plainly extract multi-scale features of the network and utilize equally each feature scale in the classification stage. However, the shallow multi-scale features are not discriminative enough, and each scale feature is not equally effective for recognition. These factors lead to the limitation of recognition performance. Therefore, we proposed a SAR ship recognition method via multi-scale feature attention and adaptive-weighted classifier to enhance features in each scale, and adaptively choose the effective feature scale for accurate recognition. We first construct an in-network feature pyramid to extract multi-scale features from SAR ship images. Then, the multi-scale feature attention can extract and enhance the principal components from the multi-scale features with more inner-class compactness and inter-class separability. Finally, the adaptive weighted classifier chooses the effective feature scales in the feature pyramid to achieve the final precise recognition. Through experiments and comparisons under OpenSARship data set, the proposed method is validated to achieve state-of-the-art performance for SAR ship recognition.
摘要
海上监测是民用领域不可或缺的,包括国家海上安全、水道监测等,Synthetic Aperture Radar(SAR)船target认识是一个关键的研究领域。recognition的核心问题是SAR船特征的大内类差和间类重叠,这限制了认识性能。现有方法通常是简单地提取网络中的多尺度特征,然后在分类阶段使用等效的每个特征尺度。然而,这些浅层多尺度特征并不是够精准的,每个尺度特征都不是等效的认识。这些因素导致了认识性能的限制。因此,我们提出了一种基于多尺度特征注意力和自适应权重分类器的SAR船认识方法,以提高每个尺度特征的表征力。我们首先在网络中构建了一个内部特征 piramid,以提取SAR船图像中的多尺度特征。然后,多尺度特征注意力可以提取和增强多尺度特征中的主成分,具有更高的内类紧密度和间类分化度。最后,自适应权重分类器可以选择网络中的有效特征尺度,以实现最终的精准认识。经过实验和比较,我们的方法在OpenSARship数据集上得到了国际顶尖的性能。
SAR ATR Method with Limited Training Data via an Embedded Feature Augmenter and Dynamic Hierarchical-Feature Refiner
results: 实验结果表明,提出的方法可以在有限SAR训练数据的情况下提高ATR性能,并且在MSTAR、OpenSARShip和FUSAR-Ship测试数据集上达到了出色的性能。Abstract
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In this study, a new method to improve SAR ATR when training data are limited is proposed. First, an embedded feature augmenter is designed to enhance the extracted virtual features located far away from the class center. Based on the relative distribution of the features, the algorithm pulls the corresponding virtual features with different strengths toward the corresponding class center. The designed augmenter increases the amount of information available for supervised training and improves the separability of the extracted features. Second, a dynamic hierarchical-feature refiner is proposed to capture the discriminative local features of the samples. Through dynamically generated kernels, the proposed refiner integrates the discriminative local features of different dimensions into the global features, further enhancing the inner-class compactness and inter-class separability of the extracted features. The proposed method not only increases the amount of information available for supervised training but also extracts the discriminative features from the samples, resulting in superior ATR performance in problems with limited SAR training data. Experimental results on the moving and stationary target acquisition and recognition (MSTAR), OpenSARShip, and FUSAR-Ship benchmark datasets demonstrate the robustness and outstanding ATR performance of the proposed method in response to limited SAR training data.
摘要
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In this study, a new method to improve SAR ATR when training data are limited is proposed. First, an embedded feature augmenter is designed to enhance the extracted virtual features located far away from the class center. Based on the relative distribution of the features, the algorithm pulls the corresponding virtual features with different strengths toward the corresponding class center. The designed augmenter increases the amount of information available for supervised training and improves the separability of the extracted features. Second, a dynamic hierarchical-feature refiner is proposed to capture the discriminative local features of the samples. Through dynamically generated kernels, the proposed refiner integrates the discriminative local features of different dimensions into the global features, further enhancing the inner-class compactness and inter-class separability of the extracted features. The proposed method not only increases the amount of information available for supervised training but also extracts the discriminative features from the samples, resulting in superior ATR performance in problems with limited SAR training data. Experimental results on the moving and stationary target acquisition and recognition (MSTAR), OpenSARShip, and FUSAR-Ship benchmark datasets demonstrate the robustness and outstanding ATR performance of the proposed method in response to limited SAR training data.Note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.
Blind Face Restoration for Under-Display Camera via Dictionary Guided Transformer
paper_authors: Jingfan Tan, Xiaoxu Chen, Tao Wang, Kaihao Zhang, Wenhan Luo, Xiaocun Cao for:The paper aims to address the problem of restoring face images taken by under-display cameras (UDCs), which are affected by significant quality degradation due to the characteristics of the display.methods:The proposed method uses a two-stage network called UDC-DMNet to synthesize UDC images by modeling the processes of UDC imaging. Additionally, a novel dictionary-guided transformer network named DGFormer is proposed, which incorporates a facial component dictionary and the characteristics of UDC images to address blind face restoration in UDC scenarios.results:The proposed methods achieve state-of-the-art performance in restoring face images taken by UDCs, with the DGFormer and UDC-DMNet achieving high restoration performance.Abstract
By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration and advances have been achieved. There are still no specialized methods and datasets for restoring UDC face images, which may be the most common problem in the UDC scene. To this end, considering color filtering, brightness attenuation, and diffraction in the imaging process of UDC, we propose a two-stage network UDC Degradation Model Network named UDC-DMNet to synthesize UDC images by modeling the processes of UDC imaging. Then we use UDC-DMNet and high-quality face images from FFHQ and CelebA-Test to create UDC face training datasets FFHQ-P/T and testing datasets CelebA-Test-P/T for UDC face restoration. We propose a novel dictionary-guided transformer network named DGFormer. Introducing the facial component dictionary and the characteristics of the UDC image in the restoration makes DGFormer capable of addressing blind face restoration in UDC scenarios. Experiments show that our DGFormer and UDC-DMNet achieve state-of-the-art performance.
摘要
By hiding the front-facing camera below the display panel, Under-Display Camera (UDC) provides users with a full-screen experience. However, due to the characteristics of the display, images taken by UDC suffer from significant quality degradation. Methods have been proposed to tackle UDC image restoration, and advances have been achieved. However, there are still no specialized methods and datasets for restoring UDC face images, which may be the most common problem in the UDC scene.To address this issue, we propose a two-stage network called UDC Degradation Model Network (UDC-DMNet) to synthesize UDC images by modeling the processes of UDC imaging. We use UDC-DMNet and high-quality face images from FFHQ and CelebA-Test to create UDC face training datasets FFHQ-P/T and testing datasets CelebA-Test-P/T for UDC face restoration.Furthermore, we propose a novel dictionary-guided transformer network named DGFormer. By introducing the facial component dictionary and the characteristics of the UDC image in the restoration, DGFormer is capable of addressing blind face restoration in UDC scenarios. Experiments show that our DGFormer and UDC-DMNet achieve state-of-the-art performance.
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning
results: 实验结果表明,本研究的方法在多种 challenging bencmarks 上具有remarkable 的超越性,与现有的状态 искусственный智能方法相比,具有大量的优势。Abstract
Watermarking serves as a widely adopted approach to safeguard media copyright. In parallel, the research focus has extended to watermark removal techniques, offering an adversarial means to enhance watermark robustness and foster advancements in the watermarking field. Existing watermark removal methods mainly rely on UNet with task-specific decoder branches--one for watermark localization and the other for background image restoration. However, watermark localization and background restoration are not isolated tasks; precise watermark localization inherently implies regions necessitating restoration, and the background restoration process contributes to more accurate watermark localization. To holistically integrate information from both branches, we introduce an implicit joint learning paradigm. This empowers the network to autonomously navigate the flow of information between implicit branches through a gate mechanism. Furthermore, we employ cross-channel attention to facilitate local detail restoration and holistic structural comprehension, while harnessing nested structures to integrate multi-scale information. Extensive experiments are conducted on various challenging benchmarks to validate the effectiveness of our proposed method. The results demonstrate our approach's remarkable superiority, surpassing existing state-of-the-art methods by a large margin.
摘要
水印技术是现代媒体版权保护的广泛采用方法之一。同时,研究焦点已经扩展到水印去除技术,提供了一种对抗性的方法,以增强水印的鲁棒性和促进水印技术的发展。现有的水印去除方法主要基于UNet架构,包括一个用于水印定位和另一个用于背景图像修复的任务特定分支。然而,水印定位和背景修复不是独立的任务,准确的水印定位直接 implies 需要修复的区域,而背景修复过程也会提高水印定位的准确性。为了整合这两个分支的信息,我们引入了隐式联合学习 paradigm。这使得网络可以自动地在两个分支之间流动信息,通过门 mechanism。此外,我们还使用了跨通道注意力来促进地方细节修复和整体结构认知,同时利用嵌入结构来整合多尺度信息。我们对多个挑战性的标准均衡进行了广泛的实验,以验证我们的提出的方法的有效性。结果表明,我们的方法在与现有状态态的方法进行比较时表现出了很大的超越。
EDDense-Net: Fully Dense Encoder Decoder Network for Joint Segmentation of Optic Cup and Disc
results: 该论文在两个公共可用的数据集上进行了评估,并与现有的状态态的方法进行了比较。结果显示,该方法在准确率和效率方面都高于现有的方法。因此,该方法可以作为医学针对系统,帮助医学眼科医生进行诊断和分析。Abstract
Glaucoma is an eye disease that causes damage to the optic nerve, which can lead to visual loss and permanent blindness. Early glaucoma detection is therefore critical in order to avoid permanent blindness. The estimation of the cup-to-disc ratio (CDR) during an examination of the optical disc (OD) is used for the diagnosis of glaucoma. In this paper, we present the EDDense-Net segmentation network for the joint segmentation of OC and OD. The encoder and decoder in this network are made up of dense blocks with a grouped convolutional layer in each block, allowing the network to acquire and convey spatial information from the image while simultaneously reducing the network's complexity. To reduce spatial information loss, the optimal number of filters in all convolution layers were utilised. In semantic segmentation, dice pixel classification is employed in the decoder to alleviate the problem of class imbalance. The proposed network was evaluated on two publicly available datasets where it outperformed existing state-of-the-art methods in terms of accuracy and efficiency. For the diagnosis and analysis of glaucoma, this method can be used as a second opinion system to assist medical ophthalmologists.
摘要
Glaucoma 是一种影视疾病,可以导致视网膜损害,从而引起视力下降和永久溃疡。 Early detection of glaucoma 是非常重要,以避免永久溃疡。在诊断 glaucoma 时, estimation of cup-to-disc ratio (CDR) 是非常重要的。在这篇论文中,我们提出了 EDDense-Net segmentation 网络,用于同时 segments OC 和 OD。这个网络的encoder 和 decoder 都由密集层组成,每个密集层都有 grouped convolutional layer,使网络可以从图像中获取和传递空间信息,同时减少网络的复杂性。为了避免空间信息损失,我们使用了最佳数量的 filters 在所有卷积层中。在 semantic segmentation 中,我们使用了 dice pixel classification 来缓解类别不均衡问题。我们的方法在两个公共可用的数据集上进行了评估,并在精度和效率方面超过了现有的状态对照方法。这种方法可以作为医疗视科医生的第二意见系统,帮助他们诊断和分析 glaucoma。
Contrastive Diffusion Model with Auxiliary Guidance for Coarse-to-Fine PET Reconstruction
paper_authors: Zeyu Han, Yuhan Wang, Luping Zhou, Peng Wang, Binyu Yan, Jiliu Zhou, Yan Wang, Dinggang Shen for: This paper aims to improve the quality of standard-dose positron emission tomography (PET) scans while reducing radiation exposure to the human body.methods: The proposed method uses a coarse-to-fine framework that consists of a coarse prediction module (CPM) and an iterative refinement module (IRM). The CPM generates a coarse PET image via a deterministic process, while the IRM samples the residual iteratively. Additionally, two strategies are proposed and integrated into the reconstruction process to enhance the correspondence between the low-dose PET (LPET) image and the reconstructed PET (RPET) image.results: The proposed method outperforms state-of-the-art PET reconstruction methods in terms of clinical reliability, as demonstrated by extensive experiments on two human brain PET datasets.Abstract
To obtain high-quality positron emission tomography (PET) scans while reducing radiation exposure to the human body, various approaches have been proposed to reconstruct standard-dose PET (SPET) images from low-dose PET (LPET) images. One widely adopted technique is the generative adversarial networks (GANs), yet recently, diffusion probabilistic models (DPMs) have emerged as a compelling alternative due to their improved sample quality and higher log-likelihood scores compared to GANs. Despite this, DPMs suffer from two major drawbacks in real clinical settings, i.e., the computationally expensive sampling process and the insufficient preservation of correspondence between the conditioning LPET image and the reconstructed PET (RPET) image. To address the above limitations, this paper presents a coarse-to-fine PET reconstruction framework that consists of a coarse prediction module (CPM) and an iterative refinement module (IRM). The CPM generates a coarse PET image via a deterministic process, and the IRM samples the residual iteratively. By delegating most of the computational overhead to the CPM, the overall sampling speed of our method can be significantly improved. Furthermore, two additional strategies, i.e., an auxiliary guidance strategy and a contrastive diffusion strategy, are proposed and integrated into the reconstruction process, which can enhance the correspondence between the LPET image and the RPET image, further improving clinical reliability. Extensive experiments on two human brain PET datasets demonstrate that our method outperforms the state-of-the-art PET reconstruction methods. The source code is available at \url{https://github.com/Show-han/PET-Reconstruction}.
摘要
为了获得高质量的 positron emission tomography(PET)扫描图像,同时减少人体被暴露于辐射的风险,不同的方法已经被提议用于从低剂量PET(LPET)图像中重建标准剂量PET(SPET)图像。一种广泛采用的方法是生成 adversarial networks(GANs),然而,最近,扩散概率模型(DPMs)已经出现为一种吸引人的替代方案,因为它们可以提供更高的样本质量和更高的logslikelihood分数。然而,DPMs在实际临床应用中受到两个主要的限制:一是计算昂贵的采样过程,二是不足的保留LPET图像和重建PET图像之间的对应关系。为了解决以上限制,本文提出了一种粗略到细节的PET重建框架,包括一个粗略预测模块(CPM)和一个迭代优化模块(IRM)。CPM使用权重函数生成一个粗略PET图像,而IRM采样了剩余的信息,并在每次迭代中优化PET图像。通过委托大部分计算负担给CPM,我们的方法的总采样速度可以得到明显的提高。此外,我们还提出了两种附加策略,即auxiliary guidance strategy和contrastive diffusion strategy,并将它们纳入重建过程中,以进一步提高LPET图像和重建PET图像之间的对应关系,从而提高临床可靠性。我们在两个人脑PET数据集上进行了广泛的实验,结果表明,我们的方法超过了现有的PET重建方法。源代码可以在 \url{https://github.com/Show-han/PET-Reconstruction} 中下载。
Federated Pseudo Modality Generation for Incomplete Multi-Modal MRI Reconstruction
paper_authors: Yunlu Yan, Chun-Mei Feng, Yuexiang Li, Rick Siow Mong Goh, Lei Zhu
for: addresses the missing modality challenge in federated multi-modal MRI reconstruction
methods: utilizes a pseudo modality generation mechanism to recover the missing modality, and introduces a clustering scheme to reduce communication costs
results: can effectively complete the missing modality within an acceptable communication cost, with similar performance to the ideal scenarioAbstract
While multi-modal learning has been widely used for MRI reconstruction, it relies on paired multi-modal data which is difficult to acquire in real clinical scenarios. Especially in the federated setting, the common situation is that several medical institutions only have single-modal data, termed the modality missing issue. Therefore, it is infeasible to deploy a standard federated learning framework in such conditions. In this paper, we propose a novel communication-efficient federated learning framework, namely Fed-PMG, to address the missing modality challenge in federated multi-modal MRI reconstruction. Specifically, we utilize a pseudo modality generation mechanism to recover the missing modality for each single-modal client by sharing the distribution information of the amplitude spectrum in frequency space. However, the step of sharing the original amplitude spectrum leads to heavy communication costs. To reduce the communication cost, we introduce a clustering scheme to project the set of amplitude spectrum into finite cluster centroids, and share them among the clients. With such an elaborate design, our approach can effectively complete the missing modality within an acceptable communication cost. Extensive experiments demonstrate that our proposed method can attain similar performance with the ideal scenario, i.e., all clients have the full set of modalities. The source code will be released.
摘要
在多模式学习广泛应用于MRI重建中,它依赖于对应的多模式数据,但在实际临床场景下获得这些数据是困难的。特别在联邦设置下,许多医疗机构只有单模式数据,称之为模式缺失问题。因此,使用标准联邦学习框架是无法实现的。在这篇论文中,我们提出了一种新的通信效率高的联邦学习框架,即Fed-PMG,以解决联邦多模式MRI重建中的模式缺失问题。特别是,我们利用 pseudo 模式生成机制来为每个单模式客户端 recuperate 缺失的模式。然而,在共享原始幅谱信息时,会导致重大的通信成本。为了减少通信成本,我们提出了一种归一化 schemes,将幅谱信息Project 到finite cluster centroids,并在客户端之间分享它们。与这种细化的设计相比,我们的方法可以有效地完成缺失模式,并且在接受ABLE 的通信成本下。我们的实验结果表明,我们的提议方法可以达到与理想情况(即所有客户端具有完整的模式)的相似性。源代码将发布。
Polymerized Feature-based Domain Adaptation for Cervical Cancer Dose Map Prediction
results: 实验结果显示,提案的方法比现有方法更高效,能够更好地预测乳腺癌的剂量地图。Abstract
Recently, deep learning (DL) has automated and accelerated the clinical radiation therapy (RT) planning significantly by predicting accurate dose maps. However, most DL-based dose map prediction methods are data-driven and not applicable for cervical cancer where only a small amount of data is available. To address this problem, this paper proposes to transfer the rich knowledge learned from another cancer, i.e., rectum cancer, which has the same scanning area and more clinically available data, to improve the dose map prediction performance for cervical cancer through domain adaptation. In order to close the congenital domain gap between the source (i.e., rectum cancer) and the target (i.e., cervical cancer) domains, we develop an effective Transformer-based polymerized feature module (PFM), which can generate an optimal polymerized feature distribution to smoothly align the two input distributions. Experimental results on two in-house clinical datasets demonstrate the superiority of the proposed method compared with state-of-the-art methods.
摘要
近期,深度学习(DL)已经自动化和加速了临床辐射治疗(RT)规划,并且预测精确剂量图。然而,大多数DL基于的剂量图预测方法是数据驱动的,而且不适用于颈部癌,因为颈部癌的数据很少。为解决这个问题,本文提出了将另一种癌种( namely,肛门癌)的丰富知识传播到颈部癌中,以提高剂量图预测性能。为了关闭预生域差距 между源(即肛门癌)和目标(即颈部癌)域,我们开发了一种高效的Transformer基于的聚合特征模块(PFM),可以生成一个最佳的聚合特征分布,以平滑地将两个输入分布对齐。实验结果表明,提出的方法在两个自有临床数据集上比状态元方法更高效。
Sensitivity analysis of AI-based algorithms for autonomous driving on optical wavefront aberrations induced by the windshield
results: 这篇论文通过evaluating两个感知模型对不同镜头配置的敏感性来研究域名shift问题。结果显示,镜头配置会导致性能差距,而现有的光学度量函数可能不够用于 pose 要求。Abstract
Autonomous driving perception techniques are typically based on supervised machine learning models that are trained on real-world street data. A typical training process involves capturing images with a single car model and windshield configuration. However, deploying these trained models on different car types can lead to a domain shift, which can potentially hurt the neural networks performance and violate working ADAS requirements. To address this issue, this paper investigates the domain shift problem further by evaluating the sensitivity of two perception models to different windshield configurations. This is done by evaluating the dependencies between neural network benchmark metrics and optical merit functions by applying a Fourier optics based threat model. Our results show that there is a performance gap introduced by windshields and existing optical metrics used for posing requirements might not be sufficient.
摘要
自主驾驶感知技术通常基于指导学习模型,这些模型通过实际街道数据进行训练。一般来说,训练过程中使用单一汽车型和车窗配置拍摄图像。然而,将这些训练过的模型部署到不同的汽车类型上可能会导致域shift问题,这可能会影响神经网络性能并违反工作ADAS要求。为解决这个问题,本文进一步调查了域shift问题,并评估了两种感知模型对不同车窗配置的敏感性。我们通过应用 Fourier optics 基于威胁模型来评估神经网络指标与光学功能函数之间的相互关系。我们的结果显示,车窗会引入性能差距,而现有的光学指标可能不够用于满足工作ADAS要求。