eess.IV - 2023-07-03

Cross-modality Attention Adapter: A Glioma Segmentation Fine-tuning Method for SAM Using Multimodal Brain MR Images

  • paper_url: http://arxiv.org/abs/2307.01124
  • repo_url: None
  • paper_authors: Xiaoyu Shi, Shurong Chai, Yinhao Li, Jingliang Cheng, Jie Bai, Guohua Zhao, Yen-Wei Chen
  • for: 针对于基因型预测和诊断 glioma 的基础设施建设
  • methods: 利用 multimodal 融合和 Cross-modality attention adapter 来细化基础模型,以提高 glioma 分 segmentation 的准确性
  • results: 在 private glioma 数据集上,提出的方法可以达到 88.38% 的 Dice 和 10.64 的 Hausdorff distance,比现状态对照方法提高了 4%,为 glioma 治疗带来更好的效果
    Abstract According to the 2021 World Health Organization (WHO) Classification scheme for gliomas, glioma segmentation is a very important basis for diagnosis and genotype prediction. In general, 3D multimodal brain MRI is an effective diagnostic tool. In the past decade, there has been an increase in the use of machine learning, particularly deep learning, for medical images processing. Thanks to the development of foundation models, models pre-trained with large-scale datasets have achieved better results on a variety of tasks. However, for medical images with small dataset sizes, deep learning methods struggle to achieve better results on real-world image datasets. In this paper, we propose a cross-modality attention adapter based on multimodal fusion to fine-tune the foundation model to accomplish the task of glioma segmentation in multimodal MRI brain images with better results. The effectiveness of the proposed method is validated via our private glioma data set from the First Affiliated Hospital of Zhengzhou University (FHZU) in Zhengzhou, China. Our proposed method is superior to current state-of-the-art methods with a Dice of 88.38% and Hausdorff distance of 10.64, thereby exhibiting a 4% increase in Dice to segment the glioma region for glioma treatment.
    摘要 (Simplified Chinese translation)根据2021年世界卫生组织(WHO)的分类方案,肿瘤分 segmentation 是诊断和基因预测的重要基础。一般来说,3D多Modal MRI 是脑部的有效诊断工具。过去十年,机器学习,特别是深度学习,在医疗图像处理领域得到了广泛的应用。感谢基模型的发展,预训练于大规模数据集的模型在多种任务上取得了更好的结果。然而,医疗图像中的小 dataset SIZE 下,深度学习方法在真实世界图像集上表现不佳。在这篇论文中,我们提出了跨Modal 注意力适配器,基于多Modal 融合来细化基模型,以实现多Modal MRI 脑部图像中的肿瘤分 segmentation 任务。我们验证了我们的提议方法的有效性,通过我们自己的私人肿瘤数据集,来自 Zhengzhou University First Affiliated Hospital(FHZU)在 Zhengzhou, China。我们的提议方法比现有状态的方法高出4%的Dice,即88.38%,并且 Hausdorff distance 为10.64,因此能够更好地 segmentation 肿瘤区域,为肿瘤治疗提供更好的基础。

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

  • paper_url: http://arxiv.org/abs/2307.00954
  • repo_url: None
  • paper_authors: Kang Yi, Jing Xu, Xiao Jin, Fu Guo, Yan-Feng Wu
  • for: RGB-D SOD 提高 salient object detection 的精度和效率,并可以更好地处理不同类型的图像和深度图像。
  • methods: 提出了一种基于 transformer 和 CNN 架构的高阶不同交互网络 (HODINet),通过不同阶段的权重 fusion 来实现跨模态特征的共同学习和权重调整。
  • results: 对七个常用的数据集进行了广泛的实验,并达到了对 24 种state-of-the-art 方法的竞争性性能,并且在四个评价指标上显示了更高的精度和效率。
    Abstract RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information. Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features. However, these features contribute differently to the final saliency results, which raises two issues: 1) how to model discrepant characteristics of RGB images and depth maps; 2) how to fuse these cross-modality features in different stages. In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD. Concretely, we first employ transformer-based and CNN-based architectures as backbones to encode RGB and depth features, respectively. Then, the high-order representations are delicately extracted and embedded into spatial and channel attentions for cross-modality feature fusion in different stages. Specifically, we design a high-order spatial fusion (HOSF) module and a high-order channel fusion (HOCF) module to fuse features of the first two and the last two stages, respectively. Besides, a cascaded pyramid reconstruction network is adopted to progressively decode the fused features in a top-down pathway. Extensive experiments are conducted on seven widely used datasets to demonstrate the effectiveness of the proposed approach. We achieve competitive performance against 24 state-of-the-art methods under four evaluation metrics.
    摘要

An open-source deep learning algorithm for efficient and fully-automatic analysis of the choroid in optical coherence tomography

  • paper_url: http://arxiv.org/abs/2307.00904
  • repo_url: None
  • paper_authors: Jamie Burke, Justin Engelmann, Charlene Hamid, Megan Reid-Schachter, Tom Pearson, Dan Pugh, Neeraj Dhaun, Stuart King, Tom MacGillivray, Miguel O. Bernabeu, Amos Storkey, Ian J. C. MacCormick
  • For: The paper aims to develop a fully-automatic, open-source algorithm for choroidal segmentation in optical coherence tomography (OCT) data.* Methods: The authors used a dataset of 715 OCT B-scans from 3 clinical studies, and finetuned a pre-trained UNet with MobileNetV3 backbone to segment the choroid region.* Results: DeepGPET, the proposed algorithm, achieves excellent agreement with a clinically validated semi-automatic segmentation method (GPET) and reduces the processing time per image by a factor of 27.Here’s the information in Simplified Chinese text:
  • for: 这个研究的目的是开发一个完全自动化、开源的choroidregion分割算法,用于optical coherence tomography(OCT)数据。
  • methods: 作者使用了715个OCT B-scan数据(82个研究subject,115个眼球)从3个临床研究,并对pre-trained的UNet with MobileNetV3 backbone进行了微调,以实现choroidregion的分割。
  • results: DeepGPET,提出的算法,与临床验证的半自动化分割方法(GPET)达到了极高的一致性(AUC=0.9994,Dice=0.9664;Pearson相关系数为0.8908和0.9082),同时将每个图像的处理时间从34.49秒(±15.09)降低到1.25秒(±0.10)。两种方法在临床专业人员的评价下表现相似,而且不需要人工干预。
    Abstract Purpose: To develop an open-source, fully-automatic deep learning algorithm, DeepGPET, for choroid region segmentation in optical coherence tomography (OCT) data. Methods: We used a dataset of 715 OCT B-scans (82 subjects, 115 eyes) from 3 clinical studies related to systemic disease. Ground truth segmentations were generated using a clinically validated, semi-automatic choroid segmentation method, Gaussian Process Edge Tracing (GPET). We finetuned a UNet with MobileNetV3 backbone pre-trained on ImageNet. Standard segmentation agreement metrics, as well as derived measures of choroidal thickness and area, were used to evaluate DeepGPET, alongside qualitative evaluation from a clinical ophthalmologist. Results: DeepGPET achieves excellent agreement with GPET on data from 3 clinical studies (AUC=0.9994, Dice=0.9664; Pearson correlation of 0.8908 for choroidal thickness and 0.9082 for choroidal area), while reducing the mean processing time per image on a standard laptop CPU from 34.49s ($\pm$15.09) using GPET to 1.25s ($\pm$0.10) using DeepGPET. Both methods performed similarly according to a clinical ophthalmologist, who qualitatively judged a subset of segmentations by GPET and DeepGPET, based on smoothness and accuracy of segmentations. Conclusions :DeepGPET, a fully-automatic, open-source algorithm for choroidal segmentation, will enable researchers to efficiently extract choroidal measurements, even for large datasets. As no manual interventions are required, DeepGPET is less subjective than semi-automatic methods and could be deployed in clinical practice without necessitating a trained operator. DeepGPET addresses the lack of open-source, fully-automatic and clinically relevant choroid segmentation algorithms, and its subsequent public release will facilitate future choroidal research both in ophthalmology and wider systemic health.
    摘要 目的:开发一个开源的自动化深度学习算法DeepGPET,用于光谱凝结 Tomatoes(OCT)数据中的choroid区域分割。方法:我们使用了715个OCT B-scan(82名病人,115只眼)从3个临床研究中获取数据,这些研究与系统疾病相关。我们使用了一种临床验证的、 semi-automatic choroid分割方法—— Gaussian Process Edge Tracing(GPET)生成了参考标准分割。我们使用了MobileNetV3预训练在ImageNet上的UNet,并对其进行了微调。我们使用了标准的分割一致度量和 derived的choroid厚度和面积度量来评估DeepGPET,并与一名临床眼科医生的评估。结果:DeepGPET与GPET在3个临床研究中达到了极高的一致度(AUC=0.9994,Dice=0.9664; Pearson相关度=0.8908 дляchoroid厚度和0.9082 дляchoroid面积),而同时减少了每个图像的平均处理时间从34.49秒(±15.09)使用GPET下降到1.25秒(±0.10)使用DeepGPET。两种方法在临床眼科医生的评估下表现相似,后者根据分割的平滑度和准确性进行质量评估。结论:DeepGPET是一个开源的、自动化的、临床相关的choroid分割算法,可以帮助研究人员快速提取choroid的测量数据,即使是大型数据集。由于不需要人工干预,DeepGPET比 semi-automatic方法更加 Objective,可以在临床实践中无需训练操作员而使用。DeepGPET解决了开源、自动化和临床相关的choroid分割算法的缺失,其后续的公共发布将促进未来choroid研究的进展,不仅在眼科领域,还在更广泛的系统医学领域。

Synthesis of Contrast-Enhanced Breast MRI Using Multi-b-Value DWI-based Hierarchical Fusion Network with Attention Mechanism

  • paper_url: http://arxiv.org/abs/2307.00895
  • repo_url: None
  • paper_authors: Tianyu Zhang, Luyi Han, Anna D’Angelo, Xin Wang, Yuan Gao, Chunyao Lu, Jonas Teuwen, Regina Beets-Tan, Tao Tan, Ritse Mann
    for:The paper aims to develop a multi-sequence fusion network to synthesize contrast-enhanced MRI (CE-MRI) based on T1-weighted MRI and diffusion-weighted imaging (DWI) to potentially reduce or avoid the use of gadolinium-based contrast agents (GBCA).methods:The proposed method uses a multi-sequence fusion network that combines T1-weighted MRI and DWIs with different b-values to efficiently utilize the difference features of DWIs. The method also includes a multi-sequence attention module to obtain refined feature maps and a weighted difference module to leverage hierarchical representation information fused at different scales.results:The results show that the multi-b-value DWI-based fusion model can potentially be used to synthesize CE-MRI, thus theoretically reducing or avoiding the use of GBCA, thereby minimizing the burden to patients.
    Abstract Magnetic resonance imaging (MRI) is the most sensitive technique for breast cancer detection among current clinical imaging modalities. Contrast-enhanced MRI (CE-MRI) provides superior differentiation between tumors and invaded healthy tissue, and has become an indispensable technique in the detection and evaluation of cancer. However, the use of gadolinium-based contrast agents (GBCA) to obtain CE-MRI may be associated with nephrogenic systemic fibrosis and may lead to bioaccumulation in the brain, posing a potential risk to human health. Moreover, and likely more important, the use of gadolinium-based contrast agents requires the cannulation of a vein, and the injection of the contrast media which is cumbersome and places a burden on the patient. To reduce the use of contrast agents, diffusion-weighted imaging (DWI) is emerging as a key imaging technique, although currently usually complementing breast CE-MRI. In this study, we develop a multi-sequence fusion network to synthesize CE-MRI based on T1-weighted MRI and DWIs. DWIs with different b-values are fused to efficiently utilize the difference features of DWIs. Rather than proposing a pure data-driven approach, we invent a multi-sequence attention module to obtain refined feature maps, and leverage hierarchical representation information fused at different scales while utilizing the contributions from different sequences from a model-driven approach by introducing the weighted difference module. The results show that the multi-b-value DWI-based fusion model can potentially be used to synthesize CE-MRI, thus theoretically reducing or avoiding the use of GBCA, thereby minimizing the burden to patients. Our code is available at \url{https://github.com/Netherlands-Cancer-Institute/CE-MRI}.
    摘要 磁共振成像(MRI)是当前临床成像技术中检测乳腺癌最敏感的方法。增强磁共振成像(CE-MRI)可提供较好的诊断和评估结果,但是使用加多林镁基contrast剂(GBCA)可能会导致肾脏系统 fibrosis和脑部堆积,危及人类健康。此外,使用contrast剂还需要胸部静脉填充和干扰性高,对患者造成困扰。为了减少contrast剂的使用,扩散成像(DWI)正在成为一种关键的成像技术,尽管目前通常作为乳腺CE-MRI的补充。在这项研究中,我们开发了一种多序列融合网络,将CE-MRI基于T1束重度成像和DWI的数据融合。DWI的不同b值被融合,以高效利用DWI的差异特征。而不是直接提出数据驱动的方法,我们创造了一种多序列注意模块,以获得精细的特征地图,并利用层次表示信息在不同级别上进行融合,同时利用不同序列的贡献。结果表明,多b值DWI基本融合模型可能可以Synthesize CE-MRI,从而可能避免或减少GBCA的使用,为患者减轻负担。我们的代码可以在 上获取。

An Explainable Deep Framework: Towards Task-Specific Fusion for Multi-to-One MRI Synthesis

  • paper_url: http://arxiv.org/abs/2307.00885
  • repo_url: https://github.com/fiy2w/mri_seq2seq
  • paper_authors: Luyi Han, Tianyu Zhang, Yunzhi Huang, Haoran Dou, Xin Wang, Yuan Gao, Chunyao Lu, Tan Tao, Ritse Mann
  • for: 这篇论文的目的是提出一个可解释的任务特定的合成网络,以便在缺失某些序列的情况下,可以将多个可用的序列合成为完整的序列。
  • methods: 这篇论文使用的方法是基于深度学习的合成方法,并将这些方法与可解释的任务特定的模组结合在一起,以提高合成的可靠性和可读性。
  • results: 根据BraTS2021 dataset的1251个主题,这篇论文的方法比之前的方法更好,并且可以将缺失的序列合成为完整的序列。
    Abstract Multi-sequence MRI is valuable in clinical settings for reliable diagnosis and treatment prognosis, but some sequences may be unusable or missing for various reasons. To address this issue, MRI synthesis is a potential solution. Recent deep learning-based methods have achieved good performance in combining multiple available sequences for missing sequence synthesis. Despite their success, these methods lack the ability to quantify the contributions of different input sequences and estimate the quality of generated images, making it hard to be practical. Hence, we propose an explainable task-specific synthesis network, which adapts weights automatically for specific sequence generation tasks and provides interpretability and reliability from two sides: (1) visualize the contribution of each input sequence in the fusion stage by a trainable task-specific weighted average module; (2) highlight the area the network tried to refine during synthesizing by a task-specific attention module. We conduct experiments on the BraTS2021 dataset of 1251 subjects, and results on arbitrary sequence synthesis indicate that the proposed method achieves better performance than the state-of-the-art methods. Our code is available at \url{https://github.com/fiy2W/mri_seq2seq}.
    摘要 多序列MRI在临床设置中是有价值的,可以帮助确定疾病诊断和治疗预测,但某些序列可能无法使用或缺失。为解决这个问题,MRI合成是一个可能的解决方案。最新的深度学习基于方法在多个可用序列的组合中实现了好的性能。尽管它们在实际应用中具有成功,但它们缺乏对不同输入序列的贡献的评估和生成图像质量的估计,这使得它们在实践中具有困难。因此,我们提出了可解释的任务特定合成网络,它自动适应不同的序列生成任务,并提供了可读性和可靠性从两个方面:1. 在合并阶段,通过可训练的任务特定权重平均模块,可以视觑每个输入序列的贡献。2. 通过任务特定注意力模块,可以高亮生成过程中网络尝试修改的区域。我们在BraTS2021数据集上进行了1251个主题的实验,并得到了比 estado-of-the-art 方法更好的性能。我们的代码可以在 GitHub 上找到:\url{https://github.com/fiy2W/mri_seq2seq}。

End-To-End Prediction of Knee Osteoarthritis Progression With Multi-Modal Transformers

  • paper_url: http://arxiv.org/abs/2307.00873
  • repo_url: None
  • paper_authors: Egor Panfilov, Simo Saarakkala, Miika T. Nieminen, Aleksei Tiulpin
  • for: The paper aims to develop a unified framework for multi-modal fusion of knee imaging data to predict the progression of knee osteoarthritis (KOA) and provide new tools for the design of more efficient clinical trials.
  • methods: The authors use a Transformer approach to fuse structural knee MRI, multi-modal imaging data, and clinical data to predict KOA progression. They analyze the performance of their framework across different progression horizons and investigate the effectiveness of different modalities and subject subgroups.
  • results: The authors report that structural knee MRI can identify radiographic KOA progressors with an area under the ROC curve (ROC AUC) of 0.70-0.76 and Average Precision (AP) of 0.15-0.54 in 2-8 year horizons. They also find that multi-modal fusion of imaging data can predict KOA progression within 1 year with high accuracy (ROC AUC of 0.76(0.04), AP of 0.13(0.04)). Additionally, they identify post-traumatic subjects as the most accurate for prediction from imaging data.
    Abstract Knee Osteoarthritis (KOA) is a highly prevalent chronic musculoskeletal condition with no currently available treatment. The manifestation of KOA is heterogeneous and prediction of its progression is challenging. Current literature suggests that the use of multi-modal data and advanced modeling methods, such as the ones based on Deep Learning, has promise in tackling this challenge. To date, however, the evidence on the efficacy of this approach is limited. In this study, we leveraged recent advances in Deep Learning and, using a Transformer approach, developed a unified framework for the multi-modal fusion of knee imaging data. Subsequently, we analyzed its performance across a range of scenarios by investigating multiple progression horizons -- from short-term to long-term. We report our findings using a large cohort (n=2421-3967) derived from the Osteoarthritis Initiative dataset. We show that structural knee MRI allows identifying radiographic KOA progressors on par with multi-modal fusion approaches, achieving an area under the ROC curve (ROC AUC) of 0.70-0.76 and Average Precision (AP) of 0.15-0.54 in 2-8 year horizons. Progression within 1 year was better predicted with a multi-modal method using X-ray, structural, and compositional MR images -- ROC AUC of 0.76(0.04), AP of 0.13(0.04) -- or via clinical data. Our follow-up analysis generally shows that prediction from the imaging data is more accurate for post-traumatic subjects, and we further investigate which subject subgroups may benefit the most. The present study provides novel insights into multi-modal imaging of KOA and brings a unified data-driven framework for studying its progression in an end-to-end manner, providing new tools for the design of more efficient clinical trials. The source code of our framework and the pre-trained models are made publicly available.
    摘要 《骨 JOINT arthritis (KOA) 是一种非常普遍的慢性骨附着病,目前还没有任何有效的治疗方法。KOA 的表现具有各种各样的特征,预测其进程是非常困难的。当前的文献表明,使用多Modal 数据和高级模型方法,如基于深度学习的方法,有可能解决这一挑战。然而,至今为止,这种方法的有效性的证据还很有限。在这项研究中,我们利用了最新的深度学习技术,使用Transformer 方法,对骨 JOINT 成像数据进行多Modal 融合。然后,我们分析了其性能在不同的enario中,包括不同的进程时间 horizon (从短期到长期)。我们通过使用大量的 cohort (n=2421-3967), derive 从骨 JOINT 创新数据集,报告我们的发现。我们发现,骨 JOINT 成像 MRI 可以与多Modal 融合方法一样准确地识别骨 JOINT 急性进程者,其 ROC AUC 在 0.70-0.76 之间,AP 在 0.15-0.54 之间。在 1 年内的进程预测中,使用 X-ray、骨 JOINT 结构和组成 MRI 的多Modal 方法可以达到 ROC AUC 的 0.76(0.04),AP 的 0.13(0.04)。或者通过临床数据来预测。我们的跟踪分析通常表明,骨 JOINT 成像数据预测的准确性更高,特别是对post-traumatic 患者。我们进一步调查了哪些子群体可能受益最多。这项研究提供了关于骨 JOINT 多Modal 成像的新的意见,并提供了一个通用的数据驱动的框架,可以在一个端到端的方式来研究骨 JOINT 进程的预测,提供新的工具,用于设计更有效的临床试验。我们的框架和预训练模型的源代码公开可用。

Anisotropic Fanning Aware Low-Rank Tensor Approximation Based Tractography

  • paper_url: http://arxiv.org/abs/2307.00833
  • repo_url: None
  • paper_authors: Johannes Grün, Jonah Sieg, Thomas Schultz
  • for: 本研究旨在提高某些 tractography 方法的完teness和准确性,特别是处理纤维 crossing 和 fanning 问题。
  • methods: 该研究使用了 low-rank higher-order tensor approximation 技术,并将 anisotropic fanning 模型integrated into a recently proposed tractography method。
  • results: 研究结果表明,在12个人 connectome project Subjects中,该扩展模型可以增加 tract 的完teness和准确性,同时可以避免过度重建。此外,该模型的结果也比基于 Watson 分布的简单模型更为准确。
    Abstract Low-rank higher-order tensor approximation has been used successfully to extract discrete directions for tractography from continuous fiber orientation density functions (fODFs). However, while it accounts for fiber crossings, it has so far ignored fanning, which has led to incomplete reconstructions. In this work, we integrate an anisotropic model of fanning based on the Bingham distribution into a recently proposed tractography method that performs low-rank approximation with an Unscented Kalman Filter. Our technical contributions include an initialization scheme for the new parameters, which is based on the Hessian of the low-rank approximation, pre-integration of the required convolution integrals to reduce the computational effort, and representation of the required 3D rotations with quaternions. Results on 12 subjects from the Human Connectome Project confirm that, in almost all considered tracts, our extended model significantly increases completeness of the reconstruction, while reducing excess, at acceptable additional computational cost. Its results are also more accurate than those from a simpler, isotropic fanning model that is based on Watson distributions.
    摘要 低阶高级张量近似法已成功地提取维度分布函数(fODFs)中连续纤维方向的精确方向。然而,它虽然考虑到纤维交叉,但忽略了扩散,导致重建不准确。在这项工作中,我们将基于比频分布的扩散模型纳入最近提出的 tractography 方法中,并使用不确定 kalman 滤波器进行低阶近似。我们的技术贡献包括基于低阶近似的 Hessian 初始化方法,预处理所需的 convolution 积分,以及使用 quaternion 表示所需的 3D 旋转。results 表明,在12名人类连接度计划的试验Subject中,我们的扩展模型可以在大多数考虑的轨迹中提高重建的完整性,而且降低过量,acceptable的额外计算成本。它的结果还比基于 Watson 分布的简单的扩散模型更准确。

ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2307.00781
  • repo_url: None
  • paper_authors: Axi Niu, Pham Xuan Trung, Kang Zhang, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang
  • for: 这个论文是为了提高图像超分辨率(SR)的批处速度而设计的。
  • methods: 该方法基于标准扩散模型,通过确定性的迭代减噪过程来实现SR。
  • results: 对于多个标准 benchmark 数据集(Set5、Set14、Urban100、BSD100、Manga109)的实验结果表明,我们的方法可以超过前一个尝试,并且生成更加可见和实用的低分辨率图像的高分辨率对应图像。
    Abstract Diffusion models have gained significant popularity in the field of image-to-image translation. Previous efforts applying diffusion models to image super-resolution (SR) have demonstrated that iteratively refining pure Gaussian noise using a U-Net architecture trained on denoising at various noise levels can yield satisfactory high-resolution images from low-resolution inputs. However, this iterative refinement process comes with the drawback of low inference speed, which strongly limits its applications. To speed up inference and further enhance the performance, our research revisits diffusion models in image super-resolution and proposes a straightforward yet significant diffusion model-based super-resolution method called ACDMSR (accelerated conditional diffusion model for image super-resolution). Specifically, our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process. Our study also highlights the effectiveness of using a pre-trained SR model to provide the conditional image of the given low-resolution (LR) image to achieve superior high-resolution results. We demonstrate that our method surpasses previous attempts in qualitative and quantitative results through extensive experiments conducted on benchmark datasets such as Set5, Set14, Urban100, BSD100, and Manga109. Moreover, our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
    摘要 Diffusion models 在图像到图像翻译领域得到了广泛的应用。先前的尝试使用 diffusion models 进行图像超解析(SR)已经证明了,通过多次使用 U-Net 架构训练在不同噪声水平上进行杜尼进行反射,可以从低解析输入图像中获得满意的高解析图像。然而,这个迭代纠正过程受到了推理速度的限制,这限制了其应用。为了加速推理和进一步提高性能,我们的研究重新评估了 diffusion models 在图像 SR 领域,并提出了一种简单但具有显著意义的 diffusion model-based SR 方法,称为 ACDMSR (加速条件扩散模型 для图像 SR)。我们的方法将标准的扩散模型应用于图像 SR 领域,通过一种决定性的迭代噪声纠正过程来进行 SR。我们的研究还发现了使用预训练 SR 模型提供给给定的低解析(LR)图像的 conditional 图像,可以实现更高的高解析结果。我们在 Set5、Set14、Urban100、BSD100 和 Manga109 等标准数据集上进行了广泛的实验,并证明了我们的方法在质量和量化方面的优越性。此外,我们的方法可以生成更加视觉真实的低解析图像,强调其在实际应用中的有效性。

Efficient Visual Fault Detection for Freight Train Braking System via Heterogeneous Self Distillation in the Wild

  • paper_url: http://arxiv.org/abs/2307.00701
  • repo_url: https://github.com/MVME-HBUT/HSD-FTI-FDet
  • paper_authors: Yang Zhang, Huilin Pan, Yang Zhou, Mingying Li, Guodong Sun
  • for: 本文提出了一种基于自适应融合的深度学习检测折衣列车 fault的方法,以确保铁路运营的安全性。
  • methods: 本文使用了一种带有多种特征的自适应融合框架,以确保检测精度和速度的同时满足资源限制。在这个框架中,教师模型通过灌注秘密信息来帮助学生模型提高性能。
  • results: 实验结果表明,本文的方法可以在四个 fault 数据集上达到37帧每秒的速度,并保持最高准确率。相比传统的杂合方法,本文的方法具有较低的内存使用量和最小的模型大小。
    Abstract Efficient visual fault detection of freight trains is a critical part of ensuring the safe operation of railways under the restricted hardware environment. Although deep learning-based approaches have excelled in object detection, the efficiency of freight train fault detection is still insufficient to apply in real-world engineering. This paper proposes a heterogeneous self-distillation framework to ensure detection accuracy and speed while satisfying low resource requirements. The privileged information in the output feature knowledge can be transferred from the teacher to the student model through distillation to boost performance. We first adopt a lightweight backbone to extract features and generate a new heterogeneous knowledge neck. Such neck models positional information and long-range dependencies among channels through parallel encoding to optimize feature extraction capabilities. Then, we utilize the general distribution to obtain more credible and accurate bounding box estimates. Finally, we employ a novel loss function that makes the network easily concentrate on values near the label to improve learning efficiency. Experiments on four fault datasets reveal that our framework can achieve over 37 frames per second and maintain the highest accuracy in comparison with traditional distillation approaches. Moreover, compared to state-of-the-art methods, our framework demonstrates more competitive performance with lower memory usage and the smallest model size.
    摘要 高效的货运列车缺陷检测是铁路安全运行的关键部分,但深度学习方法在实际工程环境中的应用效率仍然不够高。这篇论文提出了一种多元自适应框架,以确保检测精度和速度同时满足资源限制。我们首先采用了轻量级的背bone来提取特征和生成新的多元知识颈部。这个颈部模型通过并行编码来保持通道之间的位势信息和长距离依赖关系,以便优化特征提取能力。然后,我们使用通用分布来获得更准确和可靠的 bounding box 估计。最后,我们使用一种新的损失函数,让网络更容易集中在标签附近的值上来提高学习效率。实验表明,我们的框架可以在四个缺陷集上达到37帧/秒的速度,并且与传统杜邦方法相比,我们的框架在精度和资源使用量上具有更高的竞争力。