eess.IV - 2023-07-01

CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model

  • paper_url: http://arxiv.org/abs/2307.07518
  • repo_url: None
  • paper_authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang
  • for: 这个研究旨在探索基于多modal cephalometric医疗数据的诊断语言模型。
  • methods: 研究使用多modal cephalometric影像和医生病人对话数据,首先自动分析cephalometric特征点使用U-net,然后生成诊断报告。最后,cephalometric数据和生成的诊断报告分别精确化在Minigpt-4和VisualGLM上。
  • results: 研究结果显示CephGPT-4模型在表现出色,具有潜在创新的应用潜力在颚科领域。
    Abstract Large-scale multimodal language models (LMMs) have achieved remarkable success in general domains. However, the exploration of diagnostic language models based on multimodal cephalometric medical data remains limited. In this paper, we propose a novel multimodal cephalometric analysis and diagnostic dialogue model. Firstly, a multimodal orthodontic medical dataset is constructed, comprising cephalometric images and doctor-patient dialogue data, with automatic analysis of cephalometric landmarks using U-net and generation of diagnostic reports. Then, the cephalometric dataset and generated diagnostic reports are separately fine-tuned on Minigpt-4 and VisualGLM. Results demonstrate that the CephGPT-4 model exhibits excellent performance and has the potential to revolutionize orthodontic measurement and diagnostic applications. These innovations hold revolutionary application potential in the field of orthodontics.
    摘要 大规模多modal语言模型(LMMs)在通用领域已经取得了很大成功。然而,对于基于多modal额外医学数据的诊断语言模型的探索仍然受限。本文提出了一种新的多modal额外医学分析和诊断对话模型。首先,构建了一个多modal额外医学数据集,包括额外成像和医生病人对话数据,并自动分析额外特征点使用U-网和生成诊断报告。然后,额外数据集和生成的诊断报告分别在Minigpt-4和VisualGLM上进行了精细调整。结果表明,CephGPT-4模型在表现出色,有潜力革命化额外测量和诊断应用。这些创新在颌面矫正领域具有革命性应用 potential。

SDRCNN: A single-scale dense residual connected convolutional neural network for pansharpening

  • paper_url: http://arxiv.org/abs/2307.00327
  • repo_url: None
  • paper_authors: Yuan Fang, Yuanzhi Cai, Lei Fan
  • for: 本研究旨在开发一种高效精准的气象图像笼合成方法,以提高气象图像的空间分辨率和多spectral信息的精度。
  • methods: 本研究使用了一种单支持单规模的轻量级卷积神经网络,即SDRCNN,来实现笼合成。SDRCNN使用了一种新的稠密异或连接结构和卷积块,从而实现了更好的准确性和效率的平衡。
  • results: 据视觉检查和相关绝对差值图像,SDRCNN比其他8种传统方法和5种轻量级深度学习方法更为精准,具有最小的空间细节模糊和spectral扭曲。同时,SDRCNN的处理时间也最短。最后,研究人员通过缺省实验证明了SDRCNN的组件的有效性。
    Abstract Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better trade-off between accuracy and efficiency. The performance of SDRCNN was tested using four datasets from the WorldView-3, WorldView-2 and QuickBird satellites. The compared methods include eight traditional methods (i.e., GS, GSA, PRACS, BDSD, SFIM, GLP-CBD, CDIF and LRTCFPan) and five lightweight deep learning methods (i.e., PNN, PanNet, BayesianNet, DMDNet and FusionNet). Based on a visual inspection of the pansharpened images created and the associated absolute residual maps, SDRCNN exhibited least spatial detail blurring and spectral distortion, amongst all the methods considered. The values of the quantitative evaluation metrics were closest to their ideal values when SDRCNN was used. The processing time of SDRCNN was also the shortest among all methods tested. Finally, the effectiveness of each component in the SDRCNN was demonstrated in ablation experiments. All of these confirmed the superiority of SDRCNN.
    摘要 盘晶化是一种将高分辨率粒子图和低分辨率多spectral图像 fusion 成高分辨率多spectral图像的过程。本研究中提出了一种单支持单尺度轻量级卷积神经网络,称为SDRCNN。通过使用一种新型的稠密幂connected结构和卷积块,SDRCNN实现了更好的精度和效率之间的变换。SDRCNN的性能在四个来自WorldView-3、WorldView-2和QuickBird卫星的数据集上进行测试,与八种传统方法(即GS、GSA、PRACS、BDSD、SFIM、GLP-CBD、CDIF和LRTCFPan)和五种轻量级深度学习方法(即PNN、PanNet、BayesianNet、DMDNet和FusionNet)进行比较。视觉检查盘晶化图像和相关的绝对差异图,SDRCNN表现最少的空间细节模糊和spectral扭曲,其中所有方法中最佳。SDRCNN的处理时间也是所有方法中最短。最后,SDRCNN的每个组件的效果在减少实验中得到了证明。这些证明了SDRCNN的优越性。

Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

  • paper_url: http://arxiv.org/abs/2307.00307
  • repo_url: None
  • paper_authors: Shuzhe Chen, Li Li, Zhichao Lin, Ke Zhang, Ying Gong, Lu Wang, Xu Wu, Maokun Li, Yuanlin Song, Fan Yang, Shenheng Xu
  • for: 这个研究旨在用电気阻抗图像序列来分类肺功能模式。
  • methods: 这个研究使用了变换自适应网络和多重块来压缩3D电阻图像序列,并使用了一个简单的卷积神经网络进行分类。
  • results: 研究结果显示,使用图像序列可以正确地分类肺功能模式,并且准确率和敏感性都很高。
    Abstract The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.
    摘要 《肺功能测试(PFT)》是一种广泛使用和严格的分类测试,用于评估肺功能, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.

AE-RED: A Hyperspectral Unmixing Framework Powered by Deep Autoencoder and Regularization by Denoising

  • paper_url: http://arxiv.org/abs/2307.00269
  • repo_url: None
  • paper_authors: Min Zhao, Jie Chen, Nicolas Dobigeon
  • for: This paper proposes a novel framework for spectral unmixing that integrates autoencoder networks with regularization by denoising (RED) to enhance the unmixing performance.
  • methods: The proposed framework uses a deep autoencoder network to implicitly regularize the estimates and model the mixture mechanism, and leverages a denoiser to bring in explicit information.
  • results: Experimental results on both synthetic and real data sets show the superiority of the proposed framework compared with state-of-the-art unmixing approaches.Here is the text in Simplified Chinese:
  • for: 这篇论文提出了一种 integrate autoencoder 网络和正则化 by denoising (RED) 的 spectral unmixing 框架,以提高混合性能。
  • methods: 提案的框架使用深度 autoencoder 网络来隐式地正则化估计和模型混合机制,并利用 denoiser 带入显式信息。
  • results: 对both synthetic 和实际数据集进行了实验,结果显示提案的框架与当前的混合方法相比,性能更高。
    Abstract Spectral unmixing has been extensively studied with a variety of methods and used in many applications. Recently, data-driven techniques with deep learning methods have obtained great attention to spectral unmixing for its superior learning ability to automatically learn the structure information. In particular, autoencoder based architectures are elaborately designed to solve blind unmixing and model complex nonlinear mixtures. Nevertheless, these methods perform unmixing task as blackboxes and lack of interpretability. On the other hand, conventional unmixing methods carefully design the regularizer to add explicit information, in which algorithms such as plug-and-play (PnP) strategies utilize off-the-shelf denoisers to plug powerful priors. In this paper, we propose a generic unmixing framework to integrate the autoencoder network with regularization by denoising (RED), named AE-RED. More specially, we decompose the unmixing optimized problem into two subproblems. The first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism. The second one leverages the denoiser to bring in the explicit information. In this way, both the characteristics of the deep autoencoder based unmixing methods and priors provided by denoisers are merged into our well-designed framework to enhance the unmixing performance. Experiment results on both synthetic and real data sets show the superiority of our proposed framework compared with state-of-the-art unmixing approaches.
    摘要

Deep Angiogram: Trivializing Retinal Vessel Segmentation

  • paper_url: http://arxiv.org/abs/2307.00245
  • repo_url: None
  • paper_authors: Dewei Hu, Xing Yao, Jiacheng Wang, Yuankai K. Tao, Ipek Oguz
  • for: 本研究旨在提出一种可以Robustly recognize retinal vessels on unseen domains的深度学习模型。
  • methods: 我们提出了一种具有对比损失的变分自动编码器,可以筛除不相关的特征,并生成一个含有Only retinal vessels的深度影像,称为深度气图。然后,通过对深度气图进行阈值处理,可以实现高效的血管分 segmentation。
  • results: 我们的模型可以在不同目标域上稳定地生成稳定的气图,提供了优秀的血管可视化,并且成为了非侵入的、安全的替代品于荧光气graphy。
    Abstract Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.
    摘要 中文简体版 amidst 胶囊� branch retinal vasculature 图像的研究努力,深度学习模型一直表现出优秀的性能。然而,这种数据驱动的方法很敏感于频率� changes。为 fundus 图像,这种数据分布变化可以轻松地由照明条件的变化以及疾病相关特征,如血肿和� druse 所引起。由于源频道可能不包括所有可能的疾病情况,一种可以鲁� robustly 识别不同频道上的血管的模型是极其感兴趣的,但尚未实现,尽管已经提出了许多增加复杂性的 segmentation 网络。在这项工作中,我们提出了一种对比� variational autoencoder,可以过滤出不相关的特征并生成一个名为深度� angiogram 的潜在图像,表示只有血管。然后,通过对深度� angiogram 进行简单的阈值处理,可以很容易地完成分 segmentation。我们的模型通过对� target 频道进行匹配,提高了对不同频道的普适性。我们的实验显示,我们的模型可以在不同的 target 频道上生成稳定的 angiograms,提供了优秀的血管可视化和非侵入性、安全的替代品。

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

  • paper_url: http://arxiv.org/abs/2307.00211
  • repo_url: https://github.com/wangjiarui153/aigciqa2023
  • paper_authors: Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai
  • for: 为了更好地理解人类对AI生成图像的视觉偏好
  • methods: 使用6种当前最先进的文本到图像生成模型生成2000多个图像,并通过组织有序的主观实验评估人类对每个图像的质量、准确性和匹配程度的视觉偏好
  • results: 基于大规模的IQA数据库AIGCIQA2023,对多种当前最先进的IQA指标的性能进行评估
    Abstract In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023. We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts. Based on these images, a well-organized subjective experiment is conducted to assess the human visual preferences for each image from three perspectives including quality, authenticity and correspondence. Finally, based on this large-scale database, we conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
    摘要 在本文中,为了更好地理解人类对AI生成图像的Visual preferences,我们建立了一个大规模的AI生成图像评价数据库,名为AIGCIQA2023。我们首先生成了2000多个图像,使用100个提示语来基于6种state-of-the-art文本到图像生成模型。然后,我们对这些图像进行了严格的主观测试,以评估人类对每个图像的质量、原始性和准确性的Visual preferences。最后,基于我们构建的数据库,我们进行了一项 benchmark测试,以评估一些state-of-the-art IQA metrics的性能。

Unsupervised Coordinate-Based Video Denoising

  • paper_url: http://arxiv.org/abs/2307.00179
  • repo_url: None
  • paper_authors: Mary Damilola Aiyetigbo, Dineshchandar Ravichandran, Reda Chalhoub, Peter Kalivas, Nianyi Li
  • for: 这个论文是为了提出一种新的无监督视频干净深度学习方法,以解决数据稀缺问题并对不同噪声模式具有抗性,从而扩大其应用范围。
  • methods: 该方法包括三个模块:特征生成器生成特征地图,噪声约束网络生成一些噪声约束的参照帧,以及重新引入高频环境细节。通过坐标基于网络,我们可以大幅简化网络结构,同时保留高频环境细节在干净视频帧中。
  • results: 我们通过对实验室和实际捕捉的 calcium 影像视频序列进行了广泛的实验,并证明了我们的方法可以有效地去噪真实世界 calcium 影像视频序列,不需要在训练过程中知道噪声模式和数据增强。
    Abstract In this paper, we introduce a novel unsupervised video denoising deep learning approach that can help to mitigate data scarcity issues and shows robustness against different noise patterns, enhancing its broad applicability. Our method comprises three modules: a Feature generator creating features maps, a Denoise-Net generating denoised but slightly blurry reference frames, and a Refine-Net re-introducing high-frequency details. By leveraging the coordinate-based network, we can greatly simplify the network structure while preserving high-frequency details in the denoised video frames. Extensive experiments on both simulated and real-captured demonstrate that our method can effectively denoise real-world calcium imaging video sequences without prior knowledge of noise models and data augmentation during training.
    摘要 在这篇论文中,我们提出了一种新的无监督视频干扰深度学习方法,可以帮助解决数据稀缺问题,并对不同干扰模式表现稳定,提高其广泛应用性。我们的方法包括三个模块:一个特征生成器生成特征地图,一个干扰网络生成干扰除的相对轻微模糊参照帧,以及一个修复网络重新引入高频环境细节。通过借鉴坐标基于网络,我们可以大幅简化网络结构,保留高频环境细节在干扰视频帧中。我们在实验中使用 simulate 和实际捕获的 calcium 影像视频序列进行了广泛的测试,并证明了我们的方法可以有效地去干扰实际世界 calcium 影像视频序列,不需要先知道干扰模型和数据增强 durante 训练。

Multiscale Progressive Text Prompt Network for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.00174
  • repo_url: https://github.com/codehxj/MPTPN-for--Medical-Image-Segmentation
  • paper_authors: Xianjun Han, Qianqian Chen, Zhaoyang Xie, Xuejun Li, Hongyu Yang
  • for: 用于靠据医学图像分割任务的模型训练,以获取可靠的形态统计数据。
  • methods: 使用进步文本提示来导航分割过程,包括两个阶段:首先使用自然图像进行对比学习,以预训练一个强大的先提示编码器(PPE),然后将医学图像和文本提示送入PPE进行下游分割任务。
  • results: 通过将多尺度特征融合并使用文本提示来改进预测准确性,并使用UpAttention块进行预测结果的精细调整。模型可以减少数据标注成本,同时在医学图像和自然图像上都有出色表现。
    Abstract The accurate segmentation of medical images is a crucial step in obtaining reliable morphological statistics. However, training a deep neural network for this task requires a large amount of labeled data to ensure high-accuracy results. To address this issue, we propose using progressive text prompts as prior knowledge to guide the segmentation process. Our model consists of two stages. In the first stage, we perform contrastive learning on natural images to pretrain a powerful prior prompt encoder (PPE). This PPE leverages text prior prompts to generate multimodality features. In the second stage, medical image and text prior prompts are sent into the PPE inherited from the first stage to achieve the downstream medical image segmentation task. A multiscale feature fusion block (MSFF) combines the features from the PPE to produce multiscale multimodality features. These two progressive features not only bridge the semantic gap but also improve prediction accuracy. Finally, an UpAttention block refines the predicted results by merging the image and text features. This design provides a simple and accurate way to leverage multiscale progressive text prior prompts for medical image segmentation. Compared with using only images, our model achieves high-quality results with low data annotation costs. Moreover, our model not only has excellent reliability and validity on medical images but also performs well on natural images. The experimental results on different image datasets demonstrate that our model is effective and robust for image segmentation.
    摘要 医学影像分割是获取可靠形态统计的关键步骤。然而,用深度神经网络进行这个任务需要大量标注数据来确保高准确率结果。为解决这个问题,我们提议使用进步文本提示作为先验知识来导航分割过程。我们的模型包括两个阶段。在第一阶段,我们通过对自然图像进行对比学习来预训练一个强大的先验提示编码器(PPE)。这个PPE利用文本先验提示来生成多样性特征。在第二阶段,医学影像和文本先验提示被送入PPE继承自第一阶段以实现下游医学影像分割任务。一个多尺度特征融合块(MSFF)将PPE生成的特征融合成多尺度多样性特征。这两个进步特征不仅bridge semantic gap,还提高预测精度。最后,一个UpAttention块进一步细化预测结果,将图像和文本特征相结合。这种设计提供了一种简单有效的方式,使用进步多尺度文本先验提示来实现医学影像分割。相比使用仅图像,我们的模型可以 дости得高质量结果,同时减少数据标注成本。此外,我们的模型不仅在医学影像上有出色的可靠性和有效性,还在自然图像上表现良好。实验结果表明,我们的模型是有效和稳定的图像分割模型。