eess.IV - 2023-07-06

Self-supervised learning via inter-modal reconstruction and feature projection networks for label-efficient 3D-to-2D segmentation

  • paper_url: http://arxiv.org/abs/2307.03008
  • repo_url: https://github.com/j-morano/multimodal-ssl-fpn
  • paper_authors: José Morano, Guilherme Aresta, Dmitrii Lachinov, Julia Mai, Ursula Schmidt-Erfurth, Hrvoje Bogunović
  • for: 该研究旨在提出一种标签效率高的三维到二维图像分割方法,以降低医生的工作负担。
  • methods: 该方法使用了一种新的卷积神经网络和自我超vised学习方法,其中包括了新的三维到二维块。
  • results: 实验结果显示,提出的方法可以在具有有限标签数据的场景下提高状态的艺术,达到8%的 dice分数提升。此外,自我超vised学习方法可以进一步提高这种性能,并且在不同的网络架构下都有益。
    Abstract Deep learning has become a valuable tool for the automation of certain medical image segmentation tasks, significantly relieving the workload of medical specialists. Some of these tasks require segmentation to be performed on a subset of the input dimensions, the most common case being 3D-to-2D. However, the performance of existing methods is strongly conditioned by the amount of labeled data available, as there is currently no data efficient method, e.g. transfer learning, that has been validated on these tasks. In this work, we propose a novel convolutional neural network (CNN) and self-supervised learning (SSL) method for label-efficient 3D-to-2D segmentation. The CNN is composed of a 3D encoder and a 2D decoder connected by novel 3D-to-2D blocks. The SSL method consists of reconstructing image pairs of modalities with different dimensionality. The approach has been validated in two tasks with clinical relevance: the en-face segmentation of geographic atrophy and reticular pseudodrusen in optical coherence tomography. Results on different datasets demonstrate that the proposed CNN significantly improves the state of the art in scenarios with limited labeled data by up to 8% in Dice score. Moreover, the proposed SSL method allows further improvement of this performance by up to 23%, and we show that the SSL is beneficial regardless of the network architecture.
    摘要 深度学习已成为医疗图像分割任务自动化的有价值工具,减轻医生的工作负担。一些这些任务需要在输入维度中进行分割,最常见的情况是从3D转换到2D。然而,现有方法的性能受到可用标注数据量的限制,例如没有数据效果的传输学习方法,这些方法没有被验证。在这种情况下,我们提出了一种新的卷积神经网络(CNN)和自我超vised学习(SSL)方法,用于标签效率3D-to-2D分割。CNN包括3D编码器和2D解码器连接的三个3D-to-2D块。SSL方法是使用不同维度的图像对创建图像对。我们在两个临床有 relevance 的任务中验证了该方法:在optical coherence tomography中的扁平分割和细胞ular pseudodrusen。不同的数据集结果表明,我们提出的CNN可以在有限标注数据情况下提高状态的艺术到8%的Dice分数。此外,我们还证明了SSL方法可以进一步改进这种性能,并且该方法对网络架构无关。

Fourier-Net+: Leveraging Band-Limited Representation for Efficient 3D Medical Image Registration

  • paper_url: http://arxiv.org/abs/2307.02997
  • repo_url: https://github.com/xi-jia/fourier-net
  • paper_authors: Xi Jia, Alexander Thorley, Alberto Gomez, Wenqi Lu, Dipak Kotecha, Jinming Duan
  • for: 这个研究旨在提高Unsupervised Image Registration中的维度变换运算效率,使用Fourier-Net来取代传统的U-Net式维度变换网络。
  • methods: Fourier-Net使用参数无法模型驱动解oder来取代传统的U-Net式维度变换网络的复杂路径,并从Fourier domain中学习低维度的扩展场。Fourier-Net+进一步将内部网络的层数减少,并将内部网络的输入扩展至更大的对象网络。
  • results: 该研究表明,使用Fourier-Net和Fourier-Net+可以实现与现有方法相同的注册性能,并且具有更快的测试速度、更低的内存压缩和更少的乘法加法操作。此外,Fourier-Net+可以实现大规模3D注册的高效训练。
    Abstract U-Net style networks are commonly utilized in unsupervised image registration to predict dense displacement fields, which for high-resolution volumetric image data is a resource-intensive and time-consuming task. To tackle this challenge, we first propose Fourier-Net, which replaces the costly U-Net style expansive path with a parameter-free model-driven decoder. Instead of directly predicting a full-resolution displacement field, our Fourier-Net learns a low-dimensional representation of the displacement field in the band-limited Fourier domain which our model-driven decoder converts to a full-resolution displacement field in the spatial domain. Expanding upon Fourier-Net, we then introduce Fourier-Net+, which additionally takes the band-limited spatial representation of the images as input and further reduces the number of convolutional layers in the U-Net style network's contracting path. Finally, to enhance the registration performance, we propose a cascaded version of Fourier-Net+. We evaluate our proposed methods on three datasets, on which our proposed Fourier-Net and its variants achieve comparable results with current state-of-the art methods, while exhibiting faster inference speeds, lower memory footprint, and fewer multiply-add operations. With such small computational cost, our Fourier-Net+ enables the efficient training of large-scale 3D registration on low-VRAM GPUs. Our code is publicly available at \url{https://github.com/xi-jia/Fourier-Net}.
    摘要 U-Net风格网络通常用于无监督图像registrations,预测高分辨率三维图像数据中的密集偏移场景。为了解决这个挑战,我们首先提出了Fourier-Net,它将U-Net风格的昂贵expandive路径换成一个无参数的模型驱动decoder。而不是直接预测全分辨率偏移场景,Fourier-Net学习了带限 FourierDomain中的低维度偏移场景表示。我们的model-driven decoder将这个低维度表示转换成全分辨率偏移场景。在这基础之上,我们还提出了Fourier-Net+,它还接受了带限 spatial表示的图像,并降低了U-Net风格网络的 contraction path中的卷积层数。最后,我们提出了Fourier-Net+的卷积版本,用于进一步提高registrations的性能。我们在三个dataset上评估了我们的提议方法,其中Fourier-Net和其 variants具有与当前状态的方法相同的性能,而且具有更快的推理速度、更低的内存占用和更少的 multiply-add操作。这样的小计算成本使得我们的Fourier-Net+在低VRAM GPU上能够高效地训练大规模3D registration。我们的代码publicly available at \url{https://github.com/xi-jia/Fourier-Net}.

Noise-to-Norm Reconstruction for Industrial Anomaly Detection and Localization

  • paper_url: http://arxiv.org/abs/2307.02836
  • repo_url: None
  • paper_authors: Shiqi Deng, Zhiyu Sun, Ruiyan Zhuang, Jun Gong
  • for: 这个研究旨在提出一种基于恢复Error的异常检测方法,以便在物品位置变化较大的数据集上进行异常检测和地址化。
  • methods: 该方法使用了杂变恢复网络,包括多尺度融合和复原注意力模块,实现了端到端异常检测和定位。
  • results: 实验表明,该方法能够准确地检测和定位异常区域,并在MPDD和VisA数据集上达到了最新的方法的竞争性成绩。
    Abstract Anomaly detection has a wide range of applications and is especially important in industrial quality inspection. Currently, many top-performing anomaly-detection models rely on feature-embedding methods. However, these methods do not perform well on datasets with large variations in object locations. Reconstruction-based methods use reconstruction errors to detect anomalies without considering positional differences between samples. In this study, a reconstruction-based method using the noise-to-norm paradigm is proposed, which avoids the invariant reconstruction of anomalous regions. Our reconstruction network is based on M-net and incorporates multiscale fusion and residual attention modules to enable end-to-end anomaly detection and localization. Experiments demonstrate that the method is effective in reconstructing anomalous regions into normal patterns and achieving accurate anomaly detection and localization. On the MPDD and VisA datasets, our proposed method achieved more competitive results than the latest methods, and it set a new state-of-the-art standard on the MPDD dataset.
    摘要 《异常检测在各种应用领域有广泛的应用,特别是在工业质量检测中非常重要。目前许多高性能异常检测模型都采用特征嵌入方法。但这些方法在样本位置差异较大的数据集上表现不佳。重建基于方法利用重建错误来检测异常,而不考虑样本间位置差异。在本研究中,我们提出了基于噪声至常数据(noise-to-norm)的重建基于方法,以避免异常区域的恒常重建。我们的重建网络基于M-net,并包括多尺度融合和剩余注意模块,以实现端到端异常检测和地图化。实验表明,我们的提议方法可以有效地将异常区域重建为常见模式,并实现高精度的异常检测和地图化。在MPDD和VisA数据集上,我们的提议方法超过了最新的方法,并在MPDD数据集上设置了新的状态标准。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Bundle-specific Tractogram Distribution Estimation Using Higher-order Streamline Differential Equation

  • paper_url: http://arxiv.org/abs/2307.02825
  • repo_url: None
  • paper_authors: Yuanjing Feng, Lei Xie, Jingqiang Wang, Jianzhong He, Fei Gao
    for: This paper aims to improve tractography methods for reconstructing complex global fiber bundles in the brain, which are prone to producing erroneous tracks and missing true positive connections.methods: The proposed method uses a bundle-specific tractogram distribution function based on a higher-order streamline differential equation, which reconstructs streamline bundles in a “cluster to cluster” manner. The method also introduces anatomic priors to guide the tractography process and improve the accuracy of fiber bundle reconstruction.results: The proposed method demonstrates better results in reconstructing long-range, twisting, and large fanning tracts compared to traditional peaks-based tractography methods. The method also reduces the error deviation and accumulation at the local level, and provides a more accurate representation of the complex global fiber bundles in the brain.
    Abstract Tractography traces the peak directions extracted from fiber orientation distribution (FOD) suffering from ambiguous spatial correspondences between diffusion directions and fiber geometry, which is prone to producing erroneous tracks while missing true positive connections. The peaks-based tractography methods 'locally' reconstructed streamlines in 'single to single' manner, thus lacking of global information about the trend of the whole fiber bundle. In this work, we propose a novel tractography method based on a bundle-specific tractogram distribution function by using a higher-order streamline differential equation, which reconstructs the streamline bundles in 'cluster to cluster' manner. A unified framework for any higher-order streamline differential equation is presented to describe the fiber bundles with disjoint streamlines defined based on the diffusion tensor vector field. At the global level, the tractography process is simplified as the estimation of bundle-specific tractogram distribution (BTD) coefficients by minimizing the energy optimization model, and is used to characterize the relations between BTD and diffusion tensor vector under the prior guidance by introducing the tractogram bundle information to provide anatomic priors. Experiments are performed on simulated Hough, Sine, Circle data, ISMRM 2015 Tractography Challenge data, FiberCup data, and in vivo data from the Human Connectome Project (HCP) data for qualitative and quantitative evaluation. The results demonstrate that our approach can reconstruct the complex global fiber bundles directly. BTD reduces the error deviation and accumulation at the local level and shows better results in reconstructing long-range, twisting, and large fanning tracts.
    摘要 tractography 跟踪 peak 方向,从 fiber orientation distribution (FOD) 中提取的方向,受到不确定的空间相对关系,容易生成错误的轨迹,而且错过真实正确的连接。 peaks-based tractography 方法在 'single to single' manner 地重建流线,缺乏全局信息,不能捕捉整个纤维Bundle的趋势。在这种工作中,我们提出了一种基于纤维特有 tractogram distribution 函数的新的 tractography 方法,使用高阶流线差分方程,重建流线集在 'cluster to cluster' manner 中。我们提出了一个综合的框架,用于描述纤维Bundle中的不同流线。在全局水平, tractography 过程被简化为估计纤维特有 tractogram distribution(BTD)系数,并用来描述纤维Bundle中的纤维tensor vector 场的关系。我们引入了 tractogram 短信息,以提供生物学上的先验知识。我们在 Hough、Sine、Circle 数据、ISMRM 2015 Tractography Challenge 数据、FiberCup 数据和人类连接计划(HCP)数据进行了质量和量化评估。结果表明,我们的方法可以直接重建复杂的全局纤维Bundle。BTD 降低了地方错误和积累,并且更好地重建长距离、扭转和大扇辐 tracts。

Single Image LDR to HDR Conversion using Conditional Diffusion

  • paper_url: http://arxiv.org/abs/2307.02814
  • repo_url: None
  • paper_authors: Dwip Dalal, Gautam Vashishtha, Prajwal Singh, Shanmuganathan Raman
  • for: 该论文旨在使用深度学习方法来复制真实的场景,但低动态范围(LDR)摄像头无法表示实际场景的广泛的动态范围,导致图像被折制或过度曝光。
  • methods: 该论文提出了一种基于深度学习的方法,用于从阴影和突出处恢复细节,同时重建高动态范围(HDR)图像。我们将问题定义为一个图像到图像(I2I)翻译任务,并提出了一种基于conditioned Denoising Diffusion Probabilistic Model(DDPM)的框架。我们在该框架中添加了一个深度神经网络基于autoencoder来提高输入LDR图像的latent表示质量。此外,我们还提出了一种新的损失函数,称为曝光损失(Exposure Loss),该损失函数可以更好地导向梯度的方向,进一步提高结果的质量。
  • results: 我们通过对比性和质量测试,证明了我们提出的方法的效果。结果表明,一种简单的conditioned diffusion-based方法可以取代复杂的摄像头管线基础架构。
    Abstract Digital imaging aims to replicate realistic scenes, but Low Dynamic Range (LDR) cameras cannot represent the wide dynamic range of real scenes, resulting in under-/overexposed images. This paper presents a deep learning-based approach for recovering intricate details from shadows and highlights while reconstructing High Dynamic Range (HDR) images. We formulate the problem as an image-to-image (I2I) translation task and propose a conditional Denoising Diffusion Probabilistic Model (DDPM) based framework using classifier-free guidance. We incorporate a deep CNN-based autoencoder in our proposed framework to enhance the quality of the latent representation of the input LDR image used for conditioning. Moreover, we introduce a new loss function for LDR-HDR translation tasks, termed Exposure Loss. This loss helps direct gradients in the opposite direction of the saturation, further improving the results' quality. By conducting comprehensive quantitative and qualitative experiments, we have effectively demonstrated the proficiency of our proposed method. The results indicate that a simple conditional diffusion-based method can replace the complex camera pipeline-based architectures.
    摘要 数字图像处理目标是复制真实场景,但低动态范围(LDR)摄像机无法表示实际场景的广泛动态范围,导致图像被折制或过度曝光。这篇论文提出了基于深度学习的方法,用于从阴影和突出处恢复细节,同时重建高动态范围(HDR)图像。我们将问题定义为图像到图像(I2I)翻译任务,并提出了基于conditional Denoising Diffusion Probabilistic Model(DDPM)的框架。我们在我们的提议框架中集成了深度 neural network(CNN)基于autoencoder,以提高输入LDR图像的内存质量。此外,我们还引入了一种新的损失函数,称为曝光损失(Exposure Loss),该损失函数可以引导梯度的方向与暴露度相反,进一步提高结果的质量。通过进行全面的量化和质量测试,我们有效地表明了我们的提议方法的效果。结果表明,一种简单的增强 diffusion-based方法可以取代复杂的摄像机管线 Architecture。

Advancing Zero-Shot Digital Human Quality Assessment through Text-Prompted Evaluation

  • paper_url: http://arxiv.org/abs/2307.02808
  • repo_url: https://github.com/zzc-1998/sjtu-h3d
  • paper_authors: Zicheng Zhang, Wei Sun, Yingjie Zhou, Haoning Wu, Chunyi Li, Xiongkuo Min, Xiaohong Liu, Guangtao Zhai, Weisi Lin
  • for: 本研究旨在提供一个全身数字人质量评估(DHQA)数据库,以便进行数字人质量评估研究。
  • methods: 本研究使用了7种类型的扭曲方法生成1,120个扭曲的参考数字人,以及40个高品质的参考数字人。furthermore, the research proposes a zero-shot DHQA approach that leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans.
  • results: 研究结果显示,Zero-shot DHQA方法可以在不受数据库偏见影响的情况下,实现一定的质量评估表现。此外,研究也引入了数字人质量指数(DHQI),可以作为一个强大的基eline,促进领域的进步。
    Abstract Digital humans have witnessed extensive applications in various domains, necessitating related quality assessment studies. However, there is a lack of comprehensive digital human quality assessment (DHQA) databases. To address this gap, we propose SJTU-H3D, a subjective quality assessment database specifically designed for full-body digital humans. It comprises 40 high-quality reference digital humans and 1,120 labeled distorted counterparts generated with seven types of distortions. The SJTU-H3D database can serve as a benchmark for DHQA research, allowing evaluation and refinement of processing algorithms. Further, we propose a zero-shot DHQA approach that focuses on no-reference (NR) scenarios to ensure generalization capabilities while mitigating database bias. Our method leverages semantic and distortion features extracted from projections, as well as geometry features derived from the mesh structure of digital humans. Specifically, we employ the Contrastive Language-Image Pre-training (CLIP) model to measure semantic affinity and incorporate the Naturalness Image Quality Evaluator (NIQE) model to capture low-level distortion information. Additionally, we utilize dihedral angles as geometry descriptors to extract mesh features. By aggregating these measures, we introduce the Digital Human Quality Index (DHQI), which demonstrates significant improvements in zero-shot performance. The DHQI can also serve as a robust baseline for DHQA tasks, facilitating advancements in the field. The database and the code are available at https://github.com/zzc-1998/SJTU-H3D.
    摘要 “数字人类”在各个领域得到了广泛的应用,但是相关的质量评估研究受到了不足的全面数字人质量评估(DHQA)数据库的限制。为了解决这一问题,我们提出了《上海交通大学高质量数字人质量评估数据库》(SJTU-H3D),这是专门为全身数字人创建的主观质量评估数据库。它包括40个高质量参照数字人和1,120个扭曲对应件,通过七种扭曲方法生成。SJTU-H3D数据库可以作为DHQA研究的标准准确,用于评估和优化处理算法。此外,我们提出了一种零批量DHQA方法,专注于无参(NR)场景,以确保泛化能力的同时避免数据库偏见。我们的方法利用CLIP模型测量语义相似性,并使用NIQE模型捕捉低级扭曲信息。此外,我们还利用截角角度来描述数字人的网格结构,从而提取网格特征。通过积合这些度量,我们引入了数字人质量指数(DHQI),它在零批量情况下显示出了显著的改善。DHQI还可以作为DHQA任务的Robust基准,促进领域的进步。SJTU-H3D数据库和代码可以在GitHub上下载。

Retinex-based Image Denoising / Contrast Enhancement using Gradient Graph Laplacian Regularizer

  • paper_url: http://arxiv.org/abs/2307.02625
  • repo_url: None
  • paper_authors: Yeganeh Gharedaghi, Gene Cheung, Xianming Liu
  • for: 提高低光照图像质量
  • methods: 使用图像Retinex理论和图像laplacian regularizer进行降噪和增强对比
  • results: 实验结果表明,我们的算法可以提高图像质量,同时减少计算复杂度
    Abstract Images captured in poorly lit conditions are often corrupted by acquisition noise. Leveraging recent advances in graph-based regularization, we propose a fast Retinex-based restoration scheme that denoises and contrast-enhances an image. Specifically, by Retinex theory we first assume that each image pixel is a multiplication of its reflectance and illumination components. We next assume that the reflectance and illumination components are piecewise constant (PWC) and continuous piecewise planar (PWP) signals, which can be recovered via graph Laplacian regularizer (GLR) and gradient graph Laplacian regularizer (GGLR) respectively. We formulate quadratic objectives regularized by GLR and GGLR, which are minimized alternately until convergence by solving linear systems -- with improved condition numbers via proposed preconditioners -- via conjugate gradient (CG) efficiently. Experimental results show that our algorithm achieves competitive visual image quality while reducing computation complexity noticeably.
    摘要 图像采集在低光照条件下经常受到获取噪声的损害。我们基于最新的图格基于常量化的正则化技术,提出一种快速的Retinex基于的图像修复方案,可以去噪和提高图像的对比度。具体来说,我们首先假设每个图像像素是其反射和照明组成部分的乘积。我们接下来假设反射和照明组成部分是连续的 piecwise 板状信号(PWP)和 piecewise 常量信号(PWC),可以通过图像拉普拉斯正则izer(GLR)和梯度图像拉普拉斯正则izer(GGLR)分别进行回归。我们定义了quadratic 目标函数,其中GLR和GGLR的正则化项被加以规范化,然后通过 conjugate gradient(CG)高效地解决线性系统,并使用我们提出的预conditioner来提高condition number。实验结果表明,我们的算法可以达到竞争力强的视觉图像质量,同时减少计算复杂度明显。

AxonCallosumEM Dataset: Axon Semantic Segmentation of Whole Corpus Callosum cross section from EM Images

  • paper_url: http://arxiv.org/abs/2307.02464
  • repo_url: None
  • paper_authors: Ao Cheng, Guoqiang Zhao, Lirong Wang, Ruobing Zhang
  • for: 这个研究的目的是为了精确地重建动物神经系统中的 axon 和 myelin sheath 的复杂结构,以及提供大规模 EM 数据集,以促进和评估全面的 corpus callosum 重建。
  • methods: 这个研究使用了 Electron Microscope (EM) 技术,并开发了一个基于 Segment Anything Model (SAM) 的 fine-tuning 方法,以进行 EM 图像分类 задачі的推导。
  • results: 这个研究获得了一个名为 AxonCallosumEM 的大规模 EM 数据集,并使用了这个数据集进行训练和测试。EM-SAM 方法在这个数据集上表现出色,与其他现有的方法相比,具有更高的准确率和更好的一致性。
    Abstract The electron microscope (EM) remains the predominant technique for elucidating intricate details of the animal nervous system at the nanometer scale. However, accurately reconstructing the complex morphology of axons and myelin sheaths poses a significant challenge. Furthermore, the absence of publicly available, large-scale EM datasets encompassing complete cross sections of the corpus callosum, with dense ground truth segmentation for axons and myelin sheaths, hinders the advancement and evaluation of holistic corpus callosum reconstructions. To surmount these obstacles, we introduce the AxonCallosumEM dataset, comprising a 1.83 times 5.76mm EM image captured from the corpus callosum of the Rett Syndrome (RTT) mouse model, which entail extensive axon bundles. We meticulously proofread over 600,000 patches at a resolution of 1024 times 1024, thus providing a comprehensive ground truth for myelinated axons and myelin sheaths. Additionally, we extensively annotated three distinct regions within the dataset for the purposes of training, testing, and validation. Utilizing this dataset, we develop a fine-tuning methodology that adapts Segment Anything Model (SAM) to EM images segmentation tasks, called EM-SAM, enabling outperforms other state-of-the-art methods. Furthermore, we present the evaluation results of EM-SAM as a baseline.
    摘要 电子顾 microscope (EM) 仍然是研究动物神经系统细胞级结构的主要技术。然而,准确地重建神经元和脱膜的复杂形态却是一项非常大的挑战。此外,没有公共可用的大规模 EM 数据集覆盖整个脊梗,包括坚实的基本 truth 分类 для神经元和脱膜,使得整体脊梗重建的进步和评估受到了限制。为了突破这些障碍,我们介绍了 AxonCallosumEM 数据集,包括来自脊梗综合症 (RTT) 小鼠模型的 1.83 times 5.76mm EM 图像,其中包含了广泛的 axon 集合。我们仔细检查了超过 600,000 个小块,解决了 1024 times 1024 的分辨率,以提供脱膜神经元和脱膜的完整基本实际。此外,我们还为了训练、测试和验证而进行了三个不同的区域的杂志。使用这个数据集,我们开发了一种基于 Segment Anything Model (SAM) 的微调方法,称为 EM-SAM,该方法可以超越其他当前的状况。此外,我们还提供了 EM-SAM 的评估结果作为基准。

Expert-Agnostic Ultrasound Image Quality Assessment using Deep Variational Clustering

  • paper_url: http://arxiv.org/abs/2307.02462
  • repo_url: None
  • paper_authors: Deepak Raina, Dimitrios Ntentia, SH Chandrashekhara, Richard Voyles, Subir Kumar Saha
  • for: 用于评估ultrasound图像质量,自动化评估可以减少操作员依赖性和Subjective variation。
  • methods: 提出了一种不需要手动标注的自动评估方法,使用variational autoencoder加载了三个模块:预处理、分组和后处理,以增强、提取、分组和可视化ultrasound图像质量特征表示。
  • results: 对尿道ultrasound图像质量进行评估,实现了78%的准确率和与状态元方法相比的更高性能。
    Abstract Ultrasound imaging is a commonly used modality for several diagnostic and therapeutic procedures. However, the diagnosis by ultrasound relies heavily on the quality of images assessed manually by sonographers, which diminishes the objectivity of the diagnosis and makes it operator-dependent. The supervised learning-based methods for automated quality assessment require manually annotated datasets, which are highly labour-intensive to acquire. These ultrasound images are low in quality and suffer from noisy annotations caused by inter-observer perceptual variations, which hampers learning efficiency. We propose an UnSupervised UltraSound image Quality assessment Network, US2QNet, that eliminates the burden and uncertainty of manual annotations. US2QNet uses the variational autoencoder embedded with the three modules, pre-processing, clustering and post-processing, to jointly enhance, extract, cluster and visualize the quality feature representation of ultrasound images. The pre-processing module uses filtering of images to point the network's attention towards salient quality features, rather than getting distracted by noise. Post-processing is proposed for visualizing the clusters of feature representations in 2D space. We validated the proposed framework for quality assessment of the urinary bladder ultrasound images. The proposed framework achieved 78% accuracy and superior performance to state-of-the-art clustering methods.
    摘要 ultrasound imaging 是一种广泛使用的Modalities,用于许多诊断和治疗过程。然而,由于诊断基于ultrasound的图像质量,它取决于sonographers manually assessing the images的质量,这会使诊断变得人工依赖。supervised learning-based方法需要手动标注的数据集,这是非常劳动密集的获得。这些ultrasound图像质量低,并且受到了 manually annotated datasets的噪音干扰,这会降低学习效率。我们提出了一种不需要手动标注的自主ultrasound图像质量评估网络,称为US2QNet。US2QNet使用包括预处理、聚类和后处理三个模块的变分自动编码器,并在一起进行协同增强、提取、聚类和可视化ultrasound图像质量特征表示。预处理模块使用图像滤波器,将网络的注意力集中在有关质量特征的图像特征上,而不是受到噪音的扰乱。后处理是为可视化特征表示的集群在2D空间进行提出。我们验证了提出的框架,用于评估膀胱ultrasound图像质量。提出的框架实现了78%的准确率,并超过了状态的聚类方法的性能。

LLCaps: Learning to Illuminate Low-Light Capsule Endoscopy with Curved Wavelet Attention and Reverse Diffusion

  • paper_url: http://arxiv.org/abs/2307.02452
  • repo_url: https://github.com/longbai1006/llcaps
  • paper_authors: Long Bai, Tong Chen, Yanan Wu, An Wang, Mobarakol Islam, Hongliang Ren
  • for: 这个研究是为了提高无线电囊镜检查(WCE)的病理诊断效率和精确度,使其成为胃肠病诊断中不可或缺的工具。
  • methods: 本研究提出了一个基于多层态击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击击�
    Abstract Wireless capsule endoscopy (WCE) is a painless and non-invasive diagnostic tool for gastrointestinal (GI) diseases. However, due to GI anatomical constraints and hardware manufacturing limitations, WCE vision signals may suffer from insufficient illumination, leading to a complicated screening and examination procedure. Deep learning-based low-light image enhancement (LLIE) in the medical field gradually attracts researchers. Given the exuberant development of the denoising diffusion probabilistic model (DDPM) in computer vision, we introduce a WCE LLIE framework based on the multi-scale convolutional neural network (CNN) and reverse diffusion process. The multi-scale design allows models to preserve high-resolution representation and context information from low-resolution, while the curved wavelet attention (CWA) block is proposed for high-frequency and local feature learning. Furthermore, we combine the reverse diffusion procedure to further optimize the shallow output and generate the most realistic image. The proposed method is compared with ten state-of-the-art (SOTA) LLIE methods and significantly outperforms quantitatively and qualitatively. The superior performance on GI disease segmentation further demonstrates the clinical potential of our proposed model. Our code is publicly accessible.
    摘要 无线胶囊内镜(WCE)是一种不痛不侵的诊断工具 для Gastrointestinal(GI)疾病。然而,由于GI生物学结构和硬件生产限制,WCE视图信号可能受到不足照明的影响,导致诊断和检查过程复杂。在医学领域,深度学习基于图像提高(LLIE)逐渐吸引研究人员。基于计算机视觉的DDPM模型在发展的过程中,我们介绍了一种基于多尺度卷积神经网络(CNN)和反扩散过程的WCE LLIE框架。多尺度设计使得模型保留高分辨率表示和上下文信息,而弯曲波形注意力(CWA)块则用于高频和本地特征学习。此外,我们将反扩散过程与模型结合,以进一步优化浅输出并生成最真实的图像。我们的提案与现有的10种SOTA LLIE方法进行比较,并显著超越量化和质量上。此外,我们还通过GI疾病分割诊断的超越性表现,更加证明了我们的提案在临床中的潜在应用。我们的代码公共可访问。

Base Layer Efficiency in Scalable Human-Machine Coding

  • paper_url: http://arxiv.org/abs/2307.02430
  • repo_url: None
  • paper_authors: Yalda Foroutan, Alon Harell, Anderson de Andrade, Ivan V. Bajić
  • for: 这个论文是为了提高现代可扩展人机图像编码器的基层编码效率而写的。
  • methods: 论文使用了现有的人机图像编码器,并对其进行了分层编码和对比分析,以提高基层编码效率。
  • results: 论文表明,通过对基层编码进行优化,可以提高BD-Rate的效率, Specifically, the paper shows that gains of 20-40% in BD-Rate compared to the current best results on object detection and instance segmentation are possible.
    Abstract A basic premise in scalable human-machine coding is that the base layer is intended for automated machine analysis and is therefore more compressible than the same content would be for human viewing. Use cases for such coding include video surveillance and traffic monitoring, where the majority of the content will never be seen by humans. Therefore, base layer efficiency is of paramount importance because the system would most frequently operate at the base-layer rate. In this paper, we analyze the coding efficiency of the base layer in a state-of-the-art scalable human-machine image codec, and show that it can be improved. In particular, we demonstrate that gains of 20-40% in BD-Rate compared to the currently best results on object detection and instance segmentation are possible.
    摘要 基本前提是可扩展的人机编程的基层是为机器自动分析而设计,因此基层内容比同样的内容用于人类视觉更容易压缩。使用场景包括视频监控和交通监测,大多数内容从未被人类查看。因此,基层的编码效率非常重要,系统大多数时间都会在基层级别运行。在这篇论文中,我们分析了一个现代可扩展人机图像编码器的基层编码效率,并证明可以提高。特别是,我们示出了对 объек detection和实例 segmentation的Current best results可以提高20-40%的BD-Rate。