eess.IV - 2023-07-24

Conditional Residual Coding: A Remedy for Bottleneck Problems in Conditional Inter Frame Coding

  • paper_url: http://arxiv.org/abs/2307.12864
  • repo_url: None
  • paper_authors: Fabian Brand, Jürgen Seiler, André Kaup
  • for: 这个论文是为了提出一种新的视频编码方法,即基于神经网络的 conditional coding,以提高视频编码的效率。
  • methods: 这个论文使用了 conditional coding 和 residual coding 两种编码方法进行比较,并提出了一种新的 conditional residual coding 方法,以解决 conditional coding 中的信息瓶颈问题。
  • results: 论文通过 theoretically 和实际例子的分析,证明 conditional residual coding 可以减少信息瓶颈的影响,同时保持 conditional coding 的理论性能。这种编码方法可以 viewed as “the best from both worlds” 在 residual 和 conditional coding 之间。
    Abstract Conditional coding is a new video coding paradigm enabled by neural-network-based compression. It can be shown that conditional coding is in theory better than the traditional residual coding, which is widely used in video compression standards like HEVC or VVC. However, on closer inspection, it becomes clear that conditional coders can suffer from information bottlenecks in the prediction path, i.e., that due to the data processing inequality not all information from the prediction signal can be passed to the reconstructed signal, thereby impairing the coder performance. In this paper we propose the conditional residual coding concept, which we derive from information theoretical properties of the conditional coder. This coder significantly reduces the influence of bottlenecks, while maintaining the theoretical performance of the conditional coder. We provide a theoretical analysis of the coding paradigm and demonstrate the performance of the conditional residual coder in a practical example. We show that conditional residual coders alleviate the disadvantages of conditional coders while being able to maintain their advantages over residual coders. In the spectrum of residual and conditional coding, we can therefore consider them as ``the best from both worlds''.
    摘要 新的条件编码方式是基于神经网络的压缩,可以证明这种条件编码在理论上比传统的差异编码(如HEVC或VVC中的差异编码)更好。然而,在更加仔细的分析下,可以发现条件编码器可能会在预测路径中遇到信息瓶颈,即由数据处理不对称性导致的信息无法传递到重建信号中,从而影响编码器性能。在这篇论文中,我们提出了条件差异编码概念,该概念基于条件编码器的信息学性质。这种编码器可以减少预测路径中的瓶颈影响,同时保持条件编码器的理论性能。我们对这种编码器进行了理论分析,并在实践中示出了其性能。我们发现,条件差异编码器可以消除条件编码器的缺点,同时保持条件编码器比差异编码器更好的优势。因此,在差异和条件编码之间的谱spectrum中,我们可以视之为“最佳的两个世界”。

Spatiotemporal Modeling Encounters 3D Medical Image Analysis: Slice-Shift UNet with Multi-View Fusion

  • paper_url: http://arxiv.org/abs/2307.12853
  • repo_url: None
  • paper_authors: C. I. Ugwu, S. Casarin, O. Lanz
  • for: 这paper的目的是提出一种基于2D Convolutional Neural Networks的多模态脐椎像分割模型,以提高计算医学中的图像分析效能。
  • methods: 这paper使用了一种名为Slice SHift UNet(SSH-UNet)的新模型,它通过在多个视角上进行2D卷积,共同学习多个视角的特征,并通过在层次轴上偏移特征图来重新包含第三维度信息。
  • results: 该paper在Multi-Modality Abdominal Multi-Organ Segmentation(AMOS)和Multi-Atlas Labeling Beyond the Cranial Vault(BTCV) datasets上进行了实验,并证明了SSH-UNet的效果与现有的模型相当,而且更高效。
    Abstract As a fundamental part of computational healthcare, Computer Tomography (CT) and Magnetic Resonance Imaging (MRI) provide volumetric data, making the development of algorithms for 3D image analysis a necessity. Despite being computationally cheap, 2D Convolutional Neural Networks can only extract spatial information. In contrast, 3D CNNs can extract three-dimensional features, but they have higher computational costs and latency, which is a limitation for clinical practice that requires fast and efficient models. Inspired by the field of video action recognition we propose a new 2D-based model dubbed Slice SHift UNet (SSH-UNet) which encodes three-dimensional features at 2D CNN's complexity. More precisely multi-view features are collaboratively learned by performing 2D convolutions along the three orthogonal planes of a volume and imposing a weights-sharing mechanism. The third dimension, which is neglected by the 2D convolution, is reincorporated by shifting a portion of the feature maps along the slices' axis. The effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ Segmentation (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets, showing that SSH-UNet is more efficient while on par in performance with state-of-the-art architectures.
    摘要 computer tomography (CT) 和 магнитная резонансная томография (MRI) 提供了体积数据,因此开发三维图像分析算法是必需的基础部分。 although 2D convolutional neural networks (CNNs) 可以提取空间信息,但它们只能提取二维特征。 相比之下,三维 CNNs 可以提取三维特征,但它们的计算成本和延迟更高,这限制了临床实践中的快速和高效模型。 inspirited by the field of video action recognition, we propose a new 2D-based model called Slice SHift UNet (SSH-UNet),它在 2D CNN 的复杂性下编码三维特征。 more precisely, multi-view features are collaboratively learned by performing 2D convolutions along the three orthogonal planes of a volume and imposing a weights-sharing mechanism. the third dimension, which is neglected by the 2D convolution, is reincorporated by shifting a portion of the feature maps along the slices' axis. the effectiveness of our approach is validated in Multi-Modality Abdominal Multi-Organ Segmentation (AMOS) and Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) datasets, showing that SSH-UNet is more efficient while on par in performance with state-of-the-art architectures.

Multi-View Vertebra Localization and Identification from CT Images

  • paper_url: http://arxiv.org/abs/2307.12845
  • repo_url: https://github.com/shanghaitech-impact/multi-view-vertebra-localization-and-identification-from-ct-images
  • paper_authors: Han Wu, Jiadong Zhang, Yu Fang, Zhentao Liu, Nizhuan Wang, Zhiming Cui, Dinggang Shen
  • for: 本研究旨在提出一种基于多视图的 vertebra 定位和识别方法,以解决现有方法的大量计算成本和局部信息有限问题。
  • methods: 该方法将3D问题转化为2D定位和识别任务,并采用多视图对准学习策略来学习全局信息。此外,还提出了一种序列损失来保持vertebrae中的序列结构。
  • results: 评估结果表明,只使用两个2D网络,该方法可以准确地定位和识别CT图像中的vertebrae,并在比较现有方法的情况下卓越表现。
    Abstract Accurately localizing and identifying vertebrae from CT images is crucial for various clinical applications. However, most existing efforts are performed on 3D with cropping patch operation, suffering from the large computation costs and limited global information. In this paper, we propose a multi-view vertebra localization and identification from CT images, converting the 3D problem into a 2D localization and identification task on different views. Without the limitation of the 3D cropped patch, our method can learn the multi-view global information naturally. Moreover, to better capture the anatomical structure information from different view perspectives, a multi-view contrastive learning strategy is developed to pre-train the backbone. Additionally, we further propose a Sequence Loss to maintain the sequential structure embedded along the vertebrae. Evaluation results demonstrate that, with only two 2D networks, our method can localize and identify vertebrae in CT images accurately, and outperforms the state-of-the-art methods consistently. Our code is available at https://github.com/ShanghaiTech-IMPACT/Multi-View-Vertebra-Localization-and-Identification-from-CT-Images.
    摘要 通过CT图像进行精准地Localizing和识别脊梗是许多临床应用中的关键。然而,大多数现有的尝试都是基于3D的剪辑补丁操作,它们受到大量计算成本和有限的全局信息的限制。在这篇论文中,我们提出了基于多视图的脊梗Localization和识别方法,将3D问题转化为2D的Localization和识别任务。不同于剪辑补丁限制,我们的方法可以自然地学习多视图的全局信息。此外,为了更好地捕捉不同视角的解剖结构信息,我们还提出了一种多视图对比学习策略来预训练脊梗。此外,我们还提出了一种序列损失,以维护链接在脊梗上的序列结构。评估结果表明,只有两个2D网络,我们的方法可以在CT图像中准确地Localizing和识别脊梗,并在状态艺术方法上一致性地表现出优于其他方法。我们的代码可以在https://github.com/ShanghaiTech-IMPACT/Multi-View-Vertebra-Localization-and-Identification-from-CT-Images上获取。

Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning

  • paper_url: http://arxiv.org/abs/2307.12792
  • repo_url: None
  • paper_authors: Martin Huber, Sebastien Ourselin, Christos Bergeles, Tom Vercauteren
  • for: 这个研究探讨了透过从逆向录影中学习自动化 Laparoscopic 镜头运动。
  • methods: 研究将从 retrospective 录影中学习对象运动空间的增强,运用 homographies 进行对象运动不变对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对照对
    Abstract In this work, we investigate laparoscopic camera motion automation through imitation learning from retrospective videos of laparoscopic interventions. A novel method is introduced that learns to augment a surgeon's behavior in image space through object motion invariant image registration via homographies. Contrary to existing approaches, no geometric assumptions are made and no depth information is necessary, enabling immediate translation to a robotic setup. Deviating from the dominant approach in the literature which consist of following a surgical tool, we do not handcraft the objective and no priors are imposed on the surgical scene, allowing the method to discover unbiased policies. In this new research field, significant improvements are demonstrated over two baselines on the Cholec80 and HeiChole datasets, showcasing an improvement of 47% over camera motion continuation. The method is further shown to indeed predict camera motion correctly on the public motion classification labels of the AutoLaparo dataset. All code is made accessible on GitHub.
    摘要 在这个研究中,我们研究了通过imitating Learning自逆 Laparoscopic 摄像头运动的自动化。我们提出了一种新的方法,可以在图像空间通过对象运动不变的图像 регистрациюvia homographies来增强 Surgeon 的行为。与现有方法不同,我们没有做任何几何假设,也没有需要深度信息,因此可以立即翻译到Robotic 设置。与文献中主流的方法不同,我们没有手动定义目标,也没有对手术场景做任何假设,因此方法可以发现无偏的策略。在这个新的研究领域中,我们示出了在Cholec80 和 HeiChole 数据集上显著提高,比对照续摄像头运动的Camera Motion Continuation 提高47%。此外,我们还证明了该方法可以正确预测摄像头运动在AutoLaparo 数据集上的公共运动分类标签上。所有代码都已经公开在 GitHub。

Synthetic white balancing for intra-operative hyperspectral imaging

  • paper_url: http://arxiv.org/abs/2307.12791
  • repo_url: None
  • paper_authors: Anisha Bahl, Conor C. Horgan, Mirek Janatka, Oscar J. MacCormac, Philip Noonan, Yijing Xie, Jianrong Qiu, Nicola Cavalcanti, Philipp Fürnstahl, Michael Ebner, Mads S. Bergholt, Jonathan Shapey, Tom Vercauteren
  • For: The paper is written for the purpose of demonstrating the need for in situ white references in hyperspectral imaging for surgical applications, and proposing a novel, sterile, synthetic reference construction algorithm to address this need.* Methods: The paper uses a composite image from a video of a standard sterile ruler to create the synthetic reference, and models the reference as the product of independent spatial and spectral components, with a scalar factor accounting for gain, exposure, and light intensity.* Results: The paper shows that the synthetic references achieve median pixel-by-pixel errors lower than 6.5% and produce similar reconstructions and errors to an ideal reference, and that the algorithm integrated well into surgical workflow with median pixel-by-pixel errors of 4.77%, while maintaining good spectral and color reconstruction.
    Abstract Hyperspectral imaging shows promise for surgical applications to non-invasively provide spatially-resolved, spectral information. For calibration purposes, a white reference image of a highly-reflective Lambertian surface should be obtained under the same imaging conditions. Standard white references are not sterilizable, and so are unsuitable for surgical environments. We demonstrate the necessity for in situ white references and address this by proposing a novel, sterile, synthetic reference construction algorithm. The use of references obtained at different distances and lighting conditions to the subject were examined. Spectral and color reconstructions were compared with standard measurements qualitatively and quantitatively, using $\Delta E$ and normalised RMSE respectively. The algorithm forms a composite image from a video of a standard sterile ruler, whose imperfect reflectivity is compensated for. The reference is modelled as the product of independent spatial and spectral components, and a scalar factor accounting for gain, exposure, and light intensity. Evaluation of synthetic references against ideal but non-sterile references is performed using the same metrics alongside pixel-by-pixel errors. Finally, intraoperative integration is assessed though cadaveric experiments. Improper white balancing leads to increases in all quantitative and qualitative errors. Synthetic references achieve median pixel-by-pixel errors lower than 6.5% and produce similar reconstructions and errors to an ideal reference. The algorithm integrated well into surgical workflow, achieving median pixel-by-pixel errors of 4.77%, while maintaining good spectral and color reconstruction.
    摘要 高spectral成像显示在手术应用中具有潜在的优势,能够非侵入式地在空间上提供 spectral信息。为了进行准确的均衡,需要在同一种 imaging 条件下获得一个白色参照图像,但标准的白色参照图像不能sterilizable,因此不适用于手术环境。我们提出了一种新的、sterile、Synthetic参照图像建构算法。我们测试了不同距离和照明条件下的参照图像的使用,并与标准测量进行比较。我们使用了ΔE和normalized RMSE两种指标进行评估。我们的算法使用了一个标准 sterile 的测量仪表,并对其进行了补做。参照图像被视为独立的空间和spectral组分的乘积,以及一个权值补做照明、曝光和光强。我们对synthetic参照图像与理想 pero non-sterile 参照图像进行了比较,并使用了相同的指标进行评估。最后,我们通过实验评估了这种算法在手术过程中的integrability。不当的白平衡会导致所有量化和质量错误的增加。synthetic参照图像的 median 像素误差低于6.5%,并且生成了与理想参照图像类似的重建和错误。我们的算法在手术工作流中融合了良好的 spectral和color重建,并且 median 像素误差为4.77%。

ICF-SRSR: Invertible scale-Conditional Function for Self-Supervised Real-world Single Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2307.12751
  • repo_url: None
  • paper_authors: Reyhaneh Neshatavar, Mohsen Yavartanoo, Sanghyun Son, Kyoung Mu Lee
  • for: 提高单张图像超分辨率(SISR)的性能,不使用任何对应的训练数据。
  • methods: 提出了一种新的可逆扩率函数(ICF),可以扩大输入图像,然后使用不同的扩率条件恢复原始输入图像。基于该ICF,提出了一种新的无监督SISR框架(ICF-SRSR)。
  • results: 经验表明,提出的ICF-SRSR方法在实际世界 scenarios中可以很好地处理SISR任务,并且与现有的监督/无监督方法在公共 benchmark datasets上展现了相似的性能。
    Abstract Single image super-resolution (SISR) is a challenging ill-posed problem that aims to up-sample a given low-resolution (LR) image to a high-resolution (HR) counterpart. Due to the difficulty in obtaining real LR-HR training pairs, recent approaches are trained on simulated LR images degraded by simplified down-sampling operators, e.g., bicubic. Such an approach can be problematic in practice because of the large gap between the synthesized and real-world LR images. To alleviate the issue, we propose a novel Invertible scale-Conditional Function (ICF), which can scale an input image and then restore the original input with different scale conditions. By leveraging the proposed ICF, we construct a novel self-supervised SISR framework (ICF-SRSR) to handle the real-world SR task without using any paired/unpaired training data. Furthermore, our ICF-SRSR can generate realistic and feasible LR-HR pairs, which can make existing supervised SISR networks more robust. Extensive experiments demonstrate the effectiveness of the proposed method in handling SISR in a fully self-supervised manner. Our ICF-SRSR demonstrates superior performance compared to the existing methods trained on synthetic paired images in real-world scenarios and exhibits comparable performance compared to state-of-the-art supervised/unsupervised methods on public benchmark datasets.
    摘要 Single image super-resolution (SISR) 是一个具有挑战性的不定系数问题,旨在将给定的低分辨率 (LR) 图像提升到高分辨率 (HR) 对应的图像。由于实际获得LR-HR训练对的困难,现有的方法通常是通过简化的下采样算法,如比 Example: bicubic,进行训练。这种方法在实践中可能会存在问题,因为生成的Synthesized和实际世界LR图像之间存在很大的差距。为了解决这个问题,我们提出了一种新的减少函数 (ICF),可以将输入图像缩放,然后使用不同的缩放比例来恢复原始输入。通过利用我们提出的ICF,我们建立了一种新的自动编码SR框架 (ICF-SRSR),可以在不使用任何paired/unpaired训练数据的情况下进行SR任务。此外,我们的ICF-SRSR可以生成可靠和可行的LR-HR对,这可以使现有的supervised SR网络更加可靠。我们的实验表明,我们的ICF-SRSR可以在不使用任何训练数据的情况下处理SR任务,并且在实际世界 scenario 中表现出色。我们的ICF-SRSR在与现有的方法进行比较时,在公共的benchmark datasets上表现出了相当的性能。

Dense Transformer based Enhanced Coding Network for Unsupervised Metal Artifact Reduction

  • paper_url: http://arxiv.org/abs/2307.12717
  • repo_url: None
  • paper_authors: Wangduo Xie, Matthew B. Blaschko
  • for: 针对CT图像损坏的金属artifacts,提高临床诊断的精度。
  • methods: 提出了一种基于Dense Transformer的增强编码网络(DTEC-Net),利用高阶杂分解编码器和转换器来获得长距离匹配的紧密编码序列。然后,提出了第二阶杂分解方法来改进密集序列的解码过程。
  • results: 对一个标准测试集进行了广泛的实验和模型说明,证明DTEC-Net的有效性,其在降低金属artifacts的同时保留了更多的细节Texture。与之前的状态统计方法相比,DTEC-Net显著提高了图像质量。
    Abstract CT images corrupted by metal artifacts have serious negative effects on clinical diagnosis. Considering the difficulty of collecting paired data with ground truth in clinical settings, unsupervised methods for metal artifact reduction are of high interest. However, it is difficult for previous unsupervised methods to retain structural information from CT images while handling the non-local characteristics of metal artifacts. To address these challenges, we proposed a novel Dense Transformer based Enhanced Coding Network (DTEC-Net) for unsupervised metal artifact reduction. Specifically, we introduce a Hierarchical Disentangling Encoder, supported by the high-order dense process, and transformer to obtain densely encoded sequences with long-range correspondence. Then, we present a second-order disentanglement method to improve the dense sequence's decoding process. Extensive experiments and model discussions illustrate DTEC-Net's effectiveness, which outperforms the previous state-of-the-art methods on a benchmark dataset, and greatly reduces metal artifacts while restoring richer texture details.
    摘要

Low-complexity Overfitted Neural Image Codec

  • paper_url: http://arxiv.org/abs/2307.12706
  • repo_url: https://github.com/Orange-OpenSource/Cool-Chic
  • paper_authors: Thomas Leguay, Théo Ladune, Pierrick Philippe, Gordon Clare, Félix Henry
  • for: 这个论文是为了提出一种具有减少复杂度的神经网络图像编码器,该编码器可以对输入图像进行适应参数过滤。
  • methods: 该论文使用了自适应神经网络,并通过优化训练过程和使用轻量级模块来降低编码器的复杂度。
  • results: 该论文的方法可以与 autoencoder 和 HEVC 比肩,并且在不同的编码条件下具有14%的rate reduction,同时保持相似的复杂度。
    Abstract We propose a neural image codec at reduced complexity which overfits the decoder parameters to each input image. While autoencoders perform up to a million multiplications per decoded pixel, the proposed approach only requires 2300 multiplications per pixel. Albeit low-complexity, the method rivals autoencoder performance and surpasses HEVC performance under various coding conditions. Additional lightweight modules and an improved training process provide a 14% rate reduction with respect to previous overfitted codecs, while offering a similar complexity. This work is made open-source at https://orange-opensource.github.io/Cool-Chic/
    摘要 我们提出了一种减少复杂性的神经图像编码器,其将解码器参数过拟合到输入图像。而自动编码器可能需要每个解码ixel进行数百万次乘法运算,而我们的方法只需要每个解码ixel进行2300次乘法运算。虽然具有较低的复杂性,我们的方法与自动编码器的性能相当,甚至超过HEVC的编码性能在不同的编码条件下。此外,我们还提供了一些轻量级模块和改进的训练过程,可以对前一代过拟合编码器进行14%的比较率减少,同时保持相似的复杂性。该工作将在https://orange-opensource.github.io/Cool-Chic/上开源。

Bayesian Based Unrolling for Reconstruction and Super-resolution of Single-Photon Lidar Systems

  • paper_url: http://arxiv.org/abs/2307.12700
  • repo_url: None
  • paper_authors: Abderrahim Halimi, Jakeoung Koo, Stephen McLaughlin
  • for: 这篇论文主要用于描述一种基于深度学习的3D单光子探测器的重建和超分辨率方法。
  • methods: 该方法基于一种卷积 Bayesian 模型,可以在高噪音环境下提供最佳估计,同时具有改进的网络解释性。
  • results: 与现有的学习基于方法相比,该算法具有减少可训练参数数量、更高的噪音耐受度和系统响应函数模型化不足的问题,同时提供了更多的估计信息,包括不确定性度量。 Synthetic and real data 比较表明,该算法可以与现有算法相比,提供类似的推理质量和计算复杂度。
    Abstract Deploying 3D single-photon Lidar imaging in real world applications faces several challenges due to imaging in high noise environments and with sensors having limited resolution. This paper presents a deep learning algorithm based on unrolling a Bayesian model for the reconstruction and super-resolution of 3D single-photon Lidar. The resulting algorithm benefits from the advantages of both statistical and learning based frameworks, providing best estimates with improved network interpretability. Compared to existing learning-based solutions, the proposed architecture requires a reduced number of trainable parameters, is more robust to noise and mismodelling of the system impulse response function, and provides richer information about the estimates including uncertainty measures. Results on synthetic and real data show competitive results regarding the quality of the inference and computational complexity when compared to state-of-the-art algorithms. This short paper is based on contributions published in [1] and [2].
    摘要 <>将3D单 фотоン探测技术应用于实际场景中存在多种挑战,包括高噪声环境和探测器有限分辨率。这篇论文提出了基于深度学习的bayesian模型的推算和超Resolution算法,以解决3D单 фотоン探测中的重要问题。该算法利用了统计和学习两个框架的优点,提供了最佳估计值,同时具有改进的网络解释性。与现有的学习型解决方案相比,提出的架构具有较少的可训练参数量、更高的噪声和系统响应函数模型化不正确率,并提供了更多的估计值和不确定度测量。对于synthetic和实际数据进行了比较,结果显示了与当前状态艺术算法相当的质量和计算复杂度。这篇短文基于[1]和[2]的贡献。

Automatic lobe segmentation using attentive cross entropy and end-to-end fissure generation

  • paper_url: http://arxiv.org/abs/2307.12634
  • repo_url: https://github.com/htytewx/softcam
  • paper_authors: Qi Su, Na Wang, Jiawen Xie, Yinan Chen, Xiaofan Zhang
  • For: automatic lung lobe segmentation algorithm for the diagnosis and treatment of lung diseases* Methods: task-specific loss function to pay attention to the area around the pulmonary fissure, end-to-end pulmonary fissure generation method, registration-based loss function to alleviate convergence difficulty* Results: achieved 97.83% and 94.75% dice scores on private dataset STLB and public LUNA16 dataset respectively.
    Abstract The automatic lung lobe segmentation algorithm is of great significance for the diagnosis and treatment of lung diseases, however, which has great challenges due to the incompleteness of pulmonary fissures in lung CT images and the large variability of pathological features. Therefore, we propose a new automatic lung lobe segmentation framework, in which we urge the model to pay attention to the area around the pulmonary fissure during the training process, which is realized by a task-specific loss function. In addition, we introduce an end-to-end pulmonary fissure generation method in the auxiliary pulmonary fissure segmentation task, without any additional network branch. Finally, we propose a registration-based loss function to alleviate the convergence difficulty of the Dice loss supervised pulmonary fissure segmentation task. We achieve 97.83% and 94.75% dice scores on our private dataset STLB and public LUNA16 dataset respectively.
    摘要 自动肺lobSeg算法对肺病诊断和治疗具有很大的重要性,但是受到肺CT图像的杏仁缺失和疾病特征的大量变化所带来的挑战。因此,我们提出了一种新的自动肺lobSeg框架,其中我们要求模型在训练过程中对杏仁附近区域进行注意力。我们实现了这一点通过任务特定的损失函数。此外,我们还提出了一种不含额外网络分支的杏仁生成方法,以及一种基于准确Registration的损失函数,以解决约瑟分解损失supervised杏仁分 segmentation任务的困难。在我们的私有数据集STLB和公共数据集LUNA16上,我们实现了97.83%和94.75%的 dice分数。

Sparse annotation strategies for segmentation of short axis cardiac MRI

  • paper_url: http://arxiv.org/abs/2307.12619
  • repo_url: None
  • paper_authors: Josh Stein, Maxime Di Folco, Julia Schnabel
  • for: 本研究旨在探讨使用少量标注数据进行心脏MRI分割的方法,以优化标注成本和提高分割性能。
  • methods: 我们采用了减少数据量和标注数量的方法,包括减少数据量和标注数量,以及使用转移学习和数据增强技术。
  • results: 我们的实验结果表明,训练使用少量标注数据可以达到0.85的Dice分数和与全数据集相当的性能。此外,我们发现,在中部层的标注更加有价值,而胸部区域的标注最差。在评估量据集对比中,更多的层标注比更多的量据集具有更高的分割性能。因此,建议在标注时尽量标注中部层,而不是标注更多的量据集。
    Abstract Short axis cardiac MRI segmentation is a well-researched topic, with excellent results achieved by state-of-the-art models in a supervised setting. However, annotating MRI volumes is time-consuming and expensive. Many different approaches (e.g. transfer learning, data augmentation, few-shot learning, etc.) have emerged in an effort to use fewer annotated data and still achieve similar performance as a fully supervised model. Nevertheless, to the best of our knowledge, none of these works focus on which slices of MRI volumes are most important to annotate for yielding the best segmentation results. In this paper, we investigate the effects of training with sparse volumes, i.e. reducing the number of cases annotated, and sparse annotations, i.e. reducing the number of slices annotated per case. We evaluate the segmentation performance using the state-of-the-art nnU-Net model on two public datasets to identify which slices are the most important to annotate. We have shown that training on a significantly reduced dataset (48 annotated volumes) can give a Dice score greater than 0.85 and results comparable to using the full dataset (160 and 240 volumes for each dataset respectively). In general, training on more slice annotations provides more valuable information compared to training on more volumes. Further, annotating slices from the middle of volumes yields the most beneficial results in terms of segmentation performance, and the apical region the worst. When evaluating the trade-off between annotating volumes against slices, annotating as many slices as possible instead of annotating more volumes is a better strategy.
    摘要 短轴心臓MRI分割是一个广泛研究的话题,现有一些最新的模型在指导下达到了出色的结果。然而,对MRIVolume进行标注是时间consuming和expensive。许多不同的方法(如转移学习、数据扩展、少数学习等)在尝试使用 fewer annotated data 并且达到类似于全指导模型的性能。然而,据我们所知,这些工作没有关注于哪些MRI Volume slice是最重要的标注,以达到最佳分割结果。在这篇文章中,我们 investigate了在减少 annotated volume 和 sparse annotations 下的训练效果。我们使用了state-of-the-art nnU-Net模型对两个公共数据集进行评估,以确定哪些slice是最重要的标注。我们发现,通过减少数据集至48个标注Volume可以达到Dice分数大于0.85,并且与使用全数据集(160和240个Volume)的结果相当。总的来说,训练更多的slice标注比训练更多的Volume更有价值的信息。此外,从MRI Volume 中间部分标注slice最有利于分割性能,而apical区域最差。当评估 annotating Volume 和 slice 之间的负担比,更好的策略是annotating as many slices as possible 而不是 annotating more Volume。

Attribute Regularized Soft Introspective VAE: Towards Cardiac Attribute Regularization Through MRI Domains

  • paper_url: http://arxiv.org/abs/2307.12618
  • repo_url: None
  • paper_authors: Maxime Di Folco, Cosmin Bercea, Julia A. Schnabel
  • for: 本研究旨在提高深度生成模型的控制性,通过选择性地修改数据特征进行数据生成和修饰。
  • methods: 本研究使用了Variational Autoencoders (VAEs),并通过添加对偏好损失的限制来提高模型的控制性。
  • results: 实验表明,提出的Attributed Soft Introspective VAE(Attri-SIVAE)方法可以在不同的MRI数据集上达到同等的重建和规范化性,而且在不同的数据集上也可以保持同等的规范化水平,不同于相比方法。
    Abstract Deep generative models have emerged as influential instruments for data generation and manipulation. Enhancing the controllability of these models by selectively modifying data attributes has been a recent focus. Variational Autoencoders (VAEs) have shown promise in capturing hidden attributes but often produce blurry reconstructions. Controlling these attributes through different imaging domains is difficult in medical imaging. Recently, Soft Introspective VAE leverage the benefits of both VAEs and Generative Adversarial Networks (GANs), which have demonstrated impressive image synthesis capabilities, by incorporating an adversarial loss into VAE training. In this work, we propose the Attributed Soft Introspective VAE (Attri-SIVAE) by incorporating an attribute regularized loss, into the Soft-Intro VAE framework. We evaluate experimentally the proposed method on cardiac MRI data from different domains, such as various scanner vendors and acquisition centers. The proposed method achieves similar performance in terms of reconstruction and regularization compared to the state-of-the-art Attributed regularized VAE but additionally also succeeds in keeping the same regularization level when tested on a different dataset, unlike the compared method.
    摘要 深度生成模型已经成为数据生成和修饰的重要工具。提高这些模型的可控性,通过选择性地修改数据属性,是最近的研究焦点。变量自动编码器(VAEs)可以捕捉隐藏属性,但经常生成模糊的重建。在医学成像中,控制这些属性通过不同的成像频谱是困难的。最近,软 introspective VAE 利用了 VAEs 和生成对抗网络(GANs)的优点,通过在 VAE 训练中添加对抗损失来提高图像生成能力。在这项工作中,我们提出了具有属性规则化损失的 Attributed Soft Introspective VAE(Attri-SIVAE)。我们通过实验评估该方法在不同的cardiac MRI数据集上的性能。该方法与状态uset-of-the-art Attributed regularized VAE 相似的重建和规则化性能,并且在不同的数据集上保持了同等的规则化水平,不同于相比方法。

AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection

  • paper_url: http://arxiv.org/abs/2308.03766
  • repo_url: None
  • paper_authors: Anish Mall, Sanchit Kabra, Ankur Lhila, Pawan Ajmera
  • for: 这个研究论文旨在提供一个自动化的豇豉病诊断框架,用于早期检测豇豉作物中的病诊断。
  • methods: 该框架使用多spectral图像,结合了深度学习网络来提取特征和分割方法,以识别豇豉作物和其相关的病诊断。
  • results: 实验结果表明,该框架可以有效地检测豇豉作物中的多种病诊断,包括粉刺虫、芽虫和叶褪病等。
    Abstract This research paper presents AMaizeD: An End to End Pipeline for Automatic Maize Disease Detection, an automated framework for early detection of diseases in maize crops using multispectral imagery obtained from drones. A custom hand-collected dataset focusing specifically on maize crops was meticulously gathered by expert researchers and agronomists. The dataset encompasses a diverse range of maize varieties, cultivation practices, and environmental conditions, capturing various stages of maize growth and disease progression. By leveraging multispectral imagery, the framework benefits from improved spectral resolution and increased sensitivity to subtle changes in plant health. The proposed framework employs a combination of convolutional neural networks (CNNs) as feature extractors and segmentation techniques to identify both the maize plants and their associated diseases. Experimental results demonstrate the effectiveness of the framework in detecting a range of maize diseases, including powdery mildew, anthracnose, and leaf blight. The framework achieves state-of-the-art performance on the custom hand-collected dataset and contributes to the field of automated disease detection in agriculture, offering a practical solution for early identification of diseases in maize crops advanced machine learning techniques and deep learning architectures.
    摘要

Development Of Automated Cardiac Arrhythmia Detection Methods Using Single Channel ECG Signal

  • paper_url: http://arxiv.org/abs/2308.02405
  • repo_url: None
  • paper_authors: Arpita Paul, Avik Kumar Das, Manas Rakshit, Ankita Ray Chowdhury, Susmita Saha, Hrishin Roy, Sajal Sarkar, Dongiri Prasanth, Eravelli Saicharan
    for:多种心脏病的自动检测和分类可能会减少心脏疾病的死亡率。本研究提出了基于单通道电cardiogram(ECG)信号的多类刺激识别算法。methods:在本研究中,使用心脏自变性(HRV)、形态特征和wavelet幂特征,通过机器学习基于Random Forest分类器进行检测。results:使用HRV和时域形态特征时,获得了85.11%的准确率、85.11%的敏感度、85.07%的精度和85.00%的F1分数。使用HRV和wavelet幂特征时,性能提高到90.91%的准确率、90.91%的敏感度、90.96%的精度和90.87%的F1分数。实验结果表明,提出的方案可以有效地从单通道ECG记录中检测多种刺激。
    Abstract Arrhythmia, an abnormal cardiac rhythm, is one of the most common types of cardiac disease. Automatic detection and classification of arrhythmia can be significant in reducing deaths due to cardiac diseases. This work proposes a multi-class arrhythmia detection algorithm using single channel electrocardiogram (ECG) signal. In this work, heart rate variability (HRV) along with morphological features and wavelet coefficient features are utilized for detection of 9 classes of arrhythmia. Statistical, entropy and energy-based features are extracted and applied to machine learning based random forest classifiers. Data used in both works is taken from 4 broad databases (CPSC and CPSC extra, PTB-XL, G12EC and Chapman-Shaoxing and Ningbo Database) made available by Physionet. With HRV and time domain morphological features, an average accuracy of 85.11%, sensitivity of 85.11%, precision of 85.07% and F1 score of 85.00% is obtained whereas with HRV and wavelet coefficient features, the performance obtained is 90.91% accuracy, 90.91% sensitivity, 90.96% precision and 90.87% F1 score. The detailed analysis of simulation results affirms that the presented scheme effectively detects broad categories of arrhythmia from single-channel ECG records. In the last part of the work, the proposed classification schemes are implemented on hardware using Raspberry Pi for real time ECG signal classification.
    摘要 心动过速病(Arrhythmia)是心血管疾病中最常见的一种。自动检测和识别Arrhythmia可以有效降低心血管疾病的死亡率。这项工作提出了基于单通道电cardiogram(ECG)信号的多类Arrhythmia检测算法。在这项工作中,利用心跳变化(HRV)以及形态特征和wavelet幅特征来检测9种类型的Arrhythmia。通过提取统计、熵和能量基本特征,并应用机器学习基于Random Forest分类器,实现了高精度的Arrhythmia检测。数据来源于Physionet提供的4个广泛数据库(CPSC和CPSC extra、PTB-XL、G12EC和Chapman-Shaoxing和Ningbo数据库)。使用HRV和时域形态特征时,取得了85.11%的准确率、85.11%的敏感度、85.07%的精度和85.00%的F1分数,而使用HRV和wavelet幅特征时,取得了90.91%的准确率、90.91%的敏感度、90.96%的精度和90.87%的F1分数。etailed分析结果表明,提出的方案可以有效地从单通道ECG记录中检测广泛的Arrhythmia类型。最后,提出的分类方案在硬件上使用Raspberry Pi实现了实时ECG信号分类。

4D Feet: Registering Walking Foot Shapes Using Attention Enhanced Dynamic-Synchronized Graph Convolutional LSTM Network

  • paper_url: http://arxiv.org/abs/2307.12377
  • repo_url: None
  • paper_authors: Farzam Tajdari, Toon Huysmans, Xinhe Yao, Jun Xu, Yu Song
  • for: 该论文旨在帮助研究人员更好地理解动态弹性人体部件的特征,通过基于多个异步摄像机捕获的4D扫描数据进行重建。
  • methods: 该论文提出了一种通用框架,包括:1)使用非RIGID迭代最近最远点对精度找到和对准不同摄像机捕获的3D扫描数据中的动态特征;2)使用一种新型的ADGC-LSTM网络将不同摄像机捕获的3D扫描数据同步到特定摄像机的时间轴上;3)使用非RIGID注准方法将同步化的3D扫描数据注准到高质量模板中。
  • results: 该论文采用了一种新开发的4D脚部扫描仪,并将数据集分为58名参与者的15帧/秒4D形态数据集(共116个脚部,包括5147帧的3D扫描数据),覆盖了脚步征的重要阶段。结果表明提出的方法有效地同步异步的4D扫描数据,特别是通过使用提出的ADGC-LSTM网络进行同步。
    Abstract 4D scans of dynamic deformable human body parts help researchers have a better understanding of spatiotemporal features. However, reconstructing 4D scans based on multiple asynchronous cameras encounters two main challenges: 1) finding the dynamic correspondences among different frames captured by each camera at the timestamps of the camera in terms of dynamic feature recognition, and 2) reconstructing 3D shapes from the combined point clouds captured by different cameras at asynchronous timestamps in terms of multi-view fusion. In this paper, we introduce a generic framework that is able to 1) find and align dynamic features in the 3D scans captured by each camera using the nonrigid iterative closest-farthest points algorithm; 2) synchronize scans captured by asynchronous cameras through a novel ADGC-LSTM-based network, which is capable of aligning 3D scans captured by different cameras to the timeline of a specific camera; and 3) register a high-quality template to synchronized scans at each timestamp to form a high-quality 3D mesh model using a non-rigid registration method. With a newly developed 4D foot scanner, we validate the framework and create the first open-access data-set, namely the 4D feet. It includes 4D shapes (15 fps) of the right and left feet of 58 participants (116 feet in total, including 5147 3D frames), covering significant phases of the gait cycle. The results demonstrate the effectiveness of the proposed framework, especially in synchronizing asynchronous 4D scans using the proposed ADGC-LSTM network.
    摘要 4D扫描技术为研究人体动态变形带来了更好的认知,但是通过多个异步相机重建4D扫描存在两大挑战:1)在不同相机拍摄时间点找到动态匹配,并通过动态特征识别将它们相互对应;2)将不同相机拍摄的点云数据 fusion 到一起,以便形成高质量的3D模型。在这篇论文中,我们提出了一种通用的框架,可以1)使用非RIGID迭代最近最远点算法来在不同相机拍摄的3D扫描中找到和对应动态特征;2)使用一种新型的ADGC-LSTM网络将不同相机拍摄的3D扫描同步到同一个时间轴上;3)使用非RIGID注册方法将同步化后的3D扫描与高质量模板进行对应,以形成高质量的3D mesh模型。我们使用一种新开发的4D脚部扫描仪来验证该框架,并创建了首个公共数据集,即4D脚部(15帧/秒),包括58名参与者的右和左脚的4D形状(共116个脚,包括5147帧),覆盖了走势过程中重要的阶段。结果表明提出的框架具有良好的效果,特别是在同步异步4D扫描中使用提出的ADGC-LSTM网络。