eess.IV - 2023-07-16

TransNuSeg: A Lightweight Multi-Task Transformer for Nuclei Segmentation

paper_url: http://arxiv.org/abs/2307.08051
repo_url: https://github.com/zhenqi-he/transnuseg
paper_authors: Zhenqi He, Mathias Unberath, Jing Ke, Yiqing Shen
for: 这篇论文是为了提出一种基于Transformer的 nuclei segmentation方法，以解决现有的自动 nuclei segmentation方法具有较高的参数数量和训练时间。
methods: 这篇论文使用了一种叫做TransNuSeg的pure Transformer框架，其中包括了一个tri-decoder结构，用于同时进行nuclei实例、nuclei边缘和分布式边缘分割。此外， authors还提出了一种新的自适应loss函数，以确保不同分支的预测结果之间的一致性。
results: 实验表明，TransNuSeg方法可以在两个不同的数据集上，与state-of-the-art counterparts such as CA2.5-Net比较，提高了2-3%的Dice指标，同时减少了30%的参数数量。这表明，Transformer在nuclei segmentation领域中具有强大的能力，可以作为实际临床应用中的有效解决方案。

Abstract
Nuclei appear small in size, yet, in real clinical practice, the global spatial information and correlation of the color or brightness contrast between nuclei and background, have been considered a crucial component for accurate nuclei segmentation. However, the field of automatic nuclei segmentation is dominated by Convolutional Neural Networks (CNNs), meanwhile, the potential of the recently prevalent Transformers has not been fully explored, which is powerful in capturing local-global correlations. To this end, we make the first attempt at a pure Transformer framework for nuclei segmentation, called TransNuSeg. Different from prior work, we decouple the challenging nuclei segmentation task into an intrinsic multi-task learning task, where a tri-decoder structure is employed for nuclei instance, nuclei edge, and clustered edge segmentation respectively. To eliminate the divergent predictions from different branches in previous work, a novel self distillation loss is introduced to explicitly impose consistency regulation between branches. Moreover, to formulate the high correlation between branches and also reduce the number of parameters, an efficient attention sharing scheme is proposed by partially sharing the self-attention heads amongst the tri-decoders. Finally, a token MLP bottleneck replaces the over-parameterized Transformer bottleneck for a further reduction in model complexity. Experiments on two datasets of different modalities, including MoNuSeg have shown that our methods can outperform state-of-the-art counterparts such as CA2.5-Net by 2-3% Dice with 30% fewer parameters. In conclusion, TransNuSeg confirms the strength of Transformer in the context of nuclei segmentation, which thus can serve as an efficient solution for real clinical practice. Code is available at https://github.com/zhenqi-he/transnuseg.

摘要
nuclei 看起来很小，但在实际临床实践中，全球空间信息和背景和核或亮度对比的色彩或亮度对比，被视为精度核 segmentation 的关键组成部分。然而，核心 automatic segmentation 领域被 Convolutional Neural Networks (CNNs) 所主导，而 transformer 的潜力尚未得到充分探索，这是强大地捕捉当地-全球对应关系的。为此，我们提出了首个纯 transformer 框架，称为 TransNuSeg。与先前的工作不同，我们将挑战性的核 segmentation 任务分解成内在多任务学习任务，其中使用 tri-decoder 结构进行核实例、核边和集群边 segmentation 等。为了消除先前工作中分支的不一致预测，我们引入了一种新的自我抽象损失函数，以显式地强制分支之间的一致性规则。此外，我们还提出了一种高效的注意力共享方案，通过在 tri-decoders 中共享自注意力头来降低模型参数数量。最后，我们将 токен MLP 瓶颈取代了过参数化的 transformer 瓶颈，以进一步降低模型复杂性。在两个不同的模式数据上进行了实验，包括 MoNuSeg，我们的方法可以与 state-of-the-art 对手 CA2.5-Net 相比，提高 Dice 指标2-3%，并且减少参数数量30%。结论：TransNuSeg 证明了 transformer 在核 segmentation 上的力量，这些可以作为实际临床实践中的高效解决方案。代码可以在上获取。

A Novel SLCA-UNet Architecture for Automatic MRI Brain Tumor Segmentation

paper_url: http://arxiv.org/abs/2307.08048
repo_url: None
paper_authors: Tejashwini P S, Thriveni J, Venugopal K R
for: 预测和检测脑肿瘤，以降低因脑肿瘤而导致的死亡率。
methods: 使用深度学习方法，特别是UNet架构，自动化生物医学影像探索工具。
results: 提出了一种修改后的UNet架构，可以有效地捕捉脑肿瘤影像中的粗细特征信息，并在Brain Tumor Segmentation（BraTS）数据集上达到了良好的性能，具体表现为0.845、0.845、0.999和8.1等指标。

Abstract
Brain tumor is deliberated as one of the severe health complications which lead to decrease in life expectancy of the individuals and is also considered as a prominent cause of mortality worldwide. Therefore, timely detection and prediction of brain tumors can be helpful to prevent death rates due to brain tumors. Biomedical image analysis is a widely known solution to diagnose brain tumor. Although MRI is the current standard method for imaging tumors, its clinical usefulness is constrained by the requirement of manual segmentation which is time-consuming. Deep learning-based approaches have emerged as a promising solution to develop automated biomedical image exploration tools and the UNet architecture is commonly used for segmentation. However, the traditional UNet has limitations in terms of complexity, training, accuracy, and contextual information processing. As a result, the modified UNet architecture, which incorporates residual dense blocks, layered attention, and channel attention modules, in addition to stacked convolution, can effectively capture both coarse and fine feature information. The proposed SLCA UNet approach achieves good performance on the freely accessible Brain Tumor Segmentation (BraTS) dataset, with an average performance of 0.845, 0.845, 0.999, and 8.1 in terms of Dice, Sensitivity, Specificity, and Hausdorff95 for BraTS 2020 dataset, respectively.

摘要
脑肿是一种严重的健康问题，可能导致个体寿命下降，并被认为是全球致死率的一大原因。因此，在时间上掌握和预测脑肿的诊断是非常重要的。生物医学图像分析是一种广泛应用的解决方案，但现有的MRI技术受到手动 segmentation 的限制，这是耗时consuming。深度学习基本单元（Deep Learning-based Approaches）已经出现为开发自动生物医学图像探索工具的有力的解决方案之一。然而，传统的 UNet Architecture 受到复杂性、训练、准确率和上下文信息处理等限制。为此，我们提出了修改后的 UNet 架构，包括循环堆叠、层次注意力和渠道注意力模块，可以有效地捕捉粗细特征信息。我们的 SLCA UNet 方法在公共可用的 Brain Tumor Segmentation（BraTS）数据集上达到了良好的性能，其中 BraTS 2020 数据集的平均性能为 0.845、0.845、0.999 和 8.1 分别在 Dice、敏感性、特异性和 Hausdorff95 方面。

SHAMSUL: Simultaneous Heatmap-Analysis to investigate Medical Significance Utilizing Local interpretability methods

paper_url: http://arxiv.org/abs/2307.08003
repo_url: https://github.com/anondo1969/shamsul
paper_authors: Mahbub Ul Alam, Jaakko Hollmén, Jón Rúnar Baldvinsson, Rahim Rahmani
For: This paper aims to improve the interpretability of deep neural networks in the medical and healthcare domain by applying and comparing four well-established interpretability methods (LIME, SHAP, Grad-CAM, and LRP) to a chest radiography dataset.* Methods: The paper uses transfer learning and a multi-label-multi-class chest radiography dataset to interpret predictions pertaining to specific pathology classes. The authors evaluate the performance of the four interpretability methods through quantitative and qualitative investigations, and compare the results against human expert annotation.* Results: The paper finds that Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap segmentation visualization exhibits the highest level of medical significance. The research highlights the strengths and limitations of the four interpretability methods and suggests that a multimodal-based approach could offer additional insights for enhancing interpretability in the medical domain.Here is the same information in Simplified Chinese text:* For: 本研究旨在提高深度神经网络在医疗领域的解释性，通过应用和比较四种已知的解释方法（LIME、SHAP、Grad-CAM、LRP）来解释特定疾病类型的预测结果。* Methods: 本研究使用了传输学习和多类多标签的胸部X射线数据集来解释特定疾病类型的预测结果。作者们通过量化和质量调查来评估四种解释方法的性能，并与人工专家标注进行比较。* Results: 研究发现，Grad-CAM在量化评估中表现最佳，而 LIME 热图分割视觉化显示最高的医学意义。研究揭示了四种解释方法的优缺点，并建议在医疗领域使用多Modal 基于的方法可以提供更多的解释。

Abstract
The interpretability of deep neural networks has become a subject of great interest within the medical and healthcare domain. This attention stems from concerns regarding transparency, legal and ethical considerations, and the medical significance of predictions generated by these deep neural networks in clinical decision support systems. To address this matter, our study delves into the application of four well-established interpretability methods: Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Layer-wise Relevance Propagation (LRP). Leveraging the approach of transfer learning with a multi-label-multi-class chest radiography dataset, we aim to interpret predictions pertaining to specific pathology classes. Our analysis encompasses both single-label and multi-label predictions, providing a comprehensive and unbiased assessment through quantitative and qualitative investigations, which are compared against human expert annotation. Notably, Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap segmentation visualization exhibits the highest level of medical significance. Our research highlights the strengths and limitations of these interpretability methods and suggests that a multimodal-based approach, incorporating diverse sources of information beyond chest radiography images, could offer additional insights for enhancing interpretability in the medical domain.

摘要
《深度神经网络可读性的研究在医疗领域引发了广泛的关注，主要是由于透明度、法律和伦理考虑以及在临床决策支持系统中神经网络预测的医学意义。为解决这个问题，我们的研究探讨了四种已有的可读性方法：本地可读性模型自定义解释（LIME）、Shapley添加itive exPlanations（SHAP）、梯度权重分类活动映射（Grad-CAM）和层次 relevance propagation（LRP）。通过将这些方法应用于一个多标签多类胸部X射像数据集，我们想要解释具体疾病类型的预测结果。我们的分析包括单标签和多标签预测，并通过量化和质量调查对比人工专家标注进行了全面和无偏评估。结果显示，Grad-CAM在量化评估中表现最佳，而LIME热图分 segmentation 可读性方法显示最高水平的医学意义。我们的研究描述了这些可读性方法的优缺点，并表明在医疗领域可能需要结合多种信息源以获得更多的解释。》

MoTIF: Learning Motion Trajectories with Local Implicit Neural Functions for Continuous Space-Time Video Super-Resolution

paper_url: http://arxiv.org/abs/2307.07988
repo_url: https://github.com/sichun233746/motif
paper_authors: Yi-Hsin Chen, Si-Cun Chen, Yi-Hsin Chen, Yen-Yu Lin, Wen-Hsiao Peng
for: 这篇论文的目的是提出一种能够在任意扩大比例下提高视频的空间时间超分辨率（C-STVSR）技术。
methods: 该技术使用了一种空间时间本地隐藏函数，可以学习输入视频帧之间的前进动态信息。该函数有着学习前进动态信息的特点，而不是学习一个混合动态信息的混合函数。为了使得动态信息 interpolate 更加容易，该技术使用了从输入视频中提取的稀疏样本前进动态信息作为上下文输入。
results: 该技术在C-STVSR领域实现了状态机器的性能记录，并提供了一个可用的源代码MoTIF。

Abstract
This work addresses continuous space-time video super-resolution (C-STVSR) that aims to up-scale an input video both spatially and temporally by any scaling factors. One key challenge of C-STVSR is to propagate information temporally among the input video frames. To this end, we introduce a space-time local implicit neural function. It has the striking feature of learning forward motion for a continuum of pixels. We motivate the use of forward motion from the perspective of learning individual motion trajectories, as opposed to learning a mixture of motion trajectories with backward motion. To ease motion interpolation, we encode sparsely sampled forward motion extracted from the input video as the contextual input. Along with a reliability-aware splatting and decoding scheme, our framework, termed MoTIF, achieves the state-of-the-art performance on C-STVSR. The source code of MoTIF is available at https://github.com/sichun233746/MoTIF.

摘要
这个工作Addresses continuous space-time video super-resolution (C-STVSR)，它的目标是通过任何缩放因子将输入视频 both spatially and temporally up-scale。一个关键挑战是在输入视频帧之间传递信息。为此，我们引入了一个空间时本地隐藏神经函数。它有突出的特点是学习输入视频帧中的前进运动。我们从输入视频的动作轨迹学习的角度出发，而不是学习混合动作轨迹中的后向运动。为了简化运动插值，我们将输入视频中稀疏样本的前进运动编码为上下文输入。与一种可靠性感知扩散和解码方案相结合，我们的框架，称之为MoTIF，实现了C-STVSR领域的状态级性能。MoTIF的源代码可以在https://github.com/sichun233746/MoTIF上获取。

Panoramic Voltage-Sensitive Optical Mapping of Contracting Hearts using Cooperative Multi-View Motion Tracking with 12 to 24 Cameras

paper_url: http://arxiv.org/abs/2307.07943
repo_url: None
paper_authors: Shrey Chowdhary, Jan Lebert, Shai Dickman, Jan Christoph
for: 这研究用于图像心脏的活动电位波，以高空间和时间分辨率观察心脏表面的变形。
methods: 这种多摄像头光学映射技术使用24个高速低成本摄像头，可以在整个扭形的心脏表面上图像活动电位波。
results: 研究发现可以使用12个摄像头获得0.5-1.0兆Pixel的合并分辨率，并使用计算机视觉技术进行三维协同多视图动态重建和高分辨率电子敏感测量。通过这种设置，研究者在兔心中测量到了不同心律rhythm中的活动电位波，包括正常律 rhythm、脉冲心律和心肺综合症。这种设置定义了现有技术的新状态，可以用于研究心脏的电机动力学 dynamics during health和疾病。

Abstract
Action potential waves triggering the heart's contractions can be imaged at high spatial and temporal resolutions across the heart surface using voltage-sensitive optical mapping. However, for over three decades, optical mapping has been performed with contraction-inhibited hearts. While it was recently demonstrated that action potential waves can be imaged on parts of the three-dimensional deforming ventricular surface using multi-camera optical mapping, panoramic measurements of action potential waves across the entire beating heart surface remained elusive. Here, we introduce a high-resolution multi-camera optical mapping system consisting of up to 24 high-speed, low-cost cameras with which it is possible to image action potential waves at high resolutions on the entire, strongly deforming ventricular surface of the heart. We imaged isolated hearts inside a custom-designed soccerball-shaped imaging chamber, which facilitates imaging and even illumination with excitation light from all sides in a panoramic fashion. We found that it is possible to image the entire ventricular surface using 12 cameras with 0.5-1.0 megapixels combined resolution. The 12 calibrated cameras generate 1.5 gigabytes of video data per second at imaging speeds of 500 fps, which we process using various computer vision techniques, including three-dimensional cooperative multi-view motion tracking, to generate three-dimensional dynamic reconstructions of the deforming heart surface with corresponding high-resolution voltage-sensitive optical measurements. With our setup, we measured action potential waves at unprecedented resolutions on the contracting three-dimensional surface of rabbit hearts during sinus rhythm, paced rhythm as well as ventricular fibrillation. Our imaging setup defines a new state-of-the-art in the field and can be used to study the heart's electromechanical dynamics during health and disease.

摘要
心脏的刺激波可以通过电容性光Mapping在心脏表面上获得高空间和时间分辨率的刺激波图像。然而，在过去三十年内，光Mapping都是使用干扰心脏的方法进行的。而现在，我们已经成功地在三维凹陷心脏表面上获得刺激波图像，但这些图像仅限于部分心脏表面。在本文中，我们介绍了一种高分辨率多camera光Mapping系统，该系统由24个高速、低成本摄像头组成，可以在整个弯曲的心脏表面上获得刺激波图像。我们使用自定义的足球形封装室进行心脏封装，以便从所有方向进行扫描和照明。我们发现，使用12个0.5-1.0 megapixels摄像头可以获得整个心脏表面的图像。这12个可 kalibrated摄像头每秒钟生成1.5 gigabytes的视频数据，我们使用了多种计算机视觉技术，包括三维合作多视图运动跟踪，来生成三维动态重建三维凹陷心脏表面的高分辨率电容性光测量。通过我们的设置，我们在各种心脏rhythm中测量了刺激波的历史最高分辨率图像。我们的捕捉设置定义了心脏研究领域的新状态符。