eess.IV - 2023-08-09

ACE-HetEM for ab initio Heterogenous Cryo-EM 3D Reconstruction

  • paper_url: http://arxiv.org/abs/2308.04956
  • repo_url: None
  • paper_authors: Weijie Chen, Lin Yao, Zeqing Xia, Yuhang Wang
    for:This paper aims to improve the accuracy of 3D structure reconstruction from cryo-EM images with unknown poses and low signal-to-noise ratio.methods:The proposed method, called ACE-HetEM, uses an unsupervised deep learning architecture based on amortized inference to disentangle conformation classifications and pose estimations.results:ACE-HetEM has comparable accuracy in pose estimation and produces better reconstruction resolution than non-amortized methods on simulated datasets, and is also applicable to real experimental datasets.Here’s the Chinese translation:for:这篇论文目的是提高低信号噪响和未知投影角度的普适电子顺传影像三维结构重建精度。methods:提议的方法是基于杜邦诱导的无监督深度学习架构ACE-HetEM,以分离配置分类和投影估计。results:在模拟数据集上,ACE-HetEM和非杜邦方法相比,pose估计准确率相似,重建分辨率更高,并且可应用于实验室数据集。
    Abstract Due to the extremely low signal-to-noise ratio (SNR) and unknown poses (projection angles and image translation) in cryo-EM experiments, reconstructing 3D structures from 2D images is very challenging. On top of these challenges, heterogeneous cryo-EM reconstruction also has an additional requirement: conformation classification. An emerging solution to this problem is called amortized inference, implemented using the autoencoder architecture or its variants. Instead of searching for the correct image-to-pose/conformation mapping for every image in the dataset as in non-amortized methods, amortized inference only needs to train an encoder that maps images to appropriate latent spaces representing poses or conformations. Unfortunately, standard amortized-inference-based methods with entangled latent spaces have difficulty learning the distribution of conformations and poses from cryo-EM images. In this paper, we propose an unsupervised deep learning architecture called "ACE-HetEM" based on amortized inference. To explicitly enforce the disentanglement of conformation classifications and pose estimations, we designed two alternating training tasks in our method: image-to-image task and pose-to-pose task. Results on simulated datasets show that ACE-HetEM has comparable accuracy in pose estimation and produces even better reconstruction resolution than non-amortized methods. Furthermore, we show that ACE-HetEM is also applicable to real experimental datasets.
    摘要 Translated into Simplified Chinese:由于电镜电子显微镜实验中的信号响应率(SNR)和不确定的投影角度和图像翻译都很低,从2D图像中重建3D结构非常困难。此外,异类电镜电子显微镜重建还有一个额外要求:摘要分类。一种迅速成为解决方案的方法是使用启发器架构或其变体来实现摘要推理。而非摘要方法需要对每个图像在数据集中搜索正确的图像到pose/摘要映射。然而,标准的摘要推理方法附加的缺点是难以从电镜电子显微镜图像中学习摘要分布和投影角度的分布。在这篇论文中,我们提出了一种无监督深度学习架构,称为“ACE-HetEM”,基于摘要推理。为了明确分离摘要分类和投影估计的分布,我们在我们的方法中设计了两个相互轮换的训练任务:图像到图像任务和投影到投影任务。实验结果表明,ACE-HetEM在 simulate 数据集上有相当的准确率和更高的重建分辨率,而且可以应用于实验数据集。

HSD-PAM: High Speed Super Resolution Deep Penetration Photoacoustic Microscopy Imaging Boosted by Dual Branch Fusion Network

  • paper_url: http://arxiv.org/abs/2308.04922
  • repo_url: None
  • paper_authors: Zhengyuan Zhang, Haoran Jin, Zesheng Zheng, Wenwen Zhang, Wenhao Lu, Feng Qin, Arunima Sharma, Manojit Pramanik, Yuanjin Zheng
  • for: 这篇论文旨在提出一种硬件和软件合作的方法,以提高photoacoustic microscopy(PAM)系统的速度、分辨率和深度三个关键参数的矛盾。
  • methods: 该方法基于数据驱动的算法,包括一个新的双分支合并网络,其中一个高分辨率分支和一个高速分支。
  • results: 经过广泛的数值和生物体实验 validate,该算法可以提高PAM系统的速度和分辨率,同时保持AR-PAM模式的深度能力。结果显示,对于低分辨率、低采样率AR-PAM图像,可以通过增加采样率和提高分辨率来提高图像质量,从而实现高速、超分辨率、深度采集的PAM系统(HSD-PAM)。
    Abstract Photoacoustic microscopy (PAM) is a novel implementation of photoacoustic imaging (PAI) for visualizing the 3D bio-structure, which is realized by raster scanning of the tissue. However, as three involved critical imaging parameters, imaging speed, lateral resolution, and penetration depth have mutual effect to one the other. The improvement of one parameter results in the degradation of other two parameters, which constrains the overall performance of the PAM system. Here, we propose to break these limitations by hardware and software co-design. Starting with low lateral resolution, low sampling rate AR-PAM imaging which possesses the deep penetration capability, we aim to enhance the lateral resolution and up sampling the images, so that high speed, super resolution, and deep penetration for the PAM system (HSD-PAM) can be achieved. Data-driven based algorithm is a promising approach to solve this issue, thereby a dedicated novel dual branch fusion network is proposed, which includes a high resolution branch and a high speed branch. Since the availability of switchable AR-OR-PAM imaging system, the corresponding low resolution, undersample AR-PAM and high resolution, full sampled OR-PAM image pairs are utilized for training the network. Extensive simulation and in vivo experiments have been conducted to validate the trained model, enhancement results have proved the proposed algorithm achieved the best perceptual and quantitative image quality. As a result, the imaging speed is increased 16 times and the imaging lateral resolution is improved 5 times, while the deep penetration merit of AR-PAM modality is still reserved.
    摘要 photoacoustic microscopy (PAM) 是一种新的 photoacoustic imaging (PAI) 技术,用于可见三维生物结构,通过扫描生物组织的方式实现。然而,存在三个关键的图像参数之间的互相关系:图像速度、横向分辨率和吸收深度。提高一个参数会导致其他两个参数受损,这限制了整体 PAM 系统的性能。我们提议通过硬件和软件合作来突破这些限制。从低横向分辨率、低抽象率 AR-PAM 成像开始,我们想要提高横向分辨率和更新图像,以达到高速、超分辨、深入吸收的 PAM 系统(HSD-PAM)。数据驱动基于的算法是一种有希望的方法,因此我们提出了一种专门的双分支融合网络,包括高分辨率分支和高速分支。由于可用的可 switchable AR-OR-PAM 成像系统,对应的低分辨率、下抽象 AR-PAM 和高分辨率、全样本 OR-PAM 图像对在训练网络时使用。我们进行了广泛的 simulations 和生物实验,以验证训练的模型,提升结果表明,提案的算法实现了最佳的感知和量化图像质量。因此,图像速度提高 16 倍,横向分辨率提高 5 倍,而 AR-PAM 模式下的深入吸收特性仍然保留。

StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability

  • paper_url: http://arxiv.org/abs/2308.04904
  • repo_url: https://github.com/qmme/stablevqa
  • paper_authors: Tengchuan Kou, Xiaohong Liu, Wei Sun, Jun Jia, Xiongkuo Min, Guangtao Zhai, Ning Liu
  • For: The paper is written for evaluating the stability of User Generated Content (UGC) videos and proposing a novel Video Quality Assessment for Stability (VQA-S) model.* Methods: The paper uses a novel VQA-S model named StableVQA, which consists of three feature extractors to acquire optical flow, semantic, and blur features, and a regression layer to predict the final stability score.* Results: The paper achieves a higher correlation with subjective opinions than existing VQA-S models and generic VQA models, and provides a new database named StableDB that contains 1,952 diversely-shaky UGC videos with subjective scores for video stability.
    Abstract Video shakiness is an unpleasant distortion of User Generated Content (UGC) videos, which is usually caused by the unstable hold of cameras. In recent years, many video stabilization algorithms have been proposed, yet no specific and accurate metric enables comprehensively evaluating the stability of videos. Indeed, most existing quality assessment models evaluate video quality as a whole without specifically taking the subjective experience of video stability into consideration. Therefore, these models cannot measure the video stability explicitly and precisely when severe shakes are present. In addition, there is no large-scale video database in public that includes various degrees of shaky videos with the corresponding subjective scores available, which hinders the development of Video Quality Assessment for Stability (VQA-S). To this end, we build a new database named StableDB that contains 1,952 diversely-shaky UGC videos, where each video has a Mean Opinion Score (MOS) on the degree of video stability rated by 34 subjects. Moreover, we elaborately design a novel VQA-S model named StableVQA, which consists of three feature extractors to acquire the optical flow, semantic, and blur features respectively, and a regression layer to predict the final stability score. Extensive experiments demonstrate that the StableVQA achieves a higher correlation with subjective opinions than the existing VQA-S models and generic VQA models. The database and codes are available at https://github.com/QMME/StableVQA.
    摘要 视频不稳定是User Generated Content(UGC)视频中的一种不приятный扭曲,通常是因为摄像机不稳定。在过去的几年中,许多视频稳定算法被提出,但没有专门和准确的度量能够全面评估视频的稳定性。实际上,大多数现有的视频质量评估模型只评估视频质量总体而不是特定地考虑视频稳定的主观体验。因此,这些模型不能明确和精确地测量在严重抖动时的视频稳定性。另外,没有公开的大规模视频数据库,其中包含了不同程度的抖动视频以及对应的主观得分,这阻碍了视频质量评估 для稳定(VQA-S)的发展。为此,我们建立了一个名为StableDB的数据库,其中包含1,952个多样化的UGC视频,每个视频都有由34名评分者评分的视频稳定度的Mean Opinion Score(MOS)。此外,我们还 elaborately 设计了一种名为StableVQA的新型VQA-S模型,它包括三个特征提取器,用于获取光流、 semantics 和模糊特征,以及一个回归层用于预测最终的稳定度分。广泛的实验表明,StableVQA在与主观意见相关性方面高于现有的VQA-S模型和通用VQA模型。数据库和代码可以在https://github.com/QMME/StableVQA 中获取。

An automated pipeline for quantitative T2* fetal body MRI and segmentation at low field

  • paper_url: http://arxiv.org/abs/2308.04903
  • repo_url: None
  • paper_authors: Kelly Payette, Alena Uus, Jordina Aviles Verdera, Carla Avena Zampieri, Megan Hall, Lisa Story, Maria Deprez, Mary A. Rutherford, Joseph V. Hajnal, Sebastien Ourselin, Raphael Tomi-Tricot, Jana Hutter
  • for: 这项研究的目的是为了开发一种用于低场强磁共振成像的半自动化管道,以实现快速和详细的量化T2*相关分析。
  • methods: 这种管道使用了多echo动态序列获取和重建方法,并使用了一个神经网络来自动 segment fetal体3D卷积体。
  • results: 研究发现,这种管道可以成功地对17-40周的胎儿进行高精度的T2*相关分析,并且具有高度的抗变化和鲁棒性。
    Abstract Fetal Magnetic Resonance Imaging at low field strengths is emerging as an exciting direction in perinatal health. Clinical low field (0.55T) scanners are beneficial for fetal imaging due to their reduced susceptibility-induced artefacts, increased T2* values, and wider bore (widening access for the increasingly obese pregnant population). However, the lack of standard automated image processing tools such as segmentation and reconstruction hampers wider clinical use. In this study, we introduce a semi-automatic pipeline using quantitative MRI for the fetal body at low field strength resulting in fast and detailed quantitative T2* relaxometry analysis of all major fetal body organs. Multi-echo dynamic sequences of the fetal body were acquired and reconstructed into a single high-resolution volume using deformable slice-to-volume reconstruction, generating both structural and quantitative T2* 3D volumes. A neural network trained using a semi-supervised approach was created to automatically segment these fetal body 3D volumes into ten different organs (resulting in dice values > 0.74 for 8 out of 10 organs). The T2* values revealed a strong relationship with GA in the lungs, liver, and kidney parenchyma (R^2>0.5). This pipeline was used successfully for a wide range of GAs (17-40 weeks), and is robust to motion artefacts. Low field fetal MRI can be used to perform advanced MRI analysis, and is a viable option for clinical scanning.
    摘要 低场强磁共振成像在妊娠健康领域是一项发展方向。低场(0.55T)扫描仪的优点包括降低受影响的artefacts,提高T2*值和更宽的轴扩(扩大对日益肥胖妊娠人口的访问)。然而,由于缺乏标准自动化图像处理工具,如分割和重建,因此对于临床应用而言,更加受限。本研究中,我们提出了一个半自动化管线,使用量化MRI对妊娠体内部进行快速和详细的T2*相关分析。我们使用多echo动态序列获取和重建妊娠体内部高分辨率三维volume,并使用可变的材料学模型进行slice-to-volume重建。通过使用半超级vised学习方法,我们自动将妊娠体三维volume分割成十个不同的器官(得到了 dice值超过0.74的八个器官)。T2*值与胎儿龄(GA)之间存在强相关性(R^2>0.5)。这种管线在17-40周的多个胎儿龄上成功应用,并具有对运动artefacts的Robust性。低场妊娠MRI可以进行高级MRI分析,是一种可靠的扫描选项。

Transmission and Color-guided Network for Underwater Image Enhancement

  • paper_url: http://arxiv.org/abs/2308.04892
  • repo_url: None
  • paper_authors: Pan Mu, Jing Fang, Haotian Qian, Cong Bai
  • for: 本研究旨在提高水下图像的显示质量,解决光线传播在水中的吸收和散射问题。
  • methods: 该研究提出了一种基于自适应传输和动态色彩指导网络的水下图像增强方法(名为ATDCnet),包括采用物理知识设计的自适应传输指导模块(ATM)、采用动态色彩指导模块(DCM)进行颜色偏差问题的解决,以及基于编码器-解码器结构和多阶段特征融合机制进行颜色恢复和对比度增强同时处理。
  • results: EXTENSIVE experiments示出ATDCnet在多个benchmark数据集上达到了当今最佳性能水平。
    Abstract In recent years, with the continuous development of the marine industry, underwater image enhancement has attracted plenty of attention. Unfortunately, the propagation of light in water will be absorbed by water bodies and scattered by suspended particles, resulting in color deviation and low contrast. To solve these two problems, we propose an Adaptive Transmission and Dynamic Color guided network (named ATDCnet) for underwater image enhancement. In particular, to exploit the knowledge of physics, we design an Adaptive Transmission-directed Module (ATM) to better guide the network. To deal with the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Further, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to perform color restoration and contrast enhancement simultaneously. Extensive experiments demonstrate the state-of-the-art performance of the ATDCnet on multiple benchmark datasets.
    摘要 Specifically, we design an Adaptive Transmission-directed Module (ATM) to leverage the knowledge of physics and better guide the network. To address the color deviation problem, we design a Dynamic Color-guided Module (DCM) to post-process the enhanced image color. Additionally, we design an Encoder-Decoder-based Compensation (EDC) structure with attention and a multi-stage feature fusion mechanism to simultaneously perform color restoration and contrast enhancement. Extensive experiments show that the ATDCnet achieves state-of-the-art performance on multiple benchmark datasets.

Deep Generative Networks for Heterogeneous Augmentation of Cranial Defects

  • paper_url: http://arxiv.org/abs/2308.04883
  • repo_url: None
  • paper_authors: Kamil Kwarciak, Marek Wodzinski
  • for: 这个研究旨在提高人工头盾设计的自动化程度,使用深度学习技术来解决高多标本异常性的挑战。
  • methods: 本研究使用三种深度生成模型来增强数据集,包括 Wasserstein生成对抗网络受扰条件(WGAN-GP)、WGAN-GP混合方法和 introspective Variational Autoencoder(IntroVAE)。这些模型可以生成万以上的异常头盾,并且可以控制异常的多标本性和头盾的实际形状。
  • results: 这些生成的异常头盾可以帮助提高人工头盾设计的自动化程度,并且可以增强异常部分的分类效果。研究显示,使用这些生成的异常头盾可以提高头盾设计的精度和效率。
    Abstract The design of personalized cranial implants is a challenging and tremendous task that has become a hot topic in terms of process automation with the use of deep learning techniques. The main challenge is associated with the high diversity of possible cranial defects. The lack of appropriate data sources negatively influences the data-driven nature of deep learning algorithms. Hence, one of the possible solutions to overcome this problem is to rely on synthetic data. In this work, we propose three volumetric variations of deep generative models to augment the dataset by generating synthetic skulls, i.e. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), WGAN-GP hybrid with Variational Autoencoder pretraining (VAE/WGAN-GP) and Introspective Variational Autoencoder (IntroVAE). We show that it is possible to generate dozens of thousands of defective skulls with compatible defects that achieve a trade-off between defect heterogeneity and the realistic shape of the skull. We evaluate obtained synthetic data quantitatively by defect segmentation with the use of V-Net and qualitatively by their latent space exploration. We show that the synthetically generated skulls highly improve the segmentation process compared to using only the original unaugmented data. The generated skulls may improve the automatic design of personalized cranial implants for real medical cases.
    摘要 个人化骨Implant设计是一个具有挑战性和巨大性的任务,现在透过深度学习技术进行自动化处理而成为热门话题。主要挑战在于骨骼缺陷的多样性。无法获得适当的数据源,对深度学习算法的数据驱动性产生负面影响。因此,我们提出了利用合成数据作为解决方案。在这个研究中,我们提出了三种深度生成模型的卷积量变化,即Wasserstein生成对抗网络受限GP(WGAN-GP)、WGAN-GP混合Variale Autoencoder预训练(VAE/WGAN-GP)和Introspective Variale Autoencoder(IntroVAE)。我们发现可以生成万分之一的缺陷骨骼,并且可以实现缺陷兼具实际骨骼形状的调和。我们使用V-Net进行缺陷分类,并且进行潜在空间探索。我们发现生成的骨骼可以对骨骼缺陷进行优化分类,相比使用仅有原始未处理数据,提高分类效果。这些生成的骨骼可能对实际医疗案例中的个人化骨Implant设计提供帮助。

HyperCoil-Recon: A Hypernetwork-based Adaptive Coil Configuration Task Switching Network for MRI Reconstruction

  • paper_url: http://arxiv.org/abs/2308.04821
  • repo_url: https://github.com/sriprabhar/hypercoil-recon
  • paper_authors: Sriprabha Ramanarayanan, Mohammad Al Fahim, Rahul G. S., Amrit Kumar Jethi, Keerthi Ram, Mohanasankar Sivaprakasam
  • for: 这个研究是为了提高多极MRI重建的速度和精度,并且解决了训练或微调深度学习模型的问题,以便在临床实施中使用。
  • methods: 这个研究使用了hypernetwork-based的coil configuration task-switching network,将多个配置的数据集合为一个多任务学习 perspective,将每个配置视为一个任务,并将对应的问题特有的 weights 应用到重建网络中。
  • results: 这个方法可以在不同的配置下进行灵活的自适应,并且可以与特定的配置进行实时的配置匹配,实现了在试验时对应不同配置的普遍性。实验结果显示,这个方法可以与特定配置进行匹配,并且可以与不同的配置进行自适应,而且与专门设计的配置模型相比,有着1-3dB / 0.02-0.03的提升。
    Abstract Parallel imaging, a fast MRI technique, involves dynamic adjustments based on the configuration i.e. number, positioning, and sensitivity of the coils with respect to the anatomy under study. Conventional deep learning-based image reconstruction models have to be trained or fine-tuned for each configuration, posing a barrier to clinical translation, given the lack of computational resources and machine learning expertise for clinicians to train models at deployment. Joint training on diverse datasets learns a single weight set that might underfit to deviated configurations. We propose, HyperCoil-Recon, a hypernetwork-based coil configuration task-switching network for multi-coil MRI reconstruction that encodes varying configurations of the numbers of coils in a multi-tasking perspective, posing each configuration as a task. The hypernetworks infer and embed task-specific weights into the reconstruction network, 1) effectively utilizing the contextual knowledge of common and varying image features among the various fields-of-view of the coils, and 2) enabling generality to unseen configurations at test time. Experiments reveal that our approach 1) adapts on the fly to various unseen configurations up to 32 coils when trained on lower numbers (i.e. 7 to 11) of randomly varying coils, and to 120 deviated unseen configurations when trained on 18 configurations in a single model, 2) matches the performance of coil configuration-specific models, and 3) outperforms configuration-invariant models with improvement margins of around 1 dB / 0.03 and 0.3 dB / 0.02 in PSNR / SSIM for knee and brain data. Our code is available at https://github.com/sriprabhar/HyperCoil-Recon
    摘要 “ параллельное изображение, быстрый метод МРИ, involves 动态调整基于配置,即数量、位置和敏感度的磁共振器与研究对象的解剖学相关。 传统的深度学习基于图像重建模型需要训练或微调,这会对临床应用带来障碍,因为临床医生缺乏计算资源和机器学习专业知识来训练模型。 我们提出了 HyperCoil-Recon,一种基于 hypernetwork 的磁共振器配置任务转换网络,用于多个磁共振器的图像重建。 Hypernetworks 将任务特定的 weights 适应到重建网络中,1) 有效地利用磁共振器不同配置下图像特征的共同知识,2) 允许在测试时适应未看过的配置。 实验表明,我们的方法可以1) 在未见过配置下适应到多达 32 个磁共振器,2) 与磁共振器配置特定模型匹配,3) 超过配置不变模型,增幅约为 1 dB / 0.03 和 0.3 dB / 0.02 的 PSNR / SSIM 值。我们的代码可以在 GitHub 上找到。”

An Integrated Visual Analytics System for Studying Clinical Carotid Artery Plaques

  • paper_url: http://arxiv.org/abs/2308.06285
  • repo_url: None
  • paper_authors: Chaoqing Xu, Zhentao Zheng, Yiting Fu, Baofeng Chang, Legao Chen, Minghui Wu, Mingli Song, Jinsong Jiang
  • for: 这个论文旨在开发一个智能血液动脉疾病诊断系统,帮助血液动脉外科医生全面分析血液动脉疾病的临床生物学和成像指标。
  • methods: 该系统包括两个主要功能:首先,通过一系列的信息视化方法,显示血液动脉疾病和各种因素之间的相关性,并将患者生物学指标数据集成分析。其次,通过机器学习技术,提高血液动脉疾病组成部分之间的自然相关性的显示,并在医疗图像上显示血液动脉疾病的空间分布。
  • results: 通过使用医院实际取得的数据进行两个案例研究,结果表明,设计的血液动脉分析系统可以有效地为血液动脉外科医生提供临床诊断和治疗指导。
    Abstract Carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.
    摘要 carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.Here's the text in Traditional Chinese:carotid artery plaques can cause arterial vascular diseases such as stroke and myocardial infarction, posing a severe threat to human life. However, the current clinical examination mainly relies on a direct assessment by physicians of patients' clinical indicators and medical images, lacking an integrated visualization tool for analyzing the influencing factors and composition of carotid artery plaques. We have designed an intelligent carotid artery plaque visual analysis system for vascular surgery experts to comprehensively analyze the clinical physiological and imaging indicators of carotid artery diseases. The system mainly includes two functions: First, it displays the correlation between carotid artery plaque and various factors through a series of information visualization methods and integrates the analysis of patient physiological indicator data. Second, it enhances the interface guidance analysis of the inherent correlation between the components of carotid artery plaque through machine learning and displays the spatial distribution of the plaque on medical images. Additionally, we conducted two case studies on carotid artery plaques using real data obtained from a hospital, and the results indicate that our designed carotid analysis system can effectively provide clinical diagnosis and treatment guidance for vascular surgeons.

Long-Distance Gesture Recognition using Dynamic Neural Networks

  • paper_url: http://arxiv.org/abs/2308.04643
  • repo_url: None
  • paper_authors: Shubhang Bhatnagar, Sharath Gopal, Narendra Ahuja, Liu Ren
  • For: This paper is written for recognizing gestures from longer distances, specifically for applications such as gesture-based interactions with a floor cleaning robot or a drone.* Methods: The proposed method uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing, which helps the network focus on features important for gesture recognition while discarding background features early on.* Results: The proposed method outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency in the LD-ConGR long-distance dataset.
    Abstract Gestures form an important medium of communication between humans and machines. An overwhelming majority of existing gesture recognition methods are tailored to a scenario where humans and machines are located very close to each other. This short-distance assumption does not hold true for several types of interactions, for example gesture-based interactions with a floor cleaning robot or with a drone. Methods made for short-distance recognition are unable to perform well on long-distance recognition due to gestures occupying only a small portion of the input data. Their performance is especially worse in resource constrained settings where they are not able to effectively focus their limited compute on the gesturing subject. We propose a novel, accurate and efficient method for the recognition of gestures from longer distances. It uses a dynamic neural network to select features from gesture-containing spatial regions of the input sensor data for further processing. This helps the network focus on features important for gesture recognition while discarding background features early on, thus making it more compute efficient compared to other techniques. We demonstrate the performance of our method on the LD-ConGR long-distance dataset where it outperforms previous state-of-the-art methods on recognition accuracy and compute efficiency.
    摘要 Gestures 是人工智能与机器之间重要的通信媒体。现有的大多数手势识别方法假设人类和机器在非常近的距离上进行交互,这个短距离假设不符实际的许多交互情况,例如与地板干净机器人或者无人机的交互。由于手势占输入数据中的只占一小部分,因此这些方法在远距离识别中表现不佳。我们提出了一种新的、准确和高效的手势识别方法,它使用动态神经网络选择手势包含的空间区域的输入感知数据进行进一步处理。这有助于网络在执行手势识别时更加有效地启用有限的计算资源,并且能够更高效地识别手势。我们在LD-ConGR长距离数据集上展示了我们的方法的性能,其与前一个状态的方法在识别精度和计算效率上都表现出色。

1st Place Solution for CVPR2023 BURST Long Tail and Open World Challenges

  • paper_url: http://arxiv.org/abs/2308.04598
  • repo_url: None
  • paper_authors: Kaer Huang
    for:* This paper focuses on video instance segmentation (VIS) in long-tailed and open-world scenarios, where traditional VIS methods are limited to a small number of common classes but real-world applications require detection and tracking of rare and never-before-seen objects.methods:* The proposed method, LeTracker, uses a combination of segmentation and CEM on LVISv0.5 + COCO dataset for training the detector, and instance appearance similarity head on TAO dataset.results:* LeTracker achieves 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark, and 61.4 OWTAall in the open-world challenges, ranking 1st in the benchmark.
    Abstract Currently, Video Instance Segmentation (VIS) aims at segmenting and categorizing objects in videos from a closed set of training categories that contain only a few dozen of categories, lacking the ability to handle diverse objects in real-world videos. As TAO and BURST datasets release, we have the opportunity to research VIS in long-tailed and open-world scenarios. Traditional VIS methods are evaluated on benchmarks limited to a small number of common classes, But practical applications require trackers that go beyond these common classes, detecting and tracking rare and even never-before-seen objects. Inspired by the latest MOT paper for the long tail task (Tracking Every Thing in the Wild, Siyuan Li et), for the BURST long tail challenge, we train our model on a combination of LVISv0.5 and the COCO dataset using repeat factor sampling. First, train the detector with segmentation and CEM on LVISv0.5 + COCO dataset. And then, train the instance appearance similarity head on the TAO dataset. at last, our method (LeTracker) gets 14.9 HOTAall in the BURST test set, ranking 1st in the benchmark. for the open-world challenges, we only use 64 classes (Intersection classes of BURST Train subset and COCO dataset, without LVIS dataset) annotations data training, and testing on BURST test set data and get 61.4 OWTAall, ranking 1st in the benchmark. Our code will be released to facilitate future research.
    摘要 当前,视频实例分 segmentation(VIS)目标是将视频中的对象分类和分割,但现有的VIS方法只能处理固定的训练类别,缺乏对实际世界视频中的多样化对象的能力。随着TAO和BURST数据集的发布,我们有机会进行VIS在长尾和开放世界场景下的研究。传统的VIS方法通常被评估在限定的一些常见类别上,但实际应用需要超出这些常见类别,检测和跟踪罕见和从未seen的对象。 drawing inspiration from the latest MOT paper on the long tail task(Tracking Every Thing in the Wild,Siyuan Li et al),我们在BURST长尾挑战中使用重复因子抽样训练我们的模型。首先,我们使用LVISv0.5和COCO数据集训练探测器的 segmentation和CEM。然后,我们在TAO数据集上训练实例外观相似头。最后,我们的方法(LeTracker)在BURST测试集上获得14.9 HOTAall,排名第一名。在开放世界挑战中,我们只使用64个类别(BURST训练集和COCO数据集的交集类别,不包括LVIS数据集)的注释数据进行训练,并在BURST测试集上进行测试,得到61.4 OWTAall,排名第一名。我们将代码发布,以便未来的研究。

Semi-Supervised Semantic Segmentation of Cell Nuclei via Diffusion-based Large-Scale Pre-Training and Collaborative Learning

  • paper_url: http://arxiv.org/abs/2308.04578
  • repo_url: None
  • paper_authors: Zhuchen Shao, Sourya Sengupta, Hua Li, Mark A. Anastasio
  • for: 这篇论文的目的是提出一个新的不监督预训学习框架 для microscopic 图像中细胞核lei的自动化 semantic segmentation, 以便于疾病诊断和组织微环境分析。
  • methods: 这个框架包括三个主要部分:一、使用一个大规模的无标注数据集进行传播模型的预训; two、使用一个 transformer 型的整合器来将传播模型中的内部特征聚合; three、实现一个协力学习框架,让传播模型和一个监督式分类模型进行共同学习。
  • results: 在四个公开的数据集上进行了实验,证明了我们的框架可以与竞争性的半监督数据学习方法相比,并且在监督式分类模型的基础上进行了进一步的改进。
    Abstract Automated semantic segmentation of cell nuclei in microscopic images is crucial for disease diagnosis and tissue microenvironment analysis. Nonetheless, this task presents challenges due to the complexity and heterogeneity of cells. While supervised deep learning methods are promising, they necessitate large annotated datasets that are time-consuming and error-prone to acquire. Semi-supervised approaches could provide feasible alternatives to this issue. However, the limited annotated data may lead to subpar performance of semi-supervised methods, regardless of the abundance of unlabeled data. In this paper, we introduce a novel unsupervised pre-training-based semi-supervised framework for cell-nuclei segmentation. Our framework is comprised of three main components. Firstly, we pretrain a diffusion model on a large-scale unlabeled dataset. The diffusion model's explicit modeling capability facilitates the learning of semantic feature representation from the unlabeled data. Secondly, we achieve semantic feature aggregation using a transformer-based decoder, where the pretrained diffusion model acts as the feature extractor, enabling us to fully utilize the small amount of labeled data. Finally, we implement a collaborative learning framework between the diffusion-based segmentation model and a supervised segmentation model to further enhance segmentation performance. Experiments were conducted on four publicly available datasets to demonstrate significant improvements compared to competitive semi-supervised segmentation methods and supervised baselines. A series of out-of-distribution tests further confirmed the generality of our framework. Furthermore, thorough ablation experiments and visual analysis confirmed the superiority of our proposed method.
    摘要 自动化的细胞核 segmentation 在微scopic 图像中是致命的 для疾病诊断和组织微environment 分析。然而,这个任务存在复杂性和多样性的细胞问题。 Although supervised deep learning methods are promising, they require large annotated datasets that are time-consuming and error-prone to acquire. Semi-supervised approaches could provide feasible alternatives to this issue. However, the limited annotated data may lead to subpar performance of semi-supervised methods, regardless of the abundance of unlabeled data.In this paper, we introduce a novel unsupervised pre-training-based semi-supervised framework for cell-nuclei segmentation. Our framework consists of three main components. First, we pretrain a diffusion model on a large-scale unlabeled dataset. The diffusion model's explicit modeling capability facilitates the learning of semantic feature representation from the unlabeled data. Second, we achieve semantic feature aggregation using a transformer-based decoder, where the pretrained diffusion model acts as the feature extractor, enabling us to fully utilize the small amount of labeled data. Finally, we implement a collaborative learning framework between the diffusion-based segmentation model and a supervised segmentation model to further enhance segmentation performance. Experiments were conducted on four publicly available datasets to demonstrate significant improvements compared to competitive semi-supervised segmentation methods and supervised baselines. A series of out-of-distribution tests further confirmed the generality of our framework. Furthermore, thorough ablation experiments and visual analysis confirmed the superiority of our proposed method.

Towards Automatic Scoring of Spinal X-ray for Ankylosing Spondylitis

  • paper_url: http://arxiv.org/abs/2308.05123
  • repo_url: None
  • paper_authors: Yuanhan Mo, Yao Chen, Aimee Readie, Gregory Ligozio, Thibaud Coroller, Bartłomiej W. Papież
  • for: 这个研究旨在开发一个自动评分pipeline,以便从骨盘X射线成像中自动预测 modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) 的分数。
  • methods: 这个pipeline使用了我们先前开发的 VU 抽出pipeline (VertXNet) 来生成 VU,然后使用这些 VU 作为输入,预测 mSASSS 分数。
  • results: 我们的结果显示,这个pipeline可以在有限量和不均匀的数据下预测每个 VU 的 mSASSS 分数。总体而言,它可以在两个试验数据集上 achieve 平均准确率为 0.56 和 0.51 для 4 个不同的 mSASSS 分数 (即分数为 0、1、2、3)。
    Abstract Manually grading structural changes with the modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) on spinal X-ray imaging is costly and time-consuming due to bone shape complexity and image quality variations. In this study, we address this challenge by prototyping a 2-step auto-grading pipeline, called VertXGradeNet, to automatically predict mSASSS scores for the cervical and lumbar vertebral units (VUs) in X-ray spinal imaging. The VertXGradeNet utilizes VUs generated by our previously developed VU extraction pipeline (VertXNet) as input and predicts mSASSS based on those VUs. VertXGradeNet was evaluated on an in-house dataset of lateral cervical and lumbar X-ray images for axial spondylarthritis patients. Our results show that VertXGradeNet can predict the mSASSS score for each VU when the data is limited in quantity and imbalanced. Overall, it can achieve a balanced accuracy of 0.56 and 0.51 for 4 different mSASSS scores (i.e., a score of 0, 1, 2, 3) on two test datasets. The accuracy of the presented method shows the potential to streamline the spinal radiograph readings and therefore reduce the cost of future clinical trials.
    摘要 人工评估结构变化使用修改Stoke Ankylosing Spondylitis Spinal Score(mSASSS)在脊椎X射线成像中是成本高和时间耗费大的,这是因为骨形态复杂和成像质量变化。在这项研究中,我们解决这个挑战,推出了一个两步自动评估管道,称为VertXGradeNet,可以自动预测脊椎X射线成像中mSASSS分数。VertXGradeNet使用我们之前开发的VU提取管道(VertXNet)生成的VU作为输入,并根据这些VU预测mSASSS分数。我们在医学实验室内的一个后退性cervical和肠脊椎X射线成像数据集上评估了VertXGradeNet。结果表明,VertXGradeNet可以在数据量有限、不均衡的情况下预测每个VU的mSASSS分数。总的来说,它可以在两个测试数据集上实现平衡性的准确率0.56和0.51,对四个不同的mSASSS分数(即分数为0、1、2、3)进行预测。提出的方法的准确率表明了将来的脊椎X射线成像读取可能会更加流畅,从而降低未来临床试验的成本。