eess.IV - 2023-10-27

FPM-INR: Fourier ptychographic microscopy image stack reconstruction using implicit neural representations

paper_url: http://arxiv.org/abs/2310.18529
repo_url: https://github.com/hwzhou2020/fpm_inr
paper_authors: Haowen Zhou, Brandon Y. Feng, Haiyun Guo, Siyu, Lin, Mingshu Liang, Christopher A. Metzler, Changhuei Yang
for:* 这个论文是为了提高快速大型远程生物学图像分析。methods:* 这个论文使用了物理学基础模型和隐藏神经网络表示（INR）来重建快速大型远程生物学图像。results:* 比traditional FPM算法快速25倍，内存占用量减少80倍。

Abstract
Image stacks provide invaluable 3D information in various biological and pathological imaging applications. Fourier ptychographic microscopy (FPM) enables reconstructing high-resolution, wide field-of-view image stacks without z-stack scanning, thus significantly accelerating image acquisition. However, existing FPM methods take tens of minutes to reconstruct and gigabytes of memory to store a high-resolution volumetric scene, impeding fast gigapixel-scale remote digital pathology. While deep learning approaches have been explored to address this challenge, existing methods poorly generalize to novel datasets and can produce unreliable hallucinations. This work presents FPM-INR, a compact and efficient framework that integrates physics-based optical models with implicit neural representations (INR) to represent and reconstruct FPM image stacks. FPM-INR is agnostic to system design or sample types and does not require external training data. In our demonstrated experiments, FPM-INR substantially outperforms traditional FPM algorithms with up to a 25-fold increase in speed and an 80-fold reduction in memory usage for continuous image stack representations.

摘要

TabAttention: Learning Attention Conditionally on Tabular Data

paper_url: http://arxiv.org/abs/2310.18129
repo_url: https://github.com/sanoscience/tab-attention
paper_authors: Michal K. Grzeszczyk, Szymon Płotka, Beata Rebizant, Katarzyna Kosińska-Kaczyńska, Michał Lipa, Robert Brawura-Biskupski-Samaha, Przemysław Korzeniowski, Tomasz Trzciński, Arkadiusz Sitek
for: 这 paper 的目的是提出一种基于 Convolutional Neural Networks (CNNs) 的 novel 模块 TabAttention，用于combine 图像和表格数据进行预测。
methods: 该 paper 使用了一种叫做 Convolutional Block Attention Module 的模块，并将其扩展到 3D 空间，使用多头自注意力学习 attention maps。此外， authors 还增强了所有的注意模块，通过将表格数据嵌入。
results: 据 authors 的实验结果，TabAttention 可以超过临床医生和现有的方法，用于FBW 预测。这种新的方法有potential 用于各种临床工作流程中， где 图像和表格数据相结合。

Abstract
Medical data analysis often combines both imaging and tabular data processing using machine learning algorithms. While previous studies have investigated the impact of attention mechanisms on deep learning models, few have explored integrating attention modules and tabular data. In this paper, we introduce TabAttention, a novel module that enhances the performance of Convolutional Neural Networks (CNNs) with an attention mechanism that is trained conditionally on tabular data. Specifically, we extend the Convolutional Block Attention Module to 3D by adding a Temporal Attention Module that uses multi-head self-attention to learn attention maps. Furthermore, we enhance all attention modules by integrating tabular data embeddings. Our approach is demonstrated on the fetal birth weight (FBW) estimation task, using 92 fetal abdominal ultrasound video scans and fetal biometry measurements. Our results indicate that TabAttention outperforms clinicians and existing methods that rely on tabular and/or imaging data for FBW prediction. This novel approach has the potential to improve computer-aided diagnosis in various clinical workflows where imaging and tabular data are combined. We provide a source code for integrating TabAttention in CNNs at https://github.com/SanoScience/Tab-Attention.

摘要
医疗数据分析经常结合图像和表格数据处理使用机器学习算法。 previous studies have investigated the impact of attention mechanisms on deep learning models, but few have explored integrating attention modules and tabular data. In this paper, we introduce TabAttention, a novel module that enhances the performance of Convolutional Neural Networks (CNNs) with an attention mechanism that is trained conditionally on tabular data. Specifically, we extend the Convolutional Block Attention Module to 3D by adding a Temporal Attention Module that uses multi-head self-attention to learn attention maps. Furthermore, we enhance all attention modules by integrating tabular data embeddings. Our approach is demonstrated on the fetal birth weight (FBW) estimation task, using 92 fetal abdominal ultrasound video scans and fetal biometry measurements. Our results indicate that TabAttention outperforms clinicians and existing methods that rely on tabular and/or imaging data for FBW prediction. This novel approach has the potential to improve computer-aided diagnosis in various clinical workflows where imaging and tabular data are combined. We provide a source code for integrating TabAttention in CNNs at .

Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images

paper_url: http://arxiv.org/abs/2310.17911
repo_url: https://github.com/hyperspectral-skin/hyper-skin-2023
paper_authors: Pai Chet Ng, Zhixiang Chi, Yannick Verdie, Juwei Lu, Konstantinos N. Plataniotis
for:* 这个论文是为了探讨人脸皮肤的各种特征和问题而设计的。methods:* 这个论文使用了推杆式彩色扫描仪获取了各种彩色图像，并使用了这些图像来重建人脸皮肤的各种spectra特征。results:* 这个论文通过使用现有的state-of-the-art模型对41个带spectra特征的数据进行了重建，并得到了较高的重建精度。

Abstract
We introduce Hyper-Skin, a hyperspectral dataset covering wide range of wavelengths from visible (VIS) spectrum (400nm - 700nm) to near-infrared (NIR) spectrum (700nm - 1000nm), uniquely designed to facilitate research on facial skin-spectra reconstruction. By reconstructing skin spectra from RGB images, our dataset enables the study of hyperspectral skin analysis, such as melanin and hemoglobin concentrations, directly on the consumer device. Overcoming limitations of existing datasets, Hyper-Skin consists of diverse facial skin data collected with a pushbroom hyperspectral camera. With 330 hyperspectral cubes from 51 subjects, the dataset covers the facial skin from different angles and facial poses. Each hyperspectral cube has dimensions of 1024$\times$1024$\times$448, resulting in millions of spectra vectors per image. The dataset, carefully curated in adherence to ethical guidelines, includes paired hyperspectral images and synthetic RGB images generated using real camera responses. We demonstrate the efficacy of our dataset by showcasing skin spectra reconstruction using state-of-the-art models on 31 bands of hyperspectral data resampled in the VIS and NIR spectrum. This Hyper-Skin dataset would be a valuable resource to NeurIPS community, encouraging the development of novel algorithms for skin spectral reconstruction while fostering interdisciplinary collaboration in hyperspectral skin analysis related to cosmetology and skin's well-being. Instructions to request the data and the related benchmarking codes are publicly available at: \url{https://github.com/hyperspectral-skin/Hyper-Skin-2023}.

摘要
我们介绍Hyper-Skin，一个涵盖各种波长的对称资料集，从可见光（VIS） спектル（400nm - 700nm）至近红外（NIR） спектル（700nm - 1000nm）。这个对称资料集专门设计来推进对面部肤 Spectra 重建的研究，通过从RGB图像中推算肤 Spectra，实现在consumer device上进行对肤 Spectra 的分析。在现有资料集的限制之下，Hyper-Skin 的资料集包括多样化的面部肤 Data，透过推挤式对称摄取器收集。资料集包含51名志愿者的330个对称摄取，每个对称摄取都有1024x1024x448的对称立方体，总共有多万个特征向量。资料集遵循道德指南，并包括对称摄取和Synthetic RGB图像，这些图像是使用真实摄像头的回应生成的。我们显示Hyper-Skin 资料集可以用现代模型进行肤 Spectra 重建，并在31个对称摄取中显示了肤 Spectra 的重建。这个Hyper-Skin 资料集将是neurIPS社区的一个宝贵资源，激发开发新的肤 Spectra 重建算法，并促进对肤 Spectra 分析的跨学科合作。请从以下连结获取资料和相关的benchmarking代码：

CPIA Dataset: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training

paper_url: http://arxiv.org/abs/2310.17902
repo_url: https://github.com/zhanglab2021/cpia_dataset
paper_authors: Nan Ying, Yanli Lei, Tianyi Zhang, Shangqing Lyu, Chunhui Li, Sicheng Chen, Zeyu Liu, Yu Zhao, Guanglei Zhang
for: 这个论文主要是为了提出一个大规模的自助学习预训练数据集，以提高计算机辅助诊断中的病理图像分析效果。
methods: 这个论文使用了自助学习（SSL）预训练方法，不需要样本水平标签，从而解决了临床标注的昂贵问题。
results: 这个论文提出了一个大规模的病理图像分析（CPIA）数据集，包含21427877个标准化图像，覆盖了48种器官/组织和100多种疾病，并提供了一些国际顶尖的基线模型和下游评估方法。

Abstract
Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre-training without sample-level labels, which has great potential to overcome the challenge of expensive annotations. Thus, studies focusing on pathological SSL pre-training call for a comprehensive and standardized dataset, similar to the ImageNet in computer vision. This paper presents the comprehensive pathological image analysis (CPIA) dataset, a large-scale SSL pre-training dataset combining 103 open-source datasets with extensive standardization. The CPIA dataset contains 21,427,877 standardized images, covering over 48 organs/tissues and about 100 kinds of diseases, which includes two main data types: whole slide images (WSIs) and characteristic regions of interest (ROIs). A four-scale WSI standardization process is proposed based on the uniform resolution in microns per pixel (MPP), while the ROIs are divided into three scales artificially. This multi-scale dataset is built with the diagnosis habits under the supervision of experienced senior pathologists. The CPIA dataset facilitates a comprehensive pathological understanding and enables pattern discovery explorations. Additionally, to launch the CPIA dataset, several state-of-the-art (SOTA) baselines of SSL pre-training and downstream evaluation are specially conducted. The CPIA dataset along with baselines is available at https://github.com/zhanglab2021/CPIA_Dataset.

摘要
临床图像分析是计算机辅助诊断中的关键领域，深度学习在这个领域中广泛应用。使用预训练模型 initialized 自自然图像的传输学习可以有效提高下渠道的临床性能。然而，由于精细的域专专业 initialize 的缺乏，使得它们的潜力受限。无监督学习（SSL）可以无需样本级别标签进行预训练，这种技术具有巨大的潜力以超越临床标注的成本高。因此，关注临床SSL预训练的研究需要一个完整的、标准化的数据集，类似于计算机视觉领域的ImageNet。本文提出了临床图像分析（CPIA）数据集，这是一个大规模的SSL预训练数据集， combinining 103个开源数据集，通过了广泛的标准化。CPIA数据集包含21,427,877个标准化图像，覆盖了48种器官/组织和约100种疾病，其中包括两种主要数据类型：整幅影像（WSIs）和特征区域 interest（ROIs）。基于MPP（微米每平方Pixel）的均匀分辨率，我们提出了一种四级WSIs标准化过程，而ROIs则被 искусственно分为三级。这个多级数据集是根据经验丰富的高级医生的诊断习惯建立的。CPIA数据集促进了全面的临床理解，并允许探索 Pattern discovery。此外，为了推出CPIA数据集，我们特别进行了一些现状顶峰（SOTA）的SSL预训练和下渠道评估。CPIA数据集、基elines都可以在https://github.com/zhanglab2021/CPIA_Dataset上下载。

Towards optimal multimode fiber imaging by leveraging input polarization and conditional generative adversarial networks

paper_url: http://arxiv.org/abs/2310.17889
repo_url: None
paper_authors: Jawaria Maqbool, Syed Talal Hassan, M. Imran Cheema
for: 实现实用的多模式纤维器成像
methods: 使用深度学习技术和conditional generative adversarial network（CGAN）模型
results: 实验显示，输入光波 polarization 状态对成像质量产生重要影响，并且通过控制输入光波 polarization 状态，可以实现最佳的成像效果。

Abstract
Deep learning techniques provide a plausible route towards achieving practical imaging through multimode fibers. However, the results produced by these methods are often influenced by physical factors like temperature, fiber length, external perturbations, and polarization state of the input light. The impact of other factors, except input light polarization, has been discussed in the literature for imaging applications. The input polarization has been considered by researchers while looking at the characterization and control of polarization in multimode fibers. Here, we show experimentally that the state of polarization of light, being injected at multimode fiber input, affects the fidelity of reconstructed images from speckle patterns. Certain polarization states produce high-quality images at fiber output, while some yield degraded results. We have designed a conditional generative adversarial network~(CGAN) for image regeneration at various degrees of input light polarization. We demonstrate that in the case of multimode fibers that are held fixed, optimal imaging can be achieved by leveraging our CGAN model with the input light polarization state, where the fidelity of images is maximum. Our work exhibits high average structural similarity index values exceeding 0.9, surpassing the previously reported value of 0.8772. We also show that the model can be generalized to image adequately for all input light polarization states when the fiber has bends or twists. We anticipate our work will be a stepping stone toward developing high-resolution and less invasive multimode fiber endoscopes.

摘要
深度学习技术可能提供实用的多模式纤维器成像方法。然而，这些方法的结果经常受到物理因素的影响，如温度、纤维长度、外部干扰和输入光的极化状态。关于成像应用，已经在文献中讨论了其他因素的影响。而输入光的极化状态则在研究人员中被视为 Characterization and control of polarization in multimode fibers。我们的实验表明，输入多模式纤维器的光极化状态会影响生成的图像质量。某些极化状态可以生成高质量的图像，而其他状态则会导致图像受损。我们开发了一种基于CGAN的图像恢复模型，可以在不同的输入光极化状态下进行图像恢复。我们的结果显示，当多模式纤维器保持不动时，我们的模型可以在不同的输入光极化状态下实现最佳的成像。我们的结果超过了之前报道的最高值0.8772，并且表明我们的模型可以在纤维器弯曲或扭转时进行图像恢复。我们预计我们的工作将成为高分辨率和 menos invasive的多模式纤维器镜头的开端。