eess.IV - 2023-08-03

Maximum-likelihood estimation in ptychography in the presence of Poisson-Gaussian noise statistics

  • paper_url: http://arxiv.org/abs/2308.02436
  • repo_url: None
  • paper_authors: Jacob Seifert, Yifeng Shao, Rens van Dam, Dorian Bouchet, Tristan van Leeuwen, Allard P. Mosk
  • for: 提高图像质量在低信号响应率(SNR)下
  • methods: 使用最大有希望估计来考虑摄像头读取噪声,并在梯度基础的ptychography优化中使用此方法
  • results: 根据实验和数值数据显示,该方法可以在困难噪声条件下提高图像重建质量,并且比传统方法更简单地实现
    Abstract Optical measurements often exhibit mixed Poisson-Gaussian noise statistics, which hampers image quality, particularly under low signal-to-noise ratio (SNR) conditions. Computational imaging falls short in such situations when solely Poissonian noise statistics are assumed. In response to this challenge, we define a loss function that explicitly incorporates this mixed noise nature. By using maximum-likelihood estimation, we devise a practical method to account for camera readout noise in gradient-based ptychography optimization. Our results, based on both experimental and numerical data, demonstrate that this approach outperforms the conventional one, enabling enhanced image reconstruction quality under challenging noise conditions through a straightforward methodological adjustment.
    摘要 光学测量 часто会表现出杂合波尔兹-加布斯噪声统计,这会妨碍图像质量,特别是在低信号噪声比(SNR)条件下。计算摄影时,当假设只有波尔兹噪声统计时,计算摄影会失败。为解决这个挑战,我们定义了一个损失函数,该函数直接表达杂合噪声性质。通过最大似然估计,我们开发了一种实用的方法,用于考虑摄像头读取噪声在梯度基础ptychography优化中的影响。我们的结果,基于实验和数值数据,表明该方法在挑战性的噪声条件下能够超越传统方法,提高图像重建质量。

Focus on Content not Noise: Improving Image Generation for Nuclei Segmentation by Suppressing Steganography in CycleGAN

  • paper_url: http://arxiv.org/abs/2308.01769
  • repo_url: None
  • paper_authors: Jonas Utz, Tobias Weise, Maja Schlereth, Fabian Wagner, Mareike Thies, Mingxuan Gu, Stefan Uderhardt, Katharina Breininger
  • for: 这个论文的目的是提高CycleGAN在生成顺序图像的时候的数据质量,以便更好地进行核体分割任务。
  • methods: 这个论文使用了CycleGAN生成器,并通过降噪滤波来除去生成图像中的快速幂数信息,以提高生成图像和颗点mask之间的协调性。
  • results: 这个论文的实验结果表明,通过使用降噪滤波来除去生成图像中的快速幂数信息,可以提高生成的图像和颗点mask之间的协调性,并提高核体分割任务的性能。
    Abstract Annotating nuclei in microscopy images for the training of neural networks is a laborious task that requires expert knowledge and suffers from inter- and intra-rater variability, especially in fluorescence microscopy. Generative networks such as CycleGAN can inverse the process and generate synthetic microscopy images for a given mask, thereby building a synthetic dataset. However, past works report content inconsistencies between the mask and generated image, partially due to CycleGAN minimizing its loss by hiding shortcut information for the image reconstruction in high frequencies rather than encoding the desired image content and learning the target task. In this work, we propose to remove the hidden shortcut information, called steganography, from generated images by employing a low pass filtering based on the DCT. We show that this increases coherence between generated images and cycled masks and evaluate synthetic datasets on a downstream nuclei segmentation task. Here we achieve an improvement of 5.4 percentage points in the F1-score compared to a vanilla CycleGAN. Integrating advanced regularization techniques into the CycleGAN architecture may help mitigate steganography-related issues and produce more accurate synthetic datasets for nuclei segmentation.
    摘要 描述核体在微scopic影像中的标注是一项劳动密集的任务,需要专家知识和受到内部和外部评分变化的影响,尤其在染料微scopic中。生成网络如CycleGAN可以将过程逆转,生成基于给定的mask的合成微scopic影像,从而建立一个合成数据集。然而,过去的工作表明,生成的图像与mask之间存在内容不一致,部分由CycleGAN在高频范围内隐藏短cut信息来抑制图像重建的损失而不是编码所需的图像内容和学习目标任务。在这个工作中,我们提议从生成图像中除去隐藏的短cut信息,使用基于DCT的低通过滤波。我们发现,这会提高生成图像和cycled mask之间的协调性,并评估合成数据集在核体分割任务上的性能。在这里,我们实现了与vanilla CycleGAN相比的5.4个百分点的F1分数提高。可能通过在CycleGAN架构中 интегрирова高级规则化技术可以减少隐藏信息相关的问题,生成更准确的合成数据集 для核体分割任务。

NuInsSeg: A Fully Annotated Dataset for Nuclei Instance Segmentation in H&E-Stained Histological Images

  • paper_url: http://arxiv.org/abs/2308.01760
  • repo_url: https://github.com/masih4/nuinsseg
  • paper_authors: Amirreza Mahbod, Christine Polak, Katharina Feldmann, Rumsha Khan, Katharina Gelles, Georg Dorffner, Ramona Woitek, Sepideh Hatamikia, Isabella Ellinger
  • for: 本研究的目的是提供一个大规模、完全手动标注的核体实例分割数据集(NuInsSeg),以便进行核体分割任务的自动化研究。
  • methods: 本研究使用了深度学习(DL)方法进行核体分割,并提供了一个大规模的手动标注数据集(NuInsSeg),以便训练和测试这些模型。
  • results: 本研究提供了一个大规模的手动标注数据集(NuInsSeg),包含665个图像割辑和超过30,000个手动标注的核体,以及一些涉及的不确定地域mask。
    Abstract In computational pathology, automatic nuclei instance segmentation plays an essential role in whole slide image analysis. While many computerized approaches have been proposed for this task, supervised deep learning (DL) methods have shown superior segmentation performances compared to classical machine learning and image processing techniques. However, these models need fully annotated datasets for training which is challenging to acquire, especially in the medical domain. In this work, we release one of the biggest fully manually annotated datasets of nuclei in Hematoxylin and Eosin (H&E)-stained histological images, called NuInsSeg. This dataset contains 665 image patches with more than 30,000 manually segmented nuclei from 31 human and mouse organs. Moreover, for the first time, we provide additional ambiguous area masks for the entire dataset. These vague areas represent the parts of the images where precise and deterministic manual annotations are impossible, even for human experts. The dataset and detailed step-by-step instructions to generate related segmentation masks are publicly available at https://www.kaggle.com/datasets/ipateam/nuinsseg and https://github.com/masih4/NuInsSeg, respectively.
    摘要 在计算生物学中,自动核体实例分割在整个染色体图像分析中扮演着关键角色。虽然许多计算机化方法已经被提议用于这个任务,但是深度学习(DL)方法在 segmentation 性能方面表现出色,特别是在医疗领域。然而,这些模型需要完全标注的数据集来训练,而在医疗领域获得这些数据集是困难的。在这项工作中,我们发布了一个包含665个图像区域和超过30,000个手动标注的核体的全部批处数据集,称为NuInsSeg。这个数据集包含31种人类和小鼠器官的HE染色图像。此外,我们还为整个数据集提供了首次的不确定区域面罩。这些不确定区域表示图像中 precisions和决定性的手动标注是不可能的,即使是人类专家。数据集和相关的生成 segmentation 面罩的详细步骤都公开在https://www.kaggle.com/datasets/ipateam/nuinsseg和https://github.com/masih4/NuInsSeg 上。

Reference-Free Isotropic 3D EM Reconstruction using Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.01594
  • repo_url: None
  • paper_authors: Kyungryun Lee, Won-Ki Jeong
  • for: overcome the limitations of anisotropic axial resolution in Electron Microscopy (EM) images
  • methods: utilize 2D diffusion models for consistent 3D volume reconstruction, well-suited for highly downsampled data
  • results: superiority of leveraging the generative prior compared to supervised learning methods, and feasibility for self-supervised reconstruction without any training data.Here’s the full text in Simplified Chinese:
  • for: 在电子顺差icroscopy(EM)图像中,因具有特定的探针特性,存在探针分辨率的方向性偏好,这会带来分析和下游任务的挑战。本文提出了基于分散模型的框架,可以突破需要参考数据或先验知识的限制。
  • methods: 我们的方法利用2D分散模型来一致地重建3D体积,适用于高度下采样的数据。我们进行了大量的实验,证明了基于生成器先验的方法在对比supervised学习方法的情况下表现更加稳定和优秀。此外,我们还证明了我们的方法在无参考数据的情况下进行自动重建是可能的。
  • results: 我们的实验结果表明,基于分散模型的方法可以在高度下采样的情况下提供更高质量的重建结果,并且在无参考数据的情况下进行自动重建也是可能的。
    Abstract Electron microscopy (EM) images exhibit anisotropic axial resolution due to the characteristics inherent to the imaging modality, presenting challenges in analysis and downstream tasks.In this paper, we propose a diffusion-model-based framework that overcomes the limitations of requiring reference data or prior knowledge about the degradation process. Our approach utilizes 2D diffusion models to consistently reconstruct 3D volumes and is well-suited for highly downsampled data. Extensive experiments conducted on two public datasets demonstrate the robustness and superiority of leveraging the generative prior compared to supervised learning methods. Additionally, we demonstrate our method's feasibility for self-supervised reconstruction, which can restore a single anisotropic volume without any training data.
    摘要

DMDC: Dynamic-mask-based dual camera design for snapshot Hyperspectral Imaging

  • paper_url: http://arxiv.org/abs/2308.01541
  • repo_url: https://github.com/caizeyu1992/dmdc
  • paper_authors: Zeyu Cai, Chengqian Jin, Feipeng Da
  • for: 提高coded aperture snapshot spectral imaging(CASSI)中深度学习方法的性能。
  • methods: 使用动态马斯克基于双相机系统,包括RGB相机和CASSI系统,并在其中运行。首先,系统通过RGB图像学习场景中的空间特征分布,然后使用SLM编码场景,并最后将RGB和CASSI图像传递给网络进行重建。设计了DMDC-net,包括一个小规模CNN基于动态马斯克的动态调整网络和一个多模式重建网络。
  • results: 对多个数据集进行了广泛的实验,并达到了与SOTA的 более чем9dB PSNR提升。
    Abstract Deep learning methods are developing rapidly in coded aperture snapshot spectral imaging (CASSI). The number of parameters and FLOPs of existing state-of-the-art methods (SOTA) continues to increase, but the reconstruction accuracy improves slowly. Current methods still face two problems: 1) The performance of the spatial light modulator (SLM) is not fully developed due to the limitation of fixed Mask coding. 2) The single input limits the network performance. In this paper we present a dynamic-mask-based dual camera system, which consists of an RGB camera and a CASSI system running in parallel. First, the system learns the spatial feature distribution of the scene based on the RGB images, then instructs the SLM to encode each scene, and finally sends both RGB and CASSI images to the network for reconstruction. We further designed the DMDC-net, which consists of two separate networks, a small-scale CNN-based dynamic mask network for dynamic adjustment of the mask and a multimodal reconstruction network for reconstruction using RGB and CASSI measurements. Extensive experiments on multiple datasets show that our method achieves more than 9 dB improvement in PSNR over the SOTA. (https://github.com/caizeyu1992/DMDC)
    摘要 深度学习方法在coded aperture snapshot spectral imaging(CASSI)领域得到了极速的发展。现有状态的方法(SOTA)中的参数和FLOPs继续增加,但是重建精度逐渐提高。现有方法仍然面临两个问题:1)SLM(光学掩模)的性能尚未得到完全发展,因为固定的掩码编码有限制。2)单输入限制网络的性能。在本文中,我们提出了动态掩码基于双摄像头系统,该系统由RGB摄像头和CASSI系统在平行运行。首先,系统通过RGB图像学习场景中的空间特征分布,然后对场景进行编码,并将RGB和CASSI图像发送给网络进行重建。我们还设计了DMDC-net,它包括两个独立的网络:一个小规模的CNN基于动态掩码网络用于动态调整掩码,以及一个多模式重建网络用于使用RGB和CASSI测量进行重建。我们对多个数据集进行了广泛的实验,结果表明,我们的方法可以与SOTA比进行9dB以上的PSNR提高。(https://github.com/caizeyu1992/DMDC)

Numerical Uncertainty of Convolutional Neural Networks Inference for Structural Brain MRI Analysis

  • paper_url: http://arxiv.org/abs/2308.01939
  • repo_url: None
  • paper_authors: Inés Gonzalez Pepe, Vinuyan Sivakolunthu, Hae Lang Park, Yohan Chatelain, Tristan Glatard
  • for: 这 paper investigates the numerical uncertainty of Convolutional Neural Networks (CNNs) inference for structural brain MRI analysis.
  • methods: 这 paper applies Random Rounding – a stochastic arithmetic technique – to CNN models employed in non-linear registration (SynthMorph) and whole-brain segmentation (FastSurfer), and compares the resulting numerical uncertainty to the one measured in a reference image-processing pipeline (FreeSurfer recon-all).
  • results: Results obtained on 32 representative subjects show that CNN predictions are substantially more accurate numerically than traditional image-processing results (non-linear registration: 19 vs 13 significant bits on average; whole-brain segmentation: 0.99 vs 0.92 S{\o}rensen-Dice score on average), which suggests a better reproducibility of CNN results across execution environments.
    Abstract This paper investigates the numerical uncertainty of Convolutional Neural Networks (CNNs) inference for structural brain MRI analysis. It applies Random Rounding -- a stochastic arithmetic technique -- to CNN models employed in non-linear registration (SynthMorph) and whole-brain segmentation (FastSurfer), and compares the resulting numerical uncertainty to the one measured in a reference image-processing pipeline (FreeSurfer recon-all). Results obtained on 32 representative subjects show that CNN predictions are substantially more accurate numerically than traditional image-processing results (non-linear registration: 19 vs 13 significant bits on average; whole-brain segmentation: 0.99 vs 0.92 S{\o}rensen-Dice score on average), which suggests a better reproducibility of CNN results across execution environments.
    摘要

TDMD: A Database for Dynamic Color Mesh Subjective and Objective Quality Explorations

  • paper_url: http://arxiv.org/abs/2308.01499
  • repo_url: None
  • paper_authors: Qi Yang, Joel Jung, Timon Deschamps, Xiaozhong Xu, Shan Liu
  • for: 这个论文的目的是为了开发对动态颜色网格(DCM)的 объектив评价指标,以及研究Typical distortions对DCM的影响。
  • methods: 该论文使用了Tencent - dynamic colored mesh database(TDMD),包括8个参考DCM对象和6种Typical distortions。通过处理视频序列(PVS),实现了大规模主观实验,得到了303个扭曲DCM样本,并对其进行了评分。
  • results: 研究发现,不同类型的扭曲会对人类对DCM的评价产生不同的影响。此外,该论文还评估了三种当今最佳的对metric,包括图像基于、点基于和视频基于的metric,并对其在实际应用中的选择提供了建议。
    Abstract Dynamic colored meshes (DCM) are widely used in various applications; however, these meshes may undergo different processes, such as compression or transmission, which can distort them and degrade their quality. To facilitate the development of objective metrics for DCMs and study the influence of typical distortions on their perception, we create the Tencent - dynamic colored mesh database (TDMD) containing eight reference DCM objects with six typical distortions. Using processed video sequences (PVS) derived from the DCM, we have conducted a large-scale subjective experiment that resulted in 303 distorted DCM samples with mean opinion scores, making the TDMD the largest available DCM database to our knowledge. This database enabled us to study the impact of different types of distortion on human perception and offer recommendations for DCM compression and related tasks. Additionally, we have evaluated three types of state-of-the-art objective metrics on the TDMD, including image-based, point-based, and video-based metrics, on the TDMD. Our experimental results highlight the strengths and weaknesses of each metric, and we provide suggestions about the selection of metrics in practical DCM applications. The TDMD will be made publicly available at the following location: https://multimedia.tencent.com/resources/tdmd.
    摘要 “动态颜色网格”(DCM)在各种应用中广泛使用,但这些网格可能会经历不同的处理过程,如压缩或传输,这会导致它们的质量下降。为了促进DCM的 объектив评价和研究不同类型的扭曲对人类感知的影响,我们创建了腾讯——动态颜色网格数据库(TDMD),包含8个参考DCM对象和6种典型的扭曲。使用来自DCM的处理视频序列(PVS),我们进行了大规模的主观实验,得到了303个扭曲DCM样本,其中每个样本有平均意见分数,这使得TDMD成为我们所知道的最大的DCM数据库。这个数据库允许我们研究不同类型的扭曲对人类感知的影响,并提供了DCM压缩和相关任务的建议。此外,我们还评估了三种现状最佳的对metric在TDMD上,包括图像基于、点基于和视频基于的metric。我们的实验结果显示了每种metric的优缺点,并提供了实际应用中metric选择的建议。TDMD将于以下地址公开:https://multimedia.tencent.com/resources/tdmd。”

Estimation of motion blur kernel parameters using regression convolutional neural networks

  • paper_url: http://arxiv.org/abs/2308.01381
  • repo_url: None
  • paper_authors: Luis G. Varela, Laura E. Boucheron, Steven Sandoval, David Voelz, Abu Bucker Siddik
  • for: 本研究旨在拟合线性运动模糊图像中的杂乱参数。
  • methods: 本研究使用神经网络进行回归预测,以估算线性运动模糊图像中的杂乱参数。
  • results: 研究表明,线性运动模糊图像中的杂乱参数与杂乱程度和方向之间存在着密切的关系。这种关系可以被利用来在uniformed motion blur images中进行杂乱参数的回归预测。
    Abstract Many deblurring and blur kernel estimation methods use MAP or classification deep learning techniques to sharpen an image and predict the blur kernel. We propose a regression approach using neural networks to predict the parameters of linear motion blur kernels. These kernels can be parameterized by its length of blur and the orientation of the blur.This paper will analyze the relationship between length and angle of linear motion blur. This analysis will help establish a foundation to using regression prediction in uniformed motion blur images.
    摘要 很多锐化和杂化kernel估计方法使用MAP或分类深度学习技术来锐化图像和预测杂化kernel。我们提议使用回归方法使用神经网络预测线性运动杂化kernel的参数。这些kernel可以由杂化的长度和杂化方向来参数化。本文将分析线性运动杂化中长度和角度之间的关系,以Establish a foundation for using regression prediction in uniformed motion blur images。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

ELIXR: Towards a general purpose X-ray artificial intelligence system through alignment of large language models and radiology vision encoders

  • paper_url: http://arxiv.org/abs/2308.01317
  • repo_url: None
  • paper_authors: Shawn Xu, Lin Yang, Christopher Kelly, Marcin Sieniek, Timo Kohlberger, Martin Ma, Wei-Hung Weng, Attila Kiraly, Sahar Kazemzadeh, Zakkai Melamed, Jungyeon Park, Patricia Strachan, Yun Liu, Chuck Lau, Preeti Singh, Christina Chen, Mozziyar Etemadi, Sreenivasa Raju Kalidindi, Yossi Matias, Katherine Chou, Greg S. Corrado, Shravya Shetty, Daniel Tse, Shruthi Prabhakara, Daniel Golden, Rory Pilgrim, Krish Eswaran, Andrew Sellergren
  • For: The paper is written for the task of developing a lightweight adapter architecture for chest X-ray (CXR) classification and vision-language tasks using a fixed language model (PaLM 2) and a language-aligned image encoder.* Methods: The paper uses a combination of a language-aligned image encoder and a fixed large language model (PaLM 2) to perform zero-shot CXR classification, data-efficient CXR classification, and semantic search tasks. The authors train the adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset.* Results: The paper achieves state-of-the-art performance on zero-shot CXR classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings for 1% and 10% training data), and semantic search (0.76 normalized discounted cumulative gain across nineteen queries, including perfect retrieval on twelve of them). The authors also demonstrate the promise of ELIXR on CXR vision-language tasks, achieving overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively.
    Abstract Our approach, which we call Embeddings for Language/Image-aligned X-Rays, or ELIXR, leverages a language-aligned image encoder combined or grafted onto a fixed LLM, PaLM 2, to perform a broad range of tasks. We train this lightweight adapter architecture using images paired with corresponding free-text radiology reports from the MIMIC-CXR dataset. ELIXR achieved state-of-the-art performance on zero-shot chest X-ray (CXR) classification (mean AUC of 0.850 across 13 findings), data-efficient CXR classification (mean AUCs of 0.893 and 0.898 across five findings (atelectasis, cardiomegaly, consolidation, pleural effusion, and pulmonary edema) for 1% (~2,200 images) and 10% (~22,000 images) training data), and semantic search (0.76 normalized discounted cumulative gain (NDCG) across nineteen queries, including perfect retrieval on twelve of them). Compared to existing data-efficient methods including supervised contrastive learning (SupCon), ELIXR required two orders of magnitude less data to reach similar performance. ELIXR also showed promise on CXR vision-language tasks, demonstrating overall accuracies of 58.7% and 62.5% on visual question answering and report quality assurance tasks, respectively. These results suggest that ELIXR is a robust and versatile approach to CXR AI.
    摘要 我们的方法,我们称之为语言/图像对接X射线(ELIXR),利用一个语言对接图像编码器与 fixes LLM(PaLM 2)结合,以实现广泛的任务。我们在使用图像和对应的自由文本医学报告从 MIMIC-CXR 数据集进行训练这个轻量级适配器建筑。ELIXR 在零shot 肺X射线(CXR)分类中获得了状态机器的表现(平均 AUC 为 0.850,涵盖 13 个发现),以及数据效率 CXR 分类(平均 AUCs 为 0.893 和 0.898,涵盖五个发现(胸腔缺失、心肺肥大、混合、肺液腔和肺泡),对 1% (约 2,200 张图像)和 10% (约 22,000 张图像)训练数据)。此外,ELIXR 还在 CXR 视言语任务中表现良好,其总准确率为 58.7% 和 62.5%,分别在视问题回答和报告质量签名任务中。这些结果表明 ELIXR 是一种可靠和多样的 CXR AI 方法。

A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models

  • paper_url: http://arxiv.org/abs/2308.01328
  • repo_url: None
  • paper_authors: Bilel Guetarni, Feryal Windal, Halim Benhabiles, Marianne Petit, Romain Dubois, Emmanuelle Leteurtre, Dominique Collard
  • for: 这个研究的目的是为了提出一种基于整个报告图像(Whole Slide Image,WSI)的深度学习模型,用于分类混合大细胞淋巴癌(Diffuse Large B-Cell Lymphoma,DLBCL)类型。
  • methods: 我们提出了一种视transformer基本框架,用于从高分辨率WSIs中分类DLBCL类型。我们还提出了一种多模式 architectures,用于训练一个分类器模型从多个WSI模式。此外,我们还使用了知识储存机制,以有效地驱动分类器的学习。
  • results: 我们的实验研究表明,我们的单模式分类器模型在157名患者的数据集上表现出色,超过了6个最近的State-of-the-art方法。此外,我们Estimated一个力学律曲线,表明我们的分类器模型只需要一定的更多的患者数据来进行训练,以达到与IHC技术相同的诊断精度。
    Abstract Determining lymphoma subtypes is a crucial step for better patients treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which is based on gene expression technology, is highly expensive and time-consuming making difficult its accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. WSI (Whole Slide Image) analysis by deep learning models showed promising new directions for cancer diagnosis that would be cheaper and faster than existing alternative methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we propose a multi-modal architecture to train a classifier model from various WSI modalities. We then exploit this model through a knowledge distillation mechanism for efficiently driving the learning of a mono-modal classifier. Our experimental study conducted on a dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent methods from the state-of-the-art dedicated for cancer classification. Moreover, the power-law curve, estimated on our experimental data, shows that our classification model requires a reasonable number of additional patients for its training to potentially reach identical diagnosis accuracy as IHC technologies.
    摘要 确定淋巴癌 subclass 是诊断的关键 step 以提高患者治疗的可能性,并增加生存机会。在这种情况下,现有的黄金标准诊断方法,基于基因表达技术,是非常昂贵和时间consuming,使得它的可 accessibility 受限。虽然基于 IHC(免疫 histochemistry)技术的诊断方法存在,但它们仍然受到类似的限制,并且精度较低。 WSI(整个报告图像)分析方法,基于深入学习模型,显示出了可能更加便宜和更快的诊断方法。在这种工作中,我们提出了基于视Transformer的框架,用于从高分辨率 WSI 中分类Diffuse Large B-Cell Lymphoma(淋巴癌)亚种。为此,我们提出了多modal 架构,用于训练一个分类器模型。然后,我们利用知识储存机制,以高效地驱动这个模型的学习。我们的实验研究,在157名患者的 dataset 上进行,显示了我们的单Modal 分类模型在诊断方面的出色表现,超过了最近六种专门为抑癌诊断而设计的方法。此外,我们对实验数据进行了power-law 曲线估计,显示我们的分类模型需要一个合理的数量的更多患者来进行训练,以达到与 IHC 技术相同的诊断精度。

Incorporating Season and Solar Specificity into Renderings made by a NeRF Architecture using Satellite Images

  • paper_url: http://arxiv.org/abs/2308.01262
  • repo_url: https://github.com/enterprisecv-6/season-nerf
  • paper_authors: Michael Gableman, Avinash Kak
  • for: 这项研究的目的是使用卫星图像进行NeRF框架中的场景从不同视点渲染,并考虑太阳角度。
  • methods: 该研究使用Shadow NeRF和Sat-NeRF的框架,并引入一个时间年份输入变量,以教导网络渲染季节特征。
  • results: 该研究可以准确地渲染新视点中的场景,生成高程图,预测阴影,并独立地渲染季节特征和阴影。
    Abstract As a result of Shadow NeRF and Sat-NeRF, it is possible to take the solar angle into account in a NeRF-based framework for rendering a scene from a novel viewpoint using satellite images for training. Our work extends those contributions and shows how one can make the renderings season-specific. Our main challenge was creating a Neural Radiance Field (NeRF) that could render seasonal features independently of viewing angle and solar angle while still being able to render shadows. We teach our network to render seasonal features by introducing one more input variable -- time of the year. However, the small training datasets typical of satellite imagery can introduce ambiguities in cases where shadows are present in the same location for every image of a particular season. We add additional terms to the loss function to discourage the network from using seasonal features for accounting for shadows. We show the performance of our network on eight Areas of Interest containing images captured by the Maxar WorldView-3 satellite. This evaluation includes tests measuring the ability of our framework to accurately render novel views, generate height maps, predict shadows, and specify seasonal features independently from shadows. Our ablation studies justify the choices made for network design parameters.
    摘要 因为阴影NeRF和Sat-NeRF,可以使用卫星图像进行训练,以渲染场景从不同视点的 novel 视图。我们的工作是对这些贡献的扩展,并表明如何使渲染季节化特征独立于视角和太阳角。我们的主要挑战是创建一个能够独立地渲染季节特征而不是视角和太阳角的神经辐射场(NeRF)。我们教育我们的网络如何渲染季节特征,通过引入一个额外的输入变量——时间年份。然而,卫星图像的小训练集通常会导致在同一个季节中的阴影存在于每个图像中,这会引入ambiguity。我们添加了额外的损失函数项来避免网络使用季节特征来补偿阴影。我们在八个 Area of Interest 中测试了我们的框架,包括测试渲染新视图、生成高度图、预测阴影和独立地渲染季节特征。我们的剖析研究证明了我们的网络设计参数的选择。