results: 我们在皮肤病分类HAM10000 dataset和多发性硬化病患者未来病变预测 task 上进行了评估,结果表明,我们的方法可以有效控制最差表现 subgroup 的准确性错误,同时保持预测性能,并超越最近的基elines。Abstract
Trustworthy deployment of deep learning medical imaging models into real-world clinical practice requires that they be calibrated. However, models that are well calibrated overall can still be poorly calibrated for a sub-population, potentially resulting in a clinician unwittingly making poor decisions for this group based on the recommendations of the model. Although methods have been shown to successfully mitigate biases across subgroups in terms of model accuracy, this work focuses on the open problem of mitigating calibration biases in the context of medical image analysis. Our method does not require subgroup attributes during training, permitting the flexibility to mitigate biases for different choices of sensitive attributes without re-training. To this end, we propose a novel two-stage method: Cluster-Focal to first identify poorly calibrated samples, cluster them into groups, and then introduce group-wise focal loss to improve calibration bias. We evaluate our method on skin lesion classification with the public HAM10000 dataset, and on predicting future lesional activity for multiple sclerosis (MS) patients. In addition to considering traditional sensitive attributes (e.g. age, sex) with demographic subgroups, we also consider biases among groups with different image-derived attributes, such as lesion load, which are required in medical image analysis. Our results demonstrate that our method effectively controls calibration error in the worst-performing subgroups while preserving prediction performance, and outperforming recent baselines.
摘要
信任性的深入学习医学影像模型在实际临床应用中需要进行准确化。然而,即使模型在整体上具有良好的准确性,也可能对一个子群体存在不良准确性,导致医生不知道基于模型的建议而做出不良决策。虽然有些方法可以成功地消除 subgroup 的偏见,但这项工作将关注在医学影像分析中的开放问题上,即如何消除准确性偏见。我们的方法不需要在训练过程中提供 subgroup 特征,因此可以随意地消除不同敏感特征的偏见。为此,我们提出了一种新的两stage方法:集群焦点法。首先,我们将准确性不佳的样本分成集群,然后引入集群级别的焦点损失来改善准确性偏见。我们在皮肤病分类 task 和多发性精神病(MS)患者未来病变预测任务上进行了评估。除了考虑传统的敏感特征(例如年龄、性别)与人口 subgroup 之外,我们还考虑了医学影像分析中的不同图像特征,如病虫荷载,这些特征是必需的。我们的结果表明,我们的方法可以有效地控制最差 subgroup 的准确性错误,保持预测性能,并超越最新的基eline。
Once-Training-All-Fine: No-Reference Point Cloud Quality Assessment via Domain-relevance Degradation Description
results: 实验结果表明,提出的D$^3$-PCQA方法在多个公开数据集上 exhibits 强大的一般化能力和 robust性。Abstract
Full-reference (FR) point cloud quality assessment (PCQA) has achieved impressive progress in recent years. However, as reference point clouds are not available in many cases, no-reference (NR) metrics have become a research hotspot. Existing NR methods suffer from poor generalization performance. To address this shortcoming, we propose a novel NR-PCQA method, Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA). First, we demonstrate our model's interpretability by deriving the function of each module using a kernelized ridge regression model. Specifically, quality assessment can be characterized as a leap from the scattered perceptual domain (reflecting subjective perception) to the ordered quality domain (reflecting mean opinion score). Second, to reduce the significant domain discrepancy, we establish an intermediate domain, the description domain, based on insights from subjective experiments, by considering the domain relevance among samples located in the perception domain and learning a structured latent space. The anchor features derived from the learned latent space are generated as cross-domain auxiliary information to promote domain transformation. Furthermore, the newly established description domain decomposes the NR-PCQA problem into two relevant stages. These stages include a classification stage that gives the degradation descriptions to point clouds and a regression stage to determine the confidence degrees of descriptions, providing a semantic explanation for the predicted quality scores. Experimental results demonstrate that D$^3$-PCQA exhibits robust performance and outstanding generalization ability on several publicly available datasets. The code in this work will be publicly available at https://smt.sjtu.edu.cn.
摘要
Full-reference (FR) 点云质量评估 (PCQA) 在最近几年内取得了显著的进步。然而,由于参考点云不常可用,无参考 (NR) 指标成为了研究热点。现有的 NR 方法受到质量评估的泛化性能的限制。为了解决这一缺点,我们提出了一种新的 NR-PCQA 方法,即 Point Cloud Quality Assessment via Domain-relevance Degradation Description (D$^3$-PCQA)。首先,我们证明了我们的模型的可解释性,通过使用kernelized ridge regression模型来 derivate每个模块的函数。具体来说,质量评估可以被描述为从杂乱的感知领域(反映主观感受)到有序的质量领域(反映意见票)的跳跃。其次,为了减少域外差,我们建立了一个中间域,即描述域,基于对主观实验所获得的域相关性的思考。通过学习协同的秘密空间,我们生成了跨域的帮助信息,以便进行域转换。此外,我们新建立的描述域将 NR-PCQA 问题分解成两个相关的阶段。这两个阶段分别是用来给点云的质量描述和确定描述的可信度的阶段,从而提供了 semantics 的解释。实验结果表明,D$^3$-PCQA 具有出色的 Robustness 和泛化能力,在多个公开可用的数据集上达到了优秀的性能。我们将在https://smt.sjtu.edu.cn上公开代码。
Spatio-Temporal Perception-Distortion Trade-off in Learned Video SR
results: 实验结果表明,该论文提出的评价指标和框架可以更好地评价视频超解像的准确性,并且支持假设,即视频准确性评价应该考虑运动流动性的自然性。Abstract
Perception-distortion trade-off is well-understood for single-image super-resolution. However, its extension to video super-resolution (VSR) is not straightforward, since popular perceptual measures only evaluate naturalness of spatial textures and do not take naturalness of flow (temporal coherence) into account. To this effect, we propose a new measure of spatio-temporal perceptual video quality emphasizing naturalness of optical flow via the perceptual straightness hypothesis (PSH) for meaningful spatio-temporal perception-distortion trade-off. We also propose a new architecture for perceptual VSR (PSVR) to explicitly enforce naturalness of flow to achieve realistic spatio-temporal perception-distortion trade-off according to the proposed measures. Experimental results with PVSR support the hypothesis that a meaningful perception-distortion tradeoff for video should account for the naturalness of motion in addition to naturalness of texture.
摘要
文本扭曲质量评估对单张超高清图像处理well understood,但是扩展到视频超高清图像(VSR)并不直接,因为流行的感知度量只评估自然性的空间纹理,而不考虑流动的自然性(时间准确性)。为此,我们提出了一种新的spatio-temporal感知质量指标,强调流动的自然性via the perceptual straightness hypothesis(PSH),以实现有意义的spatio-temporal扭曲质量评估。我们还提出了一种新的PSVR架构,以直接强制实现流动的自然性,以达到实际的spatio-temporal扭曲质量评估。实验结果表示,在PVSR中,一个有意义的扭曲质量评估应该考虑流动的自然性,不仅是空间纹理的自然性。
Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions
results: 经过训练和测试,提出的方法在不同的照明条件和观察角度下,对 Tomatoes 的识别和评估表现出色,比基eline方法和先前方法高出58.14%、65.42% 和 66.39% 的mean average precision 分数。Abstract
Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.
摘要
采收完全熟 Tomatoes 的 mobile robot 存在许多实际应用中的挑战。这些挑战包括由叶子和枝头所引起的遮掩、 Tomatoes 和周围的植物发育阶段的颜色相似性,以及自然环境中的不同光照条件、观察角度和不同熟度水平。为了解决这些问题,这项研究提出了一个新的框架,利用卷积变换器架构来自动识别和分级 Tomatoes,无论它们的遮掩水平、光照条件和熟度如何。该提案的模型被训练和测试使用特意为这项研究制作的注意词汇图像集。该数据集在不同的光照条件下、不同的观察角度下和使用不同的移动摄像头感知器时被准备。与现有的数据集不同,这个数据集不仅包括不同的光照条件和观察角度,还使用了不同的移动摄像头感知器。为了评估该提案的效果,研究者们使用了另外两个公共数据集作为参照,即 Laboro Tomato 和 Rob2Pheno Annotated Tomato。结果表明,该提案的模型在处理受遮掩和受遮掩的 Tomatoes 实例时表现出色,与基准方法和先前的方法相比,提高了58.14%、65.42%和66.39%的平均准确率。这些结果表明,该模型在识别和定义 Tomatoes 方面具有出色的性能,而不是基准方法和先前的方法。具体来说,模型在 KUTomaData 图像集上 achieve 的 F1 分数为 80.14%,Dice 系数为 73.26%,和 Mean IoU 为 66.41%。
H-DenseFormer: An Efficient Hybrid Densely Connected Transformer for Multimodal Tumor Segmentation
results: 在两个公共的多Modal数据集上进行了广泛的实验,结果显示了我们的提案方法在与现有的State-of-the-art方法进行比较时,具有更好的表现,同时计算复杂度较低。Abstract
Recently, deep learning methods have been widely used for tumor segmentation of multimodal medical images with promising results. However, most existing methods are limited by insufficient representational ability, specific modality number and high computational complexity. In this paper, we propose a hybrid densely connected network for tumor segmentation, named H-DenseFormer, which combines the representational power of the Convolutional Neural Network (CNN) and the Transformer structures. Specifically, H-DenseFormer integrates a Transformer-based Multi-path Parallel Embedding (MPE) module that can take an arbitrary number of modalities as input to extract the fusion features from different modalities. Then, the multimodal fusion features are delivered to different levels of the encoder to enhance multimodal learning representation. Besides, we design a lightweight Densely Connected Transformer (DCT) block to replace the standard Transformer block, thus significantly reducing computational complexity. We conduct extensive experiments on two public multimodal datasets, HECKTOR21 and PI-CAI22. The experimental results show that our proposed method outperforms the existing state-of-the-art methods while having lower computational complexity. The source code is available at https://github.com/shijun18/H-DenseFormer.
摘要
Simplified Chinese:近期,深度学习方法在多Modal医疗影像肿瘤分割领域得到了广泛应用,并取得了Promising的结果。然而,大多数现有方法受到不充分的表达能力、特定的Modal数量和高计算复杂性的限制。在本文中,我们提出了一种混合密集连接网络,名为H-DenseFormer,它结合了Convolutional Neural Network (CNN)和Transformer结构的表达力。具体来说,H-DenseFormer integrate了一个基于Transformer的多路平行嵌入(MPE)模块,可以将多Modal的输入作为输入,以提取不同Modal的融合特征。然后,这些融合特征被传递到不同层的编码器,以增强多Modal学习表达。此外,我们设计了一个轻量级的Densely Connected Transformer(DCT)块,以取代标准Transformer块,从而显著降低计算复杂性。我们在HECKTOR21和PI-CAI22两个公共多Modal数据集上进行了广泛的实验。实验结果表明,我们提出的方法可以比现有的状态级方法更高效,同时计算复杂性也更低。源代码可以在https://github.com/shijun18/H-DenseFormer上获取。
Zero-DeepSub: Zero-Shot Deep Subspace Reconstruction for Rapid Multiparametric Quantitative MRI Using 3D-QALAS
paper_authors: Yohan Jun, Yamin Arefeen, Jaejin Cho, Shohei Fujita, Xiaoqing Wang, P. Ellen Grant, Borjan Gagoski, Camilo Jaimes, Michael S. Gee, Berkin Bilgic
for: develop and evaluate methods for reconstructing 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) time-series images
methods: using a low-rank subspace method and zero-shot deep-learning subspace method (Zero-DeepSub) for rapid and high fidelity T1 and T2 mapping and time-resolved imaging
results: good linearity and reduced biases compared to conventional QALAS, better g-factor maps and reduced voxel blurring, noise, and artifacts compared to conventional QALAS, and robust performance at up to 9-fold acceleration with Zero-DeepSub enabled whole-brain T1, T2, and PD mapping at 1 mm isotropic resolution within 2 min of scan time.Here’s the format you requested:
for: develop and evaluate methods for 3D-quantification using 3D-QALAS time-series images
methods: using low-rank subspace method and Zero-DeepSub
results: good linearity, reduced biases, better g-factor maps, and reduced voxel blurring, noise, and artifacts, and robust performance at up to 9-fold accelerationAbstract
Purpose: To develop and evaluate methods for 1) reconstructing 3D-quantification using an interleaved Look-Locker acquisition sequence with T2 preparation pulse (3D-QALAS) time-series images using a low-rank subspace method, which enables accurate and rapid T1 and T2 mapping, and 2) improving the fidelity of subspace QALAS by combining scan-specific deep-learning-based reconstruction and subspace modeling. Methods: A low-rank subspace method for 3D-QALAS (i.e., subspace QALAS) and zero-shot deep-learning subspace method (i.e., Zero-DeepSub) were proposed for rapid and high fidelity T1 and T2 mapping and time-resolved imaging using 3D-QALAS. Using an ISMRM/NIST system phantom, the accuracy of the T1 and T2 maps estimated using the proposed methods was evaluated by comparing them with reference techniques. The reconstruction performance of the proposed subspace QALAS using Zero-DeepSub was evaluated in vivo and compared with conventional QALAS at high reduction factors of up to 9-fold. Results: Phantom experiments showed that subspace QALAS had good linearity with respect to the reference methods while reducing biases compared to conventional QALAS, especially for T2 maps. Moreover, in vivo results demonstrated that subspace QALAS had better g-factor maps and could reduce voxel blurring, noise, and artifacts compared to conventional QALAS and showed robust performance at up to 9-fold acceleration with Zero-DeepSub, which enabled whole-brain T1, T2, and PD mapping at 1 mm isotropic resolution within 2 min of scan time. Conclusion: The proposed subspace QALAS along with Zero-DeepSub enabled high fidelity and rapid whole-brain multiparametric quantification and time-resolved imaging.
摘要
目的:开发和评估使用排序 Look-Locker 类型的三维量化(3D-QALAS)时间序列图像的重要方法,以实现精确和快速的 T1 和 T2 地图的构建,并且提高 subspace QALAS 的实用性。方法:提出了一种基于低维度的 subspace QALAS 方法和 zero-shot 深度学习 subspace 方法(Zero-DeepSub),用于快速和高实用性的 T1 和 T2 地图和时间分辨图像的重建。使用 ISMRM/NIST 系统实验库中的实验库,评估了提案方法中的 T1 和 T2 地图的准确性,并与参考方法进行比较。结果:实验结果显示,subspace QALAS 具有对于参考方法的良好线性性,而且可以降低 conventional QALAS 中的偏差,特别是 T2 地图。此外,在 vivo 中的结果显示,subspace QALAS 可以提供更好的 g-因素地图,并且可以降低像素模糊、噪音和错误,并且在 Zero-DeepSub 的支持下,可以在 9 倍的压缩因子下进行快速的构建。结论:提案的 subspace QALAS 和 Zero-DeepSub 可以实现高实用性和快速的全脑多 parametr 量化和时间分辨图像。
A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series
results: 这个研究的初步结果显示MBHR-Net可以实现高度精准的估计(3.73米RMSE、0.95 IoU、0.61 R2),表明这个深度学习模型具有实用的应用前景,包括城市规划、环境影响分析等。Abstract
Accurate estimation of building heights is essential for urban planning, infrastructure management, and environmental analysis. In this study, we propose a supervised Multimodal Building Height Regression Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series. S1 provides Synthetic Aperture Radar (SAR) data that offers valuable information on building structures, while S2 provides multispectral data that is sensitive to different land cover types, vegetation phenology, and building shadows. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands. Root Mean Squared Error (RMSE), Intersection over Union (IOU), and R-squared (R2) score metrics are used to evaluate the performance of the model. The preliminary results (3.73m RMSE, 0.95 IoU, 0.61 R2) demonstrate the effectiveness of our deep learning model in accurately estimating building heights, showcasing its potential for urban planning, environmental impact analysis, and other related applications.
摘要
准确估算建筑高度是城市规划、基础设施管理和环境分析中非常重要的。在本研究中,我们提出了一种监督式多模态建筑高度回归网络(MBHR-Net),用于使用 Sentinal-1(S1)和 Sentinal-2(S2)卫星时序序数据来估算建筑高度的10米空间分辨率。S1提供Synthetic Aperture Radar(SAR)数据,可以提供建筑结构的有价信息,而S2提供多spectral数据,敏感于不同的地表类型、植被生长阶段和建筑阴影。我们的MBHR-Net试图从S1和S2图像中提取有用的特征,以学习图像模式和建筑高度之间的复杂空间时间关系。模型在荷兰10座城市进行训练和测试。使用Root Mean Squared Error(RMSE)、Intersection over Union(IOU)和R-squared(R2) метри来评估模型的性能。初步结果(3.73米RMSE、0.95 IoU、0.61 R2)表明我们的深度学习模型可以准确地估算建筑高度,展示其在城市规划、环境影响分析等相关应用中的潜力。
Investigating Data Memorization in 3D Latent Diffusion Models for Medical Image Synthesis
paper_authors: Salman Ul Hassan Dar, Arman Ghanaat, Jannik Kahmann, Isabelle Ayx, Theano Papavassiliu, Stefan O. Schoenberg, Sandy Engelhardt
for: 这个论文的目的是评估三维潜在扩散模型在生成医疗数据方面的能力。
methods: 该论文使用了自我超vised模型基于对比学习来检测潜在的记忆效应。
results: 研究结果表明,这些潜在扩散模型确实会记忆训练数据,需要采取措施来缓解这种记忆效应。Abstract
Generative latent diffusion models have been established as state-of-the-art in data generation. One promising application is generation of realistic synthetic medical imaging data for open data sharing without compromising patient privacy. Despite the promise, the capacity of such models to memorize sensitive patient training data and synthesize samples showing high resemblance to training data samples is relatively unexplored. Here, we assess the memorization capacity of 3D latent diffusion models on photon-counting coronary computed tomography angiography and knee magnetic resonance imaging datasets. To detect potential memorization of training samples, we utilize self-supervised models based on contrastive learning. Our results suggest that such latent diffusion models indeed memorize training data, and there is a dire need for devising strategies to mitigate memorization.
摘要
文本翻译为简化字Simplified Chinese。<>生成式潜在扩散模型已成为数据生成领域的状态机。一个有前途的应用是生成真实的医疗数据,以便在开放数据分享无需妥协病人隐私。虽然有承诺,但是这些模型对敏感病人训练数据的记忆能力和生成样本高度相似的样本的 sintesis能力尚未得到充分探讨。我们在 photon-counting coronary computed tomography angiography和 knee magnetic resonance imaging 数据集上评估了3D潜在扩散模型的记忆能力。为检测可能的记忆 Training samples,我们利用了自我超VI的 contrastive learning。我们的结果表明,这些潜在扩散模型确实记忆训练数据,而需要采取措施来缓解记忆。