eess.IV - 2023-07-31

Hybrid quantum transfer learning for crack image classification on NISQ hardware

  • paper_url: http://arxiv.org/abs/2307.16723
  • repo_url: None
  • paper_authors: Alexander Geng, Ali Moghiseh, Claudia Redenbach, Katja Schladitz
  • for: 这个研究是用于应用量子机器学习来检测灰度图像中的裂线。
  • methods: 这个研究使用了量子转移学习,并比较了各种量子处理器的执行效率。
  • results: 研究发现,使用量子转移学习可以实现高效地检测灰度图像中的裂线,并且可以在不同的量子处理器上进行实际实现。
    Abstract Quantum computers possess the potential to process data using a remarkably reduced number of qubits compared to conventional bits, as per theoretical foundations. However, recent experiments have indicated that the practical feasibility of retrieving an image from its quantum encoded version is currently limited to very small image sizes. Despite this constraint, variational quantum machine learning algorithms can still be employed in the current noisy intermediate scale quantum (NISQ) era. An example is a hybrid quantum machine learning approach for edge detection. In our study, we present an application of quantum transfer learning for detecting cracks in gray value images. We compare the performance and training time of PennyLane's standard qubits with IBM's qasm\_simulator and real backends, offering insights into their execution efficiency.
    摘要 量子计算机具有可能处理数据使用非常减少的量子比特数量,根据理论基础。然而,最近的实验表明目前只能处理非常小的图像大小。尽管有这些限制,可以在当前的不纯量子Intermediate scale quantum(NISQ)时代使用量子机器学习算法。我们的研究中,我们应用量子传输学习来探测灰度图像中的裂 crack。我们比较了PennyLane的标准量子比特和IBM的qasm_simulator和真实后端的执行效率。

Conditioning Generative Latent Optimization to solve Imaging Inverse Problems

  • paper_url: http://arxiv.org/abs/2307.16670
  • repo_url: None
  • paper_authors: Thomas Braure, Kévin Ginsburger
  • for: 这个论文主要针对医学成像逆问题(IIP),尤其是在稀疏X射线投影设置下。
  • methods: 这篇论文使用了完全无监督的技术,主要是使用得分数据驱动模型来解决IIP。
  • results: 研究表明,使用cGLO方法可以在稀疏视角CT设置下提供更好的重建效果,并且不需要使用回探操作。此外,cGLO方法在小训练数据集下也表现出了更好的效果。
    Abstract Computed Tomography (CT) is a prominent example of Imaging Inverse Problem (IIP), highlighting the unrivalled performances of data-driven methods in degraded measurements setups like sparse X-ray projections. Although a significant proportion of deep learning approaches benefit from large supervised datasets to directly map experimental measurements to medical scans, they cannot generalize to unknown acquisition setups. In contrast, fully unsupervised techniques, most notably using score-based generative models, have recently demonstrated similar or better performances compared to supervised approaches to solve IIPs while being flexible at test time regarding the imaging setup. However, their use cases are limited by two factors: (a) they need considerable amounts of training data to have good generalization properties and (b) they require a backward operator, like Filtered-Back-Projection in the case of CT, to condition the learned prior distribution of medical scans to experimental measurements. To overcome these issues, we propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO), in which the parameters of a decoder network are initialized on an unsupervised dataset. The decoder is then used for reconstruction purposes, by performing Generative Latent Optimization with a loss function directly comparing simulated measurements from proposed reconstructions to experimental measurements. The resulting approach, tested on sparse-view CT using multiple training dataset sizes, demonstrates better reconstruction quality compared to state-of-the-art score-based strategies in most data regimes and shows an increasing performance advantage for smaller training datasets and reduced projection angles. Furthermore, cGLO does not require any backward operator and could expand use cases even to non-linear IIPs.
    摘要 computed tomography (CT) 是一个典型的 imaging inverse problem (IIP) 例子, highlighting 数据驱动方法在受限的测量设置中的无与伦比的表现。 although 许多深度学习方法可以从大量的直接映射实验室测量到医疗扫描的 supervised datasets 中获得优秀的表现,它们无法泛化到未知的获取设置。 相反,完全不supervised 技术,主要是使用得分数据生成模型,在 recent years 中 demonstrates 与 supervised 方法相当或更好的表现,而且可以在测试时随意选择 imaging 设置。 however, its use cases 受到两个因素的限制: (a) 它们需要大量的训练数据来有好的泛化性质, (b) 它们需要一个 backwards Operator,如 filtered-back-projection 在 CT 中,以Conditional 学习的 learned prior distribution of medical scans 到实验测量。To overcome these issues, we propose an unsupervised conditional approach to the Generative Latent Optimization framework (cGLO), in which the parameters of a decoder network are initialized on an unsupervised dataset. the decoder is then used for reconstruction purposes, by performing Generative Latent Optimization with a loss function directly comparing simulated measurements from proposed reconstructions to experimental measurements. the resulting approach, tested on sparse-view CT using multiple training dataset sizes, demonstrates better reconstruction quality compared to state-of-the-art score-based strategies in most data regimes and shows an increasing performance advantage for smaller training datasets and reduced projection angles. Furthermore, cGLO does not require any backward operator and could expand use cases even to non-linear IIPs.

Towards General Low-Light Raw Noise Synthesis and Modeling

  • paper_url: http://arxiv.org/abs/2307.16508
  • repo_url: https://github.com/fengzhang427/LRD
  • paper_authors: Feng Zhang, Bin Xu, Zhiqiang Li, Xinran Liu, Qingbo Lu, Changxin Gao, Nong Sang
  • for: 本研究旨在Addressing the problem of modeling and synthesizing low-light raw noise in computational photography and image processing applications.
  • methods: 我们提出了一种新的方法,即通过物理和学习模型来同时Synthesize signal-dependent and signal-independent noise.
  • results: 我们的方法可以同时学习不同ISO水平的噪声特性,并可以通过多尺度扩展Discriminator(FTD)来正确地分布噪声. 实验结果表明,我们的方法可以与现有方法相比,在不同的感器上表现出优异的denoising效果.
    Abstract Modeling and synthesizing low-light raw noise is a fundamental problem for computational photography and image processing applications. Although most recent works have adopted physics-based models to synthesize noise, the signal-independent noise in low-light conditions is far more complicated and varies dramatically across camera sensors, which is beyond the description of these models. To address this issue, we introduce a new perspective to synthesize the signal-independent noise by a generative model. Specifically, we synthesize the signal-dependent and signal-independent noise in a physics- and learning-based manner, respectively. In this way, our method can be considered as a general model, that is, it can simultaneously learn different noise characteristics for different ISO levels and generalize to various sensors. Subsequently, we present an effective multi-scale discriminator termed Fourier transformer discriminator (FTD) to distinguish the noise distribution accurately. Additionally, we collect a new low-light raw denoising (LRD) dataset for training and benchmarking. Qualitative validation shows that the noise generated by our proposed noise model can be highly similar to the real noise in terms of distribution. Furthermore, extensive denoising experiments demonstrate that our method performs favorably against state-of-the-art methods on different sensors.
    摘要 模型和 sintesizar 低光照Raw 噪声是计算摄影和图像处理应用中的基本问题。although most recent works have adopted physics-based models to synthesize noise, the signal-independent noise in low-light conditions is far more complicated and varies dramatically across camera sensors, which is beyond the description of these models. To address this issue, we introduce a new perspective to synthesize the signal-independent noise by a generative model. Specifically, we synthesize the signal-dependent and signal-independent noise in a physics- and learning-based manner, respectively. In this way, our method can be considered as a general model, that is, it can simultaneously learn different noise characteristics for different ISO levels and generalize to various sensors. Subsequently, we present an effective multi-scale discriminator termed Fourier transformer discriminator (FTD) to distinguish the noise distribution accurately. Additionally, we collect a new low-light raw denoising (LRD) dataset for training and benchmarking. Qualitative validation shows that the noise generated by our proposed noise model can be highly similar to the real noise in terms of distribution. Furthermore, extensive denoising experiments demonstrate that our method performs favorably against state-of-the-art methods on different sensors.Here's the translation in Traditional Chinese:模型和 sintesizar 低光照Raw 噪声是计算摄影和图像处理应用中的基本问题。although most recent works have adopted physics-based models to synthesize noise, the signal-independent noise in low-light conditions is far more complicated and varies dramatically across camera sensors, which is beyond the description of these models. To address this issue, we introduce a new perspective to synthesize the signal-independent noise by a generative model. Specifically, we synthesize the signal-dependent and signal-independent noise in a physics- and learning-based manner, respectively. In this way, our method can be considered as a general model, that is, it can simultaneously learn different noise characteristics for different ISO levels and generalize to various sensors. Subsequently, we present an effective multi-scale discriminator termed Fourier transformer discriminator (FTD) to distinguish the noise distribution accurately. Additionally, we collect a new low-light raw denoising (LRD) dataset for training and benchmarking. Qualitative validation shows that the noise generated by our proposed noise model can be highly similar to the real noise in terms of distribution. Furthermore, extensive denoising experiments demonstrate that our method performs favorably against state-of-the-art methods on different sensors.

A hybrid approach for improving U-Net variants in medical image segmentation

  • paper_url: http://arxiv.org/abs/2307.16462
  • repo_url: None
  • paper_authors: Aitik Gupta, Dr. Joydip Dhar
  • for: 这篇论文主要针对医疗图像分割领域,旨在提高医疗图像分割的精度和效率。
  • methods: 该论文使用深度学习方法,包括MultiResUNet、Attention U-Net等 variants,以及depthwise separable convolutions来降低网络参数的需求,同时保持一定的性能水平。
  • results: 该论文通过使用注意力系统和径向连接来提高医疗图像分割的准确率和效率,并且在皮肤病变分割任务中达到了一定的成果。
    Abstract Medical image segmentation is vital to the area of medical imaging because it enables professionals to more accurately examine and understand the information offered by different imaging modalities. The technique of splitting a medical image into various segments or regions of interest is known as medical image segmentation. The segmented images that are produced can be used for many different things, including diagnosis, surgery planning, and therapy evaluation. In initial phase of research, major focus has been given to review existing deep-learning approaches, including researches like MultiResUNet, Attention U-Net, classical U-Net, and other variants. The attention feature vectors or maps dynamically add important weights to critical information, and most of these variants use these to increase accuracy, but the network parameter requirements are somewhat more stringent. They face certain problems such as overfitting, as their number of trainable parameters is very high, and so is their inference time. Therefore, the aim of this research is to reduce the network parameter requirements using depthwise separable convolutions, while maintaining performance over some medical image segmentation tasks such as skin lesion segmentation using attention system and residual connections.
    摘要 医疗图像分割是医疗图像领域的关键技术,它使医 profesionales可以更加准确地检查和理解不同的成像模式中提供的信息。图像分割技术的核心是将医疗图像分成不同的区域或特点,以便进行更加准确的诊断、手术规划和治疗评估。在初期研究阶段,主要是对现有深度学习方法进行了审查,包括MultiResUNet、Attention U-Net、类传统U-Net和其他变体。这些变体的注意力特征向量或地图在运行时动态地给予重要的权重,以提高准确性。然而,这些网络的参数需求较高,导致过拟合和执行时间较长。因此,本研究的目标是通过深度分割减少网络参数,保持一定的性能水平,特别是在医学图像分割任务中,如皮肤病变分割使用注意力系统和 residual 连接。

High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation

  • paper_url: http://arxiv.org/abs/2307.16426
  • repo_url: https://github.com/jqtangust/epce-hdr
  • paper_authors: Jiaqi Tang, Xiaogang Xu, Sixing Hu, Ying-Cong Chen
  • for: 解决镜头缺乏能力导致数字图像的动态范围受限,提高图像的动态范围以更好地反映实际场景。
  • methods: 提出一种使用单一神经网络来显式地估算折射函数和其对应的HDR图像的方法,并使用synthetic和实际图像构建一个新的数据集来验证该方法的一致性和性能。
  • results: 经验表明,该方法可以在不同的折射函数下进行一致性的重建,并达到领先的性能水平。
    Abstract Due to limited camera capacities, digital images usually have a narrower dynamic illumination range than real-world scene radiance. To resolve this problem, High Dynamic Range (HDR) reconstruction is proposed to recover the dynamic range to better represent real-world scenes. However, due to different physical imaging parameters, the tone-mapping functions between images and real radiance are highly diverse, which makes HDR reconstruction extremely challenging. Existing solutions can not explicitly clarify a corresponding relationship between the tone-mapping function and the generated HDR image, but this relationship is vital when guiding the reconstruction of HDR images. To address this problem, we propose a method to explicitly estimate the tone mapping function and its corresponding HDR image in one network. Firstly, based on the characteristics of the tone mapping function, we construct a model by a polynomial to describe the trend of the tone curve. To fit this curve, we use a learnable network to estimate the coefficients of the polynomial. This curve will be automatically adjusted according to the tone space of the Low Dynamic Range (LDR) image, and reconstruct the real HDR image. Besides, since all current datasets do not provide the corresponding relationship between the tone mapping function and the LDR image, we construct a new dataset with both synthetic and real images. Extensive experiments show that our method generalizes well under different tone-mapping functions and achieves SOTA performance.
    摘要 First, based on the characteristics of the tone mapping function, we construct a model using a polynomial to describe the trend of the tone curve. To fit this curve, we use a learnable network to estimate the coefficients of the polynomial. This curve will be automatically adjusted according to the tone space of the Low Dynamic Range (LDR) image and reconstruct the real HDR image.Furthermore, since all current datasets do not provide the corresponding relationship between the tone mapping function and the LDR image, we construct a new dataset with both synthetic and real images. Extensive experiments show that our method generalizes well under different tone-mapping functions and achieves state-of-the-art performance.

DRAW: Defending Camera-shooted RAW against Image Manipulation

  • paper_url: http://arxiv.org/abs/2307.16418
  • repo_url: None
  • paper_authors: Xiaoxiao Hu, Qichao Ying, Zhenxing Qian, Sheng Li, Xinpeng Zhang
  • for: 防止图像修改和增强图像源安全性
  • methods: 利用多频部分融合网络和隐藏水印技术保护原始RAW数据
  • results: 在多个知名RAW数据集上实现了修改和增强图像的抵抗性,并且可以准确地 Localize增强区域。
    Abstract RAW files are the initial measurement of scene radiance widely used in most cameras, and the ubiquitously-used RGB images are converted from RAW data through Image Signal Processing (ISP) pipelines. Nowadays, digital images are risky of being nefariously manipulated. Inspired by the fact that innate immunity is the first line of body defense, we propose DRAW, a novel scheme of defending images against manipulation by protecting their sources, i.e., camera-shooted RAWs. Specifically, we design a lightweight Multi-frequency Partial Fusion Network (MPF-Net) friendly to devices with limited computing resources by frequency learning and partial feature fusion. It introduces invisible watermarks as protective signal into the RAW data. The protection capability can not only be transferred into the rendered RGB images regardless of the applied ISP pipeline, but also is resilient to post-processing operations such as blurring or compression. Once the image is manipulated, we can accurately identify the forged areas with a localization network. Extensive experiments on several famous RAW datasets, e.g., RAISE, FiveK and SIDD, indicate the effectiveness of our method. We hope that this technique can be used in future cameras as an option for image protection, which could effectively restrict image manipulation at the source.
    摘要 RAW文件是现场辐射强度的初始测量数据,广泛用于大多数相机中。现在,数字图像容易受到负面的修改。 inspirited by身体的自然免疫力是第一道防御线,我们提出了一种新的图像防范修改方案,通过保护图像的来源,即相机拍摄的RAW文件。我们设计了一种轻量级多频部分融合网络(MPF-Net),适合具有有限计算资源的设备。这个网络通过频率学习和部分特征融合,将不可见的水印(protective signal)引入RAW数据中。这种保护能力不仅可以在渲染后RGB图像中传递,而且对后期处理操作,如压缩或抖杂,也具有抗性。如果图像被修改,我们可以使用本地化网络来准确地标识受到修改的区域。我们在许多知名的RAW数据集,如RAISE、FiveK和SIDD上进行了广泛的实验,结果表明我们的方法的有效性。我们希望这种技术可以在未来的相机中作为图像保护选项,以防止图像修改在源头级别。

Multi-modal Graph Neural Network for Early Diagnosis of Alzheimer’s Disease from sMRI and PET Scans

  • paper_url: http://arxiv.org/abs/2307.16366
  • repo_url: None
  • paper_authors: Yanteng Zhanga, Xiaohai He, Yi Hao Chan, Qizhi Teng, Jagath C. Rajapakse
  • for: 这个研究旨在提出一种基于图形深度学习(Graph Neural Network,GNN)的多modal资料融合方法,用于早期诊断阿尔茨海默病(Alzheimer’s Disease,AD)。
  • methods: 这个研究使用了两种不同的图形深度学习方法:一种是基于图形的GNN,另一种是基于人类的GNN。这两种方法在不同的分支中进行训练,然后使用late fusion融合以获得最终的预测结果。
  • results: 实验结果显示,这个提出的多modal资料融合方法可以提高AD诊断的性能,并且显示了不同的图形深度学习方法在不同的分支中的表现。此研究也提供了一个技术参考,支持多重多modal诊断方法的需求。
    Abstract In recent years, deep learning models have been applied to neuroimaging data for early diagnosis of Alzheimer's disease (AD). Structural magnetic resonance imaging (sMRI) and positron emission tomography (PET) images provide structural and functional information about the brain, respectively. Combining these features leads to improved performance than using a single modality alone in building predictive models for AD diagnosis. However, current multi-modal approaches in deep learning, based on sMRI and PET, are mostly limited to convolutional neural networks, which do not facilitate integration of both image and phenotypic information of subjects. We propose to use graph neural networks (GNN) that are designed to deal with problems in non-Euclidean domains. In this study, we demonstrate how brain networks can be created from sMRI or PET images and be used in a population graph framework that can combine phenotypic information with imaging features of these brain networks. Then, we present a multi-modal GNN framework where each modality has its own branch of GNN and a technique is proposed to combine the multi-modal data at both the level of node vectors and adjacency matrices. Finally, we perform late fusion to combine the preliminary decisions made in each branch and produce a final prediction. As multi-modality data becomes available, multi-source and multi-modal is the trend of AD diagnosis. We conducted explorative experiments based on multi-modal imaging data combined with non-imaging phenotypic information for AD diagnosis and analyzed the impact of phenotypic information on diagnostic performance. Results from experiments demonstrated that our proposed multi-modal approach improves performance for AD diagnosis, and this study also provides technical reference and support the need for multivariate multi-modal diagnosis methods.
    摘要 近年来,深度学习模型在神经成像数据上进行早期诊断阿尔茨海默病(AD)已经广泛应用。Structural磁共振成像(sMRI)和萱 electrons Tomatoes(PET)成像提供了脑部结构和功能信息,分别。将这些特征相结合,可以建立更好的预测模型,than using a single modality alone。然而,目前的多Modalapproaches in deep learning,基于sMRI和PET,主要是基于卷积神经网络,这些网络不能整合图像和参数信息。我们提议使用图 neural networks(GNN),这些网络是非欧几何问题的解决方案。在本研究中,我们示示了如何从sMRI或PET成像中创建脑网络,并使用人口图框架将图像和参数信息结合。然后,我们提出了一种多Modal GNN框架,其中每个模式有自己的GNN分支,并提出了将多Modal数据在级别Node vector和相互作用矩阵之间进行结合的技术。最后,我们进行了较晚的融合,将每个分支的初步决策相互融合,并生成最终预测。随着多Modal数据变得更加可用,多Modal和多源是AD诊断的趋势。我们基于多Modal成像数据和非成像参数信息进行了探索性实验,分析了影响诊断性能的非成像信息。实验结果表明,我们提议的多Modal方法可以提高AD诊断性能,这也提供了技术参考,支持多变量多Modal诊断方法的需求。

Cardiac MRI Orientation Recognition and Standardization using Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2308.00615
  • repo_url: https://github.com/rxzhen/mscmr-orient
  • paper_authors: Ruoxuan Zhen
  • For: The paper is written for the purpose of addressing the challenge of imaging orientation in cardiac MRI and presenting a deep learning-based method for categorizing and standardizing the orientation.* Methods: The paper employs deep neural networks to categorize and standardize the orientation of cardiac MRI images, and proposes a transfer learning strategy to adapt the model to diverse modalities.* Results: The validation accuracies achieved were 100.0%, 100.0%, and 99.4% on CMR images from various modalities, including bSSFP, T2, and LGE, confirming the robustness and effectiveness of the proposed method.
    Abstract Orientation recognition and standardization play a crucial role in the effectiveness of medical image processing tasks. Deep learning-based methods have proven highly advantageous in orientation recognition and prediction tasks. In this paper, we address the challenge of imaging orientation in cardiac MRI and present a method that employs deep neural networks to categorize and standardize the orientation. To cater to multiple sequences and modalities of MRI, we propose a transfer learning strategy, enabling adaptation of our model from a single modality to diverse modalities. We conducted comprehensive experiments on CMR images from various modalities, including bSSFP, T2, and LGE. The validation accuracies achieved were 100.0\%, 100.0\%, and 99.4\%, confirming the robustness and effectiveness of our model. Our source code and network models are available at https://github.com/rxzhen/MSCMR-orient
    摘要 医疗影像处理任务中的方向识别和标准化扮演着关键性的角色。基于深度学习的方法在方向识别和预测任务中表现出了高度的优势。本文描述了在卡丁MRI中的影像方向识别挑战,并提出了使用深度神经网络来分类和标准化影像方向的方法。为了适应不同的序列和模式,我们提议了传输学习策略,使得我们的模型能够从单一的模式中适应多种模式。我们在不同的CMR图像模式(包括bSSFP、T2和LGE)上进行了广泛的实验,并得到了100.0%、100.0%和99.4%的验证精度,这证明了我们的模型的稳定性和有效性。我们的源代码和网络模型可以在https://github.com/rxzhen/MSCMR-orient中下载。

An objective validation of polyp and instrument segmentation methods in colonoscopy through Medico 2020 polyp segmentation and MedAI 2021 transparency challenges

  • paper_url: http://arxiv.org/abs/2307.16262
  • repo_url: https://github.com/georgebatch/kvasir-seg
  • paper_authors: Debesh Jha, Vanshali Sharma, Debapriya Banik, Debayan Bhattacharya, Kaushiki Roy, Steven A. Hicks, Nikhil Kumar Tomar, Vajira Thambawita, Adrian Krenzer, Ge-Peng Ji, Sahadev Poudel, George Batchkala, Saruar Alam, Awadelrahman M. A. Ahmed, Quoc-Huy Trinh, Zeshan Khan, Tien-Phat Nguyen, Shruti Shrestha, Sabari Nathan, Jeonghwan Gwak, Ritika K. Jha, Zheyuan Zhang, Alexander Schlaefer, Debotosh Bhattacharjee, M. K. Bhuyan, Pradip K. Das, Sravanthi Parsa, Sharib Ali, Michael A. Riegler, Pål Halvorsen, Ulas Bagci, Thomas De Lange
  • for: 这个研究旨在探讨自动分析colonoscopy影像,以提高早期癌前肿瘤的检测率。
  • methods: 这个研究使用了深度学习算法,以协助镜头专门医生在现场检查中检测和分类潜在的肿瘤和异常。
  • results: 研究发现,使用深度学习算法可以提高肿瘤检测率,并且可以提供可读的和可理解的解释,以便在临床应用中使用。
    Abstract Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
    摘要 自动分析幽门摄影像是一个活跃的研究领域,受到早期检测前期肿瘤的重要性启发。然而,在实时检查中检测肿瘤可以是困难的,因为幽门摄影医生的技能和经验异常,精力不足和疲劳等多种因素导致高检测肿瘤率。深度学习已经成为一种有希望的解决方案,它可以帮助幽门摄影医生在实时检查中检测和分类检测到的肿瘤和异常。此外,算法的准确率以外,透明度和解释性也是非常重要的,以解释算法的预测原因。然而,大多数算法是在私有数据、关闭源代码或商业软件上开发的,方法缺乏可重复性。因此,为促进效率和透明度的方法的发展,我们组织了“医疗自动肿瘤分割(Medico 2020)”和“MedAI:医疗图像分割透明度(MedAI 2021)”比赛。我们对每个提交进行了全面的摘要和分析,推出了最佳方法的优势,并讨论了这些方法在临床应用中的可能性。在透明度任务中,一个多学科团队,包括专业的肠胃内科医生,对每个提交进行了评估,以评估团队的开源实践、失败案例分析、割除研究、可用性和理解度来深入了解模型的可靠性。通过全面分析挑战,我们不仅披露了肿瘤和手术工具分割领域的进步,也鼓励了更多的透明度和理解性基于医疗图像系统的AI技术的开发。