paper_authors: Hakan Emre Gedik, Abhinau K. Venkataramanan, Alan C. Bovik
for: 图像修复和图像质量评估领域的研究
methods: 使用注意力机制进行图像修复和质量评估
results: 实现了高度相关的人类意见分数预测质量Here’s a more detailed explanation of each point:1. for: The paper is written for the fields of image restoration and image quality assessment, with a focus on using deep learning techniques to improve the accuracy of these tasks.2. methods: The paper proposes a novel attention-based convolutional neural network (CNN) that can simultaneously perform image restoration and quality assessment. The network uses “quality attention” maps to focus on the most important regions of the image when making predictions.3. results: The proposed method achieves state-of-the-art deblocking accuracy and a high correlation between predicted quality and human opinion scores, demonstrating its effectiveness in image restoration and quality assessment.Abstract
Deep learning techniques have revolutionized the fields of image restoration and image quality assessment in recent years. While image restoration methods typically utilize synthetically distorted training data for training, deep quality assessment models often require expensive labeled subjective data. However, recent studies have shown that activations of deep neural networks trained for visual modeling tasks can also be used for perceptual quality assessment of images. Following this intuition, we propose a novel attention-based convolutional neural network capable of simultaneously performing both image restoration and quality assessment. We achieve this by training a JPEG deblocking network augmented with "quality attention" maps and demonstrating state-of-the-art deblocking accuracy, achieving a high correlation of predicted quality with human opinion scores.
摘要
深度学习技术在图像修复和图像质量评估领域中引发革命性变革,而图像修复方法通常使用合成扭曲的训练数据,而深度质量评估模型则需要昂贵的标注主观数据。然而,最近的研究表明,深度神经网络训练用于视觉模型任务的激活可以用于图像质量评估。基于这一想法,我们提出了一种新的注意力机制 convolutional neural network,可同时进行图像修复和质量评估。我们通过在JPEG块化网络中添加“质量注意力”地图,并达到了最高的块化精度和人脉意见分数之间的高相关性。
Observer study-based evaluation of TGAN architecture used to generate oncological PET images
paper_authors: Roberto Fedrigo, Fereshteh Yousefirizi, Ziping Liu, Abhinav K. Jha, Robert V. Bergen, Jean-Francois Rajotte, Raymond T. Ng, Ingrid Bloise, Sara Harsini, Dan J. Kadrmas, Carlos Uribe, Arman Rahmim
results: 结果显示,6名8名训练 observer不能正确地识别真实影像,这表明synthetic dataset具有reasonable的实现度。Abstract
The application of computer-vision algorithms in medical imaging has increased rapidly in recent years. However, algorithm training is challenging due to limited sample sizes, lack of labeled samples, as well as privacy concerns regarding data sharing. To address these issues, we previously developed (Bergen et al. 2022) a synthetic PET dataset for Head and Neck (H and N) cancer using the temporal generative adversarial network (TGAN) architecture and evaluated its performance segmenting lesions and identifying radiomics features in synthesized images. In this work, a two-alternative forced-choice (2AFC) observer study was performed to quantitatively evaluate the ability of human observers to distinguish between real and synthesized oncological PET images. In the study eight trained readers, including two board-certified nuclear medicine physicians, read 170 real/synthetic image pairs presented as 2D-transaxial using a dedicated web app. For each image pair, the observer was asked to identify the real image and input their confidence level with a 5-point Likert scale. P-values were computed using the binomial test and Wilcoxon signed-rank test. A heat map was used to compare the response accuracy distribution for the signed-rank test. Response accuracy for all observers ranged from 36.2% [27.9-44.4] to 63.1% [54.8-71.3]. Six out of eight observers did not identify the real image with statistical significance, indicating that the synthetic dataset was reasonably representative of oncological PET images. Overall, this study adds validity to the realism of our simulated H&N cancer dataset, which may be implemented in the future to train AI algorithms while favoring patient confidentiality and privacy protection.
摘要
随着计算机视觉算法在医学成像领域的应用不断增长,但训练算法却面临限制性的样本数量、缺乏标注样本以及隐私问题的数据分享。为解决这些问题,我们在2022年(Bergen等人)开发了一个基于时间生成对抗网络(TGAN)架构的人工智能肿瘤 dataset,并评估其在合成图像中分类肿瘤和找到医学特征。在这项工作中,我们采用了两alternative forced-choice(2AFC)观察者研究,以量化人类观察员对真实和合成的医学PET图像之间的区别能力。研究中,8名训练有素的观察员,包括2名 board-certified 核医师,通过专门的网页应用读取170个真实/合成图像对,每对图像由观察员通过5点likert分度表示自信水平。对每对图像,观察员需要选择真实图像并输入自信水平。计算p-值使用binominal test和wilcoxon signed-rank test。使用热图比较响应准确性分布。所有观察员准确率范围从36.2%(27.9-44.4)到63.1%(54.8-71.3)。6名观察员中6名没有在统计上显示出区别,这表明我们的合成H&N肿瘤 dataset 具有相对准确的 representativeness。总的来说,这项研究为我们的模拟H&N肿瘤dataset增加了有效性,可能在未来用于训练AI算法,保护患者隐私和隐私权。
Machine-to-Machine Transfer Function in Deep Learning-Based Quantitative Ultrasound
paper_authors: Ufuk Soylu, Michael L. Oelze for: 这个研究旨在减少深度学习(DL)基于量子超音波(QUS)的数据不一致问题,特别是在收集数据时的水平上。methods: 这个研究使用了转换函数方法(Transfer Function),它可以在机器水平上减少数据不一致问题。此外,研究者还引入了机器至机器(M2M)转换函数,它可以在不同机器之间转移模型。results: 研究结果显示,将数据通过转换函数后,模型的准确性从原本的50%提高到了90%以上,AUC也从0.40提高到0.99。此外,研究者发现,选择calibration phantom的选择有着重要的影响,并且使用循环推断的Wiener filtering可以实现将一个机器的数据转移到另一个机器上。Abstract
A Transfer Function approach was recently demonstrated to mitigate data mismatches at the acquisition level for a single ultrasound scanner in deep learning (DL) based quantitative ultrasound (QUS). As a natural progression, we further investigate the transfer function approach and introduce a Machine-to-Machine (M2M) Transfer Function, which possesses the ability to mitigate data mismatches at a machine level, i.e., mismatches between two scanners over the same frequency band. This ability opens the door to unprecedented opportunities for reducing DL model development costs, enabling the combination of data from multiple sources or scanners, or facilitating the transfer of DL models between machines with ease. We tested the proposed method utilizing a SonixOne machine and a Verasonics machine. In the experiments, we used a L9-4 array and conducted two types of acquisitions to obtain calibration data: stable and free-hand, using two different calibration phantoms. Without the proposed calibration method, the mean classification accuracy when applying a model on data acquired from one system to data acquired from another system was approximately 50%, and the mean AUC was about 0.40. With the proposed method, mean accuracy increased to approximately 90%, and the AUC rose to the 0.99. Additional observations include that shifts in statistics for the z-score normalization had a significant impact on performance. Furthermore, the choice of the calibration phantom played an important role in the proposed method. Additionally, robust implementation inspired by Wiener filtering provided an effective method for transferring the domain from one machine to another machine, and it can succeed using just a single calibration view without the need for multiple independent calibration frames.
摘要
Recently, a Transfer Function approach was introduced to address data mismatches at the acquisition level for deep learning (DL) based quantitative ultrasound (QUS) using a single ultrasound scanner. Building upon this, we further investigate the transfer function approach and propose a Machine-to-Machine (M2M) Transfer Function, which can mitigate data mismatches between two scanners over the same frequency band. This ability enables the reduction of DL model development costs, the combination of data from multiple sources or scanners, and the transfer of DL models between machines with ease. We tested the proposed method using a SonixOne machine and a Verasonics machine, with a L9-4 array and two types of acquisitions (stable and free-hand) using two different calibration phantoms. Without the proposed calibration method, the mean classification accuracy was approximately 50%, and the mean AUC was about 0.40. With the proposed method, mean accuracy increased to approximately 90%, and the AUC rose to 0.99. Additionally, we observed that shifts in statistics for z-score normalization had a significant impact on performance, and the choice of calibration phantom played an important role in the proposed method. Furthermore, a robust implementation inspired by Wiener filtering provided an effective method for transferring the domain from one machine to another, succeeding with just a single calibration view without the need for multiple independent calibration frames.
Modular Customizable ROS-Based Framework for Rapid Development of Social Robots
for: developing socially competent robots with tight integration of robotics, computer vision, speech processing, and web technologies
methods: using an open-source framework called Socially-interactive Robot Software platform (SROS) with a modular layered architecture, bridging Robot Operating System (ROS) with web and Android interface layers, and implementing specialized perceptual and interactive skills as ROS services
results: successfully validated core technologies including computer vision, speech processing, and GPT2 autocomplete speech, demonstrated modularity through integration of an additional ROS package, and enabled synchronized cross-domain interaction and multimodal behaviors on an example platformAbstract
Developing socially competent robots requires tight integration of robotics, computer vision, speech processing, and web technologies. We present the Socially-interactive Robot Software platform (SROS), an open-source framework addressing this need through a modular layered architecture. SROS bridges the Robot Operating System (ROS) layer for mobility with web and Android interface layers using standard messaging and APIs. Specialized perceptual and interactive skills are implemented as ROS services for reusable deployment on any robot. This facilitates rapid prototyping of collaborative behaviors that synchronize perception with physical actuation. We experimentally validated core SROS technologies including computer vision, speech processing, and GPT2 autocomplete speech implemented as plug-and-play ROS services. Modularity is demonstrated through the successful integration of an additional ROS package, without changes to hardware or software platforms. The capabilities enabled confirm SROS's effectiveness in developing socially interactive robots through synchronized cross-domain interaction. Through demonstrations showing synchronized multimodal behaviors on an example platform, we illustrate how the SROS architectural approach addresses shortcomings of previous work by lowering barriers for researchers to advance the state-of-the-art in adaptive, collaborative customizable human-robot systems through novel applications integrating perceptual and social abilities.
摘要
开发社交能力Robot需要紧密的机器人、计算机视觉、语音处理和网络技术的集成。我们提出了社交互动Robot软件平台(SROS),这是一个开源框架,通过模块化层次架构来满足这个需求。SROS将ROS层(Robot操作系统层)与网络和Android接口层用标准的消息和API进行桥接。特殊的感知和互动技能被实现为ROS服务,可以重用地部署在任何机器人上。这使得可以快速实现协作行为,同步感知与物理 actuation。我们实验 validate了核心SROS技术,包括计算机视觉、语音处理和GPT2自动完成语音实现为插件式ROS服务。模块性被证明通过不改变硬件或软件平台上成功地集成一个额外的ROS包。功能启用了证明SROS的有效性,通过同步多Modal功能来开发社交互动Robot。通过示例平台上的示例,我们说明了SROS的建筑方法如何解决过去的工作缺点,例如降低了对研究人员的障碍,以便通过新应用程序来提高可靠的、协作、可定制人机系统的状态艺术。
Model-based reconstructions for quantitative imaging in photoacoustic tomography
results: 本文主要介绍了photoacoustic tomography中的重建结果,包括量值的回归和图像的重建。Abstract
The reconstruction task in photoacoustic tomography can vary a lot depending on measured targets, geometry, and especially the quantity we want to recover. Specifically, as the signal is generated due to the coupling of light and sound by the photoacoustic effect, we have the possibility to recover acoustic as well as optical tissue parameters. This is referred to as quantitative imaging, i.e, correct recovery of physical parameters and not just a qualitative image. In this chapter, we aim to give an overview on established reconstruction techniques in photoacoustic tomography. We start with modelling of the optical and acoustic phenomena, necessary for a reliable recovery of quantitative values. Furthermore, we give an overview of approaches for the tomographic reconstruction problem with an emphasis on the recovery of quantitative values, from direct and fast analytic approaches to computationally involved optimisation based techniques and recent data-driven approaches.
摘要
<>图像成像任务在光子颤音tomography中可以很不同,具体取决于测量目标、几何学和我们想要重建的量。特别是,由光子颤音效应生成的信号可以回归到音频以及光学参数。这称为量化成像,即正确地回归物理参数,而不仅是Qualitative图像。在这个章节中,我们将提供 photoacoustic tomography已知的重建技术的概述。我们从optical和Acoustic现象的模型开始,以确保可靠地回归量化值。此外,我们还会对tomographic重建问题进行概述,强调回归量化值的方法,从直接和快速分析方法到计算涉及优化方法和最近的数据驱动方法。>>>