eess.IV - 2023-07-18

Regression-free Blind Image Quality Assessment

  • paper_url: http://arxiv.org/abs/2307.09279
  • repo_url: https://github.com/XiaoqiWang/regression-free-iqa
  • paper_authors: Xiaoqi Wang, Jian Xiong, Hao Gao, Weisi Lin
  • for: 提高图像质量评估模型的准确性,避免因训练样本偏袋而导致的模型参数估计偏离 reality。
  • methods: 基于检索相似图像的快速准确评估方法,包括semantic-based classification(SC)模块和distortion-based classification(DC)模块。
  • results: 对四个标准数据库进行实验,研究发现该方法可以remarkably outperform当前最佳的 regression-based 模型。
    Abstract Regression-based blind image quality assessment (IQA) models are susceptible to biased training samples, leading to a biased estimation of model parameters. To mitigate this issue, we propose a regression-free framework for image quality evaluation, which is founded upon retrieving similar instances by incorporating semantic and distortion features. The motivation behind this approach is rooted in the observation that the human visual system (HVS) has analogous visual responses to semantically similar image contents degraded by the same distortion. The proposed framework comprises two classification-based modules: semantic-based classification (SC) module and distortion-based classification (DC) module. Given a test image and an IQA database, the SC module retrieves multiple pristine images based on semantic similarity. The DC module then retrieves instances based on distortion similarity from the distorted images that correspond to each retrieved pristine image. Finally, the predicted quality score is derived by aggregating the subjective quality scores of multiple retrieved instances. Experimental results on four benchmark databases validate that the proposed model can remarkably outperform the state-of-the-art regression-based models.
    摘要 “受训数据受损”问题导致抽象� returns blind图像质量评估(IQA)模型受损。为了解决这个问题,我们提出了一种不含回归的图像质量评估框架,基于检索相似实例。我们发现,人视系统(HVS)在Semantic� 相似的图像内容下具有相似的视觉响应,这成为我们的 Motivation。该框架包括两个分类模块:Semantic-based Classification(SC)模块和Distortion-based Classification(DC)模块。给定一个测试图像和IQA数据库,SC模块首先检索相似的整图,然后DC模块从相应的扭曲图像中检索具有相同扭曲的实例。最后,预测的质量分数由多个检索到的实例的主观质量分数进行汇总得来。实验结果表明,我们提出的模型可以很好地超越当前的回归型模型。

Soft-IntroVAE for Continuous Latent space Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2307.09008
  • repo_url: None
  • paper_authors: Zhi-Song Liu, Zijia Wang, Zhen Jia
  • for: 这个研究是为了提出一个基于Variational AutoEncoder的连续图像超解析方法,以提供实用和灵活的图像扩展 для不同的显示器。
  • methods: 本研究使用了Local implicit image representation来将坐标和2D特征映射到隐藏空间中,并通过一种新的潜在空间对抗训练来实现照相实际的图像重建。
  • results: 研究人员透过量化和质感比较,证明了提案的Soft-introVAE-SR方法的效果,并且显示了其在对照噪声和实际图像超解析中的一般化能力。
    Abstract Continuous image super-resolution (SR) recently receives a lot of attention from researchers, for its practical and flexible image scaling for various displays. Local implicit image representation is one of the methods that can map the coordinates and 2D features for latent space interpolation. Inspired by Variational AutoEncoder, we propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR). A novel latent space adversarial training is achieved for photo-realistic image restoration. To further improve the quality, a positional encoding scheme is used to extend the original pixel coordinates by aggregating frequency information over the pixel areas. We show the effectiveness of the proposed SVAE-SR through quantitative and qualitative comparisons, and further, illustrate its generalization in denoising and real-image super-resolution.
    摘要 <>将文本翻译成简化中文。>latest continuous image super-resolution (SR) technology has gained significant attention from researchers due to its practical and flexible image scaling capabilities for various displays. local implicit image representation is a method that can map coordinates and 2D features to latent space for interpolation. inspired by Variational AutoEncoder, we propose a Soft-introVAE for continuous latent space image super-resolution (SVAE-SR). a novel latent space adversarial training is achieved for photo-realistic image restoration. to further improve quality, a positional encoding scheme is used to extend the original pixel coordinates by aggregating frequency information over pixel areas. we demonstrate the effectiveness of the proposed SVAE-SR through quantitative and qualitative comparisons, and further illustrate its generalization in denoising and real-image super-resolution.Here's the translation in Traditional Chinese:<>将文本翻译成简化中文。>最新的连续图像超解析(SR)技术在研究人员中获得了很大的关注,因为它具有实用和 flexible 的图像扩展功能 для多种显示器。本地隐式图像表示是一种可以将坐标和2D特征映射到 latent space 中的方法,以便进行插值。受 Variational AutoEncoder 的启发,我们提议了 Soft-introVAE для连续 latent space 图像超解析(SVAE-SR)。我们还实现了一种新的 latent space 反击训练,以达到真实图像 Restoration。为了进一步提高质量,我们使用了一个位置编码方案,将原始像素坐标与像素区域的频率信息聚合。我们显示了 SVAE-SR 的效果,通过量itative和质感比较,并进一步显示其扩展到干扰和真实图像超解析。

Frequency-mixed Single-source Domain Generalization for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.09005
  • repo_url: https://github.com/liamheng/non-iid_medical_image_segmentation
  • paper_authors: Heng Li, Haojin Li, Wei Zhao, Huazhu Fu, Xiuyun Su, Yan Hu, Jiang Liu
  • for: 提高医疗影像分类模型的普遍性,特别是当标注数据短缺时。
  • methods: 提出了一个叫做“频率混合单源领域普遍化法”(FreeSDG),利用不同频率的混合 Spectrum 来增强单源领域,同时运用自我监督来学习具有上下文感知的表示。
  • results: 实验结果显示,FreeSDG 比前一代方法更有效率,可以优化医疗影像分类模型的普遍性,特别是当标注数据短缺时。
    Abstract The annotation scarcity of medical image segmentation poses challenges in collecting sufficient training data for deep learning models. Specifically, models trained on limited data may not generalize well to other unseen data domains, resulting in a domain shift issue. Consequently, domain generalization (DG) is developed to boost the performance of segmentation models on unseen domains. However, the DG setup requires multiple source domains, which impedes the efficient deployment of segmentation algorithms in clinical scenarios. To address this challenge and improve the segmentation model's generalizability, we propose a novel approach called the Frequency-mixed Single-source Domain Generalization method (FreeSDG). By analyzing the frequency's effect on domain discrepancy, FreeSDG leverages a mixed frequency spectrum to augment the single-source domain. Additionally, self-supervision is constructed in the domain augmentation to learn robust context-aware representations for the segmentation task. Experimental results on five datasets of three modalities demonstrate the effectiveness of the proposed algorithm. FreeSDG outperforms state-of-the-art methods and significantly improves the segmentation model's generalizability. Therefore, FreeSDG provides a promising solution for enhancing the generalization of medical image segmentation models, especially when annotated data is scarce. The code is available at https://github.com/liamheng/Non-IID_Medical_Image_Segmentation.
    摘要 医学影像分割的标注缺乏问题使得深度学习模型的训练数据不够,这会导致模型在未见的数据域上不好地泛化。为了解决这个问题,域泛化(DG)技术被开发出来,以提高分割模型在未见的数据域上的性能。然而,DG设置需要多个源域,这阻碍了临床应用中的深度学习模型的有效部署。为了解决这个挑战并提高分割模型的泛化性,我们提出了一种新的方法:频率混合单源域泛化方法(FreeSDG)。通过分析频率对域差异的效果,FreeSDG利用混合频率谱来扩展单源域。此外,我们还构建了基于频率域的自我超vision来学习Context-aware表示。实验结果表明,FreeSDG方法可以高效地提高分割模型的泛化性。我们对五个数据集进行了五种modalities的实验,并证明FreeSDG方法可以与当前状态的方法相比,显著提高分割模型的泛化性。因此,FreeSDG方法提供了一种有效的解决医学影像分割模型的标注缺乏问题的方法,特别是当 annotated data scarce 时。代码可以在 上获取。

Learned Scalable Video Coding For Humans and Machines

  • paper_url: http://arxiv.org/abs/2307.08978
  • repo_url: None
  • paper_authors: Hadi Hadizadeh, Ivan V. Bajić
  • for: 这个论文主要是为了支持自动视频分析,而不是人类视觉。
  • methods: 该论文使用了深度神经网络(DNN)来实现视频编码,并使用了 conditional coding 来提高压缩效果。
  • results: 实验结果表明,该系统在基层和优化层中都可以实现更好的压缩效果,并且可以在机器视觉任务和人类视觉任务之间进行可替换。
    Abstract Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce the first end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer. We will provide the implementation of the proposed system at www.github.com upon completion of the review process.
    摘要 视频编码传统上是为服务如视频流式、视频会议、数字电视等服务开发的。主要目的是为人类观看编码内容。然而,随着深度神经网络(DNN)的发展,编码的视频已经在机器自动分析中得到了广泛的应用。例如,在自动交通监测应用中,机器可以通过视频分析来探测、跟踪和计数交通车辆。在这些应用中,人类只需 occasionally 查看可能的意外事件。为支持这些应用,我们需要一种新的视频编码 paradigma,可以帮助高效地表示和压缩视频,以便同时支持机器和人类的使用。在这篇论文中,我们介绍了首个可学习的视频编码器,其基层支持机器视觉任务,而增强层支持人类视觉输入重建。我们的系统基于 conditional coding 的概念,以实现更好的压缩收益。我们在四个标准视频数据集上进行了广泛的实验评估,并证明了我们的框架在基层上比现状 learned 和 conventional 视频编码器更高效,而且在人类视觉任务的增强层中保持了相似的性能。我们将在 GitHub 上提供实现的 propose 系统。

Deep Physics-Guided Unrolling Generalization for Compressed Sensing

  • paper_url: http://arxiv.org/abs/2307.08950
  • repo_url: https://github.com/guaishou74851/prl
  • paper_authors: Bin Chen, Jiechong Song, Jingfen Xie, Jian Zhang
  • for: 这篇论文主要是为了提出一种高精度且可解释的图像重建方法,兼顾了模型驱动和数据驱动方法的优点,以解决 inverse imaging зада题中的问题。
  • methods: 这篇论文提出了一种基于高维特征空间的Physics-guided unrolled recovery learning(PRL)框架,通过普通迭代法实现高精度的图像重建。此外,作者还提出了两种实现方式:PRL-PGD和PRL-RND。
  • results: 实验表明,PRL 网络比其他状态 искусственный方法具有显著的性能和效率优势,并且还有很大的应用前景,可以应用于其他 inverse imaging 问题或优化模型。
    Abstract By absorbing the merits of both the model- and data-driven methods, deep physics-engaged learning scheme achieves high-accuracy and interpretable image reconstruction. It has attracted growing attention and become the mainstream for inverse imaging tasks. Focusing on the image compressed sensing (CS) problem, we find the intrinsic defect of this emerging paradigm, widely implemented by deep algorithm-unrolled networks, in which more plain iterations involving real physics will bring enormous computation cost and long inference time, hindering their practical application. A novel deep $\textbf{P}$hysics-guided un$\textbf{R}$olled recovery $\textbf{L}$earning ($\textbf{PRL}$) framework is proposed by generalizing the traditional iterative recovery model from image domain (ID) to the high-dimensional feature domain (FD). A compact multiscale unrolling architecture is then developed to enhance the network capacity and keep real-time inference speeds. Taking two different perspectives of optimization and range-nullspace decomposition, instead of building an algorithm-specific unrolled network, we provide two implementations: $\textbf{PRL-PGD}$ and $\textbf{PRL-RND}$. Experiments exhibit the significant performance and efficiency leading of PRL networks over other state-of-the-art methods with a large potential for further improvement and real application to other inverse imaging problems or optimization models.
    摘要 通过吸收模型和数据驱动方法的优点,深度物理参与学习方案实现高精度和可解释的图像重建。它在反射图像任务中吸引了越来越多的关注,成为主流。但是,对于图像压缩感知(CS)问题,我们发现了深度算法拆箱网络广泛实施的内在缺陷:更多的简单迭代 iterations 会带来巨大的计算成本和长时间推理时间,限制其实际应用。为解决这个问题,我们提出了一种深度物理指导的解析学习(PRL)框架,通过将传统的迭代回归模型从图像域(ID)扩展到高维特征域(FD),提高网络容量和保持实时推理速度。此外,我们还开发了一种嵌入式多尺度拆箱架构,以增强网络的扩展性和灵活性。为了实现PRL网络的实现,我们提出了两种实现方法:PRL-PGD和PRL-RND。首先,我们使用权值迭代(PGD)方法来实现PRL网络的迭代过程,其中每个迭代都是一个简单的PGD过程。其次,我们使用几何范围零空间分解(RND)方法来实现PRL网络的解析过程,这种方法可以快速地解决图像的缺失信息。实验结果表明,PRL网络在反射图像任务中表现出色,与其他当前最佳方法相比,具有显著的性能和效率优势,同时还有很大的潜在提升空间和实际应用前景。

Image Processing Methods Applied to Motion Tracking of Nanomechanical Buckling on SEM Recordings

  • paper_url: http://arxiv.org/abs/2307.08786
  • repo_url: None
  • paper_authors: Ege Erdem, Berke Demiralp, Hadi S Pisheh, Peyman Firoozy, Ahmet Hakan Karakurt, M. Selim Hanay
  • for: 这个论文是为了解决扫描电子显微镜(SEM)记录的动态纳米电romechanical系统(NEMS)的问题,因为噪声引起的低帧率、不足的分辨率和由应用的电 potential所引起的模糊。
  • methods: 这个论文使用了一种基于物理系统的图像处理算法,用于跟踪NEMS结构在高噪声水平下的动态运动。该算法包括一个图像滤波器、两个数据滤波器和一个非线性回归模型,利用物理解决方案的预期形式。
  • results: 该算法可以跟踪NEMS的动态运动和捕捉了压缩力对矩形杆的弯曲强度的依赖关系。通过该算法,可以清晰地分解NEMS在SEM记录中的转换从间隙弯曲到内隙弯曲的过程。
    Abstract The scanning electron microscope (SEM) recordings of dynamic nano-electromechanical systems (NEMS) are difficult to analyze due to the noise caused by low frame rate, insufficient resolution and blurriness induced by applied electric potentials. Here, we develop an image processing algorithm enhanced by the physics of the underlying system to track the motion of buckling NEMS structures in the presence of high noise levels. The algorithm is composed of an image filter, two data filters, and a nonlinear regression model, which utilizes the expected form of the physical solution. The method was applied to the recordings of a NEMS beam about 150 nm wide, undergoing intra-and inter-well post-buckling states with a transition rate of approximately 0.5 Hz. The algorithm can track the dynamical motion of the NEMS and capture the dependency of deflection amplitude on the compressive force on the beam. With the help of the proposed algorithm, the transition from inter-well to intra-well motion is clearly resolved for buckling NEMS imaged under SEM.
    摘要 电子透镜记录动态纳米电romechanical系统(NEMS)具有较低的帧率、不够的分辨率和应用电压所引起的噪声,使得分析变得困难。在这种情况下,我们开发了一种基于系统物理学的图像处理算法,用于跟踪NEMS结构在高噪声水平下的动态运动。该算法包括一个图像滤波器、两个数据滤波器和一个非线性回归模型,其利用了系统物理学的预期解。该方法应用于一个宽约150nm的NEMS梁,在0.5Hz的过渡率下进行了内部和外部受压变换。该算法可以跟踪NEMS的动态运动和捕捉压缩力的影响于梁的折弯幅度。通过提posed algorithm,对buckling NEMS的图像进行了清晰的分辨和解决。

Implementation of a perception system for autonomous vehicles using a detection-segmentation network in SoC FPGA

  • paper_url: http://arxiv.org/abs/2307.08682
  • repo_url: https://github.com/vision-agh/mt_kria
  • paper_authors: Maciej Baczmanski, Mateusz Wasala, Tomasz Kryjak
  • for: 本研究旨在开发一种高效、实时、能效的自动驾驶感知控制系统,以满足不同道路条件下的障碍物识别和环境元素识别等功能要求。
  • methods: 本文使用MultiTaskV3检测分割网络作为感知系统的基础,并对其进行了适当的训练、量化和实现于AMD Xilinx Kria KV260 Vision AI嵌入式平台。通过这种设备,可以并行加速计算,同时减少能耗。
  • results: 实验结果显示,该系统在对象检测和图像分割方面具有高度准确性(mAP大于97%和mIoU大于90%),并且在实时性和能效性方面也具有优异表现。
    Abstract Perception and control systems for autonomous vehicles are an active area of scientific and industrial research. These solutions should be characterised by high efficiency in recognising obstacles and other environmental elements in different road conditions, real-time capability, and energy efficiency. Achieving such functionality requires an appropriate algorithm and a suitable computing platform. In this paper, we have used the MultiTaskV3 detection-segmentation network as the basis for a perception system that can perform both functionalities within a single architecture. It was appropriately trained, quantised, and implemented on the AMD Xilinx Kria KV260 Vision AI embedded platform. By using this device, it was possible to parallelise and accelerate the computations. Furthermore, the whole system consumes relatively little power compared to a CPU-based implementation (an average of 5 watts, compared to the minimum of 55 watts for weaker CPUs, and the small size (119mm x 140mm x 36mm) of the platform allows it to be used in devices where the amount of space available is limited. It also achieves an accuracy higher than 97% of the mAP (mean average precision) for object detection and above 90% of the mIoU (mean intersection over union) for image segmentation. The article also details the design of the Mecanum wheel vehicle, which was used to test the proposed solution in a mock-up city.
    摘要 自动驾驶车辆的感知和控制系统是科学技术和工业领域的活跃领域。这些解决方案应具备高效地认知障碍物和其他环境元素,实时性和能效性。实现这种功能需要适当的算法和适当的计算平台。在这篇论文中,我们使用了MultiTaskV3检测-分割网络作为感知系统的基础,可以同时完成这两个功能 within 一个架构。它被正确地训练、量化和在AMD Xilinx Kria KV260 Vision AI嵌入式平台上实现。通过使用这个设备,可以并行化和加速计算。此外,整个系统的功耗相对较少,只有5瓦特,比较弱的CPU实现的最低功耗55瓦特,而且平台的尺寸(119mm x 140mm x 36mm)也很小,可以在空间有限的设备中使用。它还实现了对 объек detection的准确率高于97%的mAP,以及对图像分割的准确率高于90%的mIoU。文章还详细介绍了使用的Mecanum轮胎车,该车在模拟城市中测试了提议的解决方案。