eess.IV - 2023-10-19

Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

  • paper_url: http://arxiv.org/abs/2310.13093
  • repo_url: None
  • paper_authors: Thomas Amestoy, Naty Sidaty, Wassim Hamidouche, Pierrick Philippe, Daniel Menard
  • for: 这个论文的目的是对最新的视频编码标准VVC(Versatile Video Coding)与其前一代标准HEVC(High Efficiency Video Coding)的编码性能和复杂性进行比较分析。
  • methods: 本研究使用了多种测试序列,覆盖了高清分辨率(HD)和超高清分辨率(UHD)的分辨率范围,并在各种比特率范围内进行了编码。测试序列使用了HEVC(HM)和VVC(VTM)的参考软件编码器。
  • results: 结果表明,VVC在对比HEVC的情况下,可以实现比特率下降范围为31%至40%,具体取决于视频内容、空间分辨率和选择的质量指标。然而,这些编码效率提升的成本是计算复杂性的增加。在平均情况下,VVC解码过程比HEVC解码过程快1.5倍,而编码过程则变得至少8倍于HEVC参考编码器。
    Abstract In recent years, the proliferation of multimedia applications and formats, such as IPTV, Virtual Reality (VR, 360-degree), and point cloud videos, has presented new challenges to the video compression research community. Simultaneously, there has been a growing demand from users for higher resolutions and improved visual quality. To further enhance coding efficiency, a new video coding standard, Versatile Video Coding (VVC), was introduced in July 2020. This paper conducts a comprehensive analysis of coding performance and complexity for the latest VVC standard in comparison to its predecessor, High Efficiency Video Coding (HEVC). The study employs a diverse set of test sequences, covering both High Definition (HD) and Ultra High Definition (UHD) resolutions, and spans a wide range of bit-rates. These sequences are encoded using the reference software encoders of HEVC (HM) and VVC (VTM). The results consistently demonstrate that VVC outperforms HEVC, achieving bit-rate savings of up to 40% on the subjective quality scale, particularly at realistic bit-rates and quality levels. Objective quality metrics, including PSNR, SSIM, and VMAF, support these findings, revealing bit-rate savings ranging from 31% to 40%, depending on the video content, spatial resolution, and the selected quality metric. However, these improvements in coding efficiency come at the cost of significantly increased computational complexity. On average, our results indicate that the VVC decoding process is 1.5 times more complex, while the encoding process becomes at least eight times more complex than that of the HEVC reference encoder. Our simultaneous profiling of the two standards sheds light on the primary evolutionary differences between them and highlights the specific stages responsible for the observed increase in complexity.
    摘要 近年来, multimedia 应用和格式的普及,如 IPTV、虚拟现实(VR, 360度)以及点云视频,对视频压缩研究 сообщество带来了新的挑战。同时,用户对高分辨率和改进的视觉质量产生了增加的需求。为了进一步提高编码效率,2020年7月引入了一新的视频编码标准——多样化视频编码(VVC)。本文对最新的VVC标准和其前任者高效视频编码(HEVC)进行了全面的编码性能和复杂度分析。研究使用了一组多样化的测试序列,覆盖了高清(HD)和超高清(UHD)的分辨率,并覆盖了各种比特率范围。这些序列使用HEVC(HM)和VVC(VTM)的参考软件编码器进行编码。结果表明,VVC在主观质量标准下与HEVC进行比较,可以实现比特率下降达40%,特别是在现实比特率和质量水平下。对象质量指标,包括PSNR、SSIM和VMAF,支持这些发现,显示VVC在视频内容、空间分辨率和选择的质量指标下实现比特率下降在31%到40%之间。然而,这些编码效率改进的成本是计算复杂性增加的代价。在我们的结果中,VVC解码过程的计算复杂性提高了1.5倍,而编码过程则变得至少8倍于HEVC参考编码器。我们同时进行了HEVC和VVC的并行分析,并指出了这两个标准之间的主要进化差异,以及具体的阶段贡献到所见的复杂性增加。

Product of Gaussian Mixture Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.12653
  • repo_url: https://github.com/vlogroup/pogmdm
  • paper_authors: Martin Zach, Erich Kobler, Antonin Chambolle, Thomas Pock
  • for: 这个论文目的是为了估计Random变量X的总体分布f_X。
  • methods: 这个论文使用Successive Smoothing方法,使得变量Y满足噪声partial differential equation(PDE),即(∂_t - Δ_1)f_Y(⋅, t) = 0,并且Initial condition为f_Y(⋅, 0) = f_X。具体来说,这个论文提出了一种product-of-experts-type模型,使用Gaussian mixture experts,并研究了可以 analytic expression 的配置。
  • results: 这个论文通过numerical results表明,这种模型在image denoising中是竞争力强,可读性好、 Parameters少。此外,模型还可以用于静态噪声估计,允许无监测的图像减噪。
    Abstract In this work we tackle the problem of estimating the density $ f_X $ of a random variable $ X $ by successive smoothing, such that the smoothed random variable $ Y $ fulfills the diffusion partial differential equation $ (\partial_t - \Delta_1)f_Y(\,\cdot\,, t) = 0 $ with initial condition $ f_Y(\,\cdot\,, 0) = f_X $. We propose a product-of-experts-type model utilizing Gaussian mixture experts and study configurations that admit an analytic expression for $ f_Y (\,\cdot\,, t) $. In particular, with a focus on image processing, we derive conditions for models acting on filter-, wavelet-, and shearlet responses. Our construction naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes. We show numerical results for image denoising where our models are competitive while being tractable, interpretable, and having only a small number of learnable parameters. As a byproduct, our models can be used for reliable noise estimation, allowing blind denoising of images corrupted by heteroscedastic noise.
    摘要 在这项工作中,我们解决了一个估计 Random Variable X 的概率分布 f_X 的问题,通过连续缓和,使得缓和后的 Random Variable Y 满足噪声partial differential equation(PDE)$ (\partial_t - \Delta_1)f_Y(\,\cdot\,, t) = 0 $ 的初始条件 $ f_Y(\,\cdot\,, 0) = f_X $。我们提出了一种product-of-experts-type模型,使用 Gaussian mixture experts,并研究了可以表示 $ f_Y (\,\cdot\,, t) $ 的配置。特别是在图像处理方面,我们 derivate了对 filter-, wavelet- 和 shearlet responses 的模型。我们的建构 naturally allows the model to be trained simultaneously over the entire diffusion horizon using empirical Bayes。我们展示了对图像减震的numerical result,其中我们的模型能够与其他模型竞争,同时具有可读性、可解释性和只有小量可学习参数。此外,我们的模型还可以用于可靠地 estimating noise,allowing blind denoising of images corrupted by heteroscedastic noise。Note that Simplified Chinese is a written language, and the translation is based on the standardized grammar and vocabulary of Simplified Chinese. The actual translation may vary depending on the specific context and dialect.

Iterative PnP and its application in 3D-2D vascular image registration for robot navigation

  • paper_url: http://arxiv.org/abs/2310.12551
  • repo_url: None
  • paper_authors: Jingwei Song, Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari
  • for: 这 paper 描述了一种新的实时Robot-Centered 3D-2D血管图像对齐算法,可以抗扭变和实现高精度。
  • methods: 该 paper 使用了高精度 3D-2D регистра图像技术和计算效率要求,并提出了一种基于 Perspective-n-Point(PnP)问题的解决方案。
  • results: 实验表明,提出的算法可以在 50 Hz (静止)和 20 Hz (非静止)的速度下进行对齐,并且与其他工作的对齐精度相似。结果表明,Iterative PnP 适用于未来血管 interven robot 应用。
    Abstract This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose to use the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications.
    摘要 The centerline-based vascular 3D-2D image registration problems are categorized as an iterative Perspective-n-Point (PnP) problem, and the Levenberg-Marquardt solver on the Lie manifold is proposed. Additionally, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the "big-to-small" problem in typical robotic scenarios. Finally, an iterative reweighted least squares method is applied to solve the RKHS-based formulation efficiently.Experiments show that the proposed algorithm can process registration over 50 Hz (rigid) and 20 Hz (non-rigid) and obtains competing registration accuracy similar to other works. Results indicate that the Iterative PnP is suitable for future vascular intervention robot applications.Here is the Simplified Chinese translation:这篇论文报道了一种新的实时机器人 centered 3D-2D血管图像匹配算法,该算法可以抗耗弃和对非固定形态进行匹配。很少的工作能够同时实现实时和高精度的性能,这种工作将高精度 3D-2D 注册技术和机器人应用中的计算效率要求相结合。我们将中心线基于血管 3D-2D 图像注册问题分类为一个迭代 Perspective-n-Point (PnP) 问题,并提议使用 Levenberg-Marquardt 算法在 Lie 替换 manifold 上。然后,我们引入了最近发展的 Reproducing Kernel Hilbert Space (RKHS) 算法,以解决 typical 机器人应用中的 "大到小" 问题。最后,我们使用迭代重点最小二乘法解决 RKHS-based 表示的问题。实验显示,提议的算法可以在 50 Hz (固定) 和 20 Hz (非固定) 的注册速率下进行注册,并与其他工作的注册精度相似。结果表明,Iterative PnP 适用于未来血管介入机器人应用。

Multi-granularity Backprojection Transformer for Remote Sensing Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2310.12507
  • repo_url: None
  • paper_authors: Jinglei Hao, Wukai Li, Binglu Wang, Shunzhou Wang, Yuting Lu, Ning Li, Yongqiang Zhao
  • for: 本研究旨在提高遥感图像超分辨率(RSISR)的性能,尤其是在计算成本方面。
  • methods: 该研究提出了一种多级划分变换器(MBT),它将划分学习策略与转换器框架结合。MBT包括缩放意识划分基于转换器层(SPTL)和层次意识划分基于转换器块(CPTB)。此外,一种划分基于重建模块(PRM)也被引入以增强层次特征。
  • results: 实验结果表明,MBT可以高效地学习低分辨率特征,不需要过度的模块来处理高分辨率处理。MBT在UCMerced和AID数据集上达到了其他领先方法的州态较高的性能。
    Abstract Backprojection networks have achieved promising super-resolution performance for nature images but not well be explored in the remote sensing image super-resolution (RSISR) field due to the high computation costs. In this paper, we propose a Multi-granularity Backprojection Transformer termed MBT for RSISR. MBT incorporates the backprojection learning strategy into a Transformer framework. It consists of Scale-aware Backprojection-based Transformer Layers (SPTLs) for scale-aware low-resolution feature learning and Context-aware Backprojection-based Transformer Blocks (CPTBs) for hierarchical feature learning. A backprojection-based reconstruction module (PRM) is also introduced to enhance the hierarchical features for image reconstruction. MBT stands out by efficiently learning low-resolution features without excessive modules for high-resolution processing, resulting in lower computational resources. Experiment results on UCMerced and AID datasets demonstrate that MBT obtains state-of-the-art results compared to other leading methods.
    摘要 备受期待的Backprojection网络在自然图像超分辨(SR)领域取得了出色的成绩,但在 remote sensing 图像超分辨(RSISR)领域还没有得到充分探索,主要因为计算成本过高。在这篇论文中,我们提出了一种名为 Multi-granularity Backprojection Transformer(MBT)的RSISR方法。MBT将Backprojection学习策略 integrate 到Transformer框架中。它包括Scale-aware Backprojection-based Transformer Layers(SPTLs),用于学习尺度意识的低分辨度特征,以及Context-aware Backprojection-based Transformer Blocks(CPTBs),用于层次特征学习。此外,我们还提出了一种基于Backprojection的重建模块(PRM),用于增强层次特征对图像重建的贡献。MBT的优势在于不需要过多的模块来处理高分辨度数据,从而降低计算资源的消耗。实验结果表明,MBT在UCMerced和AID数据集上达到了与其他领先方法相当的成绩。