paper_authors: Yue Li, Junru Li, Chaoyi Lin, Kai Zhang, Li Zhang, Franck Galpin, Thierry Dumas, Hongtao Wang, Muhammed Coban, Jacob Ström, Du Liu, Kenneth Andersson
for: 这篇论文主要是关于 neural network-based video coding (NNVC) 的研究和应用。
methods: 这篇论文使用了两种主要的 neural network-based video coding 技术:卷积 neural network-based intra prediction 和卷积 neural network-based in-loop filtering。
results: 对于 random-access、low-delay 和 all-intra 配置,使用了提出的 NN-based coding tools 可以实现 {11.94%, 21.86%, 22.59%} BD-rate reductions 的平均提升。Abstract
The past decade has witnessed the huge success of deep learning in well-known artificial intelligence applications such as face recognition, autonomous driving, and large language model like ChatGPT. Recently, the application of deep learning has been extended to a much wider range, with neural network-based video coding being one of them. Neural network-based video coding can be performed at two different levels: embedding neural network-based (NN-based) coding tools into a classical video compression framework or building the entire compression framework upon neural networks. This paper elaborates some of the recent exploration efforts of JVET (Joint Video Experts Team of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC29) in the name of neural network-based video coding (NNVC), falling in the former category. Specifically, this paper discusses two major NN-based video coding technologies, i.e. neural network-based intra prediction and neural network-based in-loop filtering, which have been investigated for several meeting cycles in JVET and finally adopted into the reference software of NNVC. Extensive experiments on top of the NNVC have been conducted to evaluate the effectiveness of the proposed techniques. Compared with VTM-11.0_nnvc, the proposed NN-based coding tools in NNVC-4.0 could achieve {11.94%, 21.86%, 22.59%}, {9.18%, 19.76%, 20.92%}, and {10.63%, 21.56%, 23.02%} BD-rate reductions on average for {Y, Cb, Cr} under random-access, low-delay, and all-intra configurations respectively.
摘要
过去一代,深度学习在知名人工智能应用中取得了巨大成功,如面部识别、自动驾驶和大型语言模型如ChatGPT。近些年,深度学习的应用范围已经扩展到了非常广泛,其中包括神经网络基于的视频编码。神经网络基于的视频编码可以在两个不同的水平进行:在经典视频压缩框架中嵌入神经网络基于的编码工具,或者建立整个压缩框架基于神经网络。这篇论文介绍了过去几年,JVET(国际电信标准化组织ITU-T SG 16 WP 3和ISO/IEC JTC 1/SC29联合视频专家小组)在神经网络基于的视频编码(NNVC)方面的一些探索努力。具体来说,这篇论文讨论了JVET在过去几个会议征程中 investigate的两大神经网络基于视频编码技术:神经网络基于内部预测和神经网络基于循环滤波。这两种技术在JVET的参考软件中被采纳,并进行了大量的实验来评估这些技术的效果。相比VTM-11.0_nnvc,NNVC-4.0中的神经网络基于编码工具可以实现{11.94%, 21.86%, 22.59%}、{9.18%, 19.76%, 20.92%}和{10.63%, 21.56%, 23.02%}的BD-rate减少平均值,对于{Y, Cb, Cr} unter random-access、low-delay和all-intra配置分别。
Diffusion-based Adversarial Purification for Robust Deep MRI Reconstruction
results: 对比主流防御方法(如对抗训练和随机缓和),我们的提议方法可以更好地提高MRI重建图像的鲁棒性和安全性。Abstract
Deep learning (DL) methods have been extensively employed in magnetic resonance imaging (MRI) reconstruction, demonstrating remarkable performance improvements compared to traditional non-DL methods. However, recent studies have uncovered the susceptibility of these models to carefully engineered adversarial perturbations. In this paper, we tackle this issue by leveraging diffusion models. Specifically, we introduce a defense strategy that enhances the robustness of DL-based MRI reconstruction methods through the utilization of pre-trained diffusion models as adversarial purifiers. Unlike conventional state-of-the-art adversarial defense methods (e.g., adversarial training), our proposed approach eliminates the need to solve a minimax optimization problem to train the image reconstruction model from scratch, and only requires fine-tuning on purified adversarial examples. Our experimental findings underscore the effectiveness of our proposed technique when benchmarked against leading defense methodologies for MRI reconstruction such as adversarial training and randomized smoothing.
摘要
深度学习(DL)方法已广泛应用于 магни共振成像(MRI)重建,表现出了非常出色的性能提高 compared to traditional non-DL 方法。然而,最近的研究发现,这些模型对特殊设计的恶作剂抗干扰有极高的感受性。在这篇论文中,我们解决这个问题,通过利用扩散模型。我们首先介绍了一种防御策略,通过预训练的扩散模型来增强 DL-based MRI 重建方法的Robustness。与传统的State-of-the-art adversarial defense方法(例如,对抗训练)不同,我们的提议方法不需要解决一个 minimax 优化问题来训练图像重建模型,只需要在纯化的恶作剂例子上进行细调。我们的实验结果表明,我们的提议技术对于 MRI 重建中的防御方法进行了证明,并且比领先的防御方法(如对抗训练和随机滤波)更有效。
From Capture to Display: A Survey on Volumetric Video
results: 本论文通过对现有Literature的审核和分析,探讨了volumetric video服务的多种应用场景和未来研究机会,并提供了一些未来研究方向的想法和建议。Abstract
Volumetric video, which offers immersive viewing experiences, is gaining increasing prominence. With its six degrees of freedom, it provides viewers with greater immersion and interactivity compared to traditional videos. Despite their potential, volumetric video services poses significant challenges. This survey conducts a comprehensive review of the existing literature on volumetric video. We firstly provide a general framework of volumetric video services, followed by a discussion on prerequisites for volumetric video, encompassing representations, open datasets, and quality assessment metrics. Then we delve into the current methodologies for each stage of the volumetric video service pipeline, detailing capturing, compression, transmission, rendering, and display techniques. Lastly, we explore various applications enabled by this pioneering technology and we present an array of research challenges and opportunities in the domain of volumetric video services. This survey aspires to provide a holistic understanding of this burgeoning field and shed light on potential future research trajectories, aiming to bring the vision of volumetric video to fruition.
摘要
三维视频技术在吸引人们的视觉经验方面占据着越来越重要的地位。它的六个自由度使得观众能够更深入地参与到视频中,与传统视频相比,具有更高的吸引力和互动性。然而,三维视频服务也面临着一些挑战。这篇评论通过对现有文献的审核,为读者提供了三维视频服务的全面性评价。我们首先提供了三维视频服务的通用框架,然后讨论了三维视频的前提条件,包括表示、开放数据集和质量评价指标。接着,我们详细介绍了每个三维视频服务管道阶段的方法,包括捕获、压缩、传输、渲染和显示技术。最后,我们探讨了三维视频服务所带来的各种应用,并提出了这一领域的一些研究挑战和机遇。这篇评论的目的是为读者提供三维视频服务领域的总体理解,并且预测未来研究的趋势,以便实现三维视频的未来视野。
A survey on real-time 3D scene reconstruction with SLAM methods in embedded systems
results: 文章介绍了在实际应用中的实时性、内存管理和功耗优化,以及在不同粒度的3D场景重建方面的质量和性能评估。Abstract
The 3D reconstruction of simultaneous localization and mapping (SLAM) is an important topic in the field for transport systems such as drones, service robots and mobile AR/VR devices. Compared to a point cloud representation, the 3D reconstruction based on meshes and voxels is particularly useful for high-level functions, like obstacle avoidance or interaction with the physical environment. This article reviews the implementation of a visual-based 3D scene reconstruction pipeline on resource-constrained hardware platforms. Real-time performances, memory management and low power consumption are critical for embedded systems. A conventional SLAM pipeline from sensors to 3D reconstruction is described, including the potential use of deep learning. The implementation of advanced functions with limited resources is detailed. Recent systems propose the embedded implementation of 3D reconstruction methods with different granularities. The trade-off between required accuracy and resource consumption for real-time localization and reconstruction is one of the open research questions identified and discussed in this paper.
摘要
三维重建(3D reconstruction)是交通系统如无人机、服务机器人和移动AR/VR设备等领域的重要话题。与点云表示相比,基于多面体和 voxel 的三维重建特别有用于高级功能,如避免障碍物或与物理环境交互。本文介绍了资源限制的硬件平台上的视觉基于的三维场景重建管道的实现。实时性、内存管理和低功耗是嵌入式系统的关键要求。一个普通的 SLAM 管道从感知器到三维重建被描述,包括可能的深度学习应用。实现高级功能的限制是讨论的一个开放研究问题。文章还讨论了不同粒度的三维重建方法的嵌入实现,以及实时位置和重建的资源消耗和精度之间的负荷。