eess.IV - 2023-10-23

Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

  • paper_url: http://arxiv.org/abs/2310.15163
  • repo_url: None
  • paper_authors: Ahmed Telili, Wassim Hamidouche, Hadi Amirpour, Sid Ahmed Fezza, Luce Morin, Christian Timmerer
  • for: 这篇论文旨在概述不同的方法来预测内容优化的比特率梯度,以提高OTT视频流服务的流畅性。
  • methods: 这篇论文评估了多种方法,包括传统的和学习基于的方法,以预测内容优化的比特率梯度。
  • results: 这篇论文在使用大规模数据集进行了 benchmark 研究,并评估了多种学习基于的方法,以预测内容优化的比特率梯度。
    Abstract HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper.
    摘要 最简单且最低复杂度的方法是使用固定的比特率组,这些组合包括预先决定的比特率和分辨率的对。 然而,这些方法可能无法提供最佳的比特率组,因为每个影片都有不同的内容和质量需求。 因此,许多技术已经被提出供预测内容基于的比特率组,而不需要进行成本高昂的探索性编码。本文提供了各种方法的全面评论,包括传统和学习基于的方法。 此外,我们进行了专注于不同学习基于的方法的benchmark研究,以评估这些方法在多种codec设置下的性能。我们的分析提供了基线方法和对照,这些将是未来在这个领域的研究中的价值。我们将在接下来发布的proposed benchmark和dataset中公开source code。

DeepOrientation: convolutional neural network for fringe pattern orientation map estimation

  • paper_url: http://arxiv.org/abs/2310.15209
  • repo_url: https://github.com/mariasi1/deeporientationnetmodel
  • paper_authors: Maria Cywinska, Mikolaj Rogalski, Filip Brzeski, Krzysztof Patorski, Maciej Trusiak
  • for: 本研究旨在提出一种基于卷积神经网络和深度学习的本地弯曲方向图像分割方法,以便在全场光学测量中准确地估计本地弯曲方向图像。
  • methods: 本研究使用了卷积神经网络和深度学习来实现本地弯曲方向图像分割。
  • results: 实验和数值仿真结果表明,提出的 DeepOrientation 方法可以准确地估计本地弯曲方向图像,并且比传统的方法(合并平面适应/梯度法)更加稳定和自动化。
    Abstract Fringe pattern based measurement techniques are the state-of-the-art in full-field optical metrology. They are crucial both in macroscale, e.g., fringe projection profilometry, and microscale, e.g., label-free quantitative phase microscopy. Accurate estimation of the local fringe orientation map can significantly facilitate the measurement process on various ways, e.g., fringe filtering (denoising), fringe pattern boundary padding, fringe skeletoning (contouring/following/tracking), local fringe spatial frequency (fringe period) estimation and fringe pattern phase demodulation. Considering all of that the accurate, robust and preferably automatic estimation of local fringe orientation map is of high importance. In this paper we propose novel numerical solution for local fringe orientation map estimation based on convolutional neural network and deep learning called DeepOrientation. Numerical simulations and experimental results corroborate the effectiveness of the proposed DeepOrientation comparing it with the representative of the classical approach to orientation estimation called combined plane fitting/gradient method. The example proving the effectiveness of DeepOrientation in fringe pattern analysis, which we present in this paper is the application of DeepOrientation for guiding the phase demodulation process in Hilbert spiral transform. In particular, living HeLa cells quantitative phase imaging outcomes verify the method as an important asset in label-free microscopy.
    摘要 异常模式基于测量技术是现代全场光学测量领域的州际标准。它们在 макро尺度上,如异常投影 Profilometry,以及微尺度上,如无标签量phasemicroscopy。 precisely estimating the local fringe orientation map can significantly facilitate the measurement process in various ways, such as fringe filtering (denoising), fringe pattern boundary padding, fringe skeletoning (contouring/following/tracking), local fringe spatial frequency (fringe period) estimation, and fringe pattern phase demodulation. Therefore, accurately, robustly, and preferably automatically estimating the local fringe orientation map is of great importance. In this paper, we propose a novel numerical solution for local fringe orientation map estimation based on convolutional neural networks and deep learning called DeepOrientation. Numerical simulations and experimental results confirm the effectiveness of the proposed DeepOrientation compared to the classical approach to orientation estimation called combined plane fitting/gradient method. The example demonstrating the effectiveness of DeepOrientation in fringe pattern analysis, which we present in this paper, is the application of DeepOrientation for guiding the phase demodulation process in Hilbert spiral transform. In particular, the quantitative phase imaging outcomes of living HeLa cells verify the method as an important asset in label-free microscopy.

The AIMI Initiative: AI-Generated Annotations for Imaging Data Commons Collections

  • paper_url: http://arxiv.org/abs/2310.14897
  • repo_url: None
  • paper_authors: Gowtham Krishnan Murugesan, Diana McCrumb, Mariam Aboian, Tej Verma, Rahul Soni, Fatima Memon, Jeff Van Oss
  • for: This paper aims to contribute to the research and development of advanced imaging tools and algorithms by providing AI-generated annotations for 11 medical imaging collections from the Image Data Commons (IDC).
  • methods: The authors used both publicly available and novel AI algorithms, along with expert annotations, to create the AI-generated annotations. They also reviewed and corrected a portion of the AI annotations with a radiologist to assess the AI models’ performances.
  • results: The study provided AI-generated annotations for 11 medical imaging collections from the IDC, covering modalities such as CT, MRI, and PET, and focusing on the chest, breast, kidneys, prostate, and liver. The study demonstrated the potential of expansive publicly accessible datasets and AI for increasing accessibility and reliability in cancer imaging research and development.
    Abstract The Image Data Commons (IDC) contains publicly available cancer radiology datasets that could be pertinent to the research and development of advanced imaging tools and algorithms. However, the full extent of its research capabilities is limited by the fact that these datasets have few, if any, annotations associated with them. Through this study with the AI in Medical Imaging (AIMI) initiative a significant contribution, in the form of AI-generated annotations, was made to provide 11 distinct medical imaging collections from the IDC with annotations. These collections included computed tomography (CT), magnetic resonance imaging (MRI), and positron emission tomography (PET) imaging modalities. The main focus of these annotations were in the chest, breast, kidneys, prostate, and liver. Both publicly available and novel AI algorithms were adopted and further developed using open-sourced data coupled with expert annotations to create the AI-generated annotations. A portion of the AI annotations were reviewed and corrected by a radiologist to assess the AI models' performances. Both the AI's and the radiologist's annotations conformed to DICOM standards for seamless integration into the IDC collections as third-party analyses. This study further cements the well-documented notion that expansive publicly accessible datasets, in the field of cancer imaging, coupled with AI will aid in increased accessibility as well as reliability for further research and development.
    摘要 Image Data Commons (IDC) 包含公共可用的癌症医学像素数据集,这些数据集可能对预测和开发高级医学像素处理算法进行研究和开发提供有益。然而,IDC的全面研究能力受限因为这些数据集几乎没有注释。通过本研究与医学计算机视觉(AIMI)计划的合作,对IDC中的11个医学像素集进行了AI生成的注释。这些集合包括计算机扫描(CT)、核磁共振成像(MRI)和 позитроном辐射Tomography(PET)成像方式。主要注释焦点在胸部、乳腺、肾脏、膀胱和肝脏。使用公共可用的AI算法和专家注释开发了AI生成的注释。一部分AI注释由医生审核并修正以评估AI模型的性能。AI和医生的注释都遵循DICOM标准,以便轻松地集成到IDC集合中作为第三方分析。这项研究进一步证明了许多公共可用的癌症医学像素数据集,结合AI,将会提高研究和开发的可 accessible性和可靠性。

Joint Non-Linear MRI Inversion with Diffusion Priors

  • paper_url: http://arxiv.org/abs/2310.14842
  • repo_url: None
  • paper_authors: Moritz Erlacher, Martin Zach
  • for: 加速MRI扫描过程,提高图像质量
  • methods: 使用数据驱动重构方法,并jointly estimate the sensitivity maps with the image
  • results: 实现了高效、高精度的MRI扫描,并且计算了高精度的扫描图像
    Abstract Magnetic resonance imaging (MRI) is a potent diagnostic tool, but suffers from long examination times. To accelerate the process, modern MRI machines typically utilize multiple coils that acquire sub-sampled data in parallel. Data-driven reconstruction approaches, in particular diffusion models, recently achieved remarkable success in reconstructing these data, but typically rely on estimating the coil sensitivities in an off-line step. This suffers from potential movement and misalignment artifacts and limits the application to Cartesian sampling trajectories. To obviate the need for off-line sensitivity estimation, we propose to jointly estimate the sensitivity maps with the image. In particular, we utilize a diffusion model -- trained on magnitude images only -- to generate high-fidelity images while imposing spatial smoothness of the sensitivity maps in the reverse diffusion. The proposed approach demonstrates consistent qualitative and quantitative performance across different sub-sampling patterns. In addition, experiments indicate a good fit of the estimated coil sensitivities.
    摘要

First realization of macroscopic Fourier ptychography for hundred-meter distance sub-diffraction imaging

  • paper_url: http://arxiv.org/abs/2310.14515
  • repo_url: None
  • paper_authors: Qi Zhang, Yuran Lu, Yinghui Guo, Yingjie Shang, Mingbo Pu, Yulong Fan, Rui Zhou, Xiaoyin Li, Fei Zhang, Mingfeng Xu, Xiangang Luo
  • for: 提高尺度为10米的远程超分色限 imaging
  • methods: 使用弯光函数优化和目标图像联合优化方法,从而实现补做镜像和同时估计折射函数
  • results: 实验中使用这种方法可以提高最大投影距离至12米、90米和170米,同时提高最大合成孔径至200毫米,相比之前的研究result有一个数量级的提高,并且解决了FOV limitation问题,从而开启了macroscopic FP的新阶段发展。
    Abstract Fourier ptychography (FP) imaging, drawing on the idea of synthetic aperture, has been demonstrated as a potential approach for remote sub-diffraction-limited imaging. Nevertheless, the farthest imaging distance is still limited around 10 m even though there has been a significant improvement in macroscopic FP. The most severely issue in increasing the imaging distance is FoV limitation caused by far-field condition for diffraction. Here, we propose to modify the Fourier far-field condition for rough reflective objects, aiming to overcome the small FoV limitation by using a divergent beam to illuminate objects. A joint optimization of pupil function and target image is utilized to attain the aberration-free image while estimating the pupil function simultaneously. Benefiting from the optimized reconstruction algorithm which effectively expands the camera's effective aperture, we experimentally implement several FP systems suited for imaging distance of 12 m, 90 m, and 170 m with the maximum synthetic aperture of 200 mm. The maximum imaging distance and synthetic aperture are thus improved by more than one order of magnitude of the state-of-the-art works with a fourfold improvement in the resolution. Our findings demonstrate significant potential for advancing the field of macroscopic FP, propelling it into a new stage of development.
    摘要 福尔勒普图графи(FP)成像,基于合成孔径的想法,已经被证明可以作为远程下限捕集成像的潜在方法。然而,最远捕集距离仍然受到10米的限制,即使有 significanth improvement in macroscopic FP。最严重的问题在于提高捕集距离的 FoV 限制,是由于远场干扰的折射所致。我们提议修改 Fourier 远场干扰的条件,以超越小 FoV 的限制。我们使用折射光束照射对象,并对目标图像和 pupil function 进行联合优化,以获得无扰图像,同时也可以估计 pupil function。由于改进的重建算法,我们实际实现了一些适合12米、90米和170米的FP系统,其中最大合成孔径达200毫米。因此,我们实现了一个至少一个数量级的提高,比之前的状态ola工作。我们的发现表明FP在 macroscopic 领域可能会进入一个新的发展阶段。