eess.IV - 2023-08-08

Blur aware metric depth estimation with multi-focus plenoptic cameras

paper_url: http://arxiv.org/abs/2308.04252
repo_url: https://github.com/comsee-research/blade
paper_authors: Mathieu Labussière, Céline Teulière, Omar Ait-Aider
for: 这个论文的目的是提出一种基于raw图像的多重焦距投影机采集的metric depth estimation算法，以提高对不同 фокус的束缚照片的 disparity估计。
methods: 该方法利用了 Raw 图像中的杂质信息，并通过结合对应关系和杂质信息来提高depth estimation。具体来说，该方法首先使用了 inverse projection 模型来计算depth map，然后通过scale factor来进行准确的深度估计。
results: 实验结果表明，通过引入杂质信息，可以提高depth estimation的准确性。该方法在实际场景中对3D复杂场景进行了测试，并与实际的3D探测器数据进行了比较。

Abstract
While a traditional camera only captures one point of view of a scene, a plenoptic or light-field camera, is able to capture spatial and angular information in a single snapshot, enabling depth estimation from a single acquisition. In this paper, we present a new metric depth estimation algorithm using only raw images from a multi-focus plenoptic camera. The proposed approach is especially suited for the multi-focus configuration where several micro-lenses with different focal lengths are used. The main goal of our blur aware depth estimation (BLADE) approach is to improve disparity estimation for defocus stereo images by integrating both correspondence and defocus cues. We thus leverage blur information where it was previously considered a drawback. We explicitly derive an inverse projection model including the defocus blur providing depth estimates up to a scale factor. A method to calibrate the inverse model is then proposed. We thus take into account depth scaling to achieve precise and accurate metric depth estimates. Our results show that introducing defocus cues improves the depth estimation. We demonstrate the effectiveness of our framework and depth scaling calibration on relative depth estimation setups and on real-world 3D complex scenes with ground truth acquired with a 3D lidar scanner.

摘要
traditional camera 只能捕捉一个场景的一点视角，而 plenoptic 或 light-field camera 则可以在单个捕捉中捕捉场景的空间和方向信息，从而实现深度估计从单个获取。在这篇论文中，我们提出了一种基于 raw 图像的新的深度估计算法，使用多ocus plenoptic 相机。我们的 BLADE 方法旨在利用膨润信息来提高不焦相差图像中的 disparity 估计，因此我们可以更好地利用膨润信息。我们明确地 derivation 一个 inverse projection 模型，包括 defocus 膨润，以提供深度估计。我们还提出了一种准确把 calibration 方法，以考虑深度涨幅。我们的结果表明，在引入膨润信息后，深度估计得到了改善。我们在相对深度估计设置和实际世界3D复杂场景中进行了实验，并与3D激光扫描仪获取的实际深度数据进行了比较。

Under-Display Camera Image Restoration with Scattering Effect

paper_url: http://arxiv.org/abs/2308.04163
repo_url: https://github.com/namecantbenull/srudc
paper_authors: Binbin Song, Xiangyu Chen, Shuning Xu, Jiantao Zhou
for: 提供了一种全屏视图体验，不受notationchs或孔隙等遮挡。
methods: 使用物理扩散模型来处理显示器的散射效应，并设计了一种两支网络来Suppress散射效应。
results: 在实验中，提出的方法在实际数据和synthesized数据上比现状态技术更高效。Here’s the full translation of the paper’s abstract in simplified Chinese:
for: 本研究旨在提供一种全屏视图体验，不受notationchs或孔隙等遮挡。
methods: 本研究使用物理扩散模型来处理显示器的散射效应，并设计了一种两支网络来Suppress散射效应。
results: 在实验中，提出的方法在实际数据和synthesized数据上比现状态技术更高效。Please note that the translation is done in a simplified Chinese format, which may not be as precise as the original English version.

Abstract
The under-display camera (UDC) provides consumers with a full-screen visual experience without any obstruction due to notches or punched holes. However, the semi-transparent nature of the display inevitably introduces the severe degradation into UDC images. In this work, we address the UDC image restoration problem with the specific consideration of the scattering effect caused by the display. We explicitly model the scattering effect by treating the display as a piece of homogeneous scattering medium. With the physical model of the scattering effect, we improve the image formation pipeline for the image synthesis to construct a realistic UDC dataset with ground truths. To suppress the scattering effect for the eventual UDC image recovery, a two-branch restoration network is designed. More specifically, the scattering branch leverages global modeling capabilities of the channel-wise self-attention to estimate parameters of the scattering effect from degraded images. While the image branch exploits the local representation advantage of CNN to recover clear scenes, implicitly guided by the scattering branch. Extensive experiments are conducted on both real-world and synthesized data, demonstrating the superiority of the proposed method over the state-of-the-art UDC restoration techniques. The source code and dataset are available at \url{https://github.com/NamecantbeNULL/SRUDC}.

摘要
《下显示摄像头（UDC）提供了无障碍的全屏视觉体验，但 semi-透明显示器导致UDC图像受到严重抑制。在这种情况下，我们解决UDC图像恢复问题，特别是考虑显示器对图像的散射效应。我们直接模型散射效应，将显示器视为一个具有同样散射特性的媒体来进行物理模型。通过修改图像形成管道，我们构建了真实的UDC数据集，并提供了相应的真实参考值。为抑制散射效应，我们设计了两支分支网络：散射支分支利用通道wise自注意的全局模型来估算散射效应参数，而图像支分支则利用CNN的地方表示优势来恢复清晰场景，协同驱动散射支分支。我们对实际数据和生成数据进行了广泛的实验，证明了我们的方法在UDC恢复技术中的优越性。源代码和数据集可以在 \url{https://github.com/NamecantbeNULL/SRUDC} 中下载。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Towards Top-Down Stereoscopic Image Quality Assessment via Stereo Attention

paper_url: http://arxiv.org/abs/2308.04156
repo_url: https://github.com/fanning-zhang/satnet
paper_authors: Huilin Zhang, Sumei Li, Yongli Chang
for: 这篇论文主要是用于评估三维内容的图像品质评估（SIQA）方法。
methods: 该论文提出了一种基于顺序注意力的新网络方法，使用顶部下降的视角来引导评估过程。该方法可以从高级双目信号下降到低级单目信号，并在处理管道中进行进一步的均衡。
results: 实验结果表明，该方法可以更好地模拟人类视觉系统的特性，并超越当前的状态艺。code可以在https://github.com/Fanning-Zhang/SATNet上下载。

Abstract
Stereoscopic image quality assessment (SIQA) plays a crucial role in evaluating and improving the visual experience of 3D content. Existing binocular properties and attention-based methods for SIQA have achieved promising performance. However, these bottom-up approaches are inadequate in exploiting the inherent characteristics of the human visual system (HVS). This paper presents a novel network for SIQA via stereo attention, employing a top-down perspective to guide the quality assessment process. Our proposed method realizes the guidance from high-level binocular signals down to low-level monocular signals, while the binocular and monocular information can be calibrated progressively throughout the processing pipeline. We design a generalized Stereo AttenTion (SAT) block to implement the top-down philosophy in stereo perception. This block utilizes the fusion-generated attention map as a high-level binocular modulator, influencing the representation of two low-level monocular features. Additionally, we introduce an Energy Coefficient (EC) to account for recent findings indicating that binocular responses in the primate primary visual cortex are less than the sum of monocular responses. The adaptive EC can tune the magnitude of binocular response flexibly, thus enhancing the formation of robust binocular features within our framework. To extract the most discriminative quality information from the summation and subtraction of the two branches of monocular features, we utilize a dual-pooling strategy that applies min-pooling and max-pooling operations to the respective branches. Experimental results highlight the superiority of our top-down method in simulating the property of visual perception and advancing the state-of-the-art in the SIQA field. The code of this work is available at https://github.com/Fanning-Zhang/SATNet.

摘要
三通像质量评估（SIQA）在评估和改进三维内容的视觉体验方面扮演着关键性角色。现有的幂论和注意力基本方法已经实现了承诺性的表现。然而，这些底层方法不充分利用人类视觉系统（HVS）的内在特性。本文提出了一种新的网络 для SIQA，通过三通注意力，实现了顶部下向的指导评估过程。我们的提议方法可以从高级双目信号下降到低级单目信号，并在处理管道中进行进行步进式均衡。我们设计了一个通用的三通注意力块（SAT），以实现顶部下向的哲学思想在三通观察中。这个块利用生成的注意力地图作为高级双目模ulator，影响低级单目特征表示。此外，我们引入了能量系数（EC），以考虑最近的发现，表明双目响应在人类脑顶某处的辐射响应小于单目响应的总和。可以通过自适应EC调整幂论响应的大小，从而提高在我们框架中形成的稳定双目特征。为了从两个支线的单目特征批处中提取最有价值的质量信息，我们采用了双池策略，将各支线的单目特征批处应用最小池化和最大池化操作。实验结果表明，我们的顶部下向方法可以更好地模拟视觉响应和提高SIQA领域的状态。代码可以在https://github.com/Fanning-Zhang/SATNet上获取。

Physics-driven universal twin-image removal network for digital in-line holographic microscopy

paper_url: http://arxiv.org/abs/2308.04471
repo_url: None
paper_authors: Mikołaj Rogalski, Piotr Arcab, Luiza Stanaszek, Vicente Micó, Chao Zuo, Maciej Trusiak
for: 这项研究的目的是提高数字内线干涉微镜技术（DIHM）的计算量相图像识别精度，以便更好地研究细胞移动、迁徙和生物微流体力学。
methods: 这项研究使用了深度学习解决方案UTIRnet，可以快速、稳定地Suppress twin-image noise，并且可以在不同的DIHM系统中实现。
results: 实验证明，UTIRnet可以准确地Suppress twin-image noise，并且保持输入干涉图像的一致性，从而提高计算量相图像识别的可靠性。例如，在live neural glial cell culture migration感测中，UTIRnet可以成功地捕捉细胞移动的动态过程。

Abstract
Digital in-line holographic microscopy (DIHM) enables efficient and cost-effective computational quantitative phase imaging with a large field of view, making it valuable for studying cell motility, migration, and bio-microfluidics. However, the quality of DIHM reconstructions is compromised by twin-image noise, posing a significant challenge. Conventional methods for mitigating this noise involve complex hardware setups or time-consuming algorithms with often limited effectiveness. In this work, we propose UTIRnet, a deep learning solution for fast, robust, and universally applicable twin-image suppression, trained exclusively on numerically generated datasets. The availability of open-source UTIRnet codes facilitates its implementation in various DIHM systems without the need for extensive experimental training data. Notably, our network ensures the consistency of reconstruction results with input holograms, imparting a physics-based foundation and enhancing reliability compared to conventional deep learning approaches. Experimental verification was conducted among others on live neural glial cell culture migration sensing, which is crucial for neurodegenerative disease research.

摘要
数字内线推干微镜（DIHM）可以有效地和经济地实现计算量相对测量图像，具有大视野，这使其成为研究细胞活动、迁徙和生物微流体等领域的 valuables工具。然而，DIHM重建的质量受到双像噪声的限制，这成为一个 significante挑战。传统的方法用于 Mitigating这种噪声包括复杂的硬件设置或时间consuming的算法，其效果往往有限。在这种情况下，我们提出了UTIRnet，一种深度学习解决方案，用于快速、稳定、universally applicable的双像消除，该解决方案基于数字生成的数据集进行训练。UTIRnet的开源代码的可用性使得它可以在不同的 DIHM 系统中实现，无需详细的实验室训练数据。另外，我们的网络 garantizesthe consistency of reconstruction results with input holograms，从而为 DIHM 系统提供一个基于物理的基础，并提高了与传统深度学习方法相比的可靠性。实验证明了我们的UTIRnet在 live neural glial cell culture migration 感知等方面的表现。

Single-shot experimental-numerical twin-image removal in lensless digital holographic microscopy

paper_url: http://arxiv.org/abs/2308.04131
repo_url: None
paper_authors: Piotr Arcab, Mikolaj Rogalski, Maciej Trusiak
for:LDHM imaging offers a large field-of-view and is crucial for high-throughput particle tracking and biomedical examination of cells and tissues, but is limited by the twin-image effect.methods:The proposed technique uses two-source off-axis hologram recording and a novel phase retrieval numerical algorithm to remove twin-image errors, providing a low-cost, out-of-laboratory imaging solution with enhanced precision.results:The proposed technique enables twin-image-free reconstruction of LDHM images, which improves the accuracy of technical and biomedical imaging applications. The results demonstrate the effectiveness of the proposed technique using phase test targets and cheek cells biosamples.

Abstract
Lensless digital holographic microscopy (LDHM) offers very large field-of-view label-free imaging crucial, e.g., in high-throughput particle tracking and biomedical examination of cells and tissues. Compact layouts promote point-of-case and out-of-laboratory applications. The LDHM, based on the Gabor in-line holographic principle, is inherently spoiled by the twin-image effect, which complicates the quantitative analysis of reconstructed phase and amplitude maps. Popular family of solutions consists of numerical methods, which tend to minimize twin-image upon iterative process based on data redundancy. Additional hologram recordings are needed, and final results heavily depend on the algorithmic parameters, however. In this contribution we present a novel single-shot experimental-numerical twin-image removal technique for LDHM. It leverages two-source off-axis hologram recording deploying simple fiber splitter. Additionally, we introduce a novel phase retrieval numerical algorithm specifically tailored to the acquired holograms, that provides twin-image-free reconstruction without compromising the resolution. We quantitatively and qualitatively verify proposed method employing phase test target and cheek cells biosample. The results demonstrate that the proposed technique enables low-cost, out-of-laboratory LDHM imaging with enhanced precision, achieved through the elimination of twin-image errors. This advancement opens new avenues for more accurate technical and biomedical imaging applications using LDHM, particularly in scenarios where cost-effective and portable imaging solutions are desired.

摘要
LDHM（无镜像数字折射微镜）提供了很大的场视野，无标签的图像重要，如高通过率粒子跟踪和生物医学Cells和组织的检查。嵌入式的设计促进了点位应用和出厂应用。基于Gabor直线折射原理的LDHM受到双像效应的干扰，这使得量化分析重constructed的相位和振幅图表变得复杂。通用的解决方案包括数学方法，这些方法通过基于数据重复的迭代过程来减少双像效应。然而，这些方法需要额外的折射agram记录，并且最终结果受到算法参数的影响。在这篇论文中，我们提出了一种新的单 shot实验数字twain-image removedtechnique for LDHM。它利用了两个源偏心折射agram记录，使用简单的纤维Splitter。此外，我们还提出了一种专门为获得的折射agram设计的数学算法，可以在无需COMPROMISE的分辨率情况下提供无双像效应的重建。我们使用测试target和唾液细胞样本来证明提出的方法的有效性。结果表明，提出的方法可以在低成本和出厂环境中提供高精度的LDHM成像，并且消除了双像效应。这一进展开 up新的可靠和可搬移的LDHM成像应用，特别是在成本效益和出厂环境中。

Non-Intrusive Electric Load Monitoring Approach Based on Current Feature Visualization for Smart Energy Management

paper_url: http://arxiv.org/abs/2308.11627
repo_url: None
paper_authors: Yiwen Xu, Dengfeng Liu, Liangtao Huang, Zhiquan Lin, Tiesong Zhao, Sam Kwong
for: 这个研究旨在为智能城市提供一个经济可行的电力管理系统，特别是对于大规模网络中的电力负载进行监控和分析。
methods: 本研究使用了人工智能的受欢迎计算机视觉技术，设计了一种非侵入式负载监控方法，通过将一维电流信号映射到二维颜色特征图像中，然后使用U型深度神经网络进行负载识别。
results: 实验结果显示，本方法在公共和私人数据集上均达到了超过其他方法的性能，因此支持了大规模互联网智能系统的有效能源管理。

Abstract
The state-of-the-art smart city has been calling for an economic but efficient energy management over large-scale network, especially for the electric power system. It is a critical issue to monitor, analyze and control electric loads of all users in system. In this paper, we employ the popular computer vision techniques of AI to design a non-invasive load monitoring method for smart electric energy management. First of all, we utilize both signal transforms (including wavelet transform and discrete Fourier transform) and Gramian Angular Field (GAF) methods to map one-dimensional current signals onto two-dimensional color feature images. Second, we propose to recognize all electric loads from color feature images using a U-shape deep neural network with multi-scale feature extraction and attention mechanism. Third, we design our method as a cloud-based, non-invasive monitoring of all users, thereby saving energy cost during electric power system control. Experimental results on both public and our private datasets have demonstrated our method achieves superior performances than its peers, and thus supports efficient energy management over large-scale Internet of Things (IoT).

摘要
现代智能城市呼吁了一种经济高效的能源管理方法，特别是电力系统。监测、分析和控制所有用户的电力负荷是一个关键问题。在这篇论文中，我们采用了流行的计算机视觉技术，设计了一种不侵入的负荷监测方法。首先，我们利用了卷积变换（包括浪干变换和离散傅里叶变换）和 Gramian Angular Field（GAF）方法将一维电流信号映射到二维颜色特征图像上。其次，我们提出了通过 U-型深度神经网络（包括多级特征提取和注意机制）来识别所有的电力负荷。最后，我们设计了一种云端、不侵入的监测方法，以便在互联网物联网（IoT）中实现有效的能源管理。实验结果表明，我们的方法在公共数据集和私人数据集上都达到了比其他方法更高的性能，因此支持了大规模互联网物联网中的有效能源管理。

Weakly Semi-Supervised Detection in Lung Ultrasound Videos

paper_url: http://arxiv.org/abs/2308.04463
repo_url: None
paper_authors: Jiahong Ouyang, Li Chen, Gary Y. Li, Naveen Balaraju, Shubham Patil, Courosh Mehanian, Sourabh Kulhare, Rachel Millin, Kenton W. Gregory, Cynthia R. Gregory, Meihua Zhu, David O. Kessler, Laurie Malia, Almaz Dessie, Joni Rabiner, Di Coneybeare, Bo Shopsin, Andrew Hersh, Cristian Madar, Jeffrey Shupp, Laura S. Johnson, Jacob Avila, Kristin Dwyer, Peter Weimersheimer, Balasundar Raju, Jochen Kruecker, Alvin Chen
for: 提高医疗视频中物体检测精度和Robustness，使用弱监督学习方法。
methods: aggregate各个检测预测结果为视频级别预测，并通过视频级别损失进行更多的监督。还引入了基于弱监督的教师-学生训练策略，包括如何改进pseudo标签质量和自适应调整知识传递between teacher和学生网络。
results: 对医学ultrasound视频中肺聚集（如COVID-19肺炎）的检测精度和可靠性进行了改进，比基eline semi-supervised模型更高，同时提高了数据和注释的使用效率。

Abstract
Frame-by-frame annotation of bounding boxes by clinical experts is often required to train fully supervised object detection models on medical video data. We propose a method for improving object detection in medical videos through weak supervision from video-level labels. More concretely, we aggregate individual detection predictions into video-level predictions and extend a teacher-student training strategy to provide additional supervision via a video-level loss. We also introduce improvements to the underlying teacher-student framework, including methods to improve the quality of pseudo-labels based on weak supervision and adaptive schemes to optimize knowledge transfer between the student and teacher networks. We apply this approach to the clinically important task of detecting lung consolidations (seen in respiratory infections such as COVID-19 pneumonia) in medical ultrasound videos. Experiments reveal that our framework improves detection accuracy and robustness compared to baseline semi-supervised models, and improves efficiency in data and annotation usage.

摘要
< Lang="zh-CN" > 框架fram by frame的注意点标注由医疗专家是训练完全指导的物体检测模型的医学视频数据的常见需求。我们提出一种改进医学视频中物体检测的方法，通过弱指导来提高物体检测的准确性和稳定性。具体来说，我们将个体检测预测结果聚合到视频级别预测中，并将视频级别损失扩展到教师学生训练策略中，以提供额外的指导。我们还引入了改进教师学生框架的方法，包括基于弱指导的pseudo标签质量改进和adaptive调整知识传递 между教师和学生网络。我们在诊断肺脏聚集（COVID-19感染引起的肺炎）的医学超声视频中应用这种方法。实验表明，我们的框架可以提高检测精度和稳定性，并提高数据和注释使用效率。 Note that Simplified Chinese is used in the translation, as it is the more commonly used standard for scientific and technical writing in China.

DefCor-Net: Physics-Aware Ultrasound Deformation Correction

paper_url: http://arxiv.org/abs/2308.03865
repo_url: https://github.com/karolinezhy/defcornet
paper_authors: Zhongliang Jiang, Yue Zhou, Dongliang Cao, Nassir Navab
for: 这篇论文旨在提高ultrasound（US）图像取得中的形状修正精度，以便精确和一致的诊断，特别是在电脑助诊中。
methods: 本文提出了一个基于多层深度学习网络的新的体内质量测定方法（DefCor-Net），通过粗细对称的多层网络，从粗细层到细节层进行对应，以提高材料对称性的测量精度。
results: 实验结果显示，使用DefCor-Net可以对US图像进行高精度的形状修正，从$14.3\pm20.9$提高至$82.6\pm12.1$（当力量为$6N$时），这表明DefCor-Net可以实现体内质量测定的灵活性和高精度。

Abstract
The recovery of morphologically accurate anatomical images from deformed ones is challenging in ultrasound (US) image acquisition, but crucial to accurate and consistent diagnosis, particularly in the emerging field of computer-assisted diagnosis. This article presents a novel anatomy-aware deformation correction approach based on a coarse-to-fine, multi-scale deep neural network (DefCor-Net). To achieve pixel-wise performance, DefCor-Net incorporates biomedical knowledge by estimating pixel-wise stiffness online using a U-shaped feature extractor. The deformation field is then computed using polynomial regression by integrating the measured force applied by the US probe. Based on real-time estimation of pixel-by-pixel tissue properties, the learning-based approach enables the potential for anatomy-aware deformation correction. To demonstrate the effectiveness of the proposed DefCor-Net, images recorded at multiple locations on forearms and upper arms of six volunteers are used to train and validate DefCor-Net. The results demonstrate that DefCor-Net can significantly improve the accuracy of deformation correction to recover the original geometry (Dice Coefficient: from $14.3\pm20.9$ to $82.6\pm12.1$ when the force is $6N$).

摘要
“ Ultrasound（US）图像获取中，修复变形的 morphologically 精准 анатомиче图像 recover 是一项挑战，但是对医学诊断的准确性和一致性至关重要，特别是在计算机助成诊断领域。本文提出了一种基于多尺度深度神经网络（DefCor-Net）的新型 anatomy-aware deformation correction 方法。通过在线计算像素刚性的方法，DefCor-Net 可以在实时计算像素刚性的基础上进行学习基于图像材料的 deformation field 计算。通过使用 U-shaped 特征提取器，DefCor-Net 可以在每个像素位置上计算刚性，从而实现像素级别的性能。为了证明 DefCor-Net 的有效性，本文使用了多个臂部和上臂部的 six 名志愿者所记录的图像进行训练和验证。结果表明，DefCor-Net 可以显著提高 deformation correction 的准确性，从 $14.3\pm20.9$ 提高到 $82.6\pm12.1$（当力度为 $6N$）。”Note that Simplified Chinese is used in this translation, as it is the most widely used standard for Chinese writing in mainland China. If you prefer Traditional Chinese, I can provide that version as well.