eess.IV - 2023-07-19

Flexible Physical Unclonable Functions based on non-deterministically distributed Dye-Doped Fibers and Droplets

  • paper_url: http://arxiv.org/abs/2308.11000
  • repo_url: None
  • paper_authors: Mauro Daniel Luigi Bruno, Giuseppe Emanuele Lio, Antonio Ferraro, Sara Nocentini, Giuseppe Papuzzo, Agostino Forestiero, Giovanni Desiderio, Maria Penelope De Santo, Diederik Sybolt Wiersma, Roberto Caputo, Giovanni Golemme, Francesco Riboli, Riccardo Cristoforo Barberi
  • for: 这个论文的目的是开发一种新的防伪技术,以保护日常生活中的商品免受伪制品的侵害。
  • methods: 这篇论文使用了电子涂敷和电子喷涂技术,制备了具有不同Physical Unclonable Function(PUF)密钥的灵活自由浮现膜。
  • results: 这种新技术可以生成三种加密级别的抗伪标签:一、用激发辐射的聚合物滴定粒子图案,非 deterministic 地分布在聚合物纤维中;二、每个标签具有独特的辐射特征;三、使用强大的物理不可克隆功能进行挑战-响应协议(CRP)标识。这种颜色的聚合物标签具有简单、便宜的制造过程,以及多级身份验证,因此成为实用的安全保护解决方案。
    Abstract The development of new anti-counterfeiting solutions is a constant challenge and involves several research fields. Much interest is devoted to systems that are impossible to clone, based on the Physical Unclonable Function (PUF) paradigm. In this work, new strategies based on electrospinning and electrospraying of dye-doped polymeric materials are presented for the manufacturing of flexible free-standing films that embed different PUF keys. Films can be used to fabricate anticounterfeiting labels having three encryption levels: i) a map of fluorescent polymer droplets, with non deterministic positions on a dense yarn of polymer nanofibers; ii) a characteristic fluorescence spectrum for each label; iii) a challenge-response pairs (CRPs) identification protocol based on the strong nature of the physical unclonable function. The intrinsic uniqueness introduced by the deposition techniques encodes enough complexity into the optical anti-counterfeiting tag to generate thousands of cryptographic keys. The simple and cheap fabrication process as well as the multilevel authentication makes such colored polymeric unclonable tags a practical solution in the secure protection of merchandise in our daily life.
    摘要 发展新的反伪技术是一constant challenge,涉及多个研究领域。大量的研究着眼于基于物理不可克隆函数(PUF)理念的系统,以实现不可克隆的安全标签。在这项工作中,我们提出了基于电子涂敷和电子涂敷的染料含有聚合物材料的新策略,用于制造弹性自由standing films,以实现不同PUF钥匙的嵌入。这些涂敷可以用于制造反伪标签,具有三级加密:1. 染料液体的地图,具有不确定的位置在密集的聚合物纤维中;2. 每个标签的特有荧光谱,作为唯一标识符;3. 基于物理不可克隆函数的挑战回应协议(CRP),用于实现多级身份验证。电子涂敷和电子涂敷的固有独特性会将复杂度嵌入到光学反伪标签中,生成数千个 криптографиic钥匙。这种简单便宜的制造过程以及多级身份验证,使得这种颜色含有聚合物的不可克隆标签成为日常生活中安全财物保护的实用解决方案。

Blind Image Quality Assessment Using Multi-Stream Architecture with Spatial and Channel Attention

  • paper_url: http://arxiv.org/abs/2307.09857
  • repo_url: None
  • paper_authors: Hassan Khalid, Nisar Ahmed
  • for: 本研究旨在提出一种多流程空间频道注意力基于的盲图像质量评估算法,以提高图像质量评估的准确性和人类评估的相关性。
  • methods: 该算法首先使用两种不同的背景网络生成混合特征,然后通过空间和频道注意力来为重要区域提供高重量。
  • results: 研究使用四个传统图像质量评估数据集验证了该提议的效果,并通过真实和synthetic零损像库展示了该方法在不同类型的损害下的普适性和准确性。
    Abstract BIQA (Blind Image Quality Assessment) is an important field of study that evaluates images automatically. Although significant progress has been made, blind image quality assessment remains a difficult task since images vary in content and distortions. Most algorithms generate quality without emphasizing the important region of interest. In order to solve this, a multi-stream spatial and channel attention-based algorithm is being proposed. This algorithm generates more accurate predictions with a high correlation to human perceptual assessment by combining hybrid features from two different backbones, followed by spatial and channel attention to provide high weights to the region of interest. Four legacy image quality assessment datasets are used to validate the effectiveness of our proposed approach. Authentic and synthetic distortion image databases are used to demonstrate the effectiveness of the proposed method, and we show that it has excellent generalization properties with a particular focus on the perceptual foreground information.
    摘要 BIQA (自动图像质量评估) 是一个重要的研究领域,它自动评估图像的质量。虽然有了 significante 进步,但图像质量评估仍然是一个困难的任务,因为图像的内容和扭曲都很多。大多数算法生成的质量不强调关键区域的兴趣。为解决这个问题,我们提出了一种多流扩展 spatial 和通道注意力基于算法。这个算法可以将混合特征从两个不同的背板组合起来,然后使用空间和通道注意力提供高重要性的区域。我们使用四个传统的图像质量评估数据集来验证我们的提议的有效性。我们还使用了authentic和synthetic扭曲图像库来证明我们的方法的普适性,并显示它在特定的感知前景信息方面具有出色的泛化性。

Cryo-forum: A framework for orientation recovery with uncertainty measure with the application in cryo-EM image analysis

  • paper_url: http://arxiv.org/abs/2307.09847
  • repo_url: https://github.com/phonchi/cryo-forum
  • paper_authors: Szu-Chi Chung
  • for: 这个论文的目的是提出一种新的方法来确定二维晶体电子显微镜图像的方向参数,以便重建三维结构。
  • methods: 这种方法使用深度学习算法,将2D图像的方向参数表示为10维特征向量,并使用几何约束来预测方向参数。
  • results: 数学分析表明,这种方法可以从2D晶体电子显微镜图像中有效地回收方向参数,并且可以直接从3D数据中清理噪声。此外,这种方法还提供了一个用户友好的软件套件,名为cryo-forum,以便方便开发者使用。
    Abstract In single-particle cryo-electron microscopy (cryo-EM), the efficient determination of orientation parameters for 2D projection images poses a significant challenge yet is crucial for reconstructing 3D structures. This task is complicated by the high noise levels present in the cryo-EM datasets, which often include outliers, necessitating several time-consuming 2D clean-up processes. Recently, solutions based on deep learning have emerged, offering a more streamlined approach to the traditionally laborious task of orientation estimation. These solutions often employ amortized inference, eliminating the need to estimate parameters individually for each image. However, these methods frequently overlook the presence of outliers and may not adequately concentrate on the components used within the network. This paper introduces a novel approach that uses a 10-dimensional feature vector to represent the orientation and applies a Quadratically-Constrained Quadratic Program to derive the predicted orientation as a unit quaternion, supplemented by an uncertainty metric. Furthermore, we propose a unique loss function that considers the pairwise distances between orientations, thereby enhancing the accuracy of our method. Finally, we also comprehensively evaluate the design choices involved in constructing the encoder network, a topic that has not received sufficient attention in the literature. Our numerical analysis demonstrates that our methodology effectively recovers orientations from 2D cryo-EM images in an end-to-end manner. Importantly, the inclusion of uncertainty quantification allows for direct clean-up of the dataset at the 3D level. Lastly, we package our proposed methods into a user-friendly software suite named cryo-forum, designed for easy accessibility by the developers.
    摘要 Single-particle cryo-electron microscopy (cryo-EM) 的高精度Orientation parameter determination for 2D projection images is a significant challenge, yet it is crucial for reconstructing 3D structures. This task is complicated by the high noise levels present in cryo-EM datasets, which often include outliers, necessitating several time-consuming 2D clean-up processes. Recently, deep learning-based solutions have emerged, offering a more streamlined approach to the traditionally laborious task of orientation estimation. These solutions often employ amortized inference, eliminating the need to estimate parameters individually for each image. However, these methods frequently overlook the presence of outliers and may not adequately concentrate on the components used within the network.This paper introduces a novel approach that uses a 10-dimensional feature vector to represent orientation and applies a Quadratically-Constrained Quadratic Program to derive the predicted orientation as a unit quaternion, supplemented by an uncertainty metric. Furthermore, we propose a unique loss function that considers the pairwise distances between orientations, thereby enhancing the accuracy of our method. Finally, we comprehensively evaluate the design choices involved in constructing the encoder network, a topic that has not received sufficient attention in the literature. Our numerical analysis demonstrates that our methodology effectively recovers orientations from 2D cryo-EM images in an end-to-end manner. Importantly, the inclusion of uncertainty quantification allows for direct clean-up of the dataset at the 3D level. Lastly, we package our proposed methods into a user-friendly software suite named cryo-forum, designed for easy accessibility by developers.

Compressive Image Scanning Microscope

  • paper_url: http://arxiv.org/abs/2307.09841
  • repo_url: None
  • paper_authors: Ajay Gunalan, Marco Castello, Simonluca Piazza, Shunlei Li, Alberto Diaspro, Leonardo S. Mattos, Paolo Bianchini
  • for: 提高laser扫描镜微scopic imaging(ISM)的图像质量和数据获取速度
  • methods: 使用单 photon 致暴流器(SPAD)数组探测器和固定抽象策略,通过并行扫描来提高图像重建质量,同时降低计算样式矩阵的时间和数据获取时间
  • results: 实验结果表明,该方法可以提供高质量的图像,同时降低数据获取时间和可能性的辐照热问题
    Abstract We present a novel approach to implement compressive sensing in laser scanning microscopes (LSM), specifically in image scanning microscopy (ISM), using a single-photon avalanche diode (SPAD) array detector. Our method addresses two significant limitations in applying compressive sensing to LSM: the time to compute the sampling matrix and the quality of reconstructed images. We employ a fixed sampling strategy, skipping alternate rows and columns during data acquisition, which reduces the number of points scanned by a factor of four and eliminates the need to compute different sampling matrices. By exploiting the parallel images generated by the SPAD array, we improve the quality of the reconstructed compressive-ISM images compared to standard compressive confocal LSM images. Our results demonstrate the effectiveness of our approach in producing higher-quality images with reduced data acquisition time and potential benefits in reducing photobleaching.
    摘要 我们提出了一种新的方法,用于在激光扫描镜icroscopes(LSM)中实现压缩感知,具体是在图像扫描微scopy(ISM)中使用单 photon 爆发层(SPAD)数组探测器。我们的方法解决了应用压缩感知到 LSM 的两个主要限制:计算样本矩阵的时间和重建图像的质量。我们采用固定样本策略,在数据收集过程中跳过 alternate 行和列,从而将数据点数减少为四分之一,并消除计算不同样本矩阵的需求。通过利用 SPAD 数组产生的平行图像,我们提高了压缩-ISM 图像的重建质量,相比标准压缩扫描LSM 图像。我们的结果表明我们的方法可以生成高质量图像,降低数据收集时间和避免photobleaching。

Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling

  • paper_url: http://arxiv.org/abs/2307.09804
  • repo_url: None
  • paper_authors: Julia Grabinski, Janis Keuper, Margret Keuper
  • for: 本文旨在提高卷积神经网络的Native robustness,即对常见损害和攻击而言的抗颤性。
  • methods: 本文使用了FLC pooling和ASAP pooling两种方法来降采样,其中ASAP pooling在频域降采样中具有较高的抗颤性。
  • results: 对于常见损害和攻击,使用ASAP pooling的网络展现出较高的Native robustness,并且与基eline相当或者超过基eline的clean accuracy。
    Abstract Convolutional neural networks encode images through a sequence of convolutions, normalizations and non-linearities as well as downsampling operations into potentially strong semantic embeddings. Yet, previous work showed that even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness. To address such issues and facilitate simpler and faster adversarial training, [12] recently proposed FLC pooling, a method for provably alias-free downsampling - in theory. In this work, we conduct a further analysis through the lens of signal processing and find that such current pooling methods, which address aliasing in the frequency domain, are still prone to spectral leakage artifacts. Hence, we propose aliasing and spectral artifact-free pooling, short ASAP. While only introducing a few modifications to FLC pooling, networks using ASAP as downsampling method exhibit higher native robustness against common corruptions, a property that FLC pooling was missing. ASAP also increases native robustness against adversarial attacks on high and low resolution data while maintaining similar clean accuracy or even outperforming the baseline.
    摘要 convolutional neural networks 通过一系列卷积、 нор化和非线性转换,以及下采样操作转化图像为强式 semantic embedding。然而,前一些研究表明, Even slight mistakes during sampling, leading to aliasing, can be directly attributed to the networks' lack in robustness。 To address such issues and facilitate simpler and faster adversarial training, 在 [12] 提出了FLC pooling,一种可以 theoretically guarantee alias-free downsampling的方法。在这项工作中,我们通过信号处理的视角进行进一步的分析,发现现有的 pooling 方法,通过频域 Addressing aliasing, still prone to spectral leakage artifacts。因此,我们提出了防止抽象和频域泄漏的 pooling,简称ASAP。尽管ASAP只对 FLC pooling 进行了一些修改,但是使用ASAP作为下采样方法的网络在常见损害和 adversarial attack 上表现出更高的Native robustness,同时保持相似的清洁精度或者甚至超过基eline。

DiffDP: Radiotherapy Dose Prediction via a Diffusion Model

  • paper_url: http://arxiv.org/abs/2307.09794
  • repo_url: https://github.com/scufzh/DiffDP
  • paper_authors: Zhenghao Feng, Lu Wen, Peng Wang, Binyu Yan, Xi Wu, Jiliu Zhou, Yan Wang
  • for: 这个研究旨在提高辐射疗法规划中的剂量分布预测精度,并解决现有方法的过滤问题,使其能够更好地捕捉病人组织内部的剂量分布。
  • methods: 本研究提出了一个扩散基于的剂量预测模型(DiffDP),它包括一个前向过程和一个反向过程。在前向过程中,DiffDP将剂量分布图transformed为 Gaussian 噪声,并训练一个噪声预测器来预测加入每个时间步骤中的噪声。在反向过程中,它从原始 Gaussian 噪声中移除噪声,并使用已经训练好的噪声预测器,在多个步骤中重新生成原始剂量分布图。
  • results: 实验结果显示,DiffDP 模型能够更好地预测病人的剂量分布,并且能够更好地捕捉病人组织内部的剂量分布。此外,DiffDP 模型还能够对病人的临床特点进行更好的考虑,例如临床检查结果和病人的生物物理特点。
    Abstract Currently, deep learning (DL) has achieved the automatic prediction of dose distribution in radiotherapy planning, enhancing its efficiency and quality. However, existing methods suffer from the over-smoothing problem for their commonly used L_1 or L_2 loss with posterior average calculations. To alleviate this limitation, we innovatively introduce a diffusion-based dose prediction (DiffDP) model for predicting the radiotherapy dose distribution of cancer patients. Specifically, the DiffDP model contains a forward process and a reverse process. In the forward process, DiffDP gradually transforms dose distribution maps into Gaussian noise by adding small noise and trains a noise predictor to predict the noise added in each timestep. In the reverse process, it removes the noise from the original Gaussian noise in multiple steps with the well-trained noise predictor and finally outputs the predicted dose distribution map. To ensure the accuracy of the prediction, we further design a structure encoder to extract anatomical information from patient anatomy images and enable the noise predictor to be aware of the dose constraints within several essential organs, i.e., the planning target volume and organs at risk. Extensive experiments on an in-house dataset with 130 rectum cancer patients demonstrate the s
    摘要

CPCM: Contextual Point Cloud Modeling for Weakly-supervised Point Cloud Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.10316
  • repo_url: https://github.com/lizhaoliu-Lec/CPCM
  • paper_authors: Lizhao Liu, Zhuangwei Zhuang, Shangxin Huang, Xunlong Xiao, Tianhang Xiang, Cen Chen, Jingdong Wang, Mingkui Tan
  • for: 提高 weakly-supervised 点云Semantic segmentation 的精度,减少 dense annotations 的成本。
  • methods: 使用 RegionMask 策略和 CMT 方法,结合 masked modeling 技术,从稀疏标注点Cloud中提取上下文信息。
  • results: 在 ScanNet V2 和 S3DIS 测试集上,CPCM 方法比 state-of-the-art 高效。
    Abstract We study the task of weakly-supervised point cloud semantic segmentation with sparse annotations (e.g., less than 0.1% points are labeled), aiming to reduce the expensive cost of dense annotations. Unfortunately, with extremely sparse annotated points, it is very difficult to extract both contextual and object information for scene understanding such as semantic segmentation. Motivated by masked modeling (e.g., MAE) in image and video representation learning, we seek to endow the power of masked modeling to learn contextual information from sparsely-annotated points. However, directly applying MAE to 3D point clouds with sparse annotations may fail to work. First, it is nontrivial to effectively mask out the informative visual context from 3D point clouds. Second, how to fully exploit the sparse annotations for context modeling remains an open question. In this paper, we propose a simple yet effective Contextual Point Cloud Modeling (CPCM) method that consists of two parts: a region-wise masking (RegionMask) strategy and a contextual masked training (CMT) method. Specifically, RegionMask masks the point cloud continuously in geometric space to construct a meaningful masked prediction task for subsequent context learning. CMT disentangles the learning of supervised segmentation and unsupervised masked context prediction for effectively learning the very limited labeled points and mass unlabeled points, respectively. Extensive experiments on the widely-tested ScanNet V2 and S3DIS benchmarks demonstrate the superiority of CPCM over the state-of-the-art.
    摘要 我们研究弱监督点云 semantic segmentation的任务,将点云中的少于0.1%的点为标签,以减少密集标签的成本。然而,具有极端罕 annotated points 的情况下,很难从点云中提取情感和物件信息,进而实现Scene Understanding。我们受到遮罩modeling(例如MAE)的启发,尝试将这种能力应用到弱监督点云中。然而,直接将MAE应用到3D点云中可能无法工作。首先,从3D点云中效iveness地遮罩出有用的视觉上下文是一个问题。其次,如何充分利用仅有的罕 annotated points 进行上下文学习仍然是一个开问题。在这篇论文中,我们提出了一个简单 yet effective的Contextual Point Cloud Modeling(CPCM)方法,包括两个部分:RegionMask 策略和CMT 方法。具体来说,RegionMask 策略在 geometric space 中连续遮罩点云,以建立有意义的遮罩预测任务,并且CMT 方法将这些预测任务与仅有的罕 annotated points 进行协同学习,以实现有限的标签点和大量的点云中的上下文学习。我们对 ScanNet V2 和 S3DIS 两个测试集进行了广泛的实验,结果显示CPCM 方法在State-of-the-art 之上。

NTIRE 2023 Quality Assessment of Video Enhancement Challenge

  • paper_url: http://arxiv.org/abs/2307.09729
  • repo_url: None
  • paper_authors: Xiaohong Liu, Xiongkuo Min, Wei Sun, Yulun Zhang, Kai Zhang, Radu Timofte, Guangtao Zhai, Yixuan Gao, Yuqin Cao, Tengchuan Kou, Yunlong Dong, Ziheng Jia, Yilin Li, Wei Wu, Shuming Hu, Sibin Deng, Pengxiang Xiao, Ying Chen, Kai Li, Kai Zhao, Kun Yuan, Ming Sun, Heng Cong, Hao Wang, Lingzhi Fu, Yusheng Zhang, Rongyu Zhang, Hang Shi, Qihang Xu, Longan Xiao, Zhiliang Ma, Mirko Agarla, Luigi Celona, Claudio Rota, Raimondo Schettini, Zhiwei Huang, Yanan Li, Xiaotao Wang, Lei Lei, Hongye Liu, Wei Hong, Ironhead Chuang, Allen Lin, Drake Guan, Iris Chen, Kae Lou, Willy Huang, Yachun Tasi, Yvonne Kao, Haotian Fan, Fangyuan Kong, Shiqi Zhou, Hao Liu, Yu Lai, Shanshan Chen, Wenqi Wang, Haoning Wu, Chaofeng Chen, Chunzheng Zhu, Zekun Guo, Shiling Zhao, Haibing Yin, Hongkui Wang, Hanene Brachemi Meftah, Sid Ahmed Fezza, Wassim Hamidouche, Olivier Déforges, Tengfei Shi, Azadeh Mansouri, Hossein Motamednia, Amir Hossein Bakhtiari, Ahmad Mahmoudi Aznaveh
  • for: 这个论文旨在提出一个视频提升质量评估挑战,用于解决视频处理领域中的主要挑战之一——视频质量评估(VQA)。
  • methods: 这个挑战使用了VDPVE数据集,包括1211个加强视频,其中600个视频有颜色、亮度和对比度加强,310个视频有去锐加强,301个视频有去偏移加强。挑战共有167个注册参与者。
  • results: 挑战共收到了61个参与者的预测结果,共计3168个提交。最终,37个参与者在最终测试阶段提交了176个模型和相关的说明文档,并详细介绍了他们使用的方法。一些方法的结果比基线方法更好,而赢家的方法还有出色的预测性能。
    Abstract This paper reports on the NTIRE 2023 Quality Assessment of Video Enhancement Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2023. This challenge is to address a major challenge in the field of video processing, namely, video quality assessment (VQA) for enhanced videos. The challenge uses the VQA Dataset for Perceptual Video Enhancement (VDPVE), which has a total of 1211 enhanced videos, including 600 videos with color, brightness, and contrast enhancements, 310 videos with deblurring, and 301 deshaked videos. The challenge has a total of 167 registered participants. 61 participating teams submitted their prediction results during the development phase, with a total of 3168 submissions. A total of 176 submissions were submitted by 37 participating teams during the final testing phase. Finally, 19 participating teams submitted their models and fact sheets, and detailed the methods they used. Some methods have achieved better results than baseline methods, and the winning methods have demonstrated superior prediction performance.
    摘要

Uncertainty-Driven Multi-Scale Feature Fusion Network for Real-time Image Deraining

  • paper_url: http://arxiv.org/abs/2307.09728
  • repo_url: None
  • paper_authors: Ming Tong, Xuefeng Yan, Yongzhen Wang
  • for: 提高雨水影响下视觉测量系统的精度和可靠性。
  • methods: 提出了一种基于不确定性的多尺度特征融合网络(UMFFNet),通过learns probability mapping distribution между对应图像来估计不确定性。具体来说,我们引入了一种不确定性特征融合块(UFFB),通过利用不确定性信息来动态增强获取的特征和着眼于雨斑干扰的模糊区域,从而降低预测错误。
  • results: 对比 existed 图像抽取方法,UMFFNet 在雨水影响下的图像抽取 task 上表现出了显著的性能提升,并且具有少量参数和高效的特点。
    Abstract Visual-based measurement systems are frequently affected by rainy weather due to the degradation caused by rain streaks in captured images, and existing imaging devices struggle to address this issue in real-time. While most efforts leverage deep networks for image deraining and have made progress, their large parameter sizes hinder deployment on resource-constrained devices. Additionally, these data-driven models often produce deterministic results, without considering their inherent epistemic uncertainty, which can lead to undesired reconstruction errors. Well-calibrated uncertainty can help alleviate prediction errors and assist measurement devices in mitigating risks and improving usability. Therefore, we propose an Uncertainty-Driven Multi-Scale Feature Fusion Network (UMFFNet) that learns the probability mapping distribution between paired images to estimate uncertainty. Specifically, we introduce an uncertainty feature fusion block (UFFB) that utilizes uncertainty information to dynamically enhance acquired features and focus on blurry regions obscured by rain streaks, reducing prediction errors. In addition, to further boost the performance of UMFFNet, we fused feature information from multiple scales to guide the network for efficient collaborative rain removal. Extensive experiments demonstrate that UMFFNet achieves significant performance improvements with few parameters, surpassing state-of-the-art image deraining methods.
    摘要 <>Visual-based measurement systems 常常受到雨水的影响,因为捕捉图像中的雨斑会导致图像质量下降。现有的捕捉设备很难在实时中解决这个问题。大多数努力都是基于深度网络进行图像排除雨水,但这些模型很大,不能在资源受限的设备上部署。此外,这些数据驱动的模型经常生成决定性的结果,不考虑其内在的 epistemic 不确定性,这可能导致不想要的重建错误。Well-calibrated uncertainty 可以减轻预测错误和助measurement设备减轻风险,提高可用性。因此,我们提出了一个 Uncertainty-Driven Multi-Scale Feature Fusion Network (UMFFNet),它学习了图像对的映射分布,以便估计 uncertainty。specifically,我们引入了一个 uncertainty 特征融合块 (UFFB),它利用 uncertainty 信息来动态增强获取的特征,并专注于雨斑遮盖的模糊区域,从而减少预测错误。此外,为了进一步提高 UMFFNet 的性能,我们将特征信息从多个级别集成到导向网络,以便有效地协同除雨。广泛的实验表明,UMFFNet 可以在几个参数下达到显著性能提升,超过当前的图像排除雨水方法。

Flexible single multimode fiber imaging using white LED

  • paper_url: http://arxiv.org/abs/2307.09714
  • repo_url: None
  • paper_authors: Minyu Fan, Kun Liu, Jie Zhu, Yu Cao, Sha Wang
  • for: This research aims to improve the imaging capabilities of multimode fibers (MMFs) using white LEDs and cascaded convolutional neural networks (CNNs) to mitigate the effects of mode coupling and modal dispersion.
  • methods: The proposed method uses a MMF as the imaging medium, a white LED as the light source, and a cascaded CNN to reconstruct the images. The channel stitching technology is used to concatenate the output speckle patterns in three different color channels of the CCD camera.
  • results: The experimental results show that the proposed method achieves high-quality image reconstruction with an average Pearson correlation coefficient (PCC) of 0.83 on the Fashion-MINIST dataset. The method also demonstrates good robustness properties, maintaining an average PCC of 0.83 even after completely changing the shape of the MMF.
    Abstract Multimode fiber (MMF) has been proven to have good potential in imaging and optical communication because of its advantages of small diameter and large mode numbers. However, due to the mode coupling and modal dispersion, it is very sensitive to environmental changes. Minor changes in the fiber shape can lead to difficulties in information reconstruction. Here, white LED and cascaded Unet are used to achieve MMF imaging to eliminate the effect of fiber perturbations. The output speckle patterns in three different color channels of the CCD camera produced by transferring images through the MMF are concatenated and inputted into the cascaded Unet using channel stitching technology to improve the reconstruction effects. The average Pearson correlation coefficient (PCC) of the reconstructed images from the Fashion-MINIST dataset is 0.83. In order to check the flexibility of such a system, perturbation tests on the image reconstruction capability by changing the fiber shapes are conducted. The experimental results show that the MMF imaging system has good robustness properties, i. e. the average PCC remains 0.83 even after completely changing the shape of the MMF. This research potentially provides a flexible approach for the practical application of MMF imaging.
    摘要 多模式纤维(MMF)因其小直径和大模数的优点,在成像和光学通信中具有良好的潜力。然而,由于模式相互作用和模态扩散,MMF受环境变化的影响很敏感。小变化纤维形状可能导致信息重建困难。在本研究中,使用白色LED和堆式UNet实现MMF成像,以消除纤维异常的影响。通过将三个不同颜色通道的CCD相机输出的 спе克尔图像 concatenate 并输入到堆式UNet中,使用通道缝合技术进行改进重建效果。实验结果表明,MMF成像系统具有良好的灵活性特性,即平均相关系数(PCC)保持在0.83,即使完全改变纤维形状。这些研究可能为MMF成像的实际应用提供一种灵活的方法。

Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions

  • paper_url: http://arxiv.org/abs/2307.09624
  • repo_url: None
  • paper_authors: Huidong Xie, Bo Zhou, Xiongchao Chen, Xueqi Guo, Stephanie Thorn, Yi-Hwa Liu, Ge Wang, Albert Sinusas, Chi Liu
    for:* 这个论文旨在提高使用GE 530/570c专用卡ди亚斯PECT仪器进行心血管疾病诊断中的图像质量。methods:* 该方法使用了3D transformer-based dual-domain网络(TIP-Net)来实现高质量3D卡ди亚斯PECT图像重建。results:* 对人体研究来说,该方法可以提高卡ди亚斯PECT图像中心疾病潜在性的显示,并且与先前的基线方法进行比较,得到了更高的卡ди亚斯疾病潜在性图像质量。
    Abstract Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners adopt a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.
    摘要 Cardiovascular disease (CVD) is the leading cause of death worldwide, and myocardial perfusion imaging using SPECT has been widely used in the diagnosis of CVDs. The GE 530/570c dedicated cardiac SPECT scanners use a stationary geometry to simultaneously acquire 19 projections to increase sensitivity and achieve dynamic imaging. However, the limited amount of angular sampling negatively affects image quality. Deep learning methods can be implemented to produce higher-quality images from stationary data. This is essentially a few-view imaging problem. In this work, we propose a novel 3D transformer-based dual-domain network, called TIP-Net, for high-quality 3D cardiac SPECT image reconstructions. Our method aims to first reconstruct 3D cardiac SPECT images directly from projection data without the iterative reconstruction process by proposing a customized projection-to-image domain transformer. Then, given its reconstruction output and the original few-view reconstruction, we further refine the reconstruction using an image-domain reconstruction network. Validated by cardiac catheterization images, diagnostic interpretations from nuclear cardiologists, and defect size quantified by an FDA 510(k)-cleared clinical software, our method produced images with higher cardiac defect contrast on human studies compared with previous baseline methods, potentially enabling high-quality defect visualization using stationary few-view dedicated cardiac SPECT scanners.

A comparative analysis of SRGAN models

  • paper_url: http://arxiv.org/abs/2307.09456
  • repo_url: None
  • paper_authors: Fatemeh Rezapoor Nikroo, Ajinkya Deshmukh, Anantha Sharma, Adrian Tam, Kaarthik Kumar, Cleo Norris, Aditya Dangi
    for:* 这项研究评估了多种state-of-the-art SRGAN模型(ESRGAN、Real-ESRGAN和EDSR)在一个实际图像受损pipeline上的性能。methods:* 这些模型使用了state-of-the-art SRGAN网络 architecture。results:* 我们的结果表明,ESDR-BASE模型从huggingface库中出现,在量化指标和主观视觉质量评估中都有较高的表现,同时具有最小的计算负担。 Specifically, EDSR生成的图像具有更高的峰峰信号噪声比(PSNR)和结构相似性指标(SSIM)值,并且可以通过Tesseract OCR引擎返回高质量的OCR结果。这些发现表明,ESDR是一种稳定有效的单图像超分辨方法,可能特别适合需要高质量视觉准确性和优化计算的应用。
    Abstract In this study, we evaluate the performance of multiple state-of-the-art SRGAN (Super Resolution Generative Adversarial Network) models, ESRGAN, Real-ESRGAN and EDSR, on a benchmark dataset of real-world images which undergo degradation using a pipeline. Our results show that some models seem to significantly increase the resolution of the input images while preserving their visual quality, this is assessed using Tesseract OCR engine. We observe that EDSR-BASE model from huggingface outperforms the remaining candidate models in terms of both quantitative metrics and subjective visual quality assessments with least compute overhead. Specifically, EDSR generates images with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values and are seen to return high quality OCR results with Tesseract OCR engine. These findings suggest that EDSR is a robust and effective approach for single-image super-resolution and may be particularly well-suited for applications where high-quality visual fidelity is critical and optimized compute.
    摘要 在这项研究中,我们评估了多种现代SRGAN(超分解生成 adversarial网络)模型,ESRGAN、Real-ESRGAN和EDSR,在一个实际图像减退管道中的 benchmark 数据集上表现。我们的结果表明,一些模型可以明显提高输入图像的分辨率,同时保持视觉质量,这被评估使用 Tesseract OCR 引擎。我们发现,huggingface 中的 EDSR-BASE 模型在对比其他候选模型时,在量化指标和主观视觉质量评估中具有最低的计算开销。具体来说,EDSR 生成的图像具有更高的峰峰信号噪声比(PSNR)和结构相似性指标(SSIM)值,并且可以返回高质量 OCR 结果。这些发现表明,EDSR 是一种稳定和有效的单图超分解方法,可能适用于需要高品质视觉质量和优化计算的应用。

Measuring Student Behavioral Engagement using Histogram of Actions

  • paper_url: http://arxiv.org/abs/2307.09420
  • repo_url: None
  • paper_authors: Ahmed Abdelkawy, Islam Alkabbany, Asem Ali, Aly Farag
  • for: 本研究旨在开发一种新的学生行为参与度测量技术,通过识别学生的动作来预测学生行为参与度水平。
  • methods: 本研究使用人体骨骼模型来模拟学生姿势和上半身运动,并使用3D-CNN模型来学习学生上半身动作的动力学。然后,对每段2分钟视频中的动作进行识别,并将这些动作组织成一个历史gram仓库,用于SVM分类器来判断学生是否参与度高或低。
  • results: 实验结果表明,可以使用3D-CNN模型和SVM分类器来识别学生动作并预测学生参与度,并且可以 Capture the average engagement of the class。
    Abstract In this paper, we propose a novel technique for measuring behavioral engagement through students' actions recognition. The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. To learn the dynamics of student upper body, a 3D-CNN model is used. The trained 3D-CNN model is used to recognize actions within every 2minute video segment then these actions are used to build a histogram of actions which encodes the student actions and their frequencies. This histogram is utilized as an input to SVM classifier to classify whether the student is engaged or disengaged. To evaluate the proposed framework, we build a dataset consisting of 1414 2-minute video segments annotated with 13 actions and 112 video segments annotated with two engagement levels. Experimental results indicate that student actions can be recognized with top 1 accuracy 83.63% and the proposed framework can capture the average engagement of the class.
    摘要 在这篇论文中,我们提出了一种新的方法来测量学生的行为参与度 durch 学生的动作识别。我们的提议方法首先识别学生的动作,然后预测学生的行为参与度水平。为了识别学生的动作,我们使用人体骨架来模拟学生的姿势和上半身运动。为了学习学生上半身的动力学,我们使用3D-CNN模型进行训练。训练完成后,我们使用3D-CNN模型来识别每个2分钟视频段中的动作,并将这些动作组织成一个动作分布图,该图表示学生的动作和其频率。这个动作分布图被用作SVM分类器的输入,以判断学生是否参与度。为评估我们的方框架,我们建立了一个数据集,该数据集包含1414个2分钟视频段,每个视频段被标注为13种动作,以及112个视频段被标注为两个参与度水平。实验结果表明,我们的方法可以识别学生的动作 WITH top 1 准确率83.63%,并且可以捕捉教室的平均参与度。