eess.IV - 2023-09-14

Live Iterative Ptychography with projection-based algorithms

  • paper_url: http://arxiv.org/abs/2309.08639
  • repo_url: https://github.com/sp-uhh/livepty
  • paper_authors: Simon Welker, Tal Peer, Henry N. Chapman, Timo Gerkmann
  • for: 这个研究证明了在扫描过程中,ptychographic phase problem可以实时解决,而不需要预先处理数据。
  • methods: 这种方法基于投影方法,如Error Reduction (ER)和Difference Map (DM),但是它提供了实时视觉反馈、对象重建和自适应扫描功能。
  • results: 研究表明,live variants of projection-based methods可以在相同的计算资源下实现更高质量的重建,并且可以在实际应用中提供实时视觉反馈。
    Abstract In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected. We propose a generally applicable modification of the widespread projection-based algorithms such as Error Reduction (ER) and Difference Map (DM). This novel variant of ptychographic phase retrieval enables immediate visual feedback during experiments, reconstruction of arbitrary-sized objects with a fixed amount of computational resources, and adaptive scanning. By building upon the Real-Time Iterative Spectrogram Inversion (RTISI) family of algorithms from the audio processing literature, we show that live variants of projection-based methods such as DM can be derived naturally and may even achieve higher-quality reconstructions than their classic non-live counterparts with comparable effective computational load.
    摘要 在这个工作中,我们证明了ptychographic phase problem可以在扫描过程中实时解决,而数据仍在被收集。我们提议一种通用的修改,以适应广泛的投影基本算法,如Error Reduction (ER)和Difference Map (DM)。这种新的ptychographic phase恢复方法允许实时视觉反馈、 fixed computational resources 扫描任意大小的对象、和自适应扫描。通过基于音频处理文献的Real-Time Iterative Spectrogram Inversion (RTISI)家族算法,我们显示了live variants of projection-based方法,如DM,可以自然地 derivation,并且可能 même achieve higher-quality reconstructions than their classic non-live counterparts with comparable effective computational load。

MPAI-EEV: Standardization Efforts of Artificial Intelligence based End-to-End Video Coding

  • paper_url: http://arxiv.org/abs/2309.07589
  • repo_url: https://github.com/yefeng00/EEV-0.4
  • paper_authors: Chuanmin Jia, Feng Ye, Fanke Dong, Kai Lin, Leonardo Chiariglione, Siwei Ma, Huifang Sun, Wen Gao
  • for: 这份研究是为了推动人工智能(AI)技术的标准化,特别是用于神经网络对动画的处理、编码和传输。
  • methods: 这份研究使用了神经网络技术来实现端到端优化的神经动画编码,并且不受传统的混合架构限制。
  • results: 这份研究显示了EEV模型在比较标准H.266/VVC的评估指标下表现更好,并且可以更好地实现高品质动画资料的压缩。
    Abstract The rapid advancement of artificial intelligence (AI) technology has led to the prioritization of standardizing the processing, coding, and transmission of video using neural networks. To address this priority area, the Moving Picture, Audio, and Data Coding by Artificial Intelligence (MPAI) group is developing a suite of standards called MPAI-EEV for "end-to-end optimized neural video coding." The aim of this AI-based video standard project is to compress the number of bits required to represent high-fidelity video data by utilizing data-trained neural coding technologies. This approach is not constrained by how data coding has traditionally been applied in the context of a hybrid framework. This paper presents an overview of recent and ongoing standardization efforts in this area and highlights the key technologies and design philosophy of EEV. It also provides a comparison and report on some primary efforts such as the coding efficiency of the reference model. Additionally, it discusses emerging activities such as learned Unmanned-Aerial-Vehicles (UAVs) video coding which are currently planned, under development, or in the exploration phase. With a focus on UAV video signals, this paper addresses the current status of these preliminary efforts. It also indicates development timelines, summarizes the main technical details, and provides pointers to further points of reference. The exploration experiment shows that the EEV model performs better than the state-of-the-art video coding standard H.266/VVC in terms of perceptual evaluation metric.
    摘要 人工智能技术的快速发展导致了标准化处理、编程和传输视频使用神经网络的优先级。为解决这一优先领域,人工智能视频编码标准(MPAI)小组在开发一个名为“终端优化神经视频编码”(MPAI-EEV)的标准集。该项目的目标是通过使用训练过的神经网络编码技术来压缩表示高精度视频数据的比特数。这种方法不受传统的混合框架下的数据编码方法的限制。本文提供了最近和进行中的标准化努力的概述,以及EEV的关键技术和设计哲学。它还对参考模型的编码效率进行了比较报告。此外,它还讨论了正在进行的learned Unmanned-Aerial-Vehicles(UAV)视频编码活动,包括目前的规划、开发和探索阶段。关注UAV视频信号,本文介绍了当前的初步努力,包括发展时间表、主要技术细节和更多参考点。实验表明,EEV模型在可视评价指标上表现更好于当前最佳视频编码标准H.266/VVC。

Oscillating-gradient spin-echo diffusion-weighted imaging (OGSE-DWI) with a limited number of oscillations: II. Asymptotics

  • paper_url: http://arxiv.org/abs/2309.07484
  • repo_url: None
  • paper_authors: Jeff Kershaw, Takayuki Obata
  • for: 这个研究的目的是研究气压gradient spin-echo扩散磁共振成像(OGSE-DWI)技术在复频域中的应用,以研究复杂的湿物质的微结构。
  • methods: 这个研究使用了OGSE-DWI技术,通过测量$U_{kk}$和$U_{k0}$这两个量来间接获取分子扩散 спектrum的信息。
  • results: 研究发现,在低频和高频限制下,气压gradient spin-echo扩散磁共振成像信号的各自特征具有普适的行为,这些行为是基于样本的全局组织结构。
    Abstract Oscillating-gradient spin-echo diffusion-weighted magnetic resonance imaging (OGSE-DWI) has been promoted as a promising technique for studying the microstructure of complex hydrated matter in the frequency domain. The target of the OGSE-DWI technique is the spectral density of molecular diffusion, $u_{2}(\omega)$, which is predicted to obey a set of asymptotic universality relations that are linked to the global organisation of the sample. So, in principle the complex microstructure of a medium can be classified by measuring the spectral density in its low- and high-frequency limits. However, due to practical limitations on the spectral resolution and range of the technique, it is not possible to directly sample the spectral density with OGSE-DWI. Rather, information about the spectral density can be obtained only indirectly through the quantities $U_{kk}$ & $U_{k0}$, which are filtered representations of $u_{2}(\omega)$. The purpose of this study is to investigate how the universal behaviour of $u_{2}(\omega)$ emerges in the asymptotic behaviour of OGSE-DWI signal.
    摘要 oscilating-gradient spin-echo diffusion-weighted magnetic resonance imaging (OGSE-DWI) 被广泛推广为研究复杂湿物质微结构的有效技术。 OGSE-DWI 技术的目标是分子扩散 спектル的密度函数 $u_{2}(\omega)$,这被预计遵循一系列的极限universality关系,与样本的全局结构相关。因此,通过测量低频和高频限的 spectral density,可以直接分类复杂medium的微结构。然而,由于 OGSE-DWI 技术的实际 spectral resolution和范围有限制,因此不能直接测量 spectral density。相反,通过 quantities $U_{kk}$ 和 $U_{k0}$,这些是 $u_{2}(\omega)$ 的过滤表示,可以获得有关 spectral density 的信息。本研究的目标是研究 OGSE-DWI 信号的极限行为中 universality 行为如何emerge。

CvFormer: Cross-view transFormers with Pre-training for fMRI Analysis of Human Brain

  • paper_url: http://arxiv.org/abs/2309.07940
  • repo_url: None
  • paper_authors: Xiangzhu Meng, Qiang Liu, Shu Wu, Liang Wang
  • for: 这 paper 旨在Addressing the issue of neglecting complementary information between region of interest (RoI) nodes and their connectivities in human brain functional magnetic resonance imaging (fMRI) data, by proposing a novel cross-view analysis method called Cross-view transFormers (CvFormer).
  • methods: CvFormer 使用 RoI 和连接性编码模块生成两个不同视图的人脑Brain, 然后使用基本 transformer 模块处理 RoI 和子连接 tokens,并将两个视图的信息集成在 cross-view 模块中。此外,CvFormer 还使用全局 токен作为每个分支的查询,以便在 cross-view 模块中交换信息,这需要 linear 时间的计算和存储复杂度而不是quadratic 时间。
  • results: 实验结果表明,提出的 CvFormer 可以在 two public ABIDE 和 ADNI 数据集上显著提高表达,证明其效果和超越性。
    Abstract In recent years, functional magnetic resonance imaging (fMRI) has been widely utilized to diagnose neurological disease, by exploiting the region of interest (RoI) nodes as well as their connectivities in human brain. However, most of existing works only rely on either RoIs or connectivities, neglecting the potential for complementary information between them. To address this issue, we study how to discover the rich cross-view information in fMRI data of human brain. This paper presents a novel method for cross-view analysis of fMRI data of the human brain, called Cross-view transFormers (CvFormer). CvFormer employs RoI and connectivity encoder modules to generate two separate views of the human brain, represented as RoI and sub-connectivity tokens. Then, basic transformer modules can be used to process the RoI and sub-connectivity tokens, and cross-view modules integrate the complement information across two views. Furthermore, CvFormer uses a global token for each branch as a query to exchange information with other branches in cross-view modules, which only requires linear time for both computational and memory complexity instead of quadratic time. To enhance the robustness of the proposed CvFormer, we propose a two-stage strategy to train its parameters. To be specific, RoI and connectivity views can be firstly utilized as self-supervised information to pre-train the CvFormer by combining it with contrastive learning and then fused to finetune the CvFormer using label information. Experiment results on two public ABIDE and ADNI datasets can show clear improvements by the proposed CvFormer, which can validate its effectiveness and superiority.
    摘要 近年来,功能核磁共振成像(fMRI)已广泛应用于诊断神经系统疾病,利用人脑中的区域兴趣(RoI)节点和其连接性。然而,大多数现有工作只是利用RoI或连接性,忽略了这两者之间的可能性。为了解决这个问题,我们研究如何在人脑fMRI数据中发现富有跨视图信息。本文提出了一种新的跨视图分析人脑fMRI数据的方法,称为跨视图变换器(CvFormer)。CvFormer使用RoI和连接性编码模块生成两个分开的人脑视图,即RoI和子连接度 tokens。然后,基本变换模块可以处理RoI和子连接度 tokens,并将两个视图的补充信息集成起来。此外,CvFormer使用每个分支的全局token作为查询,在跨视图模块中交换信息,只需要线性时间的计算和存储复杂度,而不是 quadratic时间。为了增强提出的CvFormerRobustness,我们提出了一种两阶段策略来训练其参数。具体来说,RoI和连接视图可以在先使用自我超视的方式将CvFormer pré-训练,然后将其与标签信息混合以进行精度训练。实验结果表明,提出的CvFormer在两个公共的ABIDE和ADNI数据集上具有显著改进,这可以证明其效果和优势。

VCD: A Video Conferencing Dataset for Video Compression

  • paper_url: http://arxiv.org/abs/2309.07376
  • repo_url: None
  • paper_authors: Babak Naderi, Ross Cutler, Nabakumar Singh Khongbantabam, Yasaman Hosseinkashi
  • for: The paper is written for evaluating video codecs for real-time communication in video conferencing scenarios.
  • methods: The paper presents a new dataset called the Video Conferencing Dataset (VCD) that includes a wide variety of camera qualities and spatial and temporal information.
  • results: The paper reports the compression efficiency of several popular video codecs (H.264, H.265, H.266, and AV1) in low-delay settings on VCD and compares them with non-video conferencing datasets. The results show that the source quality and scenarios have a significant effect on the compression efficiency of all the codecs.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为评估视频编码器在视频会议场景中的实时通信而写的。
  • methods: 论文提出了一个新的视频会议数据集(VCD),该数据集包含多种摄像头质量和空间和时间信息。
  • results: 论文Report了H.264、H.265、H.266和AV1等多种流行的视频编码器在VCD中的压缩效率,并与非视频会议数据集进行比较。结果显示源质量和场景具有重要的影响于所有编码器的压缩效率。
    Abstract Commonly used datasets for evaluating video codecs are all very high quality and not representative of video typically used in video conferencing scenarios. We present the Video Conferencing Dataset (VCD) for evaluating video codecs for real-time communication, the first such dataset focused on video conferencing. VCD includes a wide variety of camera qualities and spatial and temporal information. It includes both desktop and mobile scenarios and two types of video background processing. We report the compression efficiency of H.264, H.265, H.266, and AV1 in low-delay settings on VCD and compare it with the non-video conferencing datasets UVC, MLC-JVC, and HEVC. The results show the source quality and the scenarios have a significant effect on the compression efficiency of all the codecs. VCD enables the evaluation and tuning of codecs for this important scenario. The VCD is publicly available as an open-source dataset at https://github.com/microsoft/VCD.
    摘要 通常使用的视频编码器评估 dataset 都具有非常高的质量,而不代表实际视频会议场景。我们介绍了视频会议 Dataset (VCD),用于评估实时通信中的视频编码器,这是第一个专门针对视频会议的 dataset。VCD 包括多种摄像头质量和空间和时间信息,包括桌面和移动场景,以及两种视频背景处理方式。我们在 VCD 上测试了 H.264、H.265、H.266 和 AV1 编码器的压缩效率,并与非视频会议 dataset UVC、MLC-JVC 和 HEVC 进行比较。结果表明源质量和场景具有重要的影响于所有编码器的压缩效率。VCD 可以用于评估和调整编码器,以便在这一重要场景中实现优化。VCD 公共可用于开源 dataset,可以在 GitHub 上找到。