eess.IV - 2023-07-17

Reconstructed Convolution Module Based Look-Up Tables for Efficient Image Super-Resolution

paper_url: http://arxiv.org/abs/2307.08544
repo_url: https://github.com/liuguandu/rc-lut
paper_authors: Guandu Liu, Yukang Ding, Mading Li, Ming Sun, Xing Wen, Bin Wang
for: 提高单个图像超分解（SR）任务的效果
methods: 使用新型的重构卷积（RC）模块，它将通道和空间计算解耦，从而降低LUT大小并保持$n\times n$的辐射场
results: 与state-of-the-art LUT基elineSR方法相比，提出的RCLUT方法可以提高辐射场大小9倍，并在5个流行的benchmark数据集上达到优秀表现，同时可以作为LUT基elineSR方法的替换部件进行改进。

Abstract
Look-up table(LUT)-based methods have shown the great efficacy in single image super-resolution (SR) task. However, previous methods ignore the essential reason of restricted receptive field (RF) size in LUT, which is caused by the interaction of space and channel features in vanilla convolution. They can only increase the RF at the cost of linearly increasing LUT size. To enlarge RF with contained LUT sizes, we propose a novel Reconstructed Convolution(RC) module, which decouples channel-wise and spatial calculation. It can be formulated as $n^2$ 1D LUTs to maintain $n\times n$ receptive field, which is obviously smaller than $n\times n$D LUT formulated before. The LUT generated by our RC module reaches less than 1/10000 storage compared with SR-LUT baseline. The proposed Reconstructed Convolution module based LUT method, termed as RCLUT, can enlarge the RF size by 9 times than the state-of-the-art LUT-based SR method and achieve superior performance on five popular benchmark dataset. Moreover, the efficient and robust RC module can be used as a plugin to improve other LUT-based SR methods. The code is available at https://github.com/liuguandu/RC-LUT.

摘要
look-up 表(LUT)-based 方法在单个图像超解像(SR) 任务中表现出色。然而，先前的方法忽视了 Look-up 表中 restricted 收发Field(RF) 的关键原因，这是因为混合空间和通道特征在 vanilla 核函数中所引起的。他们只能通过线性增加 LUT 大小来增加 RF。为了使 RF 增加而不是 LUT 大小线性增加，我们提议一种新的 Reconstructed Convolution(RC) 模块。这可以表示为 $n^2$ 1D LUT，以维护 $n\times n$ 收发Field。与之前的 $n\times n$D LUT 不同，这明显更小。我们的 RC 模块生成的 LUT 存储量低于 1/10000 比 SR-LUT 基eline。我们提议的 Reconstructed Convolution 模块基于 LUT 方法，称为 RCLUT，可以将 RF 尺寸提高至先前的 9 倍，并在五个流行的 benchmark 数据集上实现出色的性能。此外，我们的有效和可靠 RC 模块可以作为 LUT-based SR 方法的插件来改进其性能。代码可以在 https://github.com/liuguandu/RC-LUT 上获取。

Study of Vision Transformers for Covid-19 Detection from Chest X-rays

paper_url: http://arxiv.org/abs/2307.09402
repo_url: None
paper_authors: Sandeep Angara, Sharath Thirunagaru
for: 这个研究旨在检测 COVID-19 病毒，使用视觉转换器进行检测，以提高检测效率和准确率。
methods: 本研究使用了许多现代的视觉转换器模型，包括 Vision Transformer (ViT)、Swin-transformer、Max vision transformer (MViT) 和 Pyramid Vision transformer (PVT)，通过转移学习IMAGENET 权重来实现高度的检测精度。
results: 实验结果显示，视觉转换器模型在 COVID-19 检测中达到了状态对的性能，即 98.75% 到 99.5% 的准确率，超过了传统方法和卷积神经网络（CNNs）的性能， highlighting the potential of Vision Transformers as a powerful tool for COVID-19 detection.

Abstract
The COVID-19 pandemic has led to a global health crisis, highlighting the need for rapid and accurate virus detection. This research paper examines transfer learning with vision transformers for COVID-19 detection, known for its excellent performance in image recognition tasks. We leverage the capability of Vision Transformers to capture global context and learn complex patterns from chest X-ray images. In this work, we explored the recent state-of-art transformer models to detect Covid-19 using CXR images such as vision transformer (ViT), Swin-transformer, Max vision transformer (MViT), and Pyramid Vision transformer (PVT). Through the utilization of transfer learning with IMAGENET weights, the models achieved an impressive accuracy range of 98.75% to 99.5%. Our experiments demonstrate that Vision Transformers achieve state-of-the-art performance in COVID-19 detection, outperforming traditional methods and even Convolutional Neural Networks (CNNs). The results highlight the potential of Vision Transformers as a powerful tool for COVID-19 detection, with implications for improving the efficiency and accuracy of screening and diagnosis in clinical settings.

摘要
COVID-19 大流行导致全球卫生危机，高亮了快速和准确病毒检测的需求。这篇研究论文研究了通过视力变换器进行 COVID-19 检测，这种技术在图像识别任务中表现出色。我们利用视力变换器捕捉全局上下文和学习复杂的图像模式。在这项工作中，我们探索了最新的转换器模型，包括视力变换器（ViT）、Swin-transformer、Max视力变换器（MViT）和Pyramid视力变换器（PVT），以进行 COVID-19 检测使用 CXR 图像。通过使用转换学习IMAGENET 重量，模型实现了各种准确率范围为 98.75% 到 99.5%。我们的实验表明，视力变换器在 COVID-19 检测中实现了状态艺术表现，超越传统方法和卷积神经网络（CNNs）。结果表明，视力变换器是一种有力的工具，可以提高检测和诊断的效率和准确率。

EGE-UNet: an Efficient Group Enhanced UNet for skin lesion segmentation

paper_url: http://arxiv.org/abs/2307.08473
repo_url: https://github.com/jcruan519/ege-unet
paper_authors: Jiacheng Ruan, Mingye Xie, Jingsheng Gao, Ting Liu, Yuzhuo Fu
for: 这篇论文的目的是提出一个更有效率的医疗图像分类方法，以便应用于移动健康应用程序中。
methods: 这篇论文使用了一种名为Efficient Group Enhanced UNet（EGE-UNet）的方法，它将一种名为Group multi-axis Hadamard Product Attention（GHPA）和一种名为Group Aggregation Bridge（GAB）模组组合在一起，以提高分类精度和减少计算负载。
results: 根据实验结果，EGE-UNet比较 existed 的方法有着更好的分类性能，并且降低了参数和计算负载的比例，具体是494倍和160倍。此外，这是第一个参数数量只有50KB的模型。

Abstract
Transformer and its variants have been widely used for medical image segmentation. However, the large number of parameter and computational load of these models make them unsuitable for mobile health applications. To address this issue, we propose a more efficient approach, the Efficient Group Enhanced UNet (EGE-UNet). We incorporate a Group multi-axis Hadamard Product Attention module (GHPA) and a Group Aggregation Bridge module (GAB) in a lightweight manner. The GHPA groups input features and performs Hadamard Product Attention mechanism (HPA) on different axes to extract pathological information from diverse perspectives. The GAB effectively fuses multi-scale information by grouping low-level features, high-level features, and a mask generated by the decoder at each stage. Comprehensive experiments on the ISIC2017 and ISIC2018 datasets demonstrate that EGE-UNet outperforms existing state-of-the-art methods. In short, compared to the TransFuse, our model achieves superior segmentation performance while reducing parameter and computation costs by 494x and 160x, respectively. Moreover, to our best knowledge, this is the first model with a parameter count limited to just 50KB. Our code is available at https://github.com/JCruan519/EGE-UNet.

摘要
“transformer和其 variants 在医疗影像 segmentation 方面广泛应用。然而，这些模型的参数数量和计算负担使得它们不适合移动医疗应用。为解决这个问题，我们提出了一种更高效的方法，efficient Group Enhanced UNet (EGE-UNet)。我们在轻量级的情况下嵌入了Group multi-axis Hadamard Product Attention module (GHPA)和Group Aggregation Bridge module (GAB)。GHPA 将输入特征分组并在不同轴上执行 Hadamard Product Attention 机制 (HPA)，以提取多个视角下的疾病信息。GAB 有效地将多尺度信息 fusion，通过分组低级特征、高级特征和解码器在每个阶段生成的掩码。经过了 ISIC2017 和 ISIC2018 数据集的广泛实验，我们的 EGE-UNet 超越了现有的状态态-of-the-art 方法。总之，相比于 TransFuse，我们的模型实现了更高效的 segmentation 性能，同时减少参数数量和计算成本，减少了 494 倍和 160 倍。此外，我们知道的是，这是第一个参数数量限制在 50KB 的模型。我们的代码可以在上找到。”

Domain Adaptation using Silver Standard Masks for Lateral Ventricle Segmentation in FLAIR MRI

paper_url: http://arxiv.org/abs/2307.08456
repo_url: None
paper_authors: Owen Crystal, Pejman J. Maralani, Sandra Black, Alan R. Moody, April Khademi
for: 这个研究旨在提出一种基于转移学习的左 Lateral ventricular volume (LVV) 分割方法，用于 Fluid-attenuated inversion recovery (FLAIR) MRI 图像。
methods: 这种方法使用了域 adaptation 技术，以便在目标领域中优化性能，并使用了一种新的传统图像处理 algorithm 生成了Silver standard (SS) mask。
results: 测试结果表明，使用 SS+GS 模型（在目标 SS Mask 和源 GS Mask 上进行练习和微调）在三个目标领域上得到了最好的和最稳定的性能（mean DSC = 0.89，CoV = 0.05），并与源 GS 模型在三个目标领域上显著 differently (p < 0.05)。这些结果表明，在目标领域中生成的噪声标签可以帮助模型适应到 dataset-specific 特征，并提供了一个 robust 的参数初始化。

Abstract
Lateral ventricular volume (LVV) is an important biomarker for clinical investigation. We present the first transfer learning-based LVV segmentation method for fluid-attenuated inversion recovery (FLAIR) MRI. To mitigate covariate shifts between source and target domains, this work proposes an domain adaptation method that optimizes performance on three target datasets. Silver standard (SS) masks were generated from the target domain using a novel conventional image processing ventricular segmentation algorithm and used to supplement the gold standard (GS) data from the source domain, Canadian Atherosclerosis Imaging Network (CAIN). Four models were tested on held-out test sets from four datasets: 1) SS+GS: trained on target SS masks and fine-tuned on source GS masks, 2) GS+SS: trained on source GS masks and fine-tuned on target SS masks, 3) trained on source GS (GS CAIN Only) and 4) trained on target SS masks (SS Only). The SS+GS model had the best and most consistent performance (mean DSC = 0.89, CoV = 0.05) and showed significantly (p < 0.05) higher DSC compared to the GS-only model on three target domains. Results suggest pre-training with noisy labels from the target domain allows the model to adapt to the dataset-specific characteristics and provides robust parameter initialization while fine-tuning with GS masks allows the model to learn detailed features. This method has wide application to other medical imaging problems where labeled data is scarce, and can be used as a per-dataset calibration method to accelerate wide-scale adoption.

摘要
lateral ventricular volume (LVV) 是一个重要的临床探索指标。本研究提出了首个将 Transfer Learning 应用于 fluid-attenuated inversion recovery (FLAIR) MRI 的 LVV 分割方法。为了减少对于源和目标领域之间的差异，本研究提出了一种领域适应方法，将目标领域中的 silver standard (SS) 标签转换为 gold standard (GS) 标签，并将其用于补充来自源领域的 GS 标签。本研究测试了四种模型，包括：1) SS+GS：使用目标领域中的 SS 标签进行 fine-tuning，并使用源领域中的 GS 标签进行训练；2) GS+SS：使用源领域中的 GS 标签进行 fine-tuning，并使用目标领域中的 SS 标签进行训练；3) 使用源领域中的 GS 标签进行训练（GS CAIN Only）；4) 使用目标领域中的 SS 标签进行训练（SS Only）。结果显示 SS+GS 模型的表现最佳和最稳定（平均 DSC = 0.89，CoV = 0.05），并与三个目标领域中的 GS-only 模型相比有 statistically significant 的差异（p < 0.05）。结果显示在目标领域中使用不精确的标签进行预训练可以让模型适应到dataset-specific的特性，并提供了Robust的初始化，而在 fine-tuning 过程中使用 GS 标签可以让模型学习到详细的特征。这种方法可以延伸到其他医学影像问题， где 标签是稀缺的，并且可以作为一种可靠的准备方法，帮助快速普遍推广。

Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation

paper_url: http://arxiv.org/abs/2307.08388
repo_url: https://github.com/yaoleiqi/dscnet
paper_authors: Yaolei Qi, Yuting He, Xiaoming Qi, Yuan Zhang, Guanyu Yang
for: 提高 tubular 结构分割的准确性和效率，在各种领域 Ensure accuracy and efficiency in various fields.
methods: 使用动态蛇 convolution 精确捕捉 tubular 结构特征，并在多视图Feature fusion中补充注意力。 Propose a multi-view feature fusion strategy to complement attention to features from multiple perspectives during feature fusion.
results: 对 2D 和 3D 数据集进行实验，与其他方法比较，DSCNet 在 tubular 结构分割任务中提供更高的准确性和连续性。 Our experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods.

Abstract
Accurate segmentation of topological tubular structures, such as blood vessels and roads, is crucial in various fields, ensuring accuracy and efficiency in downstream tasks. However, many factors complicate the task, including thin local structures and variable global morphologies. In this work, we note the specificity of tubular structures and use this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraint. First, we propose a dynamic snake convolution to accurately capture the features of tubular structures by adaptively focusing on slender and tortuous local structures. Subsequently, we propose a multi-view feature fusion strategy to complement the attention to features from multiple perspectives during feature fusion, ensuring the retention of important information from different global morphologies. Finally, a continuity constraint loss function, based on persistent homology, is proposed to constrain the topological continuity of the segmentation better. Experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared with several methods. Our codes will be publicly available.

摘要
精准分割 topological tubular 结构，如血管和道路，在各个领域是关键，以确保准确性和效率。然而，许多因素使得这个任务变得复杂，包括细小的地方结构和变化的全球形态。在这种情况下，我们注意到 tubular 结构的特殊性，并使用这些知识来引导我们的 DSCNet 在三个阶段中同时提高听见：特征提取、特征融合和损失约束。首先，我们提出了动态蛇 convolution，以准确地捕捉 tubular 结构的特征，并在细小和折衔的地方结构中进行适应性地调整。然后，我们提出了多视图特征融合策略，在特征融合时，从多个角度来补充对特征的注意力，以确保保留不同全球形态中的重要信息。最后，我们提出了基于 persistente homology 的连续性约束损失函数，以更好地限制分割结果的topological连续性。在 2D 和 3D 数据集上进行了实验，发现我们的 DSCNet 在 tubular 结构分割任务上提供了更高的准确性和连续性，比较多种方法。我们的代码将在公共可用。

Component-wise Power Estimation of Electrical Devices Using Thermal Imaging

paper_url: http://arxiv.org/abs/2307.08354
repo_url: None
paper_authors: Christian Herglotz, Simon Grosche, Akarsh Bharadwaj, André Kaup
for: 这 paper 用于估计电子板子上不同活动组件的功率消耗。
methods: 该方法使用热成像技术，不需要特殊的高反射层。可以通过手动标注、物体检测方法或利用布局信息获得热图分割。
results: 评估结果显示，使用低分辨率消耗功率大于300mW的consumer infrared镜头，可以达到mean estimation error为10%。

Abstract
This paper presents a novel method to estimate the power consumption of distinct active components on an electronic carrier board by using thermal imaging. The components and the board can be made of heterogeneous material such as plastic, coated microchips, and metal bonds or wires, where a special coating for high emissivity is not required. The thermal images are recorded when the components on the board are dissipating power. In order to enable reliable estimates, a segmentation of the thermal image must be available that can be obtained by manual labeling, object detection methods, or exploiting layout information. Evaluations show that with low-resolution consumer infrared cameras and dissipated powers larger than 300mW, mean estimation errors of 10% can be achieved.

摘要
这篇论文提出了一种新的方法，用温存像来估算电子承载板上不同活动部件的能量消耗。这些部件和板可以由不同材料组成，如塑料、覆监微型逻辑器和金属带或电缆，而不需要特殊的高温透明层。温存像记录在部件上散热时，并进行了可靠的分割，可以通过手动标注、物体检测方法或利用布局信息来获得。评估结果显示，使用低分辨率消耗电频相机和大于300mW的散热功率，可以实现平均估算误差10%。

Neural Modulation Fields for Conditional Cone Beam Neural Tomography

paper_url: http://arxiv.org/abs/2307.08351
repo_url: https://github.com/samuelepapa/cond-cbnt
paper_authors: Samuele Papa, David M. Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, Efstratios Gavves
for: 提高深度学习方法在 cone beam geometry computed tomography (CBCT) 重建中的精度和效率，使其能够在更复杂的CBCT重建中提供更好的结果。
methods: 基于神经场 (NF) 的方法，通过在输入空间中学习一个连续的神经网络来近似重建的密度。新提议的方法是通过使用每个扫描中的本地修饰来Conditioning Cone Beam Neural Tomography (CondCBNT)，使其能够更好地适应不同扫描数据中的变化。
results: CondCBNT 在不同数量的可用投射下对噪声数据和清晰数据都显示了改进的性能，比如使用单个CondCBNT模型可以在低投射数下达到高精度水平。

Abstract
Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data.

摘要

Efficient coding of 360° videos exploiting inactive regions in projection formats

paper_url: http://arxiv.org/abs/2307.08344
repo_url: None
paper_authors: Christian Herglotz, Mohammadreza Jamali, Stéphane Coulombe, Carlos Vazquez, Ahmad Vakili
for: 提高360度视频编码效率，即使在无法观看的区域忽略 pixels 值。
methods: 利用无效区域忽略 pixels 值，在重建Equirectangular格式或视口中减少损失。
results: 可以达到10%的比特率减少。

Abstract
This paper presents an efficient method for encoding common projection formats in 360$^\circ$ video coding, in which we exploit inactive regions. These regions are ignored in the reconstruction of the equirectangular format or the viewport in virtual reality applications. As the content of these pixels is irrelevant, we neglect the corresponding pixel values in ratedistortion optimization, residual transformation, as well as inloop filtering and achieve bitrate savings of up to 10%.

摘要

Power Modeling for Virtual Reality Video Playback Applications

paper_url: http://arxiv.org/abs/2307.08338
repo_url: None
paper_authors: Christian Herglotz, Stéphane Coulombe, Ahmad Vakili, André Kaup
for: 评估和模型现代虚拟现实播放和流式应用程序在智能手机上的能 consumption。
methods: 通过进行功率测量，进一步构建一个用于估算真实能 consumption的模型，并且可以在关键电池水平下保存能源。
results: 结果显示，降低输入视频分辨率可以减少能 consumption。

Abstract
This paper proposes a method to evaluate and model the power consumption of modern virtual reality playback and streaming applications on smartphones. Due to the high computational complexity of the virtual reality processing toolchain, the corresponding power consumption is very high, which reduces operating times of battery-powered devices. To tackle this problem, we analyze the power consumption in detail by performing power measurements. Furthermore, we construct a model to estimate the true power consumption with a mean error of less than 3.5%. The model can be used to save power at critical battery levels by changing the streaming video parameters. Particularly, the results show that the power consumption is significantly reduced by decreasing the input video resolution.

摘要
Translated into Simplified Chinese:这篇论文提出了一种方法来评估和模拟现代虚拟现实播放和流媒体应用程序在智能手机上的电力消耗。由于虚拟现实处理排序链的计算复杂性很高，相应的电力消耗很大，导致耗电器上的运行时间受限。为解决这个问题，我们进行了电力测量，并构建了一个估算真实电力消耗的模型，模型的误差低于3.5%。这个模型可以在关键的电池水平下保存能量，通过修改流媒体参数来降低输入视频分辨率。结果表明，降低输入视频分辨率可以减少电力消耗。

Power-Efficient Video Streaming on Mobile Devices Using Optimal Spatial Scaling

paper_url: http://arxiv.org/abs/2307.08337
repo_url: None
paper_authors: Christian Herglotz, André Kaup, Stéphane Coulombe, Ahmad Vakili
for: 这个论文是为了实现功能强大的无线视频流媒体，以提高移动设备上的视频播放效率和能效性。
methods: 该论文使用了一种基于文献中的电源模型和主观质量评估指标，来 derive最佳的空间缩放和比特率控制参数。
results: 研究发现，可以通过调整输入视频的分辨率，以优化质量-能效性的交易。对于高清序列，可以保持10%的电源储备，而无损质量损失，或者保持15%的电源储备，而tolerable distortion。测试结果表明，该方法在Wi-Fi和移动网络中具有普遍适用性。

Abstract
This paper derives optimal spatial scaling and rate control parameters for power-efficient wireless video streaming on portable devices. A video streaming application is studied, which receives a high-resolution and high-quality video stream from a remote server and displays the content to the end-user.We show that the resolution of the input video can be adjusted such that the quality-power trade-off is optimized. Making use of a power model from the literature and subjective quality evaluation using a perceptual metric, we derive optimal combinations of the scaling factor and the rate-control parameter for encoding. For HD sequences, up to 10% of power can be saved at negligible quality losses and up to 15% of power can be saved at tolerable distortions. To show general validity, the method was tested for Wi-Fi and a mobile network as well as for two different smartphones.

摘要
这篇论文研究了对移动设备进行功能强化的无线视频流式传输中的空间缩放和速率控制参数优化。一个视频流应用程序被研究，它从远程服务器接收高分辨率和高质量视频流，并将内容显示给终端用户。我们表明，可以根据输入视频的分辨率进行调整，以优化质量-功耗交易。使用文献中提供的电力模型和主观质量评价使用一种感知指标，我们得出了最佳的缩放因子和编码参数的组合。对高清序列，可以在不影响质量的情况下将电力减少10%，或者在可接受的损害下减少15%。为证明普适性，方法在Wi-Fi和移动网络以及两种不同的智能手机上进行了测试。

Combiner and HyperCombiner Networks: Rules to Combine Multimodality MR Images for Prostate Cancer Localisation

paper_url: http://arxiv.org/abs/2307.08279
repo_url: None
paper_authors: Wen Yan, Bernard Chiu, Ziyi Shen, Qianye Yang, Tom Syer, Zhe Min, Shonit Punwani, Mark Emberton, David Atkinson, Dean C. Barratt, Yipeng Hu
For: This paper aims to demonstrate the feasibility of using low-dimensional parametric models to model decision rules for radiologists’ reading of multiparametric prostate MR scans, and to improve the efficiency of automated radiologist labeling.* Methods: The proposed Combiner networks use a linear mixture model or a nonlinear stacking model to model PI-RADS decision rules, and train a single image segmentation network that can be conditioned on these hyperparameters during inference.* Results: Experimental results based on data from 850 patients show that the proposed combiner networks outperform other commonly-adopted end-to-end networks, and provide added advantages in obtaining and interpreting the modality combining rules. The paper also presents three clinical applications for prostate cancer segmentation, including modality availability assessment, importance quantification, and rule discovery.

Abstract
One of the distinct characteristics in radiologists' reading of multiparametric prostate MR scans, using reporting systems such as PI-RADS v2.1, is to score individual types of MR modalities, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced, and then combine these image-modality-specific scores using standardised decision rules to predict the likelihood of clinically significant cancer. This work aims to demonstrate that it is feasible for low-dimensional parametric models to model such decision rules in the proposed Combiner networks, without compromising the accuracy of predicting radiologic labels: First, it is shown that either a linear mixture model or a nonlinear stacking model is sufficient to model PI-RADS decision rules for localising prostate cancer. Second, parameters of these (generalised) linear models are proposed as hyperparameters, to weigh multiple networks that independently represent individual image modalities in the Combiner network training, as opposed to end-to-end modality ensemble. A HyperCombiner network is developed to train a single image segmentation network that can be conditioned on these hyperparameters during inference, for much improved efficiency. Experimental results based on data from 850 patients, for the application of automating radiologist labelling multi-parametric MR, compare the proposed combiner networks with other commonly-adopted end-to-end networks. Using the added advantages of obtaining and interpreting the modality combining rules, in terms of the linear weights or odds-ratios on individual image modalities, three clinical applications are presented for prostate cancer segmentation, including modality availability assessment, importance quantification and rule discovery.

摘要
一个 radiologists 在多 Parametric prostate MR 图像读取中的特征是将不同类型的 MR 模式分别评分，使用 PI-RADS v2.1 报告系统，并将这些图像模式特定的分数相互结合使用标准化的决策规则预测肉眼标签的可能性。本研究旨在证明可以使用低维度parametric模型来模型这些决策规则，无需妥协精度预测肉眼标签。首先，研究表明，线性混合模型或非线性堆叠模型都可以模型PI-RADS决策规则，用于本地化肉眼悬液肿瘤。其次，通过将这些（通用）线性模型的参数作为 гипер参数，可以将多个独立表示不同图像模式的网络在Combiner网络训练中进行权重调整，而不是END-TO-END模式ensemble。在这个HyperCombiner网络中，可以在推理时通过conditioning来控制这些参数，以提高效率。实验结果基于850名患者的数据，对于自动化肉眼标注多参量MR的应用，与其他常见的END-TO-END网络进行比较。通过获得和解释这些组合规则，即图像模式特定的线性权重或抽象比率，可以对肉眼标注进行多种优化和应用。例如，可以根据图像模式的可用性进行评估，或者根据图像模式的重要性进行量化，还可以通过发现新的规则来进行肉眼标注。

Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network

paper_url: http://arxiv.org/abs/2307.08268
repo_url: None
paper_authors: Ke Yan, Xiaoli Yin, Yingda Xia, Fakai Wang, Shu Wang, Yuan Gao, Jiawen Yao, Chunli Li, Xiaoyu Bai, Jingren Zhou, Ling Zhang, Le Lu, Yu Shi
for: liver tumor segmentation and classification
methods: 使用mask transformer进行同时分割和类别 each lesion,以及image-wise classifier来Integrate global信息
results: 在非对照CT预处理任务中，PLAN achieved 95%和96%的患者级敏感性和特异性; 在对照CT任务中，我们的肿体分割精度、回卷率和类别精度分别达92%, 89%和86%,超过了广泛使用的CNN和transformers для肿体分割; 我们还对250个例进行了读者研究, PLAN的结果与一名高级人类放射学家一样，表明我们的结果具有临床意义。

Abstract
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.

摘要
liver tumor分割和分类是计算机辅助诊断中的重要任务。我们想要解决三个问题：liver tumor在非对照计算机 Tomography（CT）中的检测和初步诊断，以及在动态对照CT中的分化诊断。我们提出了一个名为Pixel-Lesion-pAtient Network（PLAN）的框架。它使用一个面Mask transformer来同时分割和分类每个肿瘤，并使用改进的锚点查询和前景增强抽象损失来提高分割精度。它还有一个图像级别分类器，以有效地汇集全像信息并预测patient级诊断。我们收集了一个大规模多阶段数据集，包括939个肿瘤病人和810个正常Subject。4010个肿瘤实例中有八种类型进行了广泛的注释。在非对照肿瘤检测任务上，PLAN达到了95%和96%的patient级敏感性和特异性。在对照CT任务上，我们的肿瘤分割精度、检测精度和分类精度分别为92%, 89%和86%，超过了广泛使用的CNN和transformers для肿瘤分割。我们还进行了一个读者研究，其中PLAN与一名 senior human radiologist相当，表明了我们的结果的临床意义。

Extreme Image Compression using Fine-tuned VQGAN Models

paper_url: http://arxiv.org/abs/2307.08265
repo_url: None
paper_authors: Qi Mao, Tinghan Yang, Yinuo Zhang, Shuyin Pan, Meng Wang, Shiqi Wang, Siwei Ma
for: 提高压缩数据的感知质量，特别是在低比特率下。
methods: 引入生成器模型（VQGAN），使用vector quantization（VQ）将图像表示为矢量编码。
results: 在EXTREMELY低比特率下（<0.1 bpp），提高图像压缩后的感知质量，并且超过了现有的代码库。

Abstract
Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. Nevertheless, their efficacy and applicability in achieving extreme compression ratios ($<0.1$ bpp) still remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)-based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We then propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality. Extensive qualitative and quantitative experiments on various datasets demonstrate that the proposed framework outperforms the state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

摘要
Translation notes:* "generative compression methods" is translated as "生成压缩方法" (shēngchǎn zhùsuā fāngyì)* "perceptual quality" is translated as "感知质量" (gǎnzhì zhìliàng)* "codebook" is translated as "代码本" (dàimódian)* "VQGAN model" is translated as "VQGAN模型" (VQGAN módeli)* "K-means algorithm" is translated as "K-means算法" (K-means suānfǎ)* "variable bitrates" is translated as "变量比特率" (biànlvèng bǐtiéshù)* "different levels of reconstruction quality" is translated as "不同的重建质量" (bùdōng de zhòngjiàn zhìliàng)

Adaptively Placed Multi-Grid Scene Representation Networks for Large-Scale Data Visualization

paper_url: http://arxiv.org/abs/2308.02494
repo_url: https://github.com/skywolf829/apmgsrn
paper_authors: Skylar Wolfgang Wurster, Tianyu Xiong, Han-Wei Shen, Hanqi Guo, Tom Peterka
for: 这 paper 是为了提高 scientific data 的压缩和可视化而提出的Scene Representation Networks (SRNs)。
methods: 该 paper 使用了适应放置的多重网格 SRN (APMGSRN) 和域 decomposit 训练和推理技术来加速多个 GPU 系统上的训练。
results: 该 paper 表明，使用 APMGSRN 可以提高 SRNs 的重建精度，而无需耗费贵重的 octree 细化、截割和搜索。它还提供了一个开源的 neural volume rendering 应用程序，可以轻松地在任何 PyTorch-based SRN 上进行渲染。

Abstract
Scene representation networks (SRNs) have been recently proposed for compression and visualization of scientific data. However, state-of-the-art SRNs do not adapt the allocation of available network parameters to the complex features found in scientific data, leading to a loss in reconstruction quality. We address this shortcoming with an adaptively placed multi-grid SRN (APMGSRN) and propose a domain decomposition training and inference technique for accelerated parallel training on multi-GPU systems. We also release an open-source neural volume rendering application that allows plug-and-play rendering with any PyTorch-based SRN. Our proposed APMGSRN architecture uses multiple spatially adaptive feature grids that learn where to be placed within the domain to dynamically allocate more neural network resources where error is high in the volume, improving state-of-the-art reconstruction accuracy of SRNs for scientific data without requiring expensive octree refining, pruning, and traversal like previous adaptive models. In our domain decomposition approach for representing large-scale data, we train an set of APMGSRNs in parallel on separate bricks of the volume to reduce training time while avoiding overhead necessary for an out-of-core solution for volumes too large to fit in GPU memory. After training, the lightweight SRNs are used for realtime neural volume rendering in our open-source renderer, where arbitrary view angles and transfer functions can be explored. A copy of this paper, all code, all models used in our experiments, and all supplemental materials and videos are available at https://github.com/skywolf829/APMGSRN.

摘要
Scene representation networks (SRNs) 有最近提出用于数据压缩和可视化的新方法。然而，当前的SRNs不会根据科学数据中复杂的特征进行分配可用的网络参数，导致重建质量下降。我们解决这个缺陷，通过适应地在多个网格上分布多个特性网络（APMGSRN），并提出基于多个GPU系统的域分解训练和执行技术。我们还发布了基于PyTorch的开源神经量化渲染应用，可以方便地在任意的PyTorch-based SRN上进行渲染。我们的APMGSRN架构使用多个空间自适应特征网格，以学习在域中的位置，以动态分配更多的神经网络资源，以提高SRNs的重建精度。在我们的域分解方法中，我们在不同的GPU系统上并行训练多个APMGSRN，以降低训练时间，而不需要昂贵的octree优化、剪辑和搜索。之后，我们使用轻量级的SRN进行实时神经量化渲染。一个包含这篇论文、所有代码、所有在我们实验中用到的模型、以及所有补充材料和视频的报告可以在https://github.com/skywolf829/APMGSRN中找到。

GastroVision: A Multi-class Endoscopy Image Dataset for Computer Aided Gastrointestinal Disease Detection

paper_url: http://arxiv.org/abs/2307.08140
repo_url: https://github.com/debeshjha/gastrovision
paper_authors: Debesh Jha, Vanshali Sharma, Neethi Dasu, Nikhil Kumar Tomar, Steven Hicks, M. K. Bhuyan, Pradip K. Das, Michael A. Riegler, Pål Halvorsen, Ulas Bagci, Thomas de Lange
for:这个研究是为了解决融合实时人工智能（AI）系统在临床实践中的挑战，包括扩展和acceptability。methods:这个研究使用了多中心开放存取的胃肠综合镜像数据集（GastroVision），包括不同的解剖特征、病理异常、肿瘤移除 caso和正常找到（总共27个类别）的胃肠道。数据集包括来自挪威巴鲁姆医院和瑞典卡罗琳斯卡大学医院的8,000幅照片，并由经验轻肠综合医生进行标注和验证。results:我们 validate了我们的数据集的重要性，使用了广泛的benchmarking，基于受欢迎的深度学习基础模型。我们相信我们的数据集可以促进AI基于算法的胃肠疾病检测和分类的发展。我们的数据集可以在 \url{https://osf.io/84e7f/} 上获取。

Abstract
Integrating real-time artificial intelligence (AI) systems in clinical practices faces challenges such as scalability and acceptance. These challenges include data availability, biased outcomes, data quality, lack of transparency, and underperformance on unseen datasets from different distributions. The scarcity of large-scale, precisely labeled, and diverse datasets are the major challenge for clinical integration. This scarcity is also due to the legal restrictions and extensive manual efforts required for accurate annotations from clinicians. To address these challenges, we present \textit{GastroVision}, a multi-center open-access gastrointestinal (GI) endoscopy dataset that includes different anatomical landmarks, pathological abnormalities, polyp removal cases and normal findings (a total of 27 classes) from the GI tract. The dataset comprises 8,000 images acquired from B{\ae}rum Hospital in Norway and Karolinska University Hospital in Sweden and was annotated and verified by experienced GI endoscopists. Furthermore, we validate the significance of our dataset with extensive benchmarking based on the popular deep learning based baseline models. We believe our dataset can facilitate the development of AI-based algorithms for GI disease detection and classification. Our dataset is available at \url{https://osf.io/84e7f/}.

摘要
临床应用人工智能（AI）系统整合面临挑战，包括可扩展性和接受性。这些挑战包括数据可用性、结果偏见、数据质量、透明度不足和不同分布下的性能下降。医疗数据的罕见性是临床整合的主要挑战之一，这也是由于法律限制和精度的手动准备所致。为解决这些挑战，我们介绍了《胃视》，一个多中心开放访问胃肠细胞图像数据集，包括胃肠脏器的不同解剖特征、疾病畸形、肿瘤除除例和正常发现（总共27个类）。该数据集包括8,000张从挪威布莱姆医院和瑞典卡罗琳斯卡大学医院所获取的图像，由经验丰富的胃肠镜头医生进行了标注和验证。此外，我们还验证了我们的数据集的重要性，通过基于深度学习的标准模型的比较。我们认为，我们的数据集可以促进基于AI的胃肠疾病检测和分类算法的发展。我们的数据集可以在中下载。

Neural Orientation Distribution Fields for Estimation and Uncertainty Quantification in Diffusion MRI

paper_url: http://arxiv.org/abs/2307.08138
repo_url: None
paper_authors: William Consagra, Lipeng Ning, Yogesh Rathi
for: 这篇论文主要是用于描述一种新的深度学习方法，用于精确地估算 diffusion MRI（dMRI）信号中的方向分布函数（ODF）。
methods: 该方法使用神经网络（NF）来 parameterize一种随机列表表示的秘密 ODF 场，并通过显式地模型数据中的空间相关性结构，以提高精度和效率。
results: 对于 synthetic 和实际的 in-vivo diffusion数据，该方法与现有方法相比，具有更高的精度和更低的不确定性。

Abstract
Inferring brain connectivity and structure \textit{in-vivo} requires accurate estimation of the orientation distribution function (ODF), which encodes key local tissue properties. However, estimating the ODF from diffusion MRI (dMRI) signals is a challenging inverse problem due to obstacles such as significant noise, high-dimensional parameter spaces, and sparse angular measurements. In this paper, we address these challenges by proposing a novel deep-learning based methodology for continuous estimation and uncertainty quantification of the spatially varying ODF field. We use a neural field (NF) to parameterize a random series representation of the latent ODFs, implicitly modeling the often ignored but valuable spatial correlation structures in the data, and thereby improving efficiency in sparse and noisy regimes. An analytic approximation to the posterior predictive distribution is derived which can be used to quantify the uncertainty in the ODF estimate at any spatial location, avoiding the need for expensive resampling-based approaches that are typically employed for this purpose. We present empirical evaluations on both synthetic and real in-vivo diffusion data, demonstrating the advantages of our method over existing approaches.

摘要
推断脑内连接和结构需要准确地估计Diffusion MRI（dMRI）信号中的方向分布函数（ODF），该函数包含脑组织重要的地方性特性。然而，从dMRI信号中估计ODF是一个困难的反向问题，因为存在干扰、高维度参数空间和缺乏方向测量的问题。在本文中，我们解决这些挑战，提出了一种基于深度学习的方法，用于连续地估计和评估空间变化的ODF场。我们使用神经场（NF）来参数化 latent ODFs 的随机列表表示，间接地模拟了通常被忽略的但有价值的空间相关结构，从而在稀缺和干扰的情况下提高效率。我们 Derive 一个analytic approximation to the posterior predictive distribution，可以用来评估 ODF 估计中任何空间位置的不确定性，避免使用常见的重新采样基本方法。我们在 synthetic 和实际的 in vivo diffusion 数据上进行了实验，并证明了我们的方法的优势。

Untrained neural network embedded Fourier phase retrieval from few measurements

paper_url: http://arxiv.org/abs/2307.08717
repo_url: https://github.com/liyuan-2000/trad
paper_authors: Liyuan Ma, Hongxia Wang, Ningyi Leng, Ziyang Yuan
for: 这篇论文旨在解决快速执行 Fourier 频分析 (FPR) 问题，以减少时间和硬件成本。
methods: 该论文提出了一种基于 alternating direction method of multipliers (ADMM) 框架的无经验神经网络 (NN) 嵌入算法，用于解决 FPR 问题。
results: 实验结果表明，该算法在计算资源少的情况下表现更好，甚至可以与经过训练的神经网络 (NN) 算法竞争。

Abstract
Fourier phase retrieval (FPR) is a challenging task widely used in various applications. It involves recovering an unknown signal from its Fourier phaseless measurements. FPR with few measurements is important for reducing time and hardware costs, but it suffers from serious ill-posedness. Recently, untrained neural networks have offered new approaches by introducing learned priors to alleviate the ill-posedness without requiring any external data. However, they may not be ideal for reconstructing fine details in images and can be computationally expensive. This paper proposes an untrained neural network (NN) embedded algorithm based on the alternating direction method of multipliers (ADMM) framework to solve FPR with few measurements. Specifically, we use a generative network to represent the image to be recovered, which confines the image to the space defined by the network structure. To improve the ability to represent high-frequency information, total variation (TV) regularization is imposed to facilitate the recovery of local structures in the image. Furthermore, to reduce the computational cost mainly caused by the parameter updates of the untrained NN, we develop an accelerated algorithm that adaptively trades off between explicit and implicit regularization. Experimental results indicate that the proposed algorithm outperforms existing untrained NN-based algorithms with fewer computational resources and even performs competitively against trained NN-based algorithms.

摘要
法ouveau频段恢复（FPR）是一项广泛应用的复杂任务，涉及于从傅里叶频域无法量测数据中恢复未知信号。FPR WITH few measurements是一项重要的应用，可以降低时间和硬件成本，但它受到严重的不定性困难。最近，无经过训练的神经网络（NN）已经提供了新的方法，通过引入学习的约束来缓解不定性，不需要任何外部数据。然而，它们可能无法完美地复制图像中的细节，并且可能具有高计算成本。这篇论文提出了一种无经过训练NN嵌入算法，基于 alternating direction method of multipliers（ADMM）框架来解决FPR WITH few measurements。特别是，我们使用一个生成网络来表示要恢复的图像，这使得图像受到生成网络的结构所限制。为了提高图像中高频信息的恢复，我们添加了总变量（TV）正则化，以便促进图像中的本地结构的恢复。此外，为了降低主要由无经过训练NN的参数更新所导致的计算成本，我们开发了一种可适应的加速算法，可以自适应地让拥有更多计算资源的计算机进行更多的计算。实验结果表明，我们的算法比现有的无经过训练NN基于算法更高效，甚至可以与经过训练NN基于算法竞争。