results: 比前方法更高效,能够提供更高品质的 CT 影像,并且具有更少的噪音和杂音Here’s a more detailed explanation of each point:
for: The paper is focused on developing a novel method for reconstructing high-quality CT images from sparse-view measurements.
methods: The proposed method uses Implicit Neural Representations (INRs) to establish a coordinate-based mapping between sinograms and CT images. Additionally, a self-supervised method called Anti-Aliasing Projection Representation Field (APRF) is proposed to improve the quality of the reconstructed images.
results: The proposed method outperforms state-of-the-art methods in terms of image quality, with fewer artifacts and more accurate details. The code for the proposed method will be publicly available soon.Abstract
Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views, resulting in aliasing artifacts on SV sinograms. To address this issue, we propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF), which can build the continuous representation between adjacent projection views via the spatial constraints. Specifically, APRF only needs SV sinograms for training, which first employs a line-segment sampling module to estimate the distribution of projection views in a local region, and then synthesizes the corresponding sinogram values using center-based line integral module. After training APRF on a single SV sinogram itself, it can synthesize the corresponding dense-view (DV) sinogram with consistent continuity. High-quality CT images can be obtained by applying re-projection techniques on the predicted DV sinograms. Extensive experiments on CT images demonstrate that APRF outperforms state-of-the-art methods, yielding more accurate details and fewer artifacts. Our code will be publicly available soon.
摘要
sparse-view 计算机断层成像(SVCT)重建是一个不定问题在成像中,旨在基于稀疏样本获取高质量 CT 图像。 current works 使用隐式神经表示(INRs)建立坐标基于的映射 между sinograms 和 CT 图像。 however, these methods have not considered the correlation between adjacent projection views, resulting in aliasing artifacts on SV sinograms. to address this issue, we propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF), which can build the continuous representation between adjacent projection views via the spatial constraints. specifically, APRF only needs SV sinograms for training, which first employs a line-segment sampling module to estimate the distribution of projection views in a local region, and then synthesizes the corresponding sinogram values using center-based line integral module. after training APRF on a single SV sinogram itself, it can synthesize the corresponding dense-view (DV) sinogram with consistent continuity. high-quality CT images can be obtained by applying re-projection techniques on the predicted DV sinograms. extensive experiments on CT images demonstrate that APRF outperforms state-of-the-art methods, yielding more accurate details and fewer artifacts. our code will be publicly available soon.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.
Does pre-training on brain-related tasks results in better deep-learning-based brain age biomarkers?
paper_authors: Bruno Machado Pacheco, Victor Hugo Rocha de Oliveira, Augusto Braga Fernandes Antunes, Saulo Domingos de Souza Pedro, Danilo Silva
for: 预测健康人脑年龄,作为脑健康指标和成功老化指标,以及疾病生物标志。
methods: 使用深度学习模型预测健康人脑年龄,并在训练过程中使用脑相关任务进行预处理。
results: 研究发现,使用脑相关任务预处理的深度学习模型可以在ADNI数据集上达到状态之作的结果,而且可以验证这些模型的脑年龄生物标志在轻度认知障碍和阿尔茨heimer病患者的图像上的效果。Abstract
Brain age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.
摘要
��colon cancer prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.Here's the word-for-word translation:��colon cancer prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.
Encoder Complexity Control in SVT-AV1 by Speed-Adaptive Preset Switching
results: 可以在编码过程中实现用户定义的时间限制,而无需添加额外延迟。Abstract
Current developments in video encoding technology lead to continuously improving compression performance but at the expense of increasingly higher computational demands. Regarding the online video traffic increases during the last years and the concomitant need for video encoding, encoder complexity control mechanisms are required to restrict the processing time to a sufficient extent in order to find a reasonable trade-off between performance and complexity. We present a complexity control mechanism in SVT-AV1 by using speed-adaptive preset switching to comply with the remaining time budget. This method enables encoding with a user-defined time constraint within the complete preset range with an average precision of 8.9 \% without introducing any additional latencies.
摘要
当前的视频编码技术发展趋势导致无限扩大压缩性能,但是同时也增加了计算负担。随着上传视频流量的增长和相关的编码需求,需要控制编码器复杂性来限制处理时间,以达到一个合理的性能和复杂性交互。我们在SVT-AV1中提出了一种复杂性控制机制,通过速度适应预设切换来遵守剩余时间预算。这种方法可以在完整的预设范围内进行编码,并且保证用户定义的时间约束,平均准确率为8.9%,不增加任何延迟。
HistoColAi: An Open-Source Web Platform for Collaborative Digital Histology Image Annotation with AI-Driven Predictive Integration
results: 本论文包含了一个关于细胞皮肤肿瘤诊断的用例,并进行了多个注释者的可用性研究,以验证开发的工具的可行性。Abstract
Digital pathology has become a standard in the pathology workflow due to its many benefits. These include the level of detail of the whole slide images generated and the potential immediate sharing of cases between hospitals. Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.
摘要
数字病理学已成为病理学工作流程中的标准,这主要归功于它的多种优点。这些优点包括整个报告图像的细节水平和可以立即将案例传递给医院。Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.Here's the translation in Traditional Chinese: digitale pathology 已成为病理学工作流程中的标准,这主要从其许多优点中来。这些优点包括整个报告图像的细节水平和可以立即将案例传递到医院。Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.
Super-resolution imaging through a multimode fiber: the physical upsampling of speckle-driven
methods: 这个论文使用了一种基于深度学习的超分解影像方法,通过physically realizing the upsampling of low-resolution images来提高模型的感知能力。
results: 实验表明,这种方法可以有效地提高endooscopic imaging的精度和质量,并且可以补偿数据驱动方法中缺失的信息。Abstract
Following recent advancements in multimode fiber (MMF), miniaturization of imaging endoscopes has proven crucial for minimally invasive surgery in vivo. Recent progress enabled by super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most of the DL approaches lack attention to the physical properties of the speckle, which is crucial for reconciling the relationship between the magnification of super-resolution imaging and the quality of reconstruction quality. In the paper, we find that the interferometric process of speckle formation is an essential basis for creating DL models with super-resolution imaging. It physically realizes the upsampling of low-resolution (LR) images and enhances the perceptual capabilities of the models. The finding experimentally validates the role played by the physical upsampling of speckle-driven, effectively complementing the lack of information in data-driven. Experimentally, we break the restriction of the poor reconstruction quality at great magnification by inputting the same size of the speckle with the size of the high-resolution (HR) image to the model. The guidance of our research for endoscopic imaging may accelerate the further development of minimally invasive surgery.
摘要
Recent advancements in multimode fiber (MMF) have made it crucial to miniaturize imaging endoscopes for minimally invasive surgery in vivo. Recent progress in super-resolution imaging methods with a data-driven deep learning (DL) framework has balanced the relationship between the core size and resolution. However, most DL approaches overlook the physical properties of speckle, which is essential for reconciling the relationship between the magnification of super-resolution imaging and the reconstruction quality. In our paper, we find that the interferometric process of speckle formation is a crucial basis for creating DL models with super-resolution imaging. It physically realizes the upsampling of low-resolution (LR) images and enhances the perceptual capabilities of the models. Our findings experimentally validate the role played by the physical upsampling of speckle-driven, effectively complementing the lack of information in data-driven. Experimentally, we break the restriction of poor reconstruction quality at great magnification by inputting the same size of the speckle with the size of the high-resolution (HR) image to the model. Our research guidance may accelerate the further development of minimally invasive surgery.
Offline and Online Optical Flow Enhancement for Deep Video Compression
results: 实验结果显示,该方法可以在 tested 视频上实现平均 12.8% 的比特率减少,而无需增加解码器的模型或计算复杂度。Abstract
Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows, however, may be less suitable for video compression due to the following two factors. First, the optical flow estimation networks were trained to perform inter-frame prediction as accurately as possible, but the optical flows themselves may cost too many bits to encode. Second, the optical flow estimation networks were trained on synthetic data, and may not generalize well enough to real-world videos. We address the twofold limitations by enhancing the optical flows in two stages: offline and online. In the offline stage, we fine-tune a trained optical flow estimation network with the motion information provided by a traditional (non-deep) video compression scheme, e.g. H.266/VVC, as we believe the motion information of H.266/VVC achieves a better rate-distortion trade-off. In the online stage, we further optimize the latent features of the optical flows with a gradient descent-based algorithm for the video to be compressed, so as to enhance the adaptivity of the optical flows. We conduct experiments on a state-of-the-art deep video compression scheme, DCVC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 12.8% bitrate saving on the tested videos, without increasing the model or computational complexity of the decoder side.
摘要
视频压缩听说很多地利用视频帧之间的时间重复性,通常通过估计和使用运动信息来实现。运动信息通常被表示为光流在大多数现有的深度视频压缩网络中。实际上,这些网络 часто采用预训练的光流估计网络进行运动估计。然而,光流可能对视频压缩不适用,因为以下两个因素:一是光流估计网络在尽可能准确地进行间帧预测,但光流本身可能cost太多比特来编码。二是光流估计网络在synthetic数据上进行训练,可能无法在实际视频中generalize好 enough。我们通过两个阶段进行优化来解决这两个限制:离线阶段和在线阶段。在离线阶段,我们使用一个已经训练过的光流估计网络,并在H.266/VVC中提供的运动信息的基础上进行微调。在在线阶段,我们使用一种基于梯度下降算法的优化方法,以适应视频压缩。我们在DCVC中进行实验,实际结果表明,我们的离线和在线优化结合使用,在测试视频上平均实现12.8%的比特率折损,无需增加解码器的模型或计算复杂度。
SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation
results: 通过多种数据集的量化实验,研究者发现SAR-NeRF模型能够具有良好的多视角表示和泛化能力,并且可以在几何学学习设置下提高SAR目标分类性能。Abstract
SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class.
摘要
SAR图像受观测配置影响很大,并且在不同视角下显示出显著的变化,这使得深度学习方法很难通过不同视角来泛化。以NeRF原理为灵感,本研究将SAR探测机制与神经网络结合,提出了一种新的NeRF模型 дляSAR图像生成。通过 mapping和projection原理,SAR-NeRF模型将被用来学习积分吸收率和散射强度的分布,其中每个小体积的SAR渲染公式和视角网格之间的样本关系被分析 derivation。通过多种数据集的量化实验,我们全面评估了SAR-NeRF模型在多视角表示和泛化能力方面。此外,我们发现SAR-NeRF数据集可以大幅提高SAR目标分类性能,特别是在少量学习setup下,只需使用12个图像per类可以 дости得10类分类精度达91.6%。
Towards Anytime Optical Flow Estimation with Event Cameras
paper_authors: Yaozu Ye, Hao Shi, Kailun Yang, Ze Wang, Xiaoting Yin, Yaonan Wang, Kaiwei Wang
For: The paper is focused on developing a high-frame-rate, low-latency event representation for optical flow estimation using event cameras.* Methods: The proposed method, called EVA-Flow, uses a unified voxel grid to represent events and a stacked Spatiotemporal Motion Refinement (SMR) module to predict temporally-dense optical flow. The method also utilizes a Rectified Flow Warp Loss (RFWL) for unsupervised evaluation of intermediate optical flow.* Results: The proposed method achieves competitive performance, with super-low latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization.Here is the information in Simplified Chinese text:
results: 提议的方法实现了竞争性的性能,具有超低延迟(5ms)、最快的推理(9.2ms)、时间密集的运动估计(200Hz)和强大的泛化性。Abstract
Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cameras, however, only provide limited frame rate optical flow ground truth, (e.g., at 10Hz), greatly restricting the potential of event-driven optical flow. To address this challenge, we put forward a high-frame-rate, low-latency event representation Unified Voxel Grid, sequentially fed into the network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. The key component of our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module, which predicts temporally-dense optical flow and enhances the accuracy via spatial-temporal motion refinement. The time-dense feature warping utilized in the SMR module provides implicit supervision for the intermediate optical flow. Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the unsupervised evaluation of intermediate optical flow in the absence of ground truth. This is, to the best of our knowledge, the first work focusing on anytime optical flow estimation via event cameras. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.
摘要
Event 摄像头可以在微秒级别响应日志亮度变化。它的特点是仅响应变化区域,特别适合光流估算。然而,现有的事件摄像头数据集仅提供有限制的帧率光流真实值(例如10Hz),大大限制了事件驱动的光流潜力。为解决这个挑战,我们提出了高帧率、低延迟事件表示 Unified Voxel Grid,顺序Feed into网络bin by bin。然后,我们提出了EVENT-based Anytime Flow estimation Network(EVA-Flow),以生成高帧率事件光流,只需低帧率光流真实值作为超vision。EVA-Flow的关键组件是堆叠的空间时间运动级化(SMR)模块,预测时间密集的光流并通过空间时间运动级化提高准确性。SMR模块使用的时间密集特征扭曲提供了隐式超vision для中间光流。此外,我们引入了Rectified Flow Warp Loss(RFWL),用于无监督评估中间光流。这是我们所知道的首个关注在事件摄像头上的任何时间光流估算工作。我们在MVSEC、DESC和我们自己的EVA-FlowSet上进行了广泛的实验,并证明了EVA-Flow可以实现竞争性表现,超低延迟(5ms),最快执行(9.2ms),时间密集运动估计(200Hz)和强大总体化。我们的代码将在https://github.com/Yaozhuwa/EVA-Flow中提供。
Count-Free Single-Photon 3D Imaging with Race Logic
for: The paper is written for developing an online approach for distance estimation in single-photon cameras (SPCs) without explicitly storing photon counts.
methods: The paper uses race logic to process photon streams in the time-delay domain and constructs count-free equi-depth histograms using a binner element to represent the distribution of photons.
results: The paper shows that the proposed method can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.Here’s the Chinese version of the three points:
results: 论文表明,该方法可以减少带宽和功耗消耗量,同时保持与传统处理方法相同的距离重建精度。Abstract
Single-photon cameras (SPCs) have emerged as a promising technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Here we present an online approach for distance estimation without explicitly storing photon counts. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms. Equi-depth histograms are a succinct representation for ``peaky'' distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.
摘要
单 photon 摄像机(SPC)已成为高分辨率 3D 成像技术的承诺。单 photon 3D 摄像机通过记录每个像素的各个 фотоン的到达时间来确定激光脉冲的圆涂时间。构建 photon 时间频谱 Histogram 是单 photon 3D 摄像机的基本操作之一。然而,在每个像素中进行 Histogram 处理是计算昂贵的,需要大量的内存。将 photon 时间频谱数据转移到外部 Histogramming 模块进行处理也是带宽和功耗浪费。我们现在提出了一种在线方法,不需要直接存储 photon 计数。我们的方法包括以下两个关键组成部分:1. 使用竞赛逻辑处理 photon 流,以保持 photon 数据在时延频域中。2. 使用 equi-depth histogram 构建器,通过 converges onto the median (或更一般地,另一个量化)的方式,来生成具有 "peaky" 分布的 histogram。equi-depth histogram 是一种简洁的表示方式,用于描述由 SPC 像素反射激光脉冲后得到的分布。我们的方法使用一个 binner 元素,该元素 converge 到分布的中值(或更一般地,另一个量化)。我们将多个 binner 元素串接起来,形成一个 equi-depth histogrammer,该机制可以生成多个 histogram。我们的评估表明,这种方法可以在带宽和功耗上减少一个数量级,而保持与传统处理方法相似的距离重建精度。
Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators
results: 经过各种实验,提出的方法能够在动态移动的四足机器人上实现稳定的搜索、接近和抓取操作,并且能够抗抵抗外部干扰。Abstract
We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control, which integrates image-based visual servoing (IBVS). The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing. The whole approach and pipeline can be generalized for any mobile platform (wheeled or tracked vehicles), but is most suitable for dynamically moving quadruped manipulators thanks to their reactivity against disturbances. The compliance of the impedance controller makes the robot safer for interactions with humans and the environment. We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm. The experiments consider dynamic locomotion, tracking under external disturbances, and fast motions of the target object.
摘要
我们提出了一个SAG(搜索、接近和抓取)控制管道,基于分离式机械臂链和弹簧控制,并 integrates 图像基于视服务(IBVS)。机械链的分离使得结束器速度快,并且可以快速恢复,从而实现了可靠的视服务。整个方法和管道可以应用于任何移动平台(轮式或轨道车辆),但是最适合动态移动四足机械人,因为它们对干扰的反应更强。弹簧控制器的灵活性使得机器人在与人类和环境的互动中更安全。我们通过对我们7度自由度 manipulate 机械臂的HyQReal四足机械人进行多种实验,证明了我们的方法的性能和稳定性。实验包括动态移动、外部干扰追踪和目标物体快速移动。
Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery
results: 该方法在未见图像上达到了高精度的Deforestation 估计和烧毁地带检测。Here’s the same information in English:
for: The purpose of this research paper is to propose a method based on multimodal satellite imagery and remote sensing technology for estimating deforestation and detecting wildfire in the Amazon region.
methods: The research uses convolutional neural networks (CNNs) and comprehensive data processing techniques to solve these problems. The dataset includes curated images and diverse channel bands from Sentinel, Landsat, VIIRS, and MODIS satellites.
results: The method achieves high-precision deforestation estimation and burned area detection on unseen images from the region.Abstract
Deforestation estimation and fire detection in the Amazon forest poses a significant challenge due to the vast size of the area and the limited accessibility. However, these are crucial problems that lead to severe environmental consequences, including climate change, global warming, and biodiversity loss. To effectively address this problem, multimodal satellite imagery and remote sensing offer a promising solution for estimating deforestation and detecting wildfire in the Amazonia region. This research paper introduces a new curated dataset and a deep learning-based approach to solve these problems using convolutional neural networks (CNNs) and comprehensive data processing techniques. Our dataset includes curated images and diverse channel bands from Sentinel, Landsat, VIIRS, and MODIS satellites. We design the dataset considering different spatial and temporal resolution requirements. Our method successfully achieves high-precision deforestation estimation and burned area detection on unseen images from the region. Our code, models and dataset are open source: https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation
摘要
亚马逊森林的排除和野火检测 pose 一个 significative 挑战,因为该区域的面积很大,而访问也很困难。然而,这些问题会导致严重的环境后果,包括气候变化、全球变暖和生物多样性损失。为了有效地解决这个问题,多模态卫星成像和远程感知提供了一个有希望的解决方案,通过 convolutional neural networks (CNNs) 和全面的数据处理技术。我们的数据集包括手动准备的图像和多种通道频谱的卫星图像,包括 Sentinel、Landsat、VIIRS 和 MODIS 卫星。我们设计数据集,考虑不同的空间和时间分辨率要求。我们的方法在未见图像上达到了高精度的排除和烧毁地带检测。我们的代码、模型和数据集都是开源的,可以在 GitHub 上找到:https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation。
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization
results: 我们的模型在 BioNLP 共享任务的评分数据上显示出了超过其他模型的表现,并在 RadSum23 隐藏测试集的领导排行板上获得了第一名。Abstract
In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
摘要
在这篇论文中,我们引入CheXOFA,一种新的预训练视语言模型(VLM),用于胸X光领域。我们的模型首先在通用领域中预训练于多种多样的数据集,然后转移到胸X光领域。借鉴一些知名的VLM,我们将各种领域特定任务整合成简单的序列到sequence schema。这使得模型可以很好地从有限资源中学习需要的知识和技能。在BioNLP共享任务提供的标准 datasets上表现出色,我们的模型受益于跨任务和领域的训练。通过微妙的技巧,包括ensemble和事实抽象,我们的系统在RadSum23隐藏测试集上达到了第一名。
results: 训练时LandSat-8模型达到了93.45%的训练和验证像素精度,Sentinel-2模型达到了83.87%的像素精度。测试集评估中,模型达到了84.70%的像素精度,F1-Score为0.79,IOU为0.69。Abstract
In this paper, we present a deforestation estimation method based on attention guided UNet architecture using Electro-Optical (EO) and Synthetic Aperture Radar (SAR) satellite imagery. For optical images, Landsat-8 and for SAR imagery, Sentinel-1 data have been used to train and validate the proposed model. Due to the unavailability of temporally and spatially collocated data, individual model has been trained for each sensor. During training time Landsat-8 model achieved training and validation pixel accuracy of 93.45% and Sentinel-2 model achieved 83.87% pixel accuracy. During the test set evaluation, the model achieved pixel accuracy of 84.70% with F1-Score of 0.79 and IoU of 0.69.
摘要
在这篇论文中,我们提出了一种基于注意力引导的UNet架构的森林伐木估计方法,使用电Optical(EO)和Synthetic Aperture Radar(SAR)卫星图像。对于光学图像,我们使用了Landstat-8数据进行训练和验证;对于SAR图像,我们使用了Sentinel-1数据。由于数据的 temporally和spatially不同,我们需要单独训练每款感知器。在训练时,Landstat-8模型在训练和验证像素精度方面达到了93.45%,而Sentinel-2模型在验证像素精度方面达到了83.87%。在测试集评估中,模型在像素精度方面达到了84.70%,F1-Score为0.79,IOU为0.69。