eess.IV - 2023-07-26

Artifact Restoration in Histology Images with Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.14262
  • repo_url: https://github.com/zhenqi-he/artifusion
  • paper_authors: Zhenqi He, Junjun He, Jin Ye, Yiqing Shen
  • for: histological whole slide images (WSIs) restoration, to improve the examination difficulty for pathologists and Computer-Aided Diagnosis (CAD) systems.
  • methods: innovative denoising diffusion probabilistic model called ArtiFusion, which formulates the artifact region restoration as a gradual denoising process and uses a novel Swin-Transformer denoising architecture and time token scheme to capture local-global correlations.
  • results: effective restoration of artifact-corrupted histological WSIs, preserving tissue structures and stain style in artifact-free regions, demonstrated through extensive evaluations.
    Abstract Histological whole slide images (WSIs) can be usually compromised by artifacts, such as tissue folding and bubbles, which will increase the examination difficulty for both pathologists and Computer-Aided Diagnosis (CAD) systems. Existing approaches to restoring artifact images are confined to Generative Adversarial Networks (GANs), where the restoration process is formulated as an image-to-image transfer. Those methods are prone to suffer from mode collapse and unexpected mistransfer in the stain style, leading to unsatisfied and unrealistic restored images. Innovatively, we make the first attempt at a denoising diffusion probabilistic model for histological artifact restoration, namely ArtiFusion.Specifically, ArtiFusion formulates the artifact region restoration as a gradual denoising process, and its training relies solely on artifact-free images to simplify the training complexity.Furthermore, to capture local-global correlations in the regional artifact restoration, a novel Swin-Transformer denoising architecture is designed, along with a time token scheme. Our extensive evaluations demonstrate the effectiveness of ArtiFusion as a pre-processing method for histology analysis, which can successfully preserve the tissue structures and stain style in artifact-free regions during the restoration. Code is available at https://github.com/zhenqi-he/ArtiFusion.
    摘要 histological whole slide images (WSIs) 可以受到artefacts的影响,如组织卷积和气泡,这会提高Pathologist和计算机支持诊断系统(CAD)的评估难度。现有的恢复artefact图像方法包括生成对抗网络(GANs),其中恢复过程 формули为图像-图像传输。这些方法容易受到模式落入和意外传输的困难,导致 restored图像不满意和不真实。在创新的思路下,我们提出了一种denoising扩散概率模型,即ArtiFusion,用于 histological artefact 恢复。具体来说,ArtiFusion将artefact区域恢复视为一种渐进的denoising过程,其训练仅仅基于无artefact的图像,以简化训练复杂性。此外,为了捕捉local-global相关性在区域artefact恢复中,我们设计了一种Swin-Transformer恢复架构,以及一种时间标识符方案。我们的广泛评估表明ArtiFusion作为 histology 分析前置处理方法,能够成功保留组织结构和染色样式在无artefact区域中,并且不会导致图像损害。代码可以在https://github.com/zhenqi-he/ArtiFusion 上获取。

Visual Saliency Detection in Advanced Driver Assistance Systems

  • paper_url: http://arxiv.org/abs/2308.03770
  • repo_url: None
  • paper_authors: Francesco Rundo, Michael Sebastian Rundo, Concetto Spampinato
  • for: 本研究旨在开发一种智能驾驶系统,其能够检测司机的睡眠状况并根据场景的重要性进行分类。
  • methods: 该系统使用了特殊的3D深度学习网络进行semantic segmentation,并在驾驶器上使用STA1295核心和硬件加速器进行实时处理。此外,还使用了车辆轮胎上的生物传感器来监测司机的睡眠状况,并使用1D时间深度卷积网络来分类司机的PPG信号。
  • results: 实验结果表明,该系统能够有效地检测司机的睡眠状况和场景重要性,并且可以准确地评估司机的注意力水平。
    Abstract Visual Saliency refers to the innate human mechanism of focusing on and extracting important features from the observed environment. Recently, there has been a notable surge of interest in the field of automotive research regarding the estimation of visual saliency. While operating a vehicle, drivers naturally direct their attention towards specific objects, employing brain-driven saliency mechanisms that prioritize certain elements over others. In this investigation, we present an intelligent system that combines a drowsiness detection system for drivers with a scene comprehension pipeline based on saliency. To achieve this, we have implemented a specialized 3D deep network for semantic segmentation, which has been pretrained and tailored for processing the frames captured by an automotive-grade external camera. The proposed pipeline was hosted on an embedded platform utilizing the STA1295 core, featuring ARM A7 dual-cores, and embeds an hardware accelerator. Additionally, we employ an innovative biosensor embedded on the car steering wheel to monitor the driver drowsiness, gathering the PhotoPlethysmoGraphy (PPG) signal of the driver. A dedicated 1D temporal deep convolutional network has been devised to classify the collected PPG time-series, enabling us to assess the driver level of attentiveness. Ultimately, we compare the determined attention level of the driver with the corresponding saliency-based scene classification to evaluate the overall safety level. The efficacy of the proposed pipeline has been validated through extensive experimental results.
    摘要 视觉吸引力(Visual Saliency)是人类自然的注意力机制,它使人们在观察环境中强调和提取重要的特征。在汽车研究领域,计算视觉吸引力的技术受到了最近的关注。驾驶时, drivers 会自然地将注意力集中在特定的对象上,使用大脑驱动的注意力机制,它会优先级化某些元素。在本研究中,我们提出了一种智能系统,其combines 驾驶者睡眠检测系统和基于视觉吸引力的场景理解管道。为实现这一目标,我们实现了一种专门的3D深度网络 дляsemantic segmentation,该网络在 automotive-grade 外部摄像头捕捉的帧中进行了预训练和定制。我们的提案的管道在 ARM A7 双核心的 Embedded 平台上运行,并使用了硬件加速器。此外,我们还使用了一种创新的车辙把握 sensor,以监测驾驶者的睡眠状况,并收集了PhotoPlethysmoGraphy(PPG)信号。一个专门的1D时间深度卷积网络被设计用于分类收集的 PPG 时间序列,以评估驾驶者的注意力水平。最后,我们将驾驶者的注意力水平与相应的视觉吸引力基于场景分类进行比较,以评估整体安全水平。我们的实验结果表明,提案的管道具有良好的效果。

Non-Linear Self Augmentation Deep Pipeline for Cancer Treatment outcome Prediction

  • paper_url: http://arxiv.org/abs/2307.14398
  • repo_url: None
  • paper_authors: Francesco Rundo, Concetto Spampinato, Michael Rundo
  • for: 这篇论文旨在探讨如何通过免疫疗法治疗肿瘤,以提高患者的存活率和减少化学疗法对身体的负面影响。
  • methods: 该论文提出了一种新的策略,即利用非线性细胞建筑和深度下游分类器,从肿瘤CT影像中提取和增强2D特征,以提高治疗结果预测的精度。
  • results: 作者们在实验中表明,该策略可以达到约93%的总准确率,表明其效果概括比较出色。
    Abstract Immunotherapy emerges as promising approach for treating cancer. Encouraging findings have validated the efficacy of immunotherapy medications in addressing tumors, resulting in prolonged survival rates and notable reductions in toxicity compared to conventional chemotherapy methods. However, the pool of eligible patients for immunotherapy remains relatively small, indicating a lack of comprehensive understanding regarding the physiological mechanisms responsible for favorable treatment response in certain individuals while others experience limited benefits. To tackle this issue, the authors present an innovative strategy that harnesses a non-linear cellular architecture in conjunction with a deep downstream classifier. This approach aims to carefully select and enhance 2D features extracted from chest-abdomen CT images, thereby improving the prediction of treatment outcomes. The proposed pipeline has been meticulously designed to seamlessly integrate with an advanced embedded Point of Care system. In this context, the authors present a compelling case study focused on Metastatic Urothelial Carcinoma (mUC), a particularly aggressive form of cancer. Performance evaluation of the proposed approach underscores its effectiveness, with an impressive overall accuracy of approximately 93%
    摘要 免疫疗法在治疗癌症方面emerges as a promising approach. 有鼓舞的结果验证了免疫疗药的有效性,对抗肿瘤,导致生存时间增长和化学治疗方法相比,对于患有癌症的患者而言,有着更好的体验和较少的副作用。然而,有效治疗的病人群较小,这表明我们对于特定个体征所需的体化机制仍然缺乏了一致性的理解。为了解决这个问题,作者们提出了一个创新的策略,利用非线性细胞架构,联合深入的下游分类器。这种方法的目的是精确地选择和增强来自胸腹部Computed Tomography(CT)图像的2D特征,以提高治疗结果预测的精度。提案的管线被精心设计,与进步的嵌入式点数检查系统完美融合。在这个上下文中,作者们透过了一个吸引人的案例研究,专注于肉瘤癌(mUC),这是一种特别的攻击性癌症。研究表现了提案的有效性,其总准确率约为93%。

Tackling Scattering and Reflective Flare in Mobile Camera Systems: A Raw Image Dataset for Enhanced Flare Removal

  • paper_url: http://arxiv.org/abs/2307.14180
  • repo_url: None
  • paper_authors: Fengbo Lan, Chang Wen Chen
  • for: 提高手持式摄像头系统的图像质量,特别是解决散射和反射炬的问题。
  • methods: 使用Raw图像 Dataset,该Dataset包括多种不同的手持式摄像头和摄像头设置,并且可以分解成多个patches,以适应不同的捕捉环境。
  • results: 实验结果表明,使用真实图像Dataset可以更好地适应复杂的照明环境,而使用Synthesized数据会降低图像质量。 Raw图像数据也有显著优势在解决散射和反射炬问题。
    Abstract The increasing prevalence of mobile devices has led to significant advancements in mobile camera systems and improved image quality. Nonetheless, mobile photography still grapples with challenging issues such as scattering and reflective flare. The absence of a comprehensive real image dataset tailored for mobile phones hinders the development of effective flare mitigation techniques. To address this issue, we present a novel raw image dataset specifically designed for mobile camera systems, focusing on flare removal. Capitalizing on the distinct properties of raw images, this dataset serves as a solid foundation for developing advanced flare removal algorithms. It encompasses a wide variety of real-world scenarios captured with diverse mobile devices and camera settings. The dataset comprises over 2,000 high-quality full-resolution raw image pairs for scattering flare and 1,100 for reflective flare, which can be further segmented into up to 30,000 and 2,200 paired patches, respectively, ensuring broad adaptability across various imaging conditions. Experimental results demonstrate that networks trained with synthesized data struggle to cope with complex lighting settings present in this real image dataset. We also show that processing data through a mobile phone's internal ISP compromises image quality while using raw image data presents significant advantages for addressing the flare removal problem. Our dataset is expected to enable an array of new research in flare removal and contribute to substantial improvements in mobile image quality, benefiting mobile photographers and end-users alike.
    摘要 “由于移动设备的普及,移动摄像系统已经取得了重要进步,并且提高了图像质量。然而,移动摄影仍然面临着困难的问题,如散射和反射镜光。由于现有的移动设备实验数据集仍然不够完整,因此对于这些问题的发展有限制。为解决这个问题,我们提出了一个新的原始数据集,特别是设计用于移动摄像系统,专注于灯光减除。利用原始图像的特有性,这个数据集成为了发展高级灯光减除算法的坚实基础。它包括了各种真实世界的拍摄情况,运用多种移动设备和相机设定。数据集包含了2,000多个高品质的全分辨率原始图像,用于散射灯光和反射灯光,可以进一步被切割为30,000个和2,200个相应的对称 patch。这使得这个数据集在不同的摄影情况下具有广泛的适用性。我们的实验结果显示,使用现成数据集进行训练的网络对于实际摄影情况下表现不佳。此外,将图像处理通过移动设备的内部ISP也会导致图像质量下降,而使用原始图像数据集则具有明显的优势,用于解决灯光减除问题。我们预期这个数据集将启动新的研究,并对移动图像质量做出重要改善,帮助移动摄影师和最终用户。”

Memory-Efficient Graph Convolutional Networks for Object Classification and Detection with Event Cameras

  • paper_url: http://arxiv.org/abs/2307.14124
  • repo_url: None
  • paper_authors: Kamil Jeziorek, Andrea Pinna, Tomasz Kryjak
  • for: This paper focuses on developing an efficient graph convolutional network (GCN) for processing event data in its original sparse form, with the goal of achieving high accuracy while minimizing computational and memory costs.
  • methods: The authors compare different graph convolution operations and evaluate their performance in terms of execution time, number of trainable model parameters, data format requirements, and training outcomes. They also implement an object detection architecture and evaluate its performance on the N-Caltech101 dataset.
  • results: The authors achieve a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. They also achieve an object detection accuracy of 53.7% mAP@0.5 and an execution rate of 82 graphs per second.Here’s the Chinese translation of the three key information points:
  • for: 这篇论文关注开发高效的图结构神经网络(GCN),用于处理事件数据的原始稀畴形式,以实现高精度while减少计算和内存成本。
  • methods: 作者比较了不同的图 convolution 操作,并评估其性能在执行时间、训练参数数量、数据格式要求和训练结果等方面。他们还实现了一个物体检测架构,并评估其性能在 N-Caltech101 数据集上。
  • results: 作者实现了一个 450 倍减少的特征提取模块参数数量,并将数据表示的大小减少到 4.5 倍,同时保持了 52.3% 的分类精度,相比之下和现有方法相比,提高了 6.3%。他们还实现了一个物体检测精度为 53.7% mAP@0.5 和执行速度为 82 个图/秒。
    Abstract Recent advances in event camera research emphasize processing data in its original sparse form, which allows the use of its unique features such as high temporal resolution, high dynamic range, low latency, and resistance to image blur. One promising approach for analyzing event data is through graph convolutional networks (GCNs). However, current research in this domain primarily focuses on optimizing computational costs, neglecting the associated memory costs. In this paper, we consider both factors together in order to achieve satisfying results and relatively low model complexity. For this purpose, we performed a comparative analysis of different graph convolution operations, considering factors such as execution time, the number of trainable model parameters, data format requirements, and training outcomes. Our results show a 450-fold reduction in the number of parameters for the feature extraction module and a 4.5-fold reduction in the size of the data representation while maintaining a classification accuracy of 52.3%, which is 6.3% higher compared to the operation used in state-of-the-art approaches. To further evaluate performance, we implemented the object detection architecture and evaluated its performance on the N-Caltech101 dataset. The results showed an accuracy of 53.7 % mAP@0.5 and reached an execution rate of 82 graphs per second.
    摘要

Periocular biometrics: databases, algorithms and directions

  • paper_url: http://arxiv.org/abs/2307.14111
  • repo_url: None
  • paper_authors: Fernando Alonso-Fernandez, Josef Bigun
  • for: 这篇论文主要是为了探讨 périocular 生物认证技术的现状和未来发展趋势。
  • methods: 论文使用了多种方法,包括Feature extraction from the periocular region, gender classification, ethnicity classification, and the impact of gender transformation or plastic surgery on recognition performance.
  • results: 论文提出了一些关键问题和未来发展趋势,包括使用 périocular 特征提高生物认证精度,以及gender transformation or plastic surgery的影响在认证性能中。
    Abstract Periocular biometrics has been established as an independent modality due to concerns on the performance of iris or face systems in uncontrolled conditions. Periocular refers to the facial region in the eye vicinity, including eyelids, lashes and eyebrows. It is available over a wide range of acquisition distances, representing a trade-off between the whole face (which can be occluded at close distances) and the iris texture (which do not have enough resolution at long distances). Since the periocular region appears in face or iris images, it can be used also in conjunction with these modalities. Features extracted from the periocular region have been also used successfully for gender classification and ethnicity classification, and to study the impact of gender transformation or plastic surgery in the recognition performance. This paper presents a review of the state of the art in periocular biometric research, providing an insight of the most relevant issues and giving a thorough coverage of the existing literature. Future research trends are also briefly discussed.
    摘要 périocular biometrics 已经成为一种独立的模式,由于人们对眼球或面部系统在无控制的环境中的性能有所担忧。 périocular 指的是眼睛附近的脸部区域,包括眼睛、毛发和眉毛。它可以在各种距离范围内获取,表示一种折衔 между整个脸部(可能被近距离 occluded)和眼球xture(没有足够的分辨率)。由于 périocular 区域会出现在face或iris图像中,因此它可以与这些模式一起使用。从 périocular 区域提取的特征已经成功地用于性别分类和民族分类,以及研究性别转换或整形手术对认知性能的影响。这篇论文提供了 periocular biometric 研究的现状报告,并给出了现有文献的全面概述。文章还 briefly discusses 未来的研究趋势。

Video Decoding Energy Estimation Using Processor Events

  • paper_url: http://arxiv.org/abs/2307.14000
  • repo_url: None
  • paper_authors: Christian Herglotz, André Kaup
  • for: 这个论文是用于研究软件视频解码器的处理能力的。
  • methods: 这个论文使用了处理器事件如 instrucion counts 或缓存misses来准确估计软件视频解码器的处理能力。
  • results: 这个论文的研究表明,使用该估计方法可以准确地估计最新的视频编码标准HEVC和VP9中的解码能力,mean estimation error小于6%。
    Abstract In this paper, we show that processor events like instruction counts or cache misses can be used to accurately estimate the processing energy of software video decoders. Therefore, we perform energy measurements on an ARM-based evaluation platform and count processor level events using a dedicated profiling software. Measurements are performed for various codecs and decoder implementations to prove the general viability of our observations. Using the estimation method proposed in this paper, the true decoding energy for various recent video coding standards including HEVC and VP9 can be estimated with a mean estimation error that is smaller than 6%.
    摘要 在本文中,我们证明处理器事件如 instrucion 数或缓存失败可以准确地估计软件视频解码器的处理能量。因此,我们使用专门的 profiling 软件来计数处理器级别事件,并在 ARM 基础设施上进行能量测量。测量结果表明,对于不同的编码器和解码器实现,我们可以使用我们所提出的估计方法来估计最新的视频编码标准HEVC和VP9的真正解码能量,其误差在6%以下。

Hybrid Representation-Enhanced Sampling for Bayesian Active Learning in Musculoskeletal Segmentation of Lower Extremities

  • paper_url: http://arxiv.org/abs/2307.13986
  • repo_url: None
  • paper_authors: Ganping Li, Yoshito Otake, Mazen Soufi, Masashi Taniguchi, Masahide Yagi, Noriaki Ichihashi, Keisuke Uemura, Masaki Takao, Nobuhiko Sugano, Yoshinobu Sato
  • for: 降低医学影像分割任务中手动标注的时间和成本
  • methods: bayesian active learning框架基于 bayesian u-net,采用混合表示式增强采样策略,选择高密度和多样性的不确定样本进行人工修正
  • results: 对两个lower extremity(LE)数据集的MRI和CT图像进行实验,比较不同的采样规则和方法,结果表明提出的方法在两个数据集上具有超越或相等的优势,并且量化结果表明采用混合riteria的方法在musculoskeletal segmentation中表现出优势。
    Abstract Purpose: Obtaining manual annotations to train deep learning (DL) models for auto-segmentation is often time-consuming. Uncertainty-based Bayesian active learning (BAL) is a widely-adopted method to reduce annotation efforts. Based on BAL, this study introduces a hybrid representation-enhanced sampling strategy that integrates density and diversity criteria to save manual annotation costs by efficiently selecting the most informative samples. Methods: The experiments are performed on two lower extremity (LE) datasets of MRI and CT images by a BAL framework based on Bayesian U-net. Our method selects uncertain samples with high density and diversity for manual revision, optimizing for maximal similarity to unlabeled instances and minimal similarity to existing training data. We assess the accuracy and efficiency using Dice and a proposed metric called reduced annotation cost (RAC), respectively. We further evaluate the impact of various acquisition rules on BAL performance and design an ablation study for effectiveness estimation. Results: The proposed method showed superiority or non-inferiority to other methods on both datasets across two acquisition rules, and quantitative results reveal the pros and cons of the acquisition rules. Our ablation study in volume-wise acquisition shows that the combination of density and diversity criteria outperforms solely using either of them in musculoskeletal segmentation. Conclusion: Our sampling method is proven efficient in reducing annotation costs in image segmentation tasks. The combination of the proposed method and our BAL framework provides a semi-automatic way for efficient annotation of medical image datasets.
    摘要 目的:获取手动标注以训练深度学习(DL)模型的自动分割时间consuming。uncertainty-based Bayesian活动学习(BAL)是一种广泛采用的方法,可以降低手动标注成本。本研究基于BAL,提出了一种混合表示符强化抽样策略,通过高密度和多样性标准来高效地选择最有用的样本进行手动修改。方法:我们在两个下肢(LE)数据集上进行了MRI和CT图像的实验,使用基于Bayesian U-net的BAL框架。我们的方法选择了uncertainty高和多样性高的样本进行手动修改,以便最大化与未标注实例的相似性,最小化与现有训练数据的相似性。我们使用Dice和我们所提出的metric called reduced annotation cost(RAC)进行评估精度和效率。我们进一步evaluate了不同的抽样规则对BAL性能的影响,并实现了效果的鉴定。结果:我们的方法在两个数据集上都与其他方法具有superiority或non-inferiority,并且在两个抽样规则下表现出优异。我们的ablation study表明,混合表示符强化抽样策略在musculoskeletal segmentation中表现出优异。结论:我们的抽样方法可以有效地减少图像分割任务中的手动标注成本。我们的BAL框架和混合表示符强化抽样策略的结合,提供了一种 semi-automatic的方法,可以快速和高效地注解医学图像数据集。

Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models

  • paper_url: http://arxiv.org/abs/2307.13981
  • repo_url: None
  • paper_authors: Wei Sun, Wen Wen, Xiongkuo Min, Long Lan, Guangtao Zhai, Kede Ma
  • for: 这篇论文旨在探讨视频质量评估(BVQA)在实际应用中的监测和改进视频观看体验的角色。
  • methods: 作者采用了基于基本块的最简单的BVQA模型,包括视频预处理(对空间时间下采样进行压缩)、空间质量分析器和可选的时间质量分析器,以及质量回归器。
  • results: 作者通过对八个VQA数据集进行计算分析发现,大多数数据集具有不同程度的易 datasets问题,一些甚至可以使用盲目图像质量评估(BIQA)解决方案。作者还通过比较不同模型变体在这些数据集上的质量预测性能,以及对不同基本建构元素的影响进行ablation分析,证明了他们的结论。这些结果表明BVQA领域的进步不足,同时也提供了constructing next-generation VQA datasets和模型的好做法。
    Abstract Blind video quality assessment (BVQA) plays an indispensable role in monitoring and improving the end-users' viewing experience in various real-world video-enabled media applications. As an experimental field, the improvements of BVQA models have been measured primarily on a few human-rated VQA datasets. Thus, it is crucial to gain a better understanding of existing VQA datasets in order to properly evaluate the current progress in BVQA. Towards this goal, we conduct a first-of-its-kind computational analysis of VQA datasets via designing minimalistic BVQA models. By minimalistic, we restrict our family of BVQA models to build only upon basic blocks: a video preprocessor (for aggressive spatiotemporal downsampling), a spatial quality analyzer, an optional temporal quality analyzer, and a quality regressor, all with the simplest possible instantiations. By comparing the quality prediction performance of different model variants on eight VQA datasets with realistic distortions, we find that nearly all datasets suffer from the easy dataset problem of varying severity, some of which even admit blind image quality assessment (BIQA) solutions. We additionally justify our claims by contrasting our model generalizability on these VQA datasets, and by ablating a dizzying set of BVQA design choices related to the basic building blocks. Our results cast doubt on the current progress in BVQA, and meanwhile shed light on good practices of constructing next-generation VQA datasets and models.
    摘要 《盲视视频质量评估(BVQA)在各种实际视频媒体应用中扮演着不可或缺的角色,负责监测和改进用户的观看体验。作为实验室的一个领域,BVQA模型的改进都是根据一些人类评估的VQA数据进行评估的。因此,我们需要更好地了解现有的VQA数据集,以便正确评估当前的进步。为达到这个目标,我们通过设计简单的BVQA模型进行计算分析。我们的家族中的BVQA模型只能使用基本块:视频预处理器(对于激进的空间时间采样)、空间质量分析器、可选的时间质量分析器以及质量回归器,其中所有的实现都是最简单的。通过对不同模型变体在八个VQA数据集上的质量预测性能进行比较,我们发现大多数数据集具有不同程度的易于评估(Easy Dataset Problem),一些甚至接受盲图质量评估(BIQA)解决方案。我们还通过对这些VQA数据集的模型普适性进行比较,以及对BVQA设计选择的繁殖评估来证明我们的结论。我们的结果表明当前BVQA领域的进步不充分,同时也照明了constructing next-generation VQA datasets和models的好做法。》

A real-time material breakage detection for offshore wind turbines based on improved neural network algorithm

  • paper_url: http://arxiv.org/abs/2307.13765
  • repo_url: None
  • paper_authors: Yantong Liu
  • for: 这种研究是为了提高陆上风电机的稳定性,以便更好地实现可持续能源生产。
  • methods: 这种方法使用了一种改进版的YOLOv8对象检测模型,并添加了一个卷积块注意模块(CBAM)来提高特征识别。研究使用了5,432张风电园的图像和一个公共可用的数据集进行了严格的测试。
  • results: 研究发现了一个显著提高的缺陷检测稳定性,这标志着可持续能源实践中的一个重要进步。
    Abstract The integrity of offshore wind turbines, pivotal for sustainable energy generation, is often compromised by surface material defects. Despite the availability of various detection techniques, limitations persist regarding cost-effectiveness, efficiency, and applicability. Addressing these shortcomings, this study introduces a novel approach leveraging an advanced version of the YOLOv8 object detection model, supplemented with a Convolutional Block Attention Module (CBAM) for improved feature recognition. The optimized loss function further refines the learning process. Employing a dataset of 5,432 images from the Saemangeum offshore wind farm and a publicly available dataset, our method underwent rigorous testing. The findings reveal a substantial enhancement in defect detection stability, marking a significant stride towards efficient turbine maintenance. This study's contributions illuminate the path for future research, potentially revolutionizing sustainable energy practices.
    摘要 风力机 Platform 上的缺陷问题,对可持续能源生产是关键。尽管有各种检测技术,但有限制,包括成本、效率和应用范围。本研究提出一种新的方法,利用高级版YOLOv8对象检测模型,配备Convolutional Block Attention Module (CBAM),以提高特征识别。优化的损失函数进一步优化学习过程。使用来自韩川风电园和公共数据集的5432张图像,我们的方法经过严格测试。发现缺陷检测稳定性得到了显著提高,这标志着可持续能源实践中的重要进步。本研究的贡献,推照未来研究的道路,可能会革命化可持续能源实践。