eess.IV - 2023-10-07

Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors

  • paper_url: http://arxiv.org/abs/2310.16844
  • repo_url: None
  • paper_authors: Md Abdullah-Al Kaiser, Akhilesh R. Jaiswal
  • for: 这篇论文的目的是为了解决边缘设备具有计算能力限制的问题,尤其是对于计算机见的应用,以节省能源和传输带宽。
  • methods: 这篇论文使用了不同的方法,包括靠近感应器处理、内部感应器处理和内部像素处理,以将计算进行更加靠近感应器,从而节省传输带宽。特别是在像素中进行的内部像素处理,通过将不同的操作结合在一起,以提高能效性。
  • results: 这篇论文的结果显示,这些方法可以提高边缘设备的能效性和传输带宽,并且可以降低训练时间和能源消耗。此外,这篇论文还提出了一些硬件设计和数据分析技术,以提高内部像素处理的泄漏性能。
    Abstract The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations.
    摘要 因为边缘设备的数据传输量过高,导致边缘设备具有限制的资源表现出能量和吞吐瓶颈问题。因此,研究人员正在调查不同的方法(如靠近传感器处理、在传感器处理、在像素处理),以便在传感器处理计算更近,减少传输带宽。特别是在像素处理方面,在神经网络感知器(如动态视sensors(DVS))中包含异步多乘法(MAC)操作,可以提高能效性。在CMOS实现中,低负荷能效的分析器可以在passive capacitor上储存电荷,但限制电容器的储存时间影响算法集成时间选择,从而影响算法的准确率、带宽、能效和训练效率。因此,这会导致硬件方面的设计决策——创造低泄漏计算单元,同时维持面积和能效的优点。在这种工作中,我们提供了硬件-算法共设计的硬件限制和提高泄漏性的技术分析。