eess.IV - 2023-10-07

Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors

paper_url: http://arxiv.org/abs/2310.16844
repo_url: None
paper_authors: Md Abdullah-Al Kaiser, Akhilesh R. Jaiswal
for: 这篇论文的目的是为了解决边缘设备具有计算能力限制的问题，尤其是对于计算机见的应用，以节省能源和传输带宽。
methods: 这篇论文使用了不同的方法，包括靠近感应器处理、内部感应器处理和内部像素处理，以将计算进行更加靠近感应器，从而节省传输带宽。特别是在像素中进行的内部像素处理，通过将不同的操作结合在一起，以提高能效性。
results: 这篇论文的结果显示，这些方法可以提高边缘设备的能效性和传输带宽，并且可以降低训练时间和能源消耗。此外，这篇论文还提出了一些硬件设计和数据分析技术，以提高内部像素处理的泄漏性能。

Abstract
The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations.

摘要
因为边缘设备的数据传输量过高，导致边缘设备具有限制的资源表现出能量和吞吐瓶颈问题。因此，研究人员正在调查不同的方法（如靠近传感器处理、在传感器处理、在像素处理），以便在传感器处理计算更近，减少传输带宽。特别是在像素处理方面，在神经网络感知器（如动态视sensors（DVS））中包含异步多乘法（MAC）操作，可以提高能效性。在CMOS实现中，低负荷能效的分析器可以在passive capacitor上储存电荷，但限制电容器的储存时间影响算法集成时间选择，从而影响算法的准确率、带宽、能效和训练效率。因此，这会导致硬件方面的设计决策——创造低泄漏计算单元，同时维持面积和能效的优点。在这种工作中，我们提供了硬件-算法共设计的硬件限制和提高泄漏性的技术分析。