results: 这篇论文的结果显示,这些方法可以提高边缘设备的能效性和传输带宽,并且可以降低训练时间和能源消耗。此外,这篇论文还提出了一些硬件设计和数据分析技术,以提高内部像素处理的泄漏性能。Abstract
The high volume of data transmission between the edge sensor and the cloud processor leads to energy and throughput bottlenecks for resource-constrained edge devices focused on computer vision. Hence, researchers are investigating different approaches (e.g., near-sensor processing, in-sensor processing, in-pixel processing) by executing computations closer to the sensor to reduce the transmission bandwidth. Specifically, in-pixel processing for neuromorphic vision sensors (e.g., dynamic vision sensors (DVS)) involves incorporating asynchronous multiply-accumulate (MAC) operations within the pixel array, resulting in improved energy efficiency. In a CMOS implementation, low overhead energy-efficient analog MAC accumulates charges on a passive capacitor; however, the capacitor's limited charge retention time affects the algorithmic integration time choices, impacting the algorithmic accuracy, bandwidth, energy, and training efficiency. Consequently, this results in a design trade-off on the hardware aspect-creating a need for a low-leakage compute unit while maintaining the area and energy benefits. In this work, we present a holistic analysis of the hardware-algorithm co-design trade-off based on the limited integration time posed by the hardware and techniques to improve the leakage performance of the in-pixel analog MAC operations.
摘要
因为边缘设备的数据传输量过高,导致边缘设备具有限制的资源表现出能量和吞吐瓶颈问题。因此,研究人员正在调查不同的方法(如靠近传感器处理、在传感器处理、在像素处理),以便在传感器处理计算更近,减少传输带宽。特别是在像素处理方面,在神经网络感知器(如动态视sensors(DVS))中包含异步多乘法(MAC)操作,可以提高能效性。在CMOS实现中,低负荷能效的分析器可以在passive capacitor上储存电荷,但限制电容器的储存时间影响算法集成时间选择,从而影响算法的准确率、带宽、能效和训练效率。因此,这会导致硬件方面的设计决策——创造低泄漏计算单元,同时维持面积和能效的优点。在这种工作中,我们提供了硬件-算法共设计的硬件限制和提高泄漏性的技术分析。