The mapping of machine learning algorithms on XIMA
Machine learning applications on distributed in-memory computing architecture (XIMA), including single-layer feedforward neural network (SLFN)-based learning, binary convolutional neural network (BCNN)-based inferences on passive array as well as One Selector One ReRAM (1s1R) array. In this network, matrix-vector multiplication, the major operation in machine learning on ReRAM crossbar is implemented by three steps with detailed mapping strategy for each step. The process of mapping of X'A to ReRAM. The proposed 3D CMOS-ReRAM accelerator can greatly improve the energy efficiency and neural network processing speed. The matrix-vector multiplication can be intrinsically implemented by ReRAM crossbar. Compared to CMOS, a multibit tensor-core weight can be represented by a single ReRAM, and the addition can be realized by current merging. Moreover, as all the tensor cores for the neural network weights are stored in the ReRAM devices, better area efficiency can be achieved. The power consumption of the proposed CMOS-ReRAM is much smaller than the CMOS implementation due to nonvolatile property of ReRAM. For energy efficiency, our proposed accelerator can achieve 1,055.95 GOPS/W, which is equivalent to 7.661 TOP S/W for uncompressed neural network. Our pro-posed TNN accelerator can also achieve 2.39 x better energy efficiency comparing to 3D CMOS-ASIC result (441.36 GOPS/W), and 227.24x better energy efficiency comparing to Nvidia Tesla K40, which can achieve 1,092 GFLOPS and consume 235W
The mapping of machine learning algorithms on XIMA, Page 1 of 2
< Previous page Next page > /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-1.gif /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-2.gif