The mapping of machine learning algorithms on XIMA

Hao Yu; Leibin Ni; Sai Manoj Pudukotai Dinakarrao

Machine learning applications on distributed in-memory computing architecture (XIMA), including single-layer feedforward neural network (SLFN)-based learning, binary convolutional neural network (BCNN)-based inferences on passive array as well as One Selector One ReRAM (1s1R) array. In this network, matrix-vector multiplication, the major operation in machine learning on ReRAM crossbar is implemented by three steps with detailed mapping strategy for each step. The process of mapping of X'A to ReRAM. The proposed 3D CMOS-ReRAM accelerator can greatly improve the energy efficiency and neural network processing speed. The matrix-vector multiplication can be intrinsically implemented by ReRAM crossbar. Compared to CMOS, a multibit tensor-core weight can be represented by a single ReRAM, and the addition can be realized by current merging. Moreover, as all the tensor cores for the neural network weights are stored in the ReRAM devices, better area efficiency can be achieved. The power consumption of the proposed CMOS-ReRAM is much smaller than the CMOS implementation due to nonvolatile property of ReRAM. For energy efficiency, our proposed accelerator can achieve 1,055.95 GOPS/W, which is equivalent to 7.661 TOP S/W for uncompressed neural network. Our pro-posed TNN accelerator can also achieve 2.39 x better energy efficiency comparing to 3D CMOS-ASIC result (441.36 GOPS/W), and 227.24x better energy efficiency comparing to Nvidia Tesla K40, which can achieve 1,092 GFLOPS and consume 235W

Chapter Contents:

6.1 Machine learning algorithms on XIMA
6.1.1 SLFN-based learning and inference acceleration
6.1.1.1 Step 1. Parallel digitizing
6.1.1.2 Step 2. XOR
6.1.1.3 Step 3. Encoding
6.1.1.4 Step 4. Adding and shifting for inner-product result
6.1.2 BCNN-based inference acceleration on passive array
6.1.2.1 Mapping bitwise convolution
6.1.2.2 Mapping bitwise batch normalization
6.1.2.3 Mapping bitwise pooling and binarization
6.1.2.4 Summary of mapping bitwise CNN
6.1.3 BCNN-based inference acceleration on 1S1R array
6.1.3.1 Mapping unsigned bitwise convolution
6.1.3.2 Mapping batch normalization, pooling and binarization
6.1.4 L2-norm gradient-based learning and inference acceleration
6.1.4.1 Mapping matrix–vector multiplication on ReRAM-crossbar network
6.1.4.2 Mapping L2-norm calculation on coupled ReRAM oscillator network
6.1.4.3 Mapping flow of multilayer neural network on ReRAM network
6.1.5 Experimental evaluation of machine learning algorithms on XIMA architecture
6.1.5.1 SLFN-based learning and inference acceleration
6.1.5.2 L2-norm gradient-based learning and inference acceleration
6.1.5.3 BCNN-based inference acceleration on passive array
6.1.5.4 BCNN-based inference acceleration on 1S1R array
6.2 Machine learning algorithms on 3D XIMA
6.2.1 On-chip design for SLFN
6.2.1.1 Data quantization
6.2.1.2 ReRAM layer implementation for digitized matrix–vector multiplication
6.2.1.3 CMOS layer implementation for decoding and incremental least-squares
6.2.2 On-chip design for TNNs
6.2.2.1 Mapping on multilayer architecture
6.2.2.2 Mapping TNN on single-layer architecture
6.2.3 Experimental evaluation of machine learning algorithms on 3D CMOS-ReRAM
6.2.3.1 On-chip design for SLFN-based face recognition
6.2.3.2 Results of TNN-based on-chip design with 3D multilayer architecture
6.2.3.3 TNN-based distributed on-chip design on 3D single-layer architecture

Inspec keywords: convolutional neural nets; feedforward neural nets; matrix multiplication; resistive RAM; learning (artificial intelligence); CMOS memory circuits; power consumption; tensors

Other keywords: distributed in-memory computing architecture; SLFN; uncompressed neural network; nonvolatile property; 3D CMOS-ASIC; energy efficiency; TNN accelerator; single-layer feedforward neural network based learning; Nvidia Tesla K40; matrix vector multiplication; binary convolutional neural network; power consumption; mapping strategy; BCNN; GFLOPS; machine learning; 3D CMOS-ReRAM accelerator; multibit tensor-core weight; XIMA

Subjects: Neural nets; Digital storage; Algebra; Memory circuits; Algebra

Book DOI: 10.1049/PBPC039E
Chapter DOI: 10.1049/PBPC039E_ch6
ISBN: 9781839530814
e-ISBN: 9781839530821

Preview this chapter:

The mapping of machine learning algorithms on XIMA, Page 1 of 2

< Previous page | Next page > /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-1.gif /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-2.gif

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

The mapping of machine learning algorithms on XIMA

The mapping of machine learning algorithms on XIMA

Buy chapter PDF

Buy Knowledge Pack

Thank you

Related content