Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

The mapping of machine learning algorithms on XIMA

The mapping of machine learning algorithms on XIMA

For access to this article, please select a purchase option:

Buy chapter PDF
$16.00
(plus tax if applicable)
Buy Knowledge Pack
10 chapters for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
ReRAM-based Machine Learning — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Machine learning applications on distributed in-memory computing architecture (XIMA), including single-layer feedforward neural network (SLFN)-based learning, binary convolutional neural network (BCNN)-based inferences on passive array as well as One Selector One ReRAM (1s1R) array. In this network, matrix-vector multiplication, the major operation in machine learning on ReRAM crossbar is implemented by three steps with detailed mapping strategy for each step. The process of mapping of X'A to ReRAM. The proposed 3D CMOS-ReRAM accelerator can greatly improve the energy efficiency and neural network processing speed. The matrix-vector multiplication can be intrinsically implemented by ReRAM crossbar. Compared to CMOS, a multibit tensor-core weight can be represented by a single ReRAM, and the addition can be realized by current merging. Moreover, as all the tensor cores for the neural network weights are stored in the ReRAM devices, better area efficiency can be achieved. The power consumption of the proposed CMOS-ReRAM is much smaller than the CMOS implementation due to nonvolatile property of ReRAM. For energy efficiency, our proposed accelerator can achieve 1,055.95 GOPS/W, which is equivalent to 7.661 TOP S/W for uncompressed neural network. Our pro-posed TNN accelerator can also achieve 2.39 x better energy efficiency comparing to 3D CMOS-ASIC result (441.36 GOPS/W), and 227.24x better energy efficiency comparing to Nvidia Tesla K40, which can achieve 1,092 GFLOPS and consume 235W

Chapter Contents:

  • 6.1 Machine learning algorithms on XIMA
  • 6.1.1 SLFN-based learning and inference acceleration
  • 6.1.1.1 Step 1. Parallel digitizing
  • 6.1.1.2 Step 2. XOR
  • 6.1.1.3 Step 3. Encoding
  • 6.1.1.4 Step 4. Adding and shifting for inner-product result
  • 6.1.2 BCNN-based inference acceleration on passive array
  • 6.1.2.1 Mapping bitwise convolution
  • 6.1.2.2 Mapping bitwise batch normalization
  • 6.1.2.3 Mapping bitwise pooling and binarization
  • 6.1.2.4 Summary of mapping bitwise CNN
  • 6.1.3 BCNN-based inference acceleration on 1S1R array
  • 6.1.3.1 Mapping unsigned bitwise convolution
  • 6.1.3.2 Mapping batch normalization, pooling and binarization
  • 6.1.4 L2-norm gradient-based learning and inference acceleration
  • 6.1.4.1 Mapping matrix–vector multiplication on ReRAM-crossbar network
  • 6.1.4.2 Mapping L2-norm calculation on coupled ReRAM oscillator network
  • 6.1.4.3 Mapping flow of multilayer neural network on ReRAM network
  • 6.1.5 Experimental evaluation of machine learning algorithms on XIMA architecture
  • 6.1.5.1 SLFN-based learning and inference acceleration
  • 6.1.5.2 L2-norm gradient-based learning and inference acceleration
  • 6.1.5.3 BCNN-based inference acceleration on passive array
  • 6.1.5.4 BCNN-based inference acceleration on 1S1R array
  • 6.2 Machine learning algorithms on 3D XIMA
  • 6.2.1 On-chip design for SLFN
  • 6.2.1.1 Data quantization
  • 6.2.1.2 ReRAM layer implementation for digitized matrix–vector multiplication
  • 6.2.1.3 CMOS layer implementation for decoding and incremental least-squares
  • 6.2.2 On-chip design for TNNs
  • 6.2.2.1 Mapping on multilayer architecture
  • 6.2.2.2 Mapping TNN on single-layer architecture
  • 6.2.3 Experimental evaluation of machine learning algorithms on 3D CMOS-ReRAM
  • 6.2.3.1 On-chip design for SLFN-based face recognition
  • 6.2.3.2 Results of TNN-based on-chip design with 3D multilayer architecture
  • 6.2.3.3 TNN-based distributed on-chip design on 3D single-layer architecture

Inspec keywords: convolutional neural nets; feedforward neural nets; matrix multiplication; resistive RAM; learning (artificial intelligence); CMOS memory circuits; power consumption; tensors

Other keywords: distributed in-memory computing architecture; SLFN; uncompressed neural network; nonvolatile property; 3D CMOS-ASIC; energy efficiency; TNN accelerator; single-layer feedforward neural network based learning; Nvidia Tesla K40; matrix vector multiplication; binary convolutional neural network; power consumption; mapping strategy; BCNN; GFLOPS; machine learning; 3D CMOS-ReRAM accelerator; multibit tensor-core weight; XIMA

Subjects: Digital storage; Algebra; Memory circuits; Algebra

Preview this chapter:
Zoom in
Zoomout

The mapping of machine learning algorithms on XIMA, Page 1 of 2

| /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-1.gif /docserver/preview/fulltext/books/pc/pbpc039e/PBPC039E_ch6-2.gif

Related content

content/books/10.1049/pbpc039e_ch6
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address