The state-of-the-art convolutional neural networks (CNNs) have been widely applied to many deep neural networks models. As the model becomes more accurate, both the number of computation and the data accesses are significantly increased. The proposed design uses the row stationary with network-on-chip and the fast convolution algorithm in process elements to reduce the number of computation and data accesses simultaneously. The experimental evaluation which using the CNN layers of VGG-16 with a batch size of three shows that the proposed design is more energy-efficient than the state-of-the-art work. The proposed design improves the total GOPs of the algorithm by 1.497 times and reduces the on-chip memory and off-chip memory accesses by 1.07 and 1.46 times than prior work, respectively.

Implementation of energy-efficient fast convolution algorithm for deep convolutional neural networks based on FPGA

References

Related content