access icon free Improved matrix multiplier design for high-speed digital signal processing applications

A transistor level implementation of an improved matrix multiplier for high-speed digital signal processing applications based on matrix element transformation and multiplication is reported in this study. The improvement in speed was achieved by rearranging the matrix element into a two-dimensional array of processing elements interconnected as a mesh. The edges of each row and column were interconnected in torus structure, facilitating simultaneous implementation of several multiplications. The functionality of the circuitry was verified and the performance parameters for example, propagation delay and dynamic switching power consumptions were calculated using spice spectre using 90 nm CMOS technology. The proposed methodology ensures substantial reduction in propagation delay compared with the conventional algorithm, systolic array and pseudo number theoretic transformation (PNTT)-based implementation, which are the most commonly used techniques, for matrix multiplication. The propagation delay of the implemented 4 × 4 matrix multiplier was only ∼2 µs, whereas the power consumption of the implemented 4 × 4 matrix multiplier was ∼3.12 mW only. Improvement in speed compared with earlier reported matrix multipliers, for example, conventional algorithm, systolic array and PNTT-based implementation was found to be ∼67, ∼56 and ∼65%, respectively.

Inspec keywords: CMOS integrated circuits; matrix multiplication; delays; digital signal processing chips; systolic arrays; power consumption

Other keywords: systolic array; CMOS technology; propagation delay; transistor level implementation; torus structure; spice spectre; high-speed digital signal processing application; PNTT-based implementation; pseudo number theoretic transformation-based implementation; matrix multiplier design; substantial reduction; size 90 nm; matrix element transformation; two-dimensional array; dynamic switching power consumption

Subjects: Digital signal processing chips; Algebra; Digital signal processing chips; CMOS integrated circuits; Algebra

References

    1. 1)
      • 4. Jovanovic, Z., Milutinovic, V.: ‘FPGA accelerator for floating-point matrix multiplication’, IET Comput. Digit. Tech., 2012, 6, (4), pp. 249256 (doi: 10.1049/iet-cdt.2011.0132).
    2. 2)
      • 8. Moldovan, D.I., Fortes, J.A.B.: ‘Partitioning and mapping algorithms into fixed size systolic arrays’, IEEE Trans. Comput., 1986, 35, (1), pp. 112 (doi: 10.1109/TC.1986.1676652).
    3. 3)
      • 3. Li, K., Pan, Y., Zheng, S.Q.: ‘Fast and processor efficient parallel matrix multiplication algorithms on a linear array with a reconfigurable pipelined bus system’, IEEE Trans. Parallel Distrib. Syst., 1998, 9, (8), pp. 705720 (doi: 10.1109/71.706044).
    4. 4)
      • 25. Saha, P., Banerjee, A., Dandapat, A., Bhattacharyya, P.: ‘ASIC implementation of high speed processor for calculating discrete fourier transformation using circular convolution technique’, WSEAS Trans. Circuits and Syst., 2011, 10, (8), pp. 278288.
    5. 5)
      • 23. Bravo, I., Jimenez, P., Mazo, M., Lazaro, J.L., De las Heras, J.J., Gardel, A.: ‘Different proposal to matrix multiplication based on FPGAs’. Proc. IEEE Int. Symp. Industrial Electronics (ISIE), June 2007, pp. 17091714.
    6. 6)
      • 16. Snopce, H., Elmazi, L.: ‘The importance of using the linear transformation matrix in determining the number of processing elements in 2-dimensional systolic array for the algorithm of matrix-matrix multiplication’. Proc. IEEE Int. Conf. Information Technology Interfaces, June 2008, pp. 885892.
    7. 7)
      • 7. Ramakrishnan, I.V., Varman, P.J.: ‘Modular matrix multiplication on a linear array’, IEEE Trans. Comput., 1984, C-33, (11), pp. 952958 (doi: 10.1109/TC.1984.1676369).
    8. 8)
      • 1. Hong, S., Park, K., Mun, J.: ‘Design and implementation of a high-speed matrix multiplier based on word-width decomposition’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2006, 4, (4), pp. 380392 (doi: 10.1109/TVLSI.2006.874302).
    9. 9)
      • 5. Idris, M., Noor, N.M., Tamil, E.M., Razak, Z., Arof, H.: ‘Parallel matrix multiplication design for monocular SLAM’. Proc. IEEE Fourth Asia Int. Conf. Mathematical/Analytical Modeling and Computer Simulation (AMS), May 2010, pp. 492497.
    10. 10)
      • 26. Harris, D., Sutherland, I.: ‘Logical effort of carry propagate adders’. Proc. IEEE Thirty-Seventh Asilomar Conf. Signals, Systems and Computing, November 2003, no. 1, pp. 873878.
    11. 11)
      • 18. Bhuvaneswari, T., Prasad, V.C., Singh, A.K.: ‘Multiple BDD based matrix multiplication’. Proc. IEEE Int. Conf. Semiconductor Electronics, June 2010, pp. 114118.
    12. 12)
      • 17. Jinsoo Oh, J., Kim, J., Moon, B.-R.: ‘On the inequivalence of bilinear algorithms for 3 × 3 matrix multiplication’, Inf. Process. Lett., 2013, 113, (17), pp. 640645 (doi: 10.1016/j.ipl.2013.05.011).
    13. 13)
      • 14. Cohn, H., Umans, C.: ‘A group-theoretic approach to fast matrix multiplication’. Proc. IEEE Int. Symp. Foundation of Computer Science, October 2003, pp. 438449.
    14. 14)
      • 20. Winograd, S.: ‘A new algorithm for inner product’, IEEE Trans. Comput., 1968, C-17, (7), pp. 693694 (doi: 10.1109/TC.1968.227420).
    15. 15)
      • 12. Milovanovic, E.I., Randjelovic, B.M., Milovanovic, I.Z., Stojcev, M.K.: ‘Matrix multiplication on linear bidirectional systolic arrays’, Appl. Math. Inform. Mech., 2010, 2, (1), pp. 1120.
    16. 16)
      • 19. Kung, S.Y.: ‘VLSI array processors’ (Prentice-Hall, 1988).
    17. 17)
      • 2. Gao, S., Al-Khalili, D., Chabini, N.: ‘Two level decomposition based matrix multiplication for FPGAs’. Proc. IEEE Int. Conf. Electronics, Circuits & Systems, December 2009, pp. 427430.
    18. 18)
      • 15. Cohn, H., Kleinberg, R., Szegedy, B., Umans, C.: ‘Group-theoretic algorithms for matrix multiplication’. Proc. IEEE Int. Symp. Foundation of Computer Science, October 2005, pp. 379388.
    19. 19)
      • 22. Drevet, C.E., Md. Islam, N., Schost, E.: ‘Optimization techniques for small matrix multiplication’, Theor. Comput. Sci., 2011, 412, (22), pp. 22192236 (doi: 10.1016/j.tcs.2010.12.012).
    20. 20)
      • 6. Varman, P.J., Ramakrishnan, I.V.: ‘Synthesis of an optimal family of matrix multiplication algorithms on linear arrays’, IEEE Trans. Comput., 1986, C-35, (11), pp. 989996 (doi: 10.1109/TC.1986.1676700).
    21. 21)
      • 21. Strassen, V.: ‘Gaussian elimination is not optimal’, Numer. Math., 1969, 13, pp. 354356 (doi: 10.1007/BF02165411).
    22. 22)
      • 24. Joo, A., Ekart, A., Neirotti, J.P.: ‘Genetic algorithms for discovery of matrix multiplication methods’, IEEE Trans. Evol. Comput., 2012, 16, (5), pp. 749751 (doi: 10.1109/TEVC.2011.2159270).
    23. 23)
      • 10. Chan, S.W., Wey, C.L.: ‘The design of concurrent error diagnosable systolic arrays for band matrix multiplications’, IEEE Trans. Comput. Aided Des., 1988, 7, (1), pp. 2137 (doi: 10.1109/43.3127).
    24. 24)
      • 11. Sonawane, D.N., Sutaone, M.S., Malek, I.: ‘Systolic architecture for integer point matrix multiplication using FPGA’. Proc. IEEE Int. Conf. Industrial Electronics and Application, May 2009, pp. 38223825.
    25. 25)
      • 9. Belkacemi, S., Benkrid, K., Crookes, D., Benkrid, A.: ‘Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs’. Proc. IEEE Int. Workshop on Computing Architecture for Machine Perception’, May 2003, pp. 156159.
    26. 26)
      • 13. Yagle, A.E.: ‘Fast algorithms for matrix multiplication using pseudo-number-theoretic transforms’, IEEE Trans. Signal Process., 1995, 43, (1), pp. 7176 (doi: 10.1109/78.365287).
    27. 27)
      • 27. Asati, A., Shekhar, C.: ‘A high-speed, hierarchical 16 × 16 array of array multiplier design’. Proc. Multimedia, Signal Processing, and Communication Technologies, March 2009, pp. 161164.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cds.2013.0117
Loading

Related content

content/journals/10.1049/iet-cds.2013.0117
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading