access icon free Register array-based sum of absolute difference processor with parallel memory system for fast motion estimation

Fast search block matching algorithm (BMA)-based video coding provides reasonable good quality video with minute cost of computation. In fast BMA, clock cycles required to read pixel data are quite more compared with matching operation due to erratic location of candidate macroblocks (CMBs). With aim of reduction in number of clock cycles, parallel memory system is used in this study, which can accelerate reading of CMBs and speedup motion vector (MV) computation. Novel concept of register array is introduced to organise CMBs, which expedite computation hungry search process. Owing to shape of register array, lesser space is needed to store CMBs and architecture addresses wide range of search patterns. The proposed sum of absolute difference processor with parallel memory system computes MV of 1 macroblock in 28 clock cycles in average case. Compared to single memory system, it saves 68% and 80% clock cycles in CMB access of initial search and intermediate search process, respectively. Hardware architecture is tested with Xilinx Virtex5 field programmable gate array. The proposed fixed 8×8 macroblock size architecture processes 354 high definition (HD) (1080p) frames per second (fps) and configurable architecture processes 201 HD fps which is more than adequate for real-time encoding.

Inspec keywords: motion estimation; video coding; shift registers; parallel memories; image matching; field programmable gate arrays

Other keywords: single memory system; register array-based sum; BMA; MV; intermediate search process; absolute difference processor; minute computation cost; computation hungry search process; block matching algorithm-based video coding; matching operation; memory size 1 MByte; hardware-efficient double diamond search architecture architecture; pixel data; motion vector computation process; CMB; Xilinx Virtex5 field programmable gate array family; motion estimation; candidate macroblocks; parallel memory system; initial search; clock cycles; erratic location

Subjects: Image and video coding; Image recognition; Digital storage; Memory circuits; Logic circuits; Video signal processing; Computer vision and image processing techniques; Logic and switching circuits

References

    1. 1)
      • 8. Kthiri, M., Loukil, H., Werda, I., et al: ‘Hardware implementation of fast block matching algorithm in FPGA for H.264/AVC’. Int. Multi-Conf. Systems, Signals and Devices, 2009, vol. 1, no. 1, pp. 14.
    2. 2)
      • 15. Fengxiubo, L., Qin, W.: ‘A high-performance low cost SAD architecture for video coding’, IEEE Trans. Consum. Electron., 2007, 53, (2), pp. 535541.
    3. 3)
      • 4. Porto, M., Silva, A., Almeida, S., et al: ‘Motion estimation architecture using efficient adder-compressors for HDTV video coding’, J. Integr. Circuits Syst., 2010, 5, (1), pp. 7888.
    4. 4)
      • 6. Vanne, J., Aho, E., Kuusilinna, K.: ‘A configurable motion estimation architecture for block-matching algorithms’, IEEE Trans. Circuits Syst. Video Technol., 2009, 19, (4), pp. 466476.
    5. 5)
      • 13. Rehman, S., Young, R., Chatwin, C., et al: ‘An FPGA based generic framework for high speed sum of absolute difference implementation’, Eur. J. Sci. Res., 2009, 33, (1), pp. 629.
    6. 6)
      • 10. Sanchez, G., Noble, D., Porto, M., et al: ‘High efficient motion estimation architecture with integrated motion compensation and FME support’. IEEE Second Latin American Symp. Circuits and Systems (LASCAS), 2011, pp. 69.
    7. 7)
      • 2. Zhu, S.: ‘A new diamond search algorithm for fast block-matching motion estimation’, IEEE Trans. Image Process., 2000, 9, (2), pp. 287290.
    8. 8)
      • 9. Sanchez, G., Sampaio, F., Porto, M., et al: ‘DMPDS: A fast motion estimation algorithm targeting high resolution videos and its FPGA implementation’, Int. J. Reconfigurable Comput., 2012, pp. 112.
    9. 9)
      • 17. Wei, C., Hui, H., Jiarong, T., et al: ‘A high-performance reconfigurable VLSI architecture for VBSME in H.264’, IEEE Trans. Consum. Electron., 2008, 54, (3), pp. 13381345.
    10. 10)
      • 5. Tsai, T.-H., Pan, Y.-N.: ‘High efficiency architecture design of real-time QFHD for H.264/AVC block motion estimation’, IEEE Trans. Circuits Syst. Video Technol., 2011, 21, (11), pp. 16461658.
    11. 11)
      • 12. Vanne, J., Aho, E., Hamalainen, T.D., et al: ‘A parallel memory system for variable block-size motion estimation algorithms’, IEEE Trans. Circuits Syst. Video Technol., 2008, 18, (4), pp. 538543.
    12. 12)
      • 3. Zhu, C., Lin, X., Chau, L.: ‘Hexagon-based search pattern for fast block motion estimation’, IEEE Trans. Circuits Syst. Video Technol., 2002, 12, (5), pp. 349355.
    13. 13)
      • 1. Shah, N.N., Dalal, U.D.: ‘Hardware efficient double diamond search block matching algorithm for fast video motion estimation’, Signal Process. Syst. Signal Image Video Technol., 2015, 82, (1), pp. 115135.
    14. 14)
      • 14. Vanne, J., Aho, E., Kuusilinna, K.: ‘A high-performance sum of absolute difference implementation for motion estimation’, IEEE Trans. Circuits Syst. Video Technol., 2006, 16, (7), pp. 876883.
    15. 15)
      • 11. Tanskanen, J.K., Creutzburg, R., Niittylahti, J.T.: ‘On design of parallel memory access schemes for video coding’, J. VLSI Signal Process. Syst. Signal Image Video Technol., 2005, 40, (2), pp. 215237.
    16. 16)
      • 7. Ndili, O., Ogunfunmi, T.: ‘Algorithm and architecture co-design of hardware-oriented, modified diamond search for fast motion estimation in H.264/AVC’, IEEE Trans. Circuits Syst. Video Technol., 2011, 21, (9), pp. 12141227.
    17. 17)
      • 16. Shah, N.N., Agarwal, K.R., Singapuri, H.M.: ‘Implementation of sum of absolute difference using optimized partial summation term reduction’. Int. Conf. Advanced Electronic Systems (ICAES), 2013, pp. 192196.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2016.0178
Loading

Related content

content/journals/10.1049/iet-cdt.2016.0178
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading