access icon free Optimisation of HEVC motion estimation exploiting SAD and SSD GPU-based implementation

The new High-Efficiency Video Coding (HEVC) standard doubles the video compression ratio compared to the previous H.264/AVC at the same video quality and without any degradation. However, this important performance is achieved by increasing the encoder computational complexity. That's why HEVC complexity is a crucial subject. The most time consuming and the most intensive computing part of HEVC is the motion estimation based principally on the sum of absolute differences (SAD) or the sum of square differences (SSD) algorithms. For these reasons, the authors proposed an implementation of these algorithms on a low cost NVIDIA GPU (graphics processing unit) using the Fermi architecture developed with Compute Unified Device Architecture language. The proposed algorithm is based on the parallel-difference and the parallel-reduction process. The investigational results show a significant speed-up in terms of execution time for most 64 × 64 pixel blocks. In fact, the proposed parallel algorithm permits a significant reduction in the execution time that reaches up to 56.17 and 30.4%, compared to the CPU, for SAD and SSD algorithms, respectively. This improvement proves that parallelising the algorithm with the new proposed reduction process for the Fermi-GPU generation leads to better results. These findings are based on a static study that determines the PU percentage utilisation for each dimension in the HEVC. This study shows that the larger PUs are the most utilised in temporal levels 3 and 4, which attain 84.56% for class E. This improvement is accompanied by an average peak signal-to-noise ratio loss of 0.095 dB and a decrease of 0.64% in terms of BitRate.

Inspec keywords: graphics processing units; motion estimation; video coding

Other keywords: sum of square differences; SAD; SSD GPU-based implementation; graphics processing unit; NVIDIA GPU; HEVC complexity; compute unified device architecture language; HEVC standard; Fermi architecture; signal-to-noise ratio loss; HEVC motion estimation

Subjects: Microprocessor chips; Microprocessors and microcomputers; Image and video coding; Video signal processing

References

    1. 1)
      • 4. Zhang, D., Li, B., Xu, J., et al: ‘Fast transcoding from H. 264 AVC to high efficiency video coding’. IEEE Int. Conf. Multimedia and Expo, Melbourne, Australia, 2012, pp. 651656.
    2. 2)
      • 9. Nalluri, P., Alves, L.N., Navarro, A.: ‘Improvements to TZ search motion estimation algorithm for multiview video coding’. Int. Conf. Systems, Signals and Image Processing (ICSSIP), Austria, 2012, pp. 388391.
    3. 3)
      • 17. Medhat, A., Shalaby, A., Sayed, M.S., et al: ‘A highly parallel SAD architecture for motion estimation in HEVC encoder’. IEEE Asia Pacific Conf. Circuits and Systems (APCCAS), Ishigaki, 2014, pp. 280283.
    4. 4)
      • 2. Percheron, S., Vieron, J.: ‘HEVC the key to delivering an enhanced television viewing experience beyond HD’. Annual Technical Conf. & Exhibition, SMPTE, Hollywood, CA, USA, 2013, pp. 111.
    5. 5)
      • 35. Khemiri, R., Sayadi, F.E., Bahri, H., et al: ‘Execution-time optimization based on thread and block repartitions on a graphic processing unit’. The Int. Conf. Engineering & MIS 2017 (ICIMIS2017), Monastir, Tunisia, 2017, pp. 14.
    6. 6)
      • 31. Ghorpade, J., Parande, J., Kulkarni, M., et al: ‘GPGPU processing in CUDA architecture’, Adv. Comput., Int. J., 2012, 3, (1), pp. 105120.
    7. 7)
      • 32. Glaskowsky, P.N.: ‘NVIDIA's fermi: the first complete GPU computing architecture’, Prepared under contract with NVIDIA Corporation, http://www.nvidia.com/content/pdf/fermi_white_papers/p.glaskowsky_nvidia's_fermi-the_first_complete_gpu_architecture.pdf, accessed 15 January 2016.
    8. 8)
      • 16. Nalluri, P., Alves, L.N., Navarro, A.: ‘High speed sad architectures for variable block size motion estimation in HEVC video coding’. IEEE Int. Conf. on Image Processing (ICIP), Paris, 2014, pp. 12331237.
    9. 9)
      • 39. Elecard HEVC Analyzer, http://www.elecard.com/assets/files/manuals/hevcanalyzer/EHEVCAnalyzer_UG.pdf.
    10. 10)
      • 28. Chouchene, M., Sayadi, F.E., Bahri, H., et al: ‘Optimized parallel implementation of face detection based on GPU component’, J. Microprocessors Microsyst., 2015, 39, (6), pp. 393404.
    11. 11)
      • 33. Khemiri, R., Chouchene, M., Barhi, H., et al: ‘Fast SAD algorithm of HEVC video encoder on two successive GPU generations’, Int. J. Imaging Robot., 2017, 17, (2), pp. 111.
    12. 12)
      • 23. Richardson, I.: ‘HEVC an introduction to high efficiency video coding’, VCodexVideo Compression, http://vcodex.com/, accessed 15 January 2016.
    13. 13)
      • 26. Bross, B.: ‘JCTVC-L1003_v9: high efficiency video coding (HEVC) text specification draft 10’. Proc. of the 12th JCT-VC Meeting, Geneva, 2013.
    14. 14)
      • 24. Tai, S., Chang, C., Chen, B., et al: ‘Speeding up the decisions of quad-tree structures and coding modes for HEVC coding units’, Adv. Intell. Syst. Appl. (SIST 21), 2013, 2, pp. 393401.
    15. 15)
      • 38. Diaz-Honrubia, A.J., Martinez, J.L., Cuenca, P.: ‘HEVC: a review, trends and challenges’. 2nd Workshop on Multimedia Data, Coding and Transmission (WMDCT 2012), At Elche, Alicante, Spain, 2012, pp. 16.
    16. 16)
      • 18. Chen, W.N., Hang, H.M.: ‘H.264/AVC motion estimation implementation on compute unified device architecture (CUDA)’. IEEE Int. Conf. Multimedia and Expo, 2008, pp. 697700.
    17. 17)
      • 25. Hyang-Mi, Y., Jae-Won, S.: ‘Fast coding unit decision algorithm based on inter and intra prediction unit termination for HEVC’. IEEE Int. Conf. Consumer Electronics (ICCE), Las Vegas, 2013, pp. 300301.
    18. 18)
      • 1. Sullivan, G.J., Ohm, J.R., Han, W.J., et al: ‘Overview of the high efficiency video coding (HEVC) standard’, IEEE Trans. Circuits Syst. Video Technol., 2012, 22, (12), pp. 16491668.
    19. 19)
      • 13. Po, L.M., Guo, K.: ‘Transform-domain fast sum of the squared difference computation for H.264/AVC rate-distortion optimization’, IEEE Trans. Circuits Syst. Video Technol., 2007, 17, (6), pp. 765773.
    20. 20)
      • 6. Kibeya, H., Belghith, F., Ben Ayed, M.A., et al: ‘Fast coding unit selection and motion estimation algorithm based on early detection of zero block quantified transform coefficients for high-efficiency video coding standard’, IET Image Process., 2016, 10, (5), pp. 371380.
    21. 21)
      • 40. Belghith, F., Kibeya, H., Ben Ayed, M.A., et al: ‘Statistical analysis and parametrization of HEVC encoded videos’. IEEE World Congress on Information Technology and Computer Applications Congress (WCITCA), Hammamet, Tunisia, 2015, pp. 15.
    22. 22)
      • 15. Nalluri, P., Alves, L.N., Navarro, A.: ‘A novel SAD architecture for variable block size motion estimation in HEVC video coding’. IEEE Int. Symp. on System on Chip (SoC), Tampere, 2013, pp. 14.
    23. 23)
      • 41. Frank, B., Benjamin, B., Karsten, S., et al: ‘HEVC complexity and implementation analysis’, IEEE Trans. Circuits Syst. Video Technol., 2012, 22, (12), pp. 16851696.
    24. 24)
      • 7. Furht, B.: ‘SIMD (single instruction multiple data processing), springer, encyclopedia of multimedia’ (Springer, 2008, 2nd edn.), pp. 817819.
    25. 25)
      • 21. Kim, S., Lee, D.K., Sohn, C.B., et al: ‘Fast motion estimation for HEVC with adaptive search range decision for CPU and range’. IEEE China Summit and Int. Conf. Signal and Information Processing, 2014, pp. 349353.
    26. 26)
      • 29. Khairy, M., Zahran, M., Wassal, A.G.: ‘Efficient utilization of GPGPU cache hierarchy’. GPGPU-8 Proc. of the 8th Workshop on General Purpose Processing using GPUs, New York, USA, 2015, pp. 3647.
    27. 27)
      • 37. ISO/IEC JTC1/SC29/WG11: ‘High efficiency video coding (HEVC)’, Fraunhofer Heinrich Hertz Institute, https://hevc.hhi.fraunhofer.de, accessed 20 March 2017.
    28. 28)
      • 42. Lee, D.K., Oh, S.J.: ‘Variable block size motion estimation implementation on compute unified device architecture (CUDA)’. IEEE Int. Conf. Consumer Electronics, Las Vegas, NV, USA, 2013, pp. 633634.
    29. 29)
      • 19. Rodriguez, R., Martinez, J.L.: ‘Reducing complexity in H.264/AVC motion estimation by using a GPU’. IEEE Int. Workshop on Multimedia Signal Processing (MMSP), Hangzhou, 2011, pp. 16.
    30. 30)
      • 8. Kim, J., Jun, D.S., Jeong, S., et al: ‘An SAD-based selective bi-prediction method for fast motion estimation in high efficiency video coding’, ETRI J., 2012, 34, (5), pp. 753758.
    31. 31)
      • 10. Nalluri, P., Alves, L.N., Navarro, A.: ‘Complexity reduction methods for fast motion estimation in HEVC’, Signal Process., Image Commun., 2015, 39, pp. 280292, doi: 10.1016/j.image.2015.09.015.
    32. 32)
      • 12. Khemiri, R., Bahri, N., Belghith, F., et al: ‘Fast motion estimation for HEVC video coding’. IEEEIPAS'16: Int. Image Processing Applications and Systems Conf., 2016, pp. 14.
    33. 33)
      • 22. Jiang, X., Song, T., Shimamoto, T., et al: ‘High efficiency video coding (HEVC) motion estimation parallel algorithms on GPU’, IEEE Int. Conf. Consumer Electronics, Taiwan, 2014, pp. 115116.
    34. 34)
      • 3. Ohm, J.R., Sullivan, G.J., Schwarz, H., et al: ‘Comparison of the coding efficiency of video coding standards–including high efficiency video coding (HEVC)’, IEEE Trans. CSVT, 2012, 22, (12), pp. 16491668.
    35. 35)
      • 43. Lee, D., Sim, D., Cho, K., et al: ‘Fast motion estimation for HEVC on graphics processing unit (GPU)’, J. Real-Time Image Process., 2016, 12, (2), pp. 114.
    36. 36)
      • 11. Xiong, J., Li, H., Meng, F., et al: ‘Fast HEVC inter CU decision based on latent SAD estimation’, IEEE Trans. Multimed., 2015, 17, (12), pp. 21472159.
    37. 37)
      • 27. Ahn, Y.J., Hwang, T.J., Sim, D.G., et al: ‘Implementation of fast HEVC encoder based on SIMD and data-level parallelism’, EURASIP J. Image Video Process., 2014, 1, (16), pp. 119.
    38. 38)
      • 14. Fouda, Y.M.: ‘One-dimensional vector based pattern matching’, Int. J. Comput. Sci. Inf. Technol., 2014, 6, (4), pp. 4758.
    39. 39)
      • 30. Bahri, H., Sayadi, F.E., Khemiri, R., et al: ‘Image feature extraction algorithm based on CUDA architecture: case study GFD and GCFD’, IET Comp. Digital Tech., 2017, 11, (4), pp. 125132, DOI: 10.1049/iet-cdt.2016.0135.
    40. 40)
      • 34. NVIDIA: ‘CUDA occupancy calculator’, https://www.google.tn/search?site=&source=hp&q=occupancy+calculator, accessed 13 April 2016.
    41. 41)
      • 20. Radicke, S., Hahn, J., Grecos, C., et al: ‘A highly-parallel approach on motion estimation for high efficiency video coding (HEVC)’. IEEE Int. Conf. Consumer Electronics, 2014, pp. 187188.
    42. 42)
      • 5. Belghith, F., Kibeya, H., Loukil, H., et al: ‘A new fast motion estimation algorithm using fast mode decision for high-efficiency video coding standard’, J. Real-Time Image Process., 2014, 11, (4), pp. 675691.
    43. 43)
      • 36. Dhraief, A., Aissaoui, R., Belghith, A.: ‘Parallel computing the longest common subsequence (LCS) on GPUs: efficiency and language suitability’. The First Int. Conf. Advanced Communication and Computation, INFOCOMP, Barcelona, Spain, 2011, pp. 143148.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-ipr.2017.0474
Loading

Related content

content/journals/10.1049/iet-ipr.2017.0474
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading