Exascale computing systems (ECS) are anticipated to perform at Exaflop speed (10¹⁸ operations per second) using power consumption <20 MW. This ultrascale performance requires the speedup in the system by thousand-fold enhancement in current Petascale. For future high-performance computing (HPC), power consumption is one of the vital challenges faced to achieve Exaflops through the traditional way of increasing clock-speed. One standard way to attain such significant performance is through massive parallelism. In the early stages, it is hard to decide the promising parallel programming approach that can provide massive parallelism to attain ExaFlops. This article commences with a short description and implementation of algorithms of various hybrid parallel programming models (PPMs) for homogeneous and heterogeneous cluster systems. Furthermore, the authors evaluated performance and power consumption in these hybrid models by implementing in two HPC benchmarking applications such as square matrix multiplication and Jacobi iterative solver for two-dimensional Laplace equation. The results demonstrated that the hybrid of heterogeneous (MPI + X) outperformed to homogeneous parallel programming (MPI + OpenMP) model. This empirical investigation of hybrid PPMs is a leading step for researchers and development communities to select a promising model for emerging ECS.

References

1. 1)
  - 17. Guo, X., Wu, J., Wu, Z.-S., et al: ‘Parallel computation of aerial target reflection of background infrared radiation: performance comparison of OpenMP, OpenACC, and CUDA implementations’, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 2016, 9, (4), pp. 1653–1662.
2. 2)
  - 4. Kirk, D.B., Wen-Mei, W.H.: ‘Programming massively parallel processors: a hands-on approach’ (Morgan Kaufmann, Massachusetts, USA, 2016).
3. 3)
  - 7. Dongarra, J., Hittinger, J., Wild, S.M., et al: ‘Applied mathematics research for exascale computing. No. LLNL-TR-651000’ (Lawrence Livermore National Laboratory (LLNL), Livermore, CA, 2014).
4. 4)
  - 6. Brodtkorb, A.R., Hagen, T.R., Saetra, M.L.: ‘Graphics processing unit (GPU) programming strategies and trends in GPU computing’, J. Parallel Distrib. Comput., 2013, 73, (1), pp. 4–13.
5. 5)
  - 39. Navarro, C.A., Hitschfeld-Kahler, N., Mateu, L.: ‘A survey on parallel computing and its applications in data-parallel problems using GPU architectures’, Commun. Comput. Phys., 2014, 15, (2), pp. 285–329.
6. 6)
  - 29. Gilbarg, D., Trudinger, N.S.: ‘Elliptic partial differential equations of second order’ (Springer, Berlin, Germany, 2015).
7. 7)
  - 23. Guo, X., Lange, M., Gorman, G., et al: ‘Developing a scalable hybrid MPI/OpenMP unstructured finite element model’, Comput. Fluids, 2015, 110, pp. 227–234.
8. 8)
  - 35. King Abdulaziz University: Available at https://www.top500.org/site/50585. Updated on June 2015, accessed online Aug 03, 2017.
9. 9)
  - 36. Aziz – Fujitsu PRIMERGY CX400: ‘Intel Xeon E5-2695v2 12C 2.4 GHz, intel TrueScale QDR’. Available at https://www.top500.org/system/178571, accessed on 3 August 2017.
10. 10)
  - 41. Ren, D.Q., Suda, R.: ‘Power efficient large matrices multiplication by load scheduling on multi-core and GPU platform with CUDA’. 12th IEEE International Conference on Computational Science and Engineering, CSE 2009, Vancouver, BC, Canada, 2009.
11. 11)
  - 40. Barroso, L.A.: ‘The price of performance’, Queue, 2005, 3, (7), pp. 48–53.
12. 12)
  - 16. George, C.: ‘Intel® Xeon phi™ coprocessor-the architecture’. Intel Whitepaper 176, 2014.
13. 13)
  - 15. Tsutsui, S., Collet, P.: ‘Massively parallel evolutionary computation on GPGPUs’ (Springer, Berlin, 2013).
14. 14)
  - 21. Liu, X., Zhong, Z., Xu, K.: ‘A hybrid solution method for CFD applications on GPU-accelerated hybrid HPC platforms’, Future Gener. Comput. Syst., 2016, 56, pp. 759–765.
15. 15)
  - 31. Ashraf, M.U., Fouz, F., Eassa, F.A.: ‘Empirical analysis of HPC using different programming models’, Int. J. Mod. Educ. Comput. Sci., 2016, 8, (6), pp. 27–34.
16. 16)
  - 27. Srivastava, P.R., Chis, M., Deb, M., et al: ‘An efficient optimization algorithm for structural software testing’, Int. J. Artif. Intell.2012, 8, (S12), pp. 68–77.
17. 17)
  - 2. Barker, B.: ‘Message passing interface (mpi)’. Workshop: High Performance Computing on Stampede, 2015.
18. 18)
  - 14. Hager, G., Wellein, G.: ‘Introduction to high performance computing for scientists and engineers’ (CRC Press, Florida, USA, 2010).
19. 19)
  - 12. Dongarra, J.J., Walker, D.W.: ‘The quest for petascale computing’, Comput. Sci. Eng., 2001, 3, (3), pp. 32–39.
20. 20)
  - 19. Schmitt, F., Dietrich, R., Juckeland, G.: ‘Scalable critical path analysis for hybrid MPI-CUDA applications’. 2014 IEEE Int. Parallel & Distributed Processing Symp. Workshops (IPDPSW), IEEE, Phoenix, AZ, USA, 2014.
21. 21)
  - 37. Hill, M.D., Marty, M.R.: ‘Amdahl's law in the multicore era’, Computer. (Long. Beach. Calif), 2008, 41, (7), pp. 33–38.
22. 22)
  - 20. Yan, Y., Chapman, B.M., Wong, M.: ‘A comparison of heterogeneous and manycore programming models’.
23. 23)
  - 8. Bergman, K., Borkar, S., Campbell, D., et al: ‘Exascale computing study: technology challenges in achieving exascale systems’. Defense Advanced Research Projects Agency Information Processing Techniques Office (DARPA IPTO), Tech. Rep. 2008 Sep 28;15.
24. 24)
  - 9. ASCR Programming Challenges for Exascale Computing: Available at https://www.nersc.gov/assets/pubs_presos/ProgrammingChallengesWorkshopReport.pdf. Report of the 2011 Workshop on Exascale Programming Challenges Marina del Rey, July 27–29, 2011.
25. 25)
  - 28. Laderman, J., Pan, V., Sha, X.-H.: ‘On practical algorithms for accelerated matrix multiplication’, Linear Algebr. Appl., 1992, 162, pp. 557–588.
26. 26)
  - 22. Agostini, E., Rossetti, D., Potluri, S.: ‘Offloading communication control logic in GPU accelerated applications’. Proc. 17th IEEE/ACM Int. Symp. on Cluster, Cloud and Grid Computing, Madrid, Spain, 2017.
27. 27)
  - 42. Villa, O., Johnson, D.R., Oconnor, M., et al: ‘Scaling the power wall: a path to exascale’. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, SC14, New Orleans, Louisiana, USA, 2014.
28. 28)
  - 3. Vidal, R., Moretó, M., Chasapis, D., et al: ‘Evaluating the impact of OpenMP 4.0 extensions on relevant parallel workloads’. 11th International Workshop on OpenMP, IWOMP 2015, Aachen, Germany, October 1–2, 2015.
29. 29)
  - 34. ‘Fujitsu to provide high-performance computing and services solution to king abdulaziz university’. Available at http://www.fujitsu.com/global/about/resources/news/press-releases/2014/0922-01.html. Updated Sep 22, 2014, accessed online July 06, 2017.
30. 30)
  - 25. Colmenares, J., Galizia, A., Ortiz, J., et al: ‘A combined MPI-CUDA parallel solution of linear and nonlinear Poisson-Boltzmann equation’, BioMed Res. Int., 2014, 2014.
31. 31)
  - 11. Antao, S.F., Jacob, A.C., Nair, R., et al: ‘Exploiting fine-and coarse-grained parallelism using a directive based approach’. OpenMP: Heterogeneous Execution and Data Movements: 11th Int. Workshop on OpenMP, IWOMP 2015, Aachen, Germany, October 1–2, 2015, Springer Proceedings 2015, vol. 9342.
32. 32)
  - 30. Burden, R.L., Faires, J.D.: ‘Numerical analysis. Brooks/Cole’ vol. 206, (Thomson Learning Inc., Pacific Grove CA 93950, USA, 2001), p. 772.
33. 33)
  - 1. Shalf, J., Dosanjh, S., Morrison, J.: ‘Exascale computing technology challenges’. High performance computing for computational science-VECPAR 2010, Springer, Berlin Heidelberg, 2011, pp. 1–25. DOI: 10.1007/978-3-642-19328-6 1.
34. 34)
  - 24. Woodsend, K., Gondzio, J.: ‘Hybrid MPI/OpenMP parallel linear support vector machine training’, J. Mach. Learn. Res., 10 Aug 2009, 10, pp. 1937–1953.
35. 35)
  - 26. Bethune, I., Bull, M., Dingle, N.J., et al: ‘Performance analysis of asynchronous Jacobi's method implemented in MPI, SHMEM and OpenMP’, Int. J. High. Perform. Comput. Appl., 2014, 28, (1), pp. 97–111.
36. 36)
  - 33. Ashraf, M.U., Eassa, F.A., Albeshri, A.A., et al: ‘Toward exascale computing systems: an energy efficient massive parallel computational model’, Int. J. Adv. Comput. Sci. Appl., 2018, 9, (2), pp. 118–126.
37. 37)
  - 10. Reed, D.A., Dongarra, J.: ‘Exascale computing and big data’, Commun. ACM, 2015, 58, (7), pp. 56–68.
38. 38)
  - 32. Open Hardware Monitor: Available at http://openhardwaremonitor.org, accessed 2 Feb, 2018.
39. 39)
  - 5. Yan, Y., Grossman, M., Sarkar, V.: ‘JCUDA: A programmer-friendly interface for accelerating Java programs with CUDA’. Euro-Par 2009 Parallel Processing, Springer, Berlin Heidelberg, 2009.
40. 40)
  - 18. Satarić, B., Slavnić, V., Belićb, A., et al: ‘Hybrid OpenMP/MPI programs for solving the time-dependent gross–pitaevskii equation in a fully anisotropic trap’, Comput. Phys. Commun., 2016, 200, pp. 411–417.
41. 41)
  - 38. Hwang, K., Xu, Z.: ‘Scalable parallel computing: technology, architecture, programming’ (WCB/McGraw-Hill, Boston, 1998).
42. 42)
  - 13. Misra, S., Alfa, A.A., Adewale, S.O., et al: ‘A two-way loop algorithm for exploiting instruction-level parallelism in memory system’. Int. Conf. on Computational Science and Its Applications, Guimarães, Portugal, June 30 – July 3, 2014.

Empirical investigation: performance and power-consumption based dual-level model for exascale computing systems

References

Related content