Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Amdahl's law in the context of heterogeneous many-core systems – a survey

Amdahl's law in the context of heterogeneous many-core systems – a survey

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

For over 50 years, Amdahl's Law has been the hallmark model for reasoning about performance bounds for homogeneous parallel computing resources. As heterogeneous, many-core parallel resources continue to permeate into the modern server and embedded domains, there has been growing interest in promulgating realistic extensions and assumptions in keeping with newer use cases. This study aims to provide a comprehensive review of the purviews and insights provided by the extensive body of work related to Amdahl's law to date, focusing on computation speedup. The authors show that a significant portion of these studies has looked into analysing the scalability of the model considering both workload and system heterogeneity in real-world applications. The focus has been to improve the definition and semantic power of the two key parameters in the original model: the parallel fraction (f) and the computation capability improvement index (n). More recently, researchers have shown normal-form and multi-fraction extensions that can account for wider ranges of heterogeneity, validated on many-core systems running realistic workloads. Speedup models from Amdahl's law onwards have seen a wide range of uses, such as the optimisation of system execution, and these uses are even more important with the advent of the heterogeneous many-core era.

References

    1. 1)
      • 68. Lee, S., Kim, S.H., Ro, W.W.: ‘Multicore speedup models using frequency scaling with fixed power budget’. 2014 Int. Conf. on Electronics, Information and Communications (ICEIC), Kota Kinabalu, Malaysia, January 2014, pp. 12.
    2. 2)
      • 75. McKee, S.A., Wisniewski, R.W.: ‘Memory wall’ (Springer, US, 2011).
    3. 3)
      • 20. Morad, A., Morad, T.Y., Leonid, Y., et al: ‘Generalized multi-Amdahl: optimization of heterogeneous multi-accelerator SoC’, IEEE Comput. Archit. Lett., 2014, 13, (1), pp. 3740.
    4. 4)
      • 85. Greenhalgh, P.: ‘White paper: big.LITTLE processing with ARM Cortex-A15 and Cortex-A7 – improving energy efficiency in high-performance mobile platforms’, ARM, 2011. Available at https://www.cl.cam.ac.uk/rdm34/big.LITTLE.pdf.
    5. 5)
      • 1. Amdahl, G.M.: ‘Validity of the single processor approach to achieving large scale computing capabilities’, IEEE Solid-State Circuits Soc. Newsl., 2007, 12, (3), pp. 1920.
    6. 6)
      • 46. Chen, X., Lu, Z., Jantsch, A., et al: ‘Speedup analysis of data-parallel applications on multi-core NoCs’. IEEE 8th Int. Conf. on ASIC, Changsha, People's Republic of China, October 2009, pp. 105108.
    7. 7)
      • 11. Koomey, J., Berard, S., Sanchez, M., et al: ‘Implications of historical trends in the electrical efficiency of computing’, IEEE Ann. Hist. Comput., 2011, 33, (3), pp. 4654.
    8. 8)
      • 37. Cassidy, A.S., Andreou, A.G.: ‘Beyond Amdahl's law: an objective function that links multiprocessor performance gains to delay and energy’, IEEE Trans. Comput., 2012, 61, (8), pp. 11101126.
    9. 9)
      • 55. Eyerman, S., Eeckhout, L.: ‘Modeling critical sections in Amdahl's law and its implications for multicore design’, SIGARCH Comput. Archit. News, 2010, 38, (3), pp. 362370. Available at http://doi.acm.org/10.1145/1816038.1816011.
    10. 10)
      • 89. Mujtaba, H.: ‘Intel Skylake GPU architecture analysis’, 2015. Available at https://wccftech.com/idf15-intel-skylake-analysis-cpu-gpumicroarchitecture-ddr4-memory-impact/3/.
    11. 11)
      • 32. Ryou, J.C., Wong, J.S.K.: ‘A task migration algorithm for load balancing in a distributed system’. Proc. of the Twenty-Second Annual Hawaii Int. Conf. on System Sciences: Software Track, Kailua-Kona, HI, USA, January 1989, vol. 2, pp. 10411048.
    12. 12)
      • 43. Yao, E., Bao, Y., Chen, M.: ‘What Hill–Marty model learn from and break through Amdahl's law?’, Inf. Process. Lett., 2011, 111, (23), pp. 10921095.
    13. 13)
      • 44. Zidenberg, T., Keslassy, I., Weiser, U.: ‘Optimal resource allocation with multi-Amdahl’, Computer, 2013, 46, (7), pp. 7077.
    14. 14)
      • 82. Wittenbrink, C.M., Kilgariff, E., Prabhu, A.: ‘FERMI GF100 GPU architecture’, IEEE Micro, 2011, 31, (2), pp. 5059.
    15. 15)
      • 104. Kim, W., Gupta, M.S., Wei, G.Y., et al: ‘System level analysis of fast, per-core DVFS using on-chip switching regulators’. 2008 IEEE 14th Int. Symp. on High Performance Computer Architecture, Salt Lake City, UT, USA, February 2008, pp. 123134.
    16. 16)
      • 9. Eyerman, S., Eeckhout, L.: ‘Fine-grained DVFS using on-chip regulators’, ACM Trans. Archit. Code Optim., 2011, 8, (1), pp. 1:11:24. Available at http://doi.acm.org/10.1145/1952998.1952999.
    17. 17)
      • 22. Gustafson, J.L.: ‘Reevaluating Amdahl's law’, Commun. ACM, 1988, 31, (5), pp. 532533. Available at http://doi.acm.org/10.1145/42411.42415.
    18. 18)
      • 15. Al-hayanni, M.A.N., Shafik, R., Rafiev, A., et al: ‘Speedup and parallelization models for energy-efficient many-core systems using performance counters’. 2017 Int. Conf. on High Performance Computing Simulation (HPCS), Genova, Italy, July 2017, pp. 410417.
    19. 19)
      • 80. Kumar, R., Tullsen, D.M., Ranganathan, P., et al: ‘Single-ISA heterogeneous multi-core architectures for multithreaded workload performance’. Proc. 31st Annual Int. Symp. on Computer Architecture, 2004, München, Germany, June 2004, pp. 6475.
    20. 20)
      • 14. Sridharan, S., Gupta, G., Sohi, G.S.: ‘Adaptive, efficient, parallel execution of parallel programs’. Proc. of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation, ser. PLDI'14, Edinburgh, UK, 2014, pp. 169180. Available at http://doi.acm.org/10.1145/2594291.2594292.
    21. 21)
      • 57. Yao, E., Bao, Y., Tan, G., et al: ‘Extending Amdahl's law in the multicore era’, SIGMETRICS Perform. Eval. Rev., 2009, 37, (2), pp. 2426. Available at http://doi.acm.org/10.1145/1639562.1639571.
    22. 22)
      • 81. Arenas, M.G., Mora, A.M., Romero, G., et al: ‘GPU computation in bioinspired algorithms: a review’ (Springer, Berlin, Heidelberg, 2011), pp. 433440. Available at http://dx.doi.org/10.1007/978-3-642-21501-8-54.
    23. 23)
      • 2. Sun, X.H., Ni, L.M.: ‘Another view on parallel speedup’. Proc. Super Computing ‘90, New York NY, USA, November 1990, pp. 324333.
    24. 24)
      • 38. Rafiev, A., Al-hayanni, M.A.N., Xia, F., et al: ‘Extending multi-fraction speedup models to normal form heterogeneity’. Tech. Rep. NCL-EEE-MICRO-TR-2018-202, μ Systems Research Group, School of Engineering, Newcastle University, 2018.
    25. 25)
      • 93. Duda, K.J., Cheriton, D.R.: ‘Borrowed-virtual-time BVT scheduling: supporting latency-sensitive threads in a general-purpose scheduler’, SIGOPS Oper. Syst. Rev., 1999, 33, (5), pp. 261276. Available at http://doi.acm.org/10.1145/319344.319169.
    26. 26)
      • 74. Wulf, W.A., McKee, S.A.: ‘Hitting the memory wall: implications of the obvious’, SIGARCH Comput. Archit. News, 1995, 23, (1), pp. 2024. Available at http://doi.acm.org/10.1145/216585.216588.
    27. 27)
      • 59. Ye, N., Hao, Z., Xie, X.: ‘The speedup model for many core processor’. 2013 Int. Conf. on Information Science and Cloud Computing Companion, Guangzhou, People's Republic of China, December 2013, pp. 469474.
    28. 28)
      • 76. Zurawski, R.: ‘Embedded systems handbook, second edition: embedded systems design and verification’ (Taylor and Francis, UK, 2009).
    29. 29)
      • 106. Singh, K., Bhadauria, M., McKee, S.A.: ‘Real time power estimation and thread scheduling via performance counters’, ACM SIGARCH Comput. Archit. News, 2009, 37, (2), pp. 4655.
    30. 30)
      • 101. Deng, X., Dymond, P.: ‘On multiprocessor system scheduling’. Proc. of the Eighth Annual ACM Symp. on Parallel Algorithms and Architectures, ser. SPAA ‘96, Padua, Italy, 1996, pp. 8288. Available at http://doi.acm.org/10.1145/237502.237510.
    31. 31)
      • 70. Sasaki, H., Imamura, S., Inoue, K.: ‘Coordinated power-performance optimization in many cores’. Proc. of the 22nd Int. Conf. on Parallel Architectures and Compilation Techniques, Edinburgh, UK, September 2013, pp. 5161.
    32. 32)
      • 72. Ayoub, R., Ogras, U., Gorbatov, E., et al: ‘OS-level power minimization under tight performance constraints in general purpose systems’. IEEE/ACM Int. Symp. on Low Power Electronics and Design, Fukuoka, Japan, August 2011, pp. 321326.
    33. 33)
      • 30. Gupta, U., Korrapati, S., Matturu, N., et al: ‘A generic energy optimization framework for heterogeneous platforms using scaling models’, Microprocess. Microsyst., 2016, 40, (Suppl. C), pp. 7487. Available at http://www.sciencedirect.com/science/article/pii/S0141933115000885.
    34. 34)
      • 66. Che, H., Nguyen, M.: ‘Amdahl's law for multithreaded multicore processors’, J. Parallel Distrib. Comput., 2014, 74, (10), pp. 30563069.
    35. 35)
      • 100. McCann, C., Vaswani, R., Zahorjan, J.: ‘A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors’, ACM Trans. Comput. Syst., 1993, 11, (2), pp. 146178. Available at http://doi.acm.org/10.1145/151244.151246.
    36. 36)
      • 83. Lee, V.W., Grochowski, E., Geva, R.: ‘Performance benefits of heterogeneous computing in HPC workloads’. 2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum, Shanghai, People's Republic of China, May 2012, pp. 1626.
    37. 37)
      • 17. Morad, T.Y., Weiser, U.C., Kolodny, A., et al: ‘Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors’, IEEE Comput. Archit. Lett., 2006, 5, (1), pp. 417. Available at http://dx.doi.org/10.1109/L-CA.2006.6.
    38. 38)
      • 31. Wilson, L.F., Shen, W.: ‘Experiments in load migration and dynamic load balancing in SPEEDES’. 1998 Winter Simulation Conf. Proc. (Cat. No.98CH36274), Washington, DC, USA, December 1998, vol. 1, pp. 483490.
    39. 39)
      • 62. Khanyile, N.P., Tapamo, J.-R., Dube, E.: ‘An analytic model for predicting the performance of distributed applications on multicore clusters’, IAENG Int. J. Comp. Sci., 2012, 39, (3), pp. 312320, available at http://www.iaeng.org/IJCS/issues_v39/issue_3/IJCS_39_3_11.pdf (retrieved Mar 2020).
    40. 40)
      • 87. Power, J., Basu, A., Gu, J., et al: ‘Heterogeneous system coherence for integrated CPU-GPU systems’. 2013 46th Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO), Davis, CA, USA, December 2013, pp. 457467.
    41. 41)
      • 48. Rodrigues, E.R., Madruga, F.L., Navaux, P.O.A., et al: ‘Multi-core aware process mapping and its impact on communication overhead of parallel applications’. 2009 IEEE Symp. on Computers and Communications, Sousse, Tunisia, July 2009, pp. 811817.
    42. 42)
      • 60. Blem, E., Esmaeilzadeh, H., Amant, R.S., et al: ‘Multicore model from abstract single core inputs’, IEEE Comput. Archit. Lett., 2013, 12, (2), pp. 5962.
    43. 43)
      • 29. Issa, J., Figueira, S.: ‘Performance and power-consumption analysis of mobile internet devices’. 30th IEEE Int. Performance Computing and Communications Conf., London, UK, November 2011, pp. 16.
    44. 44)
      • 91. Palacios, J., Triska, J.: ‘A comparison of modern GPU and CPU architectures: and the common convergence of both’, Oregon State University, 2011. Available at https://hgpu.org/?p=6610.
    45. 45)
      • 26. Cameron, K.W., Ge, R.: ‘Generalizing Amdahl's law for power and energy’, Computer, 2012, 45, (3), pp. 7577.
    46. 46)
      • 28. Marowka, A.: ‘Extending Amdahl's law for heterogeneous computing’. 2012 IEEE 10th Int. Symp. on Parallel and Distributed Processing with Applications, Madrid, Spain, July 2012, pp. 309316.
    47. 47)
      • 97. The international technology roadmap for semiconductors ITRS, 2017. Available at http://www.itrs2.net/.
    48. 48)
      • 84. Kumar, R., Farkas, K.I., Jouppi, N.P., et al: ‘Single-ISA heterogeneous multi-core architectures: the potential for processor power reduction’. Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2003. MICRO-36, San Diego, CA, USA, December 2003, pp. 8192.
    49. 49)
      • 73. Casey, S.D.: ‘How to determine the effectiveness of hyper-threading technology with an application’, 2011. Available at https://software.intel.com/en-us/articles/how-to-determinethe-effectiveness-of-hyper-threading-technology-with-an-application.
    50. 50)
      • 65. Chung, E.S., Milder, P.A., Hoe, J.C., et al: ‘Single-chip heterogeneous computing: does the future include custom logic, FPGAs, and GPGPUs?’. 2010 43rd Annual IEEE/ACM Int. Symp. on Microarchitecture, Atlanta, GA, USA, December 2010, pp. 225236.
    51. 51)
      • 54. Sun, X.-H., Chen, Y.: ‘Reevaluating Amdahl's law in the multicore era’, J. Parallel Distrib. Comput., 2010, 70, (2), pp. 183188. Available at http://dx.doi.org/10.1016/j.jpdc.2009.05.002.
    52. 52)
      • 102. Yue, K.K., Lilja, D.J.: ‘Implementing a dynamic processor allocation policy for multiprogrammed parallel applications in the Solaris™’, Concurrency Comput. Pract. Exp., 2001, 13, (6), pp. 449464.
    53. 53)
      • 50. Li, X., Malek, M.: ‘Analysis of speedup and communication/computation ratio in multiprocessor systems’. Proc. Real-Time Systems Symp., Huntsville, AL, USA, December 1988, pp. 282288.
    54. 54)
      • 45. Rogers, B.M., Krishna, A., Bell, G.B., et al: ‘Scaling the bandwidth wall: challenges in and avenues for CMP scaling’, SIGARCH Comput. Archit. News, 2009, 37, (3), pp. 371382. Available at http://doi.acm.org/10.1145/1555815.1555801.
    55. 55)
      • 64. Huang, T., Zhu, Y., Qiu, M., et al: ‘Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi-core processors’, J. Supercomput., 2013, 66, (1), pp. 305319. Available at http://dx.doi.org/10.1007/s11227-013-0908-9.
    56. 56)
      • 25. Woo, D.H., Lee, H.H.S.: ‘Extending Amdahl's law for energy-efficient computing in the many-core era’, Computer, 2008, 41, (12), pp. 2431.
    57. 57)
      • 42. Sun, X., Ni, L.: ‘Scalable problems and memory-bounded speedup’, J. Parallel Distrib. Comput., 1993, 19, (1), pp. 2737. Available at http://www.sciencedirect.com/science/article/pii/S0743731583710877.
    58. 58)
      • 3. Al-hayanni, M.A.N., Rafiev, A., Shafik, R., et al: ‘Power and energy normalized speedup models for heterogeneous many core computing’. 2016 16th Int. Conf. on Application of Concurrency to System Design ACSD, Torun, Poland, June 2016, pp. 8493.
    59. 59)
      • 58. Juurlink, B., Meenderinck, C.H.: ‘Amdahl's law for predicting the future of multicores considered harmful’, SIGARCH Comput. Archit. News, 2012, 40, (2), pp. 19. Available at http://doi.acm.org/10.1145/2234336.2234338.
    60. 60)
      • 52. Pei, S., Zhang, J., Jiang, L., et al: ‘Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing’, KSII Trans. Internet Inf. Syst., 2016, 10, (7), pp. 32313244.
    61. 61)
      • 67. Tang, S., Lee, B.S., He, B.: ‘Speedup for multi-level parallel computing’. 2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum, Shanghai, People's Republic of China, May 2012, pp. 537546.
    62. 62)
      • 35. Cormen, T.H., Leiserson, C.E., Rivest, R.L., et al: ‘Introduction to algorithm’ (The MIT Press, USA, 2009). Available at https://mcdtu.files.wordpress.com/2017/03/introduction-toalgorithms-3rd-edition-sep-2010.pdf.
    63. 63)
      • 10. Moore, G.E.: ‘Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, April 19, 1965, pp.114 ff’, IEEE Solid-State Circuits Soc. Newsl., 2006, 11, (5), pp. 3335.
    64. 64)
      • 103. Gupta, U., Ogras, U.Y.: ‘Constrained energy optimization in heterogeneous platforms using generalized scaling models’, IEEE Comput. Archit. Lett., 2015, 14, (1), pp. 2125.
    65. 65)
      • 36. Agrawal, K., He, Y., Hsu, W., et al: ‘Adaptive scheduling with parallelism feedback’. 2007 IEEE Int. Parallel and Distributed Processing Symp., Rome, Italy, March 2007, pp. 17.
    66. 66)
      • 95. Boothe, B., Ranade, A.: ‘Improved multithreading techniques for hiding communication latency in multiprocessors’, SIGARCH Comput. Archit. News, 1992, 20, (2), pp. 214223. Available at http://doi.acm.org/10.1145/146628.139729.
    67. 67)
      • 86. Venkat, A., Tullsen, D.M.: ‘Harnessing ISA diversity: design of a heterogeneous-ISA chip multiprocessor’. 2014 ACM/IEEE 41st Int. Symp. on Computer Architecture (ISCA), Minneapolis, MN, USA, June 2014, pp. 121132.
    68. 68)
      • 79. Vajda, A.: ‘Multi-core and many-core processor architectures’ (Springer, USA, 2011).
    69. 69)
      • 63. Khanyile, N.P., Tapamo, J.-R., Dube, E.: ‘Performance prediction model for distributed applications on multicore clusters’, Proceedings of the World Congress on Engineering, London, U.K, July 4–6, 2012, Vol II WCE 2012.
    70. 70)
      • 53. Sun, X.-H., Chen, Y., Byna, S.: ‘Scalable computing in the multicore era’. Proc. of the Int. Symp. on Parallel Architectures, Algorithms and Programming, Sydney, NSW, Australia, 2008.
    71. 71)
      • 78. Vipin, K., Fahmy, S.A.: ‘FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications’, ACM Comput. Surv., 2018, 51, (4), pp. 72:172:39. Available at http://doi.acm.org/10.1145/3193827.
    72. 72)
      • 96. Mittal, S., Vetter, J.S.: ‘A survey of CPU-GPU heterogeneous computing techniques’, ACM Comput. Surv., 2015, 47, (4), pp. 69:169:35. Available at http://doi.acm.org/10.1145/2788396.
    73. 73)
      • 19. Zidenberg, T., Keslassy, I., Weiser, U.: ‘Multi-Amdahl: how should I divide my heterogeneous chip?’, IEEE Comput. Archit. Lett., 2012, 11, (2), pp. 6568.
    74. 74)
      • 99. Han, S., Yun, Y., Kim, Y.H.: ‘Profiling-based task graph extraction on multiprocessor system-on-chip’. 2016 IEEE Asia Pacific Conf. on Circuits and Systems (APCCAS), Jeju, Republic of Korea, October 2016, pp. 510513.
    75. 75)
      • 49. Ahmad, T.B., Ciesielski, M.: ‘An approach to multi-core functional gate-level simulation minimizing synchronization and communication overheads’. 2013 14th Int. Workshop on Microprocessor Test and Verification, Austin, TX, USA, December 2013, pp. 7782.
    76. 76)
      • 39. Londono, S.M., de Gyvez, J.P.: ‘Extending Amdahl's law for energy-efficiency’. 2010 Int. Conf. on Energy Aware Computing, Cairo, Egypt, December 2010, pp. 14.
    77. 77)
      • 24. Yun, Y., Han, S., Kim, Y.H.: ‘Estimation of maximum speed-up in multicore-based mobile devices’, IEEE Embedded Syst. Lett., 2019, 11, (2), pp. 6265.
    78. 78)
      • 92. Cullinan, C., Wyant, C., Frattesi, T., et al: ‘Computing performance benchmarks among CPU, GPU, and FPGA’, Worcester Polytechnic Institute, 2012. Available at https://web.wpi.edu/Pubs/E-project/Available/Eproject-030212-123508/unrestricted/Benchmarking_Final.pdf.
    79. 79)
      • 98. Leiserson, C.: ‘What the $#@¡ is parallelism, anyhow?’, 2017. Available at https://www.cprogramming.com/parallelism.html.
    80. 80)
      • 61. Loh, G.H.: ‘The cost of uncore in throughput-oriented many-core processors’. Proc. of Workshop on Architectures and Languages for Throughput Applications (ALTA), Beijing, People's Republic of China, 2008, pp. 19.
    81. 81)
      • 23. Mercelis, S.: ‘A systematic multi-layered approach for optimizing and parallelizing real-time media and audio applications’. Ph.D. dissertation, University of Antwerp, 2016.
    82. 82)
      • 34. Li, Y., Niu, J., Long, X., et al: ‘Energy efficient scheduling with probability and task migration considerations for soft real-time systems’. 2014 IEEE Computers, Communications and IT Applications Conf., Beijing, People's Republic of Chin, October 2014, pp. 287293.
    83. 83)
      • 18. Sato, T., Mori, H., Yano, R., et al: ‘Importance of single-core performance in the multicore era’. Proc. of the Thirty-fifth Australasian Computer Science Conf. - Volume 122, ser. ACSC ‘12, Darlinghurst, NSW, Australia, 2012, pp. 107114. Available at http://dl.acm.org/citation.cfm?id=2483654.2483667.
    84. 84)
      • 47. Kumar, R., Zyuban, V., Tullsen, D.M.: ‘Interconnections in multi-core architectures: understanding mechanisms, overheads and scaling’. 32nd Int. Symp. on Computer Architecture (ISCA'05), Madison, WI, USA, June 2005, pp. 408419.
    85. 85)
      • 56. Moncrieff, D., Overill, R.E., Wilson, S.: ‘Heterogeneous computing machines and Amdahl's law’, Parallel Comput., 1996, 22, (3), pp. 407413.
    86. 86)
      • 4. Hill, M.D., Marty, M.R.: ‘Amdahl's law in the multicore era’, Computer, 2008, 41, (7), pp. 3338.
    87. 87)
      • 12. Borkar, S.: ‘Thousand core chips: a technology perspective’. Proc. of the 44th annual Design Automation Conf., San Diego, CA, USA, June 2007, pp. 746749.
    88. 88)
      • 6. Rabaey, J., Pedram, M.: ‘Low power design methodologies’, ser. The Springer International Series in Engineering and Computer Science (Springer, US, 2012). Available at https://books.google.co.uk/books?id=9IzuBwAAQBAJ.
    89. 89)
      • 13. Downey, A.B.: ‘A model for speedup of parallel programs’. Tech. Rep., UC Berkeley, Berkeley, CA, USA, 1997.
    90. 90)
      • 8. Mittal, S.: ‘A survey of techniques for improving energy efficiency in embedded computing systems’, Int. J. Comput. Aided Eng. Technol., 2014, 6, (4), pp. 440459.
    91. 91)
      • 107. Li, Y., Niu, J., Zhang, J., et al: ‘An optimized RM algorithm by task affinity on multi-core processor’. 2016 IEEE 22nd Int. Conf. on Parallel and Distributed Systems (ICPADS), Wuhan, Hubei, People's Republic of China, December 2016, pp. 286293.
    92. 92)
      • 5. Rafiev, A., Al-hayanni, M.A.N., Xia, F., et al: ‘Speedup and power scaling models for heterogeneous many-core systems’, IEEE Trans. Multi-Scale Comput. Syst., 2018, 4, (3), pp. 436449.
    93. 93)
      • 90. Lee, V.W., Kim, C., Chhugani, J., et al: ‘Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU’, SIGARCH Comput. Archit. News, 2010, 38, (3), pp. 451460. Available at http://doi.acm.org/10.1145/1816038.1816021.
    94. 94)
      • 105. Aalsaud, A., Rafiev, A., Xia, F., et al: ‘Model-free runtime management of concurrent workloads for energy-efficient many-core heterogeneous systems’. 28th Int. Symp. on Power and Timing Modeling, Optimization and Simulation (PATMOS), Platja d'Aro, Spain, 2018, pp. 206213.
    95. 95)
      • 41. Gupta, U., Campbell, J., Ogras, U.Y., et al: ‘Adaptive performance prediction for integrated GPUs’. Proc. of the 35th Int. Conf. on Computer-Aided Design, ser. ICCAD ‘16, Austin, TX, USA, 2016, pp. 61:161:8. Available at http://doi.acm.org/10.1145/2966986.2966997.
    96. 96)
      • 69. Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: ‘Thread reinforcer: dynamically determining number of threads via OS level monitoring’. 2011 IEEE Int. Symp. on Workload Characterization (IISWC), Austin, TX, USA, November 2011, pp. 116125.
    97. 97)
      • 88. Pathania, A., Jiao, Q., Prakash, A., et al: ‘Integrated CPU-GPU power management for 3D mobile games’. Proc. of the 51st Annual Design Automation Conf., ser. DAC ‘14, San Francisco, CA, USA, 2014, pp. 40:140:6. Available at http://doi.acm.org/10.1145/2593069.2593151.
    98. 98)
      • 21. Intel, 2017. Available at https://www.intel.co.uk/content/www/uk/en/homepage.html.
    99. 99)
      • 77. Quinn, M.J.: ‘Parallel programming in C with MPI and OpenMP’ (McGraw-Hill Education Group, USA, 2003).
    100. 100)
      • 94. Agarwal, A., Bianchini, R., Chaiken, D., et al: ‘The MIT alewife machine: architecture and performance’, SIGARCH Comput. Archit. News, 1995, 23, (2), pp. 213. Available at http://doi.acm.org/10.1145/225830.223985.
    101. 101)
      • 33. Johari, S., Kumar, A.: ‘Algorithmic approach for applying load balancing during task migration in multi-core system’. 2014 Int. Conf. on Parallel, Distributed and Grid Computing, Solan, Himachal Pradesh, India, December 2014, pp. 2732.
    102. 102)
      • 27. Cho, S., Melhem, R.: ‘Corollaries to Amdahl's law for energy’, IEEE Comput. Archit. Lett., 2008, 7, (1), pp. 2528.
    103. 103)
      • 51. Pei, S., Kim, M.S., Gaudiot, J.L.: ‘Extending Amdahl's law for heterogeneous multicore processor with consideration of the overhead of data preparation’, IEEE Embedded Sys. Lett., 2016, 8, (1), pp. 2629.
    104. 104)
      • 16. Intel, Intel 64 and IA-32 Architectures Software Developer's Manual. Volume 3B: System Programming Guide, Part 2. Intel, September 2016. Available at http://www.intel.co.uk/content/www/uk/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html.
    105. 105)
      • 71. Al-Babtain, B.M., Al-Kanderi, F.J., Al-Fahad, M.F., et al: ‘A survey on Amdahl's law extension in multicore architectures’, Int. J. New Comput. Archit. Their Appl., 2013, 3, pp. 3046.
    106. 106)
      • 40. Ofenbeck, G., Steinmann, R., Cabezas, V.C., et al: ‘Applying the roofline model’. IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS), Monterey, CA, USA, 2014, pp. 7685.
    107. 107)
      • 7. Xia, F., Rafiev, A., Aalsaud, A., et al: ‘Voltage, throughput, power, reliability, and multicore scaling’, Computer, 2017, 50, (8), pp. 3445.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2018.5220
Loading

Related content

content/journals/10.1049/iet-cdt.2018.5220
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address