Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Exploiting memory allocations in clusterised many-core architectures

Power-efficient architectures have become the most important feature required for future embedded systems. Modern designs, like those released on mobile devices, reveal that clusterisation is the way to improve energy efficiency. However, such architectures are still limited by the memory subsystem (i.e. memory latency problems). This work investigates an alternative approach that exploits on-chip data locality to a large extent, through distributed shared memory systems that permit efficient reuse of on-chip mapped data in clusterised many-core architectures. First, this work reviews the current literature on memory allocations and explores the limitations of cluster-based many-core architectures. Then, several memory allocations are introduced and benchmarked scalability, performance and energy-wise against the conventional centralised shared memory solution in order to reveal which memory allocation is the most appropriate for future mobile architectures. The results show that distributed shared memory allocations bring performance gains and opportunities to reduce energy consumption.

References

    1. 1)
      • 24. Binkert, N., Beckmann, B., Black, G., et al: ‘The gem5 simulator’, ACM SIGARCH Comput. Archit. News, 2011, 39, (2), pp. 17.
    2. 2)
      • 3. Pandiyan, D., Lee, S.-Y., Wu, C.-J.: ‘Performance, energy characterizations and architectural implications of an emerging mobile platform benchmark suite – mobilebench’. IEEE Int. Symp. on Workload Characterization (IISWC), Portland, OR, USA, September 2013, pp. 133142.
    3. 3)
      • 37. Brooks, D.M., Bose, P., Schuster, S.E., et al: ‘Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors’, IEEE Micro, 2000, 20, (6), pp. 2644.
    4. 4)
      • 25. Gutierrez, A., Pusdesris, J., Dreslinski, R., et al: ‘Sources of error in full-system simulation’. IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS), Monterey, CA, USA, March 2014, pp. 1322.
    5. 5)
      • 22. Borkar, S.: ‘Thousand core chips: a technology perspective’. IEEE Design Automation Conf. (DAC), San Diego, CA, USA, June 2007, pp. 746749.
    6. 6)
      • 16. Shoushtari, M., Dutt, N.: ‘SAM: software-assisted memory hierarchy for scalable manycore embedded systems’, IEEE Embedded Sys. Lett., 2017, 9, (4), pp. 109112.
    7. 7)
      • 9. Garibotti, R., Butko, A., Ost, L., et al: ‘Efficient embedded software migration towards clusterized distributed-memory architectures’, IEEE Trans. Comput. (TC), 2016, 65, (8), pp. 26452651.
    8. 8)
      • 11. Cheng, K.-T. T., Yang, X., Wang, Y.-C.: ‘Performance optimization of vision apps on mobile application processor’. Int. Conf. on Systems, Signals and Image Processing (IWSSIP), Bucharest, Romania, July 2013, pp. 187191.
    9. 9)
      • 17. Ceriani, M., Secchi, S., Villa, O., et al: ‘Exploring efficient hardware support for applications with irregular memory patterns on multinode manycore architectures’, IEEE Trans. Parallel Distrib. Syst., 2017, 28, (6), pp. 16351648.
    10. 10)
      • 31. Stine, J.E., Castellanos, I., Wood, M., et al: ‘FreePDK: an open-source variation-aware design kit’. IEEE Int. Conf. on Microelectronic Systems Education, San Diego, CA, USA, June 2007, pp. 173174.
    11. 11)
      • 18. Hascoët, J., Desnos, K., Nezan, J.F., et al: ‘Hierarchical dataflow model for efficient programming of clustered manycore processors’. IEEE Int. Conf. on Application-specific Systems, Architectures and Processors (ASAP), Seattle, WA, USA, July 2017, pp. 137142.
    12. 12)
      • 19. On, O.J., Hussin, F.A.B.: ‘Evaluation and performance analysis of heterogeneous multicore cluster processor architecture’. Int. Conf. on Frontiers of Communications, Networks and Applications (ICFCNA), Kuala Lumpur, Malaysia, November 2014, pp. 16.
    13. 13)
      • 14. Yan, K., Fu, X.: ‘Energy-efficient cache design in emerging mobile platforms: the implications and optimizations’. IEEE Design, Automation & Test in Europe Conf. & Exhibition (DATE), Grenoble, France, April 2015, pp. 375380.
    14. 14)
      • 20. Kakoee, M.R., Petrovic, V., Benini, L.: ‘A multi-banked shared-l1 cache architecture for tightly coupled processor clusters’. Int. Symp. on System on Chip (SoC), Tampere, Finland, October 2012, pp. 15.
    15. 15)
      • 23. ARM: ‘Cortex-A series processors – a technical reference manual’, December 2017, http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.set.cortexa/index.html.
    16. 16)
      • 7. Madalozzo, G., Duenha, L., Azevedo, R., et al: ‘Scalability evaluation in many-core systems due to the memory organization’. IEEE Int. Conf. on Electronics, Circuits and Systems (ICECS), Monte Carlo, Monaco, December 2016, pp. 396399.
    17. 17)
      • 28. Butko, A., Garibotti, R., Ost, L., et al: ‘A trace-driven approach for fast and accurate simulation of manycore architectures’. Asia and South Pacific Design Automation Conf. (ASP-DAC), Chiba, Japan, January 2015, pp. 707712.
    18. 18)
      • 33. Kumar, S., Jantsch, A., Soininen, J.-P., et al: ‘A network on chip architecture and design methodology’. IEEE Computer Society Annual Symp. on VLSI (ISVLSI), Pittsburgh, PA, USA, April 2002, pp. 117124.
    19. 19)
      • 27. Butko, A., Gamatié, A., Sassatelli, G., et al: ‘Design exploration for next generation high-performance manycore on-chip systems: application to big.LITTLE architectures’. IEEE Computer Society Annual Symp. on VLSI (ISVLSI), Montpellier, France, July 2015, pp. 551556.
    20. 20)
      • 29. Uhlig, R.A.: ‘Trap-driven memory simulation’, PhD dissertation, University of Michigan, 1995.
    21. 21)
      • 13. Khanjari, S.A., Vanderbauwhede, W.: ‘Evaluation of the memory communication traffic in a hierarchical cache model for massively-manycore processors’. Euromicro Int. Conf. on Parallel, Distributed, and Network-Based Processing (PDP), Heraklion, Greece, February 2016, pp. 726733.
    22. 22)
      • 15. Alvarez, L., Vilanova, L., Moreto, M., et al: ‘Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures’. ACM/IEEE Int. Symp. on Computer Architecture (ISCA), Portland, OR, USA, June 2015, pp. 720732.
    23. 23)
      • 34. Woo, S.C., Ohara, M., Torrie, E., et al: ‘The splash-2 programs: characterization and methodological considerations’. ACM/IEEE Int. Symp. on Computer Architecture (ISCA), Santa Margherita Ligure, Italy, June 1995, pp. 2436.
    24. 24)
      • 4. Huang, Y., Zha, Z., Chen, M., et al: ‘Moby: a mobile benchmark suite for architectural simulators’. IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS), Monterey, CA, USA, March 2014, pp. 4554.
    25. 25)
      • 10. Gartner: ‘Gartner says worldwide device shipments will increase 2 percent in 2018, reaching highest year-over-year growth since 2015’, October 2017, http://www.gartner.com/newsroom/id/3816763.
    26. 26)
      • 35. Garibotti, R., Ost, L., Busseuil, R., et al: ‘Simultaneous multithreading support in embedded distributed memory MPSoCs’. IEEE Design Automation Conf. (DAC), Austin, TX, USA, June 2013, pp. 17.
    27. 27)
      • 5. Gutierrez, A., Dreslinski, R.G., Wenisch, T.F., et al: ‘Full-system analysis and characterization of interactive smartphone applications’. IEEE Int. Symp. on Workload Characterization (IISWC), Austin, TX, USA, November 2011, pp. 8190.
    28. 28)
      • 6. Bournoutian, G., Orailoglu, A.: ‘Application-aware adaptive cache architecture for power-sensitive mobile processors’, ACM Trans. Embedded Comput. Syst. (TECS), 2013, 13, (3), pp. 41:141:26.
    29. 29)
      • 32. Dong, X., Xu, C., Xie, Y., et al: ‘NVSim: a circuit-level performance, energy, and area model for emerging nonvolatile memory’, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 2012, 31, (7), pp. 9941007.
    30. 30)
      • 1. ARM: ‘Big.LITTLE technology: The future of Mobile’, 2013, http://www.arm.com/files/pdf/big_LITTLE_Technology_the_Futue_of_Mobile.pdf.
    31. 31)
      • 2. Pandiyan, D., Wu, C.-J.: ‘Quantifying the energy cost of data movement for emerging smart phone workloads on mobile platforms’. IEEE Int. Symp. on Workload Characterization (IISWC), Raleigh, NC, USA, October 2014, pp. 171180.
    32. 32)
      • 12. Reddy, B.K., Singh, A.K., Biswas, D., et al: ‘Inter-cluster thread-to-core mapping and DVFS on heterogeneous multi-cores’, IEEE Trans. Multi-Scale Comput. Syst., 2017, PP, (99), pp. 114.
    33. 33)
      • 26. Endo, F., Courousse, D., Charles, H.-P.: ‘Micro-architectural simulation of in-order and out-of-order arm microprocessors with gem5’. Int. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), Agios Konstantinos, Greece, July 2014, pp. 266273.
    34. 34)
      • 8. Ma, S., Huang, M., Cartwright, E., et al: ‘Scalable memory hierarchies for embedded manycore systems’. Int. Conf. on Reconfigurable Computing: Architectures, Tools and Applications (ARC), Hong Kong, China, March 2012, pp. 151162.
    35. 35)
      • 30. Samsung: ‘Exynos 4 quad news roundup’, May 2012, http://www.samsung.com/semiconductor/minisite/exynos/newsroom/blog/exynos-4-quad-news-roundup/.
    36. 36)
      • 36. Hu, J., Marculescu, R.: ‘Energy-aware mapping for tile-based NoC architectures under performance constraints’. Asia and South Pacific Design Automation Conf. (ASP-DAC), Kitakyushu, Japan, January 2003, pp. 233239.
    37. 37)
      • 21. Esmaeilzadeh, H., Blem, E., Amant, R., et al: ‘Dark silicon and the End of Multicore scaling’. ACM/IEEE Int. Symp. on Computer Architecture (ISCA), San Jose, CA, USA, June 2011, pp. 365376.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2018.5136
Loading

Related content

content/journals/10.1049/iet-cdt.2018.5136
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address