Your browser does not support JavaScript!

access icon free Automatic management of Software Programmable Memories in Many-core Architectures

Software Programmable Memories, or SPMs, are raw on-chip memories that are not implicitly managed by the processor hardware, but explicitly by software. For example, while caches fetch data from memories automatically and maintain coherence with other caches, SPMs explicitly manage data movement between memories and other SPMs through software instructions. SPMs make the design of on-chip memories simpler, more scalable, and power efficient, but also place additional burden for programming of SPM-based processors. Traditionally, SPMs have been utilised in embedded systems, especially multimedia and gaming systems, but recently research on SPM-based systems has seen increased interest as a means to solve the memory scaling challenges of many-core architectures. This study presents an overview of the state-of-the-art in SPM management techniques in many-core processors, summarises some recent research on SPM-based systems, and outlines future research directions in this field.


    1. 1)
    2. 2)
      • 64. Bathen, L.A.D., Dutt, N.D., Nicolau, A., et al: ‘VaMV: Variability-aware Memory Virtualization’. Proc. Conf. on Design, Automation and Test in Europe, 2012.
    3. 3)
      • 4. Intel Lab: ‘The SCC programmer's guide’., March 2014.
    4. 4)
    5. 5)
    6. 6)
      • 20. Kim, Y., Broman, D., Cai, J., et al: ‘WCET-aware dynamic code management on scratchpads for software-managed multicores’. Proc. 20th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS), 2014.
    7. 7)
      • 47. IBM Technical Library: ‘Cell broadband engine architecture and its first implementation’. Available at:
    8. 8)
    9. 9)
      • 2. Abts, D., Scott, S., Lilja, D.J.: ‘So many states, so little time: verifying memory coherence in the Cray X1’. Proc. Int. Parallel and Distributed Processing Symp., 2003.
    10. 10)
      • 63. Bathen, L.A.D., Dutt, N.D., Shin, D., et al: ‘SPMVisor: dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories’. Proc. Seventh IEEE/ACM/IFIP Int. Conf. on Hardware/Software Codesign and System Synthesis, 2011.
    11. 11)
      • 18. Puaut, I., Pais, C.: ‘Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison’. Proc. Design, Automation Test in Europe Conf., 2007.
    12. 12)
      • 23. Deverge, J.-F., Puaut, I.: ‘WCET-directed dynamic scratchpad memory allocation of data’. Proc. 19th Euromicro Conf. on Real-Time Systems, 2007.
    13. 13)
      • 9. Texas Instrument: ‘TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor (Rev. E)’., January 2012.
    14. 14)
      • 40. Kannan, A., Shrivastava, A., Pabalkar, A., et al: ‘A software solution for dynamic stack management on scratchpad memory’. Proc. Conf. on Asia and South Pacific Design Automation, 2009, pp. 612617.
    15. 15)
      • 67. Tajik, H., Donyanavard, B., Jahn, J., et al: ‘SPMPool: Runtime SPM management for embedded many-cores’. Tech. Rep. CECS TR 14-08, Center for Embedded Computer Systems, University of California, Irvine, July 2014. Available at:
    16. 16)
      • 16. Verma, M., Steinke, S., Marwedel, P.: ‘Data partitioning for maximal scratchpad usage’. Proc. of the Asia and South Pacific Design Automation Conf., 2003.
    17. 17)
    18. 18)
      • 50. Gauthier, L., Ishihara, T., Takase, H., et al: ‘Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems’. Proc. Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems, 2010.
    19. 19)
      • 17. Steinke, S., Wehmeyer, L., Lee, B., et al: ‘Assigning program and data objects to scratchpad for energy reduction’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2002, p. 409.
    20. 20)
      • 13. Avissar, O., Barua, R., Stewart, D.: ‘Heterogeneous memory management for embedded systems’. Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2001.
    21. 21)
    22. 22)
      • 60. Deng, N., Ji, W., Li, J., et al: ‘A semi-automatic scratchpad memory management framework for CMP’. Proc. Ninth Int. Conf. on Advanced Parallel Processing Technologies, 2011.
    23. 23)
      • 41. Bai, K., Shrivastava, A., Kudchadker, S.: ‘Stack data management for Limited Local Memory (LLM) multi-core processors’. Proc. Int. Conf. on Application Specific Systems, Architectures and Processors (ASAP), 2011, pp. 231234.
    24. 24)
    25. 25)
      • 48. Cai, J., Shrivastava, A.: ‘Software coherence management on non-coherent cache multi-cores’. 2016 29th Int. Conf. on VLSI Design and 2016 15th Int. Conf. on Embedded Systems (VLSID), January 2016, pp. 397402.
    26. 26)
      • 12. Sjödin, J., von Platen, C.: ‘Storage Allocation for Embedded Processors’. Proc. of the Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2001.
    27. 27)
      • 38. Egger, B., Lee, J., Shin, H.: ‘Scratchpad memory management in a multitasking environment’. Proc. eighth ACM Int. Conf. on Embedded Software, 2008.
    28. 28)
      • 22. Wan, Q., Wu, H., Xue, J.: ‘WCET-aware data selection and allocation for scratchpad memory’. Proc. 13th ACM SIGPLAN/SIGBED Int. Conf. on Languages, Compilers, Tools and Theory for Embedded Systems, 2012.
    29. 29)
      • 25. Udayakumaran, S., Barua, R.: ‘Compiler-decided dynamic memory allocation for scratch-pad based embedded systems’. Proc. Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, 2003.
    30. 30)
      • 49. Keleher, P., Cox, A.L., Zwaenepoel, W.: ‘Lazy release consistency for software distributed shared memory’. Proc. 19th Annual Int. Symp. on Computer Architecture, 1992.
    31. 31)
      • 55. Cho, D., Pasricha, S., Issenin, I., et al: ‘Compiler driven data layout optimization for regular/irregular array access patterns’. Proc. ACM SIGPLAN-SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems, 2008.
    32. 32)
    33. 33)
      • 26. Dominguez, A., Udayakumaran, S., Barua, R.: ‘Heap data allocation to scratch-pad memory in embedded systems’, J. Embed. Comput., 2005, 1, (4), pp. 521540.
    34. 34)
      • 46. AMD: ‘HPC processor comparison’. July 2012. Available at:
    35. 35)
      • 42. Lu, J., Bai, K., Shrivastava, A.: ‘SSDM: smart stack data management for Software Managed Multicores (SMMs)’. Proc. 50th Design Automation Conf. (DAC), 2013.
    36. 36)
      • 5. de Dinechin, B.D., de Massas, P.G., Lager, G., et al: ‘A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor’. Procedia Computer Science, 2013.
    37. 37)
      • 19. Wu, H., Xue, J., Parameswaran, S.: ‘Optimal WCET-aware code selection for scratchpad memory’. Proc. 10th ACM International Conf. on Embedded Software, 2010.
    38. 38)
      • 53. Takase, H., Tomiyama, H., Takada, H.: ‘Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2010.
    39. 39)
    40. 40)
      • 43. Bai, K., Shrivastava, A.: ‘Heap data management for Limited Local Memory (LLM) multi-core processors’. Proc. 23th Int. Symp. on System Synthesis (CODES + ISSS), New York, NY, USA, 2010, pp. 317326, iSBN.
    41. 41)
      • 11. Panda, P.R., Dutt, N.D., Nicolau, A.: ‘Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications’. IEEE Computer Society Proc. 1997 European Conf. on Design and Test, ser. EDTC ‘97, 1997, p. 7.
    42. 42)
      • 61. Alvarez, L., Vilanova, L., Moreto, M., et al: ‘Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures’. Proc. of the 42nd Annual Int. Symp. on Computer Architecture, 2015.
    43. 43)
      • 44. Bai, K., Shrivastava, A.: ‘Automatic and efficient heap data management for limited local memory multicore architectures’. Proc. Int. Conf. on Design Automation and Test in Europe, 2013.
    44. 44)
      • 8. ARM: ‘ARM1176JZF-S Technical Reference Manual’., July 2004.
    45. 45)
    46. 46)
      • 21. Suhendra, V., Mitra, T., Roychoudhury, A., et al: ‘WCET centric data allocation to scratchpad memory’. Proc. 26th IEEE Int. Real-Time Systems Symp., 2005.
    47. 47)
      • 29. Hu, J., Xue, C., Zhuge, Q., et al: ‘Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2011.
    48. 48)
      • 1. Heinrich, M.A.: ‘The performance and scalability of distributed shared-memory cache coherence protocols’. Ph.D. dissertation, Stanford University, Stanford, CA, USA, 1999, aAI9924431.
    49. 49)
    50. 50)
      • 51. Francesco, P., Marchal, P., Atienza, D., et al: ‘An integrated hardware/software approach for run-time scratchpad management’. Proc. 41st Annual Design Automation Conf., 2004.
    51. 51)
      • 45. Bai, K., Lu, J., Shrivastava, A., et al: ‘Cmsm: an efficient and effective code management for software managed multicores’. Proc. Int. Symp. on Hardware/Software Codesign and System Synthesis (CODES + ISSS), 2013.
    52. 52)
      • 31. Poletti, F., Marchal, P., Atienza, D., et al: ‘An integrated hardware/software approach for run-time scratchpad management’. Proc. Design Automation Conf., 2004.
    53. 53)
      • 10. Banakar, R., Steinke, S., Lee, B.-S., et al: ‘Scratchpad Memory: Design Alternative for Cache on-chip Memory in Embedded Systems’. Proc. of CODES, 2002.
    54. 54)
      • 27. Li, L., Gao, L., Xue, J.: ‘Memory coloring: a compiler approach for scratchpad memory management’. Proc. PACT, 2005.
    55. 55)
    56. 56)
      • 52. Pyka, R., Faßbach, C., Verma, M., et al: ‘Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications’. Proc. 10th Int. Workshop on Software & Compilers for Embedded Systems, 2007.
    57. 57)
      • 24. Marwedel, P., Wehmeyer, L., Verma, M., et al: ‘Fast, predictable and low energy memory references through architecture-aware compilation’. Proc. Asia and South Pacific Design Automation Conf., 2004.
    58. 58)
      • 65. Bathen, L., Dutt, N.: ‘HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories’. Proc. 49th ACM/EDAC/IEEE Design Automation Conf., 2012.
    59. 59)
    60. 60)
      • 35. Pabalkar, A., Shrivastava, A., Kannan, A., et al: ‘Sdrm: simultaneous determination of regions and function-to-region mapping for scratchpad memories’, Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V. (Eds.): ‘High Performance Computing – HiPC 2008’, 2008, (LNCS5374), pp. 569582.
    61. 61)
      • 62. Komuravelli, R., Sinclair, M.D., Alsop, J., et al: ‘Stash: have your scratchpad and cache it too’. Proc. 42nd Annual Int. Symp. on Computer Architecture, 2015.
    62. 62)
      • 39. Guthaus, M.R., Ringenberg, J.S., Ernst, D., et al: ‘MiBench: a free, commercially representative embedded benchmark suite’. Proc. IEEE Int. Workshop on Workload Characterization, 2001.
    63. 63)
    64. 64)
      • 28. Kandemir, M.T., Ramanujam, J., Irwin, M.J., et al: ‘Dynamic Management of Scratch-Pad Memory Space’. Proc. Design Automation Conf., 2001, pp. 690695.
    65. 65)
      • 15. Nguyen, N., Dominguez, A., Barua, R.: ‘Memory allocation for embedded systems with a compile-time-unknown scratch-pad size’. Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2005, pp. 115125.
    66. 66)
      • 54. Suhendra, V., Raghavan, C., Mitra, T.: ‘Integrated scratchpad memory optimization and task scheduling for MPSoC architectures’. Proc. Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, 2006.
    67. 67)

Related content

This is a required field
Please enter a valid email address