© The Institution of Engineering and Technology
Software Programmable Memories, or SPMs, are raw on-chip memories that are not implicitly managed by the processor hardware, but explicitly by software. For example, while caches fetch data from memories automatically and maintain coherence with other caches, SPMs explicitly manage data movement between memories and other SPMs through software instructions. SPMs make the design of on-chip memories simpler, more scalable, and power efficient, but also place additional burden for programming of SPM-based processors. Traditionally, SPMs have been utilised in embedded systems, especially multimedia and gaming systems, but recently research on SPM-based systems has seen increased interest as a means to solve the memory scaling challenges of many-core architectures. This study presents an overview of the state-of-the-art in SPM management techniques in many-core processors, summarises some recent research on SPM-based systems, and outlines future research directions in this field.
References
-
-
1)
-
34. Verma, M., Marwedel, P.: ‘Overlay techniques for scratchpad memories in low power embedded processors’, IEEE Trans. Very Large Scale Integr. Syst., 2006, 14, (8), pp. 802–815 (doi: 10.1109/TVLSI.2006.878469).
-
2)
-
64. Bathen, L.A.D., Dutt, N.D., Nicolau, A., et al: ‘VaMV: Variability-aware Memory Virtualization’. Proc. Conf. on Design, Automation and Test in Europe, 2012.
-
3)
-
4. Intel Lab: ‘The SCC programmer's guide’. .
-
4)
-
37. Egger, B., Kim, S., Jang, C., et al: ‘Scratchpad memory management techniques for code in embedded systems without an MMU’, IEEE Trans. Comput., 2010, 59, (8), pp. 1047–1062 (doi: 10.1109/TC.2009.188).
-
5)
-
32. Ishitobi, Y., Ishihara, T., Yasuura, H.: ‘Code and data placement for embedded processors with scratchpad and cache memories’, Signal Process. Syst., 2010, 60, (2), pp. 211–224 (doi: 10.1007/s11265-008-0306-3).
-
6)
-
20. Kim, Y., Broman, D., Cai, J., et al: ‘WCET-aware dynamic code management on scratchpads for software-managed multicores’. Proc. 20th IEEE Real-Time and Embedded Technology and Applications Symp. (RTAS), 2014.
-
7)
-
47. IBM Technical Library: .
-
8)
-
36. Egger, B., Lee, J., Shin, H.: ‘Dynamic scratchpad memory management for code in portable systems with an MMU’, ACM Trans. Embed. Comput. Syst., 2008, 7, (2), pp. 11:1–11:38 (doi: 10.1145/1331331.1331335).
-
9)
-
2. Abts, D., Scott, S., Lilja, D.J.: ‘So many states, so little time: verifying memory coherence in the Cray X1’. Proc. Int. Parallel and Distributed Processing Symp., 2003.
-
10)
-
63. Bathen, L.A.D., Dutt, N.D., Shin, D., et al: ‘SPMVisor: dynamic scratchpad memory virtualization for secure, low power, and high performance distributed on-chip memories’. Proc. Seventh IEEE/ACM/IFIP Int. Conf. on Hardware/Software Codesign and System Synthesis, 2011.
-
11)
-
18. Puaut, I., Pais, C.: ‘Scratchpad memories vs locked caches in hard real-time systems: a quantitative comparison’. Proc. Design, Automation Test in Europe Conf., 2007.
-
12)
-
23. Deverge, J.-F., Puaut, I.: ‘WCET-directed dynamic scratchpad memory allocation of data’. Proc. 19th Euromicro Conf. on Real-Time Systems, 2007.
-
13)
-
9. Texas Instrument: ‘TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor (Rev. E)’. .
-
14)
-
40. Kannan, A., Shrivastava, A., Pabalkar, A., et al: ‘A software solution for dynamic stack management on scratchpad memory’. Proc. Conf. on Asia and South Pacific Design Automation, 2009, pp. 612–617.
-
15)
-
67. Tajik, H., Donyanavard, B., Jahn, J., et al: ‘SPMPool: Runtime SPM management for embedded many-cores’. , University of California, Irvine, .
-
16)
-
16. Verma, M., Steinke, S., Marwedel, P.: ‘Data partitioning for maximal scratchpad usage’. Proc. of the Asia and South Pacific Design Automation Conf., 2003.
-
17)
-
3. Li, T., John, L.K.: ‘ADir_pNB: a cost-effective way to implement full map directory-based cache coherence protocols’, IEEE Trans. Comput., 2001, 50, (9), pp. 921–934 (doi: 10.1109/12.954507).
-
18)
-
50. Gauthier, L., Ishihara, T., Takase, H., et al: ‘Minimizing inter-task interferences in scratch-pad memory usage for reducing the energy consumption of multi-task systems’. Proc. Int. Conf. on Compilers, Architectures and Synthesis for Embedded Systems, 2010.
-
19)
-
17. Steinke, S., Wehmeyer, L., Lee, B., et al: ‘Assigning program and data objects to scratchpad for energy reduction’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2002, p. 409.
-
20)
-
13. Avissar, O., Barua, R., Stewart, D.: ‘Heterogeneous memory management for embedded systems’. Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2001.
-
21)
-
14. Avissar, O., Barua, R., Stewart, D.: ‘An optimal memory allocation scheme for scratch-pad-based embedded systems’, Trans. Embed. Comput. Syst., 2002, 1, (1), pp. 6–26 (doi: 10.1145/581888.581891).
-
22)
-
60. Deng, N., Ji, W., Li, J., et al: ‘A semi-automatic scratchpad memory management framework for CMP’. Proc. Ninth Int. Conf. on Advanced Parallel Processing Technologies, 2011.
-
23)
-
41. Bai, K., Shrivastava, A., Kudchadker, S.: ‘Stack data management for Limited Local Memory (LLM) multi-core processors’. Proc. Int. Conf. on Application Specific Systems, Architectures and Processors (ASAP), 2011, pp. 231–234.
-
24)
-
58. Zhang, L., Qiu, M., Tseng, W.-C., et al: ‘Variable partitioning and scheduling for MPSoC with virtually shared scratch pad memory’, J. Signal Process. Syst., 2010, 58, (2), pp. 247–265 (doi: 10.1007/s11265-009-0362-3).
-
25)
-
48. Cai, J., Shrivastava, A.: ‘Software coherence management on non-coherent cache multi-cores’. 2016 29th Int. Conf. on VLSI Design and 2016 15th Int. Conf. on Embedded Systems (VLSID), January 2016, pp. 397–402.
-
26)
-
12. Sjödin, J., von Platen, C.: ‘Storage Allocation for Embedded Processors’. Proc. of the Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2001.
-
27)
-
38. Egger, B., Lee, J., Shin, H.: ‘Scratchpad memory management in a multitasking environment’. Proc. eighth ACM Int. Conf. on Embedded Software, 2008.
-
28)
-
22. Wan, Q., Wu, H., Xue, J.: ‘WCET-aware data selection and allocation for scratchpad memory’. Proc. 13th ACM SIGPLAN/SIGBED Int. Conf. on Languages, Compilers, Tools and Theory for Embedded Systems, 2012.
-
29)
-
25. Udayakumaran, S., Barua, R.: ‘Compiler-decided dynamic memory allocation for scratch-pad based embedded systems’. Proc. Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, 2003.
-
30)
-
49. Keleher, P., Cox, A.L., Zwaenepoel, W.: ‘Lazy release consistency for software distributed shared memory’. Proc. 19th Annual Int. Symp. on Computer Architecture, 1992.
-
31)
-
55. Cho, D., Pasricha, S., Issenin, I., et al: ‘Compiler driven data layout optimization for regular/irregular array access patterns’. Proc. ACM SIGPLAN-SIGBED Conf. on Languages, Compilers, and Tools for Embedded Systems, 2008.
-
32)
-
66. Bathen, L.A.D., Dutt, N.D.: ‘SPMCloud: towards the single-chip embedded scratchpad memory-based storage cloud’, ACM Trans. Des. Autom. Electron. Syst., 2014, 19, (3), pp. 22:1–22:45 (doi: 10.1145/2611755).
-
33)
-
26. Dominguez, A., Udayakumaran, S., Barua, R.: ‘Heap data allocation to scratch-pad memory in embedded systems’, J. Embed. Comput., 2005, 1, (4), pp. 521–540.
-
34)
-
46. AMD: ‘HPC processor comparison’. .
-
35)
-
42. Lu, J., Bai, K., Shrivastava, A.: ‘SSDM: smart stack data management for Software Managed Multicores (SMMs)’. Proc. 50th Design Automation Conf. (DAC), 2013.
-
36)
-
5. de Dinechin, B.D., de Massas, P.G., Lager, G., et al: ‘A distributed run-time environment for the Kalray MPPA-256 integrated manycore processor’. Procedia Computer Science, 2013.
-
37)
-
19. Wu, H., Xue, J., Parameswaran, S.: ‘Optimal WCET-aware code selection for scratchpad memory’. Proc. 10th ACM International Conf. on Embedded Software, 2010.
-
38)
-
53. Takase, H., Tomiyama, H., Takada, H.: ‘Partitioning and allocation of scratch-pad memory for priority-based preemptive multi-task systems’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2010.
-
39)
-
6. Ebert, C., Jones, C.: ‘Embedded software: facts, figures, and future’, Computer, 2009, 42, (4), pp. 42–52 (doi: 10.1109/MC.2009.118).
-
40)
-
43. Bai, K., Shrivastava, A.: ‘Heap data management for Limited Local Memory (LLM) multi-core processors’. Proc. 23th Int. Symp. on System Synthesis (CODES + ISSS), New York, NY, USA, 2010, pp. 317–326, .
-
41)
-
11. Panda, P.R., Dutt, N.D., Nicolau, A.: ‘Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications’. IEEE Computer Society Proc. 1997 European Conf. on Design and Test, ser. EDTC ‘97, 1997, p. 7.
-
42)
-
61. Alvarez, L., Vilanova, L., Moreto, M., et al: ‘Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures’. Proc. of the 42nd Annual Int. Symp. on Computer Architecture, 2015.
-
43)
-
44. Bai, K., Shrivastava, A.: ‘Automatic and efficient heap data management for limited local memory multicore architectures’. Proc. Int. Conf. on Design Automation and Test in Europe, 2013.
-
44)
-
8. ARM: ‘ARM1176JZF-S Technical Reference Manual’. .
-
45)
-
33. Udayakumaran, S., Dominguez, A., Barua, R.: ‘Dynamic allocation for scratch-pad memory using compile-time decisions’, ACM TECS, 2006, 5, (2), pp. 472–511 (doi: 10.1145/1151074.1151085).
-
46)
-
21. Suhendra, V., Mitra, T., Roychoudhury, A., et al: ‘WCET centric data allocation to scratchpad memory’. Proc. 26th IEEE Int. Real-Time Systems Symp., 2005.
-
47)
-
29. Hu, J., Xue, C., Zhuge, Q., et al: ‘Towards energy efficient hybrid on-chip scratch pad memory with non-volatile memory’. Proc. Design, Automation Test in Europe Conf. Exhibition, 2011.
-
48)
-
1. Heinrich, M.A.: ‘The performance and scalability of distributed shared-memory cache coherence protocols’. Ph.D. dissertation, Stanford University, Stanford, CA, USA, 1999, .
-
49)
-
7. Flachs, B., Asano, S., Dhong, S.H., et al: ‘The microarchitecture of the synergistic processor for a cell processor’, IEEE J. Solid-State Circuits, 2006, 41, (1), pp. 63–70 (doi: 10.1109/JSSC.2005.859332).
-
50)
-
51. Francesco, P., Marchal, P., Atienza, D., et al: ‘An integrated hardware/software approach for run-time scratchpad management’. Proc. 41st Annual Design Automation Conf., 2004.
-
51)
-
45. Bai, K., Lu, J., Shrivastava, A., et al: ‘Cmsm: an efficient and effective code management for software managed multicores’. Proc. Int. Symp. on Hardware/Software Codesign and System Synthesis (CODES + ISSS), 2013.
-
52)
-
31. Poletti, F., Marchal, P., Atienza, D., et al: ‘An integrated hardware/software approach for run-time scratchpad management’. Proc. Design Automation Conf., 2004.
-
53)
-
10. Banakar, R., Steinke, S., Lee, B.-S., et al: ‘Scratchpad Memory: Design Alternative for Cache on-chip Memory in Embedded Systems’. Proc. of CODES, 2002.
-
54)
-
27. Li, L., Gao, L., Xue, J.: ‘Memory coloring: a compiler approach for scratchpad memory management’. Proc. PACT, 2005.
-
55)
-
59. Cho, D., Pasricha, S., Issenin, I., et al: ‘Adaptive scratch pad memory management for dynamic behavior of multimedia applications’, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., 2009, 28, (4), pp. 554–567 (doi: 10.1109/TCAD.2009.2014002).
-
56)
-
52. Pyka, R., Faßbach, C., Verma, M., et al: ‘Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications’. Proc. 10th Int. Workshop on Software & Compilers for Embedded Systems, 2007.
-
57)
-
24. Marwedel, P., Wehmeyer, L., Verma, M., et al: ‘Fast, predictable and low energy memory references through architecture-aware compilation’. Proc. Asia and South Pacific Design Automation Conf., 2004.
-
58)
-
65. Bathen, L., Dutt, N.: ‘HaVOC: A hybrid memory-aware virtualization layer for on-chip distributed ScratchPad and Non-Volatile Memories’. Proc. 49th ACM/EDAC/IEEE Design Automation Conf., 2012.
-
59)
-
56. Marongiu, A., Benini, L.: ‘An openMP compiler for efficient use of distributed scratchpad memory in MPSoCs’, IEEE Trans. Comput., 2012, 61, (2), pp. 222–236 (doi: 10.1109/TC.2010.199).
-
60)
-
35. Pabalkar, A., Shrivastava, A., Kannan, A., et al: ‘Sdrm: simultaneous determination of regions and function-to-region mapping for scratchpad memories’, Sadayappan, P., Parashar, M., Badrinath, R., Prasanna, V. (Eds.): ‘High Performance Computing – HiPC 2008’, 2008, (5374), pp. 569–582.
-
61)
-
62. Komuravelli, R., Sinclair, M.D., Alsop, J., et al: ‘Stash: have your scratchpad and cache it too’. Proc. 42nd Annual Int. Symp. on Computer Architecture, 2015.
-
62)
-
39. Guthaus, M.R., Ringenberg, J.S., Ernst, D., et al: ‘MiBench: a free, commercially representative embedded benchmark suite’. Proc. IEEE Int. Workshop on Workload Characterization, 2001.
-
63)
-
30. Panda, P., Dutt, N.D., Nicolau, A.: ‘On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems’, ACM Trans. Des. Autom. Electron. Syst. (TODAES), 2000, 5, (3), pp. 682–704 (doi: 10.1145/348019.348570).
-
64)
-
28. Kandemir, M.T., Ramanujam, J., Irwin, M.J., et al: ‘Dynamic Management of Scratch-Pad Memory Space’. Proc. Design Automation Conf., 2001, pp. 690–695.
-
65)
-
15. Nguyen, N., Dominguez, A., Barua, R.: ‘Memory allocation for embedded systems with a compile-time-unknown scratch-pad size’. Proc. Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, 2005, pp. 115–125.
-
66)
-
54. Suhendra, V., Raghavan, C., Mitra, T.: ‘Integrated scratchpad memory optimization and task scheduling for MPSoC architectures’. Proc. Int. Conf. on Compilers, Architecture and Synthesis for Embedded Systems, 2006.
-
67)
-
57. Suhendra, V., Roychoudhury, A., Mitra, T.: ‘Scratchpad allocation for concurrent embedded software’, ACM Trans. Program. Lang. Syst., 2010, 32, (4), pp. 13:1–13:47 (doi: 10.1145/1734206.1734210).
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2016.0024
Related content
content/journals/10.1049/iet-cdt.2016.0024
pub_keyword,iet_inspecKeyword,pub_concept
6
6