access icon free Impact of spintronic memory on multicore cache hierarchy design

Spintronic memory [spin-transfer torque-magnetic random access memory (STT-MRAM)] is an attractive alternative technology to CMOS since it offers higher density and virtually no leakage current. Spintronic memory continues to require higher write energy, however, presenting a challenge to memory hierarchy design when energy consumption is a concern. This study motivates the use of STT-MRAM for the first-level caches of a multicore processor to reduce energy consumption without significantly degrading the performance. The large STT-MRAM first-level cache implementation saves leakage power. Moreover, the use of small level-0 cache regains the performance drop due to STT-MRAM long write latencies. The combination of both reduces the energy-delay product by 65% on average compared with CMOS baseline. The proposed STT hierarchy also shows good scalability over the CMOS with a few benchmarks which scale significantly better. The PARSEC and Splash2 benchmark suites are analysed running on a modern multicore platform, comparing performance, energy consumption and scalability of the spintronic cache system to a CMOS design.

Inspec keywords: MRAM devices; multiprocessing systems; cache storage

Other keywords: spin-transfer torque-magnetic random access memory; STT-MRAM; PARSEC; multicore cache hierarchy design; Splash2; energy consumption; spintronic memory; multicore processor

Subjects: Semiconductor storage; Magneto-acoustic, magnetoresistive, magnetostrictive and magnetostatic wave devices; Storage on stationary magnetic media; Multiprocessing systems; Memory circuits

References

    1. 1)
      • 10. L. Hewlett-Packard Development Company: ‘Cacti 6.5’, 2009, Available at http://www.hpl.hp.com/research/cacti/.
    2. 2)
      • 4. Sun, Z., Bi, X., Li, H.H., et al: ‘Multi retention level STT-RAM cache designs with a dynamic refresh scheme’. MICRO-44'11: Proc. of the 44th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2011.
    3. 3)
      • 31. Del Bel, B., Kim, J., Kim, C., et al: ‘Improving stt-MRAM density through multibit error correction’. Design, Automation and Test in Europe Conf. and Exhibition (DATE), 2014, 2014, pp. 16.
    4. 4)
      • 7. Jog, A., Mishra, A.K., Xu, C., et al: ‘Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs’. DAC'12: Proc. of the 49th Annual Design Automation Conf., 2012, pp. 243252.
    5. 5)
      • 16. Bienia, C., Kumar, S., Li, K.: ‘PARSEC vs. splash-2: a quantitative comparison of two multithreaded benchmark suites on chip-multiprocessors’. IEEE Int. Symp. on Workload Characterization, 2008. IISWC 2008, 2008, pp. 4756.
    6. 6)
      • 33. Kin, J., Gupta, M., Mangione-Smith, W.H.: ‘The filter cache: an energy efficient memory structure’. Proc. of the 30th Annual ACM/IEEE Int. Symp. on Microarchitecture, MICRO 30, Washington, DC, USA, 1997, pp. 184193. Available at http://www.dl.acm.org/citation.cfm?id=266800.266818.
    7. 7)
      • 18. Chun, K.C., Zhao, H., Harms, J.D., et al: ‘A scaling roadmap and performance evaluation of in-plane and perpendicular MTJ based STT-MRAMs for high-density cache memory’, IEEE J. Solid-State Circuits, 2013, 48, (2), pp. 598610.
    8. 8)
      • 8. Kim, J., Zhao, H., Jiang, Y., et al: ‘Scaling analysis of in-plane and perpendicular anisotropy magnetic tunnel junctions using a physics-based model’. Device Research Conf. (DRC), 2014, 2014.
    9. 9)
      • 22. Xu, W., Sun, H., Wang, X., et al: ‘Design of last-level on-chip cache using spin-torque transfer RAM (stt RAM)’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2011, 19, (3), pp. 483493.
    10. 10)
      • 15. Woo, S.C., Ohara, M., Torrie, E., et al: ‘The splash-2 programs: characterization and methodological considerations’. Proc. of the 22nd Annual Int. Symp. on Computer Architecture, ISCA'95 mct, New York, NY, USA, 1995, pp. 2436. Available at http://www.doi.acm.org/10.1145/223982.223990.
    11. 11)
      • 2. Smullen, C.W.I., Mohan, V., Nigam, A., et al: ‘Relaxing non-volatility for fast and energy-efficient STT-RAM caches’. 2011 IEEE 17th Int. Symp. on High Performance Computer Architecture (HPCA), 2011, pp. 5061.
    12. 12)
      • 11. Zhao, W., Cao, Y.: ‘New generation of predictive technology model for sub-45 nm design exploration’. Seventh Int. Symp. on Quality Electronic Design, 2006. ISQED'06, 2006, p. 6.
    13. 13)
      • 19. Gonzales, R., Horowitz, M.: ‘Energy dissipation in general purpose processors’, IEEE J. Solid State Circuits, 1995, 31, pp. 12771284.
    14. 14)
      • 1. Park, S.P., Gupta, S., Mojumder, N., et al: ‘Future cache design using STT MRAMs for improved energy efficiency: devices, circuits and architecture’. DAC'12: Proc. of the 49th Annual Design Automation Conf., 2012.
    15. 15)
      • 13. Genbrugge, D., Eyerman, S., Eeckhout, L.: ‘Interval simulation: raising the level of abstraction in architectural simulation’. 2010 IEEE 16th Int. Symp. on High Performance Computer Architecture (HPCA), 2010, pp. 112. Available at http://www.dx.doi.org/10.1109/hpca.2010.5416636.
    16. 16)
      • 12. Dong, X, Xu, C., Xie, Y., et al: ‘NVSim: a circuit-level performance, energy, and area model for emerging nonvolatile memory’, IEEE Trans. Comput. Aided Des. Integr. Circuits Systems, 2012, 31, (7), pp. 9941007.
    17. 17)
      • 25. Zhou, P., Zhao, B., Yang, J., et al: ‘Energy reduction for STT-RAM using early write termination’. ICCAD'09: Proc. of the 2009 Int. Conf. on Computer-Aided Design, 2009.
    18. 18)
      • 3. Rasquinha, M., Choudhary, D., Chatterjee, S., et al: ‘An energy efficient cache design using spin torque transfer (STT) RAM’. ISLPED'10: Proc. of the 16th ACM/IEEE Int. Symp. on Low Power Electronics and Design, 2010.
    19. 19)
      • 28. Ahn, J., Yoo, S., Choi, K.: ‘Dasca: dead write prediction assisted stt-RAM cache architecture’. 2014 IEEE 20th Int. Symp. on High Performance Computer Architecture (HPCA2014), February 2014.
    20. 20)
      • 23. Kim, Y., Gupta, S.K., Park, S.P., et al: ‘Write-optimized reliable design of STT MRAM’. ISLPED'12: Proc. of the 2012 ACM/IEEE Int. Symp. on Low power Electronics and Design, 2012.
    21. 21)
      • 32. Jouppi, N.P.: ‘Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers’, ACM SIGARCH Comput. Archit. News, 1990, 18, pp. 364373.
    22. 22)
      • 29. Wu, X., Li, J., Zhang, L., et al: ‘Power and performance of read–write aware hybrid caches with non-volatile memories’. Design, Automation Test in Europe Conf. Exhibition, 2009. DATE'09, 2009, pp. 737742.
    23. 23)
      • 21. Li, Q., Li, J., Shi, L., et al: ‘Compiler-assisted refresh minimization for volatile stt-ram cache’. 2013 18th Asia and South Pacific Design Automation Conf. (ASP-DAC), 2013, pp. 273278.
    24. 24)
      • 17. Alameldeen, A.R., Wood, D.A.: ‘IPC considered harmful for multiprocessor workloads’, IEEE Micro, 2006, 26, (4), pp. 817.
    25. 25)
      • 24. Sun, G., Dong, X., Xie, Y., et al: ‘A novel architecture of the 3D stacked MRAM l2 cache for CMPs’. IEEE 15th Int. Symp. on High Performance Computer Architecture, 2009. HPCA 2009, 2009, pp. 239249.
    26. 26)
      • 6. Senni, S., Torres, L., Sassatelli, G., et al: ‘Emerging non-volatile memory technologies exploration flow for processor architecture’. 2015 IEEE Computer Society Annual Symp. on VLSI (ISVLSI), 2015, p. 460.
    27. 27)
      • 27. Sun, Z., Li, H., Wu, W.: ‘A dual-mode architecture for fast-switching STT-RAM’. ISLPED'12: Proc. of the 2012 ACM/IEEE Int. Symp. on Low Power Electronics and Design, 2012.
    28. 28)
      • 14. Bienia, C.: ‘Benchmarking modern multiprocessors’. PhD thesis, Princeton University, January 2011.
    29. 29)
      • 26. Kwon, K.-W., Choday, S.H., Kim, Y., et al: ‘AWARE (asymmetric write architecture with REdundant blocks): a high write speed STT-MRAM cache architecture’, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., 2013, 22, (4), pp. 712720.
    30. 30)
      • 34. Varma, A., Jacobson, Q.: ‘Destage algorithms for disk arrays with non-volatile caches’. 22nd Annual Int. Symp. on Computer Architecture, 1995. Proc., 1995, pp. 8395.
    31. 31)
      • 35. Gill, B.S., Modha, D.S.: ‘Wow: wise ordering for writes – combining spatial and temporal locality in non-volatile caches’. Proc. of the Fourth Conf. on USENIX Conf. on File and Storage Technologies – Volume 4, FAST'05, Berkeley, CA, USA, 2005, p. 10.
    32. 32)
      • 9. Tuohy, W., Ma, C., Nandkar, P., et al: ‘Improving energy and performance with spintronics caches in multicore systems’. Europar'14: OMHI – Third Annual Workshop on On-Chip Memory Hierarchies and Interconnects, 2014.
    33. 33)
      • 30. Jadidi, A., Arjomand, M., Sarbazi-Azad, H.: ‘High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement’. ISLPED'11: Proc. of the 17th IEEE/ACM Int. Symp. on Low-power Electronics and Design, 2011.
    34. 34)
      • 20. Bhadauria, M., Weaver, V.M., McKee, S.A.: ‘Understanding PARSEC performance on contemporary CMPs’. IEEE Int. Symp. on Workload Characterization, 2009. IISWC 2009, 2009, pp. 98107.
    35. 35)
      • 5. Guo, X., Ipek, E., Soyata, T.: ‘Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing’. ISCA'10: Proc. of the 37th Annual Int. Symp. on Computer Architecture, 2010.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2015.0190
Loading

Related content

content/journals/10.1049/iet-cdt.2015.0190
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading