http://iet.metastore.ingenta.com
1887

SkipCache: application aware cache management for chip multi-processors

SkipCache: application aware cache management for chip multi-processors

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

With the advent of multiple cores on a single chip, it is common for the systems to have multi-level caches. Multiple levels of cache reduce the pressure on the memory bandwidth by allowing applications to store their frequently accessed data in them. The levels of cache nearer to the core filter the locality in the application access, which can result in high miss rates at farther levels. This piece of study revolves around one question: are all levels of cache needed by all applications during all phases of their execution? The study observes the effect of 2-level and 3-level cache hierarchies on the performance of different applications. On the basis of this study, this study proposes an application aware cache management policy called ‘SkipCache’, which allows an application to choose a 2-level or 3-level cache hierarchy during run-time. SkipCache dynamically tracks the applications at shared last-level cache (LLC) to identify the applications that do not obtain advantage by using the LLC. Such applications can completely skip the LLC so that other co-scheduled cache friendly applications can efficiently use it. Evaluation of SkipCache in a 4-core chip multi-processor with multi-programmed workloads shows significant performance improvement. SkipCache is orthogonal to other cache management techniques and can be used along with other optimisation techniques to improve the system performance.

References

    1. 1)
    2. 2)
      • 2. Jaleel, A., Theobald, K.B., Steely, Jr.S.C., Emer, J.: ‘High performance cache replacement using re-reference interval prediction (RRIP)’. Proc. ISCA, 2010, pp. 6071.
    3. 3)
      • 3. Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, Jr.S.C., Emer, J.: ‘Ship: signature-based hit predictor for high performance caching’. Proc. MICRO, 2011, pp. 430441.
    4. 4)
      • 4. Feng, M., Tian, C., Lin, C., Gupta, R.: ‘Dynamic access distance driven cache replacement’, ACM TACO, 2011, 8, (3), pp. 14:114:30.
    5. 5)
      • 5. Petoumenos, P., Keramidas, G., Kaxiras, S.: ‘Instruction-based reuse-distance prediction for effective cache management’. Proc. SAMOS, 2009, pp. 4958.
    6. 6)
      • 6. Chaudhuri, M., Gaur, J., Bashyam, N., Subramoney, S., Nuzman, J.: ‘Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches’. Proc. of PACT, 2012, pp. 293304.
    7. 7)
      • 7. Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., Veidenbaum, A.V.: ‘Improving cache management policies using dynamic reuse distances’. Proc. MICRO, 2012, pp. 389400.
    8. 8)
      • 8. Qureshi, M.K., Patt, Y.N.: ‘Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches’. Proc. MICRO, 2006, pp. 423432.
    9. 9)
      • 9. Xie, Y., Loh, G.H.: ‘Scalable shared-cache management by containing thrashing workloads’. Proc. HiPEAC, 2010, pp. 262276.
    10. 10)
      • 10. Xie, Y., Loh, G.H.: ‘PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches’. Proc. ISCA, 2009, pp. 174183.
    11. 11)
      • 11. Sanchez, D., Kozyrakis, C.: ‘Vantage: scalable and efficient fine-grain cache partitioning’. Proc. ISCA, 2011, pp. 5768.
    12. 12)
      • 12. Memik, G., Reinman, G., Mangione-Smith, W.-H.: ‘Just say no: benefits of early cache miss determination’. Proc. HPCA, 2003, pp. 307316.
    13. 13)
      • 13. Hayenga, M., Nere, A., Lipasti, M.: ‘MadCache: a PC-aware cache insertion policy’. JILP Workshop on Computer Architecture Competitions: Cache Replacement Championship, 2010.
    14. 14)
      • 14. Li, L., Tong, D., Xie, Z., Lu, J., Cheng, X.: ‘Optimal bypass monitor for high performance last-level caches’. Proc. PACT, 2012, pp. 315324.
    15. 15)
    16. 16)
      • 16. Jaleel, A.: ‘Memory characterization of workloads using instrumentation-driven simulation – a pin-based memory characterization of the SPEC CPU2000 and SPEC CPU2006 benchmark suites’. Versatile Systems and Simulation Advanced Development Technical Report, 2007.
    17. 17)
      • 17. Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: ‘Adaptive insertion policies for high-performance caching’. Proc. ISCA, 2007, pp. 381391.
    18. 18)
    19. 19)
    20. 20)
      • 20. Sanchez, D., Yen, L., Hill, M.D., Sankaralingam, K.: ‘Implementing signatures for transactional memory’. Proc. of MICRO, 2007, pp. 123133.
    21. 21)
      • 21. ‘SPEC CPU benchmark suite’. Available at http://www.spec.org.
    22. 22)
    23. 23)
      • 23. Muralimanohar, N., Balasubramonian, R., Jouppi, N.: ‘Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0’. Proc. MICRO, 2007, pp. 314.
    24. 24)
    25. 25)
      • 25. Kaxiras, S., Hu, Z., Martonosi, M.: ‘Cache decay: exploiting generational behavior to reduce cache leakage power’. Proc. ISCA, 2001, pp. 240251.
    26. 26)
      • 26. Manikantan, R., Rajan, K., Govindarajan, R.: ‘NUcache: an efficient multicore cache organization based on next-use distance’. Proc. HPCA, 2011, pp. 243253.
    27. 27)
      • 27. Gupta, S., Gao, H., Zhou, H.: ‘Adaptive cache bypassing for inclusive last level caches’. Proc. IPDPS, 2013, pp. 12431253.
    28. 28)
      • 28. Sim, J., Lee, J., Qureshi, M.K., Kim, H.: ‘Flexclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion’. Proc. ISCA, 2012, pp. 321332.
    29. 29)
      • 29. Qu, N., Gou, X., Cheng, X.: ‘Using uncacheable memory to improve unity Linux performance’. Proc. workshop on Interaction between Operating System and Computer Architecture, Austin, TX, October 2005. pp. 2732.
    30. 30)
      • 30. Coleman, J.A., Srivastava, D.: ‘Controlling a processor cache using a real-time attribute’. WO Patent App. PCT/US2011/066,973,2013.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2014.0150
Loading

Related content

content/journals/10.1049/iet-cdt.2014.0150
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address