© The Institution of Engineering and Technology
With the advent of multiple cores on a single chip, it is common for the systems to have multi-level caches. Multiple levels of cache reduce the pressure on the memory bandwidth by allowing applications to store their frequently accessed data in them. The levels of cache nearer to the core filter the locality in the application access, which can result in high miss rates at farther levels. This piece of study revolves around one question: are all levels of cache needed by all applications during all phases of their execution? The study observes the effect of 2-level and 3-level cache hierarchies on the performance of different applications. On the basis of this study, this study proposes an application aware cache management policy called ‘SkipCache’, which allows an application to choose a 2-level or 3-level cache hierarchy during run-time. SkipCache dynamically tracks the applications at shared last-level cache (LLC) to identify the applications that do not obtain advantage by using the LLC. Such applications can completely skip the LLC so that other co-scheduled cache friendly applications can efficiently use it. Evaluation of SkipCache in a 4-core chip multi-processor with multi-programmed workloads shows significant performance improvement. SkipCache is orthogonal to other cache management techniques and can be used along with other optimisation techniques to improve the system performance.
References
-
-
1)
-
1. Wulf, Wm.A., McKee, S.A.: ‘Hitting the memory wall: implications of the obvious’, SIGARCH Comput. Archit. News, 1995, 23, (1), pp. 20–24 (doi: 10.1145/216585.216588).
-
2)
-
24. Ramakrishna, M.V., Fu, E., Bahcekapili, E.: ‘Efficient hardware hashing functions for high performance computers’, IEEE Trans. Comput., 1997, 46, (12), pp. 1378–1381 (doi: 10.1109/12.641938).
-
3)
-
22. Binkert, N., Beckmann, B., Black, G., et al: ‘The gem5 simulator’, SIGARCH Comput. Archit. News, 2011, 39, (2), pp. 1–7 (doi: 10.1145/2024716.2024718).
-
4)
-
19. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: ‘Summary cache: a scalable wide-area web cache sharing protocol’, IEEE/ACM Trans. Netw., 2000, 8, (3), pp. 281–293 (doi: 10.1109/90.851975).
-
5)
-
18. Foglia, P., Comparetti, M.: ‘A workload independent energy reduction strategy for D-NUCA caches’, J. Supercomput., 2014, 68, (26), pp. 157–182 (doi: 10.1007/s11227-013-1033-5).
-
6)
-
15. Henning, J.L.: ‘SPEC cpu2006 benchmark descriptions’, SIGARCH Comput. Archit. News, 2006, 34, (4), pp. 1–17 (doi: 10.1145/1186736.1186737).
-
7)
-
2. Jaleel, A., Theobald, K.B., Steely, Jr.S.C., Emer, J.: ‘High performance cache replacement using re-reference interval prediction (RRIP)’. Proc. ISCA, 2010, pp. 60–71.
-
8)
-
9)
-
9. Xie, Y., Loh, G.H.: ‘Scalable shared-cache management by containing thrashing workloads’. Proc. HiPEAC, 2010, pp. 262–276.
-
10)
-
L. Fan ,
P. Cao ,
J. Almeida ,
A.Z. Broder
.
Summary cache: a scalable wide-area web cache sharing protocol.
IEEE/ACM Trans. Netw.
,
3 ,
281 -
293
-
11)
-
30. Coleman, J.A., Srivastava, D.: ‘Controlling a processor cache using a real-time attribute’. .
-
12)
-
29. Qu, N., Gou, X., Cheng, X.: ‘Using uncacheable memory to improve unity Linux performance’. Proc. workshop on Interaction between Operating System and Computer Architecture, Austin, TX, October 2005. pp. 27–32.
-
13)
-
M.V. Ramakrishna ,
E. Fu ,
E. Bahcekapili
.
Efficient hardware hashing functions for high performance computers.
IEEE Trans. Comput.
,
12 ,
1378 -
1381
-
14)
-
22. Binkert, N., Beckmann, B., Black, G., et al: ‘The gem5 simulator’, SIGARCH Comput. Archit. News, 2011, 39, (2), pp. 1–7 (doi: 10.1145/2024716.2024718).
-
15)
-
23. Muralimanohar, N., Balasubramonian, R., Jouppi, N.: ‘Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0’. Proc. MICRO, 2007, pp. 3–14.
-
16)
-
28. Sim, J., Lee, J., Qureshi, M.K., Kim, H.: ‘Flexclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion’. Proc. ISCA, 2012, pp. 321–332.
-
17)
-
11. Sanchez, D., Kozyrakis, C.: ‘Vantage: scalable and efficient fine-grain cache partitioning’. Proc. ISCA, 2011, pp. 57–68.
-
18)
-
10. Xie, Y., Loh, G.H.: ‘PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches’. Proc. ISCA, 2009, pp. 174–183.
-
19)
-
27. Gupta, S., Gao, H., Zhou, H.: ‘Adaptive cache bypassing for inclusive last level caches’. Proc. IPDPS, 2013, pp. 1243–1253.
-
20)
-
20. Sanchez, D., Yen, L., Hill, M.D., Sankaralingam, K.: ‘Implementing signatures for transactional memory’. Proc. of MICRO, 2007, pp. 123–133.
-
21)
-
8. Qureshi, M.K., Patt, Y.N.: ‘Utility-based cache partitioning: a low-overhead, high-performance, runtime mechanism to partition shared caches’. Proc. MICRO, 2006, pp. 423–432.
-
22)
-
16. Jaleel, A.: ‘Memory characterization of workloads using instrumentation-driven simulation – a pin-based memory characterization of the SPEC CPU2000 and SPEC CPU2006 benchmark suites’. , 2007.
-
23)
-
17. Qureshi, M.K., Jaleel, A., Patt, Y.N., Steely, S.C., Emer, J.: ‘Adaptive insertion policies for high-performance caching’. Proc. ISCA, 2007, pp. 381–391.
-
24)
-
18. Foglia, P., Comparetti, M.: ‘A workload independent energy reduction strategy for D-NUCA caches’, J. Supercomput., 2014, 68, (26), pp. 157–182 (doi: 10.1007/s11227-013-1033-5).
-
25)
-
1. Wulf, Wm.A., McKee, S.A.: ‘Hitting the memory wall: implications of the obvious’, SIGARCH Comput. Archit. News, 1995, 23, (1), pp. 20–24 (doi: 10.1145/216585.216588).
-
26)
-
13. Hayenga, M., Nere, A., Lipasti, M.: ‘MadCache: a PC-aware cache insertion policy’. JILP Workshop on Computer Architecture Competitions: Cache Replacement Championship, 2010.
-
27)
-
4. Feng, M., Tian, C., Lin, C., Gupta, R.: ‘Dynamic access distance driven cache replacement’, ACM TACO, 2011, 8, (3), pp. 14:1–14:30.
-
28)
-
5. Petoumenos, P., Keramidas, G., Kaxiras, S.: ‘Instruction-based reuse-distance prediction for effective cache management’. Proc. SAMOS, 2009, pp. 49–58.
-
29)
-
26. Manikantan, R., Rajan, K., Govindarajan, R.: ‘NUcache: an efficient multicore cache organization based on next-use distance’. Proc. HPCA, 2011, pp. 243–253.
-
30)
-
7. Duong, N., Zhao, D., Kim, T., Cammarota, R., Valero, M., Veidenbaum, A.V.: ‘Improving cache management policies using dynamic reuse distances’. Proc. MICRO, 2012, pp. 389–400.
-
31)
-
3. Wu, C.-J., Jaleel, A., Hasenplaugh, W., Martonosi, M., Steely, Jr.S.C., Emer, J.: ‘Ship: signature-based hit predictor for high performance caching’. Proc. MICRO, 2011, pp. 430–441.
-
32)
-
15. Henning, J.L.: ‘SPEC cpu2006 benchmark descriptions’, SIGARCH Comput. Archit. News, 2006, 34, (4), pp. 1–17 (doi: 10.1145/1186736.1186737).
-
33)
-
12. Memik, G., Reinman, G., Mangione-Smith, W.-H.: ‘Just say no: benefits of early cache miss determination’. Proc. HPCA, 2003, pp. 307–316.
-
34)
-
25. Kaxiras, S., Hu, Z., Martonosi, M.: ‘Cache decay: exploiting generational behavior to reduce cache leakage power’. Proc. ISCA, 2001, pp. 240–251.
-
35)
-
6. Chaudhuri, M., Gaur, J., Bashyam, N., Subramoney, S., Nuzman, J.: ‘Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches’. Proc. of PACT, 2012, pp. 293–304.
-
36)
-
14. Li, L., Tong, D., Xie, Z., Lu, J., Cheng, X.: ‘Optimal bypass monitor for high performance last-level caches’. Proc. PACT, 2012, pp. 315–324.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2014.0150
Related content
content/journals/10.1049/iet-cdt.2014.0150
pub_keyword,iet_inspecKeyword,pub_concept
6
6