Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Tree-based scheme for reducing shared cache miss rate leveraging regional, statistical and temporal similarities

Cache miss can have a major impact on overall performance of many-core systems. A miss may result in extra traffic and delay because of coherency messages. This has been reduced in coarse-grain coherency protocols where only shared misses require a coherency message. Conventional off-chip methods manage the shared miss rate by relying on reuse histories. However the pertinent memory overhead that comes with reuse histories makes them impractical for on-chip multi-processor systems. In this study, a new scheme has been proposed to reduce shared cache miss rate in multi-processor system-on-chips that benefits from novel prefetching techniques to L2 caches from off-chip memories or other remote L2 caches located on-chip. In the proposed scheme, the previously proposed Virtual Tree Coherence (VTC) method has been extended to limit block forwarding messages to true sharers within each region. Instead of relying on exact reuse histories, shared regions are searched for regional, temporal and statistical similarities. These similarities are exploited for determining the sharers that should receive the forwarded blocks. The proposed method has been evaluated with Splash-2 workloads. Simulation results indicate that the proposed method has reduced shared miss count by up to 75%, and improved interconnect traffic by up to 47% compared with VTC.

References

    1. 1)
      • 16. Pagiamtzis, K., Sheikholeslami, A.: ‘Content-addressable memory (CAM) circuits and architectures: a tutorial and survey’, IEEE J. Solid-State Circuits, 2006, 41, (3), pp. 712727.
    2. 2)
      • 7. Chang, J., Sohi, G.S.: ‘Cooperative caching for chip multiprocessors’. Proc. 33rd Int. Symp. on Computer Architecture (ISCA), Boston, MA, USA, May 2006.
    3. 3)
      • 21. SCC External Architecture Specification (EAS), Intel Corporation, November 2010, Revision 1.1.
    4. 4)
      • 10. Zebchuk, J., Srinivasan, V., Qureshi, M.K., Moshovos, A.: ‘A tagless coherence directory’. Proc. 42nd Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO-42), New York, USA, 2009, pp. 423434.
    5. 5)
      • 6. Ganusov, I., Burtscher, M.: ‘A hardware prefetching technique for chip multiprocessors’. Proc. 11th IEEE Symp. on High-Performance Computer Architecture (HPCA), San Francisco, CA, February 2005, pp. 350360.
    6. 6)
      • 18. Wentzlaff, D., Griffin, P., Hoffmann, H., et al: ‘On-chip interconnection architecture of the tile processor’, IEEE Micro., 2007, 27, (5), pp. 1531 (doi: 10.1109/MM.2007.4378780).
    7. 7)
      • 1. Hennessy, J.L., Patterson, D.A.: ‘Computer architecture – a quantitative approach’ (Morgan Kaufmann, 2002, 3nd edn.).
    8. 8)
      • 11. Carter, L.C., Donglai Dai, J.B.: ‘An adaptive cache coherence protocol optimized for producer-consumer sharing’. Proc. 13th IEEE Symp. on High-Performance Computer Architecture (HPCA), Arizona, USA, February 2007, pp. 328339.
    9. 9)
      • 3. Cheng, L., Carter, J.B.: ‘Extending CC-NUMA systems to support write update optimizations’. Proc. ACM/IEEE Conf. Supercomputing, Texas, November 2008.
    10. 10)
      • 17. Hoskote, Y., Cho, M.H., Lis, M., et al: ‘A 5-Ghz mesh interconnect for a teraflops processor’, IEEE Micro., 2007, 27, (5), pp. 5161 (doi: 10.1109/MM.2007.4378783).
    11. 11)
      • 13. Wenisch, T.F., Somogyi, S., Hardavellas, N., Kim, J., Ailamaki, A., Falsafi, B.: ‘Temporal streaming of shared memory’. Proc. 32nd Int. Symp. on Computer Architecture (ISCA), Madison, Wisconsin, USA, June 2005.
    12. 12)
      • 2. Cantin, J.F., Lipasti, M.H., Smith, J.E.: ‘Improving multiprocessor performance with coarse-grain coherence tracking’. Proc. 13th Annual Int. Symp. on Computer Architecture (ISCA), June 2005, pp. 246257.
    13. 13)
      • 12. Leventhal, S., Franklin, M.: ‘Perceptron based consumer prediction in shared-memory multiprocessors’. Proc. 24th Int. Conf. Computer Design (ICCD), San Jose, CA, USA, October 2006.
    14. 14)
      • 16. Pagiamtzis, K., Sheikholeslami, A.: ‘Content-addressable memory (CAM) circuits and architectures: a tutorial and survey’, IEEE J. Solid-State Circuits, 2006, 41, (3), pp. 712727.
    15. 15)
      • 9. Ros, A., Acacio, M.E., García, J.M.: ‘A direct coherence protocol for many-core chip multiprocessors’, IEEE Trans. Parallel Distrib. Syst., 2010, 21, (12), pp. 17791792 (doi: 10.1109/TPDS.2010.43).
    16. 16)
      • 14. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: ‘The SPLASH-2 programs: characterization and methodological considerations’. Proc. 22nd Int. Symp. on Computer Architecture (ISCA), Italy, June 1995, pp. 2436.
    17. 17)
      • 5. Jerger, N., Lipasti, M., Peh, L.: ‘Virtual tree coherence: leveraging regions and in-network multicast trees for scalable cache coherence’. Int. Symp. on Microarchitecture, November 2008, pp. 252263.
    18. 18)
      • 4. Somogyi, S., Wenisch, T.F., Ailamaki, A., Falsafi, B., Moshovos, A.: ‘Spatial memory streaming’. Proc. 13th Annual Int. Symp. on Computer Architecture (ISCA), Boston, MA USA, June 2006, pp. 252263.
    19. 19)
      • 15. Renau, J., Fraguela, B., Tuck, J., et al: ‘SESC simulator, available at: http://www.sesc.sourceforge.net, accessed on April 2011.
    20. 20)
      • 8. Jiang, G., Fen, D., Tong, L., Xiang, L., Wang, C., Chen, T.: ‘L1 collective cache: managing shared data for chip multiprocessors’. Proc. Eighth Int. Symp. on Advanced Parallel Processing Technologies, Switzerland, August 2009, pp. 123133.
    21. 21)
      • 20. Balakrishnan, A., Naeemi, A.: ‘Interconnect network analysis of many-core chips’, IEEE Trans. Electron Devices, 2011, 58, (9), pp. 28312837 (doi: 10.1109/TED.2011.2158104).
    22. 22)
      • 19. Muralimanohar, N., Balasubramonian, R., Jouppi, N.P.: ‘CACTI 6.0: A tool to model large caches’. Technical Report HPL-2009-85, HP Laboratories, 2009.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2011.0066
Loading

Related content

content/journals/10.1049/iet-cdt.2011.0066
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address