Two-phase prediction of L1 data cache misses

Two-phase prediction of L1 data cache misses

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IEE Proceedings - Computers and Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Hardware prefetching schemes which divide the misses into streams are generally preferred to other hardware based schemes. But, as they do not know when the next miss of a stream happens, they cannot prefetch a block in appropriate time. Some of them use a substantial amount of hardware storage to keep the predicted miss blocks from all streams. The other approaches follow the program flow and prefetch all target addresses including those blocks which already exist in the L1 data cache. The approach presented predicts the stream of next miss and then prefetches only the next miss address of the stream. It offers a general prefetching framework, two-phase prediction algorithm (TPP), that lets each stream have its own address predictor. Comparing the TPP algorithm with the latest variant of stream buffers and Markov predictor using SPEC CPU 2000 benchmarks shows that in average (1) the TPP approach has 18% speedup compared to 1% speedup in Markov and 0.05% in stream buffers. (2) 78% of the TPP prefetches have been useful, whereas in stream buffers and Markov, only 18% and 24% of them were useful, respectively.


    1. 1)
      • Jouppi, N.P.: `Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers', ISCA-17, May 1990, p. 364–373.
    2. 2)
      • M.J. Charney , T.R. Puzak . Prefetching and memory system behavior of the SPEC95 benchmark suite. IBM J. Res. Dev. , 265 - 286
    3. 3)
      • D. Joseph , D. Grunwald . Prefetching using Markov predictors. IEEE Tran. Comput. , 2 , 121 - 133
    4. 4)
      • Kandiraju, G.B., Sivasubramaniam, A.: `Going the distance for TLB prefetching: an application driven study', ISCA-29, May 2002, p. 195–206.
    5. 5)
    6. 6)
    7. 7)
      • Farkas, K.L., Chow, P., Jouppi, N.P., Vranesic, Z.: `Memroy-system design considerations for dynamically-scheduled processors', ISCA-24, June 1997, p. 133–143.
    8. 8)
      • Ortega, D., Ayduade, E., Valero, M.: `Dynamic memory instruction bypassing', ICS-17, June 2003, p. 316–325.
    9. 9)
      • Roth, A., Moshovos, A., Sohi, G.S.: `Dependence based prefetching for linked data structures', ASPLOS-8, October 1998, p. 115–126.
    10. 10)
      • Sherwood, T., Sair, S., Calder, B.: `Predictor-directed stream buffers', MICRO-33, 2000, p. 42–53.
    11. 11)
      • Palacharla, S., Kessler, R.E.: `Evaluating stream buffers as a secondary cache replacement', ISCA-21, April 1994, p. 24–33.
    12. 12)
      • S. Sharma , J.G. Beu , M. Conte Thomas . Spectral prefetcher: an effective mechanism for L2 cache prefetching. ACM Trans. Archit. Code Optim. (TACO) , 4 , 423 - 450
    13. 13)
      • Baer, J.L., Chen, T.F.: `An effective on-chip preloading scheme to reduce data access penalty', ICS-1991, November 1991, p. 176–186.
    14. 14)
      • J.L. Baer , T.F. Chen . Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. , 5 , 609 - 623
    15. 15)
      • R.J. Eickemeyer , S. Vassiliadis . A load instruction unit for pipelined processors. IBM J. Res. Dev. , 547 - 564
    16. 16)
      • Fu, J.W.C., Patel, J.H.: `Data prefetching in multiprocessor vector cache memories', ISCA-18, p. 54–63.
    17. 17)
      • Gonzalez, J., Gonzalez, A.: `Speculative execution via address prediction and data prefetching', ICS-11, July 1997, p. 196–203.
    18. 18)
      • Annavaram, M., Patel, J.M., Davidson, E.S.: `Data prefetching by dependence graph precomputation', ISCA-28, July–June 2001, p. 52–61.
    19. 19)
      • Karlsson, M., Dahlgren, F., Stenstrom, P.: `A prefetching technique for irregular accesses to linked data structures', HPCA-6, January 2000, p. 206–217.
    20. 20)
      • Ortega, D., Ayguade, E., Baer, J.L., Valero, M.: `Cost-effective compiler directed memory prefetching and bypassing', PACT-11, September 2002, p. 189–198.
    21. 21)
      • Wang, Z., Burger, D., McKinley, K.S., Reinhardt, S.K., Weems, C.C.: `Guided region prefetching: a cooperative hardware/software approach', ISCA-30, 2003, p. 388–398.
    22. 22)
      • Dundas, J., Mudge, T.: `Improving data cache performance by pre-executing instructions under a cache miss', ICS-11, July 1997, p. 68–75.
    23. 23)
      • Moshovos, A., Pnevmatikatos, D.N., Baniasadi, A.: `Slice processors: an implementation of operation-based prediction', ICS-15, June 2001, p. 321–334.
    24. 24)
      • T. Austin , E. Larson , D. Ernst . SimpleScalar: an infrastructure for computer system modeling. IEEE Comp. , 2 , 59 - 67
    25. 25)
      • Sazeides, Y., Smith, E.: `The predictability of data values', MICRO-30, 1997, p. 248–258.

Related content

This is a required field
Please enter a valid email address