Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Two-phase prediction of L1 data cache misses

Two-phase prediction of L1 data cache misses

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IEE Proceedings - Computers and Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Hardware prefetching schemes which divide the misses into streams are generally preferred to other hardware based schemes. But, as they do not know when the next miss of a stream happens, they cannot prefetch a block in appropriate time. Some of them use a substantial amount of hardware storage to keep the predicted miss blocks from all streams. The other approaches follow the program flow and prefetch all target addresses including those blocks which already exist in the L1 data cache. The approach presented predicts the stream of next miss and then prefetches only the next miss address of the stream. It offers a general prefetching framework, two-phase prediction algorithm (TPP), that lets each stream have its own address predictor. Comparing the TPP algorithm with the latest variant of stream buffers and Markov predictor using SPEC CPU 2000 benchmarks shows that in average (1) the TPP approach has 18% speedup compared to 1% speedup in Markov and 0.05% in stream buffers. (2) 78% of the TPP prefetches have been useful, whereas in stream buffers and Markov, only 18% and 24% of them were useful, respectively.

References

    1. 1)
      • T. Austin , E. Larson , D. Ernst . SimpleScalar: an infrastructure for computer system modeling. IEEE Comp. , 2 , 59 - 67
    2. 2)
    3. 3)
      • Ortega, D., Ayguade, E., Baer, J.L., Valero, M.: `Cost-effective compiler directed memory prefetching and bypassing', PACT-11, September 2002, p. 189–198.
    4. 4)
      • Karlsson, M., Dahlgren, F., Stenstrom, P.: `A prefetching technique for irregular accesses to linked data structures', HPCA-6, January 2000, p. 206–217.
    5. 5)
      • Ortega, D., Ayduade, E., Valero, M.: `Dynamic memory instruction bypassing', ICS-17, June 2003, p. 316–325.
    6. 6)
      • Kandiraju, G.B., Sivasubramaniam, A.: `Going the distance for TLB prefetching: an application driven study', ISCA-29, May 2002, p. 195–206.
    7. 7)
    8. 8)
      • Dundas, J., Mudge, T.: `Improving data cache performance by pre-executing instructions under a cache miss', ICS-11, July 1997, p. 68–75.
    9. 9)
      • Wang, Z., Burger, D., McKinley, K.S., Reinhardt, S.K., Weems, C.C.: `Guided region prefetching: a cooperative hardware/software approach', ISCA-30, 2003, p. 388–398.
    10. 10)
      • Roth, A., Moshovos, A., Sohi, G.S.: `Dependence based prefetching for linked data structures', ASPLOS-8, October 1998, p. 115–126.
    11. 11)
      • Palacharla, S., Kessler, R.E.: `Evaluating stream buffers as a secondary cache replacement', ISCA-21, April 1994, p. 24–33.
    12. 12)
      • Baer, J.L., Chen, T.F.: `An effective on-chip preloading scheme to reduce data access penalty', ICS-1991, November 1991, p. 176–186.
    13. 13)
      • Fu, J.W.C., Patel, J.H.: `Data prefetching in multiprocessor vector cache memories', ISCA-18, p. 54–63.
    14. 14)
      • Sazeides, Y., Smith, E.: `The predictability of data values', MICRO-30, 1997, p. 248–258.
    15. 15)
      • J.L. Baer , T.F. Chen . Effective hardware-based data prefetching for high-performance processors. IEEE Trans. Comput. , 5 , 609 - 623
    16. 16)
      • Jouppi, N.P.: `Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers', ISCA-17, May 1990, p. 364–373.
    17. 17)
      • R.J. Eickemeyer , S. Vassiliadis . A load instruction unit for pipelined processors. IBM J. Res. Dev. , 547 - 564
    18. 18)
      • D. Joseph , D. Grunwald . Prefetching using Markov predictors. IEEE Tran. Comput. , 2 , 121 - 133
    19. 19)
      • Sherwood, T., Sair, S., Calder, B.: `Predictor-directed stream buffers', MICRO-33, 2000, p. 42–53.
    20. 20)
      • Farkas, K.L., Chow, P., Jouppi, N.P., Vranesic, Z.: `Memroy-system design considerations for dynamically-scheduled processors', ISCA-24, June 1997, p. 133–143.
    21. 21)
      • Gonzalez, J., Gonzalez, A.: `Speculative execution via address prediction and data prefetching', ICS-11, July 1997, p. 196–203.
    22. 22)
      • Moshovos, A., Pnevmatikatos, D.N., Baniasadi, A.: `Slice processors: an implementation of operation-based prediction', ICS-15, June 2001, p. 321–334.
    23. 23)
      • M.J. Charney , T.R. Puzak . Prefetching and memory system behavior of the SPEC95 benchmark suite. IBM J. Res. Dev. , 265 - 286
    24. 24)
      • Annavaram, M., Patel, J.M., Davidson, E.S.: `Data prefetching by dependence graph precomputation', ISCA-28, July–June 2001, p. 52–61.
    25. 25)
      • S. Sharma , J.G. Beu , M. Conte Thomas . Spectral prefetcher: an effective mechanism for L2 cache prefetching. ACM Trans. Archit. Code Optim. (TACO) , 4 , 423 - 450
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_20050197
Loading

Related content

content/journals/10.1049/ip-cdt_20050197
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address