Two-phase prediction of L1 data cache misses
Hardware prefetching schemes which divide the misses into streams are generally preferred to other hardware based schemes. But, as they do not know when the next miss of a stream happens, they cannot prefetch a block in appropriate time. Some of them use a substantial amount of hardware storage to keep the predicted miss blocks from all streams. The other approaches follow the program flow and prefetch all target addresses including those blocks which already exist in the L1 data cache. The approach presented predicts the stream of next miss and then prefetches only the next miss address of the stream. It offers a general prefetching framework, two-phase prediction algorithm (TPP), that lets each stream have its own address predictor. Comparing the TPP algorithm with the latest variant of stream buffers and Markov predictor using SPEC CPU 2000 benchmarks shows that in average (1) the TPP approach has 18% speedup compared to 1% speedup in Markov and 0.05% in stream buffers. (2) 78% of the TPP prefetches have been useful, whereas in stream buffers and Markov, only 18% and 24% of them were useful, respectively.