© The Institution of Engineering and Technology
The trace cache is a technique that provides accurate, high bandwidth instruction fetch. However, when a desired instruction trace is not found in the cache, conventional instruction fetch and decode must be used to satisfy the trace request. Such auxiliary fetch hardware can be expensive in terms of energy, area and complexity. An approach to combine a trace cache and conventional instruction fetch hardware using a decoupled design is explored. The design enables the processor to dynamically switch between trace ID and PC-based prediction methods and helps to hide the latency associated with the instruction memory path. The decoupled design with accelerated slow path instruction delivery and no instruction cache is able to provide comparable benefit to a front-end with an 8 kB instruction cache (within 2% of the instructions per cycle with the cache). High tolerance can be demonstrated for both trace table misses and increased memory latency when scaling down the size of the trace table and scaling up the L2 access latency.
References
-
-
1)
-
G. Reinman ,
B. Calder ,
T. Austin
.
Optimizations enabled by a decoupled front-end architecture.
IEEE Trans. Comput.
,
4 ,
338 -
355
-
2)
-
T.-F. Chen ,
J.-L. Baer
.
Effective hardware-based data prefetching for high performance processors.
IEEE Trans. Comput.
,
44 ,
609 -
623
-
3)
-
T.M. Conte ,
K.N. Menezes ,
P.M. Mills ,
B.A. Patel
.
Optimization of instruction fetch mechanisms for high issue rates.
22nd Annual Int. Symp. Computer Architecture
,
333 -
344
-
4)
-
Reinman, G., Calder, B., Austin, T.: `Fetch directed instruction prefetching', 32ndInt. Symp. Microarchitecture, November 1999.
-
5)
-
Yeh, T., Patt, Y.: `A comprehensive instruction fetch mechanism for a processor supporting speculative execution', Proc. 25th Annual Int. Symp. Microarchitecture, December 1992, p. 129–139.
-
6)
-
Yeh, T.: `Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors', 1993, PhD, University of Michigan, dissertation.
-
7)
-
Seznec, A., Jourdan, S., Sainrat, P., Michaud, P.: `Multiple-block ahead branch predictors', Proc. 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, October 1996, p. 116–127.
-
8)
-
Skadron, K., Ahuja, P., Martonosi, M., Clark, D.: `Improving prediction for procedure returns with return-address-stack repair mechanisms', Proc. 31st Annual Int. Symp. Microarchitecture, December 1998, p. 259–271.
-
9)
-
Sherwood, T., Perelman, E., Calder, B.: `Basic block distribution analysis to find periodic behavior and simulation points in applications', Int. Conf. Parallel Architectures and Compilation Techniques, September 2001.
-
10)
-
Kaeli, D., Emma, P.: `Branch history table prediction of moving target branches due to subroutine returns', 18thAnnual Int. Symp. Computer Architecture, 1991.
-
11)
-
Peleg, A., Weiser, U.: `Dynamic flow instruction cache memory organized around trace segments independent of virtual address line', US Patent 5381533, January 1995.
-
12)
-
Rotenberg, E., Bennett, S., Smith, J.E.: `Trace cache: a low latency approach to high bandwidth instruction fetching', Proc. 29th Annual Int. Symp. Microarchitecture, December 1996, p. 24–34.
-
13)
-
G. Hinton ,
D. Sager ,
M. Upton ,
D. Boggs ,
D. Carmean ,
A. Kyker ,
P. Roussel
.
The microarchitecture of the Pentium 4 processor.
Intel Technol. J.
-
14)
-
V. Agarwal ,
M. Hrishikesh ,
S. Keckler ,
D. Burger
.
Clock rate versus IPC: the end of the road for conventional microarchitectures.
27th Annual Int. Symp. Computer Architecture
-
15)
-
D. Marr ,
F. Binns ,
D. Hill ,
G. Hinton ,
D. Koufaty ,
J. Miller ,
M. Upton
.
Hyper-threading technology architecture and microarchitecture: a hypertext history.
Intel Technol. J.
-
16)
-
Reinman, G., Austin, T., Calder, B.: `A scalable front-end architecture for fast instruction delivery', 26thAnnual Int. Symp. Computer Architecture, May 1999.
-
17)
-
Postiff, M., Tyson, G., Mudge, T.: `Performance limits of trace caches', CSE-TR-373-98, Technical, 1998, 8.
-
18)
-
Burger, D.C., Austin, T.M.: `The simplescalar tool set, version 2.0', CS-TR-97-1342, Technical, June 1997.
-
19)
-
Jacobson, Q., Rotenberg, E., Smith, J.E.: `Path-based next trace prediction', Int. Symp. Microarchitecture, 1997, p. 14–23.
-
20)
-
Rotenberg, E., Jacobson, Q., Sazeides, Y., Smith, J.: `Trace processors', 30thInt. Symp. Microarchitecture, 1997.
-
21)
-
McFarling, S.: `Combining branch predictors', Technical Report TN-36, June 1993.
-
22)
-
Stark, J., Racunas, P., Patt, Y.: `Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order', Proc. 30th Int. Symp. Microarchitecture, December 1997, p. 34–45.
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_20050161
Related content
content/journals/10.1049/ip-cdt_20050161
pub_keyword,iet_inspecKeyword,pub_concept
6
6