http://iet.metastore.ingenta.com
1887

Optimising power efficiency in trace cache fetch unit

Optimising power efficiency in trace cache fetch unit

For access to this article, please select a purchase option:

Buy article PDF
$19.95
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

As the issue width and the number of function units of superscalar processors continue to increase, the fetch unit must support a large fetch bandwidth in order to fully utilise the datapath resources. This trend makes power issue worse in the fetch unit since the traditional instruction fetch mechanism is not optimised for power consumption. This paper explores the problem of extra power consumption in traditional instruction caches because of dynamic control flows. Capturing the dynamic paths/characteristics of code during the course of execution, trace caches provide a potential framework for power optimisation in the fetch unit. Our study shows that conventional trace caches (CTC) may increase power consumption in the fetch unit because of the simultaneous access to both the trace cache and the instruction cache, and sequential trace caches (STC) have the advantage of lower power consumption at the cost of a significant performance loss. In order to address this problem, we perform a detailed study of trace distribution and access locality. Based on this study, we first propose a new model, the selective trace cache (SLTC). SLTC uses both compiler and hardware support to selectively control trace cache lookup and update. Experimental evaluation shows that our selective trace cache achieves up to 42.2% power reduction over CTC and an additional reduction of up to 21.8% over STC, on the average, while only trading a performance loss of no more than 1.8% compared to CTC. Further, we propose a dynamic direction prediction based trace cache (DPTC), which eliminates the need for compilation and instruction set architecture (ISA) modification involved in SLTC. Powered by a fetch direction predictor, DPTC achieves competitive power efficiency. On the average, DPTC reduces the power consumption by up to 40.5% and 17.6% in the fetch unit compared to CTC and STC, respectively, by trading a performance loss of less than 2.4% to CTC.

References

    1. 1)
      • G. Hinton , D. Sager , M. Upton , D. Boggs , D. Carmean , A. Kyker , P. Roussel . The microarchitecture of the pentium 4 processor. Intel Tech. J.
    2. 2)
      • Yeh, T.Y., Marr, D., Patt, Y.N.: `Increasing the instruction fetch rate via multiple branch prediction and a branch address cache', Proc. 7th Int. Conf. on Supercomputing (ICS'7), July 1993.
    3. 3)
      • Conte, T., Menezes, K.N., Mills, P.M., Patel, B.A.: `Optimization of instruction fetch mechanism for high issue rates', Proc. 22th Annual Int. Symp. on Computer Architecture, June 1995.
    4. 4)
      • Rotenberg, E., Bennett, S., Smith, J.E.: `Trace cache: low latency approach to high bandwidth instruction fetching', Prof. 29th Annual Int. Symp. on Microarchitecture, November 1996.
    5. 5)
      • J. Montanaro , R.T. Witek , K. Anne , A.J. Black , E.M. Cooper . A 160-mHz, 32-b, 0.5-w cmos risc microprocessor. Dig. Tech. J. Dig. Equip. Corp. , 1 , 49 - 62
    6. 6)
      • Manne, S., Klauser, A., Grunwald, D.: `Pipeline gating: speculation control for energy reduction', Proc. 25th Annual Int. Symp. on Computer Architecture (ISCA-98), July 1998, p. 132–141.
    7. 7)
      • E. Rotenberg , S. Bennett , J. Smith . A trace cache microarchitecture and evaluation. IEEE Trans. Computers (special issue on cache memory) , 111 - 120
    8. 8)
      • R. Cooksey , D. Grunwald . (2001) Characterization of the spec2000 benchmark suite.
    9. 9)
      • J. Faistl , T. Jaracz . (1998) Trace cache: effect on instruction cache miss frequency.
    10. 10)
      • Hu, J.S., Vijaykrishnan, N., Kandemir, M., Irwin, M.J.: `Power-efficient trace caches', Proc. 5th Design Automation and Test in Europe Conf. (DATE'02), March 2002, p. 1091.
    11. 11)
      • Patel, S.J., Evers, M., Patt, Y.N.: `Improving trace cache effectiveness with branch promotion and trace packing', Proc. 25th Annual Int. Symp. on Computer Architecture, June 1998.
    12. 12)
      • Friendly, D.H., Patel, S.J., Patt, Y.N.: `Alternative fetch and issue policies for the trace cache fetch mechanism', Proc. 30th Annual ACM/IEEE Int. Symp. on Microarchitecture, December 1997.
    13. 13)
      • Brooks, D., Tiwari, V., Martonosi, M.: `Wattch: a framework for architectural-level power analysis and optimizations', Proc. 27th Annual Int. Symp. on Computer Architecture, June 2000.
    14. 14)
      • Palacharla, S., Jouppi, N.P., Smith, J.E.: `Complexity-effective superscalar processors', Proc. 24th Annual Int. Symp. on Computer Architecture, June 1997, p. 206–218.
    15. 15)
      • Ernst, D., Austin, T.: `Efficient dynamic scheduling through tag elimination', Proc. 29th Annual Int. Symp. on Computer Architecture, May 2002.
    16. 16)
      • Hrishikesh, M.S., Jouppi, N.P., Farkas, K.I.: `The optimal logic depth per pipeline stage is 6 to 8 fo4 inverter delays', Proc. 29th Annual Int. Symp. on Computer Architecture, May 2002.
    17. 17)
      • Brekelbaum, E., II, J.R., Wilkerson, C., Black, B.: `Hierarchical scheduling windows', Proc. 35th Annual Int. Symp. on Microarchitecture, November 2002.
    18. 18)
      • Hu, J.S., Vijaykrishnan, N., Irwin, M.J., Kandemir, M.: `Using dynamic branch behavior for power-efficient instruction fetch', Proc. of IEEE Computer Society Annual Symp. on VLSI (ISVLSI 2003), 20–21 February 2003, Tampa, Florida, p. 127–132.
    19. 19)
      • Yeh, T.Y., Patt, Y.N.: `A comparison of dynamic branch predictors that use two levels of branch history', Proc. 20th Annual Int. Symp. of Computer Architecture, May 1993, San Diego, California, p. 257–266.
    20. 20)
      • Bannon, B.: `Alpha 21364: A scalable single-chip smp', Microprocessor Forum, October 1998.
    21. 21)
      • Naffziger, S.D., Hammond, G.: `The implementation of the next-generation 64b itanium', Proc. ISSCC, February 2002.
    22. 22)
      • Kin, J., Gupta, M., Mangione-Smith, W.: `The filter cache: An energy efficient memory structure', Proc. 30th Annual ACM/IEEE Int. Symp. on Microarchitecture, December 1997.
    23. 23)
      • R.S. Bajwa , M. Hiraki , H. Kojima , D.J. Gorny , K. Nitta , A. Shridhar , K. Seki , K. Sasaki . Instruction buffering to reduce power in processors for signal processing. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. , 417 - 424
    24. 24)
      • U. Ko , P.T. Balsara , A.K. Nanda . Energy optimization of multilevel cache architectures for risc and cisc processors. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. , 299 - 308
    25. 25)
      • N.E. Bellas , I.N. Hajj , C.D. Polychronopoulos . Using dynamic cache management techniques to reduce energy in general purpose processors. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. , 693 - 708
    26. 26)
      • Ramirez, A., Larriba-Pey, J.L., Navarro, C., Torrellas, J., Valero, M.: `Software trace cache', Proc. 13th Intl. Conf. on Supercomputing, June 1999.
    27. 27)
      • Black, B., Rychlik, B., Shen, J.P.: `The block-based trace cache', Proc. 26th Annual Intl. Symp. on Computer Architecture, May 1999.
    28. 28)
      • Ramirez, A., Larriba-Pey, J.L., Valero, M.: `Trace cache redundancy: Red & blue traces', Proc. 6th Int. Symp. on High Performance Computer Architecture (HPCA-6), 2000.
    29. 29)
      • Rosner, R., Mendelson, A., Ronen, R.: `Filtering techniques to improve trace-cache efficiency', Proc. Int. Conf. on Parallel Architectures and Compilation Techniques (PACT'01), 2001.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt_20060170
Loading

Related content

content/journals/10.1049/iet-cdt_20060170
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address