LSQ: a power efficient and scalable implementation
LSQ: a power efficient and scalable implementation
- Author(s): F. Castro ; D. Chaver ; L. Pinuel ; M. Prieto ; M.C. Huang ; F. Tirado
- DOI: 10.1049/ip-cdt:20050218
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): F. Castro 1 ; D. Chaver 1 ; L. Pinuel 1 ; M. Prieto 1 ; M.C. Huang 2 ; F. Tirado 1
-
-
View affiliations
-
Affiliations:
1: Dept. Arquitectura de Computadores y Automática, Facultad CC. Fiscias, Avda, Complutense s/n Ciudad Universitaria, Madrid, Spain
2: Dept. of Electrical & Computer Engineering, University of Rochester, Rochester, USA
-
Affiliations:
1: Dept. Arquitectura de Computadores y Automática, Facultad CC. Fiscias, Avda, Complutense s/n Ciudad Universitaria, Madrid, Spain
- Source:
Volume 153, Issue 6,
November 2006,
p.
389 – 398
DOI: 10.1049/ip-cdt:20050218 , Print ISSN 1350-2387, Online ISSN 1359-7027
The load–store queue (LSQ) of modern superscalar processors is a critical and non-scalable component responsible for keeping the order of memory operations. As new architectures become more aggressive, the number of in-flight memory instructions increases, and the LSQ must satisfy higher capacity requirements. An efficient LSQ state filtering mechanism based on Bloom filtering is proposed, which, in conjunction with a dynamic or profiling-based predictor, provides significant energy reduction (up to 55% in the LSQ and 4% in the whole processor), and only incurs a small performance loss.
Inspec keywords: cache storage; content-addressable storage; power aware computing
Other keywords:
Subjects: File organisation; Associative storage
References
-
-
1)
- Sha, T., Martin, M.M.K., Roth, A.: `Scalable store–load forwarding via store queue index prediction', Proc. Int. Symp. on Microarchitecture, November 2005, Barcelona, Spain, p. 159–170.
-
2)
- Roth, A.: `A high-bandwidth load-store unit for single- and multi- threaded processors', Technical Report (CIS), September 2004, Development of Computer and Information Science University of Pennsylvania.
-
3)
- Stone, S.S., Woley, K.M., Frank, M.I.: `Address-indexed memory disambiguation and store-to-load forwarding', Proc. Int. Symp. on Microarchitecture, November 2005, Barcelona, Spain, p. 171–182.
-
4)
- Akkary, H., Rajwar, R., Srinivasan, S.: `Checkpoint processing and recovery: towards scalable large instruction window processors', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 423–434.
-
5)
- R.E. Kessler . The Alpha 21264 microprocessor. IEEE Micro , 2 , 24 - 36
-
6)
- Torres, E., Ibanez, P., Vinals, V., Llaberia, J.: `Store buffer design in first-level multibanked data caches', Proc. Int. Symp. on Computer Architecture, June 2005, Madison, USA, p. 469–480.
-
7)
- Sethumadhavan, S., Desikan, R., Burger, D., Moore, C.R., Keckler, S.W.: `Scalable hardware memory disambiguation for high ILP processors', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 399–410.
-
8)
- S. Sethumadhavan , R. Desikan , D. Burger , C.R. Moore , S.W. Keckler . Scalable hardware memory disambiguation for high ILP processors. IEEE Micro , 6 , 118 - 127
-
9)
- Cain, H.W., Lipasti, M.H.: `Memory ordering: a value-based approach', Proc. Int. Symp. on Computer Architecture, June 2004, München, Germany, p. 90–101.
-
10)
- Moshovos, A., Breach, S., Vijaykumar, T., Sohi, G.: `Dynamic speculation and synchronization of data dependences', Proc. Int. Symp. on Computer Architecture, June 1997, Denver, USA, p. 181–193.
-
11)
- Brooks, D., Tiwari, V., Martonosi, M.: `Wattch: a framework for architectural-level power analysis and optimizations', Proc. Int. Symp. on Computer Architecture, June 2000, Vancouver, Canada, p. 83–94.
-
12)
- Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: `Automatically characterizing large scale program behavior', Proc. Int. Conf. on Architectural Support for Programming Languages and Operating Systems, October 2002, San Jose, USA, p. 45–57.
-
13)
- Baugh, L., Zilles, C.: `Decomposing the load–store queue by function for power reduction and scalability', Proc. of P=ac2 Conference, October 2004, Yorktown Heights, USA.
-
14)
- Huang, R., Garg, A., Huang, M.C.: `Software-hardware cooperative memory disambiguation', Proc. Int. Symp. on High-Performance Computer Architecture, February 2006, Austin, USA, p. 244–253.
-
15)
- Park, I., Ooi, C.L., Vijaykumar, T.N.: `Reducing design complexity of the load-store queue', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 411–422.
-
16)
- Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S., Lai, K.: `Scalable load and store processing in latency tolerant processors', Proc. Int. Symp. on Computer Architecture, June 2005, Madison, USA, p. 446–457.
-
17)
- T. Austin , E. Larson , D. Ernst . SimpleScalar: an infrastructure for computer system modeling. IEEE Comput. , 2 , 59 - 67
-
18)
- Castro, F., Chaver, D., Pinuel, L., Prieto, M., Huang, M.C., Tirado, F.: `Load–store queue management: an energy-efficient design based on a state-filtering mechanism', Proc. Int. Conf. on Computer Design, October 2005, San Jose, USA, p. 617–624.
-
1)