http://iet.metastore.ingenta.com
1887

LSQ: a power efficient and scalable implementation

LSQ: a power efficient and scalable implementation

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IEE Proceedings - Computers and Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The load–store queue (LSQ) of modern superscalar processors is a critical and non-scalable component responsible for keeping the order of memory operations. As new architectures become more aggressive, the number of in-flight memory instructions increases, and the LSQ must satisfy higher capacity requirements. An efficient LSQ state filtering mechanism based on Bloom filtering is proposed, which, in conjunction with a dynamic or profiling-based predictor, provides significant energy reduction (up to 55% in the LSQ and 4% in the whole processor), and only incurs a small performance loss.

References

    1. 1)
      • Castro, F., Chaver, D., Pinuel, L., Prieto, M., Huang, M.C., Tirado, F.: `Load–store queue management: an energy-efficient design based on a state-filtering mechanism', Proc. Int. Conf. on Computer Design, October 2005, San Jose, USA, p. 617–624.
    2. 2)
      • Sethumadhavan, S., Desikan, R., Burger, D., Moore, C.R., Keckler, S.W.: `Scalable hardware memory disambiguation for high ILP processors', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 399–410.
    3. 3)
      • R.E. Kessler . The Alpha 21264 microprocessor. IEEE Micro , 2 , 24 - 36
    4. 4)
      • T. Austin , E. Larson , D. Ernst . SimpleScalar: an infrastructure for computer system modeling. IEEE Comput. , 2 , 59 - 67
    5. 5)
      • Brooks, D., Tiwari, V., Martonosi, M.: `Wattch: a framework for architectural-level power analysis and optimizations', Proc. Int. Symp. on Computer Architecture, June 2000, Vancouver, Canada, p. 83–94.
    6. 6)
      • Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: `Automatically characterizing large scale program behavior', Proc. Int. Conf. on Architectural Support for Programming Languages and Operating Systems, October 2002, San Jose, USA, p. 45–57.
    7. 7)
      • Akkary, H., Rajwar, R., Srinivasan, S.: `Checkpoint processing and recovery: towards scalable large instruction window processors', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 423–434.
    8. 8)
      • Gandhi, A., Akkary, H., Rajwar, R., Srinivasan, S., Lai, K.: `Scalable load and store processing in latency tolerant processors', Proc. Int. Symp. on Computer Architecture, June 2005, Madison, USA, p. 446–457.
    9. 9)
      • Torres, E., Ibanez, P., Vinals, V., Llaberia, J.: `Store buffer design in first-level multibanked data caches', Proc. Int. Symp. on Computer Architecture, June 2005, Madison, USA, p. 469–480.
    10. 10)
      • Roth, A.: `A high-bandwidth load-store unit for single- and multi- threaded processors', Technical Report (CIS), September 2004, Development of Computer and Information Science University of Pennsylvania.
    11. 11)
      • Baugh, L., Zilles, C.: `Decomposing the load–store queue by function for power reduction and scalability', Proc. of P=ac2 Conference, October 2004, Yorktown Heights, USA.
    12. 12)
    13. 13)
      • Park, I., Ooi, C.L., Vijaykumar, T.N.: `Reducing design complexity of the load-store queue', Proc. Int. Symp. on Microarchitecture, December 2003, San Diego, USA, p. 411–422.
    14. 14)
      • Huang, R., Garg, A., Huang, M.C.: `Software-hardware cooperative memory disambiguation', Proc. Int. Symp. on High-Performance Computer Architecture, February 2006, Austin, USA, p. 244–253.
    15. 15)
      • Moshovos, A., Breach, S., Vijaykumar, T., Sohi, G.: `Dynamic speculation and synchronization of data dependences', Proc. Int. Symp. on Computer Architecture, June 1997, Denver, USA, p. 181–193.
    16. 16)
      • Sha, T., Martin, M.M.K., Roth, A.: `Scalable store–load forwarding via store queue index prediction', Proc. Int. Symp. on Microarchitecture, November 2005, Barcelona, Spain, p. 159–170.
    17. 17)
      • Stone, S.S., Woley, K.M., Frank, M.I.: `Address-indexed memory disambiguation and store-to-load forwarding', Proc. Int. Symp. on Microarchitecture, November 2005, Barcelona, Spain, p. 171–182.
    18. 18)
      • Cain, H.W., Lipasti, M.H.: `Memory ordering: a value-based approach', Proc. Int. Symp. on Computer Architecture, June 2004, München, Germany, p. 90–101.
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_20050218
Loading

Related content

content/journals/10.1049/ip-cdt_20050218
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address