The authors present a translation lookaside buffer (TLB) system with low power consumption for embedded processors. The proposed TLB is constructed as multiple banks, each with an associated block buffer and a corresponding comparator. Either the block buffer or the main bank is selectively accessed on the basis of two bits in the tag buffer. Dynamic power savings are achieved by reducing the number of entries accessed in parallel, as a result of using the tag buffer as a filtering mechanism. The performance overhead of the proposed TLB is negligible compared with other hierarchical TLB structures. For example, the two-cycle overhead of the proposed TLB is only ∼1%, as compared with 5% overhead for a filter (micro)-TLB and 14% overhead for a banked-TLB with block buffering. The authors show that the average hit ratios of the block buffers and the main banks of the proposed TLB are 94% and 6%, respectively. Dynamic power is reduced by ∼93% with respect to a fully associative TLB, 87% with respect to a filter-TLB and 60% relative to a banked-TLB with block buffering. Therefore, significant power savings are achieved with only a small performance degradation.
References
-
-
1)
-
N.S. Kim ,
T. Austin ,
T. Mudge ,
D. Grunwald ,
R. Melhem ,
R. Graybill
.
(2002)
Challenges for architectural level power modeling, Power aware computing.
-
2)
-
Wilton, S.J.E., Jouppi, N.: `An enhanced access and cycle time model for on-chip caches', Digital WRL Research Report 93/5, July 1994.
-
3)
-
B.L. Jacob ,
T.N. Mudge
.
Virtual memory in contemporary microprocessors.
IEEE Micro
,
4 ,
60 -
75
-
4)
-
Kamble, M.B., Ghose, K.: `Energy-efficiency of VLSI cache: a comparative study', Proc. IEEE 10th. Int. Conf. on VLSI Design, Jan. 1997, p. 261–267.
-
5)
-
AustinT.M.Simplescalar 4.0 release notehttp://www.simplecsalar.com/, 2003.
-
6)
-
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: `MiBench: a free, commercially representative embedded benchmark suite', Proc. IEEE 4th Annual Workshop on Workload Characterization, Dec. 2001.
-
7)
-
Intel Co.: StrongARM SA-1100 microprocessor, Technical Reference Manual, 1998.
-
8)
-
Juan, T., Lang, T., Navarro, J.: `Reducing TLB power requirements', Proc. Int. Symp. on Low Power Electronics and Design, 1997.
-
9)
-
Jacob, B.: `Cache design for embedded real-time systems', Proc. Embedded Systems Conf., June 1999.
-
10)
-
EdlerJ.HillM.D.Dinero IV trace-driven uniprocessor cache simulatoravailable from Univ. Wisconsin; ftp://ftp.nj.nec.com/pub/edler/d4/, 1997.
-
11)
-
Segars, S.: `Low power design techniques for microprocessors', Tutorial Note of the ISSCC, Feb. 2001.
-
12)
-
D. Liu
.
Trading speed for low power by choice of supply and threshold voltage.
IEEE J. Solid-State Circuits
,
1 ,
10 -
17
-
13)
-
Memik, G., Reinman, G., Mangione-Smith, W.H.: `Just say no: benefits of early cache miss determination', Proc. HPCA, Feb. 2003.
-
14)
-
Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., Chen, G.: `Generating physical addresses directly for saving instruction TLB energy efficiency', Proc. Int. Symp. on Microarchitecture, 2002.
-
15)
-
Lang, T., Juan, T., Navarro, J.J.: `The difference-bit cache', Proc. ISCA, May 1996, p. 114–120.
-
16)
-
Manne, S., Klauser, A., Grunwald, D., Somenzi, F.: `Low power TLB design for high performance microprocessors', Univ. of Colorado Technical Report, 1997.
-
17)
-
Kamble, M.B., Ghose, K.: `Analytical energy dissipation models for low power caches', Proc. Int. Symp. on Low-Power Electronics and Design, Aug. 1997.
-
18)
-
Reinman, G., Jouppi, N.: `CACTI 3.0: an integrated cache timing and power, and area model', Compaq WRL Report, Aug. 2001.
-
19)
-
Austin, T.M., Sohi, G.S.: `High-bandwidth address translation for multiple-issue processors', Proc. 23rd ACM Int. Symp. on Computer Architecture, May 1996, p. 158–167.
-
20)
-
Kin, J., Gupta, M., Mangione-Smith, W.H.: `The filter cache: an energy efficient memory structure', Proc. Int. Symp. on Microarchitecture, 1997, p. 184–193.
-
21)
-
Ghose, K., Kamble, M.B.: `Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation', Proc. Int. Symp. on Low Power Electronics and Design, Aug. 1999, p. 70–75.
-
22)
-
J.H. Choi ,
J.H. Lee ,
S.W. Jeong ,
S.D. Kim ,
C. Weems
.
A low power TLB structure for embedded systems.
Comput. Archit. Lett.
-
23)
-
Reinman, G, Jouppi, N.: `An integrated cache timing and power model', Compaq WRL Report, 1999.
-
24)
-
NEC Co.NEC announces CB-12 family, world's first 0.13-micron cell-based ICshttp://www.necus.com/companies/2/000127.htm, 2000.
-
25)
-
B.L. Jacob ,
T.N. Mudge
.
Virtual memory: issues of implementation.
Computer
,
6 ,
33 -
43
-
26)
-
K. Itoh ,
K. Sasaki ,
Y. Nakagome
.
Trends in low-power RAM circuit technologies.
Proc. IEEE
,
4 ,
524 -
543
http://iet.metastore.ingenta.com/content/journals/10.1049/ip-cdt_20045025
Related content
content/journals/10.1049/ip-cdt_20045025
pub_keyword,iet_inspecKeyword,pub_concept
6
6