Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

access icon free Reliability-aware simultaneous multithreaded architecture using online architectural vulnerability factor estimation

Miniaturisation in modern microprocessors increases susceptibility to soft errors leading to reliability degradation. Recently simultaneous multithreaded (SMT) architecture is utilised to improve fault tolerance. Despite full coverage, redundant checking in such schemes causes significant performance and energy overheads. Fortunately, some of the soft errors can be masked at the architectural level and architectural vulnerability factor (AVF) of a structure represents the portion of soft errors which lead to a failure in the output of a program. In this study, the authors present an infrastructure for online monitoring of AVF of sensitive structures of an SMT processor. Based on estimated AVF, we have introduced partial thread redundancy (PTR) protection scheme for intervals whose AVF is greater than a predefined threshold and the estimated AVF is used for adaptation between reliability improvement or performance enhancement, especially when the processor executes more than one workload. We have utilised SPEC CPU2006 benchmarks for AVF estimation of some important hardware resources such as issue queue, reorder buffer, load/store queue and register file. Experimental results show that the mean absolute error of our AVF estimation method varies from 0.04 to 0.07 and combined online AVF estimation and PTR protection, leads to a reliability aware execution and lower performance overhead.

References

    1. 1)
    2. 2)
      • 3. Mukherjee, S.S., Weaver, C.T., Emer, J.S., Reinhardt, S.K., Austin, T.M.: ‘A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor’. MICRO, ACM/IEEE, 2003, pp. 2942.
    3. 3)
      • 5. Wang, N.J., Mahesri, A., Patel, S.J.: ‘Examining ACE analysis reliability estimates using fault-injection’. Proc. 34th Int. Symp. on Computer Architecture (34th ISCA'07), ACM SIGARCH, San Diego, California, USA, 2007, pp. 460469.
    4. 4)
      • 24. Sharkey, J.: ‘M-sim: a flexible, multithreaded architectural simulation environment’, Tech. Rep. CS-TR-05-DP01, Department of Computer Science, State University of New York at Binghamton, October 2005.
    5. 5)
      • 4. Fu, X., Li, T., Fortes, J.A.B.: ‘Sim-soda: a unified framework for architectural level software reliability analysis’. Workshop on modeling, benchmarking and Simulation in conjunction with ISCA, 2006.
    6. 6)
      • 1. Nguyen, H.T., Yagil, Y.: ‘A systematic approach to SER estimation and solutions’. Forty-first Annual IEEE Int. Reliability Physics Symp. Proc., 2003, pp. 6070.
    7. 7)
      • 14. Li, X., Adve, S.V., Bose, P., Rivers, J.A.: ‘Softarch: An architecture level tool for modeling and analyzing soft errors’. DSN, IEEE Computer Society, 2005, pp. 496505.
    8. 8)
    9. 9)
    10. 10)
      • 23. Gomaa, M.A., Vijaykumar, T.: ‘Opportunistic transient-fault detection’. Proc. 32nd Int. Symp. on Computer Architecture, 2005. ISCA'05, IEEE, 2005, pp. 172183.
    11. 11)
      • 13. Zhou, H.: ‘A case for fault tolerance and performance enhancement using chip multi-processors’, Comput. Archit. Lett., 2006, 5, (1), pp. 2225.
    12. 12)
      • 18. Fu, X., Poe, J., Li, T., Fortes, J.A.B.: ‘Characterizing microarchitecture soft error vulnerability phase behavior’. MASCOTS, IEEE Computer Society, 2006, pp. 147155.
    13. 13)
      • 9. Reinhardt, S.K., Mukherjee, S.S.: ‘Transient fault detection via simultaneous multithreading’. Proc. 27th Annual Int. Symp. on Computer Architecture, IEEE Computer Society and ACM SIGARCH, Vancouver, British Columbia, 2000, pp. 2536.
    14. 14)
    15. 15)
      • 15. Walcott, K.R., Humphreys, G., Gurumurthi, S.: ‘Dynamic prediction of architectural vulnerability from microarchitectural state’. Proc. 34th Int. Symp. on Computer Architecture (34th ISCA'07), ACM SIGARCH, San Diego, California, USA, 2007, pp. 516527.
    16. 16)
      • 16. Biswas, A., Soundararajan, N., Mukherjee, S.S., Gurumurthi, S.: ‘Quantized AVF: a means of capturing vulnerability variations over small windows of time’. IEEE Workshop on Silicon Errors in Logic-System Effects, 2009.
    17. 17)
      • 22. Hennessy, J.L., Patterson, D.A.: ‘Computer architecture – a quantitative approach’ (Morgan Kaufmann, 2007, 4. edn.).
    18. 18)
    19. 19)
    20. 20)
      • 7. Soundararajan, N., Parashar, A., Sivasubramaniam, A.: ‘Mechanisms for bounding vulnerabilities of processor structures’. Proc. 34th Int. Symp. on Computer Architecture (34th ISCA'07), ACM SIGARCH, San Diego, California, USA, 2007, pp. 506515.
    21. 21)
      • 19. Nair, A.A., Eyerman, S., Eeckhout, L., John, L.K.: ‘A first-order mechanistic model for architectural vulnerability factor’. ACM SIGARCH Computer Architecture News, IEEE Computer Society, 2012, vol. 40, pp. 273284.
    22. 22)
      • 21. Pan, S., Hu, Y., Li, X.: ‘Online computing and predicting architectural vulnerability factor of microprocessor structures’. PRDC, IEEE Computer Society, 2009, pp. 345350.
    23. 23)
      • 10. Vijaykumar, T.N., Pomeranz, I., Cheng, K.K.: ‘Transient fault recovery using simultaneous multithreading’, in: DeGroot, D. (ed.): ‘Proc. 29th Int. Symp. on Computer Architecture (ISCA-02)’, 2 of Computer Architecture News, (ACM Press, New York, 2002), vol. 30, pp. 8798.
    24. 24)
      • 6. Li, X., Adve, S.V., Bose, P., Rivers, J.A.: ‘Online estimation of architectural vulnerability factor for soft errors’. ISCA, IEEE, 2008, pp. 341352.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2013.0162
Loading

Related content

content/journals/10.1049/iet-cdt.2013.0162
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
Errata
An Erratum has been published for this content:
Erratum
This is a required field
Please enter a valid email address