Your browser does not support JavaScript!

Soft-error reliable architecture for future microprocessors

Soft-error reliable architecture for future microprocessors

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

The transient error is the failure of the device due to transient hardware faults caused by high-energy particles like neutron and alpha particle strikes. In this study, the authors propose two schemes of fault-tolerant architecture. The first scheme is a hardware-based solution called REMO that combines the best features of space and time redundancy. REMO provides very high fault coverage with minimum overheads in performance, power and area. The second scheme, REMORA combines the best features of hardware and software approaches of fault tolerance. The persistent issue of unprotected code which exists in software approaches is eliminated in this proposal. Simulation results from a SPEC2006 benchmark suite indicate, REMO incurs an increase in the area of about 6%, power overhead is 9% in spite of redundant execution and a negligible performance penalty during a fault-free run. In REMORA, performance degradation increases to 12%. The code size inflation is close to 12%. This is due to the additional signature instructions inserted into the application program. In this study, the authors have explored the possibility of eliminating this penalty by embedding the signatures in control flow instructions. The power and area overhead of REMORA is on par with REMO.


    1. 1)
      • 27. Shrivastava, A., Rhisheekesan, A.: ‘Quantitative analysis of control flow checking mechanisms for soft errors’. Proc. ACM/EDAC/IEEE Design Automation Conf. (DAC), San Francisco, CA, USA., 2014, pp. 16.
    2. 2)
      • 4. Gopalakrishnan, S., Singh, V.: ‘REMORA: a hybrid low-cost soft-error reliable fault tolerant architecture’. Proc. Int. Symp. on DFT, Cambridge, UK., October 2017, pp. 16.
    3. 3)
      • 8. Subramanyan, P., Singh, V., Saluja, K.K., et al: ‘Multiplexed redundant execution: a technique for efficient fault tolerance in chip multiprocessors’. Proc. of DATE, Dresden, Germany, 2010, pp. 15721577.
    4. 4)
      • 13. Bournoutian, G., Orailoglu, A.: ‘Dynamic transient fault detection and recovery for embedded processor datapaths’. Proc. Int. CODES + ISSS, Tampere, Finland, 2012, pp. 4352.
    5. 5)
      • 2. Jing, Y., Maria, J., Marc, S.: ‘ESoftcheck: removal of non-vital checks for fault tolerance’. Proc. Int. Symp. on Code Generation and Optimization, Seattle, WA, 2009, pp. 3546.
    6. 6)
      • 31. Ernst, D., Kim, N.S., Das, S., et al: ‘Razor: a low-power pipeline based on circuit-level timing speculation’. Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture, 2003 MICRO-36, San Diego, CA, USA., December 2003, pp. 718.
    7. 7)
      • 11. Kim, S., Somani, A.: ‘SSD: an affordable fault tolerant architecture for superscalar processors’. Proc. Eight Pacfic Rim Int. Symp. on Dependable Computing, Seoul, Republic of Korea, 2001, pp. 2734.
    8. 8)
      • 9. Subramanyan, P., Singh, V., Saluja, K.K., et al: ‘Energy efficient redundant execution for chip multiprocessors’. Proc. Int. Conf. GLSVLSI, Rhode Island, USA., 2010, pp. 419426.
    9. 9)
      • 6. Rodrigues, R., Kundu, S.: ‘A low power architecture for online detection of execution errors in SMT processors’. Proc. Int. Symp. on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, New York, NY, USA., October 2013, pp. 3338.
    10. 10)
      • 5. Reinhardt, S.K., Mukherjee, S.S.: ‘Transient fault detection via simultaneous multithreading’. Proc. Int. Conf. ISCA, Vancouver, BC, Canada, June 2000, pp. 2536.
    11. 11)
      • 25. Oh, N., Shirvani, P., McCluskey, E.J.: ‘Control-flow checking by software signatures’, IEEE Trans. Rel., 2002, 51, (2), pp. 111122.
    12. 12)
      • 32. Lattner, C., Adve, V.: ‘Llvm: a compilation framework for lifelong program analysis & transformation’. Proc. Int. Code Generation and Optimization, San Jose, CA, USA., 2004, pp. 7586.
    13. 13)
      • 34. Li, S., Ahn, J.H., Strong, R.D., et al: ‘McPAT: An integrated power, area, and timing modeling framework for multicore and many core architectures’. Proc. Int. Symp. on MICRO 42, New York, NY, USA., 2009, pp. 469480.
    14. 14)
      • 23. Ragel, R. G., Parameswaran, S.: ‘A hybrid hardware-software technique to improve reliability in embedded processors’, ACM Trans. Embedded Comput. Syst., 2011, 10, (3), pp. 36:136:16.
    15. 15)
      • 26. Meixner, A., Sorin, D.J.: ‘Error detection using dynamic dataflow verification’. Proc. Int. Conf. on Parallel Architectures and Compilation Techniques, Brasov, Romania, September 2007, pp. 104118.
    16. 16)
      • 19. Feng, S., Gupta, S., Ansari, A., et al: ‘Shoestring: probabilistic soft error reliability on the cheap’. Proc. Int. Conf. ASPLOS, USA, 2010, pp. 385396.
    17. 17)
      • 17. Oh, N., Shirvani, P.P., McCluskey, E.J.: ‘Error detection by duplicated instructions in super-scalar processors’, IEEE Trans. Reliab., 2002, 51, (1), pp. 6375.
    18. 18)
      • 29. Parra, L., Lindoso, A., Portela, M., et al: ‘Efficient mitigation of data and control flow errors in microprocessors’, IEEE Trans. Nucl. Sci., 2014, 61, (4), pp. 14.
    19. 19)
      • 33. Binkert, N., Beckmann, B., Black, G., et al: ‘The gem5 simulator’, ACM SIGARCH Comput. Archit. News, 2011, 39, (2), pp. 17.
    20. 20)
      • 3. Gopalakrishnan, S., Singh, V.: ‘REMO: redundant execution with minimum area, power, performance overhead fault tolerant architecture’. Proc. Int. Symp. on IOLTS, Catalonia, Spain, July 2016, pp. 109114.
    21. 21)
      • 24. Vemu, R., Abraham, J.: ‘Ceda: control-flow error detection using assertions’, IEEE Trans. Comput., 2011, 60, (9), pp. 12331245.
    22. 22)
      • 14. Smolens, J.C., Kim, J., Hoe, J.C., et al: ‘Efficient resource sharing in concurrent error detecting superscalar microarchitectures’. Proc. Int. Symp. on Microarchitecture, Portland, Oregon, 2004, pp. 257268.
    23. 23)
      • 28. Meixner, A., Bauer, M.E., Sorin, D.J: ‘Argus: Low-cost, comprehensive error detection in simple cores’, IEEE Microarchitecture, 2008, 28, (1), pp. 5259.
    24. 24)
      • 18. Reis, A.G., Chang, J., Vachharajani, N., et al: ‘SWIFT: software implemented fault tolerance’. Int. Symp. on Code Generation and Optimization (CGO), San Jose, CA, USA., 2005, pp. 243254.
    25. 25)
      • 22. Khudia, D.S., Mahlke, S.: ‘Low cost control flow protection using abstract control signatures’, ACM Sigplan Not., 2013, 48, (5), pp. 312.
    26. 26)
      • 10. Austin, T.M.: ‘Diva: a reliable substrate for deep submicron microarchitecture design’. Proc. Int. Symp. on Microarchitecture, Haifa, Israel, 1999, pp. 196207.
    27. 27)
      • 7. Gomma, M., Scarbrough, C., Vijaykumar, T.N.: ‘Transient-fault recovery for chip multiprocessors’, IEEE Micro, 2003, 23, (6), pp. 7683.
    28. 28)
      • 20. Khudia, D.S., Mahlke, S.: ‘Harnessing soft computations for low-budget fault tolerance’. Proc. Int. Symp. on Microarchitecture, Cambridge, UK., 2014, pp. 319330.
    29. 29)
      • 12. Nickel, J., Somani, A.: ‘REESE: a method of soft error detection in microprocessors’. Proc. Int. Conf. on Dependable Systems and Networks, Göteborg, Sweden, July 2001, pp. 401410.
    30. 30)
      • 30. Golander, A., Weiss, S., Ronen, R.: ‘DDMR: dynamic and scalable dual modular redundancy with short validation intervals’, IEEE Comput. Archit. Lett., 2008, 7, (2), pp. 6568.
    31. 31)
      • 1. Baumann, R.: ‘Soft errors in advanced computer systems’, IEEE Des. Test. Comput., 2005, 22, (3), pp. 258266.
    32. 32)
      • 16. Ray, J., Hoe, J.C., Falsafi, B.: ‘Dual use of superscalar datapath for transient-fault detection and recovery’. Proc. of 34th Int. Symp. on Microarchitecture, Austin, TX, USA., 2001, pp. 214224.
    33. 33)
      • 15. Sohi, G.S., Franklin, M., Saluja, K.K.: ‘A study of time-redundant fault-tolerance techniques for high-performance pipelined computers’. Proc. Int. Symp. on Fault-Tolerant Computing, Chicago (IL), USA, 1989, pp. 436449.
    34. 34)
      • 21. Mahmood, A., McCluskey, E.J.: ‘Concurrent error detection using watchdog processors-a survey’, IEEE Trans. Comput., 1998, 37, (2), pp. 160174.

Related content

This is a required field
Please enter a valid email address