Your browser does not support JavaScript!

Resilience and fault tolerance

Resilience and fault tolerance

For access to this article, please select a purchase option:

Buy chapter PDF
(plus tax if applicable)
Buy Knowledge Pack
10 chapters for $120.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
Ultrascale Computing Systems — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Ultrascale computing is a new computing paradigm that comes naturally from the necessity of computing systems that should be able to handle massive data in possibly very large-scale distributed systems, enabling new forms of applications that can serve a very large amount of users and in a timely manner that we have never experienced before. It is very challenging to find sustainable solutions for ultrascale computing system (UCS) due to their scale and a wide range of possible applications and involved, and big data management. One of the challenges regarding sustainable UCS is resilience. Traditionally, it has been an important aspect in the area of critical infrastructure protection.

Chapter Contents:

  • 3.1 Security and reliability in ultrascale system
  • 3.1.1 Faults, fault tolerance, and robustness
  • Dependable computing and fault-tolerance techniques
  • Robustness
  • 3.1.2 Fault tolerance in UCS
  • Computing hardware resilience
  • Network resilience
  • Software and message passing interface resilience
  • 3.1.3 Blockchains and DLT
  • 3.2 Regulation compliance in ultrascale system
  • 3.2.1 NESUS Watchdog and regulatory compliance
  • Assessing significance of breach
  • Assessing probability of breach
  • 3.2.2 Illustration: penalization by NESUS Watchdog
  • 3.3 Conclusion

Inspec keywords: fault tolerant computing; Big Data; distributed databases; security of data

Other keywords: critical infrastructure protection; resilience; UCS; large-scale distributed systems; ultrascale computing; Big Data management; computing systems; high-performance computing; fault tolerance; HPC

Subjects: Systems analysis and programming; Performance evaluation and testing; Data security; Systems software; Distributed databases

Preview this chapter:
Zoom in

Resilience and fault tolerance, Page 1 of 2

| /docserver/preview/fulltext/books/pc/pbpc024e/PBPC024E_ch3-1.gif /docserver/preview/fulltext/books/pc/pbpc024e/PBPC024E_ch3-2.gif

Related content

This is a required field
Please enter a valid email address