http://iet.metastore.ingenta.com
1887

Reliable computation with unreliable computers

Reliable computation with unreliable computers

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Computers & Digital Techniques — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

As computing systems continue their unquenchable rise towards and through million core architectures, two considerations that used to be unimportant become more and more dominant: power consumption (be it FLOPS/W or W/mm2) and reliability. This study is concerned with the latter: in a system of a million cores, it is unrealistic to expect 100% functionality on power-up; equally, operational availability degrades with time. Monitoring and maintaining the health of such a system using traditional techniques is costly, and most rely on the concept of some sort of central overseer or monitor to make a final judgement about system availability, giving a single point of failure. Large systems of the future will consist of hardware and software that work synergistically to cope with isolated points of failure, allowing the gross behaviour of the system to degrade gracefully and in a meaningful way in the face of faults. This study describes one such system: spiking neural network architecture is a million-core machine with layered fault-tolerance built in at many levels. The authors show how the system may be used to solve the canonical distributed heat diffusion equation, and how the quality of solution is modulated by the effects of partial system failure.

References

    1. 1)
    2. 2)
      • A.D. Brown , S.B. Furber , J.S. Reeve .
        2. Brown, A.D., Furber, S.B., Reeve, J.S., et al: ‘SpiNNaker – foundation software’, accepted for publication, IEEE Trans. Comput..
        . IEEE Trans. Comput.
    3. 3)
    4. 4)
      • 4. http://www.humanbrainproject.eu.
        .
    5. 5)
      • H. Markram .
        5. Markram, H.: ‘Seven challenges for neuroscience’, Funct. Neurol., 2013, 28, (3), pp. 145151.
        . Funct. Neurol. , 3 , 145 - 151
    6. 6)
    7. 7)
      • 7. http://www.bluebrain.epfl.ch.
        .
    8. 8)
      • 8. http://www.eyewire.org.
        .
    9. 9)
      • S. Seung .
        9. Seung, S.: ‘Connectome: how the brains wiring makes us who we are’, Houghton-Harcourt, 2012, ISBN 978-0547508184.
        . Houghton-Harcourt
    10. 10)
      • 10. http://www.humanconnectomeproject.org.
        .
    11. 11)
      • 11. http://www.loris.ca.
        .
    12. 12)
    13. 13)
    14. 14)
      • 14. http://www.brain-map.org/.
        .
    15. 15)
      • 15. http://www.artificialbrains.com/darpa-synapse-program.
        .
    16. 16)
    17. 17)
      • 17. http://www.brainscales.kip.uni-heidelberg.de/.
        .
    18. 18)
    19. 19)
    20. 20)
    21. 21)
      • 21. http://www.myricom.com/scs/myrinet/overview/.
        .
    22. 22)
      • S. Pakin , M. Lauria , A. Chien .
        22. Pakin, S., Lauria, M., Chien, A.: ‘High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet’. Proc. of the 1995 ACM/IEEE Conf. on Supercomputing.
        . Proc. of the 1995 ACM/IEEE Conf. on Supercomputing
    23. 23)
      • 23. http://www.mcs.anl.gov/mpi/.
        .
    24. 24)
      • 24. http://apt.cs.manchester.ac.uk/projects/SpiNNaker.
        .
    25. 25)
      • L.D. Solano-Quinde , B.M. Bode .
        25. Solano-Quinde, L.D., Bode, B.M.: ‘Module Prototype for Online Failure Prediction for the IBM Blue Gene/L’. Proc. IEEE Electro/Information Technology Conf., EIT, May 2008, pp. 470474, doi: 10.1109/EIT.2008.4554349.
        . Proc. IEEE Electro/Information Technology Conf., EIT , 470 - 474
    26. 26)
      • R. Spence , S. Randeep Singh . (1988)
        26. Spence, R., Randeep Singh, S.: ‘Tolerance design of electronic circuits’ (Addison-Wesley, New York, 1988).
        .
    27. 27)
    28. 28)
    29. 29)
      • R.A. Saleh , B. Antao , J. Singh .
        29. Saleh, R.A., Antao, B., Singh, J.: ‘Multilevel and mixed-domain simulation of analogue circuits and systems’,.
        .
    30. 30)
    31. 31)
    32. 32)
    33. 33)
      • D.P. Bertsekas , J.N. Tsitsiklis . (1989)
        33. Bertsekas, D.P., Tsitsiklis, J.N.: ‘Parallel and distributed computation: numerical methods’ (Prentice-Hall, Englewood Cliffs, NJ, 1989), ISBN 0-13-648759-9.
        .
    34. 34)
      • M. Elnozahy , L. Alvisi , Y.-M. Wang , D.B. Johnson . (1999)
        34. Elnozahy, M., Alvisi, L., Wang, Y.-M., Johnson, D.B.: ‘A survey of rollback-recovery protocols in message passing systemsPub CMU-CS-99-148 (School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 1999).
        .
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cdt.2014.0110
Loading

Related content

content/journals/10.1049/iet-cdt.2014.0110
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address