Extending the application of formal methods to analyse human error and system failure during accident investigations
Recent disasters at Bhopal, Chernobyl, Habsheim and Kegworth illustrate the point that software is rarely the sole cause behind major accidents. Operator intervention, hardware faults, even the weather conditions and malicious acts all combine to create the conditions for failure. In the aftermath of these accidents, it seems difficult for software engineers, systems developers, forensic scientists and interface designers to predict all of the ways in which systems can fail. It is therefore important that we learn as much as possible from those failures that do occur. Unfortunately, it is often difficult to gain a coherent overview from the mass of detail that is typically contained in many accident reports. This makes it difficult for readers to identify the ‘catastrophic’ events that produced the necessary conditions for disaster. The paper argues that formal specification techniques can be used to resolve these problems. In particular, Temporal Logic of Actions is used to build a unified account of the human errors and system failures that contributed to the Three Mile Island accident. This notation provides high-level abstractions that can be used to strip away the mass of irrelevant details that often obscures important events during disasters. Formal proof techniques can then be applied to the model as a means of identifying the causal relationships that must be broken in order to prevent future failures.