Complexity and Reliability are Strange Bedfellows

Complexity, Reliability and Safety in a Future Complex Weapon System - Will it be as Reliable? 

Today’s ICBM command and control system is simple compared to current technological command and control capabilities. If you look at fault trees attempting to identify the outcome of all possible failures in a command and control system, then use that analysis to determine all the permutations of failure paths that have the outcome of an accidental launch or the inability to launch a missile, you can calculate probabilities for such system failures to occur. By computing the product of individual failure probabilities, one achieves a probability number for the overall fault path. Unfortunately, this leads to what most scholars concede is an unrealistically low probability of failure, due in large part to the difficulty of determining the interactive complexity of the system and the ‘coupling’, the impact a failure may have on the probability of another failure. The failures are not truly independent events. Then summing the probabilities of all the fault paths provides an overall probability of failure, which again, is unrealistically low.

Using the same fault tree approach for a more complex system, such as a future ICBM, the identifiable population of failure modes is very likely to be much greater, simply for no other reason than the system will be more complex (have more parts/failure paths).  Whether the sum of probabilities of the increased number of fault paths can be maintained acceptably low is undetermined.  I believe ‘undetermined’ is a generous statement. Experience with other complex systems doesn’t encourage me that we can accomplish an overall probability of failure that is lower than today’s ICBM.  The increased complexity means the number of fault path permutations increases greatly.

So, unless the designers can do something to reduce the complexity, the overall probability of failure would be greater.  The designers are very aware of this and are some of the best minds in the country.  The trick will be to try to design the system such that failures that might result in accidental launch or inability to launch is actually a smaller population, i.e., using some design concept such as an automated fault correcting system.  But then the complexity increases even more.  This is why I use the whimsical example of two Campbell’s Soup cans and a string for a NC2 network.  The logic reads ‘the simpler the system, the safer and more reliable it is’. Maybe by isolating the critical paths used to launch and making them as independent from the remainder of the system as possible, the number of fault paths could actually be reduced.  Of course, these points can be debated for days, but I think burden of proof lies with the more complex system to prove it’s comparable and hopefully, safer and more reliable, than the less complex system.   Unfortunately, if it isn’t, we have increased the risks when we deploy such systems.