Musings on project management: Does software fail?

Thursday, March 26, 2015

Does software fail?

Does software fail, or does it just have faults, or neither?
Silly questions? Not really. I've heard them for years.

Here 's the argument for "software doesn't fail": Software always works the way it is designed to work, even if designed incorrectly. It doesn't wear out, break (unless you count corrupted files), or otherwise not perform exactly as designed. To wit: it never fails

Here's the argument for "it never fails, but has faults": Never fails is as above; faults refer to functionality or performance incorrectly specified such that the software is not "fit for use". Thus in the quality sense of "fit for use" it has faults.

I don't see an argument for "neither", but perhaps there is one.

However, Peter Ladkin is not buying any of this. In his blog at "the Abnormal Distribution", he has an essay, part of which is here:

What’s odder about the views of my correspondent is that, while believing “software cannot fail“, he claims software can have faults. To those of us used to the standard engineering conception of a fault as the cause of a failure, this seems completely uninterpretable: if software can’t fail, then ipso facto it can’t have faults.

Furthermore, if you think software can be faulty, but that it can’t fail, then when you want to talk about software reliability, that is, the ability of software to execute conformant to its intended purpose, you somehow have to connect “fault” with that notion of reliability. And that can’t be done. Here’s an example to show it.

Consider deterministic software S with the specification that, on input i, where i is a natural number between 1 and 20 inclusive, it outputs i. And on any other input whatsoever, it outputs X. What software S actually does is, on input i, where i is a natural number between 1 and 19 inclusive, it outputs i. When input 20, it outputs 3. And on any other input whatsoever, it outputs X. So S is reliable – it does what is wanted – on all inputs except 20. And, executing on input 20, pardon me for saying so, it fails.

That failure has a cause, and that cause or causes lie somehow in the logic of the software, which is why IEC 61508 calls software failures “systematic”. And that cause or causes is invariant with S: if you are executing S, they are present, and just the same as they are during any other execution of S.

But the reliability of S, namely how often, or how many times in so many demands, S fails, depends obviously on how many times, how often, you give it “20″ as input. If you always give is “20″, S’s reliability is 0%. If you never give it “20″, S’s reliability is 100%. And you can, by feeding it “20″ proportionately, make that any percentage you like between 0% and 100%. The reliability of S is obviously dependent on the distribution of inputs. And it is equally obviously not functionally dependent on the fault(s) = the internal causes of the failure behavior, because that/those remain constant.