Musings on project management: The battery project

Friday, March 29, 2013

The battery project

Imagine that you are the project manager for the now-infamous lithium batteries on Boeing's Dreamliner 787 aircraft -- or the program manager for the whole airplane. The preliminary report from the safety board has just been published. Could you live with this assessment by two informed followers of this incident?

Some of those details [of the preliminary report] raised questions about how Boeing could have misjudged the risks.

Christopher Drew and Jad Mouawad

Misjudged the risk? Judgement of risk is at best an estimate of uncertainty; there are are always misjudgments because there are no facts, only estimates and forecasts. All risks events are in the future; there are no facts about the future. The facts are in the past. All judgments are made in the context of uncertainty.

Failure modes
A better question is about failure modes: what failure modes did Boeing model/analyze; did they appreciate the effect of multiple and cumulative effects? It's reasonable and customary to evaluate the safety of something as complex as the battery -- a system that mixes chemistry, electronics, and flight safety -- with the Failure Mode and Criticality Analysis method (FMECA).

[Note: the FMECA link is to the DoD's ACQuipedia site, launched in 2012, that is the defense Aquisition equivalent to Wikipedia: In the DoD's blurb, we learn: "ACQuipedia serves as an online encyclopedia of common defense acquisition topics. Each topic is identified as an article; each article contains a definition, a brief narrative that provides context, and includes links to the most pertinent policy, guidance, tools, practices, and training, that further augment understanding and expand depth"]

NASA pioneered FMECA in the Apollo program when it was evident that probability risk analysis (PRA) was not going to get them there. (NASA originally called it FMEA)

Why FMECA? In complex systems with a lot of parts, it's not unusual that the expected value of failure is so high as to be meaningless. Thus, the alternative is to model the failures and trace their effects through networks and trees that represent failure effects and interactions. Each failure mode has its own PRA, but n a sense, we have to substitute weighted judgments for expected value. They are not the same. (If you don't understand this point, catch up by reading "Thinking, Fast and Slow" by Daniel Kahneman)

Preliminary report
Drew and Mouawad continue with a review of the preliminary report. We learn that

In an age of sophisticated computer modeling, Boeing engineers relied on the same test used for tiny cellphone batteries to gather data about the safety of the heftier lithium-ion battery on its new 787 jets: they drove a nail into it to see what happened

To be fair:

In the course of the original testing, the batteries were also subjected to other kinds of destructive tests, including provoking an external short circuit, overcharging the batteries for 25 hours, subjecting them to high temperatures of 185 degrees Fahrenheit for an extended period, and discharging them completely

Fortunately, for all of us who fly: