Saturday, August 9, 2014

The GIGO thing... revisited



Garbage in; garbage out... GIGO to the rest of us
 
Re GIGO, this issue is raised frequently when I talk about Monte Carlo simulation, and the GIGO thing is not without merit. These arguments are made:
  • You have no knowledge of what distribution applies to the uncertainties in the project (True)
  • You are really guessing about the limits of the three point estimates which drive the simulation (partly true)
  • Ergo: poor data information in; poor data information out (Not exactly! the devil is in the details)
Here are a few points to consider (file under: Lies, damn lies, and statistics):

First: There's no material consequence to the choice of distribution for the random numbers (uncertain estimates) that go into the simulation. As a matter of fact, for the purposes of PM, the choices can be different among tasks in the same simulation.
  • Some analysts choose the distribution for tasks on the critical path differently than tasks for paths not critical.
  • Of course, one of the strengths of the simulation is that most scheduling simulation tools identify the 'next most probable critical path' so that the PM can see which path might become critical.
Re why the choice is immaterial:  it can be demonstrated -- by simulation and by calculus -- that for a whole class of distributions, in the limit, their infinite sum takes on a Normal distribution.

  • X(sum) = X1 + X2 + X3 +.... +XN, where N is a very large number, X(Sum) is Normal, no matter what the distribution of X is
As in "all roads lead to Rome", so it is in statistics: all distributions eventually lead to Normal. (those that are practical for this purpose: single mode and defined over their entire range [no singularities]),

To be Normal, or normal-like, means that the probability (more correctly, the probability density function, pdf) has an exponential form. See: Normal distribution in wikipedia

We've seen this movie before in this blog. For example, few uniform distributions (not Normal in any respect), when summed up, the sum took on a very Normal appearance. And, more than an appearance, the underlying functional mathematics also became exponential in the limit.

Recall what you are simulating: you are simulating the sum of a lot of budgets from work packages, or you are simulating the sum of a lot of task durations. Therefore, the sim result is summation, and the summation is an uncertain number because every element in the sum is, itself, uncertain.

All uncertain numbers have distributions. However, the distribution of the sum need not be the same as the distribution of the underlying numbers in the sum. In fact, it almost never is. (Exception: the sum of Normal is itself Normal) Thus, it really does not matter what distribution is assumed; most sim tools just default the Triangular and press on.

And, the sim also tends to discount the GIGO (garbage in/garbage out) problem. A few bad estimates are likewise immaterial at the project level. They are highly discounted by their low probability. They fatten the tails a bit, but project management is a one-sigma profession. We largely don't care about the tails beyond, certainly not six sigma!

Second: most folks when asked to give a 3-point simply take the 1-pointer they intended to use and put a small range around it. They usually resist giving up anything so the most optimistic is usually just a small bit more optimistic than the 1-pointer; ....  and then they put something out there for the most pessimistic, usually not giving it a lot of thought.

When challenged, they usually move the most-likely 1-pointer a bit to the pessimistic side, still not wanting to give up anything (prospect theory at work here). And, they are usually reluctant to be very pessimistic since that calls into question the 1-pointer (anchor bias at work here). Consequently, you get two big biases working toward a more optimistic outcome than should be expected

Third: with a little coaching, most of the bias can be overcome. There is no real hazard to a few WAG because unlike an average of values, the small probability of the tails tends to highly discount the contribution of the WAGs. What's more important than reeling a few wild WAGs is getting the 1-pointer better positioned. This not only helps the MC Sim but also helps any non-statistical estimate as well.

Bottom line: the garbage, if at the tails, doesn't count for much; the Most likely, if a wrong estimate, hurts every methodology, whether statistical or not.




Read in the library at Square Peg Consulting about these books I've written
Buy them at any online book retailer!
http://www.sqpegconsulting.com
Read my contribution to the Flashblog