Monday, May 24, 2010

All things bell shaped

Chapter 14 from the GAO Cost Estimating Manual on "Cost Risk and Uncertainty" is a good read, easily understood, and very practical in its examples.  Here's one illustration that I particularly like.  When you look at it, it's understood in a moment that the repeated random throw of two dice generates a probability density function [PDF] that has a bell-shape curve that is tending towards a true Normal distribution.


Statisticians call this phenomenon the Central Limit Theorem: random occurrences over a large population tend to wash out the asymmetry and uniformness of individual events.  A more 'natural' distribution ensues.  The name for it is the Normal distribution, more commonly: the bell curve.

Here's what it looks like to a project manager.  Notice that regardless of the distribution of cost adopted by  work package managers for each individual work package, in the bigger picture at the summation of the WBS there will tend to be a bell-shaped variation in the WBS budget estimate.  In part, use of these ideas addresses the project manager's need to understand the parameters of variation in the project budget as evidenced by the esitmates of WBS.  This diagram is (again) from Chapter 14 of GAO's manual:


If the risk analyst generates these data from a simulation, like a Monte Carlo simulation, then the numeric statistics like variance and standard deviation are usually reported, along with the cumulative probability more commonly called the "S" curve.  In the diagram, on the right side, we see the cumulative curve plotted and labeled on the vertical axis as the confidence level.  With a little inspection, you will realize that the cumulative curve is just the summation of the probabilities of the bell curve that is adjacent on the left.

The GAO manual, and especially Chapter 14, has a lot more information that is well explained.  Give it a read.

Are you on LinkedIn? Share this article with your network by clicking on the link.

6 comments:

  1. Bell curve is an idealization of reality similar to uniform motion in mechanics, incompressible fluid in hydrodynamics, or absolute black body in the theory of radiation.
    This distribution has excellent applications in the analysis industrial processes in biology and botany. But the application of normal distribution for the study of problems such as life duration is facing serious difficulties.

    The fact that the usual life span can be represented as the sum of the sequential time slices. Therefore, once there is a desire to apply to this amount the central limit theorem, which necessarily leads to a normal distribution of life duration.

    But on the other hand, we know many examples of the asymmetries of life duration. This especially concerns the duration of complex projects.

    That is, on the one hand, we know many examples of the asymmetries of life duration distribution, but on the other hand, according to the central limit theorem any life duration must obey the normal distribution.

    This apparent paradox can be solved as follows. The central limit theorem is valid when the sequential time slices have the same distribution and finite variance. These conditions in the field of project management are often not implemented, so in the general case, the central limit theorem in its classical form is not applicable for the analysis of the duration of projects.

    In this case we have another version of the central limit theorem, when the sum has a stable distribution and asymmetrical look (http://en.wikipedia.org/wiki/Stable_distribution).
    Details of the corresponding analysis you can find here:
    http://www.pmforum.org/library/papers/2009/PDFs/mar/Human-Effort-Dynamics-and-Schedule-Risk-Analysis.pdf
    http://www.pmforum.org/library/papers/2010/PDFs/march/FP-BARSEGHYAN.pdf
    http://www.pmforum.org/library/papers/2010/PDFs/april/FP-BARSEGHYAN.pdf
    Pavel Barseghyan

    ReplyDelete
  2. Pavel: your white papers on PMForum are very insightful. I think Figure 1 in your April 2010 paper is particularly instructive.

    Re the use of the Central Limit Theorem in projects and whether projects really meet the criteria for validity: To a reasonable approximation that is valid as a day-to-day heuristic, the CLT provides some useful insight.

    The primary insight being that the choice of distribution at the work package level, which is nothing more than an educated guess, is not too important to the forecast of variations in the summary outcome. So, don't sweat that detail.

    The next most important Project Management heuristic is that in the summary, any one work package's mal-performance is not likely to dominate the results (as it might in a simple average), unless the rules are not in place that constrain work package planning.

    And the last is that the CLT teaches that about as much is going to go right as go wrong. So, maintain situational awareness for opportunities to offset threats.

    However, if a project is working on the edge of feasibility, then there will be rare events, largely unpredictable, that have no (practical) statistical distribution (because of they are so rare), These invalidate even the approximations that go into the CLT.

    ReplyDelete
  3. Pavel,
    The "normal" distribution (Gaussian) is an indication of randomness with independent events. It is NOT an idealization of reality but a "test" for independent randomness.

    None of the items you mention have independent random samples. Black Body is non-symmetrical - the Planck curve, NCFF is a highly coupled system where the past influences the future. This is a test of Gaussian - not past event influences future events - that is not the case in fluid flow.

    In the project world, one would have to show there is no coupling between events - durations, costs, technical performance - before the Normal distribution can be applied.

    ReplyDelete
  4. John,

    I'm meeting with some of the authors of the GAO at EVM World. The chart you've shown is correct ONLY if all the WBS elements are independent statistical processes!

    Wow, I have noticed this.

    ReplyDelete
  5. John,
    Between the linear region and the region of rare events there is a huge non-linear region of the complex, but feasible projects. We need non-linear transformations and, consequently, the asymmetric distribution for just this region. Competition always forces people to work in the non-linear region.

    Glen,
    Normal distribution is an idealization in the sense of independency of successive actions in project management. In real life the durations of successive actions are dependent events, and the human productivities are highly controllable. Detail you can find here.
    http://www.pmforum.org/library/papers/2010/PDFs/feb/FP-Barseghyan-Parkinson'sLaw.pdf
    Pavel

    ReplyDelete
  6. Pavel,

    The normal distribution is a "test" of independence of the sample space. Many (most) continuous random variables are "normally distributed if and only if they are independent and there is a "large" population of them.

    Neither is the case for project random variables - there is a small population and they are not independent.

    All of these "distribution" issues, static models as mentioned in your paper, are dispensed with when Monte Carlo Simulation is introduced. Individual probability distributions are assign to durations, cost, productivity, etc. then the model produced the CDF for the examined variable.

    In his way the "distribution" wars is passed over and all we need to know is what is the individual distribution of the examoned variable set - in the absence of this Triangle produces results that have .85 correlation and above with the known distribution in the absence of "really odd" shapes.

    The example from the GAO report, where various distributions are summed only produces a Normal Distribution if the sample are large many 10's of 1,000's and each sample is independent. Again neither is the case in practice. The example must be considered notional, and possible erroneously notional.

    The modeling of dynamic system - at least in our domain of space and defense - has moved to Monte Carlo Markov Chains, with Bayesian inferences. NASA leads this effort with
    http://www.hq.nasa.gov/office/codeq/doctree/SP2009569.pdf

    I'd conjecture that human productivities are NOT highly controllable, in fact just the opposite. I'd like to see sufficient sample of productivities in engineering or software development projects that show (with statistical confidence) that productivity is in fact controllable and the units of measure of that controllable. Then I'd absolutely like to speak with the managers to produce such results, so we can apply them to our spaceflight avionics software development projects. "'Cause it ain't working for us," and we've been writing avionics software for decades and still can't predictability forecast cost and schedule more than with an 80% confidence.

    What would be the variance and confidence on that variance of "highly controllable?"

    ReplyDelete