Thursday, October 1, 2020

All is not a Bell Curve



It's a bell, unless it's not
For nearly all of us when approaching something statistical, we imagine the bell-shape distribution right away. And, we know the average outcome is the value at the peak of the curve.

Why is it so useful that it's the default go-to?  Because many, if not most, natural phenomenon with a bit of randomness tend to have a "central tendency" or preferred state of value. In the absence of influence, there is a tendency for random outcomes to cluster around the center, giving rise to the symmetry about the central value and the idea of "central tendency". To default to the bell-shape really isn't lazy thinking; in fact, it's a useful default when there is a paucity of data. 

Some caution required: Some useful stuff in projects is not bell shaped.  Yes, the bell shape does serve as a most useful surrogate for the probable patterns of complex systems, but no: the bell-shape distribution is not the end-all and be-all.

But if no central tendency?
Lots of important stuff that projects use every day have no central tendency and no bell curve distribution. Perhaps the most common and useful is the Pareto distribution. Point of fact: the Pareto concept is just too important to be ignored, even by the "bell-thinkers".

The Pareto distribution, which gives rise to the 80/20 rule, and its close cousin, the Exponential distribution, is the mathematical underpinning for understanding many project events for which there's no average with symmetrical boundaries--in other words, no central tendency.

Jurgen Appelo, an agile business consultant, cites as example of the "not-a-bell-phenomenon" the nature of a customer requirement. His assertion: 
The assumption people make is that, when considering change requests or feature requests from customers, they can identify the “average” size of such requests, and calculate “standard” deviations to either side. It is an assumption (and mistake)...  Customer demand is, by nature, an non-linear thing. If you assume that customer demand has an average, based on a limited sample of earlier events, you will inevitably be surprised that some future requests are outside of your expected range.

Average is often not really an average
In an earlier posting, I went at this a different way, linking to a paper on the seven dangers in averages. Perhaps that's worth a re-read.

So far, so good.  BUT.....

What's next to happen?
A lot of stuff that is important to the project manager are not repetitive events that cluster around an average. The question becomes: what's the most likely "next event"? Three distributions that address the "what's next" question are these:

  • The Pareto histogram [commonly used for evaluating low frequency-high impact events in the context of many other small impact events], 
  • The Exponential Distribution [commonly used for evaluating system device failure probabilities], and 
  • The Poisson Distribution, commonly used for evaluating arrival rates, like arrival rate of new requirements


Even so, many "next events" do cluster
But project managers are concerned with the collective effects of dozens, or hundreds of dozens of work packages, and a longer time frame, even if practicing in an Agile environment.  Regardless of the single event distribution of the next thing down the road, the collective performance will tend towards a symmetrically distributed central value. 

For example, I've copied a picture from a statistics text I have to show how fast the central tendency begins.  Here is just the sum of two events with Exponential distributions [see bottom left above for the single event]:

Good enough
For project managers, central tendency is a 'good enough' working model  that simplifies a visualization of the project context.

The Normal curve is common surrogate for the collective performance.  Though a statistician will tell you it's rare that any practical project will have the conditions present for truly a Normal distribution, again: It's good enough to assume a bell shaped symmetric curve and press on.




Buy them at any online book retailer!

2 comments:

  1. The Bell Curve can ONLY happen when the underlying processes is Independent and Identically Distributed (IID), which is only the case in a text book example

    ReplyDelete
  2. Glen: quite right re the text book. So, in the real world where independence is rare, lack of independence usually flattens the curve a bit, may introduce some lack of symmetry, and flairs the tails. Nonetheless, the approximations to central clustering may be "good enough" for PM purposes.

    ReplyDelete