*Concept 1: Centrality*Most phenomenon of interest to projects, particularly naturally occuring phenomenon, tend to cluster around a central value, given enough samples or examples. Obviously, this let's out the so-called long-tail 'black swans', but project managers can go a long way if they understand that central clustering is the norm ... in effect, the default.

The measures are average and expected value. The former is applicable when the data is known; the latter when the data is probabilistic and the numerical value is not known until an event occurs. In calculating the average, each value is equally weighted; in calculating the expected value, each value is weighted by its probability.

*Concept 2: Variation*Yes, things cluster, but that simply means that around the central value there is a range within which things are nearby the center, but not exactly on the center.

It's more likely things are close to the center than not: that's an effect of centrality on variation

The measures of variation are variance and standard deviation. Variance is a figure-of-merit related to the distance, or error, between a point in the range and the central value. Standard deviation is a more direct measurement of distance, having the same dimensions as the points in the range. Engineers refer to the standard deviation as the root-mean-square, or RMS value.

*Concept 3: Position*Sometimes it's enough to know just the position of a data value in the range. Names associated with position are quartile and percentile, and the so-called 'Z' position.

Z is just a value in the 'standard range' divided by the standard deviation. [A 'standard range' has a '0' average] For project management purposes, the 'Z-position' extends +/- 3 units from the average or expected value for most situations.

Dividing a range into 4 quartiles requires defining 3 boundary points: Q1, Q2, and Q3. Quartile is all about count, not value per se. Just rank all the values in the range in ascending order. Divide the count into quarters. Q1, Q2, and Q3 are the count values that divide the range.

If a value is in the first quartile, that means that 75% of all the values in the range are greater than the Q1 value, and by extension 75% of all the values are greater than any value in the first quartile.

Bookmark this on Delicious

Remember that centrality ONLY works is all the individual elements are independent probability distributions and there are a large number of them.

ReplyDeleteRarely are risks independent. Certainty cost and schedule aren't. And I'd have a hard time finding a piece of equipment - say a spacecraft were the risks are independent.

The number of samples needed to "tend to the mean" is still unknown to me. I'm going to be forced to dig into the stats books to find that answer, but it's not small in the sense of dozen's. I'm remembering 1,000's.

Actually, centrality--meaning a tendency towards a central value [not absolute symmetry] works in an environment of interdependencies and correlations; however, the resulting distribution will not be Normal.

ReplyDeleteYou are correct that for the results to be a Normal distribution, all contributers must be independent.

But, PM's are not statisticians, so 'good enough' goes a long way toward providing guidance.

As regards sampling, and the number of samples, it depends on whether you are estimating proportions or descriptive statistics, like average. And it depends on the sample design, and it depends on the underlying skewness of the population, which is not known. It also depends on the confidence interval you require as a matter of policy.

If you are sampling for descriptive statistics, there are quite valid conclusions to be drawn with as few a 30 sample values

If you sampling for proportions, then a few thousand sample values will be needed for 95% confidence

See my white paper at slideshare.net/jgoodpas/project-examples-for-sampling-and-the-law-of-large-numbers