## Tuesday, May 3, 2011

### Confidence interval in 7 deadly sins

Mike Cohn, the guru at Mountain Goat Software, recently gave a webinar presentation to a bunch of PMI folks entitled "Agile and the Seven Deadly Sins of Project Management" [just click on the link for a free copy of the charts from Mountain Goat]

Overall, an informing presentation

In an explanation of how agile fights information opaqueness, Mike presented a slide with a bar chart of team velocities and announced a 'confidence interval' as the main take away.

Gasp!.... I was shocked! shocked! to hear statistics in an agile discussion; sounds so much like management--project management at that.  But, it's easy to tell that Mike is pragmatic, and confidence intervals are nothing if not practical.

Fair enough .... but actually there was no explanation given as to what entails a confidence interval.  I'll correct that failing here.

First, an interval of what?  Would you believe the possible value of a random variable?  And which would that be?  Answer: the sample average of velocity, call it V-bar.  And V-bar being a random variable, it has a distribution that prescribes how likely is any particular value of V-bar to fall into the interval of interest, ie, the confidence interval.

Second, we don't actually know the distribution of V-bar and we don't know the distribution of the population V (velocities), so we can't know what the next V is going to be, or its likelihood.  But, we know (from V-bar) an estimate of the population (V) mean.  Thus, we can use V-bar as an estimating parameter of velocity, even though V-bar does not predict the next velocity value. (Example: average team throughput = V-bar x input units, like story points or ideal days)

Third, since we don't know, and it's not economic to find out what the distribution of V-bar really is, it's customary to model it with a distribution that has been tried and proven for this purpose--the T distribution.

The T-distribution is somewhat like a bell shaped distribution, except T usually has fat tails for small values of the parameter N-1 where N is the count of the values in the sample.

So what are the chances for V-bar, and how do you figure that out from the data given in Mike's chart?

I've reproduced my version of Mike's chart below; there are 9 velocity metrics ranging from about 37 to 25:

To calculate the quality of the confidence interval, some iteration is required.  It's typical to first pick a level of confidence, say 95%, and then by use of a formula, and a set of 't' tables from the T distribution, calculate the corresponding interval.  If the results are not satisfying, a new pick of parameters may be required.

Here are the steps:
• Calculate the sample average V-bar, in this case 33, and the sample standard deviation, 4.1.  Formulas in Excel will give you these figures from the 9 velocity points in the chart above.
• Look up the 't' value in a T-distribution table for N-1.  N in this case is 9.
• Pick out the 't' value for 95% confidence [in t tables, it customary to look up a parameter labeled alpha; for 95% confidence, alpha = 0.05], in this case: 2.36 [there are formulas in Excel for this also]
You'll get something like this:

• Calculate the interval around the center point of V-bar:
+/- t * sample standard deviation / sqrt(N)
+/- 2.31 * 4.1 / sqrt(9)
+/- 3.2

With just a little inspection of the formula above, and the t-tables, you'll discover that the interval gets wider as alpha is picked to be smaller [higher confidence].  In the limit, to have 100% confidence in the interval, the interval would have to very wide to cover every possible case conceivable.

Values in the velocity chart outside the interval are outside the quality limits of 95% confidence.

For reference, here's the model of V-bar, specifically the T distribution with N-1 = 8:

Need more?  Check out these two references:

And, check this out at the Khan Academy: Bookmark this on Delicious

1. My take on the 7 sins

2. Hi John,

regarding the computation of the confidence intervall around V-Bar:
- The T-value is 2.306, not 2.36 (8 degrees of freedom, two-tailed, p = 0.975)
- sqrt(9), not sqrt(8), because the margin of error depends on the sample size, not on the degrees of freedom

More important than the calculation is checking of the necessary conditions:
- Are the 'velocities' metric variables?
If not, you can't compute the average, only the median and you are in trouble...
- Are the 'velocities' normally distributed?
If not, you can't use the T-distribution regardless whether "it's customary to model it with a distribution that has been tried and proven for this purpose(...)". With sample size 9 the central limit theorem is no remedy either...

Thank you for writing such an interesting blog.

Best Regards,

Winfried

3. Winfried: thanks for the two catches: yes, the N-1 should have been N; it is correct in the linked slideshare paper, but it got typo'd in this posting.

And, yes, the 2-tail T value is 2.306. I used a 2 tail function in the spreadsheet, but when I set up the DF in the excel formula, I put in a cell value for N-1. Then, sometime later, when I went to use the table, I forgot the formula already handled N-1 and I mentally calculated 8 rather than 9. That created an error of accounting for the -1 twice.

I have edited in the corrections.

Re the V values: they are continuous data, so calculations can be made.

Re the Normal assumption for the T distribution: strictly speaking, you're correct, but the way I posed it is often 'good enough' for project management purposes, especially if the population data is not too skewed, as this is not.

A lack of Normal distribution in the population from which the sample is taken will generally widen the interval.

And, a lack of Normal violates the independence of mean and variance that is a feature of the Normal distribution. In turn, this violates one assumption about T that is: the component Z and chi-sq variables will be independent. This is likely violated in this case, as in most cases encountered in PM. The V values in the sample will be correlated to some extent by common environment, team practices by common members, etc.

Nevertheless, I still contend that over a practical range of need, the T is 'good enough' as a model for unknown distribution since at least chi-sq component of T does incorporate the effects of the sample variance.