Saturday, March 12, 2011

The Kepler project sample

Project KEPLER, a NASA project to survey the galaxies and locate new planets that could possibly sustain life, had an initial data dump from its on-orbit satellite detector-collector last month, about which it was written
.... statistical tests of a sample suggest that 80 to 95 percent of the objects on it are real, as opposed to blips in the data, [although] .... new results represent only four months’ worth of data on a three-and-a-half-year project ....

Well now: what about samples? Is 'sampling' something project managers need to know about have a bit of understanding?

I know the answer to this one: Yes, sampling is a technical practice that can translate into cost and schedule savings, and potentially reduce other risks in the project.

Here's why I think sampling belongs in many projects:
  • Often not economical to obtain and evaluate every data point in a population
  • Usually not practical to reach every member of the population.
  • Often not possible to know every member of the population.
  • May take too much time to observe, measure, or interview every member of the population
  • May result in too much data to handle even if every member of the population were readily available–to include the expense of large volume data handling, and timeliness of large volume data handling

Of course it's always good to check with the experts, and in the modern era--that is, since 1953--William G. Cochran has been at the pinacle of experts.  His book, Sampling Techniques, first published in 1953 and now in it's 3rd edition is more less the defining word on the subject.  To the reasons above, Cochran would add:
  • Greater accuracy because, he says a few people of the highest degree of training can concentrate on getting it right with only a limited amount of data to look at, and
  • Greater scope because without sampling some problems would just be too hard to tackle.  Sampling brings them within reach.

On the other hand:
Sampling introduces risk into the project:
  • Risk that the information derived from the data sample may not accurately portray the population–there may be inadvertent exclusions, clusters, strata, or other population attributes not understood and accounted for.
  • Risk that some required information in the population may not be sampled at all; thus the sample information may be deficient and may misrepresent the true condition of the population.
  • Risk that in other situations, the data in the sample may be outliers and misrepresents the true relationship to the population; the sample may not be discarded when it should be
There are two risk assessments to be made.
  1. “Margin of error”, referring to the estimated error in the measurement, observation, or calculation of statistics related to the sample data
  2. “Confidence interval”, referring to the probability that true population statistics are within the range of the estimated statistics as calculated from the sample data 
Tensions in the project
Sampling introduces a tension between the budget managers and the risk managers. Tension is another word for risk.
  • Budget managers want to limit the cost of gathering more data than is needed–in other words, avoid oversampling–and thereby limit cost risk
  • Risk managers want to limit the impact of not having enough data--risking an error of interpretation--and thereby limit functional, feature, or performance risk.
Project policies
The risk plan customarily invokes a project management policy regarding the degree of risk that is acceptable in samples:  [Nation: this not the land of Six Sigma!]
  • “Margin of error” is customarily accepted between 3 and 5%
  • “Confidence Interval” is customarily a fixed percentage between 80 and 99%, most commonly 95% or 99%
My opinion: project managers should make the investment to understand what they're getting into and what the policy implications for the project really are.

Sampling protocols:
Don't try this at home.  But, given that you have access to someone trained in statistical protocols, the project's sampling protocol is designed by the risk manager to support the project manager's policy objectives.

Photo: Jet Propulsion Laboratory of NASA
Are you on LinkedIn?    Share this article with your network by clicking on the link.