Friday, July 21, 2017

Small data

I've written before that the PMO is the world of 1-sigma; 6-sigma need not apply. Why so? Not enough data, to wit: small data.

Small data drives most projects; after all, we're not in a production environment. Small data is why we approximate, but approximation is not all bad. You can drive a lot of results from approximation.

Sometimes small data is really small.  Sometimes, we only have one observation; only one data point. Other times, perhaps a handful at best.

How do we make decisions, form estimates, and  work effectively with small data? (Aren't we told all the magic is in Big Data?)

Consider this estimating or reasoning scenario:
First, an observation: "Well, look at that! Would you believe that? How likely is that?"
Second, reasoning backward: "How could that have happened? What would have been the circumstances; initial conditions; and influences?"
Third, a hypothesis shaped by experience: "Well, if 'this or that' (aka, hypothesis) were the situation, then I can see how the observed outcome might have occurred"
Fourth, wonderment about the hypothesis: "I wonder how likely 'this or that' is?
Fifth, hypothesis married to observation: The certainty of the next outcome is influenced by both likelihoods: how likely is the hypothesis to be true, and how likely is the hypothesis -- if it is true -- to produce the outcome?

If you've ever gone through such a thought process, then you've followed Bayes Rule, and you reason like a Bayesian!

And, that's a good thing. Bayes Rule is for the small data crowd. It's how we reason with all the uncertainty of only having a few data points. The key is this: to have sufficient prior knowledge, experience, judgment to form a likely hypothesis that could conceivably match our observations.

In Bayes-speak, this is called having an "informed prior".  With an informed prior, we can synthesize the conditional likelihoods of hypothesis and outcome. And, with each outcome, we can improve upon, or modify, the hypothesis, tuning it as it were for the specifics of our project.

But, of course, we may be in uncharted territory. What about when we have no experience to work from? We could still imagine hypotheses -- probably more than one -- but now we are working with "uninformed priors". In the face of no knowledge, the validity of the hypothesis can be no better than 50-50.  

Bottom line: Bayes Rule rules! 

Read in the library at Square Peg Consulting about these books I've written
Buy them at any online book retailer!
Read my contribution to the Flashblog