Wednesday, December 27, 2017

Big data -- a quotable snip

For years and years, PMs have been "little data" people. We plot data points, compute averages, look at 1-sigma, 2-sigma (and sometimes, 6-sigma) limits, ponder the Central Limit Theorem, and wonder about the law of large numbers

Fair enough

What then is "big data" if not simply more numbers, more data points?
Andrew Gelman -- eminent authority in the statistical analysis field -- has this pithy answer:
“Big Data” is more than a slogan; it is our modern world in which we learn by combining information from diverse sources of varying quality."
The biggie, of course, is the key phrase "diverse sources of varying quality". That certainly fits the project world -- we are not, after all, operations or manufacturing or distribution where processes are well defined and the data is all about process quality.

And, so what is it we do when the data quality varies?
  • Get more, to see if there is a discernible and useful pattern
  • Use Bayesian techniques to refine hypothesis based on observations and feedback
  • Throw out the obviously bad stuff, though try not to throw it out just because it's an inconvenient counter-point
Of these, Bayesian techniques are the most powerful.
Not familiar with Bayes? Search this blog site; you'll find a lot of stuff here

