Friday, December 11, 2015

Data splits 3-ways

Andrew Gelman has an interesting observation: Rather than using the median to split data into "above" and "below" the median, putting data into three buckets may often be more effective.

Why? For one thing, the real information may not be how close to the median you are, but are you in the tails? A three way split makes that identification easier (less thinking)

And, continuous data can simply be put in buckets. All those messy data points from continuous variables can just be reduced to +1 , 0, or -1.  How nice!

Of course, to project managers, the idea has been known for a long time: it's kind of like either being +/- one sigma from the mean, or your data points are in the upper or lower tail.

So, Andrew opines: For pure efficiency it’s still best to keep the continuous information, of course, but if you have the a goal of clarity in exposition and you are willing to dichotomize, I say: trichotomize instead.

And, there's this bit of wisdom: "[Andrew] suspects that part of the motivation for dichotomizing is that people like to think deterministically.

For example, instead of giving people a continuous [performance] score and recognizing that different people have different [outcomes] at different times, just set a hard threshold: then you can characterize people deterministically and not have to think so hard about uncertainty and variation.

As Howard Wainer might have said at some point or another: people will work really hard to avoid having to think."

