Tuesday, April 5, 2016

Weight of evidence

If you are into risk assessments, you may find yourself evaluating the data

Evaluating the data usually means testing its applicability with a hypothesis. The process usually cited in project management chapters about risk is this: 
A hypothesis is formed. And then the evidence against the hypothesis—observable data—is evaluated. If the evidence against is scant, the hypothesis is assumed valid; otherwise: false.

Is this guessing? The hypothesis is true because no one seems to object? After all, how much evidence is against? Enough, or have you taken the time to really look?

Most of us would agree: the evidence-against-the-hypothesis does not always fit the circumstances in many project situations.

There are many cases where you've come to the fork in the road; what to do? Famed baseball player Yogi Berra once posited: "When you come to the fork in the road, take it!"

In the PM context, Yogi is telling us that with no data to evaluate, either road is open. In Bayes* terms: it's 50:50 a-priori of there being an advantage of one road over another.

Weight of Evidence
Enter: weight-of-evidence**, a methodology for when there is some data, but yet there is still a fork in the road. In this case, we consider or evaluate each "road"—in project terms, models or suppositions—and look at the ratio of probabilities. 
  • Specifically, the probability of model-1 being the right way to go, given the data, versus the probability of model-2, given the data.
Each such ratio, given the observable data, conditions, or circumstances, is denoted by the letter K: K equals the ratio of probabilities

Philosophers and mathematicians have more or less agreed on these strength ratios:
Strength ratio, aka “K”
Implied strength of evidence favoring one over the other
1 - 3
Not really worth a mention
3 -20
20 – 150
> 150
Very strong

Why form a ratio?
It's somewhat like a tug-of-war with a rope:
  • Each team (numerator team and denominator team) pulls for their side.
  • The analogy is that the strength of the pull is the strength or weight of evidence. Obviously, the weight favors the team the pulls the greatest. Equal weight for each team is the case for the rope not moving.
  • K is representative of the strength of the pull; K can be greater than 1 (numerator team wins), less than 1 (denominator team wins), or equal to 1 which is the equal weight case.
More data
The importance and elegance of the methodology is felt when there are several data sets—perhaps from different sources, conditions, or times—and thus there are many unique calculations of "K". 

You might find you have a set of K’s: K1 from one pair of teams, but K2 from another, and so on. What to do with the disparate K’s?

Sum the evidence
The K’s represent the comparative weight of evidence in each case. Intuitively, we know we should sum up the "evidence" somehow. But, since "K" is a ratio, we really can't sum the K’s without handling (carefully) direction.
  • That is: how would you sum the odds of 2:1 (K = 2) and 1:2 (K = 2, but an opposite conclusion)? We know that the weights are equal but pulling in opposite directions. Less obvious, suppose the opposing odds were 2:1 and 1:5?
Add it up
Fortunately, this was all sorted some 70 years ago by mathematician Alan Turing. His insight: 
  • What really needs to happen is that the ratio's be multiplied such that 2:1 * 1:2 = 1. 
  • To wit: evidence in opposite direction cancels out and unity results.
But, hello! I thought we were going to sum the evidence. What's with the multiplying thing?
Ah hah!  One easy way to multiply—70 years ago to be sure—was to actually sum the logarithms of K
It's just like the decibel idea in power: Add 3db to the power is the same as multiplying power by 2
Is it guessing?
You may think of it this way: when you have to guess, and all you have is some data, you can always wow the audience by intoning: "weight of evidence"!

Geeks beyond this point
Does everyone remember the log of 2 is 0.3? If you do, then our example of summing odds of 2:1 and 1:2 becomes: Log(2) + Log(1/2) = 0.3 – 0.3 = 0.
Of course the anti-Log of 0 is 1, so we are back at the same result we had by intuitive reasoning.
On first examination, this might seem an unusual complication, but it's a take-off on the slide rule and the older decibel (dB) measurement of power. They both multiply by summing: a 3db increase in power (or volume) means the power has doubled.
An example
What if we had four sets of evidence: odds of {10:1; 2:1; 1:2; and 1:20}. What’s the weight of evidence?
Using logarithms: log(10) = 1, log(2) = 0.3, log(1/2) = -0.3; log(20) = log(10) + log(2),
{1 + 0.3 – 0.3 – 1.3} = - 0.3 or odds of 1:2
Not all that obvious
*Bayes refers to a protocol of evaluating probabilities, given some a-priori conditions, with the idea of discovering a-posterior an underlying probable “truth”. Such discovery depends on an opportunity to gather more data, perhaps with other conditions attached.
**A practical application of the weight of evidence method is cryptanalysis. It was first developed into a practical methodology in WWII by famed mathematician Alan Turing working at storied Bletchley Park. Something of it is explained in the book "Alan Turing: The Enigma"

Read in the library at Square Peg Consulting about these books I've written
Buy them at any online book retailer!
Read my contribution to the Flashblog