Café de la Régence: The (biggest) Problem with Statistical Significance

Monday, 27 January 2014

The (biggest) Problem with Statistical Significance

Statistical Significance is the standard test of whether a study or experiment in science or social science provides enough evidence to support a conclusion. The basic idea is that in testing some hypothesis H1, you start by assuming a null hypothesis H0. H0 is essentially the negation of H1: for example, if the hypothesis to be tested is that eating deep-fried Mars bars increases one's risk of heart disease, then H0 is that it doesn't.

Having made this assumption, we of course wish to disprove it. To this end we conduct an experiment (or get hold of data forming a natural experiment) which tests a prediction we can make based upon H0 - in the deep-fried Mars bars example, this might be that there should be no correlation between consuming deep-fried Mars bars and suffering from heart disease. If the result of the experiment would have a less than 5% (or, if we're being more stringent, 1%) chance of happening given the truth of the null hypothesis then we are allowed to reject it and conclude in favour of the truth of H1.

To see the problem with this, consider this xkcd comic:

Call the evidence - in this case, the output of the neutrino detector - E. Our hypothesis to test, H1, is that the sun has exploded. H0 is that the sun has not exploded. The probability of an event or of the truth of a proposition is denoted by P(event or proposition).

The statistical significance test "works" by finding P(E, conditional upon H0 being true). However, this represents a fundamental confusion, because that isn't what we ought to be interested in: we really want to know P(H1, conditional upon E being true) - that is, the probability of the hypothesis itself, rather than that of the evidence.

How do we find this, though? The answer is through Bayes' Theorem, a simple mathematical equation discovered by the Reverend Thomas Bayes (1701-1761). It is as follows:

P(H1 given E) = P'(H1) x P(E given H1)

P'(H1) x P(E given H1) + P'(H0) x P(E given H0)

Where P'(H1) and P'(H0) are our previous estimates of the probabilities of H1 and H0, the probabilities we attached to these before the experiment, generally referred to as our priors. (Where do our priors come from? That's a deep and involved question, upon which I do not feel qualified to pass even the slightest comment).

In the sun example, then, suppose our prior for "the sun has exploded" is one in a million. Then

P(sun has exploded) = P'(H1) x P(E given H1)

P'(H1) x P(E given H1) + P'(H0) x P(E given H0)

= (one in a million) x (one)

(one in a million) x (one) + (999,999 in a million) x (one in thirty-six)

Which works out at somewhat less than one in a million, but still very unlikely. In real life, our prior for the sun having exploded would be well under one in a million.

So the result of this is that statistical significance can give credence to ridiculously unlikely views. But the mistake involved is subtle - my guess is that most people give a less than 1% chance to the idea that another person can read their minds and yet if this happened...