Truth, Lies & Statistics
When it comes to storytelling, we have a problem.
It’s not our fault though – as human beings we are hard-wired from birth to look for patterns and explain why they happen. This problem doesn’t go away when we grow up though, it becomes worse the more intelligent we think we are. We convince ourselves that now we are older, wiser, smarter, that our conclusions are closer to the mark than when we were younger (the faster the wind blows the faster the windmill blades turn, not the other way around).
Even really smart people see a pattern and insist on putting an explanation to it, even when they don’t have enough information to reach such a conclusion. They can’t help it.
This is the thing about being human. We seek explanation for the events that happen around us. If something defies logic, we try to find a reason why it might make sense. If something doesn’t add up, we make it up.
Ever heard the Latin expression Post Hoc, Ergo Propter Hoc, meaning ‘After this, therefore because of this’? It is the basis of the saying ‘Correlation Does Not Imply Causation’, also known in statistics as the Post Hoc Fallacy because it’s a very familiar trap that we all fall into from time to time. This is the idea that when things are observed to happen in sequence, we infer that the thing that happened first must have caused the thing that happened next.
The Post Hoc Fallacy is what causes a football manager to only wear purple socks on match days. He once wore them at a match and his team won. Obviously, it was the socks that did it. Now he fears that if doesn’t wear them to a match the team might lose. Damn those stinky purple socks (he also daren’t wash them for fear of the magic pixie dust washing out).
Post Hoc is also what makes rain men indispensible to the tribe – they believed that their rain man can make it rain. Spotting the clouds brewing in the distance, the rain man dances until it pours it down. It doesn’t usually take more than three or four days of dancing until the inevitable happens. “Rain man dance, water fall from sky”. It’s just a good job for the rain man that the Indians couldn’t speak Latin, otherwise he’d have been in real trouble…
For a humorous view of the Post Hoc Fallacy, let’s take a look at Pastafarianism. It’s all the rage these days. Not heard of it? It’s one of the newest and fastest growing religions on the block. Pastafarian Sparrowism, to give it its full title, is a ‘vibrant religion that seeks to bring the Flying Spaghetti Monster’s fleeting affection to all of us, through the life of His Prophet, Captain Jack Sparrow’. Seriously, they’re not joking. Well, actually, they are. They promote a light-hearted view of religion and oppose the teaching of intelligent design and creationism in public schools. They also maintain that pirates are the original Pastafarians.
In an effort to illustrate that correlation does not imply causation, the founder, Bobby Henderson, presented the argument that global warming is a direct effect of the shrinking number of pirates since the 1800s, and accompanied it with this graph:
Pirates Caused Global Warming. Honest...
Wow, look at that straight line, I hear you all say – there’s clearly a correlation between the decline in the numbers of pirates and the rise in global temperatures, so there just must be a causal connection here, mustn’t there? Yup, you’ve all just fallen for the Post Hoc Fallacy (I just knew you would).
Just because there is a straight line on the graph it doesn’t necessarily follow that one thing caused the other, particularly when you’ve grabbed two seemingly unconnected variables at random and stuck them together to see whether there might be some sort of tenuous correlation between them. In the case of pirates and global warming, take a closer look at the labels on the x-axis. Notice something strange? Apart from the fact that the proportions of neighbouring data points are all out of whack, there is also the issue that a couple of them have been humorously disordered to deliberately deceive.
I don’t know about you, but I’m a believer! As soon as I’ve finished writing this book I’m giving up stats for a life as a pirate on the open seas. I’ll stop global warming if it’s the last thing I do.
It probably will be…
If you look online there are all sorts of humorous graphs that prove the Post Hoc Fallacy. Over the past 20 years or so, there’s been a huge increase in the anti-vaccine movement, particularly in the US, and there have been all sorts of spurious correlations that have been ‘discovered’ that ‘prove’ that there is a causal link between vaccination programmes and autism. At the same time, to debunk the most crackpot of the theories, other – equally ridiculous – correlations have popped up too.
There was one that was published that showed the correlation between sales of organic food in the US and diagnosis of autism:
Organic Food Causes Autism. Oh My...
There is a very close correlation between the pair of plot lines, even accompanied by a very large r-value (close to 1) and a very small p-value (close to 0). The suggestion is that – if we trust that correlation does imply causation – a much closer correlation exists between organic food and autism than any other theory that currently exists, so therefore it must be the cause. Except that correlation does not necessarily imply causation, and organic food does not cause autism. That would be ridiculous. And that is the whole point of these graphs. All you need to do is find any pair of variables that increase over the same time period, plot them on a graph with the same x-axis and different y-axes, adjust the y-axis scales until the plot lines coalesce, and – BOOM – correlation! If, by some magic of coincidence and fate, there is a statistical correlation, then publish the p-value that goes along with it as additional proof. What this does is prove that the correlation exists, but it does not prove that one thing causes the other. It might, but then again it might not…
I also quite enjoyed the correlation that proved that Mexican lemons are a major cause of deaths on US roads. Wait, what? I must have missed the news that day – Mexican lemons are killing Americans? You bet!
Take a look at a plot of the number of fresh lemons imported into the USA from Mexico versus the total fatality rate on US highways between 1996 and 2000:
Mexican Lemons Kill Americans!
My, my, just look at the R2 value – it really must be true. Although the graph seems to be telling us that the more Mexican lemons there are in the US the fewer road deaths there are, the inescapable conclusion is that MEXICAN LEMONS KILL AMERICANS! What should we do about it? Should we import more Mexican lemons (the correlation tells us that this is what we should do)? Or should we ban Mexican lemons altogether? After all, if there are no Mexican lemons on the streets then they can’t kill any more Americans.
What utter tosh! I don’t care if there is a correlation, there is nothing to suggest that lemons cause accidents. If there was, don’t you think that lemons would be causing accidents on Mexican roads before the trucks crossed into the US? What about Sicilian lemons? Do they cause road deaths in Italy and across Europe?
Oh, the power of correlations. As long as your audience doesn’t understand that correlation does not necessarily imply causation you can make them believe pretty much anything.
blog comments powered by Disqus