Statistics is Dead – Long Live Data Science…
I keep hearing Data Scientists say that ‘Statistics is Dead’, and they even have big debates about it attended by the good and great of Data Science.
Interestingly, there seem to be very few actual statisticians at these debates.
So why do Data Scientists think that stats is dead?
Where does the notion that there is no longer any need for statistical analysis come from?
And are they right?
Is statistics dead or is it just pining for the fjords?
I guess that really we should start at the beginning by asking the question ‘What Is Statistics?’. I’ve already written a blog post on this here.
Briefly, what makes statistics unique and a distinct branch of mathematics is that statistics is the study of the uncertainty of data.
So let’s look at this logically. If Data Scientists are correct (well, at least some of them) and statistics is dead, then either (1) we don’t need to quantify the uncertainty or (2) we have better tools than statistics to measure it.
Quantifying the Uncertainty in Data
Why would we no longer have any need to measure and control the uncertainty in our data?
Have we discovered some amazing new way of observing, collecting, collating and analysing our data that we no longer have uncertainty?
I don’t believe so and, as far as I can tell, with the explosion of data that we’re experiencing – the amount of data that currently exists doubles every 18 months – the level of uncertainty in the data is on the increase.
So we must have better tools than statistics to quantify the uncertainty, then?
Well, no. It may be true that most statistical measures were developed decades ago when ‘Big Data’ just didn’t exist, and that the ‘old’ statistical tests often creak at the hinges when faced with enormous volumes of data, but there simply isn’t a better way of measuring uncertainty than with statistics – at least not yet, anyway.
So why is it that many Data Scientists are insistent that there is no place for statistics in the 21st Century?
Well, I guess if it’s not statistics that’s the problem, there must be something wrong with Data Science.
So let’s have a heated debate...
What is Data Science?
Nobody seems to be able to come up with a firm definition of what Data Science is.
Some believe that Data Science is just a sexed-up term for statistics, whilst others suggest that it is an alternative name for ‘Business Intelligence’. Some claim that Data Science is all about the creation of data products to be able to analyse the incredible amounts of data that we’re faced with.
I don’t disagree with any of these, but suggest that maybe all these definitions are a small part of a much bigger beast.
To get a better understanding of Data Science it might be easier to look at what Data Scientists do rather than what they are.
Data Science is all about extracting knowledge from data (I think just about everyone agrees with this very vague description), and it incorporates many diverse skills, such as mathematics, statistics, artificial intelligence, computer programming, visualisation, image analysis, and much more.
It is in the last bit, the ‘much more’ that I think defines a Data Scientist more than the previous bits. In my view, if you want to be an expert Data Scientist in Business, Medicine or Engineering then the biggest skill you’ll need will be in Business, Medicine or Engineering. Ally that with a combination of some/all of the other skills and you’ll be well on your way to being in great demand by the top dogs in your field.
In other words, if you want to call yourself a Data Scientist you really do need to be an expert in your field as well as having some of the other listed skills.
Are Computer Programmers Data Scientists?
On the other hand – as seems to be happening in Universities here in the UK and over the pond in the good old US of A – there are Data Science courses full of computer programmers that are learning how to handle data, use Hadoop and R, program in Python and plug their data into Artificial Neural Networks.
It seems that we’re creating a generation of Computer Programmers that, with the addition of a few extra tools on their CV, claim to be expert Data Scientists.
I think we’re in dangerous territory here.
It’s easy to learn how to use a few tools, but much much harder to use those tools intelligently to extract valuable, actionable information in a specialised field.
If you have little/no medical knowledge, how do you know which data outcomes are valuable?
If you’re not an expert in business, then how do you know which insights should be acted upon to make sound business decisions, and which should be ignored?
Enjoying this blog post? Share it with the world...
Plug-And-Play Data Analysis
This, to me, is the crux of the problem. Many of the current crop of Data Scientists – talented computer programmers though they may be – see Data Science as an exercise in plug-and-play.
Plug your dataset into tool A and you get some descriptions of your data. Plug it into tool B and you get a visualisation.
Want predictions? Great – just use tool C.
Statistics, though, seems to be lagging behind in the Data Science revolution. There aren’t nearly as many automated statistical tools as there are visualisation tools or predictive tools, so the Data Scientists have to actually do the statistics themselves.
And statistics is hard.
So they ask if it’s really, really necessary.
I mean, we’ve already got the answer, so why do we need to waste our time with stats?
So statistics gets relegated to such an extent that Data Scientists declare it dead.
Talk about the lunatics running the asylum…
If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:
- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro
Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links
You can find further details in our TCs
Statistics - The Last Dark Art?
Statistics isn't some mystical black art. You don't need runes, capes, daggers or to sacrifice a virgin at the full moon. Well, not unless you really want to…
Learn the statistics basics with this witty and informative blog post.
Data Scientist: The Sexiest Job of the 21st Century
In the October 2012 edition of the Harvard Business Review, DJ Patil and Thomas Davenport introduced the new term of Data Scientist to the world. And it stuck. Learn what it means to be a Data Scientist.
Truth, Lies & Statistics
Pirates, cats, Mexican lemons and North Carolina lawyers. Cheese consumption, margarine and drowning by falling out of fishing boats. This book has got it all.
A roller coaster of a book in 8 witty chapters, this might just be the most entertaining statistics book you’ll read this year.
21 Inspirational Books for All Aspiring Data Scientists
It's always difficult knowing where to start, but especially so when it comes to Data Science. No need to fret, though - we've selected our top 21 books that all aspiring data scientists should read.
These will get you going in no time...
Videos & Video Courses
The Best FREE Data Science Courses at Udemy
Udemy is a great place to learn how to be a Data Scientist. It's got courses on everything you need to learn - statistics, R and Python, Machine Learning, Data Visualisation and all sorts of other stuff.
These courses are all FREE, so are pretty basic. Nvertheless they're a great place to get started.
Learn to be a Data Science Ninja - The Easy Way
If you want to be an expert data scientist you're going to need a strong grounding in statistics, computer programming, machine learning, data visualisation and more, and Udemy is a great place to learn these skills.
These courses go into real depth and they're great value - you'll find courses that would cost $ thousands elsewhere for only a little over 10 £/$/Euro.
blog comments powered by Disqus