50 Shades of Grey – The Psychology of a Data Scientist
There are 2 types of Data Scientist.
Those with grey hairs and those without.
If you're one of those with greys (like me) then you didn't do a specialised Data Science course (they didn't exist before I got the greys) and you probably fell into Data Science by accident.
Interestingly, todays highly sought-after Data Scientists were yesterdays unloved academic 'jack-of-all-trades', with the unfortunate epitaph of 'master of none'.
No more - Data Science is finally getting the recognition as a specialist subject in its own right and Data Scientists are finally being seen as valuable commodities.
Anyway, here's my story - see if it rings any bells with you...
Unless you’ve recently graduated from one of the new Data Science courses that have been popping up online and in various universities around the world, then becoming a Data Scientist was most likely slightly accidental and was more about the journey than the destination.
Here’s my journey. See if you recognise any of it in your own:
I started out as a physicist and had a strong mathematical background, but I had a passion for medicine. After completing my bachelor’s degree I took a master’s degree in medical physics. This is where I gained an appreciation for the importance of image analysis and the role that data plays in medicine. I created a virtual model of a human torso by segmenting images from the Visible Human Project. Each slice had dimensions of 2048 x 1216, each in 24 bit colour, which is approximately 7.5 megabytes. Not too large, but when you put all the slices together, the full dataset is around 40 gigabytes. This may not be in Big Data territory, but it’s pretty big for a desktop PC and you get quite familiar with handling large amounts of data.
Incidentally, there are no shortages of blog posts talking about the necessary skills of Data Scientists, but very rarely does anyone mention image analysis. I predict that image analysis and video analysis will shortly become a very useful skill for a Data Scientist to have, not just in medical data analysis, but in many other areas of data analysis too.
Anyway, I digress.
After my master’s degree in medical physics I then did another master’s degree in bioinformatics. During this time, the results of the Human Genome Project were published and I was honoured to be able to do some analysis of the resultant data. The Human Genome Project produced huge amounts of data, so my newly-discovered data handling skills came in very handy. Here I learned about artificial intelligence and created a number of predictive models for a variety of purposes.
At the end of my master’s research I did a PhD in artificial intelligence where I created a predictive system that prevented a terrorist attack on a public water supply. Well, actually, that part isn’t strictly true. I wrote an article that was published in New Scientist about how an artificial neural network system could be created that would prevent a terrorist attack on public water supplies…
Now here’s where my journey comes full circle. At the conclusion of my PhD I left bioinformatics and returned to medicine where I was offered the role of medical statistician to one of the worlds best breast cancer research departments. I wasn’t appointed because I was a statistician, but rather because I wasn’t a statistician. Although I had a working background in stats, they were more interested in using my skills as a bridge between disciplines. I was neither a specialist in microbiology, pathology, cancer, surgery nor stats, but I had sufficient working knowledge of each to be able to communicate and translate effectively between them all of them.
It was a really interesting time, but I realised that I didn’t actually like stats. What I did like was programming stats. Most of my time as a medical statistician involved creating programs to automate data analysis, stats and predictive systems that helped researchers reach the story of their data in a fraction of the time that it would take to analyse the data manually.
And that sort of brings me to where I am today. A few years ago I left my job to form a start-up company, Chi-Squared Innovations, that creates automated data analysis programs, but that’s a story for another day.
OK, so that is my story, but there wasn’t really a destination. I didn’t actually plan all of that out, it just sort of happened. I think the journey is an important one, because it tells you a lot about what Data Scientists are all about, and the skills they use every day.
I started out as a scientist, and have worked in many different scientific fields, but I’m not a specialist in any of them. I learned a lot about computer programming, data handling and image analysis, but I don’t specialise in any of these either. I guess my strongest areas (at the moment) are in artificial intelligence and statistics, but I don’t claim to be an expert. Right now I’m working on improving my skills in business development, data visualisations, shell scripting, python and GUIs, but – yes, you guessed it – I’m not an expert.
What is a Data Scientist? - Cartoon courtesy of Philip Riggs (Twitter: @ProductiveEgg)
For me, this journey typifies the life of the Data Scientist. Most of us aren’t experts in more than one or two disciplines (or any, in my case), and to the traditional academic we are ‘jacks of all trades’. Our skills are neither that of the expert nor of the novice, but somewhere inbetween. Neither black nor white, but varying shades of grey.
What we need to recognise though, is that Data Science – as broad a subject as it may be – is a specialist subject of its own. To me, Data Science is the glue that binds together distinct areas of specialisation. It is the ultimate multi-discipline.
Here’s an unfunny joke I used to tell when I was still an academic:
Q: What do you get if you put together the best physicist, mathematician, biologist, surgeon, programmer, statistician and AI guy in the world into one room?
A: An unholy mess, the potential for 1000 arguments and the waste of $50 million.
Of course, this is exactly the type of multi-disciplinary dream team that universities, government bodies and companies set up regularly and call it a ‘think tank’, so why does it often fall apart?
The answer is because there is no glue. Each specialist is trained to see the problem from their own perspective and has little knowledge and understanding of other points of view.
This is why Data Scientists are becoming so important. They are the glue that pulls together disparate disciplines.
Enjoying this blog post? Share it with the world...
Oh yes, and to those that say that all you need to do to be a Data Scientist is to do an online course to learn Hadoop, MapReduce, R, Python and d3 I say this: it’s about the skills, not the tools. To learn the skills of a Data Scientist takes years, if not decades. If you don’t have any grey hairs yet, then you’re not a Data Scientist (but don’t give up – you’ll get there eventually)!
So to all Data Scientists the world over: stop using the Grecian 2000 and celebrate the grey.
All 50 shades of them…
So what was your journey to becoming a Data Scientist? I’d love to hear your story. Just lie back on the couch and tell me all about it…
If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:
- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro
Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links
You can find further details in our TCs
Statistics - The Last Dark Art?
Statistics isn't some mystical black art. You don't need runes, capes, daggers or to sacrifice a virgin at the full moon. Well, not unless you really want to…
Learn the statistics basics with this witty and informative blog post.
Data Scientist: The Sexiest Job of the 21st Century
In the October 2012 edition of the Harvard Business Review, DJ Patil and Thomas Davenport introduced the new term of Data Scientist to the world. And it stuck. Learn what it means to be a Data Scientist.
Truth, Lies & Statistics
Pirates, cats, Mexican lemons and North Carolina lawyers. Cheese consumption, margarine and drowning by falling out of fishing boats. This book has got it all.
A roller coaster of a book in 8 witty chapters, this might just be the most entertaining statistics book you’ll read this year.
21 Inspirational Books for All Aspiring Data Scientists
It's always difficult knowing where to start, but especially so when it comes to Data Science. No need to fret, though - we've selected our top 21 books that all aspiring data scientists should read.
These will get you going in no time...
Videos & Video Courses
The Best FREE Data Science Courses at Udemy
Udemy is a great place to learn how to be a Data Scientist. It's got courses on everything you need to learn - statistics, R and Python, Machine Learning, Data Visualisation and all sorts of other stuff.
These courses are all FREE, so are pretty basic. Nvertheless they're a great place to get started.
Learn to be a Data Science Ninja - The Easy Way
If you want to be an expert data scientist you're going to need a strong grounding in statistics, computer programming, machine learning, data visualisation and more, and Udemy is a great place to learn these skills.
These courses go into real depth and they're great value - you'll find courses that would cost $ thousands elsewhere for only a little over 10 £/$/Euro.
blog comments powered by Disqus