Feed Your Creativity

Inspiration in Your Inbox !
BLOGS: Popular The Captain's Blog Discover Data Discover Stats Discover Visualisation

RESOURCES: Popular eBooks Videos eCourses Exclusive **NEW**

Top 12 Tips - What To Do With Data


This is the first entry in our Discover Data Blog Series, so it seems appropriate to ask the questions 'what is data?', 'why is it so important?' and 'what can we do with it?'. We give you the Top 12 Tips on what you can do with your data...


So what is data?

Information is all around us. It is in everything we see, hear, smell, touch and taste. It can be found in the largest event, like the formation of a new galaxy, and in the smallest, such as the spin-state of an electron (you can tell I'm a physicist, can't you?)

Simply put, data is a collection of facts and information that we have gathered and translated into a form that is convenient to process

It can be numbers, words, measurements, observations or even just descriptions of things


Top 12 Tips - What To Do With Data

Top 12 Tips - What To Do With Data


So why do we collect data?

Information on its own can be interesting, but it is not really very useful. We need to collect data so we can find out 'what the world is like'


We might observe things like:

  • It's rained every day this week
  • My daughter is taller than most of her classmates
  • I seem to be diagnosing more cases of lung cancer than usual just lately


The questions to ask might be:

  • Are the current rainfall patterns unusual for this time of year?
  • Is my daughter tall for her age or is her height within accepted limits?
  • Are my observations correct, and if so, why are there more cases of lung cancer than usual?


In each of these cases we need to gather the information, observe it, measure it, count it and categorise it so that we can begin to understand the 'story' behind the information


What do we do with the data?

We typically collect data to answer one of 2 questions:

  • What is the world like?
  • What is the world going to be like?


The infographic below might help to explain the difference between these questions:


Infographic: We analyse data to find features, patterns and trends that enable us to describe what the world is like and predict what it will be like in the future


We might want to analyse the data to find features that describe to us what the world was like at the time the data was collected

There is a whole branch of statistics dedicated to finding these features, and typically we use descriptive methods to measure things such as:

  • averages (mean, median, mode)
  • variation (standard deviation, confidence intervals)


What these can't do is tell you the future

For this we need to create models that can spot patterns and trends that allow us to predict what the world will be like in the future

There are many different ways of producing predictions and forecasts from data, but they can be broadly grouped into 2 techniques:

  • regression (linear, multivariate, logistic, etc.)
  • machine learning (ANNs, SVMs, etc.)


I'll talk about these in greater detail in future posts



Data Accuracy

Ultimately, data is information that can tell us how the world works, and this is important if we want to be able to predict the future with any degree of accuracy

If we want accurate predictions, then we need accurate data, so it is of the utmost importance that we take care when we observe and measure

As a statistical consultant I have lost count of the number of times that I have had to tell a researcher that his/her data is not fit for purpose and if they want their questions answered correctly and accurately they need to start again

For a 3 year PhD student with just a month to go before submitting their thesis, this is not what they want to hear - and not what I want to tell them!


So how do we know when our data is not up to scratch?


There is a whole branch of statistics dedicated to answering this question (which I'm not going to go into here), but one of the questions we can ask is:

  • Is our data biased?


Download your Cheatsheet

Want a FREE Excel cheatsheet with 22 Essential data cleaning formulae?

Of course you do! Well here you go:



An example of how to detect bias in data is to check the remainder (the right-hand side of the decimal point) of continuous measurements


Say that we are measuring the heights of 10 year old children to 1 decimal place

We'll have measurement such as

  • 140.1cm
  • 143.6cm
  • 137.3cm
  • ...

Now leave off everything to the left of the decimal point and we have

  • .1
  • .6
  • .3
  • ...

Count up all the .1s, the .2s, .3s, etc., and plot the counts against the remainder

We expect to see approximately the same number of children in each of the deciles (the .1s, .2s, .3s and so on) so the plot should be square-ish (below left):


Example of how to detect bias in the data


If we see the graph on the right, we'll know that something has gone wrong with our measuring procedures

Most likely the person/people doing the measuring have rounded to the nearest .0 or .5 for some (but not all) of the measurements, and this has inevitably introduced bias into the data


Is this a problem?

Well, it might be, but it all depends on what questions you are asking and how accurate you need the answers to be

Only you can answer that question, and it would be a really good idea to discuss this with your local friendly statistician before you begin collecting your data rather than just hours before an important deadline!

I wish I had a pound for every time I'd told that to someone...


Enjoying this blog post? Share it with the world...


Lessons Learnt...

So what have we learnt from this? As promised, here are our Top 12 Tips about what to do with data:


Tip #1: We can use data to describe what the world is like

Tip #2: We can use data to predict what the world will be like

Tips #3-12: The accuracy of our data is 10 times more important than what we plan to do with it!


So the next time you collect data, remember GIGO:

Garbage In, Garbage Out...


And maybe, just maybe, your statistician won't ruin your day by telling you to scrap your data and start again


Did you forget to download your FREE cheatsheet?

22 spiffing Excel data cleaning formulae



Learn More


If you're interested in learning more about the content in this blog post we've sought out the best blogs, books, video courses and other stuff from around the internet for you. Some may be free while others may not, and to help you decide we use the following ratings:

- FREE content
- costs less than 10 £/$/Euro
- costs less than 50 £/$/Euro
- costs less than 100 £/$/Euro
- costs more than 100 £/$/Euro


Disclosure: some of these resources may be affiliate links, and we may earn an affiliate commission for purchases you make when using these links

You can find further details in our TCs


Blog Posts

11 Essential Tips for Effective Data Collection
Collecting data is serious business. Get it right and your analysis can go relatively smoothly, but get it wrong and you're in for a world of pain. Here are 11 essential and completely painless tips that'll get you off to a great start.

5 Productivity Tips for Efficient Data Cleansing
Now that you've collected your data, your next job is to clean it. This is a job that can be painful, thankless and can take you weeks, so it pays to be organised. Here are 5 tips that will help you get through it quickly and easily.




Videos & Video Courses



DataKleenr - Translates the Data You Have into the Data You Need

DataKleenr - all your data cleaned in minutes, not months
DataKleenr cleans and classifies your data - including outliers - as it is being uploaded. A few moments later, your data is analysis-ready!
DataKleenr is fast, simple and accurate, leaving you the time you need to do the really important parts of your job.


Geeky Stuff



blog comments powered by Disqus