Data Types 101 - A Guide to Quantitative Data, Qualitative Data and How to Distinguish Between Them
Ever looked at your data and wondered how and where to get started? If you don't know the difference between quantitative data and qualitative data then you're in the right place. Here is our guide to data types and how to deal with them...
Ultimately, there are just 2 data types. You may have heard phrases such as 'ordinal data', 'nominal data', 'discrete data' and so on and wondered just what they are and what they've got to do with your research and your data
Surely they are all just fancy words made up by mathematicians and statisticians to make them sound important, aren't they?
Well actually, they are pretty important, because if you know what types of data you have, then you know what maths and stats operations you're allowed to use on your data
Get that wrong and you're skating on pretty thin ice - sooner or later you're going to make your boss rather unhappy, and nobody wants that, do they?
So take a deep breath and let's go
I promise this will all be quite painless...
The Difference Between Quantitative Data and Qualitative Data
When it all boils down to it, all data is either measured with some kind of measuring implement - ruler, jug, weighing scales, stop-watch, thermometer and so on, or is an observed feature of interest that is placed into categories - gender (male, female), health (healthy, sick), opinion (agree, neutral, disagree)
So to put it in simple terms:
- Quantitative data is measured
- Qualitative data is categorised
The following infographic might help you to visualise what we're discussing, and we'll refer to it throughout...
Infographic: Quantitative Data is measured, Qualitative Data is categorised
No doubt you've noticed that quantitative data and qualitative data can be sub-divided into 4 further classes of data types; Ratio Data, Interval Data, Ordinal Data and Nominal Data
You can figure the difference by asking 3 questions:
- Ordered - Can some sort of progress be detected between adjacent data points or categories? Can the data be ordered meaningfully?
- Equidistant - Is the distance between adjacent data points or categories consistent?
- Meaningful Zero - Does the scale of measurement include a unique, non-arbitrary zero value?
If we can answer these 3 questions for each of our data types then we can correctly determine its class
We'll go through each of the data type classes in reverse order
Nominal data is the class of data type for data that has the following properties:
Nominal Data is observed, not measured, is unordered, non-equidistant and has no meaningful zero
We can differentiate between categories based only on their names, hence the title 'nominal' (from the Latin nomen, meaning 'name')
Examples of nominal data include:
- Gender (male, female)
- Nationality (British, American, Spanish,...)
- Genre/Style (Rock, Hip-Hop, Jazz, Classical,...)
- Favourite colour (red, green, blue,...)
- Favourite animal (aarvark, koala, sloth,...)
- Favourite spelling of 'favourite' (favourite, favorite)
The only mathematical or logical operations we can perform on nominal data is to say that an observation is (or is not) the same as another (equality or inequality), and we can determine the most common item by finding the mode (do you remember this from High School classes? If not, don't worry, it will be covered in a future blog post along with other measures of central tendency)
Other ways of finding the middle of the class, such as median or mean make no sense because ranking is meaningless for nominal data
Ordinal data have the following properties:
Ordinal Data is observed, not measured, is ordered but non-equidistant and has no meaningful zero
Their categories can be ordered (1st, 2nd, 3rd, etc. - hence the name 'ordinal'), but there is no consistency in the relative distances between adjacent categories
Examples of ordinal data include:
- Health (healthy, sick)
- Opinion (agree, mostly agree, neutral, mostly disagree, disagree)
- Tumour Grade (1, 2, 3)
- Tumour Stage (I, IIA, IIB, IIIA, IIIB, etc.)
- Time of day (morning, noon, night)
Mathematically, we can make simple comparisons between the categories, such as more (or less) healthy/severe, agree more or less, etc., and since there is an order to the data we can rank them and compute the median (or mode, but not the mean) to find the central value
It is interesting to note that in practice some ordinal data are treated as interval data - Tumour Grade is a classic example in healthcare - because the statistical tests that can be used on interval data (they meet the requirement of equal intervals) are much more powerful than those used on ordinal data. This is OK as long as your data collection methods ensure that the equidistant rule isn't bent too much
Interval data have the following properties:
Interval Data is measured and ordered with equidistant items, but has no meaningful zero
Interval data is ordered, can be continuous (have an infinite number of steps) or discrete (organised into categories), and the degree of difference between items is meaningful (their intervals are equal), but not their ratio
Examples of interval data include:
- Temperature (°C or F, but not Kelvin)
- Dates (1066, 1492, 1776, etc.)
- Time interval on a 12 hour clock (6am, 6pm)
Although interval data can appear very similar to ratio data, the difference is in their defined zero-points. If the zero-point of the scale has been chosen arbitrarily (such as the melting point of water or from an arbitrary epoch such as AD) then the data cannot be on the ratio scale and must be interval
Mathematically we may compare the degrees of the data (equality/inequality, more/less) and we may add/subtract the values, such as '20°C is 10 degrees hotter than 10°C' or '6pm is 3 hours later than 3pm'. However, we cannot multiply or divide the numbers because of the arbitrary zero, so we can't say '20°C is twice as hot as 10°C' or '6pm is twice as late as 3pm'
The central value of interval data is typically the mean (but could be the median or mode), and we can also express the spread or variability of the data using measures such as the range, standard deviation, variance and/or confidence intervals (again, if you can't quite remember what these are they'll be covered in a future blog post)
Ratio data have the following properties:
Ratio Data is measured and ordered with equidistant items and a meaningful zero
As with interval data, ratio data can be continuous or discrete, and differs from interval data in that there is a non-arbitrary zero-point to the data. Examples include:
- Age (from 0 years to 100+)
- Temperature (in Kelvin, but not °C or F)
- Distance (measured with a ruler or other such measuring device)
- Time interval (measured with a stop-watch or similar)
For each of these examples there is a real, meaningful zero-point - the age of a person (a 12 year old is twice the age of a 6 year old), absolute zero (matter at 200K has twice the energy of matter at 100K), distance measured from a pre-determined point (the distance from Barcelona to Berlin is half the distance as Barcelona to Moscow) or time (it takes me twice as long to run the 100m as Usain Bolt but only half the time of my Grandad)
Ratio data are the best to deal with mathematically (note that I didn't say easiest...) because all possibilities are on the table. We can find the central point of the data by using any of the mode, median or mean (arithmetic, geometric or harmonic) and use all of the most powerful statistical methods to analyse the data. As long as we choose correctly we can be really confident that that we are not being misled by the data and our interpretations are likely to have merit
So what have we learnt from this?
- Qualitative data is observed
- Ordinal data has ordered (but not equidistant) categories
- Nominal data has named (but not ordered) categories
- Quantitative data is measured
- Ratio data has equidistant intervals and a unique and meaningful zero
- Interval data has equidistant intervals, but an arbitrary zero
Looking back on my previous blog post - 'What Is Data?' - we established that the accuracy of our data is 10 times more important than what we plan to do with it. Well, determining the correct data type should be an important part of your quality control procedures. If you treat ordinal data as interval or ratio data you might think you're getting some powerful answers from your powerful stats, but they could be misleading or even just plain wrong
It's OK to say '9 out of 10 cats prefer...', but you need to be very careful about making claims such as 'twice as satisfied...'
That's usually guaranteed to make statisticians start jumping up and down...
blog comments powered by Disqus