Data comes in several forms, and there are things that you can, and can’t, do with different types of data. Even with data in the form of numbers, there are things that can lead to misleading or incorrect results. It is thus important to know first what the question is. Then, think about the analysis you are going to use and collect data that lends itself to the analysis. The table included below shows different scales of data and some basic operations and analysis that can be meaningfully performed with them. Why is this important? A few examples to highlight this follow.
Scale of Dependent Variable (Response) |
Example Data type |
Features | Can I: calc mean calc s.d. calc s.err. |
Can I: calc ratio |
Can I: ANOVA MANOVA |
Can I: t test |
Can I: simp lin regr/lm or glm |
Categorical: Binary |
yes, no | Two state nominal or ordinal | No | No | No | No | No Use logistic regr |
Categorical: Nominal |
mammal, arthropod, orc | Classes without a specific order | No | No | No | No | No |
Categorical: Ordinal |
bad good very good excellent |
Classes with some order | No Please stop doing this |
No | No | No | No Use ordinal regr |
Continuous: Interval |
°C | Equal intervals; zero isn’t zero | Yes | No | Yes | Yes | Yes |
Continuous: Ratio |
K; cm | Equal intervals; zero is zero | Yes | Yes | Yes | Yes | Yes |
Ordinal Data
Ordinal data are data that are classified in such a way as to have a logical order to them. These could consist of descriptors such as small, medium, large. These have an obvious order, but because the categories do not have equal intervals (i.e., one unit on the size scale each), you cannot perform mathematical manipulations with them. This is obvious in this example, but what happens when survey responses (e.g., Strongly Agree, …, Disagree, Strongly Disagree) are encoded with numbered classes, say 1 to 5. Would you then feel justified in taking the average value of the responses? Taking means of 1s,2s, 3s, 4s, and 5s seems logical, but what is Disagree and a Half? A permissible operation is to take the median value, or the coded value of middle data point when ranked in order.
Interval Data
Interval data has a logical order similar to ordinal data, but in addition the values or classes have the same interval. For example, and increase in temperature by one Celsius degree represents the same change in heat value regardless of where we are on the Celsius scale. You can thus add and subtract such values and be confident in the results. However, there are still some things that you cannot meaningfully do with some interval data.
Keeping with the temperature theme, suppose it is -10°C outside. Reporters in the media seem to have a predilection for comparing temperatures using multiplication and division. If another place is experiencing a brisk temperature of -40°C, you are likely to be assaulted with the news that such-and-such a place is, “Four times colder!” Wow, you are supposed to think, that is really cold! But what if you prefer the Fahrenheit scale? It would be 14°F where you are, and -40°F at the other place reported upon. Four times 14 does not equal -40, but the actual temperatures have not changed. So why does this not work? Let’s introduce our next scale.
Ratio Data
Ratio data consists of data that has an interval scale, but also has the important feature that zero means zero. There is nothing left to count at the zero value. Multiplication and division leads to reliable results because the location of specific values along the ratio scale represents the actual amount of stuff present, be it the number of objects (zero means no objects), the length of a board (zero means no board), or the temperature (zero means atoms are not moving and no heat).
We now go back to our temperature theme. The Kelvin scale is a ratio scale. It has a value of zero at the temperature at which there is simply no heat. If you want to say that such-and-such a place is X times warmer than where you are, convert to Kelvin and report away with impunity. You cannot, however, correctly state that one place is X times colder than where you are. That would require a fixed absolute reference higher than the temperatures compared to, which cannot logically happen because you need to use a scale with zero as zero.
The next time I hear someone tell me that it is, “Twice as cold as such-and-such,” I am simply going to go outside and enjoy the cool weather. Now, if only I could get someone to explain what they mean by Disagree and a Half.