National EMSC Data Analysis Resource Center
Another important measure of center is the median.
The median is the middle observation in a set of data. Let’s calculate the median for a sample dataset on childhood weight.
13 36 98 77 42 50
110 22 49 81 26 38
In the dataset above, what is the middle observation? Well, before we can figure this out, we have to properly order the observations in a logical manner so they make sense. We will order them from smallest to largest, as shown below:
13 22 26 36 38 42
49 50 77 81 98 110
Now that our data are properly ordered, we can find the middle observation.
Odd Number of ObservationsEven Number of Observations
In a dataset that has an odd number of observations, this is very easy; it is simply the number smack in the middle (the one with an equal number of observations above and below).
However, in our case we have 12 observations, which is an even number. This means that we need to take the two observations in the center and average them. In this case, the two observations in the middle are 42 and 49. When we take the average of these two numbers (remember, to do an average, you sum the two numbers (42+49 = 91) and divide that number by the count, which in this case is 2), we get 45.5. So our median is 45.5.
So what does the median mean? Well, like the mean, it provides a helpful measure of center of our dataset. We now know that the median weight of the children in our group is 45.5. But it is also helpful to compare the median with the mean. 45.5 is obviously less than the mean, which was 53.5. Often, the mean and median will be the same in a dataset, but sometimes they are different, such as in our case. When the mean and the median are the same, you know that the dataset is "normally distributed." When the mean and the median are different, you know that the data are "skewed" in some way.
What do I mean by skewed? Well, unlike the mean, which was a mathematical calculation using every observation in the dataset, the median ignores what the numbers say and just uses the middle observation. Which one is right? They both are. Neither one is necessarily better than the other. So why use a median? Well, there are certain kinds of data where you will be concerned about skewing. Skewing is when the mean is pulled higher or lower than the median because of very high or very low values.
Let’s say, for instance, you wanted to know the typical income of all the people you know. First, you would collect the data. You would probably get a wide range of answers, with most of them being between $20,000 and $150,000 a year or so. However, we can imagine that you may know some people who make millions and millions of dollars a year. If you include even just one or two of those people in your dataset, the entire dataset would be skewed.
Your dataset look might like this:
$20,000 $25,000 $35,000 $37,000 $42,000 $45,000 $58,000 $69,000
$80,000 $110,000 $140,000 $250,000,000
Notice how 11 out of the 12 observations fall within what most would call a “normal” income range, but the last person makes a lot more money.
If you were going to take a median of the above data you would get $51,500. But if you were to calculate the mean (the average), it would be a whopping $20,888,000! Talk about skewed data. Would you really want to go around telling people that the average income of people you know is more that 20 million dollars a year? People would probably think you’re crazy, but you’d actually be telling the truth. That is the real mean; however, the median would more accurately represent the income of "most" of the people in this example.
So, because what we really want to know is how much money "most" people make, sometimes we have to control for those situations where a few observations can seriously distort our mean. In this case, we would probably decide it best to report the median income rather than the mean.
Hospital length of stay is another example of data that is often skewed. Most people stay only a few days when they are admitted to the hospital, but there are a few people who have hospital stays of more than 365 days or longer who significantly skew the data. In this case, you also would probably want to ignore the mean and just report the median. In general, however, most people expect you to report the mean unless you have a good reason for not doing so.