Statistical Help
Home >

# Standard Deviation

 Study on Childhood Weight Statistic Value Count 12 Sum 642 Mean 53.5 Median 45.5 Mode No Mode Min 13 Max 110 Range 97 Standard Deviation 31.0

The standard deviation is useful when comparing one dataset to one or more datasets.

You’ve probably heard of the term standard deviation before. This is one way of measuring the dispersion of a given set of data. What do I mean by dispersion? Well, if we use a sample child weight dataset (shown below), the data ranges from 13 to 110, with a mean of 53.5. It is a pretty wide spread of data.

13        22        26        38        36        42

49        50        77        81        98        110

Well, what if the data had the exact same mean, but instead ranged from 45 to 62? You would determine that this would be a much tighter range, right? There would not be as much spread or dispersion from the mean. It is important to have a good understanding of the dispersion of your data so it can be properly compared to other data. The standard deviation is one tool for assessing data dispersion.

## Calculating Standard Deviation

There is a little more math involved in calculating the standard deviation, but it is not advanced. The standard deviation is simply the square root of the average squared deviation of the data from the mean. Before you allow this definition to scare you off, let’s calculate the standard deviation for the sample dataset of child weights together:

13        22        26        38        36        42

49        50        77        81        98        110

• Step 1: First, we calculate the mean (or average) of the data. Fortunately, we’ve already done this. The mean is 53.5.
• Step 2: Now, subtract the mean from every item in the set. Often, a table is helpful in performing these calculations. The calculations have been performed below:
 Child Weight Subtract the Mean Difference 13 53.5 -40.5 22 53.5 -31.5 26 53.5 -27.5 38 53.5 -15.5 36 53.5 -17.5 42 53.5 -11.5 49 53.5 -4.5 50 53.5 -3.5 77 53.5 23.5 81 53.5 27.5 98 53.5 44.5 110 53.5 56.5

Notice that if you were to sum all the numbers in the "Difference" column, you would get a sum of zero. This makes sense, of course, because by definition, the mean should be the exact middle value equidistant to each of the points in the dataset, so the positive and negative differences will always balance each other out. Adding these numbers will always result in zero, regardless of how condensed or dispersed the data values are about the mean.

• Step 3: Square the difference between each number and the mean (the third column in the table above).

In order to prevent our sum of the Difference column from resulting in zero, we can square these numbers. Remember this means multiplying each number by itself
(-40.5 times -40.5 equals 1640.25).  The new squared differences are now as follows:

Difference Squared:
1640.25
992.25
756.25
240.25
306.25
132.25
20.25
12.25
552.25
756.25
1980.25
3192.25

• Step 4: Sum the squared differences. We get a sum of 10,581.
• Step 5: Divide this sum by the number of items (for a sample, you would instead divide by n-1; remember n is the count).

Since our dataset was a sample of child weights, we will divide by n-1 (12-1=11). The answer is 961.9 This number is called the variance.

• Step 6: Take the square root of the variance to find the standard deviation.

Taking a square root converts the variance from squared units to the original units of measurement. Our standard deviation is 31.0.

Yeah! We did it! Phew...

## How Can We Use the Standard Deviation?

So how does this relate to the real world? Many people wonder how we know which is better: a large standard deviation or a small standard deviation, but it all depends on what you're asking. Remember, the smaller the standard deviation, the more closely the data cluster about the mean. This information is useful in comparison to other datasets.

In the sample child weight dataset above, the data ranges from 13 to 110, with a mean of 53.5. The standard deviation is 31.0. Another sample dataset might have the same mean, 53.5, but with a data range from 45 to 62 and a standard deviation of 3.5. The two datasets have the same mean, 53.5, but very different standard deviations. Comparing the two standard deviations shows that the data in the first dataset is much more spread out than the data in the second dataset.

rev. 29-Aug-2016