Variance
Variance is a number that tells you how spread out the values in a data set are from the mean. A high variance means the data is widely spread; a low variance means the values are close together.
Variance is calculated by finding the mean of the data, subtracting the mean from each data value and squaring the result, then averaging those squared differences. For a population of N values, this gives the population variance σ². For a sample, the sum of squared differences is divided by n − 1 rather than n, giving the sample variance s². This adjustment corrects for the tendency of a sample to underestimate the true spread of the population.
Key Formula
Where:
- = the population variance
- = the number of values in the population
- = each individual data value
- = the population mean
Worked Example
Problem: Find the variance of the data set: 2, 4, 4, 6, 9.
Step 1: Find the mean of the data set.
Step 2: Subtract the mean from each value and square the result.
Step 3: Add up all the squared differences.
Step 4: Divide by the number of values to get the population variance.
Answer: The variance of the data set is 5.6.
Visualization
Why It Matters
Variance is a foundation of statistical analysis. It appears directly in the formulas for standard deviation, confidence intervals, and hypothesis tests — all of which are central to AP Statistics. In fields like finance, variance is used to measure investment risk: a portfolio with high variance has returns that fluctuate dramatically, which matters when managing real money.
Common Mistakes
Mistake: Forgetting to square the differences before averaging them.
Correction: Without squaring, positive and negative deviations cancel each other out and the result is always zero. Squaring ensures every difference contributes positively to the total.
Mistake: Dividing by n instead of n − 1 when working with a sample.
Correction: For a sample, divide by n − 1 to get an unbiased estimate of the population variance. Dividing by n consistently underestimates the true spread. Only divide by N when you have data for the entire population.
