Regression to the Mean — Definition, Formula & Examples
Regression to the mean is the statistical tendency for extreme observations to be followed by measurements that fall closer to the overall average. If you score unusually high or low on a test, your next score will likely be closer to your typical performance.
Regression to the mean is the phenomenon whereby, given two jointly distributed random variables with imperfect correlation, a value of one variable that is far from its mean will, on average, correspond to a value of the other variable that is closer to its own mean. The effect is stronger when the correlation between the variables is weaker.
How It Works
Extreme values arise partly from genuine ability or signal and partly from random variation (luck, measurement error, etc.). Because the random component is unlikely to be equally extreme in the same direction on a second measurement, the observed value tends to shift back toward the mean. This is not a causal force pulling values toward the center — it is a purely statistical consequence of imperfect correlation between successive measurements. You can observe it whenever you compare two related but not perfectly correlated sets of data, such as test scores on two exams or a parent's height and their child's height.
Worked Example
Problem: A class of students has a mean score of 75 on Exam 1. The correlation between Exam 1 and Exam 2 scores is , and both exams have a mean of 75 and a standard deviation of 10. A student scored 95 on Exam 1. What does regression to the mean predict for their Exam 2 score?
Find how far above the mean the student scored: The student scored 95, which is 2 standard deviations above the mean.
Apply the regression effect: With , the predicted z-score on Exam 2 is times the z-score on Exam 1.
Convert back to the original scale: Multiply the predicted z-score by the standard deviation and add the mean.
Answer: The predicted Exam 2 score is 87 — still above average, but 8 points closer to the mean than the Exam 1 score of 95. This shrinkage toward the mean is regression to the mean in action.
Why It Matters
Ignoring regression to the mean leads to flawed conclusions in medicine, education, and sports. For example, a school might credit a new teaching program for improved scores when the improvement was simply students regressing from an unusually bad performance. Recognizing this effect is essential in AP Statistics and any field that evaluates interventions based on before-and-after data.
Common Mistakes
Mistake: Believing regression to the mean is a causal force that 'pulls' values toward the average
Correction: It is a statistical artifact of imperfect correlation, not a physical mechanism. Extreme values regress because the random component of variation is unlikely to repeat in the same direction.
