Residual
Residual
The vertical distance between a data point and the graph of a regression equation. The residual is positive if the data point is above the graph. The residual is negative if the data point is below the graph. The residual is 0 only when the graph passes through the data point.

See also
Key Formula
ei=yi−y^i
Where:
- ei = The residual for the $i$th data point
- yi = The actual (observed) $y$-value of the $i$th data point
- y^i = The predicted $y$-value from the regression equation for the $i$th data point
Worked Example
Problem: A regression line for study hours vs. test scores is ŷ = 50 + 5x. A student who studied 6 hours scored 83. Find the residual.
Step 1: Identify the actual y-value. The student's actual test score is 83.
y=83
Step 2: Calculate the predicted y-value by substituting x=6 into the regression equation.
y^=50+5(6)=80
Step 3: Compute the residual by subtracting the predicted value from the actual value.
e=y−y^=83−80=3
Step 4: Interpret the result. The residual is positive, so the data point lies 3 units above the regression line. The student scored 3 points higher than the model predicted.
Answer: The residual is 3. The student scored 3 points above the predicted value.
Another Example
This example differs by computing multiple residuals at once, illustrating all three sign cases (positive, negative, zero) and introducing the idea that residuals from a least-squares line sum to zero when all data points are included.
Problem: Using the same regression line ŷ = 50 + 5x, three students studied 4, 8, and 10 hours and scored 68, 95, and 100 respectively. Calculate each residual and verify that the sum is close to zero.
Step 1: Find the predicted score for each student.
y^1=50+5(4)=70,y^2=50+5(8)=90,y^3=50+5(10)=100
Step 2: Calculate each residual.
e1=68−70=−2,e2=95−90=5,e3=100−100=0
Step 3: Interpret the signs. The first student scored below the prediction (negative residual), the second scored above (positive), and the third landed exactly on the line (zero residual).
Step 4: Sum the residuals. For a least-squares regression line fitted to all the data, the residuals sum to exactly zero. Here we only have three of the data points, so the sum may not be zero.
e1+e2+e3=−2+5+0=3
Answer: The residuals are −2, 5, and 0. This example shows that residuals can be negative, positive, or exactly zero.
Frequently Asked Questions
What does a positive or negative residual mean?
A positive residual means the actual data point is above the regression line—the model underestimated the value. A negative residual means the actual point is below the line—the model overestimated. A residual of zero means the prediction was exactly correct for that point.
Why do the residuals of a least-squares regression line sum to zero?
The least-squares method minimizes the sum of squared residuals. A mathematical consequence of this optimization is that the sum of the residuals equals exactly zero. This means the regression line passes through the point (xˉ,yˉ), balancing overestimates and underestimates.
How do you use a residual plot to check a regression model?
Plot the residuals on the vertical axis against the x-values (or predicted values) on the horizontal axis. If the residuals scatter randomly with no pattern, a linear model is appropriate. If you see a curved pattern, the relationship may not be linear and a different model might fit better.
Residual vs. Predicted Value (ŷ)
| Residual | Predicted Value (ŷ) | |
|---|---|---|
| Definition | The difference between the observed and predicted y-values | The y-value estimated by the regression equation for a given x |
| Formula | ei=yi−y^i | y^i=b0+b1xi |
| Sign | Can be positive, negative, or zero | Depends on the regression equation; can be any real number |
| Purpose | Measures prediction error for each data point | Provides the model's best estimate for a given input |
Why It Matters
Residuals appear throughout statistics courses whenever you study regression. They are the foundation for assessing how well a model fits data—residual plots reveal whether a linear model is appropriate, and the sum of squared residuals is the quantity minimized when fitting a least-squares line. In AP Statistics and college-level courses, you will be asked to calculate residuals, construct residual plots, and use them to judge model quality.
Common Mistakes
Mistake: Subtracting in the wrong order, computing y^−y instead of y−y^.
Correction: Always subtract the predicted value from the actual value: e=y−y^. Reversing the order flips the sign of every residual, which changes the interpretation of whether the model over- or underestimates.
Mistake: Confusing residuals with the distance formula or horizontal distance.
Correction: A residual is strictly a vertical distance—the difference in y-values only. It does not involve x-differences or the point-to-line perpendicular distance.
Related Terms
- Regression Equation — The equation whose predictions residuals measure error from
- Least-Squares Regression Line — The line that minimizes the sum of squared residuals
- Scatterplot — Graph where data points and residuals are visualized
- Vertical — Residuals measure vertical distance from the line
- Graph of an Equation or Inequality — The regression curve from which residuals are measured
- Positive Number — Residual is positive when data point is above the line
- Negative Number — Residual is negative when data point is below the line
