Mathwords logoMathwords

Residual

Residual

The vertical distance between a data point and the graph of a regression equation. The residual is positive if the data point is above the graph. The residual is negative if the data point is below the graph. The residual is 0 only when the graph passes through the data point.

 

Scatterplot with regression line showing actual point (x,yᵢ) and predicted point (x,yₚ); Residual = yᵢ - yₚ (vertical distance).

 

 

See also

Scatterplot, least squares regression line

Key Formula

ei=yiy^ie_i = y_i - \hat{y}_i
Where:
  • eie_i = The residual for the $i$th data point
  • yiy_i = The actual (observed) $y$-value of the $i$th data point
  • y^i\hat{y}_i = The predicted $y$-value from the regression equation for the $i$th data point

Worked Example

Problem: A regression line for study hours vs. test scores is ŷ = 50 + 5x. A student who studied 6 hours scored 83. Find the residual.
Step 1: Identify the actual yy-value. The student's actual test score is 83.
y=83y = 83
Step 2: Calculate the predicted yy-value by substituting x=6x = 6 into the regression equation.
y^=50+5(6)=80\hat{y} = 50 + 5(6) = 80
Step 3: Compute the residual by subtracting the predicted value from the actual value.
e=yy^=8380=3e = y - \hat{y} = 83 - 80 = 3
Step 4: Interpret the result. The residual is positive, so the data point lies 3 units above the regression line. The student scored 3 points higher than the model predicted.
Answer: The residual is 3. The student scored 3 points above the predicted value.

Another Example

This example differs by computing multiple residuals at once, illustrating all three sign cases (positive, negative, zero) and introducing the idea that residuals from a least-squares line sum to zero when all data points are included.

Problem: Using the same regression line ŷ = 50 + 5x, three students studied 4, 8, and 10 hours and scored 68, 95, and 100 respectively. Calculate each residual and verify that the sum is close to zero.
Step 1: Find the predicted score for each student.
y^1=50+5(4)=70,y^2=50+5(8)=90,y^3=50+5(10)=100\hat{y}_1 = 50 + 5(4) = 70, \quad \hat{y}_2 = 50 + 5(8) = 90, \quad \hat{y}_3 = 50 + 5(10) = 100
Step 2: Calculate each residual.
e1=6870=2,e2=9590=5,e3=100100=0e_1 = 68 - 70 = -2, \quad e_2 = 95 - 90 = 5, \quad e_3 = 100 - 100 = 0
Step 3: Interpret the signs. The first student scored below the prediction (negative residual), the second scored above (positive), and the third landed exactly on the line (zero residual).
Step 4: Sum the residuals. For a least-squares regression line fitted to all the data, the residuals sum to exactly zero. Here we only have three of the data points, so the sum may not be zero.
e1+e2+e3=2+5+0=3e_1 + e_2 + e_3 = -2 + 5 + 0 = 3
Answer: The residuals are −2, 5, and 0. This example shows that residuals can be negative, positive, or exactly zero.

Frequently Asked Questions

What does a positive or negative residual mean?
A positive residual means the actual data point is above the regression line—the model underestimated the value. A negative residual means the actual point is below the line—the model overestimated. A residual of zero means the prediction was exactly correct for that point.
Why do the residuals of a least-squares regression line sum to zero?
The least-squares method minimizes the sum of squared residuals. A mathematical consequence of this optimization is that the sum of the residuals equals exactly zero. This means the regression line passes through the point (xˉ,yˉ)(\bar{x}, \bar{y}), balancing overestimates and underestimates.
How do you use a residual plot to check a regression model?
Plot the residuals on the vertical axis against the xx-values (or predicted values) on the horizontal axis. If the residuals scatter randomly with no pattern, a linear model is appropriate. If you see a curved pattern, the relationship may not be linear and a different model might fit better.

Residual vs. Predicted Value (ŷ)

ResidualPredicted Value (ŷ)
DefinitionThe difference between the observed and predicted y-valuesThe y-value estimated by the regression equation for a given x
Formulaei=yiy^ie_i = y_i - \hat{y}_iy^i=b0+b1xi\hat{y}_i = b_0 + b_1 x_i
SignCan be positive, negative, or zeroDepends on the regression equation; can be any real number
PurposeMeasures prediction error for each data pointProvides the model's best estimate for a given input

Why It Matters

Residuals appear throughout statistics courses whenever you study regression. They are the foundation for assessing how well a model fits data—residual plots reveal whether a linear model is appropriate, and the sum of squared residuals is the quantity minimized when fitting a least-squares line. In AP Statistics and college-level courses, you will be asked to calculate residuals, construct residual plots, and use them to judge model quality.

Common Mistakes

Mistake: Subtracting in the wrong order, computing y^y\hat{y} - y instead of yy^y - \hat{y}.
Correction: Always subtract the predicted value from the actual value: e=yy^e = y - \hat{y}. Reversing the order flips the sign of every residual, which changes the interpretation of whether the model over- or underestimates.
Mistake: Confusing residuals with the distance formula or horizontal distance.
Correction: A residual is strictly a vertical distance—the difference in yy-values only. It does not involve xx-differences or the point-to-line perpendicular distance.

Related Terms