Residual Plot
A residual plot is a scatterplot that displays the residuals (the differences between observed and predicted values) on the vertical axis against the predicted values (or the explanatory variable) on the horizontal axis. It helps you judge whether a linear regression model is appropriate for your data.
A residual plot graphs the residuals on the -axis against either the fitted values or the explanatory variable on the -axis. If the regression model is a good fit, the residuals should appear randomly scattered around the horizontal line , with no obvious pattern, curvature, or systematic change in spread. Patterns in a residual plot signal that the chosen model does not adequately capture the relationship in the data.
Key Formula
Where:
- = the residual for the $i$th observation
- = the observed value of the response variable
- = the predicted value from the regression model
Worked Example
Problem: A simple linear regression predicts test scores from hours of study. The data for five students gives observed scores and predicted scores . Compute the residuals and describe what to look for in the residual plot.
| Student | Hours () | Observed () | Predicted () |
|---------|------------|----------------|----------------------|
| 1 | 1 | 52 | 50 |
| 2 | 2 | 58 | 60 |
| 3 | 3 | 72 | 70 |
| 4 | 4 | 78 | 80 |
| 5 | 5 | 92 | 90 |
Step 1: Calculate each residual using .
Step 2: Plot each predicted value on the horizontal axis and the corresponding residual on the vertical axis. The five points are , , , , .
Step 3: Examine the plot for patterns. Here the residuals alternate between and , staying close to zero with no curvature or fan shape.
Step 4: Because the residuals are small and show no systematic pattern, the linear model appears to be a reasonable fit for these data.
Answer: The residuals are . The residual plot shows points scattered closely around the line with no clear pattern, suggesting the linear model fits well.
Visualization
Why It Matters
In AP Statistics, you cannot tell whether a linear model is appropriate just by looking at the original scatterplot or the correlation coefficient. A residual plot is the standard diagnostic tool: a curved pattern tells you a nonlinear model might work better, while a fan or funnel shape warns that the variability in your response is not constant. Checking residual plots is a required step whenever you perform regression analysis on the AP exam.
Common Mistakes
Mistake: Concluding that a clear curved pattern in the residual plot means the linear model is fine because is high.
Correction: A high correlation coefficient does not guarantee linearity. If the residual plot shows curvature, the linear model is not appropriate regardless of .
Mistake: Expecting the residual plot to show a linear trend when the model is a good fit.
Correction: A good residual plot looks like a random cloud of points centered on zero. Any visible pattern — linear, curved, or fan-shaped — indicates a problem with the model.
