Coefficient of Determination (R²)
The coefficient of determination, written as R², is a number between 0 and 1 that tells you what proportion of the variation in your dependent variable is explained by your regression model. An R² of 0.85, for instance, means 85% of the variability in the response variable can be accounted for by the independent variable(s).
The coefficient of determination, denoted , quantifies the fraction of the total variability in the response variable that is captured by a regression model. It is computed as minus the ratio of the residual sum of squares () to the total sum of squares (). Values range from (the model explains none of the variability) to (the model explains all of the variability). In simple linear regression, equals the square of the Pearson correlation coefficient .
Key Formula
Where:
- = the coefficient of determination
- = each observed value of the dependent variable
- = the predicted value from the regression model
- = the mean of all observed y values
- = the residual (error) sum of squares
- = the total sum of squares
Worked Example
Problem: Five students' hours of study (x) and test scores (y) are recorded. After fitting a regression line, you find that the correlation coefficient is r = 0.90. Find and interpret R².
Step 1: Square the correlation coefficient to get R².
Step 2: Convert to a percentage for interpretation.
Step 3: Write the interpretation in context. About 81% of the variation in test scores can be explained by the linear relationship with hours of study.
Answer: R² = 0.81. Approximately 81% of the variability in test scores is explained by hours of study.
Visualization
Why It Matters
R² is one of the first things researchers and analysts check when evaluating a model. In AP Statistics, you are expected to calculate R² and — more importantly — interpret it in the context of the data. Beyond the classroom, R² helps scientists decide whether a model is useful for prediction, such as determining how well advertising spending predicts sales revenue or how well temperature explains ice cream demand.
Common Mistakes
Mistake: Interpreting R² as proving causation between x and y.
Correction: R² measures the strength of a statistical association, not a cause-and-effect relationship. A high R² does not mean x causes y; other variables or coincidence may be involved.
Mistake: Saying "R² = 0.81 means 81% of the data points fall on the regression line."
Correction: R² describes the proportion of variance explained, not the percentage of points on the line. The correct phrasing is: 81% of the variability in the response variable is explained by the model.
