Explanatory Variable — Definition, Formula & Examples
An explanatory variable is the variable you believe may influence or help predict changes in another variable, called the response variable. In a study or regression model, it is the factor whose effect on the outcome you are investigating.
In statistical analysis, the explanatory variable (sometimes denoted ) is the predictor placed on the horizontal axis of a scatterplot or used as input in a regression equation. Unlike the term "independent variable," which implies direct causation through experimental manipulation, "explanatory variable" is deliberately neutral — it indicates a predictive or associative relationship that may or may not be causal, depending on the study design.
Key Formula
Where:
- = Explanatory variable (predictor)
- = Predicted value of the response variable
- = Slope — the predicted change in y for each one-unit increase in x
- = y-intercept — the predicted value of y when x = 0
How It Works
When you design a study or build a regression model, you identify which variable you think does the explaining (explanatory) and which variable responds (response). In an experiment, you actively set the values of the explanatory variable and observe the response. In an observational study, you simply record both variables as they naturally occur. The distinction matters because only a well-designed experiment with random assignment lets you claim the explanatory variable *causes* changes in the response. In observational studies, the explanatory variable helps predict or is associated with the response, but lurking variables may be the true cause.
Worked Example
Problem: A researcher records the number of hours 5 students studied and their exam scores. Hours studied: 1, 2, 3, 4, 5. Exam scores: 55, 62, 68, 75, 80. Identify the explanatory variable, sketch the relationship, and write the least-squares regression equation.
Identify variables: Hours studied is the explanatory variable () because the researcher believes study time helps predict exam performance. Exam score is the response variable ().
Plot the data: Place hours studied on the horizontal axis and exam score on the vertical axis. The points trend upward, suggesting a positive linear association.
Compute the regression equation: Using the least-squares formulas with , , the slope and intercept are:
Write the equation: The least-squares regression line is:
Answer: Hours studied is the explanatory variable. The regression equation predicts that each additional hour of study is associated with a 5-point increase in exam score.
Visualization
Why It Matters
Correctly identifying the explanatory variable is a core skill tested on the AP Statistics exam, especially in free-response questions about study design and inference for regression. Beyond the exam, researchers in public health, economics, and social science must distinguish explanatory from response variables to avoid overstating results — confusing association with causation can lead to flawed policy decisions.
Common Mistakes
Mistake: Assuming the explanatory variable causes changes in the response
Correction: An explanatory variable predicts or is associated with the response. Causation requires a randomized experiment with proper controls. In observational studies, lurking or confounding variables may be responsible for the observed association.
Mistake: Placing the explanatory variable on the vertical (y) axis
Correction: By convention, the explanatory variable goes on the horizontal (x) axis and the response variable goes on the vertical (y) axis. Reversing them changes the regression equation and can confuse interpretation of the slope.
