Mathwords logoMathwords

Regression Equation

Regression Equation

A function of a particular form (linear, quadratic, exponential, etc.) that fits a set of paired data as closely as possible.

 

 

 

See also

Linear regression, least-squares regression equation, least-squares regression line, scatterplot

Key Formula

y^=b0+b1x\hat{y} = b_0 + b_1 x
Where:
  • y^\hat{y} = Predicted value of the dependent variable
  • b0b_0 = y-intercept (the predicted value of y when x = 0)
  • b1b_1 = Slope (the change in ŷ for each one-unit increase in x)
  • xx = Value of the independent (explanatory) variable

Worked Example

Problem: A teacher collects data on hours studied (x) and test scores (y) for 5 students: (1, 52), (2, 58), (3, 65), (4, 70), (5, 80). Find the linear regression equation.
Step 1: Compute the means of x and y. Sum the x-values and y-values, then divide each by n = 5.
xˉ=1+2+3+4+55=3,yˉ=52+58+65+70+805=65\bar{x} = \frac{1+2+3+4+5}{5} = 3, \quad \bar{y} = \frac{52+58+65+70+80}{5} = 65
Step 2: Compute the slope using the least-squares formula. Calculate the sum of products of deviations and the sum of squared deviations of x.
b1=(xixˉ)(yiyˉ)(xixˉ)2=(2)(13)+(1)(7)+(0)(0)+(1)(5)+(2)(15)4+1+0+1+4=26+7+0+5+3010=6810=6.8b_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = \frac{(-2)(-13)+(-1)(-7)+(0)(0)+(1)(5)+(2)(15)}{4+1+0+1+4} = \frac{26+7+0+5+30}{10} = \frac{68}{10} = 6.8
Step 3: Compute the y-intercept using the means and the slope.
b0=yˉb1xˉ=656.8(3)=6520.4=44.6b_0 = \bar{y} - b_1 \bar{x} = 65 - 6.8(3) = 65 - 20.4 = 44.6
Step 4: Write the regression equation.
y^=44.6+6.8x\hat{y} = 44.6 + 6.8x
Step 5: Use the equation to predict a score. For example, a student who studies 6 hours:
y^=44.6+6.8(6)=44.6+40.8=85.4\hat{y} = 44.6 + 6.8(6) = 44.6 + 40.8 = 85.4
Answer: The regression equation is ŷ = 44.6 + 6.8x. A student who studies 6 hours is predicted to score about 85.4.

Another Example

This example uses exponential regression instead of linear, showing how the regression equation concept extends beyond straight lines. It also demonstrates the logarithmic transformation technique.

Problem: A biologist measures the population of bacteria (in thousands) at various times: (0, 2), (1, 6), (2, 18), (3, 55), (4, 160). The growth appears exponential. Find an exponential regression equation of the form ŷ = a · bˣ.
Step 1: Transform the data by taking the natural logarithm of each y-value to linearize the relationship. Let Y = ln(y).
Yi:ln20.693,  ln61.792,  ln182.890,  ln554.007,  ln1605.075Y_i: \quad \ln 2 \approx 0.693,\; \ln 6 \approx 1.792,\; \ln 18 \approx 2.890,\; \ln 55 \approx 4.007,\; \ln 160 \approx 5.075
Step 2: Find the means of x and Y.
xˉ=2,Yˉ=0.693+1.792+2.890+4.007+5.07552.891\bar{x} = 2, \quad \bar{Y} = \frac{0.693+1.792+2.890+4.007+5.075}{5} \approx 2.891
Step 3: Compute the slope of the linear regression on the transformed data.
b1=(xixˉ)(YiYˉ)(xixˉ)2=(2)(2.198)+(1)(1.099)+(0)(.001)+(1)(1.116)+(2)(2.184)4+1+0+1+44.396+1.099+0+1.116+4.36810=10.979101.098b_1 = \frac{\sum (x_i - \bar{x})(Y_i - \bar{Y})}{\sum (x_i - \bar{x})^2} = \frac{(-2)(-2.198)+(-1)(-1.099)+(0)(-.001)+(1)(1.116)+(2)(2.184)}{4+1+0+1+4} \approx \frac{4.396+1.099+0+1.116+4.368}{10} = \frac{10.979}{10} \approx 1.098
Step 4: Compute the intercept of the transformed regression, then convert back to exponential form.
b0=Yˉb1xˉ=2.8911.098(2)=0.695a=e0.6952.00,b=e1.0983.00b_0 = \bar{Y} - b_1\bar{x} = 2.891 - 1.098(2) = 0.695 \quad \Rightarrow \quad a = e^{0.695} \approx 2.00, \quad b = e^{1.098} \approx 3.00
Step 5: Write the exponential regression equation.
y^=2.003.00x\hat{y} = 2.00 \cdot 3.00^{x}
Answer: The exponential regression equation is ŷ = 2 · 3ˣ (population in thousands). For example, at x = 5, the predicted population is 2 · 3⁵ = 486 thousand.

Frequently Asked Questions

What is the difference between a regression equation and a correlation coefficient?
A regression equation gives you a formula to predict one variable from another, while a correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. The correlation coefficient is a single number between −1 and 1; the regression equation is an actual function you can use to make predictions. A strong correlation (r close to ±1) suggests the regression equation's predictions will be accurate.
How do you know which type of regression equation to use?
Start by plotting your data on a scatterplot. If the points follow a straight-line pattern, use linear regression. If they curve upward or downward in a parabolic shape, try quadratic regression. If the data shows rapid growth or decay, exponential regression is likely the best fit. Many calculators and software tools also report an R² value for each model, and you choose the one with the highest R².
Can you use a regression equation to predict values outside the data range?
You can, but it is risky. Predicting beyond the range of your original data is called extrapolation, and the relationship that holds within the data may not continue outside it. Interpolation—predicting within the data range—is generally more reliable. Always state your caution when extrapolating.

Regression Equation vs. Correlation Coefficient

Regression EquationCorrelation Coefficient
What it isA function (e.g., ŷ = b₀ + b₁x) that predicts one variable from anotherA number r between −1 and 1 measuring linear association strength
OutputA predicted y-value for any given xA single numeric value indicating direction and strength
PurposePrediction and modelingDescribing the strength of a relationship
Can be non-linear?Yes — quadratic, exponential, etc.r measures linear relationships only (R² generalizes to other models)

Why It Matters

Regression equations appear in nearly every statistics and AP math course, and they are central tools in science, economics, and data analysis. Whenever you need to predict an outcome—test scores from study hours, crop yield from rainfall, revenue from advertising spend—you build a regression equation. Understanding how to compute and interpret one is essential for reading research, making data-driven decisions, and succeeding on standardized tests.

Common Mistakes

Mistake: Confusing the roles of x and y and computing the wrong slope.
Correction: The regression of y on x is not the same as x on y. Always identify which variable is explanatory (x) and which is the response (y) before computing. Switching them produces a completely different equation.
Mistake: Using a linear regression equation when the data is clearly non-linear.
Correction: Always examine a scatterplot of your data first. If the points show a curve, a linear model will give poor predictions. Choose the regression form (quadratic, exponential, etc.) that matches the shape of the data, and check the R² value to confirm a good fit.

Related Terms