Regression Equation
Regression Equation
A function of a particular form (linear, quadratic, exponential, etc.) that fits a set of paired data as closely as possible.
See also
Linear regression, least-squares regression equation, least-squares regression line, scatterplot
Key Formula
y^=b0+b1x
Where:
- y^ = Predicted value of the dependent variable
- b0 = y-intercept (the predicted value of y when x = 0)
- b1 = Slope (the change in ŷ for each one-unit increase in x)
- x = Value of the independent (explanatory) variable
Worked Example
Problem: A teacher collects data on hours studied (x) and test scores (y) for 5 students: (1, 52), (2, 58), (3, 65), (4, 70), (5, 80). Find the linear regression equation.
Step 1: Compute the means of x and y. Sum the x-values and y-values, then divide each by n = 5.
xˉ=51+2+3+4+5=3,yˉ=552+58+65+70+80=65
Step 2: Compute the slope using the least-squares formula. Calculate the sum of products of deviations and the sum of squared deviations of x.
b1=∑(xi−xˉ)2∑(xi−xˉ)(yi−yˉ)=4+1+0+1+4(−2)(−13)+(−1)(−7)+(0)(0)+(1)(5)+(2)(15)=1026+7+0+5+30=1068=6.8
Step 3: Compute the y-intercept using the means and the slope.
b0=yˉ−b1xˉ=65−6.8(3)=65−20.4=44.6
Step 4: Write the regression equation.
y^=44.6+6.8x
Step 5: Use the equation to predict a score. For example, a student who studies 6 hours:
y^=44.6+6.8(6)=44.6+40.8=85.4
Answer: The regression equation is ŷ = 44.6 + 6.8x. A student who studies 6 hours is predicted to score about 85.4.
Another Example
This example uses exponential regression instead of linear, showing how the regression equation concept extends beyond straight lines. It also demonstrates the logarithmic transformation technique.
Problem: A biologist measures the population of bacteria (in thousands) at various times: (0, 2), (1, 6), (2, 18), (3, 55), (4, 160). The growth appears exponential. Find an exponential regression equation of the form ŷ = a · bˣ.
Step 1: Transform the data by taking the natural logarithm of each y-value to linearize the relationship. Let Y = ln(y).
Yi:ln2≈0.693,ln6≈1.792,ln18≈2.890,ln55≈4.007,ln160≈5.075
Step 2: Find the means of x and Y.
xˉ=2,Yˉ=50.693+1.792+2.890+4.007+5.075≈2.891
Step 3: Compute the slope of the linear regression on the transformed data.
b1=∑(xi−xˉ)2∑(xi−xˉ)(Yi−Yˉ)=4+1+0+1+4(−2)(−2.198)+(−1)(−1.099)+(0)(−.001)+(1)(1.116)+(2)(2.184)≈104.396+1.099+0+1.116+4.368=1010.979≈1.098
Step 4: Compute the intercept of the transformed regression, then convert back to exponential form.
b0=Yˉ−b1xˉ=2.891−1.098(2)=0.695⇒a=e0.695≈2.00,b=e1.098≈3.00
Step 5: Write the exponential regression equation.
y^=2.00⋅3.00x
Answer: The exponential regression equation is ŷ = 2 · 3ˣ (population in thousands). For example, at x = 5, the predicted population is 2 · 3⁵ = 486 thousand.
Frequently Asked Questions
What is the difference between a regression equation and a correlation coefficient?
A regression equation gives you a formula to predict one variable from another, while a correlation coefficient (r) measures the strength and direction of the linear relationship between two variables. The correlation coefficient is a single number between −1 and 1; the regression equation is an actual function you can use to make predictions. A strong correlation (r close to ±1) suggests the regression equation's predictions will be accurate.
How do you know which type of regression equation to use?
Start by plotting your data on a scatterplot. If the points follow a straight-line pattern, use linear regression. If they curve upward or downward in a parabolic shape, try quadratic regression. If the data shows rapid growth or decay, exponential regression is likely the best fit. Many calculators and software tools also report an R² value for each model, and you choose the one with the highest R².
Can you use a regression equation to predict values outside the data range?
You can, but it is risky. Predicting beyond the range of your original data is called extrapolation, and the relationship that holds within the data may not continue outside it. Interpolation—predicting within the data range—is generally more reliable. Always state your caution when extrapolating.
Regression Equation vs. Correlation Coefficient
| Regression Equation | Correlation Coefficient | |
|---|---|---|
| What it is | A function (e.g., ŷ = b₀ + b₁x) that predicts one variable from another | A number r between −1 and 1 measuring linear association strength |
| Output | A predicted y-value for any given x | A single numeric value indicating direction and strength |
| Purpose | Prediction and modeling | Describing the strength of a relationship |
| Can be non-linear? | Yes — quadratic, exponential, etc. | r measures linear relationships only (R² generalizes to other models) |
Why It Matters
Regression equations appear in nearly every statistics and AP math course, and they are central tools in science, economics, and data analysis. Whenever you need to predict an outcome—test scores from study hours, crop yield from rainfall, revenue from advertising spend—you build a regression equation. Understanding how to compute and interpret one is essential for reading research, making data-driven decisions, and succeeding on standardized tests.
Common Mistakes
Mistake: Confusing the roles of x and y and computing the wrong slope.
Correction: The regression of y on x is not the same as x on y. Always identify which variable is explanatory (x) and which is the response (y) before computing. Switching them produces a completely different equation.
Mistake: Using a linear regression equation when the data is clearly non-linear.
Correction: Always examine a scatterplot of your data first. If the points show a curve, a linear model will give poor predictions. Choose the regression form (quadratic, exponential, etc.) that matches the shape of the data, and check the R² value to confirm a good fit.
Related Terms
- Linear Regression — Most common type of regression equation
- Least-Squares Regression Equation — Method used to find the best-fit line
- Least-Squares Regression Line — The specific line minimizing squared residuals
- Scatterplot — Graph used to visualize paired data
- Paired Data — The (x, y) data points a regression fits
- Exponential Function — Form used in exponential regression
- Quadratic — Form used in quadratic regression
- Function — A regression equation is a specific type of function
