Mathwords logoMathwords

Linear Regression

Linear Regression

The process of finding a linear fit.

 

 

 

See also

Least-squares regression line

Key Formula

y^=a+bxwhereb=nxyxynx2(x)2anda=yˉbxˉ\hat{y} = a + bx \quad \text{where} \quad b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} \quad \text{and} \quad a = \bar{y} - b\bar{x}
Where:
  • y^\hat{y} = Predicted value of the dependent variable
  • xx = Independent (explanatory) variable
  • bb = Slope of the regression line
  • aa = y-intercept of the regression line
  • nn = Number of data points
  • xˉ,yˉ\bar{x}, \bar{y} = Means of the x- and y-values

Worked Example

Problem: Find the linear regression equation for the data points: (1, 2), (2, 4), (3, 5), (4, 4), (5, 5).
Step 1: Compute the necessary sums with n = 5.
x=15,y=20,xy=67,x2=55\sum x = 15,\quad \sum y = 20,\quad \sum xy = 67,\quad \sum x^2 = 55
Step 2: Calculate the slope b using the formula.
b=5(67)(15)(20)5(55)(15)2=335300275225=3550=0.7b = \frac{5(67) - (15)(20)}{5(55) - (15)^2} = \frac{335 - 300}{275 - 225} = \frac{35}{50} = 0.7
Step 3: Find the means and then the intercept a.
xˉ=3,yˉ=4,a=40.7(3)=1.9\bar{x} = 3,\quad \bar{y} = 4,\quad a = 4 - 0.7(3) = 1.9
Answer: The least-squares regression line is ŷ = 1.9 + 0.7x.

Why It Matters

Linear regression is one of the most widely used tools in statistics and data science. It lets you model trends in data—such as predicting sales from advertising spend or estimating a student's test score from study hours. It also serves as the foundation for more advanced regression techniques like multiple regression and polynomial regression.

Common Mistakes

Mistake: Assuming that a good-looking regression line means x causes y.
Correction: Linear regression measures association, not causation. A strong linear fit does not by itself prove that changes in x cause changes in y; other variables or coincidence could explain the relationship.

Related Terms