Linear Regression
Key Formula
y^=a+bxwhereb=n∑x2−(∑x)2n∑xy−∑x∑yanda=yˉ−bxˉ
Where:
- y^ = Predicted value of the dependent variable
- x = Independent (explanatory) variable
- b = Slope of the regression line
- a = y-intercept of the regression line
- n = Number of data points
- xˉ,yˉ = Means of the x- and y-values
Worked Example
Problem: Find the linear regression equation for the data points: (1, 2), (2, 4), (3, 5), (4, 4), (5, 5).
Step 1: Compute the necessary sums with n = 5.
∑x=15,∑y=20,∑xy=67,∑x2=55
Step 2: Calculate the slope b using the formula.
b=5(55)−(15)25(67)−(15)(20)=275−225335−300=5035=0.7
Step 3: Find the means and then the intercept a.
xˉ=3,yˉ=4,a=4−0.7(3)=1.9
Answer: The least-squares regression line is ŷ = 1.9 + 0.7x.
Why It Matters
Linear regression is one of the most widely used tools in statistics and data science. It lets you model trends in data—such as predicting sales from advertising spend or estimating a student's test score from study hours. It also serves as the foundation for more advanced regression techniques like multiple regression and polynomial regression.
Common Mistakes
Mistake: Assuming that a good-looking regression line means x causes y.
Correction: Linear regression measures association, not causation. A strong linear fit does not by itself prove that changes in x cause changes in y; other variables or coincidence could explain the relationship.
Related Terms
- Linear Fit — The resulting line from linear regression
- Least-Squares Regression Line — The specific line that minimizes squared errors
- Correlation Coefficient — Measures the strength of the linear relationship
- Residual — Difference between observed and predicted values
- Scatter Plot — Graph used to visualize data before regression
