Mathwords logoReference LibraryMathwords

Line of Best Fit

A line of best fit is a straight line drawn through the points on a scatterplot that comes as close as possible to all the data points. It shows the general trend in the data and can be used to make predictions.

A line of best fit, also called a trend line, is a linear function that minimizes the overall distance between itself and the data points in a scatterplot. When calculated using the least-squares method, it minimizes the sum of the squared vertical distances (residuals) from each point to the line. The result is an equation of the form y=mx+by = mx + b that best represents the linear relationship between the two variables.

Key Formula

y^=mx+b\hat{y} = mx + b
Where:
  • y^\hat{y} = the predicted value of the response variable
  • mm = the slope of the line (rate of change)
  • xx = the value of the explanatory variable
  • bb = the y-intercept (predicted value when x = 0)

Worked Example

Problem: A student records the number of hours studied and the test score for 5 classmates: (1, 55), (2, 60), (3, 70), (4, 75), (5, 85). Find the line of best fit and predict the score for someone who studies 6 hours.
Step 1: Find the mean of the x-values and the mean of the y-values.
xˉ=1+2+3+4+55=3,yˉ=55+60+70+75+855=69\bar{x} = \frac{1+2+3+4+5}{5} = 3, \quad \bar{y} = \frac{55+60+70+75+85}{5} = 69
Step 2: Calculate the slope using the formula for m. Multiply each deviation from the mean and sum them up.
m=(xixˉ)(yiyˉ)(xixˉ)2=(2)(14)+(1)(9)+(0)(1)+(1)(6)+(2)(16)4+1+0+1+4=28+9+0+6+3210=7510=7.5m = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} = \frac{(-2)(-14)+(-1)(-9)+(0)(1)+(1)(6)+(2)(16)}{4+1+0+1+4} = \frac{28+9+0+6+32}{10} = \frac{75}{10} = 7.5
Step 3: Find the y-intercept by substituting the means and the slope into the equation.
b=yˉmxˉ=697.5(3)=6922.5=46.5b = \bar{y} - m\bar{x} = 69 - 7.5(3) = 69 - 22.5 = 46.5
Step 4: Write the equation of the line of best fit.
y^=7.5x+46.5\hat{y} = 7.5x + 46.5
Step 5: Predict the test score for 6 hours of studying by substituting x = 6.
y^=7.5(6)+46.5=45+46.5=91.5\hat{y} = 7.5(6) + 46.5 = 45 + 46.5 = 91.5
Answer: The line of best fit is y^=7.5x+46.5\hat{y} = 7.5x + 46.5. A student who studies 6 hours is predicted to score about 91.5.

Visualization

Why It Matters

Lines of best fit are used constantly in science, economics, and medicine to identify trends and make predictions from data. For example, a researcher might use one to predict how a patient's blood pressure changes with age, or an economist might model how spending changes with income. Understanding how to find and interpret this line is a foundation for more advanced statistical modeling.

Common Mistakes

Mistake: Drawing the line so it passes through as many points as possible instead of balancing the points above and below.
Correction: The line of best fit minimizes the overall distance to all points. It doesn't need to pass through any specific point — it should follow the general trend so that the points are spread roughly evenly on both sides.
Mistake: Using the line to predict far outside the range of the original data (extrapolation) and treating the result as reliable.
Correction: Predictions are most trustworthy within the range of your data. Extrapolating well beyond the data assumes the trend continues unchanged, which may not be true.

Related Terms