Mathwords logoMathwords

Linear Fit — Definition, Formula & Examples

Linear Fit
Regression Line

Any line used to model the pattern in a set of paired data. Note: The least-squares regression line is the most commonly used linear fit.

 

Scatterplot with x and y axes showing a downward-sloping line (linear fit) through a scattered set of data points.

 

 

See also

Scatterplot

Key Formula

y^=a+bx\hat{y} = a + bx where b=nxyxynx2(x)2b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - \left(\sum x\right)^2} a=yˉbxˉa = \bar{y} - b\bar{x}
Where:
  • y^\hat{y} = The predicted value of the response variable for a given x
  • xx = The explanatory (independent) variable
  • bb = The slope of the line — the change in ŷ for each one-unit increase in x
  • aa = The y-intercept — the predicted value of y when x = 0
  • nn = The number of data points
  • xˉ\bar{x} = The mean of all x-values
  • yˉ\bar{y} = The mean of all y-values

Worked Example

Problem: Find the linear fit (least-squares regression line) for the data: (1, 2), (2, 5), (3, 7), (4, 9), (5, 12).
Step 1: Compute the necessary sums. With n = 5 data points:
\sum x = 1+2+3+4+5 = 15, \quad \sum y = 2+5+7+9+12 = 35$$ $$\sum x^2 = 1+4+9+16+25 = 55$$ $$\sum xy = (1)(2)+(2)(5)+(3)(7)+(4)(9)+(5)(12) = 2+10+21+36+60 = 129
Step 2: Calculate the slope b using the formula:
b=nxyxynx2(x)2=5(129)15(35)5(55)152=645525275225=12050=2.4b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} = \frac{5(129) - 15(35)}{5(55) - 15^2} = \frac{645 - 525}{275 - 225} = \frac{120}{50} = 2.4
Step 3: Find the means of x and y:
xˉ=155=3,yˉ=355=7\bar{x} = \frac{15}{5} = 3, \quad \bar{y} = \frac{35}{5} = 7
Step 4: Calculate the y-intercept a:
a=yˉbxˉ=72.4(3)=77.2=0.2a = \bar{y} - b\bar{x} = 7 - 2.4(3) = 7 - 7.2 = -0.2
Step 5: Write the equation of the linear fit:
y^=0.2+2.4x\hat{y} = -0.2 + 2.4x
Answer: The linear fit is ŷ = −0.2 + 2.4x. This means that for each additional unit increase in x, the predicted y-value increases by 2.4.

Another Example

This example shows how to use a linear fit for prediction and how to compute residuals, rather than deriving the line itself.

Problem: Using the linear fit ŷ = −0.2 + 2.4x from the previous example, predict y when x = 7, and compute the residual for the data point (3, 7).
Step 1: Predict y when x = 7 by substituting into the equation:
y^=0.2+2.4(7)=0.2+16.8=16.6\hat{y} = -0.2 + 2.4(7) = -0.2 + 16.8 = 16.6
Step 2: For the residual at (3, 7), first find the predicted value at x = 3:
y^=0.2+2.4(3)=0.2+7.2=7.0\hat{y} = -0.2 + 2.4(3) = -0.2 + 7.2 = 7.0
Step 3: The residual is the actual y-value minus the predicted value:
residual=yy^=77.0=0\text{residual} = y - \hat{y} = 7 - 7.0 = 0
Answer: The predicted value at x = 7 is 16.6. The residual at the point (3, 7) is 0, meaning the regression line passes exactly through that data point.

Frequently Asked Questions

What is the difference between a linear fit and a line of best fit?
A linear fit is any straight line used to model paired data, while 'line of best fit' usually refers specifically to the least-squares regression line. The least-squares line is the unique line that minimizes the sum of squared residuals. Other linear fits—such as a line drawn by eye—may model the general trend but are not optimal in this mathematical sense.
When should you not use a linear fit?
You should avoid using a linear fit when the scatterplot of your data shows a curved pattern rather than a straight-line trend. Applying a linear model to nonlinear data leads to poor predictions and misleading conclusions. Always examine a scatterplot or a residual plot before deciding that a linear model is appropriate.
How do you know if a linear fit is good?
Check the coefficient of determination, r², which tells you what fraction of the variation in y is explained by the linear relationship with x. An r² close to 1 indicates a strong fit. Also inspect the residual plot: if the residuals are randomly scattered around zero with no pattern, the linear model is a reasonable choice.

Linear Fit (any linear model) vs. Least-Squares Regression Line

Linear Fit (any linear model)Least-Squares Regression Line
DefinitionAny straight line used to model paired dataThe unique line that minimizes the sum of squared residuals
UniquenessMany possible lines could be drawnExactly one line for a given data set
How it's foundCan be drawn by eye or calculated by various methodsCalculated using the slope and intercept formulas involving sums
AccuracyVaries — may not be optimalOptimal in the least-squares sense

Why It Matters

Linear fit appears throughout statistics courses, science labs, and real-world data analysis whenever you need to describe a trend or make predictions from paired data. You will encounter it in algebra, AP Statistics, and any science class that requires graphing experimental results. Understanding how to compute and interpret a linear fit is essential for evaluating whether a straight-line model genuinely captures the relationship in your data.

Common Mistakes

Mistake: Using a linear fit without checking whether the data actually follow a linear pattern.
Correction: Always plot the data first. If the scatterplot or residual plot shows a curve, a linear fit is inappropriate and a different model (quadratic, exponential, etc.) should be considered.
Mistake: Confusing the slope formula's numerator and denominator or swapping x and y.
Correction: The slope formula has n∑xy − (∑x)(∑y) on top and n∑x² − (∑x)² on the bottom. Mixing up x and y gives the regression of x on y, which is a different line. Double-check which variable is explanatory (x) and which is the response (y).

Related Terms

  • Least-Squares Regression LineThe most common and optimal linear fit
  • ScatterplotGraph used to visualize paired data before fitting
  • Paired DataThe (x, y) data that a linear fit models
  • LineThe geometric object a linear fit represents
  • ModelA linear fit is one type of mathematical model
  • SetData used for fitting form a set of points
  • SlopeMeasures steepness and direction of the linear fit
  • CorrelationMeasures strength of linear relationship in data