Linear Fit — Definition, Formula & Examples

Linear Fit
Regression Line

Any line used to model the pattern in a set of paired data. Note: The least-squares regression line is the most commonly used linear fit.

See also

Scatterplot

Key Formula

\hat{y} = a + bx

where

b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - \left(\sum x\right)^2}

a = \bar{y} - b\bar{x}

Where:

$\hat{y}$ = The predicted value of the response variable for a given x
$x$ = The explanatory (independent) variable
$b$ = The slope of the line — the change in ŷ for each one-unit increase in x
$a$ = The y-intercept — the predicted value of y when x = 0
$n$ = The number of data points
$\bar{x}$ = The mean of all x-values
$\bar{y}$ = The mean of all y-values

Worked Example

Problem: Find the linear fit (least-squares regression line) for the data: (1, 2), (2, 5), (3, 7), (4, 9), (5, 12).

Step 1: Compute the necessary sums. With n = 5 data points:

\sum x = 1+2+3+4+5 = 15, \quad \sum y = 2+5+7+9+12 = 35$$ $$\sum x^2 = 1+4+9+16+25 = 55$$ $$\sum xy = (1)(2)+(2)(5)+(3)(7)+(4)(9)+(5)(12) = 2+10+21+36+60 = 129

Step 2: Calculate the slope b using the formula:

b = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} = \frac{5(129) - 15(35)}{5(55) - 15^2} = \frac{645 - 525}{275 - 225} = \frac{120}{50} = 2.4

Step 3: Find the means of x and y:

\bar{x} = \frac{15}{5} = 3, \quad \bar{y} = \frac{35}{5} = 7

Step 4: Calculate the y-intercept a:

a = \bar{y} - b\bar{x} = 7 - 2.4(3) = 7 - 7.2 = -0.2

Step 5: Write the equation of the linear fit:

\hat{y} = -0.2 + 2.4x

Answer: The linear fit is ŷ = −0.2 + 2.4x. This means that for each additional unit increase in x, the predicted y-value increases by 2.4.

Another Example

This example shows how to use a linear fit for prediction and how to compute residuals, rather than deriving the line itself.

Problem: Using the linear fit ŷ = −0.2 + 2.4x from the previous example, predict y when x = 7, and compute the residual for the data point (3, 7).

Step 1: Predict y when x = 7 by substituting into the equation:

\hat{y} = -0.2 + 2.4(7) = -0.2 + 16.8 = 16.6

Step 2: For the residual at (3, 7), first find the predicted value at x = 3:

\hat{y} = -0.2 + 2.4(3) = -0.2 + 7.2 = 7.0

Step 3: The residual is the actual y-value minus the predicted value:

\text{residual} = y - \hat{y} = 7 - 7.0 = 0

Answer: The predicted value at x = 7 is 16.6. The residual at the point (3, 7) is 0, meaning the regression line passes exactly through that data point.

Frequently Asked Questions

What is the difference between a linear fit and a line of best fit?

A linear fit is any straight line used to model paired data, while 'line of best fit' usually refers specifically to the least-squares regression line. The least-squares line is the unique line that minimizes the sum of squared residuals. Other linear fits—such as a line drawn by eye—may model the general trend but are not optimal in this mathematical sense.

When should you not use a linear fit?

You should avoid using a linear fit when the scatterplot of your data shows a curved pattern rather than a straight-line trend. Applying a linear model to nonlinear data leads to poor predictions and misleading conclusions. Always examine a scatterplot or a residual plot before deciding that a linear model is appropriate.

How do you know if a linear fit is good?

Check the coefficient of determination, r², which tells you what fraction of the variation in y is explained by the linear relationship with x. An r² close to 1 indicates a strong fit. Also inspect the residual plot: if the residuals are randomly scattered around zero with no pattern, the linear model is a reasonable choice.

Linear Fit (any linear model) vs. Least-Squares Regression Line

	Linear Fit (any linear model)	Least-Squares Regression Line
Definition	Any straight line used to model paired data	The unique line that minimizes the sum of squared residuals
Uniqueness	Many possible lines could be drawn	Exactly one line for a given data set
How it's found	Can be drawn by eye or calculated by various methods	Calculated using the slope and intercept formulas involving sums
Accuracy	Varies — may not be optimal	Optimal in the least-squares sense

Why It Matters

Linear fit appears throughout statistics courses, science labs, and real-world data analysis whenever you need to describe a trend or make predictions from paired data. You will encounter it in algebra, AP Statistics, and any science class that requires graphing experimental results. Understanding how to compute and interpret a linear fit is essential for evaluating whether a straight-line model genuinely captures the relationship in your data.

Common Mistakes

Mistake: Using a linear fit without checking whether the data actually follow a linear pattern.

Correction: Always plot the data first. If the scatterplot or residual plot shows a curve, a linear fit is inappropriate and a different model (quadratic, exponential, etc.) should be considered.

Mistake: Confusing the slope formula's numerator and denominator or swapping x and y.

Correction: The slope formula has n∑xy − (∑x)(∑y) on top and n∑x² − (∑x)² on the bottom. Mixing up x and y gives the regression of x on y, which is a different line. Double-check which variable is explanatory (x) and which is the response (y).

Related Terms

Least-Squares Regression Line — The most common and optimal linear fit
Scatterplot — Graph used to visualize paired data before fitting
Paired Data — The (x, y) data that a linear fit models
Line — The geometric object a linear fit represents
Model — A linear fit is one type of mathematical model
Set — Data used for fitting form a set of points
Slope — Measures steepness and direction of the linear fit
Correlation — Measures strength of linear relationship in data