Linear Fit — Definition, Formula & Examples
Linear Fit
Regression Line
Any line used to model the pattern in a set of paired data. Note: The least-squares regression line is the most commonly used linear fit.

See also
Key Formula
y^=a+bx
where
b=n∑x2−(∑x)2n∑xy−∑x∑y
a=yˉ−bxˉ
Where:
- y^ = The predicted value of the response variable for a given x
- x = The explanatory (independent) variable
- b = The slope of the line — the change in ŷ for each one-unit increase in x
- a = The y-intercept — the predicted value of y when x = 0
- n = The number of data points
- xˉ = The mean of all x-values
- yˉ = The mean of all y-values
Worked Example
Problem: Find the linear fit (least-squares regression line) for the data: (1, 2), (2, 5), (3, 7), (4, 9), (5, 12).
Step 1: Compute the necessary sums. With n = 5 data points:
\sum x = 1+2+3+4+5 = 15, \quad \sum y = 2+5+7+9+12 = 35$$
$$\sum x^2 = 1+4+9+16+25 = 55$$
$$\sum xy = (1)(2)+(2)(5)+(3)(7)+(4)(9)+(5)(12) = 2+10+21+36+60 = 129
Step 2: Calculate the slope b using the formula:
b=n∑x2−(∑x)2n∑xy−∑x∑y=5(55)−1525(129)−15(35)=275−225645−525=50120=2.4
Step 3: Find the means of x and y:
xˉ=515=3,yˉ=535=7
Step 4: Calculate the y-intercept a:
a=yˉ−bxˉ=7−2.4(3)=7−7.2=−0.2
Step 5: Write the equation of the linear fit:
y^=−0.2+2.4x
Answer: The linear fit is ŷ = −0.2 + 2.4x. This means that for each additional unit increase in x, the predicted y-value increases by 2.4.
Another Example
This example shows how to use a linear fit for prediction and how to compute residuals, rather than deriving the line itself.
Problem: Using the linear fit ŷ = −0.2 + 2.4x from the previous example, predict y when x = 7, and compute the residual for the data point (3, 7).
Step 1: Predict y when x = 7 by substituting into the equation:
y^=−0.2+2.4(7)=−0.2+16.8=16.6
Step 2: For the residual at (3, 7), first find the predicted value at x = 3:
y^=−0.2+2.4(3)=−0.2+7.2=7.0
Step 3: The residual is the actual y-value minus the predicted value:
residual=y−y^=7−7.0=0
Answer: The predicted value at x = 7 is 16.6. The residual at the point (3, 7) is 0, meaning the regression line passes exactly through that data point.
Frequently Asked Questions
What is the difference between a linear fit and a line of best fit?
A linear fit is any straight line used to model paired data, while 'line of best fit' usually refers specifically to the least-squares regression line. The least-squares line is the unique line that minimizes the sum of squared residuals. Other linear fits—such as a line drawn by eye—may model the general trend but are not optimal in this mathematical sense.
When should you not use a linear fit?
You should avoid using a linear fit when the scatterplot of your data shows a curved pattern rather than a straight-line trend. Applying a linear model to nonlinear data leads to poor predictions and misleading conclusions. Always examine a scatterplot or a residual plot before deciding that a linear model is appropriate.
How do you know if a linear fit is good?
Check the coefficient of determination, r², which tells you what fraction of the variation in y is explained by the linear relationship with x. An r² close to 1 indicates a strong fit. Also inspect the residual plot: if the residuals are randomly scattered around zero with no pattern, the linear model is a reasonable choice.
Linear Fit (any linear model) vs. Least-Squares Regression Line
| Linear Fit (any linear model) | Least-Squares Regression Line | |
|---|---|---|
| Definition | Any straight line used to model paired data | The unique line that minimizes the sum of squared residuals |
| Uniqueness | Many possible lines could be drawn | Exactly one line for a given data set |
| How it's found | Can be drawn by eye or calculated by various methods | Calculated using the slope and intercept formulas involving sums |
| Accuracy | Varies — may not be optimal | Optimal in the least-squares sense |
Why It Matters
Linear fit appears throughout statistics courses, science labs, and real-world data analysis whenever you need to describe a trend or make predictions from paired data. You will encounter it in algebra, AP Statistics, and any science class that requires graphing experimental results. Understanding how to compute and interpret a linear fit is essential for evaluating whether a straight-line model genuinely captures the relationship in your data.
Common Mistakes
Mistake: Using a linear fit without checking whether the data actually follow a linear pattern.
Correction: Always plot the data first. If the scatterplot or residual plot shows a curve, a linear fit is inappropriate and a different model (quadratic, exponential, etc.) should be considered.
Mistake: Confusing the slope formula's numerator and denominator or swapping x and y.
Correction: The slope formula has n∑xy − (∑x)(∑y) on top and n∑x² − (∑x)² on the bottom. Mixing up x and y gives the regression of x on y, which is a different line. Double-check which variable is explanatory (x) and which is the response (y).
Related Terms
- Least-Squares Regression Line — The most common and optimal linear fit
- Scatterplot — Graph used to visualize paired data before fitting
- Paired Data — The (x, y) data that a linear fit models
- Line — The geometric object a linear fit represents
- Model — A linear fit is one type of mathematical model
- Set — Data used for fitting form a set of points
- Slope — Measures steepness and direction of the linear fit
- Correlation — Measures strength of linear relationship in data
