Mathwords logoMathwords

Least Squares Fitting — Polynomial — Definition, Formula & Examples

Least squares fitting — polynomial is a method for finding the polynomial curve (quadratic, cubic, etc.) that best fits a set of data points by minimizing the sum of the squared differences between observed and predicted values.

Given nn data points (xi,yi)(x_i, y_i) and a polynomial of degree dd, p(x)=a0+a1x+a2x2++adxdp(x) = a_0 + a_1 x + a_2 x^2 + \cdots + a_d x^d, the least squares polynomial fit determines the coefficients a0,a1,,ada_0, a_1, \ldots, a_d that minimize S=i=1n(yip(xi))2S = \sum_{i=1}^{n} (y_i - p(x_i))^2. The optimal coefficients satisfy the normal equations (XTX)a=XTy(X^T X)\mathbf{a} = X^T \mathbf{y}, where XX is the Vandermonde matrix with entries Xij=xijX_{ij} = x_i^{\,j}.

Key Formula

S=i=1n(yi(a0+a1xi+a2xi2++adxid))2S = \sum_{i=1}^{n}\bigl(y_i - (a_0 + a_1 x_i + a_2 x_i^2 + \cdots + a_d x_i^d)\bigr)^2
Where:
  • SS = Sum of squared residuals to be minimized
  • nn = Number of data points
  • xi,yix_i, y_i = The $i$-th observed data point
  • a0,a1,,ada_0, a_1, \ldots, a_d = Polynomial coefficients to be determined
  • dd = Degree of the fitting polynomial

How It Works

You construct a design matrix XX whose columns are powers of your xx-values: a column of ones, a column of xix_i, a column of xi2x_i^2, and so on up to degree dd. Then you solve the normal equations (XTX)a=XTy(X^T X)\mathbf{a} = X^T \mathbf{y} for the coefficient vector a\mathbf{a}. For a quadratic fit (d=2d = 2), this yields three equations in three unknowns. The resulting polynomial minimizes the total squared error across all data points, extending the idea of a least squares regression line to curved relationships.

Worked Example

Problem: Fit a quadratic polynomial p(x)=a0+a1x+a2x2p(x) = a_0 + a_1 x + a_2 x^2 to the data points (0,1)(0, 1), (1,2)(1, 2), (2,9)(2, 9).
Build the design matrix: Each row corresponds to a data point, with columns for x0x^0, x1x^1, and x2x^2.
X=(100111124),y=(129)X = \begin{pmatrix} 1 & 0 & 0 \\ 1 & 1 & 1 \\ 1 & 2 & 4 \end{pmatrix}, \quad \mathbf{y} = \begin{pmatrix} 1 \\ 2 \\ 9 \end{pmatrix}
Solve the normal equations: Since we have 3 data points and 3 unknowns, the system Xa=yX\mathbf{a} = \mathbf{y} has a unique solution. Solve directly.
{a0=1a0+a1+a2=2a0+2a1+4a2=9\begin{cases} a_0 = 1 \\ a_0 + a_1 + a_2 = 2 \\ a_0 + 2a_1 + 4a_2 = 9 \end{cases}
Find the coefficients: From the first equation, a0=1a_0 = 1. Substituting into the second gives a1+a2=1a_1 + a_2 = 1. The third gives 2a1+4a2=82a_1 + 4a_2 = 8, so a1+2a2=4a_1 + 2a_2 = 4. Subtracting yields a2=3a_2 = 3 and a1=2a_1 = -2.
a0=1,a1=2,a2=3a_0 = 1, \quad a_1 = -2, \quad a_2 = 3
Answer: The best-fit quadratic is p(x)=12x+3x2p(x) = 1 - 2x + 3x^2. Since the number of data points equals the number of coefficients, this polynomial passes exactly through all three points with S=0S = 0.

Why It Matters

Many real-world datasets — projectile trajectories, cost curves, sensor calibration data — follow nonlinear trends that a straight line cannot capture. Polynomial least squares fitting is a standard tool in engineering, physics, and data science courses for modeling curvature without jumping to more complex nonlinear regression methods.

Common Mistakes

Mistake: Choosing too high a polynomial degree, which causes the curve to overfit and oscillate wildly between data points.
Correction: Use the lowest degree that captures the data's trend. Compare residual plots or use adjusted R2R^2 to judge whether increasing the degree genuinely improves the fit.