Least Squares Fitting — Polynomial — Definition, Formula & Examples
Least squares fitting — polynomial is a method for finding the polynomial curve (quadratic, cubic, etc.) that best fits a set of data points by minimizing the sum of the squared differences between observed and predicted values.
Given data points and a polynomial of degree , , the least squares polynomial fit determines the coefficients that minimize . The optimal coefficients satisfy the normal equations , where is the Vandermonde matrix with entries .
Key Formula
Where:
- = Sum of squared residuals to be minimized
- = Number of data points
- = The $i$-th observed data point
- = Polynomial coefficients to be determined
- = Degree of the fitting polynomial
How It Works
You construct a design matrix whose columns are powers of your -values: a column of ones, a column of , a column of , and so on up to degree . Then you solve the normal equations for the coefficient vector . For a quadratic fit (), this yields three equations in three unknowns. The resulting polynomial minimizes the total squared error across all data points, extending the idea of a least squares regression line to curved relationships.
Worked Example
Problem: Fit a quadratic polynomial to the data points , , .
Build the design matrix: Each row corresponds to a data point, with columns for , , and .
Solve the normal equations: Since we have 3 data points and 3 unknowns, the system has a unique solution. Solve directly.
Find the coefficients: From the first equation, . Substituting into the second gives . The third gives , so . Subtracting yields and .
Answer: The best-fit quadratic is . Since the number of data points equals the number of coefficients, this polynomial passes exactly through all three points with .
Why It Matters
Many real-world datasets — projectile trajectories, cost curves, sensor calibration data — follow nonlinear trends that a straight line cannot capture. Polynomial least squares fitting is a standard tool in engineering, physics, and data science courses for modeling curvature without jumping to more complex nonlinear regression methods.
Common Mistakes
Mistake: Choosing too high a polynomial degree, which causes the curve to overfit and oscillate wildly between data points.
Correction: Use the lowest degree that captures the data's trend. Compare residual plots or use adjusted to judge whether increasing the degree genuinely improves the fit.
