Normal Equation — Definition, Formula & Examples
The Normal Equation is a formula that directly computes the best-fit coefficients in a least squares problem without iteration. It finds the vector that minimizes the squared error between observed values and the predictions of a linear model.
Given an overdetermined system where is an matrix with , the Normal Equation yields the least squares solution , provided is invertible.
Key Formula
Where:
- = The m × n design matrix (rows = data points, columns = features)
- = The m × 1 vector of observed values
- = The n × 1 vector of best-fit coefficients
- = The transpose of A
How It Works
You start with a system that has more equations than unknowns, so no exact solution exists. Multiply both sides of by on the left to get the square system . If the columns of are linearly independent, is invertible and you solve for directly. The resulting minimizes , the sum of squared residuals.
Worked Example
Problem: Find the least squares line y = c₀ + c₁x for the data points (1, 2), (2, 3), (3, 6).
Set up A and b: Each row of A has a 1 (for the intercept) and the x-value. The vector b contains the y-values.
Compute AᵀA and Aᵀb: Multiply Aᵀ by A and Aᵀ by b.
Solve the Normal Equation: Invert AᵀA and multiply by Aᵀb. The determinant of AᵀA is 3(14) − 6(6) = 6.
Answer: The least squares line is .
Why It Matters
The Normal Equation is the foundation of ordinary least squares regression, which appears in statistics, econometrics, and machine learning. In courses like linear algebra and data science, it connects matrix operations to curve fitting. For small-to-medium datasets, it provides a closed-form solution that is faster than iterative methods like gradient descent.
Common Mistakes
Mistake: Attempting to use the Normal Equation when AᵀA is singular (not invertible).
Correction: AᵀA is invertible only when the columns of A are linearly independent. If they are not, use the pseudoinverse or add regularization (ridge regression).
