Normal Equation — Definition, Formula & Examples

The Normal Equation is a formula that directly computes the best-fit coefficients in a least squares problem without iteration. It finds the vector

\mathbf{x}

that minimizes the squared error between observed values and the predictions of a linear model.

Given an overdetermined system $A\mathbf{x} = \mathbf{b}$ where $A$ is an $m \times n$ matrix with $m > n$ , the Normal Equation $A^T A\,\mathbf{x} = A^T \mathbf{b}$ yields the least squares solution $\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}$ , provided $A^T A$ is invertible.

Key Formula

\hat{\mathbf{x}} = (A^T A)^{-1} A^T \mathbf{b}

Where:

$A$ = The m × n design matrix (rows = data points, columns = features)
$\mathbf{b}$ = The m × 1 vector of observed values
$\hat{\mathbf{x}}$ = The n × 1 vector of best-fit coefficients
$A^T$ = The transpose of A

How It Works

You start with a system that has more equations than unknowns, so no exact solution exists. Multiply both sides of

A\mathbf{x} = \mathbf{b}

A^T

on the left to get the square system

A^T A\,\mathbf{x} = A^T \mathbf{b}

. If the columns of

A

are linearly independent,

A^T A

is invertible and you solve for

\mathbf{x}

directly. The resulting

\hat{\mathbf{x}}

minimizes

\|A\mathbf{x} - \mathbf{b}\|^2

, the sum of squared residuals.

Worked Example

Problem: Find the least squares line y = c₀ + c₁x for the data points (1, 2), (2, 3), (3, 6).

Set up A and b: Each row of A has a 1 (for the intercept) and the x-value. The vector b contains the y-values.

A = \begin{bmatrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 2 \\ 3 \\ 6 \end{bmatrix}

Compute AᵀA and Aᵀb: Multiply Aᵀ by A and Aᵀ by b.

A^T A = \begin{bmatrix} 3 & 6 \\ 6 & 14 \end{bmatrix}, \quad A^T \mathbf{b} = \begin{bmatrix} 11 \\ 25 \end{bmatrix}

Solve the Normal Equation: Invert AᵀA and multiply by Aᵀb. The determinant of AᵀA is 3(14) − 6(6) = 6.

\hat{\mathbf{x}} = \frac{1}{6}\begin{bmatrix} 14 & -6 \\ -6 & 3 \end{bmatrix}\begin{bmatrix} 11 \\ 25 \end{bmatrix} = \begin{bmatrix} -\tfrac{7}{3} \\ 2 \end{bmatrix} \approx \begin{bmatrix} -0.33 \\ 2 \end{bmatrix}

Answer: The least squares line is

y = -\tfrac{1}{3} + 2x

Why It Matters

The Normal Equation is the foundation of ordinary least squares regression, which appears in statistics, econometrics, and machine learning. In courses like linear algebra and data science, it connects matrix operations to curve fitting. For small-to-medium datasets, it provides a closed-form solution that is faster than iterative methods like gradient descent.

Common Mistakes

Mistake: Attempting to use the Normal Equation when AᵀA is singular (not invertible).

Correction: AᵀA is invertible only when the columns of A are linearly independent. If they are not, use the pseudoinverse or add regularization (ridge regression).