Estimator Bias — Definition, Formula & Examples

Estimator bias is the difference between an estimator's expected value and the true value of the parameter it estimates. An estimator with zero bias is called unbiased, meaning it hits the correct value on average across many samples.

The bias of an estimator $\hat{\theta}$ for a parameter $\theta$ is defined as $\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta$ . An estimator is unbiased if and only if $E[\hat{\theta}] = \theta$ , so that its bias equals zero.

Key Formula

\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta

Where:

$\hat{\theta}$ = The estimator (a statistic computed from sample data)
$E[\hat{\theta}]$ = The expected value of the estimator across all possible samples
$\theta$ = The true population parameter being estimated

How It Works

To assess whether an estimator is biased, you compare its expected value (the long-run average over all possible samples) to the true parameter. If the expected value is consistently too high, the bias is positive; if consistently too low, the bias is negative. In practice, you often prove unbiasedness algebraically rather than through simulation. For example, the sample mean

\bar{X}

is an unbiased estimator of the population mean

\mu

because

E[\bar{X}] = \mu

. In contrast, dividing by

n

instead of

n-1

when computing sample variance produces a biased estimator that systematically underestimates the population variance.

Worked Example

Problem: A population has variance

\sigma^2 = 20

. You draw samples of size

n = 5

and compute

\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2

, which divides by

n

rather than

n-1

. Find the bias of this estimator.

Step 1: It is a known result that the expected value of this estimator is:

E\!\left[\frac{1}{n}\sum(X_i - \bar{X})^2\right] = \frac{n-1}{n}\,\sigma^2

Step 2: Substitute

n = 5

and

\sigma^2 = 20

E[\hat{\sigma}^2] = \frac{4}{5}(20) = 16

Step 3: Apply the bias formula:

\text{Bias}(\hat{\sigma}^2) = 16 - 20 = -4

Answer: The bias is

-4

, meaning this estimator systematically underestimates the population variance by 4 on average. This is exactly why the corrected sample variance

s^2

divides by

n - 1

instead.

Why It Matters

Estimator bias shows up directly on the AP Statistics exam when comparing estimators or explaining why

s^2

uses

n-1

. In data science and econometrics, choosing between biased and unbiased estimators (or accepting some bias for lower variance, as in ridge regression) is a core modeling decision.

Common Mistakes

Mistake: Confusing bias with variability. Students assume a biased estimator always gives wrong answers for any single sample.

Correction: Bias describes the long-run average error, not individual sample error. A biased estimator can still land on the true value in a given sample — it just won't center on it across repeated sampling.