Mathwords logoMathwords

Estimator Bias — Definition, Formula & Examples

Estimator bias is the difference between an estimator's expected value and the true value of the parameter it estimates. An estimator with zero bias is called unbiased, meaning it hits the correct value on average across many samples.

The bias of an estimator θ^\hat{\theta} for a parameter θ\theta is defined as Bias(θ^)=E[θ^]θ\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta. An estimator is unbiased if and only if E[θ^]=θE[\hat{\theta}] = \theta, so that its bias equals zero.

Key Formula

Bias(θ^)=E[θ^]θ\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta
Where:
  • θ^\hat{\theta} = The estimator (a statistic computed from sample data)
  • E[θ^]E[\hat{\theta}] = The expected value of the estimator across all possible samples
  • θ\theta = The true population parameter being estimated

How It Works

To assess whether an estimator is biased, you compare its expected value (the long-run average over all possible samples) to the true parameter. If the expected value is consistently too high, the bias is positive; if consistently too low, the bias is negative. In practice, you often prove unbiasedness algebraically rather than through simulation. For example, the sample mean Xˉ\bar{X} is an unbiased estimator of the population mean μ\mu because E[Xˉ]=μE[\bar{X}] = \mu. In contrast, dividing by nn instead of n1n-1 when computing sample variance produces a biased estimator that systematically underestimates the population variance.

Worked Example

Problem: A population has variance σ2=20\sigma^2 = 20. You draw samples of size n=5n = 5 and compute σ^2=1ni=1n(XiXˉ)2\hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^{n}(X_i - \bar{X})^2, which divides by nn rather than n1n-1. Find the bias of this estimator.
Step 1: It is a known result that the expected value of this estimator is:
E ⁣[1n(XiXˉ)2]=n1nσ2E\!\left[\frac{1}{n}\sum(X_i - \bar{X})^2\right] = \frac{n-1}{n}\,\sigma^2
Step 2: Substitute n=5n = 5 and σ2=20\sigma^2 = 20:
E[σ^2]=45(20)=16E[\hat{\sigma}^2] = \frac{4}{5}(20) = 16
Step 3: Apply the bias formula:
Bias(σ^2)=1620=4\text{Bias}(\hat{\sigma}^2) = 16 - 20 = -4
Answer: The bias is 4-4, meaning this estimator systematically underestimates the population variance by 4 on average. This is exactly why the corrected sample variance s2s^2 divides by n1n - 1 instead.

Why It Matters

Estimator bias shows up directly on the AP Statistics exam when comparing estimators or explaining why s2s^2 uses n1n-1. In data science and econometrics, choosing between biased and unbiased estimators (or accepting some bias for lower variance, as in ridge regression) is a core modeling decision.

Common Mistakes

Mistake: Confusing bias with variability. Students assume a biased estimator always gives wrong answers for any single sample.
Correction: Bias describes the long-run average error, not individual sample error. A biased estimator can still land on the true value in a given sample — it just won't center on it across repeated sampling.