Why do we subtract 1 instead of using the full sample size?

When you calculate the sample mean first, the last data value is no longer free — it is determined by the other values and the mean. This constraint removes one degree of freedom. Dividing by $n - 1$ instead of $n$ corrects for this, producing an unbiased estimate of the population variance.

Degrees of Freedom — Definition, Formula & Examples

Degrees of freedom is the number of independent values in a data set that are free to vary when calculating a statistic. It determines which version of a probability distribution (such as the t-distribution or chi-square distribution) you use for inference.

In a statistical estimation procedure, the degrees of freedom (often abbreviated df) equal the number of independent observations minus the number of parameters estimated from those observations. For a single-sample mean with sample size $n$ , the degrees of freedom are $n - 1$ because the sample mean imposes one linear constraint on the data values.

Key Formula

df = n - 1

Where:

$df$ = Degrees of freedom
$n$ = Sample size (number of independent observations)

How It Works

When you compute a sample statistic, you "use up" some of the information in your data to estimate parameters. Each parameter you estimate removes one degree of freedom. For example, when you calculate a sample standard deviation, you first compute the sample mean — that constrains the data so only

n - 1

values can vary independently. This is why you divide by

n - 1

instead of

n

in the sample variance formula. In practice, you plug the degrees of freedom into the appropriate distribution (t, chi-square, or F) to find critical values or p-values. A smaller df means the distribution has heavier tails, reflecting greater uncertainty from fewer independent pieces of information.

Worked Example

Problem: You survey 25 students and record their test scores. You want to construct a 95% confidence interval for the population mean using a t-distribution. What are the degrees of freedom, and how do they affect your calculation?

Step 1: Identify the sample size.

n = 25

Step 2: Subtract 1 because you estimate one parameter (the mean) from the data.

df = 25 - 1 = 24

Step 3: Look up the t-critical value for 24 degrees of freedom at the 95% confidence level. From a t-table, you find:

t^* = 2.064

Step 4: Use this critical value in the confidence interval formula. Notice that with only 24 df, the critical value (2.064) is larger than the z-critical value (1.960), producing a wider interval that accounts for the extra uncertainty in estimating the population standard deviation.

\bar{x} \pm 2.064 \cdot \frac{s}{\sqrt{25}}

Answer: The degrees of freedom are 24. You use

t^* = 2.064

instead of

z^* = 1.960

, which gives a slightly wider confidence interval to account for the uncertainty from a small sample.

Another Example

Problem: A chi-square goodness-of-fit test compares observed frequencies across 6 categories to expected frequencies. What are the degrees of freedom?

Step 1: Count the number of categories.

k = 6

Step 2: For a goodness-of-fit test, subtract 1 because the category frequencies must sum to the total sample size, which imposes one constraint.

df = k - 1 = 6 - 1 = 5

Answer: The chi-square test has 5 degrees of freedom. You compare your test statistic to a chi-square distribution with

df = 5

to find the p-value.

Visualization

Why It Matters

Degrees of freedom appear throughout AP Statistics whenever you perform a t-test, construct a confidence interval for a mean, or run a chi-square test. Using the wrong df leads to incorrect critical values and p-values, which can cause you to draw the wrong conclusion about a hypothesis. In fields like clinical research and quality engineering, correctly specifying degrees of freedom is essential for valid inference from sample data.

Common Mistakes

Mistake: Using

n

instead of

n - 1

for a one-sample t-test.

Correction: You estimate the population mean from the sample, which costs one degree of freedom. Always use

df = n - 1

for a single-sample t-procedure.

Mistake: Applying the single-sample formula

df = n - 1

to every test.

Correction: Different procedures have different df formulas. A chi-square goodness-of-fit test uses

df = k - 1

. A two-sample t-test uses a more complex formula (or the smaller of

n_1 - 1

and

n_2 - 1

as a conservative approach). Always check which formula matches your test.