Bonferroni Correction — Definition, Formula & Examples

The Bonferroni correction is a way to adjust your significance level when you perform multiple hypothesis tests at the same time, so you don't accidentally conclude something is significant when it isn't. You divide your original significance level (like 0.05) by the number of tests you're running.

When conducting $m$ simultaneous hypothesis tests, the Bonferroni correction controls the family-wise error rate (FWER) by rejecting each individual null hypothesis $H_i$ only if its p-value $p_i \leq \frac{\alpha}{m}$ , where $\alpha$ is the desired overall significance level. This follows from Boole's inequality, which guarantees that the probability of at least one Type I error across all tests does not exceed $\alpha$ .

Key Formula

\alpha_{\text{adjusted}} = \frac{\alpha}{m}

Where:

$\alpha$ = Original family-wise significance level (e.g., 0.05)
$m$ = Number of hypothesis tests being conducted simultaneously
$\alpha_{\text{adjusted}}$ = Adjusted significance level used for each individual test

How It Works

Without any correction, running many tests inflates the chance that at least one produces a false positive. For instance, with 20 independent tests at

\alpha = 0.05

, the probability of at least one false positive is

1 - (1 - 0.05)^{20} \approx 0.64

. The Bonferroni correction addresses this by making each individual test harder to pass. You simply divide your chosen significance level by the total number of comparisons, then compare each p-value against this stricter threshold. The trade-off is reduced statistical power — genuinely real effects may fail to reach the adjusted threshold, especially when the number of tests is large.

Worked Example

Problem: A researcher tests whether a new drug affects 4 different biomarkers, running a separate hypothesis test for each. She wants an overall significance level of 0.05. The four p-values obtained are 0.003, 0.015, 0.042, and 0.610. Which results are significant after the Bonferroni correction?

Compute the adjusted threshold: Divide the overall significance level by the number of tests.

\alpha_{\text{adjusted}} = \frac{0.05}{4} = 0.0125

Compare each p-value to the threshold: A result is significant only if its p-value is at most 0.0125. Check each: 0.003 ≤ 0.0125 (significant), 0.015 > 0.0125 (not significant), 0.042 > 0.0125 (not significant), 0.610 > 0.0125 (not significant).

p_1 = 0.003 \leq 0.0125 \quad \checkmark

Answer: Only the first biomarker (p = 0.003) is statistically significant after the Bonferroni correction. The second test (p = 0.015), which would have been significant without the correction, no longer passes the stricter threshold.

Why It Matters

Genomics studies often test thousands of genes at once; without a correction like Bonferroni, hundreds of false discoveries would slip through. Any field that runs multiple comparisons — from psychology experiments with several outcome measures to A/B testing on websites — needs a principled way to control false positives. Understanding this correction is also a stepping stone to more powerful methods like the Holm-Bonferroni procedure or the Benjamini-Hochberg method for controlling the false discovery rate.

Common Mistakes

Mistake: Applying the Bonferroni correction to every test you've ever run in a study, rather than to the specific family of related comparisons.

Correction: The correction applies within a defined family of tests addressing the same research question. Group your comparisons logically and correct within each family.