Hypergeometric Distribution — Definition, Formula & Examples
The hypergeometric distribution gives the probability of drawing exactly successes from a finite population when you sample without replacement. It applies whenever you pull items from a group that contains two types (success/failure) and do not put them back.
A discrete random variable follows a hypergeometric distribution with parameters (population size), (number of success states in the population), and (number of draws) if its probability mass function is for .
Key Formula
Where:
- = Total population size
- = Number of success states in the population
- = Number of draws (sample size)
- = Number of observed successes in the sample
How It Works
You use the hypergeometric distribution when three conditions hold: the population is finite, each member is classified as success or failure, and sampling is done without replacement. To find , count the ways to choose successes from , multiply by the ways to choose failures from , then divide by the total ways to choose items from . The expected value is , and the variance is . As grows large relative to , the hypergeometric distribution approaches the binomial distribution.
Worked Example
Problem: A deck contains 20 cards: 6 red and 14 black. You draw 5 cards without replacement. What is the probability of getting exactly 2 red cards?
Identify parameters: Population , successes , draws , desired successes .
Count favorable outcomes: Choose 2 red from 6 and 3 black from 14.
Count total outcomes: Choose any 5 from 20.
Compute probability: Divide favorable by total.
Answer: The probability of drawing exactly 2 red cards is approximately , or about 35.2%.
Visualization
Why It Matters
Quality control relies on this distribution: when an inspector pulls a sample from a finite lot of products, the hypergeometric model gives the exact probability of finding a certain number of defectives. It also underpins Fisher's exact test, a standard tool in biostatistics for analyzing contingency tables with small sample sizes.
Common Mistakes
Mistake: Using the binomial distribution instead when sampling without replacement from a small population.
Correction: The binomial assumes independence between draws (replacement). When the sample is a non-negligible fraction of the population (often cited as more than 5-10%), the hypergeometric distribution is the correct model because each draw changes the composition of the remaining pool.
