Mathwords logoReference LibraryMathwords

Chi-Square Test

A chi-square test is a statistical test that measures whether the differences between observed data and expected data are likely due to chance. It works with categorical (countable) data and helps you decide if a particular model or assumption about the data is reasonable.

The chi-square test evaluates the goodness of fit between observed frequencies and the frequencies expected under a specified probability distribution. The test statistic χ2\chi^2 quantifies how much the observed counts deviate from expected counts across all categories. When χ2\chi^2 is large, the observed data diverges significantly from what was expected, providing evidence against the null hypothesis. The test statistic follows an approximate chi-square distribution with degrees of freedom determined by the number of categories minus one (for a goodness-of-fit test).

Key Formula

χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}
Where:
  • χ2χ² = the chi-square test statistic
  • OiO_i = the observed frequency for category i
  • EiE_i = the expected frequency for category i
  • ΣΣ = sum across all categories

Worked Example

Problem: A candy company claims that its bags contain equal proportions of 4 colors: red, blue, green, and yellow. You open a bag with 200 candies and count 60 red, 45 blue, 55 green, and 40 yellow. At a significance level of 0.05, does the distribution match the company's claim?
Step 1: Find expected frequencies: If all 4 colors are equally likely, each color should make up 1/4 of 200 candies.
Ei=2004=50 for each colorE_i = \frac{200}{4} = 50 \text{ for each color}
Step 2: Compute each term of the chi-square statistic: For each color, calculate (O − E)² / E.
(6050)250+(4550)250+(5550)250+(4050)250=10050+2550+2550+10050=2+0.5+0.5+2\frac{(60-50)^2}{50} + \frac{(45-50)^2}{50} + \frac{(55-50)^2}{50} + \frac{(40-50)^2}{50}\\[6pt]= \frac{100}{50} + \frac{25}{50} + \frac{25}{50} + \frac{100}{50} = 2 + 0.5 + 0.5 + 2
Step 3: Sum to get the test statistic: Add the four terms together.
χ2=5.0\chi^2 = 5.0
Step 4: Determine degrees of freedom and compare: Degrees of freedom = number of categories − 1 = 4 − 1 = 3. The critical value for χ² with 3 degrees of freedom at α = 0.05 is 7.815. Since 5.0 < 7.815, we fail to reject the null hypothesis.
χ2=5.0<7.815=χcritical2\chi^2 = 5.0 < 7.815 = \chi^2_{\text{critical}}
Answer: There is not enough evidence at the 0.05 significance level to reject the company's claim that the four colors are equally distributed.

Visualization

Why It Matters

The chi-square test appears throughout science, business, and social research whenever data falls into categories rather than measurements. Biologists use it to check whether genetic ratios match predicted Mendelian patterns. Market researchers use it to determine whether customer preferences differ across demographic groups. In AP Statistics, mastering this test gives you a tool for analyzing any situation where you need to compare what you observed with what a model predicts.

Common Mistakes

Mistake: Using percentages or proportions instead of actual counts in the formula
Correction: The chi-square formula requires raw frequencies (counts), not proportions or percentages. Always convert to counts before calculating.
Mistake: Forgetting that expected counts must be sufficiently large
Correction: A standard rule is that all expected frequencies should be at least 5. If any expected count is too small, the chi-square approximation becomes unreliable and the test results may be invalid.

Related Terms