Chi-Squared Distribution — Definition, Formula & Examples
The chi-squared distribution is a continuous probability distribution that arises when you sum the squares of independent standard normal random variables. It is defined by a single parameter called degrees of freedom, and it appears throughout hypothesis testing and confidence interval estimation.
If are independent standard normal random variables, then follows a chi-squared distribution with degrees of freedom, written . The distribution is supported on , is right-skewed, and approaches a normal distribution as .
Key Formula
Where:
- = Value of the chi-squared random variable (must be positive)
- = Degrees of freedom (positive integer)
- = The gamma function, which generalizes the factorial
How It Works
The chi-squared distribution models the sum of squared deviations. In a chi-squared goodness-of-fit test, you compute a test statistic that measures how far observed frequencies deviate from expected frequencies. You then compare that statistic to a critical value from the table at your chosen significance level and degrees of freedom. If the test statistic exceeds the critical value, you reject the null hypothesis. The distribution is also used in constructing confidence intervals for a population variance.
Worked Example
Problem: A die is rolled 60 times. The observed frequencies for faces 1–6 are: 8, 12, 10, 14, 7, 9. Test at the 0.05 significance level whether the die is fair.
Step 1: Under the null hypothesis (fair die), each face has an expected frequency of 60/6 = 10.
Step 2: Compute the chi-squared test statistic by summing the squared deviations divided by expected frequencies.
Step 3: The degrees of freedom are k = 6 − 1 = 5. The critical value from the chi-squared table at α = 0.05 with 5 df is 11.07. Since 3.4 < 11.07, we fail to reject the null hypothesis.
Answer: The test statistic is 3.4, which is less than the critical value of 11.07. There is not enough evidence at the 5% level to conclude the die is unfair.
Visualization
Why It Matters
The chi-squared distribution is central to goodness-of-fit tests, tests of independence in contingency tables, and inference about population variances. Any field that relies on categorical data analysis — genetics, market research, quality control — uses chi-squared tests routinely.
Common Mistakes
Mistake: Confusing the chi-squared test statistic with the chi-squared distribution's degrees of freedom.
Correction: The test statistic is a computed value from your data; degrees of freedom come from the structure of the problem (e.g., number of categories minus 1). They serve different roles: df determines which chi-squared distribution to use, and the test statistic is compared against that distribution.
