Covariance — Definition, Formula & Examples
Covariance is a measure of how two variables move together. A positive covariance means they tend to increase together, while a negative covariance means one tends to decrease when the other increases.
For two random variables and , the covariance is defined as the expected value of the product of their deviations from their respective means: . For a sample of paired observations, the sample covariance uses in the denominator as a degrees-of-freedom correction.
Key Formula
Where:
- = The $i$-th paired observations of variables $X$ and $Y$
- = The sample means of $X$ and $Y$
- = The number of paired observations
How It Works
To compute covariance, you find how far each data point deviates from its variable's mean, multiply the paired deviations together, and average the results. If large values of tend to appear alongside large values of , most products will be positive, yielding a positive covariance. If large values pair with small values, most products will be negative. The magnitude of covariance depends on the units and scales of and , which is why correlation (covariance divided by the product of the standard deviations) is often preferred for comparison.
Worked Example
Problem: Five students' hours studied () and exam scores () are: (2, 60), (4, 70), (6, 80), (8, 85), (10, 95). Find the sample covariance.
Find the means: Compute the mean of each variable.
Compute each product of deviations: For each pair, multiply by .
Sum and divide by n − 1: Add the products and divide by 4.
Answer: The sample covariance is , indicating a positive association: more hours studied tends to go with higher exam scores.
Visualization
Why It Matters
Covariance is the building block of the Pearson correlation coefficient and appears throughout regression analysis, portfolio theory in finance, and multivariate probability. In AP Statistics and college-level courses, understanding covariance is essential for interpreting how variables relate before moving to linear models.
Common Mistakes
Mistake: Interpreting a large covariance as a strong relationship.
Correction: Covariance is not standardized — its magnitude depends on the units of the variables. Divide by the product of the standard deviations to get the correlation coefficient, which ranges from to and allows meaningful comparison of strength.
