Correlation vs. Causation
Correlation means two variables tend to change together — when one goes up, the other tends to go up (or down). Causation means one variable directly causes the other to change. The critical distinction: correlation does not imply causation. Ice cream sales and drowning deaths are correlated (both increase in summer), but ice cream does not cause drowning — a third variable (warm weather) drives both.
Correlation vs. Causation
| Correlation | Causation | |
|---|---|---|
| Definition | Two variables move together in a pattern | One variable directly produces a change in the other |
| Direction | Can be positive, negative, or zero | Always one-directional: cause → effect |
| Measured by | Correlation coefficient (−1 to +1) | Controlled experiments, not just observation |
| Third variables? | May be driven by confounding variables | Must rule out confounders to establish |
| Example | Cities with more firefighters have more fires (both caused by city size) | Smoking causes lung cancer (established via decades of controlled studies) |
| Proves? | Association, not explanation | A mechanism linking cause to effect |
When to Use Each
Use Correlation when...
- Describing relationships in observational data
- Exploring whether variables are related before investigating why
- Building prediction models (regression) where mechanism isn't the goal
- Reporting statistical associations in research papers
Use Causation when...
- Making policy decisions (banning a substance, recommending a treatment)
- Understanding WHY something happens, not just that it happens
- Drawing conclusions from randomized controlled experiments
- Establishing scientific mechanisms
Examples
Spurious correlation
Per capita cheese consumption correlates with the number of people who die tangled in bedsheets (r ≈ 0.95). This is pure coincidence — no mechanism connects them. This is why correlation ≠ causation.Confounding variable
Students who eat breakfast get better grades. Does breakfast cause better grades? Not necessarily — families that provide breakfast may also provide more academic support, better sleep routines, and other advantages.Established causation
Randomized clinical trials show that a specific drug reduces blood pressure. Because the experiment controls for confounders (placebo group, random assignment), we can conclude the drug causes the reduction.Common Confusion Points
The most common error in statistics and media reporting is treating a correlation as proof of causation. Headlines like 'Study finds coffee drinkers live longer' imply causation, but the study may only show correlation.
Reverse causation is another pitfall: A and B are correlated, but B causes A (not A causes B). For example, successful people read more books — but does reading cause success, or do already-successful people have more leisure time to read?
Frequently Asked Questions
Does correlation ever imply causation?
Correlation alone never proves causation. However, strong correlation combined with a plausible mechanism, dose-response relationship, temporal ordering (cause precedes effect), and consistency across studies can build a strong case for causation.
How do you prove causation?
The gold standard is a randomized controlled experiment (RCT): randomly assign subjects to treatment and control groups, then observe the difference. Random assignment ensures that confounding variables are balanced between groups.
What is a confounding variable?
A confounding variable (confounder) is a third variable that influences both the supposed cause and the supposed effect, creating a spurious correlation. For example, age can confound the relationship between exercise and heart health.
