Correlation vs. Causation
Correlation vs. causation is the distinction between two variables that happen to move together (correlation) and one variable actually causing the other to change (causation). Just because two things are related does not mean one is responsible for the other.
Correlation describes a statistical association between two variables — as one changes, the other tends to change in a predictable way. Causation, by contrast, means that a change in one variable directly produces a change in another. Establishing causation requires more than observing a correlation; it typically demands a controlled experiment that isolates the effect of the explanatory variable while accounting for confounding variables. In observational studies, correlation alone can never prove causation because lurking variables may explain the observed relationship.
Example
Problem: A researcher finds that cities with more ice cream sales also have higher rates of drowning. The correlation coefficient is r = 0.85. Can we conclude that ice cream sales cause drowning?
Step 1: Identify the association. There is a strong positive correlation between ice cream sales and drowning rates ().
Step 2: Ask whether the data come from a controlled experiment. They do not — this is an observational study. No variable was deliberately manipulated.
Step 3: Look for a lurking (confounding) variable. Both ice cream sales and drowning rates increase during hot summer months. Temperature is a plausible lurking variable that drives both.
Step 4: State the conclusion. Because this is an observational study with a clear confounding variable, we cannot conclude that ice cream sales cause drowning. The correlation is real, but the causal claim is not supported.
Answer: No. The strong correlation is likely explained by a lurking variable (temperature). Correlation does not imply causation.
Why It Matters
Confusing correlation with causation is one of the most common errors in interpreting data, and it shows up everywhere — in news headlines, medical studies, and policy debates. A headline might claim that eating breakfast improves test scores, but without a controlled experiment, the link could be driven by other factors like household income. Understanding this distinction helps you critically evaluate statistical claims rather than accepting them at face value.
Common Mistakes
Mistake: Assuming a strong correlation coefficient automatically means one variable causes the other.
Correction: A high value only tells you the variables are associated. Causation requires evidence from a well-designed experiment or, at minimum, a careful argument ruling out confounding variables.
Mistake: Thinking that 'correlation does not imply causation' means correlated variables are never causally related.
Correction: Sometimes a causal relationship does exist — smoking really does cause lung cancer. The point is that correlation alone is not sufficient proof. You need additional evidence, such as experimental data, to establish the causal link.
