Correlation
Correlation
The
degree to which two variables are associated. For example, height
and weight have a moderately strong positive correlation.
See also
Correlation coefficient, positively associated data, negatively associated data
Key Formula
r=[n∑x2−(∑x)2][n∑y2−(∑y)2]n∑xy−(∑x)(∑y)
Where:
- r = Pearson correlation coefficient, a value between −1 and 1
- x,y = The two variables being compared
- n = The number of data pairs
- ∑xy = The sum of the products of each paired value
Worked Example
Problem: Five students report hours studied per week and their test scores. Determine the direction and strength of the correlation.
Hours (x): 1, 2, 3, 4, 5
Scores (y): 50, 55, 65, 70, 80
Step 1: Compute the needed sums. Here n = 5.
∑x=15,∑y=320,∑xy=1030
Step 2: Compute the sums of squares.
∑x2=55,∑y2=21450
Step 3: Substitute into the formula for r.
r=[5(55)−152][5(21450)−3202]5(1030)−(15)(320)
Step 4: Simplify the numerator.
5(1030)−(15)(320)=5150−4800=350
Step 5: Simplify the denominator.
(275−225)(107250−102400)=(50)(4850)=242500≈492.4
Step 6: Divide to find r.
r=492.4350≈0.711
Answer: r ≈ 0.71, indicating a strong positive correlation. As study hours increase, test scores tend to increase as well.
Frequently Asked Questions
Does correlation mean causation?
No. Correlation tells you that two variables move together, but it does not prove that one causes the other. A third hidden variable (called a confounding variable) could be driving both. For example, ice cream sales and drowning rates are positively correlated, but both are caused by hot weather — ice cream does not cause drowning.
What do the values of the correlation coefficient mean?
The coefficient r ranges from −1 to 1. A value of 1 means a perfect positive linear relationship, −1 means a perfect negative linear relationship, and 0 means no linear relationship. Generally, |r| above 0.7 is considered strong, between 0.4 and 0.7 is moderate, and below 0.4 is weak.
Correlation vs. Causation
Correlation describes a statistical association between two variables — they tend to change together. Causation means one variable directly produces a change in the other. You can observe correlation from data alone, but establishing causation typically requires a controlled experiment. Two variables can be strongly correlated without either one causing the other.
Why It Matters
Correlation is one of the most widely used tools in data analysis. Scientists use it to identify relationships between variables, such as linking exercise frequency to heart health. In everyday life, understanding correlation helps you evaluate claims in news headlines and avoid jumping to conclusions about cause and effect.
Common Mistakes
Mistake: Assuming that a strong correlation proves one variable causes the other.
Correction: Correlation only measures association. A lurking or confounding variable may explain the relationship. Always ask whether a controlled experiment supports the causal claim.
Mistake: Thinking r = 0 means the variables are completely unrelated.
Correction: An r value of 0 means there is no *linear* relationship. The variables could still have a strong nonlinear relationship, such as a curved or U-shaped pattern. Always look at a scatter plot in addition to computing r.
Related Terms
- Correlation Coefficient — The numerical value that quantifies correlation
- Variable — The quantities being compared in correlation
- Positively Associated Data — Data where both variables increase together
- Negatively Associated Data — Data where one variable decreases as the other increases
- Scatter Plot — Graph used to visually display correlation
- Line of Best Fit — Line that models the trend in correlated data
- Regression — Method for predicting one variable from another
