Mathwords logoMathwords

Correlation

Correlation

The degree to which two variables are associated. For example, height and weight have a moderately strong positive correlation.

 

 

See also

Correlation coefficient, positively associated data, negatively associated data

Key Formula

r=nxy(x)(y)[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{[n\sum x^2 - (\sum x)^2][n\sum y^2 - (\sum y)^2]}}
Where:
  • rr = Pearson correlation coefficient, a value between −1 and 1
  • x,yx, y = The two variables being compared
  • nn = The number of data pairs
  • xy\sum xy = The sum of the products of each paired value

Worked Example

Problem: Five students report hours studied per week and their test scores. Determine the direction and strength of the correlation. Hours (x): 1, 2, 3, 4, 5 Scores (y): 50, 55, 65, 70, 80
Step 1: Compute the needed sums. Here n = 5.
x=15,y=320,xy=1030\sum x = 15, \quad \sum y = 320, \quad \sum xy = 1030
Step 2: Compute the sums of squares.
x2=55,y2=21450\sum x^2 = 55, \quad \sum y^2 = 21450
Step 3: Substitute into the formula for r.
r=5(1030)(15)(320)[5(55)152][5(21450)3202]r = \frac{5(1030) - (15)(320)}{\sqrt{[5(55) - 15^2][5(21450) - 320^2]}}
Step 4: Simplify the numerator.
5(1030)(15)(320)=51504800=3505(1030) - (15)(320) = 5150 - 4800 = 350
Step 5: Simplify the denominator.
(275225)(107250102400)=(50)(4850)=242500492.4\sqrt{(275 - 225)(107250 - 102400)} = \sqrt{(50)(4850)} = \sqrt{242500} \approx 492.4
Step 6: Divide to find r.
r=350492.40.711r = \frac{350}{492.4} \approx 0.711
Answer: r ≈ 0.71, indicating a strong positive correlation. As study hours increase, test scores tend to increase as well.

Frequently Asked Questions

Does correlation mean causation?
No. Correlation tells you that two variables move together, but it does not prove that one causes the other. A third hidden variable (called a confounding variable) could be driving both. For example, ice cream sales and drowning rates are positively correlated, but both are caused by hot weather — ice cream does not cause drowning.
What do the values of the correlation coefficient mean?
The coefficient r ranges from −1 to 1. A value of 1 means a perfect positive linear relationship, −1 means a perfect negative linear relationship, and 0 means no linear relationship. Generally, |r| above 0.7 is considered strong, between 0.4 and 0.7 is moderate, and below 0.4 is weak.

Correlation vs. Causation

Correlation describes a statistical association between two variables — they tend to change together. Causation means one variable directly produces a change in the other. You can observe correlation from data alone, but establishing causation typically requires a controlled experiment. Two variables can be strongly correlated without either one causing the other.

Why It Matters

Correlation is one of the most widely used tools in data analysis. Scientists use it to identify relationships between variables, such as linking exercise frequency to heart health. In everyday life, understanding correlation helps you evaluate claims in news headlines and avoid jumping to conclusions about cause and effect.

Common Mistakes

Mistake: Assuming that a strong correlation proves one variable causes the other.
Correction: Correlation only measures association. A lurking or confounding variable may explain the relationship. Always ask whether a controlled experiment supports the causal claim.
Mistake: Thinking r = 0 means the variables are completely unrelated.
Correction: An r value of 0 means there is no *linear* relationship. The variables could still have a strong nonlinear relationship, such as a curved or U-shaped pattern. Always look at a scatter plot in addition to computing r.

Related Terms