Mathwords logoMathwords

Correlation Coefficient — Definition, Formula & Examples

Correlation Coefficient

A number that is a measure of the strength and direction of the correlation between two variables. Correlation coefficients are expressed using the variable r, where r is between 1 and –1, inclusive. The closer r is to 1 or –1, the less scattered the points are and the stronger the relationship. Only data points with a scatterplot which is a perfectly straight line can have r = –1 or r = 1. When r < 0 the data have a negative association, and when r > 0 the data have a positive association.

 

 

See also

Least squares regression line

Key Formula

r=nxy(x)(y)[nx2(x)2][ny2(y)2]r = \frac{n\sum xy - \left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[n\sum x^2 - \left(\sum x\right)^2\right]\left[n\sum y^2 - \left(\sum y\right)^2\right]}}
Where:
  • rr = Pearson correlation coefficient
  • nn = Number of data points
  • xx = Values of the first variable
  • yy = Values of the second variable
  • xy\sum xy = Sum of the products of each paired x and y value
  • x\sum x = Sum of all x values
  • y\sum y = Sum of all y values

Worked Example

Problem: Five students studied for a test. Their hours studied (x) and test scores (y) were: (1, 50), (2, 60), (3, 65), (4, 75), (5, 90). Find the correlation coefficient r.
Step 1: Find the needed sums. There are n = 5 data points.
x=1+2+3+4+5=15\sum x = 1+2+3+4+5 = 15
Step 2: Find the sum of y values.
y=50+60+65+75+90=340\sum y = 50+60+65+75+90 = 340
Step 3: Find the sum of the products xy.
xy=(1)(50)+(2)(60)+(3)(65)+(4)(75)+(5)(90)=50+120+195+300+450=1115\sum xy = (1)(50)+(2)(60)+(3)(65)+(4)(75)+(5)(90) = 50+120+195+300+450 = 1115
Step 4: Find the sum of x² and y².
x2=1+4+9+16+25=55y2=2500+3600+4225+5625+8100=24050\sum x^2 = 1+4+9+16+25 = 55 \qquad \sum y^2 = 2500+3600+4225+5625+8100 = 24050
Step 5: Substitute into the formula. Compute the numerator first.
nxy(x)(y)=5(1115)(15)(340)=55755100=475n\sum xy - (\sum x)(\sum y) = 5(1115) - (15)(340) = 5575 - 5100 = 475
Step 6: Compute each factor in the denominator.
nx2(x)2=5(55)225=50ny2(y)2=5(24050)115600=4650n\sum x^2 - (\sum x)^2 = 5(55) - 225 = 50 \qquad n\sum y^2 - (\sum y)^2 = 5(24050) - 115600 = 4650
Step 7: Take the square root of their product to finish the denominator.
50×4650=232500482.18\sqrt{50 \times 4650} = \sqrt{232500} \approx 482.18
Step 8: Divide to get r.
r=475482.180.985r = \frac{475}{482.18} \approx 0.985
Answer: The correlation coefficient is approximately r ≈ 0.985, indicating a very strong positive linear relationship between hours studied and test score.

Frequently Asked Questions

What does a correlation coefficient of 0 mean?
An r value of 0 means there is no linear relationship between the two variables. The data points show no tendency to follow a straight-line pattern. However, there could still be a non-linear relationship (such as a curve), so r = 0 does not mean the variables are completely unrelated.
Does correlation mean causation?
No. A strong correlation coefficient tells you two variables move together in a predictable linear pattern, but it does not prove that one variable causes the other to change. A lurking third variable or pure coincidence could explain the association. Establishing causation requires a controlled experiment or additional evidence.

Correlation coefficient (r) vs. Coefficient of determination (r²)

The correlation coefficient r tells you the strength and direction of a linear relationship and ranges from −1 to 1. The coefficient of determination r² is simply r squared, so it ranges from 0 to 1. It tells you what fraction of the variation in y is explained by the linear relationship with x. For example, if r = 0.985, then r² ≈ 0.970, meaning about 97% of the variation in test scores is explained by hours studied. Note that r² loses the direction information—it is always non-negative.

Why It Matters

The correlation coefficient is one of the most widely used statistics in science, business, and everyday data analysis. It gives you a quick, standardized way to judge whether two quantities—like advertising spending and revenue, or temperature and ice cream sales—are linearly related. Knowing r also helps you decide whether fitting a least-squares regression line to your data is meaningful or misleading.

Common Mistakes

Mistake: Assuming a high correlation means one variable causes the other.
Correction: Correlation measures association, not causation. Always consider lurking variables and the design of the study before drawing causal conclusions.
Mistake: Using the correlation coefficient to describe non-linear relationships.
Correction: The Pearson correlation coefficient only measures linear association. Data that follow a curved pattern can have r close to 0 even though the variables are strongly related. Always check the scatterplot first.

Related Terms