Correlation Coefficient — Definition, Formula & Examples
Correlation
Coefficient
A number that is a measure of the strength and direction of
the correlation between two variables.
Correlation coefficients are expressed using the variable r,
where r is
between 1 and –1, inclusive.
The closer r is to 1 or –1, the less scattered the
points are and the stronger the relationship. Only data points
with a scatterplot which is a
perfectly straight line can have r = –1
or r =
1. When r < 0 the data have a negative
association, and when r > 0 the
data have a positive
association.
∑xy = Sum of the products of each paired x and y value
∑x = Sum of all x values
∑y = Sum of all y values
Worked Example
Problem: Five students studied for a test. Their hours studied (x) and test scores (y) were: (1, 50), (2, 60), (3, 65), (4, 75), (5, 90). Find the correlation coefficient r.
Step 1: Find the needed sums. There are n = 5 data points.
Step 7: Take the square root of their product to finish the denominator.
50×4650=232500≈482.18
Step 8: Divide to get r.
r=482.18475≈0.985
Answer: The correlation coefficient is approximately r ≈ 0.985, indicating a very strong positive linear relationship between hours studied and test score.
Frequently Asked Questions
What does a correlation coefficient of 0 mean?
An r value of 0 means there is no linear relationship between the two variables. The data points show no tendency to follow a straight-line pattern. However, there could still be a non-linear relationship (such as a curve), so r = 0 does not mean the variables are completely unrelated.
Does correlation mean causation?
No. A strong correlation coefficient tells you two variables move together in a predictable linear pattern, but it does not prove that one variable causes the other to change. A lurking third variable or pure coincidence could explain the association. Establishing causation requires a controlled experiment or additional evidence.
Correlation coefficient (r) vs. Coefficient of determination (r²)
The correlation coefficient r tells you the strength and direction of a linear relationship and ranges from −1 to 1. The coefficient of determination r² is simply r squared, so it ranges from 0 to 1. It tells you what fraction of the variation in y is explained by the linear relationship with x. For example, if r = 0.985, then r² ≈ 0.970, meaning about 97% of the variation in test scores is explained by hours studied. Note that r² loses the direction information—it is always non-negative.
Why It Matters
The correlation coefficient is one of the most widely used statistics in science, business, and everyday data analysis. It gives you a quick, standardized way to judge whether two quantities—like advertising spending and revenue, or temperature and ice cream sales—are linearly related. Knowing r also helps you decide whether fitting a least-squares regression line to your data is meaningful or misleading.
Common Mistakes
Mistake: Assuming a high correlation means one variable causes the other.
Correction: Correlation measures association, not causation. Always consider lurking variables and the design of the study before drawing causal conclusions.
Mistake: Using the correlation coefficient to describe non-linear relationships.
Correction: The Pearson correlation coefficient only measures linear association. Data that follow a curved pattern can have r close to 0 even though the variables are strongly related. Always check the scatterplot first.