Negatively Associated Data — Definition & Examples

Q: What is the difference between negatively associated data and positively associated data?

With negatively associated data, one variable tends to decrease as the other increases, producing a downward trend on a scatterplot and a negative correlation coefficient (r 0).

Negatively Associated Data

A relationship in paired data in which one variable's values tend to increase when the other decreases, and vice-versa. In a scatterplot, negatively associated data tend to follow a pattern from the upper left to the lower right. Negatively associated data have a negative correlation coefficient.

See also

Positively associated data

Key Formula

r = \frac{n\sum xy - (\sum x)(\sum y)}{\sqrt{\left[n\sum x^2 - (\sum x)^2\right]\left[n\sum y^2 - (\sum y)^2\right]}}

Where:

$r$ = Correlation coefficient; a value between −1 and 1. For negatively associated data, r < 0.
$n$ = Number of data pairs
$x$ = Values of the first (independent) variable
$y$ = Values of the second (dependent) variable
$\sum xy$ = Sum of the products of each paired x and y value

Worked Example

Problem: A teacher records the number of hours students spent watching TV per week and their test scores. The data are: (2, 90), (4, 80), (6, 70), (8, 60), (10, 50). Show that this data is negatively associated by computing the correlation coefficient r.

Step 1: List the values and compute the required sums. Here n = 5.

\sum x = 2+4+6+8+10 = 30

Step 2: Find the sum of y values and the sum of the products xy.

\sum y = 90+80+70+60+50 = 350 \qquad \sum xy = 180+320+420+480+500 = 1900

Step 3: Compute the sums of squares.

\sum x^2 = 4+16+36+64+100 = 220 \qquad \sum y^2 = 8100+6400+4900+3600+2500 = 25500

Step 4: Substitute into the correlation coefficient formula.

r = \frac{5(1900) - (30)(350)}{\sqrt{[5(220) - 30^2][5(25500) - 350^2]}} = \frac{9500 - 10500}{\sqrt{[1100 - 900][127500 - 122500]}}

Step 5: Simplify the numerator and denominator to find r.

r = \frac{-1000}{\sqrt{(200)(5000)}} = \frac{-1000}{\sqrt{1000000}} = \frac{-1000}{1000} = -1

Answer: r = −1, which confirms a perfect negative association. As TV hours increase, test scores decrease in a perfectly linear pattern.

Another Example

Unlike the first example, this data set does not follow a perfectly linear pattern. It demonstrates that real-world negatively associated data often has r between −1 and 0 rather than exactly −1.

Problem: A store tracks the price of a product (in dollars) and the number of units sold over four months: (5, 40), (10, 35), (15, 20), (20, 25). Determine whether the data is negatively associated.

Step 1: Record the sums with n = 4.

\sum x = 50, \quad \sum y = 120, \quad \sum xy = 200+350+300+500 = 1350

Step 2: Compute sums of squares.

\sum x^2 = 25+100+225+400 = 750 \qquad \sum y^2 = 1600+1225+400+625 = 3850

Step 3: Substitute into the formula for r.

r = \frac{4(1350) - (50)(120)}{\sqrt{[4(750) - 2500][4(3850) - 14400]}} = \frac{5400 - 6000}{\sqrt{(500)(1000)}}

Step 4: Simplify to find r.

r = \frac{-600}{\sqrt{500000}} = \frac{-600}{707.1} \approx -0.849

Answer: r ≈ −0.849, indicating a strong (but not perfect) negative association between price and units sold.

Frequently Asked Questions

What is the difference between negatively associated data and positively associated data?

With negatively associated data, one variable tends to decrease as the other increases, producing a downward trend on a scatterplot and a negative correlation coefficient (r < 0). With positively associated data, both variables tend to increase together, producing an upward trend and a positive correlation coefficient (r > 0).

Does negative association mean one variable causes the other to decrease?

No. Negative association describes a pattern or trend, not causation. Two variables can move in opposite directions because of a third hidden variable or pure coincidence. You need a controlled experiment or additional evidence to establish that one variable actually causes the other to change.

What does it mean when r is close to 0 but still negative?

A value of r near 0 (such as −0.1) indicates a very weak negative association. The data points are widely scattered, and the downward trend is barely detectable. In practice, such weak associations may not be meaningful.

Negatively Associated Data vs. Positively Associated Data

	Negatively Associated Data	Positively Associated Data
Direction of trend	One variable increases while the other decreases	Both variables increase together
Scatterplot pattern	Downward slope, upper left to lower right	Upward slope, lower left to upper right
Correlation coefficient	r < 0 (between −1 and 0)	r > 0 (between 0 and 1)
Real-world example	More exercise → lower resting heart rate	More study hours → higher test scores

Why It Matters

Recognizing negative association is essential in statistics courses when you analyze scatterplots and compute correlation. It appears in science classes (e.g., altitude vs. air pressure), economics (price vs. demand), and health studies (exercise vs. body fat percentage). Understanding that a negative r value quantifies this inverse relationship helps you interpret data and make predictions using linear regression.

Common Mistakes

Mistake: Assuming negative association means no relationship between the variables.

Correction: Negative association is a definite relationship — it means the variables move in opposite directions. 'No relationship' corresponds to r ≈ 0, where there is no clear trend at all.

Mistake: Confusing negative association with causation.

Correction: A negative correlation coefficient tells you two variables tend to move in opposite directions, but it does not prove that changes in one variable cause changes in the other. Always consider lurking variables and study design before inferring causation.

Related Terms

Positively Associated Data — Opposite trend where both variables increase together
Correlation Coefficient — Numerical measure of direction and strength
Scatterplot — Graph used to visually display association
Paired Data — Data format required to identify association
Variable — Quantity that changes across observations
Line of Best Fit — Has negative slope for negatively associated data
Linear Regression — Method for modeling the inverse relationship