Mathwords logoMathwords

Scatterplot

Scatterplot

A graph of paired data in which the data values are plotted as (x, y) points.

 

Scatterplot with x and y axes showing approximately 12 data points scattered in a loosely upward-trending pattern.

 

 

See also

Least-squares regression equation, least-squares regression line, regression, linear fit

Key Formula

r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2    i=1n(yiyˉ)2r = \frac{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\displaystyle\sum_{i=1}^{n}(x_i - \bar{x})^2 \;\cdot\; \displaystyle\sum_{i=1}^{n}(y_i - \bar{y})^2}}
Where:
  • rr = Correlation coefficient, measuring the strength and direction of the linear relationship (ranges from −1 to 1)
  • nn = Number of data pairs
  • xix_i = The x-value of the i-th data point
  • yiy_i = The y-value of the i-th data point
  • xˉ\bar{x} = Mean of all x-values
  • yˉ\bar{y} = Mean of all y-values

Worked Example

Problem: A teacher records the number of hours five students studied and their test scores. Plot the data as a scatterplot and describe the association. Student A: (1, 50), Student B: (2, 60), Student C: (3, 65), Student D: (4, 80), Student E: (5, 90)
Step 1: Identify the variables. Hours studied is the independent variable (x-axis) and test score is the dependent variable (y-axis).
x=hours studied,y=test scorex = \text{hours studied}, \quad y = \text{test score}
Step 2: Set up the axes. The x-axis runs from 0 to 6. The y-axis runs from 40 to 100.
Step 3: Plot each ordered pair. Place a dot at (1, 50), (2, 60), (3, 65), (4, 80), and (5, 90) on the coordinate plane.
Step 4: Describe the association. As hours increase, scores also increase. The points roughly follow an upward line, indicating a strong positive linear association.
Step 5: Compute the correlation coefficient to quantify the relationship. The means are:
xˉ=1+2+3+4+55=3,yˉ=50+60+65+80+905=69\bar{x} = \frac{1+2+3+4+5}{5} = 3, \quad \bar{y} = \frac{50+60+65+80+90}{5} = 69
Answer: The scatterplot shows a strong positive linear association between hours studied and test score. Calculating further yields r ≈ 0.99, confirming a nearly perfect positive linear relationship.

Another Example

This example differs from the first by showing data with no apparent association, illustrating that not every scatterplot reveals a trend. It reinforces the idea that correlation near zero means the variables lack a linear relationship.

Problem: A researcher records the age (in years) and daily screen time (in hours) for six people: (10, 4), (20, 5), (30, 3), (40, 2), (50, 6), (60, 3). Plot a scatterplot and describe what you see.
Step 1: Assign age to the x-axis and screen time to the y-axis. The x-axis runs from 0 to 70, and the y-axis from 0 to 7.
Step 2: Plot the six points: (10, 4), (20, 5), (30, 3), (40, 2), (50, 6), (60, 3).
Step 3: Examine the pattern. The points do not follow any clear upward or downward trend. They appear scattered without a consistent direction.
Step 4: Compute the means and correlation coefficient. The means are:
xˉ=10+20+30+40+50+606=35,yˉ=4+5+3+2+6+363.83\bar{x} = \frac{10+20+30+40+50+60}{6} = 35, \quad \bar{y} = \frac{4+5+3+2+6+3}{6} \approx 3.83
Step 5: After computing the full formula, the correlation coefficient is approximately r ≈ 0.07, which is very close to zero.
r0.07r \approx 0.07
Answer: The scatterplot shows no clear linear association between age and daily screen time. The correlation coefficient r ≈ 0.07 confirms virtually no linear relationship.

Frequently Asked Questions

What is the difference between a scatterplot and a line graph?
A scatterplot displays individual data points without connecting them, showing the general relationship between two variables. A line graph connects consecutive data points with straight segments, typically used when the x-variable represents a sequence like time. Scatterplots are better for exploring whether a relationship exists, while line graphs emphasize change over an ordered variable.
How do you tell if a scatterplot shows a positive, negative, or no correlation?
If the points trend upward from left to right, the scatterplot shows a positive correlation—as x increases, y tends to increase. If the points trend downward, there is a negative correlation. If the points show no discernible upward or downward pattern, there is little to no correlation. The correlation coefficient r quantifies this: values near +1 indicate strong positive, near −1 indicate strong negative, and near 0 indicate no linear relationship.
When do you use a scatterplot instead of a bar chart?
Use a scatterplot when both variables are quantitative (numerical) and you want to explore the relationship between them. Use a bar chart when one variable is categorical (like names of countries or types of fruit). Scatterplots are designed for continuous numerical data, while bar charts compare counts or values across distinct categories.

Scatterplot vs. Line Graph

ScatterplotLine Graph
DefinitionPlots individual (x, y) data points without connecting themPlots data points and connects them with line segments
Data typeTwo quantitative variables, not necessarily orderedTypically one ordered variable (like time) on the x-axis
PurposeExplore relationships, identify patterns, detect outliersShow trends and changes over a continuous or sequential variable
Connecting pointsPoints are left unconnected (or a trend line is added separately)Points are connected in order from left to right
Best forDetermining if two variables are correlatedTracking how a single measurement changes over time

Why It Matters

Scatterplots are one of the first tools you encounter in statistics courses and standardized tests (SAT, ACT, AP Statistics) for analyzing bivariate data. In science classes, you use scatterplots to identify experimental relationships—like whether increasing temperature affects reaction rate. Beyond school, scatterplots are fundamental in data science and research for detecting trends, spotting outliers, and deciding whether to fit a regression model.

Common Mistakes

Mistake: Connecting the dots in a scatterplot as if it were a line graph.
Correction: Leave the points unconnected. A scatterplot shows the overall pattern of the data, not a point-to-point path. If you want to show a trend, add a separate best-fit (regression) line instead.
Mistake: Placing the independent variable on the y-axis and the dependent variable on the x-axis.
Correction: The independent variable (the one you control or that serves as the predictor) belongs on the x-axis, and the dependent variable (the outcome you measure) belongs on the y-axis. Reversing them makes the graph misleading and any regression analysis incorrect.

Related Terms