Scatterplot
Scatterplot
A graph of paired data in which the data values are plotted as (x, y) points.

See also
Least-squares regression equation, least-squares regression line, regression, linear fit
Key Formula
r=i=1∑n(xi−xˉ)2⋅i=1∑n(yi−yˉ)2i=1∑n(xi−xˉ)(yi−yˉ)
Where:
- r = Correlation coefficient, measuring the strength and direction of the linear relationship (ranges from −1 to 1)
- n = Number of data pairs
- xi = The x-value of the i-th data point
- yi = The y-value of the i-th data point
- xˉ = Mean of all x-values
- yˉ = Mean of all y-values
Worked Example
Problem: A teacher records the number of hours five students studied and their test scores. Plot the data as a scatterplot and describe the association.
Student A: (1, 50), Student B: (2, 60), Student C: (3, 65), Student D: (4, 80), Student E: (5, 90)
Step 1: Identify the variables. Hours studied is the independent variable (x-axis) and test score is the dependent variable (y-axis).
x=hours studied,y=test score
Step 2: Set up the axes. The x-axis runs from 0 to 6. The y-axis runs from 40 to 100.
Step 3: Plot each ordered pair. Place a dot at (1, 50), (2, 60), (3, 65), (4, 80), and (5, 90) on the coordinate plane.
Step 4: Describe the association. As hours increase, scores also increase. The points roughly follow an upward line, indicating a strong positive linear association.
Step 5: Compute the correlation coefficient to quantify the relationship. The means are:
xˉ=51+2+3+4+5=3,yˉ=550+60+65+80+90=69
Answer: The scatterplot shows a strong positive linear association between hours studied and test score. Calculating further yields r ≈ 0.99, confirming a nearly perfect positive linear relationship.
Another Example
This example differs from the first by showing data with no apparent association, illustrating that not every scatterplot reveals a trend. It reinforces the idea that correlation near zero means the variables lack a linear relationship.
Problem: A researcher records the age (in years) and daily screen time (in hours) for six people: (10, 4), (20, 5), (30, 3), (40, 2), (50, 6), (60, 3). Plot a scatterplot and describe what you see.
Step 1: Assign age to the x-axis and screen time to the y-axis. The x-axis runs from 0 to 70, and the y-axis from 0 to 7.
Step 2: Plot the six points: (10, 4), (20, 5), (30, 3), (40, 2), (50, 6), (60, 3).
Step 3: Examine the pattern. The points do not follow any clear upward or downward trend. They appear scattered without a consistent direction.
Step 4: Compute the means and correlation coefficient. The means are:
xˉ=610+20+30+40+50+60=35,yˉ=64+5+3+2+6+3≈3.83
Step 5: After computing the full formula, the correlation coefficient is approximately r ≈ 0.07, which is very close to zero.
r≈0.07
Answer: The scatterplot shows no clear linear association between age and daily screen time. The correlation coefficient r ≈ 0.07 confirms virtually no linear relationship.
Frequently Asked Questions
What is the difference between a scatterplot and a line graph?
A scatterplot displays individual data points without connecting them, showing the general relationship between two variables. A line graph connects consecutive data points with straight segments, typically used when the x-variable represents a sequence like time. Scatterplots are better for exploring whether a relationship exists, while line graphs emphasize change over an ordered variable.
How do you tell if a scatterplot shows a positive, negative, or no correlation?
If the points trend upward from left to right, the scatterplot shows a positive correlation—as x increases, y tends to increase. If the points trend downward, there is a negative correlation. If the points show no discernible upward or downward pattern, there is little to no correlation. The correlation coefficient r quantifies this: values near +1 indicate strong positive, near −1 indicate strong negative, and near 0 indicate no linear relationship.
When do you use a scatterplot instead of a bar chart?
Use a scatterplot when both variables are quantitative (numerical) and you want to explore the relationship between them. Use a bar chart when one variable is categorical (like names of countries or types of fruit). Scatterplots are designed for continuous numerical data, while bar charts compare counts or values across distinct categories.
Scatterplot vs. Line Graph
| Scatterplot | Line Graph | |
|---|---|---|
| Definition | Plots individual (x, y) data points without connecting them | Plots data points and connects them with line segments |
| Data type | Two quantitative variables, not necessarily ordered | Typically one ordered variable (like time) on the x-axis |
| Purpose | Explore relationships, identify patterns, detect outliers | Show trends and changes over a continuous or sequential variable |
| Connecting points | Points are left unconnected (or a trend line is added separately) | Points are connected in order from left to right |
| Best for | Determining if two variables are correlated | Tracking how a single measurement changes over time |
Why It Matters
Scatterplots are one of the first tools you encounter in statistics courses and standardized tests (SAT, ACT, AP Statistics) for analyzing bivariate data. In science classes, you use scatterplots to identify experimental relationships—like whether increasing temperature affects reaction rate. Beyond school, scatterplots are fundamental in data science and research for detecting trends, spotting outliers, and deciding whether to fit a regression model.
Common Mistakes
Mistake: Connecting the dots in a scatterplot as if it were a line graph.
Correction: Leave the points unconnected. A scatterplot shows the overall pattern of the data, not a point-to-point path. If you want to show a trend, add a separate best-fit (regression) line instead.
Mistake: Placing the independent variable on the y-axis and the dependent variable on the x-axis.
Correction: The independent variable (the one you control or that serves as the predictor) belongs on the x-axis, and the dependent variable (the outcome you measure) belongs on the y-axis. Reversing them makes the graph misleading and any regression analysis incorrect.
Related Terms
- Paired Data — The type of data plotted in a scatterplot
- Least-Squares Regression Line — Best-fit line drawn through scatterplot points
- Least-Squares Regression Equation — Equation of the line fitted to scatterplot data
- Regression — Method for modeling relationships shown in scatterplots
- Linear Fit — Describes how well a line matches scatterplot data
- Graph of an Equation or Inequality — Broader concept of graphing on a coordinate plane
- Point — Each data pair is represented as a point
