Outlier
A data point that is distinctly separate from
the rest of the data. One definition of outlier is any data point
more than 1.5 interquartile
ranges (IQRs) below the first
quartile or above the third
quartile.
Note: The IQR definition given here is widely
used but is not the last word in determining whether a given number is an outlier.
| Example: |
For the data 2, 5, 6, 9, 12, we have the following five-number summary:
minimum = 2
first quartile = 3.5
median = 6
third quartile = 10.5
maximum = 12
IQR = 10.5 – 3.5 = 7, so 1.5·IQR = 10.5.
To determine if there are outliers we must consider the numbers that are 1.5·IQR or 10.5 beyond the quartiles.
first quartile – 1.5·IQR = 3.5 – 10.5 = –7
third quartile + 1.5·IQR = 10.5 + 10.5 = 21
Since none of the data are outside the interval from –7 to 21, there are no outliers. |
Worked Example
Problem: Determine whether any outliers exist in the data set: 3, 7, 8, 10, 11, 12, 14, 40.
Step 1: Find the first quartile (Q₁) and third quartile (Q₃). With 8 data points, Q₁ is the median of the lower half (3, 7, 8, 10) and Q₃ is the median of the upper half (11, 12, 14, 40).
Q1=27+8=7.5Q3=212+14=13 Step 2: Calculate the interquartile range (IQR).
IQR=Q3−Q1=13−7.5=5.5 Step 3: Find the lower and upper fences by subtracting and adding 1.5 × IQR from the quartiles.
Lower fenceUpper fence=7.5−1.5(5.5)=7.5−8.25=−0.75=13+1.5(5.5)=13+8.25=21.25 Step 4: Check each data value. Any value below −0.75 or above 21.25 is an outlier. The value 40 exceeds 21.25, so it is an outlier. All other values fall within the fences.
40>21.25⟹40 is an outlier Answer: The data set has one outlier: 40.
Another Example
Problem: Test scores for a class are: 55, 70, 72, 75, 78, 80, 82, 85, 88, 90. Are there any outliers?
Step 1: Find Q₁ and Q₃. The lower half is 55, 70, 72, 75, 78 and the upper half is 80, 82, 85, 88, 90. The median of each half is the middle value.
Q1=72Q3=85 Step 2: Calculate IQR.
IQR=85−72=13 Step 3: Compute the fences.
Lower fence=72−1.5(13)=72−19.5=52.5Upper fence=85+1.5(13)=85+19.5=104.5 Step 4: Every value from 55 to 90 falls within [52.5, 104.5]. The score of 55 is low but still above the lower fence.
55>52.5⟹not an outlier Answer: There are no outliers in this data set. Even the lowest score (55) is within the fences.
Frequently Asked Questions
How do you find outliers in a data set?
Calculate Q₁, Q₃, and the IQR (Q₃ − Q₁). Then find the lower fence (Q₁ − 1.5 × IQR) and upper fence (Q₃ + 1.5 × IQR). Any data point below the lower fence or above the upper fence is classified as an outlier.
Should you remove outliers from your data?
Not automatically. An outlier may be a data-entry error, in which case removing or correcting it makes sense. However, it could also be a genuine extreme value that carries important information. Always investigate why a value is unusual before deciding whether to keep or remove it.
Outlier (1.5 × IQR rule) vs. Extreme outlier (3 × IQR rule)
The standard 1.5 × IQR rule flags values that are moderately far from the bulk of the data. Some textbooks also define an extreme outlier as any value more than 3 × IQR below Q₁ or above Q₃. Extreme outliers represent even rarer, more distant data points. For example, if Q₃ = 13 and IQR = 5.5, the standard upper fence is 21.25 while the extreme upper fence is 29.5.
Why It Matters
Outliers can dramatically shift the mean and inflate the standard deviation, giving a misleading picture of a data set. Identifying them helps you decide whether to use the mean or the median as a better measure of center. In real-world contexts—like medical studies, quality control, or financial analysis—spotting outliers can reveal errors, fraud, or genuinely unusual events that deserve closer attention.
Common Mistakes
Mistake: Assuming any value that "looks" far from the rest is automatically an outlier.
Correction: Use a specific criterion like the 1.5 × IQR rule to test whether a value truly qualifies as an outlier. Visual intuition alone is unreliable, especially with skewed data.
Mistake: Deleting outliers without investigating them first.
Correction: An outlier might be a legitimate data point that reveals something important. Check for recording errors or measurement issues before removing any value from the data set.