Mathwords logoReference LibraryMathwords

Categorical Data

Categorical data is data that can be sorted into groups or categories based on labels, names, or qualities rather than numbers. Examples include eye color (blue, brown, green), type of pet (dog, cat, fish), or favorite school subject (math, science, history). Each data value belongs to one and only one category.

Categorical data (also called qualitative data) consists of observations that can be classified into a finite set of distinct categories or groups. Unlike numerical data, categorical data cannot be meaningfully added, subtracted, or averaged. Categorical variables may be nominal (no natural ordering, such as blood type: A, B, AB, O) or ordinal (categories have a natural order, such as education level: high school, bachelor's, master's, doctorate). Analysis of categorical data typically involves frequency counts, proportions, and graphical displays such as bar graphs, circle graphs, and two-way tables.

Worked Example

Problem: A teacher surveys 30 students about their favorite season. The results are: Spring — 8, Summer — 12, Fall — 7, Winter — 3. Organize this categorical data and find the relative frequency of each category.
Step 1: Identify the variable and its categories. The variable is 'favorite season' and the categories are Spring, Summer, Fall, and Winter. Since these are labels with no inherent numerical value, this is categorical data.
Step 2: Build a frequency table listing each category and its count.
SeasonFrequencySpring8Summer12Fall7Winter3Total30\begin{array}{lc} \textbf{Season} & \textbf{Frequency} \\ \hline \text{Spring} & 8 \\ \text{Summer} & 12 \\ \text{Fall} & 7 \\ \text{Winter} & 3 \\ \hline \textbf{Total} & 30 \end{array}
Step 3: Calculate the relative frequency of each category by dividing the count by the total number of responses.
Spring: 8300.267,Summer: 1230=0.400,Fall: 7300.233,Winter: 330=0.100\text{Spring: } \frac{8}{30} \approx 0.267, \quad \text{Summer: } \frac{12}{30} = 0.400, \quad \text{Fall: } \frac{7}{30} \approx 0.233, \quad \text{Winter: } \frac{3}{30} = 0.100
Answer: Summer is the most popular season (40%), followed by Spring (26.7%), Fall (23.3%), and Winter (10%). These relative frequencies sum to 1 (100%), confirming all responses are accounted for.

Visualization

Why It Matters

Categorical data appears everywhere — from survey responses and medical diagnoses to product reviews and election results. In AP Statistics, you will use categorical data to construct and interpret two-way tables, perform chi-square tests of independence, and compare proportions across groups. Understanding whether data is categorical or numerical is one of the first decisions in any statistical analysis, because it determines which graphs, summary statistics, and inference methods are appropriate.

Common Mistakes

Mistake: Calculating a mean or standard deviation for categorical data.
Correction: Means and standard deviations only make sense for numerical data. For categorical data, use frequencies, proportions, and the mode (most common category) instead.
Mistake: Confusing categorical data with numerical data that uses number codes.
Correction: If a survey codes 'Yes' as 1 and 'No' as 2, the data is still categorical. The numbers are just labels — it would be meaningless to say the average response is 1.4.

Related Terms

  • Two-Way TableDisplays frequencies for two categorical variables simultaneously
  • Frequency TableOrganizes categorical data by counting occurrences in each category
  • Bar GraphCommon graph for displaying categorical data frequencies
  • Circle GraphShows categorical data as proportional slices of a whole