Subset Sum Problem — Definition, Formula & Examples
The Subset Sum Problem asks: given a set of integers and a target value, does any subset of those integers add up exactly to the target? It is one of the most well-known problems in computer science for illustrating NP-completeness.
Given a finite set and a target integer , the Subset Sum Problem is the decision problem of determining whether there exists a subset such that . This problem is NP-complete, meaning no known polynomial-time algorithm solves all instances, yet any proposed solution can be verified in polynomial time.
How It Works
A brute-force approach checks every possible subset of the given set, testing whether its elements sum to the target. Since a set of elements has subsets, this runs in exponential time. A dynamic programming approach improves practical performance: build a table where entry records whether a subset of the first elements can sum to . This runs in time, which is pseudo-polynomial because it depends on the numeric value of , not just the input size. The problem's NP-completeness means that no algorithm is known to solve all instances in time polynomial in the length of the input.
Worked Example
Problem: Given the set and target , determine whether any subset sums to 11.
Step 1: Try subsets systematically. Check :
Step 2: Since , a valid subset exists. You could also verify that is another solution:
Answer: Yes, a subset summing to 11 exists — for example, .
Why It Matters
The Subset Sum Problem is a gateway to understanding NP-completeness in courses on algorithms and complexity theory. It appears directly in cryptography (some public-key schemes rely on its hardness) and in resource allocation tasks where you must decide if a budget can be met exactly. Mastering it prepares you for harder combinatorial problems like the knapsack problem and integer linear programming.
Common Mistakes
Mistake: Assuming the dynamic programming solution runs in polynomial time and therefore the problem is not truly hard.
Correction: The DP solution runs in time, which is pseudo-polynomial — it is polynomial in 's value, not in the number of bits needed to represent . For large target values, this is still exponential in input size.
