Likelihood — Definition, Formula & Examples

Likelihood is a measure of how well a particular parameter value or hypothesis explains observed data. Unlike probability, which describes how likely data are before you see them, likelihood evaluates how plausible different parameter values are after the data have been observed.

Given observed data $x$ and a statistical model with parameter $\theta$ , the likelihood function is defined as $L(\theta \mid x) = P(x \mid \theta)$ . It is numerically equal to the probability of the data given $\theta$ , but is treated as a function of $\theta$ with $x$ fixed, rather than a function of $x$ with $\theta$ fixed.

Key Formula

L(\theta \mid x) = P(x \mid \theta)

Where:

$L(\theta \mid x)$ = Likelihood of parameter θ given observed data x
$\theta$ = Parameter or hypothesis being evaluated
$x$ = Observed data
$P(x \mid \theta)$ = Probability of observing data x assuming parameter θ is true

How It Works

You start with observed data and a model that has one or more unknown parameters. For each candidate parameter value, you compute the probability that the model would have produced the data you actually saw — that value is the likelihood. Higher likelihood means the parameter value is more consistent with your observations. In maximum likelihood estimation (MLE), you find the parameter value that maximizes this function. Likelihood also plays a central role in Bayes' theorem, where it weights the prior belief about a hypothesis to produce a posterior probability.

Worked Example

Problem: A coin is flipped 10 times and lands heads 7 times. Compare the likelihood of θ = 0.5 (fair coin) versus θ = 0.7 (biased coin).

Set up the likelihood using the binomial model: With n = 10 flips and k = 7 heads, the likelihood for a given θ is:

L(\theta \mid 7) = \binom{10}{7} \theta^7 (1 - \theta)^3

Compute likelihood for θ = 0.5: Substitute θ = 0.5 into the formula.

L(0.5 \mid 7) = 120 \cdot (0.5)^7 \cdot (0.5)^3 = 120 \cdot \frac{1}{1024} \approx 0.1172

Compute likelihood for θ = 0.7: Substitute θ = 0.7 into the formula.

L(0.7 \mid 7) = 120 \cdot (0.7)^7 \cdot (0.3)^3 = 120 \cdot 0.0824 \cdot 0.027 \approx 0.2668

Answer: The likelihood for θ = 0.7 (≈ 0.267) is more than twice the likelihood for θ = 0.5 (≈ 0.117), so the data support the biased-coin hypothesis more strongly.

Why It Matters

Likelihood is the foundation of maximum likelihood estimation, one of the most widely used methods for fitting statistical models in fields from biology to machine learning. It also serves as the key updating factor in Bayes' theorem, connecting prior beliefs to posterior conclusions in any Bayesian analysis.

Common Mistakes

Mistake: Treating likelihood values as probabilities that must sum (or integrate) to 1.

Correction: Likelihood is not a probability distribution over θ. It does not need to sum or integrate to 1, which is why likelihoods for different parameter values can only be compared in ratio, not interpreted as absolute probabilities.