Mathwords logoMathwords

Maximum Likelihood — Definition, Formula & Examples

Maximum likelihood is a method for estimating the parameters of a statistical model by finding the parameter values that make the observed data most probable. You choose the estimate that maximizes the likelihood function—essentially asking, 'Which parameter value would have been most likely to produce the data I actually saw?'

Given a sample of independent observations x1,x2,,xnx_1, x_2, \ldots, x_n drawn from a distribution with probability (density) function f(xθ)f(x \mid \theta), the maximum likelihood estimator (MLE) of θ\theta is the value θ^\hat{\theta} that maximizes the likelihood function L(θ)=i=1nf(xiθ)L(\theta) = \prod_{i=1}^{n} f(x_i \mid \theta), or equivalently, the log-likelihood (θ)=i=1nlnf(xiθ)\ell(\theta) = \sum_{i=1}^{n} \ln f(x_i \mid \theta).

Key Formula

θ^=argmaxθ  L(θ)=argmaxθi=1nf(xiθ)\hat{\theta} = \arg\max_{\theta}\; L(\theta) = \arg\max_{\theta} \prod_{i=1}^{n} f(x_i \mid \theta)
Where:
  • θ^\hat{\theta} = Maximum likelihood estimate of the parameter
  • θ\theta = Unknown parameter to be estimated
  • f(xiθ)f(x_i \mid \theta) = Probability (density) of the i-th observation given θ
  • nn = Number of observations in the sample

How It Works

First, write down the likelihood function by multiplying together the probability of each observed data point as a function of the unknown parameter. Because products are hard to differentiate, take the natural log to convert the product into a sum—this gives the log-likelihood. Then take the derivative of the log-likelihood with respect to the parameter, set it equal to zero, and solve. The solution is your MLE. Always check the second derivative to confirm you found a maximum, not a minimum.

Worked Example

Problem: You flip a coin 20 times and observe 13 heads. Assuming each flip is independent with probability p of heads, find the MLE of p.
Write the likelihood: The number of heads follows a binomial distribution. The likelihood function is:
L(p)=(2013)p13(1p)7L(p) = \binom{20}{13} p^{13}(1-p)^{7}
Take the log-likelihood: Drop the constant binomial coefficient since it does not depend on p:
(p)=13lnp+7ln(1p)+C\ell(p) = 13\ln p + 7\ln(1-p) + C
Differentiate and solve: Set the derivative equal to zero and solve for p:
ddp=13p71p=0    p^=1320=0.65\frac{d\ell}{dp} = \frac{13}{p} - \frac{7}{1-p} = 0 \implies \hat{p} = \frac{13}{20} = 0.65
Answer: The maximum likelihood estimate is p^=0.65\hat{p} = 0.65, which matches the intuitive sample proportion.

Why It Matters

MLE is the default estimation technique behind logistic regression, survival analysis, and many machine learning models. When you fit a model in R or Python, the software is often running MLE behind the scenes. Understanding it gives you insight into why parameter estimates behave the way they do and how confidence intervals are constructed.

Common Mistakes

Mistake: Maximizing the likelihood function directly instead of the log-likelihood, leading to algebraic errors with large products.
Correction: Always take the natural log first. The log transforms products into sums, making differentiation straightforward. The maximum occurs at the same parameter value because ln is a strictly increasing function.