Cost Function — Definition, Formula & Examples

A cost function is a formula that takes in one or more variables and outputs a single number representing how far off a result is from what you want. The goal in optimization is usually to minimize this value.

A cost function $J(\theta)$ is a mapping from a set of parameters $\theta$ to the real numbers, where the output quantifies the error or expense associated with a particular choice of parameters. In optimization and machine learning, the objective is typically to find the parameter values $\theta^*$ that minimize $J(\theta)$ . Cost functions are also called loss functions or objective functions depending on the context.

Key Formula

J(\theta) = \frac{1}{N}\sum_{i=1}^{N}\left(y_i - f(x_i, \theta)\right)^2

Where:

$J(θ)$ = the cost function (total error)
$N$ = the number of data points
$y_i$ = the actual (observed) value for data point i
$f(x_i, θ)$ = the predicted value for data point i, given parameters θ
$θ$ = the parameter(s) being optimized

Worked Example

Problem: You are fitting a line

f(x) = 2x

to three data points:

(1, 2.5)

(2, 4.0)

, and

(3, 7.0)

. Calculate the mean squared error cost.

Step 1: Compute the predicted value for each data point using

f(x) = 2x

f(1) = 2,\quad f(2) = 4,\quad f(3) = 6

Step 2: Find the error (difference) between each actual value

y_i

and the predicted value

f(x_i)

(2.5 - 2) = 0.5,\quad (4.0 - 4) = 0,\quad (7.0 - 6) = 1.0

Step 3: Square each error.

0.5^2 = 0.25,\quad 0^2 = 0,\quad 1.0^2 = 1.0

Step 4: Sum the squared errors and divide by the number of data points

N = 3

J = \frac{1}{3}(0.25 + 0 + 1.0) = \frac{1.25}{3} \approx 0.417

Answer: The mean squared error cost is approximately

0.417

. A different slope or intercept might produce a lower cost, which is the whole point of optimization — adjusting parameters to make

J

as small as possible.

Why It Matters

Cost functions are central to machine learning and data science. When a computer "learns" from data — for example, training a model to recognize images or predict prices — it repeatedly evaluates a cost function and adjusts its parameters to reduce the error. Without a well-defined cost function, there would be no way to measure whether one solution is better than another.

Common Mistakes

Mistake: Forgetting to square the errors, so positive and negative differences cancel out.

Correction: In mean squared error, each difference is squared before summing. This ensures that an error of

-2

counts just as much as an error of

+2

, and the total cost is always non-negative.

Mistake: Confusing the cost function with the model itself.

Correction: The model

f(x, \theta)

makes predictions; the cost function

J(\theta)

measures how bad those predictions are. They are two separate things — you adjust the model's parameters to reduce the cost.