Mathwords logoReference LibraryMathwords

Cost Function

A cost function is a formula that takes in one or more variables and outputs a single number representing how far off a result is from what you want. The goal in optimization is usually to minimize this value.

A cost function J(θ)J(\theta) is a mapping from a set of parameters θ\theta to the real numbers, where the output quantifies the error or expense associated with a particular choice of parameters. In optimization and machine learning, the objective is typically to find the parameter values θ\theta^* that minimize J(θ)J(\theta). Cost functions are also called loss functions or objective functions depending on the context.

Key Formula

J(θ)=1Ni=1N(yif(xi,θ))2J(\theta) = \frac{1}{N}\sum_{i=1}^{N}\left(y_i - f(x_i, \theta)\right)^2
Where:
  • J(θ)J(θ) = the cost function (total error)
  • NN = the number of data points
  • yiy_i = the actual (observed) value for data point i
  • f(xi,θ)f(x_i, θ) = the predicted value for data point i, given parameters θ
  • θθ = the parameter(s) being optimized

Worked Example

Problem: You are fitting a line f(x)=2xf(x) = 2x to three data points: (1,2.5)(1, 2.5), (2,4.0)(2, 4.0), and (3,7.0)(3, 7.0). Calculate the mean squared error cost.
Step 1: Compute the predicted value for each data point using f(x)=2xf(x) = 2x.
f(1)=2,f(2)=4,f(3)=6f(1) = 2,\quad f(2) = 4,\quad f(3) = 6
Step 2: Find the error (difference) between each actual value yiy_i and the predicted value f(xi)f(x_i).
(2.52)=0.5,(4.04)=0,(7.06)=1.0(2.5 - 2) = 0.5,\quad (4.0 - 4) = 0,\quad (7.0 - 6) = 1.0
Step 3: Square each error.
0.52=0.25,02=0,1.02=1.00.5^2 = 0.25,\quad 0^2 = 0,\quad 1.0^2 = 1.0
Step 4: Sum the squared errors and divide by the number of data points N=3N = 3.
J=13(0.25+0+1.0)=1.2530.417J = \frac{1}{3}(0.25 + 0 + 1.0) = \frac{1.25}{3} \approx 0.417
Answer: The mean squared error cost is approximately 0.4170.417. A different slope or intercept might produce a lower cost, which is the whole point of optimization — adjusting parameters to make JJ as small as possible.

Visualization

Why It Matters

Cost functions are central to machine learning and data science. When a computer "learns" from data — for example, training a model to recognize images or predict prices — it repeatedly evaluates a cost function and adjusts its parameters to reduce the error. Without a well-defined cost function, there would be no way to measure whether one solution is better than another.

Common Mistakes

Mistake: Forgetting to square the errors, so positive and negative differences cancel out.
Correction: In mean squared error, each difference is squared before summing. This ensures that an error of 2-2 counts just as much as an error of +2+2, and the total cost is always non-negative.
Mistake: Confusing the cost function with the model itself.
Correction: The model f(x,θ)f(x, \theta) makes predictions; the cost function J(θ)J(\theta) measures how bad those predictions are. They are two separate things — you adjust the model's parameters to reduce the cost.

Related Terms

  • MinimizeThe typical goal applied to a cost function