What is Loss Function

A loss function in machine learning is a mathematical function that measures the difference between the predicted output of a model ( $\hat{y}$ ) and the actual output (or true label, $y$ ). The purpose of the loss function is to quantify how well or poorly a model is performing by comparing the predicted values against the actual values.

Types of Loss Functions

There are different types of loss functions used depending on the type of machine learning problem. Mean Squared Error (MSE) will be an example to introduce, which measures the average squared difference between the predicted and actual values. It’s commonly used in regression problems.

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2

Example

Assuming that the previous case of house price prediction will be discussed in detail. The model and cost function are given in the table below. In order to make it more clear and easy to understand, the $b$ has been assigned to 0. Therefore, function $f_{w,b}(x) = wx + b$ has become to $f_{w}(x) = wx$

Name	Content
Model	$f_{w}(x) = wx$
Parameters	$w$
Cost Function	$J(w) = \frac{1}{n} \sum_{i=1}^{n} (f_{w}(x_i) - y_i)^2$
Goal	$Minimize J(w)$

Assume that there are three known data, which are $(1, 1)$ , $(2, 2)$ , and $(3, 3)$ if expressed in $(x, y)$ . The input value is on the x-axis and the output value is on the y-axis. Then let $w$ equals to 0, 0.5 and 1 and do the further calculation (while $w$ can be any value).

The calculation detail:

\because J(w) = \frac{1}{2n} \sum_{i=1}^{n} (wx_i - y_i)^2, w = 1

\therefore J(0) = \frac{1}{6} \sum_{i=1}^{3} (x_i - y_i)^2 =\frac{(1 - 1)^2 + (2 - 2)^2 + (3 - 3)^2}{6} = 0

J(0.5) = \frac{1}{6} \sum_{i=1}^{3} (x_i - y_i)^2 =\frac{(0.5 - 1)^2 + (1 - 2)^2 + (1.5 - 3)^2}{6} \approx 0.58

J(1) = \frac{1}{6} \sum_{i=1}^{3} (x_i - y_i)^2 =\frac{1^2 + 2^2 + 3^2}{6} \approx 2.3

The visualization of calculation result

Alt text for the image

What is other ways to calculate the w, instead of trying with numbers on by one. (Speculation)

Definition of the Loss Function

Assuming $b = 0$ , the loss function $J(w)$ is defined as:

J(w) = \frac{1}{n} \sum_{i=1}^{n} \left( wx_i - y_i \right)^2

where $n = 3$ , because we have three data points.

Substituting Data Points

Substitute the data points $(1,1)$ , $(2,2)$ , and $(3,3)$ into the loss function:

J(w) = \frac{1}{3} \left[ (w \cdot 1 - 1)^2 + (2w - 2)^2 + (3w - 3)^2 \right]

This can be further simplified to:

J(w) = \frac{1}{3} \left[ (w - 1)^2 + (2w - 2)^2 + (3w - 3)^2 \right]

Taking the Derivative with Respect to $w$

Differentiate with respect to $w$ and set the derivative to zero:

\frac{\partial J(w)}{\partial w} = \frac{2}{3} \left[ (w - 1) \cdot 1 + (2w - 2) \cdot 2 + (3w - 3) \cdot 3 \right] = 0

Expand and simplify:

\frac{\partial J(w)}{\partial w} = \frac{2}{3} \left[ (w - 1) + 4(w - 1) + 9(w - 1) \right] = \frac{2}{3} \times 14 \times (w - 1) = 0

Solving gives: $w = 1$

Conclusion

The optimal value of $w$ is $1$ , so the ideal regression model (function) is: $f(x) = 1 \cdot x$