Ben Chuanlong Du's Blog

It is never too late to learn.

Loss Functions for Machine Learning Models

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

Tips and Traps

  1. A Loss function is always non-negative. If you get a negative loss when training a model, there must be something wrong with the code. For example, maybe you chosed a loss function incorrectly.

Loss Functions

0-1 Loss

Or sometimes called binary loss function.



Negative Log Likelihood

Comparisons of Loss Functions

Cross Entropy vs Negative Log Likelihood

Please refer to Entropy for detailed discussions.

MSE (L2 Loss) vs L1 Loss

MSE is a L2 loss function. Both L1 and L2 loss functions are special cases of \(L_p\) (\(p>0\)) loss functions. \(L_p\) loss functions are typicall used for regression problems. Compared to L1 loss, L2 loss gives larger weights on larger (absolute) errors. In many real applications, we often observe the following 2 phenomena. 1. Real/training data is not uniformly distribution across all scenarios. There are often a lot more samples which generate small respoonse values. 2. During training, large response values often have larger errors. If samples generating larger response values (errors) are not important (or can be treated as outliers) then a L1 loss (or even \(L_p\) where \(0<p<1\)) is a better choice than a L2 loss. If samples generating larger response values (errors) cannot be treated as outliers or even more important, then a L2 loss (or even \(L_p\) where \(p>1\)) is better than a L1 loss.

Loss Functions in PyTorch




PyTorch Loss Functions: The Ultimate Guide