To understand the cross-entropy loss function, we first need to grasp three concepts from information theory: surprisal, expected value, and entropy. The cross-entropy loss function is derived from the concept of entropy and is adapted for use in deep neural networks.
In general, the cross-entropy loss function is preferred over the mean squared error (MSE) loss function for deep neural networks. The loss from cross-entropy is typically larger than the loss from MSE, resulting in a steeper slope for the cross-entropy loss graph. This makes the minimization of loss more effective with cross-entropy.
Back to Blog