Cross Entropy Loss Function

Introduction

To understand the cross-entropy loss function, we first need to grasp three concepts from information theory: surprisal, expected value, and entropy. The cross-entropy loss function is derived from the concept of entropy and is adapted for use in deep neural networks.

Key Concepts

Surprisal: Denoted as the negative logarithm of the probability of an event, log(1/P(x)), where P(x) is the probability. The logarithm is used to simplify large numbers and allows addition instead of multiplication, which is computationally faster.
Expected Value: Calculated by multiplying the random variable by its probability, given by E(x) = ∑x ⋅ P(x).
Entropy: The expected value of surprisal, expressed as H(x) = ∑log(1/P(x)) ⋅ P(x).

Why Cross-Entropy?

In general, the cross-entropy loss function is preferred over the mean squared error (MSE) loss function for deep neural networks. The loss from cross-entropy is typically larger than the loss from MSE, resulting in a steeper slope for the cross-entropy loss graph. This makes the minimization of loss more effective with cross-entropy.

Back to Blog