Cross-entropy is widely used in modern ML to compute the loss for classification tasks. This post is a brief overview of the math behind it and a related concept called Kullback-Leibler (KL) divergence. Information content of a single random event We'll start with a single event (E) that has probability p. The information content (or "degree of surprise") of this event occurring is defined as: \[I(E) = \log_2 \left (\frac{1}{p} \right )\] The base 2 here is used so that we can count the information in units of bits. Thinking about this definition intuitively, imagine an event with probability p=1; using the formula, the information we gain by observing this event occurring is 0, which makes sense. On the other extreme, as the probability p approaches 0, the information we gain is huge. An equivalent way to write the formula is: \[I(E) = -\log_2 p\] Some numeric examples: suppose we flip a fair coin and it comes out heads. The probability of this event happening is 1/2, therefore: \[I(E_{heads})=-\log_2 \frac{1}{2} = 1\] Now suppose we roll a fair die and it lands on 4. The probability of this event happening is 1/6, therefore: \[I(E_4)=-\log_2 \frac{1}{6} = 2.58\] In other words, the degree of surprise for rolling a 4 is higher than the degree of surprise for flipping to heads - which makes sense, given the probabilities involved. Other than behaving correctly for boundary values, the logarithm function makes sense for calculating the degree of surprise for another important reason: the way it behaves for a combination of events. Consider this: we flip a fair coin and roll a fair die; the coin comes out heads, and the die lands on 4. What is the probability of this event happening? Because the two events are independent, the probability is the product of the probabilities of the individual events, so 1/12, and then: \[I(E_{heads}\cap E_{4})=-\log_2 \frac{1}{12} = 3.58\] Note that the entropy is the precise sum of the entropies of individual events. This is to be exp...
First seen: 2025-04-13 07:59
Last seen: 2025-04-13 17:01