Shannon entropy

Creator
Creator
Seonglae Cho
Created
Created
2023 Jun 1 6:9
Editor
Edited
Edited
2025 Apr 29 2:56

Informational Entropy

A characteristic value that represents the shape of probability distribution and amount of information. It measures of information content; the number of bits actually required to store data; How random it is, How broad it is (Uniform distribution has maximum entropy)
The information content of a message is a function of how predictable it is. The information content (number of bits) needed to encode i is log2(1/pi)=log2pi\log_2 (1/p_i) = -\log_2 p_i. So
Next Token Prediction
probability is containing information content itself.
The entropy of a message is the expected number of bits needed to encode it. H(p)=i=1npilog2piH(p) = -\sum_{i=1}^{n} p_i \log_2 p_i (
Shannon entropy
)

Average Information Content

The entropy of a probability distribution can be interpreted as a measure of uncertainty, or lack of predictability
H(p(x))=Exp(x)[logp(x)]H(p(x)) = -E_{x\sim p(x)} [\log p(x)]

Information Content of Individual Events

Information content must be additive, meaning the total information content equals the sum of individual event information
I(X)=lobb(P(X))I(X) = -lob_b(P(X))
 
 
 
 
 
 

Recommendations