The information content of a message is a function of how predictable it is. The information content (number of bits) needed to encode i is . So Next Token Prediction probability is containing information content itself.
The entropy of a message is the expected number of bits needed to encode it. (Shannon entropy)