Informational Entropy
A characteristic value that represents the shape of probability distribution and amount of information. It measures of information content; the number of bits actually required to store data; How random it is, How broad it is (Uniform distribution has maximum entropy)
The information content of a message is a function of how predictable it is. The information content (number of bits) needed to encode i is . So Next Token Prediction probability is containing information content itself.
Average Information Content
The entropy of a probability distribution can be interpreted as a measure of uncertainty, or lack of predictability
Information Content of Individual Events
Information content must be additive, meaning the total information content equals the sum of individual event information
정보 엔트로피(information entropy) - 공돌이의 수학정리노트 (Angelo's Math Notes)
정보란 무엇인가?공부를 하다보면 용어에 막히는 경우가 종종 있다.특히, 필자의 경우에는 통계학을 공부하면서 용어에서 많은 벽을 느꼈던 것 같다.가령 귀무가설 같은 용어는 너무 생소하다보니 그 용어를 익히는데 애를 먹었던 좋은 예라고 할 수 있을 것 같다.그런데, 이번 포스트에서 말...
https://angeloyeo.github.io/2020/10/26/information_entropy.html

Seonglae Cho