Zipfian distribution

Created
Created
2025 Jan 22 11:0
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Jan 22 11:59

Power law that

Dataset exhibiting scale-invariance and self-similarity driven by power-law relationship
a few words occur very often, and many words hardly ever occur
Word frequency is inversely proportional to rank. For example, the most frequent word appears approximately twice as often as the second most frequent word
where is means frequency and indicates the rank in a set of variables controlled by parameter
notion image
Practice follows theory quite well, but not entirely.

What will happen to this plot if we remove stop words from our vocabulary?

 
 
 
 

Recommendations