Topic model uncovers hidden (latent) topical patterns or semantic structure. Quantitatively, It groups (or clusters) of words (terms, n-grams) that are somehow related. It is often defined as a probabilistic structure (e.g. word cluster) expressing a certain set of assumptions about how the documents in our collection were generated.
Topic modeling methods
Pointwise mutual information
Words that occur in similar contexts (co-occur) tend to have similar meanings
- Numerator: How often we have seen these words together
- Denominator: How often we expect the words to co-occur, assuming they are independent
- PMI: how much more two words, co-occur than expected by chance