LDA

Created
Created
2023 Jun 11 8:58
Editor
Creator
Creator
Seonglae Cho
Edited
Edited
2025 Mar 12 12:41

Latent Dirichlet Allocation

Motivation

LDA is an algorithm that improves upon
LSA
and is suitable for
Topic model
.
Let's assume we have a number of topics, each defined as distributions over words. A document is generated through the following process: First, we choose a distribution over the topics. Then, for each word position, we select a topic assignment and choose a word from that corresponding topic.

Method

  1. For each of the topics, draw a multinomial distribution from a
    Dirichlet distribution
    with parameter which controls the mean shape and sparsity of
  1. For each of the documents, draw w a multinomial distribution from a
    Dirichlet distribution
    with parameter which controls the mean shape and sparsity of
  1. For each word position in a document
    1. Select a latent topic from the multinomial distribution
    2. Choose the observation from the multinomial distribution
and has parameters where is the size of the vocabulary across all documents.

Modeling

Posterior is impossible to compute so we approximate it

Approximation using
Gibbs sampling

  1. Initialize probabilities randomly or uniformly
  1. In each step, replace the value of one of the variables by a value drawn from the distribution of that variable conditioned on the values of the remaining variables
  1. Repeat until convergence
Estimate the probability of assigning to each topic, conditioned on the topic assignments () of all other words (notation indicating the exclusion of )
From the above conditional distribution, sample a topic and set it as the new topic assignment of

Comparison

notion image
notion image
 
 
 

Online example

 
 

Recommendations