LDA

Created
Created
2023 Jun 11 8:58
Editor
Creator
Creator
Seonglae ChoSeonglae Cho
Edited
Edited
2025 Mar 12 12:41

Latent Dirichlet Allocation

Motivation

LDA is an algorithm that improves upon
LSA
and is suitable for
Topic model
.
Let's assume we have a number of topics, each defined as distributions over words. A document is generated through the following process: First, we choose a distribution over the topics. Then, for each word position, we select a topic assignment and choose a word from that corresponding topic.

Method

  1. For each of the topics, draw a multinomial distribution from a
    Dirichlet distribution
    with parameter which controls the mean shape and sparsity of
  1. For each of the documents, draw w a multinomial distribution from a
    Dirichlet distribution
    with parameter which controls the mean shape and sparsity of
  1. For each word position in a document
    1. Select a latent topic from the multinomial distribution
    2. Choose the observation from the multinomial distribution
and has parameters where is the size of the vocabulary across all documents.

Modeling

Posterior is impossible to compute so we approximate it

Approximation using
Gibbs sampling

  1. Initialize probabilities randomly or uniformly
  1. In each step, replace the value of one of the variables by a value drawn from the distribution of that variable conditioned on the values of the remaining variables
  1. Repeat until convergence
Estimate the probability of assigning to each topic, conditioned on the topic assignments () of all other words (notation indicating the exclusion of )
From the above conditional distribution, sample a topic and set it as the new topic assignment of

Comparison

notion image
notion image
 
 
 

Online example

mimno.infosci.cornell.edu
www.jmlr.org
[ML] PCA(주성분분석), SVD, LDA by Fisher
※해당 게시물에 사용된 일부 자료는 순천향대학교 빅데이터공학과 정영섭 교수님의 머신러닝 전공수업 자료에 기반하였음을 알려드립니다. 이번 포스팅에서는 선형대수의 벡터를 기반으로 하는 모델들에 대해서 알아보려고 한다. 앞으로 소개할 모델들을 이해하기 위해서는 벡터공간에서의 기저(Basis)에 대한 개념 이해가 중요하다. 필자도 이 모델에 대한 수업을 들으면서 선형대수의 기초 개념 공부의 필요성을 느꼈고 개인적으로 오픈 소스 강의를 통해 학습을 진행하고 있다. 앞으로 소개할 목차는 다음과 같다. 1. 기본적으로 알고가야 할 개념 2. Eigen Value(고유값) Decomposition 3. SVD(Singular Value Decomposition) 4. PCA(Principal Component Anal..
[ML] PCA(주성분분석), SVD, LDA by Fisher
 
 

Recommendations