LSA

Created
Created
2024 Feb 24 2:12
Editor
Creator
Creator
Seonglae Cho
Edited
Edited
2025 Mar 12 12:19
Refs
Refs
LDA

Latent Semantic Analysis

Use simple latent build techniques such as
SVD
on the term-document matrix representing the frequency of terms in documents.
  • : each topic’s distribution over terms
  • : diagonal matrix, can be seen as a topic importance / weight
  • : each documents’s distribution over topics

Cons

  • SVD has a significant computational cost
  • No intuition about the origin of the topics

pLSA (Probabilistic LSA)

The topic distribution that characterizes a document in our collection determines which words should exist in it.
In a document , a word is generated from a single topic from the assumed ones, and given that topic, the word is independent of all the other words in the document.
We derive this through transforming from
Joint Distribution
for and (single word in the document) using
Conditional probability
distribution based on
Bayes Theorem
→ Joint distribution for and (all words in the document)
notion image

Solution

EM Algorithm
(probabilistic, not like the deterministic property of traditional LSA approach)

Cons

  • Parameter grows linearly across document count
  • To deal with new document, it needs to repeat EM
 
 
 
 

Recommendations