Gaussian Mixture Model, Mixture of Gaussian, MoGThe Gaussian Mixture distribution is a linear superposition of Gaussians. Mixture models can be used to build complex distribution and to cluster data.
p ( x ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) p(x) = \sum_{k=1}^K\pi_k N(x|\mu_k, \Sigma_k) p ( x ) = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) 각 데이터 포인트가 특정한 가우시안 분포에서 생성된다고 가정하고 z variables are
Latent Variable Multinomial Distribution p ( z k = 1 ) = π k , p ( z ) = ∏ k = 1 K π k z k , ∑ k = 1 K π k = 1 p(z_k=1) = \pi_k, p(z) = \prod_{k=1} ^K \pi_k^{z_k}, \sum_{k=1}^K \pi_k = 1 p ( z k = 1 ) = π k , p ( z ) = ∏ k = 1 K π k z k , ∑ k = 1 K π k = 1 we introduce K-dimensional binary random variable z in which only one element zk is equal to 1 and the others are all 0.
p ( x ∣ z k = 1 ) = N ( x ∣ μ k , Σ k ) , p ( x ∣ z ) = ∏ k = 1 K N ( x ∣ μ k , Σ k ) z k p(x|z_k = 1) = N(x|\mu_k , \Sigma_k)
, p(x|z) = \prod_{k=1}^{K} N(x|\mu_k , \Sigma_k)^{z_k}
p ( x ∣ z k = 1 ) = N ( x ∣ μ k , Σ k ) , p ( x ∣ z ) = ∏ k = 1 K N ( x ∣ μ k , Σ k ) z k Therefore we can induce first equation easily using
Marginalization p ( x ) = ∑ z p ( x ∣ z ) p ( z ) = ∑ z ∏ k = 1 K N ( x ∣ μ k , Σ k ) z k ∏ k = 1 K π k z k = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) p(x) = \sum_{z} p(x|z) p(z) = \sum_{z} \prod_{k=1}^{K} N(x|\mu_k, \Sigma_k)^{z_k} \prod_{k=1}^{K} \pi_k^{z_k}= \sum_{k=1}^{K} \pi_k N(x|\mu_k, \Sigma_k) p ( x ) = ∑ z p ( x ∣ z ) p ( z ) = ∑ z ∏ k = 1 K N ( x ∣ μ k , Σ k ) z k ∏ k = 1 K π k z k = ∑ k = 1 K π k N ( x ∣ μ k , Σ k ) Responsibility γ \gamma γ γ ( z n k ) \gamma(z_{nk}) γ ( z nk ) is responsibility of component
k k k for
x n x_n x n γ ( z k ) = p ( z k = 1 ∣ x ) = p ( x ∣ z k = 1 ) p ( z k = 1 ) p ( x ) = π k N ( x ∣ μ k , Σ k ) ∑ j = 1 K π j N ( x ∣ μ j , Σ j ) \gamma(z_k) = p(z_k = 1|x) = \frac{p(x|z_k=1)p(z_k =1)}{p(x)}
\\
= \frac{\pi_k N(x|\mu_k, \Sigma_k)}{\sum_{j=1}^{K} \pi_j N(x|\mu_j, \Sigma_j)}
γ ( z k ) = p ( z k = 1∣ x ) = p ( x ) p ( x ∣ z k = 1 ) p ( z k = 1 ) = ∑ j = 1 K π j N ( x ∣ μ j , Σ j ) π k N ( x ∣ μ k , Σ k ) Likelihood p ( X ∣ π , μ , Σ ) = ∏ n = 1 N p ( x n ∣ π , μ , Σ ) = ∏ n = 1 N ∑ k = 1 K p ( x n ∣ z = k ) p ( z = k ) = ∏ n = 1 N ∑ k = 1 K π k N ( x n ∣ μ k , Σ k ) p(X|\pi, \mu, \Sigma) = \prod_{n=1}^{N} p(x_n|\pi, \mu, \Sigma) = \prod_{n=1}^{N} \sum_{k=1}^{K} p(x_n|z=k)p(z=k) = \prod_{n=1}^{N} \sum_{k=1}^{K} \pi_k N(x_n|\mu_k, \Sigma_k) p ( X ∣ π , μ , Σ ) = ∏ n = 1 N p ( x n ∣ π , μ , Σ ) = ∏ n = 1 N ∑ k = 1 K p ( x n ∣ z = k ) p ( z = k ) = ∏ n = 1 N ∑ k = 1 K π k N ( x n ∣ μ k , Σ k ) Log Likelihood log p ( X ∣ π , μ , Σ ) = ∑ n = 1 N log ( ∑ k = 1 K π k N ( x n ∣ μ k , Σ k ) ) \log p(X|\pi, \mu, \Sigma) = \sum_{n=1}^{N} \log \left( \sum_{k=1}^{K} \pi_k N(x_n|\mu_k, \Sigma_k) \right) log p ( X ∣ π , μ , Σ ) = ∑ n = 1 N log ( ∑ k = 1 K π k N ( x n ∣ μ k , Σ k ) )