We estimate covariance and mean vector of Gaussian Distributions
K-means Clustering approximated when variance approximate to 0 and all same
Mixture models can be used to build complex distribution and to cluster data
Solution
We can get by derivation of GMM’s Log Likelihood but they are independent so we need an algorithm such as EM Algorithm to get a solution.
- E: fix the distributions’ parameters, compute responsibilities using current parameter values (probabilities of belonging to each cluster)
- M: Re-estimate the parameters using the current responsibilities (fix the responsibilities)
Detail
- Initialize the distributions’ parameters and evaluate the initial value of the log likelihood
- For each data point, compute responsibilities using current parameter values
- Re-estimate the parameters using the current responsibilities
- Evaluate the log likelihood and check for convergence of either the parameters or the loglikelihood. If the convergence criterion not satisfied return to step 2
Pros
- Soft assign points to clusters
- Convergence is guaranteed but it may reach a locally optimal solution