K-means

Creator
Created
Created
2019 Nov 5 3:14
Editor
Edited
Edited
2024 Oct 17 10:12

Optimize minimize sum of similarity measure (
Euclidean Distance
)

  1. Initialize cluster centroids randomly
  1. Assign examples in dataset to clusters by
  1. Update centroid for each cluster using
  1. Repeat steps 2 and 3 until convergence

Complexity

K is hyper-parameter and K center, N number of data point
  1. Starting from randomly chosen K centroids
  1. Given μ, find optimal assignment variable z - calculate distance based on dimension
  1. Given z, find the optimal centroids (mean) μ
  1. Until Convergence (proved always)

Calculating Loss after convergence for evaluation or
Elbow method

K-means Notion
 
 

Limitation

It is sensitive to initial centroids, data outlier
K-means Substitutes
notion image
 
 
 
 
 

Recommendations