Data Clustering

Creator
Creator
Seonglae Cho
Created
Created
2023 May 11 1:56
Editor
Edited
Edited
2025 Mar 25 11:4

Visualization, Auxiliary information

Clusters should be as different as possible where values inside the cluster should be as similar as possible (
Contrastive Learning
).
In general, a good clustering algorithm produces clusters that have small within-cluster variance and large between clusters variances.
  • Density-based Clustering
  • Distribution-based Clustering
  • Centroid-based Clustering
  • Hierarchical Clustering

High level method

  • Data partition

Iteration method

  1. Start with an initial partition of the data
  1. Iterate, reassigning cluster membership so that some criteria
Data Clustering Methods
 
 
 
Can be used to select the optimal number of clusters. In practice quantitative performance measures have limited use while qualitative measures are used. Cluster are best validated using downstream tasks.
Data Clustering Metrics
 
 
 
 
 

 

Recommendations