Visualization, Auxiliary information
Clusters should be as different as possible where values inside the cluster should be as similar as possible (Contrastive Learning).
In general, a good clustering algorithm produces clusters that have small within-cluster variance and large between clusters variances.
- Density-based Clustering
- Distribution-based Clustering
- Centroid-based Clustering
- Hierarchical Clustering
High level method
- Data partition
- Unsupervised Feature Extraction
Iteration method
- Start with an initial partition of the data
- Iterate, reassigning cluster membership so that some criteria
Data Clustering Methods
Can be used to select the optimal number of clusters. In practice quantitative performance measures have limited use while qualitative measures are used. Cluster are best validated using downstream tasks.
Data Clustering Metrics