Data Clustering

Visualization, Auxiliary information

Clusters should be as different as possible where values inside the cluster should be as similar as possible (

In general, a good clustering algorithm produces clusters that have small within-cluster variance and large between clusters variances.

Density-based Clustering

Distribution-based Clustering

Centroid-based Clustering

Hierarchical Clustering

High level method

Data partition

Unsupervised
Feature Extraction

Iteration method

Start with an initial partition of the data

Iterate, reassigning cluster membership so that some criteria

Data Clustering Methods

Nonparametric Clustering

Parametric Clustering

Downstream Clustering

Can be used to select the optimal number of clusters. In practice quantitative performance measures have limited use while qualitative measures are used. Cluster are best validated using downstream tasks.

V-measure

Homogeneity

Completeness

Mutual information

Data Clustering Metrics

Dissimilarity metrics

Silhoute Coefficient

Rand Index

Data Clustering

Visualization, Auxiliary information

High level method

Iteration method

Backlinks

Recommendations