t-Stochastic Neighbor Embedding
Most popular tool for visualization
- because for visualization, the goal is similarity preserving rather than information preserving
- t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.
- The locations of the points in the map are determined by minimizing the KL divergence of the distance distributions
- P calculates the similarity x based on Gaussian distribution
- Q calculates the similarity between y based on t-distribution
Important Insight
t-SNE plots can sometimes be mysterious or misleading. The t-SNE algorithm adapts its notion of “distance” to regional density variations in the data set. As a result, it naturally expands dense clusters, and contracts sparse ones, evening out cluster sizes. The bad news is that distances between well-separated clusters in a t-SNE plot may mean nothing. t-SNE tends to expand denser regions of data. Since the middles of the clusters have less empty space around them than the ends, the algorithm magnifies them.
A second feature of t-SNE is a tuneable parameter, “perplexity,” which says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. Getting the most from t-SNE may mean analyzing multiple plots with different perplexities. for the algorithm to operate properly, the perplexity really should be smaller than the number of points. (5~50 is recommended)
If you see a t-SNE plot with strange “pinched” shapes, chances are the process was stopped too early. Out of sight from the user, the algorithm makes all sorts of adjustments that tidy up its visualizations.