t-SNE

Creator

Creator

Seonglae Cho

Created

Created

2023 May 4 5:8

Editor

Editor

Seonglae Cho

Edited

Edited

2025 Mar 4 17:0

Refs

Refs

Data Visualization

t-Stochastic Neighbor Embedding

Most popular tool for visualization

because for visualization, the goal is similarity preserving rather than information preserving

t-SNE has a cost function that is not convex, i.e. with different initializations we can get different results.

The locations of the points in the map are determined by minimizing the KL divergence of the distance distributions

P calculates the similarity x based on Gaussian distribution

Q calculates the similarity between y based on t-distribution

Important Insight

t-SNE plots can sometimes be mysterious or misleading. The t-SNE algorithm adapts its notion of “distance” to regional density variations in the data set. As a result, it naturally expands dense clusters, and contracts sparse ones, evening out cluster sizes. The bad news is that distances between well-separated clusters in a t-SNE plot may mean nothing. t-SNE tends to expand denser regions of data. Since the middles of the clusters have less empty space around them than the ends, the algorithm magnifies them.

A second feature of t-SNE is a tuneable parameter, “perplexity,” which says (loosely) how to balance attention between local and global aspects of your data. The parameter is, in a sense, a guess about the number of close neighbors each point has. Getting the most from t-SNE may mean analyzing multiple plots with different perplexities. for the algorithm to operate properly, the perplexity really should be smaller than the number of points. (5~50 is recommended)

If you see a t-SNE plot with strange “pinched” shapes, chances are the process was stopped too early. Out of sight from the user, the algorithm makes all sorts of adjustments that tidy up its visualizations.

notion image

How to Use t-SNE Effectively

Although extremely useful for visualizing high-dimensional data, t-SNE plots can sometimes be mysterious or misleading.

https://distill.pub/2016/misread-tsne/

How to Use t-SNE Effectively

Geoffrey Hinton

Visualizing Data using t-SNE

We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images ofobjects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.

https://jmlr.org/papers/v9/vandermaaten08a.html

Backlinks

Recommendations

////////