t-distributed Stochastic Neighbour Embedding
t-SNE is a non-linear dimensionality reduction, ie, it allows us to separate data which cannot be separated by any straight line.
t-SNE vs PCA
- t-SNE is iterative, so unlike PCA it cannot be applied to another dataset.
- t-SNE is used to understand high-dimensional data, and project it into low-dimensional space (like 2D or 3D).
Detailed Explanation
- Problem: To understand high-dimensional datasets, less useful to perform dimensionality reduction for ML training.
- Details:
- t-SNE cannot be reapplied similar to PCA, since t-SNE is iterative and non deterministic.
- STEP 1: Creating similarities, ie, probability distribution.
- Create probability distribution that represents similarities between neighbours.
- Here similarity of datapoint to datapoint is the conditional probability , that would pick as its neighbour.
- This similarity is proportional to probability density under Gaussian centered at .
- We can distinguish b/w similar and non-similar points, but absolute values of probability are much smaller than in first example (compare Y-axis values).
- We can fix that by normalisation
- This scales all values to have a sum equal to 1. We set , not 1.
- STEP 2: Dealing with different distances
- Where N is number of dimensions.
- STEP 3: Final formula
- We haven’t used any variance, for gaussian distribution.
- This is the final formula
- STEP 4: Create low-dimensional space
- Gaussian distribution has “short tail”, so it creates crowding problem.
- To solve it we use Student t-distribution with a single degree of freedom.
- Visual comparison between Gaussian and Student t-distribution
The Gradient Descent
To optimise this distribution t-SNE is using Kullback-Leibler divergence between the conditional probabilities and .
The gradient descent step can be treated as repulsion and attraction between points.
What are we updating in t-SNE gradient descent optimisation step???
We are updating the y_i position, the datapoint around which we are estimating the distribution.
t-SNE and CNN feature maps
t-SNE can be used when dealing with CNN feature maps, it helps us understand which input data seems similar from the n-layer features extracted for an image.