Hacker News new | ask | show | jobs
by bearzoo 3806 days ago
i once read that high perplexity can generate embeddings that are very tightly bound on a unit circle...not sure if this is what is going on
1 comments

The diagram shown is only a visualization. The actual word vectors have many dimensions. To reduce them to 2 dimensions, they use a method which tries to keep vectors that are similar as close to each other as possible, but also unsimilar words apart. This creates the shape seen on the scatter plot. Just looking at the scatter plot by itself doesn't tell you anything about the underlying data.
I did not claim anything about the underlying data.. The 2 dimensional embeddings were forced into the unit circle because the 'perplexity' hyper parameter for the t-sne was set too high.

From the guy who helped make t-sne:

When I run t-SNE, I get a strange ‘ball’ with uniformly distributed points?

This usually indicates you set your perplexity way too high. All points now want to be equidistant. The result you got is the closest you can get to equidistant points as is possible in two dimensions. If lowering the perplexity doesn’t help, you might have run into the problem described in the next question. Similar effects may also occur when you use highly non-metric similarities as input.

> Just looking at the scatter plot by itself doesn't tell you anything about the underlying data.

Well if that were the case it would be perfectly pointless to make such a visualization... The goal of dimensionality reduction is to provide a useful summarization of the data; it is a valid question to ask to what degree it is successful at that.