Hacker News new | ask | show | jobs
by godelski 478 days ago
These are terrible graphs. Sorry, but how do I read them? There are labels for the clusters (?) provided in the text but the legend is just called cluster so what does it represent? Dates? Sometimes there's more clusters than labels. A good graph is worth thousands of words but a bad graph is worth a thousand miscommunications
1 comments

They literally just represent clusters in a learned embedding vector space that's not necessarily well understood, but is believed to map words or phrases with similar meanings to vectors that point in similar directions in a high-dimensional space. The axes themselves don't have any understandable meanings.

https://en.wikipedia.org/wiki/T-distributed_stochastic_neigh...

I appreciate you trying to help, but I think you are misunderstanding my complaint. The issue here is that even with embeddings, things get labels. Clusters have labels (organized by color) and individual data have labels. Nether of these are well defined so it is not super clear what is being said.

(I'm actually well aware of T-SNE. FYI, it is not a great tool to use and people often conflate it with PCA or dimensionality reductions. Probably fine here because it is concerned with grouping.)