Hacker News new | ask | show | jobs
by graphviz 297 days ago
What do people learn from visualizations like this?

What is the most important problem anyone has solved this way?

Speaking as somewhat of a co-defendant.

4 comments

Not everything has to be directly informative or solve a problem. Sometimes data visualization can look pretty for pretty's sake.

Dimensionality reduction/clustering like this may be less useful for identifying trends in token embeddings, but for other types of embeddings it's extremely useful.

Agreed. The fact that it has any structure at all is fascinating (and super pretty). Could signal at interesting internal structures. I would love to see a version for Qwen-3 and Mistral too!

I wonder if being trained on significant amounts of synthetic data gave it any unique characteristics.

I lets you inspect what actually constitutes a given cluster, for example it seems like the outer clusters are variations of individual words and their direct translations, rather than synonyms (the ones I saw at least).
> What do people learn from visualizations like this?

Applying the embeddings model to some dataset of yours of interest, and then a similar visualization, is where it gets cool because you can visually look at clusters and draw conclusions about the closeness of items in your own dataset

Embedding visualizations have helped identify bias in word embeddings (Word2Vec), debug entity resolution systems, and optimize document retrieval by revealing semantic clusters that inform better indexing strategies.
Interesting, glad to know it's been useful for some specific contributions. (Not questioning that interesting-looking, appealing displays as overviews for general awareness are also worthwhile.)