| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by refulgentis 899 days ago

You're actually kinda hitting the nail on the head. _Generally_, the word2vec woman + king = queen thing was cute but not very real.

People rarely have to get down to the real true metal on the embeddings models, and they're not what people think they are from their memory of word2vec. Ex. there's actually one vector emitted _per token_, the final vector is the mean. And cosine distance for similarity is the only metric anyone is training for.

In summary, there's ~no reason to think a visualization trying to show multiple vectors will ever be meaningful. Even just starting from "they have way way way more dimensions than we can represent visually" is enough to rule it out

Mini LM v2, foundation of most vector dbs, is 384 dims.

n.b. dear reader, if you've heard of that: you should be using v3! V3 is for asymmetric search, aka query => result docs. V2 is for symmetric search, aka chunk of text => similarly worded chunks of texts. It's very very funny how few people read the docs, in this case, the sentence transformers site.