|
|
|
|
|
by wrsh07
900 days ago
|
|
I wish there were more context and maybe the ability to do math on the vectors Eg what is the real distance between the two vectors? That should be easy to compute Similarly: what do I get from summing two vectors and what are some nearby vectors? Maybe just generally: what are some nearby vectors? Without any additional context it's just a point cloud with a couple of randomly labeled elements |
|
People rarely have to get down to the real true metal on the embeddings models, and they're not what people think they are from their memory of word2vec. Ex. there's actually one vector emitted _per token_, the final vector is the mean. And cosine distance for similarity is the only metric anyone is training for.
In summary, there's ~no reason to think a visualization trying to show multiple vectors will ever be meaningful. Even just starting from "they have way way way more dimensions than we can represent visually" is enough to rule it out
Mini LM v2, foundation of most vector dbs, is 384 dims.
n.b. dear reader, if you've heard of that: you should be using v3! V3 is for asymmetric search, aka query => result docs. V2 is for symmetric search, aka chunk of text => similarly worded chunks of texts. It's very very funny how few people read the docs, in this case, the sentence transformers site.