|
|
|
|
|
by keeeba
174 days ago
|
|
I don’t have the experiments to prove this, but from my experience it’s highly variable between embedding models. Larger, more capable embedding models are better able to separate the different uses of a given word in the embedding space, smaller models are not. |
|
When Claude is using our embed endpoint to embed arbitrary text as a search vector, it should work pretty well cross-domains. One can also use compositions of centroids (averages) of vectors in our database, as search vectors.