|
|
|
|
|
by dinobones
363 days ago
|
|
So this is basically an “embedding of embeddings”, an approximation of multiple embeddings compressed into one, to reduce dimensionality/increase performance. All this tells me is that: the “multiple embeddings” are probably mostly overlapping and the marginal value of each additional one is probably low, if you can represent them with a single embedding. I don’t otherwise see how you can keep comparable performance without breaking information theory. |
|
This is the point of the paper. Specifically, that single embedding vectors are sparse enough that you can compact more data from additional vectors together to improve retrieval performance.