|
|
|
|
|
by ntonozzi
618 days ago
|
|
One important factor this article neglects to mention is that modern text embedding models are trained to maximize distance of dissimilar texts under a specific metric. This means that the embedding vector is not just latent weights plucked from the last layer of a model, but instead specifically trained to be used with a particular distance function, which is the cosine distance for all the models I'm familiar with. You can learn more about how modern embedding models are trained from papers like Towards General Text Embeddings with Multi-stage Contrastive Learning (https://arxiv.org/abs/2308.03281). |
|
The next level of understanding is asking how “similar” and “dissimilar” are chosen. As an example, should texts about the same topic be considered similar? Or maybe texts from the same user (regardless of what topic they’re talking about)?