Hacker News new | ask | show | jobs
by jerpint 893 days ago
Yes but they are not trained to explicitly encourage similar texts to be semantically similar, only to do next token prediction. In embedding models a contrastive loss is used to minimize distance between pairs of semantically similar content and maximize distance to all other embeddings