Hacker News new | ask | show | jobs
by montebicyclelo 973 days ago
LLM vectors do have decent linear properties already. But for document embedding purposes they are often further trained for retrieval via cosine similarity, which enhances this, e.g. see table 1 in [1], avg retrieval performancs using BERT goes up from 54 to 76 after fine-tuning for embeddings.

[1] https://arxiv.org/pdf/1908.10084.pdf