| HN Mirror

“For RAG” is ambiguous.

First there is a leaderboard for embeddings. [1]

Even then, it depends how you use them. Some embeddings pack the highest signal in the beginning so you can truncate the vector, while most can not. You might want that truncated version for a fast dirty index. Same with using multiple models of differing vector sizes for the same content.

Do you preprocess your text? There will be a model there. Likely the same model you would use to process the query.

There is a model for asking questions from context. Sometimes that is a different model. [2]