|
|
|
|
|
by andy99
966 days ago
|
|
What is the use case for an 8k token embedding? My (somewhat limited) experience with long context models is they aren't great for RAG. I get the impression they are optimized for something else, like writing 8k+ tokens rather than synthesizing responses. Isn't the normal way of using embedding to find relevant text snippets for a RAG prompt? Where is it better to have coarser retrieval? |
|
Calculating embeddings on larger documents than smaller-window embedding models.
> My (somewhat limited) experience with long context models is they aren't great for RAG.
The only reason they wouldn't be great for RAG is that they aren't great at using information in their context window, which is possible (ISTR that some models have a strong recency bias within the window, for instance) but I don't think is a general problem of long context models.
> Isn't the normal way of using embedding to find relevant text snippets for a RAG prompt?
I would say the usual use is for search and semantic similarity comparisons generally. RAG is itself an application of search, but its not the only one.