| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by machinelearning 792 days ago

No, you've only discussed the Retrieval part of RAG, not the generation part.

The current workflow is to use the embedding to retrieve documents then dump the text corresponding to the embedding into the LLM context for generation.

Often, the embedding is from a different model from the LLM and it is not compatible with the generation part.

So yea, RAG does not pre-compute the KV for each document.

1 comments

Prosammer 791 days ago

I see what you're saying now, thanks for clarifying.

link