|
|
|
|
|
by moralestapia
486 days ago
|
|
In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency. With embedding/generative models becoming better with time, the need for a rerank step will be optimized away. |
|