Hacker News new | ask | show | jobs
by moralestapia 486 days ago
In my experience, storing RAG chunks with a little bit of context helps a lot when doing the retrieval, then you can skip the whole "rerank" bit and halve your cost and latency.

With embedding/generative models becoming better with time, the need for a rerank step will be optimized away.

1 comments

Huh? Rerank is always a boost on top of retrieval. So regardless of the chunking method or model you use, reranking with a good model will always result in higher MRR. And improvements in embedding models also will never solve the problem of merging lexical and vector search results. Rank/score fusion are flawed since both are hardly comparable and boosting only works sometimes. Whereas rerankers generally do a pretty good job at this. Performance is indeed the biggest issue here. Rerankers are slow as hell and simply not feasible for some use cases.