|
|
|
|
|
by janalsncm
256 days ago
|
|
Rerankers are orders of magnitude faster and cheaper than LLMs. Typical latency out of the box on a decent sized cross encoder (~4B) will be under 50ms on cheap gpus like an A10G. You won’t be able to run a fancy LLM on that hardware and without tuning you’re looking at hundreds of ms minimum. More importantly, it’s a lot easier to fine tune a reranker on behavior data than an LLM that makes dozens of irrelevant queries. |
|