Hacker News new | ask | show | jobs
by lmeyerov 65 days ago
We find it true in Louie.ai evals (ai for investigations), about a 10-20% lift which meaningful. It'd measured here: botsbench.com .

Unfortunately, undesirable in practice due to people being token-constrained even before. One case is retrying only on failure, but even that is a bit tricky...