Hacker News new | ask | show | jobs
by kachapopopow 123 days ago
well yah, that's what I mean how better is it versus cat + grep + manual line counting. Agents tend to perform worse with niche tools
2 comments

It was really helpful to make and run a benchmark - it led to some important changes and improvements, so thanks again for your question kp!

The result is ~17% reduction in raw cost. If calculated per correct answer, its ~25% reduction per correct answer.

Just posted the update -> https://news.ycombinator.com/item?id=47016959

Thank you for this question - I'm building out a benchmark now. Initial results are very promising, will update you once it's done!