Hacker News new | ask | show | jobs
by sho 57 days ago
Taking the article's 5% accuracy improvement at face value: if true, then it's more than worth the token inflation IMO. That's because of tool call chains, where errors compound and accumulate, and small improvements in accuracy get greatly magnified.

Again, the article's numbers are likely a rather crude approximation, but taking 85% accuracy (claude 4.6) vs 90% (4.7) as inputs:

  4.6 1 iteration 85%
  4.7 1 iteration 90%
  4.6 5 iterations 44.37%
  4.7 5 iterations 59.85%
  4.6 10 iterations 19.69%
  4.7 10 iterations 34.87%
Compounded, small improvements really move the needle downstream. 1.4x doesn't seem worth it for 5% better, but 10 calls in, that's more than a 40% improvement.
1 comments

You're assuming errors cannot be retried/recovered. They can.
You're assuming errors are a clear failure that can be identified and retried, rather than a silent drift from user intent that simply feeds bad but well-formed results into the next step. They're not. Well, of course sometimes they are, but the much more insidious failure mode is doing the wrong thing in the right way.