Hacker News new | ask | show | jobs
by h14h 66 days ago
For some real data, Artificial Analysis reported that 4.6 (max) and 4.7 (max) used 160M tokens and 100M tokens to complete their benchmark suite, respectively:

https://artificialanalysis.ai/?intelligence-efficiency=intel...

Looking at their cost breakdown, while input cost rose by $800, output cost dropped by $1400. Granted whether output offsets input will be very use-case dependent, and I imagine the delta is a lot closer at lower effort levels.

1 comments

This is the right way of thinking end-to-end.

Tokenizer changes are one piece to understand for sure, but as you say, you need to evaluate $/task not $/token or #tokens/task alone.