Hacker News new | ask | show | jobs
by jsnell 181 days ago
There are plenty. But it's not the comparison you want to be making. There is too much variability between the number of tokens used for a single response, especially once reasoning models became a thing. And it gets even worse when you put the models into a variable length output loop.

You really need to look at the cost per task. artificialanalysis.ai has a good composite score, measures the cost of running all the benchmarks, and has 2d a intelligence vs. cost graph.

1 comments

thanks
For reference the above completely depends on what you're using them for. For many tasks, the number of tokens used is consistent within 10~20%.