Hacker News new | ask | show | jobs
by nl 32 days ago
Yes it's a big thing that people are slowly becoming more aware of.

Nvidia models are even worse than Qwen! https://sql-benchmark.nicklothian.com/#token-efficiency-and-... (mouse over the cells for token counts and click for traces)

Gemma 4 is good for this, as AA notes:

> Gemma 4 31B is notably token efficient, using 39M output tokens to run the Intelligence Index vs 98M for Qwen3.5 27B (Reasoning). This is ~2.5x fewer output tokens for a model scoring 3 points lower. For context, the other models at the 42-point intelligence level also use significantly more tokens: MiniMax-M2.5 (56M), DeepSeek V3.2 (Reasoning, 61M), and GLM-4.7 (Reasoning, 167M)

https://artificialanalysis.ai/articles/gemma-4-everything-yo...

1 comments

Right and all of my own evals back this up for Gemma 4...

...except its notably worse at coding in an agent context even with a harness setup to do exactly what Google says it should do (wrt. to sending summarised thinking back and so on)

So despite it being far better token efficiency wise, it's just worse for what I need to use it for compared to DSv4 Flash or Qwen 3.6 27B

Such a shame, too.