Hacker News new | ask | show | jobs
by jmward01 59 days ago
Haiku not getting an update is becoming telling. I suspect we are reaching a point where the low end models are cannibalizing high end and that isn't going to stop. How will these companies make money in a few years when even the smallest models are amazing?
4 comments

Isn't it pretty common for the smaller models to release a little while after the bigger ones, for all the big model providers?
The last update for Haiku was in October, or in startup land, 10 years ago.
It seems to be a rule that older models are more expensive than newer ones. The low end models have higher $CPT and worse output. I wonder if the move is to just have one model and quantize if you hit compute constraints
> It seems to be a rule that older models are more expensive than newer ones.

It isn't. Gemini has gotten more expensive with each release. Anthropic has stayed pretty similar over time, no? When is the last time OpenAI dropped API prices? OpenAI started very high because they were the first, so there was a ton of low hanging fruit and there was much room to drop.

I'm talking about gross margins, not revenue.

It's well known that GPT-4 is much more expensive to operate than the GPT-5 family.

Of course they won't drop the prices; it's pure profit if they make models more efficient.

Google is putting a lot of research into small models. Most of my AI budget is now going to small models because I am doing lots of tiny tasks that the small models do great with. I would think a decent chunk of Goog's API revenue probably comes from their small models.
The Gemma models are at this point. A 31B model that can fit on a consumer card is as good as Sonnet 4.5. I haven't put it through as much on the coding front or tool calling as I have the Claude or GPT models, but for text processing it is on par with the frontier models.
absolutely not on par you're smoking
You make a compelling argument, but thankfully I have data to back up my anecdotal experience

This comparison shows them neck and neck https://benchlm.ai/compare/claude-sonnet-4-5-vs-gemma-4-31b

As Does this one https://llm-stats.com/models/compare/claude-sonnet-4-6-vs-ge...

And the pelican benchmark even shows them pretty close https://simonwillison.net/2026/Apr/2/gemma-4/ https://simonwillison.net/2025/Sep/29/claude-sonnet-4-5/

Also this isn't a fringe statement, you can see most people who have done an evaluation agree with me

I think one area I find hard to get around is context length. Everything self hosted is so limited on length that it is marginal to use. Additionally I think that the tools (like claude code) are clearly in the training mix for Anthropic's models so they seem to get a boost over other models pushed into that environment. That being said, open source and local inference is -really- good and only going to get better. There is no doubt that the current frontier biz model is not sustainable.
if you look at the details of the numbers of the benchmarks that you shared, Sonnet 4.5 crushes gemma 4. Somehow the first link doesn't run Sonnet on the multi modal benchmark, that's why the top score looks close, it beats Gemma at every benchmark they actually ran. The arena in the second shows that it actually destroys Gemma 4 as well, not close
The second one is Sonnet 4.6 not 4.5. If you change it to 4.5 Gemma 4 actually beats 4.5
Just to be clear, did you notice the parent said 4.5?
They are also on par in a lot of classification tasks. I did have to actually use gemma4 and fine tune it a bit but that is part of the value add.
I did, what's your point?