| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _fat_santa 102 days ago

A question I've been asking alot lately (really since the release of GPT-5.3) is "do I really need the more powerful model"?

I think a big issue with the industry right now is it's constantly chasing higher performing models and that comes at the cost of everything else. What I would love to see in the next few years is all these frontier AI labs go from just trying to create the most powerful model at any cost to actually making the whole thing sustainable and focusing on efficiency.

The GPT-3 era was a taste of what the future could hold but those models were toys compare to what we have today. We saw real gains during the GPT-4 / Claude 3 era where they could start being used as tools but required quite a bit of oversight. Now in the GPT-5 / Claude 4 era I don't really think we need to go much further and start focusing on efficiency and sustainability.

What I would love the industry to start focusing on in the next few years is not on the high end but the low end. Focus on making the 0.5B - 1B parameter models better for specific tasks. I'm currently experimenting with fine-tuning 0.5B models for very specific tasks and long term I think that's the future of AI.

7 comments

namnnumbr 102 days ago

Yes! I'd be totally happy with today's sonnet 4.6 if I could run it locally.

If you can forgive the obviously-AI-generated writing, [CPUs Aren't Dead](https://seqpu.com/CPUsArentDead) makes an interesting point on AI progress: Google's latest, smallest Gemma model (Gemma 4 E2B), which can run on a cell phone, outperforms GPT-3.5-turbo. Granted, this factoid is based on `MT-Bench` performance, a benchmark from 2023 which I assume to be both fully saturated and leaked into the training data for modern LLMs. However, cross-referencing [Artificial Analysis' Intelligence Index](https://artificialanalysis.ai/models?models=gemma-4-e2b-non-...) suggests that indeed the latest 2B open-weights models are capable of matching or beating 175B models from 3-4 years ago. Perhaps more impressive, [Gemma 4 E4B matches or beats GPT-4o](https://artificialanalysis.ai/models?models=gemma-4-e4b%2Cge...) on many benchmarks.

If this trend continues, perhaps we'll have the capabilities of today's best models available to reasonably run on our laptops!

link

renticulous 102 days ago

Does everyone need a graphing calculator? Does everyone need a scientific calculator? Does everyone need a normal calculator? Does everyone need GeoGebra or Desmos ?

link

minimaxir 102 days ago

Many people were hoping that Sonnet 4.6 was "Opus 4.5 quality but with Sonnet speed/cost" but unfortunately that didn't pan out.

link

malfist 102 days ago

You can already see people here saying the same stuff about opus 4.7, saw a comment claiming that Opus 4.7 on low thinking was better than 4.6 on high.

I'm not seeing that in my testing, but these opinions are all vibe based anyway.

link

samuelknight 102 days ago

The cost of intelligence is non-linear, with slightly dumber models costing much less. For a growing surface of problems you do not need frontier intelligence. You should use frontier intelligence for situations where you would otherwise require human intervention throughout the workflow, which is much more expensive than any model.

link

Bridged7756 102 days ago

Efficiency doesn't make as much money. It's in big LLM's best interest to keep inference computationally expensive.

I personally think the whole "the newest model is crazy! You've gotta use X (insert most expensive model)" Is just FOMO and marketing-prone people just parroting whatever they've seen in the news or online.

link

nprateem 102 days ago

So you're happy with an untrustworthy lazy moron prone to stupid mistakes and guesswork?

Surely you can see the first lab that solves this gains a massive advantage?

link

fkealy 102 days ago

I agree, and yet here i am using it... However, I think the industry IS going multiple directions all at once with smaller models, bigger models etc. I need to try out Google's latest models but alas what can one person do in the face of so many new models...

link