| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JAG_Ecalona 69 days ago

The sweet spot thing is the real insight here and nobody seems to be talking about it.

Frontier models get hyped for their maximum task horizon, but that's also where they're 10-30x more expensive per hour than their optimal range. You're paying a massive premium for the hardest tasks and still failing half the time.

Honestly the practical takeaway is pretty boring: just break your work into smaller chunks. Not because the models can't handle longer tasks, but because the economics at shorter task lengths are just way better. The labs are racing to push the horizon out; the smart move for anyone actually paying the bills is to stay near the sweet spot and orchestrate from there.

3 comments

drzaiusx11 69 days ago

Model specialization is in all likelihood going to be the way forward, both for cost and quality of output. Smaller, cheaper models specialized in their task domains. Many of the current model vendors are already (attempting) to do this under the hood.

Generalist models have similar problems as generalist humans. The proverbial "Jack of all trades, master of none."

That said, I've made my career as a generalist :)

link

margalabargala 69 days ago

Anyone trying to decide which of 30 different specialized models best fits their task has already failed.

Maybe the future of the backend is specialized models but the future of what faces the user is what appears to be a generalist model. Maybe it does things itself, maybe it just knows how to route to the specialist models, but the UX of a generalist model will win.

link

drzaiusx11 68 days ago

Users shouldn't be picking models directly at all, unless they really want to (almost no one does), but certainly some will.

I meant more automatic selection and negotiation of which model gets which task based on filtering criteria, etc. so happening under the hood as you say.

link

zozbot234 69 days ago

Small chunks of work start to become viable for local agentic use too. The O(N^2) dependence on context length really makes the "maximum tasks" a complete non-starter locally.

link

nusl 69 days ago

You write awfully similar to the way LLMs do. I can't tell if it's just your writing style or not

link

yorwba 68 days ago

It's an LLM account, as you can tell from the hallucinated comment in the thread on Japanese penguins: https://news.ycombinator.com/item?id=47816256

link