Hacker News new | ask | show | jobs
by CuriouslyC 6 days ago
Models like Cursor's Composer 2.5 show that you can get real work done without the crazy costs just by focusing on a domain. AGI is silly in part because models are spiky, in addition to making the model more expensive for all queries, you can't easily tell a priori what the model will be good at. The smaller focused model is cheaper to run and if you try to ask a coding question to a biology/chemistry model (or vice versa) it's user error rather than ignorance of the underlying training data distribution.
1 comments

"Focusing on a domain" has a hard ceiling.

A model's capability is a function of model size, and you can only push a small overspecialized "idiot savant" model so far before its crippling size starts to bite you.

You can make a model like Composer 2.5. But Mythos 5 will beat it on capability, both at coding and at everything else. And the world is always hungry for more capabilities.

If you're running high on agentic AI and low on human oversight, paying x2 for going from 5% faults to 2% faults is a good deal.

I'm not a very smart person, so take what I say with a grain of salt.

I think the path forward will have agents that use models that are individually specialized tasks (some might use a bigger model, some might use smaller models), then orchestrators that are good at knowing when to use which agent type.

I've played around with this in my own tiny coding agents, for TTRPG NPCs, and even a small experiment where LLMs controlled a MUD client as an NPC that played the game with you (only 5 rooms in the experiment).

Basically, break the tasks down into chunks so you don't have to use generalist models for everything, and can chose the right model for the job.

I'm also running all of this locally, where a generalist foundation model doesn't work, and heavily quantized models don't perform well for all tasks, so for unlimited token budgets, my solution is probably overkill.

"Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache.

But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action.

Mythos is 20x more expensive though
Fable 5 is listed at merely x2 of Opus 4.8 on OpenRouter. $10/$50 per 1M I/O, vs $5/$25.

Now, Fable 5 is currently borderline unusable because of asinine filters. But I assume they'll fix this shit eventually.

im talking about compared to composer 2.5