Hacker News new | ask | show | jobs
by ACCount37 7 days ago
"Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache.

But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action.