|
|
|
|
|
by ACCount37
7 days ago
|
|
"Orchestrator" pattern, "only use a big model to do big thinking, use smaller models to do grunt work" is probably what the field would converge to, eventually. Perhaps in form of "dynamic sparsity" - i.e. a family of closely related models allowing inference to transition from 1B class to 100T class on a dime, complete with something like joint KV cache. But it's a hard pattern to pull off, so I'm not sure how soon we'll see it in action. |
|