Hacker News new | ask | show | jobs
by losvedir 4 days ago
Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.

Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.

3 comments

No, this is an agent-level thing, not a feature of the model (ish, unsure for Fable).

You talk to a smart, heavy model to build a plan composed of smaller steps. Then you have the heavy model spin up smaller, cheaper LLMs to actually implement the tasks.

The heavy model is basically read-only in that mode. It can read files, execute tests, etc, but it can’t write code. It just tracks what needs to be done, offloads the work to dumber LLMs, validates the task is done, and moves on to the next step.

It sort of pushes humans up the stack. Instead of having a human sitting there prompting the LLM to start the next task, you have another LLM do that loop.

It’s been on my list to try out.

The AWS Kiro (https://kiro.dev) spec-driven coding harness operates this way in Auto mode which offers the base token rate.

Manually-specifying Sonnet or Opus is a multiplier on the base token rate; specifying Qwen fractions it. Left to its own, it presumably uses the heavier models to create the plan and orchestrate the work; the bite-sized task definitions are delegated to smaller models.

https://en.wikipedia.org/wiki/Mixture_of_experts#Sparsely-ga...

"The sparsely-gated MoE layer,[21] published by researchers from Google Brain, uses feedforward networks as experts, and linear-softmax gating. Similar to the previously proposed hard MoE, they achieve sparsity by a weighted sum of only the top-k experts, instead of the weighted sum of all of them."

"Top-k experts," in case of some DeepSeek's models k=1.

See OpenRouter’s recent announcement on a model fusion setup, which they now support via API:

https://openrouter.ai/blog/announcements/fusion-beats-fronti...