Hacker News new | ask | show | jobs
by simgt 89 days ago
If you're adding a model to do the "routing" you're basically putting learned backward connections and you end up with a RNN
1 comments

Mixture of Experts already have routing models,

I'm just suggesting eliminate (or weaken) the distinction between layers and expert and have just the one, then iterate that one until its 'gpod enough' score plus (iterationcount*spontaneity) is greater than some threshold.