|
|
|
|
|
by zozbot234
7 days ago
|
|
Normally, experts are picked for every layer not just every token. But there are plausible ways of getting around that bottleneck while streaming if you can batch many inferences together. Still, the Apple approach of swapping the experts only rarely is interesting, though it likely degrades the model a lot. |
|
Got all those tokens, isn’t that the point of auto research and friends??
(Only sort of joking).