|
|
|
|
|
by idonotknowwhy
437 days ago
|
|
> It's not even per token. The routing happens once per layer, with the same token bouncing between layers. They don't really "bounce around" though do they (during inference)? That implies the token could bounce back from eg. layer 4 -> layer 3 -> back to layer 4. |
|