|
|
|
|
|
by boroboro4
283 days ago
|
|
> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch. I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work. |
|