| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boroboro4 283 days ago
	> even with 0.0 temperature due to MOE models routing at a batch level, and you're very unlikely to get a deterministic batch. I don’t think this is correct - MoE routing happens at per token basis. It can be non deterministic and batch related if you try to balance out your experts load in a batch but that’s performance optimization (just like all of the blogpost) and not the way models are trained to work.

1 comments

Ah interesting, good point. So I guess expert-choice routing leaks across the batch. Now I'm not sure.