Hacker News new | ask | show | jobs
by jampekka 43 days ago
The experts in MoEs aren't specialized in any meaningful task sense. From level of what we would think as tasks MoEs are selected essentially arbitrarily per token and per block.
1 comments

It’s unsupervised, yes, but “unspecialized in any meaningful task sense” is incorrect, that’s the whole point. It’s just not in the sense of “this is a legal expert, this is a software developer”.
Optimal expert separation depends on the goal and can be pretty arbitrary, for example DeepSeek v4 separates them more or less by domain if I remember correctly.