Hacker News new | ask | show | jobs
by miohtama 202 days ago
All modern models are MoE already, no?
1 comments

That's not the case. Some are dense and some are hybrid.

MOE is not the holy grail, as there are drawbacks eg. less consistency, expert under/over-use