Hacker News new | ask | show | jobs
by hexomancer 810 days ago
Mixtral is also a MoE model, hence the name: mixtral.
1 comments

Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)