Hacker News new | ask | show | jobs
by sangnoir 812 days ago
Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)