Y
Hacker News
new
|
ask
|
show
|
jobs
by
hexomancer
810 days ago
Mixtral is also a MoE model, hence the name:
mix
tral.
1 comments
sangnoir
810 days ago
Despite both being MoEs, thr architectures are different. DBRX has double the number of experts in the pool (16 vs 8 for Mixtral), and doubles the active experts (4 vs 2)
link