Y
Hacker News
new
|
ask
|
show
|
jobs
by
phamilton
3 days ago
MTP on a MoE is hit or miss. If you're bottlenecked on memory, MTP can increase the number of active experts (like any batch processing would), which can eat away gains from it.