Hacker News new | ask | show | jobs
by SlavikCA 4 days ago
And with MTP (or other speculation techniques) you can ~double that.
1 comments

MTP on a MoE is hit or miss. If you're bottlenecked on memory, MTP can increase the number of active experts (like any batch processing would), which can eat away gains from it.