Y
Hacker News
new
|
ask
|
show
|
jobs
by
SlavikCA
4 days ago
And with MTP (or other speculation techniques) you can ~double that.
1 comments
phamilton
3 days ago
MTP on a MoE is hit or miss. If you're bottlenecked on memory, MTP can increase the number of active experts (like any batch processing would), which can eat away gains from it.
link