Y
Hacker News
new
|
ask
|
show
|
jobs
by
rishabhjain1198
923 days ago
In a MoE model with experts_per_token = 2 and each expert having 7B params, after picking the experts it should run as fast as the slowest 7B expert, not a comparable 14B model.
1 comments
nullc
923 days ago
Only assuming it's able to hide the faster one in free parallelism.
link
moffkalast
923 days ago
My CPU trying its best to run inference: parallelwhat?
link