Y
Hacker News
new
|
ask
|
show
|
jobs
by
gautam5669
783 days ago
This is the first thought came to my mind too.
Given its sparse, Will this be just replacement for MoE.
1 comments
samus
781 days ago
MoE is mostly used to enable load balancing since it makes it possible to put experts on different GPUs. This isn't so easy to do with a monolithic, but sparse layer.
link