Hacker News new | ask | show | jobs
by gautam5669 783 days ago
This is the first thought came to my mind too.

Given its sparse, Will this be just replacement for MoE.

1 comments

MoE is mostly used to enable load balancing since it makes it possible to put experts on different GPUs. This isn't so easy to do with a monolithic, but sparse layer.