Hacker News new | ask | show | jobs
by rfoo 472 days ago
For decode, MoE is nice for either bs=1 (decoding for a single user), or bs=<very large> (do EP to efficiently serve a large amount of users).

Anything in between suffers.