As someone who has spent quite a lot of time on inference, I would a add a small note:
Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.
Deployment looks very different for MoE than dense style models so I would say that it is more nuanced than "inference memory reqs remain the same". Memory can be very different for MoE style models.