| HN Mirror

I agree! The models will definitely keep getting bigger, and MoEs are a part of that trend, sorry if that wasn’t clear.

A pod of gen2-H100s might have 256 GPUs with 40 TB of total memory, and could easily run a 10T param model. So I think we are far from diminishing returns on the hardware side :) The model quality also continues to get better at scale.

Re. reading material, I would take a look at DeepSpeed’s blog posts (not affiliated btw). That team is super super good at hardware+software optimization for ML. See their post on MoE models here: https://www.microsoft.com/en-us/research/blog/deepspeed-adva...