There's a huge difference between diffusion models that were built to be run on commodity hardware and the huge autoregressive models like GPT. You can't even run GPT3 on the cloud without some specialized interconnect.
How do you know this? Not doubting you just curious. I've always been curious about requirements or size of GPT3 because Eluether's GPT-X 20B takes like 40GB VRAM to run and I think it is the closest analogue to GPT-3
No, you can’t build a cluster of GPUs to run GPT without special very fast interconnect like InfiniBand. Stable Diffusion can run on a single GPU, like 3090 .