Is it possible to break the model apart? Or does the entire thing need to be architected from the get-go such that an individual GPU can own a portion end to end?
It's possible to break the model apart (I mean, for the larger models it's not that a 8Gb card isn't enough but even a single 80Gb card isn't enough) but that needs a high-speed interconnect (Nvidia pods provide hundreds of Gbps, and use all of that) as you need to exchange those parameters quite often, so you're just as limited by your compute as you are by the interconnect speed.