It's difficult to speculate as to the exact failure from blurry pictures but the solder on that choke (inductor) looks terrible.
Something went wrong in manufacturing. The solder should have wicked to cover the entire pad, not just a small square, and there should be no (brown) discoloration.
Ok, how are people powering these things? 2.4kW is well beyond a standard circuit in the US. Are people having 240V/30A circuits installed? Are they hijacking the dryer plugs? EV charger plugs? Hottub circuits?
240V-20A circuits will handle 3.8kW continuous. It’s probably a 240V-20A circuit, as that is what the power supplies typically want. Also, easy to convert an outlet to 240V, if the breaker is dedicated to that outlet. Just requires swapping the breaker and the outlet, not the wires.
Chaining two PSUs on separate circuits is also an option. If they're using the MaxQ versions though, the total GPU power draw is only ~1200W. The bigger question to me is how are they cooling it? Sticking an AC in that room just doubles the power draw issues.
It is basically on 2 different circuits/breakers. Asus wrx90e supports 2 psu as well. You may need to synchronize both psu and several adapter for this is available in Amazon. Soon planning to upgrade it to 240V
I wonder whether those cards ran the model that wrote the nonsense about the forces involved.
Hint: when you have a piece of metal stuck with thermal goop to a lot of components, the force doesn’t “concentrate” on one of them. You need to detach it from each one with however much force is needed to detach it from that component.
Cool post. FYI you might be better off getting one big fan for your "radiator" instead of lots of little fans. Big fans don't need to spin as fast as small fans to push the same amount of air. So they run a lot quieter.
Sure 140mm fans you may call little but it does need enough static pressure for the radiators. This setup is already several times quieter than stock setup
Is that little computer training LLMs from scratch all by itself? That must take years to get any kind of progress, given the scale of training other providers do. Where do you get the training data from?
If you want ready, well engineered, water-cooled multi-GPU research workstations, my colleagues at https://comino.com build and sell them. Or you can purchase fitted waterblocks from them for many GPUs, and build your own.
Something went wrong in manufacturing. The solder should have wicked to cover the entire pad, not just a small square, and there should be no (brown) discoloration.