Hacker News new | ask | show | jobs
by angoragoats 702 days ago
Agree 100% that energy costs are important. The example system in my other post would consume somewhere around 300W at idle, 24/7, which is 219 kWh per month, and that's assuming you aren't using the machine at all.

I don't have any actual figures to back this up, but my gut tells me that the fact that enterprise GPUs are an order of magnitude (at least) more expensive than, say a, 3090, means that the payback period of them has got to be pretty long. I also wonder whether setting the max power on a 3090 to a lower than default value (as I suggest in my other post) has a significant effect on the average W/token.

1 comments

Agreed, but there are other costs associated with supporting 10-16x GPUs that may not necessarily happen with say 6 GPUs. Having to go from single socket (or Threadripper) to dual socket, PCIE bifurcation, PLX risers, etc.

Not necessarily saying that Quadros are cheaper, just that there's more to the calculation when trying to run 405B size models at home

The system I outlined in my other post [0] has ten GPUs and does not require dual socket CPUs as far as I'm aware. It could likely scale easily to 14 GPUs as well (assuming you have sufficient power), with an x8/x8 bifurcation adapter installed in each PCIe slot. This is pushing the limits of the PCIe subsystem I'm sure, but you could also likely scale up to 28 GPUs, again assuming sufficient power, by simply bifurcating at x4/x4/x4/x4 vs x8/x8.

I think it should work as-is with the components listed, but if you disagree please let me know!

[0] https://news.ycombinator.com/item?id=41047689