| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Patrick-STH 1217 days ago
	There are a few big ones: - The CUDA license does not allow you to use GeForce in the data center. In the US it has become less popular, but if you look at our Inspur AIStation piece, that was a cluster located in China with GeForce cards. So it still happens, but less so. - The memory capacity is another big challenge. Newer models have 80GB which dwarfs the 24GB on a 4090. We just got the RTX 6000 Ada in, so that is an option for more memory. - For higher-end training, one of the big challenges is interconnect, so having NVLink and Infiniband or 100GbE+/ Infiniband NICs is important. The HGX A100 platform is designed for that with its NVSwitch and PCIe switch topology. With all of that said, you are 100% right that many startups have used consumer cards for years. For example, Andrej Karpathy talked about how our DeepLearning11 build (8x 1080 Ti's) had a ~3 month payback period versus AWS https://twitter.com/karpathy/status/924340245478256640

1 comments

p1esk 1217 days ago

Andrej Karpathy talked about how our DeepLearning11 build (8x 1080 Ti's) had a ~3 month payback period versus AWS

In 2017. Currently you can rent 8xA100 server for $8.8/hr: https://lambdalabs.com/service/gpu-cloud

At this price the payback stretches to about 3 years (taking into account average energy costs in US, and assuming 24/7 operation for the whole 3 years).

link

TOMDM 1217 days ago

A bit of extra context, that's $8.8/hr for the 40GB A100's

The 80GB A100's will run you $12.0/hr for 8 of them.

link

fancyfredbot 1217 days ago

That's 24k over 3 months which would buy and power 8 4090 GPUs by my reckoning. Of course those 8 4090s wouldn't have enough memory to run chatGPT so maybe AWS is good value after all.

link

TOMDM 1217 days ago

This is Lambda Labs' pricing, which is significantly cheaper than AWS.

link