|
|
|
|
|
by Patrick-STH
1217 days ago
|
|
There are a few big ones:
- The CUDA license does not allow you to use GeForce in the data center. In the US it has become less popular, but if you look at our Inspur AIStation piece, that was a cluster located in China with GeForce cards. So it still happens, but less so.
- The memory capacity is another big challenge. Newer models have 80GB which dwarfs the 24GB on a 4090. We just got the RTX 6000 Ada in, so that is an option for more memory.
- For higher-end training, one of the big challenges is interconnect, so having NVLink and Infiniband or 100GbE+/ Infiniband NICs is important. The HGX A100 platform is designed for that with its NVSwitch and PCIe switch topology. With all of that said, you are 100% right that many startups have used consumer cards for years. For example, Andrej Karpathy talked about how our DeepLearning11 build (8x 1080 Ti's) had a ~3 month payback period versus AWS https://twitter.com/karpathy/status/924340245478256640 |
|
In 2017. Currently you can rent 8xA100 server for $8.8/hr: https://lambdalabs.com/service/gpu-cloud
At this price the payback stretches to about 3 years (taking into account average energy costs in US, and assuming 24/7 operation for the whole 3 years).