|
|
|
|
|
by IHLayman
1438 days ago
|
|
ML workloads definitely cost a lot of money. Even for a preemptible VM, A100 GPUs cost $0.88/hr/GPU. That's $624 a month for a single GPU and only the 40GB model. Want a dedicated 8 GPU machine in the cloud to do training with? That'll run you around 16 grand a month. Do that for 2 years and you may as well have bought the device. Want to do 16/24/40 GPU training? Good luck getting dedicated cloud machines with networking fast enough between them so that MPI works correctly, and prepared to give up your wallet. Also, that's just compute. What about data? Sure cloud accepts your data cheaply, but they also charge you for egress of that data. Yes you should have your data in more than one location, but if you depend on just cloud then you need it in different AZ which costs even more money to keep in sync and available for training runs. I think for simple workloads and renting compute for a startup, cloud definitely makes sense. But the moment you try to do some serious compute for ML workloads, good luck and hope you have deep pockets. |
|