|
|
|
|
|
by thesandlord
3072 days ago
|
|
According to the article, they are using NC24 VMS, which have 4 K80s attached. So yes, I would assume they are using GPUs. Check out https://github.com/google/kubeflow if you are interested in doing the same. (Disclaimer: I work for GCP doing K8s stuff, I know GKE clusters support GPUs and Kubeflow, not 100% sure if AKS supports it or if you need to set up your own cluster like OpenAI did.) |
|
If I want to train a TF model distributed over many machines in GCP, it seems like I could use Cloud ML Engine or deploy Kubeflow to a K8s cluster running in GKE and train it there.
What should I consider when choosing between these two options? Is there another option I should consider?