|
|
|
|
|
by minimaxir
2927 days ago
|
|
With the new discounts on preemptible GPUs (https://cloudplatform.googleblog.com/2018/06/Introducing-imp...), the economics of quickly spinning up a fleet of GPUs with Kubernetes for a quick parallelizable ML task become very interesting. (assuming that Google allows enough GPU quota for a fleet of GPUs for nonenterprise users anyways) What I want to use Kubernetes + instant-GPU-fleet for deep learning hyperparameter grid searching. (i.e. spin up a lot of preemptible GPUs; for each parameter config, train the model on a single GPU in parallel for linear scanning speed scaling). Kubeflow (https://github.com/kubeflow/kubeflow) is close to this functionality, but not quite there yet in user-friendlyness. (you have to package everything in a huge Docker container and launch jobs from the CLI; ideally what I want to do is to spawn containers and start training directly from the JupyterHub notebook on the master node) |
|
> (assuming that Google allows enough GPU quota for a fleet of GPUs for nonenterprise users anyways
This is actually why we have separate preemptible quota [1], which we grant more freely. You can't stock out our full-price customers, so we're happy to let you spin up tons of V100s (and as of this morning TPUs!).
[1] https://cloud.google.com/compute/quotas#quotas_for_preemptib...