|
|
|
|
|
by eggie5
3071 days ago
|
|
1. Well deploying a k8s cluster is a huge engineering challenge.
2. And doing distributed training in general is an engineering challenge.
3. And combine the two for an even larger engineering challenge of doing distributed training on k8s using GPUs. ML Engine is an order of magnitude easier. You just have to do step 2 and setup your model to employ multiple GPUs. |
|