|
|
|
|
|
by Longwelwind
1168 days ago
|
|
I don't know what tools you are using but this can be achieved with Airflow on k8s, for example: * Add a GPU resource requirement on one of your step * Add an auto-scaler that adds GPU nodes to your cluster based on the GPU resource demand. After having written the above, I realize that it might sound like that famous HN comment about how you can /easily/ re-create Dropbox yourself, which might actually prove your point that there is a need for ML-specific tools for the training part. |
|
Airflow is also absolutely not built for that purpose. It's ~10yr old Hadoop-era technology.