Hacker News new | ask | show | jobs
by acetabulum 1262 days ago
If you use Horovod Elastic, I think you can avoid this problem working across a cluster of Spot instances.

https://horovod.readthedocs.io/en/stable/elastic_include.htm...