Hacker News new | ask | show | jobs
by parsimo2010 2162 days ago
I don't know how deep you've dug but the very first thing you should be doing is using spot instances instead of on demand instances (unless you absolutely can never wait to train a model). Spot instances are cheaper than on demand instances, with the downside that the price can fluctuate, so you need to build in a precaution for shutting down if the price gets too high. So if the price goes up, you either have to stop training until the price goes back down or to suck it up and pay a higher price.

Luckily, it's pretty simple to handle interruptions for neural network like models that train over several iterations. Just save the model state periodically so you can shut the instance down whenever the price is too expensive and start training again when the price is lower.