Hacker News new | ask | show | jobs
by wolfgangK 614 days ago
For training, doesn't checkpoint saving make high reliability a moot point ? Why pay for 99.99999? uptime when you can restart your training from last/best model ?