Hacker News new | ask | show | jobs
by bob1029 293 days ago
I think this is the best path if your problem can support it.

I use a 5950X for running genetic programming and neuroevolution experiments and about once every 100 hours the machine will just not like the state/load it is experiencing and will restart. My approach is to checkpoint as often as possible. I restart the program the next morning and it deserializes the last snapshot from disk. Worst case, I lose 5 minutes of work.

This also helps with Windows updates, power outages, and EM/cosmic radiation.