Hacker News new | ask | show | jobs
by throwawayForMe2 2524 days ago
Standard procedure was to have a pre-batch and post-batch backup step. Any major problems you restore everything to the pre-batch state.
1 comments

That's basically what we did. The problem here was that processing was split into eight or so steps across ~25 streams of work between the backup steps and was far enough along the output files were generated, revving their generation number. Those files fed something like thirty follow-on jobs, so all the file generations had to be backed out before processing could run.

It all seemed horribly complicated and brittle to me at the time. Not sorry I don't work with it any more.