Hacker News new | ask | show | jobs
by perfmode 3103 days ago
Why do you need a 99.99% from job completion rate? Why not just design for failure and inevitable retries? Almost seems like you grant platform users a false sense of security by making it very reliable but not perfect.
3 comments

My guess: because financial systems.

A lot of traditional financial instruments 1) are not resilient to failure and 2) run at fixed times in batches. I’m confident it’s not their own systems that set the requirement of rigidity.

I’ll hazard a guess that this has to do with the fact that the work load is a set of scheduled tasks.

Their customers expect the cron jobs to run when they expected and how they expected.

With that constraint restarts look a lot less acceptable.

How are those two things different?