Hacker News new | ask | show | jobs
by sonium 1404 days ago
I'm not sure why this would be more reliable. But it would probably fit, but at the cost of additional complexity.
2 comments

When you split up into smaller jobs, you have to design them to work in face of retries and parallel execution. It's a bit of complexity, but the end result is a scalable and self-healing system, that can handle lives code updates, features which contribute to make the full workflow inherently reliable and scalable.

If you have a big >1h job you have to add locks, make sure deploys don't interrupt the job, handle retries of the whole job, maintain serverless + not serverless, and then inevitably rewrite the whole thing when it takes too long to be viable. All in all a lot of work and complexity as well that is wasted on making a bad design work.

60+ minute jobs are already complex.