| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Spivak 1572 days ago

Look, I obviously don't know the specifics but I think you swung too far on the other side and also just stitching together existing components. A single box running cron is quick and easy but I would be wary of hinging anything non-dev facing on that.

* You can't fearlessly patch the cron box without knowing when the jobs run, I don't want any special cases. Also how would I even know when you have jobs scheduled looking at a fleet -- read your out of date docs? Ew no. So a messaging system, guess the devs were already familiar with Kafka, is necessary to process the jobs across multiple nodes.

* Individual nodes are unreliable and you don't have any durable persistent storage. Most people don't like storing data in Kafka even though it's possible so they went with a database.

* Cron doesn't have any mechanism to give you a history of jobs that isn't built into your script or parsing logs. Ditto with failure notifications. You also can't reprocess failed jobs except manually. Guess you just wait another 4 hours?

* You now also can't duplicate the server because they're both going to try and do the export every 4 hours and step on one another. Woop, you made a system where an assumed safe operation "adding something" breaks stuff.

This kind of thing is a nightmare if you already have queues and a database because why would you stand up another thing but if you had none of that to begin with then yeah... makes total sense.

Like this is the reality of ops and running something "production ready" it's a lot of big ole complex HA platform so that you can run your 5 lines of code and not have to worry about any of the hard problems like availability, retrying, resource contention, timing, data loss, locking.

1 comments

redleggedfrog 1572 days ago

You're proving my point. None of those bullets are even considerations - they're problems in search of money. The initial solution was implemented by someone who didn't bother to notice everything else on this system was using scheduled tasks (or services and Quartz) to do their work. The didn't bother to notice that there was already Serilog setup to do reporting to another system that the customer was already using to monitor processes. They didn't bother to look for the local storage that was available. Instead they purchased a Kafka instance at the company's cost and threw tech at it. Oh, and then quit and got a different job, before writing any docs.

My solution integrated with current tech, didn't have any of the problems outlined in those bullet points, and has required 0 maintenance and or updates for over 4 years. I don't want to even think about how many Kafka and couch releases there have been since then.