|
|
|
|
|
by redleggedfrog
1571 days ago
|
|
The article didn't mention what I have observed as the worst cause of bloat - what I call the "New Toys" problem. For example, you need a process to export data every 4 hours, with some visibility of success and failures. I could have written a cron job/scheduled task in 4 hours and be done. What I found instead is Kafka with node.js and couch.db. Yes, for that one export. Not only that the were paying monthly for the Kafka. Soooooo, it got replaced. I've seen this a lot more in the last 10 years. I call them "stitcher programmers." They are near useless at providing solutions unless they can stitch together some byzantine Frankenstein's monster from existing tech, usually with extreme overkill. On the front end is the worst with React and thousands of dependencies for simple forms. Right sizing a solution is not in their vocabulary. |
|
* You can't fearlessly patch the cron box without knowing when the jobs run, I don't want any special cases. Also how would I even know when you have jobs scheduled looking at a fleet -- read your out of date docs? Ew no. So a messaging system, guess the devs were already familiar with Kafka, is necessary to process the jobs across multiple nodes.
* Individual nodes are unreliable and you don't have any durable persistent storage. Most people don't like storing data in Kafka even though it's possible so they went with a database.
* Cron doesn't have any mechanism to give you a history of jobs that isn't built into your script or parsing logs. Ditto with failure notifications. You also can't reprocess failed jobs except manually. Guess you just wait another 4 hours?
* You now also can't duplicate the server because they're both going to try and do the export every 4 hours and step on one another. Woop, you made a system where an assumed safe operation "adding something" breaks stuff.
This kind of thing is a nightmare if you already have queues and a database because why would you stand up another thing but if you had none of that to begin with then yeah... makes total sense.
Like this is the reality of ops and running something "production ready" it's a lot of big ole complex HA platform so that you can run your 5 lines of code and not have to worry about any of the hard problems like availability, retrying, resource contention, timing, data loss, locking.