Hacker News new | ask | show | jobs
by mgkimsal 2132 days ago
"outside your control" is key here. you're assuming a rollback would work. in many cases, some external system changes without your knowledge, and you're only seeing those changes on production.

I've got a client that has data feeds from multiple vendors. some are pulls, some are... "hey, we'll FTP this file to you". the file format has changed - unannounced - at least 3 times in the past... 15 months. Then something breaks on production, but you don't know what. You need to get on that machine and take a look.

"Redeploy the older working version" doesn't do anything except re-introduce more problems in these instances.

4 comments

This is a good point. There are probably lots of people on HN working in cloud environments where your dependencies are actually organizationally within your control. If one of your dependencies makes a change that breaks you, you can escalate the problem and compel them to roll back the change. This is the luxury of building the entire world. My service depends on nothing that can't be escalated to my own VP, so "roll back to the old version [of whatever changed]" is a very satisfying answer, but it's not an option when your dependencies aren't obligated to keep you running.
I pity anyone whose system needs less than X downtime per month, but who depends the constant availability of an external system that is down for more than X per month :)
> Then something breaks on production, but you don't know what. You need to get on that machine and take a look.

If that's a problem you find yourself having at all, much less regularly. You have a serious observability problem.

Isn’t the larger issue that your production environment can be brought down by bad user input?
There's breakage outside "brought down". A system that's running but doesn't produce outputs because the input data changed can be "broken" and violating its SLA too. And not really something you can design around outside "we won't promise anything", but then you loose to competitors that do take it on themselves to react quickly enough with hotfixes.
How fast can you grow if you’re constantly putting out fires? It sounds like you are in a B2B world. Businesses always want more. Where are the sales people/customer service managers that can set realistic expectations on what the client requires vs. what they are expected to do? “logging into production and manually correcting stuff” can only go so far and doesn’t scale.