Hacker News new | ask | show | jobs
by killtimeatwork 1887 days ago
> Production issues are a part of life.

Only if you accept them. The alternative is to do very few, rigorously tested releases per year. This way you don't have production issues. That's how industries like banking make sure bank transfers and card payments work and people's money is not randomly lost... It's a shame many other industries just accept their product failing for users as something normal/inevitable.

3 comments

I can't say my experience echoes your comment. I'm a former employer of a financial services (billing) company built around a mainframe code base started in the 70s. We probably qualify for the sort of business you had in mind with your comment.

We did four releases a year, across the entire organization (so mainframe and more modern platforms), on Saturday nights/early Sunday mornings. There was plenty of testing but there was still plenty of errors only found on the day of, and rushed to fix in the wee hours or daylight hours of Sunday morning.

The only thing that seemed to correlate with release quality was the overall risk of the release, i.e. the complexity and number of new features written during that quarter.

> We did four releases a year, across the entire organization (so mainframe and more modern platforms), on Saturday nights/early Sunday mornings. There was plenty of testing but there was still plenty of errors only found on the day of, and rushed to fix in the wee hours or daylight hours of Sunday morning.

This way, you had bugs in prod for less than a day once every quarter, as opposed to having buggy prod all the time, as is common in organizations doing Continuous Deployment.

That's adorable. You know that no matter how much testing you do, that something WILL slip through the cracks? Always.
Of course. Even the Space Shuttles blew up, twice. I'm guessing even pace makers and software in nuclear power plants have bugs. The point is, these things are exceedingly rare or have very limited scope (occur only in most obscure corner cases and also do limited damage), while in web companies which adopted Continuous Deployment, serious bugs are just common and I think seen as part of life.
Work in healthcare where we have heavily tested, quarterly releases. Well, we had a release today and some stuff was pretty horribly broken, despite being so heavily tested. We didn't adequately load test one piece of the new release under production-like conditions. Oops. Thankfully the fix was simple and a hotfix only took a couple of hours in total. Yet another lesson learned.
That's pretty bad, but nonetheless you detected and fixed it very quickly. Compare that to lingering bugs in Twitter iOS client (it's just broken on iPhone 5s, I guess they simply don't test on that device anymore), or happy random bugs in Windows 10 that appear after they CD an update on their users.
Then you get the worst of both worlds. You are in an industry where few very well tested releases are needed to meet SLA and customer expectations, but you have enough of the company looking at entirely different industries and wanting to follow their pipeline instead.