| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LindenRyuujin 4400 days ago

It is the similar. Lets take that as an example:

You make a change, start running tests to make sure nothing is broken. It's the end of Friday and you don't finish the testing. You close everything down and on Monday you forget you had a last few tests still to complete. You pass the change into your build process and it makes it into a release. You didn't follow the process… not because you maliciously decided to skip testing, it was simply a human mistake (slightly different to your comment, but let's be charitable and assume most people aren't purposefully breaking things).

Does tracing the problem and firing you stop it happening again? Does it even stop you from making the same kind of error again in ten years time when memory has dimmed? In the end you've lost your job and your company has lost someone who probably did good work for many years (and would no doubt continue to do so in the future) over a single easily made mistake (however big the fallout from that mistake). You're also much more likely to try and bury your mistake if you know it will cost you your job.

On the other hand what about adding a step to your build process where a second person runs the testing while they review your code? No one gets sacked, everyone learns something and the chances of this happening to anyone else are massively reduced. Is the person who made an understandable mistake any more to blame than the manager who oversees the process as a whole? They allowed flaws in the process that meant your mistake could get into a release. It’s rarely as simple as one person making a mistake.

I'm simplifying of course, if your build process is anything like ours there are multiple levels this would be caught at... which is kind of the point. In our development process there are four places this error would be caught before it made it to an official release. If our release goes wrong the worst that happens is we annoy the people using the software, yet we've built up a (reasonably) robust process in exactly the way described in the article.

It’s as the article says: When a space shuttle crashes or an oil tanker leaks, our instinct is to look for a single, “root” cause. This often leads us to the operator: the person who triggered the disaster by pulling the wrong lever or entering the wrong line of code. But the operator is at the end of a long chain of decisions, some of them taken that day, some taken long in the past, all contributing to the accident; like achievements, accidents are a team effort.