Hacker News new | ask | show | jobs
by jffry 1834 days ago
That's my point though. Even though they may understand the immediate flaw in their code that caused the issue, there's not much use (for them or their customers) in just talking in detail about that specific flaw.

I'd go so far as to argue that the specifics of the flaw are immaterial right now. At this stage, the important thing is that they have identified a specific code change that was the proximate cause of the issue, and have a mitigation in place. This is contrasted with more mysterious and hard-to-track-down failures. ("We are working to understand why our systems are down and will post another update in 30 minutes")

What will take time, and the thing which will be interesting, is failure tree analysis. (You might hear the phrase "failure chain" or "root cause" but IMO it's quite rare for things to be so linear). That can help identify opportunities to improve processes at many different levels of the product lifecycle.

Humans are fallible, and there's no way we can write bug-free software, so the solution has to be more robust than "hope that every member of our organization never makes a mistake again"

1 comments

Yes, I was saying I would have avoided words like "permanent fix", because it sets unrealistic expectations.