Hacker News new | ask | show | jobs
by JohnMakin 1058 days ago
I have seen too often overly complicated systems where one small change creates these situations that go unnoticed for far too long and then something breaks in a mystifying and spectacular way. The reaction, generally, is then a fear to make any change to the system at all, regardless of how benign, even if it can't possibly mess something up - because of fear of the "unknown."

IME, having robust alerting and monitoring tools, good rollback plans and procedures/automation should eliminate this fear entirely. If I was afraid to touch anything for fear of breaking it, I'd likely never get anything done.

1 comments

> having robust alerting and monitoring tools, good rollback plans and procedures/automation should eliminate this fear entirely.

Sure, but that all sounds like stuff that happens after you deploy/release… you really need to catch things sooner than that. Don’t make the user into the one who has to find the breakage, please. No matter how fast you roll back. Test your software thoroughly!

Nothing I said implies the user has to find it. With rolling deployments and blue/green strategies, bad changes don't even have the potential to go live.