Hacker News new | ask | show | jobs
by nomilk 82 days ago
Why don't companies with chronic outages mimic their stack from top to bottom (i.e. starting with a new domain), then before making a change, make the change on the duplicate stack and blast it with mock requests.

Might catch 90% of problems before they make it into the real stack?

E.g. every step of GitHub's migration to Azure could be mimicked on the duplicate stack before it's implemented on the primary stack. Is this just considered too much work? (I doubt cost would be the issue, because even if it costs millions, it would pay for itself in reduced reputational damage from outages).

EDIT: downvotes - why? - I think this is a good idea (I'd do it for my sites if outages were an issue).

3 comments

Testing? Who needs it when you have Copilot!
Downvotes are probably because that is what companies without chronic outages do.

If you'd ever worked on a codebase as terrible as I imagine GH's internals are and looked at the git history, you'd find two things:

1) fixing it would require rolling back 100's-1000's of engineer-years of idiocy that make things like testing or refactoring untenable

2) many prior engineers got part of the way through such improvements before leaving or being kicked out. Their efforts mostly just made it worse, because now you never know what sort of terribleness to expect when you open an unfamiliar file.

> EDIT: downvotes - why? - I think this is a good idea (I'd do it for my sites if outages were an issue).

Because that's a monumental amount of work, and extraordinarily difficult to retrofit into a system that wasn't initially designed that way. Not to mention the unstated requirement of mirroring traffic to actually exercise that system (given the tendency of bugs to not show up until something actually uses the system).

> that's a monumental amount of work

Agree, but look at the alternative; GitHub is constantly being savaged by users who (quite reasonably) expect uptime. Ignoring impacts on morale and reputation, damage to their bottom line alone might tens (hundreds?) of millions per year.

> mirroring traffic

yeah, I agree that's difficult, but it need to not be exact to still be useful.