Hacker News new | ask | show | jobs
by ownagefool 2399 days ago
Replying to myself, because it's now on their status page that a firewall change took down the database.

This points to there being:

- a lack of process and testing on key networking changes. Aren't they doing CI/CD, automated testing and peer review for this?

- A SPOF in the database; why couldn't things connect to a secondary for a read-only mode?

Quite a lot of the time, things break for stupid reasons. The main difference is when a normal company does something stupid, they can hide it, lie about it, or make it sound more complex.

The fact Gitlab publishes their fuck ups, is supposed to force them to do a better job and actually look at root causes and apply proper fixes that we can all judge. I wouldn't hold any particular fuck-up against them.

1 comments

Network devices are generally hostile to advanced automation, and if they had both primary and secondary as the same class of machine then the changes would apply to both.
I believe they're hosting in-cloud, which means it's probably not a device and can be automated. Obviously, public IP addresses will be specific to environments, but that's what PRs should double check.