|
|
|
|
|
by cogman10
208 days ago
|
|
People outside of tech (and some inside) can be really bad at understanding how something like this could slip through the cracks. Reading cloudflare's description of the problem, this is something that I could easily see my own company missing. It's the case that a file got too big which tanked performance enough to bring everything down. That's a VERY hard thing to test for. Especially since this appears to have been a configuration file and a regular update. The reason it's so hard to test for is because all tests would show that there's no problem. This isn't a code update, it was a config update. Without really extensive performance tests (which, when done well, take a long time!) there really wasn't a way to know that a change that appeared safe wasn't. I personally give Cloudflare a huge pass for this. I don't think this happened due to any sloppiness on their part. Now, if you want to see a sloppy outage you look at the Crowdstrike outage from a few years back that bricked basically everything. That is what sheer incompetence looks like. |
|