A trivial example would be a bug that replaces the configuration for all customers with the last uploaded. Then when the next customer uploads a new (valid!) config, you have a problem.
Obviously it wasn’t that trivial but the point is: it wasn’t the customer’s configuration change that was the problem but some code that managed the config change.
It's more common than we imagine. That's usually the start of many of the historical network incidents. The important part, as usual, is to make sure the remediations of such incidents focus on how to limit blast radius of small changes, and how to accomplish that without imposing artificial gatekeeping and bureaucracy into the change process.
This attitude is why we have only 2½ search engines on the entire Internet. Only Google, Bing, and Yandex run crawlers. Everybody else is just a reseller for them.
Web crawlers are a feature not a bug. If your site shouldn't be crawled, it doesn't belong on the Internet.
If you cannot generate revenue by your internet content, probably you can’t live from generating content for the internet.
The consequence, IMHO, is that the internet would have this amount of content and usefulness.
Newspapers? No. Can’t live from internet news if anyone can copy a reporter’s work and post it on his own site and dilute traffic.
Online selling? Don’t look like a viable business model, as anyone can copy the photos you paid a photographer for, the descriptions you paid someone to write and the reviews your customers wrote. True reviews are priceless, you now? Even more now that an AI can detect computer generated reviews.
Obviously an open and totally money-free internet is nice, but it wouldn’t be the internet people make a living from.
Test “it”? The change in question wasn’t by fastly but a customer of theirs making a config change. It’s possible that this customer did validate their change somehow.
Fastly obviously didn’t test their code (with the bug) enough, but testing of course can never prove the absence of bugs. Testing for a global deployment like a massive CDN happens to a large extent in prod because you don’t have another globe. You can test on a smaller scale but eventually you run into a problem that only shows itself at full scale.
> We experienced a global outage due to an undiscovered software bug that surfaced on June 8 when it was triggered by a valid customer configuration change.
Their change was bad, that was May 12. Since that seemed OK on May 13,14,… there wasn’t much indicating that change would blow up weeks later. For example if they roll it out gradually, they would reach 100% rollout with all lights being green
The customer change was a valid configuration. That was yesterday.
A trivial example would be a bug that replaces the configuration for all customers with the last uploaded. Then when the next customer uploads a new (valid!) config, you have a problem.
Obviously it wasn’t that trivial but the point is: it wasn’t the customer’s configuration change that was the problem but some code that managed the config change.