| As someone who was slightly affected by this outage, I personally also find this post-mortem to be lacking. 75% of the post-mortem talks about the power outage at PDX-04 and blames Flexential. Okay, fair - it was a bit of a disaster what was happening there judging from the text. But by end of November 2 (UTC), power was fully restored. It still took ~30 hours according to the post-mortem for Cloudflare to fully recover service. This was longer than the outage, and the text just states that too many services were dependent from each other. But I'd wish they go into more detail here why the operation as a whole took that long. Are there any take-aways from the recovery process, too? Or was it really just syncing data from the edges back to the "brain" that took this long? Also one aspect I am missing here is the lack of communication - especially to Enterprise customers.
Cloudflare support was basically radio silent during this outage except for the status page. Realistically, they couldn't do much anyway. But at least any attempt at communication would be appreciated - especially for Enterprise customers, and even more especially after the post-mortem blames Flexential for a lack of communication. While I like Cloudflare since it's a great product, I think there are still a few more things that should be taken as a conclusion for CF to take away from this incident. That being said, glad you managed to recover, and thanks for the post-mortem. |