So, before October, they were lousy at tracking downtime issues for 2 years (no downtime from 2016 to 2018), but in November, Microsoft came and gave them the technology to correctly track downtime, and they had their first downtime logged in November.
Sometimes it is. There are some incredibly brute force yet simple and elegant pattern that power some of the biggest scale system you could think of.
It is relatively easy to scale a collection of simple things to extreme and exhibit complex behavior together. It is a lot harder to scale something complex to extreme. But too many times the latter is the default - designed wrong from the ground up and stuck in scaling hell.
https://www.githubstatus.com/uptime?page=31