|
|
|
|
|
by DishyDev
1349 days ago
|
|
As someone whose job involves maintaining uptime of a critical system that's dependent on Cosmos DB this sort of thing is scary. Where there's been other reliability issues with Cosmos before we've not had an understanding customer base, and it feels very out of my control. I'm finding a lot of the reliability guarantees of Azure PaaS services are overblown or come with big caveats when you start to work with them in a serious way. For example I've had some bad reliability issues with Azure Functions not firing, or their premium function runtimes becoming unresponsive. And it seems like that's just the start of the outstanding issues with them https://github.com/Azure/azure-functions-host/issues I think people need to look more carefully at these PaaS guarantees and look at what that 99.999% reliability Microsoft are claiming actually means. |
|
That's a couple months after the Ubuntu/systemd incident (Azure's "blessed" Linux image is Ubuntu and it has unatttended-upgrades enabled including on managed infrastructure like AKS (where you can't turn it off without dirty hacks). A bad Ubuntu update caused hosts to lose their DNS from DHCP config rendering massive amounts of machines in partially broken states)
https://thenewstack.io/ubuntu-linux-and-azure-dns-problem-gi...