Hacker News new | ask | show | jobs
by paulyy_y 1943 days ago
Do y'all just like, ignore SLO's? Cloud SQL has a 99.95% SLO, going down for multiple minutes every month is within that. No smoke and mirrors here, there are ways to mitigate it but it's not Google messing around with expectations. HA doesn't mean 100% uptime.
3 comments

It’s worth noting that the dreaded maintenance everyone is talking about here is not part of the SLA:

“Downtime as part of Scheduled Maintenance will not be counted towards any Downtime Period.” - https://cloud.google.com/sql/sla

I’ve never seen an SLO document for CloudSQL though so the SLOs may be slightly higher internally?

Does that include or exclude "maintenance windows".

AWS tells you when your maintenance window for an RDS instance will be and you can delay/reschedule as needed.

I wouldn't consider a maintenance window as "downtime". That 99.95 applies to all the time outside the window.

Yet AWS manages to give near 100% uptime with RDS at a similar price point. CloudSQL downtime in the cases I've seen was caused by them rebooting both master instances at the same time. This is amateurish and totally unnecessary, the whole point of having multiple masters is to the ability to do maintenance reboots in a staggered schedule. This should be a trivial problem for Google to solve and would result in much better reliability for their users, yet it's been years with no change.