Hacker News new | ask | show | jobs
by ciguy 2983 days ago
Unfortunately last time I used CloudSQL for MySQL it was incredibly unstable. They would take down our master AND standby at the same time for maintenance. When we filed a ticket they just said it was a known bug with no plans to fix.

A major client of mine migrated to AWS because of this and other issues.

1 comments

I've been thinking about moving us to Google's Cloud Platform. What I found in regards to maintenance here: https://cloud.google.com/compute/docs/regions-zones/#mainten... states that they do live migrations without any down time. Can anyone elaborate? Is this only for Compute Engine? In that case, if one can run postgres on a Compute Engine instance, why not do that instead? Surely, if one can setup a highly available postgres cluster, Google can do updates without affecting uptime???

To be fair, we wouldn't use GCP for anything but virtual servers and storage replication... I have no desire to tie us to Google's infrastructure any more than necessary.

Were your master and standby in the same availability zone? Can't you set diff maintenance windows? WTF?

https://cloud.google.com/sql/faq#maintenancerestart

According to the link above, you can taper your upgrade windows, it looks like.

"Live migration" refers to how Compute Engine transparently migrates a VM to another physical host [1]. Disk and memory is copied over, and they have some ridiculous technology that keeps network connections alive and re-attaches them to the new VM when it's been switched over, so that it causes, in principle, zero disruptions. This is much more magical than other providers, such as AWS and DigitalOcean, where such a migration results in a reboot.

You can run PostgreSQL on a VM just fine. You just have to manage itself. Cloud SQL comes with some upsides (zero management, spectacular HA failover capabilities) and some downsides (lack of extensions, lives on a separate network, no control over maintenance window); you have to decide what you're willing to live with.

You can set the upgrade window, but it can't be predicted. What you can control is the order — e.g. set your staging instance to "early" and production instance to "late", then hopefully staging should be upgraded first and you'll know ahead of the production upgrade if any issues arose.

[1] https://cloud.google.com/compute/docs/instances/live-migrati...

> they have some ridiculous technology that maintains network connections and re-routes them when everything switches to the new VM

indeed, this is the primary reason i wish to switch. i have no problem maintaining our own stuff, we do that anyway. :) thanks for the details.

If you (or the parent) are interested in some details about that ridiculous technology, there was a paper in NSDI this year: https://www.usenix.org/system/files/conference/nsdi18/nsdi18...

(disclaimer: I'm one of the many authors on the paper, although for building parts of the underlying tech, not writing the prose)

GCP has the best compute, storage, and networking of all the clouds. They are cheaper, faster, more scalable and more reliable than the others. Their managed services leave a lot to be desired (beta status, non-standard interfaces, and other limits) but if you're just looking to run VMs then that is the perfect fit for their cloud.

We consolidated everything on GKE now which lets use use VMs but still have the kubernetes control plane looking after things for us which has been great so far.

Maintenance windows are set for the cluster, not single instances. We were distributed across 3 AZs and Google had no suggestions for mitigating the ~5 minutes of downtime we were seeing every week or two.

The whole experience was so amateur and unprofessional it really soured me on GCE. They do have some cool tech but it seems like their cloud division needs to mature a bit.