We were using their managed MySQL instance. Without any communication, they pushed a silent update which broke our year-long security system that updated MySQL _priv tables in line with specific permissions and roles. There is no workaround.
When I reported the bug, it was labeled "Working as intended," again with no communication from Google as to why this breaking change was made, or even confirmation that it was something that Google changed at all. We spent days trying to figure out why we could reproduce it in some databases but not others, including diving into the MySQL source code.
I don't have experience with their Postgres product, but pushing breaking changes to a database with no notice or communication (either before or after doing it) simply isn't acceptable.
> Direct modification of grant tables using statements such as INSERT, UPDATE, or DELETE is discouraged and done at your own risk. The server is free to ignore rows that become malformed as a result of such modifications.
Discouraged, but not broken--except in GCP after a silent update. GCP now blocks changes from being made to those tables entirely, breaking with MySQL's behavior, as well as GCP's own earlier behavior.
We have battle-tested tooling that makes this work brilliantly in all other environments. Updating the _priv tables directly is the only way to modify privileges within a transaction in MySQL.
Sounds like you took a calculated risk and it backfired. Google Cloud SQL even has it in their documentation that they automatically do these upgrades, and you were relying on unsupported behavior of the database. Especially when it’s security related, I can easily imagine something breaking in a minor release.
You’re using unsupported functionality that may break at any time, and you are using a managed database service that automatically updates. I don’t think you would have had a better time with any of the other cloud providers.
It's not unsupported. It's literally documented by the MySQL project, existed for more than 5 years, was supported by Google, and continues to be supported by MySQL.
Google broke it without warning. That's a breaking change to your database, in production, 5 years after v5.7's release when you're fully locked in without so much as even a version bump.
Microsoft didn't do this. AWS didn't do this. Google did this.
Let this be a lesson to everyone that Google can and will break your critical production systems even years after they're operating perfectly, and they'll provide no warning, no explanation, and no fix.
If it's specifically a change Google did, and upstream MySQL still supports this behavior, I'm inclined to agree with you that this is a surprise.
Having said that, it is worth emphasizing that you relied on unsupported / discouraged behavior that "is at your own risk". This isn't just any random feature Google disabled, it's security-related behavior that is discouraged to rely upon by MySQL themselves. This is a big caveat, and is something that I would never recommend anyone to do, especially not when you're planning on outsourcing the management + maintenance of your database.
Yeah regardless of these other points about the risk you took, the lack of response or initial communication is not good at all. Seems to be a pattern with Google. Sorry to hear you sunk so much time in chasing that change down.
Yes. When I'm applying large changes to permissions across many users and many tables/columns, which happens after every migration, then I absolutely need transaction guarantees. I cannot risk having a half-applied change leaving the database and its users in an unknown state.
Could you write your permission changes/rollback logic to be idempotent? In this case you could always push the changes to completion, and reason about whether each intermediate state is safe.
An alternative to this is to snapshot the database, and test the permissions change on the snapshot. Not as good as a transaction, but often good enough to proceed with the prod operation.
>GCP now blocks changes from being made to those tables entirely, breaking with MySQL's behavior, as well as GCP's own earlier behavior.
There seems to be a common pattern across Google's services of trying to force best practices on users at the expense of breakages. I suspect though that users are more likely to think "screw you Google, stop breaking my stuff" than "thank you Google for forcing me to follow best practices".
Meanwhile, AWS Aurora is still on Postgres 11. Postgres 12 is more than a year old.
I've been using Aurora because it's managed, I don't have to worry about backups, it's faster and cheaper and easier to scale, etc.
But Postgres 12 has some really important features and performance improvements, and we really need them for my app. I've got to wonder why I'm paying for a managed service when they can't even manage to do a major version upgrade in a year.
That's the price of a heavily patched postgres fork. It's really expensive to maintain them. Far from the first time such forks fell behind quickly (e.g. redshift, greenplum).
Please Google, implement binary replication from an external primary. There's no way we can move to managed SQL because it'd take weeks to import the SQL dump.
Whether you should use replication for upgrade is a bit more of a loaded question. But dump/restore should definitely NOT be the mechanism for upgrades. Postgres has support for in place upgrades with pg_upgrade for some time now. This is the mechanism used on RDS, and the same mechanism we'll use on Crunchy Bridge, and what Heroku Postgres uses as well. I'm not sure that Azure supports in place upgrade yet, and GCP does not per their docs.
A dump and restore is simply not viable for a database of any size. 100GB datababase which is not at all in the "large" territory would be down for at least an hour if not longer.
Pg_upgrade is generally the right shape of thing for this problem. One could debate whether replication is a better approach for even reducing that down time. (Pg_upgrade is on the order of minutes, it is not a size of data operation but rather a size of catalog operation). But that dump/restore is acceptable and the best option isn't really the case these days.
I was reading a bit between the lines, original was about migrating in, another was saying about a proper synchronization service or dump. In the case of migrating from something outside of Cloud SQL into Cloud SQL you could basically do this today. A dump wouldn't be recommended in my opinion (having done a lot of migrations across cloud providers including several multi-TB databases).
In that case as long as the source has some form of decoding (test decoding plugin or pgoutput) it should work.
That a dump is always the best process anything isn't really true these days (I know someone will show up with a case of why dump is 100% for what they need in moving data around). But, the combination of logical decoding, and pg_upgrade cover most cases.
Admit I was jumping a bit with upgrades, but that is where dump/restore does most often come up, not with migrations.
If you're going to pay for someone to manage Postgres for you, why not pay an official supporter/contributor to the project? I don't see Google investing in features nor community. Are you simply reaching for whatever seems more convenient?
If you're on GCP already, of course you'd opt to use their managed Postgres service, otherwhise you'd have to worry about egress traffic, transport encryption, IAM, etc by yourself (or pay someone else do it for you, of course, making it more difficult to calculate your infra costs, having a 2nd support contract, etc) without much benefit.
How do you know that Google isn't supporting Postgres in any way, e.g. by supplying upstream patches, etc? The same goes for AWS, Azure, Heroku.
>How do you know that Google isn't supporting Postgres in any way, e.g. by supplying upstream patches, etc? The same goes for AWS, Azure, Heroku.
There's a list of contributors and what organization they're from here <https://www.postgresql.org/community/contributors/>. You won't see "Azure" listed but that's because it's considered part of Microsoft. Of the ones you listed, Azure is the only one that is considered a major contributor though you'll see there are companies that specialize in managed Postgres specifically.
Without logical replication you're effectively locked in the same way people talk about Oracle lock-in. If you have any sizable amount of data you'd have to do a dump/restore to get it out which would be a change of data size operation. For 100 GB of data you're looking at 1-3 hrs depending, for 1 TB you'd be looking at a day probably.
Logical replication allows you to create replication slots and send it to other places. It does allow for more interesting things as well. One really common use case (not even migrating off) is change data capture out of Postgres into Kafka leveraging Debezium. In older versions of Postgres you could use wal2json for this, now more more recent versions with logical replication supported the pgoutput plugin is great here.
Upgrades are a thing, but in place upgrades can be fully captured without logical replication. The short is though they are limiting a lot of what Postgres can do by not supporting it.
It's unclear to me why they don't support this. RDS has had it since years, Azure Postgres added support for Debezium recently (we helped a bit with that), Heroku has it, but Postgres on GCP is missing support for logical replication and thus Debezium. I'm subscribed to the feature request (https://issuetracker.google.com/issues/70756171), it's upvoted by users weekly, if not more often.
mostly because it' makes a lot of stuff easier. like database upgrades (no downtime), especially when google gives zero fucks in implementing pg_upgrade support.
also everything that @craigkerstiens said. (shamlessly point to his text)
It's required for logical replication which is an important feature for downtimeless major version upgrades and more interesting replication topologies, change data capture, etc.
I love Cloud SQL for PostgreSQL. But upgrading still is a giant pain. I sincerely wish there was a way to schedule an automatic upgrade to newer major versions.
I'm really glad to see this. GCP was really slow to add point-in-time-recovery support to Postgres, it went live just a couple months ago, so I'm really impressed to see Postgres 13 support just a month and a half after it was released.
You can. This is for Google's fully managed SQL service, "fully managed" meaning that you don't have to directly manage Docker instances or VMs yourself.
But you do have to specify CPU and memory and pay for those for having the database running - even if you're not calling it.
What I'd like is to pay for what I use when calls are being made. Providing an SQL interface as a service, rather than actually running a whole personally copy of a database.
Yup, except that's got a cold start time of 25 seconds. So no use for a low usage system. (One, for instance, that gets used a couple of times a day, but really needs to be available when it is.)
Near zero. If my data is only a few kB then an already running process should be able to read it from disk and read/write the data almost instantly.
They should be working on a version of MySQL that can do this without spawning a whole new running instance on a whole new server.
It’s the “ask me why” that got you the downvotes. If you had of just stated your reasons without waiting for engagement you probably would have been upvoted.
Your first comment was not in line with HN guidelines. Your second comment is much better because it gives reasons.
I would have downvoted your first (and I do not work for GCP, mind you); I am now inclined not to just because you posted the reasons in your second comment.
Only this week did I receive the final Google Play Music notification that the service is permanently disabled.
Just before that Google failed to properly implement the payment settings to continue my Youtube Premium subscription at my previous price in the bundled GPM monthly packaging that I received. Thus, I lost access to both GPM and YT premium in single day.
We were using their managed MySQL instance. Without any communication, they pushed a silent update which broke our year-long security system that updated MySQL _priv tables in line with specific permissions and roles. There is no workaround.
When I reported the bug, it was labeled "Working as intended," again with no communication from Google as to why this breaking change was made, or even confirmation that it was something that Google changed at all. We spent days trying to figure out why we could reproduce it in some databases but not others, including diving into the MySQL source code.
I don't have experience with their Postgres product, but pushing breaking changes to a database with no notice or communication (either before or after doing it) simply isn't acceptable.