| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ahachete 1682 days ago

I think this comment answers most of my questions: https://news.ycombinator.com/item?id=29248306 Can you confirm (PS) this is how it works? From what I understand here, there are "shadow servers", replicating from the production traffic.

If so, this is cool. I still see some caveats:

* One already mentioned, the scope of migrations is limited to those where both old and new DDL are compatible with the currently running application. If this is the case, I believe it should be clearly advertised as such.

* Being the migration asynchronous, I lose control of when to deploy changes to the application. Even a hook would go a long way, to trigger this.

* Not knowing exactly then the cut-over process is going to happen is also potentially a problem. I understand the cut-over may involve performance degradation (e.g. higher latency) or even connection loss (may you also confirm PS how it is performed?) during some period of time, possibly small. But still, I may need to plan a small maintenance window. But if this is async, I cannot plan the window appropriately.

Neither of this takes away any merits from the solution.

1 comments

shlomi-noach 1682 days ago

Thank you! Please first see my comments to parent, as they describe how online schema change work within the same server; with PlanetScale branching, we do give you a development branch with which you can play as much as you want, without affecting production. Online schema change kicks in when you deploy your changes to production.

> the scope of migrations is limited to those where both old and new DDL are compatible with the currently running application.

You are absolutely correct, and that is the paradigm. Say, for example, you want to add a column, so you first run the migration that adds the column, and only afterwards can you deploy an application change that actually utilizes that column. Likewise if you want to DROP a column, you first deploy an app change that ceases to reference the column, and only then can you actually drop it.

This paradigm worked very well for the companies I worked with, and makes for both loose and tight coupling between code and database. It's loose when you have your test databases where you can deploy schema changes at will. It's loose in the sense you can take small steps at a time, each isolated from the other (e.g. ADD COLUMN does not require you to make any app changes _yet). Then, it's tight where you couple your code changes with the schema in your git repo. It's tight in that the app never gets too far from the database (normally one change away at any given time, per development branch).

> Being the migration asynchronous, I lose control of when to deploy changes to the application

Great point and absolutely on our radar.

> Not knowing exactly then the cut-over process is going to happen is also potentially a problem.

Again great point and on our radar. To be honest I previously moved away from caring about the exact cut-over time. We designed gh-ost to do just that: stall cut-over until the engineer/developer is happy to sit at their desk. OVer time, we found it was unnecessary. But absolutely there's use cases for both approaches.

link

ahachete 1681 days ago

> Thank you! Please first see my comments to parent

Thank you indeed for the time taken to answer all my comments. Now together with all the information here, I understand how it works, and what the trade-offs are.

If my input serves for anything, I'd strongly recommend to take all the information here and write it in a structured way as part of the documentation. I didn't see there any information as valuable as this one. For me, and possibly many others, knowing this information is required in order to make informed decisions about whether to use this or not; and if so, how and what are the trade-offs (e.g. atomizing the changes such that db changes and code changes are independent, which I agree is in general a good thing, but is something to be clearly aware of).

> Again great point and on our radar. To be honest I previously moved away from caring about the exact cut-over time. We designed gh-ost to do just that: stall cut-over until the engineer/developer is happy to sit at their desk. OVer time, we found it was unnecessary. But absolutely there's use cases for both approaches.

For me it's important as cut-over takes some locks. Sure, for a small amount of time. But these locks may create some problems, so that's why I want to be aware. Most of the time are other DDL changes, which are a non-issue here since you already prevent that. But there could be others related to normal db operation. For example, and this may not apply here but does apply with Postgres, such a lock may queue other locks behind (including read-only queries). And if the cut-over lock is itself blocked by other lock (say an explicit table lock), then everything queues on that table and leads to a lock storm, which in turn may cause effective downtime. That's why when we plan migrations or operations similar as this cutover (for example in Postgres a repack operation, which is essentially rewriting a shadow table, in this case just for the purpose or reducing bloat), we really need to take this into account.

link

shlomi-noach 1681 days ago

I really appreciate your feedback. I'll pass on the documentation advice, it's good to have your user perspective.

I hear you on cut-over, and - it's indeed on our radar! I hope to bring good news.

link