| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shlomi-noach 1682 days ago

Hi, engineer at PlanetScale and maintainer for Vitess here. Appreciate your thoughtful comment, a couple answers:

> is something that sounds like I could do ... from Git platforms themselves

Git is very bad at analyzing SQL diffs. It can show you the textual diff between two CREATE TABLE statements, but it will not know what it takes to get you from _here_ to _there_. It has many parsing issues, like capturing irrelevant columns due to trailing commas. Or, for example, if you change a column's data type as per your suggestion, Git cannot differentiate that from a complete drop and recreation of the column. It just doesn't have insight into how SQL works/parses. And most importantly, it is unable to provide the actual operational diff you're seeking: the ALTER TABLE statement to take you from state A to state B. At this point I just want to give a shout out to skeema [0] and its underlying library tengo [1], which tackled this issue in a git-like manner for MySQL dialects/flavors.

> Many DDL changes will take different amount of locks on rows or tables, which may cause some queuing and even lock storms in the presence of incoming traffic.

This is indeed one of our main premises. Online schema changes in PlanetScale (and based on Vitess) will:

- Run concurrently to your production traffic

- Will automatically throttle when your production traffic gets too high, and in particular taking care not to affect replication lag

- Will run completely lockless throughout the migration, up to the cut-over point, where locking is required

- At cut-over point, will only cut-over when it predicts smooth operation (i.e. when satisfied that its own backlog for cutting over is short enough that lock time is minimal)

- To top it all, PS also manages the lifecycle of the Online DDL, such as service discovery, scheduling, error handling, throttling (mentioned), cleanup and garbage collection and more.

I just want to clarify the above is based on proven technologies, widely used in the MySQL community, which I was fortunate enough to be involved in for the past years. These run at scale for the largest deployment in the world today. We keep evolving Online DDL with more to come.

Please also see these docs on the Vitess website: [2]

[0]: https://www.skeema.io/

[1]: https://github.com/skeema/tengo

[2]: https://vitess.io/docs/user-guides/schema-changes/#the-schem...

(Comment edited for formatting and grammar)

2 comments

evanelias 1681 days ago

Do note, earlier this month I archived the repo linked in [1] as a last resort, directly in response to the repeated behavior of you and your colleagues.

I don't even know where to begin with explaining this to outsiders, but here's a sample: I receive confused support emails from companies and users (always non-paying ones, at that) about Skeefree literally every single week; at no point in our lengthy email discussion last year did you ever disclose that your employer would be basing the entirety of its marketing campaign for its commercial offering around schema management in direct competition with my own bootstrapped products; then more recently in your fork of [1] your coworker is adding functionality that once again directly competes with the functionality in my commercial products.

In brief, repeatedly using my own open source work to compete with me, which in turn is preventing me from ever generating revenue from this work, which is needed for the work to continue.

It seemingly never ends. At this point I'm literally on the verge of throwing all my work in the trash and never touching a database again. I've clearly wasted the last several years of my life and all I get in return is a shout-out on a day-old HN thread, cool cool.

link

ahachete 1681 days ago

Thank you for your comments, I appreciate it. I'm still not sold, however. I would like to understand the underlying principles, "how this works". I don't need implementation details (happy if they are shared, though) but more on the main principles of operation. Please see my further comments below:

> Git is very bad at analyzing SQL diffs.

Agreed, nothing against. So PS has built-in a nice SQL diff. Neat! But what this really brings? I mean, it's not that there aren't SQL diff tools, tools to manage DDL migrations. Besides this, why not layer it on top of Git? Many orgs and integration tools already have similar workflows (e.g. approval workflows, issue management tools, CI, etc) and if instead of coming up with a new system it would be a layer on top of the existing ones, it would probably have less friction to use. Just my perspective on this, of course.

> Run concurrently to your production traffic

Can you elaborate? How? Do they run on another servers? Or are they waiting on a queue change waiting to be applied? If they run on different servers, what they run there, since AFAIK the migration is only DDL, there's no data?

> Will automatically throttle when your production traffic gets too high, and in particular taking care not to affect replication lag

Same as above: who will throttle, the migration? But what is the migration? Let's use my example: a column type change requires a table rewrite. So the table rewrite will throttle, i.e. slow down? But where is this table rewrite running, on the main server (apparently not) or on a shadow server (apparently either since migrations have no data)? Actually you mention "when your production traffic gets too high". What is "high", can you quantify? We run customers that do dozens to thousands of transactions per second. Is this high enough? Will their migrations ever run, or will wait for very long periods of time, maybe forever?

> Will run completely lockless throughout the migration

How is this possible? Where the migration is running, then? A shadow table, shadow server... none?

> At cut-over point

What's cut-over? Are groups of servers switched? This is what it sounds to me, and that would explain how it could be lock-less and not affecting production traffic. However, it does not explain how data is synchronized from the production database to the migration branch, nor how it keeps being updated with the real production traffic. This is essentially the crux of me failing to understand how this system works.

In general, I apologize if these are too many questions. But in essence, I feel this all sounds really well, but unless I have a deeper understanding of how the principles work, and they are sound to me, I won't be able to recommend this for production usage, as I know from experience the many caveats migrations have. If they are all solved, hats off, but I would appreciate if from a technical perspective this would be more clearly explained.

Thank you!

link

ahachete 1681 days ago

I think this comment answers most of my questions: https://news.ycombinator.com/item?id=29248306 Can you confirm (PS) this is how it works? From what I understand here, there are "shadow servers", replicating from the production traffic.

If so, this is cool. I still see some caveats:

* One already mentioned, the scope of migrations is limited to those where both old and new DDL are compatible with the currently running application. If this is the case, I believe it should be clearly advertised as such.

* Being the migration asynchronous, I lose control of when to deploy changes to the application. Even a hook would go a long way, to trigger this.

* Not knowing exactly then the cut-over process is going to happen is also potentially a problem. I understand the cut-over may involve performance degradation (e.g. higher latency) or even connection loss (may you also confirm PS how it is performed?) during some period of time, possibly small. But still, I may need to plan a small maintenance window. But if this is async, I cannot plan the window appropriately.

Neither of this takes away any merits from the solution.

link

shlomi-noach 1681 days ago

Thank you! Please first see my comments to parent, as they describe how online schema change work within the same server; with PlanetScale branching, we do give you a development branch with which you can play as much as you want, without affecting production. Online schema change kicks in when you deploy your changes to production.

> the scope of migrations is limited to those where both old and new DDL are compatible with the currently running application.

You are absolutely correct, and that is the paradigm. Say, for example, you want to add a column, so you first run the migration that adds the column, and only afterwards can you deploy an application change that actually utilizes that column. Likewise if you want to DROP a column, you first deploy an app change that ceases to reference the column, and only then can you actually drop it.

This paradigm worked very well for the companies I worked with, and makes for both loose and tight coupling between code and database. It's loose when you have your test databases where you can deploy schema changes at will. It's loose in the sense you can take small steps at a time, each isolated from the other (e.g. ADD COLUMN does not require you to make any app changes _yet). Then, it's tight where you couple your code changes with the schema in your git repo. It's tight in that the app never gets too far from the database (normally one change away at any given time, per development branch).

> Being the migration asynchronous, I lose control of when to deploy changes to the application

Great point and absolutely on our radar.

> Not knowing exactly then the cut-over process is going to happen is also potentially a problem.

Again great point and on our radar. To be honest I previously moved away from caring about the exact cut-over time. We designed gh-ost to do just that: stall cut-over until the engineer/developer is happy to sit at their desk. OVer time, we found it was unnecessary. But absolutely there's use cases for both approaches.

link

ahachete 1681 days ago

> Thank you! Please first see my comments to parent

Thank you indeed for the time taken to answer all my comments. Now together with all the information here, I understand how it works, and what the trade-offs are.

If my input serves for anything, I'd strongly recommend to take all the information here and write it in a structured way as part of the documentation. I didn't see there any information as valuable as this one. For me, and possibly many others, knowing this information is required in order to make informed decisions about whether to use this or not; and if so, how and what are the trade-offs (e.g. atomizing the changes such that db changes and code changes are independent, which I agree is in general a good thing, but is something to be clearly aware of).

> Again great point and on our radar. To be honest I previously moved away from caring about the exact cut-over time. We designed gh-ost to do just that: stall cut-over until the engineer/developer is happy to sit at their desk. OVer time, we found it was unnecessary. But absolutely there's use cases for both approaches.

For me it's important as cut-over takes some locks. Sure, for a small amount of time. But these locks may create some problems, so that's why I want to be aware. Most of the time are other DDL changes, which are a non-issue here since you already prevent that. But there could be others related to normal db operation. For example, and this may not apply here but does apply with Postgres, such a lock may queue other locks behind (including read-only queries). And if the cut-over lock is itself blocked by other lock (say an explicit table lock), then everything queues on that table and leads to a lock storm, which in turn may cause effective downtime. That's why when we plan migrations or operations similar as this cutover (for example in Postgres a repack operation, which is essentially rewriting a shadow table, in this case just for the purpose or reducing bloat), we really need to take this into account.

link

shlomi-noach 1681 days ago

I really appreciate your feedback. I'll pass on the documentation advice, it's good to have your user perspective.

I hear you on cut-over, and - it's indeed on our radar! I hope to bring good news.

link

shlomi-noach 1681 days ago

> why not layer it on top of Git? ... Many orgs and integration tools already have similar workflows

Indeed. See my writeup on skeefree, which we developed at GitHub:

https://github.blog/2020-02-14-automating-mysql-schema-migra...

or watch my FOSDEM presentation:

https://www.youtube.com/watch?v=xyMKhL75Vyg

Yes, there are many tooling, but one of my frustrations is that they're, well, tooling. In Vitess and in PSDB, the database itself gives you the developer flow you expect. You can write a solution that integrates with GitHub, but then someone else uses BitBucket, or Phabricator, or whatever other framework they have.

I've been in the MySQL space for some 20 years now, 12 of which are active in open source. What frustrates me more than anything is how different companies have to come up with similar solutions to the same problems - but cannot afford to "just use" some existing tooling or framework, because it was build for a specific kind of infrastructure, or assumes this and that setup. Cloud or no cloud? Kubernetes or bare metal? DNS or proxies? Central service discovery or distributed configuration? And so on and on...

So many tooling written to compensate for funtionalities missing in the database. We all wished the database would _just do it_. As an engineer, here is my opportunity to write stuff into the database system, or into a framework that presents itself to the app as the database. To then be used by however users choose to, because they have a functionality they don't need to worry about, and can have less boilerplate code to fit it in their infrastructure.

link

shlomi-noach 1681 days ago

Again, thank you for the questions.

I am estimating that your database space isn't MySQL, which is just fine of course. Reason I'm asking/guessing, is that in the MySQL space, online schema change toold have been around for over a decade and are the go-to solution for schema changes. A small minority of the industry, based on my understanding as a member of the community, uses other techniques such as rolling migrations on replicas etc., but the vast majority uses one of the common schema change tools:

- pt-online-schema-change - facebook's OSC - gh-ost

I authored the original schema change tool, oak-online-alter-table https://shlomi-noach.github.io/openarkkit/oak-online-alter-t..., which is no longer supported, but thankfully I did invest some time in documenting how it works. Similarly, I co-designed and was the main author for gh-ost, https://github.com/github/gh-ost, as part of the database infrastructure team at GitHub. We developed gh-ost because the existing schema change tools could not cope with our particular workloads. Read this engineering blog: https://github.blog/2016-08-01-gh-ost-github-s-online-migrat... to get better sense of what gh-ost is and how it works. I in particular suggest reading these:

- https://github.com/github/gh-ost/blob/master/doc/cheatsheet....

- https://github.com/github/gh-ost/blob/master/doc/cut-over.md

- https://github.com/github/gh-ost/blob/master/doc/subsecond-l...

- https://github.com/github/gh-ost/blob/master/doc/throttle.md

- https://github.com/github/gh-ost/blob/master/doc/why-trigger...

At PlanetScale I also integrated VReplication into the Online DDL flow. This comment is far too short to explain how VReplication works, but thankfully we again have some docs:

- https://vitess.io/docs/user-guides/schema-changes/ddl-strate... (and really see entire page, there's comparison between the different tools)

- https://vitess.io/docs/design-docs/vreplication/

- or see this self tracking issue: https://github.com/vitessio/vitess/issues/8056#issue-8771509...

Not to leave you with only a bunch of reading material, I'll answer some questions here:

> Can you elaborate? How? Do they run on another servers? Or are they waiting on a queue change waiting to be applied? If they run on different servers, what they run there, since AFAIK the migration is only DDL, there's no data?

The way all schema change tools mentioned above work is by creating a shadow aka ghost table on the same primary server where your original table is located. By carefully both copying data from original table as well as tracking ongoing changes to the table (whether by utilizing triggers or by tailing the binary logs), and using different techniques to mitigate conflicts between the two, the tools populate the shadow table with up-to-date data from your original table.

This can take a long time, and requires an extra amount of space to accommodate the shadow table (both time and space are also required by "natural" ALTER TABLE implementations in DBs I'm aware of).

With non-trigger solutions, such as gh-ost and VReplication, the tooling have almost ocmplete control over the pace. Given load on the primary server or given increasing replication lag, they can choose to throttle or completely halt execution, to resume later on when load has subsided. We have used this technique specifically at GitHub to run the largest migrations on our busiest tables at any time of the week, including at peak traffic, and this has show to pose little to no impact to production. Again, these techniques are universally used today by almost all large scale MySQL players, including Facebook, Shopify, Slack, etc.

> who will throttle, the migration? But what is the migration? Let's use my example: a column type change requires a table rewrite. So the table rewrite will throttle, i.e. slow down? But where is this table rewrite running, on the main server (apparently not) or on a shadow server (apparently either since migrations have no data)? Actually you mention "when your production traffic gets too high". What is "high", can you quantify?

The tool (or Vitess if you will, or PlanetScale in our discussion) will throttle based on continuously collecting metrics. The single most important metric is replication lag, and we found that it predicts load more than any other matric, by far. We throttle at 1sec replication lag. A secondary metric is the number of concurrent executing threads on the primary; this is mroe improtant for pt-online-schema-change, but for gh-ost and VReplication, given their nature of single-thread writes, we found that the metric is not very important to throttle on. It is also trickier since the threshold to throttle at depends on your time of day, particular expected workload etc.

> We run customers that do dozens to thousands of transactions per second. Is this high enough?

The tooling are known to work well with these transaction rates. VReplication and gh-ost will add one more transaction at a time (well, two really, but 2nd one is book-keeping and so low volume that we can neglect it); the transactions are intentionally kept small so as to not overload the transaction log or the MVCC mechanism; rule of thumb is to only copy 100 rows at a time, so exepect possibly millions of sequential such small transaction on a billion row table.

> Will their migrations ever run, or will wait for very long periods of time, maybe forever?

Some times, if the load is so very high, migrations will throttle more. At other times, they will push as fast as they can while still keeping to low replication lag threshold. In my experience a gh-ost or vreplication migration is normally good to run even on the busiest times. If a database system is such that it _always_ has substantial replication lag, such that a migration cannot complete in a timely manner, then I'd say the database system is beyond its own capacity anyway, and should be optimized/sharded/whatever.

> How is this possible? Where the migration is running, then? A shadow table, shadow server... none?

So I already mentioned the ghost table. And then, SELECTs are non blocking on the original table.

> What's cut-over?

Cut-over is what we call the final step of the migration: flipping the tables. Specifically, moving away your original table, and renaming the ghost table in its place. This requires a metadata lock, and is the single most critical part of the schema migration, for any tooling involved. This is where something as to give. Tooling such as gh-ost and pt-online-schema-change acquire a metadata lock such that queries are blocked momentarily, until cut-over is complete. With very high load the app will feel it. With extremely high load the database may not be able to (or may not be configured to) accommodate so many blocked queries, and app will see rejections. For low volume load apps may not even notice.

I hope this helps. Obviously this comment cannot accommodate so much more, but hopefully the documentation links I provided are of help.

link

ahachete 1681 days ago

> I am estimating that your database space isn't MySQL, which is just fine of course.

You are absolutely right :) My background is strongly on Postgres, you can see from my profile more information if you want to.

So yes, I apologize if some of my questions are not applying or become to obvious for cases that are MySQL-based. But for the most part, I believe principles of operation are the same.

> [other comments]

As mentioned, thank you very much for the detailed information. This completes the picture that I was looking for. I will definitely go in more detail for some of the links provided.

This principle of operation is not too different from something I proposed to a Postgres project some time ago (https://github.com/cybertec-postgresql/pg_squeeze/issues/18). This tool indeed is conceptually pretty similar. It's a shame that supporting schema changes is not part of their focus at this point. It wouldn't do throttling either, but it shouldn't be a difficult feature to add, I guess.

For other users here that may be interested in the Postgres world, there are two tools that perform similar operation (creating a shadow table and filling it in the background), but are both focused on rewriting the table to avoid bloat, rather than for doing a schema migration:

* pg_repack (https://reorg.github.io/pg_repack/): the most used one, relies on triggers * pg_squeeze: already mentioned, uses logical replication

link

shlomi-noach 1681 days ago

Heh, and in the MySQL space, we use a "trivial" online schema schema migration (that has no actual schema changes) to avoid table bloat :)

link