Hacker News new | ask | show | jobs
by throwdbaaway 1939 days ago
Exactly. I think the prior HN discussions failed to call out how painful it is to do any sort of schema migration against a big InnoDB table [1][2].

Well known MySQL uses such as Facebook TAO and this Uber Schemaless are typically abstractions built on top of MySQL, which means the schemas are pretty much static, and they don't feel the schema migration pain.

For a typical RoR startup that relies on a RDBMS, please, stay away from MySQL.

[1] Yes, I know about the INSTANT ADD COLUMN patch from Tencent Games that landed in MySQL 8.0, and which has had major bug fixes in at least 8.0.14 and 8.0.20.

[2] A side effect is that MySQL now has a thriving ecosystem of schema migration tools (pt-osc, lhm, gh-ost), while Postgres has none, and there are situations where there is indeed no choice but to rewrite the table, e.g. changing a column type from int to bigint.

2 comments

To echo and add to rwultsch's sibling comment:

* Facebook had extremely frequent schema changes, and powerful declarative schema management automation to support this

* The TAO (or more correctly "UDB") use-case supported using many separate tables, not one giant generic key/value table as people seem to assume

* The non-UDB MySQL use-cases at Facebook, in combination, are still larger than the vast, vast majority of all other companies' databases. These non-UDB databases use a wide range of MySQL's functionality. The frequent claims that "Facebook used MySQL just as a dumb K/V store" are absolutely incorrect and have never been correct.

Thanks for the insight! So how frequent does schema change happen to UDB?
Sorry, I don't recall the exact frequency, I left FB in 2015.
Then I guess my point about UDB doesn't require schema change will still stand then? Looking at https://github.com/facebookarchive/linkbench, the table structure looks pretty much "final". And judging by https://dom.as/2015/07/30/on-order-by-optimization/, the access pattern is pretty much fixed as well.
UDB had hundreds of tables per shard, and although there are a few common patterns, they did not all have an exactly identical structure. You have two former FB database engineers in this thread (myself + rwultsch) telling you your statement is incorrect. Nothing in Domas's post discusses lack of schema changes or tables being identical.

Linkbench is unmaintained and does not attempt to mirror the entirety of UDB, just its access patterns: point lookups by PK, and range scans over a secondary index. A fixed access pattern is not the same thing as having no schema changes.

Even putting column changes aside, the entirety of UDB was migrated from InnoDB to MyRocks in 2017, which is essentially a schema change across every single UDB table in every single UDB shard.

And besides, as I mentioned already, the non-UDB MySQL use-cases at Facebook are larger than the vast majority of companies' databases -- larger than the next-largest US social network, even. The non-UDB tiers had dozens of schema changes every single day.

As rwultsch correctly mentioned, Facebook's extreme agility with schema changes is directly what inspired me to create https://www.skeema.io, an open source project offering declarative schema change management. It's used by GitHub, Twilio, and a number of other well-known companies.

Please stop making incorrect statements based on things you have no direct experience with.

From my point of view, neither of you have sufficiently answer the question -- does UDB go through schema change that would require rewriting the table via pt-osc? If so, at what frequency?

Until then, sorry, we will keep advocating people to stay away from MySQL (and thus indirectly, skeema), because "long wait and potential incident from schema migration" is just not something that should come up during a sprint planning.

> Even putting column changes aside, the entirety of UDB was migrated from InnoDB to MyRocks in 2017, which is essentially a schema change across every single UDB table in every single UDB shard.

I read about that, and it is definitely an impressive feat, but still, that doesn't answer the question, at all? That's just relying on MySQL native replication that works across different engines.

FB had plenty of schema changes. I know, I wrote software to push them out. The important concepts for pushing schema at scale became part of Skeema.