UDB had hundreds of tables per shard, and although there are a few common patterns, they did not all have an exactly identical structure. You have two former FB database engineers in this thread (myself + rwultsch) telling you your statement is incorrect. Nothing in Domas's post discusses lack of schema changes or tables being identical.
Linkbench is unmaintained and does not attempt to mirror the entirety of UDB, just its access patterns: point lookups by PK, and range scans over a secondary index. A fixed access pattern is not the same thing as having no schema changes.
Even putting column changes aside, the entirety of UDB was migrated from InnoDB to MyRocks in 2017, which is essentially a schema change across every single UDB table in every single UDB shard.
And besides, as I mentioned already, the non-UDB MySQL use-cases at Facebook are larger than the vast majority of companies' databases -- larger than the next-largest US social network, even. The non-UDB tiers had dozens of schema changes every single day.
As rwultsch correctly mentioned, Facebook's extreme agility with schema changes is directly what inspired me to create https://www.skeema.io, an open source project offering declarative schema change management. It's used by GitHub, Twilio, and a number of other well-known companies.
Please stop making incorrect statements based on things you have no direct experience with.
From my point of view, neither of you have sufficiently answer the question -- does UDB go through schema change that would require rewriting the table via pt-osc? If so, at what frequency?
Until then, sorry, we will keep advocating people to stay away from MySQL (and thus indirectly, skeema), because "long wait and potential incident from schema migration" is just not something that should come up during a sprint planning.
> Even putting column changes aside, the entirety of UDB was migrated from InnoDB to MyRocks in 2017, which is essentially a schema change across every single UDB table in every single UDB shard.
I read about that, and it is definitely an impressive feat, but still, that doesn't answer the question, at all? That's just relying on MySQL native replication that works across different engines.
> that would require rewriting the table via pt-osc
Facebook does not use pt-osc; they use fb-osc, originally written in PHP and released in 2010 [1] and later ported to Python in 2017 [2]. The concepts are similar to pt-osc re: core use of triggers, but the fine print has some important differences about how the new table structure is specified, when changes are applied, and how changes are made on replicas.
Anyway, fb-osc was used on UDB, the answer is emphatically yes.
btw I am using the past tense here because I haven't kept up with FB mysql stuff the past few years. I don't even know if UDB is still on mysql at all; it's irrelevant to the discussion because the key point here is that schema changes emphatically did occur on UDB for many years, and your original statement regarding TAO and schema changes was demonstrably false, full stop.
> If so, at what frequency?
As I already said, I do not recall! Why would I remember the exact frequency of a completely and seamlessly automated process at a company I left 6 years ago?
UDB didn't require fb-osc changes nearly as often as non-UDB, if that's what you're asking, by nature of it serializing most (not all!) object fields down to a single column. But there were definitely still cases where actual schema changes were necessary on UDB tables, as I'll say yet again, the schema was not completely uniform across all UDB tables.
What's with this obsession on the frequency, anyway? Why does this matter? Your original statement was "the schemas are pretty much static, and they don't feel the schema migration pain", and this statement was wrong. Stop moving the goalpost.
> Until then, sorry, we will keep advocating people to stay away from MySQL (and thus indirectly, skeema), because "long wait and potential incident from schema migration" is just not something that should come up during a sprint planning.
Until when? Who is "we"? You aren't making sense. First your argument was that Facebook supposedly doesn't make schema changes at all, and now you're seemingly pivoting to bashing MySQL for needing external OSC tools, even though your original comment directly acknowledged that PG has cases where lack of these tools is a major problem?
> but still, that doesn't answer the question, at all? That's just relying on MySQL native replication that works across different engines.
What's relying on native replication? Changing a table's storage engine inherently requires rewriting the entire table.
> Anyway, fb-osc was used on UDB, the answer is emphatically yes.
Thank you for answering. I stand corrected then. I also wrongly assumed that it was pt-osc, because that's what get mentioned at your website - "This feature works most easily for pt-online-schema-change"
> UDB didn't require fb-osc changes nearly as often as non-UDB, if that's what you're asking, by nature of it serializing most (not all!) object fields down to a single column.
And thanks here as well for willing to at least slightly conceding your position.
> What's with this obsession on the frequency, anyway? Why does this matter? Your original statement was "the schemas are pretty much static, and they don't feel the schema migration pain", and this statement was wrong. Stop moving the goalpost.
It is a technical discussion, not a competition. Goalpost does get moved. It matters because after reading all the blog posts and bug reports, I have very high respect for Domas, Yoshinori, Mark, and Harrison. And I for one, could not imagine that they would design a critical piece of Facebook infrastructure that would require frequent babysitting.
I believe you would now like to object about the word "babysitting" by claiming that the process is completely and seamlessly automated. The thing is, when a table rewrite is going on due to a schema migration, there's always a risk that the additional write operations would trigger a production incident due to replication lag, which is typically the first bottleneck being hit. The migration to MyRocks likely has made this more seamless by providing more headroom. The experimental write-set replication in MySQL 8.0 might also have improved this, although I don't think Facebook is using 8.0.
> Until when? Who is "we"? You aren't making sense. First your argument was that Facebook supposedly doesn't make schema changes at all, and now you're seemingly pivoting to bashing MySQL for needing external OSC tools, even though your original comment directly acknowledged that PG has cases where lack of these tools is a major problem?
Er, we, as in everyone else except you in this HN thread? You do realize that you are the only one defending MySQL here right? Anyway, PG has exactly one scenario where it needs a migration tool, i.e. changing the column type. This can be easily avoided as long as people are aware of the limitation, e.g. just create the table with bigint as the primary key. So, a migration tool for PG would have been nice, but I don't exactly need one. Make sense?
> What's relying on native replication? Changing a table's storage engine inherently requires rewriting the entire table.
Did you even read about how the migration was done? Firstly, some MyRocks replicas were provisioned, then they started to serve some traffic, and then eventually they get promoted to be the master. With a ton of bug fixes and performance tuning in between, to ensure that there is no regression from InnoDB. I can't take you seriously if you think that the DB engineers would carry out such a risky move of rewriting all tables by changing the storage engine with ALTER TABLE.
> I also wrongly assumed that it was pt-osc, because that's what get mentioned at your website
I said Skeema was inspired by Facebook's approach to schema change agility. It was not implemented by Facebook. It is not a Facebook project, it does not use any Facebook tech. Facebook does not use Skeema.
> there's always a risk that the additional write operations would trigger a production incident due to replication lag
fb-osc bypasses replication entirely. Read the links I provided previously. The 2010 post was written by Mark.
As I said already, fb-osc was used dozens of times per day across Facebook's mysql fleet. Its design was influenced by some of the very people you're name-dropping. It ran seamlessly as part of a self-service declarative schema change automation system.
I was a former member of the team that was directly on-call for all MySQL incidents at Facebook. I am discussing my direct personal experience here. There were certainly some particular repeat-causes of oncall misery, and plenty of oncall shifts that were 12+ hours of hell. Yet I can't recall a single major incident that was caused by online schema change during my time at Facebook.
> The experimental write-set replication in MySQL 8.0
Nothing "experimental" about that feature. As a consultant I've directly used it to speed up parallel replication at major companies that you've very likely heard of.
> You do realize that you are the only one defending MySQL here right?
That statement is demonstrably false. There are several other commenters defending mysql in this overall thread.
Anyway, I'm in good company: the corporations using MySQL make up several trillion dollars of combined market cap. If you have any s&p500 index funds, you are heavily invested in MySQL's successful use, whether you like it or not.
> have very high respect for Domas, Yoshinori, Mark, and Harrison
Yes, these four are superstars, among others. I don't understand how you say you have very high respect for them, yet you're fine with crapping all over the database technology they all spent a large chunk of their lives working on. All four previously worked for MySQL AB, Sun, and/or Oracle.
> I can't take you seriously if you think that the DB engineers would carry out such a risky move of rewriting all tables by changing the storage engine with ALTER TABLE.
Where did I say anything about doing this migration using ALTER TABLE? You keep responding to things I did not say or even imply!
I said the MyRocks migration is an example of schema change across all of UDB, in response to your claims that UDB was somehow static and did not need any schema changes.
Storage engine is part of table schema, both logically and physically. Changing storage engine is a schema change, regardless of how you accomplish it: ALTER TABLE, or trigger-based OSC tool, or RBR-based OSC tool, or old-fashioned replica swaps, or dump-and-reload as done in this case. You gloss over this by saying "some MyRocks replicas were provisioned" -- this is the schema change step, via dump-and-reload!
> It is a technical discussion, not a competition
Is it? Your approach to "technical discussion" apparently involves arguing against people's direct lived experiences; arguing about technology that you have no hands-on experience with; and arguing against strawmen points that were never made in the first place.
You keep name-dropping my former coworkers who you claim to respect, yet you post with a throwaway pseudonym.
I do not believe you are engaging in a good-faith technical discussion, so I will not be responding further.