Hacker News new | ask | show | jobs
by evanelias 3550 days ago
MySQL's replication story has been mature for substantially longer than Postgres's. ~10 years ago, most of today's biggest social networks and other high-volume OLTP sites were starting to scale. Replication is essential and there was really no contest at the time. (EDIT to add specifics: pg added built-in replication in 9.0, released only 6 years ago.)

Today the majority of the largest sites on the internet use MySQL either as their primary database layer or at least as an essential storage system: Facebook, Google, Twitter, LinkedIn, DropBox, Alibaba, YouTube, WordPress.com, Wikipedia, GitHub, Yahoo, Yelp, Pinterest, Etsy, Tumblr, Flickr, Uber, Box, Booking.com... and meanwhile I keep hearing about more moving from pg to mysql partially or entirely (Instagram, Lyft, etc)

The result is that the MySQL ecosystem is huge. There are multiple distributions/branches, many third-party tools, expert consulting companies, knowledgeable people to hire. If you're scaling a high-volume OLTP site/app/product MySQL is a very reasonable choice.

Postgres is a great database, and is a better choice than MySQL for many applications. But some people like to think it's strictly better than MySQL in absolutely every way and every conceivable situation -- this is demonstrably not the case.

4 comments

MySQL's replication is not mature. There are too many edge cases / pitfalls where it doesn't replicate to slaves. It requires you to monitor for these situations and build systems to work around it. I don't know why anyone thinks this is OK.

Postgres is late to the replication game but they have worked hard to make sure that replication doesn't have surprises.

The character set is another nightmare that surprises novice MySQL admins... and then the fact that it allows schema operations to destroy data by default...

Can you cite examples in recent versions? In my experience, the defaults in recent versions (mixed or row-based replication, strict sql mode) are quite sane and prevent most historic problems.

I don't have enough pg experience to make a valid comparison, but I have worked extensively on the largest mysql environment in the world (among many others), and can say that properly-configured mysql replication does not encounter the problems you are describing.

Yes, there are a lot of pitfalls for novice admins to avoid. That's true of all databases to some degree. MySQL may well have more pitfalls than others, but it also has a much wider net of experts who can help you avoid these pitfalls.

Anyway, by "mature" I was also referring to the sheer number of solutions offered, most of which are now battle-tested by many large users. Out-of-the-box in the latest version you get 2 types of logical replication (and a third type mixing the two), 3 options for level of synchronicity (async, semi-sync, and now sync via the new group replication), bidirectional replication, and multi-source replication. And there are additional third-party solutions for other synchronous replication setups (e.g. Galera), and even physical replication has been implemented internally by both Amazon and Alibaba.

Looks like floating point values still aren't handled:

https://dev.mysql.com/doc/refman/5.7/en/replication-features...

This is scary

https://dev.mysql.com/doc/refman/5.7/en/replication-features...

These look like something rare, but scary to those who don't know about them

https://dev.mysql.com/doc/refman/5.7/en/replication-features...

I used to hit issues with replication and IF EXISTS and it was a nightmare: http://bugs.mysql.com/bug.php?id=77684 ... glad to see they (allegedly) fixed it

edit: I don't have anywhere near the experience you do with MySQL, but the odds of the average web developer / sysadmin configuring it properly is unlikely. The advantage of Postgres is that it protects your data first and there are much fewer foot-shooting options.

First link: Floating point values replicate fine. The manual says you may have problems if your master and replica are running different computer architectures, or were built using different compilers (and you're compiling MySQL yourself from source). Both situations are exceedingly rare.

Second link: Despite the admittedly scary wording, these things only impact two scenarios: either you're using MyISAM, and/or you're using MySQL 5.7's new LOGICAL_CLOCK multi-threaded replication but with a vital safety option disabled. Multi-threaded replication is not enabled by default anyway, and only users with extremely high write rates will ever need it.

Third link: these are all edge-cases relating to statement-based replication, which is no longer the default binlog format as of MySQL 5.7. Even with statement-based replication, these are rare; in 13 years of working with MySQL I've personally never encountered any of these specific issues.

Overall: I sort of get your point, yes there's a lot of knobs. But none of the things you've mentioned affect 99.9% of people running MySQL in the first place. MySQL replication works quite well out of the box, but provides a lot of options for advanced users to handle special cases. If you play with those options without knowing what you're doing, yes, you'll shoot yourself in the foot. That's true of most complex computing infrastructure.

The defaults in MySQL used to suck. They don't anymore, and haven't for a while.

I've been a Postgres fan since the mid '90s, but I'm gonna have to agree here. There have been tons of Postgres replication systems out there, for a very long time, but in my investigations I never could find one that I liked more than using DRBD and heartbeat. So that's what I did, and it worked well enough.

The MySQL replication felt much more integrated and there was one obvious choice and it worked fine for the uses I've had.

Did you have to monitor it and sometimes re-replicate it? Yes. Would I have put a replication system in place without monitoring it? No. And these weren't toy systems, one of the environments I used to manage you've probably used if you are in the US (several clients advertised on various superbowls, for example of scale).

I always wanted Postgres replication that was easy and well integrated like MySQL.

Today I think the replication story in Postgres is much better, but I haven't tried it recently.

So is MySQL's though. I LOVE MySQL Galera so far, though I've only used it for low volume stuff. Multi-master, just works, has been super reliable for over a year, but for a very low volume application. The only hitch has been when all nodes go down, you have to manually find the most up to date and start it in bootstrap mode. I wish they could just come up and realize all the nodes are in the cluster and negotiate who is newest automatically. But, it is a small workflow to manually handle it.

In any case, either of these replication systems are better than MS-SQL. Not for technical reasons that I know of, it is simply cost concerns.

I think we have different definitions of maturity.

Mature to me is cleanly implemented and robust in execution.

Mature to you seems to be "released the longest", sometimes these are connected but I don't think this is true for this case.

Mature to me means having a wide feature set, and a history of successfully working at varying scale and workloads.

A lot of people like to bash MySQL, but without citing specific examples, or by referring to problems that haven't been true in a decade or more.

I've listed a large number of huge corporations that have had great success with MySQL, and have decided to stay on MySQL despite having the resources to migrate away if they saw a compelling reason. Some of the companies on that list are relatively new, yet still chose to start with MySQL, or migrate to MySQL after scaling.

If MySQL is as flawed as people in this thread insist, why are so many companies continuing to use it and even migrating to it? In a few cases I can provide a first-hand answer, and maturity and ecosystem are two of the biggest factors.

I believe this is quite easily explained and at the risk of repeating other information in the thread.

1) Why are large companies still using mysql: "We can deal with the pain", Facebook turning mysql into a dumb k/v store is not equivalent to using it's feature set. Nor is it a resounding recommendation.

2) Why are new companies using mysql: Everyone knows mysql, people don't know the problems they will have until it's either too late or MySQL performs decent enough for now.. Many people haven't even heard of PostgreSQL or think of it as foreign and strange and don't wish to reeducate.

People bash on MySQL because like PHP it's architecturally bad and things are being solved at a glacial pace, it's defaults are still absurd, it's handling of character types is absolutely insane unless you know precisely what you're doing and it's absolutely full of inconsistent and often counter-intuitive behaviour.

examples: error in constraint commits data anyway[0], just clobbering your database. UTF8 isn't actually UTF8 it's only three bytes wide, so you need utf8mb4[1].

[0] https://www.youtube.com/watch?v=emgJtr9tIME [1] http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-sets....

Facebook doesn't use MySQL as a dumb k/v store. This is simply incorrect.

Although for primary product data (status updates, friends, likes, etc) the access pattern is restricted via a data access layer, it's supporting a distributed graph model (nodes and edges, distributed across many shards). Far more features than a "dumb k/v store".

Separately, Facebook uses MySQL for countless other critical OLTP use-cases, and (for better or worse) even a few OLAP use-cases. It's the primary store of Facebook, across the entire company. It's the storage layer for ad serving, payments, async task persistence, internal tooling, many many other things. Most of these use-cases make full use of SQL and the relational model.

I was the lead dev on FB's internal DBaaS, which alone supports several thousand different workloads. Despite making up only a few percent of fb's mysql fleet, it alone is still larger than the vast majority of Postgres deployments on earth.

Regarding people not having heard of Postgres: these are big companies with smart, well-informed engineers. And I assure you that Uber and Lyft haven't moved from pg to mysql because they hadn't heard of pg :)

One of your examples is a 9-minute video, from a non-MySQL expert, about mysql 5.5 which came out 6 years ago.

Your other example is a perfectly valid "gotcha", but I don't know of any system that is completely free of gotchas. IMO, needing to know to type utf8mb4 instead of utf8, isn't a good reason to avoid a database altogether.

To reiterate, I am not arguing MySQL is perfect (far from it), or even "better" than pg. I was merely answering the GP's question of why MySQL became more widely-used. I don't wish to engage in yet another mysql-vs-pg endless debate, since my personal opinion is both systems are good choices in different situations.

> Regarding people not having heard of Postgres: these are big companies with smart, well-informed engineers. And I assure you that Uber and Lyft haven't moved from pg to mysql because they hadn't heard of pg :)

Neither was it the other way round, when uber switch from mysql to pg. Both the directional changes appear to have much more to do with changes in engineering leadership, than anything else.

Uber moved from pg to mysql, not the other way around, and they very clearly explained the technical reason they did it https://eng.uber.com/mysql-migration/
One specific thing I like to bash about MySQL is that it doesn't support CTEs (aka: with-statements) which are part of the 1999 SQL standard (nearly 20 years old now). I miss them frequently when working with MySQL.
Postgres has had CTE for a while, but not that long. MySQL 8.0 (the next version) plans to them http://mysqlserverteam.com/mysql-8-0-labs-recursive-common-t...
Big companies like what you listed is not good examples. Since we don't know if they use bare bone MySQL or not. They absolutely have the expertise to modify MySQL and use their own branch. If big names use XXX product, it doesn't mean that product is awesome since that have the money and expertise to modify it.
> we don't know if they use bare bone MySQL or not

No, we know this exactly, because companies freely discuss it. (I've worked for 2 of the companies I listed, and over the years have attended conference talks by pretty much all of the other ones I mentioned.)

The vast majority of these companies use an unmodified copy of either stock (oracle) MySQL, Percona Server, or MariaDB. Of these 3 major options, Percona Server is the most common among large-scale companies.

The companies that modify MySQL (Facebook, Twitter, LinkedIn, Google, Alibaba) readily open-source their changes. They're not really forks, it's more like a patch-set that they port to new versions. They make nice improvements that are often incorporated into Percona Server and some eventually are reimplemented by Oracle to be put back upstream.

The one big exception is Amazon, which doesn't open-source its changes. But aside from them, it's an open source ecosystem. Having companies with "expertise to modify MySQL and use their own branch" is a good thing and this is exactly why it's a good example of MySQL's popularity!