| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by frankietaylr 1939 days ago
	Has Postgres architecture changed since Postgres 9.2 in terms of the inefficiencies mentioned in the article?

3 comments

glogla 1939 days ago

Sure, but it does not really matter.

Uber has not switched from Postgres used as RDBMS to MySQL used as RDBMS, they switched from Postgres used as RDBMS to MySQL used as key-value storage layer of homegrown sharded non-relational database.

This has pretty much no bearing on anyone using Postgres or MySQL in reasonable way.

link

throwdbaaway 1939 days ago

Exactly. I think the prior HN discussions failed to call out how painful it is to do any sort of schema migration against a big InnoDB table [1][2].

Well known MySQL uses such as Facebook TAO and this Uber Schemaless are typically abstractions built on top of MySQL, which means the schemas are pretty much static, and they don't feel the schema migration pain.

For a typical RoR startup that relies on a RDBMS, please, stay away from MySQL.

[1] Yes, I know about the INSTANT ADD COLUMN patch from Tencent Games that landed in MySQL 8.0, and which has had major bug fixes in at least 8.0.14 and 8.0.20.

[2] A side effect is that MySQL now has a thriving ecosystem of schema migration tools (pt-osc, lhm, gh-ost), while Postgres has none, and there are situations where there is indeed no choice but to rewrite the table, e.g. changing a column type from int to bigint.

link

evanelias 1939 days ago

To echo and add to rwultsch's sibling comment:

* Facebook had extremely frequent schema changes, and powerful declarative schema management automation to support this

* The TAO (or more correctly "UDB") use-case supported using many separate tables, not one giant generic key/value table as people seem to assume

* The non-UDB MySQL use-cases at Facebook, in combination, are still larger than the vast, vast majority of all other companies' databases. These non-UDB databases use a wide range of MySQL's functionality. The frequent claims that "Facebook used MySQL just as a dumb K/V store" are absolutely incorrect and have never been correct.

link

throwdbaaway 1938 days ago

Thanks for the insight! So how frequent does schema change happen to UDB?

link

evanelias 1938 days ago

Sorry, I don't recall the exact frequency, I left FB in 2015.

link

throwdbaaway 1938 days ago

Then I guess my point about UDB doesn't require schema change will still stand then? Looking at https://github.com/facebookarchive/linkbench, the table structure looks pretty much "final". And judging by https://dom.as/2015/07/30/on-order-by-optimization/, the access pattern is pretty much fixed as well.

link

rwultsch 1939 days ago

FB had plenty of schema changes. I know, I wrote software to push them out. The important concepts for pushing schema at scale became part of Skeema.

link

alberth 1939 days ago

What about Postgres native key-store called HStore?

https://www.postgresql.org/docs/8.3/hstore.html

I love Postgres just as much as anyone but Uber use case still seemed to be a better fit for MySQL. I was hopeful this would kickstart a renewed focus on features / architecture within thr Postgres community and I’m not certain anything resulted from this. Hope to be wrong obviously.

link

pizza234 1939 days ago

The main point, clustered vs. nonclustered indexing, is architectural, and not inherently inefficient; it depends on the use case.

"Highly advanced" databases give both options, but AFAIK, MySQL/PGSQL will likely not offer this, at least for a very long time, since it requires radical changes.

link

evanelias 1939 days ago

On the one hand, MySQL has offered this for two decades, by virtue of pluggable storage engines being core to its design. Some storage engines use clustered indexes and some do not. The user can decide which one matches their use-case; very large companies can design their own custom special-purpose storage engines; etc.

On the other hand, mixing storage engines in a single db instance has operational downsides (especially re: crash-safe replication). And InnoDB is by far the dominant storage engine, and is probably unlikely to offer nonclustered indexing, so from that perspective I agree with your point.

link

EdwardDiego 1938 days ago

I still remember the "good" ol days when the default MySQL engine was MyISAM. Even after InnoDB became the default, a lot of people were still configuring it for MyISAM for their 30 user webapp because they'd heard it was faster, and besides, you can ensure data integrity in code, right?

I made a bit of money freelancing on "my database for my LAMP stack app is corrupt!" issues by a) demonstrating that InnoDB wouldn't slow down their webapp in any measurable form and then b) trying to save and normalise as much data as possible.

link

Tostino 1939 days ago

It'll be interesting to see how things shake out when some of the other implementations using postgres's pluggable storage API start maturing. I wonder if it'll have some of the same operational downsides that mixing storage in MySQL has.

link

evanelias 1938 days ago

Good question. I assume it depends on how Postgres handles multi-engine transactions, and how it stores replication state metadata. A good discussion of the issue in MySQL/MariaDB is here: https://kristiannielsen.livejournal.com/19223.html

Apparently MariaDB 10.3+ has this solution implemented, which is cool, never knew that before. I don't think there's anything equivalent in MySQL.

link

user5994461 1939 days ago

The replication was redone around that time (not sure which version exactly). It's still working on the same principles though, sending queries and redoing them on each replica.

Before in short, the WAL was sent every minute and always 10MB even if there were no changes. Now it's more adaptive, actually doing nothing when they are no changes, and picking up quicker when changes begin.

I am surprised they don't mention this point because the replication was really unusable in PostgreSQL.

There are still spikes (write amplification) and other drawback from this design, but at least it doesn't shit itself under no activity.

link

Quekid5 1939 days ago

I don't understand this. Why would it be sending queries and redoing on a replica and sending the WAL? Just sending the WAL would seem to be sufficient, or alternatively: sending queries would be redundant if you just send the WAL and apply directly at the secondaries.

link

amenonsen 1939 days ago

It doesn't make sense to you because everything in the comment you're replying to is wrong.

Neither log shipping (copying WAL files one by one) nor streaming replication (sending a stream of WAL) works by sending queries. WAL segments are 16MB by default, and the default archive_timeout is 0, not 1 minute (and the archive timeout is not applicable to streaming replication anyway). There is also nothing "adaptive" about the replication—when there is no traffic, there will be ~no changes, and when there are changes, they will be sent to the replica.

I don't understand what the comment is suggesting used to happen in periods of no activity that made replication unusable, but it is also probably incorrect, and has nothing to do with the write amplification problem.

link

Quekid5 1939 days ago

Thank you for confirming my suspicions :).

link

user5994461 1938 days ago

I was calling the WAL the "queries" to simply, never mind that, it doesn't matter whether it contains the queries or not.

What's important is that the WAL was generated on a periodic basis and of a constant size. Say 16MB every minute. It's pretty much a plain file, that could be stored on S3/FTP.

This had a lot of drawbacks:

- Replicas were measurably late behind the current state, simply because of the built-in delay in "replication".

- It was incredibly inefficient on bandwidth and storage. Consider the time it takes to transfer large files (especially for off-site replicas) and storage costs. That further contributed to poor performance and delay.

- There could be many WAL files generated at once when there were changes happening. They would take FOREVER to be processed. It was commonplace for replicas to fall 5-10 minutes under what I consider to be minimum activity.

Long story short, the replication was reworked in a later version of PostgreSQL (3 or 4 years ago), the part about fixed size and fixed delay is not true anymore.

link

chousuke 1939 days ago

In streaming (physical) replication, PostgreSQL sends the WAL only, and it's applied on the replica; no "sending of queries" is involved, or even possible; with physical replication, the secondary has to keep itself identical with the primary, otherwise replication will fail. This is why you can't use physical replication across major versions, since they can't be bit-for-bit identical.

In more recent versions there's "logical replication", which sort of "sends the queries", in that the secondary node has its own database state that does not have to be exactly identical with the primary, allowing for replication across major versions.

In my opinion though, unless you really need logical replication for some reason, stick with streaming replication. It's much easier to understand and there are fewer failure modes.

link

amenonsen 1939 days ago

> In more recent versions there's "logical replication", which sort of "sends the queries"

What it sends is not the queries, but a logical description of the changes to each row that were made by running the query. So an UPDATE that changes N rows would generate N changes to be applied to the corresponding rows (usually identified by primary key) on the logical replica, not a single update that had to be "re-executed".

link