Hacker News new | ask | show | jobs
by troupo 487 days ago
> They slow to a crawl when you have huge tables

Define "huge". Define "massive".

For modern RDBMS that starts at volumes that can't really fit on one machine (for some definition of "one machine"). I doubt Mongo would be very happy at that scale, too.

On top of that an analysis of the query plan usually shows trivially fixable bottlenecks.

On top of that it also depends on how you store your versioned data (wikipedia stores gzipped diffs, and runs on PHP and MariaDB).

Again, none of the claims you presented have any solid evidence in real world.

1 comments

Wikipedia is tiny data. You don't start to really see cost scaling issues until you have active data a few hundred times larger and your data changes enough that autovacuuming can't keep up.

I'm getting paid to move a database that size this morning.

English language Wikipedia revision history dump: April 2019: 18 880 938 139 465 bytes (19 TB) uncompressed. 937GB bz2 compressed. 157GB 7z compressed.

I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.

I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.

This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"

[1] https://meta.wikimedia.org/wiki/Data_dumps/Dumps_sizes_and_g...

Wikipedia's active english data is only 24gb compressed. https://dumps.wikimedia.org/enwiki/20250201/

They store revisions in compressed storage mostly read only for archival. https://wikitech.wikimedia.org/wiki/MariaDB#External_storage

They have the layout and backup plans of their servers available.

They've got an efficient layout, and they use caching, and it is by nature very read intensive.

https://wikitech.wikimedia.org/wiki/MariaDB#/media/File:Wiki...

Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.