Hacker News new | ask | show | jobs
by Thews 488 days ago
While data can be used in a relational way, it doesn't mean that's the best for performance or storage. Important systems usually require compliance (auditing) and need things like soft deletion and versioning. Relational databases come to a crawl with that need.

Sure you can implement things to make it better, but it's layers added that balloon the complexity. Most robust systems end up requiring more than one type of database. It is nice to work on projects with a limited scope where RDBMS is good enough.

1 comments

> and need things like soft deletion and versioning. Relational databases come to a crawl with that need.

Lol. No relational database slows to a crawl on `is_deleted=true` or versioning

In general so far not a single claim by NoSQL databases has been shown to be true. Except KV databases, as they have their own uses

They slow to a crawl when you have huge tables with lots of versioned data and massive indexes that can't perform maintenance in a reasonable amount of time, even with the fastest vertically scaled hardware. You run into issues partitioning the data and spreading it across processors, and spreading it across servers takes solutions that require engineering teams.

There's a large amount of solutions for different kinds of data for a reason.

I have built "huge tables with lots of versioned data and massive indexes". This is false. I had no issues partitioning the data and spreading it across shards. On Postgres.

> ... takes solutions that require engineering teams.

All it took was an understanding of the data. And just one guy (me), not an "engineering team". Mongo knows only one way of sharding data. That one way may work for some use-cases, but for the vast majority of use-cases it's a Bad Idea. Postgres lets me do things in many different ways, and that's without extensions.

If you don't understand your data, and you buy in to the marketing bullshit of a proprietary "solution", and you're too gullible to see through their lies, well, you're doomed to fail.

This fear-mongering that you're trying to pull in favour of the pretending-to-be-a-DB that is Mongo is not going to work anymore. It's not the early 2010s.

Where did I ever say anything about Mongo?

I have worked with tables on this scale. It definitely is not a walk in the park with traditional setups. https://www.timescale.com/blog/scaling-postgresql-to-petabyt...

Now data chunked into objects distributed around to be accessed by lots of servers, that's no sweat.

I'd love to see how you handle database maintenance when your active data is over 100TB.

I'd love to see a NoSQL database handling this easier than a RDBMS
You mean like scylla?
> They slow to a crawl when you have huge tables

Define "huge". Define "massive".

For modern RDBMS that starts at volumes that can't really fit on one machine (for some definition of "one machine"). I doubt Mongo would be very happy at that scale, too.

On top of that an analysis of the query plan usually shows trivially fixable bottlenecks.

On top of that it also depends on how you store your versioned data (wikipedia stores gzipped diffs, and runs on PHP and MariaDB).

Again, none of the claims you presented have any solid evidence in real world.

Wikipedia is tiny data. You don't start to really see cost scaling issues until you have active data a few hundred times larger and your data changes enough that autovacuuming can't keep up.

I'm getting paid to move a database that size this morning.

English language Wikipedia revision history dump: April 2019: 18 880 938 139 465 bytes (19 TB) uncompressed. 937GB bz2 compressed. 157GB 7z compressed.

I assume since then it's grown at least ten-fold. It's already an amount of data that would cripple most NoSQL solutions on the market.

I honestly feel like talking to functional programming zealots. There's this fictional product that is oh so much better than whatever tool you're talking about. No one has seen it, no one has proven it exists, or works better than the current perfectly adequate and performant tool. But trust us, for some ridiculous vaguely specified constraints it definitely works amazingly well.

This time "RDBMS is bad at soft deletions and versions because 19TBs of revisions on one of the world's most popular websites is tiny"

[1] https://meta.wikimedia.org/wiki/Data_dumps/Dumps_sizes_and_g...

Wikipedia's active english data is only 24gb compressed. https://dumps.wikimedia.org/enwiki/20250201/

They store revisions in compressed storage mostly read only for archival. https://wikitech.wikimedia.org/wiki/MariaDB#External_storage

They have the layout and backup plans of their servers available.

They've got an efficient layout, and they use caching, and it is by nature very read intensive.

https://wikitech.wikimedia.org/wiki/MariaDB#/media/File:Wiki...

Archival read only servers don't have to worry about any of the maintenance mentioned. Use chatgpt or something to play your devil's advocate, because what you're saying is magical and non existent is quite common.