Hacker News new | ask | show | jobs
by 015a 1761 days ago
"Sharding" "Backups" and "Failovers" are NOT "practical" aspects of any database. They're theoretical. Most databases are not big enough to need sharding. Most backups go unused. Most failover happens automatically, totally managed by your hosting provider.

You know what is practical? Schema design. Query language. That's what made MongoDB super popular; no schemas to worry about. A query is just '{ firstName: "John" }'.

I cannot emphasize this enough: I cannot summon even a milliliter of desire to care about whether Mongo's way of doing these things is actually "better" or "worse". But it is what made it popular.

4 comments

Backups are most definitely a necessary part of any persistent datastore that is accumulating value for your company. Sharding is something that becomes necessary at scale - having it in your toolbox is a really good idea - but it doesn't come up for most folks. Failovers are the least valuable of what you've mentioned IMO since downtime management is something that even very mature companies have a lot of flexibility with.

SQL is a pretty easy language to learn - you can get pretty much everything working right with some SELECT, FROM, WHERE, GROUP BY and subqueries alone - the more arcane dialect components of SQL (like HAVING - gosh I hate how easy to misunderstand HAVING is) are things you can grow into. But basic SQL - it works and has worked fine for decades - I strongly dislike tooling and languages that go on crusades to make a better SQL.

If the database itself can’t handle failover properly, there’s nothing your hosting provider can do except rewrite it (which is exactly what Amazon ended up doing with MongoDB). Also if you’ve spent any significant time being responsible for operating a database and came away with “backups aren’t practical” you’re insanely lucky (or I and everyone I know is insanely unlucky).
I work for a company that supports ClickHouse. Our focus is analytic systems, which tend to run large compared to OLTP systems.

* Sharding is part of schema design for any analytic app whose data exceeds the capacity of a single host. This is very common for use cases like web analytics, observability, network management, intrusion detection, to name just a few. Automatic resharding is one of the top asks in the ClickHouse community. (We're working on it.)

* How do I backup ClickHouse is one of our top 3 support questions in order of frequency. I just taught a ClickHouse class yesterday--part of a regular series--and it was the first question out of the gate from the audience. It has come up at some point in almost every customer engagement I can think of.

In my experience, your comment is only correct for relatively small applications that are not critical to the business.

> failover happens automatically, totally managed by your hosting provider.

I see you've never used Mongo Atlas.