Hacker News new | ask | show | jobs
by jacques_chester 2881 days ago
> monolithic large relational databases are hard to scale

DB2 on z/OS was able handle billions of queries per day.

In 1999.

Some greybeards took great delight in telling me this sometime around 2010 when I was visiting a development lab.

> When you have one large database with tons of interdependencies, it makes migrating data, and making schema changes much harder.

Another way to say this is that when you have a tool ferociously and consistently protecting the integrity of all your data against a very wide range of mistakes, you have to sometimes do boring things like fix your mistakes before proceeding.

> In theory better application design would have separate upstream data services fetch the resources they are responsible for.

A join in the application is still a join. Except it is slower, harder to write, more likely to be wrong and mathematically guaranteed to run into transaction anomalies.

I think non-relational datastores have their place. Really. There are certain kinds of traffic patterns in which it makes sense to accept the tradeoffs.

But they are few. We ought to demand substantial, demonstrable business value, far outweighing the risks, before being prepared to surrender the kinds of guarantees that a RDBMS is able to provide.

1 comments

Not everything requires pessimistic transactional guarantees or atomicity. The problem domain you are solving for will influence the importance of those guarantees. If I'm solving for something where data consistency is not an utmost priority (tons of applications meet this criteria, including the one you are using now HN.) I don't have to worry about this.

But when you have transactional guarantees you also lose partition/failure tolerance. So it ends up being a choice of consistency over availability.

> Not everything requires pessimistic transactional guarantees or atomicity.

They are easier to give up after the fact than to try to regain after the fact.

> If I'm solving for something where data consistency is not an utmost priority (tons of applications meet this criteria, including the one you are using now HN.) I don't have to worry about this.

Sure. But wait for the pain. Prove the business need to relax the guarantees and the business acceptance of the risks.

> So it ends up being a choice of consistency over availability.

Total partitions are relatively rare and so disruptive that even if the magical datastore keeps chugging, everything else is mostly boned, so it doesn't matter. Meanwhile people tend to discover that actually, consistency mattered all along, but it's impossible to fix in retrospect.

Then there's the whole thing of bold claims being made in theory and not delivered in reality. RDBMSes, with the exception of MySQL which is close to being singlehandedly responsible for the emergence of NoSQL in the first place, tend to actually deliver on what they promise. The record for the alternatives is mixed, the fine print varies wildly and tends to leave out important details like "etcd split brains if you sneeze too loudly" or "mongodb is super fast, unless you want your data back".