Hacker News new | ask | show | jobs
by pkolaczk 4569 days ago
He forgot one of the very important reasons to use (some) NoSQL databases: high availability. Relational database systems are very poor at providing that. Most often the availability options are limited to resistance to node failures. RDBMSes have several SPOFs and must use failover which is not dependable, hard to test, and in many times needs manual intervention. Forget resistance to network partitions.
3 comments

CAP theorem tells us that you can't have availability without sacrificing consistency or partition tolerance, which means that there isn't a NoSQL database which can do that either.

It is not true that relational databases must have a single point of failure (SPoF) or must use failover: MySQL Cluster is a sharded multi-master distributed database without a SPoF.

On the other hand Redis, for example, is a master-slave failover NoSQL datastore.

CAP theorem says it cannot be done at the same time. But it is perfectly fine to sacrifice consistency for availability at the time partition happens and restore consistency once the partition is fixed. Still better than nothing if revenue counts. Financial institutions do like that all the time.
None of what you wrote is true.
And you provided no arguments at all. Sure, there are things like multi-master replication in RDBMS world, but I yet have to see a scalable system that utilizes it and at the same time is fully ACID compliant (which rules out async replication). We tried multimaster HA replication once, but the write throughput was terrible.
There is no possible argument to make. There are a million - probably more - highly available RDBMS systems all over the world handling real money and real goods and services, 24/7. You can argue if you wish that tables are not a good way to store particular data structures, fine. But the claim that RDBMSs are unreliable is just ludicrous and had been for 30 years.
First, there is difference between reliability and availability. IMHO availability is a subset of reliability.

Not saying they are unreliable per se, but making them really highly available is much, much harder than some NoSQL stores designed for HA, and the solutions are much more complex, usually beyond the point of being able to prove their correctness. It will cost you lots of effort, money and hardware. And to make it network-partition-tolerant, you'll have to give up ACID anyway, so one of the main advantage of RDBMSes over NoSQL stores goes away. IMHO not worth the trouble. In fact, most of the RDBMS systems operating in banks and insurance institutions I've seen were not even fully ACID. They were AD + eventually C and very relaxed I. You really don't need full ACID to handle money, it is just a convenient model for programmers.

I can see systems based on RDBMSes, even the most expensive ones, claiming 7 nines availability on paper do actually fail and sometimes in a totally weird ways, that fixing the mess takes too long. I know some of the companies migrated to NoSQL stores exactly because of this reason - an expensive RDBMS cluster failing after hardware accident while another NoSQL cluster still operating fine in the same datacenter, despite networking problems. I've seen that simply way too often happening to big names, including a few commercial banks and telecoms in Poland, to believe in marketing of HA RDBMS store. Sure, all of them recovered (sometimes after minutes and in one case after a week) and none lost any data, therefore I'm not saying RDBMSes are unreliable ;)

This is exactly a similar story as with scalability. Can RDBMSes be scaled? Yes, they can. But it is expensive, hard and requires very careful application design. It does not work automagically by "I'll simply normalize and throw my queries at it".

Rarely matters for a startup.
For the startup I once worked for, it mattered much more than we had thought at the beginning. The investors were smart enough to notice we had some considerable periods of downtime. Additionally, once we got first million of users (not really that much and nowhere near the scale of Google or FB) we ran into performance problems which couldn't be easily solved just by indexing, optimizing queries or adding more hardware, and "buying" a beefy Oracle superserver was not an option as we didn't have enough revenue yet. So we had to dump joins, relax transactions, denormalize a lot and ended with a half-baked, bug-ridden NoSQL store on top of PostgreSQL, that couldn't even do horizontal partitioning well. I wished we had a proper solution like Cassandra right from the start. It would save us lots of pain.
First million users. Come on. 99% of your audience is never going to have that problem.

Especially if they spend their early days fucking with a Cassandra cluster instead of talking to customers.

And it should be noted, you made it anyway.

When you make it by the skin of your teeth, that means you probably timed it right.

Preempting a problem far-ahead of time in startups means time and effort was wasted, especially if it was done before the existence of the problem was established.

It is unreal to me that people still can't figure out how to apply Maslow's hierarchy to startups.

A million registered users is nothing for an MMORPG. You can get it pretty quickly even in a national-level game, without going global. A completely different story is keeping those users active and earning on them. The problem here is - you need to sustain a pretty massive load, yet only a few % of that load brings you revenue. And things like being out of service for even 10 minutes during peak hours (and peak load can be 100x higher than average load if you do special events in the game) can put you out of business or at least seriously worry potential investors.

The sad thing is they actually didn't make it. I don't think the revenue ever crossed the cost of f*ing with all the scalability and availability problems. AFAIK currently they use Membase.

A million registered users is $15,000,000 a month if you're charging $15 a month, which was standard for a long time.

Are you fucking serious? A product that fails with a million users has a monetization problem not a technological one.

Not in the freemium model. In the freemium model it is easy to get many registered users because you're giving the basic version of the game for free and charging for extras. And if 10% of users pay for some extras at all, you may consider yourself very lucky. Typical paying users share is 2-5%. Another problem is 1 million of those users don't play all the time (we had only 5% of them logged at a time), however it is possible to get most of them active for a short time by organizing special events / competitions for them. Therefore you need to have capacity for handling that 1 million of users for a very short period of time, therefore you need to scale, but you are not going to make money from them constantly.
Additionally, when you do a startup and it succeeds it is already too late to redesign your app completely to make it scale. Changing the RDBMS in the middle of the game and migrating data is risky and would cost you probably more than using a properly scalable database right from the beginning.

Additionally once you know you need to scale, your competition will see your product. Knowing the idea works, if they start from scratch, but using a better database for the job, they can easily put you out of the business, because instead of competing and adding new feauters, you'll be busy fighting scalability problems.