While the call-me-maybe series is definitely informative... it's worth noting that they've called out flaws in every distributed system they've tested against. What it comes down to is, are those flaws fatal in practice. The truth is, it depends.... If you lose a comment on a social media site, no big deal. If you lose part of a transaction for a multi million dollar stock trade, very big deal.
No software system is perfect, but there are definitely practical balances to be made. Especially when you are beyond what a single database/server can offer in terms of write throughput. The fact is, when your traffic needs exceed what a single database can keep up with in terms of writes, you have to give up some level of reliability.
In general, strongly consistent distributed datastores like zookeeper tend to be strongly consistent (cf Consul and Etcd too)... But Postgres was not tested as a distributed database, sharded or replicated, and without any form of failover.
The difference is: kill a zookeeper node and you will not notice, kill Postgres and your app is dead.
Postgres is a good DB, but since it's not distributed, it's not very useful to compare it to distributed databases. Yes it's consistent, but it's only as reliable as the single node where it is installed.
This is a common misconception about CAP theorem. Significant number of people don't realize that distributed system also includes clients, it's not just communication between servers.
I suspect he did not go over replication, because Postgres technically still fail over support is DIY, although he should. There are two replication methods though which I would like to see:
- asynchronous - this one is fast, but it most likely would have similar issues the other database have
- synchronous - the master makes sure data is replicated before returning to the user this should in theory always consistent
You would typically have two nodes in same location replicating synchronously and use asynchronous replication to different data centers. On a failure, you simply fail over to another synchronously replicating server.
Regarding consul/etcd actually those technologies did not do well in his tests, but authors appear to be motivated to fix issues.
I agree with your point about the lcient, but there's always a client... What's missing in the postgresql test is high availability, redundancy and partition tolerance... Similarly any inprocess db would beat the competition :)
That's why I said it's unfair to say "postgresql did well".
Call me maybe is supposed to test all the difficult problems of CAP, which have not been tested at all with Postgresql.
That's a real problem. There's little need for a nosql database (except redis maybe, because it's so fast) if you're not trying to overcome partitions and ensure HA...
MongoDB doesn't even scale well horizontally[1]. I normally would put a link to paper where they benchmarked Cassandra & HBase with MongoDB 2, but looks like they did their tests again with MongoDB 3.0 and included Couchbase as well.
I've mostly used MongoDB in mostly-read, and in a replica set... that said, if I needed to support pure scale, I'd be more inclined to reach for Cassandra. If I only needed mid-range replication, I'm more inclined to look at RethinkDB or ElasticSearch at this point. In fact the project I'm working on now is using ElasticSearch.
All of that said, you have to take a research paper funded by a database company (Datastax is backing Cassandra) with a grain of salt. Not to mention, that most people reach for MongoDB because it has some flexibility, and is a natural fit for many programming models. Beyond this, setting up a replica set with MongoDB was far easier than with any other database I've had to do the same with... Though I'd say getting setup with RethinkDB is nicer, but there's no automated failover option yet.
Postgresql doesn't even try to scale horizontally.
Regarding Mongodb, all I'll say is that I've switched from mysql to mongodb 2 years ago, and I've never looked back. YMMV.
I'm also a user of ElasticSearch and Redis, and looking to add Couchbase to the lot. One size doesn't fit all. mysql and postgres certainly don't fit all either.
No software system is perfect, but there are definitely practical balances to be made. Especially when you are beyond what a single database/server can offer in terms of write throughput. The fact is, when your traffic needs exceed what a single database can keep up with in terms of writes, you have to give up some level of reliability.