Hacker News new | ask | show | jobs
by james33 4077 days ago
When did you last use Mongo? We've been using it in production for 3+ years, and while there were certainly some issues early on, we've had nothing but success with it (especially over the last few major versions).
1 comments

You could have inconsistencies in your data[1] and not even realize it.

[1] https://aphyr.com/posts/284-call-me-maybe-mongodb

While the call-me-maybe series is definitely informative... it's worth noting that they've called out flaws in every distributed system they've tested against. What it comes down to is, are those flaws fatal in practice. The truth is, it depends.... If you lose a comment on a social media site, no big deal. If you lose part of a transaction for a multi million dollar stock trade, very big deal.

No software system is perfect, but there are definitely practical balances to be made. Especially when you are beyond what a single database/server can offer in terms of write throughput. The fact is, when your traffic needs exceed what a single database can keep up with in terms of writes, you have to give up some level of reliability.

He wasn't able to find issues with Zookeeper and Postgres.

Granted that you can only prove that the system is vulnerable and not the reverse, but if there is a vulnerability it is much harder to trigger it.

In general, strongly consistent distributed datastores like zookeeper tend to be strongly consistent (cf Consul and Etcd too)... But Postgres was not tested as a distributed database, sharded or replicated, and without any form of failover. The difference is: kill a zookeeper node and you will not notice, kill Postgres and your app is dead.

Postgres is a good DB, but since it's not distributed, it's not very useful to compare it to distributed databases. Yes it's consistent, but it's only as reliable as the single node where it is installed.

This is a common misconception about CAP theorem. Significant number of people don't realize that distributed system also includes clients, it's not just communication between servers.

I suspect he did not go over replication, because Postgres technically still fail over support is DIY, although he should. There are two replication methods though which I would like to see:

- asynchronous - this one is fast, but it most likely would have similar issues the other database have - synchronous - the master makes sure data is replicated before returning to the user this should in theory always consistent

You would typically have two nodes in same location replicating synchronously and use asynchronous replication to different data centers. On a failure, you simply fail over to another synchronously replicating server.

Regarding consul/etcd actually those technologies did not do well in his tests, but authors appear to be motivated to fix issues.

I agree with your point about the lcient, but there's always a client... What's missing in the postgresql test is high availability, redundancy and partition tolerance... Similarly any inprocess db would beat the competition :)

That's why I said it's unfair to say "postgresql did well".

Call me maybe is supposed to test all the difficult problems of CAP, which have not been tested at all with Postgresql.

95% of the people I hear about using NoSQL databases are using them on a single node.
That's a real problem. There's little need for a nosql database (except redis maybe, because it's so fast) if you're not trying to overcome partitions and ensure HA...
MongoDB doesn't even scale well horizontally[1]. I normally would put a link to paper where they benchmarked Cassandra & HBase with MongoDB 2, but looks like they did their tests again with MongoDB 3.0 and included Couchbase as well.

[1] http://www.datastax.com/wp-content/themes/datastax-2014-08/f...