|
|
|
|
|
by dougws
4313 days ago
|
|
Your objections to MongoDB's model seem reasonable, but I don't see any evidence in either this comment or the linked blog post that DocumentDB is better (especially in the absence of benchmarks). What is this "battle-tested distributed systems framework"? Several of your complaints about MongoDB have to do with the interaction between persistence to disk and replication to the network; as the Multi-Paxos algorithm does not specify when data should be written to disk (much less what the format should be), what reason is there to believe that DocumentDB does this any better? I'm totally willing to believe that DocumentDB beats the pants of MongoDB on just about every axis (in fact, that seems pretty likely) but it's going to take some actual numbers and a better description of the internals. |
|
All I can really say is that the replication model provides a significant performance boost over MongoDB in the multiple replica (i.e., production) scenario.
We were using MongoDB at Microsoft for a while (I left MS almost a year ago). I was developing a real-time metrics system with it. It was very unstable at our target load (500k increments per minute, high percentage of tomorrow's documents preallocated the day before). We only managed maybe 10% of that with MongoDB, IIRC. Sometimes it would choke and not come back until I restarted the cluster (~30 machines total, I believe. 3 replicas * 10 shards).
We were so sure that MongoDB should be able to handle this scenario, since they talk about it in their documentation. After talking with the MongoDB devs, we came to the conclusion that even though we were issuing increment operations on preallocated documents, MongoDB was:
a) using a global lock on the "local" db used for replication, and
b) "replicating via disk" instead of via the network. In other words, replication requires writing to the journal journal before other members of the replica set have a chance to apply the change and ack back. This results in a loss of concurrency.
The lack of async query support in the C# driver didn't help either.
Eventually we used a replicated, write-back cache which sits atop the framework DocDB uses. Not a fair comparison, but the goal was achieved easily with 1/3rd the hardware. We just backed it onto Azure Table Storage. Our queries were all range queries, which table storage supports.
I can't talk about the framework, unfortunately.