| HN Mirror

I agree with you - we need numbers before making that kind of conclusion and I haven't run any benchmarks on the public version of DocDB. I'd like to see someone measure MongoDB on Azure vs DocDB on Azure - even then it might not be a fair measurement of db vs. db, since we don't know what machines DocDB is hosted on.

All I can really say is that the replication model provides a significant performance boost over MongoDB in the multiple replica (i.e., production) scenario.

We were using MongoDB at Microsoft for a while (I left MS almost a year ago). I was developing a real-time metrics system with it. It was very unstable at our target load (500k increments per minute, high percentage of tomorrow's documents preallocated the day before). We only managed maybe 10% of that with MongoDB, IIRC. Sometimes it would choke and not come back until I restarted the cluster (~30 machines total, I believe. 3 replicas * 10 shards).

We were so sure that MongoDB should be able to handle this scenario, since they talk about it in their documentation. After talking with the MongoDB devs, we came to the conclusion that even though we were issuing increment operations on preallocated documents, MongoDB was:

a) using a global lock on the "local" db used for replication, and

b) "replicating via disk" instead of via the network. In other words, replication requires writing to the journal journal before other members of the replica set have a chance to apply the change and ack back. This results in a loss of concurrency.

The lack of async query support in the C# driver didn't help either.

Eventually we used a replicated, write-back cache which sits atop the framework DocDB uses. Not a fair comparison, but the goal was achieved easily with 1/3rd the hardware. We just backed it onto Azure Table Storage. Our queries were all range queries, which table storage supports.

I can't talk about the framework, unfortunately.