| Look, I'm not the best person to do this..but...good points? 1 - Default writes are unsafe by default: MongoDB supports a number of "write concerns": * fire-and-forget or "unsafe" * safe mode (only written to memory, but the data is checked for "correctness", like unique constraint violations) * journal commit * data-file commit * replicate to N nodes The last 4 can be mixed and matched. Most (all?) drivers allow this to be specified on a per-write basis. It's an incredible amount of flexibility. I don't know of any other store that lets you do that. When a user registers, we do a journal commit ({j:true}), 'cuz you don't want to mess that up. When a user submits a score, we do a fire-and-forget, because, if we lose a few scores during the 100ms period between journal commit, it isn't the end of the world (for us, if it is for you, always use j:true) The complaint is the default-behavior (which I think you can globally configure in most drivers) of the driver? Issue a pull request. Is the default table created in MySQL still MyISAM ? 2 and 6 - Lost Data This is the most damning point. But what can I say? "No?" My word versus his? I haven't seen those issues in production, I hang out in their google groups and I don't recall seeing anyone bring that up - though I do tend to avoid anything complicated/serious and let the 10gens guys handle that. Maybe they did something wrong? Maybe they were running a development release? Maybe they did hit a really nasty MongoDB bug. 3 - Global Lock MongoDB works best if your working set fits in memory. That should simply be an operation goal. Beyond that, three points. First, the global lock will yield, i believe (someone more informed can verify this). Second, the story gets better with every version and it's clearly high on 10gen's list. Most importantly though, it's a constraint of the system. All systems have constraints. You need to test it out for your use-case. For a lot of people, the global lock isn't an issue, and MongoDB's performance tends to be higher than a lot of other systems. Yes it's a fact, but with respect to "don't use MongoDB", its FUD. It's an implementation detail, that you should be aware of, but it's the impact of that implementation details, if any, that we should be talking about. 3 and 4 - Sharding Sharding is easy, rebalancing shards is hard. Sharding is something else which got better in 1.8 and 2.0, which the author thinks we ought to simply dismiss. I don't have enough experience with MongoDB shard management to comment more. I think the foursquare outage is somewhat relevant though (again, keeping in mind that things have improved a lot since then). 7 - "Things were shipped that should have never been shipped" This is a good verifiable point? I remember using MySQL cluster when it first shipped. That was a disaster. I also remember using MySQL from a .NET project and opened up a good 3-4 separate bugs about concurrency issues where you could easily deadlock a thread trying to pull a connection from the connection pool. I once had to use use clearcase. Talk about something that shouldn't have shipped. This is essentially an attack on 10gen, that ISN'T verifiable. Again, it's his anonymous word versus no ones. Just talking about it is giving it unjust attention. 8 - Replication It's unclear if this is replica sets or the older master-slave replication. Either way, again, I don't think this is verifiable. In fact, I can say that, relatively speaking, I see very few replica set questions in the groups. It works for me, but I have a very small data set, my data pieces themselves are small. Obviously some people are managing just fine (I'm not going to go through their who's who, I think we all know some of the big MongoDB installations). 9 - The "real" problem We've all seen some pretty horrible things. I was using MySQL in 5.0 and there was some amazing bugs. There's a bug, which I think still exists, where SQL Server can return you the incorrect inserted id (no, not using @@identify, using scope_identity) when you use a multi-core system. MS spent years trying to fix it. I guess I can say what 10gen never could...If you were using MongoDB prior to 1.8 on a single server, it's your own fault if you lost data. To me, replication as a means to provide durability never seemed crazy. It just means that you have to understand what's going on. Look, I don't doubt that this guy really ran into problems. I just think they have a large data set with a heavy workload, they thought MongoDB was a silver bullet, and rather than being accountable for not doing proper testing, they want to try and burn 10gen. They didn't act responsibly, and now they aren't being accountable. |
Well, except for that thing where the replication decided that the empty set was the most recent and blew everything else away. And those cases where keys went away.
Losing data, particularly when the server goes down, is fine. Even not writing data isn't terrible, though his points about not knowing whether it has been written in case of failure are really good ones. But corrupting data and then replicating that corrupted data is really, really bad. Often unfixably bad.
They didn't act responsibly, and now they aren't being accountable.
For the complaints about the default write stuff, sure. For everything else... Dunno. He brought up a lot of real, actual issues which were not documented MongoDB behavior. Yes, there's also a fair bit of complaining about the documented bits, and sure, boo-hoo, whatever. But the idea that 10gen is shipping stuff with serious data integrity bugs, and doing so knowing, doesn't seem out of line here.
And while MySQL also has some bad stuff, sure, it has nothing like as many data integrity bugs as MongoDB.
And I say all of this as a serious fan of MongoDB.