| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jdagostino 5334 days ago
	what did you end up going with?

1 comments

electic 5334 days ago

Our company is a big data company. So our amazing engineers are responsible for storing hundreds of millions of pieces of data per week AND also crunching and analyzing that data. So basically we need a system where we can have incredible write and read performance but also a system that is elastic in nature. Most importantly, it has to be available.

Before I go into more details, MongoDB is great for most people who don't have a high transaction volume. It is easy to setup and easy to use. So if you are in this camp, MongoDB is probably a good fit for you.

We did about two months worth of extensive tests in our lab. Basically two things didn't bode well for us. One, the locking killed reading...we just had a hard time keeping the flow of writes and the flow of data to our statistics cluster alive. Yea, you could use replication but that too didn't work too well performance wise. Two, the sharding didn't seem that robust. As the cluster got bigger and bigger, we started noticing the overhead of keeping it up was getting to be too great. Rather than write in detail, I think this article covers some of the scaling issues we experienced:

http://blog.schmichael.com/2011/11/05/failing-with-mongodb/

We finally used a hybrid system. We went with Membase, now CouchBase, to handle immediate storage and we are now implementing Hadoop for our long term storage needs.

P.S. Our entire stack is a KV in nature.

link

Goldcap 5334 days ago

Just reading about your transactional volume, it seems like at it's face MongoDB wouldn't be a good fit for this project. 30k per second is not anywhere MongoDB pretends to live, I think by their own admission. And Sharding in MongoDB, while being called a core feature, was bolted on after core development, probably intended to give Mongo some credibility with those who want it to be more scalable. IMHO if you need that kind of scalability, you're already straying from the Mongo Niche, 2.0.0 notwithstanding.

So agreeing with a point earlier, if you don't like a write lock implementation, and have concerns about scaling, and have a huge transactional volume, just really not something that fits well with MongoDB.

I've been using Mongo now (currently using 1.8) for three (is it almost three now?) years, 2 million hits/day, with a replicated set, and while I've needed maintenance, reindexing, and (gasp) restarts on occasion, never had any of the problems identified by the author of this post.

Bottom line, sounds to me like someone was in over someone's head from an architectural standpoint, made a bad choice of MongoDB, and then blamed 10gen for his own lack of foresight. So while I empathize with the struggle, I fault him for not knowing his options in advance, TESTING first, then betting the farm on a fairly new opensource codebase.

LOTS of other database solutions that would scale better. Analyzing lots and lots of transactional stateless data with MongoDB map-reduce? Well, just kinda like killing yourself by trying to sprint up from the bottom of the Grand Canyon. "You really tried to do that?"

link

christkv 5334 days ago

there's an initial hadoop plugin for mongo that might be a better fit for doing map-reduce over large datasets https://github.com/mongodb/mongo-hadoop

link

bdarfler 5334 days ago

We easily support 10s of millions of writes and reads against Mongo per hour on a very small (single digit) number of shards in the cloud (i.e. crappy disk I/O). While that is around an order of magnitude less than 30k a second I would be surprised if we couldn't scale mostly linearly by adding shards.

P.S. If your stack is KV then you should use a KV store.

link

electic 5334 days ago

"I would be surprised if we couldn't scale mostly linearly by adding shards"

MongoDB aside, why should you assume? You should test the heck out of any DB solution before using it to base your product on.

link