Realtime Analytics with MongoDB

Y	Hacker News new \| ask \| show \| jobs

	Realtime Analytics with MongoDB (slideshare.net)
	27 points by jrosoff 5748 days ago

4 comments

rrival 5747 days ago

I liked this one as well:

http://www.slideshare.net/jrosoff/scalable-event-analytics-w...

link

nessence 5747 days ago

I'm working on a system which is similar but higher volume.

Have you done any benchmarks to test thousands of updates per second?

Same, but on the front-end. What is the impact of generating 10 reports per second for 2 hours? Do the writers get behind?

You won't have scaling issues in until the front-end hits some threshold of x queries per y updates, with x servers.

Good presentation on another application of mongo.

link

jrosoff 5747 days ago

We have hit 1000's of updates per second on our current system during some high load periods and did not see any problems. Our steady state is 100's per second, but it bursts to 1000's for extended durations about once per week if not more often.

10 reports per second is actually not that much load and has almost no impact on writers. We have an alerting system that runs while data is input to the system. It effectively loads a report for each metric reported in the input and decides whether or not to send an alert. That system generates queries about 50 reports per second on an ongoing basis and does not impact the writers. Our read volume in steady state is about 2x our write volume.

We have not seen any queueing problems on writes and the lock ratio in mongodb is typically in the 0.01 - 0.005 range.

We have found that we can break this by running lots of map-reduce jobs simultaneously while processing high write volume but that's a whole other ball of wax.

Our data access patterns very easily accomodate sharding. Both reads and writes are pretty even distributed across the set of URL's we track. By activating sharding using URL as shard key, we feel we can handle scaling several orders of magnitude beyond where we are now without anything more than additional hardware (or virtual machines).

I'd love to hear how your system scaling goes. Feel free to hit me up via email if you want to discuss (jrosoff AT yottaa.com)

link

SanjayUttam 5747 days ago

If you like this, you may want to check out HummingBird...(Node + Mongo)

http://webpulp.tv/post/757442457/hummingbird-michael-nutt

link

jrosoff 5747 days ago

Hummingbird is awesome and both as a tool and a case study. We learned a lot from Hummingbird that we incorporated into the design of our system.

link

rgrieselhuber 5747 days ago

Slide 11 and 12 were interesting because they are close to the same solution, but 11 looks more complicated with Voldemort, et. al. in the mix. HBase with Hadoop seems like another good alternative not mentioned.

link

jrosoff 5747 days ago

Yeah slide 11 depicted what we thought would be a great solution before we started investigating MongoDB. MongoDB effectively replaced all those other systems for us and was _significantly_ easier to set up and develop against.

A few people have mentioned HBase as an alternative. We did not consider HBase at the time we were making our architecture choices, however if we were starting today, we'd probably have looked at it too. My first impressions of HBase are that it lacks the level of documentation & community support behind MongoDB. I am definitely going to dig in some more to see how it would compare. That being said, we're totally happy with our choice of MongoDB and would recommend it to anybody considering HBase.

link