Hacker News new | ask | show | jobs
by gtuhl 5482 days ago
I am not a big fan of Hadoop. It is a headache to configure and optimized for installs with node counts only a few companies could make use of. I really wish there were more options as I believe Hadoop is overkill for most of the people using it.

For quick and dirty map reduce on a smaller node count I've started to really like Disco (discoproject.org). You just pull down the backend with your package manager, push your files into ddfs, write a python script, and run it.

2 comments

I just finished building a proof of concept with disco doing some simple analytics calculations on one machine. I did run into a couple of bugs (http://groups.google.com/group/disco-dev/browse_thread/threa...) and the documentation is awful but overall I'm pretty impressed. Now if I could just get discodex to run...
Interesting, I haven't looked at disco for quite awhile. How does disco compare to hadoop streaming these days? (I'm highly biased, so I reach for bigcouch most of the time now)
The latest release supports workers written in any language. Disco comes with worker libraries for python and ocaml.

http://discoproject.org/doc/howto/worker.html