Hacker News new | ask | show | jobs
by mitultiwari 5406 days ago
Apache Mahout is interesting but I still haven't found a strong need to use it. Most of the time I can sample the data that I process in Hadoop, and use R for training machine learning algorithms.

Does any body know of any large scale data mining use of Apache Mahout?

2 comments

I suppose the key benefit is shorter development cycles when you don't have a very powerful machine setup to your disposal. Distributed ML in the cloud can be an interesting alternative to buying and maintaining in-house compute servers.

Often times you want to run an experiment by training a classifier, testing it on a development data set, tweaking it and starting over. In the mean time, you just sit and wait. Reduce that cycle from a couple of hours even days to a shorter amount of time is probably the most appealing aspect of a project like Mahout.

We run an adserving company which, among other things, does optimization based on ad campaign results, - manual training of algorithms is just not going to work there, so we need to distribute the training among a cluster of machines.

We're not using Mahout, but I keep an eye on it since it might be an improvement to our current solution.