Hacker News new | ask | show | jobs
by ogrisel 4921 days ago
You should directly use Mahout: the recsys part is quite complete and high level and application oriented contrary to scikit-learn which does not provide high level recsys concepts.

The best documentation I found is the Mahout in Action book (http://manning.com/owen/) while reading the source code in parallel.

Also you probably don't need to run this on a Hadoop cluster unless your data is too big to fit on one single machine.