Hacker News new | ask | show | jobs
by Radim 5329 days ago
There are decimal dots missing in the confusion matrix numbers (i.e., 190440 should read 19044.0, in case anyone else was wondering why the numbers don't add up).

If anything, the article convinced me not to use Mahout. So, the author decided to use the simplest algorithm, Naive Bayes, and got miserable results (from the article: "This is possibly due to a bug in Mahout that the community is still investigating."). He then changed to problem formulation in order to get better results, and concluded by saying the outcome is still likely a bug, but he's happy with it anyway?

This would be probably fine if we were talking about a small, nimble project that you could go into and hack/fix yourself. But we're talking about a massive, Java codebase. The thought of customizing it makes me shudder.

EDIT: forgot to mention I agree with the parent comment completely, except I would add "... and choosing the right evaluation process" to the initial sentence.