Hacker News new | ask | show | jobs
by b_ttercup 3248 days ago
Is Naive Bayes really ever the most practical choice? Yes it is a simple, fast algorithm, but it's usually a non trivial step below other simple models in my experience and doesn't seem to show any major advantages. The results shown here seem good but bag of words models usually do better than you might think on supervised NLP. So what's the motivation?
2 comments

The scikit-learn flowchart recommends it for text data with less than 100k samples when linear SVC doesn't work: http://scikit-learn.org/stable/tutorial/machine_learning_map...

AFAIK it's by far the fastest machine learning method and one of the only ones that can be learned "online". I.e. it can just update the model each time it gets a datapoint, and then throw it away without saving it for future training. These are nice properties if you are doing something at a very large scale or in an environment with very limited resources.

And if your data happens to actually meet the naive bayes assumptions (that all the features are conditionally independent) then it's literally mathematically optimal and you can't do any better than it. It seems to work fairly well even when that isn't the case though.

Logistic regression can easily be made online too, keep in mind! sklearn has an implementation of online gradient descent, and vowpal wabbit is also excellent at those problems.

Naive bayes can be parallelized in ways that SGD can't, that's a whole other conversation.

Gradient descent can be made online. But it's very slow and suffers from catastrophic forgetting. Typical gradient descent needs to iterate over the dataset many times, while naive Bayes only needs one pass.
I thought it was the typical approach for identifying email spam. Has that changed?