|
|
|
|
|
by Dn_Ab
4268 days ago
|
|
As others have mentioned, when the assumptions of conditional independence are met, Gaussian NaiveBayes and MaxEnt will asymptotically learn the same classifier. Except NB will learn it in O(log dimensions) instead of O(dimensions) examples. So in those cases and where you have fewer examples you'll want to use NB. Even when the independence assumptions are not met, NB will often produce a good classifier if you don't care about the accuracy of the probabilities. NB is also online, resistant to the curse of dimensionality and for categorical data will learn polynomial decision boundaries. So the gain in the trade-off is in ease of implementation (you can write one in a dozen lines or less) that you can throw something close to your raw data at and often get good enough results without any ahead of time training, kernels or regularization. You are right though that MaxEnt (some sparsity capturing algorithms are as powerful as SVMs) will in general outperform naive bayes and that the averaged perceptron probably has, on average, the best performance/ease of implementation ratio. But for many implementors/problems, where clarity is paramount, those distinctions will not be worth the cost. Also, you have a wonderful website. |
|
Okay, it's not true that NB is strictly dominated by MaxEnt. But, look at the two example problems the author gave, where Naive Bayes was said to be a good choice. The parameters there definitely, definitely won't be conditionally independent. And probably you'll have enough data. Naive Bayes is a bad choice here, as it is in most other situations.
So, I think the caveats you've raised are all true...but, I still wouldn't be raising them in a class I was teaching. I think it's easy to have less useful discussion, made up of individually more true statements. I think a common problem in technical discussion is too much attention to every qualification, and every caveat.
People assume that there's some sort of proportionality between the importance of an idea/topic, and the airtime you give it. And I think they do this implicitly, in a way that's really hard to consciously over-rule. So I think it's really important to editorialise. I think a lot of technical discussion would be better off by making statements that are untrue at the edges, accepting the imprecision, and moving on.
That's why I'm dismayed to see someone's started writing a detailed tutorial on Naive Bayes, especially set to two problem domains it's a bad choice for. Even if it's composed of entirely true statements, I think its net effect is to miseducate people.