Hacker News new | ask | show | jobs
by moultano 3251 days ago
For a simple example, imagine a dataset where the naive assumption is true if you split it into 100 classes, but false if you split it into one vs everything else. All of the conditional probabilities for the "everything else" class will be underestimated, biasing the weights towards the one.

This problem happens because the class you are interested in is more compact than its inverse.

It's also exacerbated by feature selection, as the negative features have smaller weights and thus lower information gain than the positive features.