Hacker News new | ask | show | jobs
by p1esk 2611 days ago
Another reason to balance the test set when the train set is unbalanced is to check if lack of training data for certain classes is a problem. You would use cross-validation, but do different splits for each class. It might well turn out that certain classes are just "easy", and you don't need to find more training samples for them to get the overall accuracy up.