|
|
|
|
|
by djsbshek
2052 days ago
|
|
My main concerns with the imbalance are undersampling of the negative class data distribution relative to the positive class, and overestimating performance on the test splits. I can buy that you may want to train on a balanced dataset, but the testing condition should reflect the true case distribution as closely as possible. I agree that you would not want to use only the class priors for prediction. However, I do not think it is clear that you would want to throw that information out. Also not sure that I agree with the statement that neural network has “no memory” of the prior class distribution. That is a strong claim to make about something as opaque as a neural net model. |
|
And no, it's not a strong claim to make. Of course the network learns the distribution of your training set. That's why you want it balanced. But during successive applications of inference the weights do not change, it has no state. So it cannot, for example, store that it just predicted 90% negative and now it would be time again for some positive prediction.