| HN Mirror

They could have used all negative samples for testing (and even training if they would have done it better), yes. But once your test set is large enough, whatever that means, its not that relevant anymore. They are anyway "under sampling" by not recording data from all humans that are negative right now.

And no, it's not a strong claim to make. Of course the network learns the distribution of your training set. That's why you want it balanced. But during successive applications of inference the weights do not change, it has no state. So it cannot, for example, store that it just predicted 90% negative and now it would be time again for some positive prediction.