|
|
|
|
|
by YeGoblynQueenne
1436 days ago
|
|
>> This leads to one of the key questions of deep learning, currently: Why do
neural networks prefer solutions that generalize to unseen data, rather than
settling on solutions which simply memorize the training data without actually
learning anything? That's the researchers who prefer these solutions, not the networks. And that's
how the networks find them: because the experimenters have access to the test
data and they keep tuning their networks' parameterers until they perfectly fit
not only the training, but also the _test_ data. In that sense the testing data is not "unseen". The neural net doesn't "see" it
during training but the researchers do and they can try to improve the network's
performance on it, because they control everything about how the network is
trained, when it stops training etc etc. It's nothing to do with loss functions and the answers are not in the maths.
It's good, old researcher bias and it has to be controlled by othear means,
namely, rigorous design _and description_ of experiments. |
|
https://blockgeni.com/how-to-hill-climb-the-test-set-for-mac...
One benchmark I know where the test set is completely hidden is François Chollet's ARC dataset, and that's done precisely to preclude overfitting to the test set.