That's the point. 100 parameters is sufficient to overfit, and it's a number that's independent of the input size. Do you have a reference for your statement?
Reference for what exactly? That input dimensionality is important when determining net size? That seems quite self-explanatory; try training a image classifier with only 100 parameters.
Maybe I understood that question wrong, but regardless, even if early stopping wasn't implemented, a NN would have more predictive power than the hash mapping. Both would be completely overfit on the training data set, yet the NN would most likely be able to make some okay guesses with OOD data.
Maybe I understood that question wrong, but regardless, even if early stopping wasn't implemented, a NN would have more predictive power than the hash mapping. Both would be completely overfit on the training data set, yet the NN would most likely be able to make some okay guesses with OOD data.