Hacker News new | ask | show | jobs
by montenegrohugo 1727 days ago
The point is your simple mapping with zero error on the training dataset also has zero prediction power in both the test dataset and in real life. It's learned nothing; it's at the extreme scale of overfitted.

Input dimensionality is absolutely important when determining net size.

2 comments

Seems like cross talking to me. They were responding to the erroneous claim of "input dimensonlity" being equivalent to data. What the first poster referred to as "internal data points" may be better described as the presumption of differentiability, that is, a small disturbance of the pixels should result in a "small" change of the labels. But it was ridiculous to claim somehow the total number of pixels is a meaningful measure of sample size. The pixels are not independent, as dramatized by the hash map example given above.
That's the point. 100 parameters is sufficient to overfit, and it's a number that's independent of the input size. Do you have a reference for your statement?
Reference for what exactly? That input dimensionality is important when determining net size? That seems quite self-explanatory; try training a image classifier with only 100 parameters.

Maybe I understood that question wrong, but regardless, even if early stopping wasn't implemented, a NN would have more predictive power than the hash mapping. Both would be completely overfit on the training data set, yet the NN would most likely be able to make some okay guesses with OOD data.