|
|
|
|
|
by sendtown_expwy
1727 days ago
|
|
You are incorrect about the input dimensionality mattering. Let's say you have 100 high-res images with yes/no labels. If you hash the images and put their labels in a hashmap, you can say this is a "learned" function of 100 parameters which achieves zero training error on the dataset. This parameter count is independent of input dimension. Why do you think this would change when this mapping is replaced by a smooth neural network mapping? GPT is trained to predict the input (estimating p(x)), versus predicting a label given an input (p(y|x)). So in the case of GPT you can use the input dimensionality as a "label", as another responder has mentioned. ImageNet classification
is different (excepting recent semi-supervised or unsupervised approaches to image recognition). The ability to generalize in the typical imagenet setting is, as the article says, a byproduct of SGD with early stopping, which in practice limits the number of functions a deep neural network can express (something not considered in an analysis which only considers parameter count). |
|
Input dimensionality is absolutely important when determining net size.