|
|
|
|
|
by yldedly
1730 days ago
|
|
>In practice, these huge models are, in laymans terms, fucking awesome and work really well e.g. they generalize and work in production. No one understands why. To add nuance to this, these models are awesome at interpolation, but not so much at extrapolation. Or in different terms, they generalize very well to an IID test set, but don't generalize under (even slight) distribution shift. The main reason for this is that these models tend to solve classification and regression problem quite differently from how humans do it. Broadly speaking, a large, flexible NN will find a "shortcut", i.e. a simple relation between some part of the input and the output, which may not be informative in the way we want; such as a watermark in the corner of an image, or statistical regularities in textures which disappear in slightly different lighting conditions. See e.g. https://thegradient.pub/shortcuts-neural-networks-love-to-ch... I think it's fair to say that these models are great when you have an enormous dataset that covers the entire domain, but sub-Google-scale problems are usually still solved by underparametrized models (even at Google). |
|
Maybe your point stands, and it’s just that some domains need less data, just saying.