|
A quick summary/translation for those of us who don't speak ML. We keep hearing about these giant models like GPT3 with 1.5 billion paramaters. Parameters are the things that change when we train a model, you can think about them as degrees of freedom.
If you have a lot of parameters, theory made us believe that the model would just "overfit" the training data, e.g. memorize it. That's bad, because when new data comes in in production we'd expect the model to not be able to "generalize" to it, e.g. make accurate predictions on data it hasn't seen before, because it's just memorized training data instead of uncovering the "guiding principles" of the data so to speak. In practice, these huge models are, in laymans terms, fucking awesome and work really well e.g. they generalize and work in production. No one understands why. This paper is a survey or overview of what "too many paramaters" are, and all the research into why these models work even though they shouldn't. |
You can solve classification with a hash function: Hash the image, and then just memorize which label goes with which hash. You can try to dodge this obviously dodgy solution by adding augmentation to the dataset. Then you instead learn to find a representation invariant under the set of augmentations, and learn the hash of that representation. It turns out these augmentation-invariant representations are actually pretty good, so we can solve the classification problem in what looks like a general way.
However, there are many other classes of problems where the hash problem doesn't exist, because the information density of the outputs is too high to memorize in the same way. Specifically, generative models, and the sorts of predictive/infill problems used for self-supervision. In these spaces, the problems are more like: "Given this pile of augmented input, generate half a megabyte of coherent output." These kinds of problems simply don't overfit: Train a speech separation model on a big dataset, and the train+eval quality metrics will just asymptote their way up and to the right until you run out of training budget.