|
My big beef with a lot of the 'leading edge' ML research is that it tends to be waaaaay too focused on classification problems, and ImageNet in particular. And, last I checked, you /do/ still fight with overfitting in classification models, by cleverly choosing learning rate schedules and using early stopping schemes, 'double descent' be damned. You can solve classification with a hash function: Hash the image, and then just memorize which label goes with which hash. You can try to dodge this obviously dodgy solution by adding augmentation to the dataset. Then you instead learn to find a representation invariant under the set of augmentations, and learn the hash of that representation. It turns out these augmentation-invariant representations are actually pretty good, so we can solve the classification problem in what looks like a general way. However, there are many other classes of problems where the hash problem doesn't exist, because the information density of the outputs is too high to memorize in the same way. Specifically, generative models, and the sorts of predictive/infill problems used for self-supervision. In these spaces, the problems are more like: "Given this pile of augmented input, generate half a megabyte of coherent output." These kinds of problems simply don't overfit: Train a speech separation model on a big dataset, and the train+eval quality metrics will just asymptote their way up and to the right until you run out of training budget. |