So the article mentions "regularization" as the secret ingredient to get to a generalized solution, but they don't explain it. Does someone know that is? Or is it an industrial secret of OpenAI?
Regularization as a concept is taught in introductory ML classes. A simple example is called L2 regularization: you include in your loss function the sum of squares of the parameters (times some constant k). This causes the parameter values to compete between being good at modeling the training data and satisfying this constraint--which (hopefully!) reduces overfitting.
The specific regularization techniques that any one model is trained with may not be publicly revealed, but OAI hardly deserves credit for the concept.
The specific regularization techniques that any one model is trained with may not be publicly revealed, but OAI hardly deserves credit for the concept.