| HN Mirror

This is about new research, which mostly lives in papers and articles (like this) about the papers. It won't show up in introductory books for a while, so if you're unwilling to read or even look at papers, you won't be able to understand details of new research.

Scaling a model is just like it sounds: more data fed into a bigger network with more parameters. The gist of what this article is saying about scaling is that there's no sign of diminishing returns yet in terms of what the network can do and how well it generalises as the number of parameters is increased: the "more parameters = better performance" trend continues up to the enormous size of the full GPT-3 model, with no indication that even bigger models won't have even better performance.

Here is the GPT-3 paper: https://arxiv.org/pdf/2005.14165.pdf

If you really want to understand, skim this, and focus especially on the graphs, as they show the scaling. The x axis is usually model size, and the y axis is mostly accuracy or "loss" (~error).