|
|
|
|
|
by claytoneast
2147 days ago
|
|
from Gwern, @ https://www.gwern.net/newsletter/2020/05:
"This year, GPT-3 is scary because it’s a magnificently obsolete architecture from early 2018, which is small & shallow compared to what’s possible3, with a simple uniform architecture4 trained in the dumbest way possible (unidirectional prediction of next text token) on a single impoverished modality (random Internet HTML text dumps5) on tiny data (fits on a laptop), sampled in a dumb way6, and yet, the first version already manifests crazy runtime meta-learning—and the scaling curves still are not bending!" It's probably not a state-of-the-art breakthrough at this point. Who knows what OpenAI has done in the intervening two years? |
|