|
|
|
|
|
by Herring
151 days ago
|
|
Note I didn't say Karpathy's nanoGPT, I said use the speedrun. Transformers are universal function approximators. When well-tuned, they often start to approximate other innovations. Not always, thank god, but often enough that you have to be careful. |
|