|
|
|
|
|
by simonw
1232 days ago
|
|
This article is an absolutely fantastic introduction to GPT models - I think the clearest I've seen anywhere, at least for the first section that talks about generating text and sampling. Then it got to the training section, which starts "We train a GPT like any other neural network, using gradient descent with respect to some loss function". It's still good from that point on, but it's not as valuable as a beginner's introduction. |
|
It doesn't go into the math but I don't think that's a bad thing for beginners.
If you want mathematical, 3blue1brown has a great series of videos [3] on the topic.
[1] https://www.youtube.com/watch?v=hBBOjCiFcuo&t=1932s
[2] https://github.com/fastai/fastbook/blob/master/04_mnist_basi...
[3] https://www.youtube.com/watch?v=aircAruvnKk
* I've been messing around with this stuff since 2016 and have done a few different courses like the original Andrew Ng course and more.