|
|
|
|
|
by aquafox
211 days ago
|
|
Having gone through the explainations of the Transformer Explainer [1], I now have a good intuition for GPT-2. Is there a resource that gives intuition on what changes since then improve things like more conceptually approaching a problem, being better at coding, suggesting next steps if wanted etc? I have a feeling this is a result of more than just increasing transformer blocks, heads, and embedding dimension. [1] https://poloclub.github.io/transformer-explainer/ |
|
The exceptions are improvements in context length and inference efficiency, as well as modality support. Those are architectural. But behavioral changes are almost always down to: scale, pretraining data, SFT, RLHF, RLVR.