Hacker News new | ask | show | jobs
by quonn 1291 days ago
Don't know. Karpathy has a very compact implementation of GPT [0] using standard technology (could be even more compact but is reimplementing for example the attention layer for teaching purposes) and while he presumably has no access to how the real model was trained exactly, if there would be more to it I think he would be the kind of person to point it out.

[0] https://github.com/karpathy/minGPT/tree/master/mingpt