|
|
|
|
|
by desideratum
70 days ago
|
|
This is a great question. You definitely aren't training this to use it, you're training it to understand how things work. It's an educational project, if you're interested in experimenting with things like distributed training techniques in JAX, or preference optimisation, this gives you a minimal and hackable library to build on. |
|
Kaparthy's notes on improving nanochat [1] are one of my favorite blog-like things to read. Really neat to see which features have how much influence, and how the scaling laws evolve as you improve the architecture
There's also modded-nanogpt which turns the same kind of experimentation into a training speedrun (and maybe loses some rigor on the way) [2]
1 https://github.com/karpathy/nanochat/blob/master/dev/LOG.md
2 https://github.com/kellerjordan/modded-nanogpt