| HN Mirror

Yeah, I did! I trained a few small ones — mostly the “nano” and “tiny” templates (a few million params) on datasets like Shakespeare and Alpaca. The goal was to make sure the training loop, tokenizer, and evaluation all worked smoothly.

Didn’t go for massive models — more about making the whole setup process quick and reliable. You can actually train the nano one on CPU in a few minutes just to see it working.