Hacker News new | ask | show | jobs
by theaniketgiri 242 days ago
Yeah, I did! I trained a few small ones — mostly the “nano” and “tiny” templates (a few million params) on datasets like Shakespeare and Alpaca. The goal was to make sure the training loop, tokenizer, and evaluation all worked smoothly.

Didn’t go for massive models — more about making the whole setup process quick and reliable. You can actually train the nano one on CPU in a few minutes just to see it working.