Hacker News new | ask | show | jobs
by potamic 242 days ago
How did you test this? Did you train something?
1 comments

Yeah, I did! I trained a few small ones — mostly the “nano” and “tiny” templates (a few million params) on datasets like Shakespeare and Alpaca. The goal was to make sure the training loop, tokenizer, and evaluation all worked smoothly.

Didn’t go for massive models — more about making the whole setup process quick and reliable. You can actually train the nano one on CPU in a few minutes just to see it working.