Hacker News new | ask | show | jobs
by pstorm 820 days ago
I'm surprised this isn't getting more love. I love the concept of finetuned, hyper-specific, tiny LLMs. Of course, the data is the most important part.
1 comments

Thanks for the kind words! I started with the 780M param flan-t5-large model, and kept trying smaller and smaller base models - I was shocked at how good the output was at 77M. As you go smaller, though, it's much easier to accidentally overfit or collapse the model and produce gibberish. Had to be very careful with hyperparams and sanitizing / filtering the dataset.