Hacker News new | ask | show | jobs
by fzimmermann89 289 days ago
Also, for an auto complete I think a small llm trained from scratch should already work well. Have you tried on if the tinystories(also only 3gb..)/nanogpt speed runs without any fancy loss terms etc as a baseline?