Hacker News new | ask | show | jobs
by maccam912 536 days ago
Is there any rule of thumb for small language models vs large language models? I've seen phi 4 called a small language model but with 14 billion parameters, it's larger than some large language models.
3 comments

7b to 9b is usually what we call small. the rule of thumb is a model that you can run on a single GPU.
It’s not a useful distinction. The first LLMs had less than 1 billion parameters anyway.
I would claim that even 500 million parameters could be considered large.