|
|
|
|
|
by baalimago
42 days ago
|
|
Yeah it's just a semantic pet peeve. Let me ask you this: What is a "Language Model", if this is a "Large Language Model"? Inversely, if a 1.5B model is "Large" then what is the recent 1T param models? "Superlarge"? In my own very humble opinion, it becomes "Large" when it's out of non-specialized hardware. So currently, a model which requires more than 32GB vram is large (as that's roughly where the high-end gaming GPUs cut off). And btw, there is no way you can train a language model on a CPU, even with ddr5, lest you wait a whole week for a single training cycle. Give it a go! I know I did, it's a magnitude away from being feasible. |
|
I'm not sure. Microsoft calls Phi-4 a small language model, so the distinction is considered meaningful to some people working in the space. My own view is that the term "LLM" implies something about the capabilities of the model in 2026. Maybe there's not a hard definition of the term, but whatever the definition is, the model in the article wouldn't make it.