Hacker News new | ask | show | jobs
by rhdunn 123 days ago
Mistral have small variants (3B, 8B, 14B, etc.), as do others like IBM Granite and Qwen. Then there are finetunes based on these models, depending on your workflow/requirements.
1 comments

True, but anything remotely useful is 300B and above
That is a very broad and silly position to take, especially in this thread.

I use Devstral 2 and Gemini 3 daily.

Devstral 2 is 123B parameters. Thats less than 300B, but its still much larger than the 3-14B models GP was talking about.