Hacker News new | ask | show | jobs
by asabla 260 days ago
I'm always so confused by those statements as well. Because just like you, I feel that the 20B version is really good at following instructions.

Some of the qwen models are too, but they seem to need a bit more handholding.

This is of course just anecdotal from my end. And I've been slacking on keeping up with evals while testing at home