Hacker News new | ask | show | jobs
by c0brac0bra 406 days ago
What tasks have you found the 0.6B model useful for? The hallucination that's apparent during its thinking process put up a big red flag for me.

Conversely, the 4B model actually seemed to work really well and gave results comparable to Gemini 2.0 Flash (at least in my simple tests).

2 comments

You can use 0.6B for speculative decoding on the larger models. It'll speed up 32B, but slows down 30B-A3B dramatically.
It's okay for extracting simple things like addresses, or for formatting text with some input data, like a more advanced form of mail merge.

I haven't evaled these tasks so YMMV. I'm exploring other possibilities as well. I suspect it might be decent at autocomplete, and it's small enough one could consider finetuning it on a codebase.