Hacker News new | ask | show | jobs
by drnick1 1 hour ago
Consider this. One of the smallest Qwen models (4B parameters) powers my home automation voice assistant, and runs on CPU alone at >20 tok/s. It is enough for that use case, and could be made even better/faster with a modest GPU. It isn't as smart as some cloud-connected thingamajig, but I would never allow a literal Google or Amazon bug in my home. Huge SOTA models aren't relevant everywhere. Most people use LLMs for rather trivial tasks such as finding typos or drafting text.
1 comments

Curious, what exactly does it do for you? I has bad luck with these small models to do anything useful tbh.