| HN Mirror

> IMHO the end game is on-device speech recognition and anything streaming audio somewhere else for processing is delaying this result.

Why? There's practically no latency to a central hub on the local network. A Raspberry Pi is probably over-specced for this, but I do very much see value in buying 5 $20 speaker/mic streaming stations and a $200 hub instead of buying 5 $100 Raspberry Pis.

If anything, I would expect the streaming to a hub solution to respond faster than the locally processed variant. My wifi latency is ~2ms, and my PC will run Whisper way more than 2ms faster than a Raspberry Pi. Add in running a model and my PC runs circles around a Raspberry Pi.

> I ran Dragon Dictate on a 200MHz PC in the 1990s. Now that wasn't top quality, but it should have been good enough for voice assistants. I expect at least that quality on-device with an R-Pi today if not better.

You should get that via Whisper. I haven't used Dragon Dictate, but Whisper works very well. I've never trained it on my voice, and it rarely struggles outside of things that aren't "real words" (and even then it does a passable job of trying to spell it phonetically).

No idea about resources. It's such an unholy pain in the ass to get working in the same process as LLMs on Windows that I'm usually just ecstatic it will run. One of these days I'll figure out how to do all the passthroughs to WSL so I can yet again pretend I'm not using Windows while I use Windows (lol).