| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by baobun 766 days ago

Check out Rhasspy.

You won't get anything practically useful running LLMs on the 4B but you also don't strictly need LLM-based models.

In the Rhasspy community, a common pattern is to do (cheap and lightweight) wake-word detection locally on mic-attached satellites (here 4B should be sufficient) and then stream the actual recording (more computational resources for better results) over the local network to a central hub.

3 comments

phkahler 765 days ago

>> and then stream the actual recording (more computational resources for better results) over the local network to a central hub.

This frustrates me. I ran Dragon Dictate on a 200MHz PC in the 1990s. Now that wasn't top quality, but it should have been good enough for voice assistants. I expect at least that quality on-device with an R-Pi today if not better.

IMHO the end game is on-device speech recognition and anything streaming audio somewhere else for processing is delaying this result.

link

everforward 765 days ago

> IMHO the end game is on-device speech recognition and anything streaming audio somewhere else for processing is delaying this result.

Why? There's practically no latency to a central hub on the local network. A Raspberry Pi is probably over-specced for this, but I do very much see value in buying 5 $20 speaker/mic streaming stations and a $200 hub instead of buying 5 $100 Raspberry Pis.

If anything, I would expect the streaming to a hub solution to respond faster than the locally processed variant. My wifi latency is ~2ms, and my PC will run Whisper way more than 2ms faster than a Raspberry Pi. Add in running a model and my PC runs circles around a Raspberry Pi.

> I ran Dragon Dictate on a 200MHz PC in the 1990s. Now that wasn't top quality, but it should have been good enough for voice assistants. I expect at least that quality on-device with an R-Pi today if not better.

You should get that via Whisper. I haven't used Dragon Dictate, but Whisper works very well. I've never trained it on my voice, and it rarely struggles outside of things that aren't "real words" (and even then it does a passable job of trying to spell it phonetically).

No idea about resources. It's such an unholy pain in the ass to get working in the same process as LLMs on Windows that I'm usually just ecstatic it will run. One of these days I'll figure out how to do all the passthroughs to WSL so I can yet again pretend I'm not using Windows while I use Windows (lol).

link

genewitch 765 days ago

in the same vein there was a product called "Copernic Summarizer" that was functionally equivalent to asking chatgpt to summarize an article - 20 years ago.

link

BizarroLand 765 days ago

I have a decent home server, is there a system like this where the heavy lifting can be run on my home server but the front end piped to a ras-pi with speaker/camera, etc. ?

If not, any good pointers to start adapting this platform to do that?

link

shaan7 765 days ago

I've been a happy Rhasspy user and now even more excited for the future because I'm hoping the conversational style (similar to what OpenAI demo'd yesterday) will eventually come to the offline world as well. It is okay if I gotta buy a GPU to make that happen, or maybe we get luckier and it wont? Ok, maybe I'm getting too optimistic now

link

nirav72 765 days ago

Someone posted this on the HA sub yesterday - https://www.youtube.com/watch?v=4W3_vHIoVgg

He's using GPT 4o. Although not the stuff that was demo'ed yesterday, since it hasn't been widely rolled out yet. But I think its entirely possible in the near future.

link

shaan7 764 days ago

Awesome. Side note: now my living room lights are white because I listened to this on speakers lol. I have the same wakeword xD

link