Hacker News new | ask | show | jobs
by pawelduda 762 days ago
All I need is a voice assistant that: - RPi 4 can handle, - I can integrate with HomeAssistant, - is offline only, and doesn't send my data anywhere.

This project seems to be ticking most, if not all of the boxes, compared to anything else I've seen. Good job!

While at it, can someone drop a recommendation for a Rpi-compatible mic for Alexa-like usecase?

4 comments

Check out Rhasspy.

You won't get anything practically useful running LLMs on the 4B but you also don't strictly need LLM-based models.

In the Rhasspy community, a common pattern is to do (cheap and lightweight) wake-word detection locally on mic-attached satellites (here 4B should be sufficient) and then stream the actual recording (more computational resources for better results) over the local network to a central hub.

>> and then stream the actual recording (more computational resources for better results) over the local network to a central hub.

This frustrates me. I ran Dragon Dictate on a 200MHz PC in the 1990s. Now that wasn't top quality, but it should have been good enough for voice assistants. I expect at least that quality on-device with an R-Pi today if not better.

IMHO the end game is on-device speech recognition and anything streaming audio somewhere else for processing is delaying this result.

> IMHO the end game is on-device speech recognition and anything streaming audio somewhere else for processing is delaying this result.

Why? There's practically no latency to a central hub on the local network. A Raspberry Pi is probably over-specced for this, but I do very much see value in buying 5 $20 speaker/mic streaming stations and a $200 hub instead of buying 5 $100 Raspberry Pis.

If anything, I would expect the streaming to a hub solution to respond faster than the locally processed variant. My wifi latency is ~2ms, and my PC will run Whisper way more than 2ms faster than a Raspberry Pi. Add in running a model and my PC runs circles around a Raspberry Pi.

> I ran Dragon Dictate on a 200MHz PC in the 1990s. Now that wasn't top quality, but it should have been good enough for voice assistants. I expect at least that quality on-device with an R-Pi today if not better.

You should get that via Whisper. I haven't used Dragon Dictate, but Whisper works very well. I've never trained it on my voice, and it rarely struggles outside of things that aren't "real words" (and even then it does a passable job of trying to spell it phonetically).

No idea about resources. It's such an unholy pain in the ass to get working in the same process as LLMs on Windows that I'm usually just ecstatic it will run. One of these days I'll figure out how to do all the passthroughs to WSL so I can yet again pretend I'm not using Windows while I use Windows (lol).

in the same vein there was a product called "Copernic Summarizer" that was functionally equivalent to asking chatgpt to summarize an article - 20 years ago.
I have a decent home server, is there a system like this where the heavy lifting can be run on my home server but the front end piped to a ras-pi with speaker/camera, etc. ?

If not, any good pointers to start adapting this platform to do that?

I've been a happy Rhasspy user and now even more excited for the future because I'm hoping the conversational style (similar to what OpenAI demo'd yesterday) will eventually come to the offline world as well. It is okay if I gotta buy a GPU to make that happen, or maybe we get luckier and it wont? Ok, maybe I'm getting too optimistic now
Someone posted this on the HA sub yesterday - https://www.youtube.com/watch?v=4W3_vHIoVgg

He's using GPT 4o. Although not the stuff that was demo'ed yesterday, since it hasn't been widely rolled out yet. But I think its entirely possible in the near future.

Awesome. Side note: now my living room lights are white because I listened to this on speakers lol. I have the same wakeword xD
I have got good results with Playstation 3 and Playstation 4 cameras (which also have a mic). They are available for $15-20 on ebay.
Have you checked out the Voice Assistant functionalities integrated into HA? https://www.home-assistant.io/voice_control/

NabuCasa employed the Rhasspy main dev to work on these functionalities and they are progressing with every update.