Hacker News new | ask | show | jobs
by Cheetah26 1128 days ago
This looks like something I've been wanting to see for a while.

I currently have a google home and I'm getting increasingly fed up with it. Besides the privacy concerns, it seems like it's getting worse at being an assistant. I'll want my light turned on by saying "light 100" (for light to 100 percent) and it works about 80% of the time, but the others it starts playing a song with a similar name.

I'd be great if this allows limiting / customizing what words and actions you want.

4 comments

Personally, I plugged a Jabra conference speaker to a Raspberry and if it hears something interesting, it sends to my local GPU computer for decoding (with whisper) + answer-getting + response sent back to the Raspberry as audio (with a model from coqui-ai/TTS but using more plain PyTorch). Works really nicely for having very local weather, calendar, ...
Neat!

If you don't mind my asking, what do you mean "if it hears something interesting"? Is that based on wake word, or always listen/process?

Both:

A long while ago, I wrote a little tutorial[0] on quantizing a speech commands network to the Raspberry. I used that to control lights directly and also for wake word detection.

More recently, I found that I can just use more classic VAD because my uses typically don't suffer if I turn on/off the microphone. My main goal is to not get out the mobile phone for information. That reduces the processing when I turn on the radio...

Not high-end as your solution, but nice enough for my purposes.

[0]. https://devblog.pytorchlightning.ai/applying-quantization-to...

Totally get it!

There are at least two ways to deal with this frustrating issue with Willow:

- With local command recognition via ESP SR command recognition runs completely on the device and the accepted command syntax is defined. It essentially does "fuzzy" matching to address your light command ("light 100") but there's no way it's going to send some random match to play music.

- When using the inference server -or- local recognition we send the speech to text output to the Home Assistant conversation/intents[0] API and you can define valid actions/matches there.

[0] - https://developers.home-assistant.io/docs/intent_index/

This drives me nuts and happens all the time as well. To be honest, I unplugged my google home device a while back and haven't missed it. It mostly ended up being a clock for me because I'd try to change the color of my lights to a color that it mustn't have been capable of because I'd have to sit there for minutes listening to it list stores in the area that might sell those colored lights or something. It wouldn't stop. This is just one of many frustrating experiences I'd had with that thing.
THIS. It's hilarious and infuriating our digital assistants struggle to understand variants of "set lights at X% intensity".

However, if I spend the time to configure a "scene" with the right presets, Google has no issue figuring it out.

If only it could notice regular patterns about light settings and offer suggestions that I could approve/deny.