| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Cheetah26 1128 days ago

This looks like something I've been wanting to see for a while.

I currently have a google home and I'm getting increasingly fed up with it. Besides the privacy concerns, it seems like it's getting worse at being an assistant. I'll want my light turned on by saying "light 100" (for light to 100 percent) and it works about 80% of the time, but the others it starts playing a song with a similar name.

I'd be great if this allows limiting / customizing what words and actions you want.

4 comments

t-vi 1128 days ago

Personally, I plugged a Jabra conference speaker to a Raspberry and if it hears something interesting, it sends to my local GPU computer for decoding (with whisper) + answer-getting + response sent back to the Raspberry as audio (with a model from coqui-ai/TTS but using more plain PyTorch). Works really nicely for having very local weather, calendar, ...

link

kkielhofner 1128 days ago

Neat!

If you don't mind my asking, what do you mean "if it hears something interesting"? Is that based on wake word, or always listen/process?

link

t-vi 1128 days ago

Both:

A long while ago, I wrote a little tutorial[0] on quantizing a speech commands network to the Raspberry. I used that to control lights directly and also for wake word detection.

More recently, I found that I can just use more classic VAD because my uses typically don't suffer if I turn on/off the microphone. My main goal is to not get out the mobile phone for information. That reduces the processing when I turn on the radio...

Not high-end as your solution, but nice enough for my purposes.

[0]. https://devblog.pytorchlightning.ai/applying-quantization-to...

link

kkielhofner 1128 days ago

Totally get it!

There are at least two ways to deal with this frustrating issue with Willow:

- With local command recognition via ESP SR command recognition runs completely on the device and the accepted command syntax is defined. It essentially does "fuzzy" matching to address your light command ("light 100") but there's no way it's going to send some random match to play music.

- When using the inference server -or- local recognition we send the speech to text output to the Home Assistant conversation/intents[0] API and you can define valid actions/matches there.

[0] - https://developers.home-assistant.io/docs/intent_index/

link

chankstein38 1128 days ago

This drives me nuts and happens all the time as well. To be honest, I unplugged my google home device a while back and haven't missed it. It mostly ended up being a clock for me because I'd try to change the color of my lights to a color that it mustn't have been capable of because I'd have to sit there for minutes listening to it list stores in the area that might sell those colored lights or something. It wouldn't stop. This is just one of many frustrating experiences I'd had with that thing.

link

schainks 1128 days ago

THIS. It's hilarious and infuriating our digital assistants struggle to understand variants of "set lights at X% intensity".

However, if I spend the time to configure a "scene" with the right presets, Google has no issue figuring it out.

If only it could notice regular patterns about light settings and offer suggestions that I could approve/deny.

link