Hacker News new | ask | show | jobs
by t-vi 1128 days ago
Personally, I plugged a Jabra conference speaker to a Raspberry and if it hears something interesting, it sends to my local GPU computer for decoding (with whisper) + answer-getting + response sent back to the Raspberry as audio (with a model from coqui-ai/TTS but using more plain PyTorch). Works really nicely for having very local weather, calendar, ...
1 comments

Neat!

If you don't mind my asking, what do you mean "if it hears something interesting"? Is that based on wake word, or always listen/process?

Both:

A long while ago, I wrote a little tutorial[0] on quantizing a speech commands network to the Raspberry. I used that to control lights directly and also for wake word detection.

More recently, I found that I can just use more classic VAD because my uses typically don't suffer if I turn on/off the microphone. My main goal is to not get out the mobile phone for information. That reduces the processing when I turn on the radio...

Not high-end as your solution, but nice enough for my purposes.

[0]. https://devblog.pytorchlightning.ai/applying-quantization-to...