|
|
|
|
|
by Psychokiller
1516 days ago
|
|
This is entirely true and I have a few solution I could deploy with some work, the problem is a caching like this consumes power, as you literally listen all the time, as for a wakeword, to cache the audio data in memory and use it ONLY if a wakeword is detected. Now big companies do it in the cloud, we could do it locally, as an option. The path I chose to mitigate that unatural feeling is to use a human answer, bit like at home, you in the kitchen, wife or kids further away, not communicating. At some point you'd call your wife "Alice?" and you'd wait for her to reply for a "yes?" before talking as you are unaware if she's focused on you at the moment or playing with the kids whatever |
|
I mean, would it really consume that much extra power to just have a second sink that's just a N-second circular buffer, so you got the samples after the wakeword ready for speech recognition when the wakeword is detected?