| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by philipp2310 1516 days ago

For the moment, no, nothing is cached until the hotword is recognized. We thought about it though, but it would mean we have to store passed sound input for a few seconds. While this won't be a problem for the main device, satellites aren't power full enough to run ASR them selfes (Raspi Zero), so the sound is streamed to the main device after the hotword detection. This process wouldn't match perfectly with the storing of the data.

Another thing to keep in mind is, we use intermediate results for the ASR. Means already while you are speaking, the input is parsed. Only a few ms after you go silent, the parsing is finalized and NLU/TTS will start right away.

Of course with a bit bias, I'd say it is more like: "Hey Alice" "Yes?.." "What's gonna be the weather at 11 tomorrow?" short pause (.2 seconds?) <answer>