| Voice assistants have a wake word. There is low power circuitry that is running locally listening for a specific series of syllables and a buffer of a second or two of audio. Once that wake word circuit detects that series of syllables, then it activates the rest of the device and starts streaming the buffer and current audio into whatever systems it has for transcription (or cloud). In many cases, this happens locally. If you have an iPhone, put it airplane mode and say "Siri, what time is it?" And it will respond - all processing is local, no recoding on the cloud for that request. Some other requests may require additional processing. "Hey Siri, where am I?" -> "To do that, you will need to turn off airplane mode." If you have an Amazon device, enable the "Start of request sound" ( https://www.amazon.com/b?ie=UTF8&node=21341310011 ). With this in place, you can then hear when the wake word has been triggered. None of these devices are constantly recording or streaming to the cloud (aside: consider the network and compute requirements if every iPhone or Android was constantly streaming sound to Apple or Google for it to be recorded). https://www.nxp.com/design/design-center/software/embedded-s... https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3... https://www.syntiant.com/news/syntiant-low-power-wake-word-s... https://www.researchgate.net/publication/224163648_Fully_int... (from 2010, Fully integrated 500uW speech detection wake-up circuit) |
I guess if it a device is transcribing audio data from a buffer it’s not the same as a recording. Still I remember Apple was using some humans to review recordings:
https://www.latimes.com/business/technology/story/2019-10-29...