Hacker News new | ask | show | jobs
by TeMPOraL 1884 days ago
You can easily process the audio on the fly and reduce it to a probabilistic estimate of whether a tag from a predefined topic set was present in the conversation. Doesn't need to be 100% accurate. You don't need to store the audio - just stream it through the recognizer. The output of such recognizer will be something on the order of 8-32 bytes (an int for tag, a float for probability, an int64 for timestamp), possibly less if one's clever - and it only needs to be stored until the next opportunity to send it out.

Also: people seem to be looking at modern speech recognizers on their phones and wrongly concluding that speech recognition in general is very compute-intensive. It isn't, if you're willing to make some sacrifices on accuracy and generality, and to do it locally instead voice data off to a cloud somewhere. A proper benchmark here isn't Siri or Google Assistant - it's Microsoft Speech API, as shipped with Windows 12+ years ago.