Hacker News new | ask | show | jobs
by Rust 5129 days ago
One way might be to run the audio stream through a speech-to-text engine and parse the resulting transcript.

A video recognition system could also be used to identify faces, landmarks and common objects.