| HN Mirror

As far as I understand, it uses general STT (which tries to transcribe everything, unlike say Picovoice which limits interpretation to only a few commands) + intent recognition. It probably can't interpret an utterance "stop" as anything other than its matching intent (even a bag of words classifier can) and since the STT's still the same, it probably won't change a thing.