|
|
|
|
|
by omarhegazy
4422 days ago
|
|
Really interesting program, although, as Andrew has said, seems kinda limited. That's not your fault at all; any early project will seem limited in scope. Given some popularity and extra effort, something like this could be the Siri of the command line. Which gets me thinking -- is stuff like Siri and Google Now really just like this? Core set of pre-set commands surrounded by regex magic to recognize said pre-set commands? Interesting. Begs the question : is it possible, using current knowledge in machine learning and NLP, to create a English-like interface for #{some_device_or_program_here} that learns and self-develops the English commands from the user? Sort of like how Bayesian spam filters (http://www.paulgraham.com/spam.html) don't have core preset hardcoded set of Spam-Related Words and classify them accordingly, but instead takes an initial corpus and then learns and self-develops from the user after that. |
|
Anyway, it does seem that most proof-of-concept voice-control (as opposed to text-controlled) systems use a prefix too "siri"/"glass"/<microsoft had one, can't remember which, also they have "xbox". The idea is that if the mic is always on, you don't want your drones to blow something up, just because you jokingly told a friend "kill it with fire" in a voice call. Context is hard to get right for such systems, I expect the kinetic and similar systems can do better (if user looks at computer, listen. If user is already speaking "in conversation with" to computer, listen. Otherwise ignore, unless user asks for computer by name).
As for you question, I think it should be relatively easy to train, say a music-player app to understand stuff like "next song", "accept call", "repeat" -- in any language, using simple statistical methods. Not sure how far you could take it though (example, dictation software still makes (AFAIK) enough errors that it's not really a viable option if the user already can type reasonably well (or hire an actual stenographer)).