Hacker News new | ask | show | jobs
by omarhegazy 4422 days ago
Really interesting program, although, as Andrew has said, seems kinda limited. That's not your fault at all; any early project will seem limited in scope. Given some popularity and extra effort, something like this could be the Siri of the command line.

Which gets me thinking -- is stuff like Siri and Google Now really just like this? Core set of pre-set commands surrounded by regex magic to recognize said pre-set commands? Interesting.

Begs the question : is it possible, using current knowledge in machine learning and NLP, to create a English-like interface for #{some_device_or_program_here} that learns and self-develops the English commands from the user? Sort of like how Bayesian spam filters (http://www.paulgraham.com/spam.html) don't have core preset hardcoded set of Spam-Related Words and classify them accordingly, but instead takes an initial corpus and then learns and self-develops from the user after that.

2 comments

I think having a prefix (be that "betty" "b" and/or some "mode" for your shell) that signals you want to use natural language (with all it's ambiguity) is a good idea. The zsh way of suggesting "did you mean" rather than simply erroring with "command not found/invalid syntax" drives me nuts -- but a lot of people seem like it. Having a prefix allows you more freedom at "learning" -- ambiguity isn't so terrible if the user expects it (and the simple idea of just listing alternatives that betty uses seems like a great interface. Not as sexy as "I'm feeling lucky"-style (super-)high scoring wins, middling scores ties and asks user to pick -- but I think it may win on the principle of least surprise).

Anyway, it does seem that most proof-of-concept voice-control (as opposed to text-controlled) systems use a prefix too "siri"/"glass"/<microsoft had one, can't remember which, also they have "xbox". The idea is that if the mic is always on, you don't want your drones to blow something up, just because you jokingly told a friend "kill it with fire" in a voice call. Context is hard to get right for such systems, I expect the kinetic and similar systems can do better (if user looks at computer, listen. If user is already speaking "in conversation with" to computer, listen. Otherwise ignore, unless user asks for computer by name).

As for you question, I think it should be relatively easy to train, say a music-player app to understand stuff like "next song", "accept call", "repeat" -- in any language, using simple statistical methods. Not sure how far you could take it though (example, dictation software still makes (AFAIK) enough errors that it's not really a viable option if the user already can type reasonably well (or hire an actual stenographer)).

Stuff like "next song", "accept call", etc. can be done with extensive knowledge in machine learning and some clever work.

The really tough bits will be stuff like,

"Siri, check if PBS Idea Channel has uploaded any new videos, please."

How will Siri know you mean the YouTube app? How will Siri know what "check if X has uploaded any new videos" means? How will Siri know you mean "PBS Idea Channel" and not the channel called "PBS Idea"?

Cortona was released with an API to external apps, and that only allows simple pattern matching, so that's similar. Obviously the built-in stuff is more complex. I've no idea about today, but the original Siri was mostly just chaining relationships together by keyword matching in an ontology.

You could probably train a system like this with a word alignment approach if you generated a training corpus. But ideally you'd want to be able to show the system a new manpage and have it map arguments correctly.

Also a false positive in a SPAM filter is bad, but `rm -rf`ing because of the vagaries of the English language is worse.