Hacker News new | ask | show | jobs
by SlySherZ 3082 days ago
I've been thinking about this for a while, there are some cool things you could do with command-based interfaces.

My take on it atm is to have a flexible (should work if you mistype a letter) command-based interface which is both voice and text based, where you can perform commands like:

play songs from coldplay

set alarm to friday at 3pm

make list with words a, b and c. Give me a random item from the list

There are some tricky parts though: - Should it be context aware? Notice how I, in the second part of the last command mentioned "the list". I think it should, and maybe even ask "which list?" in case there is more than one.

- How do you define commands in a way that makes it easy to add and compose commands, and reduces or eliminates the ambiguity for the parser?

- Is the kind of parsing you do in voice recognition similar enough to be compatible with text parsing?

If someone know about some tool similar to this, please let me know.

EDIT: Fixed typo.

1 comments

$ Play all Coldplay songs that were the most popular of their album $ Play all Coldplay songs about love $ Play the list of Coldplay songs for my 8 a.m. alarm

Now, how would the UI go about specifying which previously defined list you want to refer to? Sometimes you'll want to pick the most recent, sometimes the one most closely matching the definition, sometimes the one matching the "alarm clock" format most, sometimes you'll want to offer the user a choice among all objects similar enough to the description (What about if the object was built iteratively, do you offer intermediary objects as possible choices?), sometimes you'll want to ask a short question to restrict possible choices, if there's seems to be a clear criterion that probably improves understanding fast enough more relative to the time it takes to ask (this can depend on the user/environment/situation to choose between fast/precise answer).

To me the difficulty is that the choice of strategy can be built on the fly depending on context by humans, usually without building an understanding of all possible strategies but instead by just magically guessing a strategies which seems to fit well enough. This means being able to learn strategies based on previous experiences and building an evolving understanding of contexts.

Now this is probably not necessary to build a functioning interactor, but this is a reasonable description of normal human interaction, and the capacity for systems to adapt to contexts without much more outside help than humans do is going to be a good way to rate them.

Notice that your first and last example are somewhat simple to explain conceptually, so they should also be simple to build by composing the right building blocks:

"Play" gives away the fact you're looking for a song or list of songs, songs which you'll hopefully have in an internal knowledge base.

"all Coldplay songs": just filter by Coldplay. If your library is big enough, you could figure out that it refers to the artist.

"that": we need to filter again by the condition which follows.

Speeding up... "were the most popular of their album": take each song and corresponding album, sort album by property "popularity" and check it's the first one. You'll need to know "popular" refers to the property "popularity".

The third one is pretty similar. The second one could be easy too, but the piece "about love" is complicated on it's own.

IMO this means three things: firstly, words don't map directly to commands / capabilities, which means that having a composable way define capabilities is hard. You'll likely need to define many ways to do each thing, but you could add them one by one, over time. Secondly, the tool should be able to tell that it doesn't know what you're talking about (what is this "about love" thing about?!?). Lastly, it should be interactive, so that it can ask/tell you when it doesn't know ("what list do you mean, A or B").

Your comment about context is on point. We can't expect the tool to understand context it doesn't know about, which is why we cannot expect human level from this. But it doesn't have to be human level, it just needs to be good enough to be useful, and we do have a lot of space to improve.

We spend years learning this stuff, we could slowly teach a few tricks to the computer ;)