Hacker News new | ask | show | jobs
by rtpg 695 days ago
Is this not actually kind of powerful? Having linguists write up a bunch of rules seems a lot more predictable than "rolling a bunch of dice and hoping that some LLM spits out a coherent set of steps".

It feels very fractal but on the other hand if Alexa has only a specific gamut of responses it's not exactly a limitless state space right?

Very curious about how those rules look like though

3 comments

The problem is it's completely undiscoverable. You can tell Alexa "play some music" because you're pretty sure one of these linguists added a rule for that. But can you tell it "play me a song that lasts longer than 5 minutes"? Doubtful. The only way to know is to try it.

The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.

The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.

Contrast that with LLMs which basically at least understand everything you're asking of them. If you give them a simple API to carry out actions they can do really complex commands like "send a WhatsApp to my wife telling her how when I'll get home if I start cycling in 10 minutes". That's impossible without LLMs but pretty trivial with them.

Obviously the downside is they are prone to bullshitting and might do completely the wrong thing.

It’s worse than that. These systems can be adapted by looking at failed user commands, but people don’t really sit around trying out fun things and watch it fall on its face for longer than the first day or so. After that, the novelty wears off, so you’ve trained your users to accept the device’s limitations. Then, even when you do improve the functionality, your users won’t know! They won’t try it, and those commands will never get traction in the system or get more testing beyond the initial launch criteria. It’s a death spiral. The same thing happens with the tone of voice people use.
> people don’t really sit around trying out fun things and watch it fall on its face for longer than the first day or so.

Isn't this what thumbs-up/down RL is for? To improve the quality of the results.

That’s the intention but very few users enjoy being unpaid QA for trillion dollar corps.
I am confused as to why it's more undiscoverable than, say, some LLM.

> The problem is the space of possible commands is waaaaay bigger than the space of commands you can manually handle, which means if you just randomly try stuff 95% of the time it won't work. Users learn that very quickly and end up sticking to the few commands they know work.

This is not strictly true. Context free grammars can be written to handle (finite) sentences of arbitrary length! if you have a rule like "play me <song>" and then <song> can be "a song that lasts longer than X" or "a song by <artist>" (then you have <artist> be "<some name>" or "some German singer" or whatever....). You can just keep on going.

> The one exception is "search" queries - "how tall is Everest" and so on, but that only really works well on Google's platform because they've done all the work for that already.

Had a small Google Assistant thingy for years, and that search stuff works great, until it doesn't, and completely misses the mark. This immediately kills trust and reduces it to a gadget that I will only use for non-critical stuff, always expecting it to break anyway.

> But can you tell it "play me a song that lasts longer than 5 minutes"?

I don't think even pre-LLM technology allows you to do this.

I can't do something as basic as goto Spotify's search page and filter "only genres I like", neither a smart version of that filter or a manual version of that filter is possible.

Honestly the FSTs themselves were actually really cool, it's very much GOFAI. It automatically creates lots of permutations, i.e. `play taylor swift`, `please play taylor swift`, play taylor swift now`. etc. And once the FST is built it always works deterministically. It's compiled to a graph and an incoming command is pushed through the state machine, if you get to an end state it "matched the fst" and some specific behaviour would be triggered.

the rule were really just strings and we had efficient matching against it. I didn't work on that, I would assume some sort of LHS.

what do all these acronyms mean
wouldn't that just be some kind of NLP https://en.wikipedia.org/wiki/Natural_language_processing?

May just be a long list of if/else and/or switch statements or isomorphism.