Hacker News new | ask | show | jobs
by abritinthebay 3300 days ago
I feel this fundamentally misunderstands how each of their AI's work.

You're describing an interface problem. Google's AI is extremely good at recognizing speech in comparison to Siri but that's a wholly different thing to deep understanding.

Siri is a more advanced AI in many ways due to it's understanding of intents. Google's is much more command based (under the hood). However Apple's speech recognition is letting it down here. Intents are a much more sophisticated way to interact with an AI than Google's command based structure but they are correspondingly more complex to integrate and you realistically need to infer things more from the user (which Apple needs to get better at).

I don't disagree that Apple needs to step up their game with how people interact with Siri, but it's a perceptual issue with the interface not with the underlying AI.

(source - have discussed these exact issues with one of the founders of Nuance).

5 comments

This runs contrary to the experience of most people.

See http://www.businessinsider.com/siri-vs-google-assistant-cort...

Siri is dead last. Surprisingly, Cortana did quite well (I didn't realize Microsoft was catching up in this space).

Those responses require a lot more than just voice recognition - they require an understanding of context and "intent".

That study isn't very useful.

> A recent study […] asked the major voice assistants 5,000 general knowledge questions

General knowledge questions is just one of the many things you can use voice assistants for. It's well known that Siri is the worst at general knowledge questions, but that doesn't mean Siri is the worst digital assistant.

This morning I told my Google Home "Can you turn your volume all the way up?" and it did it. It maxed the volume, not just turned it up a little bit. To me, this indicates that the Google Assistant is actually pretty good at understanding intent and not just repeating exact commands. Or maybe _I'm_ just misunderstanding your definition of 'commands' and 'intents'.
I mean.. that's a command for sure.

This is where a good UI can obscure the underlying thing though. Google is trying to find intent via the text of speech and that's one way to go about it for sure. It's totally valid. It's more limited, but it'll work great if there is a command that it can match it to.

But there has to be an underlying single command that is "turn volume up"

With an intent based system you can chain complex intents together and the result is the behavior of those intents interaction.

It's... ugh, I'm explaining this poorly.

Think of intents as building blocks to create something from whereas commands are just specific things. I wish I could explain this better.

I don't really understand your distinction between intents and commands. I've created apps that leverage a variety of bot frameworks, and most of them seem to fall under your "command" criticisms, but are labeled as intents. I think I understand the heart of your argument which is akin to saying google handles intents / actions as if it were filling inputs on a web form that ultimately goes to an api for response generation.

Having said that, I don't know how you can say that Siris backend is much better from an intent perspective without being able to leverage it properly because of the shortcomings related to the UI. From what I've seen, it doesn't even handle context well. Now it sounds like Siri will be used to do proactive things which is certainly new and different from Google Assistant. Yet, I suspect that logic is just being branded as Siri because it is a push to label Siri as your intelligent assistant as opposed to the weird robot thing you can use to check the weather

So what would be an advantage of Apple's system (assuming they improve speech recognition) and how would that contrast with Google's approach?
The power to create connections between things of the underlying system is greater.

Basically if Apple actually manages to get the voice recognition part down (so it parses intent/structure rather than just pure words) it'll be very powerful.

Right now Apple has a frontend that is more suitable for a command-based system and Google has a frontend that is better for am intent based system.

It's easier to improve the frontend than create a better backend though. So that's the competitive trade-off that Google has made.

I actually think it was smart of Google, and quite un-Apple like of Apple to do it the way they did. Usually they are very focused on the interface as well.

Then again, at the time, Siri was quite good at that. It's just Google focused on it (clearly).

Oh I see, thanks!
[citation needed]