Hacker News new | ask | show | jobs
Ask HN: After ChatGPT, why are voice assistants still so horrible?
3 points by throwawayllaq 999 days ago
I just spent 15 frustrating minutes talking to Google Assistant asking simple questions that can easily be found on, mild shock, Google, and all I got was "sorry, I didn't understand".

Same with Siri or even Alexa.

Google Assistant was launched in 2016. ChatGPT is much younger and is already running circles around them with the plugins.

Which evil force is making them so bad?

3 comments

It's because speech to text, text to speech, and LLMs all have latency. They are working on it. There were rumors that Apple is spending a million dollars a day on its secret AI. Probably some form of it will go into Siri.

Edit: you can try the gpt4-enhanced Bing, it works pretty well with voice

I expected TTS/STT to be a solved problem now with decades of work on it.

I can understand LLMs having greater latency but all flagship smartphones have inference accelerators these days.

A good response is just an API away, which Google Assistant already does I think (it doesn't give me an instantaneous answer ever).

My experience with Google Assistant is a disaster. I ask in English it answers in Turkish that it couldn't understand. I repeat in Turkish, it switches back to English and we can't move any further.
My exact issue today too. To my surprise, I was talking in Portuguese and it was answering in Portuguese even though my phone is set to English. Then I opened Spotify and it started to talk in English and misunderstand everything.
> Which evil force is making them so bad?

First-party liability, probably. I doubt Apple or Google are interested in shipping an "assistant" that can't reliably distinguish fiction from fact.

To be fair, I'm not saying it should hold all the knowledge in the world. I just expect it to give me short answers to pretty simple questions.

For example: "Hey Google, what are the biggest cities in Australia?" could be answer by "Hey Dave, the biggest cities in Australia are Sydney, Melbourne, and Brisbane" OR "I don't know this information and can't search the web for it"... but it just answers "I did not understand" over and over for anything.

It seems to only understand (sometimes): "play X on spotify", "directions to X", "restaurants" ... anything in slightly more complex natural language is a no go.