Hacker News new | ask | show | jobs
by bottlepalm 1066 days ago
How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant?

Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.

7 comments

OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.

Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.

I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).

It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.
The official ChatGPT app has had voice-recognition for a while now. Still not closing the obvious loop with text-to-speech, but probably they have bigger fish to fry. It might be that the projected extra subscription revenue would not make such a big difference in the rate at which they burn through capital.
No big company wants their appliance to accidentally talk customer's child into suicide or spouse into a divorce. Bad for image.
It's not like ChatGPT can't do that already..
> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."

Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.
Microsoft is doing that to replace Cortana in windows 11
I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.
Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.
It takes a while to develop a product, and the world only woke up to them mere months ago
I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".

Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS