| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bottlepalm 1066 days ago
	How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet, and how has OpenAI not released their own voice assistant? Also like RSS, if there were some standard URL a websites exposed for AI interaction, using this TypeChat to expose the interfaces, we'd be well on our way here.

7 comments

dbish 1066 days ago

OpenAI is pretty likely working on their own (see Kaparthy's "Building a kind of JARVIS @ OреոΑӏ"), and Microsoft of course is doing an integration or reinterpretation of Cortana with OpenAI's LLMS (since they are incapable of building their own models nowadays it seems - "Why do we have Microsoft Research at all?”-S.N.), but there's a lot less value in voice driven LLM then there is in actually being able to perform actions. Take Alexa for example, you need a system that can handle smart home control in a predictable, debuggable, way otherwise people would get annoyed. I definitely think you can do this, but the current system as built (and others like Siri and to a lesser use Cortana) all have a bunch of hooks and APIs being used by years and years of rules and software built atop less powerful models. They need to both maintain the current quality and improve on it while swapping out major parts of their system in order to make this work, which takes time.

Not to mention that none of these assistants actually make any money, they all lose money really, and are only worthwhile to big companies with other ways to make cash or drive other parts of their business (phones, shopping, whatever), so there's less incentive for a startup to do it.

I worked on both Cortana and Alexa in the past, thought a lot about trying to build a new version of them ground up with the LLM advancements, and while the tech was all straight forward and even had some new ideas for use cases that are enabled now, could not figure out a business model that would work (and hence, working on something completely different now).

link

bottlepalm 1066 days ago

It's July, they just needed to put a voice interface on ChatGPT, it'd easily help them sell more pro licenses as well. I'm not a conspiracy person, but this just seems so obvious it feels like there's something else going on here.

link

pegasus 1066 days ago

The official ChatGPT app has had voice-recognition for a while now. Still not closing the obvious loop with text-to-speech, but probably they have bigger fish to fry. It might be that the projected extra subscription revenue would not make such a big difference in the rate at which they burn through capital.

link

throwaway290 1066 days ago

No big company wants their appliance to accidentally talk customer's child into suicide or spouse into a divorce. Bad for image.

link

bottlepalm 1065 days ago

It's not like ChatGPT can't do that already..

link

nonethewiser 1066 days ago

> How does no voice assistant (Apple, Google, Amazon, Microsoft) integrate LLMs into their service yet

When I first learned what ChatGPT was my thought was "oh so like what Siri is supposed to be."

link

perryizgr8 1066 days ago

Talking to Alexa is laughable now, after having interacted with ChatGPT and Bing. It's so frustrating to see capable hardware being let down by crappy software for years upon years.

link

zitterbewegung 1066 days ago

Microsoft is doing that to replace Cortana in windows 11

link

nathan_f77 1066 days ago

I'm really looking forward to something that I can use to control Home Assistant. I'm just really nervous about using any cloud-based API for this, so I would like to get something running on a server in my own house. But I would also want the voice recognition and response times to be extremely fast so I don't feel like I'm ever waiting for anything. I've seen a few DIY attempts at a personal assistant but there's always a significant delay that would become very annoying if I used it regularly.

link

9dev 1066 days ago

Seriously, it feels like there’s some collusion going on behind the scenes. This is the most obvious use case for the technology, but none of the big vendors have explored it.

link

jomohke 1066 days ago

It takes a while to develop a product, and the world only woke up to them mere months ago

link

mavamaarten 1066 days ago

I think it's because it turns out that taming a generative language model is really difficult. It's what we need to support more than some hardcoded simple questions, but companies like Google who are known for search want to keep their image of "use us to find what you're looking for". In the current state, their models (especially Bard in my experience) simply return bullshit and want to sound confident. They need to get beyond that stage.

But I feel you. My Google Assistant doesn't even seem to look for answers to questions anymore. All I get, even for simple queries, is a "sorry, I don't understand".

link

COGlory 1066 days ago

Willow, and the Willow Interference Server have the option to use Vicuna with speech input and TTS

link