|
|
|
|
|
by jchw
498 days ago
|
|
OpenAI's approach is certainly more technically interesting, and probably the way to go in the longer term, if the juice is worth the squeeze with this sort of technology. (After being relatively unmoved by LLMs for many other tasks, I found the voice assistant concept a lot more interesting, personally, even though I still don't have any routine uses for it.) That said, it doesn't really matter exactly how it works internally: Gemini Live accomplishes what it sets out to do, in that it feels very natural and works fairly well. I think it's clear there will be benefits to the multimodal approach of running voice directly in and out of a model for this sort of application, but if stringing together other existing technology can get you 80% of the way, it's not really a rush to get there. I don't really find this too surprising, since as far as speech recognition and voice synthesis goes, the state of the art today is very good, and most of the time computer voice interactions were greatly held back mostly by other things. |
|