Hacker News new | ask | show | jobs
by KianHooshmand 1176 days ago
Our belief is that at some point OpenAI will add a speech-to-speech model. This will improve the library functionality (since now the whole stack is controlled by a single entity, so the product will naturally be better latency/quality wise).

Our library is open source so that we can all build a development/utility layer on top of whatever foundational models are created. Plugins of course also improve what the agents can do. And right, we will be building enterprise focused products in the future!

1 comments

OpenAI will absolutely add voice and my guess is that their voice support will rival anything on the market because they will train the voice model alongside the text and image models. This is likely months away if not weeks away.

Obviously just my $0.02:

I'd start building for the enterprise right now. Visualize a future where there are several multimodal AGIs that work with voice, images, and text. Be the enterprise voice layer for all of them. Build your moat there.

I don't think there will be any demand for a self-hosted voice model with a SaaS LLM though. So that only works if they are going to train an LLM from scratch (or take the legal risk of using LLaMA).
We totally agree – thank you for the feedback! :)