ElevenLabs is the only one offering speech to speech generation where the intonation, prosody, and timing is kept intact. This allows for one expressive voice actor to slip into many other voices.
What ElevenLabs and OpenAI call “speech to speech” are completely different.
ElevenLabs’ takes as input audio of speech and maps it to a new speech audio that sounds like a different speaker said it, but with the exact same intonation.
OpenAI’s is an end-to-end multimodal conversational model that listens to a user speaking and responds in audio.