Hacker News new | ask | show | jobs
by oidar 463 days ago
Any plans to offer speech to speech models which keep prosody, intonation, and timing intact? ElevenLabs is getting expensive for this.
1 comments

we'll keep expanding these GPT-4o based models with more controls. Is the main feature missing we're missing custom voices?
No, not custom voices - but voices that can be influenced by a recording. As in, a male voice actor records a part, and the model transforms it to a female part - keeping all the prosody, intonation and timing in the original recording. This would allow one voice actor to do many roles.