| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by oidar 463 days ago
	Any plans to offer speech to speech models which keep prosody, intonation, and timing intact? ElevenLabs is getting expensive for this.

1 comments

jeffharris 463 days ago

we'll keep expanding these GPT-4o based models with more controls. Is the main feature missing we're missing custom voices?

link

oidar 462 days ago

No, not custom voices - but voices that can be influenced by a recording. As in, a male voice actor records a part, and the model transforms it to a female part - keeping all the prosody, intonation and timing in the original recording. This would allow one voice actor to do many roles.

link