|
|
|
|
|
by ianbicking
385 days ago
|
|
I've been using OpenAI's new models a lot lately (https://www.openai.fm/)... separating instructions from the spoken word is an interesting choice, and I'm assuming also has a lot to do with OpenAI/GPT using "instructions" across their products, and maybe they are just more comfortable and familiar generating the data and do the training for that style. Separate instructions is a bit awkward, but does allow mixing general instructions with specific instructions. Like I can concatenate output-specific instructions like "voice lowers to a whisper after 'but actually', and a touch of fear" with a general instruction like "a deep voice with a hint of an English accent" and it mostly figures it out. The result with OpenAI feels much less predictable and of lower production quality than Eleven Labs. But the range of prosidy is much larger, almost overengaged. The range of _voices_ is much smaller with OpenAI... you can instruct the voices to sound different, but it feels a little like the same person doing different voices. But in the end OpenAI's biggest feature is that it's 10x cheaper and completely pay-as-you-go. (Why are all these TTS services doing subscriptions on top of limits and credits? Blech!) |
|
Terrible pricing model, in my opinion.