|
|
|
|
|
by ripperdoc
1149 days ago
|
|
Am I hallucinating or didn't several of the examples have background audio artifacts, like it's been trained on speech with noisy backgrounds, I'm guessing audio from movies paired with subtitles? Having random background audio can make it quite hard to use in production. |
|
The other side of that problem is an opportunity. That's why the same model can also generate music, background noise and sound effects. And it's just because the prompt specifies those things explicitly. The input is truly semantic, so the output is rich and reflects that context. Is your input text sounds like it came from a speech, then there's a high chance your output audio will sound like a megaphone in a public space with crowd reactions and maybe even applause.