|
|
|
|
|
by nine_k
1261 days ago
|
|
From thee link: > Tortoise is a bit tongue in cheek: this model is insanely slow. It leverages both an autoregressive decoder and a diffusion decoder; both known for their low sampling rates. On a NVidia Tesla K80, expect to generate a medium sized sentence every 2 minutes. I suspect that for a real(-ish) time TTS system, something else is needed. OTOH if you want to record some voice acting for a game or other multimedia product, it still may be more cost-effective than recording a bunch of live humans. (K8 = NVidia Tesla K80, GPU, $800-900 for a 24GB version right now.) |
|