| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nine_k 1261 days ago

From thee link:

> Tortoise is a bit tongue in cheek: this model is insanely slow. It leverages both an autoregressive decoder and a diffusion decoder; both known for their low sampling rates. On a NVidia Tesla K80, expect to generate a medium sized sentence every 2 minutes.

I suspect that for a real(-ish) time TTS system, something else is needed. OTOH if you want to record some voice acting for a game or other multimedia product, it still may be more cost-effective than recording a bunch of live humans.

(K8 = NVidia Tesla K80, GPU, $800-900 for a 24GB version right now.)

2 comments

generalizations 1260 days ago

I see 24GB Tesla K80s on ebay for $90...what am I missing?

link

shaklee3 1261 days ago

a k80 is extremely old by now, so I'd expect this to be maybe an order of magnitude faster.

link

nine_k 1261 days ago

Would it still require a 3080 to run adequately, that is, with 1-2 seconds of delay? I've no idea what consumer-grade hardware works well for ML loads.

link

shaklee3 1261 days ago

I haven't tried it, but the k80 is about 6 years old/5 generations. there have been massive leaps since then.

link

epolanski 1261 days ago

6 years old is nowadays more like 3 generations and it's definitely not a magnitude (10x) of difference.

link

shaklee3 1260 days ago

Kepler, Maxwell, Turing, Volta, ampere, Lovelace, hopper. it's 6 generations old when you include the micro architectures. it would be about a 10x improvement.

link

epolanski 1260 days ago

Oh, if it's kepler, absolutely. Thought 6 years thus Ampere.

link