Hacker News new | ask | show | jobs
by ShamelessC 1260 days ago
Heh you might want to use an equivalent gaming GPU for the price comparison. Surely a thousand dollars spent on an RTX 4000 series card (Hopper) would outperform a P5000?

I agree though, Tortoise TTS did a lot of similar work IIRC by a single person on their multi-GPU setup. Really impressive effort. Did they get a citation? They deserve one.

edit: reading other comments it seems there is a misconception that the model takes 3 seconds to run? That isn't the case - it requires "just" 3 seconds of example audio to successfully clone a voice (for some definition of success).

1 comments

rtx4000 only has 8gig memory which means reducing the batch size (much slowness) and/or how much text you can give it at once (meaning you have to break up text chunks not at sentence breaks)

rtx5000 maybe but not sure how much of a value improvement there is

What is this, chatGPT? RTX 4000 is a series of cards, some of which have 24 GB of VRAM. There is no such thing as RTX 5000 series yet.
The commenter you're responding to is talking about Lovelace architecture based GeForce RTX 40x0 products. The Quadro line isn't even released yet on this architecture. You are talking about the specific Quadro RTX 4000 product, which is a TU104 (turing arch, 2 gens behind, with 2560 processors and 8GB memory). The commenter you're responding to is referring to something like a GeForce RTX 4090 which sports an AD102 (lovelace arch, with 16384 processors and 24GB memory).

You were merely an unfortunate casualty of Nvidia's product marketing scheme (and a commenter's slightly imprecise reference to it) here.

I'm pretty sure we all lost heh. Thanks for clarifying. Indeed, there were slight errors in my description and the other commenter was reasonable in assuming those other cards were in discussion.