| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tmalsburg2 1822 days ago
	About their TTS system: "These models provide speech synthesis with ~0.12 real-time factor on a GPU and ~1.02 on a CPU." The quality of the samples is really impressive but, wow, but isn't this computationally too expensive for many applications?

2 comments

nyanpasu64 1822 days ago

>If, for example, it takes 8 hours of computation time to process a recording of duration 2 hours, the real time factor is 4. When the real time factor is 1, the processing is done in real time. It is a hardware-dependent value.

I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.

link

tmalsburg2 1821 days ago

Not sure what you're quoting because I didn't write that, but

> I think real-time factors smaller than 1 are faster than real-time (not slower) and use less than 100% of a resource's computational power to keep up.

Sure, but who has the necessary GPUs installed? And on CPUs it will apparently take longer to generate speech than the duration of that speech. Unusable for many UIs and it will also drain the batteries of any portable device.

link

jpetso 1821 days ago

You're not wrong, but with so many chips incorporating some sort of dedicated "AI" or "tensor" functionality, perhaps the issue will resolve itself for most portable devices in a few years. Plus there's always the option of optimizing a little more and/or abusing other available hardware such as DSP chips to get the real time factor down. Anything over 1 isn't great, but it's not a bad start.

link

mazoza 1821 days ago

I means it is faster than real time almost 10x

So it is the contrary

link

tmalsburg2 1821 days ago

It’s 8.3 times faster than real time if you have a beefy GPU, which most devices don’t have. On a desktop CPU it‘s real time and on smartphones worse.

link