Hacker News new | ask | show | jobs
by pronlover723 1259 days ago
What are the odds of this kind of thing being open source so I can use it at home. So far, most of the "good" text-to-speech systems are all commercial services

https://aws.amazon.com/polly/

https://cloud.google.com/text-to-speech

https://azure.microsoft.com/en-us/products/cognitive-service...

And now one is also a service.

I tried using tortoise-tts on my M1. Generating a 7 minute speech took 3 days and, while better than the 15 yr old text-to-speech built into the OS it wasn't close to the quality of the services above. Maybe I don't know who to use it but of course it's not as simple as text-to-speech. You need the system to ideally understand the text it can act out parts

Of course see my username. I want to generate personal adult content so I'd prefer not to upload it to a service.

4 comments

Any time I see AI model news on hn nowadays, my first question is whether I can run it locally, and if not, what are the alternatives that I can run locally.
> what are the alternatives that I can run locally

...you will be disappointed by the answers to that question for the foreseeable future.

I'm the opposite of disappointed. The amount of public pretrained models that have been popping up recently is crazy.
Same model with random tweaks applied.

Just because there is a new toy doesn’t mean capitalism gave up.

There is much more than stable-diffusion out there :)

Of course capitalism doesn't give up, I wouldn't even want it to.

The speed of progress on this front is increasing. These days even "cheap" rockchip MCUs are packing 5TOPs AI accelerators. And both AMD and Intel are working on much more powerful ones for their cpus. Heck, I recently wrote a mobile (android) app that runs pretty powerfull AI for intensive image processing locally on mobile phones thinking improved privacy would be more in demand than sending everything "to the cloud". I was mildly surprised to discover most people don't care (after writing the app). Still, I wouldn't be surprised if in 10 years the majority of AI people use rums on end user devices.
Yeah, most people don't care, but it might also be the case that many people who care use iOS, since that's the platform where all photo machine learning provided by the system happens on device.
That’s because you’re running tortoise on a CPU. It does about a sentence a minute on my 3090 gpu. It’s also quite good if you pick “high quality” and train it with 10 sec clips at the framerate and bitrate it asks for.
Effectively superseded by https://github.com/coqui-ai/TTS
What kind of personal adult content do you generate? We are curious for details