Hacker News new | ask | show | jobs
by striking 3367 days ago
1. If you can pay for around 24.6 hours of VA speech data, you can get enough data to run this process with the same quality that Google presented. (that's from the "Experiments section") Not cheap (definitely not free, especially considering the amount of quality control you have to apply), but not expensive either.

2. You can rent out a 96GB GDDR5 GPU instance from Google's cloud for pretty cheap. (https://cloud.google.com/compute/docs/gpus/) I don't think you need anything more powerful than that (but feel free to prove me wrong).

I think your last paragraph is totally misguided/uninformed. You can download models for cheap/free (for non-commercial/edu use) from UPenn (https://www.ldc.upenn.edu/language-resources/data/obtaining). People don't give away models for free with 0 strings attached because they're a pain to make.

And if you want something you can run on a home computer for cheap/free, you can try DeepSpeech: https://github.com/mozilla/DeepSpeech. All you need is an Nvidia GPU.

2 comments

How about feeding it several audiobooks read by a single narrator, coupled with the books in text? Cost would be < $100. There could be legal problems if you tried to sell the resulting voice, but as proof of concept, wouldn't this work?
1.What is your estimate then? Hundreds Dollars? Thousands? 10 Thousands? Surely it might fall into the later two category, since paying a professional speaker to sit and work for 26.4 hours is already over 1000 dollars, if you assume 40$ per hour wage.

2.96GB GDDR5 instances on GCE costs 4166.4 dollars per month. Though it is within affordable range, but definitely not CHEAP. I don't know whether this is powerful or not, but Google used 96 GPUs for their GNMT work. Thus, I don't think I have the confidence to say a 4-GPU machine is all you need, and it will surely cause much more if you go beyond that.