Hacker News new | ask | show | jobs
by 57844743385 1722 days ago
I did a lot of work researching all the available text to speech systems a couple of years ago.

The cloud based systems from Google, Microsoft, Amazon and IBM are much better than anything else, and within them, the neural network based systems, which appear to be a sort of different product category, are far and away the best of all. The neural voices are approaching natural voice intonation and have an almost believable ability to read text.

The ones that sounded most natural were IBM Watson and Googles neural voices.

Amazon Polly appeared to be the furthest behind of all the cloud systems…. a really average sounding product.

Of the local TTS systems, the one built into MacOS sounds the best… but they were all very average at best. All the linux ones frankly sounded like garbage relative to the state of the art.

Things might have advanced with the cloud systems over the past couple of years but I didn’t get the impression the cloud companies were putting much effort into research and development.

2 comments

I searched for a TTS service recently and found wellsaidlabs. It’s a saas product but the quality is astonishing. It’s also fast to render the audio, approximately 2 times the length of the audio file. Here is an article of the mit technology review magasine about it https://www.technologyreview.com/2021/07/09/1028140/ai-voice...
I had reason to sample the IBM performance recently. It is imressive. Do you know if NN based systems have been trained on, say, audio books for which text is also available?