Hacker News new | ask | show | jobs
by lefthansolo 4963 days ago
This is a nice one, however I'm still confounded by the lack of progress since bell labs made an online text to speech converter many years ago. Particularly, the notion that the interpretation of each sentence is idempotent is just wrong. Want to see what I mean? A human would not speak like the following; there should be differences in intonation, "emotion" (sounding bored, angry, excited, etc. that varies depending on the number of times "dogs" would be said), speed, and delay. In addition, you have to breathe at some point, and even the best audiobooks have some level of breath noise.

http://tts-api.com/tts.mp3?q=dogs.%20dogs.%20dogs.%20dogs.%2....

1 comments

It takes a breath for blank lines or new paragraphs. http://tts-api.com/tts.mp3?q=High%20Quality%0AWe%20believe%2...!