Hacker News new | ask | show | jobs
by joegreen 3810 days ago
I've just tried playing "Your account has a balance of 100 dollars and 95 cents" on https://www.ivona.com/us/ and it seems quite natural to me (but I'm not a native english speaker so I may not be sensitive).
2 comments

There are 2 types of text-to-speech system:

- The "bank" one referenced above: this is made of short recordings of a real person saying the words or phrases, cut up and then concatenated. For some messages they sound exactly like a real person (because all it does is play a single recording), but when numbers are inserted, the above characterization is quite accurate. There is no effort to make the inflection fit properly in a sentence or have it sound natural.

- ivona.com and OS X `say`. These generate audio in real time, and may have a few samples but are generally created on-the-fly according to what is around the text. This is where the research is at right now, but the main problem is the CPU required to generate these. Your car, or Madden 2015, or the bank might not want to use up too much CPU time to make their audio sound like that.

Just spent more time than I would like to admit making those voices say all matter of inappropriate things and giggling to myself.
Did you try the chipmunk voice?