Hacker News new | ask | show | jobs
by 0942v8653 3810 days ago
There are 2 types of text-to-speech system:

- The "bank" one referenced above: this is made of short recordings of a real person saying the words or phrases, cut up and then concatenated. For some messages they sound exactly like a real person (because all it does is play a single recording), but when numbers are inserted, the above characterization is quite accurate. There is no effort to make the inflection fit properly in a sentence or have it sound natural.

- ivona.com and OS X `say`. These generate audio in real time, and may have a few samples but are generally created on-the-fly according to what is around the text. This is where the research is at right now, but the main problem is the CPU required to generate these. Your car, or Madden 2015, or the bank might not want to use up too much CPU time to make their audio sound like that.