Hacker News new | ask | show | jobs
by cypher543 4150 days ago
According to the documentation[1], it's a concatenative synthesizer using decision trees for prosody modeling and PSOLA for output.

[1]: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...

1 comments

Thanks! I am working in this area and have some ideas for deep learning type methods which move away from concatenative synthesis. It will be nice to compare to what they are using.
We did some work on applying NNs to prosody prediction; see Fernandez, Raul, et al. "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks." Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). 2014.
This paper (from ICASSP2013) may be of interest to you: https://static.googleusercontent.com/media/research.google.c...