| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kastnerkyle 4193 days ago
	What techniques are being used for text to speech? Is is something deep learning related or more standard HMM synthesis? Any paper references?

1 comments

cypher543 4193 days ago

According to the documentation[1], it's a concatenative synthesizer using decision trees for prosody modeling and PSOLA for output.

[1]: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...

link

kastnerkyle 4193 days ago

Thanks! I am working in this area and have some ideas for deep learning type methods which move away from concatenative synthesis. It will be nice to compare to what they are using.

link

picheny 4192 days ago

We did some work on applying NNs to prosody prediction; see Fernandez, Raul, et al. "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks." Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). 2014.

link

woodson 4192 days ago

This paper (from ICASSP2013) may be of interest to you: https://static.googleusercontent.com/media/research.google.c...

link