Hacker News new | ask | show | jobs
by mrmaximus 3393 days ago
Based upon what little is posted there, I thought they were taking the original recording, then training the model on that recording against the text of the recording... reproducing the recording. I would think next step is to sample enough audio and text to be able to produce new outputs entirely. It should in theory even be able to learn when/where/how to use inflection.