Hacker News new | ask | show | jobs
by jmvalin 837 days ago
Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.
2 comments

Any demos of this to listen to? It sounds potentially really good.
There is a demo in the link shared by OP.
That's really cool. Congratulations on the release!