| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jmvalin 837 days ago
	Well, there's different ways to make things up. We decided against using a pure generative model to avoid making up phoneme or words. Instead, we predict the expected acoustic features (using a regression loss), which means that model is able to continue a vowel. If unsure it'll just pick the "middle point", which won't be something recognizable as a new word. That's in line with how traditional PLCs work. It just sounds better. The only generative part is the vocoder that reconstructs the waveform, but it's constrained to match the predicted spectrum so it can't hallucinate either.

2 comments

stevage 837 days ago

Any demos of this to listen to? It sounds potentially really good.

GaggiX 837 days ago

There is a demo in the link shared by OP.

CharlesW 837 days ago

That's really cool. Congratulations on the release!