| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dentalperson 938 days ago
	Yes, they all have significant 'ghosting' artifacts where the harmonics are a bit fuzzy if you listen closely. AFAIK all of the recent neural speech engines have this, from SoundStream to EnCodec, especially in low latency causal setups. Wavenet was a bit better in that regard but has fallen out of style due to complexity and the lack of a bottleneck. It seems like something diffusion post processing would be able to clean up.