Hacker News new | ask | show | jobs
by dentalperson 938 days ago
Yes, they all have significant 'ghosting' artifacts where the harmonics are a bit fuzzy if you listen closely. AFAIK all of the recent neural speech engines have this, from SoundStream to EnCodec, especially in low latency causal setups. Wavenet was a bit better in that regard but has fallen out of style due to complexity and the lack of a bottleneck. It seems like something diffusion post processing would be able to clean up.