|
|
|
|
|
by dentalperson
938 days ago
|
|
Yes, they all have significant 'ghosting' artifacts where the harmonics are a bit fuzzy if you listen closely. AFAIK all of the recent neural speech engines have this, from SoundStream to EnCodec, especially in low latency causal setups. Wavenet was a bit better in that regard but has fallen out of style due to complexity and the lack of a bottleneck. It seems like something diffusion post processing would be able to clean up. |
|