|
|
|
|
|
by narrationbox
1111 days ago
|
|
Since it does the signal processing in the Fourier domain, does this suffer from audio artefacts e.g. hissing in the output? Torch's inverse STFT uses Griffin-Lim which is probabilistic and if you don't train it sufficiently, you may sometimes get noise in the output. https://pytorch.org/docs/stable/generated/torch.istft.html#t... An alternative would be to use a vocoder network (or just target a neural speech codec like SoundStream). |
|