Hacker News new | ask | show | jobs
by antognini 1289 days ago
Yes, this is exactly right and is why Griffin-Lim generated audio often has a sort of warbly quality. If you use a large FFT you can mitigate the issues with pitch because the frequency resolution in your spectrogram is higher, so the phase isn't so critical to getting the right pitch. But the trade-off of a bigger FFT is that the pitches now have to be stationary for longer.

The other place where phase is critical is in impulse sounds like drum beats. A short impulse is essentially just energy over a broad range of frequencies, but the phases have been chosen such that all the frequencies cancel each other out everywhere except for one short duration where they all add constructively. Without the right phases, these kinds of sounds get smeared out in time and sound sort of flat and muffled. The typing example on their demo page is actually a good example of this.