Hacker News new | ask | show | jobs
by cviilgan 1879 days ago
Did I understand this correctly, what you are doing is essentially:

X[n] = F[x[k]][n/2] if (n even) else F[x'[k]][(n+1)/2]

With F[x[k]] the DFT of the time-domain signal x[k], x'[k] = x[k]·exp(2·pi·i·k·alpha) and this alpha some constant which yields a frequency-domain shift by 25Hz.

If so: How does this method compare to zero-padding the time-domain signal (i.e. sinc-interpolating the frequency domain)? It is an interesting concept, but alas it's not immediately clear to me how to analyze this...

1 comments

This sounds about right. I assume your (n+1)/2 is really n+1/2. The idea, like you've said, is to get Y[k+1/2] values where Y = FFT[X].

Whether this is mathematically sound is another question. I presume that it is, for two reasons. First, FFT essentially convolves X with a bunch of sinusoids with frequencies from a fixed set: 0 Hz, 50 Hz, 100 Hz and so on. There's nothing wrong with manually convolving X with a 57.3 Hz sinusoid, it's just FFT isn't designed for this (it's designed for rapid computation). The other reason is that combining such shifted FFTs we get what looks almost exactly like a CWT (i.e. wavelet transform).

As for sinc-interpolation, I think it's mathematically equivalent. Say we shift the input X with Z[k] = exp(ik/N...) and get XZ. Then we transform it to FFT[XZ] = FFT[X] conv FFT[Z], so it's convolving FFT[X] with FFT[Z] where FFT[Z] is probably that sinc kernel. I certainly know from experiments that FFT of exp(2·pi·i·k·alpha) where alpha doesn't precisely align with the 1024 grid produces a fuzzy function with a max around alpha and a bell-shaped curved around it, the width of the curve depends on how precisely alpha fits into one of the 1024 grid points.

Instead of combining 2 FFTs of 1024 bins (one shifted + one non-shifted), could you not just calculate 1 FFT of 2048 bins? Isn't it the same result?
Larger FFT window has undesired side effects because the estimated frequencies are averaged over the entire window. Moreover, the FFT output always spans from 0 Hz to 24 kHz (with 48 kHz sample rate), so to zoom into the 0..6 kHz region we'll need a window with 8192 bins or about 150 ms. With such window it would be impossible to capture rapid volume oscillations.