| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hosay123 2671 days ago
	I thought I understood FFT, but totally not getting the relationship to this problem. Someone ELI5 please? :)

4 comments

aabeshou 2671 days ago

ELI5: You can turn this problem into finding the best "convolution index", and fourier transforms make computing convolutions cheaper.

ELIUndergrad: (note that \* means multiplication, there doesnt seem to be a way to escape an asterisk)

Lets start by seeing how this is a convolution. We have the videoSpeech sequence, and the subtitle sequence - each is a vector, indexed by time, of 0's and 1's indicating whether there is speech in that time. We can imagine padding the sequences out on either side with 0's, and consider the alignment task as shifting the subtitle sequence left and right in time until we get the best alignment with the 1's in the videoSpeech sequence. We can express the goodness of alignment as the number of matching 1's, aka the sum over all times t of videoSpeech(t) \* subtitle(t). This is the definition of a convolution: the convolution of two sequences gives a new sequence where the value at index i is this sum above where one of the sequences is shifted by i. Mathematically, conv(videoSpeech, subtitle)(i) = sum( videoSpeech(t)\* subtitle(t-i)). So we can rephrase this problem as, find the index i which maximizes the value of the convolution sequence.

The discrete fourier transform is a function that takes a sequence and gives another sequence. It's relevant here because it "turns convolution into multiplication": fourier(videoSpeech)(i) \* fourier(subtitle)(i) = fourier(conv(videoSpeech, subtitle))(i).

So finally to solve the problem, we get the pointwise product sequence S = fourier(videoSpeech) \* fourier(subtitle), do the inverse fourier transform on it invFourier(S), and maximize invFourier(S)(i) over i.

akie 2671 days ago

> ELI5: You can turn this problem into finding the best "convolution index", and fourier transforms make computing convolutions cheaper.

I tried this sentence on my 5-year-old and got a blank stare. He then proceeded to tell me a story about how Darth Vader is so scary and cool and that he's actually Luke's father. YMMV.

ComodoHacker 2671 days ago

I guess GP actually meant ELWOT - Explain Like We're On Twitter.

aabeshou 2671 days ago

lol, that makes sense. heres another try:

ELI5 v2: you can look at the sequences a different way, like tilting your head to look at them from a different angle. and from that angle, we can more easily try every possible way of putting them together, so that we can choose the best way.

_fbpt 2671 days ago

I've beem working on a similar algorithm.

I'm writing a program which takes multiple channels of near-periodic audio wav files, and outputs a phase-stable oscilloscope video. This is done by FFT-correlating (a buffer of recent oscilloscope plots) with (the audio signal) (with quite a bit of added complexity for better results). Incidentally I'm also using Python/Numpy/ffmpeg.

It's most useful for "complex" chiptune like FM/SNES/tracker/MIDI music, which are easy to split into monophonic single-note channels. https://github.com/corrscope/corrscope Should I submit this separately (Show HN)?

yesenadam 2671 days ago

Why not use good ol 'x' for multiplication? :-)

manwithaplan 2671 days ago

I keep AltGr+8 mapped to × for just such an emergency.

frosted-flakes 2671 days ago

The Compose key works great for this (I use WinCompose on Windows). Most of the time I can guess the key combination. I think × is Compose, x, x.

computerfriend 2671 days ago

'x' is a letter, not the times symbol.

janaagaard 2671 days ago

But if you have to choose between '\*' and 'x', then 'x' might well be the best choice, no? :-)

rjeli 2671 days ago

If they had used ‘x’ instead, I would be scouring the comment for where the variable had been introduced.

yesenadam 2671 days ago

Ok, use 'x' and write "(note that 'x' means multiplication)" instead of "(note that \* means multiplication, there doesnt seem to be a way to escape an asterisk)". More legible and shorter and 'symbol is already used for the purpose' and..

pbhjpbhj 2671 days ago

It is _a_ multiplication symbol. Why not use period ".".

yesenadam 2671 days ago

Because there are a lot of those in the vicinity and it's very small?

Iburinoc 2671 days ago

The problem being solved essentially is: you have two binary strings, and you want to offset one of them so that they match up the best. For each offset, you're taking a dot product of one sequence with the offset version of the other. This is the same as computing the convolution of the two sequences together (https://en.wikipedia.org/wiki/Convolution). Computing this naively would be O(n^2) (doing linear work for each possible offset).

One property of the Fourier transform is that convolution in the time domain corresponds to element-wise multiplication in the frequency domain (https://en.wikipedia.org/wiki/Convolution_theorem), so you can compute the convolution efficiently by taking the FFT of both series, doing element-wise multiplication, and then taking the inverse FFT of the result.

beagle3 2671 days ago

It has nothing to do directly with FFT, which is why it is confusing:

What he is looking for is maximal correlation between two binary series (think "Pearson's r or r-square correlation coefficient"). Now, correlation is just like convolution except one series is flipped around on the time axis. Which means, if you have an efficient way to compute convolutions (and you do, through FFT), you have an efficient way to compute correlations.

(I don't think 5 year olds heard of Pearson's correlation coefficient or convolutions, but ... that's the best I can do).

In many ways, correlation and covariance are more fundamental than convolution - they are closely and directly related to the inner product.

And the only reason to use the FFT here is convenience (i.e., it is fast and simple to compute), but the correlation/convolution property applies to many "transform domains" - Laplace, Z, Cosine, Sine, and a few others (all of which are closely related among themselves, but only a few easy to compute numerically).

farazzz 2671 days ago

From my memory, convolution in the time domain is equivalent to multiplication in the frequency domain