| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aabeshou 2671 days ago

ELI5: You can turn this problem into finding the best "convolution index", and fourier transforms make computing convolutions cheaper.

ELIUndergrad: (note that \* means multiplication, there doesnt seem to be a way to escape an asterisk)

Lets start by seeing how this is a convolution. We have the videoSpeech sequence, and the subtitle sequence - each is a vector, indexed by time, of 0's and 1's indicating whether there is speech in that time. We can imagine padding the sequences out on either side with 0's, and consider the alignment task as shifting the subtitle sequence left and right in time until we get the best alignment with the 1's in the videoSpeech sequence. We can express the goodness of alignment as the number of matching 1's, aka the sum over all times t of videoSpeech(t) \* subtitle(t). This is the definition of a convolution: the convolution of two sequences gives a new sequence where the value at index i is this sum above where one of the sequences is shifted by i. Mathematically, conv(videoSpeech, subtitle)(i) = sum( videoSpeech(t)\* subtitle(t-i)). So we can rephrase this problem as, find the index i which maximizes the value of the convolution sequence.

The discrete fourier transform is a function that takes a sequence and gives another sequence. It's relevant here because it "turns convolution into multiplication": fourier(videoSpeech)(i) \* fourier(subtitle)(i) = fourier(conv(videoSpeech, subtitle))(i).

So finally to solve the problem, we get the pointwise product sequence S = fourier(videoSpeech) \* fourier(subtitle), do the inverse fourier transform on it invFourier(S), and maximize invFourier(S)(i) over i.

3 comments

akie 2671 days ago

> ELI5: You can turn this problem into finding the best "convolution index", and fourier transforms make computing convolutions cheaper.

I tried this sentence on my 5-year-old and got a blank stare. He then proceeded to tell me a story about how Darth Vader is so scary and cool and that he's actually Luke's father. YMMV.

link

ComodoHacker 2671 days ago

I guess GP actually meant ELWOT - Explain Like We're On Twitter.

link

aabeshou 2671 days ago

lol, that makes sense. heres another try:

ELI5 v2: you can look at the sequences a different way, like tilting your head to look at them from a different angle. and from that angle, we can more easily try every possible way of putting them together, so that we can choose the best way.

link

_fbpt 2671 days ago

I've beem working on a similar algorithm.

I'm writing a program which takes multiple channels of near-periodic audio wav files, and outputs a phase-stable oscilloscope video. This is done by FFT-correlating (a buffer of recent oscilloscope plots) with (the audio signal) (with quite a bit of added complexity for better results). Incidentally I'm also using Python/Numpy/ffmpeg.

It's most useful for "complex" chiptune like FM/SNES/tracker/MIDI music, which are easy to split into monophonic single-note channels. https://github.com/corrscope/corrscope Should I submit this separately (Show HN)?

link

yesenadam 2671 days ago

Why not use good ol 'x' for multiplication? :-)

link

manwithaplan 2671 days ago

I keep AltGr+8 mapped to × for just such an emergency.

link

frosted-flakes 2671 days ago

The Compose key works great for this (I use WinCompose on Windows). Most of the time I can guess the key combination. I think × is Compose, x, x.

link

computerfriend 2671 days ago

'x' is a letter, not the times symbol.

link

janaagaard 2671 days ago

But if you have to choose between '\*' and 'x', then 'x' might well be the best choice, no? :-)

link

rjeli 2671 days ago

If they had used ‘x’ instead, I would be scouring the comment for where the variable had been introduced.

link

yesenadam 2671 days ago

Ok, use 'x' and write "(note that 'x' means multiplication)" instead of "(note that \* means multiplication, there doesnt seem to be a way to escape an asterisk)". More legible and shorter and 'symbol is already used for the purpose' and..

link

pbhjpbhj 2671 days ago

It is _a_ multiplication symbol. Why not use period ".".

link

yesenadam 2671 days ago

Because there are a lot of those in the vicinity and it's very small?

link