| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by giantrobot 1754 days ago

They're extracting video frames at some interval (default of 1 second) as 144x144px stills and then turning them into a square collage. That collage then has a perceptual hash performed on it.

The major problem here is two videos with the exact same content but slightly different times (say one with a couple second intro) will rarely if ever have a positive match.

The only cases I see where this particular scheme helpful is where you've got videos with the same contents but different encodings. The length will be the same but quality between two encodings (and names) might be different. This would help you find them in a sea of files.

A simple improvement would be to only check the frame from the middle of each video first. If the frame at the same time stamp are the same in one part you've got a non-zero probability of a match. Then you can attempt to check more frames radiating out from the center point. Negative matches will fail fast and save you work. It also matches when the lengths are dissimilar because of trims or splices at the beginning and end of the videos.

A second improvement would be to pick a frame from the A video and scan through the B video (or segment of each) to find a high probability match. Then check other segments of the video for matches in the same way.

Trying to turn a video into a single static representation and comparing it is not the best.

1 comments

jsdwarf 1754 days ago

Wouldn't it make more sense to convert the video to greyscale and e.g. detect significant changes of brightness during frames and store them as vector coordinates (% of the playtime, brightness delta)?

link

giantrobot 1754 days ago

That could work. But I think limiting your search to brightness patterns is going to make for a lot of false positives. The brightness search might make for a good first pass to find a subset of the corpus for a more in depth search.

link