| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by varelaz 1754 days ago
	I used similar approach for video hashing. Instead of interval I used key frames with ffmpeg, then you don't depend on codec. Also didn't rescale but took hash of every frame. For youtube I found that it still produces different hashes sometimes. edit: to get only keyframes use select=eq(pict_type,I)

2 comments

Farmadupe 1753 days ago

I get that decoding only keyframes will be much faster, but how can codec independence be maintained when different codecs will insert keyframes at very different points?

Could such an algorithm ever find a duplicate between say a GIF (every frame is a keyframe) vs any modern codec with very few keyframes?

(or is this optimization specifically for videos known the be encoded with the exact same codec, and specifically with a static keyframe interval?)

link

varelaz 1752 days ago

Codecs are algorithms how to generate B and P frames. I frames are just jpegs. Yes, codecs can split video differently, but in case of the same split the same frame will be encoded the same way. In most cases key frame frequency is just a number. Some formats like HLS cannot work with variable key frame frequency at all. Why it matters, because different version of the same codec can replay the same video differently for B and P (no guarantee), but not I frames. So I frames are the most stable.

link

mzs 1754 days ago

faster: -skip_frame nokey

link