Wouldn't it make more sense to convert the video to greyscale and e.g. detect significant changes of brightness during frames and store them as vector coordinates (% of the playtime, brightness delta)?
That could work. But I think limiting your search to brightness patterns is going to make for a lot of false positives. The brightness search might make for a good first pass to find a subset of the corpus for a more in depth search.