Hacker News new | ask | show | jobs
by foobarrio 1774 days ago
In my admittedly limited experience in image hashing, typically you extract some basic feature and transform the image before hashing (eg darkest corner in the upper left or look for verticals/horizontals and align). You also take multiple hashes of the images to handle various crops, black and white vs color. This increases robustness a bit but overall yea you can always transform the image in such a way to come up with a different enough hash. One thing that would be hard to catch is if you do something like a swirl and then the consumers of that content will use a plugin or something to "deswirl" the image.

There's also something like the Scale Invariant Feature Transform that would protect against all affine transformations (scale, rotate, translate, skew).

I believe one thing that's done is whenever any CP is found, the hashes of all images in the "collection" is added to the DB whether or not they actually contain abuse. So if there are any common transforms of existing images then those also now have their hashes added to the db. The idea being that a high percent of hits from even the benign hashes means the presence of the same "collection".

1 comments

Huh, or you can just use encryption if you'll be using some SW based transformation anyway.