Hacker News new | ask | show | jobs
by yeet_yeet_yeet 1313 days ago
>not realistic

The odds are zero.

1/2^256 = 0.

In cryptography these odds are treated as zero until you generate close to 2^128 images.

Unfortunately there's no word in natural English to describe how unlikely. The most precise is "zero".

3 comments

Are you assuming that digital images are evenly distributed over the set of all possible 256 bit vectors?

Because I don't think that's a reasonable assumption.

Even if image recognition was perfectly solved with no known edge cases (ha!), when an entire topic is a semantic stop-sign for most people, you can't expect the mysterious opaque box that is a guilty-enough-to-investigate detection mechanism to be something that gets rapid updates and corrections when new failure modes are discovered.

You should spend some time with an internet search engine and the term "perceptual hashing". What you're talking about is another type of hashing, which can be useful for classifying image files, but not images. The former has a very concrete definition that is specified down to the bit; the latter is a fuzzy space because it's trying to yield similar (not necessarily identical) hashes for images that humans consider similar. Much different space, much different problem, much different collision situation. Cryptographic hashing is not the only kind of hashing.
Oh wow https://www.apple.com/child-safety/pdf/CSAM_Detection_Techni... so they essentially just use CNN output to automatically determine whether to report people to the authorities? For some reason I assumed they were just comparing the files they knew to be CSAM.

Yeah that's bad. What about deepdream/CNN reversing? Couldn't a rogue apple engineer just create a innocuous looking false positive, say a cat picture, share it on Reddit, and everybody who downloads it is flagged to police for CSAM?

No, there are two hashes used in the Apple system, one public and neural and one hidden, the intent of both is to match specific known images and not unknown new ones, and the result of passing both hashes is a manual review and not automatic reporting. I've never seen a published attack that would actually be a problem; they all misread how the system worked.

(Also, it's not reported to the police but to NCMEC, which is not a government agency. This is for 4th amendment privacy reasons.)

The CSAM flagging generally isn’t reported to police to prevent the situation you describe. Google would get the report and once some threshold is reached, a person reviews the report(s) and decides if the police are notified.
How can you be so sure? As I understand it, the hash is of features in the image and not the image itself. Are the CSAM feature detection heuristics public?