So this means it is checking if you are sharing known CP images? That does seem to be much less invasive and problematic as there is likely no good reason to be sharing these images.
It’s not just checking files you are sharing, it’s also scanning files that exist on your device. The worry, or slippery slope argument, is that it’s one step away from scanning your device for other types of content, like memes critical of the government or just general wrongthink.
There actually are some edge cases even for matching against image blacklists. Google has experience with hitting them because it's used this type of image simhash for years (for shared cloud files at least).
The definition of child porn varies around the world. These systems use the US definition. This is not entirely what you might expect. For example, in the USA the courts have decided that cartoons can be child porn even though no actual children are in the picture. Most of the world does not agree with this, meaning an image can be CP in one place but not another. Is Apple going to enforce the US definitions or the ones where the user actually lives?
In the USA, photos an under-age person takes of themselves can also be considered CP.
What counts as a "child" for sexual purposes also varies around the world. Some countries have a lower age of consent than other places. In some parts of the world the age of consent and the age at which a child stops being a child for CP purposes are different, meaning that a teenager can have sex legally but if they take a photo of themselves doing it, they are trafficking in CP.
Finally, what is actually on these image blacklists? Hardly anyone actually knows because of the third rail nature of CP. Tech firms are often delivered image hashes, not even the images themselves, by third party 'charities' of various kinds and tech workers are - for obvious reasons - not normally given access to the actual pixels. Additionally, appeals from users are invariably ignored because people say "legal issues, it's complicated" and so everyone clams up. If FPs occur there is no way to resolve it and the people who see your appeal, if there even is one, won't be willing to actually look at the image to find out what it was.
It should be obvious how much potential for abuse this hands the people who actually manage these CP databases. Literally any image can be made verboten immediately, without any recourse, and basically nobody will ever find out including the people who shut down the affected users.
Yes, that's exactly it. It uses a database compiled by NGOs and specialized firms comprising of file hashes matching child porn. These lists are handled by humans.
Fuzzy means that it takes compression and the like into account, because even if just one pixel out of 20 thousand is different, the hash is different too. Fuzzy hash still recognizes it as the same image, so using an algorithm to alter the color etc. won't work.
That's also true for the no-fly list and the Terrorist Screening Database,[1] yet those are full of false positives. And unlike those lists, CSAM databases cannot be independently verified. To do so would require having the original images, which is illegal.
The no-fly list and the terrorist screening database aren't used in a court of law. The Confrontation Clause of the Sixth Amendment guarantees you access to all the evidence presented against you. You also don't need the original images to defend yourself, though apparently CP can be presented to a (traumatized) jury [0].
So if you're charged on the basis of a fuzzy hash matching, you'd subpoena Apple for the photo in your backup that matched, present it to the court (since it doesn't actually matter if it's CP or not to be admissible), and you win the case.
> So this means it is checking if you are sharing known CP images?
No. NeuralMatch was “trained” using 200,000 CP images. “Neural“ is likely a reference to the perceptual matching that it uses. It is not a bit for bit match.
Perceptual matching is a technique used for categorizing images based on characteristics and content.
The algorithm will scan your library containing new information and compare it to what it understands as CP.