Hacker News new | ask | show | jobs
by bitwise-evan 1759 days ago
> an adversary could trick Apple’s algorithm into erroneously matching an existing image

This is a very real, possible attack. Apple ships its CSAM model on device so any attacker can have a copy of the model. Then the attacker creates an image that triggers CSAM but looks like a panda [1]. Now the attacker sends tons of triggering photos to the unsuspecting victim, who now gets questioned by the FBI.

1: https://medium.com/@ml.at.berkeley/tricking-neural-networks-...

2 comments

> Now the attacker sends tons of triggering photos to the unsuspecting victim, who now gets questioned by the FBI.

That's glossing over the middle part where a human from Apple (before it even gets to law enforcement) actually look at the images and goes "oh, these are actually pandas" and realizes they were erroneously detected.

So the attacker creates an image then the user has to download it. Then the FBI digs in and see it was a crafted false positive, then begin to investigate who sent it and why. Then the user takes civil action against the person who sent it for harassment.
More precisely, 30 carefully crafted false positives. All of which need to be imported into your iPhone's photo library to sit alongside pictures of your dog and your mum. And then they have to get past human review. Not impossible, but so far beyond implausible that it can be dismissed as ridiculous.

And if this trick ever works, it could only be done once before Apple has the opportunity to plug holes in their NeuralHash algorithm and fix any deficiencies in the manual review process.