That was just a MD5 collision - an image that has same MD5 hash as some other image (in this case some CP). This is uncommon yet possible thing - see this example[0].
Yeah, vaguely talking about MD5 as "broken" is common and misleading. There are very particular known attacks.
Obviously nobody should be using MD5, but it can be useful to understand there are circumstances where it's basically reliable unless you have an extremely sophisticated attacker.
Yes, hash collisions definitely occur. There is no such thing as collision-free hashes, and MD5 is definitely broken.
Even though the author says they were 3 million MD5 hashes the second time, the first one he calls them SHA1 and MD5 hashes (even though SHA1 is considered weak too).
I wonder what kind of hashes Apple is planning to use. Will it be whatever is made available to them or will they only accept (what is now considered) secure standards?
Which may contain the hashes of their photos, because they've been taken down in the past, which means they probably have been added to certain blacklists that may have been integrated into the blackbox of NCMEC's database.
NCMEC's CSAM database already includes images that are not necessarily illegal. If _your particular_ photos have been flagged in the past, they may well be part of the database.
> NCMEC's CSAM database already includes images that are not necessarily illegal.
How could this be the case? If it's been determined to be CSAM then it is, by definition, illegal.
If it were true that the database is likely to contain legal material, how would we possibly know about it, given that the contents of the database are secret?
> How could this be the case? If it's been determined to be CSAM then it is, by definition, illegal.
Certain images are CSAM by _context_. They do not necessarily require those within the image to be abused, but rather that the image at one time or another was traded alongside other CSAM.
> If it were true that the database is likely to contain legal material, how would we possibly know about it, given that the contents of the database are secret?
Tools like Spotlight [0] make use of the database, so certain well-known images are known to flag. Such as Nirvana's controversial cover for Nevermind.
> Certain images are CSAM by _context_. They do not necessarily require those within the image to be abused, but rather that the image at one time or another was traded alongside other CSAM.
At the risk of sounding like a broken record, how can we know this is actually true? Every description of the NCMEC database's contents that I've seen is incredibly vague, and as of 2019 it seems like there were fewer than[1] 4 million total hashes available. I would think that if it genuinely did include innocent photos of people's kids, the number would be much higher.
> ...certain well-known images are known to flag. Such as Nirvana's controversial cover for Nevermind.
I've heard this multiples times now, but I've never been able to find any evidence of it actually happening. The only instance I could find was one where Facebook removed[2] that Nirvana cover once for containing nudity.
If you're sending other people photos of your children that are explicit enough to prompt someone bring them to the attention of child safety groups like NCMEC, and they look at it and agree it's worth their time to investigate, the first you hear of it isn't likely to be after it eventually comes full circle through Apple's CSAM processes.
Remember, this isn't a porn detector strapped to a child detector.
Hypothetically that's possible, although all three steps you listed are exceedingly non-trivial. The notion that an attacker could pull off two of those steps let alone three is borderline fanciful. In addition, their target must also qualifies with the necessary prerequisites:
• has an iPhone;
• has children;
• took photos of their children which could be mistaken for CSAM by a sloppy reviewer;
• is of sufficiently high importance to justify the effort.
And after that insane effort, all you've done is inconvenience your target for a little while until child safety people investigate your family situation and discover that the photos which got flagged were not actually CSAM.
Immediately after the investigation process discovers the hash fraud, Apple will immediately start delving into exactly how their hash algorithm failed in this instance, improving it to mitigate this exploit. So this target better be worth it!
If this was a plausible exploit, surely it would have already happened to people with Android phones since Google has been doing pretty much the exact same scanning of customer images for over five years. (The only difference with what Apple is now doing is where the hashing is performed—but this makes no functional difference to the viability of your hypothetical exploit.)
[1]: https://www.hackerfactor.com/blog/index.php?/archives/929-On...