Hacker News new | ask | show | jobs
by tpush 1772 days ago
This is not the kind of algorithm that Apple is be using. That one only scans for already known CSAM in NCMEC's database.
2 comments

Quite funnily and disturbingly, one the databases of "known CSAM" hashes also apparently includes a picture of a clothed man holding a monkey[1]

[1]: https://www.hackerfactor.com/blog/index.php?/archives/929-On...

That was just a MD5 collision - an image that has same MD5 hash as some other image (in this case some CP). This is uncommon yet possible thing - see this example[0].

[0] https://natmchugh.blogspot.com/2014/11/three-way-md5-collisi...

I think a flawed process where the monkey image ended up in the database is more likely than a random unintentional hash collision.
Not really. MD5 is thoroughly and completely broken, and has been for years. You can modify an image to be an MD5 collision for another image.
No you cannot. A collision requires the attacker to create both images.

What you are describing is a second preimage attack-- creating a second input with the same hash as a target.

There is no currently known tractable way to create second preimages for MD5.

Yeah, vaguely talking about MD5 as "broken" is common and misleading. There are very particular known attacks.

Obviously nobody should be using MD5, but it can be useful to understand there are circumstances where it's basically reliable unless you have an extremely sophisticated attacker.

That would be an intentional collision. An unintentional collision remains unlikely for a cryptographic hash.
Not just unlikely but astronomically unlikely.
Yes, hash collisions definitely occur. There is no such thing as collision-free hashes, and MD5 is definitely broken.

Even though the author says they were 3 million MD5 hashes the second time, the first one he calls them SHA1 and MD5 hashes (even though SHA1 is considered weak too).

I wonder what kind of hashes Apple is planning to use. Will it be whatever is made available to them or will they only accept (what is now considered) secure standards?

Which may contain the hashes of their photos, because they've been taken down in the past, which means they probably have been added to certain blacklists that may have been integrated into the blackbox of NCMEC's database.
Photographs of your naked child in the bath are not illegal, are not CSAM, and are not going to be in the NCMEC's database.
NCMEC's CSAM database already includes images that are not necessarily illegal. If _your particular_ photos have been flagged in the past, they may well be part of the database.
> NCMEC's CSAM database already includes images that are not necessarily illegal.

How could this be the case? If it's been determined to be CSAM then it is, by definition, illegal.

If it were true that the database is likely to contain legal material, how would we possibly know about it, given that the contents of the database are secret?

> How could this be the case? If it's been determined to be CSAM then it is, by definition, illegal.

Certain images are CSAM by _context_. They do not necessarily require those within the image to be abused, but rather that the image at one time or another was traded alongside other CSAM.

> If it were true that the database is likely to contain legal material, how would we possibly know about it, given that the contents of the database are secret?

Tools like Spotlight [0] make use of the database, so certain well-known images are known to flag. Such as Nirvana's controversial cover for Nevermind.

[0] https://www.wired.com/story/how-facial-recognition-fighting-...

> Certain images are CSAM by _context_. They do not necessarily require those within the image to be abused, but rather that the image at one time or another was traded alongside other CSAM.

At the risk of sounding like a broken record, how can we know this is actually true? Every description of the NCMEC database's contents that I've seen is incredibly vague, and as of 2019 it seems like there were fewer than[1] 4 million total hashes available. I would think that if it genuinely did include innocent photos of people's kids, the number would be much higher.

> ...certain well-known images are known to flag. Such as Nirvana's controversial cover for Nevermind.

I've heard this multiples times now, but I've never been able to find any evidence of it actually happening. The only instance I could find was one where Facebook removed[2] that Nirvana cover once for containing nudity.

1. https://inews.co.uk/news/technology/uk-us-collaborate-crack-...

2. https://www.theguardian.com/music/2011/jul/28/facebook-nirva...

If you're sending other people photos of your children that are explicit enough to prompt someone bring them to the attention of child safety groups like NCMEC, and they look at it and agree it's worth their time to investigate, the first you hear of it isn't likely to be after it eventually comes full circle through Apple's CSAM processes.

Remember, this isn't a porn detector strapped to a child detector.

Step 1: Get copies of pictures of targets kid in bath from phone/SNS

Step 2: Manipulate pictures so that hash collides with CSAM

Step 3: Get pictures back on targets phone so they get scanned.

I don't have the skills or understanding of how the hashes are created but would this be possible?

Hypothetically that's possible, although all three steps you listed are exceedingly non-trivial. The notion that an attacker could pull off two of those steps let alone three is borderline fanciful. In addition, their target must also qualifies with the necessary prerequisites:

• has an iPhone;

• has children;

• took photos of their children which could be mistaken for CSAM by a sloppy reviewer;

• is of sufficiently high importance to justify the effort.

And after that insane effort, all you've done is inconvenience your target for a little while until child safety people investigate your family situation and discover that the photos which got flagged were not actually CSAM.

Immediately after the investigation process discovers the hash fraud, Apple will immediately start delving into exactly how their hash algorithm failed in this instance, improving it to mitigate this exploit. So this target better be worth it!

If this was a plausible exploit, surely it would have already happened to people with Android phones since Google has been doing pretty much the exact same scanning of customer images for over five years. (The only difference with what Apple is now doing is where the hashing is performed—but this makes no functional difference to the viability of your hypothetical exploit.)