Hacker News new | ask | show | jobs
by matwood 1015 days ago
You are correct, the original method would only have scanned items destined to iCloud and only transmitted some hash of matching hashes. And yes, similar slippery arguments exist with any providers that store images unencrypted. They are all scanned today, and we have no idea what they are matched against.

I speculated (and now we know) when this new scanning announced, that it was in preparation for full E2EE. Apple came up with a privacy preserving method of trying to keep CSAM off their servers while also giving E2EE.

The larger community arguments swayed Apple from going forward with their new detection method, but did not stop them from moving forward with E2EE. At the end of the day they put the responsibility back on governments to pass laws around encryption - where they should be, though we may not like the outcome.

1 comments

There are also ways to detect matches even with e2ee iirc and I suspect they found doing that instead easier than dealing with the previous approach.

At the time I also thought it was obvious it was in preparation for e2ee (despite loud people on HN who disagreed).

I do wonder if they had intended to have it be default on though, maybe not since probably better for most users to have a recovery option.

> There are also ways to detect matches even with e2ee

By definition, encryption (with unique user keys) means you can't infer nor check what the content of the message is. Not without client cooperation, which is what this feature would have been.

This is what I was recalling, this method gives you a clever way to do it using the file itself as the key:

> “Convergent encryption solves this problem in a very clever way:

“The way to make sure that every unique user with the same file ends up with an encrypted version of that file that is also identical is to ensure they use the same key. However, you can’t share keys between users, because that defeats the entire point; you need a common reference point between users that is unknown to anyone but those users.

“The answer is to use the file itself: the system creates a hash of the file’s content, and that hash (a long string of characters derived from a known algorithm) is the key that is used to encrypt said file.

“If every iCloud user uses this technique — and given that Apple implements the system, they do — then every iCloud user with the same file will produce the same encrypted file, given that they are using the same key (which is derived from the file itself); that means that Apple only needs to store one version of that file even as it makes said file available to everyone who “uploaded” it (in truth, because iCloud integration goes down to the device, the file is probably never actually uploaded at all — Apple just includes a reference to the file that already exists on its servers, thus saving a huge amount of money on both storage costs and bandwidth).

“There is one huge flaw in convergent encryption, however, called “confirmation of file”: if you know the original file you by definition can identify the encrypted version of that file (because the key is derived from the file itself). When it comes to CSAM, though, this flaw is a feature: because Apple uses convergent encryption for its end-to-end encryption it can by definition do server-side scanning of files and exploit the “confirmation of file” flaw to confirm if CSAM exists, and, by extension, who “uploaded” it. Apple’s extremely low rates of CSAM reporting suggest that the company is not currently pursuing this approach, but it is the most obvious way to scan for CSAM given it has abandoned its on-device plan.”

https://stratechery.com/2022/apple-icloud-encryption-csam-sc...

That makes me happy, because 12 years ago here on HN I posted a comment [1] outlining how a Dropbox-like service could be implemented that stored user files encrypted, with the service not having the keys, yet allow for full deduplication when different users were storing the same file, while still supporting the normal Dropbox sharing features.

The file encryption part was based on using a hash of the file as the key.

It's always nice to later find out that one's quick amateur idea turns out to be an independent rediscovery of something legit. Now that I've learned it is called "convergent encryption" Googling tells me it it goes back to 1995 and a Stac patent.

[1] https://news.ycombinator.com/item?id=2461713

This still suffers the same problems as the original proposal. Specifically, Apple could still be pressured or forced by governments to check for non-CSAM images. And using cryptographic hashing means they can’t detect altered files, while using perspective hashing leaves them open to false positives.
That’s not what’s commonly understood to be a modern cipher. It would be trivial for a government to make a list of undesired messages/images and find everyone that has forwarded it.

https://en.wikipedia.org/wiki/Chosen-plaintext_attack

Yeah that’s literally the point in the CSAM case.

For regular people taking photos the government won’t have their plaintext.

For popular media people are uploading the same copy of it saves a lot of bandwidth.

> that hash is the key that is used to encrypt said file.

So every file has a unique key? So thousands or 10,000s of keys would need need to be in the keychain, mapped to the file name.

And if one person's keys are leaked, they can be used prove that other people had the same file

No, this doesn't sound well thought out

Could one defeat this by changing a single byte in their file?
Bit, yes. Though on a moderately large file it would be easy to brute force all one-bit modifications, and then the effort grows exponentially (basically) in the number of bits flipped, so you’ll want to do more than a few.
> At the time I also thought it was obvious it was in preparation for e2ee

I thought the same.

> despite loud people on HN who disagreed

Yeah, loud people be like that, but this is really Apple’s communication fault. They could have started with that “hey we want to provide e2e encrypted storage, the price of it will be that we need to scan what you upload for csam”.