Hacker News new | ask | show | jobs
by xucheng 1765 days ago
> Neural hash generated here might be a few bits off from one generated on an iOS device. This is expected since different iOS devices generate slightly different hashes anyway. The reason is that neural networks are based on floating-point calculations. The accuracy is highly dependent on the hardware. For smaller networks it won't make any difference. But NeuralHash has 200+ layers, resulting in significant cumulative errors.

This is a little unexpected. I'm not sure whether this has any implication on CSAM detection as whole. Wouldn't this require Apple to add multiple versions of NeuralHash of the same image (one for each platform/hardware) into the database to counter this issue? If that is case, doesn't this in turn weak the threshold of the detection as the same image maybe match multiple times in different devices?

7 comments

This may explain why they (weirdly), only announced it for iOS and iPadOS, as far as I can tell they didn't announce it for macOS.

My first thought was that they didn't want to make the model too easily accessible by putting it on macOS, in order to avoid adversarial attacks.

But knowing this now, Intel Macs are an issue as (not as I previously wrote because they differ in floating point implementation to ARM, thanks my123 for the correction) they will have to run the network on a wide variety of GPUs (at the very least multiple AMD archs and Intel's iGPU), so maybe that also factored in their decision ? They would have had to deploy multiple models and (I believe, unless they could make the models exactly converge ?) multiple distinct database server side to check back.

To people knowledgeable on the topic, would having two versions of the models increase the attack surface ?

Edit: Also, I didn't realise that because of how perceptual hashes worked, they would need to have their own threshold to matching, independent of the "30 pictures matched to launch a human review". Apple's communication push implied exact matches. I'm not sure they used the right tool here (putting aside the fact for now that this is running client side).

It wasn’t part of the original announcement afaik but is coming to MacOS Monterey: https://www.apple.com/child-safety/

Edit: cwizou correctly points out not all of the features (per Apple) will be on Monterey but the code exists.

Is it ? I checked your link and they separate clearly which features comes to which OS, here's how I read it :

- Communication safety in Messages

> "This feature is coming in an update later this year to accounts set up as families in iCloud for iOS 15, iPadOS 15, and macOS Monterey."

- CSAM detection

> "To help address this, new technology in iOS and iPadOS"

- Expanding guidance in Siri and Search

> "These updates to Siri and Search are coming later this year in an update to iOS 15, iPadOS 15, watchOS 8, and macOS Monterey."

So while the two other features are coming, the CSAM detection is singled out as not coming to macOS.

But ! At the same time, and I saw that after the editing window closed, the GitHub repo clearly states that you can get the models from macOS builds 11.4 onwards :

> If you have a recent version of macOS (11.4+) or jailbroken iOS (14.7+) installed, simply grab these files from /System/Library/Frameworks/Vision.framework/Resources/ (on macOS) or /System/Library/Frameworks/Vision.framework/ (on iOS).

So my best guess is, they trialed it on macOS as they did in iOS (and put the model there contrary to what I had assumed) but choose not to enable it yet, perhaps because of the rounding error issue, or something else.

Edit : This repo by KhaosT refers to 11.3 for the API availability but it's the same ballpark, Apple is already shipping it as part of their Vision framework, under an obfuscated class name, and the code samples runs the model directly on macOS : https://github.com/KhaosT/nhcalc/blob/5f5260295ba584019cbad6...

Ah good catch and write up. I believe you’re right and likely a matter of time for Mac. Hard to tell if this means it’s shipping with MacOS but just not enabled yet.
The model runs on the GPU or the Neural Engine, CPU arch isn't really a factor.
My bad, I edited the previous post, thanks for this. Assuming this runs on Intel's iGPU, they would still need the ability to run on AMD's GPU for the iMac Pro and Mac Pro, so that's at least two extra separate cases.
It's not a user facing feature, and x86 macs are the past already - I doubt they'll bother porting it.
my primary expectation is this tech will be used for dcma2.0 and "for the kids" is the best way to launch it.
This basically invalidates any claims Apple made about accuracy, and brings up an interesting point about the hashing mechanism: it seems two visually similar images will also have similar hashes. This is interesting because humans quickly learn such patterns: for example, many here will know what dQw4w9WgXcQ is without thinking about it at all.
> it seems two visually similar images will also have similar hashes

This is by-design - The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are, so I don't think it invalidates any claims.

Perceptual hashes are different to a cryptographic hash, where any change in the message would completely change the hash.

> The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are

If that is the case, then the word "hash" is terribly mis-applied here.

Hash is applied correctly here. A hash function is "any function that can be used to map data of arbitrary size to fixed-size values." The properties of being a(n) (essentially) unique fingerprint, or of small changes in input causing large changes in output, are properties of cryptographic hashes. Perceptual hashes do not have those properties.
Good explanation, thanks. I only knew about cryptographic hashes, or those that are used for hash tables where you absolutely do not want to have collisions. Anyhow, I'm not really comfortable with this usage of the word "hash". It is completely opposite of the meaning I'm used to.
Maybe the term fingerprint is better
It greatly increases the collision space if you only have to get near a bad number.
> The whole idea of a perceptual hash is that the more similar the two hashes are, the more similar the two images are

This is already proven to be inaccurate. There are adversarial hashes and collisions possible in the system. You don’t have to be very skeptically-minded to think that this is intentional. Links to examples of this already posted in this thread.

You are banking on an ideal scenario of this technology not the reality.

EDIT: Proof on the front page on HN right now https://github.com/AsuharietYgvar/AppleNeuralHash2ONNX/issue...

I think you may have misread my comment: I did not mean that the similarity of hashes invalidates any claims.
> Wouldn't this require Apple to add multiple versions of NeuralHash of the same image (one for each platform/hardware) into the database to counter this issue?

Not if their processor architectures are all the same, or close enough that they can write (and have written) an emulation layer to get bit-identical behaviour.

Floating point arithmetic in an algorithm that can land you in jail? Why not!
This algorithm can not land you in jail. Nobody would be jailed based on this algorithm.

The algorithm alerts a human, who actually looks and makes the call.

I think it would just require generating the table of hashes once on each type of hardware in use (whether CPU or GPU), then doing the lookup only in the table that matches the hardware that generated it.
To re-do the hashes, you would need to run it on the original offending photo database, which -- as an unofficial party doing so -- could land you in trouble, wouldn't it?

And what if you re-do the hashes on a Mac with auto-backup to iCloud -- next think you know the entire offending database has been sync'd into your iCloud account :-/

They are probably using https://en.wikipedia.org/wiki/Hamming_distance to have a leeway which again adds to a potential of having more false positives.
Yes, this and other distance metrics are what are used to do reverse and image similarity lookups with perceptual hashes.
I don't understand the concept of "slightly different hash". Aren't hashes supposed to be either equal or completely different?
You're thinking of cryptographic hashes. There are many kinds of hash (geographic, perceptual, semantic, etc), many of which are designed to only be slightly different.
There is a class of hashes known as locality-sensitive hashes, which are designed to preserve some metric of "closeness".

https://en.wikipedia.org/wiki/Locality-sensitive_hashing