| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by a785236 2473 days ago

I wish the authors wouldn't oversell the privacy claim:

> Github: "The DeepPrivacy GAN never sees any privacy sensitive information, ensuring a fully anonymized image."

> Abstract: "We ensure total anonymization of all faces in an image by generating images exclusively on privacy-safe information."

> Paper: "We propose a novel generator architecture to anonymize faces, which ensures 100% removal of privacy-sensitive information in the original face."

Changing a face anonymizes an image the same way that removing a name anonymizes a dataset -- poorly. This is cool, but it's not anonymization.

1 comments

lucb1e 2473 days ago

"This is cool, but it's not anonymization." Isn't it?

For clarity it might be good to establish what I mean when I talk about three terms: "identifiable" is either the original, encrypted with the key available, or a hashed version or bloom filter (or so) of low-entropy data such as email addresses or phone numbers; "pseudonymous" is replacing the data with a unique but disconnected value (e.g. a UUID, or encrypted with a random key and key destroyed); and "anonymous" is either no data, or data that has no relation to the original.

As far as I can tell, this algorithm replaces the data with a random value that has no relation to the original. I understand that if we have a list of HN comment metadata and you remove the usernames ("anonymize"), you can still find me by the time of posting correlated to DNS request logs at the ISP. In the case of pictures, I guess the place is usually identifiable + the time is known, thus you can potentially piece together who was there at that time, corroborated by the presence of a certain backpack or shirt.

Is that what you mean, or is there something else that makes you say it is either still identifiable or pseudonymized rather than anonymized?

link

a785236 2473 days ago

No, it isn't.

> ... this algorithm replaces the data with a random value that has no relation to the original.

Based on that sentence, I assume that when you write "the data" you mean "the part of a picture corresponding to a person's face." But removing the face from a picture doesn't necessarily make it particularly difficult to identify the subject if the subject is very familiar to you. It doesn't matter if you've never seen that specific picture, or if you have no additional context like place and time.

Just look at the examples on the GitHub page for proof! The picture of Obama and Trump is clearly recognizable, and at least one of the other Obama photos is easy to recognize. The soccer players are identifiable from their jersies (Messi is #10 on Barcelona). Jennifer Lawrence was also easy for me to spot.

link

lucb1e 2473 days ago

> if the subject is very familiar to you.

Fair enough, if you know what someone wears, their exact skin color, build, and perhaps even the place they are in, then sure, blacking out the face (or changing it for that matter) won't help. I guess I agree that this is more common than the authors make it sound (it's indeed not 100% guaranteed absolutely anonymous always ever, as they put it). But I do have to say, this is about as good as blacking out the face completely and a lot less obnoxious.

> The picture of Obama and Trump is clearly recognizable

You sure? If I show this picture in isolation to someone https://snipboard.io/VjwEc1.jpg I'm not sure that they will say it's Obama. Not sure there is a politically correct way of saying this, but there aren't that many people that are well-known by billions with that skin tone and in a suit, so of course if you ask them "the face was changed, who is this?" they can do a lucky guess for Obama because that's the only guessable possibility.

link