Hacker News new | ask | show | jobs
by wildmusings 3238 days ago
The details are very important here. Would the proposed ban really affect researchers proving that anonymization schemes don't work, or would it just apply to attempts to reidentify real people in real user data?

It seems reasonable that a company be prohibited from actively trying to ascertain the identity of users who have tried to remain anonymous. The ease of doing it is rather irrelevant. I'm kind of tired of this tech culture meme, that something should be allowed because it is easy. How easy it is to do something is really irrelevant to how legal it should be. As an extreme example, killing a man is rather easy.

EDIT:

Here is the bit from the source document that the blog author is responding to:

>Create a new offence of intentionally or recklessly re-identifying individuals from anonymised or pseudonymised data. Offenders who knowingly handle or process such data will also be guilty of an offence. The maximum penalty would be an unlimited fine.

"intentionally or recklessly re-identifying individuals" seems to limit this to real user data, not researchers evaluating anonymization schemes. As with any law, it is important to see what the eventual proposed legislation looks like, but I don't think there's anything to worry about here for legitimate security research.

4 comments

Would the proposed ban really affect researchers proving that anonymization schemes don't work, or would it just apply to attempts to reidentify real people in real user data?

There's not a clear line between the two. If a company publishes a list of "anonymized" email addresses, should I be arrested for putting one of the strings into Google to see if it's just an MD5 hash?

The ease of doing it is rather irrelevant. I'm kind of tired of this tech culture meme, that something should be allowed because it is easy.

The full argument is of the form "X is easy to do and hard to detect, so it would require police state tactics to have any hope of enforcing a law against it". The war on drugs is the classic example for this. Murder isn't; killing someone may be relatively easy, but it's usually obvious when it happens and it's hard to avoid leaving evidence of your involvement.

>The full argument is of the form "X is easy to do and hard to detect, so it would require police state tactics to have any hope of enforcing a law against it".

Plenty of crimes go unsolved in most cases. Littering, for example.

When you do catch an internet marketing company deanonymizing data, you can throw the book at them though. Strong penalties can serve as sufficient discouragement to others even if they are unlikely to get caught.

As far as I understand the GDPR, email hashes wouldn't be "anonymous" data at all, they'd be considered pseudo-anonymous (and therefore still PII)

I mean the problem is that this makes good-willed sites like haveibeenpwned.com illegal in the UK (with criminal sanctions) as they attempt to re-identify data that comes from a breach.

But on the other hand, I don't see why processing PII that comes from a data breach with the intent of de-anonymising it should be legal.

Maybe protections should be in place for security researchers, but how do you distinguish between them and malicious actors?

> I'm kind of tired of this tech culture meme, that something should be allowed because it is easy.

It's not just that it's easy; it's that it can be done merely by thinking in a particular way about information that's public or that was freely given to the person doing it. I'm not sure the fact that the thinking is done mainly with the aid of an algorithm changes the fundamental concept.

It probably shouldn't be illegal to think about things or to process data you've obtained legitimately whether it's easy or hard.

And what of the examples in the article about reidentifying Netflix users from public data, or reidentifying people from Australian census data? These two incidents could no longer, legally, be publically written about. We'd be left discussing only theoretical applications (ie "this is why MD5 is weak" vs "this is how you can deanonymize this real-world complete example"), which simply never has the same impact.

Do we have to start posting these on Pastebin instead of Medium now? Can 3rd parties report them during a security audit?

Even if this has all the good intentions of preventing scummy marketers from scraping data, the execution, if history is any indicator, will likely result in a law can be used to throw people in jail for reversing an MD5 hash.

Anything that attempts to ascribe intention to code is going to run into a lot of corner cases; see the long history of "copying" programs vs copyright law.

"Knowingly" is similarly vague: are you knowingly running every line of code executing on your machine right now? How would you be sure?

> "Knowingly" is similarly vague: are you knowingly running every line of code executing on your machine right now? How would you be sure?

That's exactly the point. If you perform the act unknowingly, you're innocent of the offence.