Hacker News new | ask | show | jobs
by marten-de-vries 3035 days ago
It's interesting to see how the GDPR seems to clash with some popular data models. For example, git.

Rewriting history of a shared branch is disastrous, but it's currently the only way to redact, say, an e-mail address someone committed with a couple of years ago. I'm curious how the various code hosting sides plan to handle that. Perhaps we'll see an extension of the data model that links commits to committer UUIDs, with the actual information being linked to that, making removal easier.

2 comments

Apparently Git is ok by GDPR as data subjects do not have the right to erasure if the information is meant for archiving purposes in the public interest [1].

[1] http://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:320... (Article 17)

> Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:

> [...]

> (d) for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing;

(emphasis mine)

I'd not say redacting a git repository does 'seriously impair' processing for archiving purposes. All the data (with the exception of the redacted e-mail) is still there, after all.

Still, the hashes will have changed, making the repo less useful for current users. But that has nothing to do with archival.

What if the purpose of the archiving is to not only record what was changed but also who changed it?
From the GDPR, recital 45:

> [...] where processing is necessary for the performance of a task carried out in the public interest [...] the processing should have a basis in Union or Member State law.

I don't think that purpose of archiving has a basis in law.

That said, I do remember my law professor calling the 'right to be forgotten' one of the weaker parts of the GDPR, and I'm not an expert, so it's possible I'm missing something.

Thank you for emphasizing that last part. I think you might be right.
And it's also protected by the right of freedom of speech: the entity operating the git server has the right to inform the public of who committed which changes. The GDPR explicitly recognizes "exercising the right of freedom of expression and information", although I'm not sure how European courts would interpret this provision. But for an American entity without a physical presence or assets in Europe, any EU judgment would be quickly quashed by American courts.
And Facebook has the right to 'inform' another company about all data it collected from its users (only in exchange for a nice sum of money!)

Except in the EU, freedom of speech and privacy are both considered human rights, which need to be weighed against each other. Freedom of speech will win when someone uses the GDPR to try to censor e.g. an online news article with some personal facts. But it won't for my Facebook tongue-in-cheek example, and I doubt it will for the redacted committer example either.

How would the judgement be quashed by american courts? No american court has jurisdiction over European courts. For an entity without presence or business in Europe, enforcing a european court decision might be a problem, but that’s a different matter. I’m sure the EU will find a way if the sum is sufficiently high.
Sorry, I was sloppy in saying that the judgment itself would be quashed. What I meant is that any attempt to enforce the judgment would be quashed. Since (by assumption) the defendant doesn't any assets in Europe to pay the fine, enforcing the judgment would require going after the defendant's assets located in the US. American courts will typically enforce foreign judgments from 'friendly' jurisdictions, but if the judgment is incompatible with American law, American courts will quash any attempt to enforce the judgment in the US.

Of course, Facebook and other large American corporations can be expected to comply with GDPR, since the cost of compliance is much less than the opportunity cost of being excluded from the EU market.

Sure, but these are two entirely different matters. Having an open, unenforced judgment against you might lead to complications, for example if the defendant happens to travel to the EU. It’s unlikely that a minor fine will be enforced by snatching the defendant at the airport, but it could at least legally an option.

Or the defendant may later open up a German subsidiary or plan on selling to a company with a german subsidiary. Things would get complicated in those cases.

So it’s important to be somewhat precise here - no enforcement doesn’t equal squashed judgment.

I suspect preserving git history is allowed for the purpose of determining copyright compliance.
There's an exception 'for the establishment, exercise or defence of legal claims.', but there's situations where that would not apply. E.g. commits fixing a single spelling mistake are probably not copyrightable.

Also, I doubt you can just keep a copy of all data you ever process, just because it might some day be useful as legal evidence.

Why would you say that? If you can get sued for a piece of code written 30 years ago, then it seems legitimate to me to store legal evidence for at least 30 years. As far as I know there is no time limit to being sued over something.
That makes sense for repository users keeping a private copy.

But, I was thinking more about companies like Github. If they can hide behind that clause for every single repo they host, the GDPR as a whole becomes useless. Pretty much everything could serve as evidence one day. As far as I know, judges don't like 'hacks' like that.

Also, additionally, code hosting platforms argue they are service providers and should not be liable for copyright infringement as long as they apply notice and takedown.