Hacker News new | ask | show | jobs
by davidgerard 1615 days ago
Nothing about GDPR is hard ... unless your business model is to abuse your customers' personal data. Then it might be hard.

I routinely see the loudest complainers about the onerous nature of GDPR compliance suddenly get vague or stop posting when you ask for details of precisely what bit is so hard for them in particular. Note lack of those details in this present discussion, for example.

So far, it seems a safe assumption that the excuse makers are abusing personal data, and they know they're abusing personal data.

Perhaps one day a clear exception will show up.

I wrote up a thing here a few years ago with my actual on the ground experience of getting us compliant: https://reddragdiva.dreamwidth.org/606812.html

tl;dr anything that might vaguely constitute personal data, down to Apache logs, must either be in a writable database for redactability, or deleted.

Since then, our legal team - who are not your legal team! - has advised:

* 30 days for operational purposes is fine actually.

* Go feral on anything over 30 days. You need a named person responsible for GDPR redactions.

* If you want to do analytics on those Apache logs, do them quickly and into a form that doesn't contain personal data.

I'm in the UK, which is no longer in the EU, but the GDPR laws still hold here.

3 comments

I have a small business that operates a website. To be absolutely clear, we don't rent or sell personal data in any form and our business model has nothing to do with tracking or profiling anyone.

We retain certain access records that can potentially be used to identify individuals indefinitely. These records have demonstrably helped us to defend against attacks on our infrastructure and to prevent attempted fraud on multiple occasions, sometimes years after the records used were first collected. We include these general purposes for processing but do not disclose exactly how we use these records for these purposes in our privacy policy.

So, are we compliant because there is a demonstrable legitimate interest in keeping these records? Is holding that personal data indefinitely, knowing that it mostly won't be needed, disproportionate and a GDPR violation? I'd love the people who think the GDPR is simple to show me verifiable, authoritative answers to these types of questions, because so far we haven't found any lawyer who can, nor found any information from any relevant regulator that we could point to as a clear indication either way.

1. You'll have to disclose this in your privacy policy

2. You can store identifying data of website accesses etc for at most 30 days without worry

3. Beyond that, you can only store data that's absolutely necessary, e.g. metadata associated with actual purchases and transactions, but not every access.

4. Usually, you'll have to delete that 2 years afterwards, in some exceptional situations up to 30 years are possible

What I'd do: 1) disclose, 2) delete logs after 29 days, 3) copy all logs associated with a customers transaction into a separate storage location, shared by customer, transaction and date, so you can delete it 2 years later.

My response to all of your points is the same: can you cite the authority for those claims please?

For example, no-one processing card payments is going to disclose in any privacy policy exactly how they combine all their signals to determine fraud risk and whether to allow an attempted transaction in real time.

If you're really in business in the EU or associated companies, and not just LARPing in a comment section, you'll have been under these laws for several years now, and should already have contacted your own lawyer on this question.
I've commented about my business interests and the GDPR several times in my more than a decade on HN. You're welcome to scan my comment history if you think I'm LARPing but I have no interest in continuing a discussion with anyone who isn't doing so in good faith.

I addressed your point about taking expert advice in my original comment above: neither lawyers nor regulators have been able to give us a clear answer so far.

I'd believe that you're describing your system's current requirements at a high level. But without exact technical details of how retaining personal information helps you prevent fraud many years later, I don't believe that it is the only way possible.

For example, if the personal information you're talking about is IP addresses, it seems like you could cook those down to non-identifying information pretty quickly - eg zap the last octet. Furthermore, I'd think you would want to cook it down promptly so you can store the current use of the IP block rather than what it might be used for in a few years. (Sidenote: I personally get hassled based on my IP address block way too much, so keep in mind you're harming legitimate customers if this is what you're doing).

Another example - if you're keeping personal details on people who have committed fraud (or not) and referencing that years later, then I'd say that falls squarely in the purpose of the GDPR and you should not be doing that long term.

Or you're doing something else. But without describing exactly what you're doing, you don't make a very compelling case.

Another example - if you're keeping personal details on people who have committed fraud (or not) and referencing that years later, then I'd say that falls squarely in the purpose of the GDPR and you should not be doing that long term.

You're saying that we shouldn't be keeping detailed records of previous attempts to criminally defraud us that are demonstrably useful for identifying and preventing further attempts to criminally defraud us over a long period of time by the same groups of people?

I'm sorry to be blunt but that is not a serious proposition. If anyone thinks the GDPR says otherwise, chapter and verse please.

I agree with you that this is a good example of a GDPR challenge. I think that building a profile of user patterns to protect against fraud & abuse is a perfectly legitimate business interest, even under the GDPR. But I disagree that this is a problem unique to the GDPR -- any means of profiling like this runs the risk of discriminating against protected groups or individuals, and there's been plenty of discussion here about e.g. the use of proxy identifiers for race of background employed by universities to filter applicants.

As with all legislation, there is no clear yes or no answer. If a GDPR watchdog were to evaluate your use of this information, they would primarily care whether you (as a company) are aware of the risks involved in the profiling, whether you have spent the effort to weigh the pros and cons of your approach, and whether you have taken steps to sufficiently anonymize such data without making it useless for your purpose. If you have, I don't think you have to worry about retroactive fines even if the watchdog concludes you're violating the GDPR in some way.

Personally, I'd go even further and say that you don't have to honour data deletion requests from users that have tried to defraud you -- it's unlikely they will do so because they would be required to identify themselves to you, after which you can turn them in to the police, but you can legitimately argue that you need to keep their identity on-file to protect your business. I'm sure the GDPR disagrees with me here, but I'd like to see a watchdog test that case in court.

I'm sure the GDPR disagrees with me here, but I'd like to see a watchdog test that case in court.

I doubt it. The right to erasure has never been absolute even under GDPR. Typical examples are that you can't compel a bank to delete all records of a loan it gave you, nor compel the police to delete a criminal record of your past behaviour, as long as the data is lawfully and properly handled.

Is that what you are specifically doing and describing above, or are you just choosing one implication of my general pondering?

If this is actually what you're doing, let's discuss the specific details of the information flow you're using to make these decisions, rather than talking in terms of strong sweeping generalizations. If you're just picking out a worst case implication of my general statement, that doesn't seem very productive. An example of what I was specifically thinking:

Customer buys something from Vendor. Customer never receives package. Vendor refuses to issue refund because tracking marks delivered. Customer files CC chargeback (I know this is less common in the EU but work with me here).

From the perspective of the Vendor this Customer has defrauded them, or at the very least is an increased risk. From the perspective of the Customer, they've been unjustly judged for circumstances beyond their control.

Can the Vendor retain that judgement on the Customer forever? Can they share it with other Vendors to create an industry blacklist of "problematic" customers? These questions seem squarely within the aim of the GDPR.

In the example I was thinking of the situation was much more clear-cut than that. Again I'm not going to get into real specifics because this is legal stuff and it's just a discussion forum, but consider this broadly similar example.

You provide a service that anyone can sign up for. It costs money.

As a matter of good customer service your usual practice is to allow significant grace periods when money owed is overdue before you actually cut a customer off.

Someone signs up for a real account using the name "Mallory One" and then exploits the "generosity" of your system to avoid paying part of what they owe you. Eventually you cut them off.

Someone then signs up for a real account using the name "Mallory Two" and does the same thing again. Again you eventually cut them off but miss part of the payment you were due.

After this has happened several times over an extended period, it comes to your attention that the only people signing up using names of the form "Mallory (number)" are ripping you off and the person or persons responsible have already cost you thousands in unpaid bills.

You add a rule to your security system that says when anyone creates an account with the name "Mallory (number)" you will immediately block it.

How long are you allowed to remember the pattern "Mallory (name)" in your security system if it can potentially be tied to a specific individual and is therefore personal data but you reasonably believe that person to be responsible for all of that fraud and you reasonably expect that they will continue to defraud you if you don't prevent it?

Is this is a practical way of preventing fraud? Can the person not switch their next account from "Mallory Three" to "Eve Smith", thereby evading your rule?

I understand you've simplified the example here for the sake of discussion, but I think the details inform the situation. Like if you really just want to discriminate on any account named "Mallory _____" then that doesn't seem like personal data to me (even though you've created the rule from "personal data"), but also it doesn't seem particularly effective so there must be more to the story.

For an analogous example, you don't need to keep a permanent record of fraudulent transactions with specific IP addresses of 10.0.37.{23,45,67} to remember that 10.0.37/24 is suspicious.

(Also what about everybody else who legitimately has the first name Mallory ?)

Your case is interesting because it contains a few unusual qualities that businesses generally don't offer, but smaller "nicer" businesses will give more leeway. You could straightforwardly stop giving a large freebie to new users or require a payment method or identification on signup, but it would be nice to figure out where the line is instead of just giving in to such less friendly practices.

righty-ho. You're in the EU, UK or another country under the GDPR? Have you spoken to an actual lawyer about this? Since, as you say, it's your business.

And this has been a regulation you've been required by law to follow for quite a few years now. Have you just not been worrying about it?

You're asking questions that, as other commenters have noted, are plausibly a valid case, but are quite specific to the precise details of what you're doing and how you do it.

Yes, we took real legal advice in good time. We also had some time with a specialist in GDPR compliance and eventually spoke with the regulator in our country. While I'm obviously not going to discuss specifics here, nothing was hidden from any of those experts. And we are still not 100% clear on what is theoretically allowed here.

This is my point. Literally no-one actually knows whether these kinds of edge cases are permitted under the regulations until you're already at the point that someone in a regulator's office has initiated a formal action to find out and potentially penalise you if they're not.

Are there not any other ways to secure your system? Seems a bit off to me that some personal info is all that is needed to try fraud or 'attacks'.
You would be amazed how dumb (and yet still dangerous and disruptive) some people can be. Here is an example without getting too specific.

We once had a series of attacks by the same group. They would sign up for real accounts on our site and then take certain actions that violated our terms of service. Everyone here would agree those violations were serious enough to justify immediate termination and potentially reporting to relevant authorities.

Every time they signed up there were certain patterns in the details they gave that allowed us to recognise them. Those are the kinds of data we intend to keep indefinitely so that our security system can intercept any further attempts (which still happen sometimes) and block them.

IP addresses are the problem. Those are pretty important in trying to find bad actors. They are widely stored for a long time to to be able to identify various forms of abuse. But GDPR considers IP addresses to be personal data, as it is potentially possible that one identifies a unique individual.

For example, most classic forum software stores the IP address of a post submitter indefinitely for anti-abuse reasons. It seems like nobody running such forum software could ever be GDPR compliant. This is despite them never selling this data, trying to mine it for any nefarious purposes or anything like that.

> unless your business model is to abuse your customers' personal data. Then it might be hard.

It's not only your business model, but also the business model of all third-party services you are using on your site.

Also, part of the reason why it's not that hard is that the GDPR is pretty much one of a kind. Imagine the US and maybe some countries in Asia having similar but different implementations of privacy laws, and you having to work with them simultaneously. Or even different laws in each US state (CCPA?). Imagine every country requiring you to store user data only the user's country of origin, thus managing a separate database for each country.

as I said:

> Note lack of those details in this present discussion, for example.

your comments so far have been apocalyptic GDPR fan-fiction, but are notably short on the actual details of what you're doing and how you do it.

>Also, part of the reason why it's not that hard is that the GDPR is pretty much one of a kind. Imagine the US and maybe some countries in Asia having similar but different implementations of privacy laws, and you having to work with them simultaneously.

That's why treaties like Convention 108+[0] exist, to provide a common framework for implementing data protection laws.

[0] https://search.coe.int/cm/Pages/result_details.aspx?ObjectId...

How about a site like Wikipedia?

There is relatively little non-public information about about users kept. The email address, date and time of a few first time actions (like creating the account, verifying email address, going to an edit page, etc), some account settings like language. They do keep some data short term, like track of the ip address a given user signs in with. I'm not certain how long this data is kept for but apparently up to 90 days. This is one of the tools used to check for certain types of abuse by logged in users, like sockpupetry.

The majority of information about a user that the site stores is publically displayed information clearly voluntarily submitted, with implied consent for use, like what pages the user edited and when (public info), information they choose to add to their user page etc.

But never the less, Wikipedia is potentially a pile of GDPR violations, despite pretty clearly not doing the sort of stuff the GDPR is trying to restrict.

Potential violations include:

#1. When an anonymous user edits the site, the edit is publicly attributed to an IP address, which is kept forever. IP Addresses are considered personal data under the GDPR. It is not feasible to only keep these address for 30 days, as all edits need to be attributed to something. It is not at all clear that keeping the IP address indefinitely for this falls on the correct side of the legitimate interest line here. So this could well be a GDPR violation.

#2. What about users requesting deletion? While the project can delete the user-pages, and even rename the account to something non-specific (like renaming away from being the User's real name), it is likely to not be terribly difficult for someone to identify the renamed user, especially if they ever left a signed message on a talk page. Retroactively modifying such past edits, and editing other people's posts that referenced your old username would be too disruptive. But it is not 100% clear that what Wikipedia reasonably can do is enough under the GDPR.

Also, technically speaking as a rule Wikipedia never actually deletes revisions from the database unless technical reasons require it. Deleted articles are no longer visible but are still stored in the DB. Even copyright violations are normally only rev-deleted (can be restored by admins), or Suppressed (can be restored by oversight users). This sort of not-actual deletion might not actually be enough under the GDPR.

#3a. Let's say a user submits a data access request. Wikipedia could provide them with their own email address, profile settings, non-public temporally logged information about the user, like the IP addresses used to log in. They could provide a copy of the user pages, and even the complete history of them, as well as all the edits the user has made, possible even edits that are not currently public. (Like articles that have been deleted, edits that were suppressed, etc).But is that all really enough?

What if other users on some talk page end up talking about this user, without specifying the username (so Wikimedia Foundation cannot easily find the reference), but the prose is sufficiently specific to clearly identify this natural person? The posts could potentially even reference other interesting data about the person, like their religion. While Wikimedia foundation may not have the sort of AI needed to parse the conversation and extract the personal data and associate it with the user in question, by the strict letter of the GDPR it still counts as personal data, and there is no infeasibility exception to disclosing it, so if the user later find this conversation, and then wants the relevant data protection agency to go after Wikipedia, the agency technically could justify issuing a fine here. Is is likely to actually happen? Of course not! But it could if for some reason the relevant people at the agency has a personal grudge against Wikipedia.

#3b. Once again a data access request: What if the user is actually a also public figure. Surely they would also need to be given a copy of their article, and possibly the complete history of the article. But there could well be other articles that reference this person and it is not necessarily feasible to automatically find all of them, especially if any don't explicitly link to the subject's main article. Once again, strictly speaking not providing any personal data contained in those other articles would be a violation of the letter of the GDPR, despite not violating the spirit.

------------------------------

These are only a handful of edge cases I can come up with. In all of these scenarios Wikipedia is being very reasonable, and is not trying to collect any more personal data than needed to run their site, and is being fairly reasonable in trying to balance user's rights to with practical considerations. But they still have multiple places where it could be argued they violate the GDPR nevertheless. They are not an evil company trying to collect personal data and mine it for profit or sell it. But the extremely vagueness about details contained in the GDPR makes it so it is hard really have any idea for sure if they are on the correct side of it or not.

This is true despite the fact that no data protection agency is likely to every try to take action against the Wikimedia Foundation for such violations, simply because in practice their actions are good enough, and trying to attack something like Wikipedia will likely piss off the population that want the agency instead going after Facebook, or companies who have massive data leaks they try to hide.

One might argue that Article 85 might be interpreted to protect Wikipedia under freedom of expression and information. Or perhaps one might say that the data qualifies for processing under the Article 6 1(e) because identifying users modifying a public resource is a necessary part of the task of developing Wikipedia itself, which is a task in the public interest (questionable, but not impossible to try to argue). But let's say it was not actually Wikipedia in question, but some other forum of user provided content with similar limitations, that might not qualify for extended protections for freedom of expression and information, or as a task in the public interest?

Some of these same sort of concerns technically apply to any sort of online public discussion forum, even ones that are very much not trying to collect personal information, beyond the bare minimum they need for accounts and anti-abuse. Even this very forum we are on right now can potentially suffer from the "other people talking about you in an identifiable way", but admins cannot find the conversation to provide it to you for an access request problem.

I think you're over-complicating some of these.

On point 1, I think legitimate interests covers this fairly well, but it would be arguable for sure.

On point 2, the right to erasure is not absolute so the fact that data are not purged from the database is not relevant. Legitimate interests also come in to play here.

On point 3a, the GDPR only mandates that data subjects are given access to personal data, so the WMF need not collate the information to send to them. Surfacing rev-deleted data might be more tricky, I suppose, Wikipedia has policies against posting personal data of other users and such edits will be oversighted where it's brought to admin attention (see WP:OUTING).

On point 3b, again the legal requirement is to give access to the data. Rectification and erasure is also straightforward (edit the page, ask for other edits to be revdeled/oversighted if the violate WP:BLP). Like you say, Article 85 offers wide protection here, too.