Hacker News new | ask | show | jobs
by franky47 2062 days ago
> If you track a user, you have to get his consent before doing it

This would mean any server-side analytics (looking at access logs, which include IP address and user-agent) cannot be used for analytics or tracking, since there is no way for a user to give/deny consent to a page that already has logged information on them.

4 comments

You obtain consent and then you log only if consent was provided. You can essentially use two logs, one for technical purposes (under legitimate interests you should be fine logging as long as those logs are only used for technical/debugging/abuse prevention purposes and the data is not kept for longer than necessary) and one for marketing/analytics purposes. You only log to the second one if consent has been given, and you only ever do your analytics on that second log and not the first one.
It's also probably a legitimate interest to retain data for marketing and analytics purposes, so long as that retention meets the same sort of guidelines. Marketing is explicitly highlighted as one of the applicable uses for legitimate interest.
Have you any specific document or decision in mind ?
Recital 47 (https://gdpr-info.eu/recitals/no-47/) explicitly states:

"The processing of personal data for direct marketing purposes may be regarded as carried out for a legitimate interest."

It's also mentioned in Article 21 describing the right to object to processing using legitimate/public interest:

"Where personal data are processed for direct marketing purposes, the data subject shall have the right to object at any time… etc."

The ICO has some useful guidance on when it is an appropriate basis: https://ico.org.uk/for-organisations/guide-to-data-protectio...

One could argue that analytics purpose is not direct marketing purpose. My understanding is that as analytics can be considered as a usual/expected business process, it may use legitimate interests as far as it fulfill requirements (information of the process, the right to opt-out, ...). However, the problem is that analytics may be advanced analytics. Is the retrieval of Adwords parameters from a glcid allowed/expected ? Is the injection of historical behaviour or marketing segment allowed/expected ?
I would like to see more software having the option of just logging the users country and not the IP, and perhaps just as generic a user agent as possible (Just, is this Chrome, FireFox, Edge, whatever, but nothing else.)

for example for Nginx something like:

log_format logfmt '$remote_country - [$time_local] ' '"$request" $status $body_bytes_sent ' '"$http_referer" "$http_generic_user_agent" "$gzip_ratio"';

That would assume access to a GeoIP database, but it would be helpful.

$remote_country is interesting idea, you classify visitor into per-country "buckets". Although the buckets would not be of equal size. If you have a single regular visitor from a tiny country, $remote_country could uniquely identify them.

A similar idea would be to have built-in $remote_addr_hash8, $remote_addr_hash16 variables which hash IPv4 and IPv6 addresses down to 8-bit or 16-bit numbers.

There are hacky ways you can do some forms of anonymization already:

https://www.supertechcrew.com/anonymizing-logs-nginx-apache/

FWIW, CloudFlare can inject a cf_ipcountry header that does that. User-agent parsing is unfortunately more complex, with lots of false readings (not counting bots & crawlers).
The reality is that GDPR is not strongly enforced at the moment. This is not uncommon for Europe and may be a cultural differences with other places.

Those who have the intent to comply and are at least complying in spirit are not at any legal risk. Attitude matters.

And the spirit is obvious: get consent if you enable a third party to unique identify a user in reality. I.e. if it's private data or if you enable correlation across websites.

It's correlating and sharing you need consent for. Don't worry about a server log.

It is not about what you make possible. It's about what you do. Technically any sysadmin can access some information they should not. It's unavoidable.

But that's quite a far way from commercially exploiting databases of people without their consent.

Honestly they should just ban the sale of personal information. Most internet marketing vendors are not actually in the business of selling personal data.

Now the good ones suffer because of the bad ones. And the bad ones either pretend they have consent or find a way to get it.

I think that overall the GDPR law was good for privacy but a disaster for usability.

It was good for privacy, not because it's enforced or not and not because sites are showing cookie consents, but because it made the public more aware of centralization/privacy issues on the internet and companies a bit more careful with data processing. This law also resulted in many "privacy-friendly" alternatives for various services, which in the end led to a healthier market and improved data decentralization.

If you're tracking an amorphous profile, how do you match the right person to the right data? Do you have to match the data to a unique person?