Hacker News new | ask | show | jobs
by joefkelley 2479 days ago
I'm an engineer who has worked on ad systems like this and I'm really struggling to make sense of this article - what hope does a layman have?

Here's my understanding: Google runs real-time bidding ad auctions by sending anonymized profiles to marketers, who bid on those impressions. The anonymous id used in each auction was the same for each bidder, which is in violation of GDPR. If Google were to send different ids for each bidder, it would be ok? Is this correct?

Why would it matter that the bidders are able to match up the IDs with each other, aren't they all receiving the same profile anyway? Wouldn't privacy advocates consider the sending of the profiles at all an issue?

4 comments

This is a problem because companies can use this ID to correlate private user data, without anyone's knowledge or consent.

There are companies that specialise in sharing user information. Some of them work by only sharing data with companies that first share data with them (an exchange).

If you got this Google ID, and you had a few other pieces of information about the user, you could share that data with an exchange, indicating that the Google ID is a unique identifier. Then, the exchange would check if it has a matching profile, add the information you provided to that profile, and then return all of the information they have for that profile to you.

So, let's say you're an online retailer, and you have Google IDs for your customers. You probably have some useful and sensitive customer information, like names, emails, addresses, and purchase histories. In order to better target your ads, you could participate in one of these exchanges, so that you can use the information you receive to suggest products that are as relevant as possible to each customer.

To participate, you send all this sensitive information, along with a Google ID, and receive similar information from other retailers, online services, video games, banks, credit card providers, insurers, mortgage brokers, service providers, and more! And now you know what sort of vehicles your customers drive, how much they make, whether they're married, how many kids they have, which websites they browse, etc. So useful! And not only do you get all these juicy private details, but you've also shared your customers sensitive purchase history with anyone else who is connected to the exchange.

Considering google_gid is valid for you for 14 days only. It is very unlikely to build a profile around it.
I have no doubt that if you had a record of my browsing habits for 2-3 days you could readily identify who I am the next time you have my browsing habits for that period of time.

I wouldn't be surprised at all if 2-3 hours of active browsing was enough for this.

Your device fingerprint alone is generally enough to tie your new google id to any previous ones.
Which is also a typical example of privacy violations in the name of alleged security.

Some newer linux kernels (>2016) use random tcp timestamps offsets to prevent clock skew profiling.

That is a security feature, not the shit big tech is offering here.

But of course the mechanisms in question are suddenly implemented for fraud protection instead of user security. Yeah, bullshit.

It seems likely that the ad network could detect the change in ID if the expiration happens in the middle of a browsing session. Which, considering user habits, they are probably online at the same time every day, or have habits that cycle weekly.

Also, considering we largely do the same things every week and every day, I suspect a single day to give you at least 50% of a user's identifying data, and a week to give you at least 80%. That leaves a whole week of pretty accurate tracking.

I think you've made a pretty wild claim that 14 days isn't enough time to build a useful profile. Regardless, even if the usefulness of the data over two weeks is questionable, it's still illegal to share the data in this way. You wouldn't be too happy if someone broke into your house and "only" stole a single fork.

Considering how much time many people spend online, and how efficient these profiling systems have become, I wouldn't be surprised if 14 days was plenty of time.
The time of validity and how hard it might be to build a profile are not factors in whether or not this is legal under GDPR. Here's the actual text from GDPR on pseudonyms and synthetic keys of this type[1]

> The principles of data protection should apply to any information concerning an identified or identifiable natural person. Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person

So PII that has been pseudonymized (mapped to a gid in this case) is protected in exactly the same way as if it had not been if the pseudonymized data could be mapped to a natural person by the use of additional data. The pseudonym (gid) is itself also considered PII under gdpr. [1] https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CEL...

> The pseudonym (gid) is itself considered PII under GDPR.

I know of multiple systems that use a UID but throw away a user’s information, including the UID mapping, when the user leaves. This allows historic metrics to be retained without ever identifying a user who isn’t still using the system.

AFAICT, guids are a grey area.

I don't mind that at all, so long as that replacement is never shared with other entities.
Thank you: that explanation is the first that makes sense to me.

I get the impression that this structure would require an exchange: retailers would not trust each other otherwise.

Wouldn’t commercial pamphlets, interviews with salespeople, etc., from the exchange be obvious proof of illegal behaviour there? Google’s implementation is imperfect but, for the loophole to work, it would need coordination between several competitors and third party with a business model explicitly and almost exclusively about going around against GDPR.

If I can risk a comparison, that would be Google is like a chemical company selling fertilizer, and the exchange is selling bombs made from raw material bought by other people.

Am I missing the point? Shouldn’t this article be about those exchange and their clients, not Google?

> To participate, you send all this sensitive information, along with a Google ID

Isn't that also a GDPR violation?

> Why would it matter that the bidders are able to match up the IDs with each other, aren't they all receiving the same profile anyway?

I would guess that yes, they're all receiving – _from Google_ – the "same profile" but they also are collecting additional info that they can then share with each other and, because they can match profiles exactly, they can access each other's info about specific people.

> Wouldn't privacy advocates consider the sending of the profiles at all an issue?

I'd imagine that the profile Google has and shares is by itself fairly anodyne, but I could be (very) wrong about that. The problem seems to be more (if not entirely) that different advertisers can share info using a common profile ID.

I'd imagine that even a single advertiser would be able to perform a similar 'attack' by, e.g. running multiple different campaigns, but I may be misunderstanding exactly what info is being shared. It's possible advertisers are able to match the Google profiles to specific unique identities and thus are sharing much more than just the info they're collecting directly from their ads.

If it's the different advertisers who are going to share info, then why aren't they responsible for their own adherence to GDPR, rather than Google?
I'd imagine they are responsible too, not just alone, and that Google is a much more attractive target for GDPR enforcement both because they're larger, have more money, are more visible, but also because they're directly facilitating the "different advertisers" sharing that info.

If Google ceases to provide them the means of readily sharing info then all of those entities will no longer be violating the GDPR, in the scenario anyways.

As I understand it, Google is responsible for not sharing information that would allow them to violate GDPR. Without explicit user opt-in, that is.
The answer is: RTB is illegal and we're just waiting for the courts to decide on it.
Are they maybe only receiving a partial profile, with info relevant to that ad buy? And by compiling that data with the unique identifier, they can match it with other partial data from other ad buys?