Hacker News new | ask | show | jobs
by otabdeveloper3 2567 days ago
There's nothing to whistleblow.

Advertising is applied sociology. As such, advertisers want to aggregate large data sets into large segments that are easy to manipulate statistically. (Where the central limit theorem starts working.)

There is no demand for personal data or de-anonymization because that stuff doesn't sell.

The personal data collection is done by Google, Facebook at al not for advertising purposes. They're collecting it because they view it as a resource and a currency in the future de-anonymized world. (Think China's "social capital" except on a larger scale.)

Source: I've worked in the ad industry for over 15 years.

3 comments

> There is no demand for personal data or de-anonymization because that stuff doesn't sell.

Say what?? I’ve also worked in the ad industry and deanonymized personal data is shared and sold routinely. You speak of statistics and large segments but every advertiser I’ve interacted with is either doing individual-level targeting or striving towards it.

To wit, A few weeks ago there was a discussion here about a method by which you could figure out how fast a browser/machine could compute an SHA 512 hash, and that this was being used to fingerprint users even who had cookies, images, JavaScript disabled.
Hence why technical solutions have been and always will be the wrong approach. If you are worried about privacy, then work to make tracking illegal. That's what this article is doing.
Was it just a proof of concept demonstration, or was there evidence that this method is being used in the wild by advertisers?
They stated that they were using it in production for that purpose.
Gotchya, thanks. That seems... kinda wild to me. That method has to be super imprecise, and wastes the resources of everybody involved.
I would like to see more. How can you get their computer to compute the hash? Is it somewhere in the https interaction?
I don't recall the details, but it had something to do with a would-be security feature in the browser that computes the hash of something before following a link.
> every advertiser I’ve interacted with is either doing individual-level targeting or striving towards it.

Only if they're clueless.

For example: Nike really wants a dataset of "people who buy expensive sneakers for fashion purposes".

This dataset is probably hundreds of millions of anonymous people, and not personal data. If there was a way to get this dataset directly, Nike would do that in a heartbeat.

Unfortunately, as of 2019 the only way to get something like this today is by, e.g., crossreferencing credit card purchase info with Twitter browsing logs, which leaks a shitload of sensitive private data.

For ad purposes personal data collection is a bug, not a feature.

There are many sites that have quite accurate PII for large fractions of the American population. Think job boards for instance. One approach that I have seen used successfully is simply buying whatever data such companies are willing to sell.

I’m not necessarily talking about demographics, but rather clickstream data, and anything categorical that you can get your hands on. You join that to your CRM and build a model to predict buyers. A really good predictive dataset for marketing purposes is simply a list of time stamps and names of visited domains. With the right feature engineering, that becomes an excellent proxy for demographic data, current buying appetite, and a whole lot more.

At the end of the day, you don’t even necessarily need to know what the data means as long as it’s predictive. And there are plenty of brokers out there who will let you test their data for free with an agreement to pay if you end up using it at scale. All of this revolves on using PII for matching.

I’m sure what you’re saying is true for some marketers, but there are billions being made on PII keyed data.

Are you reading what I'm writing? No?

Let me repeat again. PII is a crutch used for matching, because current matching/segmenting technologies are crude.

Advertisers don't want PII. What they want is target audiences with predictive power, which means data sets where the central limit theorem holds sway. (I.e., thousands and millions of people lumped together.)

If advertisers could get at these segments directly without PII, they'd do it in a second.

I think in some circles the meaning of the term "whistle-blowing" has drifted enough that people use it interchangeably with "reveal."

Given your 15 years of experience, what resources do you recommend to HN so that we can learn more? Can you give us a "life of an advertising bit," eg: A person visits a website on their phone, that information is accompanied by x data on their phone, goes to the initial ad server, this information is compiled against data from sources a,b,c, etc ...

My question to someone from the ad-industry would be if there are known "intersections" between these anonymised data sets ad-tech is using to sell as much as possible, and companies who buy these data-sets to connect them to real identities.

Especially because the web is full of Privacy notices people agree to, and I guess in some of those people actually agree to have their anonymous browsing data connected to their real identities.

"Known"? Probably not.

Like I said, knowing real identities is the last thing on the list of ad tech priorities.

If shadowy entities are collecting "real identities" then it's not for ad purposes.