Hacker News new | ask | show | jobs
by proofofstake 3198 days ago
The end result for the customer is that the ad industry will switch to cross-device tracking for everyone.

Right now, you had an option to opt-out, by setting cookies to block. You were relatively safe.

Now the default will become a net of machine learning algorithms which can track you cross-device without requiring cookies. It is not possible to safeguard against that, unless you completely randomize your online browsing behavior.

1 comments

Interesting point: If the despicable ad industry "ups their game", it might get harder to evade.

What techniques exist in that space? Anything beyond browser footprints ( https://panopticlick.eff.org ) and super cookies?

Any reasonably realistic suggestions for evading tracking in that scenario?

There is probabilistic vs. deterministic cross-device tracking.

Deterministic assigns a unique device identifier to each device and then uses more data to connect device IDs to an individual.

Probabilistic cross-device tracking uses machine learning algorithms to match up devices and identities. For this they can use basically any data that you happen to give them, including behavioral data (you check a website both at home and during transit on your mobile phone, you use the mouse to select text during reading an article, you accidentally gave an application access to your location data and they have resold this, etc.). Probabilistic cross-device tracking can work with and without cookies. Of course ad companies employing these techniques for their customers claim very optimistic accuracy, but know that the accuracy is at least accurate enough to provide them with useful tracking data on individuals. This accuracy will go up if you push ad companies in a corner and confront them with a 10% (or whatever marketshare Apple browsers have) non-cookie-able surfers (as opposed to a fringe small sample of users that block cookies and did so for years).

When cookies got banned/required permission in Europe, European websites just started buggering everyone to accept cookies before you were able to read what you were coming for. While everyone already had the option to only allow cookies from trusted domains, now everybody gets pestered with giant pop-ups. Companies also switched to server-side analytics/tracking, or started requiring log-in to track you.

If cookies were accepted, one could just join a cookie swap program to mess with the advertisers. Probabilistic cross-device tracking is very hard to avoid, as not using javascript and a general browser like TorBrowser is also an informative fingerprint. And you can't realistically change your browsing habits, which exposes you to gender and age identification (they need this to identify individuals in a household using a single IP).

Probabilistic cross-device tracking uses machine learning algorithms to match up devices and identities

Is there any application of machine learning that isn't evil? Genuine question. It seems to be exclusively used to exploit people.

I think you're running up against the fact that ML is a tool that can solve classification problems, which just like search in years past, has the potential to be very political.

We've mostly gotten used a world with search -- in the past it was a big political controversy that powerful groups could scan huge volumes of data for advertising, tracking, or criminal activity. There were jokes in chatrooms about using the word bomb because the FBI would pick up on it. Or that your search queries and web history would be scanned for keywords and used for ads.

Now there's this new tool that allows people/things to be grouped and classified very precisely by imprecise rules. Combine the hype about the different problems this new tool will help people solve and the inexperience that people have with the ethical ramifications of the things they build, with the crudeness of the initial implementations and it's easy to get the sense that ML will be a net bad for society -- just like search.

Sure there is! ML helps with medical diagnosis, cures, and treatment, ML helps automate services that are unavailable to third world countries, ML helps catch criminals, money laundering, and fraudsters, ML saves wasteful energy consumption, ML improves customer support, ML helps optimize resources to focus on those in immediate need, ML supports science like high-energy physics, ML creates a new generation of digital artists.

All good uses of ML, but maybe not so sexy.

I bet the majority of ML is used neutrally: to add business value to a company. Depending on your view of capitalism of course.

ML helps with medical diagnosis, cures, and treatmen

Sure that's what IBM says with Watson but that's just exploiting people too.

An example of what I was aiming at is to use ML methods like deep nets to detect early-onset diabetic retinopathy. Diabetic retinopathy is the leading cause of blindness and at least 90% of new cases could be reduced with early detection and proper treatment. [1]

Especially in third world countries where there is no eye doctor in sight, these cheap automated methods can be deployed on a mobile phone and achieve near-human expert level accuracy. [2]

Then organizations like Watsi can use data science and predictive modeling to reduce fraud and get both detection and treatments to those most in need. [3]

[1] https://en.wikipedia.org/wiki/Diabetic_retinopathy

[2] http://blog.kaggle.com/2015/09/09/diabetic-retinopathy-winne...

[3] https://dssg.uchicago.edu/

About IBM Watson, the entire thing is unfortunate, I completely agree. Their marketing department upsold IBM Watson for cancer treatment. But I know that a lot of great research scientists worked at IBM on Watson. What they were doing was legit advancing machine learning too. That's the thing about marketing: if IBM were to deploy 10.000 phones with a neural net to improve early detection of disease in a third world country, I probably won't even hear about it, and I work with ML. But for IBM Watson, everybody and their grandmother goes: That's the AI that beat Jeopardy. It's a thin line between being majorly successful in marketing and crossing the line into damaging your reputation and goodwill (or in this case: the entire ML industry).

Learn a few new languages, browse on each device thinking with that language. It could be enough to fool some algorithms.