Hacker News new | ask | show | jobs
by blasterford 5224 days ago
Even stranger to see no mention that cookies aren't needed at all to track users successfully.

Check your browser at http://panopticlick.eff.org/

Basically, using useragent, plugins, time zone, language, screen size, etc etc etc, you can fingerprint a user pretty reliably without using cookies.

Cookies are just an easy way to track users client side, but if there is an 'assault' on cookies, then people will just start relying more on server side tracking of users instead.

Disabling 3rd party cookies etc really achieves nothing.

Also, there's numerous methods you can use to store "cookies" in the browser these days, (localstorage api, http cookies, flash, cache etc)

If you really don't want to be tracked for some reason, disable javascript, clear out your user-agent, and use TOR.

8 comments

I would love to see a plugin that mimics the statistically normal setup. (all 3 of the browsers I currently have open read in as unique)
Or something that randomizes your browser fingerprint values each time you restart.
I try to mimic TorBrowser's fingerprint as far as possible.
Incognito mode does that fairly successfully. Try the Panopticlick link in incognito, you'll see what I mean.
"only one in 1,024,434 browsers have the same fingerprint as yours."

fairly successful? eh.

I think that you are that one, as well. Do the test in a normal window and then in an incognito window. Each "Browser Characteristic" had the same result in both.
Wow, there's a lot of unique information there that I didn't expect, such as installed system fonts. Hashing this information could generate a fairly reliable primary key.
Maybe, but there would have to also be passes through the data afterward to link identities. Hashing is dangerous since browsers are living, dynamic beasts. When someone updates their browser, their useragent changes, and you'll want to keep their new identity as an extension of their old one. Not to mention that people use multiple browsers. So there's going to be a vital step of "linking the new identity to old ones" which can happen on a different thread more dedicated -- but you'll need to keep data. You'll probably truncate ultralarge fields and then GZIP them or so, rather than just hashing them.

One interesting thought: how much space would you need to pull this off? Chromium generates 12 KB of data which can gzip to 3KB, Firefox generates 5 KB of data which can gzip to a little over 1KB. Truncate-then-gzip could be used to keep perhaps 0 - 4 KB per person. Assume that your average user uses ~2KB. That's still rather a lot, when compared with what you can do with counters -- 8 bytes or so to store. If you wanted to keep your database under 2 TB, you could only handle a million people, not hundreds of millions. So it would really be a big distributed project to link identities as they evolve over time. I imagine that's one huge factor in using tracking cookies; it's lazy for scaling.

It reminds me of Latanya Sweeney's work in 1990 that demonstrated that 87% of the US population can be identified by just their gender, zip code, and full date of birth.

An interesting project might be to create a database having a table with the useragent hash as the primary key, and associate each identity in the user table to a number of these useragent hashes.

You could do much better than gzip with a custom compression scheme like the Mailinator guy talked about recently.

http://news.ycombinator.com/item?id=3617074

Not mentioning names, but I used to be involved in a project that used this method to track users. From my observations, it was very useful as filter for further analysis. It's not perfect, but really quite good.
Correct me if I'm wrong on this:

Those may be unique for your browser right now, but if you were to update your fonts or your plugins, that would generate a completely new user and all their information about you would be lost. Same as if you delete your cookies, but I bet it happens more frequently.

Even if you had this 'lossy' tracking method for users, you can still deduce that the 'new' user generated by a browser is a potential match of the old user agent info with a certain probability.

More than likely, only the browser version will change. For larger updates, would-be-broken plugins would disappear or see a newer version. It would be a ton of effort to track users this way, but I think it's within the realm of reason for those with enough incentive (NSA, maybe advertising companies)

It doesn't need to be easy to implement to become easy to use, if someone implements this in a way which can be packaged up and included on sites with only a single line of code somewhere it becomes trivial to apply this to any site that wants it.
That's true, it would. In that case you'd need alternative methods to carry the identity across hash changes. Companies participating in this tracking could use their legitimate cookies, or even just login events, to pair up one hash with another.

Likely this could be done in a way that doesn't violate any terms of service or data disclosure promises. After all, pushing out "browser fingerprint 'abcd' and 'efgh' are the same person" isn't disclosing information that most people would realize they're trusting someone with.

Even so, for someone like Google with their scale, even identifying users for a short amount of time would work better than not identifying them at all.
What really surprised me is that running it in incognito mode (in Chrome) made absolutely no difference to my results.

I know that incognito mode is geared towards not leaving a trace on the user's computer rather than being anonymous to the server, but I guess I assumed that with the plugins disabled they wouldn't be visible to the server.

As long as your browser leaks the list of available plugins and fonts, you're not incognito.

An alternative I'd like to see is a standard somewhat fixed small set of plugins and fonts.

I have yet to find a network that uses fingerprinting in that way. The use of ETags in the wild has been seen (KISSMetrics), but nobody has (as yet) had to resort to fingerprinting from the client side since effective blocking of traditional methods isn't pervasive.
I wonder if there is a way to block presentation of the list of plugins to the server.

AFAIK it's not very useful information and certainly removes one of the bigger unique factors.

Panopticlick asked me to enable Java, which I only do for trusted sites. How much identifying information can be gathered with just Javascript to determine how unique my browser is?
I believe the Java is simply for the Fonts which falls back to Flash if you say no.

User Agent provides quite a lot of identifying information; OS/OS version, browser version. Panopticlick breaks this down for you, one in every 186,062 data points they have has my User Agent. This provides 17.5 bits of identifying entropy (log2 of 186062).

They mention they have a total database size of 2,046,684, which requires 20.96 bits of identifying information. So to answer your question about how much identifying information you can get from Javascript, a lot.

You can then also get Flash Version, Time Zones, Browser Plugins, IP Address, Screen Info.

Pardon me for doubting the EFFs claim. Just for grins, I took my stock laptop (as handed out by my company, so I _know_ they're identical), fired up a completely unmodified Safari (not my normal browser, hence nothing installed), and it still claims I'm unique.

At which point the word BS comes to mind.

But just for grins, I repeated the test with a Chromebook fresh out of the box, and of course it's flagged "uniquely identifiable".

I'm not saying the underlying claim - browser characteristics can be used to track you - is bogus. I am saying that I think that site is intentionally exaggerating for effect. Or, more realistically, that while they can extract 20+ bits of info from those strings, the values in that 20+ bit domain are far from uniformly distributed.

Why is it implausible that either of those systems has a unique fingerprint among all those that have run the Panopticlick tool?
Because that means that nobody with a stock chromebook and nobody with a stock laptop from my employer (of which there are many, let's put it that way) has ever visited the panopticlick site.

But just because, I tried two more chromebooks (same model), both in guest mode, both stock configuration - and they're both flagged as "unique" too.

Maybe I'm just a victim of a really long update cycle of their database.

(Addendum: I went back with my original laptop, all cookies cleared, and it's indeed not considered unique any more. So maybe I really just saw some lag in updating their DB)

(Addendum 2: Just to clarify, I never doubted that you can be uniquely identified. But the "unique" part was wrong for my sample. )

Even if you have the same system fonts and plugins installed, the order in which they are reported may be stable on one system but differ on another (due to filesystem inode layout). The EFF's Panopticlick FAQ [1] suggests that Flash and Java plugins should alphabetize the font lists reported from their APIs to reduce variation.

https://panopticlick.eff.org/faq.php

Wait, posting factual info gets you downmodded? Go HN, I guess.
He's probably being downvoted (not by me) because he's misunderstood the tool: his signature is unique among all browsers that have visited that page, not every browser in existence.