Hacker News new | ask | show | jobs
by Gare 1975 days ago
They load the image URL and observe the loading time. If it's fetched quickly, they know it was from cache. The server (controlled by the advertisers) can intentionally add delay to those image requests that makes detection reliable.
2 comments

I don't see how that helps you persist a tracking ID.

If you generate a random URL, you'll always get a cache miss.

If you use a static URL, you'll know if you have a new session or not, but that doesn't tell you what the tracking ID was.

The only thing I can imagine is the server serve several images /byte1.png /byte2.png etc. and make them all X by 1 pixels, encoding a random value in the dimensions, assuming that's available to Javascript.

But if you encode the tracking ID in the image somehow, you don't care much whether it was cached or not, it's inherently persistent. It'd mainly be useful if you're trying to reconstruct a super cookie.

> If you use a static URL, you'll know if you have a new session or not, but that doesn't tell you what the tracking ID was.

As Mozilla have said:

> "In the case of Firefox’s image cache, a tracker can create a supercookie by “encoding” an identifier for the user in a cached image on one website, and then “retrieving” that identifier on a different website by embedding the same image."

The identifier is encoded into the image itself on a fresh fetch of the static URL, which can then be extracted by JS (which can access pixel data, and their RGBA channel values).

When a cache-hit is detected, you know you have an identifier that correlates to user history.

Assuming js can retrieve pixel data, you could have the server generate unique images and use the rgb values as a unique ID. The unique image would be cached.
You don't need to worry about whether the image is in the cache or not.

If you have to hit the server on that static URL, you write a request handler that will always give you back a new image with a new ID encoded in the pixels. Think of it like dynamic page generation on the server side, but for an image instead. Every time you hit the same URL you get a different image.

On the client you can decode that ID and use it throughout your code, in network requests, etc., to track user activity.

If the image is already cached you just decode the ID and use it as described above. All the browser cares about is associating a URL with a resource: it doesn't know or care that the resource in question changes every time it's asked for.

Also, the client code literally doesn't need to care whether the ID is from an image in cache or an image returned from the server.

The server can simply tie all activity for a given ID together on the back end.

This is one way of doing it: there are probably others. I'm certainly no expert.

With some forms of caching it's much simpler: the browser sends an ETag or If-Modified-Since and the server is supposed to return 304 Not Modified to optimize the load if the cached resource is still valid.
But from JavaScript I don’t think you can see that. You just get the end result of the image being served to you. You have to infer it from timing.
https://megous.com/dl/tmp/705dc9a2477d1f95.png

For cross-origin, you'd add CORS.