Hacker News new | ask | show | jobs
by gpm 1833 days ago
The website is really a third party here, the browser is choosing to track users browser history and report a summary statistic on it to anyone who asks, there's nothing the website can do about that.

Chrome has promised to listen if websites say they don't want to be included in the browser history they calculate that statistic on, but it's all client side, there is nothing the website can actually do but request that they aren't included.

1 comments

> the browser is choosing to track users browser history and report a summary statistic on it to anyone who asks

It doesn't work that way at all.

Really? Because that is how googles documentation says it works: https://web.dev/floc/#how-does-floc-work
Nowhere in this document does it claim that a summary of your browser history is being sent to websites. It explains the actual process of how cohort IDs are generated and used.
A cohort id is literally a summary statistic...

I think the problem here is just one of language, a summary statistic is a number calculated from a set of data that gives you some idea of the contents of the data, but condenses it in a way that you can't reproduce the original data. Common examples for numeric data sets are things like mean, mode, median, standard deviation. Common examples for data sets consisting of a finite list of strings (such as browser history) would be things like average length, character frequency, count, etc. The cohort id generated is unambiguously such a summary statistic.

I think language could be an issue here, but the problem as I see it is that cohort ID doesn't contain even a summary of the data. It's really just a number.

The website or ad network is able to read those numbers and build profiles on them, but it's still divorced from the user and their specific data.

I think a better comparison is that of a hash. It sums up the data, but is just a unique identifier for it. Of course with a cohort ID it's non-unique (by design).

Because the browser is only sending a number, it retains the ability to change, randomize, or obscure that number. That's an important privacy consideration of the system.

For what it's worth, I do think more work is needed. One of Mozilla's suggestions which I liked was to automatically send a missing ID on occasion, just to keep things a little hazy and reduce fingerprinting viability.

Fingerprinting is inherently less-necessary as a result of FloC, and you need to balance it to not become necessary again, but it's a way to protect users that fully opt-out without themselves become fingerprintable.

Based on https://web.dev/floc/#floc-server it looks exactly like an ml class, rather than a hash.

Almost certainly your browser history is summarized into a vector, and then the closest class number is chosen and sent.

You might not know which vector the number represents, but it does represent a vector for the centroid, and has relationships with other cohorts.

I'd say it's guaranteed that that interface is leaky

that’s my understanding of how it works too. could you explain?
Rather than the browser sending a summary of your history, it calculates a cohort ID. That ID is sent to websites, and the website then has the job of associating IDs with interests.

So instead of building a profile on specific users, the website (or ad network) builds profiles on cohort IDs. Users can change IDs, or mask theirs altogether if they wish.

So we'll have to trust Google's browser will respect all website's headers that request not to be included in the cohort tracking. Just like Google respected Safari privacy settings. https://www.eff.org/deeplinks/2012/02/time-make-amends-googl...
Chromium is open-source. It's trivial to see if it's respecting the header or not.

DNT was DOA. You can blame Microsoft for that one.