Hacker News new | ask | show | jobs
by gpm 1832 days ago
Really? Because that is how googles documentation says it works: https://web.dev/floc/#how-does-floc-work
1 comments

Nowhere in this document does it claim that a summary of your browser history is being sent to websites. It explains the actual process of how cohort IDs are generated and used.
A cohort id is literally a summary statistic...

I think the problem here is just one of language, a summary statistic is a number calculated from a set of data that gives you some idea of the contents of the data, but condenses it in a way that you can't reproduce the original data. Common examples for numeric data sets are things like mean, mode, median, standard deviation. Common examples for data sets consisting of a finite list of strings (such as browser history) would be things like average length, character frequency, count, etc. The cohort id generated is unambiguously such a summary statistic.

I think language could be an issue here, but the problem as I see it is that cohort ID doesn't contain even a summary of the data. It's really just a number.

The website or ad network is able to read those numbers and build profiles on them, but it's still divorced from the user and their specific data.

I think a better comparison is that of a hash. It sums up the data, but is just a unique identifier for it. Of course with a cohort ID it's non-unique (by design).

Because the browser is only sending a number, it retains the ability to change, randomize, or obscure that number. That's an important privacy consideration of the system.

For what it's worth, I do think more work is needed. One of Mozilla's suggestions which I liked was to automatically send a missing ID on occasion, just to keep things a little hazy and reduce fingerprinting viability.

Fingerprinting is inherently less-necessary as a result of FloC, and you need to balance it to not become necessary again, but it's a way to protect users that fully opt-out without themselves become fingerprintable.

Based on https://web.dev/floc/#floc-server it looks exactly like an ml class, rather than a hash.

Almost certainly your browser history is summarized into a vector, and then the closest class number is chosen and sent.

You might not know which vector the number represents, but it does represent a vector for the centroid, and has relationships with other cohorts.

I'd say it's guaranteed that that interface is leaky