Hacker News new | ask | show | jobs
by Medicineguy 1976 days ago
The problem is not the option to place cookies per se. The issue is its misuse which aims to de-anonymize users (in order to place ads). I don't see how saving the user data somewhere else (in a browser add-on or in the browser natively) is helping here.

EDIT: The official description [https://github.com/WICG/floc], does a better job in explaining the point. They try to cluster (="cohort") users interests and exchange that with the ad-service. This could maybe help to increase transparency and authority over your data as it's saved locally. But I don't see a way to limit the access to the users cohorts (they even say that themself, see link above). Everybody could access my interests - not just Google and other ad services. And of course, if you have 1000 categories and some meta information (region based on IP address etc.), you will be able to track down individual users with pretty good accuracy.

8 comments

From the Github page: >Browsers would need a way to form clusters that are both useful and private >The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits.

To me it sound like just another layer of indirection with google right in the center of it. Even if this method works well enough from an advertising perspective, i expect there will soon be adverserial models to deanonymize.

Rather than giving the advertiser a list of my interests, it'd be nice if the advertiser gave me a list of keywords for the ads it might show next and my browser requests the ad for me. A default browser could then be configured to learn with a thumbs up / thumbs down / never show me again type of Bayesian training. Or a non-mainstream browser could request random ads.
But most people would never up/down the ad, which means the ad would be targeted more randomly, which means it wouldn't be as effective, which means the website/content owner wouldn't get as much money for displaying it.

I don't think that solution works in the current environment, unfortunately.

If I clicked the ad, thumbs up, if not thumbs down. With appropriate weights this can work.

But.. Ad networks will never implement this, cause priority there :

1) ad network 2) advertiser 3) publisher 4) user

This bumps user from 4th place to 1st place

As of today, some ads pay per click (CPC), but most ad spots pay per 1000 views (CPM). Ads can influence behavior after they are viewed, regardless of whether the user decides to interact with the ad. I'm sure Google has put tons of effort into trying to tie ad views to purchases, both online and offline (I bet GMail and Google Pay are leveraged for this).

I am not claiming this is good or bad, but clicks are not a good enough signal of efficacy for the vast majority of ads shown on the internet.

A good ad network's JS should be able to tell how long the ad was in the viewing portal, or for a video interstitial how long it played before the viewer skipped the end. That sort of signal could be useful, especially the way many sites use horizontal ad banners like horizontal rules in the pages.
They would never implement it because there is no valid signal here. The vast majority of users would just thumbs down every ad they see because they believe that will result in less ads.

Go check out the messaging around Ad Choices and how poorly it ended up working.

Ad Choices also intentionally obscured the access into by making it small and appear to be ad branding rather than a button.
> The vast majority of users would just thumbs down every ad they see because they believe that will result in less ads.

This... just doesn't apply to the scheme described:

>> If I clicked the ad, thumbs up, if not thumbs down. With appropriate weights this can work.

Ad networks are in business based on ad performance, which is driven by the user. Even though there's quite a lot of terrible UX with online ads (for lots of reasons), the user does matter more than you think.
If the advertiser is able to try a large amount of keywords they might still be able to infer the client's interest list based on what it requests.
That's for sure, and more fringe interests would still be more informative than more mainstream interests, too. It wouldn't be as direct and therefore provides a bit of a barrier.
There is another issue of potential/actual misuse that few are discussing.

That is, the (mis)use of experiments on browser users. "Field trials." This is enabled by the use of "updates". When users agree to updates they agree to let a corporation silently install and run new code on their computer at will, at any time.

This permits the company to create a situation where person A's browser is not quite the same program as person B's, there will be differences. Thus the corporation can run an "experiment". Both person A and person B might believe "I am using XYZ browser". The two users believe it is the same program. However there are differences. The differences can be added and removed through "automatic updates".

How do users maintain privacy in that situtation. The company behind XYZ browser can easily isolate groups of users with similar/different traits by conducting such experiments and observing user behaviour. "Cohorts". While the company may argue it is only testing software, there is an argument that it can also be testing users.

The words in parentheticals above can be defined and redfined any way you like. What is important is what the corporation is actually doing, not the label/name/terminology they assign to it.

I think what you’re saying is “despite this potential change, A/B testing by user attributes will still exist”. Is that right?
Plus, if each cohort is a “group of [merely] thousands of people [any the worldwide internet population]”, the advertiser could probably narrow your identity pretty well using passive fingerprinting of cohort(s) + IP address + Chrome version + OS + OS version and maybe HTTP headers for languages locale and time zone, though those are probably strongly correlated with the client IP address.
Combining those would definitely be a problem. https://www.chromium.org/Home/chromium-privacy/privacy-sandb... describes removing/limiting those fingerprinting vectors, including IP.

(Disclosure: I work for Google, speaking only for myself.)

> Browsers would need a way to form clusters that are both useful and private >The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits.

How would FLoC audience targeting work in non-chrome browsers? DV360 users deliver ads on all browsers, no?

FLoC is a proposal for a web standard, which other browsers could implement.

Today, in browsers where third party cookies were removed without replacement, companies like Google that aren't willing to fingerprint have pretty limited user targeting capabilities.

Does that mean advertisers using DV360 will have the option to target using known identifiers or FloC? Chrome market share in the US is 50%. FloC covers 50% of the total US market. Advertisers want all the scale. https://www.statista.com/statistics/276738/worldwide-and-us-...
I think users using the search engine, email, maps etc in other browsers is hardly a "limited" amount of data for ad targeting.
Sorry, you're right, advertising on Google's own properties is mostly unaffected by browsers removing support for third-party cookies. I was thinking about AdManager and AdSense; ads shown on publisher sites.
According to the specs, the requests are made without user agent headers, leaving only IP address. Targeting ads based on IP address isn't particularly valuable to ad networks if they can't correlate it with anything other than the sandboxed cohort data.
If you give me a demographic group (age, sex, income, etc) of a thousand people, and give me the IP address I can uniquely identify the individual within that group using outside data sources like Experian.
> and give me the IP address

The Chrome proposal is that it won't: https://github.com/bslassey/ip-blindness

What insane ramblings is this? Every site will be forced to use an approved CDN? Adding forced MitM to every connection is the opposite of what we should be trying to implement.
If you want to prevent fingerprinting, you need to look at where the identifying bits are coming from. (ex: https://coveryourtracks.eff.org/) The IP address provides enough bits to uniquely identify many users, and when combined with just a few more bits, to identify almost anyone.

TOR is one solution here, which you could potentially also describe as "adding forced MitM to every connection". The proposals in https://github.com/bslassey/ip-blindness/blob/master/near_pa... and https://github.com/bslassey/ip-blindness/blob/master/willful... have different tradeoffs than TOR, with the "TOR is painfully slow" problem being a big one.

If you have better ideas, though, I would be very interested in reading them!

> if you have 1000 categories and some meta information (region based on IP address etc.), you will be able to track down individual users with pretty good accuracy.

Looking at the corresponding TURTLEDOVE proposal, it's sending only a handful of the known categories to any given ad network at any given time. Floc also claims that:

> The collection of cohorts will be analyzed to ensure that cohorts are of sufficient size

Browser fingerprinting is already pretty good if you can run arbitrary JS on a site. Add access to a FLOC, even a FLOC with 10k people, and you're basically at a place that's worse than third-party cookies were, because at least third-party cookies could be blocked. Ad networks are already using fingerprinting and this will be seen as a blessing to them.
If browsers would stop some edge case extensions such as rendering to canvas and reading the data back, it would be much more difficult. Browser JS envs just expose way way too much entropy from the user system
You'd have to get rid of a ton of modern features and somehow backfill / update all browsers to a set of constants

- audio waveform generation - access to gpu/webgl info - have to somehow dramatically change or remove ICE/webrtc - standardize 'feature flags' e.g. somehow backfill old browser so they all show support for new JS objects - access to only a small set of fonts - somehow make rendering completely the same across browsers or remove measurement/rendering to like 5px increments or something. e.g. bounding rect of (747744.888some two character specific font or some svgcss transform etc) - testing for a ton of css extensions - supported mime types - a bunch of SVG things (i dont think this has been explored much i have a hunch there are some good targets) - a bunch of latency hacks and more...

Things like string measurement is indeed tricky. Audio generation or reading back raster data simply shouldn’t be possible by default. I’d be happy to enable that on a per site basis like pop ups.
Its a classic battle of intent and misdirection to the tools. The problem isn’t the tools it’s the intent.
When cookies first appeared, my first response to someone pushing them was: you want to save data for your purposes? Save it on your own damned machine, I don't want it on mine. Of course they're 'abused', that was the whole intent.