Hacker News new | ask | show | jobs
by Ajedi32 1807 days ago
> Moving stuff around (from User-Agent to Sec-CH-UA-*) doesn't really solve much. That is, having to request this information before getting it doesn't help if sites routinely request all of it.

I think this is sort of ignoring the whole point of the proposal. By making sites request this information rather than simply always sending it like the User-Agent header currently does, browsers gain the ability to deny excessively intrusive requests when they occur.

That is to say, "sites routinely request all of it" is precisely the problem this proposal is intended to solve.

There are some good points in this post about things which can be improved with specific Sec-CH-UA headers, but the overall position seems to be based on a failed understanding of the purpose of client hints.

6 comments

> browsers gain the ability to deny excessively intrusive requests when they occur

But Set-Cookie kind of proves what happen to that kind of feature. If at first sites gets used to be able to request it and get it, then the browsers that deny anything will simply be ignored. And then those browsers will start providing everything, because they don't want to be left out in the cold.

That's what happened to User-Agent, that's what happened to Set-Cookie, and I can't see why it won't happen to Sec-CH-UA-*. Which the post hints at several times. Set-Cookie was supposed to have the browser ask the user to confirm whether they wanted to set a cookie. Not many clients doing that today.

To be honest, I feel the proposal is a bit naïve if it thinks that websites and all browsers will suddenly be on their best behaviour.

> Set-Cookie was supposed to have the browser ask the user to confirm whether they wanted to set a cookie. Not many clients doing that today.

No worries, that's why we have laws to make the website do in the content what the browser no longer wants to do in the viewer. ;D

Having the browser explicitly prompt for cookies is neither necessary nor sufficient to do what strong, consistently-enforced privacy laws can do, because the browser can't tell a tracking cookie (which needs a prompt) apart from a settings cookie (which does not).
And the law also only requires you to ask the user if they want to be spied on.

It's not tightly bound to cookies in any way.

And vastly misunderstood.

There was a predecessor which was somehow tied to cookies but even then you didn't need to ask for setting purely functional cookies.

But somehow everyone ended up interpreting it as such.

Maybe because most sites don't have many purely functional cookies or fingerprinting, as they always track you for other purposes, too.

I’m convinced that a lot of the really annoying cookie prompts are the result of two things:

* paranoia, from small websites that are understandably worried about massive fines that could actually put their one-man-show into the poor house

* retaliation, from large websites that intentionally want to turn public sentiment against privacy laws

We were naive if we ever thought the end result would be otherwise.
But browsers could disable third party cookies, and autodelete first party cookies on page/tab close by default.

There would be a "keep cookies for this site" button somewhere near the address bar, and at each login, the browser would also ask you if you want to save your password and/or save cookies for that domain.

99% of websites don't require persistant storage, and those who do, 99% of them are sites you're logged into and already prompt the user, asking if they want to save the password.

That's private browsing currently. Why not use a private window?
Because i might want cookies on this page, gmail and reddit, and nowhere else. This would mean me starting a private window, googling something, finding a link on reddit, opening it, either logging in again, or copying the link to a non-private window, commenting, closing that window, and back to search results.
I often do that, but now I have to click on cookie confirmation banners all the time. It is very annoying. Might just take seconds, but it sums, eventually I have been clicking on these banners for hours

Sometimes these banners do not even work because of my NoScript

Because software is supposed to make our lives easier, not to insist we keep making the same choices again and again, and undo everything as soon as we make a mistake.
That would be an extension or fork of Set-Cookie.
Of course a web server could report which cookies are for tracking, and which are for authentication or configuration, instead of doing it within the content.

But so what? The browser has no way to tell if it’s lying.

Yes, this looks like DNT all over again. Just another header that quickly becomes meaningless, wasting terabytes of bandwidth all over the world for no good reason.
DNT does nothing technically, but it has political power and that's where privacy happens to a great degree. When 70% of users say 'do not track me', it is hard to claim that they don't care about privacy.
Unless a big vendor (coff Microsoft coff) decides to enable it by default, them it becomes meaningless.
It was meaningless from the beginning: DNT was always nothing but an Evil Bit. You’re getting mad at Microsoft for pointing out that the emperor had no clothes.
It was an Evil Bit becaut it didn't have the force of law behind it. Now we have cookie laws.
There were people promising to implement it. That's a lot better than nothing.
Yes, but it's not hard to ignore DNT on Microsoft user agents, which are a small part of the population.
which were a large part of the population at the time.
Yes, I wish they would engage with how this fits into the rest of the Privacy Sandbox proposal (https://www.chromium.org/Home/chromium-privacy/privacy-sandb...). My understanding is it's:

1. Move entropy from "you get it by default" to "you have to ask for it".

2. Add new APIs that allow you to do things that previously exposed a lot of entropy in a more private way.

3. Add a budget for the total amount of entropy a site is allowed to get for a user, preventing identifying users across sites through fingerprinting.

Client hints are part of step #1. Not especially useful on its own, but when later combined with #3 sites now have a strong incentive to reduce what they ask for to just what they need.

(Disclosure: I work on ads at Google, speaking only for myself)

I think pretty much all browsers and a lot of web platforms made it clear in their response to FLoC that everyone except Google (and Twitter, I guess?) considers Privacy Sandbox to be harmful as a whole.
Objections to FLoC are basically about what should be included in #2. I don't understand why people would be opposed to #1 or #3 though?
It's a fundamental disagreement on the very idea:

Google's position is that it's okay for a website to know X amount of data about a user, you know, as long as it doesn't, in total, cross the creepy line.

Everyone else's position is that if the data isn't required to operate, you don't need it. If we accept that the User Agent, as it is going to be frozen, is going to be served anyways to avoid breaking the legacy web, very little of this proposal adds value, and much of it adds harm. It isn't practical to move to not serving the User Agent, so any replacement for the data in it is pointless at it's very best. The frozen UA provides enough to determine if someone is mobile, the only real need for UA strings. And when most browsers are looking at reducing the tools for websites to fingerprint, Google is introducing new ones.

So Firefox's position on Privacy Sandbox as a whole is pretty logical: If it's optional enough to be requested, why offer it at all? The entire premise of Privacy Sandbox is that it wants sites to have access to some amount of information about the user, and the position of every non-Google-browser is that they want to give sites as close to no data at all as possible.

This is the core of the problem with a single company being legally permitted to operate a web browser and an ad company. Every single browser developer that doesn't own an Ads and Analytics suite is opposed to Privacy Sandbox.

> Google's position is ... Everyone else's position is...

I don't think this categorization is accurate. For example, Apple built https://webkit.org/blog/8943/privacy-preserving-ad-click-att...

> if the data isn't required to operate, you don't need it

This is simple, but it's also wrong. Some counterexamples:

* Learning from implicit feedback: dictation software can operate without learning what corrections people make, or a search engine can operate without learning what links people click on, but the overall quality will be lower. Each individual piece of information isn't required, but the feedback loop allows building a substantially better product.

* Risk-based authentication: you have various ways to identify a user, some of which are more hassle for them than others. A login cookie is lowest friction, asking for a password adds more friction, email / SMS / OTP verification add even more. You don't want to ask all users to go through the highest-friction approach on every pageview, but you also don't want to let a fraudster who gets access to someone's cookiejar/leaked password/old device/etc impersonate the user. If you have a small amount of information about the current user's browsing environment, in a way that's hard for a fraudster to imitate, you can offer much lower friction for a given level of security.

* Incremental rollouts: when you make changes to software that operates in complex environments it can be very difficult to ensure that it operates correctly through testing alone. Incremental rollouts, with telemetry to verify that there are no regressions or that relevant bugs have been fixed, produces better software. You're writing as if your position is Firefox's but even they collect telemetry by default: https://support.mozilla.org/en-US/kb/telemetry-clientid

> the position of every non-Google-browser is that they want to give sites as close to no data at all as possible ... Every single browser developer that doesn't own an Ads and Analytics suite is opposed to Privacy Sandbox.

I cited Apple's conversion tracking API above, but another example of this general approach is Microsoft's https://github.com/WICG/privacy-preserving-ads/blob/main/Par... I don't know where you're getting that they're trying for "close to no data at all", as opposed to improving privacy and preventing cross-site tracking?

(Still speaking only for myself)

> Learning from implicit feedback: dictation software can operate without learning what corrections people make, or a search engine can operate without learning what links people click on, but the overall quality will be lower. Each individual piece of information isn't required, but the feedback loop allows building a substantially better product.

That sounds cool. How do I opt into it?

I would highlight that both Microsoft and Apple (to a lesser extent, mind you) also operate their own ad platforms. Don't get me wrong, I'd be happy to see a blanket ban on web browsers and ad companies being related, and have it apply to all three. I'm an equally opportunity antitrust breakup advocate. ;)

Regarding risk-based authentication, I see a lot of value in it, but I think the cost may be too high, and often less robust methods it uses are a poor metric anyways. I gave an example elsewhere that someone might be using a wired PC and a wireless phone on two different carriers with vastly different user agents at the same time, for instance.

I think there's some merit in some very rough Geo-IP based RBA, but I'm not sure how many other strategies for that I find effective. The fact that Outlook and Gmail seem equally happy to let someone who's never signed in from outside the United States get logged into in Nigeria seems like low-lying fruit in the risk-based authentication space. ;)

IMHO #3 is fundamentally flawed as I just can't imagine browsers improving to a point where you couldn't cross reference such "fixed" entropy budges to clearly identify the user.

The only IMHO reasonable technical solution is to reduce entropy as much as possible, even below any arbitrary set entropy limit.

Through in the end I think the right way is a outright (law based) ban of micro targeting and collecting of anything but strongly, transparently and decentralized anonymized metrics.

Also I don't seen Google fully pulling through, e.g. one area where chrome is massively worse then Firefox wrt. entropy is the canvas (at least last time I checked). It's an area where there are known reliable ways to strongly hinder fingerprinting of the canvas. But I don't see Google using them as it would be in conflict with Flutter Web rendering animations in the canvas (which inherently has problems and is technically sub-par compared to how the browser could render web animations (and does in case of Firefox)).

There are really only two ways this can go:

A. Browsers successfully reduce available entropy to where users cannot reliably be tracked across sites.

B. Browsers fail at this, and widely available JavaScript libraries allow cross-site tracking. If it's possible to extract enough bits, they will be extracted.

The thing is, if you can't get all the way to (A) then in removing bits you're just removing useful functionality and adding work for browser developers and web developers. Fighting fingerprinting is only worth it if you have a serious chance of getting to (A).

If you think (A) is off the table then I agree a regulatory solution is the best option. Even then, #1, as exemplified by UACH, is still helpful because it makes tracking more visible. If every piece of information you collect requires active work, instead of just receiving lots of bits by default, then it's much easier for external organizations to identify excessive collection.

(Still speaking only for myself)

Why not both (A) and a regulatory solution? I see no reason to avoid the regulatory route.
Well, if the browsers can just deny those requests, then they can just drop the information entirely. (And they are dropping them from the UA.)

From the two non-harmful pieces, one is of interest of all sites, and the other one has the implementation broken on Chrome, so sites will have to use an alternative mechanism anyway. If there's any value on the idea, Google can propose them with a set of information that brings value, instead of just fingerprinting people.

I think the idea is that there are some legitimate uses for UA information that they don't want to eliminate entirely, otherwise yeah they could just deprecate the User-Agent header and be done with it.
Yes, I got that from your post. It's just that for Google, proposing it again with harmless content is very easy, but for anybody else to filter the bad content once the Google proposal gets accepted is almost impossible. (Although, if I was working on Firefox, I would just copy the most common data from Chrome, adjusting for those 2 fields that matter. That would create problems, but it's the less problematic choice.)

So, no, it should be rejected. Entirely and severely. It doesn't mean that contextual headers are a bad practice, it's just that this one proposal is bad.

I think most of the legitimate uses could be solved in a simple statement: Let users know whether the device is mobile or desktop, and then expect websites to send all of the logic to handle the rest client-side, so the server does not need to know.

I'd love to see browser metrics being absolutely devastated as an analytic source: It just is used today as an excuse to only support Chrome.

Risk-based authentication can use a change in user agent as an increased risk factor.
It could, but as someone who has spoofed user-agents in the past (primarily to get Chrome-only websites to cooperate) I would prefer if it wouldn't. If the baddies can snoop my https traffic or directly copy the auth cookies from my machine then also copying my user-agent isn't that big of a step for them. One might argue that detecting changes in user agents could be part of some kind of defense in depth strategy, but as a user I imagine I'm already so boned in that scenario that I doubt it would save me. So overall such a mechanism would bring me more inconvenience than security.
That's the whole point of RBA, though. That two requests have the same user agent doesn't tell me much, but if you have two different user agents from two different IPs that may sound really risky (use case dependent, of course).
>By making sites request this information rather than simply always sending it like the User-Agent header currently does, browsers gain the ability to deny excessively intrusive requests when they occur.

Browsers can just not send a UA header

I tried this. It breaks a surprisingly large number of sites (or perhaps not-so-surprisingly), and good luck trying to beat Google's captcha without a User-Agent header.
Good luck trying to beat ReCaptcha if you're doing anything that puts you outside of the normal web browser behavior as imagined by Google's Algorithm.

If User Agent Client Hints become the new normal, I'm sure anyone excessively denying requests will be flagged in the same way.

Having to request it is a terrible idea to begin with. If I want to use different templates for mobile vs desktop, I need to know, on the backend, whether the device is a mobile device, and I need it on the very first request. Having to request these headers explicitly is an unnecessary complication that would slow down the first load.

However it is nice that there's now a separate header that gives a yes or no answer on whether it's a mobile device.

Why would you need different templates for mobile/desktop? CSS is quite capable responding to any screen orientation.
Yes it is. Except you can't use the same markup for both because the input devices, and thus interaction paradigms, are so radically different. Mice are precise and capable of hovering over things, so it makes sense to pack everything densely and add various tooltips and popup menus. Touchscreens are imprecise and don't have anything resembling hovering, so UI elements must be large, with enough padding around them, and with menus appearing on click.
Between CSS Flexbox and CSS Grid there shouldn't any reasons today that you can't handle 100% of those differences with the same markup and media stylesheets. (There's also obviously JS if you really must contort the HTML DOM to get what you want.)
You're not wrong. However, there are times when CSS isn't enough. For example:

- The Mobile vs Desktop design differences are too great.

- The site was originally created without considering mobile, and retrofitting mobile support is unfeasible.

Can you expand on the design differences?
"By making sites request this information rather than simply sending it like the User-Agent header currently does..."

This is also true with respect to SNI which leaks the domain name in clear text on the wire. The popular browsers send it even when it is not required.

The forward proxy configuration I wrote distinguishes the sites (CDNs) that actually need SNI and the proxy only sends it when required. The majority of websites submitted to HN do not need it. I also require TLSv1.3 and strip out unecessary headers. It all works flawlessly with very few exceptions.

We could argue that sending so much unecessary information as popular browsers do when technically it is not necessary for the user is user hostile. It is one-sided. "Tech" companies and others interested in online advertising have been using this data to their advantage for decades.

How would this work?

SNI is sent by the client in the initial part of the TLS handshake. If you don't send it, the server sends the wrong/bad cert. The client could retry the handshake using SNI to get the correct cert but:

- This adds an extra RTT, on the critical path of getting the base HTML, hurting performance.

- A MITM could send back an invalid cert, causing the browser to retry with SNI, leaking it anyway (since we aren't talking about TLS 1.3 and an encrypted SNI).

I suppose the client could maintain a list of sites that don't need SNI, like the HSTS preload list, but that seems like a ton of overhead to avoid sending unneeded SNI, especially when most DNS is unencrypted and would leak the hostname just like SNI anyways.

"I suppose the client could maintain a list of sites that don't need SNI."

That list would be much larger than the list of sites that do require SNI.

Generally, I can determine whether SNI is required by IP address, i.e., whether it belongs to a CDN that requires SNI. Popular CDNs like AWS publish lists of their public IPs. I use TLSv1.3 plus ESNI with Cloudflare but they are currently the only CDN that supports it. Experimental but works great, IME.

The proxy maintains the list not the browser. The proxy is designed for this and can easily hold lists of 10s of 1000s of domains in memory. That's more domains than I visit in one day, week, month or year.

Is it not a question of whether this is possible. "How would this work". I have already implemented it. It works. It is not difficult to set up.

Why this works for me and would unlikely work for others.

I am not a heavy user of popular browsers, I "live on the command line". Installing a custom root certificate with appropriate SANs to suppress browser warnings is a nusiance that would likely dissuade others since they are heavy users of those programs. However I generally do not use those browsers to retrieve content from the web.

Ahhh. I see, you are default-no-SNI, and whitelist those that do.

If your threat model is such that you absolutely positively cannot leak the signal of what domain names you want to make HTTPS connections to, then I suppose this is an approach that can be used. But if you believe that is your threat model, I imagine you have bigger issues to protect against. As you say, it's unlikely to work for others.

No "threat model" here, just a dissatisfaction with so-called "modern" browsers and TLS extensions that disproportionally benefit hosting companies over users (privacy in this case). Plus I genuinely prefer commandline TCP clients and text-only browser to read HTML for most web use. I like the speed, reliability and more uniform presentation I get across all web sites. I like text. Big browsers that do everything under the sun written by people working for "tech" companies funded by advertising are not interesting to me. In fact, I find them annoying.

Some folks write "browser extensions" to control graphical browsers to their liking. I generally do not use graphical javascript-enabled browsers; I prefer to use a different program, a proxy, to control the browser. It works with both graphical browsers and text-only ones.

I don't think you can ever determine that a site doesn't need SNI using HTTP alone. All you can have is that it doesn't or you don't know.
I do not use "HTTP alone", I use DNS, more specifically IP address. I generate lists. The lists are largely based on the hosting provider and created automatically, but I also edit them manually when necessary, which is the exception not the rule. Most sites requiring SNI that are submitted to HN all use the same CDNs: AWS and Cloudflare. The SNI list is dominated by sites hosted on AWS. The ESNI list is all sites hosted on Cloudlfare.

When I first started developing this workaround I thought I would be manually editing the SNI list constantly for "all those random sites that use SNI". This has not been the case. For the sites submitted to HN, use of SNI is mostly a CDN phenomena.

The important point here is that I do not send SNI by default. The default is privacy-by-design: no SNI. If I encounter a site that fails because it needs SNI, I add it to the list. The failure is caught by the proxy (the proxy verifies certificates, I do not rely on the browser), the SSL error is visible in the logs, and the error page the browser receives is a custom one I created myself that tells me where in the configuration the failure occured. I can test whether a site requires SNI very quickly.

Popular browsers cannot do this, we know that. If they could, I would not be coming up with workarounds. They routinely send more data than is needed, including SNI. That is the point of the original comment.

s/phenomena/phenomenon/