| HN Mirror

"Why would an advertiser do this?"

Not an advertiser necessarily but any entity or person that can "monetise" the data collected. The collector might use the data itself, it might license, sell or transfer the data, it might provide services that rely on the data, who knows. Some users may not want to voluntarily share this data when they derive no benefit from doing so. We do not have to guess all the possible ways, besides locating the applicable TLS certificate, that the data might be used before we can honor the user's wish that this data not be sent in plaintext where it is not needed for choosing the certificate.

AFAIK, sniffing SNI is already used for the purpose of censorship by some countries. This has been published. It would be ignorant to think that this is the only purpose for which such data might be used, or that any purpose would always be non-commercial and unconnected, directly or indirectly, to web advertising. As the use of DoH increases, sniffing SNI would seem an easy substitute for sniffing DNS.

1. A real-time list of every domain visited by a user.

"You don't use the lock on your door to thrawt the person who you invited in and opened the door for."

For some users, advertisers are not an "invited person". What is more, companies like Google have attempted to force the use of TLS for every site, even ones where, in the user's or site operator's opinion, TLS is not needed.

"It is difficult to tell if a site needs it or not at the stage where you send it."

But this is not an argument for sending SNI by default, even where it is not needed.

A TLS proxy can be configured to distinguish sites that need it from sites that do not. This is what I do. The default configuration is to not send SNI. This makes sense because the majority of sites I visit do not require it.

As such, from where I sit, the solution chosen by modern web browsers is to prioritise websites that use CDNs that depend on SNI. The side effects for users of indiscriminantly sending SNI, i.e., sharing every domain the user visits in plaintext on the wire, are not as important as reducing costs for those websites using TLS and CDNs. Arguably, SNI is for the benefit of websites and CDNs at the expense of users. (Hopefully ECH will obviate this tradeoff.)

"Otherwise (e.g. if using DoH) just create a db of popular sites you care about."

According to this answer, 1:1 mapping is not an equally easy alternative to SNI. Sniffing SNI is "trivial" and works for any https site, whereas 1:1 mapping through a database is "non-trivial" and only works for "a selection of popular sites [one] cares about". SNI makes the task of monitoring a user's web use easy. If SNI is not available to sniff, then the task becomes more difficult. This is the point.

Sniffing SNI is easy. The theoretical 1:1 mapping alternative proposed by HN commenters is more difficult. This is the point. What is easy and reliable for all www sites versus what is more difficult and unreliable for all www sites. The point is not what is possible^2 and what is impossible. That is the red herring diversionary argument tactic that HN commenters defending gratuitous SNI like to use.

2. It is possible to avoid DNS altogether and to only send SNI when it is required. I have been doing this for years. Gather bulk DNS data and load the data into a forward TLS proxy that stores domain:IP addresses mapppings in memory and does lookups in real-time as requests are received.