| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saurik 2451 days ago

(I am technically involved in a project called Orchid that would be considered a direct competitor to this idea, were this idea a product; but I would like to think my cynicism isn't related to that ;P.)

So, a more complete--and somewhat more balanced--description of this is in the actual paper, for which this is just a blog post summary; I would think the paper is way more valuable than this blog post, and maybe should even be the Hacker News post target.

https://arxiv.org/abs/1910.00159

First off, the DHT here is unlikely to scale well to large whitelists; yet, for small whitelists, you will (of course) end up knowing the target domain to high probability--which, even for large whitelists, is going to be possible given just the target IP address almost all of the time anyway: even with a CDN, the set of websites you get overlapped with tends to not be extremely large; and, even when it is, it is almost always with a bunch of niche websites that are unlikely to be on your whitelist--so, the premise that this is all hiding from the exit node who you are connecting to is extremely weak.

Oh: and when it does even sort of work with the CDN (due to having the shared endpoint), the user can usually then use domain fronting to trick the SNI, which would bypass this proof and let you connect to any other website behind that IP address; so, really, the way they are doing whitelists is just wrong: the IP address you are connecting to and the totality of what is behind it is way more important than the SNI. Essentially, while you can do this (prove, in zero knowledge, the SNI of an HTTPS connection), it doesn't seem like it really helps a real-world problem (as the situations where the technique works correlate with situations where you failed to hide anything).

Meanwhile, this paper admits to taking 10-30 seconds per HTTPS connection (not per VPN tunnel!) as the DHT lookups and zero knowledge proofs are both slow operations. Somehow, before that completes, it sounds like you just get to use a different node to send "unauthorized traffic"? Why can't I just sit in that regime forever? I am hoping I just don't understand this part, but they say it multiple times as if it isn't such a big deal, and have a bunch of space dedicated to trying to make it sound like the unauthorized traffic would be a small portion of the total traffic (which doesn't exactly sound comforting).

And finally, domain whitelists don't work in the first place: I can post horrible things that get you in trouble to the comments section of a news site (their best example of a kind of website you might whitelist) quite easily; and, for their example of Facebook, it is actively dangerous: Facebook is an entire Internet unto itself that proactively scans for evil things, and so if you whitelist that you are essentially admitting "I would be willing to let you do anything". I could see a URL-based whitelist potentially having value, but not a domain-based one. We shouldn't be making users feel safer with systems that don't even slightly help :(.

(It is maybe also worth reminding that before the advent of encrypted SNI, this data could easily have been used to filter and whitelist traffic... and yet people working on projects like Tor still don't use it for filtering, as it just isn't enough, as you still don't know what the user is doing. It frankly just feels likely to me that the two goals that people want to simultaneously achieve here--"I don't know exactly what you are doing" and "I do know, to some reasonably high certainty, that you aren't doing something that would harm me"--are simply philosophically incompatible without some form of reputation/trust... which then makes achieving a third goal that people want--"I don't know who you are"--much harder.)

Regardless, back to the paper itself, I would argue that this is a single maybe-novel idea--that you can do a zero knowledge proof over the SNI packet of a TLS 1.3 connection with encrypted SNI--that is, as is common in academic papers, trying to be described in the context of a full-scale solution by surrounding it with the minimally-viable wrapper required to turn it into a product for an under-specified use case and then trying to type quickly past the serious downsides (such as the latency), all without being extremely critical of whether the idea itself is useful.