Hacker News new | ask | show | jobs
by hyyypr 2149 days ago
> Conversely, sending every URL that a user visits to some external service, where it could be logged and data-mined by nefarious third parties (i.e Google)

As in, just like DNS.

3 comments

This suggests an interesting question to put on an exam in whatever class teaches about Bloom filters.

----------------

Q: It is proposed to make the local DNS resolver handle IPv4 by using 64 Bloom filters, BO1, BZ1, BO2, BZ2, ... BO64, BZ64.

Filter BOn answers the question "is bit n of the IP address a 1?", and BZn answers the question "is bit n of the IP address a 0?".

Your resolver checks all these Bloom filters. If BOn return "no", then it knows bit n of the address is 0. If BZn returns "no", then it is knows bit n of the address is a 1. Only if BOn and BZn both return "maybe" for some n must the resolver actually do a DNS query over the internet.

Explain why this proposed resolver would not be useful.

These would have to be pretty large and updated regularly. Why not use as k-anonymity scheme? A decent amount of password managers use that to see if the password a user generated is unique.
Google was also the one that invented the technique to prevent sending the URLs serverside. So the random potshot feels unwarranted.
It's a shame that the OP went to huge effort to make a mathematically perfect proof, and that wrote such a deceptive article about it.

It's an ironic demonstration that we shouldn't trust prose. The author's implied thesis is that papers are worthless and only code matters, which applied to the author's paper too!

Apologies if the article came off as deceptive, my intentions were not to try and mislead anyone. While the result may not have practical implications, I don't think that the "debunked" part of the story is incorrect.

To reiterate a point I made in an earlier response: In the paper, we actually present a large number of other papers in the literature (even some recent as 2019) that actually still incorrectly refer to Bloom's expression as an exact bound, so I do think that this is important and somewhat justifies the debunked narrative.

You should remove the unwarranted "nefarious" slam. It's simply incorrect; the actual reason the browser does not send the URL to a central service is user's expectations of privacy do not allow their browsing history to be logged like that, even if the purpose is for malware protection. They have not given proper informed consent, and fortunately don't need to in order to detect malware.

From "Google Chrome Privacy Whitepaper":

Chrome checks the URL of each site you visit or file you download against this local list. If you navigate to a URL that appears on the list, Chrome sends a partial URL fingerprint (the first 32 bits of a SHA-256 hash of the URL) to Google for verification that the URL is indeed dangerous. Chrome also sends a partial URL fingerprint when a site requests a potentially dangerous permission, so that Google can protect you if the site is malicious. Google cannot determine the actual URL from this information.

https://www.google.com/chrome/privacy/whitepaper.html#malwar...

(I work for Google but not on anything like this.)

Good point, my inclusion of Google was mostly in jest, but in hindsight, it come off as misleading, as actual browsers don't actually function in the simplified way presented in the article. I will remove the reference to Google.
DNS sends only the domain name and you can choose who your DNS resolvers will be