Hacker News new | ask | show | jobs
by mayanksinghal 4739 days ago
> only sends in a small part of the URL's hash for matching ...

Isn't it incredibly easy to bypass that check by using a randomly generated url segment?

[Edit: Formatting + isn't]

3 comments

No, its more like this. You download a series of truncated hashes; you generate a bunch of permutations of your URL (strip the query params, strip components of the path/domain), you hash those, check them against your local list. If you get any matches, you request an expanded list from Google, giving them the truncated hashes that matched. This gives you a cacheable list of full hashes; you check your matched hashes against those full hashes, and if any match, then its a match.

tl;dr No, its not that easy.

I have used the Safe Browsing API for one of my projects and if I remember correctly, you are supposed to sent hash of the root domain along with the hash of the URL. Assuming it works similarly for browsers, once the root domain is blacklisted randomly generated URLs won't be able to get through.
Right, so next time we download something private from a different party, we'll ask them to change their file structure to suit our privacy needs. :)
I am not questioning that sending entire URLs is undesirable, I am asking if the hash solution works (at all).