|
|
|
|
|
by dpkonofa
2438 days ago
|
|
Except that the problem with that is that these hashes likely won't reveal domain names. For example, the hash for reddit.com/r/gifs would be different from reddit.com/r/funny and so the prefixes would be different for both of them. Unless the requested hashes are saved for every single user, it would be way too computationally expensive for them to get anything useful out of that. Not to mention the fact that hashes would return the same prefix for any thousands of URLs. Narrowing down which domains those URLs are rooted on would be incredibly hard. |
|
I don't know why or what part you think this is that hard. Do you think a map from User -> set [Requested URL Hashes] is hard to build? Or that building the URL Hash -> set [possible domains] is hard?
Maybe I'm missing a piece of this.
Building something simple to start guessing domains visited seems pretty easy. If a user has 10 URL hashes and the same domains show up in each hashes' possible domains you're probably requesting pages on that domain. If you're lucky and all the pages from a domain fall into a single hash, all it takes is two or 3 hashes from known outbound links to show up to confirm this.
It's not foolproof but hardly infeasible? Or maybe I don't fully understand the algorithm.