Hacker News new | ask | show | jobs
by vinceguidry 4510 days ago
If their goal is to see where it's customers are going, hashing the data with MD5 is a very strange way to go about it. Sure you can break the encryption, but unless their favorite activity to do is run expensive compute farms, they're not going to bother.
5 comments

I assume if that's the objective they just maintain a list of domains they are looking for and match their hashes to the ones they fetch from the users.

I guess it's a way for them to pretend they're not actually invading the user's privacy, just looking for certain websites. That's pretty weak though.

Why can't the MD5 be to protect their own list? They have a bunch of urls they want to block. They don't want to share the list. They md5 each entry on the list to prevent trivial discovery of these urls.
That's a good point actually, if the check is done locally I'd be curious to know which domains they're looking for. If someone could get the list of hashes I'm sure it wouldn't take long before someone manages to bruteforce them with a rainbow table.
Assuming it's a list of cheating-related websites you wouldn't even need a rainbow table, you just post the list of hashes to a cheating forum, have forum users compare their DNS entries and post the hits.
Well, look at it this way, if you were a programmer for Valve looking to solve that problem, wouldn't you think MD5 is decent enough? A stronger algorithm + salt would be slower for no real benefit.
I meant it was a "weak" excuse, you're right that a stronger algorithm wouldn't change much.

In fact, after reading the code and the rest of the thread I'm starting to believe it might be for obfuscation rather than protecting the user's "privacy".

MD5 != encryption.

It's trivial to MD5 a list of common domains, or any other ones of interest, and compare that to the user's list.

Unfortunately the space of all possible web addresses is much (MUCH!) smaller than the space of possible MD5 outputs (modulo the deep web). This means it is much easier to reverse. It just means iterating over the ~billions/trillions of incriminating pages you want to search for and collecting hits rather than brute forcing the MD5.

This absolutely represents a privacy invasion.

Except their favorite activity is to run expensive computer farms
Or they have a hash dictionary of domains they're interested in whether or not you've visited. That is, maybe they don't care about you're midget porn habit, but do care if you visit a competitors web site.