Hacker News new | ask | show | jobs
by andrewflnr 2758 days ago
I'm pretty sure that's actually impossible. If someone registers a domain and cert that's essentially a homoglyph attack against a common website, you're basically stuck with heuristics to detect it. You need a global database of targetable domains that supports similarity checking with arbitrary Unicode. You need some kind of fuzzy hash of the website to see whether the website your user is looking at is actually an imitation or just happens to legitimately have a similar name. It will be messy at best.
2 comments

And yet such a feature would more or less solve the problem for a large majority of users. What you describe as terrible seems like a pretty good feature to me. If a URL is visually indistinguishable from amazon.com yet differs at the byte level its probably not legit.

If I were implementing it I would render the domain text and then check how significantly pixels differed from its nearest "known" domain. We used to do this with render tests where there was a bit of noise.

Don't let perfect be the enemy of good.

Whatever happened to the Web of Trust thing? We could have a curated one so that an extension can indicate:

- whether the domain is substantially similar to a trusted one - recent data breaches - whether the site has been known to sell data

Those could be indicated by different, intuitive colors:

- red - high likelihood of phishing/malware - yellow - recent data breach; user intervention required, but the service itself isn't fraudulent - green - reasonable safe - green padlock - trusted

It would be awesome to get all major browser vendors on board to ship it by default, and make sure that data is never sent upstream (download a database).

I loved MyWot! I was one of the earlier users around ~2007 until 2009 or so. It helped teach intuition on sketchy, dangerous, and bloated web pages. The community was small and plenty of sites were unrated, though a surprising number still had ratings (and I was fairly active myself).

To answer your question, privacy addons started selling our data. I remember Adblock Plus added "Acceptable Ads" around 2012. MyWot redesigned in 2013. Times were changing. Surely enough in 2016 they were found selling sensitive user data. It's not like this was a surprise, since it's the reason I left years ago.

These days, I'd rather reduce my browser dependency. I hope the community finds a way to filter the 1% of useful data on the internet into like a .txt file, or something that doesn't make me solve puzzles to grep.