Hacker News new | ask | show | jobs
by dolzenko 3351 days ago
Isn't it possible to detect the mixture of cyrillic vs english and so on?
1 comments

This particular example was chosen such that the entire name was using the Cyrillic Unicode range:

https://en.wikipedia.org/wiki/Cyrillic_(Unicode_block)

From that, you can see domains composed exclusively of those characters with are homoglyphic or nearly so with the ASCII equivalents would be susceptible to this:

AaBCcEeFHhIiJjKMOoPpSsTXxY

That's not all the letters, but there are plenty of domains in English which can be written to look the same with entirely Cyrillic.

When I generate all the English words that are in a UK dictionary that use the letters "acehijopsx" exclusively, and then test which are reachable domains, I come up with only about 500. Other than epic.com, the most significant ones are chase.com, sap.com, soap.com and sex.com, I think.