Hacker News new | ask | show | jobs
by cgranier 4223 days ago
Exactly. I need to turn them into something meaningful.
1 comments

Well, there are two separate problems.

First is phonetic similarity. This is mostly just to allow users to be able to understand each other and to help automatically catch alternate latinizations so you find out "Hey, he already registered under a latinized-spelling name".

The second is glyph similarity. This is the security concern where you have two glyphs that are graphically similar but phonetically completely different, but can easily be mistaken for each other. These glyphs are used to trick and confuse users. The first kind of check won't catch these, but they're the reason we don't have unicode in domain names.

Probably a correct system would have a very liberal interpretation of glyph similarity and would treat strings as matched when they contain similar glyphs.