Hacker News new | ask | show | jobs
by raffy 1398 days ago
Unfortunately, this is only single-character confusables. There are an enormous number of permutations missing (ѐ [450] vs è [E8]). The confusable matching isn't even reflexive (A confuse B, but B doesn't confuse A).

I've been developing a normalization library for Ethereum Name Service: https://adraffy.github.io/ens-normalize.js/test/resolver.htm...

3 comments

There are the NKFC and NKFD normalizations which can resolve most of the confusing things so that, e.g., Ⅵ will become VI, Å will become Å, etc. although it doesn’t resolve B and В to the same character.
I wonder at what point you need to give up and just OCR a rendering
Honestly, for a brand new name service I would just stick to ASCII to sidestep the whole issue.
Because fuck those people who want to use non-English names, amirite?

Seriously, it's 2022, we have better solutions than ASCII by now. And for what it's worthful, even in ASCII some chars like l, I and 1 are quite confusable.

DNS support is no where near Unicode. At best we get local language support for non-ASCII domains, but everyone else will see it as punycode. Of course, phishing is one good reason why it's this.

> Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

https://en.wikipedia.org/wiki/Internationalized_domain_name

https://en.wikipedia.org/wiki/Punycode

DNS dates from 1983 and we're stuck with its limitations, but the GP is building a new system.