| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by raffy 1398 days ago
	Unfortunately, this is only single-character confusables. There are an enormous number of permutations missing (ѐ [450] vs è [E8]). The confusable matching isn't even reflexive (A confuse B, but B doesn't confuse A). I've been developing a normalization library for Ethereum Name Service: https://adraffy.github.io/ens-normalize.js/test/resolver.htm...

3 comments

dhosek 1398 days ago

There are the NKFC and NKFD normalizations which can resolve most of the confusing things so that, e.g., Ⅵ will become VI, Å will become Å, etc. although it doesn’t resolve B and В to the same character.

link

petesergeant 1398 days ago

I wonder at what point you need to give up and just OCR a rendering

link

leni536 1398 days ago

Honestly, for a brand new name service I would just stick to ASCII to sidestep the whole issue.

link

thematrixturtle 1398 days ago

Because fuck those people who want to use non-English names, amirite?

Seriously, it's 2022, we have better solutions than ASCII by now. And for what it's worthful, even in ASCII some chars like l, I and 1 are quite confusable.

link

cmroanirgo 1398 days ago

DNS support is no where near Unicode. At best we get local language support for non-ASCII domains, but everyone else will see it as punycode. Of course, phishing is one good reason why it's this.

> Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.

https://en.wikipedia.org/wiki/Internationalized_domain_name

https://en.wikipedia.org/wiki/Punycode

link

thematrixturtle 1398 days ago

DNS dates from 1983 and we're stuck with its limitations, but the GP is building a new system.

link