Hacker News new | ask | show | jobs
by bazzargh 3498 days ago
Nope. The browser is behaving sensibly, since you can't register that domain. It's applying the same rules that the registrars do.

ICANN require that registries follow RFC3491 and related RFCs for name prep before allowing a name to be registered https://www.icann.org/resources/unthemed-pages/idn-guideline... . What that one does is (among other things) NFKC normalization and case-folding:

    irb(main):016:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c"
    => "PAYPAL"
    irb(main):017:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c".unicode_normalize(:nfkc).downcase
    => "paypal"
3 comments

Interesting. So, out of interest, why is the same not being applied for ɢ? (When I ran it through Python's unidecode I got the roman symbol all the same).
Because 'small capital g' doesn't have a compatibility decomposition to G, but wide letter P does have a compatibility decomposition to 'normal' P. Unicode normalization kills large classes of homograph attacks but by no means all. conventions over mixing scripts from different languages stop some more, but there's no single answer.
Doing the "ɢ" conversion here[0], I get

  xn--1na
[0]https://www.punycoder.com/
The problem is that the RFCs aren't restrictive enough, partly because the IETF doesn't have much authority over registrars. The domain name rules really ought to be something like "one script, plus numbers, in a domain name part". But this runs into such things as the tendency in Japan to mix kanjii with English words. Then there's the whole right-to-left mark business, which has to coexist with left-to-right TLDs.
So if I mix ASCII with obscure UTF8 characters like the domain in OP's post I can register it then?

Something like www.paypal.com --> www.n--pal-n76secrc.com

No. When you apply NFKC normalization to that string, you get just 'paypal', so Paypal have already registered the result. You can try that here: http://mct.verisign-grs.com/ - notice how the output is not the same as some online converters based on punycode.js, because that doesn't have nameprep support https://github.com/bestiejs/punycode.js/issues/40