Hacker News new | ask | show | jobs
by ergot 3499 days ago
Most browsers should forcibly transcribe this to Punycode[1]:

    https://www.𝙿𝙰𝚈𝙿𝙰𝙻.com/
And yet when I paste this into the latest Firefox it redirects to https://www.paypal.com/

No 301 redirects or anything, the browser just treats it like ASCII, which it is clearly not, it actually happens to be Fullwidth:

https://en.wikipedia.org/wiki/Fullwidth_form

Serious phishing opportunity if you ask me!

[1] https://en.wikipedia.org/wiki/Punycode

3 comments

Nope. The browser is behaving sensibly, since you can't register that domain. It's applying the same rules that the registrars do.

ICANN require that registries follow RFC3491 and related RFCs for name prep before allowing a name to be registered https://www.icann.org/resources/unthemed-pages/idn-guideline... . What that one does is (among other things) NFKC normalization and case-folding:

    irb(main):016:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c"
    => "PAYPAL"
    irb(main):017:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c".unicode_normalize(:nfkc).downcase
    => "paypal"
Interesting. So, out of interest, why is the same not being applied for ɢ? (When I ran it through Python's unidecode I got the roman symbol all the same).
Because 'small capital g' doesn't have a compatibility decomposition to G, but wide letter P does have a compatibility decomposition to 'normal' P. Unicode normalization kills large classes of homograph attacks but by no means all. conventions over mixing scripts from different languages stop some more, but there's no single answer.
Doing the "ɢ" conversion here[0], I get

  xn--1na
[0]https://www.punycoder.com/
The problem is that the RFCs aren't restrictive enough, partly because the IETF doesn't have much authority over registrars. The domain name rules really ought to be something like "one script, plus numbers, in a domain name part". But this runs into such things as the tendency in Japan to mix kanjii with English words. Then there's the whole right-to-left mark business, which has to coexist with left-to-right TLDs.
So if I mix ASCII with obscure UTF8 characters like the domain in OP's post I can register it then?

Something like www.paypal.com --> www.n--pal-n76secrc.com

No. When you apply NFKC normalization to that string, you get just 'paypal', so Paypal have already registered the result. You can try that here: http://mct.verisign-grs.com/ - notice how the output is not the same as some online converters based on punycode.js, because that doesn't have nameprep support https://github.com/bestiejs/punycode.js/issues/40
Those characters are not fullwidth.

This:

www.paypal.com

or this:

www.PAYPAL.com

would be fullwidth.

What you actually posted are characters in the Mathematical Alphanumeric Symbols block. Specifically:

𝙿 — U+1D67F MATHEMATICAL MONOSPACE CAPITAL P

𝙰 — U+1D670 MATHEMATICAL MONOSPACE CAPITAL A

𝚈 — U+1D688 MATHEMATICAL MONOSPACE CAPITAL Y

𝙻 — U+1D67B MATHEMATICAL MONOSPACE CAPITAL L

How is that a phishing opportunity if it redirects you to the real website?