| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ergot 3499 days ago

Most browsers should forcibly transcribe this to Punycode[1]:

    https://www.𝙿𝙰𝚈𝙿𝙰𝙻.com/

And yet when I paste this into the latest Firefox it redirects to https://www.paypal.com/

No 301 redirects or anything, the browser just treats it like ASCII, which it is clearly not, it actually happens to be Fullwidth:

https://en.wikipedia.org/wiki/Fullwidth_form

Serious phishing opportunity if you ask me!

[1] https://en.wikipedia.org/wiki/Punycode

3 comments

bazzargh 3499 days ago

Nope. The browser is behaving sensibly, since you can't register that domain. It's applying the same rules that the registrars do.

ICANN require that registries follow RFC3491 and related RFCs for name prep before allowing a name to be registered https://www.icann.org/resources/unthemed-pages/idn-guideline... . What that one does is (among other things) NFKC normalization and case-folding:

    irb(main):016:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c"
    => "ＰＡＹＰＡＬ"
    irb(main):017:0> "\ufeff\uff30\uff21\uff39\uff30\uff21\uff2c".unicode_normalize(:nfkc).downcase
    => "paypal"

link

2T1Qka0rEiPr 3498 days ago

Interesting. So, out of interest, why is the same not being applied for ɢ? (When I ran it through Python's unidecode I got the roman symbol all the same).

link

bazzargh 3498 days ago

Because 'small capital g' doesn't have a compatibility decomposition to G, but wide letter P does have a compatibility decomposition to 'normal' P. Unicode normalization kills large classes of homograph attacks but by no means all. conventions over mixing scripts from different languages stop some more, but there's no single answer.

link

7Z7 3498 days ago

Doing the "ɢ" conversion here[0], I get

  xn--1na

[0]https://www.punycoder.com/

link

Animats 3498 days ago

The problem is that the RFCs aren't restrictive enough, partly because the IETF doesn't have much authority over registrars. The domain name rules really ought to be something like "one script, plus numbers, in a domain name part". But this runs into such things as the tendency in Japan to mix kanjii with English words. Then there's the whole right-to-left mark business, which has to coexist with left-to-right TLDs.

link

ergot 3498 days ago

So if I mix ASCII with obscure UTF8 characters like the domain in OP's post I can register it then?

Something like www.ｐａｙpal.com --> www.n--pal-n76secrc.com

link

bazzargh 3498 days ago

No. When you apply NFKC normalization to that string, you get just 'paypal', so Paypal have already registered the result. You can try that here: http://mct.verisign-grs.com/ - notice how the output is not the same as some online converters based on punycode.js, because that doesn't have nameprep support https://github.com/bestiejs/punycode.js/issues/40

link

arm 3499 days ago

Those characters are not fullwidth.

This:

www.ｐａｙｐａｌ.com

or this:

www.ＰＡＹＰＡＬ.com

would be fullwidth.

What you actually posted are characters in the Mathematical Alphanumeric Symbols block. Specifically:

𝙿 — U+1D67F MATHEMATICAL MONOSPACE CAPITAL P

𝙰 — U+1D670 MATHEMATICAL MONOSPACE CAPITAL A

𝚈 — U+1D688 MATHEMATICAL MONOSPACE CAPITAL Y

𝙻 — U+1D67B MATHEMATICAL MONOSPACE CAPITAL L

link

thefreeman 3498 days ago

How is that a phishing opportunity if it redirects you to the real website?

link