>This is highly non-trivial once you realize that the world speaks more than ASCII and things like http://www.xn--n3h.net exist.
I was under the impression that requests to and from the server still used ASCII?
That is, the server would see a host header as this:
Host: www.xn--n3h.net
And not as this:
Host: www.[snowman icon].net
Anything else is a question of URL-encoding, which if not used would raise interesting bugs with space characters, let alone anything more exotic like snowmen.
Edit for completeness: in my server logs, the GET request for a /[snowman icon] URL is url encoded to
If a user copied the URL from the address bar, it will be correctly percent-encoded already.
You can put the same percent-encoded URL in the href attribute of a hyperlink. A properly encoded URL will not contain any character that requires escaping in an HTML context.
When a user clicks on that link, the browser will navigate to the percent-encoded URL but display the snowman icon in the address bar. If the user copies it, it will transparently turn back into the percent-encoded URL. All modern browsers do this.
I just tried doing that with a few domain names containing an umlaut (äöü) and every single time that letter was copied into the clipboard (even though behind the scenes at the request level it would have been encoded). This is what I expect as a regular user. They don't want to deal with encoded, unreadable URLs.
I've had an international domain since 2006, and the sad truth is they still aren't widely supported 10 years later (the fuckyeahmarkdown website being a case in point). I don't think people are deliberately filtering out those characters - they just aren't aware that such names are even possible.
In the beginning I used to file bug reports whenever I encountered websites that couldn't handle my domain, but I eventually resigned myself to the fact that most people just don't care. Nowadays I don't even bother trying the unicode most of the time, and just use the punycode version instead.
I was under the impression that requests to and from the server still used ASCII?
That is, the server would see a host header as this:
And not as this: Anything else is a question of URL-encoding, which if not used would raise interesting bugs with space characters, let alone anything more exotic like snowmen.Edit for completeness: in my server logs, the GET request for a /[snowman icon] URL is url encoded to