Hacker News new | ask | show | jobs
by tokenizerrr 3567 days ago
Right but how does the user submit it and what do you put in the href?
1 comments

If a user copied the URL from the address bar, it will be correctly percent-encoded already.

You can put the same percent-encoded URL in the href attribute of a hyperlink. A properly encoded URL will not contain any character that requires escaping in an HTML context.

When a user clicks on that link, the browser will navigate to the percent-encoded URL but display the snowman icon in the address bar. If the user copies it, it will transparently turn back into the percent-encoded URL. All modern browsers do this.

I just tried doing that with a few domain names containing an umlaut (äöü) and every single time that letter was copied into the clipboard (even though behind the scenes at the request level it would have been encoded). This is what I expect as a regular user. They don't want to deal with encoded, unreadable URLs.
I tried with http://њњњ.срб , which Firefox copies correctly, but Chromium copies as http://xn--g2aaa.xn--90a3ac/ — not very useful.

This is a different mechanism to the path part, where both Firefox and Chromium give https://ru.wikipedia.org/wiki/%D0%A0%D0%BE%D1%81%D1%81%D0%B8... rather than the readable https://ru.wikipedia.org/wiki/Россия

The two methods are punycode [1] and percent encoding [2].

[1] https://en.wikipedia.org/wiki/Punycode

[2] https://en.wikipedia.org/wiki/Percent-encoding

If you select the entire URL bar, you get the encoded form. If you leave off the protocol, or just the h, you get it unencoded.
If I'm a user writing an URL, then I will write it as it appears in the URL bar. That means you must be able to accept URLs that contain unicode.

Keep in mind that unicode isn't just for emojii. Plenty of languages use characters that are not in ascii.