Hacker News new | ask | show | jobs
by bjt2n3904 2864 days ago
Ugh. This just further poisons me against unicode being a good thing. ASCII or bust.

"ABCD" and "𝓐𝓑𝓒𝓓" are the same thing, but they also aren't. Am I supposed to normalize everything on username creation to prevent people from making duplicates?

2 comments

You either have to do that, or restrict usernames to ASCII subset only (and face the wrath of non-Latin alphabet users), or restrict to a whitelist of ranges that only represent alpha characters across all languages (i.e. not math symbols, nor box-drawing, nor emoji, etc.)

i8n is complicated because the diversity of language is itself complicated. I feel your pain, though, don't get me wrong.

Not normalise, just set the rules you want to apply. Internationalised domain names suffer this issue, so cribbing their fix probably would work for you - pick a script, restrict allowable characters to it.

You probably never want to allow unrestricted use of any character set for a username, even ASCII - otherwise I could take the username 'bjt2n3904 '.