Hacker News new | ask | show | jobs
by Kenji 3498 days ago
Unicode URLs are the devil. Too many indistinguishable characters. URLs should stay full ASCII imho. And I say that as someone whose language requires non-ASCII symbols.

Or, in Bruce Schneier's words: "Unicode is just too complex to ever be secure."

1 comments

But think about the poor underrepresented folks using foreign character sets?

You really need to support this 'sub café {} café()' => Undefined subroutine café in your friendly and social programming language, otherwise you will be accused of discrimination. Of course the two é are not normalized.

Which unicode-friendly language does really check for mixed script confusables? Java only is my guess. Even perl6 falls into this trap.

http://unicode.org/reports/tr39/#Mixed_Script_Confusables

When it is just accents, it's ok. But when your users have a language that uses à radically different alphabet, sometimes they can't even read Latin transliterations.
agree. but then you need to declare your exoting encoding somehow, such as in perl via use encoding 'greek'; and then the parser does not need to guess about mixed scripts encodings on every identifier. there's only latin and greek valid, everything else invalid.

how many languages even check for mixed script confusables? for dynamic languages this check is much too expensive, but they are leading the "good cause", allowing everything, and checking nothing.