Hacker News new | ask | show | jobs
by jcranmer 2727 days ago
Case insensitivity sounds good except it quickly runs afoul of "language isn't so simple."

If I define a variable as "groß", does "GROSS" or "GROẞ" match it (or both, which probably implies "gross" would match as well)? What about "ê" and "E"? Or the infamous i/I/İ/ı debacle, which could make matching "insane" to "INSANE" locale-dependent? How do you define case-insensitivity in a way that makes sense?

1 comments

These are solved problems though and unicode identifiers are rare in practice...

See Normalization Form KC and Clause 21 of ISO/IEC 10646:2017.

"Normalization forms are the mechanisms allowing the selection of a unique coded representation among alternative; but equivalent coded text representations of the same text. Normalization forms for use with ISO/IEC 10646 are specified in the Unicode Standard UAX#15..." yada yada

Unicode normalization doesn't actually solve a single problem I mentioned. All of the listed characters are equal to themselves in both NFC and NFKC.

Also Unicode identifiers aren't rare in terms of language support. Most of the popular languages support them--C/C++, C#, Java, PHP, Python, Perl, Swift, Go, Rust, Ruby, JavaScript, even Ada. It's actually difficult to find a popular language that prohibits Unicode identifiers entirely (MATLAB does, not sure about Visual Basic).