Hacker News new | ask | show | jobs
by celoyd 5696 days ago
I understand the need for localization and all, but 46 THOUSAND characters? Jeez.

There are that many Han characters alone, so I’m not sure what the surprise is. It’s not like you have to hard-code them in your grammar.

If anything, I’d hope that new languages in 2010 allow any of the roughly 100,000 non-control non-whitespace [edit: non-punctuation] Unicode characters. For a lot of the code I see, ASCII is at least as constraining as, say, fixnums would be.

2 comments

Actually, it is possible it does. I only ran the test from 0x0000 to 0xFFFF. Maybe I should revisit.
moreover you actually already have trivial code for that in $JAVAISH_IMPLEMENTATIon's Character.isUnicodeIdentifier{start,part}
I used Character.isJavaIdentifierStart(int) and Character.isJavaIdentifierPart(int) to write a file with the ranges that I cut and pasted into my C# code. And thank goodness, too! I'm sure I would've made a typo or missed characters if I had to type all that myself.